January 21, 2019

Geek Speak: Kmeans Clsutering with College Football Defenses

Editors Note: Paul Chimenti is a marketing analyst and provides statistical analysis for Seldom Used Reserve. The tables below originated with analysis done by Paul. For more detailed information on methodology, data, assumptions, etc., please contact chimenti80@gmail.com.

The data below includes games from 2011-2013 and includes games between “Big 5” teams only (or those that will be this season such as Louisville) and is an attempt to “cluster” defenses together using Kmeans clustering. If you’re not familiar with Kmeans clustering you may want to read this page prior to attempting to digest the data.

First, let’s clarify what the data includes (and doesn’t include):

• Includes all games between two “Big 5” teams (ACC, SEC, Big 10, PAC 12 and Big 12), Notre Dame and teams (like Louisville) that will be in a Big 5 conference this season.
• For teams like Louisville (and Notre Dame) only games against Big 5 teams are included.
• Does not include games against FCS teams or games with teams outside of the Big 5 – i.e. Clemson vs. Citadel and Clemson vs. Troy, for example, are not included.

The data does not purport to tell you which defensive style (“cluster”) is better than another – Cluster 1 is not necessarily better than cluster 2, just different – but rather gives you an idea of defenses with similar attributes.

I’ll have to admit that I was surprised to see Michigan in Cluster 1, but 348.1 yards per game is not a bad average these days, even if the Wolverines do play in the Big 10.

D Cluster 1 2013
Cluster 2 is where the Tigers reside and that’s probably about right over the last 3 seasons.  I would classify these as “decent”, but not top tier defenses.  An interesting side note here (at least for me) is that Arizona gave up 95.4 more yards per game than Clemson, but only 1.8 more points.  Perhaps turnovers were the key as the Sun Devils averaged a half more turnover per game than Clemson.

D Cluster 2 2013

Cluster 3 introduces us to some of the more problem defenses and includes one that many see as “good” – Ohio State.  Again, we have to remember this is a 3 season window, not a look back at 2013 and the clustering is not a referendum on “good” or “bad”, but rather grouping like defenses.

D Cluster 3 2013

Each team in Cluster 4 gave up at least 411.6 yards per game and a minimum of 29.9 points per game.  Ouch.

It’s also notable that 4 ACC teams reside in this cluster.  And remember how there were 5 Big 12 teams (of 10 conference teams) in Cluster 1 on the offensive side?  Well, there are 5 Big 12 teams in Cluster 4 on defense.

D Cluster 4 2013

 

Geek Speak: Kmeans Clustering with College Football Offenses

Editors Note: Paul Chimenti is a marketing analyst and provides statistical analysis for Seldom Used Reserve. The tables below originated with analysis done by Paul. For more detailed information on methodology, data, assumptions, etc., please contact chimenti80@gmail.com.

The data below includes games from 2011-2013 and includes games between “Big 5” teams only (or those that will be this season such as Louisville) and is an attempt to “cluster” offenses together using Kmeans clustering. If you’re not familiar with Kmeans clustering you may want to read this page prior to attempting to digest the data.

First, let’s clarify what the data includes (and doesn’t include):

• Includes all games between two “Big 5” teams (ACC, SEC, Big 10, PAC 12 and Big 12), Notre Dame and teams (like Louisville) that will be in a Big 5 conference this season.
• For teams like Louisville (and Notre Dame) only games against Big 5 teams are included.
• Does not include games against FCS teams or games with teams outside of the Big 5 – i.e. Clemson vs. Citadel and Clemson vs. Troy, for example, are not included.

The data does not purport to tell you which offensive style (“cluster”) is better than another – Cluster 1 is not necessarily better than cluster 2, just different – but rather gives you an idea of offenses with similar attributes.

Most of these are givens – you won’t get much argument about Clemson, Oregon, Oklahoma State and Texas Tech being clustered together.

But what about Indiana? The data shows they performed worse in every category than other cluster 1 teams except for turnovers.

However, the Hoosiers played fast, averaging 2.85 plays per minute of possession, which by the way was second only to Oregon’s 2.86 in cluster 1.
O Cluster 1 2013
In other words, Indiana played fast but not efficient and in that sense in makes sense they’re included in Cluster 1.

Cluster 2 also makes sense to me as it contains some very good, but not fast paced offenses like Florida State, Alabama, Georgia and Louisville.
O Cluster 2 2013

Conversely, many would question Auburn in the 3rd cluster, as I did. But remember, this is a 3 year window so the horrid Tiger offense of 2011 most likely offset the torrid Auburn offense of 2013.
O Cluster 3 2013

Cluster 1 confirms a couple of long held assumptions of mine.

First, the Big 12 has been the “fastest” offense in college football with 5 of the 11 teams in the cluster coming from that league (there are only 10 teams in the conference).  3 more come from the PAC 12. That means 8 of the 11 teams in Cluster 1 come from conferences that are known for fast paced offenses.

Secondly, Clemson is the lone ACC team in cluster 1 and that gives the Tigers an advantage over the rest of the ACC in general, Florida State’s dominance notwithstanding.

That doesn’t mean Clemson will win every game, offense is only half the battle, but absent a stout defense (i.e. Florida State) it means the Tigers have a decided advantage in most ACC games.