August 16, 2017

Yards Still Matter

If you spend any time on this site you know that one of my favorite college football metrics is also one of the most straightforward – total yards.  With the exception of yards per play, and it’s close, you’d be hard pressed to find a metric that correlates more to winning than total yardage differential between two teams.  Why?  Because yards = points.

Every time I hear someone say yards don’t matter – and provide an example of a game where a team with less yards won – my head wants to explode.  Yes, there are exceptions to the rule.  Just like there are exceptions to “You have to win the turnover battle” or “You have to run the ball to win”.  For every example you provide of a team winning despite having less total yards, I can provide more where a team won the turnover battle and lost.

If you made it through freshman stats then the graph below should tell you a story.  I’ve plotted every college football game from 2011 through 2015 (D1 vs. D1 only, 3,580 games and 7,160 data points) with yards and points.  Notice the slope of the line that goes through the data points.  It’s pointing up, as in more yards means more points.

Points and Yards Graph 2011-2015

This is significant for Clemson because one of the narratives of the offseason is that the Tigers defense lost a lot of talent and will have to “rebuild”. Fair enough.

In the period covered, teams that reached the magic 500 yard mark won 79% of the time, without regard to any other metric.  Clemson has reeled off 10 straight (and counting) 500 yard games.  Just by the fact that your offense is gaining 500 yards means you are highly likely to win.  

But it gets better.

The Clemson defense gave up 313.0 yards per game in 2015.  For arguments sake, lets say the Tigers 2016 defense regresses to “average”, which in NCAA terms in 2015 was 400 yards per game.  That’s an additional 87 yards per game given up (27.7% increase).

In our mythical game Clemson gets 500 (or more) yards and gives up 400 yards (regressed to average).  What are the chances the Tigers win?  Over the same time period (2011-2015) teams with this profile won 94% of the time (671-43).

Yards+Points Diff

To recap, the Clemson offense is likely going to be so good that the Tigers can absorb a defensive regression to “average” and still have a high probability of winning.

Obviously, there are no guarantees and every game is an independent data point on a graph such as the one above. The Tigers may reach 400 yards in one game (reducing the odds of winning) and 600 (very high probability of winning) in the next.

The point is an offense that’s likely to reach 500 yards in any given game and give up an “average” amount in the same game is still likely to win and is also why our early win probabilities have the Tigers favored in all 12 games.

As we saw in the championship game 500 yards of offense a game is not a 100% guarantee of a win, but 500 yards of offense combined with holding your opponent to 400 or less is about as close as it gets.

Geek Speak: Why Every Yard Matters, The Relationship Between Yards & Points

One of the most important statistics in football is also one of the most basic – total yards. If you knew nothing else about two teams,  be it records, point spread, who was favored and who was the underdog or any other in game stat, but you knew who had the most total yards you would have a 75% chance of picking the winner.

Closer to home Clemson went 9-1 in games where they out gained opponents and 1-2 when being out gained.

It’s usually at this point that football savants remind me of the teams who gained more yards than their opponents and lost as proof that I’m “wrong”.  Maybe they feel like I’m taking the physicality and strategy out of football by assigning a value to each yard gained (more on this below), but really the point of this is to reinforce the importance of each and every yard gained or lost, emphasizing the importance of the physical nature and accompanying strategic moves that are part of the game.

Many prefer black and white, yes or no and disdain odds and/or probabilities.  The only metric they are interested in is points.  Getting more than the other team guarantees a win 100% of the time. Nothing else matters.

Yet if a team starts at its own 10, drives 40 yards, punts and their opponent is backed up inside the 10 the team gets 0 points for that drive – but those 40 yards have a value.  Field position has been changed, and so have the odds of winning because of those 40 yards that yielded 0 points.

Every yard is important, at least while the game is in doubt.

Clemson scored on offense, defense, and special teams in 2014.  However, the vast majority of touchdowns (and therefore points) came on offense (89.6% of Clemson touchdowns came on offense) and involved gaining some amount of yards.  Sometimes it takes a lot of yards, sometimes just a few, but by and large you score by gaining yards.

The graph below plots points and yards of every game over the last four years (between 2 FBS teams). The slope should tell you all you need to know.

Yards & PointsYes, there are outliers, but the picture tells a story in 3 words – yards equal points.

Better than that generic “yards equal points” phrase, we are able to determine exactly how many points a team can expect to score based on yards gained.  Even more intriguing than that is the close to perfect symmetry of the numbers below.  For almost every 11 yard increment one additional point can be expected.

Expected Points Per Total Offense

There are no exceptions, meaning there is no instance where gaining more yards means you should expect less points.  It sounds amazingly obvious, but you’d be surprised at how many football fans believe total yards is an irrelevant metric.

Of course, the yards you gain are only part of the game and are therefore relative.  You can expect to score 36 points if you gain 490 yards, but if your defense gives up 510 yards you will most likely lose.

That doesn’t mitigate the overall point, which remains valid – yards are important because the more yards you gain the more points you are likely score – without exception, statistically speaking – and that means the more likely you are to win.

Total yards is certainly not the only metric I use when determining win probabilities, but its an important one that I give significant weight in my calculation.

 

The Difference Between Winners and Losers

Below are the metrics across the 686 games between FBS teams for the 2014 season to this point. Bowl games will be added once complete.

First, here’s a look at the averages for each metric.  A couple of things stand out – the plays are almost equal, with winners running a paltry two more plays per game.  It’s what they do with those plays (Yards/Play) that matters.  Secondly, many people say total yards don’t matter. They do.  More on this in a minute.

By Category 2014

Here’s a look at home vs. away and favorites vs. underdogs.  56% seems a bit low given that most Power 5 teams have 1 or 2 “gimmee” games on their schedule.  When you look at conference games this number is traditionally much closer to 50%.  More important than playing at home  is being the favorite.  The old adage “the best team usually wins” is in large part true, assuming the best team is favored.

HomeAway 2014

These numbers show the % of teams that have the better number for each category.  For instance, the winning team has more plays in 54.8% of the games, more total yards in 77.7% of the games (told you yards were important) and higher yards per play in 79.6% of the games.  Yards per play and total yards are the two most important stats in my book.  Secondly, we often hear about “winning” the turnover battle.  Since 22.9% of the time the turnover battle is even it’s more important to not lose the turnover battle (win or be even) – 82.3% of winning teams are at least even on turnovers.PCT 2014

Fun with Point Spreads

Below are the straight up winning % and % against the spread for favored teams from 2011-2013, FBS vs. FBS only, including Bowl Games.

The straight up column contains 2,105 games – 11 “Pick ’em” games were excluded because there was no “favorite”.  The games that ended as a “Push” are not included in the “Cover % by Spread” table.

Going by spread, the biggest upset of the last 3 years was Lousiana-Monroe’s upset of Arkansas in week 2 of the 2012 season when the Warhawks were 30 point underdogs.

Biggest cover? Florida State beat Idaho 80-14 last season to cover a 59 point spread.

The most frequent spread over the last 3 years? No surprise – 3 points – as it showed up in 133 games (6.3%).

There are some strange anomalies in the data, such as 11 point favorites win 64.5% of the time and cover only 48.4% of the time, but 11.5 point favorites are 18-0 and cover two-thirds of the time.

There’s also a weird little thing between 31.5 and 32.5 spreads where the teams are 23-0 straight up (expected) and 18-5 ATS (not so expected) which is very dissimilar to the spreads immediately preceding (3-7 ATS) and after (0-3).

 

Spread 1thru14Spread 145thru28Spread285thru42Spread425Plus

 

Geek Speak: Kmeans Clsutering with College Football Defenses

Editors Note: Paul Chimenti is a marketing analyst and provides statistical analysis for Seldom Used Reserve. The tables below originated with analysis done by Paul. For more detailed information on methodology, data, assumptions, etc., please contact chimenti80@gmail.com.

The data below includes games from 2011-2013 and includes games between “Big 5” teams only (or those that will be this season such as Louisville) and is an attempt to “cluster” defenses together using Kmeans clustering. If you’re not familiar with Kmeans clustering you may want to read this page prior to attempting to digest the data.

First, let’s clarify what the data includes (and doesn’t include):

• Includes all games between two “Big 5” teams (ACC, SEC, Big 10, PAC 12 and Big 12), Notre Dame and teams (like Louisville) that will be in a Big 5 conference this season.
• For teams like Louisville (and Notre Dame) only games against Big 5 teams are included.
• Does not include games against FCS teams or games with teams outside of the Big 5 – i.e. Clemson vs. Citadel and Clemson vs. Troy, for example, are not included.

The data does not purport to tell you which defensive style (“cluster”) is better than another – Cluster 1 is not necessarily better than cluster 2, just different – but rather gives you an idea of defenses with similar attributes.

I’ll have to admit that I was surprised to see Michigan in Cluster 1, but 348.1 yards per game is not a bad average these days, even if the Wolverines do play in the Big 10.

D Cluster 1 2013
Cluster 2 is where the Tigers reside and that’s probably about right over the last 3 seasons.  I would classify these as “decent”, but not top tier defenses.  An interesting side note here (at least for me) is that Arizona gave up 95.4 more yards per game than Clemson, but only 1.8 more points.  Perhaps turnovers were the key as the Sun Devils averaged a half more turnover per game than Clemson.

D Cluster 2 2013

Cluster 3 introduces us to some of the more problem defenses and includes one that many see as “good” – Ohio State.  Again, we have to remember this is a 3 season window, not a look back at 2013 and the clustering is not a referendum on “good” or “bad”, but rather grouping like defenses.

D Cluster 3 2013

Each team in Cluster 4 gave up at least 411.6 yards per game and a minimum of 29.9 points per game.  Ouch.

It’s also notable that 4 ACC teams reside in this cluster.  And remember how there were 5 Big 12 teams (of 10 conference teams) in Cluster 1 on the offensive side?  Well, there are 5 Big 12 teams in Cluster 4 on defense.

D Cluster 4 2013

 

Geek Speak: Kmeans Clustering with College Football Offenses

Editors Note: Paul Chimenti is a marketing analyst and provides statistical analysis for Seldom Used Reserve. The tables below originated with analysis done by Paul. For more detailed information on methodology, data, assumptions, etc., please contact chimenti80@gmail.com.

The data below includes games from 2011-2013 and includes games between “Big 5” teams only (or those that will be this season such as Louisville) and is an attempt to “cluster” offenses together using Kmeans clustering. If you’re not familiar with Kmeans clustering you may want to read this page prior to attempting to digest the data.

First, let’s clarify what the data includes (and doesn’t include):

• Includes all games between two “Big 5” teams (ACC, SEC, Big 10, PAC 12 and Big 12), Notre Dame and teams (like Louisville) that will be in a Big 5 conference this season.
• For teams like Louisville (and Notre Dame) only games against Big 5 teams are included.
• Does not include games against FCS teams or games with teams outside of the Big 5 – i.e. Clemson vs. Citadel and Clemson vs. Troy, for example, are not included.

The data does not purport to tell you which offensive style (“cluster”) is better than another – Cluster 1 is not necessarily better than cluster 2, just different – but rather gives you an idea of offenses with similar attributes.

Most of these are givens – you won’t get much argument about Clemson, Oregon, Oklahoma State and Texas Tech being clustered together.

But what about Indiana? The data shows they performed worse in every category than other cluster 1 teams except for turnovers.

However, the Hoosiers played fast, averaging 2.85 plays per minute of possession, which by the way was second only to Oregon’s 2.86 in cluster 1.
O Cluster 1 2013
In other words, Indiana played fast but not efficient and in that sense in makes sense they’re included in Cluster 1.

Cluster 2 also makes sense to me as it contains some very good, but not fast paced offenses like Florida State, Alabama, Georgia and Louisville.
O Cluster 2 2013

Conversely, many would question Auburn in the 3rd cluster, as I did. But remember, this is a 3 year window so the horrid Tiger offense of 2011 most likely offset the torrid Auburn offense of 2013.
O Cluster 3 2013

Cluster 1 confirms a couple of long held assumptions of mine.

First, the Big 12 has been the “fastest” offense in college football with 5 of the 11 teams in the cluster coming from that league (there are only 10 teams in the conference).  3 more come from the PAC 12. That means 8 of the 11 teams in Cluster 1 come from conferences that are known for fast paced offenses.

Secondly, Clemson is the lone ACC team in cluster 1 and that gives the Tigers an advantage over the rest of the ACC in general, Florida State’s dominance notwithstanding.

That doesn’t mean Clemson will win every game, offense is only half the battle, but absent a stout defense (i.e. Florida State) it means the Tigers have a decided advantage in most ACC games.

Geek Speak – Yards Per Play (YPP) vs. Total Yards

Much as I did with the total yard metric, I plotted the yard per play (YPP) for the 2,116 FBS vs FBS games for the last three years.

Not surprisingly, the curves and results are nearly the same. In fact, the YPP metric has a slight edge – 79.0% to 77.7%.

YPP Chart and Graph 2014

However, two things stand out to me:

  1. With the exception of the last range, in which both are at 100%, the total yard metric has a higher percentage than the YPP metric in each range. How then does the YPP metric have a higher overall percentage? There are many more (328 to 169) games over the last three ranges in the YPP metric thereby weighting those ranges much heavier.
  2. While the ranges go progressively higher without exception in the total yards metric, it actually goes lower (slightly) from the 3-3.49 range to the 3.50-3.99 range. The sample size is small, only an average of 26 games per year fit in this range, so it’s likely to be an anomaly and will work itself out over time.

There’s not a lot of difference in these metrics in my mind and that was part of my point in the total yard post. YPP is a simple and easy calculation, but you could easily use a metric that doesn’t even require a calculation (total yards) and get similar results.

50,000 Foot View of College Football

The charts below tell the big picture story of college football from 2011-2013 and cover 2,116 games between two FBS teams.

Some things that I found within the data:

  1. Almost all categories for winners increased (far right column) over the 3 seasons.
  2. Losing teams had reduced numbers in most categories in 2013 compared to 2012.
  3. Turnovers have remained remarkably consistent for both winners – 1.3 per game across all 3 seasons – and losers (slight variation in 2013).
  4. Winning teams average more penalty yards than losers.
  5. While the losing teams yard per pass average has remained constant, the winning teams have increased their yard per pass metric 2.5% over the 3 seasons.
  6. Both have increased their yards per rush, but winners have increased at a higher rate.
  7. Average rush yards for winners has increased by 9.2% and yards per rush by 5.1% for winners from 2011 to 2013.
  8. Scoring is up for both winners (5.7%) and losers (2.9%).
  9. Both winners and losers have increased plays and total yards, but winners have increased at  a higher rate than losers.
  10. As a whole, these numbers tend to lead credence to the theory that offenses are moving faster and have the upper hand (known as the Saban/Bielema Complex)

These numbers lay the foundation for an upcoming analysis by Paul Chimenti who holds an MS in Mathematical Sciences with Statistics Concentration. Paul is using a statatistics package that will arrange offenses and defenses in “clusters” based on metrics from the 2011-2013 seasons.

Winning Teams

2011-2013 Winners

Losing Teams

Losers 2011-2013

Geek Speak: Total Yards Matter – 2014 Version

While I don’t believe total yardage is the “end-all, be-all of football” it’s pretty clear to me that total yards are an important stat in college football.

Besides the obvious – it generally takes yards to score points – I have some numbers that back up this theory.

There are many guys smarter than me that say total yards mean little, are an “overrated” or “simplistic” metric and spend many hours devising complicated formulas to prove why that is.

I’m not smart enough to understand all of the mathematics behind those theories, but my general operating theory is “the simpler the better”.

It’s difficult to find a simpler metric than total yards, and this seems to give those smarter than me fits.

Specifically, out gaining your opponent is important.  The more the better.  If you think about it, out gaining your opponent takes into account many factors that occur during the game.  If you turn the ball over consistently you are likely to gain less yards, score less points and win less often, for example and using the difference between teams total yardage also means defense is factored into the equation.

So while gaining  yards is important, this analysis looks at the difference in yardage between winners and losers.  Another way to put it is, if Team A gains 600 yards and gives up 575 yards in game 1 and gains 125 yards and gives up 100 in game 2, Team A has the same odds of winning both games.

It’s not about the number of yards you gain, it’s about the difference between the number of yards you gain and the number of yards your opponent gains.

The charts and graphs below cover 2,116 games (6 games resulted in teams having exactly the same number of yards) between Division I teams from 2011 through 2013 and tell a simple story: Outgain your opponent and you will likely win. The more you outgain your opponent the higher your odds of winning.

Winning Pct by TYA Chart

Winning Pct by TYA Graph

A little further proof that yards matter? Teams with more yards than their opponents cover 64.6% of the time. And, as with the winning %, the higher the yardage differential the more likely a team is to cover, without exception.

 

Cover Pct by TYA Chart

Cover Pct by TYA Graph

Using the Pearson Coefficient I found a solid 0.606149 correlation between total yard differential and winning.

How did Clemson fare using this metric in 2013? I’ve previously posted on why I wasn’t that worried as Clemson fell behind in the Orange Bowl vs. Ohio State and the Tigers were 9-1 (lost South Carolina) when they outgained their opponent and 1-1 when being outgained (won Georgia, lost Florida State). Against the spread the Tigers were 6-5 when outgaining an opponent and 1-1 when being outgained.

No, total yards aren’t the end-all, be-all of football. But total yards, specifically when compared to your opponents total yards, matter and this simple metric can also increase the odds of picking the team that’ll not only win, but cover the spread, too.

It’s important not to confuse correlation with causation and I’m not saying having more total yards causes teams to win by itself.  Other factors (turnovers, for example) can cause a team to have more (turnovers gained) or less (turnovers lost) total yards and win or lose the game.

I’m saying total yards is an important factor in determining winners and losers, more than many want to acknowledge.

 

Small Ball = Smaller Odds of Winning

Last May Clemson entered the top of 9th inning with a 7-2 lead against third ranked North Carolina in their second game of the ACC Tournament and, despite losing the opening game, the Tigers were in good shape to host a regional with a 5 run lead and 3 outs standing between them and their 40th win against 18 losses.

To that point in the game the Tigers had managed to scratch out 7 runs on 9 hits, 8 of which were singles. Small ball was winning.

North Carolina also entered the 9th with 9 hits, but the Tar Heels only had 2 runs to show for their efforts. The Tiger pitching staff had held a power-laden (20th in home runs and 31st in slugging) North Carolina team to one extra base hit to this point.

A walk, 3 singles and a sacrifice fly brought home a couple of runs, but Clemson still led by 3 with two outs and two on. The odds still favored the Tigers, but the great equalizer was waiting in the wings. As Brian Holberton’s bat met Scott Firth’s pitch and the ball sailed over the fence to tie the game all of the missed opportunities that plagued the Tar Heels that evening were erased.

Clemson went on to lose that game in 14 innings, was shut out the next day and relegated to traveling to Columbia for the second straight year for a regional.

For all the talk about the lack of offense in college baseball since the bat changes after the 2010 season power and slugging still rule offensively and teams that have those qualities have a much larger margin of error than small ball teams.

Much as I did with football last summer I took look at college baseball and again found some oft-repeated themes don’t meet the statistical test as “important” to winning and losing.  I looked at 23 metrics for all 296 Division I teams for the 2013 season. Some findings were mundane, some surprising.

With the current state of the bats in college baseball who would have guessed that stolen bases and sacrifices are far and away the metrics that have the least correlation to scoring runs?

Small ball seems to be what the majority of college baseball teams have turned to in the “dead bat era”. It’s not unusual to see the 3-hole or cleanup hitter sacrifice. Get a guy on, steal or bunt him over and hope someone knocks him in. Play for one run at a time. The power game is gone. There’s no use in playing for the big inning. Hang around. Keeep it close.

Except that’s not what the data shows gives you the best chance of winning.

On some level the statistics below are obvious and those metrics show that, not surprisingly, on the offensive side runs correlate highest to winning %.
Baseball Batting Metrics Pearson Wins

Once we have confirmed that runs lead to wins, we needed to determine how teams score runs by finding the correlation to runs for each of the other metrics. There’s also no surprise that hits is the metric that correlates most highly to runs.
Baseball Batting Metrics Pearson Runs

The surprise in the data is not at the top, but the bottom. Stolen bases and sacrifices are not only at the bottom, but they are not close to any of the other metrics in terms of correlation to runs.

With the aforementioned changes in the bat several years back it would seem logical to assume a higher correlation between runs, stolen bases and sacrifices, but that’s simply not so.

Sacrifices and stolen bases are what teams do to generate runs when they don’t hit as well or with as much power as other teams and not something better hitting teams do to score runs. Teams that score runs (and have better odds of winning) generally hit for power (slugging %, OPS).

To some extent coaches play with the hands they are dealt, field and personnel-wise. But many small ball coaches recruit small ball type players and it becomes a self-fulfilling prophecy on some level.

The findings above are not earth shattering, but it does confirm that a lot of sacrifices and stolen bases and not a lot of power (slugging%, OPS) reduces your chances of scoring runs and leaves these teams a much smaller margin of error.  Small ball  type teams have to capitalize on a higher percentage of opportunities on offense and rely more on pitching and defense.

So while small ball is often cheered and celebrated as indicating a “well-coached” team, it also means the odds of scoring runs, and therefore winning games, are reduced.

Looking at the pitching and defensive metrics it’s ERA that corresponds highest with wins which, again, is not surprising.
Baseball Pitching Metrics Pearson Wins

But what metric drives a team’s ERA the most? You might be mildly surprised (as I was) to find that it’s WHIP, which as the table below shows correlates extremely high to ERA. Not splitting the atom that the two are related, but I wonder how many of us were aware of just how closely these metrics are tied?
Baseball Pitching Metrics Pitchng ERA

On the defensive side a surprise is how low on the list fielding is. I’ve been of the belief that fielding is one of the most important metrics in driving wins and apparently I was wrong. There’s a correlation, but it’s a moderate level.

Hits/9 innings is also important, but WHIP dominates because walks and hits are low on the list individually, but the combination of the two per inning is a powerful indication of winning and losing.

So what’s all this about? It doesn’t take a rocket scientist to understand that scoring runs is tougher for teams that have little power and rely on sacrifice bunts and stolen bases to score. A walk, stolen base and single is much less dangerous than a walk, single and 3 run homer. You generally need 3 of the former to equal 1 of the latter.

But the drum beat is nearly non-stop that small ball is the way to win in college baseball with the current bats. These numbers indicate that not only is that not true statistically, but relying on these tactics could in fact retard scoring (by definition you are “sacrificing” a chance for a hit and almost always giving the defense an out when attempting to advance a runner by bunting in most situations).

The truth is small ball wins some games and some teams and coaches are better at it than others, which is a benefit when playing another small ball team. But the odds are against teams using this approach against a team with more power, even during the dead bat era.

These numbers suggest that disadvantage is bigger than most of us realized.