UPDATE: I've posted a link to Excel files with spreadsheets I used in this analysis at the bottom of this page. There's a bonus file, which has the Pomeroy rankings of each ACC team. If you're a mathematical type of person and think that what I've done here is unsophisticated (at best), please see if you can do something better. Post it in the comments and I'll give it some love in a future blog post.
I’ll warn you right now: This might get a little math-y.
Today, I wrote a column in which I took a statistical look at two commonly-held and related Duke basketball precepts:
1. Duke struggles down the stretch
2. These struggles are caused in part by Coach K’s habit of playing his top players too many minutes
I thought I'd use this space to fill in some of the math-heavy details that I glossed over in the paper-and-ink version of the column, and to show some of the graphs I made reference to in the column without printing.
I started my analysis with a scatterplot. Each ACC game since 2004 was assigned a point on the scatterplot, where the x value was ACC game (i.e. 1 is the conference opener, 19 is the ACC championship game) and the y value was efficiency margin:
Efficiency margin = (Points per possession) – (points allowed per possession)
(For a full explanation of tempo-free, per-possession statistics, you can’t do better than this website, which is also where I got all of my data for the scatterplots. The CliffNotes version is as follows: Tempo-free stats allow teams to be compared without regard for style of play; if a team scores more points per possession than its opponents, it will win. The larger the per possession difference, the larger the margin of victory relative to the pace at which the game is played. Efficiency margin is a worthwhile stat for comparing teams because, all else being equal, a team will tend to increase its margin of victory as long as it is possible to do so. Thus, a "very good" team will beat a bad team by more points than a mere "good" team will.)
I ran linear regression models on all points collectively and on each season individually. Predictably, most seasons failed to produce robust trends due to the limited sample size, though all seasons from 2004-2009 demonstrated a downward trend. However, when I ran the linear regression model on all points simultaneously, there was a clearer downward trend. The p-value for negative slope was 0.0002, highly statistically significant, and the R-squared value was 0.12.
The R-squared value (along with the highly significant p-value) intrigued me. It’s not a very high number, but it essentially indicates that 12% of the variation in Duke’s play is determined by when the game is played. One would expect this value would be somewhere around zero.
To make sure that this result was not just a function of increasingly competitive games as the season wore on (perhaps as the post-season approaches, teams tried harder?), I extended my analysis to cover two additional teams – UNC (for obvious reasons) and Michigan State (because the Spartans have a reputation for getting better as the season goes on). As you can see, both teams had essentially flat slopes for each season as well as for the six seasons collectively.
So I was reasonably convinced that Duke’s performance does decline over the course of the season. Proving that this decline was related or unrelated to the number of minutes played by Duke’s star players proved to be more difficult.
I made another scatterplot where each season got one point: The y-coordinate was percentage of minutes played by Duke’s starters, and the x-coordinate was the slope of Duke’s decline. This graph was somewhat intractable to linear regression; however, the Blue Devils best season (where their decline over the course of ACC play was least) coincided with the seasons in which Duke’s starters played the most minutes. The r-square value here was 0.075 – not that impressive.
Since the precept dealt more with playing guys too many minutes and less with playing too few guys, I then looked at seasons in which Duke used several players heavily. Since 2004, there have been three seasons in which three players each played more than 32 minutes per game and three seasons where this has not been the case. In the three seasons where three players played more than 32 minutes per game, the slope of the decline was less step. In the three seasons where one player averaged over 35 minutes per game, the slope of the decline was less than in the three other seasons. Of course, the sample sizes here are exceedingly small, and I don’t think its really worth drawing any conclusions from these data.
So basically, I’ve proven to myself (and hopefully you) that the late-season decline is real, but I can’t demonstrate any causes.
Your guess is as good as mine.