I was going to flesh this idea out and refine it for a proper paper/poster for NESSIS, but since I have to be in a wedding that weekend (sigh), here are my current raw thoughts on Russell Westbrook. I figured it was best to get these ideas out now … before I become all consumed by The Finals.
I’ve been thinking a lot about Russell Westbrook and his historic triple-double season. Partially I’ve been thinking about how arbitrary the number 10 is, and how setting 10 to be a significant cutoff is similar to setting 0.05 as a p-value cutoff. But also I have been thinking about stat padding. It’s been pretty clear that Westbrook’s teammates would let him get rebounds, but there’s also been a bit of a debate about how he accrues assists. The idea being that once he gets to 10, he stops trying to get assists. Now this could mean that he passes less, or his teammates don’t shoot as much, or whatever. I’m not concerned with the mechanism, just the timing. For now.
I’ll examining play-by-play data and box-score data from the NBA for the 2016-2017 season. This data is publicly available from http://www.nba.com. The play-by-play contains rich event data for each game. The box-score includes data for which players started the game, and which players were on the court at the start of a quarter. Data, in csv format, can be found here.
Let’s look at the time to assist for every assist Westbrook gets and see if it significantly changes for assists 1-10 vs 11+. I thought about looking at every assist by number and doing a survival analysis, but soon ran into problems with sparsity and granularity. Westbrook had games with up to 22 assists, so trying to look at them individually got cumbersome. Instead I decided to group assists as follows: 1-7, 8-10 and 11+. I reasoned that Westbrook’s accrual rate for the first several assists would follow one pattern, which would then increase as he approached 10, and then taper off for assists 11+.
I freely admit that may not be the best strategy and am open to suggestions.
I also split out which games I would examine into 3 groups: all games, games where he got at least 11 assists, and games where he got between 11 and 17 assists. This was to try to account for right censoring from the end of the game. In other words, when we look at all games, we include games where he only got, say, 7 assists, and therefore we cannot hope to observe the difference in time to assist 8 vs assist 12. Choosing to cut at 17 assists was arbitrary and I am open to changing it to fewer or more.
Our main metric of interest is the time between assists, i.e. how many seconds of player time (so time when Westbrook is on the floor) occur between assists.
First, let us take a look at some basic statistics, where we examine the mean, median, and standard deviation for the time to assist broken down by group and by the different sets of games. Again, this is in seconds of player time.
We can see that if we look at all games, it appears that the time between assists goes down on average once Westbrook gets past 10 assists. However this sample of games includes games where he got upwards of 22 assists, which, given the finite length of games, means assists would tend to happen more frequently. Limiting ourselves to games with at least 11 assists, or games with 11-17 assists gives a view of a more typical game with many assists. We see in (1b) and (1c) that time to assist increases on average once Westbrook got his 10th assist.
However, these basic statistics only account for assists that Westbrook actually achieved, they do not account for any right censoring. That is, say Westbrook gets 9 assists in a game in the first half alone, and doesn’t record another assist all game despite playing, say, 20 minutes in the second half. If there game were to go on indefinitely, Westbrook eventually would record that 10th assist, say after 22 minutes. But since we never observe that hypothetical 10th assist, that contribution of 22 minutes isn’t included. Nor is even the 20 minutes of assist-less play. This basic censoring problem is why we use survival models.
Next we can plot Kaplan Meier survival curves for Westbrook’s assists broken down by group and by the different sets of games. I used similar curves when looking at how players accrue personal fouls – and I’ll borrow my language from there:
A survival curve, in general, is used to map the length of time that elapses before an event occurs. Here, they give the probability that a player has “survived” to a certain time without recording an assist (grouped as explained above). These curves are useful for understanding how a player accrues assists while accounting for the total length of time during which a player is followed, and allows us to compare how different assists are accrued.
Here is it very easy to see that the time between assists increases significantly once Westbrook has 10 assists. This difference is apparent regardless of which subset of games we look at, though the increase is more pronounced when we ignore games with fewer than 11 assists. We can also see that the time between assists doesn’t differ significantly between the first 7 assists and assists 8 through 10.
Finally we could put the data into a conditional risk set model for ordered events. I’m not sure this is the best model to use for this data structure, given that I grouped the assists, but it will do for now. I recommend not looking at the actual numbers and just noticing that yes, theres is a significant difference between the baseline and the group of 11+ assists.
If interested we can find the hazard ratios associated with each assist group. To do so we exponentiate the coefficients since each coefficient is the log comparison with respect to the baseline of the 1st through 7th assists. For example, looking at the final column, we see that, in games where Westbrook had between 11 and 17 assists, he was 63% less likely to record an assist greater than 10 versus how likely he was to record one of his first 7 assists (the baseline group). Interpreting coefficients is very annoying at times. The take away here is yes, there is a statistically significant difference.
Based on some simple analysis, it appears that the time between Russell Westbrook’s assists decreased once he reached 10 assists. This may contribute to the narrative that he stopped trying to get assists after he reached 10. Perhaps this is because he stopped passing, or perhaps its because his teammates just shot less effectively on would-be-assisted shots after 10. Additionally, there are many other factors that could contribute to the decline in time between assists. Perhaps there is general game fatigue, and assist rates drop off for all players. Maybe those games were particularly close in score and therefore Westbrook chose to take jump shots himself or drive to the basket.
What’s great is that a lot of these ideas can be explored using the data. We could look at play by play data and see if Russ was passing at the same rates before and after assist number 10. We could test if assist rates decline overall in the NBA as games progress. I’m not sure which potential confounding explanations are worth running down at the moment. Please, please, please, let me know in the comments, via email, or on Twitter if you have any suggestions or ideas.
REMINDER: The above analysis is something I threw together in the days between my graduation celebrations and The Finals starting and isn’t as robust or detailed as I might like. Take with a handful of salt.