NBA Fouls – Data, basic stats and visualizations

Part 2 of my series on DeMarcus Cousins and how NBA players accrue personal fouls.
Part 1 can be found here

I’ll be pulling edited sections from the paper I wrote with Udam Saini for the 2017 Sloan Sports Analytics Conference research paper competition. A full, finalized version of the paper will be available at a later date.

The goal of this project is to examine how NBA players accrue fouls and if it is possible to mitigate their foul tendencies through simple coaching decisions. Let’s start with getting some data and looking at basic foul rates.

Data

We examine play-by-play data and box-score data from the NBA for the 2011-2012, 2012-2013, 2013-2014, 2014-2015, and 2015-2016 seasons. This data is publicly available from http://www.nba.com. The play-by-play contains rich event data for each game. The box-score includes data for which players started the game, and which players were on the court at the start of a quarter. Data, in csv format, can be found here.

Using the box-score data and substitutions in the play by play for each game, we can determine the amount of time any given player has actively played in the current game at each event in the play by play data.  We look at only active player time, rather than game time within a game to accurately determine how often a player commits foul. Most discussion of time throughout discusses only actual play time; that is, individual person time for each player. Using player play time should control for substitution patterns, as a player in foul trouble will likely not play until later in the game. If we used game time, it would artificially increase time between fouls. Additionally, censoring times for each player in each game were generated.  For example if a player only committed 3 fouls in a game, an entry was generated for his 4th  foul, with foul time equal to the max player time and an indicator that the foul did not occur. This is important as we need to account for censored fouls in our analysis.

For now, let us only consider only centers in our analysis, to minimize effects of fouling patterns between different NBA positions. Overall, we will further limit ourselves to Al Horford, Andrew Bogut, Brook Lopez, DeMarcus Cousins, Dwight Howard, Marc Gasol, Robin Lopez, and Tyson Chandler.
In our analysis, we will focus on DeMarcus Cousins, Al Horford, and Robin Lopez as these three centers exhibit three distinct trends that we see in other centers that we analyzed. All centers considered share many of the same characteristics in our analysis as well.

Summary Statistics

Even simple analysis and statistics can give us some insight into how NBA players accrue up to 6 personal fouls over the course of a game. Table 1 gives a few summary statistics. Table 1a gives basic statistics for most of DeMarcus Cousins’ fouls from the 2011-2016 season. We can see that on average, Cousins commits his 1st personal foul after about 500 seconds (or about 8 minutes and 20 seconds) of his personal playing. By contrast, he commits his 4th foul about 300 seconds (or about 5 minutes) of personal playing time after committing his 3rd foul. Table 1b gives the same statistics for Al Horford where we see that his 1st foul comes after an average of about 823 seconds while he commits his 4th foul an average of about 311 seconds after his 3rd. From these numbers, it might appear that Horford is more “tilted” given his time between fouls shrinks more than Cousins.

screen-shot-2017-02-04-at-10-12-53-am
Table 1: Summary statistics for Cousins, Horford, Lopez and all centers pooled. Gives the average time to each foul by number, the number of games with exactly that many fouls, and the number of games with at least that many fouls. Cousins had 11 games with only 1 foul, but 115 with 5 or more.

However, the tables also show that Horford had 80 games in which he only recorded a single foul and only 37 games where he recorded 4 or more fouls. By contrast, Cousins had only 11 games with a single foul and 193 games with 4 or more. Because games often end before a player commits all six fouls, many of the foul times are right censored by the end of the game. These foul times are not included in simple summary statistics and therefore merely examining the average time to foul does not accurately reflect all the differences between players or how those players individually accrue fouls.

Visualization – Survival Curves

Next, we visualize foul rates for each player by using Kaplan Meier survival curves. A survival curve, in general, is used to map the length of time that elapses before an event occurs. Here, they give the probability that a player has “survived” to a certain time without committing a particular number foul. These curves are useful for understanding how a player accrues fouls while accounting for the total length of time during which a player is followed, and allows us to compare how the different fouls are accrued.

Screen Shot 2017-02-04 at 10.19.02 AM.png
Figure 1: Survival curves for Cousins, Horford and Lopez. Displays the probability that a player has “survived” to a certain time without committing a particular foul by number. These curves include fouls that at censored by the game ending which obscures some of the patterns.

Figure 1a gives the overall survival curves for Cousins. From the graph, there appears to be some evidence his time to foul decreases as he accrues fouls because there is layering between the fouls. While the trend may seem small, it is much starker than that for other centers, as we can see in Figures 1b for Al Horford and 1c for Robin Lopez. Their curves appear much more random. The survival curve for Al Horford’s 6th foul seems abnormal, along with Robin Lopez to a smaller extent. This abnormality is likely explained by the small sample sizes for 6 fouls as seen in Table 1.

I’d like to note here that it is important to use survival curves in this scenario as it accounts for censoring. If we were to just look at the densities of fouls for a given player, we might falsely see a very different trend. Figure 2 shows raw foul densities for DeMarcus Cousins, and there is a clear ordering for the fouls.

demarcus-cousins-foul-densities
Figure 2: Foul densities for DeMarcus Cousins. From this graph it would appear that there is a big difference in foul times. However these graphs do not take censoring into account

As mentioned above, if games were infinitely long, and players continued to play, we would observe every player until he committed his 6th personal foul and was removed from the game. As games are of finite length, many fouls are censored due the end of follow up time. Therefore, it makes sense that the 5th and 6th fouls would be subject to sampling bias. For example, if the 5th foul is committed with 4 minutes left in the game, we will never observe a 6th foul that comes 5 minutes later. To help adjust for this censoring, we considered limiting analysis to only games where all 6 fouls were committed. However, this limitation severely restricts the sample sizes for all players. Instead, we will examine games for each player where they committed a minimum of five fouls and limit our analysis to the first four fouls. This foul restriction gives us a larger sample size, though restricts us from gaining understanding about how players accrue their 5th and 6th fouls.

Screen Shot 2017-02-04 at 10.23.14 AM.png
Figure 3: Survival curves for the first 4 fouls for Cousins, Horford, and Lopez when the player commits a minimum of 5 fouls. Cousins has a clear ordering to how he commits these fouls given that his time to foul decreases as he accrues fouls. Horford does not display this trend.

Figures 3a, 3b, and 3c show the 5 foul minimum survival curves for Cousins, Horford, and Lopez. Cousins displays much clearer ordering, where the more fouls he accrues, the more likely he is to foul. However, Horford and Lopez show much less distinction between the fouls. Lopez shows some ordering, especially for his 4th foul, but Horford’s curves are fairly random.

Of course we are not controlling for nearly enough variables and the sample size is sadly limited. A full discussion of areas for further research will be discussed later. However, for now we now have a nice way to model and visualize fouls so we can understand them better moving forward.