This is Part 2 of my series on Catch & Shoot jumpers
Part 1 can be found here
Last time, we ended by looking at a basic logistic regression predicting success of a shot, conditioned on whether a shot was: a catch & shoot (C&S), a three point attempt, and open. This time we will start considering effective field goal percentage (EFG%), which gives an additional bonus to three point shots.
For anybody unaware of the difference between FG% and EFG%, here is the brief but informative definition from basketball reference:
“Effective Field Goal Percentage; the formula is (FG + 0.5 * 3P) / FGA. This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal. For example, suppose Player A goes 4 for 10 with 2 threes, while Player B goes 5 for 10 with 0 threes. Each player would have 10 points from field goals, and thus would have the same effective field goal percentage (50%).”
Let’s start our investigation into EFG% by comparing EFG% to FG% for C&S vs pull-up jumpers split out between all shots and just 3 point shots.
|FG%||All shots||3 point||2 point|
|EFG%||All shots||3 point||2 point|
By using EFG% instead of FG% it becomes much clearer that C&S is a better shot than a pull-up jumper.
We could also split the data by whether or not these shots were open (as we first saw in part 1).
We see that, of course, open shots are better than defended shots. However we can also see that using EFG% shows that a defended C&S is better than an open pull-up. Even without seeing the raw numbers, we suspect these results come from a large number of C&S shots being 3-point attempts.
So we could stratify further and look at C&N vs 3-point vs openness. And while it would be easy to make a number of stratified 2×2 tables, at a certain point it makes more sense to just use a model and account for as many possible variables that could effect FG% or EFG%. Which is not to say that examining raw percentages is a bad idea. After all, tables are a simple way to compare different kinds of shots, and since we have a large number of shots, we won’t really run into any sparsity problems.
But I don’t want to spend too long just looking at basic statistics. So, let’s continue down our previous path of looking at a simple regression to predict shot success, and see how we can improve it. However, we quickly run into two potential problems.
The first problem is one we touched on previously – looking at confounders. We want to understand variables that effect whether a shot is successful and that effect a players decision to take a specific type of shot. Last time we looked at defender distance as a potential confounder. This time we will also consider the shot clock. If there are only a few seconds left on the clock, a player may not have time to drive to the basket, and will have to just shoot. For future analyses, I’d want to explore other variables that are potential confounders such as game time remaining, the score, and who the closest defender is. But for now, let’s keep things relatively simple.
The second problem we will face is more complicated – how do model EFG%? Modeling FG% is easy because our outcome is binary, a shot is successful or not. Logistic regression requires a binary outcome, so we can’t just give successful 3 point shots an outcome of 1.5. Most statistics software will allow us to use weights in a quasibinomial framework, but I can’t think of a good way to use weights to get at EFG%. Weights are used to create pseudo-populations that up-weight or down-weight certain shots depending on how representative they are. The problem with giving a successful 3 point shot a weight of 1.5 is that it doesn’t make the outcome 1.5, rather it increases the representation of the characteristics of that shot.
If anybody has a way to examine EFG% using a weighted regression, please let me know. I only spent a few days thinking about this and while I have a work around, I would love to be able to show this analysis just use a simple regression framework. But I cannot, for the life of me, think of a way to do it. I tried for a while to reframe the problem by using functionals instead of trying to target a regression parameter, but I still don’t think it works.
So what is my work around? Don’t look at EFG%. Instead split out 3 point shots and 2 point shots and examine them separately. 3 point shots and 2 point shots are different enough that trying to pool them into a single population will obscure the differences and lead to analytical problems. Especially since it may be naive to assume a constant treatment effect of C&S for both 2-point and 3-point shots. We could also split out the two kinds of shots and instead look at the expected number of points per shot. Stephen Shea has touched on this, which makes me think it is a good avenue for further investigation.
On a more philosophical level, there always seems to be this strong desire to collapse everything down to single number. We see this a lot when we try to invent statistics that fully capture how good a player is with one number. And while I agree there is value in a single statistic, I also think there is value in nuance and increased granularity. My goal is to examine C&S shots, and there is no harm in splitting that out by shot value.
But I freely admit that I may be missing something obvious and there is an easy way to use EFG%. Again, if you have any ideas, please let me know.
Next time in this series, we will dive into causal effects of catch & shoot vs pull-up jumpers.