What follows is the work Sameer Deshpande and I did for the 2019 NFL Big Data Bowl. We will be presenting this work at the Finals on February 27th.
Consider two passing plays during the game between the Los Angeles Rams and visiting Indianapolis Colts in the first week of the 2017 season.
The first passing play was a short pass in the first quarter from Colts quarterback Scott Tolzien intended for T.Y. Hilton which was intercepted by Trumaine Johnson and returned for a Rams touchdown.
The second passing play was a long pass from Rams quarterback Jared Goff to Cooper Kupp, resulting in a Rams touchdown (time stamp 3:39).
In this work, we consider the question: which play had the better route(s)?
From one perspective, we could argue that Kupp’s route was better than Hilton’s; after all it resulted in the offense scoring while the first play resulted in a turnover and a defensive score. However evaluating a decision based only on its outcome is not always appropriate or productive. Two recent examples of similar plays come to mind: Pete Carroll’s decision to pass the ball from the 1 yard line in Super Bowl XLIX and the “Philly Special” in Super Bowl LII. Had the results of these two plays been reversed, Pete Carroll might have been celebrated and Doug Pederson criticized.
All this is to say, we shouldn’t condition on the observed outcome along.
If evaluating plays solely by their outcomes is inadequate, on what basis should we compare routes? Intuitively, we might tend to prefer routes which maximize the receiver’s chance of catching the pass, or completion probability.
If we let y be a binary indicator of whether a pass was caught and let x be a collection of covariates summarizing information about the pass, we can consider a logistic regression model of completion probability:
or equivalently , for some unknown function f.
If we know the function f, a first pass at assessing a route would be to plug in the relevant covariates x and see whether the forecasted completion probability exceeded some threshold, say 50%. If so, regardless of whether the receiver actually caught the actual pass, we could say that the route was run and ball was placed in such a way as to give the receiver a better chance than not of catching the pass.
Wait a minute, what’s f and what’re the inputs x, you might ask? We’ll go into all of the gory details later but suffice it to say: x contains what we’ll call “time of delivery” variables, which are recorded the moment the ball is thrown, and “time of arrival” variables, which are recorded when the receiver tries to catch the ball. Intuitively, we might expect that catch probability depends on both of these. And f, well f is probably some crazy non-linear function of a bunch of variables. See Post 2 for more details.
We could then directly compare the forecasted completion probabilities of the two plays mentioned above; if it turned out that the Tolzien interception had a higher completion probability than the Kupp touchdown, that play would not seem as bad, despite the much worse outcome [spoiler: it wasn’t].
But why stop there? There are usually multiple eligible receivers running routes on a given pass play. What can we say about the non-targeted receivers? In particular, if the quarterback threw to a different location along a possibly different receiver’s route, can we predict the catch probability? It turns out, this is challenging for two fundamental reasons.
First, even if we knew the true function f, we are essentially trying to deduce what might have happened in a counterfactual world where the quarterback had thrown the ball to a different player at a different time, with the defense reacting differently. On such a counterfactual pass, we do not observe any “time of arrival” variables that may predictive of completion probability. Figure 1 illustrates this issue, showing schematics for an observed pass (left panel) and a hypothetical pass (right panel). In both passes, there are two receivers running routes; we have colored the route of the intended receiver on both passes blue and the route of the other receiver in gray.
Before proceeding, let’s pause for a moment to distinguish between our use of the term “counterfactual” and its use in causal inference.
Sameer and I are both fairly embedded in the world of causal inference (though he doesn’t have a twitter handle, email and website that prominently displays his love of all things causal. Rejoinder from Sameer: Bayes is bae. I make no apologies.) and it feels weird to use the term “counterfactual” and not elaborate.
The general causal framework of counterfactuals supposes that we change some treatment or exposure variable and asks what happens to downstream outcomes. In contrast, in this work, we considering changing a midstream variable, the location of the intended receiver when the ball arrives, and then impute both upstream and downstream variables like the time of the pass and the receiver separation at the time the ball arrives. In this work, we use “counterfactual” interchangeably with “hypothetical” and hope our more liberal usage is not a source of further confusion below. We use the word “counterfactual” interchangeably with “hypothetical” because while an unobserved pass is hypothetical, the intended receiver of that pass is not.
Ok, I’ve said my piece.
The second fundamental challenge: we typically do not know the function f and must therefore estimate it using the observed data. Even if we knew how to overcome the issue of unobserved “time of arrival” inputs for the hypothetical passes, estimation uncertainty about f will propagate to the forecasts of hypothetical completion probabilities. So we’re going to need to estimate f in a way that makes it quantify uncertainty downstream functionals In doing so, estimation uncertainty about f propagates to the uncertainty about the hypothetical completion probabilities.
So to recap: we’re positing there’s some true function f that takes in “time of release” variables and “time of arrival” variables and outputs the log-odds of a receiver catching the pass. We don’t know this function so we need to estimate f. We then want to take this estimate and plug-in inputs about hypothetical passes to predict the completion probability for every receiver involved at all times during a play. Unfortunately, we don’t actually know the value of the “time of arrival” variables for the hypothetical passes.
If you’re still with us, you might be thinking “Wait a second! I can sidestep the fact that we never observe the hypothetical “time of arrival” variables by letting f only depend on “time of release” variables. And you’d technically be right! But it strains credulity to believe, for instance, that how far a receiver is from his closest defender doesn’t affect his chances of catching the ball. So, restricting f to not depend on “time of arrival” variables seems like a decidedly arbitrary solution to our first challenge. Technically, we’d need to first establish that models of catch probability that account for “time of arrival” variables predicts better than one that does not. But we’re willing to make this intuitive assumption for now.
OK, so we want to evaluate a function that we’re uncertain about at inputs about which we’re also uncertain. We overcome the two challenges in this work.Using tracking, play, and game data from the first 6 weeks of the 2017 NFL season, we developed Expected Hypothetical Completion Probability (EHCP).
At a high-level, our framework consists of two steps:
- We estimate the log-odds of a catch as a function of several characteristics of each observed pass in our data.
- We simulate the characteristics of the hypothetical pass that we do not directly observe and compute the average completion probability of the hypothetical pass.
In Part 2 of this blog post series, we will describe our Bayesian procedure for fitting a catch probability model like in the equation above and outline the EHCP framework.
In Part 3, we will discuss the results of our catch probability model and illustrate the EHCP framework on several routes.
Finally in Part 4, we will conclude with a discussion of potential methodological improvements and refinements and potential uses of our EHCP framework.