Expected Hypothetical Catch Probability – Part 1

What follows is the work Sameer Deshpande and I did for the 2019 NFL Big Data Bowl. We will be presenting this work at the Finals on February 27th.


Consider two passing plays during the game between the Los Angeles Rams and visiting Indianapolis Colts in the first week of the 2017 season.

The first passing play was a short pass in the first quarter from Colts quarterback Scott Tolzien intended for T.Y. Hilton which was intercepted by Trumaine Johnson and returned for a Rams touchdown.

The second passing play was a long pass from Rams quarterback Jared Goff to Cooper Kupp, resulting in a Rams touchdown (time stamp 3:39).

In this work, we consider the question: which play had the better route(s)?

From one perspective, we could argue that Kupp’s route was better than Hilton’s; after all it resulted in the offense scoring while the first play resulted in a turnover and a defensive score. However evaluating a decision based only on its outcome is not always appropriate or productive. Two recent examples of similar plays come to mind: Pete Carroll’s decision to pass the ball from the 1 yard line in Super Bowl XLIX and the “Philly Special” in Super Bowl LII. Had the results of these two plays been reversed, Pete Carroll might have been celebrated and Doug Pederson criticized.

All this is to say, we shouldn’t condition on the observed outcome along.

If evaluating plays solely by their outcomes is inadequate, on what basis should we compare routes? Intuitively, we might tend to prefer routes which maximize the receiver’s chance of catching the pass, or completion probability.

If we let y be a binary indicator of whether a pass was caught and let x be a collection of covariates summarizing information about the pass, we can consider a logistic regression model of completion probability:

\log{\left(\frac{P(y = 1 | x)}{P(y = 0 | x)}\right)} = f(x),

or equivalently P(y = 1 | x) = \left[1 + \text{e}^{-f(x)}\right]^{-1}, for some unknown function f.

If we know the function f, a first pass at assessing a route would be to plug in the relevant covariates x and see whether the forecasted completion probability exceeded some threshold, say 50%. If so, regardless of whether the receiver actually caught the actual pass, we could say that the route was run and ball was placed in such a way as to give the receiver a better chance than not of catching the pass.

Wait a minute, what’s f and what’re the inputs x, you might ask? We’ll go into all of the gory details later but suffice it to say: x contains what we’ll call “time of delivery” variables, which are recorded the moment the ball is thrown, and “time of arrival” variables, which are recorded when the receiver tries to catch the ball. Intuitively, we might expect that catch probability depends on both of these. And f, well f is probably some crazy non-linear function of a bunch of  variables. See Post 2 for more details.

We could then directly compare the forecasted completion probabilities of the two plays mentioned above; if it turned out that the Tolzien interception had a higher completion probability than the Kupp touchdown, that play would not seem as bad, despite the much worse outcome [spoiler: it wasn’t].

But why stop there? There are usually multiple eligible receivers running routes on a given pass play. What can we say about the non-targeted receivers? In particular, if the quarterback threw to a different location along a possibly different receiver’s route, can we predict the catch probability? It turns out, this is challenging for two fundamental reasons.

First, even if we knew the true function f, we are essentially trying to deduce what might have happened in a counterfactual world where the quarterback had thrown the ball to a different player at a different time, with the defense reacting differently. On such a counterfactual pass, we do not observe any “time of arrival” variables that may predictive of completion probability. Figure 1 illustrates this issue, showing schematics for an observed pass (left panel) and a hypothetical pass (right panel). In both passes, there are two receivers running routes; we have colored the route of the intended receiver on both passes blue and the route of the other receiver in gray.

Figure 1: Schematic of what we directly observe on an actual pass (left panel) from our dataset and what we cannot observe for a hypothetical pass (right panel). In both passes, there are two receivers running routes.The targeted receiver is denoted with a circle and the defender closest to the receiver is denoted with an X. Unobservables are colored red while observables are colored blue.

Before proceeding, let’s pause for a moment to distinguish between our use of the term “counterfactual” and its use in causal inference.

Sameer and I are both fairly embedded in the world of causal inference (though he doesn’t have a twitter handle, email and website that prominently displays his love of all things causal. Rejoinder from Sameer: Bayes is bae. I make no apologies.) and it feels weird to use the term “counterfactual” and not elaborate.

The general causal framework of counterfactuals supposes that we change some treatment or exposure variable and asks what happens to downstream outcomes. In contrast, in this work, we considering changing a midstream variable, the location of the intended receiver when the ball arrives, and then impute both upstream and downstream variables like the time of the pass and the receiver separation at the time the ball arrives. In this work, we use “counterfactual” interchangeably with “hypothetical” and hope our more liberal usage is not a source of further confusion below. We use the word “counterfactual” interchangeably with “hypothetical” because while an unobserved pass is hypothetical, the intended receiver of that pass is not.

Ok, I’ve said my piece.

The second fundamental challenge: we typically do not know the function f and must therefore estimate it using the observed data. Even if we knew how to overcome the issue of unobserved “time of arrival” inputs for the hypothetical passes, estimation uncertainty about f will propagate to the forecasts of hypothetical completion probabilities. So we’re going to need to estimate f in a way that makes it quantify uncertainty downstream functionals In doing so, estimation uncertainty about f propagates to the uncertainty about the hypothetical completion probabilities.

So to recap: we’re positing there’s some true function f that takes in “time of release” variables and “time of arrival” variables and outputs the log-odds of a receiver catching the pass. We don’t know this function so we need to estimate f. We then want to take this estimate and plug-in inputs about hypothetical passes to predict the completion probability for every receiver involved at all times during a play. Unfortunately, we don’t actually know the value of the “time of arrival” variables for the hypothetical passes.

If you’re still with us, you might be thinking “Wait a second! I can sidestep the fact that we never observe the hypothetical “time of arrival” variables by letting f only depend on “time of release” variables. And you’d technically be right! But it strains credulity to believe, for instance, that how far a receiver is from his closest defender doesn’t affect his chances of catching the ball. So, restricting f to not depend on “time of arrival” variables seems like a decidedly arbitrary solution to our first challenge. Technically, we’d need to first establish that models of catch probability that account for “time of arrival” variables predicts better than one that does not. But we’re willing to make this intuitive assumption for now.

OK, so we want to evaluate a function that we’re uncertain about at inputs about which we’re also uncertain. We overcome the two challenges in this work.Using tracking, play, and game data from the first 6 weeks of the 2017 NFL season, we developed Expected Hypothetical Completion Probability (EHCP).

At a high-level, our framework consists of two steps:

  1. We estimate the log-odds of a catch as a function of several characteristics of each observed pass in our data.
  2. We simulate the characteristics of the hypothetical pass that we do not directly observe and compute the average completion probability of the hypothetical pass.

In Part 2 of this blog post series, we will describe our Bayesian procedure for fitting a catch probability model like in the equation above and outline the EHCP framework.

In Part 3, we will discuss the results of our catch probability model and illustrate the EHCP framework on several routes.

Finally in Part 4, we will conclude with a discussion of potential methodological improvements and refinements and potential uses of our EHCP framework.



Expected Hypothetical Completion Probability – Quick Post

What follows is some info on the work Sameer Deshpande and I did for the 2019 NFL Big Data Bowl. We will be presenting this work at the BDB Finals at the NFL Combine in Indianapolis on February 27th.

We are in the process of putting together a series of blog posts that will explain our method in, hopefully, an easily digestible way. Until then, we wanted to share a copy of the paper as it was submitted to the contest.

Expected Hypothetical Completion Probability – link to pdf

We note that there are a few caveats:

1. This is very much proof-of-concept. EHCP is a modular framework that involves lots of pieces. We have put the pieces together, but none are optimized at the moment.

2. There are many technical and conceptual details to discuss. We’re going to dive into many of these details in the coming blog posts. Additionally, we’re happy to discuss the paper with particularly interested parties.

That being said,

3. Please be patient! We’re posting the paper we submitted to the Big Data Bowl contest. We recognize that the write-up is somewhat technical and terse when it comes to the finer details of our methodology. Over the last few weeks, we’ve received some great feedback and questions from some of our friends and colleagues in sports and academia. Our plan in the next several posts is to respond to this feedback and hopefully address a bunch of initial questions. So please be patient with us; if you send us a bunch of burning questions and we don’t respond, it’s not entirely because we’re avoiding you.

That being said, here is a quick FAQ:

Q: Did you think about including other variables, such as QB pressure, time from snap to throw, defensive schemes, player information, etc?
A: We considered many variables, but had to limit scope due to time constraints. Incorporating additional variables is a clear opportunity for further work.

Q: Why BART?
A: Over an ever-growing range of problems, BART has demonstrated really great predictive performance with minimal hyperparameter tuning and without the need to pre-specify a specific functional relationship between inputs and outputs. While we didn’t do it in our analysis, BART can also be adapted to do feature selection. At the same time, it’s totally plausible that another regression technique would be effective for the problem.

Q: Did you consider other outcomes like YAC or expected yards gained?
A: We did. Ultimately, we may want to maximize expected value of a play E[value | input variables], which we can further decompose as:
E[value | input variables] = E[value | catch, x] * P(catch | x) + E[value | no catch, x] * P(no catch | x)
We focused on the P(catch | x) part.

Q: Wait a minute! You need to do a better job of modeling the conditional distribution of the unobserved variables on the observed ones. There is no way they are independent. Especially since they may change as the route develops.
A: That’s not really a question, but we agree. Handling the missing variables is one of the modular parts of the framework and can be optimized independently of the other parts. It is an interesting missing data question in its own right.

Q: Who is the best QB/WR?
A: We didn’t have enough data to draw any strong conclusions. Jameis Winston looked great in the data we had available to us.

Q: Does this run on the block chain?
A: 😑

Q: Did y’all try deep lear–
A: No.