# Kathy Explains all of Statistics in 30 Seconds and “How to Succeed in Sports Analytics” in 30 Seconds

I spent the weekend of October 19-21 in Pittsburgh at the 2018 CMU Sports Analytics Conference. One of the highlights of the weekend was Sam Ventura asking me to explain causal inference in 15 seconds. I couldn’t quite do it, but it morphed into trying to explain all of statistics in 30 seconds. Which I then had to repeat a few times over the weekend. Figured I’d post it so people can stop asking. I’m expanding slightly.

### Kathy Explains all of Statistics in 30 Seconds

Broadly speaking, statistics can be broken up into three categories: description, prediction, and inference.

• Description
• Summaries
• Visualizations
• Prediction
• Mapping inputs to outputs
• Predicting outcomes and distributions
• Inference/Causal Inference
• Prediction if the world had been different
• Counterfactual/potential outcome prediction

I’ll give an example in the sports analytics world, specifically basketball (this part is what I will say if I only have 30 seconds):

• Description
• Slicing your data to look at the distribution of points per game (or per 100 possessions or whatever) scored by different lineups
• Prediction
• Predicting the number of points your team will score in a game given your planned lineups
• Inference/Causal Inference
• Prediction of change in points per game if you ran totally new lineups versus the normal lineups

My day job is working for a tech healthcare company, and the following are the examples I normally use in that world:

• Description
• Distributions of patient information for emergency department admissions stratified by length of stay
• Prediction
• Predicting length of stay based on patient information present on admission
• Inference/Causal Inference
• Prediction of change in length of stay if chest pain patient had stress test vs having cardiac catheterization

So, it’s not *all* of statistics. But I think its important to understand the different parts of statistics. They have different uses and different interpretations.

### More thoughts from the conference

Any time I am at a sports conference there is always the question of “how does one succeed in/break into the field?” Many others have written about this topic, but I’ve started to see a lot of common themes. So….

### How to Succeed in Sports Analytics in 30 Seconds

Success in sports analytics/statistics seems to require these 4 abilities:

• Domain expertise
• Communication
• Statistics
• Coding/programming/CS type skills

Imagine that each area has a max of 10 points. You gotta have at least 5/10 in every category and then like, at least 30 points overall. Yes I am speaking very vaguely. But the point is, you don’t have to be great at everything, but you do have to be great at something and decent at everything.

I don’t feel like I actually know that much about basketball or baseball, or any sport really. I didn’t play any sport in college, and generally when I watch games, I’m just enjoying the game. While watching the Red Sox in the playoffs I don’t really pay attention to the distribution of  David Price’s pitches, I just enjoy watching him pitch. Hell, I spend more time wondering what Fortnite skins Price has. I’ve been guessing Dark Voyager, but he also seems like the kind of guy to buy a new phone just to get the Galaxy skin. Anyway. I’m not an expert, but I do know enough to talk sensibly about sports and to help people with more expertise refine and sharpen their questions.

And I know statistics. And years of teaching during graduate school helped me get pretty damn good at explaining complicated statistical concepts in ways that most people can understand. Plus I can code (though not as well as others). Sports teams are chock full of sports experts, they need experts in other areas too.

These four skills are key to succeeding in any sort of analytical job. I’m not a medical expert, but I work with medical experts in my job and complement their skills with my own.

### Concluding thoughts from the conference

Man, no matter what a talk is about, there’s always the questions/comments of “did you think about this other variable” (yes, but it wasn’t available in the data), “could you do it in this other sport…” (is there data available on that sport?), “what about this one example when the opposite happened?” (-_-), “you need to be clearer about how effects are mediated downstream, there’s no way this is a direct effect even if you’ve controlled for all the confounding” (ok that one’s usually me), etc.

Next time, we are going to make bingo cards.