Kathy Explains all of Statistics in 30 Seconds and “How to Succeed in Sports Analytics” in 30 Seconds

I spent the weekend of October 19-21 in Pittsburgh at the 2018 CMU Sports Analytics Conference. One of the highlights of the weekend was Sam Ventura asking me to explain causal inference in 15 seconds. I couldn’t quite do it, but it morphed into trying to explain all of statistics in 30 seconds. Which I then had to repeat a few times over the weekend. Figured I’d post it so people can stop asking. I’m expanding slightly.

Kathy Explains all of Statistics in 30 Seconds

Broadly speaking, statistics can be broken up into three categories: description, prediction, and inference.

  • Description
    • Summaries
    • Visualizations
  • Prediction
    • Mapping inputs to outputs
    • Predicting outcomes and distributions
  • Inference/Causal Inference
    • Prediction if the world had been different
    • Counterfactual/potential outcome prediction

I’ll give an example in the sports analytics world, specifically basketball (this part is what I will say if I only have 30 seconds):

  • Description
    • Slicing your data to look at the distribution of points per game (or per 100 possessions or whatever) scored by different lineups
  • Prediction
    • Predicting the number of points your team will score in a game given your planned lineups
  • Inference/Causal Inference
    • Prediction of change in points per game if you ran totally new lineups versus the normal lineups

My day job is working for a tech healthcare company, and the following are the examples I normally use in that world:

  • Description
    • Distributions of patient information for emergency department admissions stratified by length of stay
  • Prediction
    • Predicting length of stay based on patient information present on admission
  • Inference/Causal Inference
    • Prediction of change in length of stay if chest pain patient had stress test vs having cardiac catheterization

So, it’s not *all* of statistics. But I think its important to understand the different parts of statistics. They have different uses and different interpretations.

More thoughts from the conference

Any time I am at a sports conference there is always the question of “how does one succeed in/break into the field?” Many others have written about this topic, but I’ve started to see a lot of common themes. So….

How to Succeed in Sports Analytics in 30 Seconds

Success in sports analytics/statistics seems to require these 4 abilities:

  • Domain expertise
  • Communication
  • Statistics
  • Coding/programming/CS type skills

Imagine that each area has a max of 10 points. You gotta have at least 5/10 in every category and then like, at least 30 points overall. Yes I am speaking very vaguely. But the point is, you don’t have to be great at everything, but you do have to be great at something and decent at everything.

I don’t feel like I actually know that much about basketball or baseball, or any sport really. I didn’t play any sport in college, and generally when I watch games, I’m just enjoying the game. While watching the Red Sox in the playoffs I don’t really pay attention to the distribution of  David Price’s pitches, I just enjoy watching him pitch. Hell, I spend more time wondering what Fortnite skins Price has. I’ve been guessing Dark Voyager, but he also seems like the kind of guy to buy a new phone just to get the Galaxy skin. Anyway. I’m not an expert, but I do know enough to talk sensibly about sports and to help people with more expertise refine and sharpen their questions.

And I know statistics. And years of teaching during graduate school helped me get pretty damn good at explaining complicated statistical concepts in ways that most people can understand. Plus I can code (though not as well as others). Sports teams are chock full of sports experts, they need experts in other areas too.

These four skills are key to succeeding in any sort of analytical job. I’m not a medical expert, but I work with medical experts in my job and complement their skills with my own.

Concluding thoughts from the conference

Man, no matter what a talk is about, there’s always the questions/comments of “did you think about this other variable” (yes, but it wasn’t available in the data), “could you do it in this other sport…” (is there data available on that sport?), “what about this one example when the opposite happened?” (-_-), “you need to be clearer about how effects are mediated downstream, there’s no way this is a direct effect even if you’ve controlled for all the confounding” (ok that one’s usually me), etc.

Next time, we are going to make bingo cards.


SSAC17 “Recap”

The weekend has come and gone and so has the 2017 Sloan Sports Analytics Conference. This was the third time I attended the conference and easily the most enjoyable experience I have had to date.

Many others have recapped a lot of the compelling analytics content, so I don’t feel compelled to repeat much of that. Moreover, I don’t have the journalistic abilities yet to condense everything I learned into a nice blog entry. AND I have a proper dissertation committee meeting this week, followed by the ENAR biometrics conference next week. Between the two, I haven’t been burdened with an abundance of time. So here are some thoughts on the conference, which will inevitably spiral into larger thoughts on the field as a whole.

My experience at SSAC this year was a weird mix of trying to see famous people speak, trying to hear interesting analytics/statistics talks, and trying to meet as many people as possible. In previous years I didn’t know anyone and wasn’t thinking seriously about a career in this field, so I prioritized panels with famous speakers. This was great for maximizing entertainment value. But now that I am making a proper attempt to pursue sports analytics as a career, it was clear I needed to actually understand where the field is and where its going… while still taking time to see big names where possible. Because who can resist Nate Silver and Mark Cuban or Nate Silver and Adam Silver. It’s clear that experiences at SSAC will vary greatly depending on interests and goals.

It’s also interesting to be at the conference while in a position of actively looking for a job. During almost every conversation I had, I was trying to maintain a balance between a number of potentially conflicting motivations. Mostly, I just wanted to nerd out and talk about sports stats with like-minded people. But I also wanted to make sure the work I am doing is in the right direction and get advice on how to be better. How can I improve my work not just to be better intrinsically, but also to have a bigger impact. And then at a certain point, especially if I was talking to somebody working for a team, I’d think “is this person on a team that is hiring? Would they want to hire me?” I’m better at networking than I used to be, but at the end of the day, I am a still a somewhat awkward stats nerd. One big takeaway from the conference for me was that I need to be more aggressive and confident in general. It’s easy to have imposter syndrome. I eventually felt generally okay with the other stats folks, but at a conference with a lot of MBAs, it can be intimidating to talk to new people. Especially since I was in the minority at SSAC.

Yep, I’m going to talk about diversity for a minute. There are a lot of men at Sloan. A lot of white men. And of the women who are there, few are statisticians. I was lucky enough to meet Diana Ma who does analytics for the Indiana Pacers. We hugged out of sheer joy of finding another woman in sports stats. Diana is the first woman I have met in person who works for a team in any sport. I’ve been in STEM for most of my life, and I’m used to being in scenarios that are majority white male, but SSAC takes the cake. Conference attendees are, for the most part, aware of the demographic disparities. Not just about the lack of women, but the lack of any other minorities. And there are always conversations about how to increase the diversity of the conference and the field overall. I don’t have a good answer, but I’m glad people (including Daryl Morey) are talking about it.

Side note to the jerks on twitter, and elsewhere, questioning why diversity is important – this is for you. Even if you want to argue that diversity adds nothing to the end product, equality is important. Not everyone who had interest in the conference had access. And not everyone who might have had interest had access to resources to foster that growing interest.

Moreover, I distinctly remember being at SSAC in 2015 and hearing somebody say that women shouldn’t bother with this field, because it is such a man’s world. I can’t remember if it was on a panel or a conversation I overheard, but it struck a huge chord with me and was a large part of why I eschewed the field for so long. Fortunately, I am lucky enough to have incredibly supportive friends, family, and mentors.

Which brings me to a final, big takeaway from the conference this year. Success in sports analytics has a large component of luck. From the family into which you were born, to the school you attended and the TA you happen to have for a class, to who re-tweets you, to who you randomly happen to be sitting next to at a panel. Don’t get me wrong, you also need skill. You need to be good enough that when you are lucky enough to make a connection or have your blog post re-tweeted, people find value in it and pay attention.

Our entire careers are about quantifying uncertainty and randomness in the data we examine; we should acknowledge the randomness in our lives.

Anyway. I met a lot of really awesome people. I’m going to avoid trying to name everyone, because I’m sure I’ll forget somebody and then feel bad. But needless to say, everyone was friendly, smart, and incredibly welcoming. I wish the conference were a few days longer so things wouldn’t be so rushed, interesting panels wouldn’t overlap, and I’d have more time to chat with everyone. It’s all well and good to talk over email or the phone, but in person conversations are ideal. Maybe next year I just won’t sleep.

I hope everyone makes it out to NESSIS in September.


Random thoughts

  • Does specializing in a sport early really increase the risk of injury later in life? I think so. So do a lot of other people. But I also spoke to some folks this weekend who don’t buy it. Awesome. Let’s run the numbers. And then do it a few more times to make sure we have reproducible work.
  • I was on the Hot Takedown live recording. I may or may not have totally whiffed a question about the Warriors. Caught the tail end of John Urschel’s segment. He’s great.
  • Highlight of the weekend was a ~20 minute 1-on-1 conversation with John Urschel. We talked about causal inference, Voronoi diagrams, and super bowl win probability models. He is giant nerd and an incredibly warm person.
  • Mark Cuban does not like Donald Trump.
  • I love when athletes are on panels. Especially random additions. It’s nice to see Sue Bird and Shane Battier every year, but they are used to it by now. Luis Scola was a last minute addition to a few panels, and he gave thoughtful insights into how players use analytics.
  • Luis Scola is a very tall man. So were a lot of the men at this conference. I am 5’4″.
  • I know I want to pursue a career in sports analytics/statistics, but I have no idea of the best avenue. Should I try to join a team? The NBA? An independent company? Should I go get a regular job that will pay more and/or be less time intensive and pursue my own projects on the side? Which of these paths makes the most impact from the diversity side of things?
  • I wonder if Bob Myers would be my agent when negotiating a job offer.
  • Zach Lowe’s voice is as enjoyable in person as it is on his podcast.
  • I still have yet to meet/introduce myself to Mike Zarren. Which is insane given the number of events I been at with him, my love for the Celtics, and the fact that I personally know another member of the Celtic’s analytics team. At this point, I almost want to see if I can meet Brad Stevens and Danny Ainge before meeting Mike.
  • The name tags this year were not conducive to reading names. I wonder if that was intentional.
  • Hynes >>>>>> BCEC
  • Were we supposed to get two drink tickets? I only got one. But I feel like I got two last year.
  • Years ago I did a project on optimal strategy for penalty kicks in the World Cup. I should update that.
  • I have so many ideas for projects. So many. But this pesky PhD thing is going to get in the way for a few months. I’ll be able to put some stuff out, but school is going to take priority for a while.
  • I love sports analytics so much.