Are you doing data science, or #DataScience?

#DataScience is everywhere – it’s eating the world, promising to solve all of your business problems with a little data and a lot of technology. Unfortunately, it looks like actual data science is being lost somewhere in the hype.

Continue reading


Modeling Strike Zones With Neural Networks Part 3

In part 1 I looked at building a neural network model of a batter strike zone in R. In part 2 I talked about using that model to estimate the top and bottom of that batter’s individual strike zone. At long last, this post will use that information to model an umpire strike zone, which was the whole point all along!

Continue reading

Modeling Strike Zones with Neural Networks Part 2

In part 1 I looked at building a neural network model of a batter strike zone in R. In this post I will show you have to use that model to estimate the boundaries of his personal strike zone. As a reminder, the reasons we want individual batter strike zones are:

  1. Batter heights and stances vary significantly
  2. The PITCHf/x sz_top and sz_bot fields have problems

Ok, so let’s get right to the R!

Continue reading

Modeling strike zones with neural networks

In the 2016 Hardball Times Annual I wrote about evaluating umpire consistency. The analysis implements an idea Tom Tango originally blogged about. Here on my own blog I’m going to go a little more in depth on the underlying methodology and potential improvements. Along the way I’ll also relay some R tips I’ve picked up that are useful for sabermetric analysis, particularly on large datasets like PITCHf/x.

[If you haven’t already read the THT Annual, I strongly encourage you to pick up a copy (available on Amazon). Besides the background on my own work there are 300+ pages of great analysis and research.]

Continue reading