Working with the Lahman Baseball Database

March 1, 2012

I often get emails from folks who have downloaded the database and aren’t quite sure what to do with it.  Often, they’re expecting a simple spreadsheet and have no concept of what a relational database is or how to work with it.  They wonder why there are no player names in the batting table, for example.

For some of those people, this database is probably more than they need, and it’s not worth their time to learn the skills necessary to work with the database in Microsoft Access, or to learn some basic SQL.  I point those people to the Play Index at Baseball-Reference, the downloadable Myers app, or other sources that will allow them to monkey with the numbers without a crash course in computer science.

But for many people, my baseball database is a great excuse for digging into SQL or programming languages, and I want to use this post to suggest some resources that might help.

One of the best tutorials I’ve seen was written by Colin Wyers for Hardball Times.  It walks through the steps of installing MySQL and importing the baseball database.

There’s also a nice book by Joe Adler called Baseball Hacks, although it’s a bit dated now.

Here are a handful of other resources for people interested in crunching baseball statistics:


