Park Factors

Oh, it feels good to be back. Yesterday, I spent a full day at the office programming, and then programmed for a few more hours once I got home. After feeling “stuck” at a job I stopped liking, it’s been a while since I’ve felt that passion for something I was working on. It feels good to want to continue my work into the wee hours of the night.

What was I working on? Calculating park factors from retrosheet files. I spent the last day and a half coming up with a scheme to update park factors with each run that is scored.

Retrosheet files deal with completed games from past seasons, so one doesn’t necessarily have to update park factors with each event. Alternatively, you could parse the events from these logs, and then calculate the park factors of each stadium after completing the parsing of a full season’s worth of event files. Of course, this would be a faster and easier method, but …

I plan on turning this application into something more than just a database derived from retrosheet files. At some point, I hope to be gathering stats from a live feed, or generating my own. When this day comes, I not only want to insert these events into a database in real time, I want to update various advanced metrics in “real time” as well.

Clearly, this is no small task. Take, for example, FIP-based pitching WAR. Suppose C.C. Sabathia is pitching in Yankee Stadium. Obviously, the events in this game are going to affect the ERA, FIP, and WAR of Sabathia and affect the park factor of Yankee Stadium. But the change in Yankee Stadium’s park factor also affects A.J. Burnett’s WAR because he too pitches at “The Stadium” (btw, I am not a Yankee fan, this is just an example). These events also change the league averages for the AL, which affect the WAR of every pitcher in the AL.

The less complicated way to handle this would be to have a cron run after every game is completed that will recalculate the league averages and park factors as well as the ERA, FIP, and WAR of every pitcher. But c’mon! This is 2009 – people want things in “real time”.

I don’t know what the solution is yet … I’m thinking something involving Amazon’s Simple Queue Service. Whatever the solution is, if it’s even feasible, I’m sure it’ll be fun trying to get there.

This entry was posted in Baseball and tagged , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>