Calculating FIP and WAR from Retrosheet Data, Part 1

Ok, I finally have the beginnings (Finally have the beginnings? That doesn’t sound right.) of FIP and WAR calculations for starting pitchers. After parsing pitching data out of retrosheet files from the 2007 season, I calculated FIP and WAR for SP on a per-game level. By taking a weighted average of these game-level FIP and WAR values, one can derive a pitcher’s FIP and WAR for an entire season.

In a previous post, I mentioned my intent to update such metrics after parsing each event from a retrosheet file or a stats feed. Though I intend to eventually update these metrics in “real-time”, I am currently calculating FIP and WAR after parsing an entire season’s worth of data. For now, I want to get these values into a database and make sure my math is correct. Once done, I’ll go back and optimize things so that FIP, WAR, and other stats can update with each relevant event.

I’ll now go through the process in detail, using James Shields’ 2007 season as an example. Before going through the details, here are a few disclaimers:

  1. This is my first attempt at calculating FIP and WAR, and I am by no means an expert on the process. I make a few assumptions, which I will explain, that may be incorrect. If I find out that any of these assumptions are wrong, I’ll go back and update this post with the corrections.
  2. MySQL tables are NOT optimized for displaying leaderboards or other practical uses. Optimizing tables will be an ongoing process.

My parsing script places a starting pitcher’s stats for an individual game into a table with the following structure:

CREATE TABLE `pitcher_game_start` (
  `rs_game_id` char(12) NOT NULL,
  `rs_player_id` char(8) NOT NULL,
  `rs_park_id` char(5) NOT NULL,
  `rs_league_id` char(2) NOT NULL,
  `rs_team_id` char(3) NOT NULL,
  `year` smallint(6) NOT NULL,
  `HR` tinyint(4) NOT NULL,
  `BB` tinyint(4) NOT NULL,
  `HBP` tinyint(4) NOT NULL,
  `IBB` tinyint(4) NOT NULL,
  `SO` tinyint(4) NOT NULL,
  `ER` tinyint(4) NOT NULL,
  `R` tinyint(4) NOT NULL,
  `IP_outs` tinyint(4) NOT NULL,
  PRIMARY KEY  (`rs_player_id`,`rs_game_id`),
  KEY `year_league` (`year`,`rs_league_id`),
  KEY `park_year` (`rs_park_id`,`year`),
  KEY `game_id` (`rs_game_id`),
  KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
  KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
  KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM

A field of significant interest is `rs_league_id`. This field describes the league of the home team. For interleague games, the home team’s league will be used for the pitchers on both sides. Additionally, the league average ERA and RA, both used in calculating FIP and WAR, will be based on the home team’s league.

Here are Shields’ rows in `pitcher_game_start` for 2007:

rs_game_id rs_player_id rs_park_id rs_league_id rs_team_id year HR BB HBP IBB SO ER R IP_outs
SEA200709140 shiej002 SEA03 AL TBA 2007 0 2 0 0 5 1 1 24
TBA200709090 shiej002 STP01 AL TBA 2007 1 0 0 0 5 2 2 21
TBA200709030 shiej002 STP01 AL TBA 2007 0 1 0 0 6 1 4 21
BAL200708290 shiej002 BAL12 AL TBA 2007 1 2 0 0 6 4 4 24
TBA200708240 shiej002 STP01 AL TBA 2007 1 1 0 0 8 2 2 25
TBA200708190 shiej002 STP01 AL TBA 2007 0 1 0 0 6 2 2 18
BOS200708130 shiej002 BOS07 AL TBA 2007 0 0 0 0 5 1 1 18
DET200708080 shiej002 DET05 AL TBA 2007 1 0 0 0 9 1 1 21
TBA200708030 shiej002 STP01 AL TBA 2007 1 3 1 0 6 3 3 20
TBA200707280 shiej002 STP01 AL TBA 2007 0 1 1 0 7 3 3 15
NYA200707220 shiej002 NYC16 AL TBA 2007 1 4 0 0 1 10 10 10
TBA200707170 shiej002 STP01 AL TBA 2007 0 1 0 0 2 2 2 21
TBA200707120 shiej002 STP01 AL TBA 2007 3 1 0 0 2 5 6 18
KCA200707060 shiej002 KAN06 AL TBA 2007 0 0 1 0 6 4 4 23
CLE200707010 shiej002 CLE08 AL TBA 2007 1 0 1 0 10 2 3 19
TBA200706260 shiej002 STP01 AL TBA 2007 2 0 0 0 11 5 5 21
ARI200706200 shiej002 PHO01 NL TBA 2007 2 0 0 0 4 6 6 15
COL200706150 shiej002 DEN02 NL TBA 2007 2 1 0 0 2 5 5 18
FLO200706090 shiej002 MIA01 NL TBA 2007 1 0 2 0 9 2 2 21
TBA200706040 shiej002 STP01 AL TBA 2007 2 1 0 0 5 2 2 22
TBA200705300 shiej002 STP01 AL TBA 2007 0 1 0 0 5 3 3 27
CHA200705250 shiej002 CHI12 AL TBA 2007 1 3 2 0 2 4 4 21
TBA200705200 shiej002 STP01 AL TBA 2007 0 3 0 0 8 1 1 21
TBA200705150 shiej002 LBV01 AL TBA 2007 1 0 0 0 5 3 3 24
BAL200705090 shiej002 BAL12 AL TBA 2007 0 1 0 0 4 0 0 27
TBA200705030 shiej002 STP01 AL TBA 2007 0 2 1 0 8 3 3 22
OAK200704270 shiej002 OAK01 AL TBA 2007 1 1 0 0 9 1 1 24
TBA200704220 shiej002 STP01 AL TBA 2007 1 1 0 0 12 2 2 24
TBA200704160 shiej002 STP01 AL TBA 2007 2 2 0 0 5 5 5 19
TEX200704110 shiej002 ARL02 AL TBA 2007 3 1 0 0 8 4 5 21
TBA200704060 shiej002 STP01 AL TBA 2007 0 2 1 0 3 3 3 20

Next, I created a table for league constants with the following structure:

CREATE TABLE `league_constants` (
  `rs_league_id` char(2) NOT NULL,
  `year` smallint(6) NOT NULL,
  `league_constant_name` varchar(32) NOT NULL,
  `league_constant_value` float(8,3) NOT NULL DEFAULT ’0.000′,
  PRIMARY KEY  (`rs_league_id`,`year`,`league_constant_name`(8)),
  KEY `year` (`year`)
) ENGINE=MyISAM

The constants I am concerned with for these calculations are: ERA, RA, and FIP_ERA. ERA and RA are the league average ERA and RA, and FIP_ERA is the constant used to translate the league average ERA to the league average FIP. I ran the following script to populate the table with said constants:

<?php

$sql =
"SELECT  `year`, `rs_league_id`,
  SUM(9*`ER`) / (SUM(`IP_outs`) / 3)  as sp_era,
  SUM(9*`R`) / (SUM(`IP_outs`) / 3)  as sp_ra,
  SUM(9*`ER` + 2*`SO` – 3*(`BB`-`IBB`+`HBP`) – 13*`HR`) / (SUM(`IP_outs`) / 3) AS sp_fip_era
FROM `pitcher_game_start`
FORCE INDEX (`year_league`)
GROUP BY `year`, `rs_league_id`"
;
mysql_query($sql, $link);

$inserts = array();
$res = mysql_query($sql, $link);
while($row = mysql_fetch_assoc($res)){
  $league = $row[‘rs_league_id’];
  $year = $row[‘year’];
  unset($row[‘rs_league_id’]);
  unset($row[‘year’]);
       
       
  foreach($row as $key => $value){
    $inserts[] = "(‘$league‘, ‘$year‘, ‘$key‘, ‘$value‘)";
  }
}

$sql =
"INSERT IGNORE INTO `league_constants` (`rs_league_id`,`year`,`league_constant_name`, `league_constant_value`)
VALUES "
. implode(‘,’, $inserts) . "
ON DUPLICATE KEY UPDATE `league_constant_value` = VALUES(`league_constant_value`)"
;
mysql_query($sql, $link);

And here are the values in the table:

rs_league_id year league_constant_name league_constant_value
AL 2007 sp_era 4.686
AL 2007 sp_ra 5.064
AL 2007 sp_fip_era 3.408
NL 2007 sp_era 4.567
NL 2007 sp_ra 4.909
NL 2007 sp_fip_era 3.280

All league constants are derived from innings pitched solely by starting pitchers. When calculating WAR for RP, probably in a later post, I’ll use another set of constants that averages values from innings pitched by relievers.

Now, we have everything we need to start calculating FIP and WAR for each game. I used Dave Cameron’s 7-part description of WAR on fangraphs as a guide. I created the following table to store game details for starting pitcher’s along with calculated FIPs and WARs:

CREATE TABLE `pitcher_game_start_fip` (
  `rs_game_id` char(12) NOT NULL,
  `rs_player_id` char(8) NOT NULL,
  `rs_park_id` char(5) NOT NULL,
  `rs_league_id` char(2) NOT NULL,
  `rs_team_id` char(3) NOT NULL,
  `year` smallint(6) NOT NULL,
  `IP_outs` tinyint(4) NOT NULL,
  `PF` float(5,3) NOT NULL DEFAULT ’1.000′,
  `fip_era_const` float(5,3) NOT NULL DEFAULT ’0.000′,
  `avg_era` float(5,3) NOT NULL DEFAULT ’0.000′,
  `avg_ra` float(5,3) NOT NULL DEFAULT ’0.000′,
  `ERA` float(4,2) NOT NULL,
  `base_fip` float(4,2) NOT NULL,
  `FIP` float(4,2) NOT NULL,
  `fip_ra` float(4,2) NOT NULL,
  `fip_ra_prk` float(4,2) NOT NULL,
  `R_per_W` float(5,3) NOT NULL DEFAULT ’0.000′,
  `RAR` float(4,2) NOT NULL,
  `WAR` float(4,2) NOT NULL,
  PRIMARY KEY  (`rs_player_id`,`rs_game_id`),
  KEY `year_league` (`year`,`rs_league_id`),
  KEY `park_year` (`rs_park_id`,`year`),
  KEY `game_id` (`rs_game_id`),
  KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
  KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
  KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM

The first step is to insert the general game info, as well as PF, ERA, and base_fip columns into the table.

  • PF (park factor) comes from another table called, park_effects.
  • If not already obvious, ERA = (ER*9) / (IP_outs/3)
  • base_fip = (-2*SO + 3*(BB-IBB+HBP) + 13*HR) / (IP_outs/3)

I used the following code snippet to populate those values:

for($i = 2007; $i <= 2007; $i++){

  .
  .
  .
  .
       
  //basic info, ERA, base_fip, PF
  $sql =
  "INSERT IGNORE INTO `pitcher_game_start_fip` (`rs_game_id`, `rs_player_id`, `rs_park_id`, `rs_league_id`, `rs_team_id`, `year`, `IP_outs`, `PF`, `ERA`, `base_fip`)
  SELECT
    s.`rs_game_id`, s.`rs_player_id`, s.`rs_park_id`, s.`rs_league_id`, s.`rs_team_id`, s.`year`, s.`IP_outs`,
    p.`PF`,
    (s.`ER`*9) / (s.`IP_outs`/3) as ERA,
    (-2*s.`SO` + 3*(s.`BB`-s.`IBB`+s.`HBP`) + 13*s.`HR`) / (s.`IP_outs`/3) as base_fip
    FROM `pitcher_game_start` s
    LEFT JOIN `games` g ON s.`rs_game_id` = g.`rs_game_id`
    LEFT JOIN `park_effects` p ON s.`rs_park_id` = p.`rs_park_id` AND s.`year` = p.`year` AND g.`rs_home_team_id` = p.`rs_team_id`
    WHERE s.`year` = $i"
;
  mysql_query($sql, $link);

        .
        .
        .
        .
}

Next, I populated the table with the appropriate values for league average ERA, league average RA, and the league constant used to translate ERA to FIP. I used values from the `league_constants` table, calculated earlier, with the following piece of code:

for($i = 2007; $i <= 2007; $i++){
  //get league constants
  $league_consts = array();
  $sql = "SELECT * FROM `league_constants` WHERE `year` = $i";
  $res = mysql_query($sql, $link);
  while($row = mysql_fetch_assoc($res)){
    $league_consts[$row[‘rs_league_id’]][$row[‘league_constant_name’]] = $row[‘league_constant_value’];
  }

  .
  .
  .
  .
       
  //fip_era_const, avg_era, avg_ra
  foreach($league_consts as $league_id => $consts){
    $sql =
    "UPDATE `pitcher_game_start_fip`
      SET
        `fip_era_const` = ‘"
.$consts[‘sp_fip_era’]."’,
        `avg_era` = ‘"
.$consts[‘sp_era’]."’,
        `avg_ra` = ‘"
.$consts[‘sp_ra’]."’
      WHERE
        `year` = ‘.$i.’ AND
        `rs_league_id` = ‘"
.$league_id."’";
    mysql_query($sql, $link);
  }
       
  .
  .
  .
  .
}

And now, calculate FIP and WAR for the starting pitcher of each game. Here are the necessary formulas:

  1. Calculate FIP by adding the league FIP constant to the pitcher’s base FIP:
    FIP = base_fip + fip_era_const
  2. Turn FIP into a value reflecting runs rather than earned runs. To do so, multiply FIP by the league RA/ERA ratio:
    fip_ra = FIP * (avg_ra/avg_era)
  3. Adjust fip_ra to the park setting:
    fip_ra_prk = fip_ra / PF
  4. Calculate how many runs are required to win in the current setting. This is formula comes from Tango’s +2*1.5 formula: ((League RA + Pitcher’s RA)/2)+2)*1.5
    R_per_W = (((((54-IP_outs)*avg_ra) + (IP_outs*fip_ra_prk)) / 54) + 2) * 1.5
  5. And finally, calculate WAR. An average starter has a winning% of .500. A replacement level starter has a winning percentage of .380% (.12 less than .500). So, if we calculate the pitcher’s wins above replacement, multiplied by IP/9 inning game, we get WAR for the game in question:
    WAR = ((.12 + ((avg_ra – fip_ra_prk) / R_per_W)) * (IP_outs/3)) / 9

Here’s the last part of the script, making the above calculations.

for($i = 2007; $i <= 2007; $i++){
  .
  .
  .
  .    
  //calculate FIP, fip_ra, fip_ra_prk, R_per_W, WAR, RAR
  $sql =
  "UPDATE `pitcher_game_start_fip`
    SET
      `FIP` = @fip:= (`base_fip` + `fip_era_const`),
      `fip_ra` = @fip_ra:= ((@fip * avg_ra) / avg_era),
      `fip_ra_prk` = @frp:= (@fip_ra / `PF`),
      `R_per_W` = @rpw:= (((  ( ((54-`IP_outs`)*`avg_ra`) + (`IP_outs`*@frp) ) / 54) + 2) * 1.5),
      `WAR` = @war:= (((.12 + ((`avg_ra` – @frp) / @rpw)) * (`IP_outs`/3)) / 9),
      `RAR` = @war * @rpw
    WHERE `year` = "
.$i;
  mysql_query($sql, $link);
}

And here are the results for starts by James Shields:

rs_game_id rs_player_id rs_park_id rs_league_id rs_team_id year IP_outs PF fip_era_const avg_era avg_ra fip_ra_const ERA base_fip FIP fip_ra fip_ra_prk R_per_W RAR WAR
SEA200709140 shiej002 SEA03 AL TBA 2007 24 0.948 3.408 4.686 5.064 3.786 1.12 -0.50 2.91 3.14 3.31 9.430 2.56 0.27
TBA200709090 shiej002 STP01 AL TBA 2007 21 0.876 3.408 4.686 5.064 3.786 2.57 0.43 3.84 4.15 4.73 10.404 1.23 0.12
TBA200709030 shiej002 STP01 AL TBA 2007 21 0.876 3.408 4.686 5.064 3.786 1.29 -1.29 2.12 2.29 2.61 9.166 2.76 0.30
BAL200708290 shiej002 BAL12 AL TBA 2007 24 1.109 3.408 4.686 5.064 3.786 4.50 0.88 4.29 4.63 4.18 10.006 1.85 0.19
TBA200708240 shiej002 STP01 AL TBA 2007 25 0.876 3.408 4.686 5.064 3.786 2.16 0.00 3.41 3.68 4.20 9.999 1.91 0.19
TBA200708190 shiej002 STP01 AL TBA 2007 18 0.876 3.408 4.686 5.064 3.786 3.00 -1.50 1.91 2.06 2.35 9.241 2.55 0.28
BOS200708130 shiej002 BOS07 AL TBA 2007 18 1.177 3.408 4.686 5.064 3.786 1.50 -1.67 1.74 1.88 1.60 8.862 3.02 0.34
DET200708080 shiej002 DET05 AL TBA 2007 21 1.051 3.408 4.686 5.064 3.786 1.29 -0.71 2.70 2.92 2.77 9.260 2.65 0.29
TBA200708030 shiej002 STP01 AL TBA 2007 20 0.876 3.408 4.686 5.064 3.786 4.05 1.95 5.36 5.79 6.61 11.455 -0.13 -0.01
TBA200707280 shiej002 STP01 AL TBA 2007 15 0.876 3.408 4.686 5.064 3.786 5.40 -1.60 1.81 1.95 2.23 9.415 2.20 0.23
NYA200707220 shiej002 NYC16 AL TBA 2007 10 1.070 3.408 4.686 5.064 3.786 27.00 6.90 10.31 11.14 10.41 12.081 -1.44 -0.12
TBA200707170 shiej002 STP01 AL TBA 2007 21 0.876 3.408 4.686 5.064 3.786 2.57 -0.14 3.27 3.53 4.03 9.994 1.74 0.17
TBA200707120 shiej002 STP01 AL TBA 2007 18 0.876 3.408 4.686 5.064 3.786 7.50 6.33 9.74 10.52 12.01 14.071 -3.51 -0.25
KCA200707060 shiej002 KAN06 AL TBA 2007 23 1.033 3.408 4.686 5.064 3.786 4.70 -1.17 2.24 2.42 2.34 8.856 3.22 0.36
CLE200707010 shiej002 CLE08 AL TBA 2007 19 1.132 3.408 4.686 5.064 3.786 2.84 -0.63 2.78 3.00 2.65 9.323 2.48 0.27
TBA200706260 shiej002 STP01 AL TBA 2007 21 0.876 3.408 4.686 5.064 3.786 6.43 0.57 3.98 4.30 4.91 10.505 1.10 0.10
ARI200706200 shiej002 PHO01 NL TBA 2007 15 1.111 3.280 4.567 4.909 3.622 10.80 3.60 6.88 7.40 6.66 11.092 -0.23 -0.02
COL200706150 shiej002 DEN02 NL TBA 2007 18 1.160 3.280 4.567 4.909 3.622 7.50 4.17 7.45 8.01 6.90 11.361 -0.42 -0.04
FLO200706090 shiej002 MIA01 NL TBA 2007 21 1.068 3.280 4.567 4.909 3.622 2.57 0.14 3.42 3.68 3.44 9.508 2.03 0.21
TBA200706040 shiej002 STP01 AL TBA 2007 22 0.876 3.408 4.686 5.064 3.786 2.45 2.59 6.00 6.48 7.40 12.023 -0.73 -0.06
TBA200705300 shiej002 STP01 AL TBA 2007 27 0.876 3.408 4.686 5.064 3.786 3.00 -0.78 2.63 2.84 3.24 9.229 2.93 0.32
CHA200705250 shiej002 CHI12 AL TBA 2007 21 1.084 3.408 4.686 5.064 3.786 5.14 3.43 6.84 7.39 6.82 11.619 -0.28 -0.02
TBA200705200 shiej002 STP01 AL TBA 2007 21 0.876 3.408 4.686 5.064 3.786 1.29 -1.00 2.41 2.60 2.97 9.375 2.50 0.27
TBA200705150 shiej002 LBV01 AL TBA 2007 24 1.257 3.408 4.686 5.064 3.786 3.38 0.38 3.79 4.09 3.26 9.391 2.61 0.28
BAL200705090 shiej002 BAL12 AL TBA 2007 27 1.109 3.408 4.686 5.064 3.786 0.00 -0.56 2.85 3.08 2.78 8.879 3.35 0.38
TBA200705030 shiej002 STP01 AL TBA 2007 22 0.876 3.408 4.686 5.064 3.786 3.68 -0.95 2.46 2.66 3.03 9.354 2.57 0.27
OAK200704270 shiej002 OAK01 AL TBA 2007 24 0.833 3.408 4.686 5.064 3.786 1.12 -0.25 3.16 3.41 4.10 9.951 1.92 0.19
TBA200704220 shiej002 STP01 AL TBA 2007 24 0.876 3.408 4.686 5.064 3.786 2.25 -1.00 2.41 2.60 2.97 9.200 2.84 0.31
TBA200704160 shiej002 STP01 AL TBA 2007 19 0.876 3.408 4.686 5.064 3.786 7.11 3.47 6.88 7.43 8.48 12.402 -1.36 -0.11
TEX200704110 shiej002 ARL02 AL TBA 2007 21 0.979 3.408 4.686 5.064 3.786 5.14 3.71 7.12 7.69 7.86 12.225 -1.03 -0.08
TBA200704060 shiej002 STP01 AL TBA 2007 20 0.876 3.408 4.686 5.064 3.786 4.05 0.45 3.86 4.17 4.76 10.427 1.15 0.11

To get Shields’ WAR for the 2007 season, we can take a weighted average of his starts, using the following query:

mysql> SELECT SUM(WAR) FROM pitcher_game_start_fip WHERE rs_player_id = ‘shiej002′ AND year = 2007;
+———-+
| SUM(WAR) |
+———-+
|     4.74 |
+———-+
1 row IN SET (0.00 sec)

Alternatively, you could calculate the pitcher’s weighted park factor, weighted league ERA, and weighted league RA and use these values with the pitcher’s IP, HR, SO, BB, etc for the season.

Well, there you have it. The next thing to do is to store season FIP’s, WAR’s and other values for more practical comparisons of players. Then, I’ll have to calculate the WAR of relief pitchers, something a bit more complicated.

This entry was posted in Baseball, Tech and tagged , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>