Ok, I finally have the beginnings (Finally have the beginnings? That doesn’t sound right.) of FIP and WAR calculations for starting pitchers. After parsing pitching data out of retrosheet files from the 2007 season, I calculated FIP and WAR for SP on a per-game level. By taking a weighted average of these game-level FIP and WAR values, one can derive a pitcher’s FIP and WAR for an entire season.
In a previous post, I mentioned my intent to update such metrics after parsing each event from a retrosheet file or a stats feed. Though I intend to eventually update these metrics in “real-time”, I am currently calculating FIP and WAR after parsing an entire season’s worth of data. For now, I want to get these values into a database and make sure my math is correct. Once done, I’ll go back and optimize things so that FIP, WAR, and other stats can update with each relevant event.
I’ll now go through the process in detail, using James Shields’ 2007 season as an example. Before going through the details, here are a few disclaimers:
- This is my first attempt at calculating FIP and WAR, and I am by no means an expert on the process. I make a few assumptions, which I will explain, that may be incorrect. If I find out that any of these assumptions are wrong, I’ll go back and update this post with the corrections.
- MySQL tables are NOT optimized for displaying leaderboards or other practical uses. Optimizing tables will be an ongoing process.
My parsing script places a starting pitcher’s stats for an individual game into a table with the following structure:
CREATE TABLE `pitcher_game_start` (
`rs_game_id` char(12) NOT NULL,
`rs_player_id` char(8) NOT NULL,
`rs_park_id` char(5) NOT NULL,
`rs_league_id` char(2) NOT NULL,
`rs_team_id` char(3) NOT NULL,
`year` smallint(6) NOT NULL,
`HR` tinyint(4) NOT NULL,
`BB` tinyint(4) NOT NULL,
`HBP` tinyint(4) NOT NULL,
`IBB` tinyint(4) NOT NULL,
`SO` tinyint(4) NOT NULL,
`ER` tinyint(4) NOT NULL,
`R` tinyint(4) NOT NULL,
`IP_outs` tinyint(4) NOT NULL,
PRIMARY KEY (`rs_player_id`,`rs_game_id`),
KEY `year_league` (`year`,`rs_league_id`),
KEY `park_year` (`rs_park_id`,`year`),
KEY `game_id` (`rs_game_id`),
KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM
A field of significant interest is `rs_league_id`. This field describes the league of the home team. For interleague games, the home team’s league will be used for the pitchers on both sides. Additionally, the league average ERA and RA, both used in calculating FIP and WAR, will be based on the home team’s league.
Here are Shields’ rows in `pitcher_game_start` for 2007:
| rs_game_id |
rs_player_id |
rs_park_id |
rs_league_id |
rs_team_id |
year |
HR |
BB |
HBP |
IBB |
SO |
ER |
R |
IP_outs |
| SEA200709140 |
shiej002 |
SEA03 |
AL |
TBA |
2007 |
0 |
2 |
0 |
0 |
5 |
1 |
1 |
24 |
| TBA200709090 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
1 |
0 |
0 |
0 |
5 |
2 |
2 |
21 |
| TBA200709030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
1 |
0 |
0 |
6 |
1 |
4 |
21 |
| BAL200708290 |
shiej002 |
BAL12 |
AL |
TBA |
2007 |
1 |
2 |
0 |
0 |
6 |
4 |
4 |
24 |
| TBA200708240 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
1 |
1 |
0 |
0 |
8 |
2 |
2 |
25 |
| TBA200708190 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
1 |
0 |
0 |
6 |
2 |
2 |
18 |
| BOS200708130 |
shiej002 |
BOS07 |
AL |
TBA |
2007 |
0 |
0 |
0 |
0 |
5 |
1 |
1 |
18 |
| DET200708080 |
shiej002 |
DET05 |
AL |
TBA |
2007 |
1 |
0 |
0 |
0 |
9 |
1 |
1 |
21 |
| TBA200708030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
1 |
3 |
1 |
0 |
6 |
3 |
3 |
20 |
| TBA200707280 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
1 |
1 |
0 |
7 |
3 |
3 |
15 |
| NYA200707220 |
shiej002 |
NYC16 |
AL |
TBA |
2007 |
1 |
4 |
0 |
0 |
1 |
10 |
10 |
10 |
| TBA200707170 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
1 |
0 |
0 |
2 |
2 |
2 |
21 |
| TBA200707120 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
3 |
1 |
0 |
0 |
2 |
5 |
6 |
18 |
| KCA200707060 |
shiej002 |
KAN06 |
AL |
TBA |
2007 |
0 |
0 |
1 |
0 |
6 |
4 |
4 |
23 |
| CLE200707010 |
shiej002 |
CLE08 |
AL |
TBA |
2007 |
1 |
0 |
1 |
0 |
10 |
2 |
3 |
19 |
| TBA200706260 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
2 |
0 |
0 |
0 |
11 |
5 |
5 |
21 |
| ARI200706200 |
shiej002 |
PHO01 |
NL |
TBA |
2007 |
2 |
0 |
0 |
0 |
4 |
6 |
6 |
15 |
| COL200706150 |
shiej002 |
DEN02 |
NL |
TBA |
2007 |
2 |
1 |
0 |
0 |
2 |
5 |
5 |
18 |
| FLO200706090 |
shiej002 |
MIA01 |
NL |
TBA |
2007 |
1 |
0 |
2 |
0 |
9 |
2 |
2 |
21 |
| TBA200706040 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
2 |
1 |
0 |
0 |
5 |
2 |
2 |
22 |
| TBA200705300 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
1 |
0 |
0 |
5 |
3 |
3 |
27 |
| CHA200705250 |
shiej002 |
CHI12 |
AL |
TBA |
2007 |
1 |
3 |
2 |
0 |
2 |
4 |
4 |
21 |
| TBA200705200 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
3 |
0 |
0 |
8 |
1 |
1 |
21 |
| TBA200705150 |
shiej002 |
LBV01 |
AL |
TBA |
2007 |
1 |
0 |
0 |
0 |
5 |
3 |
3 |
24 |
| BAL200705090 |
shiej002 |
BAL12 |
AL |
TBA |
2007 |
0 |
1 |
0 |
0 |
4 |
0 |
0 |
27 |
| TBA200705030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
2 |
1 |
0 |
8 |
3 |
3 |
22 |
| OAK200704270 |
shiej002 |
OAK01 |
AL |
TBA |
2007 |
1 |
1 |
0 |
0 |
9 |
1 |
1 |
24 |
| TBA200704220 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
1 |
1 |
0 |
0 |
12 |
2 |
2 |
24 |
| TBA200704160 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
2 |
2 |
0 |
0 |
5 |
5 |
5 |
19 |
| TEX200704110 |
shiej002 |
ARL02 |
AL |
TBA |
2007 |
3 |
1 |
0 |
0 |
8 |
4 |
5 |
21 |
| TBA200704060 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
0 |
2 |
1 |
0 |
3 |
3 |
3 |
20 |
Next, I created a table for league constants with the following structure:
CREATE TABLE `league_constants` (
`rs_league_id` char(2) NOT NULL,
`year` smallint(6) NOT NULL,
`league_constant_name` varchar(32) NOT NULL,
`league_constant_value` float(8,3) NOT NULL DEFAULT ’0.000′,
PRIMARY KEY (`rs_league_id`,`year`,`league_constant_name`(8)),
KEY `year` (`year`)
) ENGINE=MyISAM
The constants I am concerned with for these calculations are: ERA, RA, and FIP_ERA. ERA and RA are the league average ERA and RA, and FIP_ERA is the constant used to translate the league average ERA to the league average FIP. I ran the following script to populate the table with said constants:
<?php
$sql =
"SELECT `year`, `rs_league_id`,
SUM(9*`ER`) / (SUM(`IP_outs`) / 3) as sp_era,
SUM(9*`R`) / (SUM(`IP_outs`) / 3) as sp_ra,
SUM(9*`ER` + 2*`SO` – 3*(`BB`-`IBB`+`HBP`) – 13*`HR`) / (SUM(`IP_outs`) / 3) AS sp_fip_era
FROM `pitcher_game_start`
FORCE INDEX (`year_league`)
GROUP BY `year`, `rs_league_id`";
mysql_query($sql, $link);
$inserts = array();
$res = mysql_query($sql, $link);
while($row = mysql_fetch_assoc($res)){
$league = $row[‘rs_league_id’];
$year = $row[‘year’];
unset($row[‘rs_league_id’]);
unset($row[‘year’]);
foreach($row as $key => $value){
$inserts[] = "(‘$league‘, ‘$year‘, ‘$key‘, ‘$value‘)";
}
}
$sql =
"INSERT IGNORE INTO `league_constants` (`rs_league_id`,`year`,`league_constant_name`, `league_constant_value`)
VALUES " . implode(‘,’, $inserts) . "
ON DUPLICATE KEY UPDATE `league_constant_value` = VALUES(`league_constant_value`)";
mysql_query($sql, $link);
And here are the values in the table:
| rs_league_id |
year |
league_constant_name |
league_constant_value |
| AL |
2007 |
sp_era |
4.686 |
| AL |
2007 |
sp_ra |
5.064 |
| AL |
2007 |
sp_fip_era |
3.408 |
| NL |
2007 |
sp_era |
4.567 |
| NL |
2007 |
sp_ra |
4.909 |
| NL |
2007 |
sp_fip_era |
3.280 |
All league constants are derived from innings pitched solely by starting pitchers. When calculating WAR for RP, probably in a later post, I’ll use another set of constants that averages values from innings pitched by relievers.
Now, we have everything we need to start calculating FIP and WAR for each game. I used Dave Cameron’s 7-part description of WAR on fangraphs as a guide. I created the following table to store game details for starting pitcher’s along with calculated FIPs and WARs:
CREATE TABLE `pitcher_game_start_fip` (
`rs_game_id` char(12) NOT NULL,
`rs_player_id` char(8) NOT NULL,
`rs_park_id` char(5) NOT NULL,
`rs_league_id` char(2) NOT NULL,
`rs_team_id` char(3) NOT NULL,
`year` smallint(6) NOT NULL,
`IP_outs` tinyint(4) NOT NULL,
`PF` float(5,3) NOT NULL DEFAULT ’1.000′,
`fip_era_const` float(5,3) NOT NULL DEFAULT ’0.000′,
`avg_era` float(5,3) NOT NULL DEFAULT ’0.000′,
`avg_ra` float(5,3) NOT NULL DEFAULT ’0.000′,
`ERA` float(4,2) NOT NULL,
`base_fip` float(4,2) NOT NULL,
`FIP` float(4,2) NOT NULL,
`fip_ra` float(4,2) NOT NULL,
`fip_ra_prk` float(4,2) NOT NULL,
`R_per_W` float(5,3) NOT NULL DEFAULT ’0.000′,
`RAR` float(4,2) NOT NULL,
`WAR` float(4,2) NOT NULL,
PRIMARY KEY (`rs_player_id`,`rs_game_id`),
KEY `year_league` (`year`,`rs_league_id`),
KEY `park_year` (`rs_park_id`,`year`),
KEY `game_id` (`rs_game_id`),
KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM
The first step is to insert the general game info, as well as PF, ERA, and base_fip columns into the table.
- PF (park factor) comes from another table called, park_effects.
- If not already obvious, ERA = (ER*9) / (IP_outs/3)
- base_fip = (-2*SO + 3*(BB-IBB+HBP) + 13*HR) / (IP_outs/3)
I used the following code snippet to populate those values:
for($i = 2007; $i <= 2007; $i++){
.
.
.
.
//basic info, ERA, base_fip, PF
$sql =
"INSERT IGNORE INTO `pitcher_game_start_fip` (`rs_game_id`, `rs_player_id`, `rs_park_id`, `rs_league_id`, `rs_team_id`, `year`, `IP_outs`, `PF`, `ERA`, `base_fip`)
SELECT
s.`rs_game_id`, s.`rs_player_id`, s.`rs_park_id`, s.`rs_league_id`, s.`rs_team_id`, s.`year`, s.`IP_outs`,
p.`PF`,
(s.`ER`*9) / (s.`IP_outs`/3) as ERA,
(-2*s.`SO` + 3*(s.`BB`-s.`IBB`+s.`HBP`) + 13*s.`HR`) / (s.`IP_outs`/3) as base_fip
FROM `pitcher_game_start` s
LEFT JOIN `games` g ON s.`rs_game_id` = g.`rs_game_id`
LEFT JOIN `park_effects` p ON s.`rs_park_id` = p.`rs_park_id` AND s.`year` = p.`year` AND g.`rs_home_team_id` = p.`rs_team_id`
WHERE s.`year` = $i";
mysql_query($sql, $link);
.
.
.
.
}
Next, I populated the table with the appropriate values for league average ERA, league average RA, and the league constant used to translate ERA to FIP. I used values from the `league_constants` table, calculated earlier, with the following piece of code:
for($i = 2007; $i <= 2007; $i++){
//get league constants
$league_consts = array();
$sql = "SELECT * FROM `league_constants` WHERE `year` = $i";
$res = mysql_query($sql, $link);
while($row = mysql_fetch_assoc($res)){
$league_consts[$row[‘rs_league_id’]][$row[‘league_constant_name’]] = $row[‘league_constant_value’];
}
.
.
.
.
//fip_era_const, avg_era, avg_ra
foreach($league_consts as $league_id => $consts){
$sql =
"UPDATE `pitcher_game_start_fip`
SET
`fip_era_const` = ‘".$consts[‘sp_fip_era’]."’,
`avg_era` = ‘".$consts[‘sp_era’]."’,
`avg_ra` = ‘".$consts[‘sp_ra’]."’
WHERE
`year` = ‘.$i.’ AND
`rs_league_id` = ‘".$league_id."’";
mysql_query($sql, $link);
}
.
.
.
.
}
And now, calculate FIP and WAR for the starting pitcher of each game. Here are the necessary formulas:
- Calculate FIP by adding the league FIP constant to the pitcher’s base FIP:
FIP = base_fip + fip_era_const
- Turn FIP into a value reflecting runs rather than earned runs. To do so, multiply FIP by the league RA/ERA ratio:
fip_ra = FIP * (avg_ra/avg_era)
- Adjust fip_ra to the park setting:
fip_ra_prk = fip_ra / PF
- Calculate how many runs are required to win in the current setting. This is formula comes from Tango’s +2*1.5 formula: ((League RA + Pitcher’s RA)/2)+2)*1.5
R_per_W = (((((54-IP_outs)*avg_ra) + (IP_outs*fip_ra_prk)) / 54) + 2) * 1.5
- And finally, calculate WAR. An average starter has a winning% of .500. A replacement level starter has a winning percentage of .380% (.12 less than .500). So, if we calculate the pitcher’s wins above replacement, multiplied by IP/9 inning game, we get WAR for the game in question:
WAR = ((.12 + ((avg_ra – fip_ra_prk) / R_per_W)) * (IP_outs/3)) / 9
Here’s the last part of the script, making the above calculations.
for($i = 2007; $i <= 2007; $i++){
.
.
.
.
//calculate FIP, fip_ra, fip_ra_prk, R_per_W, WAR, RAR
$sql =
"UPDATE `pitcher_game_start_fip`
SET
`FIP` = @fip:= (`base_fip` + `fip_era_const`),
`fip_ra` = @fip_ra:= ((@fip * avg_ra) / avg_era),
`fip_ra_prk` = @frp:= (@fip_ra / `PF`),
`R_per_W` = @rpw:= ((( ( ((54-`IP_outs`)*`avg_ra`) + (`IP_outs`*@frp) ) / 54) + 2) * 1.5),
`WAR` = @war:= (((.12 + ((`avg_ra` – @frp) / @rpw)) * (`IP_outs`/3)) / 9),
`RAR` = @war * @rpw
WHERE `year` = ".$i;
mysql_query($sql, $link);
}
And here are the results for starts by James Shields:
| rs_game_id |
rs_player_id |
rs_park_id |
rs_league_id |
rs_team_id |
year |
IP_outs |
PF |
fip_era_const |
avg_era |
avg_ra |
fip_ra_const |
ERA |
base_fip |
FIP |
fip_ra |
fip_ra_prk |
R_per_W |
RAR |
WAR |
| SEA200709140 |
shiej002 |
SEA03 |
AL |
TBA |
2007 |
24 |
0.948 |
3.408 |
4.686 |
5.064 |
3.786 |
1.12 |
-0.50 |
2.91 |
3.14 |
3.31 |
9.430 |
2.56 |
0.27 |
| TBA200709090 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
21 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
2.57 |
0.43 |
3.84 |
4.15 |
4.73 |
10.404 |
1.23 |
0.12 |
| TBA200709030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
21 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
1.29 |
-1.29 |
2.12 |
2.29 |
2.61 |
9.166 |
2.76 |
0.30 |
| BAL200708290 |
shiej002 |
BAL12 |
AL |
TBA |
2007 |
24 |
1.109 |
3.408 |
4.686 |
5.064 |
3.786 |
4.50 |
0.88 |
4.29 |
4.63 |
4.18 |
10.006 |
1.85 |
0.19 |
| TBA200708240 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
25 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
2.16 |
0.00 |
3.41 |
3.68 |
4.20 |
9.999 |
1.91 |
0.19 |
| TBA200708190 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
18 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
3.00 |
-1.50 |
1.91 |
2.06 |
2.35 |
9.241 |
2.55 |
0.28 |
| BOS200708130 |
shiej002 |
BOS07 |
AL |
TBA |
2007 |
18 |
1.177 |
3.408 |
4.686 |
5.064 |
3.786 |
1.50 |
-1.67 |
1.74 |
1.88 |
1.60 |
8.862 |
3.02 |
0.34 |
| DET200708080 |
shiej002 |
DET05 |
AL |
TBA |
2007 |
21 |
1.051 |
3.408 |
4.686 |
5.064 |
3.786 |
1.29 |
-0.71 |
2.70 |
2.92 |
2.77 |
9.260 |
2.65 |
0.29 |
| TBA200708030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
20 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
4.05 |
1.95 |
5.36 |
5.79 |
6.61 |
11.455 |
-0.13 |
-0.01 |
| TBA200707280 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
15 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
5.40 |
-1.60 |
1.81 |
1.95 |
2.23 |
9.415 |
2.20 |
0.23 |
| NYA200707220 |
shiej002 |
NYC16 |
AL |
TBA |
2007 |
10 |
1.070 |
3.408 |
4.686 |
5.064 |
3.786 |
27.00 |
6.90 |
10.31 |
11.14 |
10.41 |
12.081 |
-1.44 |
-0.12 |
| TBA200707170 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
21 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
2.57 |
-0.14 |
3.27 |
3.53 |
4.03 |
9.994 |
1.74 |
0.17 |
| TBA200707120 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
18 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
7.50 |
6.33 |
9.74 |
10.52 |
12.01 |
14.071 |
-3.51 |
-0.25 |
| KCA200707060 |
shiej002 |
KAN06 |
AL |
TBA |
2007 |
23 |
1.033 |
3.408 |
4.686 |
5.064 |
3.786 |
4.70 |
-1.17 |
2.24 |
2.42 |
2.34 |
8.856 |
3.22 |
0.36 |
| CLE200707010 |
shiej002 |
CLE08 |
AL |
TBA |
2007 |
19 |
1.132 |
3.408 |
4.686 |
5.064 |
3.786 |
2.84 |
-0.63 |
2.78 |
3.00 |
2.65 |
9.323 |
2.48 |
0.27 |
| TBA200706260 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
21 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
6.43 |
0.57 |
3.98 |
4.30 |
4.91 |
10.505 |
1.10 |
0.10 |
| ARI200706200 |
shiej002 |
PHO01 |
NL |
TBA |
2007 |
15 |
1.111 |
3.280 |
4.567 |
4.909 |
3.622 |
10.80 |
3.60 |
6.88 |
7.40 |
6.66 |
11.092 |
-0.23 |
-0.02 |
| COL200706150 |
shiej002 |
DEN02 |
NL |
TBA |
2007 |
18 |
1.160 |
3.280 |
4.567 |
4.909 |
3.622 |
7.50 |
4.17 |
7.45 |
8.01 |
6.90 |
11.361 |
-0.42 |
-0.04 |
| FLO200706090 |
shiej002 |
MIA01 |
NL |
TBA |
2007 |
21 |
1.068 |
3.280 |
4.567 |
4.909 |
3.622 |
2.57 |
0.14 |
3.42 |
3.68 |
3.44 |
9.508 |
2.03 |
0.21 |
| TBA200706040 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
22 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
2.45 |
2.59 |
6.00 |
6.48 |
7.40 |
12.023 |
-0.73 |
-0.06 |
| TBA200705300 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
27 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
3.00 |
-0.78 |
2.63 |
2.84 |
3.24 |
9.229 |
2.93 |
0.32 |
| CHA200705250 |
shiej002 |
CHI12 |
AL |
TBA |
2007 |
21 |
1.084 |
3.408 |
4.686 |
5.064 |
3.786 |
5.14 |
3.43 |
6.84 |
7.39 |
6.82 |
11.619 |
-0.28 |
-0.02 |
| TBA200705200 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
21 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
1.29 |
-1.00 |
2.41 |
2.60 |
2.97 |
9.375 |
2.50 |
0.27 |
| TBA200705150 |
shiej002 |
LBV01 |
AL |
TBA |
2007 |
24 |
1.257 |
3.408 |
4.686 |
5.064 |
3.786 |
3.38 |
0.38 |
3.79 |
4.09 |
3.26 |
9.391 |
2.61 |
0.28 |
| BAL200705090 |
shiej002 |
BAL12 |
AL |
TBA |
2007 |
27 |
1.109 |
3.408 |
4.686 |
5.064 |
3.786 |
0.00 |
-0.56 |
2.85 |
3.08 |
2.78 |
8.879 |
3.35 |
0.38 |
| TBA200705030 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
22 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
3.68 |
-0.95 |
2.46 |
2.66 |
3.03 |
9.354 |
2.57 |
0.27 |
| OAK200704270 |
shiej002 |
OAK01 |
AL |
TBA |
2007 |
24 |
0.833 |
3.408 |
4.686 |
5.064 |
3.786 |
1.12 |
-0.25 |
3.16 |
3.41 |
4.10 |
9.951 |
1.92 |
0.19 |
| TBA200704220 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
24 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
2.25 |
-1.00 |
2.41 |
2.60 |
2.97 |
9.200 |
2.84 |
0.31 |
| TBA200704160 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
19 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
7.11 |
3.47 |
6.88 |
7.43 |
8.48 |
12.402 |
-1.36 |
-0.11 |
| TEX200704110 |
shiej002 |
ARL02 |
AL |
TBA |
2007 |
21 |
0.979 |
3.408 |
4.686 |
5.064 |
3.786 |
5.14 |
3.71 |
7.12 |
7.69 |
7.86 |
12.225 |
-1.03 |
-0.08 |
| TBA200704060 |
shiej002 |
STP01 |
AL |
TBA |
2007 |
20 |
0.876 |
3.408 |
4.686 |
5.064 |
3.786 |
4.05 |
0.45 |
3.86 |
4.17 |
4.76 |
10.427 |
1.15 |
0.11 |
To get Shields’ WAR for the 2007 season, we can take a weighted average of his starts, using the following query:
mysql> SELECT SUM(WAR) FROM pitcher_game_start_fip WHERE rs_player_id = ‘shiej002′ AND year = 2007;
+———-+
| SUM(WAR) |
+———-+
| 4.74 |
+———-+
1 row IN SET (0.00 sec)
Alternatively, you could calculate the pitcher’s weighted park factor, weighted league ERA, and weighted league RA and use these values with the pitcher’s IP, HR, SO, BB, etc for the season.
Well, there you have it. The next thing to do is to store season FIP’s, WAR’s and other values for more practical comparisons of players. Then, I’ll have to calculate the WAR of relief pitchers, something a bit more complicated.
Calculating FIP and WAR from Retrosheet Data, Part 1
Ok, I finally have the beginnings (Finally have the beginnings? That doesn’t sound right.) of FIP and WAR calculations for starting pitchers. After parsing pitching data out of retrosheet files from the 2007 season, I calculated FIP and WAR for SP on a per-game level. By taking a weighted average of these game-level FIP and WAR values, one can derive a pitcher’s FIP and WAR for an entire season.
In a previous post, I mentioned my intent to update such metrics after parsing each event from a retrosheet file or a stats feed. Though I intend to eventually update these metrics in “real-time”, I am currently calculating FIP and WAR after parsing an entire season’s worth of data. For now, I want to get these values into a database and make sure my math is correct. Once done, I’ll go back and optimize things so that FIP, WAR, and other stats can update with each relevant event.
I’ll now go through the process in detail, using James Shields’ 2007 season as an example. Before going through the details, here are a few disclaimers:
My parsing script places a starting pitcher’s stats for an individual game into a table with the following structure:
`rs_game_id` char(12) NOT NULL,
`rs_player_id` char(8) NOT NULL,
`rs_park_id` char(5) NOT NULL,
`rs_league_id` char(2) NOT NULL,
`rs_team_id` char(3) NOT NULL,
`year` smallint(6) NOT NULL,
`HR` tinyint(4) NOT NULL,
`BB` tinyint(4) NOT NULL,
`HBP` tinyint(4) NOT NULL,
`IBB` tinyint(4) NOT NULL,
`SO` tinyint(4) NOT NULL,
`ER` tinyint(4) NOT NULL,
`R` tinyint(4) NOT NULL,
`IP_outs` tinyint(4) NOT NULL,
PRIMARY KEY (`rs_player_id`,`rs_game_id`),
KEY `year_league` (`year`,`rs_league_id`),
KEY `park_year` (`rs_park_id`,`year`),
KEY `game_id` (`rs_game_id`),
KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM
A field of significant interest is `rs_league_id`. This field describes the league of the home team. For interleague games, the home team’s league will be used for the pitchers on both sides. Additionally, the league average ERA and RA, both used in calculating FIP and WAR, will be based on the home team’s league.
Here are Shields’ rows in `pitcher_game_start` for 2007:
Next, I created a table for league constants with the following structure:
`rs_league_id` char(2) NOT NULL,
`year` smallint(6) NOT NULL,
`league_constant_name` varchar(32) NOT NULL,
`league_constant_value` float(8,3) NOT NULL DEFAULT ’0.000′,
PRIMARY KEY (`rs_league_id`,`year`,`league_constant_name`(8)),
KEY `year` (`year`)
) ENGINE=MyISAM
The constants I am concerned with for these calculations are: ERA, RA, and FIP_ERA. ERA and RA are the league average ERA and RA, and FIP_ERA is the constant used to translate the league average ERA to the league average FIP. I ran the following script to populate the table with said constants:
$sql =
"SELECT `year`, `rs_league_id`,
SUM(9*`ER`) / (SUM(`IP_outs`) / 3) as sp_era,
SUM(9*`R`) / (SUM(`IP_outs`) / 3) as sp_ra,
SUM(9*`ER` + 2*`SO` – 3*(`BB`-`IBB`+`HBP`) – 13*`HR`) / (SUM(`IP_outs`) / 3) AS sp_fip_era
FROM `pitcher_game_start`
FORCE INDEX (`year_league`)
GROUP BY `year`, `rs_league_id`";
mysql_query($sql, $link);
$inserts = array();
$res = mysql_query($sql, $link);
while($row = mysql_fetch_assoc($res)){
$league = $row[‘rs_league_id’];
$year = $row[‘year’];
unset($row[‘rs_league_id’]);
unset($row[‘year’]);
foreach($row as $key => $value){
$inserts[] = "(‘$league‘, ‘$year‘, ‘$key‘, ‘$value‘)";
}
}
$sql =
"INSERT IGNORE INTO `league_constants` (`rs_league_id`,`year`,`league_constant_name`, `league_constant_value`)
VALUES " . implode(‘,’, $inserts) . "
ON DUPLICATE KEY UPDATE `league_constant_value` = VALUES(`league_constant_value`)";
mysql_query($sql, $link);
And here are the values in the table:
All league constants are derived from innings pitched solely by starting pitchers. When calculating WAR for RP, probably in a later post, I’ll use another set of constants that averages values from innings pitched by relievers.
Now, we have everything we need to start calculating FIP and WAR for each game. I used Dave Cameron’s 7-part description of WAR on fangraphs as a guide. I created the following table to store game details for starting pitcher’s along with calculated FIPs and WARs:
`rs_game_id` char(12) NOT NULL,
`rs_player_id` char(8) NOT NULL,
`rs_park_id` char(5) NOT NULL,
`rs_league_id` char(2) NOT NULL,
`rs_team_id` char(3) NOT NULL,
`year` smallint(6) NOT NULL,
`IP_outs` tinyint(4) NOT NULL,
`PF` float(5,3) NOT NULL DEFAULT ’1.000′,
`fip_era_const` float(5,3) NOT NULL DEFAULT ’0.000′,
`avg_era` float(5,3) NOT NULL DEFAULT ’0.000′,
`avg_ra` float(5,3) NOT NULL DEFAULT ’0.000′,
`ERA` float(4,2) NOT NULL,
`base_fip` float(4,2) NOT NULL,
`FIP` float(4,2) NOT NULL,
`fip_ra` float(4,2) NOT NULL,
`fip_ra_prk` float(4,2) NOT NULL,
`R_per_W` float(5,3) NOT NULL DEFAULT ’0.000′,
`RAR` float(4,2) NOT NULL,
`WAR` float(4,2) NOT NULL,
PRIMARY KEY (`rs_player_id`,`rs_game_id`),
KEY `year_league` (`year`,`rs_league_id`),
KEY `park_year` (`rs_park_id`,`year`),
KEY `game_id` (`rs_game_id`),
KEY `pylt` (`rs_player_id`,`year`,`rs_league_id`,`rs_team_id`),
KEY `tly` (`rs_team_id`,`rs_park_id`,`year`),
KEY `ty` (`rs_team_id`,`year`)
) ENGINE=MyISAM
The first step is to insert the general game info, as well as PF, ERA, and base_fip columns into the table.
I used the following code snippet to populate those values:
.
.
.
.
//basic info, ERA, base_fip, PF
$sql =
"INSERT IGNORE INTO `pitcher_game_start_fip` (`rs_game_id`, `rs_player_id`, `rs_park_id`, `rs_league_id`, `rs_team_id`, `year`, `IP_outs`, `PF`, `ERA`, `base_fip`)
SELECT
s.`rs_game_id`, s.`rs_player_id`, s.`rs_park_id`, s.`rs_league_id`, s.`rs_team_id`, s.`year`, s.`IP_outs`,
p.`PF`,
(s.`ER`*9) / (s.`IP_outs`/3) as ERA,
(-2*s.`SO` + 3*(s.`BB`-s.`IBB`+s.`HBP`) + 13*s.`HR`) / (s.`IP_outs`/3) as base_fip
FROM `pitcher_game_start` s
LEFT JOIN `games` g ON s.`rs_game_id` = g.`rs_game_id`
LEFT JOIN `park_effects` p ON s.`rs_park_id` = p.`rs_park_id` AND s.`year` = p.`year` AND g.`rs_home_team_id` = p.`rs_team_id`
WHERE s.`year` = $i";
mysql_query($sql, $link);
.
.
.
.
}
Next, I populated the table with the appropriate values for league average ERA, league average RA, and the league constant used to translate ERA to FIP. I used values from the `league_constants` table, calculated earlier, with the following piece of code:
//get league constants
$league_consts = array();
$sql = "SELECT * FROM `league_constants` WHERE `year` = $i";
$res = mysql_query($sql, $link);
while($row = mysql_fetch_assoc($res)){
$league_consts[$row[‘rs_league_id’]][$row[‘league_constant_name’]] = $row[‘league_constant_value’];
}
.
.
.
.
//fip_era_const, avg_era, avg_ra
foreach($league_consts as $league_id => $consts){
$sql =
"UPDATE `pitcher_game_start_fip`
SET
`fip_era_const` = ‘".$consts[‘sp_fip_era’]."’,
`avg_era` = ‘".$consts[‘sp_era’]."’,
`avg_ra` = ‘".$consts[‘sp_ra’]."’
WHERE
`year` = ‘.$i.’ AND
`rs_league_id` = ‘".$league_id."’";
mysql_query($sql, $link);
}
.
.
.
.
}
And now, calculate FIP and WAR for the starting pitcher of each game. Here are the necessary formulas:
FIP = base_fip + fip_era_const
fip_ra = FIP * (avg_ra/avg_era)
fip_ra_prk = fip_ra / PF
R_per_W = (((((54-IP_outs)*avg_ra) + (IP_outs*fip_ra_prk)) / 54) + 2) * 1.5
WAR = ((.12 + ((avg_ra – fip_ra_prk) / R_per_W)) * (IP_outs/3)) / 9
Here’s the last part of the script, making the above calculations.
.
.
.
.
//calculate FIP, fip_ra, fip_ra_prk, R_per_W, WAR, RAR
$sql =
"UPDATE `pitcher_game_start_fip`
SET
`FIP` = @fip:= (`base_fip` + `fip_era_const`),
`fip_ra` = @fip_ra:= ((@fip * avg_ra) / avg_era),
`fip_ra_prk` = @frp:= (@fip_ra / `PF`),
`R_per_W` = @rpw:= ((( ( ((54-`IP_outs`)*`avg_ra`) + (`IP_outs`*@frp) ) / 54) + 2) * 1.5),
`WAR` = @war:= (((.12 + ((`avg_ra` – @frp) / @rpw)) * (`IP_outs`/3)) / 9),
`RAR` = @war * @rpw
WHERE `year` = ".$i;
mysql_query($sql, $link);
}
And here are the results for starts by James Shields:
To get Shields’ WAR for the 2007 season, we can take a weighted average of his starts, using the following query:
+———-+
| SUM(WAR) |
+———-+
| 4.74 |
+———-+
1 row IN SET (0.00 sec)
Alternatively, you could calculate the pitcher’s weighted park factor, weighted league ERA, and weighted league RA and use these values with the pitcher’s IP, HR, SO, BB, etc for the season.
Well, there you have it. The next thing to do is to store season FIP’s, WAR’s and other values for more practical comparisons of players. Then, I’ll have to calculate the WAR of relief pitchers, something a bit more complicated.