12 April 2009

A bit about my projections

My projections are not entirely original work, and I do not claim them to be. I put a good deal of time into them, but it was really just fine-tuning other people's work to my own purposes. I came up with several good ideas on which I lacked the time to follow through, but I hope to add them in the future.

The basic data of my projections are an average of the Bill James, CHONE, Marcel, Oliver, and ZiPS projections, all of which are available on Fangraphs. I converted the raw projections into a rate per AB. I understand that PA would be a better denominator, but not all of my data included PA.

For playing time projections, I did a bit more work. I used the Bill James and Marcel projections as the basic guide (part A), and CHONE, Oliver, and ZiPS, to my eye, too often give an inaccurate idea of playing time, especially for younger players. They project more of what the player could do than what the player will get the opportunity to do. I used MLBDepthcharts.com, which I continued to check throughout the offseason, as the other side of my playing time projections (part B). I used the players' projected roles and converted them into an approximate AB total. I also considered whether a player was injured or is considered prone to injury, thus lowering my projected AB. I then considered their likely place in the lineup, again using MLBDepthcharts.com, and multiplied that projected AB total by a lineup position multiplier, which I obtained by weighting the last five seasons of aggregate MLB data by batting order position (5-4-3-2-1), via baseball-reference.com. I then averaged part A and part B, figuring that preseason expectations are not all going to be accurate, but neither will the projections based a player's history. By combining the player's history with his role at the start of the season, I hoped to obtain a better idea of what that player would contribute during the season. My team offense was based on wOBA.

For pitchers, I did much the same, using IP instead of AB. I used a very rough idea of how many IP to expect from each pitcher, and considered the depth and strength of the bullpen in giving an admittedly arbitrary part B to each reliever. My team pitching was based on FIP.

For fielding, I went for simplicity, using the player's AB total to indicate their playing time. I used the CHONE defensive projections (baseballprojection.com) as one side of my defensive projections. I then gathered each player's UZR/150 and games played at their positions for the last three years, courtesy of Fangraphs. I weighted them according to games played. I then, running out of time, used the CHONE projection and my UZR projection at each position, combined with the expected role of the player, to venture an estimate as to the player's runs cost/saved over the course of the season. For instance, a DH would be mostly 0, but if they were likely to spend some time in the field, then I would give them maybe 10% of their UZR/150 for the position, based on how much time they would likely be a defender. It was very approximate, and I considered players without MLB experience to be league average. If they were expected to play multiple roles (as they usually are), I considered them average at C, 2B, 3B, LF, and RF; slightly above average at 1B; and slightly below average at SS and CF.

I then combined the offensive, pitching, and defensive contributions to come up with estimated Runs scored and Runs allowed totals. I used these to determine a Pythagorean win expectation. I then went through each team's schedule and used the log-5 formula to determine the teams' win expectancies against each other, assuming the Pythagorean expectation to be their natural Winning percentage. I did not assign a winner to each game or series; I simply added the win expectancies accrued from each game to arrive at a projected win total.

And there you have it. I simplified many things, and I have so many ideas for improving the system, but I did this all on Excel with limited time, and could not pursue (yet) all of my ideas. Some of those:

-Creating a more formulaic basis for each player's defensive ratings at each position, taking their defense at every position into account, based on playing time. For instance, a player with 6 games experience at 2B but 300 games at SS would have a 2B rating almost entirely derived from their 2B rating and a positional adjustment. Meanwhile, a player with 100 games each at 2B, 3B, and SS would have a rating based almost half on their 2B playing time, but half based on their other infield ratings. I am tentatively using 250 games (over 3 years) as the cutoff point; a player with 250 games at 2B would use only 2B data, a player with 200 games at 2B would use 2B data for 80% of their projection and then date from other positions, and a player with less than 250 games played total would have the remainder averaged in as league average (0).

-Estimating each players RBI and R based on: their position in the batting order, their wOBA (RBI), their SLG (RBI), their OBP (R), the preceding batters' OBP (RBI), the following batters' SLG (R), and the following batters' wOBA (R). This would require fairly accurately projecting each team's wOBA/OBP/SLG projection for each batting order slot, taking into account the different hitters likely to spend time there. It would also require extensive math work in figuring out what correlations (if any) there are between these rates and the actual scoring of runs. This data may or may not matter for my team projections (I would need to be convinced it was more accurate than using wOBA). However, it would help for individual projections, although mostly in a way that common sense would also do.

-Studying the factors that influence a pitcher's decisions. Do certain types of pitcher get more no-decisions per start than others? How do the relative strengths of the offense and bullpen/defense affect which starts become no-decisions (left with lead, left with deficit)? I don't think I'll have the time to study this in-depth, honestly, for a few years, but it's definitely something I would like to look into.

-Park factors. I'm not sure which projections take park factors into account, and whether they project for a neutral park or for a home park, or even which team they project for free agents (i.e. when did they project?). Thus, we can assume the Rockies and Rangers might score less runs, and maybe the A's and Padres won't score as many.

-Taking replacement level into consideration. A team is not likely to give much time to players who perform significantly below replacement level. It should be assumed that replacement level replacements will be found.

-Age. I considered slightly penalizing teams with an older age by projecting, in the aggregate, that they would receive fewer AB from their starters, and more AB from their bench players and replacement/AAA players than projected. This would account for the likelihood that someone will go down, without unduly penalizing a given player who shows no signs of being frail. Age will catch up to someone, say the odds.

-Finally... considering the effects of trades. It is beyond the scope of what I would be doing, but a serious projectionist could consider the expected records of teams in a continuous manner as the season progresses. He could then project at which point a team would decide that they were out of the running, or that they needed another bat/arm to contend. Then a look at who might be available and what money might be available could lead to a projection as to the level of production a team could reasonably expect to gain/lose from trading. Obviously, projecting specific trades would be difficult, but projecting that the team has roughly a 75% chance of acquiring one of 4-5 available pitchers would help. You could add 75% of the average run-saving production of those pitchers, demoting a lesser performer, in projecting the rest of the season. This would influence the final standings.

I probably skipped some nice ideas that I might detail later. Hopefully someone takes the time to solve some of these issues, and hopefully I do so as well.

No comments: