May 13, 2010

For more baseball analysis, check out The Hardball Times.

For those following along with the debate over at the CardRunners league site, you'll know that the topic of accuracy and precision of player "projections" (or whatever you wish to call your impressions of a player) has been involved throughout. There are those in the fantasy community who have taken to talking about "false accuracy" and how projections are merely a crutch, a security blanket, and that because we can never truly know what a player will do in the future, precision is unimportant. Regular readers also know that this is not something I believe to be true.

Far too much has been written at CR for me to dig through and try to find the exact quotes, so forgive (and correct) me if this isn't perfectly accurate, but essentially, those on the "intuition" side of the debate have said something to the effect of, "We've done projections and models before and have discovered that they have limits. Fantasy baseball is a complex game that can never be fully comprehended or predicted, and we will never be 100 percent accurate with our projections, so instead of being quantitative about it, we're just going to use our intuition instead. It's not important if Derek Jeter steals 25 bases or 21 bases; having a general idea is enough."

This line of thinking seems to be becoming more and more prevalent in the fantasy industry, even among those with reputations as "stat" guys. One of the analysts at the forefront of this movement is Ron Shandler, who had this to say in his Baseball Forecaster last year:

The player knowledge is not "Vladimir Guerrero is projected to hit 25 home runs, drive in 100 runs, and bat .300." These are lifeless pieces of projected data. Vlad could 27 HRs, or 22. He could bat .317 or .292. There are dozens of variables that may impact the actual numbers. The only knowledge that you can count on even a little is "Vlad Guerrero is a fading slugger who will get regular at-bats on a contending team." Anything more definitive is pointless.

I disagree with this.

Shandler, more recently, repackaged his point using Miguel Cabrera as his example (I've condensed it to be succinct -- follow the link if you wish to read his entire explanation):

Coming into the 2009 season, we had projected that (Cabrera would) hit 39 home runs. That's his M.O. -- he hits home runs in the 30s...

Cabrera finished 2009 with 34 HR. The difference between 34 and 39 is a bit more noticeable but does not substantively change who Cabrera is...

What's more, we already know that there will be a minimum 30 percent error bar around whatever number we attach to his projected home run output. For a 30ish home run hitter, that could be a variance of 10 home runs. Suddenly, my 39 HR projection doesn't look so bad ...

Maybe, but by that logic, a 21 HR projection wouldn't look so bad either, and I think we can all agree that a 21 HR projection for Miguel Cabrera would be way off-base.

So I have to ask, why do we need to attach a "39" to his projected home run output?...

Perhaps we should just project that Cabrera will hit HRs "in the 30s." It's a wide enough range that not only covers our error bar -- essentially taking into account some of the variability of playing time -- but, oddly, also increases our accuracy. I'm more apt to be correct projecting Cabrera to hit HRs in the high-30's than projecting him to hit exactly 37.

Well, of course it's going to increase our accuracy. We're getting 10 guesses instead of one. "In the 30s" will never be wrong when "37" is right.

I think the problem here (and perhaps I'm misinterpreting) is that we seem to assume that this error bar is static, that Miguel Cabrera has one fixed error bar and that we're completely justified in projecting any number of home runs within that error bar, and that if we do this we'll be fine. But that's not true. Every number from 30 to 39 is going to have its own individual error bar. Because that "30 percent error bar" exists, if all we were to say is that "Miguel Cabrera will hit HRs in the 30s," we must implicitly be saying that we're projecting him to hit 35 HRs.

Why? Because if we're not saying that he's going to hit 35 HRs, then that error bar looks very different:

Neither of these bars covers all the HRs in the 30s. So in actuality, by trying to remove precision, it seems that we may actually just be fooling ourselves. In trying to remove precision, we're actually implying precision (strange as it is), but not the good kind that's well thought-out. Rather, it's the rounding-off kind. So if we're going to be precise no matter what, why not make it count? Put me down for 37 home runs.

My point essentially boils down to this: if you're using a range of outcomes as your projection, you're using an error bar. And if you're using an error bar, you're necessarily implying a precise projection -- whatever happens to be in the middle of that bar.

In a similar vein, Shandler posited in a March newsletter that perhaps we shouldn't take consensus No. 1 pick Albert Pujols with the first pick in the draft.

If I were to go into a public draft with the No. 1 seed and select Ryan Braun first, or Joe Mauer, or Carl Crawford ... well, I'd certainly keep the bloggers and tweeters busy for a few days.

But consider some facts...

Over the past six years, from 2004-2009, the player who was the consensus No. 1 pick has NEVER finished first. Never. This is a period when Pujols has been just as dominant as he is now. A-Rod was dominant during this time as well. Neither finished No. 1 in a year that the ADPs predicted they would.

In three of the past six years, the player who DID finish No. 1 wasn't even ranked in the top 15 coming into the season. Ichiro Suzuki, Derrek Lee and Jose Reyes all finished No. 1 in a year when they couldn't crack the pre-season top 15.

Yes, but how could we have known that Ichiro or Lee or Reyes would have been No. 1? Had we known, they would have been taken No. 1, wouldn't they have? Or at least they would have been in the top 15. Just because a non-top 15 player has a good chance of finishing No. 1 doesn't mean we know who it is. Sure, we could have guessed and taken Ichiro, but we could have just as easily guessed and taken Alfonso Soriano or David Ortiz and been wrong. Using hindsight to say that Ichiro should have been picked No. 1 is an example of the Texas Sharpshooter Fallacy.

You have a decision to make on draft day. You can avoid public ridicule and select Pujols and Hanley as your top two picks. Or you can consider other options. You can go for across-the-board consistency with a Ryan Braun or Chase Utley. You can hop onto the rising trends of a Matt Kemp or Justin Upton. You can play the speed scarcity card with a Carl Crawford or Jacoby Ellsbury. You can play the position scarcity card with A-Rod or Joe Mauer.

You're probably reading this and thinking, Justin Upton with a No. 1 pick? Well, unless you are convinced he will come back to you in Round 2, remember that we would have been saying the exact same thing about Ichiro, Derrek Lee and Jose Reyes just a few years ago.

I'm not convinced that Shandler is looking at this the right way. Sure, odds are Albert Pujols will not be the No. 1 most productive player in fantasy baseball. Odds are, some player other than Albert Pujols will be the No. 1 fantasy player. But, sitting with the No. 1 pick in the draft, we are not given the choice of "Albert Pujols or "the field". We're given the choice of "Albert Pujols" or "Hanley Ramirez" or "Chase Utley" or "David Eckstein." And therein lies the problem with the analysis. While Pujols may not have a great chance at finishing No. 1, he stands a greater chance than anyone else. Maybe it breaks down like this:

Chances of finishing 2010 as the No. 1 fantasy baseball player

Albert Pujols: 12%Hanley Ramirez: 10%Alex Rodriguez: 8%Chase Utley: 7.5%Matt Kemp: 6%....David Eckstein: 0.0001%

Twelve percent may not be that high (and I'm just making these numbers up, they may be way off), but it's the highest of anyone else on the list. Were the list of available options to look like this, it'd be a different story:

Chances of finishing 2010 as the No. 1 fantasy baseball player

Albert Pujols: 12%Someone other than Albert Pujols: 88%

To put it another way, it makes no sense to take a guy like Jacoby Ellsbury just because Pujols has some chance of not being the No. 1 player. Let's do one more thought exercise using a graph of Pujols' and Ellsbury's expected value distributions based upon the "30 percent error bar" (assuming reasonable $40 Pujols and $33 Ellsbury projections).

Sure, a scenario exists -- and a somewhat likely scenario, at that -- where Ellsbury is more valuable than Pujols. But we can't take Ellsbury over Pujols just because in a few scenarios he could end up being more valuable. In far more scenarios, Pujols will be more valuable (and in one he's almost $20 more valuable!). And this will be the case for every single player who is an alternative to Pujols (assuming you have Pujols ranked No. 1 on your cheat sheet, of course).

If you like Jacoby Ellsbury better than Albert Pujols, sure, take him No. 1 if you can't trade the pick and he won't be there in the second round. But don't take Ellsbury just because Pujols' chances of finishing No. 1 are low in an absolute sense.

I know it seems like I'm picking on Shandler a lot here, but that's not my intention. I have the utmost respect for Shandler and what he has done for the fantasy industry. I used him for most of my examples precisely because he is such an influential figure, because he is at the forefront of this line of reasoning, and because he has been so vocal about it in recent years. I think Ron is a very intelligent and talented fantasy player; I just don't agree with him in this instance.

You May Like

Eagle (-2)
Birdie (-1)
Bogey (+1)
Double Bogey (+2)