The problems with defensive stats
Let's suppose that you're a normal baseball fan with normal interests and an understanding that smart writers and operators think fancy defensive statistics are important.
Curious, you visit Fangraphs.com and see that
Later in the day, looking to settle a bet, you look up
In the evening, suspicious, you sign up for a subscription to BillJamesOnline.net and check their defensive evaluations. You're by now not shocked to see that the top five here has nothing in common with competing ones:
Exasperated, you give up and mutter that these guys need to get their act together. Is this a fair reaction? Well, yes. If the problem were that one metric was right and its rivals wrong -- that Fangraphs' UZR, for example, was reliable, while the Plus/Minus system at
The basic problem with fancy defensive statistics is that they aren't statistics at all in the sense normal baseball fans understand the word. It is a fact that Cory Snyder hit .272 in 1988. On its own, this means nearly nothing, but by putting it together with other statistics, which account for playing time, his hitting environment and such, we can derive an evaluative metric that tells us how good a hitter he was in 1988.
This can't be done with defense because we lack statistics that we can rely on to describe what happens on the field.
Of the other problems with defensive metrics, two strike me as especially significant. One is that they don't obviously correlate to winning. From 2002 through 2009, for example, the correlation between team winning percentage and team UZR was .19 -- less than half that of fielding percentage. This is ungenerous, as UZR correlates reasonably well to runs allowed without even accounting for pitching, but points up that these metrics are not nearly as robust as the likes of OPS.
The other issue is this: As a rule of thumb, researchers figure that three years' worth of defensive data is equivalent to one year of offensive data. Practically, this means that a single years' rating in any advanced defensive metric is next to useless, at least if what we're interested in is true talent. Just as a .300 hitter might hit .200 or .400 over two months, so might an average fielder run up excellent or lousy defensive numbers in a given year. It also means that we're trying to measure a moving target. By the time those three years' worth of data are in the books, the actual talent of the player in question will have changed.
There are other problems with these metrics, but these are the main ones. The question, then, is this: Allowing that despite their flaws these new numbers are extremely useful, a massive step forward in our understanding of baseball, how is a normal fan to use them?
First, it has to be understood that these metrics are attempting to measure performance, not talent. A lousy defender might score well in UZR in a given year because of a flaw in the system, but it's also possible he's just played well, the same way a lousy hitter might bat .320 for no apparent reason. Keeping this distinction in mind is important.
Second, one has to appreciate that fielding performance is best measured over periods of time longer than a single season. Saying that Ichiro Suzuki is a good right fielder and that his UZR of 9.6 this year proves it is silly. Saying that he's a good right fielder and that he's scored extremely well in UZR for nearly a decade is a lot more defensible.
Third, it's probably better to think of defensive metrics more as a way of classifying players than of measuring their precise value. That Yunel Escobar rates above Alexei Ramirez by two runs in Plus/Minus means less than nothing; the takeaway from their ratings is that both score as A+ defenders in that system, something that should be taken into account along with other evaluative metrics, observation and their reputations when ranking them against one another.
Basically, the point is to take these measures as a flashlight in a dark room. There isn't enough light in the room to tell if an object is seven or nine feet high, but there is enough to tell it's tall. The light keeps going out, but it works often enough to be useful. So long as we're aware of its limitations, it can help us. If we pretend it can do more than it can, though, we're going to be actively worse off than we would be if we'd just tossed the light aside and groped around the room with our bare hands.