Let's suppose that you're a normal baseball fan with normal interests and an understanding that smart writers and operators think fancy defensive statistics are important.
Curious, you visit Fangraphs.com and see that Carl Crawford, Tony Gwynn Jr., Marlon Byrd, Alexei Ramirez and Ichiro Suzuki rate as the five best defenders in the major leagues, with Crawford having saved 19 runs and the others half as much. "Fair enough," you say.
Later in the day, looking to settle a bet, you look up Cory Snyder's 1988 batting average on Baseball-Reference.com and while there, happen on their defensive leader boards -- which, you note, have Justin Upton, Angel Pagan, Aubrey Huff, Brett Gardner and Michael Bourn in the lead. "Huh," you say.
In the evening, suspicious, you sign up for a subscription to BillJamesOnline.net and check their defensive evaluations. You're by now not shocked to see that the top five here has nothing in common with competing ones: Yunel Escobar, Alexei Ramirez, Brendan Ryan, Ryan Zimmerman and Chase Headley comprise the top five, with the first four all credited with more runs saved than Carl Crawford.
Exasperated, you give up and mutter that these guys need to get their act together. Is this a fair reaction? Well, yes. If the problem were that one metric was right and its rivals wrong -- that Fangraphs' UZR, for example, was reliable, while the Plus/Minus system at Bill James' site was not -- that would be understandable. The truth, though, is more frustrating.
The basic problem with fancy defensive statistics is that they aren't statistics at all in the sense normal baseball fans understand the word. It is a fact that Cory Snyder hit .272 in 1988. On its own, this means nearly nothing, but by putting it together with other statistics, which account for playing time, his hitting environment and such, we can derive an evaluative metric that tells us how good a hitter he was in 1988.
This can't be done with defense because we lack statistics that we can rely on to describe what happens on the field. Colin Wyers of Baseball Prospectus wrote an excellent piece about the problem recently, and it might fairly be summed up "garbage in, garbage out." The underlying data that feeds systems like Plus/Minus is subjective, not objective, and prone to varying kinds of bias. This does a lot to explain why they arrive at different results.
Of the other problems with defensive metrics, two strike me as especially significant. One is that they don't obviously correlate to winning. From 2002 through 2009, for example, the correlation between team winning percentage and team UZR was .19 -- less than half that of fielding percentage. This is ungenerous, as UZR correlates reasonably well to runs allowed without even accounting for pitching, but points up that these metrics are not nearly as robust as the likes of OPS.
The other issue is this: As a rule of thumb, researchers figure that three years' worth of defensive data is equivalent to one year of offensive data. Practically, this means that a single years' rating in any advanced defensive metric is next to useless, at least if what we're interested in is true talent. Just as a .300 hitter might hit .200 or .400 over two months, so might an average fielder run up excellent or lousy defensive numbers in a given year. It also means that we're trying to measure a moving target. By the time those three years' worth of data are in the books, the actual talent of the player in question will have changed.
There are other problems with these metrics, but these are the main ones. The question, then, is this: Allowing that despite their flaws these new numbers are extremely useful, a massive step forward in our understanding of baseball, how is a normal fan to use them?
First, it has to be understood that these metrics are attempting to measure performance, not talent. A lousy defender might score well in UZR in a given year because of a flaw in the system, but it's also possible he's just played well, the same way a lousy hitter might bat .320 for no apparent reason. Keeping this distinction in mind is important.
Second, one has to appreciate that fielding performance is best measured over periods of time longer than a single season. Saying that Ichiro Suzuki is a good right fielder and that his UZR of 9.6 this year proves it is silly. Saying that he's a good right fielder and that he's scored extremely well in UZR for nearly a decade is a lot more defensible.
Third, it's probably better to think of defensive metrics more as a way of classifying players than of measuring their precise value. That Yunel Escobar rates above Alexei Ramirez by two runs in Plus/Minus means less than nothing; the takeaway from their ratings is that both score as A+ defenders in that system, something that should be taken into account along with other evaluative metrics, observation and their reputations when ranking them against one another.
Basically, the point is to take these measures as a flashlight in a dark room. There isn't enough light in the room to tell if an object is seven or nine feet high, but there is enough to tell it's tall. The light keeps going out, but it works often enough to be useful. So long as we're aware of its limitations, it can help us. If we pretend it can do more than it can, though, we're going to be actively worse off than we would be if we'd just tossed the light aside and groped around the room with our bare hands.