Chipper Jones is not a .400 hitter. That doesn't mean he won't hit .400.
What we have on our hands is a classic case of the irresistible force against the immovable object. On the one hand, it's exceptionally unlikely that a player who has hit .310 over a 15-year major league career suddenly woke up one morning at 35 years old and became a .400 hitter. Jones is seeing the ball exceptionally well, and apart from frequent problems with injury, he has aged relatively gracefully. But he's undoubtedly also squeezed a few lucky hits in between the shortstop and the second baseman, and had a few Texas Leaguers drop in.
On the other hand, it is also exceptionally unlikely that a player who is really and truly "just" a .310 hitter can hit .420 in 219 at-bats based on luck alone. Just how unlikely? The probability of a .310 hitter getting at least 92 hits in 219 tries is 0.023 percent. That's a one-in-4,423 chance, for those of you who like your odds Vegas style.
The truth, then, lies somewhere in between. But we'd like to know exactly where in between it lies. Jones is going to have to hit about .385 for the rest of the season to finish with a .400 average. If he's really a .370 hitter, that is well within the realm of possibility. If, on the other hand, Jones's talent is that of a .320 hitter, he has his work cut out for him -- he'll have to do the equivalent of win the lottery twice in a row. Let's break this problem down into its two essential steps.
By "true level of talent", I mean what Jones would hit if you gave him an infinite number of at-bats, devoid of the vagaries of luck and small sample sizes. Before the season began, we had some rough notion of what Jones' level of ability was based on the performance of the comparable players in his PECOTA forecast. This took the form of a bell curve, centered around Chipper's usual batting averages of about .310 or .320, but with some higher and lower figures possible. (A technical note to my regular readers: what you see in the chart below is not our usual way of doing a PECOTA forecast. Instead, I have generated a normal distribution based on the performance of Chipper's comparables, after regressing the comparables' batting averages to the mean).
At this point, however, we have significantly more information about Chipper than we did at the start of the season. Information like this: Chipper really, really knows how to hit a baseball. So the idea is to come up with a new estimate of Jones' talent based on what we've learned about him this year.
The process for doing this is a little involved, and requires the use of something called Bayes' Theorem. But the basic intuition is as follows: sure, it seemed unlikely at the start of the season that Jones was a .360 hitter. But we also know that it's much, much likelier for a .360 hitter to sustain a .420 batting average over the first ten weeks of the season than a .310 or a .290 hitter. What Bayes' Theorem gives us is a way to balance these two pieces of information. (I've used this process before to evaluate hot and cold starts, and it's proven to have pretty good predictive power).
Sparing everyone some math, our solution from Bayes' Theorem is that Jones is really and truly about a .350 hitter -- specifically, our estimate is that he should hit about .348 the rest of the way out. But there is some uncertainty around this estimate too. It's plausible that Jones has become a .360 or a .370 hitter who has gotten a little lucky, and it's plausible that he's still more like a .320 or .330 hitter who has gotten a lot lucky. What we can say almost for certain is that: (i) Jones isn't really a .400 hitter, but that (ii) he's also almost certainly better than the .310-.320 range we pegged him at before the season began.
Now that we have some better idea of what Jones is likely to hit over the medium-term, we can take a fairer crack at the short term. How likely is Chipper to stay hot enough to finish with a .400 average? This process was taken care of by a simulation I designed, in which we played out the rest of the season 1,000 times. The way that the simulation worked was as follows:
1. Figure out how likely Jones is to stay healthy. Since Chipper has had persistent problems with injuries, we cannot rule out the possibility that he will lose his battle with the .400 mark by attrition. Since 2003, Jones has spent roughly 13 percent of his time on the disabled list. So what I did was to break the remainder of the season into seven 15-day intervals. In each interval, Jones was assigned a 13 percent chance of being disabled. If he was disabled, obviously, he wasn't assigned any plate appearances during that period. Ironically, spending some time on the DL might actually be helpful to Chipper's cause, since it means that he'll need fewer at bats over which to sustain a .400 average. On the other hand, if he spends more than about 30 days on the DL, he probably will not finish with enough plate appearances to qualify for the batting title (502 is the minimum).
2. Figure out how many plate appearances Jones gets for each game on the active roster. The next step is to assign Chipper a specific number of plate appearances per game. So far in 2008, Jones has gotten six plate appearances in a game four times, five plate appearances 24 times, four plate appearances 27 times, three plate appearances twice, two plate appearances once, one plate appearance once, and zero plate appearances six times (when he got the day off). For each remaining game of the season, the simulation randomly selects a number in proportion to this array of possibilities, and assigns that many plate appearances to Chipper
3. Assign a pitcher to Jones for each plate appearance. Next, I randomly selected a National League pitcher for Jones to face in each at-bat. The pitchers were weighted by the number of batters that they have faced thus far in the 2008 season, so starting pitchers come up more often than relievers. Pitchers from the Braves were excluded, while pitchers from Jones's home division in the NL East were weighted double.
4. Estimate the probability that a plate appearance becomes an at-bat. In order for Jones to have an opportunity to get a hit, he cannot draw a walk, be hit by a pitch, or record a sacrifice. So what I did was to estimate the probability that one of these non at-bat outcomes takes place against the pitcher that Jones was assigned. This was determined through an old Bill James invention called the log5 formula, which can estimate the possibility of different offensive outcomes for any batter-pitcher matchup. Overall, we estimate that 15 or 16 percent of Jones's plate appearances will not turn into at-bats. This is because Jones walks quite frequently. This, it should be said, is also an asset for a player trying to hit .400, since things like walks essentially run time off of Chipper's clock while still contributing to the plate appearances he need to qualify for the batting title. This is perhaps the only ways in which attempting to hit .400 might be easier than trying to hit 60 home runs: getting pitched around can work to your benefit.
5. Given an at-bat, estimate the probability of a hit. One we determine that a Jones plate appearance becomes an at-bat, we then estimate the probability that Jones gets a hit based on his batting average and the batting average allowed by the pitcher (again using the log5 formula). The only wrinkle is that Jones's batting average changes from simulation to simulation based on the probability curve that we developed before. Since we don't know exactly what Jones's long-run batting average is, he sometimes gets to take all of his plate appearances for the rest of the season as a .370 hitter, but other times as a .320 hitter.
6. Rinse and repeat. From there, it's simply a matter of repeating the simulation as many times as we like to estimate Chipper's season-ending batting average.
Overall, out of our 1,000 simulations, Jones hit .400 or better and had enough plate appearances to qualify for the batting title 125 times. So this is your answer: we estimate that Jones has about a 12-13 percent chance of finishing with a .400 average. On six additional occasions, Jones finished with a .400 batting average but missed qualifying for the batting title due to injury; these results are not counted toward his total.
Chipper's highest batting average in any one simulation run was .438 (!) His lowest was .318. His average season-ending figure was about .378. So, whether or not Jones finishes above the .400 mark, he is more likely than not to hit better than .372, which is the best figure recorded thus far in the aughts (the record is shared by Nomar Garciaparra and Todd Helton, both of whom hit .372 in 2000).
Overall, this paints a somewhat brighter picture for Chipper than I was anticipating. There are undoubtedly some elements of luck in Chipper's performance to date, but he's run too far ahead of the curve for too long into the season for us not to take it at least somewhat seriously. With a few more timely hits, and perhaps a fortuitous 15-day injury or two, this might prove to make for the most exciting individual record-chase since 1998.