The BABIP computation is hits minus homers for the numerator with at-bats minus homers minus strikeouts as the denominator. The result is the batting average on any balls that stay in the yard. One can nitpick that this includes balls caught in foul territory and doubles hit off outfield walls too high to catch, but the instances of these occurrences is insignificant when compared to the totality of batted balls. More details are available via an Internet search, but McCracken's primary finding was that the BABIP was remarkably similar for all major league pitchers.
Before we get involved with some interesting math, something needs to be clarified. Often you will hear or read something like "what this means if you or I were to pitch, the BABIP against us would be the same as it is against Roy Halladay or Tim Lincecum." Sorry, but this is not the case, as the original studies were done using major league pitching as the input. In fact, there is even variation within the data on the minor league level. The point is the conclusions are only viable for the hitters and pitchers in The Show.
As a scientist, I was admittedly fascinated by McCracken's discovery. But as a baseball fan, I would be lying if I said I was not a little bummed out by the revelation. Granted, I fully understand striking guys out and limiting walks are skills that can be demonstrated by better pitchers, but intuitively, I wanted to believe that better pitchers induced weaker contact, whereby limiting hits. Thanks for nothing, Voros.
But alas, there is hope. Note I said hope, not proof, but there is hope that the scientist in me can help assuage the little kid in me. Let us look at some BABIP numbers from the last five seasons. This data was all gathered from www.Baseball-Reference.com.
By means of a brief introduction, let us break down BABIP into components. Batted balls come in three varieties: grounders, line drives and fly balls. On balls in play, 72.4 percent of all line drives, 23.8 percent of all ground balls and 13.8 percent of all fly balls on the average result in hits. If we make the leap of faith that line drive percentage against is not a skill, a supposition backed by some Mastersball research, then there will be some BABIP deviation based on a pitcher's ground ball and fly ball tendencies, a factor our research shows is in control of the pitcher. And while this is a subject for a different essay, there is a balance between the fewer number of hits on balls in play a fly ball hurler allows and the increased number of home runs.
With that said, it is the extent that line drives are out of control of the pitcher that intrigues me. For a pitcher to induce a lower than expected BABIP, it stands to reason the associated skill is in large part the ability to allow fewer line drives than the average.
What we will do is look at some data splits and compare BABIP. Note in each case, the BABIP you would intuitively expect to be better indeed is. This holds form for each and every case. The data presented will be the average BABIP in each instance spanning the 2004-09 campaigns.
The BABIP for right-handed pitchers versus right-handed batters was .294 but rose to .305 versus left-handed batters. Similarly, with lefty pitchers against lefty batters, the BABIP was .297 but rose to .303 against righty hitters. Obviously, this does not prove anything about the skills of an individual pitcher, but it does suggest BABIP is not a completely random phenomenon as in both instances, the same handedness pitcher held the batter to a lower BABIP. To add further credence to this idea, the accepted skills of K/9 and BB/9 also follow this same pattern, with the advantage going to a pitcher with the same handedness of the batter.
THE BABIP for pitchers working at home is .297 as compared to .303 on the road. Unto itself, this does not mean much, but when taken together with the fact that, again, K/9 and BB/9 numbers follow the same path, it can be argued that in the aggregate, home pitchers are more skilled with respect to reducing, albeit slightly, BABIP.
In the interest of space, only data in the extreme counts will be presented. In a 3-0 count, the BABIP is .315 as compared to just .286 when it is 0-2. The intermediate counts also fall in line. So again, we are not unveiling anything about a specific pitcher's skills, but globally, in a count that intuitively favors pitchers, the BABIP is decidedly lower than in a hitters count. So if nothing else, this suggests BABIP is not totally random.
The last set of splits really piques my interest for reasons that will be elucidated in a bit. The BABIP with runners on base is .304 as compared to .296 with the bases empty. Looking at these numbers a bit closer, the BABIP with a runner on first is .313. With runners at first and third, it is .309. Curiously, it is only .292 with runners on first and second. For the record, while a five-year average was presented, the above held true for each individual year. In addition, the corresponding K/9 and BB/9 numbers display the same pattern, which is a pitcher shows a higher skill level with the bases empty.
Here is the part that has me thinking. There is a tangible difference between a pitcher working with the bases empty and with runners aboard. With ducks on the pond, a pitcher generally works from the stretch. However, with the sacks clear, starters will use the full windup. I have always been of the mind that each starting pitcher is actually two different guys, the windup guy and the stretch guy. That is, the skills like K/9 and BB/9 may very well be different depending on how the pitcher delivers the ball.
Admittedly, there are at least two shortcomings within this data. The first is many relievers work from the stretch regardless of the game situation. The second is some pitchers will work from the windup with the bases loaded or men on second and third. Neither of these scenarios is incorporated into the presented data, so the precise BABIP for pitchers using the windup versus the stretch is not discernable.
Even with the above caveat, there is reasonable evidence to suggest that something that may differentiate skills from one pitcher to the next is the loss in skills they incur as they switch from the windup to the stretch. It is entirely plausible that for whatever reason, certain pitchers retain more of their effectiveness working from the stretch. This may even help explain how some hurlers are able to leave a higher percentage of their allowed runners on base, leading to a better ERA than expected. There is one more pertinent point to be made. Recall the oddity that the BABIP with runners on first and second dropped. The primary difference between runners on first and second versus only on first or first and third is the likelihood of a stolen base attempt is reduced with runners on first and second. So even though the pitcher is still going from the stretch, he is less concerned about a steal attempt.
Hmm, to be fair, this does introduce another variable and that is whether or not the first baseman is holding the runner on and whether or not the middle infielders are cheating to cover second for a steal or are at double play depth. This altered defensive alignment may impact BABIP. But again, the purpose of this discussion is not to draw conclusions; I am not attempting to show beyond a shadow of a doubt that pitchers can affect BABIP. I am merely trying to open some eyes as to the possibility and perhaps persuade those that do it for a living to collect the necessary data to further test these hypotheses, but I digress.
This is a fantasy column, so let us conclude with some potential fantasy applications of the above. As most are aware, presently, the primary utility of examining BABIP data in-season is to help gauge if performance is fortunate or unfortunate. As explained, most pitchers generally cluster around .300, so if a guy is sporting a BABIP lower than that, he is deemed lucky and a spike in WHIP and ERA is expected as the BABIP regresses towards the mean. And if it is above .300, improvements in his ratios are likely as it regresses downward. The concept is analogous for hitters, except they establish their individual baselines and are considered lucky if their BABIP is above that level.
To be frank, you did not suffer through the previous 1,500 or so words for this, as that type of analysis has moved into the mainstream. Instead, what I would like to do is propose that we may, in fact, be overlooking some skill elements, or the lack thereof, in a small sample when incorporating BABIP into the analysis. Specifically, there may be a couple of instances where what we assumed to be strictly bad luck may have also been some bad pitching.
At this point, it is borderline Pavlovian to write off all pitchers' high BABIP as bad luck. But what if the high BABIP is fueled by a high line drive percentage? Is allowing a bunch of line drives bad luck or bad pitching? So at minimum, perhaps also noting a pitcher's line drive percentage is necessary to gauge his effectiveness. And if he is sporting a high line drive percentage, perhaps it cannot automatically be assumed his BABIP will regress, until he figures out why he has lost effectiveness. Looking at K/9 and BB/9 hand in hand with this can help judge how much of the effect is skill.
Something else to at least consider is the domino effect of having runners on base affect pitchers' skills, including BABIP decline when switching to the stretch. If a pitcher is uncharacteristically wild, the trouble may snowball as he then works from the stretch. Or even if a pitcher gives up a couple of unlucky hits, the fact he is forced to work more from the stretch may result in numbers worse than he actually pitched, which may cloud our opinion how he will fare in ensuing contests. The reverse is also true. We may overestimate the potential of a guy that was lucky on balls in play and therefore did not have to go from the stretch as much, artificially suppressing his numbers.
Obviously the small sample size caution applies to the above analysis, but in this era of leagues with daily transactions and weekly leagues with liberal movement between active and reserve rosters, it is necessary to at least attempt to get a better grip on performance over small samples. I say attempt, because ultimately, this could be an exercise in futility, at least with the data currently tracked.