Quality control: The numbers behind John Lowe's quality start stat
Earlier this week, John Lowe, a writer for the Detroit Free Press for the past 29 years — and an extremely well-respected one at that — announced his retirement. If you don't know Lowe by name, you almost certainly know at least one facet of his legacy, for he invented a statistic: the quality start.
The year was 1985, and Lowe was writing for the Philadelphia Inquirer (he moved along to Detroit the next year). Noting the decline of the complete game and the evolving philosophy of managers with regards to their expectations for their starting pitchers, he strove to find a descriptive stat that recognized this change. As he told Murray Chass in 2011:
"I got the idea in 1983 and '84," Lowe said. "I was hearing managers saying they were looking for six innings from their pitchers. I heard Whitey Herzog say 'all I want from my pitchers is six good innings.'"
That's where six innings came from. And the runs? "Six and two is too stingy, six and four is too much. I wasn't going to get into a more than or less than. This was new and had to be understandable."
Why the need for a new statistic? "I didn’t like ERA as a definitive stat," Lowe said. "One bad start could wreck your ERA. But I never said don't look at wins and losses."
Thus was born the quality start, credited to a starter for any outing in which he pitches at least six innings and allows no more than three earned runs. Due to the vagaries of offensive and bullpen support, that starter may not actually get a win for his effort, but by and large, he's done a good job of keeping his team in the game. Note that while Lowe didn't explicitly distinguish between earned and unearned runs in answering Chass, he clearly wanted something in the same ballpark as ERA.
Historically speaking, quality starts occur in a bit more than half of all games, with their rate essentially mirroring scoring levels. One stat's peak is the other's nadir; since 1950, the extremes on either side for quality starts have coincided with the highest and lowest scoring levels in that span. In 1968, the "Year of the Pitcher," teams scored an average of 3.42 runs per game (the lowest level since 1908) and pitchers made quality starts 62.6 percent of the time. In 1996, when teams averaged 5.04 runs per game (the highest level since 1936), pitchers made quality starts in 45.8 percent of games.
In 2014, when teams scored an average of 4.07 runs per game, pitchers made quality starts 54.0 percent of the time, up from 52.6 percent of the time in 2013. As you'd expect, the best pitchers made quality starts with the highest frequency. Here are the top 10 in each of the AL and NL in terms of rate:
|1||Jon Lester||Red Sox/Athletics||32||27||84.4|
|2||Chris Sale||White Sox||26||21||80.8|
For those tables, I stuck with the 162-inning cutoff used for ERA qualifiers, which means that Masahiro Tanaka (80 percent in 20 starts), Jacob deGrom (77.3 percent in 22 starts) and Hyun-Jin Ryu (73.1 percent in 26 starts) were bumped, though they at least deserve mention. Those short-season pitchers aside, that's just about everybody you'd place on a Cy Young ballot in either league, not to mention just about everybody from the Wins Above Replacement leaderboards. Everybody who's anybody is making quality starts.
And yet the stat has its critics, though not unreasonably so. One primary criticism is that the statistic rewards mediocrity or worse, namely by granting credit for an outing in which a pitcher produces a 4.50 ERA (six innings, three runs). During the season that Lowe invented the stat, the major league ERA was 3.89; it had been exactly 4.00 in 1977 and 1979 but hadn't been around 4.50 since 1936, part of a much higher-scoring era. A 4.50 mark was out of step with the times, though Lowe couldn't have foreseen that scoring levels would rise sharply in the 1990s and 2000s. The major league ERA during the strike-shortened 1994 season was 4.51, and from that year until 2007, it was within one-tenth of a run (0.10) of 4.50 nine times, topping it five times with a high of 4.77 in 2000.
Cumulatively, the major league ERA was 4.48 for the 1994-2007 stretch, before scoring levels began to taper off significantly; it's been below 4.00 in three of the past four seasons, with this year's 3.74 mark the lowest since 1989 (3.71). Over a broader swath of history, the major league ERA is lower; from 1961 (the start of the expansion era) to 2014, it's 4.00, while from 1973 (the start of the designated hitter era) to 2014, it's 4.10, and from 1993 (the start of the more recent expansion wave, and the point when scoring and home run levels began to soar) to 2014, it's 4.35.
Given that, the 4.50 ERA threshold doesn't represent average, but actually functions more like a replacement level — a baseline that has real value when it comes to measuring performance. That isn't to say that 4.50 is an ideal place to set that line, given the degree to which scoring levels fluctuate over time. As I'll show below, a 4.50 ERA produces a winning percentage that's below .500 but still well above the .320 percentage that is used by both Baseball-Reference.com and FanGraphs (which is to say that a team full of replacement level players would win 32 percent of its games).
Critics of the quality start stat also complain that it preserves the artificial distinction between earned and unearned runs, a distinction set more than a century ago, when fielders' mitts were nonexistent or rudimentary and errors abounded. In 1901, the year the American League came into existence, 32 percent of all runs were unearned, whereas they've accounted for less than 10 percent of all runs every year since 1991, and just 8.3 percent this year.
Given that it's the pitcher's job to prevent all runs, earned or unearned, and that we now have more sophisticated ways to account for the separation between pitching and defense (such as the Fielding Independent Pitching stat), there's certainly an argument to be made for ditching the distinction — far beyond the realm of quality starts. But in order to understand the stat better, I'm ignoring that for now.
Back to what we'll call the 4.50 case — by which I mean exactly six innings and three earned runs. A few years ago, it inspired me to dig deep into the concept of quality starts. What its critics don't realize is how rarely the 4.50 case occurs; from 1950-2010 (the span I used when I wrote my piece in 2011, given that RetroSheet data only went back so far), it accounted for 5.9 percent of quality starts and just 3.0 percent of all starts. In 2014, it accounted for 8.5 percent of quality starts and just 4.6 percent of all starts. In those games alone — games in which starters delivered exactly six innings with three earned runs allowed — teams went 98-124, for a .441 winning percentage.
Given that, you might think that there isn't much separation between teams that receive a quality start versus teams that don't, but you'd be far off base. Historically, teams receiving quality starts win around two-thirds of the time. For the post-1960 expansion era, their winning percentage is .674; for 2014, it was .660. What's more, there's a massive gulf in collective performance between pitchers who put up quality starts versus those who don't; generally, the former group has an ERA a bit below 2.00, the latter above 7.00. In 2014, the split was a 1.88 ERA for those pitching quality starts, 6.97 for the rest. That's actually very similar to the performance of all pitchers in wins (1.82 this year) and losses (7.33).
Many who are conceptually on board with the idea of counting some kind of quality starts take issue with the thresholds that Lowe defined. ROOT Sports, the channel that broadcasts Mariners games and thus King Felix’s starts, introduced two variants: the "ultra quality start" (at least seven innings, with no more than two earned runs) and the "mega quality start" (at least eight innings, with no more than one earned run). CBS Sports' Dayn Perry offered the "dominant start," dispensing with the earned/unearned run distinction and considering only starts of at least eight innings with no more than one run allowed.
All of those are reasonable points to draw the line, perhaps more useful than six innings and three earned runs, thresholds I've defended in the past given their broad applicability across large swatches of baseball history. But having spent hours sifting through Play Index data for a notion of where a better place to draw the line might be, I'm struck by the diminishing returns. For example, if we were to redefine a quality start as seven or more innings, three or fewer earned runs, we'd find that in 2014, that described just 29.1 percent of starts (compared to 54.0 for the 6/3 thresholds), and teams receiving such starts had a .707 winning percentage (up from .660). The "ultra quality start" definition describes 25.2 percent of 2014 starts, and teams post a .741 winning percentage. Such starts are much scarcer commodities, but does drawing the line there tell us significantly more than the original stat?
I'd argue not, for if we want more sophistication in recognition of quality, we can turn to FIP or WAR. Having something distinct from those two — easy enough to calculate at a glance when combing through the morning box scores — is why I continue to use the original definition of the quality start and to marvel at the stat's simple elegance. Thus, my tip of the cap to the retiring John Lowe, for yet another quality piece of work.