Mathematicians Predict Cy Young Winners

This year's Cy Young award-winners in baseball will be announced Nov. 8 (American League) and Nov. 10 (National League) by the Baseball Writers' Association of America, whose members vote on the award.

But mathematicians Rebecca Sparks and David Abrahamson, a husband-and-wife team who teach at Rhode Island College, couldn't wait to find out who wins the pitching award. So they developed a math formula that predicts which pitchers will place first through third in Cy Young voting.

They predict Chris Carpenter of the St. Louis Cardinals and Mariano Rivera of the New York Yankees will snag the coveted awards.

Update: Math is Wrong
Nov. 8: Bartolo Colon won the American League Cy Young Award on Tuesday in a surprisingly one-sided vote, becoming the first Angels pitcher in 41 years to take home the honor.

Colon, who led the league with 21 wins, was listed first on 17 ballots and second on the other 11 for 118 points in voting by the Baseball Writers' Association of America. He was the only pitcher named on every ballot, easily beating out New York Yankees reliever Mariano Rivera, who received 68 points.

Rivera got eight first-place votes, while 2004 winner Johan Santana of the Minnesota Twins received three and finished third.

-- Associated Press

Sparks and Abrahamson announced their prediction today. They had presented their model in the April 2005 issue of Math Horizons, a magazine published by the Mathematical Association of America (MAA).

Unusual approach

Every season, the baseball writers' association selects two sportswriters from every city in the major leagues to vote for a first, second and third place choice. The ballots are due right after the regular season ends.

"The identities of the voters change frequently," Sparks and Abrahamson write in their Math Horizons article, "but we will see that their voting results follow a predictable course."

The researchers structured their formula to predict the voting results for starting pitchers, who almost always win the award, rather than relief pitchers, who are rarely the recipients. However, their formula reveals a lack of standout American League starting pitchers this year, suggesting that the AL award will go to relief pitcher Mariano Rivera for his extraordinary 2005 season.

The researchers did not consider which pitchers should win the award, or which qualities were most important in a pitcher. They simply aimed to develop a mathematical formula that would best match the voting results.

The formula computes a score for each pitcher on a scale from roughly 0 to 10. For their formula to be successful, it must yield the top score in a particular season to the pitcher who places first in Cy Young voting, the next-highest score to the player who places second, and the third-highest score to the player who places third.

To calculate the scores, they first chose four key pitching statistics: wins, losses, strikeouts, and ERA (earned run average, which is the average number of runs that the pitcher is responsible for giving up per 9 innings of play). They also included a fifth statistic, the winning percentage of the pitcher's team, as they thought that it influences the voting results.

Math help

But the main question, according to the two researchers, is how much importance the voters placed on each of those five categories. Do voters, consciously or unconsciously, generally value a pitcher's number of wins more than his number of strikeouts? Does a pitcher on a first-place team really have a better chance of winning the award than a pitcher with slightly better stats on a last-place team?

The tools of mathematics can answer this seemingly subjective question. First, the researchers looked up the statistics in those five categories for starting pitchers between 1993 and 2002 and compared them to the Cy Young voting results for those years.

Then, to determine the relative importance of each of the five categories in the voting results, they turned to a mathematical method, dating to the 1940s, called linear programming. First developed by economists (who won the Nobel Prize for work that employed it) and mathematician George Dantzig, the idea is to find the missing numbers (in this case, the relative importance or "weight" of each pitching category in the voting) in order to satisfy certain constraints (i.e., a formula that would correctly yield the first- through third-place results for Cy Young balloting).

Analyzing the 1993 to 2002 data, they concluded that a pitcher's number of wins carried almost three times as much weight in the voting as his earned run average.

ERA, in turn, was about one-and-a-half times more important than strikeouts, and about twice as important as the winning percentage of the pitcher's team. Almost completely insignificant, according to the model, is a pitcher's number of losses; they seemed to have very little bearing on the voting results.

Hindcasting success

By taking each pitcher's statistics in these five categories and adjusting their values according to these relative weights, the researchers' formula correctly yielded all but one of the first-, second- and third place vote-getters in each league from 1993 to 2002. Recently, they incorporated the data for the 2003 and 2004 seasons into the model, and predicted three out of four Cy Young winners (the fourth was a reliever). By looking at the 2003 and 2004 statistics, they again found that the relative weights of the five categories were almost exactly the same as in the earlier data.

Using their formula, the researchers come up with the following predictions for the first three places in the 2005 National League voting:

• Chris Carpenter, St. Louis (6.4257 points)
• Dontrelle Willis, Florida (6.3420)
• Roy Oswalt, Houston (5.9064)

According to Abrahamson, it is possible that voters may drift away from their past behavior by voting for Roger Clemens or Andy Pettitte ahead of Roy Oswalt this year.

Clemens and Pettitte are generally better known veterans who may have a somewhat higher profile in the news media than Oswalt.

In the American League, the top starters (not the predicted winners) in their model are, in order,

• Bartolo Colon, LA/Anaheim (5.8074)
• Johann Santana, Minnesota (5.3671)
• Jon Garland, Chicago (5.0730)

No standout

The model shows that there is no standout starter in the American League this year. Bartolo Colon, the top starter according to their model, has a total score of less than 6, a far cry from many AL Cy Young award winners in years past, such as Barry Zito (6.75, 2002) and Pedro Martinez (7.54, 1999).

"Our model quantifies the fact that there is no AL pitcher who will knock the voters' socks off," says Abrahamson. Therefore, Sparks says the two are "very confident" that the AL Cy Young Award will go to Mariano Rivera, a relief pitcher who had a particularly outstanding year. A Cy Young for Rivera, they say, would also serve as a kind of "lifetime achievement award" as Rivera, who has never earned the award, is likely toward the end of a very distinctive career.

The researchers think that their mathematical approach, known generally as "constrained optimization," might work for other sports awards, such as the most valuable player in various leagues. It also might help provide insights into how magazines rank corporations, or top colleges. But the point of their approach, they say, is to show how the methods of mathematics can apply in many unexpected everyday situations.

"The moral is always the same for the mathematical modeler," they write in their Math Horizons article. "More often than we may know, there is a pattern out there. We just have to keep thinking creatively, and we have got a good chance of finding it."