26 March 2012

Parlor Games and Predicting Presidential Elections

At the NYT Nate Silver has a great post up on the ability of political scientists to predict elections based on "fundamentals" such as the state of the economy. Silver asks:
Can political scientists “predict winners and losers with amazing accuracy long before the campaigns start”?
His answer?
The answer to this question, at least since 1992, has been emphatically not. Some of their forecasts have been better than others, but their track record as a whole is very poor.

And the models that claim to be able to predict elections based solely on the fundamentals — that is, without looking to horse-race factors like polls or approval ratings — have done especially badly. Many of these models claim to explain as much as 90 percent of the variance in election outcomes without looking at a single poll. In practice, they have had almost literally no predictive power, whether looked at individually or averaged together.

When I was in political science graduate school in the early 1990s, predicting elections was a hot niche in the field. I thought it was a dubious endeavor then and haven't encountered any evidence to change my mind. (One of my very first blog posts at Prometheus back in 2004 was on this subject.)

Silver identifies "nearly 60 forecasting models published by political scientists or economists in advance of the 1992 through 2008 elections" and then engages in on one of my favorite academic exercises -- he compares the predictions to what actually happened, noting (rather remarkably for a field engaged in prediction) that he'd never done such an exercise and "[n]or, to my knowledge, has anybody else done so in a comprehensive way."

An effective evaluation of predictions requires that a few "ground rules" (Silver's phrase) be followed, such as the prediction must be made in advance of the event being predicted by academics using data available at the time the forecast was made. Importantly, Silver highlights the need to include all predictions that were made. He observes a "trick" of the trade:
But one “trick” that some of the forecasters use is to highlight the version of the forecast that seems to match the polls or the consensus view, while burying the others in the fine print. One forecaster in 2008, for instance, published four versions of his model, two of which showed a clear advantage for John McCain and the other two for Barack Obama. Since there was abundant evidence by the late summer that Mr. Obama was the favorite, he recommended that people look at the pro-Obama versions. However, he was presumably doing so because he was aware of factors like polls that he hadn’t originally deemed useful to include in his model. We treat all these versions equally: if it was published and has your name on it, it counts. 
That "trick" doesn't happen in other fields, does it? ;-) In a paper that I wrote in 2009 evaluating hurricane landfall forecasts, I gave this "trick" a fancier name -- "when the hot hand fallacy meets the guaranteed winner scam" (discussed here in PDF), and long-time readers may recall RMS and the monkeys, but I digress.

Silver evaluates the standard error in predictions and finds not just quantitatively poor performance, but results that are literally all over the place.
  • 1992, "little consensus among the models, a high standard error, and much less accuracy than claimed."
  • 1996: " the models had many of the same problems" 
  • 2000: "the models had their worst year" 
  • 2004: "similar to 1992 or 1996" 
  • 2008: "the divide in the models was especially wide"
In total, 18 of the 58 models — more than 30 percent — missed by a margin outside their 95 percent confidence interval, something that is supposed to happen only one time in 20 (or about three times out of 58).
Silver promises that an upcoming post will look at the predictive skill of the models against a naive baseline, an approach that is necessary to quantitatively evaluate the value-added by the forecasting methodology. He says that "The results are not going to be flattering." Back in 2004 I wrote that "such models are little more than parlor games for academics" and Silver's analysis doesn't really compel me to change my mind.

Silver has a book coming out later this year on predictions and if it is of the high quality of this post (and many of his others) it'll be fantastic.