26 March 2012

Parlor Games and Predicting Presidential Elections

At the NYT Nate Silver has a great post up on the ability of political scientists to predict elections based on "fundamentals" such as the state of the economy. Silver asks:
Can political scientists “predict winners and losers with amazing accuracy long before the campaigns start”?
His answer?
The answer to this question, at least since 1992, has been emphatically not. Some of their forecasts have been better than others, but their track record as a whole is very poor.

And the models that claim to be able to predict elections based solely on the fundamentals — that is, without looking to horse-race factors like polls or approval ratings — have done especially badly. Many of these models claim to explain as much as 90 percent of the variance in election outcomes without looking at a single poll. In practice, they have had almost literally no predictive power, whether looked at individually or averaged together.

When I was in political science graduate school in the early 1990s, predicting elections was a hot niche in the field. I thought it was a dubious endeavor then and haven't encountered any evidence to change my mind. (One of my very first blog posts at Prometheus back in 2004 was on this subject.)

Silver identifies "nearly 60 forecasting models published by political scientists or economists in advance of the 1992 through 2008 elections" and then engages in on one of my favorite academic exercises -- he compares the predictions to what actually happened, noting (rather remarkably for a field engaged in prediction) that he'd never done such an exercise and "[n]or, to my knowledge, has anybody else done so in a comprehensive way."

An effective evaluation of predictions requires that a few "ground rules" (Silver's phrase) be followed, such as the prediction must be made in advance of the event being predicted by academics using data available at the time the forecast was made. Importantly, Silver highlights the need to include all predictions that were made. He observes a "trick" of the trade:
But one “trick” that some of the forecasters use is to highlight the version of the forecast that seems to match the polls or the consensus view, while burying the others in the fine print. One forecaster in 2008, for instance, published four versions of his model, two of which showed a clear advantage for John McCain and the other two for Barack Obama. Since there was abundant evidence by the late summer that Mr. Obama was the favorite, he recommended that people look at the pro-Obama versions. However, he was presumably doing so because he was aware of factors like polls that he hadn’t originally deemed useful to include in his model. We treat all these versions equally: if it was published and has your name on it, it counts. 
That "trick" doesn't happen in other fields, does it? ;-) In a paper that I wrote in 2009 evaluating hurricane landfall forecasts, I gave this "trick" a fancier name -- "when the hot hand fallacy meets the guaranteed winner scam" (discussed here in PDF), and long-time readers may recall RMS and the monkeys, but I digress.

Silver evaluates the standard error in predictions and finds not just quantitatively poor performance, but results that are literally all over the place.
  • 1992, "little consensus among the models, a high standard error, and much less accuracy than claimed."
  • 1996: " the models had many of the same problems" 
  • 2000: "the models had their worst year" 
  • 2004: "similar to 1992 or 1996" 
  • 2008: "the divide in the models was especially wide"
In total, 18 of the 58 models — more than 30 percent — missed by a margin outside their 95 percent confidence interval, something that is supposed to happen only one time in 20 (or about three times out of 58).
Silver promises that an upcoming post will look at the predictive skill of the models against a naive baseline, an approach that is necessary to quantitatively evaluate the value-added by the forecasting methodology. He says that "The results are not going to be flattering." Back in 2004 I wrote that "such models are little more than parlor games for academics" and Silver's analysis doesn't really compel me to change my mind.

Silver has a book coming out later this year on predictions and if it is of the high quality of this post (and many of his others) it'll be fantastic.


  1. Very enjoyable, thank you. For some reason reminded me of Olympic predictions!

    Did you come across 'Future Babble' by Gardner, published 2011 ('why expert predictions fail and why we believe them anyway'), based I think on Tetlock's work? I was less interested in the 'why we believe part' than the very good examples of complete failure of predictions even in what one might consider more tractable fields. He's strong on economic predictions.


    Here's an example of prediction-testing from my field, from a VERY smart hedge-fund called Marshall Wace. They were faced with '000's of stock analysts at investment banks and brokers, chucking out predictions right left and centre. How to sort the wheat from the chaff, especially when even the good analysts might not believe their 'buy/sell', it might just be a condition of the job or the corporate client to slap on a recommendation of some sort, ie very few of the predictions were in reality meant to be predictions. Some of those analysts HAD TO BE really smart and worth listening to, but there was no point back-testing because they couldn't tell what value the analyst placed on the rec.

    So they said to the entire field, we've set up a system where when you really believe in your 'buy' or 'sell', enter it into the system. The analyst was blind, he had no idea whether any position was initiated by MW, but the carrot was an 'audited track record' for the analyst AND directed commission flows to that firm and hence that analyst's bonus if they were good. So his market value and compensation would rise.

    It worked. The analysts Darwin-ed themselves into winners and losers, and after some time testing MW set up a huge fund called, amusingly, TIPS on the back of it (and by clever system design and transparency to the regulator and compliance departments managed to swerve any allegations of front-running or privileged information at the analyst end, no analyst would be cretinous enough to enter a 'trade' into a trackable system based on that, and MW were clean, they couldn't know what a recommendation was based on).

    I don't know how MW ran their selection-testing at their end, how they tried to discern lucky from smart, but I'd guess they used a naive baseline of historic beta versus sector, and given the focus on a single objective of 'does it make money with an acceptable Sharpe Ratio in a hedged portfolio setting' it shouldn't be that difficult. They weren't short of data.

    The part that was of beauty were the 'blind' features. An analyst would be rewarded via 'hypothecated' commission flows to his firm (very probably in quite different stocks) for good predictions whether or not they were in fact implemented, he had no idea.

  2. I've always wondered when Nate will wander into the statistical politics of climate change. You know he's a liberal; you know he's into statistics; you know he dabbles into cross-curricular material (much like yourself)... It seems perfect for him to toss his hat in regardless of field-relevancy. Krugman does it all the time. It would be neat to see him apply the same sort of rigor in the realm of proxies, attribution, and corelation.

  3. Having been cognizant of every US Presidential election campaign since 1972, I have become convinced that the process is almost entirely irrational [however rational we may be as individuals]. How anyone could suppose that the outcome of a completely irrational process can be predicted on the basis of some "fundamental" process I don't know.

    If predictions based upon various fundamental processes were possible it would HAVE to be because these "fundamentals" actually 'cause' the election result in some real way. By what mechanism could: Housing Starts, the Prime Rate, Unemployment Rate, & etc [however artfully massaged] CAUSE a particular candidate's success? How?

    Think about it for a moment, suppose the Prime Rate really did cause particular candidates to win or lose elections and so to prevent unfair advantage to candidate X or Y NEWS of the Prime Rate was kept from the public to keep elections fair. Would the Prime Rate still have its [magic] effect upon the electorate? - or - would we see that it is the NEWS of the Prime Rate fed very artfully to the electorate through that mass psychogenic illness inducing 'process' known as the 'news media' that 'causes' the magical electoral effect?

    My opinion is that these predictive models are nothing more than very sophistry-cated RANDOM NUMBER GENERATORS. Someone with more math skills than me could probably demonstrate this mathematically [if we had the code].

    I would tend to prefer the prognostication of a psychic octopus before one of these so-called models.


  4. The irony, in this context, is that while human behavior is completely (within the known limits) characterized, as with the climate system, it is too unwieldy to predict. The only suggestion I would offer is that as individuals form ever larger cooperatives, their perceptions of reality, principles, actions, etc. tend to converge.

  5. dart throwing chimps (per Tetlock, Philip) http://www.amazon.com/Expert-Political-Judgment-Good-Know/dp/0691123020

  6. Silver's pbservations are interesting, at the same time he is on the horns of a dilema.

    The outcome of elections is determined by something(s). Polls are merely a means of measuring the effects of those things, prior to the election itself which is essentially just another poll with consequences.

    Whatever their reasons voters do not decide how to vote based on polls - essentially circular reasoning. Polls predict elections by measuring those reasons indirectly. They may have greater accuracy, but they are not actually predictions or forecasts - whatever the explanation of poll results it is after the fact.

  7. The only presidential prediction model familiar to me is by Abramowitz and, according to Silver's numbers, it does pretty well. It depends on GDP growth and party incumbency, but also includes the approval rating 17 months out. It appears to overstate the likelihood of an incumbent's vote % by about 2%, but that systematic error is easily corrected. The variation after correction is only about +-1%.

    The bottom line is that presidential prediction models can be reasonably good IF they include a variable based on voter sentiment, which is already a measure of many complex criteria and how they affect each other.