29 May 2012

Hot Hands and Guranteed Winners

In a 2009 paper I laid out an argument that explored what happens when "the guaranteed winner scam meets the hot hand fallacy" (PDF). It went as follows, drawing upon two dynamics:
The first of these dynamics is what might be called the ‘guaranteed winner scam’. It works like this: select 65,536 people and tell them that you have developed a methodology that allows for 100 per cent accurate prediction of the winner of next weekend’s big football game. You split the group of 65,536 into equal halves and send one half a guaranteed prediction of victory for one team, and the other half a guaranteed win on the other team. You have ensured that your prediction will be viewed as correct by 32,768 people. Each week you can proceed in this fashion. By the time eight weeks have gone by there will be 256 people anxiously waiting for your next week’s selection because you have demonstrated remarkable predictive capabilities, having provided them with eight perfect picks. Presumably they will now be ready to pay a handsome price for the predictions you offer in week nine.
The second,
. . . is the ‘hot hand fallacy’ which was coined to describe how people misinterpret random sequences, based on how they view the tendency of basketball players to be ‘streak shooters’ or have the ‘hot hand’ (Gilovich et al., 1985). The ‘hot hand fallacy’ holds that the probability in a random process of a ‘hit’ (i.e. a made basket or a successful hurricane landfall forecast) is higher after a ‘hit’ than the baseline probability.9 In other words, people often see patterns in random signals that they then use, incorrectly, to ascribe information about the future.
In the paper I used the dynamics to explain why there is not likely to be convergence on the skill of hurricane landfall forecasts anytime soon. The existence of (essentially) an infinite number of models of hurricane landfalls coupled with the certainty that unfolding experience will closely approximate a subset of available models creates a context ripe for seeing spurious relationships and chasing randomness. However, the basic argument has much more general applicability.

A new paper is just out by Nattavudh Powdthavee and Yohanes E. Riyanto from the Institute for the Syudy of Labor in Bonn, Germany provides some empirical support for this argument. The paper --titled, "Why Do People Pay for Useless Advice? Implications of Gambler’s and Hot-Hand Fallacies in False-Expert Setting --  looks "experimentally whether people can be induced to believe in a non-existent expert, and subsequently pay for what can only be described as transparently useless advice about future chance events."

In the study they authors operationalized the dynamics of the "the guaranteed winner scam meets the hot hand fallacy" using coin flips, while going to great lengths to ensure that the participants were aware that the coin being flipped was fair (i.e., the flips were random), even going so far as to have the participants furnish the coin.

They found that upon receiving an accurate "prediction" of the subsequent coin flip, many participants were willing to abandon any assumption of randomness and pay for a prediction of the next toss:
On average, the probability of buying the prediction in Round 2 for people who received a correct prediction in Round 1 was 5 percentage points higher than those who previously received an incorrect prediction in Round 1 (P=0.046). The effect is monotonic and well-defined; probabilities of buying were 15 percentage points (P=0.000), 19 percentage points (P=0.000), and 28 percentage points (P=0.000) higher in Rounds 3, 4, and 5 . . .
 The authors identify two interesting results:
The first is that observations of a short streak of successful predictions of a truly random event are sufficient to generate a significant belief in the hot hand of an agent; the perception which also did not revert even in the final round of coin flip. By contrast, the equally unlikely streak of incorrect predictions also generated a relatively weak belief in the existent of an “unlucky” agent whose luck was perceived to be likely to revert before the game finishes; evidence which was reflected in an increase in the subject’s propensity to buy in the final round of coin flip.
The study also looked at whether characteristics of the participants might be related to their behavior, finding: "there is no statistical evidence that some people are systematically more (or less) susceptible to the measured effects."

What does this study mean for how we thing about science in decision making?

While the authors focus on "false" experts, the findings have much broader relevance in the context of "true" experts. The simple reason for this is that the distribution of legitimate scientific findings about many complex subjects covers an enormous range of possible outcomes. Not all of these outcomes can simultaneously be correct -- whether they are looking at the past, at causality or offering projections of the future.

In the example that use from my paper cited above, I explain how a single scientific paper on hurricane landfalls provides 20 scientifically legitimate predictions of how many hurricanes would hit the US over the subsequent 5 years:
Consider, for example, Jewson et al. (2009) which presents a suite of 20 different models that lead to predictions of 2007–2012 landfall activity to be from more than 8 per cent below the 1900–2006 mean to 43 per cent above that mean, with 18 values falling in between. Over the next five years it is virtually certain that one or more of these models will have provided a prediction that will be more accurate than the long-term historical baseline (i.e. will be skilful). A broader review of the literature beyond this one paper would show an even wider range of predictions. The user of these predictions has no way of knowing whether the skill was the result of true predictive skill or just chance, given a very wide range of available predictions. And because the scientific community is constantly introducing new methods of prediction the ‘guaranteed winner scam’ can go on forever with little hope for certainty.8
Such models are of far more than academic interest -- they guide hundreds of billions of dollars in investment and financial decisions related to insurance and resinsurance. What if such decisions rest on an intellectual house of cards? How would we know?

The general issue is that a bigger problem than discerning legitimate from illegitimate expertise is figuring out how to use all of the legitimate expertise at our disposal. The dynamics of the "guaranteed winner scam meets the hot hand fallacy" also presents a challenge for experts themselves in interpreting the results of research in light of evolving experience. As experts are people too, they will be subject to the same incentives in and obstacles to interpreting information as were found by Powdthavee and Riyanto.
The dominant strategies in political discourse used to deal with this situation of too much legitimate science are to argue that there is one true perspective (the argument from consensus) or that experts can be judged according to their non-expert characteristics (argument by association). My experiences over the past decade or so related to extreme events and climate change provides a great example how such strategies play out in practice, among both experts and activists.

As we have learned, neither strategy is actually a good substitute for evaluating knowledge claims and understanding that uncertainty and ignorance are often irreducible, and decisions must be made accordingly.


  1. These statistical points pretty much sum up paleoclimatology for the past 15 years or so.

    And probably much of climatology too.

  2. Life is an exercise in risk management; or there are no mortal gods and people should not be deceived to believe there are. Even people who place their faith in divinity acknowledge that their consciousness was endowed with the condition that they will be judged individually; and as mortal beings we are neither omniscient nor omnipotent.

    As for knowledge, there is what we know, don't know, and are incapable of knowing. When we restrict our faith to a constrained frame of reference or an "objective" reality, our knowledge can reasonably be classified as the first and second. When we consider knowledge supported by limited, circumstantial evidence, unreproducible and untestable, which is likely and often a permanent condition, then it should be classified as the third and not conflated with science.

    The Earth's system is incompletely characterized and unwieldy. Its behavior is influenced by subterranean, terrestrial, atmospheric, extraterrestrial, and, yes, human (or conscious) effects. The system can be modeled in the long-term as chaotic with limited accuracy determined by its behavioral envelope.

    While the need to express precision where it is not forthcoming may be desirable and comforting, it only serves to distort reality and undermines a proper assessment of risk. It is a distraction taken with various motives and most notably for material gain. A further distortion occurs when the risk is not exclusively born by the actor(s), which is most prevalent with sovereign actors.

    That said, if only Hitler would have occupied himself with his fascination for cute kittens, it would have saved the world tens of millions of lives. However, even without his input, there seem to be an unending supply of individuals (e.g. Stalin, Zedong) and cooperatives (e.g. communists, socialists) who suffer from delusions of grandeur and desire to control or destroy other sentient beings. They have a "god" complex and do not respect individual dignity.

  3. "Hot dice" would be a better name for the fallacy.

    Bio metric analysis can observe streaks not as random processes, but by the details of the players motion. A free throw shooter may be "cold" not because of statistical variation, but because muscle soreness (or lack of focus on proper mechanics), is interfering with some element of performing the shot.

    Feeding a player with a "hot hand" (and not feeding somebody gone cold) is not statistically naive, but recognizes (and takes advantage of) the naturally (or "man made" e.g. a hangover) variations on the center point for that day's group of samples.

  4. Very amusing after your post on US hurricane landfalls.

  5. Yes, HI was stupid to push, for however many hours, the unabomber comparison to AGW believers. So how stupid was pushing the claim, now definitively shown to be a fraud, that skeptics were threatening the lives of Australian academics for many months? More to the point, where were the lines of AGW believers, as there were lines of skeptics, condemning the fraudulent indictment of skeptics?
    And I would point out that HI's mistake is one. AGW hypesters regularly engage in scurrilous and phony claims regarding skeptics. The focus on HI, and your unpleasant Hitlerian riff above, seems a bit contrived when contrasted against the years of AGW promoters doing even more to skeptics.
    Or is the underlying unmentionable assumption that since skeptics are the good guys, they must be held to higher standards?

  6. "Bio metric analysis can observe streaks not as random processes, but by the details of the players motion."

    I bet it has almost no skill at doing so before the shot is made.

    And if it cannot be used before, but only to validate known information afterwards, what is the point of it?

    You appear to have not realised the strength of fallacies laid out by Roger. "Hot hands" can be post facto rationalised, for sure. But that rationalisation is totally meaningless, both as an explanation and as a method of prediction.

  7. -3-jzulauf, -1-mooloo

    I have never liked the basketball analogy for the "hot hand" fallacy. I have no problem with the logic of the fallacy (obviously), however, having spent my fair share of time on the court, I am pretty sure that some days are better than others.

    See Tony Parker last night? ;-)

  8. Roger...are you aware that someone apparently has paid to run this image (alone without editorial content) as an ad on your site?


  9. Of course some days are better than others. That's luck for you.

    Personally I can't stand basketball. And the tendency of journalists to build narratives around its statistics confirms it for me. XKCD is on the button: http://xkcd.com/904/.

    (My sport is rugby.)

    I do have a soft spot for Tony Parker though. Because I love the way the French describe him as "French".

  10. Mooloo... what biometrics can measure can be observed in real-time. The "hot hand" is observable and usable not on the prediction of a given shot, but in terms of to the center of the players *current* distribution.

    Certain types of athletes are *very* good at reading the body language of other players. Batters, do not have enough time to read and react to a pitch from the time it leaves the pitchers hand, and thus are reading the pitcher. Good receivers can read the defensive player and "create separation" when they sense the defender committing to a "fake", fencers, martial artists and the like respond to subtle motions. Thus the subtle motion differences between "hot" and "cold" are certainly conceivably (and likely actually) observable by player and coach alike.

    On a related topic, some studies have shown that "momentum" is measurable in blood chemistry (for example testosterone levels).

  11. Jzulauf,

    I agree that the best players in sports can read a player before they finish an action. They can predict a fast ball, a faked pass or a feint attack.

    But this is not the same as spotting hot and cold.

    What you are talking about is an ability to read physical movements to make a judgement about whether an action is well performed or not. That's unexceptional. Even poor golfers know immediately they hit the ball, long before they check the path, whether they have hit it well or not.

    But the "hot hands" fallacy is that you can reliably predict before the action is made an increased success in the outcome, because the athlete (or whatever) is currently running "hot".

  12. There are times when athletes have it and times when they don't. See e.g. Tiger Woods lately. If Tiger is about to hit his tee shot on 18 and he's hit the fairway 10 of 13 times previously that day, his odds of hitting the fairway are not the same as those days when he's hit 2 of 13 fairways previously. On the good day, the swing thought and execution are likely different (perhaps very different as analysis of his swing often shows) than on the bad day. Hitting the fairway on 18 isn't just a matter of good luck vs. bad luck with the odds unchanged.

    The same can be said of pitchers, quarterbacks, basketball shooters, tennis players and most every athlete. Every day, you look for that key that grooves your stroke/pitch/throw/hit/whatever. Some days you get in the groove and are consistently more successful. And some days are a just a damn struggle.

    Any stats guru who thinks every day is the same and it's all just streaky dice throws simply doesn't understand athletic performance.

  13. Put another way, at the end of his career Mickey Mantle struggled with injuries and often hangovers. The stats guru wants to pretend that his performance on any given day was just a stratomatic game determined by the roll of the dice. As Gen. McAuliffe said, "Nuts." Mickey's odds of belting one out were much better when his knee and wrist weren't throbbing with pain and his head wasn't pounding from a killer headache.