09 August 2010

Skill in Prediction, Part IIb, The Naive Prediction

OK, I received 14 usable, independent naive predictions from my request (thanks all), which you can see in the graph above.  The time series is of course the CRU global temperature anomaly (for January, multiplied by 100 then added to 100, just to throw you off;-).

As you can see, I received a very wide range of proposed naive forecasts.  The red line shows the average of the various forecasts. It has only a very small trend.

I was motivated to do this little experiment after reading Julia Hargreaves recent essay on the skill of Jim Hansen's 1988 Scenario B forecast.  They used a naive forecast of simply extending the final value of the observed data into the future as their baseline (which several people suggested in this exercise).  Not surprisingly, they found that Hansen's forecast was skillful compared to this baseline, as shown below.
My initial reaction was that this naive forecast was way too low a threshold for a skill test.  But then again, I know what the dataset was and how history had played out.  So I could simply be reflecting my own biases in that judgment.  So I decided to conduct this little blog experiment with a blind test, using my readers to to see what might result (and thanks to Dr. Hargreaves for the CRU data that she used).  What you readers came up with is little different from what Hargreaves used.

Upon reflection, it would have been better to stop the CRU data in 1988 when asking for the naive forecasts.  However, given that the dataset to 2009 has more of a trend than to 1988, I can conclude that the use of a zero trend baseline by Hargreaves is certainly justifiable.  But, as you can see from the spread of naive forecasts in the figure at the top, many other possible naive trends are also justifiable.

This helps to illustrate the fact that the selection of the naive trend against which to measure skill is in many respects arbitrary and almost certainly influenced by extra-scientific considerations.  Were I a forecaster whose salary depended on a skillful forecast, I'd certainly argue for an easy-to-beat metric of skill!  You can imagine all sorts of these types of issues becoming wrapped up in a debate over the appropriate naive forecast used to determine skill.

Some lessons from this exercise:

1. In situations where an evaluation of skill is being conducted and there are not well-established naive baselines, it is probably not a good idea to have the same person/group doing the evaluation come up with the naive forecast against which skill will be judged -- especially after the forecast has been made and observations collected.

2. Related, metrics of skill should be negotiated and agreed upon at the time that a forecast is issued, to avoid these sort of problems.  Those in the climate community who issue long-term forecasts (such as associated with climate change) generally do not have a systematic approach to forecast verification.

Evaluating skill of forecasts is important, even if it takes a long time for that judgment to occur.

13 comments:

  1. The wisdom of Roger's crowd seems to indicate a unit root rather than a deterministic trend (compare shape of confidence intervals here and here, that also happens to be the conclusion supported by the time-series econometrics, kind of a neat result for a quick-and-dirty blog poll).

    From the abstract of the linked paper:In general, assessments of predictions based on today's climate models should use Bayesian methods, in which the inevitable subjective decisions are made explicit.
    What a good idea.

    ReplyDelete
  2. Armstrong and Green have a similar model validation exercise, but arrive at generally different conclusions than James and Jules when it comes to climate model skillfullness:

    http://www.masterresource.org/2009/11/simple-model-leaves-expensive-climate-models-cold/

    -Chip

    ReplyDelete
  3. jstults left this comment and Google apparently ate it:

    "The wisdom of Roger's crowd seems to indicate a unit root rather than a deterministic trend (compare shape of confidence intervals here and here, that also happens to be the conclusion supported by the time-series econometrics, kind of a neat result for a quick-and-dirty blog poll).

    From the abstract of the linked paper:In general, assessments of predictions based on today's climate models should use Bayesian methods, in which the inevitable subjective decisions are made explicit.
    What a good idea."

    ReplyDelete
  4. Links from 'eaten' post:
    - unit root comment, confidence intervals are Figures 3 and 4
    - Bayesian good idea

    ReplyDelete
  5. @ jstults

    Indeed. A naive forecast for the trend should properly be represented by a fan of potential outcomes. For series with a stochastic trend, as this data exhibits, that fan will be quite wide - increasingly divergent and unbounded. It will NOT be more narrow, increasing but bounded range (like that provided by Lucia)

    ReplyDelete
  6. Roger, you are misusing naive. If nothing else, a naive Bayes classifier has a very particular meaning, which none of your estimates meets, because they are all based on relationships with other physical/philosophical/policy constructs.

    Another piece of advice, don't get into a statistical pissing match with James or Julia. It always ends badly for you.

    ReplyDelete
  7. In case anyone wondered why that curve didn't *look* like CRUTEM, it was just the Januarys:

    http://www.woodfortrees.org/plot/hadcrut3vgl/from:1958/to:2010/every:12/scale:100/offset:100

    ... which means that the 2-year oscillation I was seeing at inflection points was scraping the Nyquist frequency of a sampling, and thus most likely an alias of something and non-physical.

    On the other hand, Roger, if someone *should* discover a 2-year oscillation associated with climate regime changes, I'd appreciate them naming it "the Hall effect" :-)

    ReplyDelete
  8. -6-Josh

    Thanks for the catch, fixed in the text!

    ReplyDelete
  9. I guess splicing gisstemp onto the series wasnt such a bad guess after all.

    the ensemble mean of all naive forecasts would have been an interesting thing to compute

    ReplyDelete
  10. Roger:
    Well I was certainly naive about naive.

    OK, now I understand the context, why wouldn't the naive estimate be the trend that existed before the current "experiment" started. In other words what was the trend prior to any change in the rate of increase in GHG emissions, i.e, the supposedly natural longterm forcing. It looks to me that naive assumption would be a positive trend, looking at the slope of temperatures from 1850 to say 1930 or 1940.

    ReplyDelete
  11. Roger:
    Do you have a digitized version of the data for the entire period shown in the graph?

    ReplyDelete
  12. So the initial frame of this problem was that a baseline based on naive prediction can sometimes be more powerful or indistinguishable from a "skilled" prediction.

    But now we've got a whole host of naive baselines and they're all over the map. What exactly are we supposed to take as the baseline for purposes of comparison?

    ReplyDelete
  13. -12-Nathan Smith

    The arbitrary nature of baseline selection is one lesson to take from this exercise;-)

    ReplyDelete