We employ the percentage improvement in mean square error over a climatological forecast (MSEClim) as our skill score measure. The climatologies used here are a rolling prior 10-year climatology for the replicated real-time forecast period 1988-2002, a fixed 51-year climatology 1952-2002 for the cross-validated hindcast period 1952-2002 and a fixed 53-year climatology 1950-2002 for the cross-validated hindcast period 1950-2002. MSEClim is the standard skill metric recommended by the World Meteorological Organisation for verification of deterministic seasonal forecasts [WMO, 2002]. MSEClim is a robust skill measure which is immune to the bias problems associated with the correlation and percent age of variance skill measures. Positive skill indicates the model does better than a climatology forecast, while negative skill indicates that it does worse than climatology.
We compute the statistical significance of the MSEClim skill using the bootstrap method [Efron, 1979; also see Efron and Gong, 1983]. The bootstrap tests the hypothesis that the model forecasts are more skilful than those from climatology to a given level of significance. We apply the bootstrap by randomly selecting (with replacement) 15 (1988-2002), 51 (1952-2002) or 53 (1950-2002) actual values together with their associated predicted and climatology forecast values to provide a fresh set of hindcasts for which the MSEClim skill measure can be calculated. This process is repeated 2,000 times and the results histogrammed to give the required skill score. Provided the original data are independent (in distribution and in order), the distribution of these recalculated values maps the uncertainty in the forecast skill about the original value over a 15-year or 51(53)-year period. 95% two-tailed confidence intervals for this uncertainty are then readily obtained. Where the lower boundary of this 95% confidence interval has an MSEClim skill value greater than 0% the model forecast has skill better than climatology to 97.5% confidence.
- Efron, B., Bootstrap methods: another look at the jackknife,
The Annals of Statistics, 7, 1-26, 1979.
- Efron, B. and G.Gong, A leisurely look at the bootstrap, the jackknife, and
cross-validation, The American Statistician, 37,
- WMO, Standardised Verification System (SVS) for Long-Range Forecasts (LRF)
New Attachment II-9 to the Manual on the GDPS (WMO-No. 485), Volume I, WMO,