We employ the percentage improvement in mean square error over a climatological
forecast (MSEClim) as our skill score measure. The climatologies
used here are a rolling prior 10-year climatology for the replicated real-time
forecast period 1988-2002, a fixed 51-year climatology 1952-2002 for the
cross-validated hindcast period 1952-2002 and a fixed 53-year climatology
1950-2002 for the cross-validated hindcast period 1950-2002.
MSEClim is the standard skill metric recommended by the World
Meteorological Organisation for verification of deterministic seasonal forecasts
[*WMO*, 2002]. MSEClim is a robust skill measure which is
immune to the bias problems associated with the correlation and percent age of
variance skill measures. Positive skill indicates the model does better than a
climatology forecast, while negative skill indicates that it does worse than
climatology.

We compute the statistical significance of the MSEClim skill
using the bootstrap method [*Efron*, 1979; also see
*Efron and Gong*, 1983]. The bootstrap tests the hypothesis that the
model forecasts are more skilful than those from climatology to a given level
of significance. We apply the bootstrap by randomly selecting (with
replacement) 15 (1988-2002), 51 (1952-2002) or 53 (1950-2002) actual values
together with their associated predicted and climatology forecast values to
provide a fresh set of hindcasts for which the MSEClim skill
measure can be calculated. This process is repeated 2,000 times and the results
histogrammed to give the required skill score. Provided the original
data are independent (in distribution and in order), the distribution of these
recalculated values maps the uncertainty in the forecast skill about the
original value over a 15-year or 51(53)-year period. 95% two-tailed confidence
intervals for this uncertainty are then readily obtained. Where the lower
boundary of this 95% confidence interval has an MSEClim skill
value greater than 0% the model forecast has skill better than climatology to
97.5% confidence.

- Efron, B., Bootstrap methods: another look at the jackknife,
*The Annals of Statistics*,**7**, 1-26, 1979.

- Efron, B. and G.Gong, A leisurely look at the bootstrap, the jackknife, and
cross-validation,
*The American Statistician*,**37**, 36-48, 1983.

- WMO, Standardised Verification System (SVS) for Long-Range Forecasts (LRF)
New Attachment II-9 to the Manual on the GDPS (WMO-No. 485), Volume I, WMO,
Geneva, 2002.