|
||||||||||||||||||||||||||||||||||||||||||
BMRC is now part of CAWCR: The Centre for Australian Weather and Climate Research.
For more information on The Centre please go to http://www.cawcr.gov.au
|
|
The Potential for Improved Statistical Seasonal Climate Forecasts.
Wasyl Drosdowsky
Bureau of Meteorology Research Centre,
and
Rob Allan
CSIRO Division of Atmospheric Research.
Paper presented at the Symposium on
"Applications of Seasonal Climate Forecasting in Agricultural and Natural
ecosystems - the Australian Experience",
held in Brisbane, Queensland, November 1997.
Abstract
From its beginning, statistical climate prediction has been hampered
by poor observational data coverage, both spatially and temporally; incomplete
theoretical understanding of the climate system; availability of only basic
statistical techniques and limited computational capabilities. Progress
in recent years, and the potential for further improvement in the future
is the result of improvements in all these areas. Obviously these topics
are all connected, and improvement in one area leads to, or requires, improvement
in the others.
Until recently, most seasonal climate outlooks, such as that provided
by the Bureau of Meteorology National Climate Centre (NCC) have used the
Southern Oscillation Index (SOI) as the sole or major predictor, and simple
forecast techniques such as linear regression. Now seasonal forecasts are
being issued based on large scale surface temperature (SST) and global
circulation anomalies. The availability of large multi-variable data sets
has and will continue to increase the use of more advanced statistical
techniques beyond the simple linear regression approach. The availability
of many additional potential predictors, and different statistical forecasting
systems greatly increases the possibility of artificial skill, which can
be minimised by rigorous cross validation techniques.
Much of the potential predictability in Australian seasonal rainfall
one season ahead, may be realised by a system utilising global SST patterns
as predictors. However, climate variability encompasses other parameters,
and occurs on a variety of time scales ranging from intra-seasonal through
inter-annual (ENSO) through to decadal. This leaves considerable scope
for improvement, or expansion of seasonal climate prediction of other parameters
besides rainfall, and with increased lead times, and different target season
length.
Introduction
An historical account of the current seasonal climate outlook service
provided by the Bureau of Meteorology National Climate Centre (NCC) is
given by de Hoedt et al (1998). This service currently uses the
Southern Oscillation Index (SOI) as its sole predictor. Elsewhere, particularly
at the National Center for Environmental Prediction (NCEP) Washington,
and the Hadley Centre for Climate Prediction and Research of the United
Kingdom Meteorological Office (UKMO) empirical seasonal forecasts are being
issued based on large scale sea surface temperature (SST) anomalies, amongst
other predictors. A similar scheme is under development, and has been run
in quasi-operational mode for the past twelve months, in the Bureau of
Meteorology Research Centre (BMRC) (Drosdowsky and Chambers, 1998).
In order to assess the potential for improved statistical seasonal climate forecasts for Australia beyond that provided by either the current system, or the BMRC SST based forecast system, we need to consider the basic requirements for such a forecast system. These include;
(a) long time series of reliable data, of both the predictand and any potential predictors,
(b) appropriate statistical methodologies to analyse and find relationships between this data,
(c) the computing power, including data storage capacity, and ultimately,
(d) a knowledge of the science or the theoretical background to verify the physical plausibility of the predictor - predictand relationships.
Until very recently, statistical climate prediction has been restricted
in all these requirements.
Data
Most observational data networks have been designed for weather forecasting,
and while these may have adequate spatial coverage at present, this has
not always been the case. As a result very few stations have the long,
continuous and homogeneous records necessary for seasonal prediction. In
the past 10 to 15 years much effort has been directed towards the compilation
of historical data, for example the global SST data in the Comprehensive
Ocean Atmosphere Data Set (COADS) (Slutz et al., 1985). This compilation
of ship data has been further refined to interpolate missing values, and
include other sources of SST such as satellite radiances, resulting in
the UKMO Global sea-Ice and Sea Surface Temperature (GISST) data set (Rayner
et al., 1998), and the NCEP SST reanalysis (Reynolds and Smith, 1994).
Other global surface and upper air data sets being developed include the
Global Historical Climate Network (GHCN) data set (Vose et al., 1992),
the Global Mean Sea Level Pressure (GMSLP) data set from the UKMO (Allan
et al., 1996, Basnett and Parker 1997), and the Comprehensive Aerological
Reference Data Set (CARDS) (Eskridge et al., 1995). Over the past few years
this array of observed data has been subjected to comprehensive global
reanalyses using current, state-of-the-art analysis systems at NCEP (Kalnay
et al., 1996) and the European Centre for Medium Range Weather Forecasts
(ECMWF) (Gibson et al., 1997).
Computing Power.
The global reanalysis projects at NCEP and ECMWF are feasible due to
the enormous computing and data storage facilities available at these centres.
The full potential of these large data sets can now also be realised by
national meteorological services. In Australia, access to historical data
has been greatly enhanced with the introduction of the Australian Data
Archive for Meteorology (ADAM) (Lee, 1994) system within the National Climate
Centre. This now allows virtually the entire historical data base to be
kept on disk and accessed easily from anywhere within the Bureau of Meteorology
and by external users.
Statistical Methods.
The availability of these large multi-variable data sets and increased
computing capabilities has and will increase the use of more advanced statistical
techniques, beyond the simple traditional methods such as linear correlation
and regression. These statistical techniques are used for a number of purposes.
Multivariate analysis techniques such as Principal Component Analysis (PCA)
or Empirical Orthogonal Functions (EOF) analysis, Singular Value Decomposition
(SVD), cluster analysis, and Canonical Correlation analysis (CCA), are
now commonplace in climate research and seasonal climate forecasting, being
used as data reduction techniques, and also to explore patterns or modes
of variability in the data (see reviews of these techniques in Mann and
Park 1998). More powerful forecast techniques such as linear discriminant
analysis (LDA) and non linear methods such as neural networks are being
applied to seasonal forecasts.
An essential requirement of all forecast schemes, whether statistical
or dynamical, is an estimate of the skill to be expected by the scheme.
Here we need to distinguish between the in-sample skill estimated by using
all the data, ie the model fit, and true out-of-sample skill obtained by
testing the model on independent data. This latter skill is usually obtained
through a cross-validation or "leave-one-out" procedure. The difference
between these two estimates is sometimes referred to as the artificial
skill.
A more important source of possible artificial skill in seasonal climate
predictions is the availability of many additional potential predictors,
advanced statistical techniques and forecast methodologies and the computing
power to apply them. To obtain a true out-of-sample skill estimate, when
selecting between different potential predictors or different forecast
models a nested cross validation procedure is necessary. Alternatively,
an improved theoretical understanding of the climate system, and experience
with global climate models, may lead to the formulation of appropriate
a
priori hypotheses which can be tested by the improved data base, rather
than engaging in haphazard "fishing expeditions" which will almost always
find some statistical relationship between the predictand and a large number
of potential predictors. Simply selecting the best set of predictors or
best model will lead to increased artificial skill.
The BMRC SST based prediction system.
To illustrate the interaction between the various factors described
above we present a brief description of the BMRC SST based statistical
system. The predictand is the Australian seasonal rainfall anomaly, on
a uniform one degree grid over the entire continent. This grided data set
was developed by NCC using all the available monthly rainfall records in
the ADAM data base (Jones and Weymouth, 1998). The predictand to be used
is the global or near global SST anomaly field. Both these data sets consist
of many hundreds of individual data gridpoints. To relate one field to
another directly would mean examining many thousands of possible predictor
- predictand combinations. Clearly these will not be all independent as
there is appreciable spatial correlation in both fields. To reduce the
number of predictors and predictands to manageable size, and at the same
time produce largely uncorrelated predictors, the coherent variability
of both fields is summarised by rotated principal component analysis. For
the rainfall data, retaining the first nine components (Figure
1) accounts for about 60% of the variance, while the first twelve components
(Figure 2) account for about 50% of
the SST data variance. This procedure reduces the number of potential predictors
to around twenty to thirty, if not all SST components are considered, but
some lagged values (at sufficiently large lags to reduce the autocorrelations
to insignificant levels) are also used. The number of actual forecasts
required for each season is then reduced from over 600 to just the nine
rainfall principal components. An additional benefit is that the resulting
forecast, when interpolated back to the original gridpoints in some appropriate
manner will be much smoother spatially than that produced by forecasting
for each individual gridpoint.
Having reduced the predictors and predictands to a small number of uncorrelated
principal components we need to decide on a prediction methodology; the
method chosen for this forecast scheme is LDA, mainly due to its ability
to produce probability forecasts. In LDA the rainfall for each season in
the training set is categorised according to its rank in the cumulative
frequency distribution, with the driest third of years assigned to tercile
1 (Below normal), the middle third to tercile 2 (Near normal), and the
wettest third to tercile three (Above normal). LDA then uses Bayes Theorem
to estimate the probability of a new observation of SSTs belonging to a
particular category, based on the distribution of the observations in the
training set. This approach is identical to that employed by Ward and Folland
(1991) to predict rainfall in Northeast Brazil using eigenvectors of global
SST .
The estimate of hindcast skill (ie based on the training set of data)
can be obtained in a number of ways. The two most common methods are the
"Holdout Method" in which the data is split into the development set to
which the LDA model is fitted, and a test sample (usually 1/4 to 1/3 of
the total) to which the model is applied and from which the hindcast skill
is calculated. The second method, and that used here, is the "Leave-One-Out"
or cross validation method. This is similar to the Holdout method except
that the split is into N-1 cases in the development sample, and only one
test case. However this procedure is then repeated over all N cases, leaving
one out each time. Skill is then evaluated over all N hindcasts. In either
method the development and test samples must be kept independent to prevent
the introduction of artificial skill.
This discussion applies to the simple case where the form of the LDA
model, ie the predictors to be used, are known or specified in advance.
When the structure of the model is not known, ie when we need to select
the best subset of predictors from a large pool of potential predictors,
then the selection procedure itself must also be cross-validated, as described
in detail by Elsner and Schmertmann (1994). For the BMRC SST forecast system,
we have examined the use of up to ten SST principal components, lagged
by one and three months for a total of twenty potential predictors (Drosdowsky
and Chambers, 1998). With such a large number of potential predictors it
is not feasible to include every possible subset of size up to 20 predictors;
in this case the search has been restricted to the best model involving
a subset of at most two predictors. This still results in over 200 possible
forecast models, and, as shown by Drosdowsky and Chambers (1998), an enormous
potential for artificial skill as measured by the difference between the
"in-sample" and the cross-validated "out of sample" skill. Nevertheless,
the true cross-validated skill of this system exceeds that obtained from
the SOI alone, particularly through the "autumn predictability barrier"
when El Nino Southern Oscillation (ENSO) events typically change phase.
There are two major problems with the use of this scheme in operational
seasonal prediction. Firstly, some of the predictor choices are difficult
to justify on physical grounds, being simply the best in a statistical
sense. Secondly, there is no requirement for temporal continuity of the
selected predictors, which can result in major shifts in the forecasts
from month to month. To overcome these problems the operational scheme
has been restricted to the use of the first two SST components, which are
known to be related to Australian rainfall variability. Retaining the one
and three month lags results in four potential predictors, and it is feasible
to examine the 15 possible subsets of any size. The resulting cross-validated
skill is shown in Figure 3. Again this shows
some improvement over the SOI alone, especially at the same lead time,
with most of the increased skill due to the inclusion of the second, Indian
Ocean, SST component.
Improved Future Forecast Systems.
The potential for improvement of current seasonal climate prediction
systems, and specifically the NCC system, depends on what is meant by "improved".
In the narrow sense of more accurate or skilful forecasts in the current
format, there may be only limited potential for improvement. Much of the
potential predictability in Australian seasonal rainfall one season ahead,
may be realised by the system utilising global SST patterns as predictors.
From the vast array of new potential predictands which will be available
in the near future, only a small portion will display variability truly
independent of the SST data.
Climate variability occurs on a variety of time scales ranging from
intra-seasonal through inter-annual (ENSO) through to multi-decadal. More
significantly the relationship between predictor and predictand can also
vary on decadal to multi-decadal time scales, for example the changing
relationships over time between the SOI and seasonal rain over most of
Australia (Nicholls et al 1996, 1997). Wider physical evidence for these
changes is now coming from concerted studies of the climate system. One
line focusing on detailed global analyses of historical atmospheric pressure
and SST compilations using techniques such as EOFs and SVD has isolated
several modes of variability operating on decadal to secular time scales
(Allan et al., 1998; Mann and Park, 1998). This research indicates that
not only do these climatic modes display ENSO-like structure and rainfall
relationships at low frequencies (Allan, 1998), but that they interact
with interannual ENSO signals to provide important modulations of the phenomenon.
Consequently, protracted El Nino and La Nina event sequences, such as the
1990-1995 El Nino period, are manifest through the superposition of interannual
ENSO and decadal ENSO-like modes in the climate system. In addition, long-term
changes in ENSO, such as the climate shift in its characteristics since
the 1970s, are seen to result from the operation of multidecadal ENSO-like
fluctuations.
This low frequency variability can confound statistical forecasts for average conditions over a three-month season. If, however, we regard seasonal climate prediction in a wider sense including the intraseasonal and decadal variability, then there is considerable scope for improvement, or expansion of seasonal climate prediction.
Areas of possible expansion of seasonal climate prediction include;
(a) Introduction of different target season length. Many parameters,
particularly in the tropics, display significant variability on the 30
to 90 day "intraseasonal oscillation" time scale. While this variability
can be aliased onto the seasonal time scale and confound seasonal predictions,
it may itself be predictable if the target "season" is much shorter than
the traditional three months. Closely related to shorter target seasons
is the prediction of significant events such as dates of last frost or
seasonal changes such as monsoon onset in northern Australia or the "winter
break" in southern Australia.
(b) Increased lead times for seasonal forecasts. The current system
based on the SOI has essentially zero lead time, while the SST-based system
has been designed with a one month lead. Forecasts issued by NCEP for the
United States have lead times ranging from one to twelve months ahead.
Appropriate realisable lead times for seasonal (and subseasonal) forecasts
need to be established by consultation with the users of the forecasts.
(c) Prediction of other parameters besides rainfall. BMRC is currently
developing a system for prediction of seasonal maximum and minimum temperatures,
and has explored the feasibility of the prediction of seasonal extremes.
Many other agriculturally useful potential predictands have also been identified.
Realisation of these extensions will require the availability of data
sets of both the predictands and potential predictors on the appropriate
time scales. For example, the introduction of sub-monthly target seasons
may require daily or pentad (five day) resolution data. This in turn requires
much greater computing and data storage capabilities than required for
monthly or seasonal data. Some research effort has already been directed
at these possible improvements by BMRC, especially the extension to other
parameters such as temperature as mentioned earlier, and examination of
intraseasonal variability of Australian rainfall. The SST-based season
prediction scheme will also be tested with much longer lead times.
References
Allan, R.J., J.A. Lindesay, and D.E. Parker 1996: El Nino Southern Oscillation
and Climatic Variability, CSIRO Publishing, Melbourne, 405 pp.
Allan, R.J., 1998: ENSO and climatic variability in the last 150 years.
In Diaz, H.F. and V. Markgraf (eds), El Nino and the Southern Oscillation:
Multiscale variability and its impacts on natural ecosystems and society.
Cambridge University Press, Cambridge, UK (In press).
Allan, R.J., C.K. Folland, D.E. Parker, M.E. Mann, I.N. Smith and N.A.
Rayner, 1998: ENSO and large-scale modes of climatic variability in global
instrumental data. Nature (In Preparation).
Basnett, T.A. and D.E. Parker, 1997: Development of the Global Mean
Sea Level pressure data set GMSLP2. Climate Research Technical Note CRTN79,
Hadley Centre, Meteorological Office; Bracknell, U.K, 16 pp.
de Hoedt, G.C., R.C. Stone, and M. Voice, 1998; The development and
delivery of current seasonal climate forecasting capabilities in Australia.
(This conference)
Drosdowsky, W., and L.E. Chambers, 1998: Near global sea surface temperature
anomalies as predictors of Australian seasonal rainfall. BMRC Research
Report no. 65.
Elsner, J.B., and C.P. Schmertmann, 1994: Assessing forecast skill through
cross validation. Wea. and Forec., 9, 619-624.
Eskridge, R.E., O.A. Alduchov, I.V. Chernykh, Z. Panmao, A.C. Polansky,
and S.R. Doty, 1995: A Comprehensive Aerological Reference Data Set (CARDS).
Rough and systematic errors. Bull. Amer. Meteor. Soc., 76, 1759-1775.
Gibson J.K., P. Kallberg, S. Uppala, A. Nomura, A. Hernandez, and E.
Serrano, 1997: ERA Description. ECMWF Re-Analysis Project Report Series,
1
Kalnay, E, and Co-Authors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.
Bull. Amer. Meteor. Soc., 77, 437-471.
Lee, D.M., 1994: Australian Data Archive for meteorology (ADAM) Manual:
Unpublished National Climate Centre Report. 64pp.
Mann, M.E. and J. Park, 1998: Oscillatory spatiotemporal signal detection
in climate studies. Adv. Geophys. (In press).
Nicholls, N., W. Drosdowsky and B. Lavery, 1997: Australian rainfall
variability and change. Weather, 52, 66-72.
Nicholls, N., B. Lavery, C. Frederiksen and W. Drosdowsky, 1996: Recent
apparent changes in relationships between the El Nino Southern Oscillation
and Australian rainfall and temperature. Geophys. Res. Lett., 23, 3357-3360.
Rayner, N.A, E.B. Horton, D.E. Parker and C. K. Folland, 1998: The GISST2.3
and GISST3.0 data sets. Climate Research Technical Note CRTN??, Hadley
Centre, Meteorological Office; Bracknell, U.K (in press).
Reynolds, R.W., and T.M. Smith, 1994: Improved global sea surface temperature
analyses using optimal interpolation. J. Climate, 7, 929-948.
Slutz, R.J., S.J. Lubker, J.D. Hiscox, S.D. Woodruff, R.L. Jenne, D.H.
Joseph, P.M. Steurer, and J.D. Elms, 1985: Comprehensive Ocean Atmosphere
DataSet; Release 1. NOAA Environmental Research Laboratories, Climate Research
Program, Boulder, CO. 268pp. (NTIS PB86-105723)
Vose, R.S., R.L. Schmoyer, P.M.Steurer, R.Heim, T.R.Karl, and J.K. Eischeid,
1992: The Global Historical Climatology Network: Long-Term Monthly Temperature,
Precipitation, Sea Level Pressure, and Station Pressure Data. ORNL/CDIAC-53,
NDP-041. CDIAC, Oak Ridge, Tennessee. 315pp
Ward, N.M., and C.K. Folland, 1991: Prediction of seasonal rainfall
in the north Nordeste of Brazil using eigenvectors of sea surface temperature.
Intl. J. Climatol., 11, 711-743.
Figure Captions.
Figure 1. Spatial pattern of loadings
and associated scores (time series) of the first nine grided Australian
rainfall VARIMAX rotated principal components of the standardised month
anomalies of the data set. Contour interval is 0.2, with zero contour heavy,
negative contours dashed and areas above +0.2 and below -0.2 shaded.
Figure 2. Spatial pattern of loadings and associated scores (time series)
of the first twelve VARIMAX rotated principal components of the standardised
month anomalies of the GISST data set. Contour interval is 0.2, with zero
contour heavy, negative contours dashed and areas above +0.2 and below
-0.2 shaded. (SST1-6, SST7-12)
Figure 3. Independent, "out of sample"
double cross-validation LEPS scores for seasonal rainfall hindcasts for
the period 1950-1993. Predictors used in the hindcasts are the best combination
of any size selected from a pool of four potential predictors, these being
the first two SST principal components shown in Figure 2 lagged by one
and three months.
| Bureau Home || BMRC Home || Search || Contact BMRC Webmaster |
| Experimental results described in these pages are from research systems developed in BMRC and are not part of the Bureau of Meteorology's operational products & services. |
Home | About Us | Learn about Meteorology | Contacts | Search | Help | Feedback Weather and Warnings | Climate | Hydrology | Numerical Prediction | About Services | Registered Users | SILO |
|
© Copyright Commonwealth of Australia 2008, Bureau of Meteorology (ABN 92 637 533 532) Please note the Copyright Notice and Disclaimer statements relating to the use of the information on this site and our site Privacy and Accessibility statements. Users of these web pages are deemed to have read and accepted the conditions described in the Copyright, Disclaimer, and Privacy statements. Please also note the Acknowledgement notice relating to the use of information on this site. No unsolicited commercial email. |