Australian Government - Bureau of Meteorology Home | About Us | Contacts | Help | Feedback |

Global | Australia | NSW | Vic. | Qld | WA | SA | Tas. | ACT | NT | Ant. |

Weather & Warnings | Hydrology | Climate | Numerical Prediction | About Services | Learn About Meteorology | Registered User Services |

Bureau of Meteorology Research Centre link image
BMRC is now part of CAWCR: The Centre for Australian Weather and Climate Research.
For more information on The Centre please go to http://www.cawcr.gov.au

CLIMATE FORECASTING HOME CLIMATE FORECASTING STAFF CLIMATE FORECASTING EXPERIMENTAL RESULTS CLIMATE FORECASTING COLLABORATIONS CLIMATE FORECASTING ANNUAL REPORT

The Potential for Improved Statistical Seasonal Climate Forecasts.

Wasyl Drosdowsky

Bureau of Meteorology Research Centre,

and

Rob Allan

CSIRO Division of Atmospheric Research.





Paper presented at the Symposium on
"Applications of Seasonal Climate Forecasting in Agricultural and Natural ecosystems - the Australian Experience",
held in Brisbane, Queensland, November 1997.


Abstract



From its beginning, statistical climate prediction has been hampered by poor observational data coverage, both spatially and temporally; incomplete theoretical understanding of the climate system; availability of only basic statistical techniques and limited computational capabilities. Progress in recent years, and the potential for further improvement in the future is the result of improvements in all these areas. Obviously these topics are all connected, and improvement in one area leads to, or requires, improvement in the others.

Until recently, most seasonal climate outlooks, such as that provided by the Bureau of Meteorology National Climate Centre (NCC) have used the Southern Oscillation Index (SOI) as the sole or major predictor, and simple forecast techniques such as linear regression. Now seasonal forecasts are being issued based on large scale surface temperature (SST) and global circulation anomalies. The availability of large multi-variable data sets has and will continue to increase the use of more advanced statistical techniques beyond the simple linear regression approach. The availability of many additional potential predictors, and different statistical forecasting systems greatly increases the possibility of artificial skill, which can be minimised by rigorous cross validation techniques.

Much of the potential predictability in Australian seasonal rainfall one season ahead, may be realised by a system utilising global SST patterns as predictors. However, climate variability encompasses other parameters, and occurs on a variety of time scales ranging from intra-seasonal through inter-annual (ENSO) through to decadal. This leaves considerable scope for improvement, or expansion of seasonal climate prediction of other parameters besides rainfall, and with increased lead times, and different target season length.




Introduction

An historical account of the current seasonal climate outlook service provided by the Bureau of Meteorology National Climate Centre (NCC) is given by de Hoedt et al (1998). This service currently uses the Southern Oscillation Index (SOI) as its sole predictor. Elsewhere, particularly at the National Center for Environmental Prediction (NCEP) Washington, and the Hadley Centre for Climate Prediction and Research of the United Kingdom Meteorological Office (UKMO) empirical seasonal forecasts are being issued based on large scale sea surface temperature (SST) anomalies, amongst other predictors. A similar scheme is under development, and has been run in quasi-operational mode for the past twelve months, in the Bureau of Meteorology Research Centre (BMRC) (Drosdowsky and Chambers, 1998).

In order to assess the potential for improved statistical seasonal climate forecasts for Australia beyond that provided by either the current system, or the BMRC SST based forecast system, we need to consider the basic requirements for such a forecast system. These include;

(a) long time series of reliable data, of both the predictand and any potential predictors,

(b) appropriate statistical methodologies to analyse and find relationships between this data,

(c) the computing power, including data storage capacity, and ultimately,

(d) a knowledge of the science or the theoretical background to verify the physical plausibility of the predictor - predictand relationships.

Until very recently, statistical climate prediction has been restricted in all these requirements.

Data

Most observational data networks have been designed for weather forecasting, and while these may have adequate spatial coverage at present, this has not always been the case. As a result very few stations have the long, continuous and homogeneous records necessary for seasonal prediction. In the past 10 to 15 years much effort has been directed towards the compilation of historical data, for example the global SST data in the Comprehensive Ocean Atmosphere Data Set (COADS) (Slutz et al., 1985). This compilation of ship data has been further refined to interpolate missing values, and include other sources of SST such as satellite radiances, resulting in the UKMO Global sea-Ice and Sea Surface Temperature (GISST) data set (Rayner et al., 1998), and the NCEP SST reanalysis (Reynolds and Smith, 1994). Other global surface and upper air data sets being developed include the Global Historical Climate Network (GHCN) data set (Vose et al., 1992), the Global Mean Sea Level Pressure (GMSLP) data set from the UKMO (Allan et al., 1996, Basnett and Parker 1997), and the Comprehensive Aerological Reference Data Set (CARDS) (Eskridge et al., 1995). Over the past few years this array of observed data has been subjected to comprehensive global reanalyses using current, state-of-the-art analysis systems at NCEP (Kalnay et al., 1996) and the European Centre for Medium Range Weather Forecasts (ECMWF) (Gibson et al., 1997).


Computing Power.

The global reanalysis projects at NCEP and ECMWF are feasible due to the enormous computing and data storage facilities available at these centres. The full potential of these large data sets can now also be realised by national meteorological services. In Australia, access to historical data has been greatly enhanced with the introduction of the Australian Data Archive for Meteorology (ADAM) (Lee, 1994) system within the National Climate Centre. This now allows virtually the entire historical data base to be kept on disk and accessed easily from anywhere within the Bureau of Meteorology and by external users.

Statistical Methods.

The availability of these large multi-variable data sets and increased computing capabilities has and will increase the use of more advanced statistical techniques, beyond the simple traditional methods such as linear correlation and regression. These statistical techniques are used for a number of purposes. Multivariate analysis techniques such as Principal Component Analysis (PCA) or Empirical Orthogonal Functions (EOF) analysis, Singular Value Decomposition (SVD), cluster analysis, and Canonical Correlation analysis (CCA), are now commonplace in climate research and seasonal climate forecasting, being used as data reduction techniques, and also to explore patterns or modes of variability in the data (see reviews of these techniques in Mann and Park 1998). More powerful forecast techniques such as linear discriminant analysis (LDA) and non linear methods such as neural networks are being applied to seasonal forecasts.

An essential requirement of all forecast schemes, whether statistical or dynamical, is an estimate of the skill to be expected by the scheme. Here we need to distinguish between the in-sample skill estimated by using all the data, ie the model fit, and true out-of-sample skill obtained by testing the model on independent data. This latter skill is usually obtained through a cross-validation or "leave-one-out" procedure. The difference between these two estimates is sometimes referred to as the artificial skill.

A more important source of possible artificial skill in seasonal climate predictions is the availability of many additional potential predictors, advanced statistical techniques and forecast methodologies and the computing power to apply them. To obtain a true out-of-sample skill estimate, when selecting between different potential predictors or different forecast models a nested cross validation procedure is necessary. Alternatively, an improved theoretical understanding of the climate system, and experience with global climate models, may lead to the formulation of appropriate a priori hypotheses which can be tested by the improved data base, rather than engaging in haphazard "fishing expeditions" which will almost always find some statistical relationship between the predictand and a large number of potential predictors. Simply selecting the best set of predictors or best model will lead to increased artificial skill.

The BMRC SST based prediction system.

To illustrate the interaction between the various factors described above we present a brief description of the BMRC SST based statistical system. The predictand is the Australian seasonal rainfall anomaly, on a uniform one degree grid over the entire continent. This grided data set was developed by NCC using all the available monthly rainfall records in the ADAM data base (Jones and Weymouth, 1998). The predictand to be used is the global or near global SST anomaly field. Both these data sets consist of many hundreds of individual data gridpoints. To relate one field to another directly would mean examining many thousands of possible predictor - predictand combinations. Clearly these will not be all independent as there is appreciable spatial correlation in both fields. To reduce the number of predictors and predictands to manageable size, and at the same time produce largely uncorrelated predictors, the coherent variability of both fields is summarised by rotated principal component analysis. For the rainfall data, retaining the first nine components (Figure 1) accounts for about 60% of the variance, while the first twelve components (Figure 2) account for about 50% of the SST data variance. This procedure reduces the number of potential predictors to around twenty to thirty, if not all SST components are considered, but some lagged values (at sufficiently large lags to reduce the autocorrelations to insignificant levels) are also used. The number of actual forecasts required for each season is then reduced from over 600 to just the nine rainfall principal components. An additional benefit is that the resulting forecast, when interpolated back to the original gridpoints in some appropriate manner will be much smoother spatially than that produced by forecasting for each individual gridpoint.

Having reduced the predictors and predictands to a small number of uncorrelated principal components we need to decide on a prediction methodology; the method chosen for this forecast scheme is LDA, mainly due to its ability to produce probability forecasts. In LDA the rainfall for each season in the training set is categorised according to its rank in the cumulative frequency distribution, with the driest third of years assigned to tercile 1 (Below normal), the middle third to tercile 2 (Near normal), and the wettest third to tercile three (Above normal). LDA then uses Bayes Theorem to estimate the probability of a new observation of SSTs belonging to a particular category, based on the distribution of the observations in the training set. This approach is identical to that employed by Ward and Folland (1991) to predict rainfall in Northeast Brazil using eigenvectors of global SST .

The estimate of hindcast skill (ie based on the training set of data) can be obtained in a number of ways. The two most common methods are the "Holdout Method" in which the data is split into the development set to which the LDA model is fitted, and a test sample (usually 1/4 to 1/3 of the total) to which the model is applied and from which the hindcast skill is calculated. The second method, and that used here, is the "Leave-One-Out" or cross validation method. This is similar to the Holdout method except that the split is into N-1 cases in the development sample, and only one test case. However this procedure is then repeated over all N cases, leaving one out each time. Skill is then evaluated over all N hindcasts. In either method the development and test samples must be kept independent to prevent the introduction of artificial skill.

This discussion applies to the simple case where the form of the LDA model, ie the predictors to be used, are known or specified in advance. When the structure of the model is not known, ie when we need to select the best subset of predictors from a large pool of potential predictors, then the selection procedure itself must also be cross-validated, as described in detail by Elsner and Schmertmann (1994). For the BMRC SST forecast system, we have examined the use of up to ten SST principal components, lagged by one and three months for a total of twenty potential predictors (Drosdowsky and Chambers, 1998). With such a large number of potential predictors it is not feasible to include every possible subset of size up to 20 predictors; in this case the search has been restricted to the best model involving a subset of at most two predictors. This still results in over 200 possible forecast models, and, as shown by Drosdowsky and Chambers (1998), an enormous potential for artificial skill as measured by the difference between the "in-sample" and the cross-validated "out of sample" skill. Nevertheless, the true cross-validated skill of this system exceeds that obtained from the SOI alone, particularly through the "autumn predictability barrier" when El Nino Southern Oscillation (ENSO) events typically change phase. There are two major problems with the use of this scheme in operational seasonal prediction. Firstly, some of the predictor choices are difficult to justify on physical grounds, being simply the best in a statistical sense. Secondly, there is no requirement for temporal continuity of the selected predictors, which can result in major shifts in the forecasts from month to month. To overcome these problems the operational scheme has been restricted to the use of the first two SST components, which are known to be related to Australian rainfall variability. Retaining the one and three month lags results in four potential predictors, and it is feasible to examine the 15 possible subsets of any size. The resulting cross-validated skill is shown in Figure 3. Again this shows some improvement over the SOI alone, especially at the same lead time, with most of the increased skill due to the inclusion of the second, Indian Ocean, SST component.


Improved Future Forecast Systems.

The potential for improvement of current seasonal climate prediction systems, and specifically the NCC system, depends on what is meant by "improved". In the narrow sense of more accurate or skilful forecasts in the current format, there may be only limited potential for improvement. Much of the potential predictability in Australian seasonal rainfall one season ahead, may be realised by the system utilising global SST patterns as predictors. From the vast array of new potential predictands which will be available in the near future, only a small portion will display variability truly independent of the SST data.

Climate variability occurs on a variety of time scales ranging from intra-seasonal through inter-annual (ENSO) through to multi-decadal. More significantly the relationship between predictor and predictand can also vary on decadal to multi-decadal time scales, for example the changing relationships over time between the SOI and seasonal rain over most of Australia (Nicholls et al 1996, 1997). Wider physical evidence for these changes is now coming from concerted studies of the climate system. One line focusing on detailed global analyses of historical atmospheric pressure and SST compilations using techniques such as EOFs and SVD has isolated several modes of variability operating on decadal to secular time scales (Allan et al., 1998; Mann and Park, 1998). This research indicates that not only do these climatic modes display ENSO-like structure and rainfall relationships at low frequencies (Allan, 1998), but that they interact with interannual ENSO signals to provide important modulations of the phenomenon. Consequently, protracted El Nino and La Nina event sequences, such as the 1990-1995 El Nino period, are manifest through the superposition of interannual ENSO and decadal ENSO-like modes in the climate system. In addition, long-term changes in ENSO, such as the climate shift in its characteristics since the 1970s, are seen to result from the operation of multidecadal ENSO-like fluctuations.

This low frequency variability can confound statistical forecasts for average conditions over a three-month season. If, however, we regard seasonal climate prediction in a wider sense including the intraseasonal and decadal variability, then there is considerable scope for improvement, or expansion of seasonal climate prediction.

Areas of possible expansion of seasonal climate prediction include;

(a) Introduction of different target season length. Many parameters, particularly in the tropics, display significant variability on the 30 to 90 day "intraseasonal oscillation" time scale. While this variability can be aliased onto the seasonal time scale and confound seasonal predictions, it may itself be predictable if the target "season" is much shorter than the traditional three months. Closely related to shorter target seasons is the prediction of significant events such as dates of last frost or seasonal changes such as monsoon onset in northern Australia or the "winter break" in southern Australia.

(b) Increased lead times for seasonal forecasts. The current system based on the SOI has essentially zero lead time, while the SST-based system has been designed with a one month lead. Forecasts issued by NCEP for the United States have lead times ranging from one to twelve months ahead. Appropriate realisable lead times for seasonal (and subseasonal) forecasts need to be established by consultation with the users of the forecasts.

(c) Prediction of other parameters besides rainfall. BMRC is currently developing a system for prediction of seasonal maximum and minimum temperatures, and has explored the feasibility of the prediction of seasonal extremes. Many other agriculturally useful potential predictands have also been identified.

Realisation of these extensions will require the availability of data sets of both the predictands and potential predictors on the appropriate time scales. For example, the introduction of sub-monthly target seasons may require daily or pentad (five day) resolution data. This in turn requires much greater computing and data storage capabilities than required for monthly or seasonal data. Some research effort has already been directed at these possible improvements by BMRC, especially the extension to other parameters such as temperature as mentioned earlier, and examination of intraseasonal variability of Australian rainfall. The SST-based season prediction scheme will also be tested with much longer lead times.

References

Allan, R.J., J.A. Lindesay, and D.E. Parker 1996: El Nino Southern Oscillation and Climatic Variability, CSIRO Publishing, Melbourne, 405 pp.

Allan, R.J., 1998: ENSO and climatic variability in the last 150 years. In Diaz, H.F. and V. Markgraf (eds), El Nino and the Southern Oscillation: Multiscale variability and its impacts on natural ecosystems and society. Cambridge University Press, Cambridge, UK (In press).

Allan, R.J., C.K. Folland, D.E. Parker, M.E. Mann, I.N. Smith and N.A. Rayner, 1998: ENSO and large-scale modes of climatic variability in global instrumental data. Nature (In Preparation).

Basnett, T.A. and D.E. Parker, 1997: Development of the Global Mean Sea Level pressure data set GMSLP2. Climate Research Technical Note CRTN79, Hadley Centre, Meteorological Office; Bracknell, U.K, 16 pp.

de Hoedt, G.C., R.C. Stone, and M. Voice, 1998; The development and delivery of current seasonal climate forecasting capabilities in Australia. (This conference)

Drosdowsky, W., and L.E. Chambers, 1998: Near global sea surface temperature anomalies as predictors of Australian seasonal rainfall. BMRC Research Report no. 65.

Elsner, J.B., and C.P. Schmertmann, 1994: Assessing forecast skill through cross validation. Wea. and Forec., 9, 619-624.

Eskridge, R.E., O.A. Alduchov, I.V. Chernykh, Z. Panmao, A.C. Polansky, and S.R. Doty, 1995: A Comprehensive Aerological Reference Data Set (CARDS). Rough and systematic errors. Bull. Amer. Meteor. Soc., 76, 1759-1775.

Gibson J.K., P. Kallberg, S. Uppala, A. Nomura, A. Hernandez, and E. Serrano, 1997: ERA Description. ECMWF Re-Analysis Project Report Series, 1

Kalnay, E, and Co-Authors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437-471.

Lee, D.M., 1994: Australian Data Archive for meteorology (ADAM) Manual: Unpublished National Climate Centre Report. 64pp.

Mann, M.E. and J. Park, 1998: Oscillatory spatiotemporal signal detection in climate studies. Adv. Geophys. (In press).

Nicholls, N., W. Drosdowsky and B. Lavery, 1997: Australian rainfall variability and change. Weather, 52, 66-72.

Nicholls, N., B. Lavery, C. Frederiksen and W. Drosdowsky, 1996: Recent apparent changes in relationships between the El Nino Southern Oscillation and Australian rainfall and temperature. Geophys. Res. Lett., 23, 3357-3360.

Rayner, N.A, E.B. Horton, D.E. Parker and C. K. Folland, 1998: The GISST2.3 and GISST3.0 data sets. Climate Research Technical Note CRTN??, Hadley Centre, Meteorological Office; Bracknell, U.K (in press).

Reynolds, R.W., and T.M. Smith, 1994: Improved global sea surface temperature analyses using optimal interpolation. J. Climate, 7, 929-948.

Slutz, R.J., S.J. Lubker, J.D. Hiscox, S.D. Woodruff, R.L. Jenne, D.H. Joseph, P.M. Steurer, and J.D. Elms, 1985: Comprehensive Ocean Atmosphere DataSet; Release 1. NOAA Environmental Research Laboratories, Climate Research Program, Boulder, CO. 268pp. (NTIS PB86-105723)

Vose, R.S., R.L. Schmoyer, P.M.Steurer, R.Heim, T.R.Karl, and J.K. Eischeid, 1992: The Global Historical Climatology Network: Long-Term Monthly Temperature, Precipitation, Sea Level Pressure, and Station Pressure Data. ORNL/CDIAC-53, NDP-041. CDIAC, Oak Ridge, Tennessee. 315pp

Ward, N.M., and C.K. Folland, 1991: Prediction of seasonal rainfall in the north Nordeste of Brazil using eigenvectors of sea surface temperature. Intl. J. Climatol., 11, 711-743.

Figure Captions.

Figure 1. Spatial pattern of loadings and associated scores (time series) of the first nine grided Australian rainfall VARIMAX rotated principal components of the standardised month anomalies of the data set. Contour interval is 0.2, with zero contour heavy, negative contours dashed and areas above +0.2 and below -0.2 shaded.

Figure 2. Spatial pattern of loadings and associated scores (time series) of the first twelve VARIMAX rotated principal components of the standardised month anomalies of the GISST data set. Contour interval is 0.2, with zero contour heavy, negative contours dashed and areas above +0.2 and below -0.2 shaded. (SST1-6, SST7-12)

Figure 3. Independent, "out of sample" double cross-validation LEPS scores for seasonal rainfall hindcasts for the period 1950-1993. Predictors used in the hindcasts are the best combination of any size selected from a pool of four potential predictors, these being the first two SST principal components shown in Figure 2 lagged by one and three months.



  Bureau Home   ||   BMRC Home  ||  Search  ||  Contact BMRC Webmaster
Experimental results described in these pages are from research systems developed in BMRC and are not part of the Bureau of Meteorology's operational products & services.


© Copyright Commonwealth of Australia 2008, Bureau of Meteorology (ABN 92 637 533 532)
Please note the Copyright Notice and Disclaimer statements relating to the use of the information on this site and our site Privacy and Accessibility statements. Users of these web pages are deemed to have read and accepted the conditions described in the Copyright, Disclaimer, and Privacy statements. Please also note the Acknowledgement notice relating to the use of information on this site. No unsolicited commercial email.