Volume 55, Issue 8 p. 7400-7418
Research Article
Open Access

Effectiveness of CMIP5 Decadal Experiments for Interannual Rainfall Prediction Over Australia

Dipayan Choudhury

Dipayan Choudhury

School of Civil and Environmental Engineering, The University of New South Wales, Sydney, New South Wales, Australia

ARC Centre of Excellence for Climate System Science, The University of New South Wales, Sydney, New South Wales, Australia

Now at Center for Climate Physics, Institute for Basic Science, Busan, South Korea

Search for more papers by this author
Rajeshwar Mehrotra

Rajeshwar Mehrotra

School of Civil and Environmental Engineering, The University of New South Wales, Sydney, New South Wales, Australia

Search for more papers by this author
Ashish Sharma

Corresponding Author

Ashish Sharma

School of Civil and Environmental Engineering, The University of New South Wales, Sydney, New South Wales, Australia

Correspondence to: A. Sharma,

[email protected]

Search for more papers by this author
Alexander Sen Gupta

Alexander Sen Gupta

ARC Centre of Excellence for Climate System Science, The University of New South Wales, Sydney, New South Wales, Australia

Climate Change Research Centre, The University of New South Wales, Sydney, New South Wales, Australia

Search for more papers by this author
Bellie Sivakumar

Bellie Sivakumar

School of Civil and Environmental Engineering, The University of New South Wales, Sydney, New South Wales, Australia

Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India

Search for more papers by this author
First published: 06 August 2019
Citations: 9

Abstract

Phase five of the Coupled Model Intercomparison Project enabled a range of decadal modeling experiments where climate models were initialized with observations and allowed to evolve freely for 10–30 years. However, climate models struggle to realistically simulate rainfall and the skill of rainfall prediction in decadal experiments is poor. Here, we examine how predictions of sea surface temperature anomaly (SSTA) indices from Coupled Model Intercomparison Project Phase 5 decadal experiments can provide skillful rainfall forecasts at interannual timescales for Australia. Forecasts of commonly used SSTA indices relevant to Australian seasonal rainfall are derived from decadal hindcasts of six different climate models and corrected for model drift. The corrected indices are then combined to form a multimodel ensemble. The resultant forecasts are used as predictors in a statistical rainfall model developed in this study. As SSTA forecasts lose skill with increasing lead time, a new methodology for predicting interannual rainfall is proposed. We allow our statistical prediction model to evolve with lead time while accounting for the loss of skill in SSTA forecasts instead of using one statistical model for all lead times. Results in this pilot study across two of the largest climate zones in Australia show that SSTA outputs from the decadal experiments provide enhanced skill in rainfall prediction over using the conventional model (based purely on lagged observed indices) up to a maximum of three years ahead. This methodology could be used more broadly for other regions around the world where rainfall variability is known to have strong links to ocean temperatures.

Key Points

  • A novel model combination approach is developed and applied to Australian rainfall prediction using CMIP5 decadal experiment outputs
  • The method statistically combines independent models based on both empirical (lagged) and dynamical (concurrent) relationships
  • Significant improvements in rainfall forecasting skill over using just the empirical models are noted out to a lead of 3 years

1 Introduction

Reliable and accurate forecasting of seasonal rainfall is of high value and, thus, has attracted considerable research efforts in recent years. Many different systems have been developed and used by meteorology departments around the world to issue operational forecasts. These forecasting systems are based on dynamical models, statistical models, or a combination of both. Dynamical systems predict rainfall using numerical weather prediction models or climate forecasting models. Krishnamurthy et al. (2019) used the models from the Geophysical Fluid Dynamics Laboratory for forecasting summer rainfall over the Intra-Americas Sea and found that the high-resolution atmosphere-only model forced with observed sea surface temperature anomaly (SSTA) led to the best simulation of the Caribbean low-level jet and hence the highest forecast skill. Where high-resolution simulations are not available, regional climate models nested in coupled general circulation models (GCMs) have been used for seasonal rainfall forecasting studies over Vietnam (Phan-Van et al., 2018), West Africa (Siegmund et al., 2015), China (Yuan et al., 2012), and globally (Liu et al., 2014). Statistical systems employ relationships between large-scale climate predictors and rainfall to issue such forecasts. For instance, the Indian Meteorological Department issues long-range forecasts of the Indian summer monsoon rainfall using SSTA and surface pressure anomalies over the North Atlantic and the tropical Indian and Pacific Oceans as predictors alongside land surface air temperature anomalies over Europe and north central Pacific zonal wind anomalies (Rajeevan et al., 2007). Extremes in Indian summer monsoon rainfall have also been linked to and predicted with useful skill using indices of the El Niño–Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD; Krishnaswamy et al., 2014; Rajagopalan & Molnar, 2014). More recently, B Wang et al. (2015) suggested that including new predictors such as central Pacific ENSO, Asian low, and spring North and South Pacific Highs could significantly improve the Indian Monsoon forecast skill. Similar statistical forecasting studies have also been undertaken for seasonal rainfall over Thailand (Babel et al., 2017), East Asian Summer Monsoon over China (Xing et al., 2016), and West Africa (Sittichok et al., 2016).

For Australia, the Bureau of Meteorology (BoM) issues official forecasts of seasonal climate using a combination of dynamical and statistical approaches. These approaches are called calibration, bridging, and merging schemes and follow the logic outlined below [Schepen et al., 2014].
  1. Calibration: GCMs are used to obtain raw rainfall forecasts, which are then bias corrected based on previous skill between forecasts and observations. This method is good for seasons and regions over which the rainfall forecasts are known to be skillful.
  2. Bridging: GCMs are used to obtain forecasts of large-scale climate indices (CIs; based on SSTA indices, winds, etc.), which are then passed through a statistical rainfall prediction model. This method is good for seasons and regions over which rainfall is known to be strongly linked to CIs. The statistical model is based on a Bayesian Joint Probability scheme (BJP, Q J Wang et al., 2009), which forms individual regression models between rainfall and concurrent GCM SSTA indices (Schepen et al., 2012) and then combines these regression models using a Bayesian Model Averaging (BMA) procedure (Q J Wang et al., 2012) to forecast seasonal rainfall.
  3. Merging: The dynamical (calibration) and statistical (bridging) rainfall forecasts are combined using BMA.

Rainfall variability in Australia has long been linked to large-scale changes in the atmospheric circulation patterns driven by temperature changes over the Pacific and Indian Oceans (Ashok et al., 2003; Chiew et al., 1998; Drosdowsky, 1993b; Kirono et al., 2010; Nicholls, 1984b; Ramsay et al., 2008; Risbey et al., 2009; Taschetto et al., 2009; Ummenhofer et al., 2011). The positive (negative) phase of the ENSO leads to drying (wetting) over large parts of Australia (Power et al., 2006; G Wang & Hendon, 2007). IOD, the difference in SSTA between the central Indian Ocean and the Maritime Continent, has been shown to affect winter and spring rainfall (Ashok et al., 2003; Cai et al., 2011; Verdon & Franks, 2005), and linked to droughts over Australia (Ummenhofer et al., 2009). The El Niño Modoki, characterized by warm SSTA over central tropical Pacific flanked by colder SSTAs to the east and west, has been shown to affect winter rainfall and wind anomalies over Australia (Ashok et al., 2007; Ashok et al., 2009), and has been linked to shorter and more intense Australian monsoon season (Taschetto et al., 2009). These indices also affect climate in other parts of the planet. For instance, ENSO affects rainfall over the southwest of the United States (Andrade & Sellers, 1988), central Asia (Mariotti, 2007), winter rainfall over Europe (Zanchettin et al., 2008), and leads to global climate impacts (McPhaden et al., 2006). The El Niño Modoki affects the climate over South America, New Zealand, Japan, and India alongside Australia (Ashok et al., 2007). The IOD influences precipitation over India (Ashok et al., 2001), East Asia (Guan & Yamagata, 2003), and Africa (Black, 2005; Ummenhofer et al., 2009) besides Australia. In addition to these tropical Pacific and Indian Ocean climate patterns, Australian rainfall has also been linked to SSTAs over the South Tasman Sea (Drosdowsky, 1993a) and the Indonesian Seas (Nicholls, 1984a). Schepen et al. (2012) showed these indices to be useful predictors of Australian rainfall, and a subset of these indices are used for operational forecasting by the BoM. More recently, the tropical transbasin variability (TBV) index (Chikamoto et al., 2015), a dipole between the tropical Pacific and the tropical Atlantic, has been shown to significantly affect Australian rainfall (Choudhury, Sen Gupta, et al., 2016; Johnson et al., 2018). Moreover, the TBV has been reported to have a multiyear predictability, which is much longer than those associated with other such SSTA indices (e.g., ENSO; Chikamoto et al., 2015; Choudhury, Sen Gupta, et al., 2016). Chikamoto et al. (2015) suggest the longer-term predictability of TBV compared to ENSO to primarily arise from coupled feedback processes between the Atlantic, Indian, and Pacific Oceans via Walker circulation displacements. Chikamoto et al. (2015) used partially coupled experiments to show the importance of Atlantic SSTAs in modulating Pacific variability. They suggest that the coupled Indian-Atlantic-Pacific system, described by the TBV index, exhibits longer-term variability than the tropical Pacific alone leading to longer-term predictability besides the higher predictability expected from TBV's larger spatial extent. However, a detailed explanation for how this happens has yet to be provided.

A set of experiments was conducted as part of the Phase 5 of the Coupled Model Intercomparison Project (CMIP5) to bridge the gap between seasonal forecasting systems and climate projections (Kirtman et al., 2013). These “near-term” or “decadal” experiments are a collaboration between the World Climate Research Programme and the Working Group on Seasonal to Interannual Prediction and provide simulations over a timescale of 10–30 years. Forecasts at decadal timescales would be beneficial for many industries, including agriculture, water supply, energy, and fisheries. These experiments are initialized with the observed ocean state, like seasonal predictions, and also account for changes in external forcings, such as greenhouse gases, aerosols, and solar activity, ignored in seasonal forecasts. To isolate the predictable part of the climate response from the unpredictable short-term climate variability, multiple simulations (called ensembles) of the same model with slightly perturbed initial conditions are carried out. To evaluate the merit of such experiments, multiple modeling groups have run hindcast experiments initialized every year or every 5 years from the year 1960 to 2005.

GCMs are known to have significant difficulties in realistically simulating rainfall (Collins et al., 2011; Lin, 2007; Perkins et al., 2007; Stephens et al., 2010; Sun et al., 2006). This is in part because precipitation is strongly affected by local processes that occur over short timescales, which climate models struggle to reproduce. However, studies suggest that GCMs are better at forecasting sea surface temperature at timescales longer than the annual timeframe, including from the decadal prediction experiments (Chikamoto et al., 2015; Choudhury et al., 2015; Corti et al., 2012; Gonzalez & Goddard, 2016; Mehta et al., 2013). This can be attributed to the fact that SSTA, especially over regions linked to large-scale climate phenomena like ENSO, generally have a longer memory and evolve more slowly. It should be noted that representation of these large-scale climate patterns in climate models may contain biases that might result in changes in predictability (Guemas et al., 2012; Meehl & Teng, 2012). Despite this, using forecasts of SSTA indices as predictors of rainfall may prove more skillful than using rainfall forecasts directly. Previous studies have shown simple statistical rainfall forecasting systems based on large-scale climate patterns can outperform rainfall forecasted using sophisticated weather prediction models (Rajeevan et al., 2007; B Wang et al., 2015; Westra & Sharma, 2010; Wilks, 2011). For Australia, the merit of using SSTA fields driven by a multimodel combination approach (Kaiser Khan et al., 2014) over using a single SSTA field to issue seasonal rainfall forecasts has been previously documented by Khan et al. (2015). The “bridging” component of the operational forecasting system of BoM is such a statistical model based on rainfall-CI relationships.

A challenge in the above approach, however, is accounting for the reduced skill of predicted SSTA indices as lead times increase. This temporal decay of prediction skill applies also to decadal experiments, additionally due to the presence of climate drift that introduces significant systematic biases in the variables simulated (Mehrotra et al., 2014). A new approach that accounts for model drift, the deterioration of forecast skill with lead time, and the fact that the same predictability is not achievable for all climate variables of interest, is needed. In view of this, the present study aims to introduce a novel methodology for forecasting rainfall (or any other variable) that has low direct predictability (because it is poorly simulated in models) by using a suite of available information while letting the statistical prediction model to evolve temporally. The model is referred to as a hierarchical linear combination (HLC) model and is used to forecast rainfall at seasonal to interannual timescales using SSTA indices as predictors. Two sets of predictor indices are considered: observed indices available at the start of the forecast and concurrent predicted SSTA indices derived from the decadal experiments. The rationale being that multiple SSTA predictors are necessary in order to obtain the best possible rainfall forecast, including observed indices (which provide skill based on a lagged relationship with rainfall) and concurrent modeled indices (which provide skill where there is a concurrent relationship between an index and rainfall). The relationships between observed rainfall and observed SSTA and observed rainfall and predicted SSTA are used to predict monthly rainfall in a leave-one-out cross-validation framework over the period of 1960–2010. This predicted rainfall is then compared with the observed rainfall to evaluate the skill of the HLC model. The prediction skill obtained is compared with the case where the concurrent modeled SSTA indices are not considered (which provides the lower limit of predictability) and where modeled SSTA indices are replaced with concurrent observed SSTA indices (representing the scenario where the decadal predictions are “perfect,” which provides the upper limit of predictability). Using this novel but simplified version of the existing operational statistical model, this study aims to quantify the merit in including SSTA predictions from the CMIP5 decadal experiments in Australian rainfall forecasting at an interannual timescale.

2 Models and Methods

2.1 SST and Rainfall Data

Decadal hindcasts integrated between 1960 and 2010 from five CMIP5 models are considered in this study. These models are CanCM4 (i1), GFDL-CM2.1, MIROC5, MPI-ESM-LR, and HadCM3 (i2 and i3). While data from other models are available at the CMIP5 data archive, only these models provided annual initializations at the time of the commencement of the study while other models provided decadal hindcast data initialized every 5 years over 1960–2010. Choudhury, Sharma, et al., 2016 reported that five yearly initialized decadal hindcast experiments could contain spurious imprinting of the observed ENSO signal during 1960–2010 leading to biased results over the tropical Pacific even after drift correction. The selected model decadal hindcasts are initialized every year from 1960 to 2000, thus making a total of 41 overlapping decadal projections, and encompass both full-field initialization (CanCM4i1, GFDL-CM2.1, and HadCM3i3) and anomaly initialization (MIROC5, MPI-ESM-LR, and HadCM3i2) methods. The models have a minimum of 3 and maximum of 10 ensemble members for each year (Table 1). The SSTA indices (defined in the next section) derived from the ensemble mean of each of the 41 forecasts of each model are drift corrected, using a leave-one-out cross-validation scheme. The standard drift correction method involves estimating the lead time-dependent mean drift, which is simply the average of all but one hindcast ensemble biases, and subtracting it from each of the ensembles (ICPO, 2011). However, here we use the best drift correction method for each index and model following the results of Choudhury et al. (2017). Most predicted indices are corrected either by the initial condition based drift correction method (Fučkar et al., 2014) or the trend based drift correction method (Kharin et al., 2012). These methods account for additional dependence on the observed initial conditions or additional time-dependent trend adjustment terms over the standard mean drift correction method. A thorough description of these drift correction methods and their effects on SSTA can be found in Choudhury et al. (2017). Indices from all the models are then combined to form the multimodel ensemble mean (MME). Choudhury et al. (2017) demonstrated that drift correcting individual models prior to averaging leads to the most skillful forecasts. Figure S1 in the supporting information shows the impact of this drift correction. CIs for each decade from each of the model are drift corrected and then averaged to form the mean time series of each index from the drift corrected MME (blue). This is compared with the case where individual predictions are not drift corrected before averaging to form the MME (Raw, black) and the mean time series from the observations (Obs, red). Drift correcting individual models and averaging leads to a much better match with the observed decadal time series than the raw values.

Table 1. Models Used in This Study
Model Group Initialization method Ensemble Size Historical Run Period
MIROC5 Atmospheric and Ocean Research Institute (AORI), National Institute for Environmental Studies (NIES), and Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Japan. Anomaly initialization 6 1850/01 to 2005/12
CanCM4 (i1) Canadian Centre for Climate Modelling and Analysis (CCCma), Canada. Full-field initialization 10 1961/01 to 2005/12
HadCM3 (i2 and i3) Met Office Hadley Centre, United Kingdom

i2 = Anomaly initialization

i3 = Full-field initialization

10 1859/12 to 2005/12
GFDL-CM2p1 National Oceanic and Atmospheric Administration (NOAA), Geophysical Fluid Dynamics Laboratory (GFDL), USA Full-field initialization 10 1861/01 to 2040/12
MPI-ESM-LR Max Plank Institute of Meteorology (MPI-M), Germany Anomaly initialization 3 1850/01 to 2005/12

Observed SSTA data are obtained from the Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST) data set (Rayner, 2003).

Observed gridded monthly rainfall over Australia during 1961–2010 are obtained from the Australian Water Availability Project (Jones et al., 2009), at a 0.5° × 0.5° resolution.

2.2 SSTA Indices

The choice of SSTA indices here is governed by two factors. The first factor is the relationship each index has with seasonal rainfall on a concurrent basis. The second factor is the predictability each index exhibits based on CMIP5 decadal hindcasts. An SSTA index with low predictability but high correlation with rainfall is of limited use in forecasting for the long lead times of interest. This is a key reason why dynamical precipitation simulations are of limited value in formulating seasonal precipitation forecasts, as their distinction from noise disappears within a few months from initialization (Mehrotra et al., 2014).

Ten SSTA CIs that have been shown to have a significant relationship with Australian rainfall are considered as potential predictors of Australian rainfall. They are Niño 3, Niño 4, Niño 3.4, the El Niño Modoki index (EMI), the Dipole Mode index (DMI), the Indian Ocean East Pole Index (EPI), the Indian Ocean West Pole Index (WPI), the Indonesian Index (II), the Tasman Sea Index (TSI), and the TBV (see Table 2 for details). The first nine indices have been used as a basis for operational seasonal forecasts (Schepen et al., 2012), while the TBV has recently been shown to have significant relationships with Australian rainfall by Choudhury, Sen Gupta, et al. (2016). Many of these indices are also useful predictors for rainfall in other parts of the world. The anomalies for the observations (HadISST) are calculated with respect to a 1871–2015 climatology and that for the decadal experiments is calculated from their respective historical runs (usually 1860 to 2005).

Table 2. Climate Indices Used in the Study
Climate index Description Group
Niño 3 Average SSTA over 5°N to 5°S, 150–90°W Pacific
Niño 4 Average SSTA over 5°N to 5°S, 160°E to 150°W Pacific
Niño 3.4 Average SSTA over 5°N to 5°S, 170–120°W Pacific
EMI (C) Average SSTA over 10°N to 10°S, 165°E to 140°W Pacific
EMI (E) Average SSTA over 5°N to 15°S, 110–70°W Pacific
EMI (W) Average SSTA over 20°N to 10°S, 125–145°E Pacific
EMI (El Niño Modoki Index) EMI (C) − 0.5 (EMI (W) + EMI (E)) Pacific
WPI (Indian Ocean West Pole Index) Average SSTA over 10°N to 10°S, 50–70°E Indian
EPI (Indian Ocean East Pole Index) Average SSTA over 0°N to 10°S, 90–110°E Indian
DMI (Dipole Mode index) WPI-EPI Indian
II (Indonesian index) Average SSTA over 0°N to 10°S, 120–130°E Indian
TSI (Tasman Sea index) Average SSTA over 30–40°S, 150–160°E Extratropical
TBV (A) Average SSTA over 15°N to 15°S, 40°W to 60°E Atlantic-Indian
TBV (P) Average SSTA over 15°N to 15°S, 180–150°W Pacific
TBV (Tropical Transbasin Variability Index) TBV (P) − TBV (A) Pacific-Atlantic Transbasin
  • Note. See Schepen et al. (2012), Chikamoto et al. (2015), and Choudhury, Sen Gupta, et al. (2016) for details. The indices are calculated from monthly mean SSTA. The anomalies for HadISST are calculated with respect to 1871–2015, and that for the decadal experiments is calculated from their respective historical runs (Table 1). SSTA = sea surface temperature anomaly.

The predictability of decadal hindcast indices was ascertained using random skill as a reference, separately for the first nine indices using six CMIP5 GCMs (Choudhury et al., 2015) and for TBV using a single GCM with multiple (10) ensembles (Chikamoto et al., 2015). An assessment of this predictability is reproduced in Figure 1, with significant predictability associated with TBV up to 36 months from initialization. This high predictability of TBV could be from the larger spatial extent of TBV and/or because of the connection between the Pacific and Atlantic Oceans. The high correlation between Australian rainfall and the TBV in certain regions (Choudhury, Sen Gupta, et al., 2016; Johnson et al., 2018) offers the potential for useful rainfall forecasts that extend beyond normal seasonal timescales. The other SSTA indices exhibit shorter predictive timescale with most becoming equivalent to “noise” within the first 9 months. This is consistent with past research on the limited ability of climate models to cross the “spring barrier” in predicting the El Nino Southern Oscillation (Ruiz et al., 2005).

Details are in the caption following the image
Predictability horizons of the climate indices used in this study (the values for the first nine indices are from Choudhury et al., 2015, and that for TBV are from Choudhury, Sen Gupta, et al., 2016, blue bar and from Chikamoto et al., 2015, gray bar). The predictability horizon is obtained by comparing model errors with a distribution of random model errors, derived from a Monte Carlo resampling of 10,000 times, for each calendar month. The predictability horizon for an index and model is the month before the model error exceeds 5% of the random model error distribution. Further details can be found in Choudhury et al. (2015). MME = multimodel average; EMI = El Niño Modoki index; DMI = Dipole Mode index; EPI = Indian Ocean East Pole Index; WPI = Indian Ocean West Pole Index; II = Indonesian Index; TSI = Tasman Sea Index; TBV = Tropical Transbasin Variability Index .

2.3 HLC Model

The HLC is a simplified version of the “bridging” (BJP-BMA) method, which is the statistical approach used operationally by the Australian BoM to forecast seasonal rainfall from CIs (Schepen & Wang, 2015). The HLC hierarchically combines linear regression models of predictands (such as rainfall) and predictors (such as CIs), using weights representative of the error-covariance in each regression model, in a double (nested) leave-one-out cross-validation framework. Unlike a multiple linear regression model, HLC uses individual linear regression models of the predictors and the predictand while the combining weights are calculated based on the errors in each of these models (equation 2). Given the limited amount of data available, it is imperative to use a strict cross-validation scheme to obtain robust results. We assume that the values of rainfall and climate indices are all known at the start of the forecast (Lead 0) and then progressively forecast the rainfall values at subsequent lead times using “n” individual linear regressions for each of the “n” CI predictors (CIs, called “xi”). These are then combined using weights representative of the errors in each of these linear regressions, ascertained based on the covariance of the errors across all linear regressions developed. Use of the error-covariance as the basis for this combination approach ensures common information in the models being combined does not get overweighted. For each grid point and season, rainfall at a particular lead time (say, “l”) is assumed to depend on the initial observed CIs (at lead “0,” xi,0) and the concurrent MME CIs (at lead lxi,l). A schematic of this rainfall prediction model is shown in Figure 2, followed by a detailed explanation of the process.

Details are in the caption following the image
Schematic of the hierarchical linear combination model. The final prediction at lead l, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0001, consists of two different component models: empirical model, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0002, and dynamical model, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0003. Ye uses lagged observed values at the start of the forecast to predict the response at subsequent leads ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0004) while Yd uses MME values at concurrent leads to predict the response at those lead times ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0005). Here, ne and nd are the number of predictors used for the empirical and dynamical components respectively. The top and middle rows show the skills of the empirical, Ye, and dynamical, Yd, components at predicting the response as lead time increases. The bottom row shows the weights we and wd, representative of the skill of Ye and Yd, used to combine the two component models. The current prediction system needs the weights to always sum up to 1. The skill of the two models in the top two rows is represented as the distributions of the predictions (brown curves) with respect to the observations (black vertical line). It shows the performance of Ye deteriorates faster with lead time as the lagged relationships between the lagged observed predictors and the predictand decay rapidly with lead time. Yd initially has a lower skill compared to Ye, since the MME predictors are not perfect. However as lead time increases, the skill of concurrent MME predictors (Yd) outperforms that of the lagged observed predictors (Ye). This relative skill of the two component models is also reflected in the values of we and wd in the bottom row. Note that for Lead 5, both components get similar weights although neither shows much skill. This is because the current prediction model requires the weights to always sum up to 1. MME = multimodel average.
Consider Yl to be the response (a function of the rainfall amount) at a lead time of l from initialization. As represented in Figure 2, the predictive models formulated for each of the climate indices can be classified into two sets. These are
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0006(1)
In the notation above, superscript e denotes empirical and represents the empirical (lagged) relationship established between observed CIs at lead time 0 and the response at lead l, while superscript d denotes dynamical and uses the drift corrected modeled indices at the concurrent lead time l, ascertained as a MME to reduce structural biases in individual model simulations. G is a vector of the regression coefficients while x is a vector of individual predictors (CIs). The residual error term e is estimated independently for each lead time (l). Note that dynamical, here, simply refers to predictor values at the same lead time as the predictand. These dynamical values are still used in a statistical modeling framework. n represents the total number of predictors (CIs) used for each timestep. Each of the respective residual error terms can be ascertained using a robust cross-validation procedure (Chowdhury & Sharma, 2009). The two modeled responses can then be combined through a Best Linear Unbiased Estimator of the following form:
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0007(2)
where urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0008 is the weight associated with the first prediction model, we, and (1 − wl) is the weight of the second prediction model, wd. urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0009 is the covariance of the errors from each of the prediction models. Use of error-covariance method as the basis for this combination approach ensures common information in the models being combined does not get overweighted. More details on this can be found in Bates and Granger (1969) and Wasko et al. (2013) with extensions to higher-order combinations reported in Kaiser Khan et al. (2014).
In this study, to maintain consistency with the BOM's operational seasonal prediction model, the form of the two component models, Ge() and Gd(), is assumed to be weighted general additive. This is based on the bridging component (BMA procedure) in BoM forecasting system for merging independent forecasts with relatively stable weights, as defined by Q J Wang et al. (2012). Specifically, each such model is defined as follows:
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0010(3)
where mi and ci are the regression coefficients (slope and intercept respectively) for a prediction model using xi as a predictor.
This rainfall prediction model (M) is run at a monthly time step. All the 10 CIs are considered from the observations. However, since we are interested at annual timescales, we have dropped indices that have shown predictability less than 6 months from the decadal experiments (as per Figure 1; Choudhury et al., 2015). That is, decadal predictions of the four indices with the lowest predictability timescales (i.e. DMI, EPI, II, and TSI) are excluded from the dynamical component of the HLC model. In order to assess the ability of the model to forecast into the future, a cross-validation framework is adopted. The framework description uses the following convention:
  1. The components relevant to the dynamical model are denoted by d and use concurrent predictors at lead l.
  2. The components relevant to the empirical model are denoted by e and use lagged predictors at Lead 0.
  3. The predictors from the observations have “Obs” as a subscript while those from the decadal multimodel ensemble have “MME.”
In a leave-one-out cross-validation scheme, the HLC procedure involves the following steps:
  1. As mentioned in section 2.1, a total of 41 overlapping decades (1960–1970, 1961–1971, … , 2000–2010) are considered in this study.
  2. As a first step, both rainfall and CIs are transformed using the extended form of the Box-Cox transform, introduced by Yeo and Johnson (2000), following the “bridging” approach of Q J Wang et al. (2009). They have shown this method of normalizing the data to perform best for Australian rainfall, with the approach being an integral part of the seasonal forecasting procedure adopted by BoM. The transformation coefficient for the rainfall is noted. The discussion henceforth assumes normalized variables are used as predictors and predictands. Similarly, the predictors and predictands represented in equations 2 and 3 earlier denote such transformed variables.
  3. For each grid point and lead time (months), leave one decadal forecast initialized in a specified year out for validation, denoting this as “val1,” and use the remaining forecasts representing the rest of the initializations, for training, with a notation “tr1.” The val1 is the decade that would finally be predicted using this model, and thus, we will have no information about its rainfall value, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0011.
  4. As a result, there are 40 × 1 training rainfall values ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0012, 40 being the number of years when decadal forecasts are issued after one forecast decade is left out for cross validation), 40 × ne training initial observed CIs ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0013, ne equalling 10 in the present study and representing the number of empirical, observed, predictors) and 40 × nd training concurrent (drift corrected) dynamical multimodel-averaged CIs ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0014, nd equalling 6 in this case and representing the SSTA indices that had longer predictability horizons (Choudhury et al., 2015)). Please also note that the observed CIs are used at the initialization time step (denoted 0) while the dynamical CIs are used at the concurrent time step the forecasts are being made for (denoted l). Hence, the empirical case represents lagged observed information while the dynamical case represents concurrent predicted information in the predictor variables.
  5. Next perform the following sequential operations:
  1. Form individual linear regressions between the ne +nd (16) predictors and predictand in a second leave-one-out cross-validation scheme, by leaving one additional forecast out for validation (“val”) and using the other 39 as the training data (“tr”). Notice the difference to “val1” and “tr1”.
  • 1. Ascertain the linear regression coefficients, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0015 (2 × 16), between urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0016(39 × 1) with the 16 predictors:
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0017(4)
  1. Use coefficients urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0018 on the validation set of predictors, to get (ne + nd) estimates of the validation rainfall (1 × 16) and hence the residual error terms (1 × 16):
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0019(5)
  1. Repeat step 5a for all possible 40 cross-validation cases (∀ val ∈ tr1) to have the final error estimate urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0020 (40 × 16):
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0021(6)
  1. Calculate the weights, “ urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0022” (16 × 1), by minimizing the mean squared error loss using the covariance matrix of these errors, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0023, as per (Timmermann, 2006)
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0024(7)
Refer to Kaiser Khan et al. (2014) for details on weight estimation using covariance matrix of the errors. The weights are only allowed to be positive and have to always add up to one.
  1. Next, fit individual regressions of the entire urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0025 (40 × 1) response vector with urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0026 (40 × 10) and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0027(40 × 6), to get the new regression coefficients, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0028 (2 × 16, notice difference to (m,c)tr).
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0029(8)
  1. Then predict the originally left forecast ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0030) using urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0031 (2 × 16) on the 16 individual linear regressions from observed CI values at the start of forecast ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0032, 1 × 10) and the concurrent CIs from MME ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0033, 1 x 6) for that particular decade.
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0034(9)
  1. Combine these independent regressions using urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0035 from Step 5c to get the predicted rainfall per lead time l and per grid point. This estimate is denoted as urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0036:
urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0037(10)
  1. Use the transformation coefficient from Step 2 to inversely transform the predicted rainfall values to get the final rainfall forecast for each month and grid point.

While the rainfall forecasting model is run for each month, results are presented as annual rainfall means, since the study is focused on investigating rainfall prediction at interannual timescales.

The above-mentioned rainfall prediction procedure uses a combination of lagged observed information and concurrent predicted information to form the rainfall forecasts. To reiterate, the empirical component uses observed SST at lead 0 while the dynamical component uses predictors at the concurrent lead time. As a result, the current prediction model is denoted urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0038 to maintain clarity in the remainder of this study. Two other variants are also considered:
  1. urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0039: Predicting rainfall using the same HLC method, but only considering the initial observed SSTA values at the start of the forecast ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0040) as predictors. Without any information about concurrent forecast SSTA indices, this involves predicting rainfall based on lagged SSTA-rainfall relationships and serves as the lower bound on predictability.
  2. urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0041: Predicting rainfall using the same HLC method but considering concurrent observed SSTA indices urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0042 instead of the concurrent forecast indices urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0043 alongside lagged observed SSTA indices, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0044. This indicates the skill that could be achieved if our decadal forecasts were perfect and serves as the upper limit of our prediction ability.

It should be noted that urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0045 and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0046 have 16 predictors, while urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0047 has 10 predictors. It should also be noted that the HLC approach ensures greater parsimony than using a multivariate regression model with as many predictor variables. This is because individual regression models are fitted for each predictor variable, and model weights only modulate the relative importance of each such model, ensuring that the final prediction represents the same level of stability as the individual regression models do.

3 Analysis, Results, and Discussion

In this study, primary results over Queensland (QLD) and the Northern Monsoon Region (NMR), which independently occupy the largest areas (21% and 20%, respectively) among the 10 Australian climate regions, as defined by Timbal et al., 2008 (Figure 3), are presented. These two regions have relatively high rainfall and a strong seasonality and are important for agricultural production. Additionally, these regions have been shown to have a strong relationship with the tropical Indo-Pacific SST, by Kaiser Khan et al. (2014). These two regions were also shown to have the highest skill in predicting next season rainfall based on lagged climate indices using the BMA approach (bridging component of BoM forecasting model) by Q J Wang and Robertson (2011). The performance of the different prediction models ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0048, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0049, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0050) is assessed using correlation between the observed and predicted rainfall over the regions. The aim of the current study is not to develop the best rainfall prediction model but to quantify the merit of using CMIP5 decadal outputs in rainfall prediction and estimate the improvement limits that can be expected if the decadal prediction experiments were perfect.

Details are in the caption following the image
The boundaries of the ten climate zones: Tasmania (TAS), Southwest of Western Australia (SWA), Nullarbor Plain (NUL), the Southwest of Eastern Australia (SEA), the Southern part of the Murray-Darling basin (SMD), the South-East Coast (SEC), the Mid-East Coast (MEC), Queensland (QLD), the Northern Monsoon Region (NMR), and the Northwest of Western Australia (NWA). These 10 climate zones are based on the rotated Empirical Orthogonal Functions for rainfall suggested by Drosdowsky (1993b) and adopted by Timbal et al. (2008) for the BoM statistical downscaling model technical report. These climate zone definitions have also been used previously for Australian rainfall prediction skill studies such as Mehrotra et al. (2014). Results in this study are presented for the QLD and NMR regions.

We expect urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0051 to have its highest skill at short lead times and that the skill would reduce with time as the lagged relationship between the indices and rainfall diminishes (Figure 2). urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0052 would have the highest skill among the three prediction models and should maintain a high skill score at all lead times. We would expect urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0053 to lie between these two extreme cases of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0054 and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0055. Specifically, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0056 would (i) have similar skill to urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0057 (and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0058) at the start of the forecast, since the GCM forecasts should have their highest skill; (ii) decay with increasing lead time, while still being better than urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0059 and lower than urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0060; and (iii) eventually become statistically indistinguishable from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0061 as the GCM predictions lose skill and stop adding any useful information into the prediction model. At higher leads, we would expect more predictors with low weights (since the weights need to sum up to 1) and no preference between the empirical and dynamical predictors as neither would provide useful information.

Figure 4 shows the areal mean correlation skills of the three prediction models ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0062, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0063, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0064) for annual mean rainfall over QLD and NMR. The gray dashed line marks the 90% significant correlation value corresponding the 39 degrees of freedom (0.257, since we have 41 decades in total). urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0065 starts with a correlation skill centered around 0.4 and 0.3 for QLD and NMR, respectively, which drops to close to zero with increasing lead time. urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0066 shows significant correlation with observed rainfall only for the lead of 1 year for both regions. urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0067 shows significant skill for all five leads. The skill of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0068 should ideally not change with lead time since the concurrent relationships between observed predictors and rainfall is independent of time. However, factors unrelated to the various predictors that affect rainfall could become more or less important over time, which could lead to such small deviations in the prediction skill. The skill of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0069 is between 0.42 and 0.52 for QLD and between 0.3 and 0.4 for NMR. This potential skill of rainfall forecasts is greater for QLD than NMR because of a stronger statistical relationship between QLD rainfall and CIs compared to NMR rainfall and CIs (Khan et al., 2015).

Details are in the caption following the image
Model prediction skill for (a) QLD and (b) NMR regions. Correlation between the annual mean observed and predicted rainfall (based on 41 decadal forecasts) using urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0070 (model based only on initial observed CI values; red, lower limit), urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0071 (model based on initial observed CI values and concurrent observed CI values; blue, upper limit) and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0072 (model based on initial observed CI values and concurrent MME CI values; black) at five annual leads. The dashed gray line represents the 90% significance level of correlation values with 39 degrees of freedom (0.257). QLD = Queensland; NMR = Northern Monsoon Region; MME = multimodel average; CI = climate index.

urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0073 starts with correlations between urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0074 and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0075 (because of the high skill of GCMs at short lead times), starts deteriorating, and eventually gets similar to the urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0076. This also shows that we are losing skill in our dynamical predictors even at a lead of 1 year. The biggest improvements of using urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0077 over urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0078 are noticed for the early leads of 2 and 3 years, since the climate models are still skillful enough to add information over just using the lagged observed indices. The biggest drop in skill is noticed for the leads of 3 to 4. This could be because none of the predicted indices add any skill to the prediction model. The correlation of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0079 remains significant till year 3, despite the lagged observations ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0080) providing significant information for only the first year for QLD. This indicates that the dynamical MME predictors are adding value for additional 2 years. This is not the case for NMR. Both urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0081 and urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0082 show significant correlations only for the first year and then become insignificant. At higher leads, the skill of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0083 should collapse down to urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0084 since there is no predictability of the CIs at leads of 4 and 5 years. However, our results show the skill of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0085 is higher than urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0086 even for leads 4 and 5, although the highest predictability of the indices is seen for TBV at 36 months. This could be because a combination of indices is providing skill out to longer lead times based on unknown physical links in the climate system or simply because of the higher number of predictors in urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0087 compared to urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0088. Nevertheless, the skill of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0089 is insignificantly around 0.1 and, hence, this issue is not investigated further.

Figures 5 and 6 shows the spatial distribution of the improvement in correlation skill from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0090 with respect to that from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0091. Hatchings mark the regions where the improvements in correlation skill are significant (at a 90% level based on 39 degrees of freedom, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0092). Grid cells in the north west of NMR that had missing rainfall values in certain decades have been left out of the analysis. Both regions show the lowest fraction of significant improvements for Lead 1. For QLD, the areas that show improvements increase from Lead 1 to Lead 3 and then reduce again. However, the share of area that improves significantly is around 20% for the leads of 2–4 years. The largest improvements are for Lead 3 where 87% of the area improves with 26% showing significant improvements over urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0093. There also seems to be no consistent spatial pattern for the areas that improve, although quality of the observed rainfall data might contribute to uncertainties in the improvements (Viney & Bates, 2004). For NMR, the largest improvement is noticed for Lead 3 (87% of the area), although the 65% of the area shows significant improvements for Lead 2. This is important because although the areal mean correlation of urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0094 over NMR for Lead 2 is insignificant (Figure 4b), 65% of the region shows significant improvement (Figure 5b). This suggests that if a smaller domain in the northern part of NMR were examined, enhanced predictability could have been seen for the second year (unlike Figure 4b). To compare these results with the concurrent observations-based model, Figures S2 and S3 in the supporting information show the spatial improvement in correlation skill from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0095 with respect to that from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0096. Since Lead 2, urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0097 shows significant improvement over around 70% of the area for both regions. Again the area that improves is higher for QLD than NMR, which can be attributed to the stronger statistical relationships between CIs and QLD rainfall (Khan et al., 2015). Figures S2 and S3 represent the value that can be added by decadal prediction experiments using the current rainfall prediction model if they were perfect.

Details are in the caption following the image
Difference in correlation from using urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0098 over urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0099 for four annual leads over Queensland (a–d). The percentage of grids that improved is mentioned for each lead. The share of grids that show improvements greater than 0.257 (90% significance level of correlation based on 39 degrees of freedom) is hatched and mentioned in each plot.
Details are in the caption following the image
Same as Figure 5, but for Northern Monsoon Region (NMR).

Figures 7 and 8 show the weights of the predictors (CIs) which are more important (weights > 0.0625, mean for 16 predictors) from urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0100 for different lead years for QLD and NMR. The number of predictors that have high weights and their rankings change with lead time to represent the predictors that provide most information to the prediction model. Information from other predictors may be small or might have already been available from a different predictor. However, since the current prediction system demands the weights to always sum up to 1, predictors still get weights at higher leads even though they might be adding very little information to the system. Generally, results show that the fraction of CIs with higher weights from the MME increases with lead time while the distinctions between the two sets of predictors being the weakest for leads of 1 and 4 years. With increasing lead time, MME predictors with higher predictability (Figure 1) become more important for both regions. For QLD (Figure 7), both lagged observed CIs and concurrent MME predictors get high weights at the lead of 1 year, although the fraction of lagged CIs is higher. With increasing lead time, the proportion of predictors that are from concurrent MME ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0101) increases, though at least the two strongest predictors are always from the lagged observed CIs ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0102. For NMR (Figure 8), Lead 1 shows the expected result of lagged observed indices providing the most information into the prediction model. The strongest predictors from the lagged observed CIs ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0103 are DMI, TSI, II and EMI. Schepen et al. (2012) showed EMI to be the best predictor for northern NMR and QLD for the MAM season and southern NMR and QLD for the SON season while II, DMI, and TSI to be the best for DJF season at monthly lags over northern Australia. Although, SON rainfall was strongly related to the ENSO indices. The best performing predictors from the concurrent model CIs ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0104 are TBV, EMI, and WPI. These indices show the best match with average decadal timeseries of the observations in Figure S1. Also, these three indices were shown to have the highest predictability from the current set of models used (Figure 1; Choudhury et al., 2015). However, the ENSO indices from the concurrent MME CIs also show up as important predictors for certain leads in both regions. This could be because of the lagged observed CIs contributing even lesser information to the prediction model. For comparison, Figures S4 and S5 in the supporting information show the weights of the predictors from the concurrent observations-based model ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0105). From the concurrent observed indices ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0106, EMI and TBV show the highest weights for all lead years. Next, Niño 4 appears to be the strongest concurrent predictor out of the ENSO indices. This can be expected from Risbey et al. (2009), who showed northern and northeastern Australia to have the highest correlation with Niño 4 among the ENSO indices. From the lagged indices ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0107, DMI, TSI and II still show up as important lagged predictors for QLD and NMR suggesting the weights in urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0108 being robust.

Details are in the caption following the image
Ranking of the predictors in urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0109 that have high weights (greater than 0.0625, since there are total 16 predictors) for four lead years over QLD (a–d). Note that the number of predictors with such weights are different for different lead times. The lagged indices ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0110 are shown in blue (“eObs,” representing initial observed CIs) while those from concurrent MME predictors ( urn:x-wiley:00431397:media:wrcr24134:wrcr24134-math-0111) are in orange (“dMME,” representing concurrent MME CIs). The weights of the individual predictors are mentioned in the x axis. MME = multimodel average; EMI = El Niño Modoki index; DMI = Indian Ocean Dipole Mode Index; EPI = Indian Ocean East Pole Index; WPI = Indian Ocean West Pole Index; II = Indonesian Index; TSI = Tasman Sea Index; TBV = Tropical Transbasin Variability Index; CI = climate index; QLD = Queensland.
Details are in the caption following the image
Same as Figure 7, but for the Northern Monsoon Region (NMR).

4 Conclusions

There has been substantial research investigating predictions at the decadal timescale primarily because of the socioeconomic benefits it could provide. Various modeling groups have run such decadal experiments and shared their outputs for researchers to investigate. While the decadal experiments show promising results, predictions of rainfall remain poor compared to that of other variables. Given the practical importance of seasonal and longer-term rainfall forecasts, this study describes an alternate way of using decadal experiments for rainfall predictions over Australia. A hierarchical model combination approach is used to combine regression models based on SSTA indices to obtain the best rainfall forecast. The model combines regressions based on both observed SSTA indices and the decadal predictions of SSTA indices using weights that are allowed to evolve freely over time penalizing predictors with weaker relationships to rainfall. To estimate the full potential of the current model, concurrent decadal SST outputs are also replaced with observed SST values to determine the upper limit of predictability that can be obtained from the current setup (were our decadal models able to provide perfect forecasts). This study presents an application of this method for interannual rainfall prediction over Australia.

Simulations of relevant SSTA indices from the decadal experiments were used as predictors within a simplified version of the operational seasonal rainfall prediction model used by the Australian BoM, to investigate if they would add merit beyond using lagged observed SSTA indices. Analyses over two of the largest climate zones over Australia (Queensland and the Northern Monsoon Region) show the current rainfall prediction model (that includes concurrent SST predictions from the CMIP5 decadal experiments) to be significantly skillful (at a 90% level) out to a maximum of 3 years for QLD and 1 year for NMR. Although at higher leads most grid cells of the regions showed improvements over just using lagged indices as predictors, only around 20% show significant improvements. Thus, these results in their current form may not warrant usefulness in hydrologic planning. By comparison, if our decadal models could provide perfect forecasts significant improvements would be expected over 70% of the regions for all leads.

It is also important to note that the current prediction model's ability to predict rainfall at longer timescales relies heavily on indices with high interannual predictability, such as TBV. While the connections between the Pacific and Atlantic Oceans are becoming increasingly clearer (McGregor et al., 2018), the mechanisms giving rise to the longer predictability are not fully understood. The Atlantic-Pacific teleconnection is poorly simulated in the models and the representation (and response) of TBV in the models is subject to large uncertainties. But, TBV has so far received rather limited attention and further work is needed to examine if the TBV-Australian rainfall relationship is stable through time.

The results presented in this study simply suggest that there is significant merit in adding SSTA outputs from CMIP5 decadal experiments to interannual rainfall prediction (from 1 year for NMR to 3 years for QLD). Although, significant improvements at higher leads might be possible for certain subregions within NMR. These two areas also have very distinct wet and dry seasons. The results presented here for the interannual rainfall skill may not scale down comparably for wet season rainfall prediction. Although, many climate modes are phase locked to particular seasons and thus higher skill could also be expected for certain regions during certain seasons. These results will also be updated as the next phase of decadal prediction outputs from CMIP6 become available. Alongside this, there are possibilities for refining/improving these results using a range of additional investigations, such as (a) using information from the spatial distribution of prediction skill more effectively; (b) considering other predictors that have higher memory such as warm water volume or other upper ocean heat content measures or using temporal gradients of SST indices (instead of just spatial gradients or just the indices); and (c) supplementing these results from the decadal experiments with the existing seasonal forecasts from the BoM to enhance the usability of such predictions. Further, as the archives of decadal prediction outputs increase, these results of interannual rainfall prediction can be reevaluated. The methodology presented here can used for other variables although the efficacy of the method may be different. As this method combines information from relevant predictors while favoring the more skillful ones, it can be used for variables with low predictability if there are other covarying variables that are more predictable in the decadal simulations.

Acknowledgments

The World Climate Research Program's Working Group on Coupled Modelling is responsible for CMIP. We thank the climate modeling groups for producing and making available their model outputs used in our study (see Table 1, accessible online through http://pcmdi9.llnl.gov/esgf-web-fe/). For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. This research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government. We acknowledge the Australian Research Council (ARC) for funding this research. We also acknowledge the support and encouragement from the Australian Bureau of Meteorology and the New South Wales State Water Corporation. Bellie Sivakumar also acknowledges the support from the Australian Research Council (ARC) through the Future Fellowship Grant (FT110100328). We would like to thank the three reviewers and the associate editor for their comments, which have helped improve the overall quality of the manuscript.