Contribution of land surface initialization to subseasonal forecast skill: First results from a multi-model experiment
Abstract
[1] The second phase of the Global Land-Atmosphere Coupling Experiment (GLACE-2) is aimed at quantifying, with a suite of long-range forecast systems, the degree to which realistic land surface initialization contributes to the skill of subseasonal precipitation and air temperature forecasts. Results, which focus here on North America, show significant contributions to temperature prediction skill out to two months across large portions of the continent. For precipitation forecasts, contributions to skill are much weaker but are still significant out to 45 days in some locations. Skill levels increase markedly when calculations are conditioned on the magnitude of the initial soil moisture anomaly.
1. Introduction
[2] To forecast precipitation, air temperature, and other meteorological quantities weeks to months in advance, prediction systems must take advantage of Earth system components with implicit memory or predictability at such timescales, components that can in turn transfer this predictability to the atmosphere. The ocean is thus a critical component of today's seasonal prediction systems. The ocean, however, has limited impact on continental midlatitude areas during summer [e.g., Dirmeyer et al., 2003]. In these areas, another component of the climate system, soil moisture, is accordingly more important. The timescales of soil moisture memory span weeks to a couple of months [Entin et al., 2000]. During summer, high (low) soil moisture anomalies can lead to high (low) evaporation anomalies, and the associated increased (decreased) evaporative cooling of the surface can lead to a cooling (warming) of the overlying air [Fischer et al., 2007]. Depending on conditions, the evaporation anomalies may also lead to precipitation anomalies [e.g., Beljaars et al., 1996].
[3] The use of realistic soil moisture initialization is starting to become standard practice in long-range forecasting [e.g., Vitart et al., 2008], following on the assumption that such better initialization will improve forecasts, an assumption reflecting a long history of exploration of the coupled soil moisture-atmosphere system and supported to varying degrees by several recent uncoordinated forecasting studies [e.g., Koster et al., 2004a, Douville, 2009]. The second phase of the Global Land-Atmosphere Coupling Experiment (GLACE-2), a project jointly sponsored by the World Climate Research Programme's Global Energy and Water Cycle Experiment (GEWEX) and Climate Variability Study (CLIVAR), is designed to evaluate that assumption for the first time in a coordinated, comprehensive, and systematic manner with a wide variety of state-of-the-art long-range forecasting systems and with a forecast collection substantial enough for robust statistics. The overall goal of GLACE-2 is to provide, for the first time for today's models, a consensus view of the degree to which soil moisture initialization contributes to forecast skill at the subseasonal scale.
[4] First results from GLACE-2 are presented here. The results provide some optimism for the usefulness of soil moisture initialization, particularly when forecast skill calculations are conditioned on the size of the initial soil moisture anomaly.
2. GLACE-2 Forecast Experiments
2.1. Overview of Experiment
[5] The basic design of the experiment, followed (sometimes with second-order modifications) by all participating groups, involves running two parallel sets of 2-month retrospective forecasts, the forecast for each start date comprised of ten ensemble members (usually differing from each other by slight variations in atmospheric initial conditions). In the first set of forecasts, the initial land prognostic variables for all ensemble members are set to the same realistic values, using the approach described in section 2b. In the second set, the initial land states are chosen randomly for each ensemble member from a background distribution. (For some systems, this background distribution reflects the contemporaneous sea surface temperature, or SST, distribution; for most systems, it does not.) SSTs for both sets are initialized to realistic values (Reynolds and Smith [1994], with updates), and during the forecast period, modelers either let the SSTs evolve within a coupled ocean model or let them decay to climatology using persistence timescales derived from the SST observations. For most systems, atmospheric initial conditions were derived from reanalysis for both sets.
[6] Forecasted precipitation, P, and near-surface air temperature, T, values (ensemble means) for each of the two sets are compared to observations to generate a skill score for that set. Subtracting the skill score of the second series (i.e., obtained without realistic land initialization) from that of the first series (obtained with realistic land initialization) isolates the contribution of land initialization to forecast skill, the quantity of interest in this paper.
[7] To ensure a reasonable sample for statistical analysis, both sets consist of 100 independent forecasts, one for each of ten start dates (April 1, April 15, May 1, … August 15) in each of the ten years spanning 1986 and 1995. In this paper, we focus on forecasts at subseasonal leads, beyond the realm of short-term weather forecasts. Forecasted P and T were averaged over days 16–30, days 31–45, and days 46–60. For the June – August forecast periods considered in this paper, this allows (for a given model) 60 independent forecasts of P and T at each of these three leads for the calculation of skill.
2.2. Estimation of Realistic Land Initial Conditions
[8] For the first series of forecasts, each modeling group produced its own sets of realistic land initial conditions by driving its own land model globally offline (i.e., disconnected from the host atmospheric model) with realistic fields of precipitation, radiation, and other meteorological forcings over the years 1984–1995, essentially using the approach employed in the 2nd phase of the Global Soil Wetness Project (GSWP-2 [Dirmeyer et al., 2006]). (FSU/COAPS used a coupled initialization approach instead.) Most groups used the forcing datasets of GSWP-2, though some used similar data extracted from Sheffield et al. [2006]. The land surface prognostic fields simulated offline for 1 June 1990, for example, were used to initialize the 1 June 1990 forecast with the prediction system.
[9] Because the climatic forcing of atmospheric models is biased relative to that of nature, however, the land fields generated offline were “scaled” before they were used to initialize the forecasts. In essence, the value of a variable produced with the offline system was converted to a standard normal deviate for the date in question, and this standardized value was combined with the corresponding mean and standard deviation of the atmospheric model to produce the initialization value. (Some groups, in fact, used a more rigorous scaling approach, and FSU/COAPS avoided scaling altogether by using coupled land surface data assimilation.) Imposed limits ensured that the soil moistures produced were not unrealistic (e.g., above porosity). This scaling ensures that a relatively wet soil moisture condition is translated, to first order, to a correspondingly wet condition within the atmospheric model's climate.
2.3. Skill Metrics
[10] The daily precipitation observations used for forecast evaluation were derived from station measurements [Higgins et al., 2000], and the daily temperature observations used were estimated by averaging the minimum and maximum daily temperatures (again derived from station observations) stored in the Hadley Centre archives (http://hadobs.metoffice.com/hadghcnd/). All model and observational data were regridded to the same 2° × 2.5° global grid prior to analysis. Furthermore, prior to analysis, all 15-day P and T values for a given model or observational source were standardized: the mean for the given time of year (for that dataset) was subtracted, and the difference was divided by the standard deviation for that time of year (for that dataset). This step was critical for the estimation of a “consensus vision” of skill in section 3.
[11] Forecasted 15-day average values of P and T (at 15-day, 30-day, and 45-day lead times) falling within any of six boreal summer periods (essentially, the first and last halves of June, July, and August) were compared to the corresponding observational averages by plotting the standardized forecast/observation pairs on a scatter plot and then computing the square of the correlation coefficient (r2) between them. This r2 value represents the fraction of explained variance and is our metric of forecast skill. (First, though, negative correlations were set to zero, since they were assumed to reflect sampling noise.) The difference (hereafter, rdiff2) between r2 for the forecast series with realistic land initialization and that for the series without it serves to quantify the contribution of land initialization to skill.
2.4. Participants
[12] Ten modeling systems, itemized in Table 1, provided the data analyzed in section 3. A few systems performed half of the forecasts, those beginning on the first of the month.
System Name | Reference | JJA Forecast Periods Contributed |
---|---|---|
Canadian Centre for Climate Modelling and Analysis (CCCma) CanCM3 | Scinocca et al. [2008] | 30 |
Center for Ocean-Land-Atmosphere Studies (COLA) GCM V3.2 | Misra et al. [2007] | 60 |
European Centre for Medium-Range | Vitart et al. [2008] | 60 |
Weather Forecasts (ECMWF) Integrated Forecast System | Balsamo et al. [2009] (http://www.ecmwf.int/research/ifsdocs/CY33r1/index.html) | |
European Centre/Hamburg forecast system (ECHAM/JSBACH) | Roeckner et al. [2003], Raddatz et al. [2007] | 30 |
Florida State University/Center for | Shin et al. [2005] | 60 |
Ocean-Atmosphere Prediction Studies (FSU/COAPS) model | Cocke and LaRow [2000] | |
Geophysical Fluid Dynamics Laboratory (GFDL) Global Atmospheric Model | GFDL Global Atmospheric Model Development Team [2004], Delworth et al. [2006] | 30 |
NASA/Global Modeling and Assimilation Office (GMAO) seasonal forecast system (pre-GEOS5 version) | Bacmeister et al. [2000] | 60 |
National Center for Atmospheric Research (NCAR) Community Atmospheric Model 3.0 | Collins et al. [2006] | 60 |
National Center for Atmospheric Research (NCAR) Community Atmospheric Model 3.5/Community Land Model 3.5 | Neale et al. [2008], Oleson et al. [2008] | 60 |
National Centers of Environmental Prediction (NCEP) Global Forecast System (GFS/Noah) | Moorthi et al. [2001], Ek et al. [2003] | 60 |
3. Results
[13] The contribution of land initialization to skill (rdiff2) differs significantly between the models, as will be addressed in an upcoming GLACE-2 overview paper. Here, we provide some key results from a multi-model analysis. We focus now on the United States region of North America, a region for which two key considerations are met: the models are known to have substantial intrinsic land-related precipitation variability [Koster et al., 2004b], and observational data are known to be comprehensive.
3.1. Consensus Estimate of Land Contribution to Skill
[14] Figures 1a and 1b (All Dates) show the rdiff2 levels for precipitation and temperature, respectively, produced when the standardized forecasts of all of the models are plotted against observations on the same scatter plot. (Note the unevenly spaced levels near the zero value on the color bar.) The dots on Figures 1a and 1b indicate grid cells for which the plotted rdiff2 values differ significantly from zero at the 95% confidence level. The particular rdiff2 value associated with this significance level varies geographically and was computed with a Monte Carlo analysis that accounts for correlations in model behavior, which reduce the effective degrees of freedom – given these correlations, the effective degrees of freedom for N models and 60 forecasted periods may be less than 60N. (In the Monte Carlo analysis, to test the null hypothesis of zero land-related skill, sets of contemporaneous model forecasts were compared to repeated shufflings of the standardized observational data.) Statistical significance can also be gauged qualitatively by comparing the amounts of positive and negative values in each plot. Under the null hypothesis of zero skill from land initialization, roughly the same number of negative values should appear as positive values; thus, if positive values overwhelmingly dominate a plot, we can reasonably conclude that land initialization-derived skill is indeed nonzero.

[15] For precipitation, rdiff2 levels for the 15-day averages at the 15-day lead (days 16–30) and 30-day lead (days 31–45) are positive and significantly different from zero within a west-east swath that cuts across the continent. The magnitude of rdiff2 in this swath, however, is quite small: somewhere between 0.01 and 0.05. (Note that these are differences between small numbers that themselves decrease with lead time.) In other words, the multi-model consensus view is that precipitation forecasts are improved by realistic land initialization in this area, but only by a small and possibly inconsequential amount. Even slighter skill improvements are seen for the 15-day average at the 45-day lead.
[16] For air temperature, Figure 1b (All Dates) shows that at all three leads (i.e., out to the 45-day lead, for days 46–60), land initialization's contribution to skill is larger and significant across much of the continent. The rdiff2 values often exceed 0.05 and sometimes exceed 0.1.
3.2. Conditional Skill Levels
[17] The multi-model scatter plots underlying the rdiff2 calculations discussed above form the starting point for a conditional skill analysis: the quantification of P and T forecast skill conditioned on the size of the initial (and local) soil moisture anomaly. The idea is that more extreme values of soil moisture initial conditions may have a more substantial impact on a forecast than non-extreme values (values close to the long-term mean for the given time of year). In the multi-model scatter plot, an rdiff2 value can be calculated for a selected subset of the points – those points for which the initial root zone soil moisture in the local area (the 3 × 3 grid cell region centered on the point in question) lies at the dry or wet ends of the full spectrum of simulated values. Here, we examine rdiff2 for three different subsets of the full suite of forecasts: (1) those forecasts for which the initial soil moisture lies in either the upper or lower tercile of all simulated values, a total of 40 independent forecast start dates for each model; (2) those for which the initial soil moisture lies in the first or fifth quintile of all simulated values, a total of 24 independent forecast start dates; and (3) those for which the initial soil moisture lies in the first or tenth decile of all simulated values, a total of 12 independent forecast start dates. Of course, the rdiff2 calculations for the third case are based on more than 12 forecasts, since the 12 forecasts from a number of models are considered together in these scatter plots.
[18] This sub-setting approach implicitly assumes that soil moisture impacts are mostly local; analyses of potential remote impacts are saved for future study. It is important to note that with this “local” assumption, the rdiff2 levels produced at different grid cells are based on a different selection of forecast dates (although a significant spatial correlation of soil moisture will limit these differences). To avoid the further complication of having different start dates used for different models (based on the different, model-dependent states generated in the GLACE-2 initialization procedure), the start dates used in each sub-setting are derived from an independent analysis of soil moisture [Koster et al., 2009] for the 1986–1995 period. Daily root zone soil moistures from seven land models participating in GSWP-2, many of which did not participate in GLACE-2, were standardized and then averaged on the day prior to each of the forecast start dates. The 60 averages so obtained were accordingly ranked and sampled.
[19] Results from the conditional skill analysis are shown under Extreme Terciles, Extreme Quintiles, and Extreme Deciles in Figures 1a and 1b. For both P and T, rdiff2 levels tend to increase with the “extreme nature” of the initial soil moisture anomaly. The implication is straightforward: a more extreme initial anomaly is more likely to contribute to forecast skill. Noise, however, also increases as larger extremes are considered because fewer data points are used to compute the statistics; the significance levels for each sub-setting were recomputed to reflect the smaller sample size.
[20] For P, rdiff2 in some places increases to 0.05 or more when conditioning on extreme quintiles or deciles, even at the 45-day lead (days 46–60). The increases are more striking for T. When considering extreme deciles, rdiff2 for temperature exceeds 0.25 in many places, for all leads. For both P and T, the positive rdiff2 values generally dominate strongly the negative values, further supporting the idea that these skill increases are real, i.e., that realistic land initialization does contribute skill to subseasonal forecasts.
4. Summary and Discussion
[21] GLACE-2 is designed to address, with a broad suite of forecast systems and an extensive sample of forecast start dates, a fundamental question in subseasonal forecasting: to what extent does realistic land surface initialization increase the skill of forecasts? Here, we present the first results from GLACE-2 in the form of data composited across the participating models. Figure 1 (All Dates) suggests that over North America, realistic initialization has a very small, if sometimes significant, impact on precipitation forecasts out to a 30-day lead (days 31–45) and a somewhat larger impact on temperature forecasts, even out to a 45-day lead (days 46–60). Despite their small size, the skill contributions are important given the small skill levels obtained at these leads with prediction systems, particularly in midlatitudes during summer. (Such small skill levels are seen, for example, in the GLACE-2 results that do not use land initialization and in 31-day-lead DEMETER and CFS forecasts, as recently processed by Lavers et al. [2009].)
[22] The GLACE-2 framework lends itself to a variety of additional analyses, including studies of intrinsic land-related model predictability, local versus remote land surface impacts, probabilistic forecasting, optimal weighting strategies for multi-model forecasts, and the potential for asymmetry in land impacts – perhaps dry anomalies contribute more to skill than wet anomalies. Such questions will be addressed in future work. Here, we address the question of conditional forecast skill. At the beginning of a forecast, a forecaster would know the magnitude of the initial soil moisture anomaly in a region of interest. If that anomaly is large – if it lies, for example, in an extreme tercile, quintile, or decile of its background distribution – the ensuing forecast can be interpreted in the context of the conditional skill contributions shown in Extreme Terciles, Extreme Quintiles, and Extreme Deciles of Figure 1. A larger initial soil moisture anomaly is more likely to contribute to a more accurate precipitation and air temperature forecast, particularly in the regions indicated in Figure 1. When considering the benefit of realistic land initialization to subseasonal prediction, the larger conditional skill contributions in Figure 1 are a particular source of optimism.
Acknowledgments
[23] Coordinators of the GLACE-2 project gratefully acknowledge financial support from NOAA's Climate Prediction Program for the Americas and NASA's Terrestrial Hydrology program. The various participants in the project (see Table 1) were able to perform the GLACE-2 simulations thanks to financial and computational support from their home institutions and/or from the institutions hosting the numerical prediction systems. We thank WCRP's GEWEX and CLIVAR projects for their sponsorship of this project.