Model forecast skill and sensitivity to initial conditions in the seasonal Sea Ice Outlook
Abstract
We explore the skill of predictions of September Arctic sea ice extent from dynamical models participating in the Sea Ice Outlook (SIO). Forecasts submitted in August, at roughly 2 month lead times, are skillful. However, skill is lower in forecasts submitted to SIO, which began in 2008, than in hindcasts (retrospective forecasts) of the last few decades. The multimodel mean SIO predictions offer slightly higher skill than the single-model SIO predictions, but neither beats a damped persistence forecast at longer than 2 month lead times. The models are largely unsuccessful at predicting each other, indicating a large difference in model physics and/or initial conditions. Motivated by this, we perform an initial condition sensitivity experiment with four SIO models, applying a fixed −1 m perturbation to the initial sea ice thickness. The significant range of the response among the models suggests that different model physics make a significant contribution to forecast uncertainty.
1 Introduction
The rapid loss of Arctic sea ice, especially during the summer, that has taken place in recent decades has resulted in growing interest in the predictability of sea ice on seasonal timescales, in part spurred by expanding socioeconomic activities in the region. Since 2008, the Study of Environmental Arctic Change has led an effort to collect and synthesize September Sea Ice Outlooks (SIOs) from the Arctic research community. The outlooks are predictions of September sea ice extent produced in late spring and early summer using a range of methods (heuristic, statistical, and dynamical models) and published on three separate dates in early June, early July, and early August. Results from SIO are collected and summarized by the Sea Ice Prediction Network (SIPN) and published on the Internet by the Arctic Research Consortium of the United States (see http://www.arcus.org/sipn/sea-ice-outlook).
Forecast skill in SIO over 2008–2013 was evaluated by Stroeve et al. [2014]. To measure skill, individual submissions to SIO were considered as single deterministic forecasts. Overall, skill was only marginally better than a linear trend forecast, and statistical predictions were found to have slightly higher skill than dynamical model predictions. Unexpectedly, forecast skill did not significantly improve as the forecast lead time decreased.
While the application of dynamical models to real-world sea ice prediction is in its infancy, considerable progress has been made over the last decade in assessing and understanding potential predictability in dynamical models, particularly fully-coupled global climate models (GCMs), using a “perfect-model” approach [e.g., Koenigk and Mikolajewicz, 2009; Holland et al., 2010; Blanchard-Wrigglesworth et al., 2011a; Day et al., 2014a]. These studies agree that significant initial-value predictability is about 1–2 years for sea ice area (or extent) and 3–4 years for sea ice volume, while forced predictability (arising from external forcing) emerges about 5 years after forecast initialization. Thus, seasonal sea ice forecasting is an initial-value problem. Additionally, several studies have recently found skill in retrospective forecasts (hindcasts) of September sea ice extent from spring or early summer initializations over the recent satellite-based observation record [e.g., Sigmond et al., 2013; Chevallier et al., 2013; Wang et al., 2013; Peterson et al., 2014; Msadek et al., 2014; Guemas et al., 2014].
There are two primary sources of uncertainty in seasonal predictions: poorly known initial conditions and model uncertainty. In perfect-model studies, initial conditions are known perfectly and there are no model uncertainties, in the sense that the model is used to predict itself. The chaotic growth of prescribed infinitesimal errors in the initial conditions is the only source of forecast error, and thus, the skill found in perfect-model studies is considered to be the upper limit of predictability for a specific model.
When these models are used to forecast the real world, both sources of error need to be considered. Initial conditions are often poorly constrained, and model physics are an approximation of the real world. To date, there have been few efforts to assess how potential predictability of sea ice depends on either source of error. Considering the first source of error (imperfect initial conditions), Day et al. [2014b] and Msadek et al. [2014] find that correct initialization of sea ice thickness is key for skillful predictions, while considering the second source of error (model uncertainty), Juricke et al. [2014] assess how different sea ice strength formulations affect sea ice predictability.
While completely attributing the gap between potential and actual forecast skill to the source of error is a complex problem, progress can be made in understanding current forecast skill. In this paper we first assess the skill of dynamical models in SIO and consider how it compares with the skill from all published perfect-model and hindcast experiments, and that of a simple statistical benchmark. To help isolate the role of model uncertainty on forecast error, we then show results from an initial-condition perturbation experiment using four models that have submitted to SIO in the past. In this experiment, conducted as part of the 2014 SIPN international workshop, we assess the response of the models to an identical perturbation in the sea ice thickness initial conditions that were used for the 2013 SIO.
2 SIO Model Forecast Skill
(1)
(2)
is the multimodel ensemble mean of forecasts, j is the model, i is the year, J is the total number of forecasts for each submission call, and N is the total number of variables in the summation. We use the National Snow and Ice Data Center September sea ice extent values [Fetterer et al., 2002, updated 2014] for xi in equations 1 and 2. For all years, SIO model forecasts total 35, 43, and 37 for the June, July, and August submission dates, respectively.
SIO forecasts are generally initialized in the weeks preceding the submission date, and this time-varying initialization is illustrated by the line plots in Figure 1. To compare with the results of SIO models, we add to Figure 1 the RMSE of previously published perfect-model studies, the RMSE of bias-corrected and detrended hindcast experiments, and the RMSE for all SIO submissions (including statistical and heuristic methods from Stroeve et al. [2014]). We also include the RMSE of a damped anomaly persistence forecast, whereby the anomaly from the linear trend in the month preceding the forecast is applied to the following September linear trend value, scaled by the autocorrelation coefficient between both months and the ratio of the standard deviations of both months [Van den Dool, 2006]. We also show
, where σ is the standard deviation of observed detrended September sea ice extent—this value represents the lower threshold of skill [Collins, 2002] and, by definition, is also the predictability offered by a linear trend forecast, while a RMSE value of zero represents perfect skill. We apply an F test to calculate the degree of statistical significance of the difference among various RMSE values. Details of all experiments displayed in Figure 1 are shown in Table 1.
| Model | Experiment Type | Period | Reference |
|---|---|---|---|
| GFDL CM3 | perfect model | present-day control run | Tietsche et al. [2014] |
| HadGEM1.2 | perfect model | present-day control run | Tietsche et al. [2014] |
| MPI-ESM-LR | perfect model | present-day control run | Tietsche et al. [2014] |
| EC-EARTH V2 | perfect model | present-day control run | Tietsche et al. [2014] |
| NCAR CCSM4 | perfect model | present-day control run | Blanchard-Wrigglesworth et al. [2011a] |
| CanSIPS | hindcast | 1979–2009 | Sigmond et al. [2013] |
| CNRM-CM5.1 | hindcast | 1990–2009 | Chevallier et al. [2013] |
| MetOffice GLOSEA5 | hindcast | 1996–2009 | Peterson et al. [2014] |
| NOAA CFSv2 | hindcast | 1982–2007 | Wang et al. [2013] |
| GFDL CM2.1 | hindcast | 1982–2013 | Msadek et al. [2014] |
| GFDL CM2.5 FLOR | hindcast | 1982–2013 | Msadek et al. [2014] |
| SIO models | forecast | 2009–2014 | Stroeve et al. [2014] |
While other metrics are commonly used in predictability studies, such as the anomaly correlation coefficient, or ACC (see Collins [2002] for a discussion of different metrics), we choose to use RMSE since it arguably provides a more physically intuitive estimate of predictability error—an actual sea ice extent value rather than a correlation coefficient. Moreover, calculating an ACC requires a minimum sample size of n = 3 for each model in SIO and would thus eliminate from our sample size those models that have only one or two yearly submissions, which represent 35% of the total sample size.
Perfect-model forecasts tend to have higher skill than the hindcasts, yet the skill in both beats the skill threshold from damped persistence. Interestingly, the skill of the hindcasts improves little at shorter lead times—much less than the improvement in skill of damped persistence. Conversely, individual SIO model forecasts (RMSEsio − obs) exhibit very high RMSE values and hence do not beat damped persistence or even
for the June and July initializations, indicating no skill. The RMSEsio − obs is significantly different from all hindcast RMSE values at the 99% level. SIO model skill is slightly higher when considering the SIO model-mean predictions (RMSEmean(sio) − obs) yet only marginally beats the damped persistence threshold for June initialization. However, given the insignificant correlation of sea ice extent anomalies between May and September [Blanchard-Wrigglesworth et al., 2011b], damped persistence is not expected to yield significant skill at this lead time. The RMSEmean(sio) − obs is not significantly different from the hindcast RMSE values at the 95% level.
(3)This metric can shed light on whether there is a consistent bias (in model physics and/or initial conditions) among models, as a small value would indicate small intermodel differences in initial conditions and physics, whether the models skillfully simulate observations or not. If the models predict each other better than observations, this would be evidence of common errors in the physics and initial conditions employed across the models. In the case that a common bias exists across models, one would expect RMSEsio − sio to be lower than RMSEsio − obs. However, if these RMSE values are equal, then it would indicate the models have the same skill (or lack of skill) in predicting each other or observations. Figure 1 shows that RMSEsio − sio is only marginally lower than RMSEsio − obs, and not statistically significantly different at the 95% level, indicating that SIO models are as poor at predicting each other as they are poor at predicting observations. This result suggests that there are significant differences in physics and/or initial conditions across SIO models.
3 Sensitivity of SIO Models to Initial Conditions
Forecast systems may produce forecasts that disagree with one another by using different initial conditions and/or different model physics. Dynamical models that participate in SIO range from regional ice-ocean models to global fully-coupled models, while the observations that are assimilated and the ice-ocean reanalyses that are used as initial conditions vary substantially (M. Chevallier, personal communication, 2015). In order to investigate the role of model uncertainty on forecast error and spread, we next test the sensitivity of the forecast to sea ice thickness initial conditions of dynamical models in SIO. To achieve this, prior to the 2014 SIO workshop modeling groups that submitted to SIO in 2013 were invited to perform an initial-condition perturbation experiment in which the sea ice thickness was reduced by 1 m relative to the initial conditions used for the 2013 SIO—referred to as the control run hereafter.
Four groups performed the experiment with the following models: National Center for Atmospheric Research (NCAR) Community Climate System Model version 4 (CCSM4) [Gent et al., 2011], NASA Global Modeling and Assimilation Office (GMAO) Goddard Earth Observing System 5 Atmosphere-Ocean General Circulation Model (AOGCM) [Suarez et al., 2008], NOAA Climate Forecast System version 2 (CFSv2) [Saha et al., 2014], and Pan-Arctic Ice Ocean Modeling and Assimilation System (PIOMAS) [Zhang and Rothrock, 2003]. For the SIO forecast, NCAR CCSM4 utilizes a global fully-coupled GCM initialized with sea ice thickness anomalies taken from PIOMAS. To create an ensemble forecast, initial atmospheric states are varied across ensemble members by taking consecutive days in the control run centered around the initialization date. NASA GMAO and NOAA CFSv2 are global fully-coupled seasonal forecasting systems. PIOMAS is a regional ice-ocean model that assimilates sea ice concentration and is forced by NOAA/National Centers for Environmental Prediction reanalysis data. In forecast mode, PIOMAS produces a seven-member ensemble, each member with prescribed atmospheric conditions taken from reanalysis of the previous 7 years. Further details on the models and the methodology employed for the experiment and SIO can be found in Table S1 in the supporting information and online at the SIO URL (http://www.arcus.org/search-program/seaiceoutlook/2013/july). NCAR CCSM4, NASA GMAO, and NOAA CFSv2 are initialized in early May, while PIOMAS is initialized in early June. Initial sea ice thickness was reduced in all four models by 1 m, except in regions where the original sea ice thickness is less than 1.5 m. In these regions, the perturbation was reduced linearly as a function of the original control sea ice thickness, as p = h/1.5, where p is the perturbation and h is the control ice thickness. Initial sea ice area and extent was kept unchanged from the control run. The choice of a 1 m sea ice thickness perturbation is arbitrarily chosen to be large enough to provide a high signal-to-noise response in the experiments and is a 2–4 σ anomaly of sea ice thickness [Blanchard-Wrigglesworth and Bitz, 2014].
Figure 2 shows monthly sea ice area in all models for the control and perturbation forecasts. In all four models, sea ice area is smaller relative to the control, as would be expected from the reduced sea ice thickness in the initial conditions. However, the response varies across all four models throughout the forecast integration period (May through September for NCAR CCSM4, NASA GMAO, and NOAA CFSv2, and June through September for PIOMAS). Figure 3a shows the mean monthly sea ice area difference between the perturbed and control forecasts. In NCAR CCSM4 the perturbed forecast loses 2.5 × 106 km2 relative to the control between May and July, and subsequently, the loss in sea ice area stabilizes. NASA GMAO follows a similar trajectory until July, but subsequently, the loss in sea ice area in the perturbed forecast relative to the control increases to almost 4.5 × 106 km2 by August—indeed, NASA GMAO loses most sea ice by August (see Figure 2c). NOAA CFSv2 has a rapid initial loss of 2–2.5 × 106 km2 by June, which then stabilizes for the reminder of the forecast, while PIOMAS has a strong initial loss in sea ice area over the first 2 months (June–July), as its summer sea ice area becomes strongly reduced (see Figure 2b). By September, the range of response across all four models is a loss of 1.9 × 106km2 to 4.4 × 106km2. It is unclear why the models have such a wide range of response, and as shown in Figure S3, the range is only weakly related to the range of sea ice thickness initial conditions.


We also explore how the growth of the forecast ensemble spread of individual models (a measure of potential predictability) responds to the initial condition perturbation (see Figure 3b). In NCAR CCSM4, the spread of the perturbed forecast ensemble for June and July grows faster than the control forecast ensemble, indicating a more rapid loss of potential predictability in the perturbed forecasts in these months. However, in August and September both ensemble spreads are comparable (not significantly different at the 95% level), indicating that potential predictability is similar. PIOMAS has a different response—while the control forecast ensemble spread grows comparably to that of the control forecast ensemble in NCAR CCSM4, the perturbed forecast ensemble has a much more reduced spread after July, which is visually apparent in Figure 2b. Lastly, NOAA CFSv2 has a much slower growth in its control forecast ensemble spread than either NCAR CCSM4 or PIOMAS, while its perturbed forecast ensemble spread is not significantly different from the control forecast ensemble spread. Thus, all three models that performed a multiple-run perturbation forecast exhibit different responses in the growth of forecast ensemble spread.
4 Discussion
The low skill in SIO models relative to the hindcasts is surprising, particularly considering that some of the models that have produced hindcasts have also submitted outlooks to SIO (e.g., NOAA CFSv2 and MetOffice GLOSEA5). It is possible that summer sea ice extent has been less predictable in recent years compared to previous decades (the time period for which most of the hindcasts were produced, see Table 1), a result also found within the Geophysical Fluid Dynamics Laboratory (GFDL) hindcasts [Msadek et al., 2014]—indeed, thinner sea ice conditions may lead to lower predictability [e.g., Holland et al., 2010] and may offset the expected forecast improvement offered by improved data availability for initialization in recent years [Msadek et al., 2014]. One may also consider how sea ice persistence has evolved throughout the satellite era, since sea ice predictability is intrinsically linked with persistence of sea ice anomalies [Day et al., 2014a], and it has recently been shown that the persistence and reemergence of sea ice anomalies show significant interannual variability [Bushuk et al., 2015]. Thus, reduced persistence throughout the summer may be indicative of reduced potential predictability. However, neither persistence nor the skill offered by damped persistence has decreased in recent years (Figure S1).
Interestingly, some climate reanalysis systems that are used to provide initial conditions for hindcast and forecast experiments show reduced fidelity in simulating observed September sea ice extent variability in recent years compared to the earlier period of the satellite observation record (Figure S2) [Msadek et al., 2014, Figure 1b], and it is plausible that this would imply a reduction in forecast or hindcast skill during recent years. It is worth noting that the RMSE of SIO submissions of both NOAA CFSv2 and MetOffice GLOSEA5 models is significantly higher than their hindcast RMSE values shown in Figure 1, and no better than RMSEsio − obs (not shown). Care must be taken, however, when interpreting this result given the small sample size of individual model submissions to SIO (10 and 7 submissions during 2011–2014 from NOAA CFSv2 and MetOffice GLOSEA5, respectively).
The multimodel mean of SIO forecasts offer slightly higher skill than the single-model ensemble prediction, a typical feature seen in other areas of seasonal climate prediction [e.g., Hagedorn et al., 2005] yet neither beats significantly a damped persistence forecast, perhaps helping explain why statistical models show higher skill than dynamical models in SIO [Stroeve et al., 2014]. Furthermore, models are equally unskilled at predicting each other, indicative of a large difference in model physics and/or initial conditions.
Four dynamical models that have participated in SIO have significantly different responses to identical initial condition perturbations. This result implies that different model physics can result in significantly different forecast sensitivity to identical initial condition perturbations. Thus, model physics are likely responsible for a significant component of the large spread in the multimodel September SIOs. Improving the skill of September sea ice predictions will thus depend not only on better observations and assimilation of the models' initial conditions but also on further improving model physics.
Acknowledgments
We thank Michael Sigmond, Bill Merryfield, Woo-Sung Lee, and Rym Msadek for making the CanSIPS, NOAA CFSv2, GFDL CMOR, and ECDA hindcast skill scores available and Larry Hamilton for assistance with the SIO data sets. We thank two anonymous reviewers for helpful suggestions on how to improve the paper. The SIO data in the paper are available at http://www.arcus.org/sipn/sea-ice-outlook. EBW and CMB were supported by ONR grant N0014-13-1-0793. JZ was supported by ONR grant N00014-12-1-0112.
The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.





