Bright Prospects for Arctic Sea Ice Prediction on Subseasonal Time Scales
Abstract
With retreating sea ice and increasing human activities in the Arctic come a growing need for reliable sea ice forecasts up to months ahead. We exploit the subseasonal-to-seasonal prediction database and provide the first thorough assessment of the skill of operational forecast systems in predicting the location of the Arctic sea ice edge on these time scales. We find large differences in skill between the systems, with some showing a lack of predictive skill even at short weather time scales and the best producing skillful forecasts more than 1.5 months ahead. This highlights that the area of subseasonal prediction in the Arctic is in an early stage but also that the prospects are bright, especially for late summer forecasts. To fully exploit this potential, it is argued that it will be imperative to reduce systematic model errors and develop advanced data assimilation capacity.
Plain Language Summary
The need for reliable forecasts for the sea ice evolution from weeks to months in advance has substantially grown in the last decade. Sea ice forecasts are of critical importance to manage the opportunities and risks that come with increasing socioeconomic activities in the rapidly changing Arctic, which, despite the reduction of the sea ice cover, remains an extreme environment. The position of the sea ice edge is a key parameter for potential forecast users, such as Arctic mariners. However, little is known about the ability of current operational subseasonal forecast systems to predict the evolution of the ice edge. Therefore, we assess for the first time the skill of state-of-the-art forecast systems, using a new verification metric that quantifies the accuracy of the ice edge position in a meaningful way. Our results demonstrate that subseasonal sea ice predictions are in an early stage, although skillful predictions 1.5 months ahead are already possible. We argue that relatively modest investments into reducing initial state and model errors will lead to major returns in predictive skill.
1 Introduction
The observed rapid retreat of Arctic sea ice and the prospect of a virtually ice-free Arctic Ocean in late summer by the middle of this century (Collins et al., 2013; Overland & Wang, 2013; Stroeve et al., 2007; Wang & Overland, 2009) have fueled socioeconomic interests in the region (Emmerson & Lahn, 2012; Stephenson et al., 2011). As a consequence there is a growing demand for reliable predictions of Arctic weather and sea ice across a wide range of time scales to reduce the risks that come with enhanced activities in the high north (Jung et al., 2016).
Much of what is known about the skill of existing systems in predicting Arctic sea ice is based on the Sea Ice Outlook (Stroeve et al., 2014)—an effort of the international research community that since 2008 has been aiming to build and evaluate seasonal sea ice prediction capabilities. So far, Sea Ice Outlook dynamical predictions have shown limited skill, with simple statistical forecasts being of comparable quality (Blanchard-Wrigglesworth et al., 2017; Stroeve et al., 2014). On the other hand, perfect-model studies suggest significant potential predictability at seasonal time scales (Goessling et al., 2016; Guemas et al., 2016; Tietsche et al., 2014), indicating that there is scope for major improvements. On much shorter weather time scales (up to ∼10 days ahead) high-resolution forecast systems are increasingly being used by operational ice services (Carrieres et al., 2017; Sea-ice information services in the world, edition 2017, 2017), and recent research has started into exploring the predictability of sea ice on these shorter time scales (e.g., Mohammadi-Aragh et al., 2018).
The potential for skillful predictions of Arctic sea ice on subseasonal-to-seasonal (S2S) time scales has improved considerably through recent developments. Recognizing the urgent need for a better representation of the sea ice-ocean system, forecast centers are moving toward using fully coupled models (Smith et al., 2015). This also holds for shorter weather time scales, where features such as the location of the sea ice edge can feed back significantly to the atmosphere, thereby influencing the further evolution of the coupled system (Jung et al., 2016). This development toward using coupled models is reflected by the fact that 6 out of 11 forecast systems contributing to the recently established S2S Prediction database (Vitart et al., 2012, 2016) include dynamical sea ice components. These dynamical models replace relatively crude schemes where the sea ice state is simply persisted from its initial state and/or relaxed toward climatological conditions. In fact, the S2S database constitutes an unprecedented opportunity for a thorough assessment of state-of-the-art operational predictions of Arctic sea ice on subseasonal time scales. Numerous reforecasts are available for each of the contributing systems, which is critical for making robust statements about the skill and the associated uncertainties. Furthermore, the forecasts cover the whole annual cycle, allowing to determine seasonal variations in skill. To our knowledge, this study represents the first assessment of these systems in the Arctic, showing that the field of subseasonal prediction of Arctic sea ice is in an early stage, but also highlighting that prospects for skillful predictions are bright.
2 Data
The ensemble forecasts analyzed here have been obtained from the database of the S2S Prediction project. Here we consider only those six systems that include a sea ice model coupled to an atmospheric and ocean model, thereby producing actual dynamical sea ice forecasts. The only exception is the older European Centre for Medium-Range Weather Forecasts (ECMWF) forecast system (ECMWF Pres.) where the sea ice state is persisted for the first 15 days of the forecast and then relaxed toward climatology. Archiving of real-time ensemble forecasts in the S2S database started in January 2015 only. However, corresponding reforecasts are available approximately for the previous two decades. The S2S forecast systems exhibit different forecast lengths, initialization frequencies, ensemble sizes, data assimilation methods, and model physics (Supporting Information S1, Table S1). Despite their differences, however, some forecast centers also share some of the same model components, typically the ocean or sea ice model, including the extreme case of UK Met Office (UKMO) and Korea Meteorological Administration (KMA) which share the same forecasting system altogether. Differences in ensemble size and initialization frequency exist between real-time forecasts and the corresponding reforecasts. The initialization strategy also varies among the systems: some feature a balanced assimilation among sea ice, ocean, and atmospheric components (EMCWF, UKMO, and KMA, National Centers for Environmental Prediction [NCEP]), in contrast Météo-France (MF) and China Meteorological Administration (CMA) adopt a two tier initialization strategy. To ensure a sufficiently large sample size, while allowing comparability between the systems, our analysis is focused on the common reforecast period 1999–2010. The sea ice concentration fields from the S2S database are provided on a 1.5° × 1.5° longitude-latitude grid, although the sea ice models run are at higher resolution (from 0.25° to 1°).
The verification is carried out against daily sea ice concentration data from passive microwave (PMW) satellite measurements. As for the forecast data, we use the 15% sea ice concentration contour to determine the location of the ice edge. The main observational product used here is the Global sea ice Concentration data record (OSI-SAF, 2016). Discrepancies between true and observed ice edge locations are mainly caused by the summer melting over sea ice and snow. These are interpreted as open water by PMW sensors (Kwok, 2002; Notz, 2014) and cause a northward shift of the ice edge (Comiso & Nishio, 2008). However, since most of the forecast centers also assimilate PMW measurements, we expect this systematic error to be propagated also to the forecasts and to have a limited impact on our analysis.
3 Methods
We apply the recently introduced Spatial Probability Score (SPS; Goessling & Jung, 2018) as verification metric, which can be regarded as the extension of the Integrated Ice Edge Error (IIEE; Goessling et al., 2016) to probabilistic ice edge forecasts. These metrics are specifically designed to capture the accuracy of the forecasted ice edge and to overcome the limitations of more widely used metrics such as the difference in pan-Arctic sea ice extent or area. The latter only evaluate the total extent of the ice cover, but fail to provide useful information about its spatial distribution. In contrast, the SPS and the IIEE account not only for differences in total sea ice extent but also for ice that is forecast at a wrong location.
The decomposition of the IIEE for the ensemble-median ice edge into Overestimation (O) and Underestimation (U) or, alternatively, Absolute Extent Error (AEE) and Misplacement Error (ME; Goessling et al., 2016) adds information to the SPS and provides insights into the origin of forecast errors. O is the spatial integral of all areas where the forecast sea ice concentration is above 15% but the observed sea ice concentration is below 15%; U is the spatial integral of all areas where the forecast sea ice concentration is below 15% but the observed sea ice concentration is above 15%. The AEE component represents the total difference in sea ice extent between forecast and observation, while the ME component accounts for sea ice that is forecast at a wrong location. A more extensive description of the verification metrics can be found in the Text S1.
The computation of verification scores is conducted on a per-grid cell basis. Therefore, it is necessary to remap either the forecast data or the observations (or both) to a common grid and to investigate the impact of the forecasts and observation resolution on our results. In the analysis, the observational data were remapped by first-order conservative remapping to the relatively coarse-resolution forecast data. Further details on the role of resolution in observations and forecasts can be found in Text S2. Only grid cells that are classified as ocean (including sea ice) in all models and in the observations were used (see the resulting land-mask in Figure S4). Employing a common conservative land-mask guarantees an unbiased comparison of the skill of different forecast systems.
A meaningful assessment of the forecast skill requires the introduction of observation-based benchmarks based on the same metric employed for measuring the forecast error. If the forecast error is lower than that of a benchmark, the dynamical forecasting system has some predictive skill. Otherwise, the observational record can be used to build a better forecast. We have followed two strategies to construct a meaningful benchmark. First, we defined a climatological benchmark forecast as the 10-member ensemble of states observed at the same time of the year during those 10 years preceding the respective forecast target time. Second, we defined a persistence benchmark based on the observed sea ice conditions one month before the forecast target time (Blanchard-Wrigglesworth et al., 2010). The climatological benchmark is more restrictive than the persistence benchmark for most of the year (see Text S3 and Figure S1) and is therefore used to assess the skills of the S2S systems.
4 Results
4.1 Annual-Mean Sea Ice Forecast Skill
The annual-mean skill of different forecasts in predicting the Arctic sea ice edge can be inferred from Figure 1. The most striking feature is that the forecast skill varies substantially across the different systems. Compared to the climatological benchmark, the CMA and MF systems do not show any predictive skill, even at initialization time. On the other hand, the ECMWF system shows predictive skill all the way to a lead time of 45 days. The other systems (KMA, NCEP, and UKMO) are comparable to ECMWF for short lead times; the error growth is larger, however, leading to a faster loss of predictive skill.

The wide range of error growth rates among the different models is in stark contrast to what can be found for predictions of atmospheric fields, which are much more similar in terms of skill (Jung & Matsueda, 2016). This highlights the fact that the field of sea ice prediction with weather and climate models is still in its infancy.
Although the skill of ECMWF, KMA, NCEP, and UKMO at initial time is much better than that of MF and CMA, initial errors are still quite large (half the values of the climatological benchmark; Figure 1). Given that, based on satellite data, the sea ice conditions should be reasonably well known at the time of the initialization, the large initial errors suggest that there is still substantial scope for improving the data assimilation procedure and thereby the prediction skill of subseasonal forecast systems.
The skill of the UKMO and KMA systems is almost identical (Figure 1) because of the same system shared. However, given that they represent independent forecast realizations (ensemble members) of the chaotic climate system, their agreement demonstrates that the data available in the S2S database allow to draw robust conclusions about the skill of sea ice forecasts. Furthermore, noting that UKMO ensemble size is larger than KMA (Table S1), the slightly higher skill of UKMO compared to KMA suggests that ensemble size matters to improve sea ice edge predictions.
4.2 Seasonal Variations in Forecast Skill and Origins of Error
The results discussed so far were based on annually averaged values. However, since high latitudes experience very different physical conditions at different times of the year, it appears likely that the predictability of Arctic sea ice is seasonally dependent. In this section, this seasonality will be further explored.
Despite the specific biases affecting each system, a general feature of the SPS, including the climatological benchmark, is a pronounced seasonal cycle with two peaks at the end of the winter and summer seasons (Figure 2). This pattern can be explained by a corresponding seasonality of the ice edge length, which reaches its maxima in late winter and in summer. In general, a longer edge simply implies on average a larger area where forecast and observations can disagree.

The ECMWF system achieves the largest skill in late summer, when actual predictions remain for all the lead times much better than climatological forecasts, which exhibits particularly low skill in this period (Figure 2, top left). A possible explanation for this is that around September the uncertainty in the ice edge location is the largest due to higher mobility of the ice. However, the ECMWF forecast system is able to capture a relatively large fraction of that variability and therefore the forecast error is not larger around September than at other times of the year. Lower relative skill is found from October through July; during this time of the year only short-term forecasts out to ∼18 days achieve meaningful skill compared to the climatological benchmark.
The error components provide further insights into the performance of the ECMWF forecast system. An evident feature is a peak in SPS in July for short lead times (Initial, Day 8 and Day 18; Figure 2, ECMWF). This reflects a less accurate initialization of the ice edge compared to the rest of the year. The O,U error decomposition (Figure S2) reveals that the peak is associated with a development of a substantial model bias: The initial position of the ice edge is systematically underestimated (O ≈ 0% and U ≈ 100%) from July to October.
Interestingly, the forecasts less accurately initialized in July produce comparably skillful long-range (Day 45) predictions for late summer, with an approximate balance between O and U (O ≈ 40% and U ≈ 60%, Figure S2) and the ME dominating over the AEE (ME ≈ 70% and AEE ≈ 30%, Figure S3). A possible reason for this apparent contradiction is that the skill in late September, which marks the beginning of the freezing season, is related to sources of predictability residing in components of the climate system other than the sea ice. For example, the heat content stored in the surface ocean could influence the sea ice edge position in the early freezing season (Blanchard-Wrigglesworth et al., 2010; Text S3). The underestimation of the initial ice edge in the ECMWF system continues until late September, affecting the forecasts at longer lead times in October. The striking transition at the beginning of the freezing season, when the underestimation and the AEE components start to dominate, hints at a delayed onset of the ice growth season in the ECMWF system.
A similar seasonal cycle as for ECMWF can be found for UKMO, KMA, and NCEP, at least for forecasts out to 8–18 days, which show still some skill. For longer lead times (beyond Day 18), UKMO and KMA show a rapid error growth in August and September. The decomposition of the forecast error reveals that this deterioration of skill is associated with the development of a substantial model bias that is reflected by an underestimation of the integrated Arctic sea ice extent (O ≈ 10% and U ≈ 90%, Figure S2, KMA and UKMO). The NCEP system exhibits notable differences in how the initially similar imbalances evolve with lead time (Figure S2, NCEP). In particular, the dominance of overestimation in January and February increases, and an initially balanced state in August and September turns overestimation-dominated with lead time, pointing to positive model biases for sea ice extent during these months. In contrast, a rapid transition from overestimation-dominated to underestimation-dominated errors around the end of September hints at a delayed onset of the ice growth season in the model, similar as in the ECMWF system.
The CMA system, which is outperformed by the climatological benchmark for all lead times and times of the year, exhibits particularly large errors from August to October (Figure 2, CMA). From July to September the skill decreases (i.e., the SPS increases) with lead time, implying that very large initial errors during this part of the year are amended over the course of the forecast model integration toward a more realistic state. Furthermore, the CMA system considerably overestimates the Arctic sea ice extent from November to June, and underestimates the extent even more strongly from July to October (Figure S2, CMA). Moreover, the CMA system features a series of negative SPS spikes in spring; the cause of these can be tracked down to a single forecast bust associated with an erroneous initialization on 25 March 2007.
The MF system is approximately as skillful as the climatological benchmark from October to April, with only a weak dependence on lead time (Figure 2, MF). During the melting season from May to September, however, the MF system is less skillful and exhibits large initial errors that are slightly amended with growing lead time. Errors in long-term prediction in September are dominated by an underestimation of the pan-Arctic sea ice cover, whereas biases play a minor role in the MF system at other times of the year. This suggests that a more accurate initialization of the MF system might already be sufficient to improve ice edge forecasts of this system considerably.
4.3 The Benefit of Using a More Realistic Representation of Sea Ice and Ocean
ECMWF updated its operational forecast system in November 2016. Until then, sea ice conditions were determined based on the persistence of the initial conditions for the first 15 forecast days, followed by a relaxation toward average sea ice conditions observed during the 5 years preceding the forecast target time (ECMWF Pres.). The change to a more advanced approach, in which sea ice dynamics and thermodynamics are explicitly represented by a sea ice model, provides a unique opportunity to study the impact of this critical development of the forecast system. Note that the system update also included an increase of the ocean model resolution from 1° to 0.25°. For our assessment, we exploit the fact that reforecasts for 1999–2010 are available for both versions of the ECMWF system. Figure S4 illustrates recent forecasts from the two ECMWF system versions in comparison with the observed sea ice edge derived from different PMW products (OSI-SAF, 2016; Spreen et al., 2008).
The accuracy of the ice edge location in the initial conditions is similar for the two versions of the ECMWF system; with increasing lead time, however, the version with explicit sea ice physics included quickly outperforms the older version with simple sea ice treatment (Figures 1 and 2). This highlights that investments in forecast system development can lead to major advances in predictive skill.
Not surprisingly, using persistence, even for short lead times, leads to an overestimation of sea ice during the melting season from April to August and an underestimation during the growing season from October to February (Figure S2, ECMWF Pres., dark and light blue lines). Around Day 18 of the forecasts, the older version of the ECMWF system exhibits an intermittent increase in skill that is a result of the gradual transition from initial-state persistence toward average conditions of previous years (Figure 1). In fact, the temporary decrease of the SPS from Day 19 to Day 22 suggests that the older version could have benefited from an earlier transition toward climatological sea ice fields.
4.4 Case Study: The Summer of 2007
Some of our main results can be further illustrated by considering subseasonal sea ice forecasts for the exceptional summer of September 2007, which was the first in a series of summers with anomalously low Arctic sea ice extent. Not surprisingly, the climatological forecast clearly overestimates the ice extent in large parts of the Arctic (Figure 3). The ECMWF system clearly captures the observed sea ice edge in its 30-days forecast. The ECMWF ensemble spread appears reasonable, with probabilities transitioning smoothly from 0 to 1 along the observed ice edge. This indicates that the ensemble is reliable, that is, neither underdispersive nor overdispersive. In contrast, the NCEP forecast, although clearly more skillful than the climatology, is overconfident regarding the ice edge location, with probabilities transitioning sharply from 0 to 1 in disagreement with observed ice edge. The UKMO and KMA systems produce very similar forecasts, including a region at about 170°W where the amount of sea ice is strongly underestimated, also confirming the similarity of the systems. The CMA model is a clear outlier in the sense that initialization and model errors lead to the complete absence of Arctic sea ice during this time of the year. The MF forecast is characterized mostly by overestimation of the ice extent in the Siberian sector, combined with an underestimation along eastern Greenland. This misplacement suggests that the MF system does not capture the particularly high sea ice transport through Fram Strait which occurred in summer 2007. In this specific year, the persistence benchmark provides a better representation of the September ice edge than other empirical schemes based on the climatological sea ice state (ECMWF Pres. and the climatological benchmark forecast). This suggests that the use of the climatological benchmark has particularly pronounced drawbacks in unusual years such as 2007, which are more common in a rapidly changing climate.

5 Discussion
This paper provides the first overview of the subseasonal skill of state-of-the-art coupled forecast systems in predicting the sea ice edge in the Arctic. By exploiting the recently established S2S database, we find a surprisingly large range of skills with some of the systems showing no skill at all, even at short weather time scales, and the best system producing skillful forecasts up to 45 days in advance. The fact that prediction skill is largest in late summer suggests that useful long-range forecasts can be provided to stakeholders during a time of the year when marine operations peak.
Our analysis of error components has revealed that seasonally dependent model biases play a critical role. This calls for dedicated efforts to improve the realism of coupled models in the Arctic, with the ultimate aim of reducing systematic model errors. Bias correction could be a means to improve real-time forecasts. In fact, a method specifically designed to bias-correct ice edge forecasts has been recently proposed (Director et al., 2017), and the reforecasts needed for bias correction are available in the S2S database. However, the size of the biases in some of the models, which are comparable in size or even larger than the anomalies one would like to predict, suggests that nonlinearity may be an issue.
The large differences in the accuracy of the initial conditions for sea ice between the systems is related to how the forecasts are initialized, that is, the way observations are assimilated into the coupled models. A major difference between the CMA and MF systems and the other (more skillful) systems is that the former two systems do not directly assimilate any sea ice observations into their models, unlike the other systems that assimilate sea ice concentration. In principle, one could have expected to see some skill also for the CMA and MF systems because (i) they do assimilate other ocean variables that affect the sea ice, in particular sea surface temperature and (ii) the evolution of the atmosphere, which largely drives sea ice anomalies, is constrained through the assimilation of atmospheric observations. However, our results indicate that these aspects are not sufficient to generate realistic sea ice initial states and that direct assimilation of sea ice observations is required.
Even the systems with a more accurate initialization of sea ice (ECMWF, UKMO, KMA, and NCEP) exhibit considerable ice edge initial errors that amount to about half of the error of the climatological benchmark. This agrees well with the assessments of the Arctic sea ice cover in reanalyses by Chevallier et al. (2017) and Uotila et al. (2018) who found a substantial spread in the sea ice edge position between reanalyses, particularly in late summer. Several mechanisms could contribute to the initial error: one is that adjustments of sea ice concentrations based on other assimilated variables (in particular, sea surface temperature) to obtain more consistent states introduce inaccuracies in the ice edge location. Constraints related to delays in the availability of observational sea ice products might also contribute to the initial errors, although it is not obvious whether such constraints applying to real-time operations are also an issue for the reforecasts.
We conclude that the accuracy of sea ice initial states needs further research and will be critical to advance the field of Arctic sea ice forecasting on subseasonal time scales. While for short-range summer predictions (below 10 days) or subseasonal winter predictions, a correct initialization of the sea ice concentration field might be sufficient to achieve skillful forecasts of the ice edge, for longer timescales the role of the sea ice thickness initialization will be crucial, especially during the melting season. In this regard, new satellite observational products have the potential to improve sea ice initial conditions considerably. Of particular interest are, for example, sea ice thickness observations from multiple instruments, with a proven potential to help constrain sea ice initial states (Day et al., 2014; Mu et al., 2017).
The sea ice prediction is a central element of major international efforts such as the Polar Prediction Project along with its flagship activity, the Year of Polar Prediction (Jung et al., 2016), suggesting that there is an opportunity for resource mobilization and international coordination that promises imminent progress. These factors, and the already achieved progress documented by our analysis, indicate that the prospects for subseasonal prediction of Arctic sea ice are bright.
Acknowledgments
We are very grateful to the World Climate Research Program (WCRP) and to the World Weather Research Program (WWRP), to operational forecast centers and individuals that contribute to the S2S database, as we are grateful to all those involved in implementing and maintaining the database. We thank Steffen Tietsche for providing ECMWF forecast data on the native grid; and we thank him as well as Matthieu Chevallier and Ed Blockley for very helpful discussions. We also acknowledge the OSI-SAF consortium, the University of Bremen, and the NSIDC for making their sea ice concentration products available. L.Z. and H.F.G. acknowledge the financial support by the Federal Ministry of Education and Research of Germany in the framework of SSIP (grant 01LN1701A). T.J. acknowledges the funding from the European Union's Horizon 2020 Research and Innovation program project APPLICATE (grant 727862). All data analyzed here are openly available. The S2S forecasts database is hosted at ECMWF and at CMA; the data can be retrieved from the ECMWF data portal at http://apps.ecmwf.int/datasets/data/s2s/levtype=sfc/type=cf/. The OSI-SAF sea ice concentration product can be retrieved from the MET Norway FTP server at ftp://osisaf.met.no/reprocessed/ice/conc/v1p2/.





