Impact of Ocean and Sea Ice Initialisation On Seasonal Prediction Skill in the Arctic

There is a growing demand for skillful prediction systems in the Arctic. Using the Norwegian Climate Prediction Model that combines the fully coupled Norwegian Earth System Model and the Ensemble Kalman filter, we present a system that performs both, weakly coupled data assimilation (wCDA) when assimilating ocean hydrography (by updating the ocean alone) and strongly coupled data assimilation when assimilating sea ice concentration (SIC) (by jointly updating the sea ice and ocean). We assess the seasonal prediction skill of this version of the Norwegian Climate Prediction Model, the first climate prediction system using strongly coupled data assimilation, by performing retrospective predictions (hindcasts) for the period 1985 to 2010. To better understand the origins of the prediction skill of Arctic sea ice, we compare this version with a version that solely performs wCDA of ocean hydrography. The reanalysis that assimilates just ocean data exhibits skillful hydrography in the upper Arctic Ocean and features an improved sea ice state, such as improved summer SIC in the Barents Sea, or reduced biases in sea ice thickness. Skillful prediction of SIE up to 10–12 lead months are only found during winter in regions of a relatively deep ocean mixed layer outside the Arctic basin. Additional DA of SIC data notably further corrects the initial seaice state, confirming the applicability of the results of Kimmritz et al. (2018) in a historical setting. The resulting prediction skill of SIE is widely enhanced compared to predictions initialized through wCDA of only ocean data. Particularly high skill is found for July‐initialized autumn SIE predictions.


Introduction
During recent decades, surface air temperatures in the Arctic increased at least twice as much as on the rest of the globe (Cohen et al., 2014). Prolonged melting seasons go hand in hand with a transition of thicker perennial to thinner seasonal ice and a strong decline in September sea ice extent (Kwok & Rothrock, 2009;Maslanik et al., 2011;Stroeve et al., 2014). These and related changes are affecting the ecosystem, local communities, and economic activities in the Arctic, such as tourism, fisheries, shipping, and resource exploitation (Liu & Kronbak, 2010). On a global scale, changes in the Arctic have been linked to enhanced occurrences of extreme weather events in high to middle latitudes (Cohen et al., 2014;Tang et al., 2013;Zhou, 2017). Yet, an in-depth understanding of the involved mechanisms is lacking so far due to the sparsity of the observational network. Continuous and reliable reconstructions that are based on fusing these sparse observations into a dynamically consistent model could facilitate the study of climate mechanisms. Such historical reanalyses furthermore provide initial conditions for skillful dynamical predictions to better mitigate harmful consequences and benefit from the changing Arctic. In retrospective prediction experiments we assess the impact of our assimilation approaches on the seasonal prediction skill in the Arctic. Retrospective predictions assume perfect historical external forcing, as standard when assessing the impact of assimilation approaches on the seasonal prediction skill (Bushuk et al., 2017;Wang et al., 2019). The difference between historical simulations and retrospective predictions provide an estimate of the skill that can be gained with the same system in real predictions. Our main focus lies on sea ice extent (SIE, area of ice coverage excluding regions with sea ice concentration (SIC) ≤15%) utilizing the good coverage of observed SICs, sea surface temperatures (SSTs), and ocean hydrographic profiles.
Dynamical prediction systems for the Arctic have only recently become available. While perfect model studies (i.e., the model is unbiased) suggest skill for Arctic sea ice up to three lead years (Bushuk et al., 2019;Day et al., 2014;Tietsche et al., 2014), these systems have up to now demonstrated skill up to only a few months ahead in a realistic setup (Blanchard-Wrigglesworth et al., 2010;Bushuk et al., 2017). For instance, utilizing SIC observations, Wang et al. (2013) found skill for Arctic SIE anomalies for up to three lead months with largest skill in late summer. Initializing with reanalyses for the ocean and the atmosphere, and with reconstructions for sea ice, Krikken et al. (2016) skillfully predicted winter SIE in the North Atlantic sector up to six lead months. Notably longer skill of up to 9-11 lead months was found in Bushuk et al. (2017), utilizing observations from ocean hydrography and the atmosphere. Along the Arctic shelf seas, the authors skillfully predicted summer SIE up to one-to-four lead months. In each of these systems data were assimilated in a weakly coupled manner (Laloyaux et al., 2016); that is, data assimilation (DA) is applied to each model component independently, and the information of this update is propagated to the other model components only during the model integration as (thermodynamical) dynamical responses. The consistency of the initial conditions is expected to be improved with strongly coupled DA (Laloyaux et al., 2016), where information from the observations is instantaneously propagated across several model components using error crosscovariances between different model variables (Penny & Hamill, 2017). However, as the spatial and temporal scales of the atmosphere, the ocean and the sea ice are widely separated, a joint update of these Earth system components is highly demanding. Considering only the ocean and sea ice with their more comparable scales is a more feasible yet still challenging approach. The potential of such an initialization for climate prediction is poorly analyzed.
One of the challenges lies in the strong and flow-dependent coupling between the sea ice and the upper ocean, partly characterized by nonstationary and anisotropic crosscovariances (Lisaeter et al., 2003;Sakov et al., 2012). Neglecting these relations has the potential to introduce large errors into the updated model variables via spurious crosscorrelations and to eventually propagate these errors to the entire system (Kimmritz et al., 2018). On the contrary, the use of a flow-dependent assimilation method, such as the Ensemble Kalman Filter (Evensen, 2003), to jointly update the sea ice and the upper ocean can lead to noticeable improvements. The multivariate nature of flow-dependent assimilation schemes could also positively affect unobserved quantities. For instance, assimilating SIC has shown to reduce biases of sea ice thickness (SIT) or to improve SST (Kimmritz et al., 2018;Massonnet et al., 2015;Sakov et al., 2012;Xie et al., 2016). Furthermore, the valuable transition from single-category to multicategory seaice models (Hunke & Lipscomb, 2008;Vancoppenolle et al., 2009) should go hand in hand with an adaptation of the applied assimilation method. Thus, the benefits of the improved model can be transferred to the reanalysis product (Massonnet et al., 2015). However, it has to be taken into account that the issue of non-Gaussian distributed variables (such as SIC or SIT) is exacerbated in multicategory seaice models compared to single-category ones. For such variables, the linear analysis update used in the EnKF would return nonphysical values. Post-processing to maintain physical ranges likely introduces biases into the system. Considering these challenges, Kimmritz et al. (2018) developed a robust and efficient scheme to assimilate SIC strongly coupled with the ocean within a fully coupled Earth system model. The scheme was tested in a controlled (perfect model) environment to identify and minimize drifts in the performance. We show here that the results of the optimal version of Kimmritz et al. (2018) can be reproduced in a historical (real world) setup. Using additional ocean hydrography observations, this is the first climate prediction system that applies strongly coupled DA with a fully coupled Earth system model. This system yields skillful sea ice retrospective predictions, particularly for July-initialized predictions of September minimum SIE and in the North Atlantic sector.

10.1029/2019MS001825
Knowledge of the physical mechanisms determining this skill can help to further enhance the prediction system, for example, by guiding the optimization of the assimilation setup or the choice of the assimilated observations. The ocean provides a crucial source of sea ice predictability with longer potential predictability of SIE on the Atlantic than on the Pacific side . In the Nordic Seas, the dominant part of SIE variability has been explained by anomalies of the ocean heat transport through the Barents Sea Opening (Smedsrud et al., 2013;Onarheim et al., 2015;Li et al., 2017;Årthun et al., 2019). With the heat capacity of a deep mixed layer, summer SST anomalies as well as spring SST anomalies stored underneath the summer mixed layer and reemerging in late fall form a source of predictability for SIE anomalies in this region (Blanchard-Wrigglesworth et al., 2011;Bushuk et al., 2017;Ordonez et al., 2018).
Within the Arctic basin, anomalies in SST play only a subordinate role for sea ice predictability (see e.g. Timmermans, 2015, for the Beaufort Sea), and SIT anomalies during ice growth serve as a crucial predictor for SIC in the next melting season (Bushuk et al., 2017;Chevallier & Salas-Melia, 2014;Ordonez et al., 2018). However, this relationship weakens in the course of a thinning Arctic sea ice cover (Msadek et al., 2014). Within the Arctic basin, SIC and SIT are regulated by the ocean circulation, such as the Beaufort Gyre and the Transpolar drift (Davis et al., 2014), which in turn are determined by the atmosphere. Particularly, winds, which rapidly lose their predictability, impact the Arctic circulation (Kwok et al., 2013;Proshutinsky et al., 2009;Tietsche et al., 2014;Venegas & Mysak, 2000). In the North Pacific and the North Atlantic, northward transport of moist air (Woods & Caballero, 2016) and strong near-surface winds (Sorteberg & Kvingedal, 2006) significantly influence the sea ice. In particular, cyclones strongly impact the sea ice export through Fram Strait (Smedsrud et al., 2011). Spurious representations of these atmospheric features limit the predictability in the Arctic.
Despite this variety of findings, a lot of mechanisms are yet poorly understood. We will focus on two questions. First, anomalies in the ocean hydrography provide a source of predictability of sea ice mainly in the North Atlantic sector. But the impact of assimilating ocean hydrography anomalies on the Arctic circulation and thus the sea ice state within the Arctic basin is a so far hardly addressed question. To fill this gap, we study a version of a fully coupled Earth system model, which has been initialized using anomalies of the ocean surface and subsurface observations. Second, despite the crucial role of SIT for the prediction skill for SIE in the Arctic basin, the observational network of SIT has been very sparse until only rather recently. Assimilating the comparably wellobserved SIC has shown some assets in reducing biases in modeled SIT. We will address the question to what extent these improvements enhance the prediction skill of SIE. In the course of our discussions, we will also touch on the issue of assimilating anomalies or the entire variable in a fully coupled Earth system model. This paper is organized as follows. In section 2 we introduce the experimental design. Section 3 is dedicated to the study of the upper Arctic Ocean and the Arctic sea ice state of the reanalysis products, where in one case we perform weakly coupled, anomaly DA of ocean hydrography and SST observations, and in a second case we additionally assimilate observed SIC anomalies strongly coupled with the upper ocean. In section 4 we assess the seasonal prediction skill for Arctic SIE based on the initialization described in section 3. We finish with a summary and discussion.

The Norwegian Climate Prediction Model
The Norwegian Climate Prediction Model (Counillon et al., 2014;Counillon et al., 2016;Kimmritz et al., 2018;Wang et al., 2016;Wang et al., 2017) combines the fully coupled Norwegian Earth system model  and a version of the Ensemble Kalman Filter (Evensen, 2003), an advanced DA technique.
NorESM is based on the Community Earth System Model version 1.0.3 (Vertenstein et al., 2012). It consists of an updated version  of the isopycnal coordinate ocean model MICOM (Bleck et al., 1992), the Los Alamos seaice model (Gent et al., 2011;Holland et al., 2012), the Community Land Model Version 4 (Lawrence et al., 2011;Oleson et al., 2010), a modified version of the Community Atmosphere Model Version 4 (Neale et al., 2010), and the Version 7 coupler (Craig et al., 2011). The ocean model is equipped with a bulk upper ocean mixed layer consisting of two density evolving layers. The 51 isopycnal layers underneath enable a minimized diapycnal vertical mixing and ensure optimized conservation of water mass properties. Potential densities, referenced to 2,000 dbar, are defined in the range 1,028.202-1,037.800 kg m −3 and thus fit best to characteristic water masses. Released salt due to freezing is evenly distributed below the upper ocean mixed layer down to the layer with a density deviating up to 0.4 kg m −3 from the ocean surface density with a maximal depth of 500 m . The seaice model formulates the internal sea ice stresses via the elastic-viscous-plastic sea ice rheology (Hunke & Dukowicz, 1997) and distinguishes between N different ice thickness categories (here N=5, Bitz et al., 2001) enabling an improved representation of thermodynamic and dynamic processes in the sea ice. Melt ponds and aerosols parameterizations are used as detailed in Holland et al. (2012). The delta-Eddington short-wave radiation transfer is implemented following Briegleb and Light (2007). NorESM is run with a latitude/longitude resolution of 1.9 • × 2.5 • for the atmosphere and the land, and a 1 • resolution for the ocean and the sea ice. This version was used in CMIP5 Tjiputra et al., 2013).
The EnKF consists of a sequential Monte Carlo integration followed by a linear analysis update. The method is multivariate, meaning that it propagates the information from the observed to the non-observed variables. Updates are flowdependent as they are estimated from the evolving ensemble of the model.
To prevent a filter collapse, we use a deterministic version of the EnKF (Sakov & Oke, 2008) instead of the standard stochastic EnKF in which observations are independently perturbed (Evensen, 2006). The DEnKF overestimates the analyzed error covariance by adding a semidefinite positive term to the theoretical error covariance given by the Kalman filter, which reduces the amount of inflation needed. This deterministic version outperforms the standard stochastic EnKF in particular for small ensemble size (Sakov & Oke, 2008).
To sustain the ensemble spread, we additionally apply the moderation and the prescreening method (Sakov et al., 2012). In the moderation technique, the observation error in the update of the error covariance is increased (here by a factor 2). The prescreening method aims to avoid an unbalanced analysis by increasing the observation error whenever the innovation is too large compared to the forecast error (here, larger than two standard deviations of the prior ensemble spread, kfactor=2). At the beginning of the reanalysis we inflate the observation error by a factor of 8 and reduce this gradually within 8 months ensuring a slow assimilation start to avoid possible initialization shocks. As the system dimension is high compared to the model ensemble size, we use a local analysis framework with a smooth distance-weighted localization function (Gaspari & Cohn, 1999;Hamill et al., 2001). The localization radius for ocean variables varies with latitude and ranges between 800 km and 2,500 km as described in Wang et al. (2017). For the assimilation of SIC we use a fixed radius of 800 km as applied in Kimmritz et al. (2018) and in agreement with Massonnet et al. (2015).
Assimilating the observations as they are (full-field DA, FF-DA) and thus minimizing the initial model error can cause a drift during the model propagation phase due to model biases and thus deteriorate the prediction skill. To avoid such a drift, we apply anomaly DA (Anom-DA); that is we remove the climatologies from the model and the observations before assimilation and only update the anomalies from the monthly mean climatological state. For a detailed discussion on that topic we refer to Weber et al. (2015), Carrassi et al. (2014). When assimilating SST and SIC anomalies we only use the nearest observation, while all hydrographic profiles within the local area are retained. Assimilation within the ocean component is carried out in an isopycnal framework as described in Wang et al. (2017). Observation errors are assumed to be uncorrelated.

Design of Reanalyses and Retrospective Predictions
We have run two 30-member ensemble reanalysis as well as the retrospective prediction (hindcast) experiments that are branched from the first nine members of the reanalyses. As a reference, we use a historical 30-member ensemble simulation of NorESM, in which all model components evolved freely, henceforth denoted as FREE. The initial states of FREE and the reanalyses, V1 and V2, are taken from a NorESM ensemble run integrated from 1850 to 1980 with historical forcing, that itself is initialized with 10-year spaced states of a preindustrial control simulation. The external forcings used to derive the reanalyses and retrospective predictions comply with the CMIP5's historical experiment (see Bentsen et al., 2013, for details). To derive the reanalysis products, we assimilate monthly averaged observations in the middle of each month. Thus, increments are applied only once per month using the closest observations within a month's time frame. The output from FREE, V1 and V2 are monthly averages from 1980 to 2010.

Journal of Advances in Modeling Earth Systems
10.1029/2019MS001825 (Gouretski & Reseghetti, 2010) comprising profiles of global ocean temperature and salinity from 1900 until the present. The data set is divided into different quality categories. We only retain data of excellent quality (category 1). Data uncertainties are determined as in Wang et al. (2017). To derive anomalies, we use the climatology based on the EN4 objective analysis from 1980-2010. HadISST.2.1.0.0 SST is a monthly stochastic reconstruction from the Met Office Hadley Center (Titchner & Rayner, 2014), provided with global coverage on a regular 1 • grid from 1850 to 2010. We assimilate the ensemble mean of the ten realizations of the data set and use the ensemble spread as a space-time varying estimator of the uncertainty of the data. The climatology of this product used for the anomaly assimilation is calculated from the period 1980-2010. The V1 implementation follows the weakly coupled DA approach of Wang et al. (2017), meaning that the assimilation only updates the ocean component. Information of this update is propagated to the other model components only during the model integration phase between the assimilation steps.
V2 is a version of NorCPM in which, in addition to the aforementioned ocean observations, anomalies of the HadISST.2.1.0.0 SIC product are assimilated. This product is provided monthly on a 1 • grid from 1850 to 2010 covering both hemispheres (Titchner & Rayner, 2014). Observation uncertainties are not provided with the SIC product, and we use a constant value of 20%, which is in agreement with the large summer uncertainties as found in Mathiot et al. (2012). As before, the climatology is derived from the period 1980-2010. The assimilation in V2 is realized in two successive steps: (1) weaklycoupled Anom-DA of only hydrographic data following Wang et al. (2017) and (2) strongly-coupled Anom-DA of SST and SIC updating the ice component and the ocean mixed layer. The assimilation of SIC and post-processing follow Kimmritz et al. (2018).
For initializing the retrospective predictions we used the states of the first nine ensemble members of FREE, V1, and V2. This choice has no implications for our conclusions, as all members are equally likely. Seasonal retrospective predictions are initialized in the middle of January, April, July, and October from 1985 to 2010 and are run for 13 months. Note that as the reanalyses have been derived by assimilating in the middle of each month using observational data until the end of that month, lead month 1 of a retrospective prediction is the following month.

Validation Metrics
Our analysis is based on ensemble averages of the monthly averaged output. As common with anomaly assimilation studies, we neglect the climatology of a variable and consider the anomaly from that state. The (seasonal) climatology used for all systems is the mean of FREE for each month calculated over the period 1980-2010. The climatological bias is defined as the difference in climatologies between the model and the observations. Given the decomposition of a (climatologically) bias-corrected variable v into a linear trend, v ′ , and detrended variability,ṽ, the decomposition indicates which part of v is attributed to the linear trend (first summand) and which part to the detrended variability (second summand); corr denotes the Pearson's correlation coefficient and v standard deviation of v.
To assess the degree to which the detrended variability of the reanalysis is synchronized with the one observed, we use correlations between detrended variables of the model and the observations. Statistical significance is tested with a Student's ttest at the significance level of 5%. The degrees of freedom are estimated from the auto-correlation as described in von Storch and Zwiers (1999). The combined effect of phase and amplitude errors averaged over time or space will be studied using with time average · and areaweighted spatial average · Ω . The area weight is set to zero in cells where observations are absent for the whole study period, x is the model variable, and y the corresponding observed variable.

The Ocean State
On interannual and shorter time scales, sea ice is crucially determined by the upper ocean (Timmermans, 2015;Årthun et al., 2017), which renders a skillfully initialized upper ocean a prerequisite for skillful seasonal prediction of sea ice. We assess the skill of the upper ocean in the reanalyses by using the objective analysis EN4 (Good et al., 2013). While objective analyses minimize the difference with the observations, our reanalyses made use of the same raw observations to produce a dynamically consistent reconstruction. As a consequence, reanalyses tend to differ more from the observations than the EN4 product (Murphy, 1993). An agreement between the EN4 and a NorCPM product indicates that the dynamical solution determined by the EnKF matches the EN4.2.1 data set with minimal error. However, the quality of the EN4 analysis is questionable in the Arctic as observations are sparse in this area. This should be kept in mind in the following.
The upper 200 m heat (HC200, Figure 1a) and the upper 200 m salinity (SC200, Figure 1b) content of FREE correlate very poorly with the EN4 objective analysis. In FREE the internal variability is removed by the ensemble averaging of the unsynchronized members and the ensemble mean mostly depicts the response to the external forcing (e.g., the solar cycle or volcanic eruptions). Thus, whenever FREE has no significant detrended correlation skill, we can assume that the linear trend is a good firstorder approximation of the model response to the external forcing and it is reasonable to assume that the detrended variability is a good approximation of the internal variability.
In contrast to FREE, Anom-DA of SST and hydrographic profiles (V1) results in a well-represented detrended variability of the upper ocean in the North Pacific and North Atlantic sector, and the regions influenced by the Atlantic water inflow. Additionally, the skill in SC200 (Figure 1e) is noticeably larger in the Amerasian basin compared to FREE (Figure 1b), which may reflect changes in the circulation of the ocean and the atmosphere (Proshutinsky et al., 2009). Note that the skill in SC200 is larger here compared to HC200, as V1 captures well the detrended variability of SC200 in EN4, in particular, two strong negative anomalies in 1992 and 2008. On the contrary, HC200 in EN4 does not have such outliers and the detrended variability of EN4 is poorly represented in V1 (not shown). Indications for a changed ocean circulation are the slightly decreased SSH in V1 (Figure 1f) compared to FREE (Figure 1c), a higher climatological freshwater content in the upper Siberian shelf seas (not shown) and changes in the water mass transport: In contrast to FREE, the exports through Fram Strait and the Canadian Archipelago in V1 exhibit a multiannual variability (Figures 1j  and 1k). As well, while the trend in the export through Fram Strait has been decreasing in FREE (from 1.71 mt/s [in 1980] to 1.6 mt/s [in 2010]), V1 has an increasing trend (from 1.43 mt/s to 1.65 mt/s). On the contrary, the export through the Canadian Archipelago has been increasing in FREE (from 1.8mt/s to 1.91mt/s), while we observe a decrease in V1 (from 1.8mt/s to 1.71mt/s). These changes in the ocean are accompanied by adaptations of the sea level pressure and the surface air temperature (not shown). A study of these changes goes beyond the scope of this paper. Yet, further analysis would be required to confirm the mentioned changes in the general circulation and to better understand the underlying mechanisms.
In V2, Anom-DA of SIC in addition to SST and hydrographic profiles only slightly affects the skill in the hydrography of the upper ocean. We suspect that the degradation of the skill in the chaotic Gulf stream region (see Figures 1g and 1h) arises from updating only in the mixed layer when assimilating SST anomalies in V2 (unlike in V1 where SST anomalies impact the entire water column). For an assessment of the isolated impact of the DA of SIC on the upper ocean we refer to Kimmritz et al. (2018).

The Sea Ice State
In the Arctic basin, skillful SIT in the ice growth season transfers to the prediction skill of summer SIE (Blanchard-Wrigglesworth et al., 2010;Bushuk et al., 2017;Guemas et al., 2016;Ordonez et al., 2018). To assess the skill of SIT in the reanalyses we compare the model results with the independent ICESat product ( Figure 2a). Data are only available between October 2003 and March 2008 and have poor accuracy near the ice edge, see, e.g., Zygmuntowska et al. (2014).
As also documented in Bentsen et al. (2013), we find a strong bias in FREE of more than 1 m thicker ice in the Arctic basin (Figure 2b), which was reported to be caused by a too weak summer snowmelt and a consequently delayed beginning of the melting. Anom-DA of ocean hydrography (V1) reduces this positive bias in SIT by about 30 cm on average (Figure 2c). Changes in the Arctic SIT have been linked dynamically to the Arctic oscillation, the Beaufort Gyre and the Transpolar Drift, and thermodynamically to the oceanic heat content and associated ocean-ice and ocean-atmosphere heat fluxes (Rigor et al., 2002;Lei et al., 2018;Årthun et al., 2019). Recall, that in section 3.1 we found indications of altered dynamic and thermodynamic states in the ocean and the atmosphere as a result of assimilating anomalies of ocean hydrography. In an experiment similar to V1 but without updating the ocean in the icecovered area, the resulting bias in SIT was very similar to that of FREE (not shown). Thus, we suspect the changes in the climatology of SIT to originate from constraining the ocean under the sea ice. Further, the slightly stronger decline in sea ice mass export through Fram and Davis Strait found in V1 compared to FREE (Figures 2e and 2f) and the increased water mass export (see section 3.1) points to the relevance of thermodynamic processes in the reduction of the bias in SIT. To better understand the underlying mechanisms, further study would be required. Additional Anom-DA of SIC (V2) reduces the bias in SIT farther with the most pronounced reduction (up to 1 m) in the first year ice region of the Arctic shelf seas. In an idealized setup, Kimmritz et al. (2018) have shown that updating the multicategory sea ice states during the assimilation efficiently improves the variability and the mean-state of SIT. The particularly strong improvement for firstyear ice is explained by the high correlation between SIT and SIC for ice in the thinnest ice category (Massonnet et al., 2015). Here, we verify that the SIT state in firstyear ice can be improved by the same mechanisms also in a real framework.
To assess the skill in SIC, we compare our model results against the HadISST SIC product, which has been assimilated in the V2 system. FREE has a (climatological) positive bias (too high SIC values) in the first year ice regions, except for the Labrador and the Bering Seas (Figure 3b), where the bias is negative (too low SIC values). Similar (inverse) patterns found for the biases in SST  relate the biases in SIC to biases in the upper ocean heat budget. Yet, atmospheric biases in NorESM, such as in the sea level pressure field over the Nordic Seas and Fram Strait (see Pontoppidan et al., 2018), may also contribute to the biases in SIC. Anom-DA of ocean hydrography (V1) and additional SIC (V2) keeps the climatological bias in SIC unchanged (not shown). While this is expected for Anom-DA of SIC (V2), it is not as straightforward for Anom-DA of ocean hydrography, particularly against the backdrop of a modified bias in SIT. This robustness in the climatology of SIC indicates a linear relationship between anomalies in SIC and anomalies in the upper ocean heat budget (Bushuk et al., 2017). Differences in the time-evolving rmse t of (seasonally biascorrected) Arctic SIC (see Figure 3a and equation (2)) between FREE (12.07% on average) and V1 (11.83% on average) are only minor. Bulk contributions of these errors are located in the first year ice regions <see also> (Kimmritz et al., 2018), indicating a poorly represented detrended variability of SIC in the summer season in both products. In fact, despite an improved JJA skill in V1 in the East Siberian, the Chukchi and the Barents Seas compared to FREE (see Figures 3f and 3g), the JJA correlation skill in SIC for both, FREE and V1, are generally poor. Additional Anom-DA of SIC (V2) reduces the rmse in SIC(9.03% on average) and also enhances the summer correlation skill in SIC (Figure 3h) in large areas in the Arctic except for the multiyear ice region in the Central Arctic, where SIC is saturated. These improvements are expected as we assimilate SIC anomalies. The correlation skill in winter SIC (Figures 3c-3e) is high for all systems in wide areas of the Arctic basin. It might indicate that the external forcing determines the skill, for instance by a global warming signal. In the Central Arctic close to the pole correlations are not significant as the detrended variability is very low here.
The so far studied detrended variability of the sea ice cover only explains a part of its entire variability. Another important contribution is the trend, which is usually excluded in the study of seasonal predictions of SIE, as it is presumed to be known. Here, we consider the linear trend in SIE over the period 1980-2010. In the North Atlantic, the observed trend in SIE has been reported to stronger contribute to the observed variability in SIE than in the North Pacific (Divine & Dick, 2006;Martin et al., 1998). Our bar plots in Figure 4 confirm these findings (compare Regions 6-8, 13, and 14 with regions 9 and 10), where the fraction of the trend is indicated by the gray bars (following equation (1)), while the blue bars represent the fraction of the detrended variability of SIE. Also, we find a strong seasonality of this fraction in the shelf seas of the Arctic basin. While in spring, the variability is almost entirely defined by the detrended variability, the declining summer SIE trend explains a large part of the variability in these regions, compared to other seasons. In contrast to the Arctic basin, a strong declining trend in SIE can define only a small fraction of the variability in the North Atlantic sector, for example, the winter Labrador or the Greenland-Iceland-Norwegian (GIN) Seas, which might be linked to a strong detrended variability of the subpolar gyre. The capability of our reanalyses to skillfully represent the trend in observed SIE (black line in Figure 4, based on HadISST2 SIC) varies strongly with region, season and the reanalysis product itself (see Figure 4). In FREE, the trend in SIE is underestimated (red lines in Figure 4), likely due to the model biases in SIT and because part of the observed trend could reflect internal climate variability (Svendsen et al., 2018) that is not captured by FREE. Anom-DA of the ocean hydrography (V1, purple lines in Figure 4) improves the trend in the Arctic shelf seas (Regions 1-5) and in the Bering Sea (Region 9) particularly in the freezing season when SIC strongly correlates with SST (Bushuk et al., 2017;Timmermans & Krishfield, 2018). Additional Anom-DA of SIC (V2, magenta lines in Figure 4) corrects the trend in the melting season in the Arctic shelf seas. Limits in the skill of the trend are due to remaining biases in SIT and a largely freely evolving (chaotic) atmospheric state, which determines the melting process and transport of sea ice (Smedsrud et al., 2011;Woods & Caballero, 2016) and thus affects the sea ice coverage. The poor trend during spring and early summer in the seas close to the open Atlantic (Regions 6 and 7 in Figure 4) and Pacific (Regions 9 and10 in Figure 4) depicts the accumulated impact from biases in the upper ocean and the atmosphere.

Seasonal Retrospective Predictions
In this section, we assess the skill of the retrospective predictions initialized in January, April, July, and October from 1985 to 2010 with focus on the Arctic. The ensemble retrospective predictions are initialized with the first nine members of the reanalyses products discussed in section 3 and are run for 13 months using perfect historical forcing.

The Ocean State
On a seasonal scale, SST variability has been reported to strongly affect air-sea interaction and to be a key factor of climate predictability in the Arctic, particularly for SIE, (e.g., Blanchard-Wrigglesworth et al., 2010;Shukla, 1998;Timmermans & Krishfield, 2018). The large heat capacity of a thick mixed layer, as found in the North Atlantic and also (to a lesser degree) in the Pacific sector, enables winter SST anomalies to persist longer in the upper ocean compared to other Arctic regions. Persistence of skillful summer SST anomalies was found to transfer to persistent and skillful SIE anomalies in ice-covered regions close to the open Atlantic (Bushuk et al., 2017). In these regions, spring SST anomalies are stored underneath the shallower, yet insulating summer mixed layer. These anomalies provide a crucial source of SIE prediction skill in the upcoming freezing season when the mixed layer deepens and SST anomalies are entrained into the mixed layer (Alexander & Deser, 1995). Figure 5g depicts the climatological mixed layer of NorESM in the Arctic region, being deepest in the North Atlantic sector. Thus, we expect skill in SST to persist longer in the North Atlantic compared to other regions in the Arctic. The very thin mixed layer in the Arctic basin indicates a very short persistence of SST anomalies. The skill in SST due to external forcing is represented by the correlations in FREE after 6 and 12 lead months (Figures 5a and 5d). It is poor except for the eastern North Atlantic, where external forcing drives the (detrended) variability in the upper ocean, and likely also the trend. Assim-

Journal of Advances in Modeling Earth Systems
10.1029/2019MS001825 Figure 6. Rmse s of (detrended, bias-corrected) SIE for FREE for each of the regions as defined in Figure 4, normalized by the area of the region. A dark red (blue) box means that the rmse s reached 20% (0%) of the region.
ilating the ocean surface and subsurface hydrography (V1) strongly enhances the prediction skill in SST. In regions with a deep mixed layer of more than 200 m, such as the subpolar gyre region, the skill resides up to 12 months, while in the Barents Sea with its on average shallower mixed layer the good skill at lead month 6 is further reduced at lead month 12. In the (ice-free) region with moderately deep mixed layer south of the Bering Shelf, the skill in SST is moderate with correlation coefficients up to 0.4 at 6 lead months and weaker at 12 lead months (not shown). Beside the moderate depth of the mixed layer, we relate this lower skill to the atmosphere, which plays an important role for the surface ocean in the midlatitude North Pacific (Wu & Kinter, 2010). Additional assimilation of SIC (V2) modestly enhances the skill in SST in the Barents Sea and the Hudson Bay up to 12 lead months. We suspect that Anom-DA of SIC positively impacts the net surface heat flux, which has been reported to determine the winter SST anomalies (Alexander & Deser, 1995).

The Sea Ice State
Among the pivotal characteristics of the sea ice state, coverage, thickness and volume, only the ice coverage has been sufficiently observed during the study period to enable a thorough evaluation of its prediction skill. Focusing on SIE, we are interested in how well each seasonal retrospective prediction can reproduce the detrended variability of observed (HadISST2-derived) SIE anomalies dependent on the initialization month, the target month, and the respective lead time.
The rmse of the (bias-free and detrended) SIE anomalies in FREE in Figure 6 reflects the impact of the external forcing within NorESM on the detrended variability of SIE and thus does not change with lead month. The rmse is large in seasons of a strong decline in observed SIE (see Figure 4), such as in the Arctic shelf seas (Regions 1-5) and in the bays west of Greenland (Regions 13 and 14). Values in the GIN and in the Bering seas are comparably low because a part of the considered region is ice-free throughout the entire Figure 7. Differences between the rmse of the V1 and V2 initialized retrospective predictions (odd/even rows) and that of FREE. The range of the color bar is set to 25% of the maximum FREE rmse (see Figure 6) in each region. Blue (red) colors denote reduction (increase) of the rmses compared to FREE. The number in the lower right corner of each plot represents the percentage of the regional area that corresponds to 25% rmse reduction/increase (dark blue/red color) in that region.
period. Within the Arctic basin and in the bays west of Greenland, rmses are low when the region is (nearly) icecovered. Like the rmses, the detrended correlations between FREE and observations suggest no skill (not shown).
Anom-DA of ocean hydrography (V1) reduces the large rmses found in FREE (Figure 7). The general independence of lead months indicates that the added value is controlled by the ocean that has a comparably long memory. In the Barents and the Labrador Seas, the rmse of SIE (Figure 7, Regions 6 and 8) is reduced nearly all year round, while correlations ( Figure 8) are skillful only in winter and spring. Recall that SST anomalies in V1 showed skill up to 12 lead months in these regions (Figure 5e) due to a comparably deep mixed layer (Bitz et al., 2005;Årthun et al., 2017). The early lead month prediction skill in winter-to-spring SIE is related to a highly anticorrelated SIC and upper ocean heat budget, particularly strong in winter-to-spring (Bushuk et al., 2017). In summer, sea ice is usually little and weak and thus prone to errors in these regions.
In the Barents Sea, the correlation skill of the July-initialized retrospective prediction (Figure 8, Region 8) reemerges at lead month 6 (January) and remains skillful into spring (up to 8 months lead time, see also section 4.1). We relate this reemergence in the skill of SIE with the deepening of the mixed layer in autumn, which leads to a reemergence of the spring SST anomalies stored beneath the mixed layer during summer (Blanchard-Wrigglesworth et al., 2011;Bushuk et al., 2017). The rapid loss of the correlation skill in the Labrador Sea is presumably due to a too vigorous subpolar gyre and biases in the SST (see Bentsen et al., 2013;Bushuk et al., 2017, which is based on FF-DA of the ocean, among others). The GIN Seas (Figure 8, Region 7) are influenced by the atmospherically driven sea ice export through the Fram Strait (Koenigk et al., 2006;Tsukernik et al., 2010). The noninitialized atmosphere, as well as biases in SIT (see Section 3.2) explain the poor skill there. In the North Pacific, predominantly in the Sea of Okhotsk, the rmse is degraded in the freezing season ( Figure 6, Region 10). Here, skill in winter SIE has been linked to the upper ocean heat budget (Bushuk et al., 2017), and the formation of sea ice is dominantly atmospherically driven (Onarheim et al., 2018). V1 neither has a skillful upper ocean heat budget in this region (Figure 1e) nor has the atmosphere been initialized. In the Arctic shelf seas, the rmse is reduced in summer and autumn up to one year ahead ( Figure 6, upper row), which is likely linked to an improved representation of SIT (see also Bushuk et al., 2017, and Figure 2). In the Chukchi and the Kara Seas, V1 has on average 1-1.5 m thinner ice than the remaining Arctic shelf seas (not shown). This enables an improved response of the sea ice state to dynamical and thermodynamical processes in the upper ocean leading to a stronger error reduction in these regions ( Figure 6, Regions 1 and 4). However, within the Arctic basin (first row in Figure 8) assimilating ocean hydrography is skillful only in the Beaufort Sea. Here, the October-initialized retrospective prediction carries skill throughout the winter yielding skill in summer-to-fall up to 12 lead months. The reemerging skill appears to be a consequence of skillful initial near-surface summer temperatures in the Beaufort Sea where surface temperature anomalies stored beneath the surface impact the sea ice cover of later seasons (Timmermans, 2015). The high correlations found in the winter season in the Laptev and the East Siberian Seas and the Canadian Archipelago originate from reductions in winter sea ice cover, both in the observations and in the retrospective predictions. Note that these high correlation coefficients are not significant as a consequence of too few effective degrees of freedom in the time series. In the seas with no skill in winter, for example, in the Chukchi Sea, the retrospective predictions or the observations do not show detrended variability for the particular target and lead month.
V2 lacks this reemerging summer correlation skill found in V1 (see Row 2, Region 5 in Figure 8), likely because only the mixed layer is updated when assimilating observed SST (unlike in V1) and SIC (see also Kimmritz et al., 2018). However, the summer SIE retrospective predictions in V2 are strongly improved in all Arctic shelf seas due to smaller biases in SIT compared to V1 (Blanchard-Wrigglesworth et al., 2010;Bushuk et al., 2017;Holland et al., 2012;Massonnet et al., 2015). Rmses in V2 are reduced up to 25% in the Arctic shelf seas compared to FREE and correlations are strongly improved (Figure 7 and 8, Row 2) up to four lead months. In V2, the summer skill in the Kara Sea (region 1) is strong throughout all considered lead months. We suspect that the rather isolated location of this sea maintains the initially high skill throughout the winter. In the remainder of the Arctic shelf seas (Regions 2-5) the skill persists only until the ice reaches full coverage. Further south, in the Barents Sea, the correlation skill found in V1 is further enhanced and prolonged in V2, mostly in late winter to spring. We relate this improvement to a more consistent representation of the combined sea ice-and-upper ocean state in V2 compared to V1, where only weakly coupled Anom-DA of ocean hydrography has been performed. The skill in the GIN Seas is remarkably improved for all seasons. For instance, the rmse in V2 is reduced by up to 25% compared to FREE and correlations in V2

Summary and Discussion
In this study, we employed the approach to assimilate SIC in a fully coupled Earth system model as developed in the controlled perfect model experimental setup of Kimmritz et al. (2018). We showed that the results of that study can be successfully reproduced in a historical setup using real observations. In combination with a weakly coupled assimilation of ocean hydrography, this system is the first reported climate prediction system that applies joint updates of the ocean and sea ice with a fully coupled Earth system model. It is capable to provide skillful seasonal retrospective predictions in the Arctic, particularly for the challenging July-initialized prediction of SIE up to September (Liu et al., 2019). In the North Atlantic, the retrospective prediction shows skill up to 10-12 lead months in the winter season, while the skill is lower in the North Pacific.
To improve our understanding of the underlying mechanisms leading to this skill, we first studied a system that only used SSTs and ocean subsurface hydrography data in a weakly coupled manner. This system shows high initial skill in the upper ocean hydrography, particularly in the North Pacific and the North Atlantic, that transfers to skillful retrospective predictions of SST up to 12 lead months in regions of a comparably deep mixed layer: in the larger subpolar gyre region from the Labrador Sea to the Atlantic inflow, and along the Norwegian coast up to the Barents Sea. In these regions, upper ocean heat content anomalies are a source of winter sea ice predictability. While the correlation skill in SIE in the Barents Sea reemerges in the winter season leading to skill up to 8 lead months, model biases in SST reduce the skill in the Labrador Sea. In the GIN Seas, which are determined by the export through Fram Strait and the atmosphere, we could not derive skillful retrospective predictions with DA of ocean data alone. Note that Wang et al. (2019) obtained better prediction skill in SIE in the Labrador, GIN, and Barents Seas despite worse prediction skill in SST. The authors used the same system (NorCPM) but assimilated only SST data. This indicates that our strategy to assimilate hydrographic profiles has the potential for improvement. We suspect that the degradation originates in a too large localization radius in the vicinity of sea ice. Within the Arctic basin, DA of ocean hydrography affected the oceanic and the atmospheric circulation, causing a reduced bias in SIT in the entire Arctic Basin. A separate study is required to identify the underlying mechanisms of the changed circulation and its impact on the sea ice state. However, improvements in SIT are only little reflected in the prediction skill of SIE.
In a second step we assessed the benefit of additionally assimilating SIC data to jointly update the sea ice and the ocean state in the mixed layer. The additional DA of SIC adds only little information to the upper ocean compared to the ocean only initialization. The joint update of the sea ice and the upper ocean yields minor prediction skill in SST in the Barents, GIN, and Labrador Seas. In the North Atlantic and the North Pacific, skill in SIE is enhanced and prolonged (up to 10 months lead time in the Barents Sea) compared to the case of weakly coupled DA of ocean only data. These improvements indicate the potential of joint updates of the upper ocean and the sea ice state for the prediction skill of SIE in this region. The most prominent benefit of additional DA of SIC is found in the Arctic shelf seas. Here, Anom-DA of SIC yields skillful correlation of SIC (particularly in summer), and strongly reduces the biases in SIT. We relate this bias reduction in SIT to the DA in different thickness categories of the sea ice model and the enhanced correlation skill in SIC to the joint update of the upper ocean and the sea ice state (Kimmritz et al., 2018). Based on this improved initial seaice state, July-initialized retrospective predictions of SIE now are skillful up to four lead months. A similar skill is found in Bushuk et al. (2017), where observations from the ocean and the atmosphere were used, indicating that the added skill in the Arctic basin due to (6-hourly) weakly coupled DA of atmospheric observations is comparable to the skill added by the (monthly) strongly coupled DA of SIC. Assimilating atmospheric observations is expected to be beneficial as the atmosphere strongly influences the ocean and the sea ice state (Pickart et al., 2003;Smedsrud et al., 2011;Woods & Caballero, 2016), and due to the good atmospheric observing network compared to the ocean and the sea ice. However, DA of atmospheric variables in a fully coupled Earth system is challenging. Due to the characteristic time scales of the atmosphere, monthly DA of the atmospheric observations is not suitable. DA of aggregated atmospheric states as done in Lu et al. (2015) allows some benefits, but most of the potentially relevant short-scale atmospheric information is lost in monthly aggregates of atmospheric states (Spensberger et al., 2017). High-frequency DA of atmospheric observations as done in Bushuk et al. (2017) would drastically increase the costs of our assimilation system, as the entire ensemble needs to stop for each assimilation step. Furthermore, the large differences in the typical scales of the different system components need to be considered, as well as the nonlinear interactions between these. Assimilating observations of the different Earth system components in a model consistent way is one of today's challenges of the DA community (Penny et al., 2019). And thorough studies are necessary to guide the choice of assimilated observations, including aspects of efficiency, choice of assimilation method and quality of the observational network.
To derive the reanalyses we applied anomaly DA (Anom-DA) of SIC, in contrast to full-field DA (FF-DA) that has been used earlier (Kimmritz et al., 2018) in a (bias-free) perfect model study. The rmse of SIC is similarly reduced in both cases. FF-DA of ocean and sea ice observations has been successfully applied for real-time operational forecasting systems, as, for example, introduced in Sakov et al. (2012). However, for seasonal to decadal predictions in a realistic setup, we prefer Anom-DA of sea ice to FF-DA, as the system tends to propagate back to its climatology on annual to decadal time scales. Moreover, possible biases in the observations, as found in the HadISST2 SIC product, may be propagated into the state of the Earth system. Yet, a drawback of Anom-DA in a highly nonlinear system is the nonlinear feedback of biases (Weber et al., 2015), which has the potential to limit the prediction skill. We may be facing this issue for instance in the Labrador Sea, where the strong SST bias in NorESM likely negatively affects the prediction skill in SIE.
In the analysis of the prediction skill, trends are usually excluded, as prediction systems are generally expected to capture trends (Bushuk et al., 2017). We found that the ability of our prediction systems to capture (linear) trends in SIE varies by region and season. Trends in SIE in FREE are too weak. In the Arctic shelf seas and the Atlantic water inflow regions, DA of ocean data improves these trends in the freezing season. This is expected in the Atlantic water inflow regions due to the tight link between the upper ocean heat budget and SIC, particularly in the ice growth season. The improvements found in the Arctic basin could originate in several mechanisms. To better understand the underlying mechanisms requires further study. As expected, additional DA of SIC improves the trends in SIE in the melting season as well.
To further improve the prediction skill, the system can be enhanced as follows: (1) Updating the entire water column when assimilating SST in the case of sea ice-ocean DA. Here, we only updated the mixed layer, which leaves the potential for better skill in the upper ocean in the Atlantic inflow region.
(2) Increasing the error of hydrographic observations underneath the sea ice. In the North Atlantic, the prediction skill of our ocean only DA is worse than assimilating only the nearest SST observation in the same system (Wang et al., 2019). We suspect that the skill reduction in our system is due to a large localization radius for hydrographic data here. Because hydrographic data are sparse underneath or close to the sea ice, the assimilation uses the less accurate climatology.
(3) Joint updates of the ocean and the sea ice. This is more efficient than weakly coupled DA (Kimmritz et al., 2018). We expect further increased skill if not only SIC is assimilated strongly coupled with the ocean but also vice versa. Sea ice can be post-processed following Kimmritz et al. (2018). We plan to assess the added skill of a combination of these three suggested improvements in the future.