Nonstationary Relationship Between Autumn Arctic Sea Ice and the Winter North Atlantic Oscillation

The North Atlantic Oscillation (NAO) has a dominating influence on wintertime weather in the North Atlantic region, and therefore, it is of great interest to predict the NAO several months ahead. While state‐of‐the‐art dynamical forecast models appear to be increasingly skillful in predicting the NAO, statistical methods with comparable or higher predictive skill are still often used. An inherent problem with statistical methods is that any empirical relationship between predictors and the NAO may be valid for some periods but subject to change over time. Here we use a set of new centennial reanalyses and large‐ensemble simulations with multiple climate models to discover clear evidence of nonstationarity in the lagged correlation between autumn Barents‐Kara sea ice and the winter NAO. This nonstationarity leads us to question the causality and/or robustness of the ice‐NAO link. We caution against indiscriminately using Barents‐Kara sea ice to predict the NAO.


Introduction
Skillful forecasting of wintertime weather anomalies several months in advance appears attainable. As the North Atlantic Oscillation (NAO) is the dominant mode of variations in atmospheric circulation anomalies in the extratropical North Atlantic (Hurrell, 1995), it is often the benchmark target for seasonal prediction studies in that region. Recently, dynamical seasonal forecasting models have shown skill in predicting NAO anomalies a season to a year ahead (Baker et al., 2018;Dunstone et al., 2016). The source of this skill is still not fully understood, but one potential factor of many is the autumn sea ice in the Kara Sea (Scaife et al., 2014). By combining empirical predictors and dynamical models, Dobrynin et al. (2018) used a subsampling approach to obtain very high NAO prediction skill. Statistical models often show even higher skill than state-of-the-art dynamical systems for forecasting the NAO. In a recent study by Wang et al. (2017), anomalies of sea ice concentration, sea surface temperature (SST), and snow cover were used to predict the NAO (and surface variables) several months ahead. In their empirical model, autumn Barents-Kara (BK) sea ice was the strongest predictor of the winter NAO. Hall et al. (2017) also identified autumn sea ice as a key statistical predictor of the winter NAO over the period 1980-2016. It is worth noting that in the analyses of Wang et al. (2017) and Hall et al. (2017), the skill of NAO predictions was highly sensitive to the inclusion, or not, of the large downward trend in sea ice, with higher skill when using detrended time series. This sensitivity raises potential concerns about the robustness of the ice-NAO link.
An argument against relying on predictions based on past training data is that empirical relationships may change over time. Kolstad and Årthun (2018) recently showed that seasonal anomalies of the NAO and Central England Temperature were strongly linked to Northeast Atlantic SST anomalies several months before during the satellite era (i.e., after 1979), but according to a twentieth century reanalysis, these relationships did not hold before that. Another example of nonstationarity of teleconnections is that the lagged correlation between Eurasian snow cover and midlatitude weather (Cohen & Entekhabi, 1999) did not emerge until the 1970s (Douville et al., 2017;Peings et al., 2013).
The aim of this study is to investigate the stationarity of NAO predictability based on Arctic sea ice in the BK Seas. Many studies have explored the influence of sea ice variations on the atmospheric circulation (e.g., Alexander et al., 2004;Herman & Johnson, 1978;Murray & Simmonds, 1995;Petoukhov & Semenov, 2010). More specifically, the link between autumn sea ice anomalies and subsequent winter weather has been widely studied (e.g., Caian et al., 2018;Francis et al., 2009;García-Serrano et al., 2015;Honda et al., 2009;Hopsch et al., 2012;Jaiser et al., 2012;Koenigk et al., 2016;Kretschmer et al., 2016). One physical mechanism that has been proposed is a stratospheric pathway linking low sea ice to negative NAO (Dethloff et al., 2019;Nakamura et al., 2015;Sun et al., 2015). While many of the aforementioned studies have observed a positive correlation between interannual variability in autumn BK sea ice and the winter NAO over recent decades, this link is not robust in model simulations. Model experiments with specified reductions in sea ice have yielded a full spectrum of NAO responses, from a significant positive shift (Cassano et al., 2014;Screen et al., 2014), to no significant effect (Singarayer et al., 2006;Strey et al., 2010), to a significant negative shift (Magnusdottir et al., 2004;Seierstad & Bader, 2009).
It is not clear why models diverge in this regard (Screen et al., 2018;Smith et al., 2019), but one possible explanation could be that the relationship between autumn sea ice and winter NAO is nonstationary, perhaps due to the response being dependent on the background climate state (Smith et al., 2017), or the response could be nonlinear with respect to the magnitude or spatial pattern of sea ice loss (Chen et al., 2016;McKenna et al., 2018;Overland et al., 2016;Peings & Magnusdottir, 2014;Petoukhov & Semenov, 2010;Semenov & Latif, 2015;Sun et al., 2015). Furthermore, causality of the observed relationship is still questionable. Even in those model experiments that reproduce the observed relationship between reduced sea ice and the negative NAO phase, it is often unclear what is the specific contribution from autumn sea ice loss. Recently, Blackport and Screen (2019) performed model experiments to isolate the role of autumn sea ice loss from that in other seasons. These authors found that although year-round sea ice loss enhanced the negative phase of the winter NAO, solely autumn sea ice loss had no discernable effect on the winter NAO.
In summary, there are many open questions regarding the apparent link between autumn sea ice and the winter NAO. Here we seek to address some of those questions, specifically: Has the observed correlation between autumn sea ice and the winter NAO been stationary over time? Is the observed ice-NAO correlation in recent decades unprecedented or usual in a longer-term context? According to large climate model ensembles, how much variation in the ice-NAO correlation arises due to external climate drivers versus internal climate variability?

Data and Methods
For the period after 1979 we used the ERA-Interim reanalysis (Dee et al., 2011). We also used longer reanalyses: version 2c of NCEP's Twentieth Century Reanalysis (Compo et al., 2011;20CR henceforth), which covers 1850-2014, but as recommended, we only used the period after 1865 due to a bias in marine observations between 1851 and 1865 (NOAA ESRL PSD, n.d.), and the ECMWF's CERA-20C (Laloyaux et al., 2018) and ERA-20C (Poli et al., 2016), which end in 2010 and go back to 1901 and 1900, respectively. CERA-20C is a coupled reanalysis with 10 ensemble members, and we used each member individually. 20CR consists of 56 ensemble members, but we used ensemble means of each variable.
We also used all simulations from the following ensembles: the 40-member Community Earth System Model Large Ensemble Project (Kay et al., 2014), which covers 1920-2100 and is referred to henceforth as CESM (we only used data up to 2080); the 100-member Max Planck Institute Grand Ensemble (MPI hereafter), which covers 1850-2100 (Bittner et al., 2016); the 50-member Canadian Earth System Model Large Ensemble (CanESM2 hereafter; Kirchmeier-Young et al., 2017), covering 1950-2080; and a 30-member ensemble based on the GFDL ESM2M model (GFDL hereafter; Rodgers et al., 2015), which starts in 1950 and ends in 2100. Up until 2005 the models used observed external forcing, and after that they followed the RCP8.5 emissions scenario (Meinshausen et al., 2011).
Following Hall et al. (2017), we computed time series for BK sea ice based on area averages from 70°N to 85°N and between 30°E and 90°E. Our results were not sensitive to the choice of boundaries. Sea ice data were taken from all the reanalyses and models, and in addition, we used sea ice from the data set compiled by Walsh et al. (2017) based on satellite data and historical sources. The sea ice data series are not consistent across the observational-based data sets. In 20CR version 2c, the sea ice is based on the COBE-SST2 data set (Hirahara et al., 2014), and the sea ice in ERA-20C is taken from the Hadley Centre Sea Ice and Sea Surface Temperature data set version 2.1 (HadISST2; Titchner & Rayner, 2014). As CERA-20C is a coupled oceanatmosphere reanalysis, its sea ice is simulated, but it is constrained by observations as both the ocean and atmosphere were nudged toward observations.
The NAO index based on pressure data from Stykkisholmur, Iceland, and Lisbon, Portugal, was downloaded from NCAR's Climate Data Guide (Hurrell & National Center for Atmospheric Research Staff, 2018) and standardized. For the climate models and reanalyses, the NAO index was computed in several steps. First, time series of sea level pressure (SLP) data for the nearest grid points to the two stations were compiled. Second, these were standardized individually. Third, a new time series consisting of the difference between the standardized SLP data from Lisbon and Stykkisholmur was created. Fourth, that time series was standardized.
We define R as the interannual Pearson's correlation between area-averaged BK sea ice in October and the NAO index in the subsequent winter from December to February. When R refers to the correlation between sea ice based on Walsh et al. (2017) and Hurrell's NAO index, we label it "H/W." Most of the analysis is based on 30-year time series, none of which were detrended (see discussion in section 4). To assess the statistical significance of R and other correlations, we created 10,000 pairs of time series consisting of 30 random numbers drawn from the normal distribution and computed the correlations between each pair. We then defined the 2.5th and 97.5th percentiles of the 10,000 correlation coefficients as the thresholds for significance at the 5% level.  Figure 1b shows R for each possible 30-year period, now starting in 1865. We find that R is significantly positive in the most recent period and in agreement with the values of R derived using ERA-Interim sea ice. Looking at the period before 1979, R is only significantly positive for 30-year periods starting in the late 1800s and is significantly negative or nearly so for some 30-year periods starting in the 1940s. This suggests that R is highly nonstationary and can even assume negative values. This nonstationarity is supported by all three reanalyses, although there are clear differences between the data sets. In particular, the coupled reanalysis CERA-20C diverges from the other data sources, as it is the only one with significantly negative R for periods starting in the 1910s. There is a marked shift in 20CR and ERA-20C from low sea ice variability before about 1960 to much higher variability after that, and this could be a reason for the weak correlations in those data sets in the pre-satellite era.

Results
To illustrate how the correlation between autumn BK ice and North Atlantic winter SLP has varied in time and space, Figure 2 shows maps of grid point correlations for the three 30-year periods for which R was most negative, closest to zero, and most positive (according to CERA-20C). For the 30-year period with the most negative R (starting in 1916), the pattern resembles the negative phase of the NAO (Figure 2a). The correlations for the 30-year period of neutral R (starting in 1933) bear little resemblance to the NAO. For the 30-year period with the most positive R (starting in 1980), the spatial pattern is consistent with a positive NAO index, even if the northern center of action is far from Iceland (Figure 2c). These results demonstrate that the temporal variability of R is associated with radical differences in the circulation patterns linked to sea ice variability and is not just due to subtle changes in the NAO centers of action (Pedersen et al., 2016).  We now turn to the question of how much R varies due to internal climate variability, as depicted by large ensembles of climate model simulations. Figure 3a shows the time evolution of R between 1920 and 2080 in 40 realizations of the Community Earth System Model, which differ from each other only in their temporal depiction of internal climate variability. The ensemble-mean R is close to zero in all 30-year periods and shows no trend. Individual ensemble members, however, show significant (multi) decadal variability, with R fluctuating between positive and negative values. This shows two things: first, this model confirms the nonstationarity of R, and second, this nonstationarity appears to arise due to internal climate variability and is not a forced response to external climate drivers. We note that because the temporal variability of R is internally driven, we do not expect the simulated variability of R to be in phase with the observed variability of R. However, if we preferentially select the ensemble member that best matches the observations, we find that this individual ensemble member fairly well captures the observed time evolution of R, suggesting the model is adequately capturing the magnitude and time scales of variability in R. Other single-member evolutions of R supported this finding (not shown). Results based on simulations from three other large ensembles are shown in Figures 3b-3d. In each case we observe (1) that there is no discernible forced trend in R; (2) that R fluctuates between positive and negative values due to internal variability but is rarely significantly positive or negative for long time periods; and (3) that individual ensemble members show internally driven variability of R on (multi)decadal time scales, consistent with observations. Last, Figure 4a summarizes the range of R-values found in observations, reanalyses, and models. Given that we find no forced variability (i.e., trends) in R, we combine all possible 30-year periods, and for each model, all ensemble members, into a single distribution. It is clear that the recent period, as depicted by ERA-Interim, is unusual in having a median value of R that is significant. The data sources covering longer time periods all have median values of R that are far from significant. However, the positive values in ERA-Interim are not without precedent. All the reanalyses and models display occurrences of positive correlation as high or higher than seen in ERA-Interim. We note that the MPI model suggests R can reach as high as almost 0.8, which is appreciably higher than observed in recent decades. With the exception of ERA-Interim, all the data sources suggest that R can assume significantly negative values.

Discussion
Our results indicate that R does not have consistent magnitude or even a preferred sign. Long-term observations, reanalyses, and climate models all suggest that R is most often small and statistically insignificant; however, large decadal variability can give rise to rare 30-year periods of significantly positive or significantly negative R values. While we found some minor differences in the frequency distributions of R between data sets, this result is robust in each of the climate models and reanalyses. It appears that recent decades, particularly the 1980s, 1990s, and 2000s, are one such example of a rare 30-year period with highly positive R. Since around 2010, R has decreased.
Even though the climate models display no multidecadal trends in the mean value of R, the projected sea ice decline through the 21st century may have an impact on the decadal variability of R. To investigate this possibility, we repeated the analysis after first removing the forced sea ice decline (by subtracting the ensemble Figure 4. Box plots illustrating the range of correlations between Barents-Kara (BK) sea ice concentration (SIC) in October and the winter (December-January-February) North Atlantic Oscillation index for each data source. Each box extends from the lower (Q 1 ) to the upper quartile (Q 3 ) of R, and the horizontal lines show the median. The upper "whiskers" extend to the highest data points lower than Q 3 + 1.5 IQR, where IQR is the interquartile range defined as IQR = Q 3 -Q 1 and the lower whiskers extend to the lowest data points greater than Q 1 -1.5 IQR. The circles show outliers, and the dashed lines indicate the positive and negative thresholds for statistical significance at the 5% level for 30-year periods. mean). As shown in Figure 4b, the ranges of R were indistinguishable from those shown in Figure 4a. The implication is that-insofar as the climate models are realistic-the loss of sea ice does not lead to a forced change of the mean or variability of the ice-NAO correlation.
There are several possible interpretations of the nonstationarity in R, which we briefly discuss here. First, the nonstationarity could imply that the ice-NAO relationship is noncausal and that R fluctuates simply due to internal climate variability. To test this interpretation, we created another set of R for the climate models where instead of using matching 30-year periods for sea ice and NAO, we correlated random 30-year periods of sea ice data with other random 30-year periods of NAO data. Figure 4c shows that the resulting ranges for R were indistinguishable from those shown in Figure 4a. This suggests that the spread of R seen in Figure 4a could arise purely from random sampling of two unrelated physical variables. Second, if there is a causal relationship, it could be very weak and masked by internal variability. An alternative third perspective is that the ice-NAO relationship is an example of necessary but insufficient causation, leading to intermittency in R (Overland et al., 2016). A specific case of necessary causation that could explain the nonstationarity of R is dependence on the background climate state. For example, the magnitude and even sign of the NAO response to sea ice may depend on the background climatological flow via the refraction of planetary waves (Smith et al., 2017). In such a way it is plausible that decadal variability in the North Atlantic climate could control the magnitude and sign of R. Further work is required to ascertain which of these valid interpretations is/are correct.
The nonstationarity in R serves as a warning about the potential pitfalls of statistical predictions. Empirical prediction models and machine learning based on a certain training period may fail when applied to a different prediction period. Such nonstationarity is an argument for using and improving dynamical models, which might be able to simulate the processes that give rise to nonstationarity. Indeed, past studies have found that the skill of dynamical NAO forecasts varies over time (Kumar & Chen, 2018;O'Reilly et al., 2017;Weisheimer et al., 2019), which might indicate nonstationarity in the drivers of NAO variability. Anecdotally, skill in predicting the NAO  shows broadly consistent multidecadal variability to that shown here for R. It is unclear whether the strength of R is important for the skill of NAO predictions or whether both NAO prediction skill and R vary due a common influence, such as the mean background state.

Conclusions
We have shown that the interannual correlation between autumn BK sea ice and the winter NAO is nonstationary and displays considerable (multi)decadal variability. The strong positive correlation observed in recent decades is a rare occurrence in the historical record, but it is within the range of internal variability as simulated by climate models. Large climate model ensembles suggest that internal climate variability can give rise to infrequent periods of significant positive or significant negative correlation but that the correlation is most often weak. We find no evidence for variability in the ice-NAO correlation due to external climate drivers, such as anthropogenic greenhouse gases. Although the recent observed high correlation can be explained purely by internal variability, we cannot fully rule out a contribution from external forcing superimposed on the internal variability, in part because it remains unclear whether or not climate models can adequately represent physical links between sea ice and the NAO. Also, some modeling studies do suggest a forced response of the NAO to sea ice loss, albeit weak compared to internal variability. Our results reveal that the ice-NAO correlation is nonrobust and raise further questions about the causality of this proposed connection, or at least, the strength of any causal relationship. Given the clear nonstationarity of the ice-NAO relationship, we caution against the use of autumn sea ice as a statistical predictor of the NAO. CanESM2, and MPI large ensembles, respectively, ahead of their public release and Marius Årthun for taking part in the initial phase of our study. The complete CESM and MPI large ensembles have been publicly released and are obtainable at http://www.cesm. ucar.edu/projects/community-projects/ LENS/ and https://www.mpimet.mpg. de/en/grand-ensemble/, respectively. Selected variables (including those used here) from the CanESM2 and GFDL large ensembles have been collated by the U.S. CLIVAR Large Ensembles Working Group and made publicly available online (https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4. CLIVAR_LE.html).