Enhancing Skill of Initialized Decadal Predictions Using a Dynamic Model of Drift

Since near‐term predictions of present‐day climate are controlled by both initial condition predictability and boundary condition predictability, initialized prediction experiments aim to augment the external‐forcing‐related predictability realized in uninitialized projections with initial‐condition‐related predictability by appropriate observation‐based initialization. However, and notwithstanding the considerable effort expended in finding such “good” initial states, a striking feature of current, state‐of‐the‐art, initialized decadal hindcasts is their tendency to quickly drift away from the initialized state, with attendant loss of skill. We derive a dynamical model for such drift, and after validating it we show that including a recalibrated version of the model in a postprocessing framework is able to significantly augment the skill of initialized predictions beyond that achieved by a use of current techniques of postprocessing alone. We also show that the new methodology provides further crucial insights into issues related to initialized predictions.


Introduction
The ability to predict near-term changes in climate, here understood as up to a few decades, is as scientifically challenging (e.g., see Meehl et al., 2009Meehl et al., , 2014, and references therein) as it is important. The importance of such an ability stems, for example, from the fact that our attempts to assess societal and ecological impacts of climate change and propose, study, and plan adaptation and mitigation strategies are only as accurate and useful as the climate change predictions are correct (e.g., see MacLeod et al., 2012). As such, a series of coordinated decadal prediction experiments have been put in place to help improve understanding of the various issues related to decadal predictions (e.g., see Taylor et al., 2012 andEyring et al., 2016).
Since near-term variability of present-day climate is influenced by both natural variability and external forcing (Lorenz, 1975), two types of such decadal prediction experiments are typically considered: uninitialized projections (UP) that account for changes in climate due to changes in external forcing and initialized decadal prediction experiments which aim to augment the external-forcing-related predictability realized in UP with natural-variability-related predictability by appropriate observation-based initialization. While uninitialized climate change simulations date back to the earliest attempts at climate modeling, interest in initialized (decadal) predictions (IP) is more recent. Indeed, interest in the latter became mainstream after initial studies such as Smith et al. (2007), Keenlyside et al. (2008), and Pohlmann et al. (2009), about a decade ago, suggested that the predictive aspects of simulations of climate over the past half a century could benefit from observation-based initialization. In the context of such IPs, a primary objective is the improvement in skill of such predictions, beyond that achieved in UPs.
A comparison of predictions of past climate (hindcast) to observations (verification data) typically reveals biases in the former (bottom panel of Figure 1). Such "model bias" can be attributed, on the one hand, to errors in initial conditions (ICs) and, on the other, to "model error." Here, by model error we refer to deficiencies in the model such as insufficient resolution and inaccurate parameterizations of unresolved processes which lead to the model's inability to accurately simulate the delicate dynamical balance of processes that underlies both the mean state and the modes of variability of the climate system. It is easy to see how IP efforts can be undermined by model bias: If model bias were to be large in comparison to interannual climate variability (size of the signal to be predicted), the skill in such predictions would necessarily be small (bottom panel of Figure 1). As such, reducing model bias is a top priority of modeling centers (e.g., see Hawkins et al., 2014;Hurrell et al., 2009). But it is also a very challenging task. Consequently, while it is an area of active, ongoing research, progress is typically incremental and slow but steady (e.g., see Drews et al., 2015). The main question we address in this article is the following: Given the IPs (initialized predictions) in cyan, how can they be postprocessed to be more predictive of the observations in black? Answering this question leads to being able to predict near-term changes in climate better. The development of a better postprocessing method has a greater impact in the imperfect model scenario (bottom panel); Coupled Model Intercomparison Project models fall in this category. Our answer is based on first developing a dynamic model that describes the dominant behavior of the IPs (cyan and blue). After validating the model, we show that the new postprocessing method better predicts observations over the hindcast period, when compared to state-of-the-art methods for postprocessing alone. The drift model developed suggests that a faster loss of memory of ICs (initial conditions), as in the cyan IPs, is more typical (cf. IPs in blue) and provides other insights (discussed in the final section).
Given this state of affairs, for current, state-of-the-art IPs to be useful, bias correction in a postprocessing fashion is essential (e.g., see Choudhury et al., 2017;Doblas-Reyes et al., 2013;Kharin et al., 2012).
In IPs, a wide range of techniques are used for either full-field or anomaly initialization and to generate the associated ensemble of initial perturbations (e.g., see Hazeleger et al., 2013;Magnusson et al., 2013;Polkova et al., 2019;Smith et al., 2013). Such techniques range from data assimilation in the ocean component alone (e.g., Pohlmann et al., 2009;Nadiga et al., 2013) to coupled data assimilation (e.g., Yang et al., 2013) to partially coupled spin-up (Thoma et al., 2015), to spinning up ocean and sea ice components with reanalysis products (e.g., Yeager et al., 2018) and others (e.g., see Matei et al., 2012). Even though this is an area of active research, a salient aspect of IPs is that the initial state in these simulations, at a particular start date, is closer to observations than the state of the (possibly an ensemble of) UPs at that same date.
Notwithstanding the considerable effort expended in finding such "good" initial states, a striking feature of current, state-of-the-art, initialized decadal hindcasts is their tendency to drift away from the initialized state (e.g., Hazeleger et al., 2013;Magnusson et al., 2013;Meehl et al., 2014;Smith et al., 2013; bottom panel of Figure 1; Figures 2 and 3; Figure 1 of Kim et al., 2012, and others). Since model drift and model bias are sometimes used interchangeably, for definiteness, we will refer to the early behavior of IPs that leads to a (sometimes quick) departure from its initialized state as "initialization-induced drift" or "initialization drift" Figure 2. Evolution of the ensemble mean of IPs started at four different times is shown (cyan downward trending lines). When bias corrected using TBC, the resulting predictions I (C) are shown in yellow. Drift computed using the dynamical drift model described in section 2 and supporting information is shown in magenta and captures the dominant behavior of the IPs. Also resulting from the model is the attracting trajectory shown in blue. This is seen to follow the uninitialized projection trajectory shown in red; the correspondence is even greater with RDM2 (see Figure  S3). These elements are recombined statistically and bias corrected using TBC to obtain the corrected predictions I (RC) which are shown in green. The new procedure improves hindcast skill of the IPs: averaged over all lead times and start years, not only is root-mean-square error reduced by 10-15%, but anomaly correlation coefficient increases by about 4-6% (see Tables 1 and S1). IP = initialized predictions; TBC = trend-based bias correction; SST = surface temperature; NCAR = National Center for Atmospheric Research; CESM = Community Earth System Model. or simply drift. A primary reason for such drift is related to the fact that the climate system loses memory of its IC and transitions from a regime of IC-related (equivalently natural-variability-related) predictability to external forcing-related predictability in near-term predictions. When such a transition occurs in a model that is biased, initialization drift is exacerbated and the IP trajectories appear to almost jump away from the initialized state (bottom panel of Figure 1). Other reasons may include different bias characteristics in the equilibrated states of the system used to obtain the initial state and those used for prediction. Given that fixing model bias and developing initialization methods are already tasks that are actively worked on and are of high priority at modeling centers, currently, we concern ourselves with the more pragmatic question of whether one can profitably take into account initialization drift in a postprocessing framework to enhance the skill of IPs. Toward this end, we develop a dynamical model for initialization drift and show that a postprocessing methodology based on this model of drift is able to significantly enhance the skill of IPs when compared to using existing state-of-the-art techniques of bias correction alone. We also show that the new methodology provides further crucial insights into issues related to IPs of climate.
In order to illustrate each of the points discussed above, motivate the new model of drift and the postprocessing method presented, and facilitate further related discussions, we start with two schematics in Figure 1. In both of these panels, one is interested in predicting the black trajectory. In each of these two panels, an uninitialized-projection ensemble is shown in red, and two IP ensembles are shown in blue and cyan. The individual ensemble members are shown in thin lines, and the ensemble mean is shown in thick lines. The top panel corresponds to a perfect model scenario, whereas the bottom panel corresponds to an imperfect model scenario and is more realistic.
In each of the panels, the uninitialized projections are seen to capture the upward trend of the black trajectory. The upward trend of the black trajectory itself should be thought of as caused by external forcing. As such, the UPs are seen to capture predictability that arises from external forcing. Next, the schematics illustrate the point that sophisticated state estimation procedures have permitted a proper initialization of the IPs whereby the IPs are coincident with observations at the start of 1972. In the initial period up to about 2 years later, the IP ensembles are seen to be closer to the observations than the UPs are to the observations. This is illustrative of the IPs being able to augment the forcing-related predictability of UPs with natural-variability-related predictability by proper initialization. In these IP ensembles the transition from natural-variability-related predictability to external-forcing-related predictability is seen to occur between a year and 2 years.
The difference in behavior of the two (blue and cyan) IP ensembles is because of the different rates at which they lose memory of ICs. That is, the cyan IP ensemble is seen to remain closer to observations for a shorter period of time than the blue IP ensemble because the cyan system loses memory of the ICs faster. In this case, it may also be noted parenthetically that while the tendency of the climate system to lose memory of its initial condition and transition from natural-variability-related predictability to external-forcing-related predictability is verified in models, unlike in the weather context, the relevant time scale on which this occurs in the climate context is model dependent.
Next, in the perfect model scenario (top panel), because the model represents the various processes and small differences between such processes that lead to the observed state and its variability perfectly, there is no systematic deviation between the modeled state and its variability and the observed state and its variability. However, because it is difficult to achieve such levels of modeling accuracy, it is almost always the case that models exhibit biases. Because the observed mean state typically results from a dynamical balance between various processes (small differences between large numbers), errors in modeling lead to a biased ("cooler") mean state in the lower schematic. Finally, a faster loss of memory of ICs in the cyan IP ensemble in conjunction with model bias is seen to result in an almost-jump-like behavior wherein the cyan IP trajectories are seen to quickly move away from the observations.
The main question we address in this article can also be illustrated in the schematics of Figure 1: Given the IPs in cyan, how can they be postprocessed to be more predictive of the observations in black? Answering this question leads to being able to predict near-term changes in climate better. Development of a better postprocessing method has a greater impact in the imperfect model scenario (bottom panel); CMIP models fall in this category. Our answer is based on first developing a dynamical model that describes the dominant behavior of the IPs in cyan. After validating the model, we show that the new postprocessing method better predicts observations over the hindcast period, when compared to state-of-the-art methods for postprocessing. The drift model developed suggests that a faster loss of memory of ICs, as in the cyan IPs, is more typical (cf. IPs in blue) and provides other insights that we will discuss in the final section.

A Dynamic Model of Drift
In the postprocessing framework, state-of-the-art methods of bias correction are purely statistical in nature and comprise mean bias correction, trend-based bias correction (TBC), and initial-state-based bias correction (e.g., Choudhury et al., 2017;Doblas-Reyes et al., 2013;Kharin et al., 2012, and others). Symbolically, given P j , a generic prediction with a start year j and at lead time , the corrected prediction P (C) is obtained as where B j (P j , V j ) is a statistical approximation of the bias of P j with respect to V j which is the verification or reference product; it is one of B (M) , or B (T) , or B (I) for the three schemes mentioned above. Details of these techniques are given and discussed in the supporting information (SI) In keeping with the usage of the term initialization drift as discussed in section 1, to be clear, when current methods of bias correction are applied to IPs, they indeed correct for such drift. What we are interested in examining is whether the dynamical aspects of initialization drift have a further useful role to play: For example, such an approach may lead to insights into IPs and eventually lead to their improvement. Or better yet, such an approach will, right away, also lead to further enhancement of the skill of IPs beyond that achieved by current bias correction techniques alone.
We now transition from the schematics of Figure 1 to considering a particular instance of near-term prediction. Here, the quantity of interest (QoI) is the average of the sea surface temperature (SST) between 60 • N and 60 • S in the decadal prediction large ensemble (DPLE) experiment at the National Center for Atmospheric Research (NCAR) that uses the Community Earth System Model (CESM). Not only does the climate model used in this study have a large user base, but also these recent and sophisticated experiments represent the state-of-the-art in IPs and have been made available as a community resource. (See Yeager et al., 2018, and SI for details.) In Figure 2, the cyan lines correspond to the ensemble mean of the IPs started at four different times and which extend over lead times up to 10 years. We consider a set of predictions started at 62 successive years , but just four of them are shown for illustration without clutter.
In Figure 2, the (rather rapid) drift of the IP trajectories away from their respective initialized states and toward a seemingly different state is evident (cf. cyan line in bottom panel of Figure 1). Next, toward developing a model of this behavior, we decompose the IP with a start year j and at lead time (I j ) into an initialization drift D j and a residualÎ : and focus on a model for D. The model for D is derived in SI starting from a suitably averaged version of a scalar transport equation. The model for D that results from that derivation, with D j0 = I j0 and j = 0 + 1 j, however, can be intuitively seen to capture the behavior of IPs either in the schematic of Figure 1 or in the more realistic scenario of Figure 2: It captures the transition from natural-variability-related predictability achieved through initialization to external-forcing-related predictability in terms of losing memory of ICs at a rate of j per year and eventually being attracted to A(t).
Given the ensemble mean of the IPs I j , the attracting trajectory A(t) and parameters 0 and 1 are found using nonlinear least squares that minimizes the squared sum of the residual D j − I j across all js and all s. We consider two alternatives for inferring A: A ≡ a 0 + a 1 j + a 2 j 2 (Drift Model DM) or A ≡ {a j } (alternative DM2). While there is no conceptual difference between these two models, the statistical parsimony of DM makes it better suited when IPs are performed for only a few start years. Thus, at this point, using only the IPs I j , we have identified dynamical features that consist of (i) an attracting trajectory A(t) (blue line in Figure 2 and which roughly follows the UP), (ii) a drifting component of the IP D j that takes the initialized state at the time of initialization toward the attracting trajectory at later times (magenta lines), and (iii) a residual componentÎ (difference between cyan and magenta lines).
The drifting of IPs away from the initialized state toward the attracting state A(t) conforms with our notion of the climate system forgetting its IC with time and reverting to a preferred state (climatology in the real sys-tem; a biased state in the imperfect modeling context). At this stage, notwithstanding the fact that the drift behavior that we have successfully modeled is valid dynamical behavior, it may be worthwhile to momentarily adopt a point of view that the drift component has no further useful predictive information in it. However, on doing so, and comparing the skill of the residual componentÎ alone against traditional bias correction techniques, the poorer skill ofÎ reiterates the likelihood of the drift component containing, at least, some predictive information. That is, drift leads to IPs losing skill, but because it is explainable and expected dynamical behavior, it likely contains some information that can be used to enhance the skill of IPs. In the next section, we focus attention on how the validated drift model can be used to recover such information. Finally, we also note that if the behavior of the near-global mean of SST in CESM-DPLE is in any way generic, it appears that the drift away from the initialized state is rather quick-quicker than the multiyear to decadal scale at which IC-related predictability is generally thought of as transitioning over to forcing-related predictability (e.g., see Branstator & Teng, 2010Meehl et al., 2014).

The Recalibrated Drift Model
The validation of the drift model in Figure 2 highlights and reiterates the point of view that one of the main problems underlying IPs is that the model attractor is significantly removed from the initialized state (which itself is related to observations). Consequently, one can imagine that if the (inferred) attractor is empirically corrected to be closer to the initialized state, the skill of the IPs will improve. We therefore consider a correction of the attractorÂ given bŷ Here, the corrected attractor is obtained using the trend-based bias correction with respect to predictions based solely on the IC of the IPs I (0) .
Next, since the premise of the work is that the drift of IPs is valid dynamical behavior (even though it leads to loss of predictive skill in IPs), and since we have modified the "model attractor," it follows that the drift should be modified as well: We will get to the actual determination of the parameters involved momentarily, but note that, for example, 0 and 1 are statistical degrees of freedom. Note that this step may be justified given the observation that while the tendency of the climate system to lose memory of its initial condition is verified in models, unlike in the weather context, the relevant time scale on which this occurs in the climate context is model dependent. Finally, on having reencoded the useful component of drift D j in the modified driftD , we propose that the modified drift be recombined with the residual trajectoryÎ to form the recombined (or recalibrated; see next) IP I (R) as where 0 and 1 are further statistical degrees of freedom.
Up to this point, the only inputs to the analyses are the IPs I j . If the statistical degrees of freedom introduced in (4) and (5) are determined so as to minimize the difference between I (R) and I (0) in the least squares sense, it may be anticipated that I (R) will outperform I j in terms of skill metrics. Conventional bias correction cannot be applied at this point since observations have not been brought into the picture yet. Pragmatically, however, one is most interested, instead, in the final improvements over the hindcast period given observations. Toward examining such metrics and making comparisons to the state of the art, given the verification product V(t) over the hindcast period, we correct the recalibrated prediction I (R) using trend-based bias correction: Note. Both RMSE and ACC are improved (RMSE is reduced, while ACC is augmented). This is seen in each of the four cases considered with two different models (NCAR-CESM-DPLE, GFDL-CM2.1), with a surface-related QoI versus a more depth-averaged QoI (NCAR SST, NCAR ALFWC), and with a more global QoI versus a more regional QoI (NCAR SST, NCAR SPNA)-QoIs with different inertia, bias, and interannual variability characteristics. Improvements with respect to the other two commonly used bias correction techniques are given in the table in  This is necessary, not only because there are (typically) differences between the ICs of IPs I j0 and observations V(t) but also because we have focused on correcting IPs with respect to one dynamical aspect (drift) when there are likely other reasons for the IPs not tracking observations.
Thus, the methodology is given by (1)-(6) and involves two nonlinear least squares problems-one to find

10.1029/2019GL084223
increase ACC) for each of the QoIs considered. Improvements with respect to the other two commonly used bias correction techniques tend to be greater, and they are given in Table S1.

Discussion and Conclusion
Starting from the fact that near-term predictions of climate are affected by both natural variability and external forcing, we developed a model for initialization drift, an early component of evolution of IPs that leads to a quick departure from the initialized state to a preferred (biased) state. This model permitted an identification of the preferred attracting state, the drift component along with a time scale for the loss of memory of IC, and a residual component of the IP trajectory. After recovering useful predictive information from the drift component and encoding it in the form of a modified drift, we proposed the recalibrated drift model that recombines the modified drift and the residual component in statistical fashion to minimize RMSE with respect to observations. RDM was then shown to reduce RMSE and increase ACC in a few situations when compared against current methods of bias correction. The situations considered included QoIs in two different models, an SST-related QoI versus a depth-integrated QoI, and a more global QoI versus a more regional QoI. Since the different QoIs considered exhibit different characteristics such as inertia, interannual variability, and bias, we expect broader applicability of the proposed technique.
Our modeling of the initialization-related drift provides further insights into issues related to IP. For example, in Figure 2, for the SST-related QoI considered there, the attracting trajectory inferred using only the IPs approximately follows the trajectory of the UP. The similarity is even stronger when the alternative model of drift RDM2 is used ( Figure S3). However, in the same model, when a different SST-related QoI was considered that was more regional, in this case averaged over the subpolar North Atlantic, the trajectory of the inferred attracting state is far removed from the trajectory of the QoI in the UP, as seen in Figure 3. Indeed, in contrast to the attracting nature of the inferred trajectory, the IPs are seen to evolve away from the UP. Notably, here the trajectory of the QoI in the UP is closer to observations than in the IPs. On the one hand, this begs questions such as (a) Is there strong external forcing in this region, and if so what is it? (b) Does external-forcing-related predictability dominate in these regions? (c) What aspects of initialization in the subpolar region contribute to IPs being dramatically bad? On the other hand, it raises questions about how the state that corresponds to the inferred attracting trajectory and that corresponding to the UP can be dynamically reconciled or if there are multiple preferred states. The proposed drift model also helps sharpen the question of why the (average) e-folding time related to the loss of memory in the SPNA context is much longer than that for the other QoIs considered and as to why the loss of memory of ICs in the latter QoIs is much faster than has been anticipated in the literature (e.g., Meehl et al., 2014, and others). Understanding such issues raised by the analysis will likely lead to a better understanding of IP systems and may help improve them.
We anticipate that further analysis of the proposed method, for example, toward better understanding details of its working, its range of applicability, its limitations, and how it can be improved, will be useful. For example, (a) our derivation of the drift model assumes an average over a semienclosed basin, and consequently we applied the method to QoIs that were averaged or integrated over large regions. From inertia considerations alone, predictions for such QoIs have a better chance of being skillful. How much and under what circumstances can this assumption be violated and the model still be useful? What improvements are to be expected for spatially distributed fields? (b) It is encouraging that RMSE was reduced in each of the cases considered, but as to why there was also a simultaneous improvement in ACC in each of the cases is not something that we can currently explain. In this context it is also interesting to note that (Table S1) unlike with the current methods of bias correction, with the new method and its alternative, low RMSE and high ACC occur concurrently.