Internal Variability and Disequilibrium Confound Estimates of Climate Sensitivity From Observations
Abstract
An emerging literature suggests that estimates of equilibrium climate sensitivity (ECS) derived from recent observations and energy balance models are biased low because models project more positive climate feedback in the far future. Here we use simulations from the Coupled Model Intercomparison Project Phase 5 (CMIP5) to show that across models, ECS inferred from the recent historical period (1979–2005) is indeed almost uniformly lower than that inferred from simulations subject to abrupt increases in CO2 radiative forcing. However, ECS inferred from simulations in which sea surface temperatures are prescribed according to observations is lower still. ECS inferred from simulations with prescribed sea surface temperatures is strongly linked to changes to tropical marine low clouds. However, feedbacks from these clouds are a weak constraint on long-term model ECS. One interpretation is that observations of recent climate changes constitute a poor direct proxy for long-term sensitivity.
Plain Language Summary
Even if we remove the uncertainty associated with human behavior, we still don't know exactly how hot it is going to get. This is because warming associated with increased atmospheric carbon dioxide triggers climate changes that themselves can accelerate or decelerate the warming. The equilibrium climate sensitivity (ECS) is defined as the eventual warming in response to a doubling of atmospheric CO2, and it is tempting to estimate this quantity from recent observations. However, in climate models, the ECS inferred from recent decades is lower than the eventual warming for two reasons. First, it takes the climate many centuries to fully come to equilibrium, and models indicate that we should expect even more warming in the future. Second, the conditions experienced in the real world seem to have given rise to especially low estimates of ECS, perhaps purely by chance. Climate models indicate that not only are ECS estimates based on recent decades lower than the eventual warming, but they may not even be predictive of that warming. A climate model that shows strong warming in response to recent real-world conditions does not necessarily have high long-term sensitivity, and vice versa.
1 Introduction
Equilibrium climate sensitivity (ECS), defined as the equilibrium global mean temperature response to the doubling of atmospheric carbon dioxide concentration, can be inferred by interpreting observations of temperature change, top-of-atmosphere energy imbalance, and effective radiative forcing in an energy balance framework (see Gregory et al., 2002; Lewis & Curry, 2015; Otto et al., 2013, among many examples).
However, a range of recent work (e.g., Armour, 2017; Gregory & Andrews, 2016; Zhou et al., 2016) suggests that such estimates may underestimate equilibrium warming. The explanation for the bias has been interpreted in a variety of ways. The total increase in radiative forcing experienced contains contributions from a range of forcing agents, each with a possibly unique efficacy (Hansen et al., 2005), that is, a different global mean temperature response per unit change in radiative forcing. In this view the high efficacy of tropospheric aerosols and land use change, which have formed a relatively large fraction of historical effective radiative forcing, biases estimates of ECS relative to a future forcing dominated by greenhouse gases (Marvel et al., 2016; Shindell, 2014). An alternative view notes that estimates of climate sensitivity are partly governed by the partitioning between surface temperature changes and deep ocean heat uptake and that the efficacy of this ocean heat uptake (Rose et al., 2014; Rugenstein et al., 2016; Winton et al., 2010)—the rate at which heat is mixed into the deep ocean—may have been anomalously low during this historical period. This explanation is intimately tied to the 1998–2013 “hiatus” (Fyfe & Gillett, 2014) in which sea surface temperature patterns were dominated by anomalously cool conditions in the eastern tropical Pacific (Kosaka & Xie, 2013). Finally, the energy balance framework, like the forcing-adjustment-feedback paradigm on which it is based, assumes that perturbations to the climate system are small enough that feedbacks can be considered constant, but recent experience (Armour et al., 2013; Gregory et al., 2015) shows that this assumption rarely holds even for the quadrupled-CO2 state from which ECS is frequently inferred.
What these interpretations have in common is the idea that relationships between forcing and response are mediated by the spatial pattern of surface warming, especially the ocean surface, which warms more slowly than land. Using experiments with preindustrial forcings and observed sea surface temperatures (SSTs) and sea ice, Gregory and Andrews (2016) established, in the HadGEM climate model, that the climate feedback parameter over the 1979–2005 Atmospheric Model Intercomparison Project (AMIP) period is systematically higher than the equilibrium feedback parameter. Because ECS is inversely related to this parameter, this implies an underestimate of the equilibrium sensitivity. Zhou et al. (2016) proposed a physical mechanism for this difference: in the CESM climate model, the particular SST pattern realized during the AMIP time period leads to enhanced tropical marine low cloud cover and a negative shortwave cloud feedback that does not continue in the long term.
Here we show that both results apply more broadly to the collection of models participating in CMIP5. We use a perfect-model paradigm to reaffirm that the temporal evolution of sea surface temperature makes it nearly inevitable that any estimate of ECS based on observations of the recent past would be lower than the “true” ECS and to demonstrate that the particular manifestation of surface warming to which the Earth has been subject acts to amplify this bias. We show that the first component of bias could result from disequilibrium and the emergence of feedbacks in future that have not been excited during historical warming, while the second component may arise from the confounding of internal variability (noise) with forced response (signal).
2 Methods
We exploit data from the Coupled Model Intercomparison Project, Phase 5 (CMIP5; Taylor et al., 2012) to investigate the roles of SST patterns in apparent and long-term climate sensitivity in a multimodel framework. We use three sets of experiments: (1) atmosphere-only simulations for the period 1979–2005, using observed sea ice and SSTs, sea ice concentrations, and anthropogenic and natural forcings, labeled “amip”; (2) “historical” simulations using the same forcings but in which sea ice and SSTs are predicted by fully coupled models; and (3) “abrupt4xCO2” simulations with coupled atmosphere-ocean models in which atmospheric carbon dioxide is abruptly quadrupled from preindustrial concentrations. In the perfect-model framework the historical simulations constitute a plausible sample of the time-evolving SST patterns that might have occurred due to internal variability and increasing forcing, while the amip experiments are constrained to reflect the pattern of SST and sea ice that was actually experienced. Long-term change is inferred from abrupt4xCO2 simulations which are generally integrated for 140 years following CO2 quadrupling. This duration is not nearly long enough to allow the deep ocean to approach equilibrium but allows significantly different patterns of ocean warming to emerge after the initial decades when internal variabiity influences model feedbacks.

. In their absence we use, for every model, the time series of historical forcing estimated in Myhre et al. (2013) and a canonical value of
= 3.7 W m−2. Any impacts on our results of systematic relationships between ECS and F are mitigated by likely correlations between the
and F terms appearing in the numerator and denominator.A climate feedback parameter λ is inferred from these transient quantities as the regression slope of y = ΔF − ΔQ against x = ΔT; ECS is then estimated as
. ECS estimates are not highly sensitive to the methodology used; regressing over 5 year running means or simply subtracting the first decade from the last (not shown) yields similar results.
Estimates of long-term ECS are those reported in Caldwell et al. (2014), which were obtained by regressing annual mean temperature anomalies against top of the atmosphere (TOA) energy balance changes (Forster et al., 2013; Gregory et al., 2004). Because feedbacks in most models become more positive as time scales approach equilibrium (Armour, 2017; Gregory et al., 2015; Proistosescu & Huybers, 2017), these estimates derived from 140 years of the abrupt4xCO2 experiments are themselves likely to underestimate the eventual climate response after many centuries.
3 Results
3.1 Apparent Equilibrium Climate Sensitivities in AMIP, Historical, and Long-Term Simulations
Figure 1 shows the distributions of ECS inferred from the CMIP5 amip, historical, and abrupt4xCO2 simulations. The median value of ECS inferred from amip simulations (1.8°C) is significantly lower than the median inferred from historical simulations (2.3°C). Because amip and historical simulations use the same forcing over the same time period, this suggests that the specific realization of internal variability experienced in recent decades provides an unusually low estimate of ECS. This interpretation is subject to the caveats of the perfect-model framework, including our assumption that the models as a group provide realistic descriptions of the mechanisms underlying observed climate variability.

The median ECS value inferred from historical simulations is, in turn, smaller than the median “long-term” value (3.1°C). This suggests, as in Rose et al. (2014) and Armour (2017), that disequilibrium effects contribute to an underestimate of ECS because model climate feedbacks become more positive in the far future.
The historical ECS range is large, with 90% of the samples in the range 1.6–3.9°C. This variability reflects intermodel variations in the inferred ECS and neglects the intrinsic uncertainty in estimating ECS over a 26 year period; the resulting spread is comparable to the spread in long-term ECS estimates (2.2–4.4°C). The distribution inferred from amip simulations is more sharply peaked, although the long tail of high values means that the 5–95% range (1.3–3.5°C) is not substantially smaller. This sharper peak of ECS values arises because amip runs represent a single set of SST conditions by construction, while historical simulations represent a far wider range of possible SST patterns and hence pattern-dependent feedbacks. Still, the amip spread is quite large, given that all models are forced by identical SST patterns and, we assume, similar radiative forcings.
Supporting information Figure S1 shows the inferred ECS for each of the CMIP5 ensemble members on a model-by-model basis. The intramodel spread in ECS inferred from historical simulations is larger in all models than the spread inferred from AMIP simulations; this is unsurprising, because all members of the AMIP ensemble experience the same SST patterns while the SST patterns in the unconstrained historical ensembles sample multiple realizations of internal variability. However, the intramodel spread in ECS inferred from AMIP simulations is nonzero and almost 1 K in some cases. The same general circulation model, forced by the same radiative forcing and experiencing the same SSTs and sea ice, can yield different inferred climate sensitivities over this short 26 year period. This suggests that while variations in SST pattern dominate differences in ECS inferred from amip simulations, internal variability in the atmosphere and over land remains an important source of variability in ECS estimates drawn from short observational records.
3.2 Relationships Between Sensitivities Calculated in Different Experiments
Many modeling centers submitted multiple amip and historical simulations. The distributions in Figure 1 are calculated by giving each of these ensemble members associated with each CMIP5 model equal weight. We can gain further insight into the differences between amip sensitivities and historical sensitivities by comparing ensemble means on a model-by-model basis. Figure 2a, comparable to Figure S1 in Gregory and Andrews (2016), shows the ensemble mean amip sensitivity and historical sensitivity for each model used to construct Figure 1. In most models, the ensemble mean ECS estimate from historical simulations exceeds the ensemble mean ECS estimate from AMIP simulations.

The ensemble mean equilibrium climate sensitivity estimated from historical simulations is, in turn, generally less than the “long-term” ECS (Figure 2b). There is a weak but positive correlation (R = 0.39) between the ensemble mean ECS inferred from a model's AMIP simulations and the ECS inferred from the ensemble average over that model's historical simulations. This suggests that models with low sensitivity to the specific SST patterns of the AMIP period also tend to have low sensitivity to other SST patterns realizable over the same time period. The correlation between a model's ensemble mean “historical” ECS and its long-term ECS is more robust (R = 0.66). This larger correlation is likely due to the ensemble averaging process, which damps internal variability relative to the forced response. If only one member of each ensemble is included, the correlation between historical and long-term ECS is much smaller (R = 0.38), albeit significantly different from 0. However, there is no correlation (R =− 0.07) between a model's inferred ECS in response to amip conditions and its long-term sensitivity (Figure 2c). Across the collection of CMIP5 models there is no simple relationship between the response to the SST pattern experienced in recent decades and the long-term response.
Figure 2d shows histograms of ECS bias for amip and historical simulations. This bias is defined as the difference between the ECS inferred from a historical or amip simulation and the ECS calculated from the corresponding abrupt4xCO2 simulation on a model-by-model basis. The mean amip bias (1.4°C) is much larger than the mean historical bias (0.8°C), even when considering the outliers (all members of the GISS physics version 1 and 3 ensembles) where the amip ECS exceeds the abrupt4xCO2 ECS. This suggests that amip SST conditions led to an especially severe underestimate of ECS. This underestimate may result from internal variability: the climate just happened to experience a particular SST pattern that caused a lower-than-average bias. It could also arise from the failure of the coupled models to reproduce aspects of the forced response.
3.3 Tropical Marine Low Clouds as a Controlling Factor
The intermodel and intramodel spread in ECS inferred from amip experiments is striking, as different CMIP models are forced with the same SST and sea ice conditions by construction. What, then, explains the differences in amip climate sensitivity? Zhou et al. (2016) use experiments with a single climate model to suggest that decadal variations in cloud cover, particularly tropical marine low clouds, strongly bias estimates of ECS over the recent historical period relative to long-term estimates. As shown below, this inference holds over a broader range of CMIP5 models.
Following Qu et al. (2014), we calculate the average change in tropical marine low cloud cover ΔLCC in the regions shown in Figure 3a for AMIP and historical experiments. Low clouds are identified by their apparent cloud-top pressures greater than 680 hPa using the ISCCP simulator (Klein & Jakob, 1999; Webb et al., 2001). Using the ISCCP simulator, which diagnoses the changes as would be observed from the TOA, limits the number of models but ensures that the cloud changes are precisely those that impact the TOA radiation budget and hence the feedbacks. Changes are defined as the (1996–2005) average minus the (1979–1988) average.

The differing responses of low clouds in these regions to the same imposed pattern of sea surface temperature explains almost three quarters of the variance in ECS estimates in the amip simulations (Figure 3b): low apparent sensitivity is associated with increases in low cloud cover. The relationship in historical simulations over the same period (Figure 3c) is similarly negative (R =− 0.74) and, despite the small number of models providing ISCCP simulator data, significantly different from 0 (p < 0.01). This suggests that low cloud cover changes in these subtropical regions are important in determining the ECS inferred over the short observational period regardless of SST pattern.
Relatively few models incorporated the ISCCP simulator, so not all CMIP5 models are represented in Figure 3. However, the relationships between tropical marine low cloud cover change and inferred ECS are unlikely to be artifacts of using a subset of CMIP5 models; changes in tropical marine shortwave radiative cloud effect (a standard CMIP5 output available for many more models) yield similar results (supporting information).
These results, combined with those presented in Figure 2, suggest a need for caution in extrapolating ECS estimated from observations of recent change to the far future. CMIP5 models' sensitivity to the observed SST patterns appears to be largely controlled by low cloud changes in the stratocumulus regions. This process is also highly explanatory of the spread in historical ECS estimates, suggesting that low inferred sensitivities for amip simulations may be an artifact of the particular SST pattern experienced—itself a combination of external forcing and internal variability—and the changes in low clouds induced by this pattern. However, observed low cloud changes in the stratocumulus regions are not predictive of future climate. Figure 2c indicates that there is no simple relationship between a model's amip sensitivity, largely controlled by low cloud changes, and a model's long-term sensitivity.
4 Conclusions
Our results are built on several antecedents. Gregory and Andrews (2016) established that the climate feedback parameter in the HadGEM model estimated from amip-like experiments with prescribed SST observed over 1979–2005 is systematically higher than the equilibrium feedback parameter from this model,implying that estimates of ECS from recent decades would be biased low. The link to cloud feedback was established by Zhou et al. (2016) who used the CESM model to demonstrate that the particular observed SST pattern leads to enhanced tropical marine low cloud cover and a negative shortwave cloud feedback that does not continue in the long term. More recently, Silvers et al. (2017) showed similar responses in the Geophysical Fluid Dynamics Laboratory (GFDL) models, contrasting increases in tropical low cloud cover during the recent low-sensitivity period with decreases in tropical low clouds during an earlier period of higher sensitivity.
We have shown that both results apply more broadly to the collection of models participating in CMIP5. In such models, the ECS inferred from simulations using observed SSTs is lower than ECS in coupled simulations over the same time period in which SSTs are allowed to evolve. These historical simulations, in turn, yield lower ECS estimates than simulations in which CO2 is abruptly quadrupled.
We suggest that a low bias in ECS inferred from temperature change, imbalance, and forcing over recent decades is due to climate disequilibrium. Models tend to project more positive feedbacks in the far future than in the recent past, and taking these centennial-scale modes into account largely reconciles historical and long-term estimates (Proistosescu & Huybers, 2017). But, to the extent that the models participating in CMIP provide a reliable sample of possible ocean states, those recent decades appear to have experienced a pattern of sea surface temperatures that excited unusually negative feedback in tropical marine low clouds, leading to an even lower estimate of climate sensitivity than would have been expected under more usual historical conditions. The amip conditions appear to be unusual in the historical context due to the particular manifestation of internal variability experienced. There also remains the nonexclusive possibility that coupled models simply fail to capture important aspects of the real-world climate response to forcing. However the spatial pattern of SST arose during the AMIP period, the fact remains that this SST pattern yields lower estimates of ECS in models than most patterns produced by coupled historical simulations.
If greenhouse gas emissions continue to increase, then the resulting increase in radiative forcing will reduce the role of internal variability, enhancing the signal-to-noise ratio. Nonetheless, the climate may pass through a series of states where the feedbacks do not resemble those found at equilibrium. This suggests that there are no direct analogues to be found in the recent past; the only way to experience equilibrium climate is to wait for equilibrium. Moreover, while intermodel differences in amip ECS are largely explained by different subtropical stratocumulus cloud changes, a model's amip ECS is not predictive of its long-term ECS. Evidently, other feedback are dominant in the long term, eventual stratocumulus changes are unrelated to recent stratocumulus changes, or both. This suggests that ECS estimates inferred from recent observations are not only biased but do not necessarily provide any simple constraint on future climate sensitivity.
Acknowledgments
Climate modeling at GISS is supported by the NASA Modeling, Analysis and Prediction Program and resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Center for Climate Simulation (NCCS) at Goddard Space Flight Center. We acknowledge the World Climate Research Program's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table S1 of this paper) for producing and making available their model output. For CMIP the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. KM is supported by the Regional and Global Climate Modeling Program of the US Department of Energy under grant DE-SC0014423 R. P. is supported by the Regional and Global Climate Modeling Program of the U.S. Department of Energy under grant DE-SC0012549 and by the National Science Foundation under grant ATM-1138394.





