Equilibrium Climate Sensitivity Estimated by Equilibrating Climate Models

The methods to quantify equilibrium climate sensitivity are still debated. We collect millennial-length simulations of coupled climate models and show that the global mean equilibrium warming is higher than those obtained using extrapolation methods from shorter simulations. Specifically, 27 simulations with 15


Estimating Equilibrium Climate Sensitivity
The equilibrium climate sensitivity (ECS) is defined as the global-and time-mean, surface air warming once radiative equilibrium is reached in response to doubling the atmospheric CO 2 concentration above pre-industrial levels.It is by far the most commonly and continuously applied concept to assess our understanding of the climate system as simulated in climate models, and it is used to compare models, observations, and paleo-proxies (Charney et al., 1979;Houghton et al., 1990;Knutti et al., 2017;Stocker, 2013).Due to the large heat capacity of the oceans, the climate system takes millennia to equilibrate to a forcing, but performing such a long simulation with a climate model is often computationally not feasible.As a result, many modeling studies use extrapolation methods on short, typically 150-year long, simulations to project equilibrium conditions (Andrews et al., 2012(Andrews et al., , 2015;;Calel & Stainforth, 2017;Collins et al., 2013;Forster, 2016;Lewis & Curry, 2015;Taylor et al., 2011;Otto et al., 2013).These so-called effective climate sensitivities (Gregory et al., 2004;Murphy, 1995) are often reported as ECS values (Brient & Schneider, 2016;Forster, 2016;Hargreaves & Annan, 2016;Tian, 2015).Research provides evidence for decadal-to-centennial changes of feedbacks (e.g., Armour et al., 2013;Gregory et al., 2004;Murphy, 1995;Paynter et al., 2018;Proistosescu & Huybers, 2017;Senior & Mitchell, 2000;Winton et al., 2010) but the behavior on longer time scales has not been compared among models.Here, we utilize LongRunMIP, a large set of millennia-long coupled general circulations models (GCMs) to estimate the true equilibrium warming, study the centennial-to-millennial behavior of the climate system under elevated radiative forcing, and test extrapolation methods.LongRun-MIP is a model intercomparison project (MIP) of opportunity in that its initial contributions were preexisting Geophysical Research Letters 10.1029/2019GL083898  09 (16.04 -19.19) 11.20 (11.00 -11.42) 12.19 11.87 (11.51 -12.33)     simulations, without a previously agreed upon protocol.The minimum contribution is a simulation of at least 1,000 years with a constant CO 2 forcing level.The collection consists mostly of doubling or quadrupling step forcing simulations ("abrupt2x", "abrupt4x", … ) as well as annual increments of 1% CO 2 increases reaching and sustaining doubled or quadrupled concentrations ("1pct2x", "1pct4x").Table 1 lists the simulations and models used here, while Rugenstein et al. (2019) documents the entire modeling effort and each contribution in detail.

10.1029/2019GL083898
The equilibration of top of the atmosphere (TOA) radiative imbalance and surface temperature anomaly of the simulations are depicted in Figure 1.Throughout the manuscript, we show anomalies as the difference to the mean of the unforced control simulation with pre-industrial CO 2 concentrations.Light gray dots indicate annual means of the first 150 years of a step forcing simulation, requested by the Coupled Model Intercomparion Project Phase 5 and 6 protocols (CMIP5 and CMIP6; Eyring et al., 2016;Taylor et al., 2011) and widely used to infer ECS (Andrews et al., 2012;Geoffroy et al., 2013).We refer to this time scale as "decadal to centennial".Colors indicate the "centennial to millennial" time scale we explore here.The diminishing distances to the reference line at TOA = 0 indicate that most simulations archive near equilibrium by the end of the simulations.However, even if a simulation has an equilibrated TOA imbalance of near zero, the surface temperature, surface heat fluxes, or ocean temperatures can still show a trend (discussed in Rugenstein et al., 2019).
Throughout the manuscript, we use "ΔT [specification] " for a true or estimated equilibrium warming, for a range of forcing levels not only CO 2 doubling (Table 1).We define the best estimate of equilibrium warming, ΔT best est , as the temperature-axis intersect of the regression of annual means of TOA imbalance and surface temperature anomaly over the simulations' final 15% of global mean warming (black lines in Figure 1).The lower right panel in Figure 1 illustrates that all simulations eventually warm significantly more (measured by ΔT best est ) than predicted by the most commonly used method to estimate the equilibrium temperature by extrapolating a least-square regression of the first 150 years of the same step forcing simulation (Gregory et al., 2004;Flato et al., 2013), denoted here as "ΔT est 1−150 ".For simulations that have gradual forcings (e.g., 1pct2x), we use 150-year long step forcing simulations of the same model to calculate ΔT est 1−150 .The median increase of ΔT best est over ΔT est 1−150 is 17% for all simulations and 16% for the subset of CO 2 doubling and quadrupling simulations.While ΔT est 1−150 implies a constant feedback parameter (the slope of the regression line), other extrapolation methods allow for a time-dependent feedback parameter, but still typically underestimate ΔT best est : Using years 20-150 in linear regression (ΔT est 20−150 ; e.g., Andrews et al., 2015;Armour, 2017) results in a median equilibrium warming estimate which is 7% lower than ΔT best est , both for all simulations and the subset of CO 2 doubling and quadrupling.The two-layer model including ocean heat uptake efficacy (ΔT EBM− ; e.g., Geoffroy et al., 2013;Winton et al., 2010) results in a multimodel median equilibrium warming estimate which is 9% lower than ΔT best est , again both for all simulations and the subset of CO 2 doubling and quadrupling.Both methods are described and illustrated in the supporting information.
ΔT best est of any forcing level can be scaled down to doubling CO 2 levels to estimate equilibrium warming for CO 2 doubling.We do so by assuming that the temperature scales with the forcing level, which depends logarithmically on the CO 2 concentration (Myhre et al., 1998), and by assuming no feedback temperature dependence (e.g., Mauritsen et al., 2018 andRohrschneider et al., 2019, see discussion below).The estimate of equilibrium warming for CO 2 doubling range from 2.42 to 5.83 K (excluding FAMOUS abrupt4x at 8.55 K; see Table 1 and Figure 1).Note that the simulation abrupt4x of the model FAMOUS warms anomalously strongly.As this simulation represents a physically possible result, we do not exclude it from the analysis (see more details in supporting information section 3).The results are qualitatively the same if ΔT best est is defined by regressing over the last 20% instead of 15% of warming or instead time averaging the surface warming toward the end of every simulation without taking the information of the TOA imbalance into account.Supporting information section 1 discusses different options and choices to determine ΔT best est .

Global Feedback Evolution
Current extrapolation methods underestimate the equilibrium response because climate feedbacks change with the degree of equilibration (Andrews et al., 2015;Armour, 2017;Knutti & Rugenstein, 2015;Murphy, 1995;Paynter et al., 2018;Proistosescu & Huybers, 2017;Rugenstein et al., 2016;Senior & Mitchell, 2000).We define the global net TOA feedback as the local tangent in temperature-TOA space (ΔTOA/ΔT) computed by a least square regression of all global and annual means of netTOA imbalance and surface temperature anomaly within a temperature bin, which is moved in steps of 0.1 K throughout the temperature space to obtain the continuous local slope of the point cloud (sketched out in supporting information Figure S2a).We decompose the net TOA imbalance into its clear sky and cloud radiative effects (CRE; e.g., Ceppi & Gregory, 2017;Soden & Held, 2006;Wetherald & Manabe, 1988) in the shortwave and longwave (Figure 2a).The feedbacks change continuously -not on obviously separable time scales -in some models more 10.1029/2019GL083898 at the beginning of the simulations (e.g., CESM104), in some models after 150 years (e.g., GISSE2R) or, in some models, intermittently throughout the simulation (e.g., MPIESM11 or HadGEM2).The shortwave CRE dominates the magnitude and the timing of the net feedback change, and can be counteracted by the longwave CRE.The reduction of the shortwave clear sky feedback associated with ice albedo, lapse rate, and water vapor is a function of temperature and occurs on centennial to millennial time scales.Longwave clear sky changes, when present, contribute to the increase of the sensitivity with equilibration time and temperature.The net feedback parameter can be composed of a subtle balance of different components at any time, and the forced signal is not obviously linked to the feedback arising from internal variability, defined by regressing all available annual and global means of TOA imbalance and surface temperature anomalies (relative to the mean) of the control simulations (circles in Figure 2a; Brown et al., 2014;Colman & Hanson, 2017;Roe, 2009;Zhou et al., 2015).
Models which are more sensitive than other models -have feedbacks which are more positive -at the beginning of the simulation are generally also more sensitive towards the end.The model spread in the magnitude of feedbacks does not substantially reduce in time, while the feedback parameter change varies from negligible to an order of magnitude.We quantify the continuous changes across models by considering different time periods, namely Years 1-20, 21-150, and 151-1,000 (Figure 2b), in each of which we regress all points.In addition to the increase of the feedback parameter between Years 1-20 and 21-150, which has been documented for CMIP5 models (Andrews et al., 2015;Ceppi & Gregory, 2017;Geoffroy et al., 2013;Proistosescu & Huybers, 2017), there is a further increase from centennial to millennial time scales.
Previous research has shown that the change in feedbacks over time can come about through a dependence of feedback processes on the increasing temperature (Bloch-Johnson et al., 2015;Caballero & Huber, 2013;Hansen et al., 1984;Jonko et al., 2013;Meraner et al., 2013) due to evolving surface warming patterns and feedback processes ("pattern effect"; Armour et al., 2013;Gregory & Andrews, 2016;Haugstad et al., 2017;Paynter et al., 2018;Rugenstein et al., 2016;Senior & Mitchell, 2000;Winton et al., 2010), or both at the same time (Rohrschneider et al., 2019).There is no published method which clearly differentiates between time/pattern and temperature/state dependence, and simulations with several forcing levels are needed to disentangle them.The relationship between forcing and CO 2 concentrations is a matter of debate (Etminan et al., 2016) and further complicates the analysis, as time, temperature, and forcing level dependence might compensate to some degree (Gregory et al., 2015).As not all models contributed several forcing levels, we focus in the following on robust pattern changes in surface temperatures and feedbacks, which occur in most or all simulations irrespective of their overall temperature anomaly or forcing level.

Pattern Evolution of Surface Warming and Feedbacks
The evolution of surface warming patterns during the decadal, centennial, and millennial periods displays a fast establishment of a land-sea warming contrast, Arctic amplification, and the delayed warming over the Southern Ocean that have been studied on annual to centennial time scales (Figure 3; Armour et al., 2016;Collins et al., 2013;Li et al., 2013;Senior & Mitchell, 2000).Arctic amplification does not change substantially, whereas Antarctic amplification strengthens by approximately 50% on centennial to millennial time scales (Rugenstein et al., 2019;Salzmann, 2017).The warming in the northern North Atlantic reflects the strengthening of the Atlantic meridional overturning circulation, after the initial decline (Jansen et al., 2018;Li et al., 2013;Rind et al., 2018;Rugenstein et al., 2016;Stouffer & Manabe, 2003).
In the Pacific, at all times, the temperatures in absolute terms are higher in the west compared to the east Pacific.The eastern equatorial Pacific warms more than the warm pool in most simulations, a phenomenon reminiscent of the positive phase of the El-Ni ño-Southern Oscillation (ENSO; "ENSO-like warming" ; Andrews et al., 2015;Luo et al., 2017;Tierney et al., 2019;Song & Zhang, 2014).This tendency can last several millennia, but significantly reduces or stops in most simulations after a few hundred years.Similar to the equatorial east Pacific, the south east Pacific warms more than the warm pool (Andrews & Webb, 2018;Zhou et al., 2016).However, models display a large variance in the time scales of warming in these two regions, that is, the warm pool can initially warm faster or slower than the south east Pacific.Across the Pacific, the change in surface warming pattern is reminiscent of the Interdecadal Pacific Oscillation (IPO; Figure 3d).In many models, the reduction of the Walker circulation coincides with the decadal to centennial ENSO/IPO-like warming pattern, but it does not obviously coincide with surface warming pattern changes on the millennial time scale, indicating that subtropical ocean gyre advection and upwelling play a more  prominent role on longer time scales (Andrews & Webb, 2018;Fedorov et al., 2015;Knutson & Manabe, 1995;Kohyama et al., 2017;Luo et al., 2017;Song & Zhang, 2014;Zhou et al., 2017).The mechanisms and spread of model responses in the Pacific are still under investigation.
Feedbacks defined as the local tangent in temperature-TOA space as used in Figure 2a contain a signal from both the internal variability and the forced response.In order to isolate the forced response, we take the difference of the means at the beginning and end of the time periods discussed above.We call this definition of feedbacks the finite difference approach, as it represents a change across a time period (Figure S2b).10.1029/2019GL083898 millennial time scales, our models show that feedbacks become less negative almost everywhere, switching from slightly negative to positive in parts of the Southern Ocean and North Atlantic region, and become less destabilizing in the tropical Pacific (Figure 4c).The feedback pattern change from decadal to centennial time scales (Figure 4d) is reversed in many regions on centennial to millennial time scales (Figure 4e), particularly in the entire Pacific basin, the Atlantic, and parts of Asia and North America.This "pattern flip" is dominated by longwave CRE (Figure S8) and mirrors, in the Pacific, the reduction in ENSO/IPO-like surface warming patterns discussed for the surface temperature evolution.
Note that the local temperature is not part of the calculation of the local contribution in feedback changes.Due to the far-field effects of local feedbacks (e.g., Ceppi & Gregory, 2017;Dong et al., 2019;Kang & Xie, 2014;Liu et al., 2018;Rose et al., 2014;Rugenstein et al., 2016;Zhou et al., 2016Zhou et al., , 2017)), the relation between the local feedback contribution (Figure 4) and the local temperatures (Figure 3) is not straightforward.There is a strong correspondence between changes of TOA fluxes and temperature patterns in the Pacific on decadal to millennial time scales: Stronger (weaker) local warming coincides with a more positive (negative) local feedback contribution.However, there is no clear correspondence directly after the application of the forcing or over land and the Southern Ocean through time.Although the spatial patterns of changing temperature and radiative feedbacks vary among models, the large-scale features discussed here occur robustly across most models and forcing levels and also occur in the 1pct2x and 1pct4x simulations, which are not included in the figures.

Regions Accounting for Changing Global Feedbacks
We quantify the contribution of the tropics, extra tropics, and polar regions to the global feedback change (Figures 4d and e) by adding up all feedback contributions of the respective areas indicated by the gray triangles and expressing them as percentages of the total.We note that the total is the global feedback parameter, that is, the slope of the point clouds in Figure 1 which is indicated on the top right of each panel.These percentages reflect the role played by TOA fluxes in each region, which is not the same as the role played by surface warming in each region, as noted above.Whereas the tropics account for the bulk of the change (58% on decadal to centennial and 47% on centennial to millennial time scales), the middle latitudes become more important with time (Northern and Southern Hemisphere combined for 41% on decadal to centennial and for 66% on centennial to millennial time scales).The high latitudes, dominated by the shortwave clear sky feedback (Figure S12), play only a minor role in influencing the global response at all time scales.The regional accounting of global feedback changes permits us to test competing explanations regarding the spatial feedback pattern by placing them in a common temporal framework.Primary regions controlling the global feedback evolution have been suggested to be the Southern Hemisphere middle to high latitudes (Senior & Mitchell, 2000) and the Northern Hemisphere subpolar regions (Rose & Rayborn, 2016;Trossman et al., 2016), and the Tropics (Andrews et al., 2015;Block & Mauritsen, 2013;Ceppi & Gregory, 2019;Jonko et al., 2013;Meraner et al., 2013), especially in the Pacific (Andrews & Webb, 2018;Ceppi & Gregory, 2017).
The simulations robustly show a delayed warming in the Southern Hemisphere relative to the Northern Hemisphere throughout the millennia-long integrations, which correlates with the time evolution of net TOA and shortwave CRE (not shown).This behavior lends support to the hypothesis of Senior and Mitchell (2000) who propose that feedbacks change through time due to the slow warming rates of the Southern Ocean relative to the upper atmospheric levels.This reduced lapse rate increases atmospheric static stability (and thus, the shortwave cloud response) in the transient part of the simulation, but decreasingly less so towards equilibrium.
The extra-tropical cloud response in the model mean is non-negligible in the Southern Ocean and North Atlantic on decadal to centennial time scales, as proposed by Rose and Rencurrel (2016) and Trossman et al. (2016).However, it comes to dominate the global response only on centennial to millennial time scales and when both hemispheres are considered.

10.1029/2019GL083898
We find that the longwave clear sky feedback does moderately increase in many models as the temperature or the forcing level increases, mainly in the tropics and Northern Hemisphere middle latitudes (Figures 2a,S4,and S5).This is in accordance with the proposed argument that the tropics govern the global feedback evolution because the water vapor feedback increases with warming (Andrews et al., 2015;Block & Mauritsen, 2013;Jonko et al., 2013;Meraner et al., 2013), possibly following the rising tropical tropopause (Meraner et al., 2013;Mauritsen et al., 2018).
Recent work has focused on the relative influence of the Pacific, specifically the relative influence of temperatures of the warm pool compared to other regions.Feedbacks in regions of atmospheric deep convections have a far-field and global effect, while feedbacks in regions of atmospheric subsidence have only a local or regional influence (Andrews & Webb, 2018;Barsugli & Sardeshmukh, 2002;Ceppi & Gregory, 2019;Dong et al., 2019;Zhou et al., 2017).With the available fields in the LongRunMIP archive, we cannot quantify the relative importance of water vapor and lapse rate feedbacks.However, the shortwave and longwave cloud response (Figures S6-S8) in the models qualitatively agree with the proposed change of tropospheric stability patterns on decadal to centennial time scales (Andrews & Webb, 2018;Ceppi & Gregory, 2017), especially in the Pacific region.In contrast, on centennial to millennial time scales, the tropical Pacific response becomes less important compared to the middle latitudes and the net tropical CRE does not change anymore (Figure S6).

Implications
We demonstrate that the evolution of the global feedback response is dominated by the middle latitudes on centennial to millennial and the tropics on decadal to centennial time scales.The global net feedback change is a result of a subtle balance of different regions and different TOA components at all times; even more so in single simulations than in the model mean shown here.This motivates process-based feedback studies in individual models as well as multimodel ensembles to draw robust conclusions and increase physical understanding of processes.To relate the time scales and model behavior to the observational record and paleo proxies, a better understanding of (a) the atmospheric versus oceanic drivers of surface temperature patterns in both the coupled climate models and the real world and (b) the local and far-field interactions of tropospheric stability, clouds, and surface temperatures need to be achieved.Note that climate models have typical and persistent biases in regions we identify as important, mainly the equatorial Pacific, Southern Ocean, and ocean upwelling regions.The pattern effect of the real world might act on time scales which are different than the ones of the climate models.
Our results show that radiative feedbacks, usually called "fast," act continuously less stabilizing on the climate system as the models approach equilibrium.As a result, the equilibrium warming is higher than estimated with common extrapolation methods from short simulations for all models and simulations in the LongRunMIP archive.ECS has been historically used as a model characterization (Charney et al., 1979), but some studies propose that it is not the most adequate measure for estimating changes expected over the next decades and until the end of the century (e.g., Knutti et al., 2017;Shiogama et al., 2016;Otto et al., 2013).Alternative climate sensitivity measures are the effective climate sensitivity computed on different time scales, the transient climate response to gradually increasing CO 2 (transient climate response), or the transient climate response to cumulative carbon emissions (e.g., Allen & Frame, 2007;Gregory et al., 2015;Grose et al., 2018;Millar et al., 2015).Beyond not being an accurate indicator of the equilibrium response, these alternative climate sensitivity measures capture the models in different degrees of equilibration.We show that it is an open question how different measures of sensitivity relate to each other.A recent study shows that ΔT est 1−150 correlates better than TCR with the end-of-21st-century warming across model (Grose et al., 2018, see also Gregory et al., 2015).Thus, we underscore the need of comparing models, observations, and paleo proxies on well-defined measures of climate sensitivity, which ensure they are in the same state of equilibration.

Figure 1 .
Figure 1.Evolution of global and annual mean top of the atmosphere (TOA) imbalance and surface temperature anomalies (14 small panels).The first 150 years of step forcing simulations are depicted in light gray, the LongRunMIP gain in colors.For 1pct simulations, only the period with stabilized CO 2 concentrations is shown.The black line shows the linear regression of TOA imbalance and surface warming for the last 15% of warming.The panel on the lower right shows the ratio ΔT best est / ΔT est 1−150 , see text for definitions.A dot at the lower end of the bar indicates with 90% confidence that ΔT best est and ΔT est 1−150 obtained by resampling 10,000 times do not overlap.The gray hashed bar in the background is the median of all simulations (1.17).FAMOUS abrupt4x ends outside of the depicted range at 1.53.Table 1 specifies the model versions and names, length of simulations, and numerical values for different climate sensitivity estimates.

Figure a )
Figure a) Time evolution of global feedbacks in four characteristic models.Net TOA feedback (gray) is the sum of its components: the cloud effects in the shortwave (red) and longwave (blue), and clear sky feedbacks in the shortwave (salmon) and longwave (light blue).Circles at the right of each panel indicate the feedbacks arising from internal variability; shading and vertical lines shows the 2.5-97.5% confidence intervals.Panel titles give the model name and length of the simulation.Time periods of 1-20 years and 150-1000 years are shaded gray in the background.(b) Feedback evolution in the step forcing simulations of CCSM3, CESM104, CNRMCM6, ECHAM5MPIOM, FAMOUS, GISSE2R, HadCM3L, HadGEM2, IPSLCM5A, MPIESM11, and MPIESM12, seeTable 1 for naming convention.Lines show all simulations, dots represent median values and bars spans all but the two highest and two lowest simulations.Figures S4 and S5 show the feedback evolution for all available simulations.

Figure 3 .
Figure 3. Multi-model mean normalized patterns of surface warming (local warming divided by global warming) between the average of (a) the control simulation and Year 15-25, (b) Years 15-25 and 140-160, (c) Years 140-160 and 800-1,000, and their differences (d and e) for the same models and simulations as in Figure 2b.For models contributing several simulations, these are averaged.Stippling in Panels (d) and (e) indicates that 9 out of 11 models agree in the sign of change.

Figure 4 .
Figure 4. Time evolution of feedback patterns.Model-mean of local contribution to the change in global feedbacks (local top of the atmosphere anomaly divided by global warming during the period indicated in the panel titles; see text for definitions) (a-c) and their differences (d, e).The global feedback value is shown in the panel title.Regionally aggregated contributions to the global values are indicated with percent numbers and gray triangles (22 • S-22 • N, 22 • S/N-66 • S/N, 66 • S/N-90 • S/N, representing 40%, 27%, and 4% of the global surface area respectively).Model and simulation selection, weighting, and stippling are the same as in Figure 3. Supporting information Figures S6-12 show all top of the atmosphere components.
Figures S13 and S14 show overlays of Figure 3 and 4 for a better comparison.A local correspondence does not necessarily indicate a strong local feedback (i.e., local TOA divided by local surface temperature change), as both the local TOA and the surface in one region could be forced by another region.A closer investigation of local and far-field influence of feedbacks is under investigation.

Table 1
Estimates of Equilibrium Warming and their Uncertainties for Each Simulation ΔT best est / ΔT est 1−150 , see text for definitions.A dot at the lower end of the bar indicates with 90% confidence that ΔT best est and ΔT est 1−150 obtained by resampling 10,000 times do not overlap.The gray hashed bar in the background is the median of all simulations (1.17).FAMOUS abrupt4x ends outside of the depicted range at 1.53.Table1specifies the model versions and names, length of simulations, and numerical values for different climate sensitivity estimates.
Table 1 for naming convention.Lines show all simulations, dots represent median values and bars spans all but the two highest and two lowest simulations.Figures S4 and S5 show the feedback evolution for all available simulations.