Multivariate Estimations of Equilibrium Climate Sensitivity from Short Transient Warming Simulations

One of the most used metrics to gauge the effects of climate change is the equilibrium climate sensitivity, defined as the long-term (equilibrium) temperature increase resulting from instantaneous doubling of atmospheric CO$_2$. Since global climate models cannot be fully equilibrated in practice, extrapolation techniques are used to estimate the equilibrium state from transient warming simulations. Because of the abundance of climate feedbacks - spanning a wide range of temporal scales - it is hard to extract long-term behaviour from short-time series; predominantly used techniques are only capable of detecting the single most dominant eigenmode, thus hampering their ability to give accurate long-term estimates. Here, we present an extension to those methods by incorporating data from multiple observables in a multi-component linear regression model. This way, not only the dominant but also the next-dominant eigenmodes of the climate system are captured, leading to better long-term estimates from short, non-equilibrated time series.

For conceptual models and earth system models of intermediate complexity, it is possible to let a simulation run until the system is fully equilibrated (Holden et al., 2014). However, for more refined models, including contemporary and future state-of-the-art global climate models, this is not viable (Rugenstein et al., 2019); equilibrating those models simply takes too much computing power. With the current trend -and need -to build models with higher temporal and spatial resolutions Duffy et al., 2003;Govindasamy et al., 2003;Haarsma et al., 2016), this is not expected to resolve itself in the near future. Hence, the equilibrium climate sensitivity of these models is instead estimated by extrapolating transient warming simulations -way before these models have reached equilibrium (Knutti & Hegerl, 2008;Rugenstein et al., 2020;Dai et al., 2020;Knutti et al., 2017). There are several techniques to perform such extrapolation that use different physical and mathematical properties of the system to give sensible estimates for the true equilibrium climate sensitivity of a model (Knutti & Hegerl, 2008;Dai et al., 2020;Gregory et al., 2004;. The main problem with equilibrium estimations lies with the abundance of feedbacks present in the climate system (Von der Heydt et al., 2016. These feedbacks are quite diverse and span a wide range of spatial and temporal scales; these include, for example, the very fast Planck feedback, the slower ice-albedo feedback and the even slower ocean circulation feedbacks. Estimation techniques deal differently with this problem, for instance by incorporating multiple time scales directly in the estimation method , by explicit modelling of long-term (ocean) heat uptake  or more indirectly by ignoring initial fast warming behaviour (Rugenstein et al., 2020;Dai et al., 2020).
The most predominantly used estimation technique is the one developed by Gregory et al. (2004). In this technique, the top-of-atmosphere radiative imbalance (∆R) is fitted using a linear regression against the temperature increase (∆T ). However, recently, it has become clear that ∆R and ∆T do not always adhere to such linear relationship . Typically, there is an initial fast warming which is followed by one or several slower additional (less substantial) warming processes. Hence, estimates made by this method depend heavily on the time period used in the regression and typically underestimate the equilibrium warming. Most of the times, this problem is largely circumvented by ignoring the first part of a simulation that contains the initial fast processes; the regression is then applied only on the last part of the simulation.
The thus ignored data does however still contain information about the dynamics of the system -even beyond the initial fast warming. The issue here is that this information cannot be extracted using a one-dimensional linear regression; that kind of fit will only ever recover the one process (i.e. one eigenmode of decay to equilibrium) that is most dominantly present on the time scale of the regression data. In this paper, we present an extension to this technique that is capable of capturing multiple eigenmodes by incorporating additional observables into a multi-component linear regression (abbreviated as MC-LR) model. Subsequently, we show the potential efficiency of this technique using both low dimensional conceptual models and modern global climate models.

Method: a Multi-Component Linear Regression Model
In the linear regime of the decay to equilibrium, the evolution of any observable O (e.g. global mean temperature increase or top-of-atmosphere radiative imbalance) is given by the sum of exponentials (i.e. the eigenvalue decomposition), capturing the behaviour on different time scales based on the different eigenmodes of the system. Specifically, denoting the equilibrium value of an observable by O * , this evolution follows where λ j denote the eigenvalues and β If only one eigenmode would be present (or relevant, as other eigenmodes are exponentially small on the time scale of the data), the evolution of the global mean surface temperature increase ∆T and the top-of-atmosphere radiative imbalance ∆R can be combined into the linear relation Since ∆R * = 0, this readily gives rise to the commonly used regression model by Gregory et al. (2004), where a := and f := −a∆T * are to be determined from the used regression data. In this case, the equilibrium warming is estimated by ∆T est * := − 1 a f . If multiple eigenmodes are relevant, there no longer is such linear relationship between ∆R and ∆T (as time t cannot be eliminated from the equations anymore) and this technique breaks down. It is, however, possible to extend the technique by taking additional observables into account: if N eigenmodes are relevant, one must use two sets of N observables, denoted here by X and Y ; using a similar procedure, the equations for their evolutions can be combined together (e.g. using basic matrix computations) to obtain the linear relation where A is a N × N matrix. If the set of observables in Y only contains observables that tend to zero in equilibrium (i.e. Y * = 0), this gives rise to a new multi-component linear regression model with A and F := −A X * to be determined by the regression data. Here, equilibrium estimates are given by the vector X est * := −A −1 F and contain equilibrium estimates for all observables in X. The method by Gregory et al. (2004) is a special example of this regression model, where N = 1, X = ∆T and Y = ∆R. Here, this model is extended by adding one or two observables to the data vectors X and Y (i.e. N = 2 or N = 3, lining up with previous studies by Caldeira and Myhrvold (2013); Tsutsui (2017); ). Specifically, the mean global effective top-of-atmosphere short-wave albedo α and long-wave emissivity ε are considered as additional observables. Their values are added to the data vector X and the values of their (numerical) time-derivatives -that tend to zero in equilibrium -to Y . The fits in this study are all made using standard least squares regression.
A different and much more extensive take on the rationale behind the technique can be found in Supporting Information Text S1.

Results: Conceptual Models
First, we present the results on a variant of the conceptual Budyko-Sellers energy balance model for global mean surface temperature (Budyko, 1969;Sellers, 1969). This model has been extended such that albedo and emissivity are no longer instantaneous processes, but will settle slowly over time. Moreover, white noise has been added to simulate climate variability. Thus, a three-component stochastic ordinary differential equation is created, which has been simulated in MATLAB with an Euler-Maruyama scheme. A more extensive description of the model can be found in Supporting Information Text S2.
Output of this model has been analyzed using the previously described MC-LR technique with the use of some or all of the observables. The resulting estimates for the equilibrium climate increase ∆T est * (t) are given in Figure 1 for simulation runs with moderate noise (figures for other noise levels can be found in Supporting Information Figures S3 and S4). These estimates are given as functions of model time: the value for time t indicates the estimate is made with model output up to time t only. To evaluate the various estimation techniques and track their accuracy depending on the amount of data used, the remaining relative error is computed: the maximum in relative error of the estimates occurring after the current time (i.e. when more data points are used). This gives a better impression of the kind of error to expect when using data up to time t. Mathematically, the remaining error is defined as where ∆T * is the true equilibrium warming (determined numerically via Newton's method). For this kind of low-dimensional models, it is clear that the multi-component linear regression leads to better estimations of the real equilibrium warming than conventional techniques ( Figure 1). Although the estimations for very short time series are not very accurate, estimations for slightly longer time series quickly pick up and are much better compared to the linear 'Gregory' fit ( Figure 1a), because also the longer time dynamics are taken into account (and are accurately fitted; see Supporting Information Text S2 and Figure S5). It takes some tens of (arbitrary) time units for the new estimates to get within 0.1K of the actual equilibrium value, whereas hundreds of time units are needed for the conventional technique ( Figure 1b). Moreover, it also seems that the MC-LR technique still works reliable in case of noise.

Results: LongRunMIP Models
The MC-LR technique has also been tested on more detailed global climate models. Specifically, data is taken from abrupt 4 × CO 2 forcing experiments of models participating in LongRunMIP, a model intercomparison project that focuses on millennia-long simulation runs (Rugenstein et al., 2019). Because of these long time series, a relative accurate value for the true equilibrium temperature can be determined, which is needed to adequately assess the performance of the estimation techniques.
With the use of the model output, various techniques have been used to estimate equilibrium warming for all models. In Figure 2, a Gregory (∆T, ∆R)-plot is given along with results of commonly used estimation techniques for one of the models (CESM 1.0.4) when applied on data up to model year 300. This illustrates the capabilities of the various techniques in capturing the behaviour of the model system over different time scales. Clearly, the classical Gregory method mainly captures initial fast warming from the data. Hence, it is common practice to ignore an arbitrary number of years from the start of the simulation run -that show the initial fast warmingin a Gregory fit (Rugenstein et al., 2020;Dai et al., 2020). That technique has also been tested here, where the initial 20 years have been excluded. In contrast, the multi-component linear regression technique does not rely on such arbitrary choices for data selection and outperforms both of these classical methods. Certainly, there also exist other alternative estimation techniques that aim to extract long-term behaviour from short simulation runs (of which two have been added to Figure 2). However, these often amount to fitting an explicit low-dimensional model to transient simulations (e.g.  and/or a non-linear regression (e.g. . The proposed MC-LR method does neither -and furthermore seems to perform similar or better than the mentioned other methods. The results for other time frames are shown in Figure 3. Here, as before, estimates ∆T est * (t) are functions of time, which only use data up to a given time t for the estimation, and remaining relative errors have been computed as well. These results show that the MC-LR method also performs better on other time frames; in particular, when data for more than 150 years is being used, a multi-component linear regression that utilises both albedo and emissivity leads to better estimates compared to the classical Gregory methods. Especially on a century time scale this leads to significant improvements. Detailed results for all models can be found in Supporting Informaiton Figures S8-S18.
To further disseminate the results and to assess the effectiveness over the range of models, in Figure 4 the remaining errors are given for all considered models at given times t = 150 years (CMIP protocol, ), t = 300 years and t = 500 years. These results indicate that the MC-LR method can lead to more accurate equilibrium warming estimates. This new approach also better captures the long-term dynamics than the classical Gregory method when used on all data (with the HadGEM2 model for t = 150 years being the exception, where performance is similar). Moreover, the MC-LR method also tends to outperform the Gregory method that ignores the first 20 years of data when t > 150 years. For t = 150 years, results vary much per model. This is closely related to the difference in model behaviour: if dynamics happen on two dominant time scales, and the Gregory plot has an inflection point around (the arbitrarily chosen) year 20, this Gregory method works In the plot, red dots denote all data points (the later in the run, the smaller in size). The blue line shows the linear 'Gregory' fit when all data from years 1 to 300 is used and the yellow line the Gregory fit when the first 20 years are ignored. The green line shows the 3 exponent fit . The cyan line indicates a fit to the EBM-ε model that includes ocean heat uptake . The magenta line visualises the newly introduced multi-component linear regression that, in this case, utilises both albedo and emissivity (for this visualisation only -and not for any of the fits in this paper -averaged data from the experiment are used). The stars ( ) are the estimated equilibrium warming values from the different methods. Finally, dotted and dashed black lines indicate linear Gregory fits for the first and last part of the simulation that can be used for comparison -and that show how the various estimation methods capture dynamics on multiple time scales.
well (see for example the model MPI-ESM 1.1); otherwise, the MC-LR method will (eventually) outperform it. A more in-depth discussion per model is included in Supporting Information Text S3.5.

Discussion
In this paper, we have introduced a new equilibrium climate sensitivity estimation technique -the multicomponent linear regression (MC-LR) -that better captures the long-term behaviour compared to conventional techniques. This MC-LR method has one prime rationale: a perturbed climate system evolves according to a linear system (given that the radiative perturbation is small). This linear evolution is recovered through the multi-component linear regression (i.e. regression to Y = A X + F ). Although, here, only data from one transient simulation is used in the fits, data from multiple runs (with the same radiative forcing) can also be put together -possibly leading to even better estimates. As the goal of the method is to recover the eigenmodes in the linear regime of the system, such combination of runs seems extremely beneficial if runs follow the evolution of different eigenmodes. Indeed, it seems plausible -and an interesting direction for further research -that a small ensemble of short runs, each with a different perturbation of the initial system state, will better estimate the coefficients of the fitted linear system (i.e. A and F ) without compromising in terms of total computing power.
The most difficult -and the most important -aspect of the MC-LR method is the choice of the observables used in the regression data. It is key that this data well-represents the different eigenmodes of the system. If too few are used, not all eigenmodes are found; if too many (or redundant ones) are used, estimates become unusable (as data becomes linearly dependent, which causes the fitted matrix A to become near singular). In this study, we have focused on the use of (effective top-of-atmosphere) albedo and/or emissivity -observables that can be computed from datasets that are already normally used for climate sensitivity (Gregory et al., 2004;. However, the use of other, more curated observables might -and probably will -work better. For instance, the very long-term ocean dynamics might be better represented in data on ocean heat uptake Raper, Gregory, & Stouffer, 2002;Li, von Storch, & Marotzke, 2013). It also seems natural to capture the known climate feedbacks, e.g. surface albedo, water vapour and lapse rate (Von der Heydt et al., 2020). One should beware though that all these (feedback) processes together combine to the system's eigenmodes in non-straightforward ways. For example, summing feedbacks -like is commonly done in climate literature -only makes sense in systems that only have one component; in systems with multiple components, processes and eigenmodes are not linked directly like this. Nevertheless, a careful inclusion of these feedbacks might lead to even better estimates and may further shorten the needed length of simulation runs.
The method described in this study does not only lead to better estimates for the equilibrium climate sensitivity, but can also be used to develop extensions of climate sensitivity, by incorporating other observables. Regression of the multi-component model Y = A X + F leads to equilibrium estimates for all the observables in X as X est * = −A −1 F . This estimate can be seen as a multi-variate metric for climate sensitivity, in contrast to classical uni-variate metrics that focus only on changes in global temperature. Such multivariate metrics can better describe and quantify the changes that occur to the climate system due to changes in radiative forcing. In fact, many -if not all -climate subsystems and ecosystems do not depend critically on the global mean surface temperature, but on other observables such as the amount of precipitation or ocean heat transport (Lenton et   Here, only the best MC-LR method is depicted for each model (because of differences in model dynamics, which observables yield the best estimates differs per model). A complete list for all variants of the estimation techniques can be found in Supporting Information Figure S6, and a scatter plot of the results in this figure can be found in Supporting Information Figure S5. Rockström et al., 2009). Estimating those directly -rather than considering them enslaved to the global mean surface temperature -will possibly lead to better projections for those (sub)systems. Accurate estimations of equilibrium climate sensitivity are hard to come by, mostly due to the lengthy computation times needed to fully equilibrate modern global climate models. Going forward, it seems the more and more realistic state-of-the-art models will only take longer and longer to equilibrate (even considering developments in computer hardware). In particular, for high-resolution simulations with ultra fine numerical grids such equilibration runs are just not a practical option. For these kind of simulations it is vital to have extrapolation techniques that only need relatively short transient simulations to estimate the system's long-term behaviour. Once fully developed, such methods -the one introduced in this study being a first step towards them -can help to design the kind, amount and length of the experiments performed with these high-resolution models, indicating an optimum between accurate (multi-variate) climate sensitivity estimation and computing time.  Ultimately, warming of Earth's climate can only happen if more energy enters than leaves the Earth system. The 3 amount of energy that enters or leaves is given by the (top-of-atmosphere) radiative imbalance ∆R, which is the 4 incoming short-wave solar radiation minus outgoing short-and long-wave radiation (∆R = R SW,↓ − R SW,↑ − 5 R LW,↑ ). As the amount of outgoing radiation is determined by many (feedback) processes and the Earth's current 6 temperature, the radiative imbalance can be seen as function of these. That is, where β := (β 1 , . . . , β N ) indicate other state variables that influence the radiative fluxes (e.g., sea ice, surface 8 albedo, ocean heat content, cloud formation). Note that, for the generality of this reasoning, neither the exact 9 processes nor the precise relevant state variables, need to be known.

10
When the system is close to equilibrium -a typical assumption when studying equilibrium climate sensitivity -11 expression (S1.1) can be linearised via a Taylor expansion to obtain Here, T * is the equilibrium temperature and β * := (β * 1 , . . . , β * N ) denotes the equilibrium values of the other state 13 variables. Since there is, by definition, no radiative imbalance in equilibrium, ∆R * := ∆R(T * , β * ) = 0. Hence 14 this expression reduces to Since, T = T 0 + ∆T (T 0 being the initial, non-forced temperature), a slight rewriting of this expression gives Thus, there is a linear relationship between ∆R and ∆T of the form Even without explicit knowledge of the various functions in this section, this gives a recipe to determine equilibrium 23 warming: close to equilibrium, there is a linear relationship between the time series for ∆R and ∆T . Linear 24 regression of those gives approximate values for constants a and f in (S1.7) and an estimation for the equilibrium 25 warming is given by ∆T est One of the reasons this method is used so often is its relative simplicity: almost no system knowledge is needed 27 to perform this procedure and results are good provided the system is close enough to equilibrium. However, it 28 is often not clear if the system is close enough to equilibrium: in fact, the relation between ∆R and ∆T often is 29 not linear but nonlinear -and this is not always obvious from short time series Armour, 30 2017; . A practical work-around is to only use the last part of the time series; this 31 way, the initial non-linear parts are not taken into account and only the more close to equilibrium data is used, 32 which better satisfies the assumptions of this technique and gives better long-term predictions. Yet, in practice,

33
it is not always easy to determine how many years of data should be ignored. Moreover, the signal-to-noise ratio 34 tends to be smaller in the later parts of the data, leading to imprecise estimations if too much data is ignored.

35
It is also possible to automatise the detection of the inflection point in the Gregory plot; for instance, a double 36 linear function can be fitted: (S1.10) This technique, dubbed the 'double Gregory method' here, is able to find the inflection point of a Gregory plot if 38 the data clearly shows behaviour on two time scales (and estimates the equilibrium warming as ∆T est However, if there are multiple time scales, these time scales overlap, there is a lot of noise or anything else that 40 obscures the two-part splitting of the Gregory plot, this automatic detection tends to fail at detecting the long-term 41 trend as well. Also note that this 'double Gregory' method is similar to the mode decomposition in Proistosescu 42 and Huybers (2017); in fact, in case of (only) two very distinct modes, both techniques are essentially equivalent.

46
Both of these hold true 'close to equilibrium'. However, the requirement for the second approximation is much 47 stricter: it only holds when the system is much closer to equilibrium and the decay to equilibrium happens 48 approximately on one eigenmode only (i.e. feedback processes are virtually instantaneous). In this section, this 49 will be explained in more detail by considering general properties of dynamical systems.

50
Consider the dynamical system 51 y = f ( y), ( y(t) ∈ R n ) (S1.11) (the prime denoting the derivative with respect to time) and let y * be a fixed point of the system (i.e. f ( y * ) = 0).

52
For y 'close to' the fixed point y * , (S1.11) can be linearized to capture the leading order dynamics of the difference 53 z := y − y * : 54 z = A z, (S1.12) where A = Df (y * ) ∈ R n×n . From (S1.12) it follows that 55 z(t) = e At z(0). (S1.13) The dynamics close to equilibrium are thus determined by the eigenvalues and eigenvectors of matrix A. For 56 illustrative purposes, assume that A has n distinct real eigenvalues that are ordered as λ 1 < λ 2 < . . . < λ n (other 57 possibilities do not alter the reasoning, but give rise to distracting bookkeeping that are non-essential here). In 58 this case, the evolution of z is equivalently given by  Figure S1. In general, linear multi-component systems will result in non-linear behaviour z(t) z (t) Figure S1 -Exaggerated sketch of a possible evolution of z1 as in (S1.15), for λ1 < λ2 < 0.
of the system's individual components.

66
The nonlinearity is the result of the presence of multiple eigenmodes that act on different time scales (i.e. are 67 associated with different eigenvalues). When the system has evolved for long -and thus is very close to equilibrium 68 -the exponent e λ1t is much smaller than e λ2t (i.e. for t 1, e λ1t e λ2t ). Thus, z 1 (t) ≈ c 2 v 2,1 e λ2t , which is 69 equivalent to z 1 = λ 2 z 1 . So, only in this situation, there is a linear relationship between z 1 and z 1 -and does the 70 linearisation of the feedback processes in (S1.4) hold true. That is, for t 1, the system dynamics can generically 71 be captured by a 1-component linear system; otherwise, multiple components are necessary to accurately track 72 the system dynamics.

73
Shifting attention back to equilibrium estimations, these general considerations also give insight. Similar to before, 74 since y = y 0 + ∆ y (with y 0 the initial, unforced state), it follows that z = ∆ y − ∆ y * . Hence, (S1.12) becomes 75 ∆ y = A∆ y + A∆ y * . (S1.16) If the whole system state y can be observed, a linear fit of ∆ y and ∆ y to (S1.16) gives approximate values of 76 A (and thus to all of its eigenvalues and eigenvectors) and F := A∆ y * , such that the equilibrium system state is 77 estimated by ∆y est * := A −1 F .

78
If only one component of ∆ y can be observed, a linear regression only finds the projection of the full dynamics; 79 that is, only the most dominant dynamics present in the data is captured. Going back to the example system, a 80 linear regression of ∆y 1 and ∆y 1 to ∆y 1 = a∆y 1 + f gives different results over time:

85
To illustrate this, in Figure S2, an example is given of a trajectory in (∆y 1 , ∆y 1 )-space, along with linear regression 86 results when applied on various subsets of the data. scales, the next eigenmode is needed. By ignoring the first part of the data, this feat is achieved. However, as 94 argued above, it is also possible to capture these other eigenmodes by employing a larger set of observables, from 95 which the relevant eigenmodes -and thus the long-time dynamics -can be derived from this early data (that is 96 typically ignored in Gregory plots).

97
Also note that it is not necessary to observe the state space directly, nor to use the same observables on the 98 left and right hand sides (as is also not done in the Gregory plots either). That is, it is possible to use any m 99 observables Y = (Y 1 , . . . , Y m ) T that tend to 0 in equilibrium, and any m observables X = (X 1 , . . . , X m ) T , some of 100 which need to be estimated in equilibrium. Provided these form a good representation of the system's dynamics, 101 these can be regressed to to obtain the equilibrium estimate X est * system is close enough to the equilibrium (as transforming (S1.16) to (S1.17) requires another Taylor expansion 104 for general observables). However, if observables are linear combinations of the components of the state space -105 such as globally averaged surface temperature and radiative imbalance -there is no such additional caveat.

106
Finally, we note that this method, and specifically (S1.17), can also be derived directly from the equation Y = 107 Y ( X, β) using a generalisation of the reasoning in section S1.
the (ultimate) albedo and emissivity for a given temperature T , but these values are approached slowly -not 121 instantaneously. Furthermore, Q 0 is incoming solar radiation, −Q 0 α the reflected solar radiation, −εσT 4 the 122 outgoing long wave blackbody 'Planck' radiation and µ models the effect of CO 2 in the atmosphere, where we set 123 µ = µ 0 + A 0 log (C/C 0 ) (with C/C 0 the factor of the considered CO 2 increase). Finally νξ(t) is Gaussian white 124 noise, representing natural climate variability.

130
Numerical simulations were performed via discretisation of (S2.1) with a standard Euler-Maruyama scheme and 131 numerical time step ∆t = 0.001 was used. Output of the model includes time series T (t), α(t), and ε(t). From 132 these, time series for the increment of the observables have been computes: Derivatives needed for some of the estimation 134 techniques have been computed numerically as forward differences.

136
For each individual realisation of the model, estimations for the real equilibrium temperature increase, ∆T est * , 137 have been made using the following different techniques:

142
(4) Fit the linear system Linear regression for all of these techniques has been standard least squares regression. Examples of the results of 149 these estimation techniques at various noise strengths ν are given in Figure S3. The results are given as function 150 of time t: estimates ∆T est * (t) contain the estimation value when regression is applied only on data up to time t.

152
The previously described estimation techniques all converge to the true equilibrium warming ∆T * . However, they 153 do so at different rates. Therefore, to assess and quantify their power, it is of interest to quantify how fast and 154 how accurate they are. For this, the remaining relative error can be used -the maximum of the relative errors 155 that occurs for later predictions. More precisely, the relative error at time t is given by , (S2.2) and the remaining relative error is

162
An ensemble run of 100 simulations has been performed for noise levels ν = 0, ν = 0.25, ν = 0.5 and ν = 1. The results on the remaining error are given in Figure S4.

165
Furthermore, in this simple model it is possible to determine the real Jacobian of the new equilibrium state and 166 its eigenvalues. Although the fitted matrix A does not have the same eigenvalues (since dT dt = ∆R C T ), the Jacobian 167 matrix can be recovered from matrix A through a row operation (i.e. by multiplying the '∆R row' by C T ). Hence, 168 also the values for the eigenvalues can be compared, to see how successful the various methods are in tracking 169 these. Here, especially the rate at which the smallest eigenvalues are captured is relevant for accurate long-term 170 predictions. The results, for the system without noise, are given in Figure S5.  As explained in section S1.1, this is some interpolation of the real system's eigenvalues; in this case, certainly the 192 fitted eigenvalue is nowhere near λ 3 , and thus a large underestimation of the real equilibrium warming is made 193 when using this method. Finally, the 'double Gregory' technique does fit two eigenvalues; one turns out to be 194 close to the highest eigenvalue, λ 3 (and the other an interpolation of the two remaining real eigenvalues), which 195 explains why estimates using this technique are quite good (for low noise levels).

196
Lastly, in the presented results, standard linear regression (i.e. least squares minimalisation) was used. Hence, 197 the fitted matrix A has no non-zero entries. The real Jacobian, however, does have zero entries. These could be 198 identified when using a slightly different linear regression technique that also minimizes for instance the L 1 or 199 L 2 norm of the fitted coefficients, such as the Lasso regression (e.g ?) or Ridge regression (e.g ?). This could 200 result in (slightly) better estimates, but requires some hyper-fitting to obtain adequate fitting settings (which are 201 probably different settings for different models and experiments). As the obtained fitted eigenvalues and estimates  Figure S3 -Estimated values for one single realisation of (S2.1) using the different estimation techniques described in the text, for various noise strenghts ν.

206
To test the equilibrium estimation techniques on complex models, data from LongRunMIP (Rugenstein et al.,207 2019) was used. These model runs have a significantly longer simulation time compared to the typically used 208 150 year runs (the minimum requirement for models participating in most model intercomparison projects such 209 as CMIP6, ). Hence, the model runs used here are typically relatively equilibrated at the 210 end of the simulation. Therefore, bounds on the real model equilibrium warming can be found, allowing for a 211 quantitative assessment of the equilibrium estimation techniques.

212
In this study, we have used data from the abrupt 4×CO 2 experiments on the models in LongRunMIP. That is, 213 eleven models, of various levels of complexity, have been considered. Data for all of these models include at least 214 1000 years of simulation. An overview of the models is given in table S1.

216
For every model, model output on surface temperature and radiative fluxes has been downloaded from the 217 LongRunMIP database. Specifically, we have downloaded the globally averaged yearly datasets for near surface 218 temperature ('tas'), incoming short-wave radiation at top-of-atmosphere ('rsdt'), outgoing short-wave radiation 219 at top-of-atmosphere ('rsut') and outgoing long-wave radiation at top-of-atmosphere ('rlut'). These datasets have 220 been downloaded for the abrupt 4×CO 2 experiments and the pre-Industrial control runs.

222
The downloaded (globally averaged yearly) data has been used to compute the following observables:

226
Table S1 -Overview of models used in this study, alongside the length of their abrupt 4×CO2 experiment run and their pre-Industrial run and the obtained best fit value and 5%-95% range for the model's equilibrium warming.

233
Above, the general procedure to process the data has been explained. There are, however, a few exceptions to this 234 general scheme for some of the models -as per instructions of the LongRunMIP database readme (Rugenstein 235 et al., 2019). The following list contains the details of these slight alterations per model.

GISS-E2-R :
The control run has some discrepancies as months for the different variables do not line up.

237
Therefore, the first 300 years of data for the datasets 'tas' and 'rlut' have been ignored as there is no 238 corresponding data for 'rsut' and 'rsdt'.

CCSM3 :
The model is out of equilibrium in the first years, both for the control and for the experiment run.

240
Therefore -as per instruction -we have not used the control mean for the first years, but rather have taken 241 the anomaly year by year for the first 20 years of the simulation. For the remainder of the simulation, the 242 mean is taken as normal.

244
For each model output, estimates of the equilibrium warming ∆T * have been made with the following techniques:

247
(3) Fit ∆R = a∆T + f on all but the first 20 years of data. Estimation is then given by ∆T est * (t) := −a −1 f ;

248
(4) Fit a double-linear function to ∆R and ∆T .

253
(7) Fit the linear system To be able to assess the performance of different estimation techniques, it would be best to have the real equilibrium 262 warming ∆T * . However, even in these century-long model runs, the models have not equilibrated fully and 263 therefore the real warming is not known. Hence, we resort to a best estimate of the real model equilibrium 264 warming, which is defined as ∆T est,best * . Following Rugenstein et al. (2020), we define this best estimate as the 265 increment predicted by a linear 'Gregory' fit on the last 15% of warming. Concretely: we take the last 50 years 266 of a run and average the radiative imbalance in those years. We then take 85% of this average and determine the 267 first year (of the whole run) in which the radiative imbalance is below this value. The best estimate is then based 268 on a regression of all years of the simulation run following this year. In subplots (d) of Figures S8-S18 a plot of 269 ∆T and ∆R is given along with the 'best estimate' fit.

270
As the value for ∆T est,best * significantly incluences the outcome of any comparison and since its value is uncertain 271 even from the long simulation runs, instead of using a single value a range of probably 'best estimates' are used.

272
This range is denoted as R ∆T * . To find reasonable bounds for this range, we have resampled the data used to

286
To assess the different estimation techniques, the remaining relative error is used. Since the exact equilibrium 287 warming ∆T * is not known for these models, the error is computed as distance to the found range of possible best 288 estimates R ∆T * . Thus, the relative error is defined as where d (x, R) denotes the distance between the point x and the range R. That is, The remaining relative error is again given by the maximum of the relative errors that follow at later times: tends to work best -especially when data for at least a few hundred years is available. In particular, the MC-LR 301 estimate outperforms the classical Gregory methods -as can be seen most clearly from Figure S7 and Figure 4 302 in the main text. In the rest of this section, the results per model are discussed in slightly more detail.

303
CCSM3: Although the range R ∆T * is quite small -and the resampling of the last 15% of warming thus does 304 not show much variance -the best estimate range seems a bit on the high side; only estimates that take 305 X = (∆T, ∆α) T lie within R ∆T * when using all of the available data. Nevertheless, the Gregory plot shows 306 the plausibility of a late and slow additional warming (that is apparently not picked up completely by the 307 other methods); a longer run of the model is necessary to get a decisive answer on this.

308
The estimates of the MC-LR method with temperature and albedo do have many poles in years 400 -700 309 and overall gives higher estimates. The other techniques do have similar estimates, although the Gregory method needs more than 1, 000 years of data to get close to that value. Other methods perform similarly, too gentle slopes when a limited amount of data is used.
IPSL-CM5A-LR: Because of the short run, the model IPSL-CM5A-LR is not equilibrated at the end of the 379 run. This led to problems in finding a good and plausible best fit range (see section S3.4.1). The final 380 estimated values also show a large spread, and it indeed seems that the true equilibrium warming of the 381 model is high -especially considering the late curvature in the Gregory plot, that has been mentioned before 382 in Rugenstein et al. (2020).

383
For times t < 200 years, the lowest error is obtained with a MC-LR method that uses emissivity. Then, for 384 larger times t < 500 years, the lowest error is obtained with fits that use the albedo; for t > 500 years, the