CMIP6 Historical Simulations (1850–2014) With GISS‐E2.1

Simulations of the CMIP6 historical period 1850–2014, characterized by the emergence of anthropogenic climate drivers like greenhouse gases, are presented for different configurations of the NASA Goddard Institute for Space Studies (GISS) Earth System ModelE2.1. The GISS‐E2.1 ensembles are more sensitive to greenhouse gas forcing than their CMIP5 predecessors (GISS‐E2) but warm less during recent decades due to a forcing reduction that is attributed to greater longwave opacity in the GISS‐E2.1 pre‐industrial simulations. This results in an atmosphere less sensitive to increases in opacity from rising greenhouse gas concentrations, demonstrating the importance of the base climatology to forcing and forced climate trends. Most model versions match observed temperature trends since 1979 from the ocean to the stratosphere. The choice of ocean model is important to the transient climate response, as found previously in CMIP5 GISS‐E2: the model that more efficiently exports heat to the deep ocean shows a smaller rise in tropospheric temperature. Model sea level rise over the historical period is traced to excessive drawdown of aquifers to meet irrigation demand with a smaller contribution from thermal expansion. This shows how fully coupled models can provide indirect observational constraints upon forcing, in this case, constraining irrigation rates with observed sea level changes. The overall agreement of GISS‐E2.1 with observed trends is familiar from evaluation of its predecessors, as is the conclusion that these trends are almost entirely anthropogenic in origin.


Introduction
In an era of climate change and the frequent occurrence of record-setting measurements, there is intense interest in anticipating future changes. The Coupled Model Intercomparison Project (CMIP) was conceived to facilitate coordinated experiments of past and future climate among modeling groups (Meehl et al., 1997). Contrasting behavior between climate simulations would be attributed to model differences rather than differences in experimental design. The coordinated experiments allow models from different research centers to be meaningfully compared to each other and observations, while providing benchmarks for the comparison of successive generations of models.
A key element of CMIP Phase 6 (CMIP6 Eyring et al., 2016) is the simulation of climate change observed during the so-called historical period from 1850 to 2014. The comparison of observed changes to simulated trends driven by changes in atmospheric composition and other climate drivers is an incomplete but nonetheless useful test of model climate sensitivity that is important for assessing the likelihood of future changes projected by the models. Historical simulations have been performed with multiple generations of the NASA Goddard Institute for Space Studies (GISS) Earth system model (Hansen, Sato, Ruedy, Lacis, et al., 1997;Hansen et al., 2002Hansen et al., , 2007Hansen et al., , 1988Miller et al., 2014). Over previous CMIP generations, GISS has assessed robustness of the modeled response by simulating the historical period with different combinations of model components. Since CMIP3, we have carried out experiments with two ocean versions coupled to the same atmospheric model, and for CMIP5, GISS analyzed the impact of different treatments of atmospheric composition and the aerosol indirect effect. For CMIP6, we continue to use multiple versions of the coupled model. The use of a common atmosphere allows the effect of ocean treatments to be highlighted, while different treatments of atmospheric composition test robustness of certain interactions and forcings.
Characterization of forcing is the basis for understanding the contribution of individual drivers to future changes as well as attributing past and currently observed changes to the forcings (e.g., Hansen et al., 2005;Marvel et al., 2015). For each CMIP generation, the forcings are better characterized, the number of forcings recognized as important to observed trends is increased, and the representation of physical processes impacted by the forcings is expanded. Interactions between expanded physical representations within the model can reveal limitations and the need for refinement that were obscure prior to their coupling.
The GISS contribution to CMIP6 includes varying ocean components, composition interactivity, vertical and horizontal resolution, carbon cycle, and forcings. We report here on a subset of this contribution based upon GISS-E2.1, an updated and more skillful version of the GISS-E2 model used in CMIP5 (Miller et al., 2014;Nazarenko et al., 2015;Schmidt et al., 2014). The seasonal climatology of GISS-E2.1 during the satellite period  is evaluated elsewhere by Kelley et al. (2020). The present article is a complementary analysis of climate trends during the CMIP6 historical period (1850-2014) simulated by GISS-E2.1 for comparison to observed changes.
In section 2, we summarize the model versions and describe the format of the historical experiments. In section 3, we document forcings that perturb the model's pre-industrial climate, noting differences with respect to the GISS CMIP5 simulations that impact the climate response. In section 4, we evaluate model climate trends as coherently expressed across disparate climate variables and the trend dependence upon the forcing along with the atmosphere and ocean components. In section 5, we present a summary of results and implications.

Models and Experiments
A full description of GISS-E2.1, including development since CMIP5, is given by Kelley et al. (2020). Here we summarize the main features of the atmosphere and ocean components, while describing how the historical ensembles arise from the pre-industrial experiments.  (Sun & Bleck, 2006), where "R" and "G" are replaced by "H." The suffix "CC" refers to an interactive carbon cycle (Ito et al., 2020;Latto & Romanou, 2018;Lerner et al., 2020;Romanou et al., 2013Romanou et al., , 2014. The domain of GISS-E2.2 includes the mesosphere and has a better stratospheric circulation Rind et al., 2020), while GISS-E3 incorporates a new cubed-sphere grid and improved model physics.

Atmospheric Model
The horizontal and vertical resolution of the atmospheric component of GISS-E2.1 is unchanged since CMIP5: 2 • latitude by 2.5 • longitude with 40 vertical layers extending from the surface to 0.1 hPa in the lower mesosphere. Nonetheless, GISS-E2.1 shows substantial improvement in its seasonal climatology, along with variability like the Madden-Julian Oscillation (Del Genio et al., 2015;Kim et al., 2012) and teleconnection patterns associated with El Niño and the Southern Oscillation (ENSO) and the Pacific Decadal Oscillation .
Three versions of the GISS-E2.1 atmospheric component have been submitted to CMIP6. Atmospheric composition is prescribed in the first version, denoted here as NINT (for "non-interactive") and in the CMIP6 archive as physics-version=1 ("p1"). In two other versions, aerosols and ozone are calculated prognostically using either the One-Moment Aerosol (OMA) model or else the MATRIX model ("Multiconfiguration Aerosol TRacker of mIXing state" Bauer et al., 2008Bauer et al., , 2020. The OMA version is designated as physics-version=3 ("p3") in the CMIP6 archive to signify continuity with its CMIP5 predecessor, the TCADI model from which OMA was derived. (The relation of different CMIP generations of ModelE is summarized in Figure 1) The MATRIX version of GISS-E2.1 is designated by physics-version=5 ("p5") in the CMIP6 archive.
Aerosols and ozone in the NINT model are prescribed from "AMIP-style" OMA simulations of the historical period with forcings described in section 3 . Sea surface temperature (SST) and sea ice fraction are available from Rayner (2003) after 1870. Prior to that, these variables are specified using an average from the decade 1876-1885, a period chosen for its spatial coverage of measurements prior to significant warming. The use of the OMA model to prescribe NINT composition gives greater consistency between the two model versions. For example, ozone variations are in phase with the solar forcing in all the GISS CMIP6 ensembles. This consistency allows differences in behavior between NINT and OMA to be more directly attributed to feedbacks between composition and climate variations. In addition, the OMA version of GISS-E2.1 provides atmospheric composition for NINT based upon a state-of-the-art model, in contrast to the CMIP5 NINT version, where some constituents like ozone were calculated with the previous-generation (CMIP3) version of ModelE. OMA aerosol species provided to NINT include sulfate, nitrate, ammonium, organic carbon and three types of black carbon (BC). These species are primarily anthropogenic in origin, although there are non-anthropogenic sources from biomass burning and lightning. OMA also provides NINT with sea salt and dust aerosols that vary temporally through climate variables like wind speed. Anthropogenic dust sources are not represented in OMA, even though high dust concentration is observed in regions of cultivation (e.g., Ginoux et al., 2012;Mahowald et al., 2004;Tegen et al., 1996). Nonetheless, some anthropogenic trends in dust concentration can be driven indirectly by irrigation that increases soil moisture , as well as by the formation of sulfate and nitrate coatings on the dust particle surface that increases the rate of wet removal (Bauer & Koch, 2005;Bauer et al., 2004).
Atmospheric composition from AMIP-style OMA and MATRIX experiments is described by Bauer et al. (2020). Aerosol concentrations are determined by simulating an extensive series of chemical reactions in the atmosphere, which includes prognostic budgets for other radiatively active constituents like methane, nitrous oxide and chlorofluorocarbons (CFCs). The surface concentration of each of these gases is prescribed during the historical period using measurements (Meinshausen et al., 2017), with their vertical and spatial distributions determined by a suite of physical and chemical processes. Photochemical oxidation of methane is a source of stratospheric water vapor that alters the column longwave opacity. Ozone in the troposphere and stratosphere is calculated prognostically based upon the prescribed concentration of constituents like halogens or emission of tropospheric pollutants (Hoesly et al., 2018;Meinshausen et al., 2017), photochemistry and heterogeneous reactions on aerosols and polar stratospheric cloud droplets. Photolysis reactions associated with ozone are modulated by variations in the solar photon flux as described in section 3.4 (Matthes et al., 2017), although at present this flux does not vary with the changing concentration of radiatively active constituents like volcanic aerosols or ozone itself.
Aerosol extinction is linearly related to aerosol mass. Solar extinction by BC is increased by a factor of 1.5 relative to its mass to represent enhanced absorption by sulfate and nitrate coatings that increase internal reflection (Bauer et al., 2007), while dust LW extinction is increased by a factor of 1.3 to represent scattering (Dufresne et al., 2002;Schmidt et al., 2006). Along with their direct radiative effect, aerosols perturb climate through microphysical adjustments of cloud properties and the associated cloud radiative fluxes: the aerosol indirect effect (AIE; e.g., Boucher et al., 2013). In the NINT model, cloud fraction is increased according to the number concentration of sulfate, nitrate and carbonaceous aerosols above their pre-industrial levels: a heuristic representation of changes to cloud radiative forcing by aerosol nucleation of cloud droplets (Hansen et al., 2005). The relation between cloud fraction and aerosol number is logarithmic to represent the declining effect of adding anthropogenic aerosols to highly polluted air where nucleation sites are already abundant. This heuristic representation of the AIE within the NINT model is tuned to give roughly −1 W m −2 of forcing over the historical period, consistent with more physically sophisticated models constrained by observations (Boucher et al., 2013).
The NINT AIE directly relates anthropogenic aerosol mass to a change in cloud cover. Radiative properties of observed clouds depend upon cloud droplet number (CDN), which depends upon the concentration of both anthropogenic and natural aerosols, but additionally upon droplet size and their distribution with height within the cloud. The OMA and MATRIX models incorporate this more comprehensive view by relating aerosol mass only to the CDN at cloud base, and simulating cloud microphysical processes that convert vapor to liquid water to determine droplet size and the cloud optical depth. CDN is derived from aerosol number and updraft speed using empirical relations for convective clouds (Menon & Rotstayn, 2006, their Figure 1) and for stratiform clouds (Lohmann et al., 2007, their Equation 1). These relations are derived from case studies of observed cloud properties simulated by more physically complex microphysical models (e.g., Segal et al., 2004). In MATRIX, aerosols serve as droplet nucleation sites depending upon their mass and number along with the cloud updraft velocity following Abdul-Razzak et al. (1998) and Abdul-Razzak and Ghan (2000).
The OMA and MATRIX versions of GISS-E2.1 represent only the first indirect effect relating aerosol number to cloud optical thickness (Twomey, 1977). Development of the second indirect effect relating the influence of aerosols upon the growth of cloud droplets to precipitable size (Albrecht, 1989) is underway in GISS-E3. In contrast to the NINT model, the OMA and MATRIX calculations of cloud radiative effect are not adjusted to give a specific indirect forcing.
For the NINT pre-industrial control simulations (denoted as "piControl" in the CMIP6 archive), aerosol and ozone concentrations vary with calendar month based upon a decadal average of the OMA AMIP Note. Aerosols and ozone in the p1 (NINT) "f1" and "f2" simulations are taken from a four-member ensemble average of a prior OMA AMIP model version and a five-member ensemble average of the corrected OMA AMIP model, respectively. The doi of the latter ensemble is 10.22033/ESGF/CMIP6.6984, and its output in the CMIP6 archive is identified by amip_r[1-5]i1p3f1. The OMA experiments identified by "f1" in the CMIP6 archive use the corrected model.
The first year of each piControl experiment is unrelated to the length of its coupled spin-up.
simulation between 1850 and 1859. For NINT experiments during the CMIP6 historical period, the prescribed concentration is updated using OMA AMIP values that vary by month and year.

Ocean Models
Each of the three physics versions of the GISS-E2.1 atmospheric component is coupled to one of two ocean general circulation models (OGCMs).
The coupled model denoted "GISS-E2-1-G" in the CMIP6 archive (and "E2.1-G" here) uses the GISS Ocean version 1 (GO1), a descendent of the Russell OGCM used in CMIP5 (whose coupled version was denoted "GISS-E2-R" or else "E2-R"). Both GO1 and the Russell OGCM have horizontal resolution of 1 • latitude by 1.25 • longitude, so that four ocean grid boxes nest within a single AGCM grid box. GO1 has 40 vertical layers, 8 more than its CMIP5 predecessor. The additional layers improve vertical resolution in the upper ocean, including the thermocline. GO1 has 24 layers above 1,400 m depth compared to 15 in the CMIP5 version. Above 200 m depth, GO1 has 11 layers compared to just 6 in its predecessor. This approximately doubles vertical resolution in shallow, marginal seas where GO1 shows reduced salinity biases. The strongest mixing is now directed along isopycnals (Redi, 1982), eliminating unintended cross-isopycnal diffusion present in the GISS CMIP5 simulations.
The HYCOM model calculates the ocean circulation on 32 isopycnal layers, an increase of 6 over its CMIP5 precursor. These are replaced by height coordinates in regions of weak stratification. Meridional grid spacing is 1 • at the equator and decreases poleward with the cosine of latitude. Zonal resolution is a uniform 1 • outside of the Northern Hemisphere (NH) high latitudes, where a separate bi-polar grid eliminates the singularity of spherical coordinates. Because of the varying horizontal resolution associated with the NH polar grid and varying meridional spacing elsewhere, output from the coupled HYCOM model ("GISS-E2-1-H" or "E2.1-H") is archived on a uniform 1 • by 1 • grid at 32 depths extending from the surface to 5500 m.
The combination of the three physics versions of the GISS-E2.1 AGCM and the two ocean models results in six distinct coupled models that have been submitted to CMIP6. At the time of writing, less model output was available for the MATRIX coupled models, so we present results only for the NINT and OMA ensembles.

Spin-Up of Pre-Industrial Control Runs and Branching
Each historical simulation departs from a control simulation that is near equilibrium with respect to pre-industrial atmospheric composition.
The pre-industrial control runs resulting from the different combinations of atmospheric model physics versions and ocean models are listed in Table 1.
During integration of the NINT historical experiments, we identified a coding error in the OMA model that caused an underestimate of stratospheric ozone destruction by heterogeneous reactions on the surface of volcanic aerosols. (The surface area of volcanic aerosols that host these reactions was too small.) This error had a small effect upon the pre-industrial climate, but restricted ozone destruction by halogens following a volcanic eruption (Muthers et al., 2015;Solomon, 1999), leading to an ozone increase following Pinatubo in contradiction to observations (Hofmann & Oltmans, 1993). The AMIP OMA simulations were rerun to generate new ozone and aerosol inputs for a second set of NINT pre-industrial control and historical runs. The NINT model output with the original and corrected composition is distinguished by "f1" and "f2," respectively, in the CMIP6 ripfarchive code. The atmospheric component of GISS-E2.1 used in these two NINT simulations is identical; all that varies is the prescribed ozone and aerosol composition. In this article, we emphasize the results of the NINT f2 and OMA f1 historical ensembles, both of which are based upon the corrected model. In the troposphere, NINT f1 and f2 variables that we have examined are statistically indistinguishable, so we show the NINT f1 results for stratospheric variables, mostly deferring tropospheric diagnostics of this configuration to the Supporting Information.
The E2.1-G NINT f1 pre-industrial ocean was started from rest using a present-day climatology of temperature and salinity Levitus, Burgett, et al., 1994), and run for two millennia with pre-industrial atmospheric composition before a small correction was made to the sea ice calculation. The model was integrated for another 200 years before a small fix to lake ice was applied. After an additional 50 years, net radiation at top of atmosphere (TOA) was roughly 0.1 W m −2 (with an interannual standard deviation of 0.35 W m −2 ). This was considered close enough to equilibrium to start the piControl simulation and branching of the first historical ensemble member. After nearly three millennia of additional integration, the NINT f1 piControl was used to initialize the hundred-year spin-up of the corresponding (E2.1-G) OMA model, after which the OMA piControl and historical ensembles were started. The E2.1-G NINT f2 model branched from the f1 piControl 500 years after the OMA model; after 50 years of spin-up, the f2 piControl and historical ensemble were initialized. The E2.1-H NINT and OMA coupled configurations followed a similar path toward equilibrium and initialization of the piControl and historical experiments.
The global and annual-mean surface air temperature for the NINT and OMA piControl experiments is shown in black in Figure 2, along with low-frequency variations in red that are identified using a loess fit (described below). For the NINT experiments, temperature variations are on the order of a tenth of a degree K, with higher variability in the E2.1-G coupled model than in E2.1-H. Unforced variability of surface air temperature in each piControl experiment is larger than in its CMIP5 predecessor. (The standard deviation about the loess fit is indicated in the upper right corner of each panel of Figure 2, with the CMIP5 GISS-E2 value in parentheses.) This contrast is associated with stronger ENSO variations in E2.1-G . In general, surface trends are small for each model in Figure 2. Sub-surface trends are more apparent. Figure 3 shows drift in ocean heat content, consistent with the small radiation imbalance at TOA. For the NINT f2 model, the ocean heat content decreases by 550 ZJ over 450 years. This change is comparable in magnitude to an estimate of uptake since the pre-industrial (Zanna et al., 2019), suggesting that the model deep ocean is still adjusting toward equilibrium (Figure 3b; 1 ZJ equals 10 21 J).
The first member of each historical ensemble branches at the start of its pre-industrial control run (Table 1). The remaining members are initialized every 20 years, a duration long enough so that the mid-latitude gyre circulations and tropical ENSO are decorrelated and independent between ensemble members (Blumenthal, 1991). (The dates of branching are shown by the blue triangles in Figure 2) To offset the larger unforced variability in E2.1-G compared to its CMIP5 E2-R predecessor, which can obscure forced variations, we chose a ten-member ensemble size, compared to five for E2-R and (CMIP6) E2.1-H, whose piControl variability is smaller. Table 2 lists the historical ensembles including the branching date in the piControl simulation that corresponds to the start of the first member of each historical ensemble.

Definition of Anomalies During the Historical Period
The historical ensemble members are perturbed from the pre-industrial control experiment by the forcings described in section 3. However, each member initially retains some drift and low-frequency variability from the control run, in part because the latter may not have reached equilibrium. To isolate changes due to the forcing, we construct anomalies by subtracting from each historical member the corresponding low-frequency drift of the PI model. This drift is identified using a loess fit to the PI control experiment (Cleveland & Devlin, 1988), based upon the loess function in the R statistical (or stats) library (e.g., Teetor, 2011).
The loess fit derives from a local linear regression that adapts to a larger range of time scales in the piControl simulations than a simple linear or quadratic fit, similar to a low-pass filter. This is useful if the remaining approach to equilibrium is largest in the early part of the control run, so that minimal detrending is necessary thereafter. This is the case for the E2.1-H NINT model whose trend is largest in the first hundred years ( Figure 2d). A disadvantage of loess or low-pass fits is that they also capture low-frequency unforced oscillations in a control experiment. Where forcing in the historical experiment drives oscillations out of phase with the control variability, there will be spurious variations in the anomaly that are due to this temporal mismatch and not the forcing.
We choose our smoothing duration near 200 years to capture drift at longer time scales while excluding higher frequency oscillations. Despite this, Figure 2a shows a temporary warming around Year 4900 whose origin between residual disequilibrium and internal variability cannot be distinguished. Pre-industrial control variations at the surface are on the order of a tenth of a degree K or less, small compared to the magnitude of the forced anomalies observed by the end of the historical period, so this distinction is not of practical importance. In general, we find that drift correction is important for ocean heat content and sea level, but does not affect atmospheric trends noticeably.

Forcing
We characterize perturbations to the pre-industrial climate using the effective radiative forcing (ERF), defined as the difference in TOA net radiation between two AMIP-style simulations with pre-industrial SST and sea ice, where one simulation contains the forcing agent: for example, an increase in greenhouse gas (GHG) concentration (Hansen et al., 2005). The difference is constructed from 30-year averages after a 1-year adjustment period, following the RFMIP-ERF protocol Pincus et al., 2016;Smith et al., 2020). The ERF gives the radiative imbalance prior to the slow return by the ocean to equilibrium but after rapid adjustments in the atmosphere to the original forcing: for example, through stratospheric temperature or clouds, including their modification by aerosols.
Unforced variability in both simulations introduces uncertainty in the ERF. Within the 30-year averaging period, interannual variations of TOA net radiation are similar in magnitude for all forcings; we combine variance across the individual forcings to compute a common standard deviation of 0.15 W m −2 , so that the standard error of the ERF, constructed from the difference of two 30-year average fluxes, is 0.04 W m −2 . Thus, drivers whose global ERFs have magnitudes less than 0.08 W m −2 cannot be distinguished statistically from 0 with more than 95% confidence without longer runs. (This same criterion applies when comparing ERF values computed from two forcing agents introduced in the same model. For an identical forcing agent applied to NINT and OMA, this range should be multiplied by √ 2, so that statistically distinct values are separated by at least 0.11 W m −2 .) This uncertainty is problematic for highly localized forcing agents like land use and irrigation, whose global ERFs are small. Regional ERF values are also obscured by unforced variations of TOA net radiation.
Below, we describe the forcings that perturb GISS-E2.1 during the historical period. Global ERF for the E2.1-G NINT f2 model version is shown for the years 2000 and 2014 in Figure 4. The year 2014 is chosen to characterize the present-day forcing; the year 2000 allows comparison to forcing computed with previous model versions. Comparison of forcing between the E2.1-G NINT f2 and OMA ensembles is given by Figure 5.
Forcing in the E2.1-G and E2.1-H configurations is expected to be similar, due to their common AGCM and pre-industrial climates that are only slightly different.

Long-Lived Greenhouse Gases
Long-lived greenhouse gases (GHGs) have nearly uniform concentrations as a result of atmospheric mixing. In the CMIP6 historical ensembles, concentrations of GHGs are annual averages of values taken from Meinshausen et al. (2017). We use "Option 2" from that study and calculate radiative forcing from prescribed concentrations of CO 2 , CH 4 , N 2 O, CFC-11 (eq,) and CFC-12, where CFC-11 (eq.) includes CFC-11 plus a radiative equivalent of 38 other long-lived GHGs. Concentrations are uniform with height throughout the troposphere.
In the NINT ensembles, the NH concentrations of CH 4 , N 2 O and CFCs are increased compared to their Southern Hemisphere (SH) values to account for hemispheric contrasts of anthropogenic sources. In the OMA historical experiments, the vertical dependence of CH 4 is calculated prognostically, but its surface value is tethered to the prescribed NINT concentration and meridional contrast. The prognostic concentrations of N 2 O and CFC calculated by the OMA model are replaced in the radiative calculation with the prescribed CMIP6 values used by the NINT model.
In the stratosphere, oxidation of methane is a source of water vapor (LeTexier et al., 1988) that is calculated prognostically in the OMA and MATRIX ensembles. For the NINT model, the source is implemented empirically by increasing stratospheric water according to the methane concentration at the surface, but lagged by 2 years to account for vertical transport (Miller et al., 2014). This corresponds to a forcing around 0.1 W m −2 (Hansen et al., 2005).
Radiative forcing by GHGs is markedly smaller in E2.1-G than in the CMIP5 E2-R. Figure 4 shows that in Year 2000, ERF for the E2.1-G NINT configuration is 2.38 W m −2 in contrast to the E2-R value of 3.25 W m −2 . This reduction by one-quarter is greater than the fractional contrast in forcing associated with the doubling or quadrupling of CO 2 (Figure 4), suggesting that other GHGs also contribute to the reduction.
The spatial distribution of the GHG ERF is shown in Figure 6 for both the CMIP6 and CMIP5 versions of the GISS NINT configuration. The forcing is largest mainly at low latitudes where outgoing longwave radiation is highest. The contrast is fairly uniform with latitude, although slightly higher in the SH mid-latitudes ( Figure 6c). Column water vapor (or "precipitable water") and longwave (LW) cloud radiative forcing are larger in the E2.1-G NINT pre-industrial climate, compared to their CMIP5 E2-R predecessors (Figure 7). The radiative effect of the precipitable water contrast is related to the percentage difference (Goody & Yung, 1989;Huang & Shahabadi, 2014), which is largest in the SH (Figure 7c). Contrasts in LW cloud radiative forcing vary more with latitude but show a smaller hemispheric contrast (Figure 7f). We interpret the smaller GHG forcing in the GISS CMIP6 ensembles as the result of a column that is more opaque to LW radiation due to higher concentrations of water vapor and increased LW cloud forcing, and therefore less sensitive to further increases in LW opacity by GHGs. The reduction of the E2.1-G ERF compared to the CMIP5 E2-R value brings the GHG forcing closer to that of the CMIP3 ModelE (Hansen et al., 2007).  Skeie et al. (2020) show that GISS-E2.1 simulates the observed change in column ozone trend at springtime high latitudes between the late 20th century and first decade of the 21st century. For Year 2014, the NINT ozone ERF is 0.27 W m −2 and statistically indistinguishable from the OMA value (Figures 4 and 5). During this year, the concentration of volcanic aerosols is small as is the forcing difference with respect to the uncorrected NINT f1 configuration that underestimates ozone destruction by heterogeneous chemistry ( Figure S3). This correction is nonetheless important to regional ozone concentration and forcing. The NINT f2 ERF for 2014 is generally positive at all latitudes ( Figure 8e) except over Antarctica, where the stratospheric ozone hole caused by heterogeneous reactions with ozone depleting substances is apparent. In contrast, negative forcing over Antarctica is nearly absent in the f1 ensemble ( Figure S7e).

Aerosols
For Year 2000, the NINT f2 ERF corresponding to aerosol direct radiative forcing is −0.35 W m −2 , more negative than the CMIP5 NINT ERF of −0.22 W m −2 (Figure 4). The rapid adjustment includes the semi-direct effect of aerosols . The ERF is largest over land and coastal regions, especially in polluted regions like East Asia and the Indo-Gangetic Plain along with regions of cultivation and fertilizer use like the North American Great Plains (Figure 9a). For OMA, emission rather than concentration is prescribed so that the latter is intertwined with cloud processes. Thus, the total aerosol forcing cannot be decomposed into separate direct and indirect contributions. However, the NINT aerosol concentration is derived from a prescribed-SST version of OMA, so the direct effect of the two models should be similar.
The ERF associated with the aerosol indirect effect is larger in magnitude than the direct forcing and dominates the total aerosol ERF, especially where aerosol emissions coincide with clouds, such as the Guinea coast of NH Africa and the Amazon basin (Figures 9b and 9c). For the NINT configuration, the magnitude of the AIE is imposed, as noted in section 2.1. However, for the OMA version, where the first indirect effect is calculated using a physically based parameterization without tuning, the magnitude of total aerosol forcing is only slightly reduced (Figure 9e). Aerosols perturb climate where their deposition upon snow and ice increases shortwave absorption (Flanner et al., 2007;Skiles et al., 2018;Warren & Wiscombe, 1985). In GISS-E2.1, surface albedo is reduced by BC deposition (Bauer et al., 2013;Hansen & Nazarenko, 2003). Forcing is locally positive, as seen in the Himalayas (Figure 8b) where prolific sources of BC are near the bright snowy surface. In the OMA configuration, the global ERF is slightly positive: 0.10 W m −2 ( Figure 5). The NINT ERF is globally negative but not statistically distinct from 0 ( Figure 4).

Solar
Incoming solar radiation at TOA is derived from the annual average of the spectrally resolved flux at 1 AU compiled by Matthes et al. (2017). The annual flux is derived from two empirical models that relate measurements of solar irradiance by satellite (Rottman, 2005) to proxies like sunspot number and facular darkening that are available farther back in time. The satellite measurements are themselves a source of uncertainty due to gaps in temporal coverage and unknown offsets between non-overlapping instruments (e.g., Fröhlich, 2006). The same CMIP6 data set provides the photon flux that is used to calculate photolysis rates. The NINT f2 global ERF for solar in 2000 is 0.06 W m −2 , down from the CMIP5 NINT ERF of 0.28 W m −2 (Figure 4). For the year 2014, solar ERF for both CMIP6 NINT and OMA models is small and indistinguishable from 0 ( Figure 5). Both of these years are near the maximum irradiance measured during their respectively solar cycles. The latest cycle (24) is unusually weak compared to recent predecessors (Jiang et al., 2015).

Volcanic Aerosols
We use the recommended CMIP6 data set for stratospheric aerosols created by volcanic eruptions (ftp://iacftp.ethz.ch/pub_read/luo/CMIP6/Readme_Data_Description.pdf), a change from prior GISS CMIP experiments, where volcanic aerosols were derived from an update of Sato et al. (1993).
The CMIP6 volcanic prescription is based upon a combination of direct measurements and model reconstructions. During the satellite era (after 1979), radiance measurements are used to retrieve the aerosol extinction and particle size as an evolving function of height and latitude (Thomason et al., 2018). This period includes major eruptions by El Chichon and Pinatubo. Prior to the satellite era, stratospheric aerosol properties are derived using a chemical transport model (Arfeuille et al., 2014). The injection of sulfur dioxide following an eruption is estimated using measurements from multiple ice cores. Oxidation and conversion to sulfate aerosol is calculated by a model that simulates nucleation, condensation, coagulation, and deposition with transport prescribed from climatology as a function of latitude and height. Photometric extinction is used as a constraint when available. The model simulates extinction by sulfate aerosols for specific historical eruptions identified by the ice cores. During quiescent periods, aerosol properties are taken from satellite retrievals averaged between 1996 and 2005. Aerosol properties following eruptions are different after 1979 when retrievals replace model estimates.
The CMIP6 prescription moves one eruption identified by Sato et al. (1993) from the late 1850s to 1861 (Table 3). GISS prescribes total extinction at 0.55 μm using the CMIP6 prescription, but specifies its own version of the remaining Mie parameters as described by Hansen et al. (2005). During the historical period, total extinction at 0.55 μm is generally reduced compared to Sato et al. (1993). Extinction is now largest following the eruption of Pinatubo rather than Krakatau. Globally, the effective radius of volcanic aerosols is reduced.
During the piControl simulations, volcanic aerosol properties are held constant at a value equal to their average during the historical period. This is intended to minimize the climate trend resulting from successive eruptions during the historical period by requiring that the time-average volcanic forcing be 0. Negative forcing following eruptions after 1850 is punctuated by positive forcing during the intervening quiescent periods ( Figure 10). For example, in 2000, during the quiescent period following Pinatubo, CMIP6 volcanic forcing is 0.09 W m −2 (Figure 4). The assumption of zero forcing during the historical period results in a forcing discontinuity at 1850, when the constant pre-industrial concentration of volcanic aerosols is replaced by a smaller value until the Makian eruption in 1861 (Table 3).

Orbital
Orbital forcing results from variations in the eccentricity of Earth's orbit, the obliquity of its rotation axis compared to the orbital plane and the date of perihelion. As for CMIP5, these parameters are prescribed following Berger (1978). During the historical period, orbital variations in TOA insolation are small and are calculated only to allow continuity with Last Millennium simulations where changes are larger.

Land Use
The expansion of cultivation and grazing with human population over the historical period reduces the fraction of natural vegetation types within each land grid box. We inadvertently implemented land use in GISS-E2.1 using the same areal fractions of cropland and pasture as prescribed for the GISS CMIP5 model (Miller et al., 2014). Departures from the CMIP6 protocol are described by Ito et al. (2020). After 2005, the end of the CMIP5 historical period, land use is held constant.
Land use affects surface albedo. Forcing is negative where crops replace darker forests; forcing is small where crops expand into grassland and positive where bright bare soils are planted. Land use alters transpiration, which increases where the leaf area index is increased. Reforestation is not included in our prescription. The "pasture" category includes both managed pastures and unmanaged rangeland. The latter is found within natural vegetation categories and its surface properties are generally unmodified by grazing, so its inclusion within the pasture category leads to an excessive prescription of cultivated area. This overestimate is largest in grid boxes with abundant natural grasslands and arid shrublands where rangelands are identified. The inclusion of unmanaged rangeland in the cultivated category nearly doubles the area of simulated land use. Despite the overestimation of land use area, neither NINT or OMA global forcing in 2014 are distinct from 0 ( Figure 5). Nonetheless, negative values of ERF are significant over the Great Plains of North America (Figure 8c), consistent with the GISS CMIP5 ERF, due to replacement of forest by lighter crops. Figure 8c shows large positive forcing over the Arabian Peninsula, where cultivation expands to cover bright bare soil.

Irrigation
Irrigation was implemented within ModelE after CMIP5 (Cook et al., 2011Puma & Cook, 2010;Shukla et al., 2014). Water demand for irrigation in GISS-E2.1 was calculated between 1900 and 2005 as described by Wada et al. (2014) using irrigation areal extent from Siebert et al. (2015). Nineteenth century irrigation is estimated by extrapolating backwards from the trend between 1900 and 1929. Irrigation is held constant after 2005 using the flux from 2004.
The irrigated flux is taken from lakes and runoff within the same grid box and deposited into the soil column (Puma & Cook, 2010). When lakes and runoff are insufficient to meet demand, water is assumed to be supplied by a aquifer of unlimited volume. The aquifer draw represents an addition of water to ModelE.
Irrigated water may run off or evaporate, but some is diverted into deep soil layers that have a long return time to the ocean and atmosphere. As a consequence, irrigation initially has the potential to sequester water in the soil column and reduce sea level by transferring moisture from the ocean to the land. Eventually enough water accumulates in the deep layers that a large fraction of the irrigated flux remains near the surface where it is subject to evapotranspiration and runoff. Crops and vegetation share the same soil column in GISS-E2.1. Thus, saturation of the deep soil layers through the irrigation flux takes longer than if water were applied solely to the crop fraction of the soil column.
Global radiative forcing by irrigation is statistically indistinguishable from 0 in the NINT model but slightly positive in the OMA model ( Figure 5). Forcing is positive over the Indo-Gangetic Plain (Figure 8d), a site of extensive irrigation during the twentieth century (Puma & Cook, 2010). Moistening of the atmosphere by irrigation increases LW opacity, which contributes to positive forcing. The surface can nonetheless cool, as moisture supplied by irrigation shifts the compensation of solar heating away from the turbulent sensible heat flux toward efficient cooling by evaporation.

All Forcing
For Year 2014, the NINT ERF for the addition of all forcing agents simultaneously is 2.08 W m −2 and is dominated by the contribution from rising greenhouse gas concentrations (Figures 4 and 10). For the OMA Figure 11. Ensemble, global, and annual-average surface air temperature (K) during the historical simulations (1850-2014). (a) NINT f2 and (c) OMA models; (b and d) same but emphasizing the period after 1970. Intra-ensemble variability at plus and minus two standard deviations is marked by pink and light blue shading for the E2.1-G and E2.1-H models, respectively, with light purple marking their overlap. This variability is smoothed using an 11-year window that damps periods less than a decade. Temperatures from the corresponding CMIP5 models (E2-R and E2-H) are plotted as dotted lines. The observed GISTEMP record is in black. Temperatures are anomalies, defined relative to their late-nineteenth century average . Vertical gray lines show major volcanic eruptions listed in Table 3. configuration, the all-forcings ERF is 2.19 W m −2 ( Figure 5), marginally larger than the NINT value, with aerosols making the largest contribution to the contrast.
For Year 2000, the CMIP6 NINT ERF is 1.79 W m −2 , nearly 1 W m −2 smaller than the CMIP5 NINT value of 2.72 W m −2 . Most of this change is due to a reduction of GHG forcing in the CMIP6 model. The total aerosol forcing is nearly unchanged between the two generations of NINT experiments, but solar and volcanic forcing combine for an additional reduction of 0.46 W m −2 .
The spatial distribution of the ERF is generally positive as a result of increasing LW opacity due to rising GHG concentrations ( Figure 8a). However, the imprint of more regionalized forcings is apparent: for example, land use over the North American Great Plains and western Arabian Peninsula, aerosols over southeast Asia, the Guinea coast and Amazon basin, BC deposited upon snow in the Himalayas, irrigation over the Indo-Gangetic Plain, and the near offset of GHG forcing over SH high latitudes by stratospheric ozone loss.

Temperature
Surface air temperature for the historical ensembles is plotted in Figure 11 along with the GISTEMP Land-Ocean Temperature Index (LOTI; Lenssen et al., 2019). (Table 4 summarizes the observations used in this study.) Anomalies are defined relative to the start of the GISTEMP record in the late nineteenth century (1880-1900), when the forcing is still small. Over the historical period, the GISS-E2.1 models warm by 0.1-0.2 K less than their GISS CMIP5 counterparts. The reduced warming is in spite of a higher transient climate response (TCR) for the GISS-E2.1 models compared to GISS-E2 (Table 5, excerpted from Table 6 of Kelley et al., 2020), suggesting that the CMIP6 reduction is a consequence of smaller total forcing (Figure 4). The E2.1-H version has a higher TCR than E2.1-G, consistent with its higher warming in both the NINT and OMA versions. (Table 5).
Because E2.1-G and E2.1-H share a common atmospheric model, the warming and TCR contrasts suggest that HYCOM exports heat from the surface to the deep ocean more slowly (as shown in section 4.3).
Regional warming is shown in Figure 12 as the temperature difference between the start of the GISTEMP LOTI (1880-1899) and the end of the historical period (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). Both the models and observations show their largest warming at high latitudes, especially in the NH, a robust feature of coupled models (Dai et al., 2019;Holland & Bitz, 2003). In E2.1-G, there is cooling in the region of North Atlantic deepwater formation near the outlet of the Labrador Sea that matches the location of the observed warming minimum (Figures 12a and 12b). This minimum is not reproduced by the E2.1-H ensembles (Figures 12d and 12h).
Despite the broad hemispheric agreement of model temperature with the observations, regional disagreements are apparent. The extratropical continents in both hemispheres warm less than observed in both the NINT and OMA simulations. The contrast is especially large over the Indo-Gangetic Plain and East Asia, regions cooled by irrigation and strong negative forcing by anthropogenic aerosols (Figures 8 and 9).   In contrast, there is excessive warming by all model versions in the Greenland and Norwegian Seas. In the Tropics, model warming is "ENSO-like" with greater magnitude in the equatorial Eastern Pacific compared to observations (Cane et al., 1997).
Differences between the OMA and NINT ensembles are largest at NH high latitudes, where meridional amplification of warming by OMA is smaller. This moderation is most prominent in the vicinity of the Norwegian Sea in both OMA models (Figures 12e and 12f).
Upper air temperatures retrieved from the Microwave Sounding Unit (MSU) and Advanced Microwave Sounding Unit (AMSU) are plotted in Figure 13. Retrievals of lower stratosphere (TLS) and middle troposphere (TMT) temperature, representing weighted averages with respect to height, are provided by Remote Sensing Systems (Mears & Wentz, 2016, RSS Version 4.0). TLS is weighted most heavily near 20 km; TMT has its largest contribution from radiation originating near 10 km, but has significant weight extending into the lower stratosphere. The weighting with respect to height varies with location and season. For comparison to the retrievals, we average model temperature over a range of altitude that is globally and temporarily invariant (Hansen et al., 1998), an approximation that has been shown to have a small effect upon annual trends compared to other uncertainties, including differences among retrieval algorithms (Santer et al., 1999;Thorne et al., 2011).
The TLS retrievals show abrupt warming of the lower stratosphere following a volcanic eruption. Recovery is initially rapid followed by a slower decadal cooling and return to pre-eruption values (Figure 13a and 13b). In the last half-century, this cycle is superimposed upon a multi-decadal cooling trend that has been attributed to increasing concentrations of ozone depleting substances and greenhouse gases (Gillett et al., 2011;Manabe & Weatherald, 1967). Each of the GISS-E2.1 ensembles reproduces these transient warmings and recent cooling. Throughout the historical period, TLS from the NINT f2 ensemble closely tracks the OMA value, reflecting the prescription of atmospheric composition in the former model from an AMIP version of the latter.
At the start of the historical period, the lower stratosphere in the OMA and NINT f2 models cools abruptly, prior to any volcanic eruption (Figures 13a and 13b). The anomaly is outside two standard deviations of intra-ensemble variability, and is significantly different from the initial pre-industrial temperature. We interpret this cooling as the result of a discontinuous removal of volcanic aerosols. As noted in section 3.5, aerosol concentration during the pre-industrial is constant and equal to its average during the historical period. At the start of the historical period, concentration drops abruptly and remains low until the Makian eruption in 1861. The removal of volcanic aerosols causes cooling in the OMA and NINT f2 ensembles by the same mechanism that the injection of aerosols into the stratosphere following an eruption causes rapid warming through longwave absorption (McCormick et al., 1995).
In contrast, this cooling is absent in the NINT f1 ensemble that also generally lacks the slow decadal recovery following each eruption that is present in the other model versions. This absence is associated with a consistent warm offset of the NINT f1 TLS during quiescent periods. Until the introduction of CFCs into the stratosphere in the mid-twentieth century, TLS is tightly coupled to column-integrated ozone in both NINT models (Figure 14). At the start of the historical period, column ozone drops in the NINT f2 ensemble, associated with a reduced concentration of volcanic aerosols that host heterogeneous chemical reactions on the aerosol surface. These reactions compete for nitrogen and reduce the concentration of NO x and the associated catalytic destruction of ozone (Muthers et al., 2015;Tie & Brasseur, 1995). The abrupt reduction of volcanic aerosols at the start of the NINT f2 and OMA ensembles causes catalytic ozone destruction to increase, associated with cooling of the lower stratosphere. In contrast, heterogeneous reactions are muted in the NINT f1 model by the misspecification of surface area for volcanic aerosols. Ozone destruction in NINT f1 is nearly unchanged by the removal of volcanic aerosols in 1850, resulting in a steady concentration and temperature of the lower stratosphere.
By the late twentieth century, CFC concentrations have increased and the effect of heterogeneous reactions upon ozone is now dominated by destruction through the presence of chlorine. Figure 14shows that column ozone decreases following the Pinatubo eruption in the NINT f2 and OMA ensembles, as observed (Hofmann & Oltmans, 1993), and in contrast to the increase in the NINT f1 model. The contrasting relation of TLS and ozone in the NINT f2 model between nineteenth and late-twentieth century eruptions shows that the representation of heterogeneous chemistry in OMA is flexible and able to reproduce the distinct limits dominated by chlorine and nitrogen associated with high and low background concentrations of CFCs (Tie & Brasseur, 1995).
Retrievals of TMT show a warming trend starting in the late twentieth century punctuated by abrupt warming associated with El Niño events and cooling following major volcanic eruptions (Figure 13c and 13d). (TMT trends also include an offsetting contribution from cooling in the lower stratosphere.) The warming trend and post-volcanic cooling are reproduced by each ensemble.
Temperature trends between 1979 and 2014 at the surface and MSU retrieval layers are shown in Figure 15. This period was chosen for overlap between the MSU retrievals and historical ensembles. Observational uncertainty reflects interannual variations, but not instrument or retrieval uncertainty. For surface temperature, this effect is on the order of 0.01 K per decade and relatively unimportant. Retrieval uncertainty for MSU is higher: 0.05 K per decade (Thorne et al., 2011).
The figure shows surface and tropospheric warming but stratospheric cooling, a contrast that is consistent with anthropogenic forcing, especially rising greenhouse gas concentrations and ozone depletion (e.g., Gillett et al., 2011;Manabe & Weatherald, 1967). The NINT f1 and f2 ensembles have tropospheric trends that are indistinguishable, showing the limited effect of the erroneous post-eruption ozone chemistry in the original ensemble (Table 6). The difference is larger and significant in the stratosphere.   0.17 ± 0.03 0.13 ± 0.04 −0.22 ± 0.10 Note. Model trends are calculated from the ensemble mean, while their uncertainty range bounds the central 95th percentile of the distribution of ensemble member trends. Observed trends are based upon GISTEMP (surface) and RSS v.4.0 MSU channels TMT (middle troposphere) and TLS (lower stratosphere). Uncertainty of observed trends is related to interannual variability but does not include measurement or retrieval error. Different methods of MSU retrieval contribute additional TMT trend uncertainty near 0.05 K per decade (cf. Thorne et al., 2011).
For the lower stratosphere, the observed cooling is consistent with the distribution of model trends for each ensemble. In the mid-troposphere and surface, most ensembles have mean warming trends that are excessive compared to observations, but each E2.1-G ensemble has members whose trend is close to the observed value.
We recomputed trends with the observational record extended to 2018, which introduces an unusually large El Niño event. The observed surface and tropospheric trends increase by 0.02 K per decade, which reduces the model discrepancy (Table 6). This extension has little effect upon agreement in the lower stratosphere.
At the surface and mid-troposphere, E2.1-H ensembles with the HYCOM ocean model show larger warming than the E2.1-G ensembles. This contrast is largest in the OMA ensembles, consistent with the greater contrast in TCR (Table 5.)

Sea Ice
Sea ice area is shown for each ensemble as an annual hemispheric percentage in Figure 16, along with the percentage retrieved by the National Snow and Ice Data Center (NSIDC Fetterer et al., 2017). Sea ice area is greater in both hemispheres in the GISS-E2.1 ensembles compared to their GISS CMIP5 predecessors. This represents an improvement in the SH, where the E2.1-G ensemble means are in good agreement with the NSIDC retrievals at the start of the satellite period. Observed SH sea ice trends are obscured by interannual variability (Figure 16). E2.1-G trends are weak, but SH ice loss in the E2.1-H ensembles is comparable to NH trends and unrealistically large. Freshwater from melting ice sheets is one source of sea ice variability not represented in the models (Golledge et al., 2019;Rye et al., 2014). Anomalous meltwater increases surface stratification, reducing vertical mixing and the upward transport of ocean heat that leads to sea ice loss.
In the NH, ensemble-mean area is up to 30% higher than the NSIDC retrieval, with a larger bias than their GISS CMIP5 counterparts. Kelley et al. (2020) attribute this increase to excessive radiative loss at high latitudes related to brighter clouds. Figure 16 includes an estimate of sea ice extent that extends prior to the satellite period (Brennan et al., 2020). The estimate addresses the sparsity of sea ice observations prior to the mid-twentieth century by assimilating comparatively ubiquitous temperature measurements into an ESM to infer sea ice extent, defined as the total area of grid cells whose areal fraction of ice exceeds 15%. By excluding open water within a grid cell, ice area is generally smaller. For the NSIDC retrieval grid, ice extent is larger than area by about 20%. To facilitate comparison with model area, we give ice extent the same average between 1979 and 2013 as the NSIDC area.
NH ice extent shows a decline in the early twentieth century interrupted by a brief recovery following the eruption of Mount Agung before a steeper decline during the satellite period. Throughout the historical Figure 16. Ensemble and annual sea ice area as a hemispheric percentage during the historical simulations . For comparison to observations, total ice area is shown instead of a detrended anomaly. Unforced piControl trends are small compared to recent historical trends ( Figures S1 and S2). The pink and light blue shading marks plus and minus two standard deviations of intra-ensemble variability for the E2.1-G and E2.1-H NINT f2 models, respectively, with light purple marking their overlap. Sea ice from the corresponding CMIP5 models (E2-R and E2-H) are plotted as thinner lines. The observed hemispheric percentage derived from the NSIDC sea ice area index (v.3.0) is in black. NH sea ice extent calculated from assimilated temperature measurements is shown in purple with shading to denote plus and minus two standard deviations of uncertainty (Brennan et al., 2020). Ice extent is assigned the NSIDC area average between 1979 and 2013 to reduce their offset.
period, the GISS-E2.1 ensembles have too much NH sea ice compared to their GISS CMIP5 predecessors or the Brennan et al. (2020) assimilation. The GISS-E2.1 ensembles also underestimate the early twentieth-century decrease and partial recovery with the E2.1-H OMA ensemble showing the best correlation. The GISS CMIP5 ensembles start from a more realistic baseline and show a declining twentieth-century trend as observed, although they lack the post-Agung recovery.
Over the satellite period, the observed decline in hemispheric ice area is generally within the range simulated by the GISS-E2.1 models ( Figure 17). The GISS-E2.1 ensembles have greater success reproducing the decline of the sea ice maximum in March compared to the September minimum.
Regional sea ice trends are shown in Figure 18. The largest NH areal loss is at the seasonal ice margin in the Greenland and Norwegian Seas. This loss coincides with localized regions of atmospheric warming that is excessive compared to observations (Figures 12c,12d,12g,and 12h), suggesting that the removal of sea ice allows the ocean to warm the atmosphere. Sea ice differences between the NINT and OMA ensembles are small, even though the latter is colder over an extensive region at high latitudes (Figure 12e and 12f). Over most of this region, absolute temperatures are far below the melting point of sea ice, which is thus insensitive to small temperature contrasts.  In the SH, melting is largest along the Antarctic coast rather than at the seasonal ice margin.

Ocean Heat Content
Ocean heat content (OHC) is proportional to temperature, but only potential temperature is available in the CMIP6 archive for all GISS-E2.1 models. We approximate OHC as the column integral of the product of mass, potential temperature and the specific heat of seawater. Potential temperature includes the effect of compression due to the weight of the overlying water column, so departures from temperature are smallest near the surface, increasing on the order of 0.1 K per kilometer (Pawlowicz, 2013). During the historical period, temperature anomalies are largely confined to the upper ocean so fractional differences in trends should be small. We use a constant value of specific heat that is independent of temperature and salinity, but chosen to minimize errors in heat content (Griffies et al., 2016;McDougall, 2003).
Ocean heat uptake by models and observations during the historical period is a few hundred ZJ ( Figure 19). Direct measurements of global temperature are available only since the mid-twentieth century and mainly from above the thermocline. To evaluate the model at earlier times and over the entire column, we plot an assimilation model estimate of heat uptake that uses observed SST to infer sub-surface temperature anomalies (Zanna et al., 2019). Uptake increases steadily over the historical period with a brief hiatus following major volcanic eruptions. For most of the ensembles, this increase is relative to a non-negligible trend during the pre-industrial control experiment (Figure 3). For the E2.1-G NINT f2 ensemble, for example, the magnitude of the pre-industrial trend is half the increase during historical period, demonstrating the importance of subtracting OHC drift. Column uptake since the nineteenth century is estimated near 450 ZJ (Zanna et al., 2019), almost identical to uptake by the E2.1-H OMA ensemble (Figure 19c). Uptake for the NINT ensembles is smaller, between 350 and 380 ZJ (Figure 19a), although all GISS-E2.1 ensembles show a stronger trend in recent decades than observed.
Heat uptake in the GISS CMIP6 models is smaller and closer to the Zanna et al. (2019) estimate than the CMIP5 predecessors to E2.1-G and E2.1-H whose column uptake was 800 and 700 ZJ, respectively (Miller et al., 2014). This is consistent with the smaller total ERF for GISS-E2.1 that results from the reduction of GHG forcing (Figure 4). Figures 19a and 19c show GISS-E2.1 model heat uptake above roughly 700 and 2,000 m. (Vertical layering in the GISS and HYCOM oceans is slightly offset from these depths.) Heat uptake within a particular depth range is given by the uptake difference between the two depths. For example, for the E2.1-G NINT f2 ensemble in 2014, uptake between 685 and 1,908 m is the difference between 160 and 210 ZJ: roughly 50 ZJ (Figure 19a). The difference between uptake by the entire column and that part above 1,908 m is about 140 ZJ, which corresponds to the uptake below this depth.  Domingues et al. (2008) by . Upper ocean heat uptake by 1950 at the start of the observational record is not known, so the time average of uptake between 1950 and 2009 is given the same value as the E2.1-H OMA ensemble: only observed and modeled trends are compared. Upper ocean uptake by the E2.1-H ensembles is largest and close to the observed trend; E2.1-G trends are weaker.
The regional distribution of ocean heat anomalies is shown in Figure 20 for the full column as well as the layers above roughly 700 m and below roughly 2,000 m. In the Pacific, heat uptake within E2.1-G NINT f2 is largest above the thermocline of the subtropical gyre where there is downward Ekman pumping (Figure 20c). In the North Atlantic, upper ocean heat uptake is generally positive except at the outlet of the Labrador Sea, a region of deepwater formation that is coincident with an anomalous reduction in surface air temperature (Figure 12a). We interpret cooling in this region as a consequence of weakening of meridional overturning that begins near the end of the historical period (Figure 21), reducing the convergence of ocean heat into regions of deepwater formation. Despite this weakening, overturning continues to deliver heat below 2,000 m as seen by uptake along the deep western boundary currents adjacent to North and South America (Figure 20e). E2.1-G NINT f2 also exhibits large heat uptake below 2,000 m in the Southern Ocean.
E2.1-H NINT f2 shows more extensive warming within the upper ocean (Figure 20d). Atlantic overturning in this model shows almost no weakening during the historical period (Figure 21), consistent with positive heat uptake near the Labrador Sea and the absence of atmospheric cooling (Figures 12d and 20d). Below 2,000 m, anomalous heat is seen along the western boundary of the deep Atlantic, equatorward of the Labrador Sea, but uptake is smaller than in E2.1-G and elsewhere there is little heat in this layer.
The greater heat uptake by E2.1-H and its confinement to the upper ocean is consistent with the model's greater TCR (Table 5). Lerner et al. (2020) show that the mixed layer in E2.1-G is too deep in the North Atlantic, suggesting that there is excessive transport to the deep ocean. The greater fraction of heat storage below 700 m in E2.1-G is consistent with its CMIP5 E2-R predecessor, despite an incorrect implementation of isopycnal diffusion in the earlier model that resulted in excessive downward transport across isopycnals (Miller et al., 2014). The contrasting vertical extent of heat invasion common to both the CMIP5 and CMIP6 generations of GISS ocean models suggests that this error was not central; the contrasting depths of heat uptake reflect a more fundamental difference in the response of each ocean model to radiative forcing.

Sea Level
Sea level rises both through steric and eustatic effects. Steric rise results from ocean expansion, equivalent to a decrease in seawater density, either through warming or freshening. The latter occurs through melting of sea ice, for example. Eustatic sea level rise results from an increase of ocean mass as through the melting of continental ice. Conversely, eustatic fall results from an onshore transport of water vapor and sequestration of precipitation in lakes, soil and snow fields. E2.1-G has a mass-conserving ocean whose sea level trends are diagnosed exactly. The E2.1-H ocean is volume conserving so sea level is computed diagnostically through linearization of the seawater equation of state and the excess of precipitation and runoff over evaporation.
(The E2.1-H eustatic value was incorrectly diagnosed, so only the steric value is available.) Since posting output from the historical ensembles to the CMIP6 archive, we discovered an error in the initialization of lake volume that caused a spurious sea level decline. We corrected the initialization and created an extra (eleventh) member of the E2.1-G NINT f2 ensemble. (Sea level shows little variation among members, so we considered a single corrected realization to be sufficient to diagnose historical changes.) The total change in sea level during the historical period for the corrected model is shown in Figures 22a  and 22b. The model and observed sea level are assumed to have identical averages during their period of overlap, so only changes are comparable. Model sea level is fairly uniform until the early twentieth century; thereafter it rises 26 cm by the end of the historical period. Over the entire historical period, the total model rise is close to the observed rise estimated since 1880 . However, the recent model trend exceeds that computed from a combination of tide gauges and satellite altimetry , and the attribution is different as described below.
The thermosteric contribution to sea level rise is shown in Figures 22c and 22d. Because this is related to thermal changes during the historical period, we assume that the thermosteric trend from the ensembles with incorrect lake initialization remain informative. For E2.1-G, the thermosteric contribution is 2 cm over the historical period; this contribution is 2 times larger for both the E2.1-H NINT f2 and OMA ensembles, a contrast that is consistent with each model's greater surface warming and ocean heat uptake (Figures 11,19a,and 19c). (The E2.1-H value is taken from the total steric value with the assumption that salinity effects are secondary.) The model thermosteric contribution over the entire historical period is comparable to that estimated from observations since 1961 by . The observed thermosteric rise contributes slightly less than half of the total increase during recent decades. Most of the remaining observed rise is eustatic and attributed to melting of continental ice sheets like Greenland and Antarctica along with mountain glaciers.
Land ice mass is fixed in GISS-E2.1, so it cannot contribute to the eustatic rise in Figure 22. In the model, the main contribution to the total is eustatic (Figure 23a), but in contrast to the observations, the source of the eustatic rise is irrigation (Figure 23b). When there is insufficient lake and river water to supply the prescribed irrigation demand, water is drawn from an aquifer of unlimited volume. The aquifer draw increases the mass of total model water and causes a rise in sea level. Over the period 1980-2009, this rise is 7 cm, an order of magnitude larger than the observationally based estimate of Grogan et al. (2017). Figure 23b shows that other contributions to eustatic change like snowmelt in the warming climate or groundwater sequestration would be more important in comparison to a realistic aquifer draw.

Precipitation
The change in global precipitation over the historical period is shown in Figure 24. For all model ensembles, precipitation increases steadily during the final half century of the historical period except in the aftermath of a major volcanic eruption. Any observed trend is obscured by interannual variations in the Global Precipitation and Climatology Project (GPCP) analysis (Adler et al., 2003(Adler et al., , 2018, where extrema track El Niño and La Niña events whose influence is diminished in the models by ensemble averaging. Nonetheless, Adler et al. (2017) find no significant GPCP trend after removing the influence of ENSO and volcanos. This discrepancy may result from a large unforced signal in the remaining variability or limitations of the models. The largest model increase is associated with E2.1-H ensembles that show the greatest tropospheric warming (Figures 11 and 15). The precipitation increase results from offsetting changes by GHGs and aerosols. The former increase tropospheric radiative divergence (balanced mainly by additional latent heating associated with precipitation); the latter dim the surface and reduce evaporation (Allen & Ingram, 2002;Andrews et al., 2009;Miller et al., 2004;Samset et al., 2016). To isolate the effect of GHGs, we regressed precipitation with the temperature increase following an abrupt quadrupling of CO 2 . The initial rapid adjustment of precipitation is negative but followed by a precipitation increase of 2.3% for each degree K of warming. Both the adjustment and the sensitivity of precipitation to warming are three-quarters of the corresponding values for the GISS CMIP5 ensembles. This is also the ratio of Year 2000 GHG forcing between the GISS CMIP6 and CMIP5 configurations ( Figure 4). We have attributed the CMIP6 reduction of GHG forcing in GISS-E2.1 Figure 24. Ensemble and global annual precipitation (×0.1 mm day −1 ) during the historical ensembles (1850-2014). Pink shading marks plus and minus two standard deviations of intra-ensemble variability of the E2.1-G NINT f2 configuration. This variability is smoothed using an 11-year window that damps periods less than a decade. The vertical gray lines mark volcanic eruptions. The GPCP anomaly is constructed to have the same time average as the E2.1-G NINT f2 ensemble between 1979 and 2014.

10.1029/2019MS002034
to its greater pre-industrial opacity to longwave radiation (Figure 7). The increase in opacity and decrease of forcing is also consistent with the reduction in the rapid adjustment of precipitation following the abrupt CO 2 increase and the subsequent increase of precipitation with warming.

Conclusions
We describe the GISS-E2.1 response to forcing during the CMIP6 historical period (1850-2014) using distinct atmospheric physics versions coupled to two different ocean models.
As observed, each model ensemble shows roughly 1 K of warming at the surface during the historical period with cooling in the stratosphere. The choice of ocean model is important to the simulated climate trends.
For the specific period of 1979-2014, surface warming in the E2.1-G ensembles is consistent with observations, but warming in E2.1-H is marginally too rapid. The E2.1-G ensembles warm more slowly by exporting a greater fraction of ocean heat beneath the thermocline, a contrast that was also exhibited by the CMIP5 E2-R and E2-H configurations. While confinement of anomalous heat within the upper ocean in the E2.1-H ensembles is in better agreement with an estimate of OHC based upon historical SST (Zanna et al., 2019), more observations of heat uptake below the thermocline are needed to evaluate the models. Only the E2.1-G ensembles are able to reproduce a distinctive feature of the observed surface temperature: namely, a slight cooling in the North Atlantic that in the ensembles is associated with a recent reduction in the rate of meridional overturning.
Resemblance of the NINT trends to observed values is aided by the calibration of the aerosol indirect effect to −1 W m −2 , consistent with current estimates based upon models constrained by observations (Myhre et al., 2013). However, the OMA model versions warm similarly while calculating the first indirect effect with a physically based parameterization. Preliminary experiments with ModelE3 that include a physically based second indirect effect show that model climate sensitivity is only slightly increased and remains within the canonical range (Sherwood et al., 2020). The dominant indirect contribution to aerosol forcing shows the importance of continued efforts to represent interactions between aerosols and clouds mechanistically.
Warming of the GISS-E2.1 ensembles is reduced compared to the GISS CMIP5 simulations, despite the higher TCR of the former. This reduction is due to smaller total forcing in the GISS-E2.1 configurations that follows the reduction of GHG forcing by one-quarter. We attribute this reduction to greater longwave opacity of the GISS-E2.1 pre-industrial climate, which has higher water vapor concentration and greater cloud longwave forcing than the GISS CMIP5 model (Figure 7). This demonstrates the importance of the unperturbed climate to forcing and forced climate trends. The LW opacity increase and forcing decrease is also related to a lower sensitivity of precipitation to warming in GISS-E2.1 compared to the CMIP5 GISS-E2.
Comparison of multiple model configurations allows the importance of different climate processes to be examined. The NINT f2 ensemble corrects an error in the prescription of volcanic aerosol surface area available for heterogenous chemical reactions. Comparison with the NINT f1 ensembles shows the effect of these reactions upon ozone and the associated temperature variations in the lower stratosphere following a volcanic eruption (Figure 13). Comparison of the eruption response across the historical period demonstrates that whether ozone is destroyed or enhanced by these reactions depends upon the concentration of CFCs.
Climate forcing by irrigation was introduced into GISS-E2.1 after CMIP5. Irrigation is important for simulating regional trends, especially those associated with the Asian Summer Monsoon (Cook et al., 2011;Puma & Cook, 2010;Shukla et al., 2014). Aspects of the irrigation flux are not well-constrained by observations, including its decomposition into contributions from surface reservoirs like lakes as opposed to aquifers that are replenished over geologic durations that are long compared to the historical period. We show that sea level rise in the GISS historical ensembles is comparable to the observed change , but results almost entirely from an unrealistically large draw upon aquifer fossil water. This shows how coupled models can reveal unexpected limitations of atmosphere-only representations, guiding model development and providing indirect observational constraints upon forcing, in this case, constraining irrigation rates with observed sea level changes.
Model comparison to observed climate trends is a first step to understanding these changes and assessing the reliability of predictions of future change. The overall agreement of GISS-E2.1 with observed trends is familiar from evaluation of its predecessors, as is the conclusion that these trends are almost entirely

10.1029/2019MS002034
anthropogenic in origin. This agreement points to larger climate change and disruption in the coming decades, even when assessed by a model with mid-range sensitivity to forcing like GISS-E2.1.

Data Availability Statement
CMIP6 standard variables analyzed in this study are available from the Earth System Grid Federation. Additional variables like MSU temperatures and OHC can be downloaded online (from https://portal.nccs. nasa.gov/GISS_modelE/E2.1). Table 4 provides links to the measurements and retrievals used to evaluate the models.