Present-Day Greenland Ice Sheet Climate and Surface Mass Balance in CESM2

The response of the Greenland Ice Sheet (GrIS) to a warmer climate is uncertain on long time scales. Climate models, such as those participating in the Coupled Model Intercomparison Project phase 6 (CMIP6), are used to assess this uncertainty. The Community Earth System Model version 2.1 (CESM2) is a CMIP6 model capable of running climate simulations with either one-way coupling (fixed ice sheet geometry) or two-way coupling (dynamic geometry) to the GrIS. The model features prognostic snow albedo, online downscaling using elevation classes, and a firn pack to refreeze percolating melt water. Here we evaluate the representation of the GrIS surface energy balance and surface mass balance in CESM2 at 1 ◦ resolution with fixed GrIS geometry. CESM2 agrees closely with ERA-Interim reanalysis data for key controls on GrIS SMB: surface pressure, sea ice extent, 500 hPa geopotential height, wind speed, and 700 hPa air temperature. Cloudsat-CALIPSO data show that supercooled liquid-containing clouds are adequately represented, whereas comparisons to Moderate Resolution Imaging Spectroradiometer and CM SAF Cloud, Albedo, and Surface Radiation data set from Advanced Very High Resolution Radiometer data second edition data suggest that CESM2 underestimates surface albedo. The seasonal cycle and spatial patterns of surface energy balance and surface mass balance components in CESM2 agree well with regional climate model RACMO2.3p2, with GrIS-integrated melt, refreezing, and runoff bracketed by RACMO2 counterparts at 11 and 1 km. Time series of melt, runoff, and SMB show a break point around 1990, similar to RACMO2. These results suggest that GrIS SMB is realistic in CESM2, which adds confidence to coupled ice sheet-climate experiments that aim to assess the GrIS contribution to future sea level rise.


Introduction
The Greenland Ice Sheet (GrIS) is the Earth's second largest freshwater reservoir (after the Antarctic Ice Sheet), with the potential to raise global mean sea level by about 7.4 m were it to melt completely (Morlighem et al., 2017). During 2012-2016, the GrIS lost 247±15 Gt yr −1 or 0.69±0.04 mm sea level equivalent . Mass loss is expected to increase in a warming climate (Church et al., 2013), with implications for global sea level rise, marine biology (e.g., Bhatia et al., 2013), and ocean circulation (Böning et al., 2016;Fichefet et al., 2003;Gerdes et al., 2006;Gillard et al., 2016;Hu et al., 2012). A large uncertainty in future GrIS mass loss is the sensitivity of the surface mass balance (SMB) to atmospheric warming, with estimates ranging from 5 to 13 cm of sea level equivalent in 2100 under a high emissions scenario (Fettweis, Franco, et al., 2013). Accurately representing ice sheet surface processes in global climate models, such as those participating in the Coupled Model Intercomparison Project phase 6 (CMIP6, Eyring et al., 2016), could reduce this uncertainty and improve our understanding of feedbacks, for example, with ocean circulation (Fyke et al., 2018;Little et al., 2016). On centennial timescales, dynamic feedbacks become important, and ice sheet volume and extent must be modeled with a dynamical ice sheet model (Levermann & Winkelmann, 2016;Le clec'h et al., 2019). This is recognized by the Ice Sheet Model Intercomparison Project for CMIP6 (ISMIP6), which provides protocols for coupled ice sheet-climate model experiments (Nowicki et al., 2016). The evolution of a dynamical ice sheet model is sensitive to the applied SMB (Khan et al., 2015), underscoring the need for a realistic representation of ice sheet surface climate and snow/firn properties in climate models.
Calculating SMB directly within the global model offers the advantage of performing long transient coupled ice sheet-climate simulations within a single modeling framework, also in paleoconfigurations with large ice sheet domains (Ziemen et al., 2014). Global models pose challenges in terms of unresolved physical processes, regional biases, and limited spatial resolution (Fyke et al., 2018;Vizcaino, 2014). Nevertheless, some Earth system models (ESMs) now incorporate full energy balance models and multilayer snow models to calculate ice sheet surface melting in a physically realistic way (Alexander et al., 2019;Cullather et al., 2014;Punge et al., 2012;Shannon et al., 2019). As noted by Rae et al. (2012), adequate energy balance schemes are a critical prerequisite for modeling SMB realistically, with more detailed snow schemes agreeing better with observations than simpler models. Furthermore, some ESMs address the challenge of spatial resolution by using elevation tiles or elevation classes (ECs) that resolve SMB at multiple vertical levels within a single grid cell Shannon et al., 2019). The use of ECs is motivated by the observation that the SMB in ablation zones typically varies nonlinearly with height (due, e.g., to the snow-albedo feedback). ECs can be essential for resolving narrow ablation zones in a coarse-resolution model (cf. van Kampenhout et al., 2019, their figure S1).
The Community Earth System Model version 1 (CESM1) was among the first ESMs to produce a realistic present-day GrIS SMB using (1) explicit calculation of snow albedo, (2) ECs, and (3) melt rates calculated through energy balance modeling (Sellevold et al., 2019;Vizcaíno et al., 2013). CESM1 was not used, however, in coupled ice sheet-climate experiments because the required infrastructure changes-specifically, dynamic land cover change with mass and energy conservation-were not ready at the time. CESM1 had a number of physics limitations. For example, snow height was limited to a maximum of 1 m water equivalent (w.e.) which is insufficient to simulate firn and limits the refreezing capacity Vizcaíno et al., 2014). Further, compensating errors were found in vertical gradients of snowfall, melt, and refreezing (Sellevold et al., 2019). CESM version 2 (CESM2) aims to address these shortcomings. Most importantly, dynamic land cover is now operational (Lawrence et al., 2019) allowing two-way coupled ice sheet-climate simulations, for example, in the context of ISMIP6. Further, snowpack/firn physics have been improved, and the maximum firn depth has been increased to 10 m w.e. (van Kampenhout et al., 2017), which is relevant to GrIS SMB projections since firn can store and refreeze surface meltwater .
Here, we assess the skill of CESM2 in simulating GrIS SMB, with the goal of providing confidence in GrIS simulations and suggesting directions for future development. First, we use best-estimate reanalysis data to evaluate large-scale circulation patterns during the late twentieth century . As explained above, consistency with present-day reanalysis data is a relevant goal for global models, since biases may lead to unrealistic SMB (or an SMB that is right for the wrong reasons), which will have implications for SMB projections under external forcing. Second, we use remotely sensed cloud water path observations to evaluate CESM2 clouds over Greenland, since clouds have a strong impact on the radiative budget at the surface (McIlhattan et al., 2017). Of particular importance are liquid-bearing clouds, which are thought to be the leading cause of the extreme GrIS melt event in July 2012 (Bennartz et al., 2013;Van Tricht, Lhermitte, Lenaerts, Gorodetskaya, L'Ecuyer, et al., 2016), as well as other widespread melt events (Cullather & Nowicki, 2018). Third, CESM2 surface albedo is evaluated using a combination of remote sensing data and RCM output. Finally, best-estimate RCM output is used to assess monthly mean CESM2 energy and mass fluxes over the GrIS, with a focus on the seasonal cycle and spatial distribution. In order to account 10.1029/2019JF005318 for the increased (vertical) detail that is obtained through ECs, we downscale CESM2 fluxes offline from each EC to the RCM grid.
The layout of the paper is as follows. Section 2 describes CESM2, with a focus on SMB computations and the EC method, the experimental setup, and the reference data. Section 3 presents the main results, including a detailed model evaluation and a brief time series analysis. Section 4 reflects on the development process toward CESM2 and discusses directions for future development. Conclusions are found in section 5.

Model Description
CESM2 is a climate modeling framework that encompasses the Earth's major physical and biochemical components operating on decadal to centennial time scales, including the atmosphere, oceans, terrestrial systems, sea ice, and land ice. The public release of CESM2.1 in December 2018 included source code, forcing files for CMIP6 Diagnostic, Evaluation and Characterization of Klima experiments, and scientifically validated configuration files. This is the version referred to here as "CESM2". Ocean dynamics and physics are modeled by the Parallel Ocean Program version 2 (updated from Smith et al., 2010) and sea ice dynamics and physics by CICE, the Los Alamos Sea Ice Model (Hunke et al., 2015). The atmospheric component is the Community Atmosphere Model version 6 (CAM6, Gettelman et al., 2019) which features a new subgrid cloud parametrization (CLUBB, ; Bogenschutz et al., 2013), new cloud microphysics (Morrison-Gettelman (MG2), Gettelman & Morrison, 2014;, and a new surface drag parametrization (Beljaars et al., 2004) that replaces the former Turbulent Mountain Stress scheme. The land surface component is the Community Land Model version 5 (CLM5), which contains many updated parametrizations and which introduces dynamic land units-a requirement for coupled ice sheet-climate experiments (Lawrence et al., 2019).
Due to the global nature of CESM2, regional biases will likely remain. Relevant to our study is a bias in CAM6 that causes unrealistically high rainfall over the interior GrIS, similar to CAM4 , and may be due to incorrect cloud liquid-to-rain conversion rates for polar conditions. As rainfall is simulated even at very low (< −20 • C) near-surface temperatures, a bias correction was implemented into CLM to reduce the impact of rain on the mass and energy budget (e.g., on albedo). During a precipitation event, the phase of precipitation is recalculated by CLM, based on a near-surface temperature threshold that depends on the land cover type. Within each CLM grid cell, up to five different land cover types are simulated-natural vegetation, lake, urban, glacier, and crop-that each may be subdivided into multiple columns and patches. For all nonglacier land cover types, all precipitation is assumed to be snow below a near-surface temperature threshold of 0 • C and rain above 2 • C. Between these threshold temperatures, a linear ramp is assumed. For the glacier land cover type, both the lower (−2 • C) and higher threshold (0 • C) are adjusted to allow for more glacier melt in summer. The physical justification for this difference is that atmospheric inversions are more common over glaciers than over other land cover types, such that atmospheric temperatures above the surface layer can above the melting point when temperatures are below freezing near the surface. Any heat released or taken up by the phase change of precipitation is accounted by adjusting the sensible heat flux to the atmosphere, to ensure energy conservation. Note that the phase change is not unidirectional, and there may be grid points or specific events where snow is converted into rain, for example, in the lowest ECs where reference height temperature is higher due to downscaling (see section 2.4).
The last CESM2 component of interest to this study is the Community Ice Sheet Model 2.1 (CISM, Lipscomb et al., 2019), which solves the dynamics and thermodynamics of the ice sheet. CISM incorporates a hierarchy of (parallel) Stokes flow solvers: the shallow ice approximation, the shallow shelf approximation, a depth-integrated higher-order solver, and a 3-D higher-order solver. Parametrizations for basal sliding, iceberg calving, and sub-ice-shelf melt are included as well. For the GrIS, CISM runs on a 4 km polar stereographic (Cartesian) grid on which it receives a downscaled temperature and SMB field from CLM when coupled within CESM2. CISM participated in the initMIP-Greenland study (Goelzer et al., 2018), the ISMIP6 standalone Greenland projections (Goelzer et al., 2020), and the two-way coupled ISMIP6 experiments (Nowicki et al., 2016). In our study, however, CISM exclusively operates in one-way coupled mode, that is, as a diagnostic component.

Surface Energy Balance
Surface energy balance (SEB) calculations over both ice sheets-and including Antarctic ice shelves-are handled by CLM. The SEB equation may be written as follows (Greuell & Konzelmann, 1994): where F net is the net surface energy flux, R net is the net radiative flux, which is the sum of the net shortwave (SW net ) and net longwave (LW net ) fluxes, and SHF and LHF are the turbulent fluxes of sensible and latent heat, respectively, all defined positive toward the surface, that is, downward. The net surface energy flux has two components, the conductive or ground heat flux (GHF) and the melt heat flux (MHF), which in CLM is computed separately for snow (variable FSM) and ice (variable QICE_MELT). The resulting SEB equation reads where GHF is defined positive toward the surface, that is, upward. Strictly speaking, equations (1) and (2) represent the top few centimeters of snow or ice rather than an infinitesimally thin surface, since CLM allows for shortwave radiation penetration and subsurface melt.
The net shortwave flux SW net depends on surface insolation (SW d ) and effective surface albedo ( ): Ice albedo is assumed constant in space and time, and its value was modified in CESM2 to 0.5 in the visible and 0.3 in the near-infrared parts of the spectrum, based on a compilation of Moderate Resolution Imaging Spectroradiometer (MODIS) ice albedo values. (Values in CESM1 were 0.6 and 0.4, respectively.) Snow albedo is solved using the two-stream, multilayer SNow, ICe, and Aerosol Radiation model, which accounts for radiation penetration, snow grain metamorphism, and snow impurities (Flanner & Zender, 2005;Flanner et al., 2007). In CESM1, initial snow grain size was fixed to a constant value of 54.5 μm; this is a poorly constrained parameter that affects the strength of the albedo-melt feedback. In CESM2, initial grain size is a function of temperature, with larger grains at higher temperatures: with T ref the temperature at the atmospheric reference height of 2 m. This choice is informed by a remote sensing study using ICESat that indicates snow grain size ranges from 190 to 300 μm over ice sheets (Yang et al., 2017) and is supported by an independent study suggesting that the original SNow, ICe, and Aerosol Radiation formulation underpredicts grain size by about 160 μm (Sandells et al., 2017).
The net longwave flux LW net is calculated as follows: rad with emissivity (set to 0.97 over snow and ice), Stefan-Boltzmann constant , and radiative temperature T rad . The assumption of gray body radiation with constant emissivity is not necessarily realistic-in particular in the polar regions (Kuo et al., 2018)-and therefore may need to be revised in the future. Finally, the turbulent fluxes SHF and LHF are calculated using Monin-Obukhov similarity theory, with a momentum roughness length of 10 −2 m over ice and 2.4 × 10 −3 m over snow-covered surfaces, which is of the same order of magnitude as in the regional models MAR (Alexander et al., 2019) and RACMO2 (1-5 ×10 −3 m). For further details on the SEB calculation, the reader is referred to the CLM Technical Documentation (https:// escomp.github.io/ctsm-docs/doc/build/html/tech_note/Fluxes/CLM50_Tech_Note_Fluxes.html).

Snow Model and the Computation of SMB
CLM5 incorporates a multilayer snow model with up to 12 layers and a maximum snow/firn depth of H max = 10 m w.e. Snow layers can vary in height and prognostically evolve snow density, temperature, grain size, and impurities (Oleson, 2013). During accumulation events, snow mass is advected downward whenever a layer reaches its maximum height, and any snow in excess of H max is removed from the bottom. Liquid water is allowed to percolate and refreeze, and the irreducible water content for capillary retention is set to 3.3% (Oleson, 2013). Parametrizations for fresh snow density and snow compaction are updated in CESM2 and are described in van Kampenhout et al. (2017). At typical ESM horizontal resolutions of ∼1 • , subgrid snow cover heterogeneity can be large due to subgrid topographic variability. Snow cover fraction is parameterized as a function of snow height and snow density following Swenson and Lawrence (2012), with uniform snow cover assumed when the snow pack exceeds 2 m w.e. in depth.
In most CESM2 configurations, as well as in this study, the dynamical ice sheet model CISM is active only as a diagnostic component, which implies that changes in ice volume are not considered by the coupled model. At its upper surface boundary, CISM is forced by temperature and SMB as calculated in CLM. A common definition of SMB in glaciology reads (e.g., Lenaerts et al., 2019;van den Broeke et al., 2016): That is, SMB equals total precipitation (PR, snowfall plus rainfall) minus runoff (RU), sublimation at the surface (SU sfc ), sublimation of drifting snow (SU ds ), and drifting snow erosion (ER ds ), all expressed in mm w.e. yr −1 . If subsurface processes (such as refreezing) are considered, as is the case here, this definition is formally referred to as climatic mass balance (Cogley et al., 2011). When used locally (i.e., not area integrated), equation (4) describes the specific SMB and in that case accumulation is defined as SMB > 0 and ablation as SMB < 0. We note that drifting snow is not explicitly modeled in CESM2, so that SU ds = ER ds = 0 throughout this paper. Further, the SMB calculated internally in CLM differs from equation (4) altogether, and reads where RU cap is the mass flux due to capping of excess snow, RU ice is runoff due to bare ice melt, and SU ice is sublimation from bare ice. When the snowpack reaches its maximum thickness (H = H max ) and precipitation falls, definition (5) is equivalent to definition (4), since the capping flux will equal the precipitation flux. When the snowpack is absent (H = 0) and ice melts, the two definitions are also equivalent, since the runoff flux RU equals the bare ice runoff flux RU ice . Differences occur when the snowpack gains or loses mass through accumulation or ablation. These changes are not taken into account by the CESM2 definition, which therefore yields SMB = 0 in those cases. The reason for the distinct definition (5) in CLM is technical in nature and relates to two-way coupled configurations with an active ice sheet model. One drawback of the CLM definition is that it prevents one from forcing an ice sheet model with a realistic SMB on subannual time scales, as pointed out by Lipscomb et al. (2013). Future work is planned to adapt the computed SMB to include changes in snow mass, in agreement with the usual definition (4). We remark that, in the current study, the difference in SMB definition is largely irrelevant since most analysis is carried out for individual SMB components.

Downscaling With ECs: Online and Offline
Over glaciated surfaces, CLM employs a second method-ECs-to account for subgrid topographic variability, in addition to the snow cover fraction parametrization described in the previous section. Within the glacier land cover type, the typical number of ECs is 10, each associated with a certain elevation bin and allowed to evolve independently of one another. The weight assigned to each EC is determined by the topography of an external high-resolution topographic map. Over Greenland, the elevation is provided by CISM at a resolution of 4 km. In one-way coupled runs (i.e., the ice sheet is not evolving), CISM topography is assumed constant at observed values according to Morlighem et al. (2014) and is integrated into the CAM orography for consistency. ECs with zero weight are marked "virtual" and do not contribute to the grid cell mean value of any state or flux. At 1 • resolution (finite volume), the mean EC topographic height over the GrIS and peripheral glaciers and ice caps (GICs) is 2,085 m. By contrast, mean CAM surface height is 1,871 m over the same area, which shows that the use of ECs over Greenland increases the mean height by 214 m and thus cools the climate with respect to a simulation without ECs. This cooling might have implications for regional climate, such as increased sea ice cover in the North Atlantic (Sellevold et al., 2019).
Each EC, implemented as a CLM column, maintains independent elevation, soil/ice layers, snow layers, SEB, and SMB. Atmospheric temperature is downscaled to ECs using a constant lapse rate of 6 • C km −1 , which falls into the range suggested by observational and model-based studies (Erokhina et al., 2017;Fausto et al., 2009;Hanna et al., 2005), but does not capture the seasonal cycle in lapse rate indicated by those same studies. Near-surface specific humidity is also downscaled, assuming that relative humidity is constant with elevation. New in CESM2 is an EC correction for LW d with a linear lapse rate of 32 W m −2 km −1 , a value inferred from Van Tricht, Lhermitte, Lenaerts, Gorodetskaya, van Lipzig, et al. (2016) (their Figure 6). As no radiation should be added to (or removed from) the CLM grid cell mean, LW d is normalized after this downscaling. Also, to prevent extreme outcomes, the elevation correction is restricted to a maximum relative change of 50%, that is, LW d, downscaled is bounded by [0.5, 1.5] · LW d, atm .
Each EC resolves SMB according to equation (5), resulting in a vertically resolved SMB that is subsequently downscaled by the CESM2 coupler to the high-resolution ice sheet grid by applying bilinear and linear interpolation in the horizontal and vertical directions, respectively. Along with the downscaled ice surface temperature, this downscaled SMB serves as a boundary condition to CISM. Note that the downscaled fields sometimes contain artifacts due to grid imprinting, examples of which we briefly discuss in the supporting information (Text S1). In case of two-way coupling, a global SMB normalization is applied to ensure mass conservation (for details, see https://escomp.github.io/cism-docs/cism-in-cesm/versions/release-cesm2.0/ html/index.html). Energy conservation is not an issue as there is no energy exchange between CLM and CISM. The EC infrastructure is not specific to CISM; any ice sheet model could potentially receive surface forcing through the CESM2 coupler infrastructure.
Alongside this online downscaling, EC-indexed variables in CLM can be analyzed directly, for example, for evaluation purposes. In this study, EC-indexed SEB and SMB components are downscaled offline to an 11 km RCM grid for comparison to RCM output. Our offline downscaling follows a two-step procedure, similar to the online downscaling. First, EC topography and variables of interest are bilinearly interpolated to the target grid. Then, the variable of interest is vertically downscaled toward the target elevation by using the 3-D fields from the previous steps. No corrections are made to preserve area-integrated mass or energy, so these offline fluxes may differ from fluxes that were downscaled online.

Description of the Experiments
The main data set used in this paper originates from historical (1850-2014) coupled atmosphere-ocean experiments that were carried out using the finite-volume dynamical core at ∼ 1 • horizontal resolution (0.9 • × 1.25 • ) for CMIP6. To limit the effect of decadal variability in the freely evolving CESM2 data-beyond the standard practice of taking 30-year climatologies-we calculate climatologies using a composite of six historical members named HIST-01 to HIST-06. Thus, over the reference period 1961-1990, a composite climatology is computed using a total of 180 model years. Climatologies are computed using grid cell mean output variables in both CAM and CLM. Whereas standard output is used for CAM variables, we only consider CLM variables that are specific to the glaciated part of the grid cell (i.e., variables that are suffixed _ICE). The three-dimensional, EC-indexed CLM variables are not available in these six historical experiments since they are nonstandard output. Hence, offline-downscaled data (section 2.4) were generated from another experiment: a dedicated CESM2 historical simulation (HIST-EC) that was branched off HIST-03 in the year 1950. The climatology from this experiment resembles the composite climatology but follows a unique climate trajectory.

Reference Data
Since the CESM2 climate evolves freely, it does not correspond to actual historical weather and climate nor does it necessarily reproduce the correct phasing in modes of interannual and decadal climate variability (Hanna et al., 2018). Polar RCMs forced by reanalyses, on the other hand, do reproduce historical meteorological observations and SMB measurements across the GrIS with increasingly good agreement (Fettweis et al., 2017;Langen et al., 2017;Niwano et al., 2018;Noël et al., 2018;van den Broeke et al., 2016). In this study we choose to evaluate mean CESM2 climate and SMB by comparison to polar RCM output from the Royal Netherlands Meteorological Institute (KNMI) regional atmospheric climate model version 2.3p2 (RACMO2 hereafter), which has been extensively validated over the GrIS using in situ measurements (Noël et al., 2018). Mean climatological data are compared over 1961-1990, a period during which Greenland climate was relatively stable with no obvious trends in individual SMB components (Fettweis, Franco, et al., 2013;Mouginot et al., 2019;van den Broeke et al., 2016). Along with the native 11 km RACMO2 data, we use statistically downscaled data at 1 km, which are more accurate in the GrIS ablation zones . When compared to 182 SMB measurements from the GrIS accumulation zone, RACMO2 at 11 km yields an r 2 of 0.85, bias of −21.8 mm yr −1 , and root-mean-square error (RMSE) of 71.7 mm yr −1 (Noël et al., 2018, their Figure  11a), whereas a comparison at 1 km against 1,073 ablation zone SMB measurements yields an r 2 of 0.72, bias of 120 mm yr −1 , and RMSE of 870 mm yr −1 (Noël et al., 2018, their Figure 11c).

10.1029/2019JF005318
The large-scale climate in CESM2 is evaluated using ERA-Interim reanalysis data (Dee et al., 2011), which are the same data used to force RACMO2. ERA-Interim data are available only after 1979, so the period of evaluation is changed to 1979-1999 with no impact on the conclusions (not shown).
Finally, CESM2 surface albedo is compared to two independent satellite-derived albedo products. The first product is the CM SAF Cloud, Albedo, and Surface Radiation data set from Advanced Very High Resolution Radiometer data second edition (CLARA-A2, Karlsson et al., 2017), which originates from sensors aboard polar-orbiting, operational meteorological satellites. The second product is the MODIS (16 day albedo version 6 product (MCD43A3v6), which was calculated by inverting surface reflectance measurements in MODIS instruments aboard NASA's Terra and Aqua satellites (Stroeve et al., 2013). In contrast to Alexander et al. (2014) we use the integrated diffuse "white-sky" MODIS albedo rather than the direct beam "black-sky" albedo, since white-sky albedo is not dependent on the solar zenith angle and therefore is more fit for comparisons to models. Monthly climatologies are calculated for each product by averaging monthly 50th percentile (median) albedo over the periods 2000-2017 (MODIS) and 2000-2015 (CLARA-A2). Figure 1 compares CESM2 surface pressure, geopotential height at 500 hPa, and sea ice cover to data-assimilated equivalents in ERA-Interim. The dominant feature in both seasons is the polar vortex-a permanent low-pressure system located around the North Pole-which is stronger in winter than in summer (Figures 1a and 1d). The prevalent atmospheric flow over Greenland is southwesterly and is slightly more zonal in summer than in winter. Most moisture is delivered to the ice sheet by storms that originate in the North Atlantic (Sodemann et al., 2008), as evidenced by the climatological Icelandic Low, centered southwest of Iceland. Sea ice extent is typically greatest in March, extending into Baffin Bay and the Labrador Sea, and least in September, when sea ice is mainly confined to the Arctic basin. CESM2 captures these general features of the regional climate in both winter and summer (Figures 1b and 1e). During winter, surface pressure (white contours) is lower in CESM2 (by ∼5 hPa) across the entire Arctic domain. In concert, the polar vortex extends further south around Iceland, leading to a negative geopotential height anomaly up to −4 decameter (dam, Figure 1c), which is not significant. In summer, the polar vortex is slightly weaker in CESM2, with its central geopotential height overestimated by 2 dams (Figure 1f), although the southward expansion of the polar vortex appears exaggerated. September sea ice extent is slightly underestimated in CESM2 (Figure 1e), but there is good agreement near Greenland, suggesting that substantial GrIS SMB biases due to sea ice biases are unlikely (Noël et al., 2014;Stroeve et al., 2017).

Large-Scale Circulation: Comparison to ERA-Interim
The 500 hPa zonal wind speed ( Figure S1) generally decreases from midlatitudes, where it is driven by the polar jet stream, toward the central Arctic, where meridional pressure differences are smaller (Figures S1a and S1d). CESM2 captures these general features of the zonal wind pattern (Figures S1b and S1e). During summer, there are significant anomalies in CESM2, with weaker winds across the High Arctic and stronger winds outside the Arctic, likely linked to the more widespread polar vortex in that season (Figure 1e). Zonal wind speeds are lower than observed over northern Greenland in both seasons. There are few significant differences in the 500 hPa meridional wind speed ( Figure S2) and none over Greenland. Figure S3 shows 700 hPa temperature, a variable that is strongly tied to Greenland melt and runoff (Fettweis, Franco, et al., 2013;Fettweis, Hanna, et al., 2013). Again, CESM2 succeeds in reproducing the spatial patterns and magnitude of the seasonal cycle in ERA-Interim ( Figures S3a, S3b, S3d, and S3e). In wintertime, however, CESM2 has a widespread warm bias (1-2 • C) across the Arctic and over Greenland ( Figure S3c). During summertime, this warm bias is confined to the central Arctic Ocean, with cold anomalies of ∼1 • C over Eurasia and the Canadian Arctic. We hypothesize that these cold anomalies are a thermal response to excessive summer snow cover (not shown), and we note that the negative bias does not extend over most of Greenland.  Figure 2 compares modeled cloud LWP and IWP to Cloudsat-CALIPSO and indicates that CESM2 simulates a substantial amount of liquid cloud throughout the year (Figure 2a). Integrated over the GrIS, the mean annual LWP is 29.9 g m −2 , compared to 20.2 g m −2 in CloudSat-CALIPSO, an overestimation which is most pronounced in April-May and July-November. The overestimation is most pronounced over the southern part of the ice sheet, due in part to the crude representation of topography  and in part to a general overestimation of liquid cloud in the North Atlantic ( Figure 2a). IWP, on the other hand, is underestimated in CESM2 compared to the observations (Figure 2b). CESM2 has few ice clouds over the GrIS throughout the year, with an annual mean IWP of 8.7 g m −2 compared to 37.2 g m −2 in CloudSat-CALIPSO. For the SEB and SMB, this persistent underestimation in IWP is deemed of secondary importance, given the limited sensitivity of longwave radiation to ice clouds compared to liquid clouds (Morrison et al., 2012). Figure 3a shows near-surface wind speed U 10m averaged over the GrIS in CESM2 and RACMO2. CESM2 reproduces the seasonal cycle of U 10m , although wind speed is underestimated throughout the year (−1.2 m s −1 ). The wind speed bias is attributed to the coarse spatial resolution in CESM2-rather than its physics-as CAM has been found to better resolve strong katabatic flow over the steep ice sheet slopes at higher spatial resolution . CESM2 captures the general spatial pattern of U 10m , with wind speeds increasing between the GrIS interior and the margins (Figure 4a). Over the marginal tundra, where topographic gradients are smaller and the katabatic winds break down (Figure 4a), CESM2 simulates low wind speeds, a result that agrees with RACMO2.

Near-Surface Climate: Comparison to RACMO2
DJF T 2m ( • K), (d) DJF near-surface temperature gradient ( • K), T 2m minus radiative skin temperature T skin . CESM2 data stem from a single historical member (HIST-EC) over the period 1961-1990, the same period over which the RACMO2 data are averaged. CESM2 data in the second column have been bilinearly downscaled to the 11 km RACMO2 grid using EC output over the glacier land cover type (section 2.4). Model topography is shown at 500 m intervals (first row, dashed gray lines). Note. Listed are the spatial correlation coefficient r 2 (unitless), the mean bias, and root-mean-square error (RMSE). Rows labeled "composite" represent area-integrated 1 • grid cell averages from a composite of 180 model years (section 2.5) which are compared against area-integrated RACMO2 output. Rows labeled "HIST-EC" represent the HIST-EC experiment, of which r 2 , bias in parentheses, and RMSE were calculated using offline downscaled EC output at 11 km (section 2.4). The area of the contiguous GrIS is 1,693,317 km 2 at 11 km resolution, and SEB fluxes are defined positive toward the surface. Figure 3b shows GrIS-mean 2-m temperature T 2m in CESM2 and RACMO2 and indicates good agreement between the two models. The simulated December-February (DJF) T 2m in CESM2 is −30.4 • C, which is 0.3 • C lower than RACMO2, whereas the mean June-August (JJA) average is −7.2 • C, which is 0.4 • C higher (Table 1). However, no correction has been applied for topographic height, which may be relevant since T 2m depends strongly on elevation. Mean GrIS topography in CESM2 is 2,156 m (CLM variable TOPO_COL), which exceeds that of RACMO2 (2,119 m) by 37 m, because of the subgrid tiling with ECs in CESM2 (see section 2.4). This suggests that a positive correction of +0.22 • C should be applied to CESM2 T 2m to account for this difference in elevation, assuming a lapse rate of 6 • C km −1 .
A map of downscaled T 2m during JJA is shown in Figure 4b and compared to RACMO2. The spatial correlation between the two models is high over the GrIS (r 2 = 0.97, Table 1). CESM2 is slightly warmer than RACMO2 in the interior, and colder near the margins, with the largest absolute differences along the northern and eastern margins (Figure 4b). Our current understanding is that CESM2 fails to simulate the tundra microclimate well and allows perennial snow to build up, leading to a permanent summer cold bias ( Figure  S4) which may feed back to the ice sheet. Although RACMO2 predicts a snow-free tundra in summer, this may be for the wrong reasons. RACMO2 uses the simplified surface scheme of ERA-Interim (Dutra et al., 2010) over bare land grid cells, with a single snow layer that does not allow for meltwater percolation or refreezing and a simplified albedo formulation (Ettema et al., 2010). In CESM2, a multilayer snow model is used over both tundra and ice sheet, an approach which is more physically realistic but also prone to biases.  Figure 4c shows T 2m during DJF. There are strong positive anomalies relative to RACMO2 in the interior, especially toward the north where anomalies exceed 3 • C, whereas the margins have negative anomalies (< −3 • C locally). The overall RMSE is 2.3 • C ( Table 1). The positive anomalies could be induced by large-scale circulation, as CESM2 is slightly warmer at 700 hPa than ERA-Interim (Figures S3a and S3b) or could relate to clouds. The negative T 2m anomalies near the margins are probably caused by the coarse 1 • resolution in CESM2, which mixes in cold air prevailing over the tundra (leftmost panel in Figure 4c). Indeed, DJF T skin has much less widespread negative anomalies ( Figure S5 in the supporting information). The cold tundra air at reference height may in turn be explained by the poor representation of strong inversions in stable boundary layers in CESM2, which are ubiquitous during the polar night over flat surfaces (e.g., Vignon et al., 2018). As a proxy for the near-surface inversion strength, Figure 4d shows T 2m -T skin . Over the principally flat tundra surface, RACMO2 simulates strong inversions (>10 • C) which CESM2 does not capture, presumably due to its limited vertical resolution. The midpoint of the lowest atmospheric layer in CAM6 lies at 993 hPa with a reference pressure of 1,000 hPa, that is, about 60 m above the surface.

Albedo and SEB Components
Next, we evaluate GrIS surface albedo and SEB components (equation (2)) with a focus on summer, when the majority of meltwater is produced. Figure 5 shows July surface albedo in CESM2, MODIS, CLARA-A2, and RACMO2. We expect higher albedo in the models since the satellite products do not account for variations in zenith angles and cloudy conditions. This is true for RACMO2, which shows higher surface albedo over the majority of the GrIS, with an average anomaly of 0.06 relative to MODIS (Figure 5a). CESM2, on the other hand, simulates surface albedo which is not substantially higher than the remote sensing products and simulates a lower albedo than RACMO2 year-round ( Figure 3c). The correlation in the high-albedo accumulation zone is fairly good, as illustrated by Figure 5b, but in the lower-albedo ablation zone there is 10.1029/2019JF005318 Figure 6. Comparison of CESM2 and RACMO2 radiative surface energy fluxes in JJA, (a) downwelling shortwave radiation, (b) downwelling longwave radiation, (c) net shortwave radiation, (d) net longwave radiation. Data as in Figure 4. a wide spread, suggesting that RACMO2 and CESM2 locate ablation zones differently. The overall spatial correlation is r 2 = 0.77 in JJA (Table 1). The high albedo values over the northern tundra regions are due to the excessive tundra snow cover in CESM2, as discussed in section 3.3.
The discrepancy in albedo explains why GrIS-wide SW net is higher in CESM2 than RACMO2, even though summer insolation is slightly lower in CESM2 (Figures 3d and 3e). Averaged over JJA, the ice sheet surface in CESM2 receives 6 W m −2 less shortwave radiation, yet absorbs 8 W m −2 more (Table 1). Figure 6a shows that CESM2 simulates less insolation around the ice sheet margins, where ablation zones are located, and across the southern dome, with local differences exceeding −25 W m −2 . On the other hand, CESM2 simulates more insolation across the northern interior (2-10 W m −2 ), and these anomalies are explained by cloud frequency and cloud optical thickness. CESM2 has different cloud microphysics than RACMO2 and runs at a coarser spatial resolution which spatially smooths orographic uplift, condensation, and therefore cloud formation. Indeed, CAM cloud cover over the southern dome is substantially decreased at higher spatial resolutions . At the same time, it should be noted that the physics of polar cloud formation is notoriously difficult to model. Even regional models struggle to accurately simulate SW d , as indicated by a recent comparison of RACMO2.3p2 data to 42,456 daily in situ measurements which resulted in an RMSE of 27 W −2 and a bias of 3.8 W m −2 (Noël et al., 2018).
Due to its lower albedo, CESM2 simulates greater SW net across most of the island (Figure 6b). Notable exceptions include the far south, where insolation is substantially lower (Figure 6a), and some ablation areas such as the northeastern GrIS, where CESM2 does not expose enough bare ice. The opposite can also be observed. For instance, large SW net anomalies exceeding 20 W m −2 along the northwestern margin indicate ablation zones in CESM2 which are not present in RACMO2.
Figures 3f-3h shows the GrIS-averaged seasonal cycle in LW d , LW u , and LW net in the CESM2 composite and RACMO2, generally indicating good agreement. One striking difference is the lower LW net in CESM2 during winter (−7 W m −2 ), which is matched by a similar anomaly in LW d (−7 W m −2 , Table 1). Figure 6c shows the spatial distribution of LW d in HIST-EC. As a result of longwave radiation EC corrections (section 2.4), strong gradients can be observed near the margins in the downscaled CESM2 product. Overall, the JJA spatial correlation to RACMO2 is much higher (r 2 = 0.93) than that in SW d (r 2 = 0.50, Table 1), which may be partly attributed to the longwave downscaling. Further, the LW d anomaly pattern is similar to SW d in reverse (Figure 6c) which suggests that these differences are driven by the same mechanism. Indeed, whereas clouds reflect and scatter solar radiation back to space, and thus have a cooling effect in the shortwave part of the spectrum, they absorb thermal radiation and have a warming effect in the longwave part. An obvious example is the southern GrIS, where an excess cloud cover leads to both a negative anomaly in SW d ( Figure 6a) and a positive anomaly in LW d (Figure 6c).
Adding the shortwave and longwave radiation, we find that CESM2 simulates a larger net radiative flux in summer (+7.5 W m −2 , Table 1) almost everywhere across the GrIS (Figure 6d). In most locations, R net anomalies are dominated by positive SW net , except in the south, where the longwave anomalies are more pronounced (Figures 6b and 6c). Negative R net anomalies are found in some ablation areas, notably the northeastern GrIS (Figure 6d). The spatial correlations of SW net , LW net , and R net are r 2 = 0.71, r 2 = 0.27, and r 2 = 0.75, respectively ( Table 1). The low spatial correlation of LW net is attributed to its weaker dependency on elevation and latitude.
The seasonal cycles in the nonradiative SEB terms (SHF, LHF, GHF, and MHF) are averaged over the GrIS and shown in Figures 3i-3l. The seasonal cycles of these fluxes are generally similar between the two models, but some differences remain. For instance, SHF is higher in CESM2 during DJF (+6 W m −2 ) which we interpret as a compensating effect for the lower R net in these months (−7 W m −2 , Table 1). The radiative deficit may also explain the slightly elevated GHF in winter (+1 W m −2 ), indicating more heat conduction toward the surface in CESM2. In contrast, a radiative surplus in CESM2 during JJA (+8 W m −2 ) likely underpins a lower SHF (−3 W m −2 ), LHF (−3 W m −2 ), and GHF (−1 W m −2 , Table 1). Spatially, this can be seen when comparing the positive R net anomalies (Figure 6d) to the negative anomalies in SHF, LHF, and GHF (Figures 7a-7c) and observing a large overlap in their areas. This serves as a reminder that all nonradiative heat fluxes are linked to radiation through the SEB and that we should be cautious in drawing bold conclusions from nonradiative SEB components, since cloud-induced radiation biases likely play a role.

10.1029/2019JF005318
There are several more remarks to make about Figure 7a. For instance, the most pronounced anomalies in SHF are near the margins, where the negative wind speed bias is largest (Figure 4a), weakening turbulent heat exchange. Still, EC downscaling is enhancing SHF along the ice sheet margins because of the increased temperature gradients between atmosphere and surface in low-lying regions (section 2.3). This is most apparent in the south and west, where SHF exceeds 10 W m −2 locally (Figure 7a), and less so in the north and east, where wind speeds are lowest ( Figure S6a in the supporting information). Also, bare ice exposure is less frequent in the north and east, thus limiting the inversion strength as suggested by Figure  S6d. Finally, the SHF is negative in southern Greenland (Figure 7a, first two panels), indicating that heat is extracted from the surface by SHF, probably as a result of the radiative surplus (LW d mainly). The overall spatial correlation of JJA SHF to RACMO2 is r 2 = 0.44 (Table 1). Figure 7b shows that CESM2 simulates more negative LHF (i.e., more sublimation) than RACMO2 across most of the GrIS, which we link to the radiative surplus in our model. Very near the ice sheet margin, narrow bands are found where the absolute LHF flux becomes positive that are not present in RACMO2. These positive LHF anomalies are explained by the EC downscaling procedure in which the relative humidity is kept constant with height. At low elevations, specific humidity is enhanced which leads to a weaker humidity gradient to the ice surface and therefore reduced sublimation. Positive LHF (i.e., condensation) in the GrIS ablation zone is a common feature for the lowest parts of the ablation zone in West Greenland during summer (van den Broeke et al., 2009), so we consider this plausible, even though not simulated by RACMO2. The spatial correlation of JJA LHF is r 2 = 0.38 (Table 1).
Another interesting finding from Figures 7a and 7b is that SHF and LHF in RACMO2 are both strongly negative over the marginal tundras, indicating convection and evaporation over the dark tundra surface during summer. This is to some extent simulated by CESM2, except in the north where its turbulent fluxes are weak (Figures 7a and 7b). We hypothesize that this relates to tundra snow cover not melting away early enough in the season, if at all. Also, we recall that RACMO2 adopts a simplified snow model over its tundra grid cells and may therefore not be entirely representative either.
Negative GHF in Figure 7c indicates that heat is conducted from the surface downward, which occurs when the surface is warmer than layers below. Positive GHF indicates heat conduction toward the surface, which can occur at night or after refreezing events. Indeed, locations with positive GHF typically have high refreezing rates (Figure 7e). Overall, the sign of summer GHF is well captured by CESM2, although the flux is weaker than in RACMO2, especially in the refreezing zones in the south and along the northern margins, suggesting that subsurface temperature gradients are smaller in CESM2. Negative GHF anomalies found in the GrIS interior can be explained by the radiation surplus in CESM2. The spatial correlation of JJA GHF is r 2 = 0.67 (Table 1).
The seasonal cycle of MHF in Figure 3l shows good agreement of CESM2 and RACMO2 in GrIS-averaged melt, with a mean JJA bias of 1 W m −2 (Table 1). Figure 7d shows that most of the positive MHF anomalies occur along the western margin. In some places, such as in the northwest, these anomalies can be explained by the coarse atmospheric resolution and the lack of orographically forced snowfall in CESM2, resulting in spurious ablation zones . On the other hand, RACMO2 at 11 km can underestimate melt with respect to observations (Hermann et al., 2018), with studies suggesting that statistical downscaling is needed to resolve narrow ablation zones (Noël et al., 2018). Thus, the fact that CESM2 resolves some narrow ablation zones which are missing in RACMO2 at 11 km is not necessarily unrealistic. Positive MHF anomalies away from the margin, and thus less impacted by EC downscaling, may be explained by the radiation surplus in CESM2. However, most of this additional meltwater refreezes (Figure 7e).
Negative MHF anomalies are found along the northern and eastern margins of the GrIS (Figure 7d), consistent with the underestimation of R net in these areas (Figure 6d). These differences, however, may not be purely radiation driven. Instead, overestimated snowfall or the lack of explicit erosion/sublimation by drifting snow in CESM2 could result in an albedo that is too high. Overall, the spatial correlation in MHF between the two models is high (r 2 = 0.88).
One caveat in these results is that offline downscaling to 11 km appears to increase the GrIS-integrated MHF. To be precise, the mean JJA bias in HIST-EC increases from 0.8 to 1.6 W m −2 when going from online  Figure 4. Note that all color scales are nonlinear and that those in (e) differ from the rest. Listed statistics were calculated from HIST-EC output downscaled to 11 km.  -1990 1961-1990 1961-1990 (180) Area ( EC downscaling to offline EC downscaling (Table 1). One likely explanation is that the two methods use different topography.

SMB Components: Comparison to RACMO2 at 11 km
The spatial distribution of GrIS snowfall (the main mode of mass gain, cf. Equation (4)) is captured reasonably well by CESM2 (Figure 8a), notably the positive north-south gradient and the high accumulation area in the southeast. Small-scale orographically driven features, however, are missing due to the coarser resolution.
Overall, CESM2 simulates more snowfall than RACMO2 across large parts of the GrIS interior ( Figure 8a). Positive biases exceeding 500 mm yr −1 are found over the southern dome and the high accumulation areas in the southeast, as a result of the coarse resolution and weak topographic gradients . At a higher spatial resolution, snow would fall closer to the coast, which explains why CESM2 has a negative snowfall bias along the southeastern margin. Integrated over the GrIS, CESM2 simulates 705 ± 67 Gt yr −1 of snowfall during the late twentieth century, about 9% more than RACMO2 (Table 2).
Rainfall is shown in Figure 8b. Comparing the first panel (CESM2 at 1 • ) and the second (CESM2 at 11 km), it is clear that the phase repartitioning scheme (section 2.1) removes most of the rain that CAM simulates over the GrIS interior. Still, rainfall at the surface is overestimated in CESM2 (59 ± 14 Gt yr −1 ) compared to RACMO2 (19-33 Gt, Table 2). The largest rainfall anomalies (>300 mm yr −1 ) are found along the southeastern and southwestern margin ( Figure 8b) and may in fact be exaggerated by the repartitioning scheme. That is, snow may be converted to rain here, based on downscaled temperature. Figure 8c shows annual mean melt (in mm yr −1 ). Since most melt takes place during summer, the results in Figure 8c resemble the JJA MHF (in W m −2 , Figure 8e) and are therefore not discussed in detail. The total GrIS-integrated melt is 367 ± 74 Gt yr −1 during the late twentieth century (Table 2) and is partitioned between ∼1/3 ice melt (128 Gt) and ∼2/3 snow melt (239 Gt). CESM2 melt is bracketed by the RACMO2 values at 11 km (324 Gt) and 1 km (462 Gt), indicating good agreement. Part of the liquid water from melt and rain is refrozen by CESM2, totalling 198 ± 41 Gt yr −1 over the GrIS (Table 2). This estimate is realistic, given the RACMO2 values of 157 and 249 Gt. However, refreezing in CESM2 has different spatial features than RACMO2 (r 2 = 0.66). For example, there is a pronounced maximum along the western margin that is absent in RACMO2 (Figure 7e). Also, CESM2 misses the strong refreezing zone in the southern and southeastern GrIS, likely as a result of lower snowfall (Figure 8a). Figure 8d shows runoff in CESM2 and RACMO2 (r 2 = 0.74). Naturally, the runoff patterns are similar to patterns of melt (Figure 8c), but there are differences. For instance, strong positive runoff anomalies are found along the southeast margin, where CESM2 simulates less refreezing than RACMO2. Integrated over the GrIS, CESM2 simulates 224 ± 45 Gt yr −1 of runoff, a value that again is bracketed by RACMO2 at 11 km (184 Gt) and 1 km (260 Gt, Table 2). . Stippling denotes differences that are not significant (t test, p < 0.05). Figure S8 is the same figure but including GICs and floating glacier tongues. Figure 8e shows the mean annual evaporation/sublimation flux. The downscaled sublimation field has the same general pattern as RACMO2 (r 2 = 0.56). (Sublimation is not a separate output variable of CLM, so it is assumed that over glaciers, 100% of the evaporation/sublimation flux represents sublimation. This is a fair assumption, given that surface water ponding, forming supraglacial lakes, is not modeled over the glacier land cover type.) Due to coarser resolution, however, the CESM2 pattern is smoother than that of RACMO2, which has an irregular, incised pattern near the margins as a result of drifting snow sublimation (a process not included in CESM2). The difference map in Figure 8e shows that CESM2 simulates less sublimation in low-lying regions, which is plausible as discussed previously in section 3.4. Integrated over the GrIS, the CESM2 sublimation rate is 32 ± 3 Gt yr −1 , which compares well with the RACMO2 estimate of 31 Gt yr −1 (Table 2). However, there is evidence of compensating errors, with too strong sublimation during summer and net riming during winter (Figure 3j).

SMB: Comparison to RACMO2 at 1 km
So far, spatial comparisons to RACMO2 have been made at the native RACMO2 resolution of 11 km, the resolution at which all SEB and SMB fluxes are available. Noël et al. (2016) introduced statistical downscaling of mass fluxes to 1 km, demonstrating improved skill against observations. Figure 9 compares CESM2 SMB with RACMO2 at 1 km, using offline EC downscaling (section 2.4). Overall, CESM2 captures the main spatial features of GrIS SMB (r 2 = 0.61), with narrow ablation zones near the margins, an even narrower equilibrium zone where SMB ≈ 0, and the accumulation zone in the interior. Many of the SMB anomalies in the southern GrIS are explained by resolution-related differences in snowfall (cf. Figure 8a). The ablation zones in the west are generally well positioned but are too wide toward the north because of a lack of orographic snowfall. In contrast, ablation zones in the northern and eastern parts of the GrIS are less extensive in CESM2. Factors that likely play a role here are (1) overestimation of snowfall (Figure 8a), (2) weaker SHF in CESM2 (Figure 7a), caused in part by the inability to resolve summer tundra temperatures (Figure 4b), and (3) the missing process of drifting snow erosion and sublimation in CESM2. As a result, the overall ablation extent in CESM2 (7.5%) is less than that of RACMO2 (8.4% at 1 km, Table 2). Integrated SMB in CESM2 is 508 ± 73 Gt yr −1 , which exceeds RACMO2 by 61 Gt at 11 km and by 115 Gt at 1 km (Table 2). Figure 10 shows time series of precipitation, melt, refreezing, runoff, and SMB for all historical CESM2 members analyzed in this paper. Before 1990, there are no discernible trends in any component. Mean SMB during 1850-1990 is 503 ± 78 Gt yr −1 close to the value of 508 Gt yr −1 in Table 2. Good matches are also found for the individual SMB components in Table 2, suggesting that our period of analysis ) is representative of the entire historical period up to ∼1990. Around 1990, however, there is a break point-an abrupt transition from zero trend to a strong trend-in melt, refreezing, runoff, and SMB, driven by increased melt (Figure 10). For SMB, this break point occurs in 1993, with a 95% confidence interval ranging from 1985 to 2001, using the method of Muggeo (2003Muggeo ( , 2017. The timing of the break point is consistent with reanalysis-based methods, which estimate that SMB started to decrease in the early to mid-1990s (Fettweis, Franco, et al., 2013;van den Broeke et al., 2016), as illustrated by the superimposed RACMO2 1 km data in Figure 10. Trends in individual SMB components, however, are not always consistent with RACMO2. For example, the trend in CESM2 precipitation after 1993 is +1.1 Gt yr −2 , whereas RACMO2 simulates a decrease: −1.7 Gt yr −2 . The melt trend is comparable in the two models (+8.8 vs. +9.9 Gt yr −2 ), but CESM simulates a higher increase in refreezing (+4.7 vs. +2.5 Gt yr −2 ). As a result, the runoff trend is lower in CESM2 (+5.0 vs. +7.4 Gt yr −2 ), and the decrease in SMB is less than half of that in RACMO2 (−3.9 vs. −9.1 Gt yr −2 ). These trends can be different across individual ensemble members due to internal variability and may not always be significant.

Time Series of SMB Components, 1850-2014
A recent study by Noël et al. (2019) forced RACMO2 using 6-hourly output from CESM2 and found a stronger runoff increase at 1 km in the CESM2-forced run (+138 Gt yr −1 during 1991-2012 relative to  than in the standard ERA-Interim-forced run (+100 Gt yr −1 ). In contrast, the runoff trend diagnosed directly from CESM2 is smaller than that of the ERA-Interim forced RACMO2. This discrepancy suggests that the sensitivity of GrIS SMB to climate change is underestimated in CESM2 compared to the statistically downscaled RACMO2.

Discussion
Coupled ice sheet-climate simulations with the GrIS are a major contribution of the CESM2 community to ISMIP6, and GrIS SMB was therefore considered carefully during the development of CESM2. In this section we reflect on some of the development decisions and suggest directions for future improvements. One key change between CESM1 and CESM2 is that CISM is now enabled by default as a diagnostic component in all coupled experiments. Previously, CISM and therefore CLM ECs over the GrIS were active only in dedicated runs, and GrIS SMB could not be diagnosed from most simulations. Now, GrIS SMB is monitored routinely during development, and potential problems are easily flagged.

Model Development to Improve Ice Sheet SMB Toward CESM2
Numerous studies have shown the importance of clouds on GrIS SEB and SMB (e.g., Bennartz et al., 2013;Cullather & Nowicki, 2018;Van Tricht, Lhermitte, Lenaerts, Gorodetskaya, L'Ecuyer, et al., 2016). The previous version of CESM used in GrIS SMB studies was CESM1(CAM4)  10.1029/2019JF005318 , which suffers from an excessive occurrence of liquid-containing clouds over Greenland, as shown in an upcoming study led by J.T.M. Lenaerts. In retrospect, this may explain some of the CESM1(CAM4) surface biases, such as a general warm bias over the GrIS interior, increased LW d at the expense of SW d , and unrealistic rainfall in the interior . In CESM1(CAM5), by contrast, cloud LWP was greatly underestimated, giving major surface radiative biases over Greenland (Lacour et al., 2018;McIlhattan et al., 2017) and the Arctic region as a whole (Kay et al., 2016). This model version, used by the CESM Large Ensemble (Kay et al., 2015), was therefore not suitable for GrIS SMB studies. CAM6 largely resolved this problem with several improvements in the cloud microphysics scheme. The importance of cloud LWP (and to a lesser degree, IWP) motivated us to include Figure 2 in the current paper, in which we compare CESM2 cloud water to observations.
We deem albedo to be the most important surface property to simulate accurately when modeling Greenland melt. ESMs generally have many parameters controlling ice and snow albedo. In CESM, ice albedo is given by a simple fixed value which has been lowered in CESM2 (based on observational evidence) to obtain larger melt rates for bare ice. Snow albedo is a function of many parameters and factors (e.g., initial snow grain size, snow grain growth, refreezing grain size, and impurities) and is sensitive to external forcing, such as the magnitude and timing of snowfall and rain. Following the decision to correct for the bias of interior GrIS rainfall (detailed in section 2.1), snow albedo was unrealistically high at some point in the development cycle, leading to steep SMB gradients near the margins. Multiple trials eventually led to initial snow grain size becoming a temperature-dependent parameter (section 2.2), which lowered the GrIS snow albedo without deteriorating model performance elsewhere.
In CESM2, near-surface wind speed is not only relevant for the turbulent fluxes but also partly controls snow density through the wind-dependent fresh snow density and drifting snow compaction, both of which were introduced to avoid low snow densities in cold and windy environments (van Kampenhout et al., 2017). In CAM5, however, near-surface wind speeds were biased low due to the use of the Turbulent Mountain Stress surface drag parametrization (Lindvall et al., 2012). A new surface drag parametrization was introduced in CAM6 ( Beljaars et al., 2004) that raised low-level wind speed to reasonable values (see Figures 3a and 4) and improved GrIS SEB.

Directions for Future Model Development to Improve Ice Sheet SMB
Based on the previous discussion and the results presented in this paper, we make suggestions for improving GrIS SMB in future versions of CESM.
1. A major outstanding model bias is the high CAM6 rainfall over the interior GrIS (Figure 8b), which is currently alleviated by phase repartitioning in CLM. This solution is suboptimal, as there is reason to believe that phase repartitioning degrades SMB gradients. Moreover, supercooled rain could be key to setting off melt-albedo feedbacks in northern Greenland. To resolve this bias, cloud microphysics in polar regions need to be improved. While the precipitation phase is still being repartitioned, the current temperature-only formulation could perhaps be improved upon by adding humidity as a predictor as suggested by Jennings et al. (2018). 2. In order to better resolve stable boundary layers and temperature inversions, CAM vertical resolution could be increased near the surface. Currently, the lowest atmospheric layer in CAM6 has a thickness of ∼120 m, whereas ∼10 m may be desirable over ice sheets (Vignon et al., 2018). 3. Adding drifting snow erosion and sublimation to CLM could improve the SMB simulation, for example, by widening ablation zones in northern Greenland. 4. It is desirable to reassess snow grain size, an important control on snow albedo. CESM2 albedo appears to be biased low across most of the interior ( Figure 5). 5. Artifacts caused by EC downscaling (supporting information Text S1) could be prevented by introducing spatial information to the vertical downscaling. For example, one approach could be to calculate SMB gradients per GrIS drainage basin, as done in Goelzer et al. (2019). This would also make the method less dependent on the CAM resolution. Another approach is to use variable resolution grids, as done in van . If done right, this will remove the need to use ECs altogether.

Conclusions
In this study we evaluated GrIS climate, clouds, SEB, and SMB in CESM2, which is the first version of CESM capable of ice sheet-climate simulations with dynamic land cover changes. We used output from six 10.1029/2019JF005318 fully coupled CMIP6 historical experiments at 1 • horizontal resolution with fixed ice sheet topography. In addition, we used EC-indexed output from a single historical experiment (HIST-EC) that was downscaled offline to 11 and 1 km grids, allowing for high-resolution comparisons of individual SEB/SMB components against a state-of-the-art regional climate model, RACMO2.3p2.
CESM2 at 1 • simulates reasonably well the Greenland large-scale climate, cloud LWP, surface climate, SEB, and SMB. A few biases remain, some of which can be linked to the coarse grid resolution, in both the horizontal direction (wind speed, precipitation, and tundra microclimate) and the vertical (stable boundary layers during winter). Compared to RACMO2 and satellite data (MODIS and CLARA-A2), snow albedo is biased low in CESM2. As a result, we find a JJA radiation surplus with respect to RACMO2 across most of the ice sheet that is compensating for weaker turbulent fluxes in CESM2, which in turn are linked to weaker near-surface winds.
Mean GrIS-integrated SMB during 1961-1990 is 508 ± 73 Gt, with precipitation being the leading SMB component that is overestimated compared to RACMO2. Mean GrIS-integrated values of melt, refreezing, and runoff are bracketed by RACMO2 values simulated at 11 and 1 km ( Table 2). The extent of the northern and eastern GrIS ablation areas, however, is underestimated in CESM2, and the total ablation area is 11% smaller than in RACMO2. Time series analysis shows that SMB was stable over the historical period up to ∼1990, after which it declines due to increased melt and runoff. Compared to reanalysis-forced RACMO2, CESM2 simulates a comparable trend in melt after 1993, but the trend in refreezing is larger. As a result, simulated trends in runoff and SMB are smaller than that in RACMO2. The timing of the break point in SMB is similar to that in reanalysis-forced RACMO2.
To conclude, CESM2 simulates a GrIS SMB field for the present-day geometry which is physically realistic given the known model limitations and which adds confidence to coupled ice sheet-climate experiments that assess the GrIS contribution to sea level rise on decadal to millennial time scales in past and future climates.