The Community Land Model Version 5: Description of New Features, Benchmarking, and Impact of Forcing Uncertainty

The Community Land Model (CLM) is the land component of the Community Earth System Model (CESM) and is used in several global and regional modeling systems. In this paper, we introduce model developments included in CLM version 5 (CLM5), which is the default land component for CESM2. We assess an ensemble of simulations, including prescribed and prognostic vegetation state, multiple forcing data sets, and CLM4, CLM4.5, and CLM5, against a range of metrics including from the International Land Model Benchmarking (ILAMBv2) package. CLM5 includes new and updated processes and parameterizations: (1) dynamic land units, (2) updated parameterizations and structure for hydrology and snow (spatially explicit soil depth, dry surface layer, revised groundwater scheme, revised canopy interception and canopy snow processes, updated fresh snow density, simple firn model, and Model for Scale Adaptive River Transport), (3) plant hydraulics and hydraulic redistribution, (4) revised nitrogen cycling (flexible leaf stoichiometry, leaf N optimization for photosynthesis, and carbon costs for plant nitrogen uptake), (5) global crop model with six crop types and time‐evolving irrigated areas and fertilization rates, (6) updated urban building energy, (7) carbon isotopes, and (8) updated stomatal physiology. New optional features include demographically structured dynamic vegetation model (Functionally Assembled Terrestrial Ecosystem Simulator), ozone damage to plants, and fire trace gas emissions coupling to the atmosphere. Conclusive establishment of improvement or degradation of individual variables or metrics is challenged by forcing uncertainty, parametric uncertainty, and model structural complexity, but the multivariate metrics presented here suggest a general broad improvement from CLM4 to CLM5.


Introduction
Land models are classically used as tools to integrate terrestrial contributions and responses to weather, climate variability, and climate change. In addition, modern land models are increasingly expected to provide insight into weather and climate impacts of societally relevant quantities such as water availability, crop and timber yields, wildfire risk, human heat stress, and other ecosystem services (Bonan & Doney, 2018). The Community Land Model (CLM), which is the land component of the Community Earth System Model (CESM), has been developed and expanded over the last decade to provide an increasingly comprehensive platform that researchers can use to address these types of questions. More explicitly, CLM has been developed in accordance with two central themes: (1) Terrestrial ecosystems, through their cycling of energy, water, momentum, carbon, nitrogen, and other trace gases, are important determinants of weather and climate, and (2) the land is a critical interface through which climate variability and climate change influence humans and ecosystems and through which humans and ecosystems can affect global environmental change.
Here, we introduce the Community Land Model version 5 (CLM5, http://www.cesm.ucar.edu/models/ cesm2.0/land/), which builds on progress made in CLM4  and CLM4.5 (Oleson et al., 2013). CLM is a community-developed model with CLM5 representing the outcome of model development and analysis efforts by a diverse group of scientists and software engineers from many institutions. Priorities for model development are set collectively by the CLM research and development community and are broadly focused on the enhancement of the capacity of the model to be applied to emerging questions that lie at the intersection of weather and climate with terrestrial processes. Examples of scientific topics that have driven CLM5 development include the following: • assessment of the response and vulnerability of ecosystems to climate change and disturbances (human and natural) and the possibility for ecosystem management to mitigate climate change; • quantification of the role of terrestrial processes in diurnal to interannual weather and climate variability including influence on droughts, floods, and extremes; • establishment of the availability of water resources under climate variability and climate change; • quantification of key land feedbacks to climate change including the permafrost climate-carbon feedback and snow-and vegetation-albedo feedbacks; • representation and quantification of impacts of anthropogenic land cover and land use change on climate and the carbon cycle; • assessment of how land surface heterogeneity affects land-atmosphere interactions and carbon cycling; and • examination of the impact of model structural and parameter uncertainty and exploration of parameter optimization techniques.
The overarching development philosophy also rests on the notion that terrestrial systems are highly coupled and that development in one set of model processes can modify, and often improve, the simulation of other model processes (e.g., improvements in the representation of soil hydrology is likely to improve carbon cycle simulations and vice versa) and can also expose problems in other parts of the model. Core biogeophysical and biogeochemical parameterization development has been complemented with expansions to model functionality (e.g., introduction of a global interactive crop model with fertilization and irrigation and introduction of an embedded ice sheet model) and model structural updates (e.g., increased soil vertical resolution and spatially variable soil depth). Many of the improvements adopted for CLM5 were independently developed by separate research groups for a range of reasons and applications. Therefore, a principal goal of this manuscript is to catalog and describe the full set of CLM5 model developments so that model users are aware of the new features of the model, including known strengths and limitations (section 2). The model simulations and meteorological forcing data sets employed are described in section 3. We include a high-level assessment of the integrated impact of these developments on the overall performance of the model, utilizing the International Land Model Benchmarking package (ILAMB, Collier et al., 2018), ecosystem experiment data, and other metrics (section 4). A summary and discussion are provided in section 5. A full technical description of the model is available online (http://www.cesm.ucar.edu/models/cesm2/land/CLM50_ Tech_Note.pdf).

CLM4
CLM4 was released in June 2010 along with the Community Climate System Model version 4 (CCSM4). CLM4 has been used in CCSM4 (Gent et al., 2011) and CESM1 (Hurrell et al., 2013). CLM4 is described in Lawrence et al. (2011), and a full technical description is available in Oleson et al. (2010; http://www. cesm.ucar.edu/models/cesm1.2/clm/CLM4_Tech_Note.pdf). Briefly, CLM4 included more sophisticated representations of soil hydrology and snow processes than its predecessor, CLM3.5 . In particular, new treatments of soil column-groundwater interactions, soil evaporation, aerodynamic parameters for sparse/dense canopies, vertical burial of vegetation by snow (Wang & Zeng, 2009), snow cover fraction , and aging, black carbon and dust deposition, and vertical distribution of solar energy for snow were implemented (Flanner et al., 2007). CLM4 was the first version in the CLM series to include a prognostic aboveground and belowground carbon-nitrogen cycle (CLM4CN, Thornton et al., 2007) as well as the ability to represent transient land cover change . CLM4 added a representation of organic soil and deep ground into the existing mineral soil treatment  to enable more realistic modeling of permafrost and active layer dynamics. An urban canyon model, to contrast rural and urban energy balance and climate, was also introduced .
processes including a revised canopy radiation scheme and canopy scaling of leaf processes, colimitations on photosynthesis (Bonan et al., 2011;Bonan et al., 2012), and temperature acclimation of photosynthesis . Hydrology updates included modifications such that hydraulic properties of frozen soils are determined by liquid water content only rather than total water content, introduction of an ice impedance function and allows for a perched water table above icy permafrost ground . The snow cover fraction parameterization was revised to reflect hysteresis in fractional snow cover, for a given snow depth, between accumulation and melt phases ). The lake model was thoroughly revised (Subin et al., 2012). A surface water store was introduced, replacing the wetland land unit. The surface energy flux calculation was modified to separately simulate snow-covered, water-covered, and snow/water-free portions of vegetated and crop land units, and snow-covered and snow-free portions of glacier land units . Globally constant river flow velocity was replaced with variable flow velocity based on mean grid cell slope. A vertically resolved soil biogeochemistry scheme was introduced with base decomposition rates varying with depth and modified by soil temperature, water, and oxygen limitation and also including vertical mixing of soil carbon and nitrogen due to bioturbation, cryoturbation, and diffusion . Litter and soil carbon and nitrogen pool structure as well as nitrification and denitrification were modified to reflect the Century model ). The fire model was replaced with a model that includes representations of natural and anthropogenic ignition sources and suppression as well as agricultural, deforestation, and peat fires (Li et al., 2012;. The biogenic volatile organic compounds model was updated to MEGAN2.1 (Guenther et al., 2012).
Further additions to CLM4.5 included a methane production, oxidation, and emissions model (Riley et al., 2011) and an extension of the crop model to include interactive fertilization, organ pools (Drewniak et al., 2013), and irrigation (Sacks et al., 2009). Multiple urban density classes, rather than the single dominant urban density class used in CLM4, were modeled in the urban land unit. Carbon 13 C and 14 C isotopes for natural vegetation were introduced . A summary of the changes included in CLM4.5 relative to CLM4 is listed in Table 1.

CLM5
CLM5 is the default land model for CESM2 (http://www.cesm.ucar.edu/models/cesm2/). Developments for CLM5 build on the progress made in CLM4.5. Most major components of the model have been updated with notable changes made to soil and plant hydrology, snow density, river modeling, carbon and nitrogen cycling and coupling, crop modeling as well as new surface characterization and transient land use data sets and increased flexibility to represent landscape dynamics through specified or prognostic transitions in land unit weights. Much of the development reflects a push toward more mechanistic treatment of key hydrologic and ecological processes and more comprehensive and explicit representation of anthropogenic land management.
Prior versions of CLM mainly included a single option for most parameterizations. With our new CLM codebase management philosophy, where new parameterizations or model structural decisions were defined for CLM5, we also maintained the CLM4.5 parameterization or configuration, thereby allowing users to switch back and forth between alternative parameterizations via namelist control. In this section, we briefly describe the full set of model developments. Except where explicitly noted, all described new parameterizations or features are active by default in CLM5. For full details of new and old CLM5 parameterizations, including equations and parameter values, we refer the reader to the cited papers and to the full technical description of CLM5 (http://www.cesm.ucar.edu/models/cesm2/land/CLM50_Tech_Note.pdf). Additional documentation including information about how to access the code, tutorials about how to run the model, developer's guides, and model output diagnostics can be found online (http://www.cesm.ucar.edu/models/ cesm2/land/). A schematic representation of the primary processes and functionality represented in CLM5 is shown in Figure 1. A summary of the changes in CLM5 relative to CLM4.5 is listed in Table 1 for reference.

Dynamic Land Unit Weights and Plant Functional Type Distribution
CLM5 includes a new capacity update land unit weights during a simulation either through a data set or prognostically, a technical feature that was previously not possible which prevented representation of important specified or dynamic transitions. Spatial land surface heterogeneity in CLM is represented as a nested subgrid hierarchy in which grid cells are composed of multiple land units, columns, and patches ( Figure 2). Each grid cell can have a different number of land units, each land unit can have a different Journal of Advances in Modeling Earth Systems number of columns, and each column can have multiple patches each with a specific plant functional type (PFT) or crop functional type (CFT). The first subgrid level, the land unit, is intended to capture the broadest spatial patterns of subgrid heterogeneity. The CLM5 land units are vegetated, lake, urban, glacier, and crop. New within CLM5 is the capacity to adjust the fractional area of each land unit throughout the course of a simulation either as specified through a land use data set (e.g., deforestation for agriculture and transition of a fraction of vegetated land unit to crop land unit) or through prognosed initiation or loss of glacier area (e.g., initiation of glacier area and transition of fraction of vegetated land unit to glacier land unit; only possible when two-way ice sheet interactions are activated). For natural vegetation, CLM operates under the assumption that all PFTs on the natural vegetation land unit compete for water and nitrogen and that all PFTs share the same soil column state (temperature, moisture, carbon, and nitrogen). Note that prior research has shown that for some applications, particularly for studies of land cover change impacts on climate, it may be preferable for each PFT to operate on its own soil column to avoid implicit energy transfer from one PFT to another Meier et al., 2018;Schultz et al., 2016). On the crop land unit, each CFT (irrigated and unirrigated) resides on its own soil column and therefore operates based on its own soil moisture and nitrogen conditions. The CLM5 surface data sets are created as in CLM4 and CLM4.5 but with updated methodology as described here. Present-day global land cover descriptions are generated at 1-km resolution using updated versions of the data and methods used for CLM4 and CLM4.5 (Lawrence & Chase, 2007). The basis for the land cover description comes from MODIS land cover (MCD12Q1 v5.1), vegetation continuous fields (MOD44B v5.1), LAI (MCD15A2 v5), and albedo (MCD43B3 v5) products for the years 2001-2015 (https://lpdaac. usgs.gov/dataset_discovery/modis). Additional information for tree leaf type and longevity are provided by the AVHRR continuous fields tree cover product (Defries et al., 2000). Global crop distributions are provided by the monthly irrigated and rainfed crop areas around the year 2000 (MIRCA2000) data set of Portmann et al. (2010). Canopy height data for tree PFTs are provided by the Geoscience Laser Altimeter System on the ICESat satellite as processed by Simard et al. (2011). The LUH2 historical and future  Goldewijk et al., 2017) for 850-2014 and from Integrated Assessment Model teams for multiple alternative scenarios of the future for 2015-2100. The LUH2 time series describes annual changes in primary and secondary forest and nonforest land units, along with five crop groups, managed pasture, rangeland, and urban areas. The LUH2 data also include information on wood harvest, both in terms of the mass of carbon extracted and the total harvest areal fraction (CLM5 uses carbon mass). Annual crop management is specified by crop type through industrial fertilizer application and the fraction of each crop irrigated. Finally, the CLM surface data sets and transient land use data sets are produced with the CLM Land Use Data Tool (http://www.cgd.ucar.edu/iam/projects/thesis/thesis-clm-landuse-tool.html). This tool takes the present-day land cover distribution and merges it with historical or future LUH2 transitions and management information and translates them into CLM PFT and CFT distributions and management information. 2.3.2. Soil Hydrology CLM5 includes several structural and parameterization improvements that increase the realism of the soil hydrology representation in the model. To resolve a deficiency in the seasonality of soil evaporation and soil water storage in semiarid regions, the empirical soil evaporation resistance parameterization is replaced with a mechanistically based parameterization where soil evaporation is controlled by the rate of diffusion of water vapor through a dry surface layer (Swenson & Lawrence, 2014). To account for spatial variation in soil thickness and columnar water holding capacity, CLM is updated so that different soil thicknesses (by default ranging from 0.4-to 8.5-m depth) can be applied for each soil column (Brunke et al., 2016;. The default spatially explicit soil depths are derived from a spatially explicit soil thickness data product . The explicit treatment of soil thickness with underlying bedrock (currently assumed to be impermeable, i.e., zero flux bottom boundary condition) means that the soil saturated and unsaturated zones and associated water table depth are modeled explicitly. This allows for the deprecation of the unconfined aquifer parameterization ( , which was used as part of the groundwater representation in CLM4 and CLM4.5. Note that an added benefit of the explicit representation of spatially varying soil thickness underlain by impermeable bedrock is that it removes a logical inconsistency between the treatment of soil hydrologic and soil thermal calculations that existed in CLM4 and CLM4.5. The default model soil layer resolution is increased, especially within the top 3 m, in part to more accurately simulate active layer thickness (ALT) within the permafrost zone. The default configuration includes a total of 25 ground layers that extend to a depth of~50 m. The first five (0.4-mthick soils) up to 20 (8.5-m-thick soils) layers in each column are considered soil and are hydrologically and biogeochemically active. The number of soil layers is specified independently for each column based on the imposed soil thickness data set. The remaining ground layers in each column are considered to be dry bedrock. Note that since the number of active soil layers varies from grid cell to grid cell, users need to be careful when doing spatial averaging of soil moisture or carbon/nitrogen quantities since bedrock layers have very small prescribed constant soil moisture and carbon/nitrogen values.
An adaptive time-stepping solution to the Richard's equation is introduced (Clark & Kavetski, 2010;Kavetski et al., 2001). This improves the accuracy and stability of the numerical soil water solution by allowing for multiple substeps within the standard 30-min model time step when required. In test simulations, all instances of numerical instability in the Richard's equation solution (i.e., negative soil moisture updates) were eliminated at a cost of an increase of less than 3% in model runtime.
Substepping is invoked (i.e., instabilities occur in Richard's equation solution) most frequently when and where the number of soil layers is small, which can be due to frozen soils or shallow bedrock. The process of subtracting the hydrostatic equilibrium soil moisture distribution from the vertical soil moisture profile before solving Richard's equation, proposed in Zeng and Decker (2009) and included in CLM4 and CLM4.5, has been deprecated because it is inconsistent with standard approaches used in soil hydrology (De Rooij, 2010).

Atmospheric Surface Layer Stability
In Monin-Obukhov stability theory (Foken, 2006), atmospheric stability is characterized by a length scale L, called the Obukhov length, which is used to nondimensionalize the distance to the surface using variable zeta = (z − d)/L, where z is the reference height and d is the displacement height. In CLM4.5, the stability variable zeta is constrained to be less than or equal to 2. Using temperature and friction velocity measurements from a subalpine forest flux tower, Burns et al. (2018) showed that CLM4.5 exhibited a large and persistent nighttime low bias of canopy temperature and friction velocity. In that study, they alleviated this bias by implementing the Handorf et al. (1999) stability function in very stable conditions. For CLM5, we approximate the Handorf et al. (1999) stability function for very stable conditions by setting the maximum zeta value to 0.5. Ongoing development work since CLM5 was finalized indicates that this need for a maximum zeta value can be eliminated when a vegetation biomass heat storage capacity is explicitly modeled . Stability corrections and the applicability of Monin-Obukhov similarity remain active research topics, which has recently leveraged high-resolution turbulent simulations such as direct numerical simulations .
CLM4.5 includes an additional modification to undercanopy stability designed to increase aerodynamic resistance between the canopy and the ground in stable conditions (Sakaguchi & Zeng, 2009). Due to biases in surface to lowest atmosphere layer temperature profiles, also noted by Burns et al. (2018), it was found that the undercanopy stability parameterization did not perform as intended. Consequently, this undercanopy stability parameterization is inactive in CLM5. Within-canopy and undercanopy stability remains an active area of research (e.g., .

Snow, Glaciers, and Ice Sheets
Several changes are included that are mainly targeted at improving the simulation of surface mass balance, the difference between annual accumulation and ablation, over ice sheets. New parameterizations for fresh snow density (updated temperature effects and wind effects), destructive metamorphism (the change in snow crystals from six-sided shapes to rounded, bonded ice grains due to disturbance, molecular motion, and pressure), and compaction by overburden pressure and drifting snow are included (van Kampenhout et al., 2017). For reference, fresh snow density as a function of temperature and wind speed is shown in figure 1 of van van Kampenhout et al. (2017). The maximum number of snow layers and snow amount is increased from five layers and 1-m snow water equivalent to 12 layers and 10-m snow water equivalent, to allow for the formation of firn in regions of persistent snow cover (e.g., glaciers and ice sheets; van Kampenhout et al., 2017). The snow capping routine, which sets a limit on the maximum amount of accumulated snow, has been fixed to correctly allow surface snow density and grain size to refresh when new snow falls. The grain size of freshly fallen snow has been made a function of air temperature to address unrealistically high albedos over ice sheets. Instead of applying a fresh snow grain size of 54 μm at all temperatures, fresh snow grain size is set to 54 μm below −30°C and to 204.5 μm above 0°C, with a linear ramp applied between these temperatures. The parameters for snow grain aging are maintained.
Multiple elevation classes (10 elevation classes by default) are specified on the glacier land unit to account for the strong topographic elevation gradients present over many glaciers and ice sheets (Lipscomb et al., 2013). Atmospheric surface temperature, potential temperature, specific humidity, density, and pressure are downscaled from the mean grid cell elevation to each glacier column elevation using a specified lapse rate (6.0 km −1 ) and an assumption of uniform relative humidity. Longwave radiation is downscaled by assuming a linear decrease in downwelling longwave radiation with increasing elevation (0.032 W·m −2 ·m −1 , bounded to 0.5 to 1.5 times the grid cell mean value and then normalized to conserve grid cell total energy; Tricht et al., 2016). This downscaling allows lower-elevation columns within a glacier land unit to undergo surface melting while columns at higher elevations remain frozen.
In typical configurations (e.g., by default in CESM2 and CLM5 land-only simulations), CLM5 computes ice sheet surface mass balance, but ice sheets do not evolve. CLM5 can also be coupled bidirectionally to CISM2.1 (Lipscomb et al., 2019) and thereby simulate an evolving Greenland ice sheet. The introduction of the capability to adjust land unit weights during a simulation (section 2.3.1) means that a glacier can incept, grow, shrink, or disappear during a simulation when two-way coupling between the land and ice sheet model is active. By default, two-way coupling is not active in CESM2 or CLM5 land-only simulations, including the simulations assessed here.
Vegetation canopy precipitation interception is updated to track liquid and solid water phases separately, with intercepted snow subject to unloading events due to wind or above-freezing temperatures similar to Roesch et al. (2001). Interception snow mass compares favorably with in situ measurements from Storck et al. (2002). Additionally, the snow-covered fraction of the canopy, which is calculated based on the canopy snow mass and LAI, is used within the canopy radiation and surface albedo calculations.
Finally, CLM5 partitions total precipitation into rain and snow according to a linear temperature ramp. This partitioning occurs irrespective of what phase precipitation is calculated by the atmosphere model. For most land units, this ramp generates all snow below 0°C, all rain above 2°C, and a mix of rain and snow for intermediate temperatures. For glaciers, the end points of the ramp are −2 and 0°C, respectively. To ensure energy conservation, a sensible heat flux correction term is applied when the phase of precipitation coming from the atmosphere is changed.

Rivers
The River Transport Model (RTM) used in CLM4.5 is replaced with the physically more realistic Model for Scale Adaptive River Transport (MOSART, . Note that the river model is treated as a separate coupled component in CESM and therefore is not technically part of CLM, but we include it in this manuscript because of the clear relationship with and dependence on CLM; that is, MOSART receives surface and subsurface runoff from CLM. MOSART represents an upgrade over RTM in several ways. RTM utilizes a simple linear reservoir method to calculate streamflow, while MOSART is based on the more physically based kinematic wave method. MOSART also provides more information on river conditions; that is, RTM only simulates streamflow whereas MOSART additionally simulates time-varying channel velocities, channel water depth, and channel surface water variations. In MOSART, surface runoff is routed across hillslopes and then discharged along with subsurface runoff into a tributary subnetwork before entering the main channel. MOSART assumes that all the tributaries within a spatial unit (either regular lat/lon grid or watershed) can be treated as a single hypothetical subnetwork channel with a transport capacity equivalent to all the tributaries combined. Correspondingly, three routing processes are represented in MOSART: (1) hillslope routing: surface runoff is routed as overland flow into the subnetwork channel, while subsurface runoff directly enters the subnetwork channel; (2) subnetwork channel routing: the subnetwork channel receives water from the hillslopes, routes water through the channel, and discharges it into the main channel; and (3) main channel routing: the main channel receives water from the subnetwork channel and/or inflow, if any, from upstream, and discharges the water downstream or to the ocean. The capability to simulate flooding (water transfer from rivers back onto land under flood stages) that was implemented into RTM for CLM4.5 is retained for MOSART but is not active by default. The representation of wetlands is unchanged from CLM4.5 wherein wetlands are no longer their own prescribed land unit but instead are captured through a prognostic surface water storage that accounts for fine spatial-scale variations in surface elevation (see technical description for details).

Vegetation Physiology
A plant hydraulic stress (PHS) routine is introduced which explicitly models water transport through the vegetation according to a simple hydraulic framework (Kennedy et al., 2019). The plant hydraulics routine solves for vegetation water potentials (root, xylem, and leaf) according to an electric circuit analogy, in which the flow (current) is the soil-to-leaf water supply (sap) which is set to meet the transpiration flux (demand) at every time step; that is, no storage is assumed. Explicit prognosis of plant tissue water status improves the physical basis for many processes represented in CLM, such as the dynamics of root water uptake profiles, and the attenuation of photosynthesis and transpiration with drought, which was exaggerated in previous model versions (e.g., Powell et al., 2013). In PHS, "unstressed" (atmospheric demand-driven) stomatal conductance is modulated for drought stress using a function of leaf water potential, requiring vegetation to regulate stomatal conductance to avoid excessively negative leaf water potential and thus plant desiccation and embolism in the xylem. This more mechanistic representation of vegetation water stress replaces the soil moisture stress (SMS) parameterization in prior versions of CLM in which water stress was calculated through a plant wilting factor that was based on soil water matric potential relative to PFT-dependent parameters for fully closed and fully open stomata, weighted by layer root fractions. An emergent feature of the plant hydraulics scheme (wherein water moves along water potential gradients within the soil-rootstem-leaf system) is a plant-mediated vertical hydraulic redistribution of soil water from wet to dry soil layers, which thus leads to important nighttime and seasonal hydraulic redistribution, physically constrained by the plant hydraulic parameterization (Kennedy et al., 2019). To prevent unrealistically high soil evaporative losses of soil water due to continuous hydraulic redistribution, root water uptake and hydraulic redistribution is not allowed to occur in the 2-cm-thick surface soil layer.
PHS advances the physical and empirical basis of the CLM vegetation hydrodynamics scheme. Previously used soil moisture stress functions (as in SMS) tend to lack either a strong physical or empirical justification and are a major source of uncertainty in land models . PHS, in adopting a plant hydraulic framework, incorporates more physical root water uptake, following Darcy's law, and a stress formulation based on avoiding excessive xylem tension. Likewise, PHS opens avenues for better empirical constraints on vegetation water use. The model parameters have physical meaning, and new prognostic vegetation water potential can be validated with field observations and, potentially, satellite remote sensing products (e.g., Anderegg et al., 2018;. In CLM5, maximum stomatal conductance is obtained from the Medlyn "empirical-optimal" conductance model (Medlyn et al., 2011), rather than the Ball-Berry stomatal conductance model that was utilized in CLM4.5 and prior versions of the model. The Ball-Berry implementation used a single slope parameter for all C3 plants. In a recent study, Lin et al. (2015) estimated PFT-dependent slope parameters for the Medlyn model, which have been successfully used in CABLE . The slope parameters used in CLM5 are from CABLE. Note that the slope parameter value is indicative of the plant's water use strategy-PFTs with a high slope parameter have high stomatal conductance per unit photosynthesis and therefore a low water use efficiency (WUE). As discussed by Franks et al. (2017) and Franks et al. (2018), the primary difference between the two stomatal models, after accounting for different slope parameters, relates to the effects of extreme low and high vapor pressure deficit on stomatal conductance.
Two other relatively minor changes are included in CLM5.
(1) The trigger for stress deciduous PFT phenology is augmented with an antecedent precipitation requirement (Dahlin et al., 2015). This additional trigger was implemented to reduce the occurrence of anomalous green-up during the dry season in many semiarid regions that was being driven by upwards water movement from wet to dry soil layers and thereby triggering unrealistic leaf-out even in circumstances when there was not any recent rainfall. More recent work has demonstrated a broad array of stress deciduous phenology strategies that are not possible to resolve in the current CLM PFT scheme (Adole et al., 2018;Dahlin et al., 2017), but this complexity could potentially be represented in Functionally Assembled Terrestrial Ecosystem Simulator (FATES; see section 2.3.12).
(2) The rooting profiles, which were inconsistent for water and carbon in CLM4.5, were updated to be consistent in CLM5. The Jackson et al. (1996) rooting profile is preferred over the Zeng (2001) profile as it produces more realistic vertical soil C profiles, though the Zeng (2001) profile is retained as an option.
Lastly, ozone damage to vegetation is included as an optional feature in CLM5. The ozone damage parameterization is the same as implemented by  based on ozone damage response data compiled by Lombardozzi et al. (2013). Ozone damage to vegetation is applied directly and independently to photosynthesis and stomatal conductance for three broad PFT classes (broadleaf trees and shrubs, needleleaf trees and shrubs, and crops and grasses) based on the cumulative uptake of ozone. Cumulative uptake of ozone is calculated as the ozone concentration multiplied by stomatal conductance, integrated through time, to account for the fact that ozone primarily damages vegetation once it enters the leaf and total damage is dependent on the time period of exposure. The damage decays over the growing season to account for the fact that plants acquire new, undamaged leaves throughout the growing season and also decays over the leaf life span for evergreen plant types. 2.3.7. Carbon Dynamics CLM5 applies a fixed C allocation scheme for woody vegetation where allocation to aboveground and belowground biomass is held constant. The decision not to use the dynamic allocation scheme based on net primary productivity (NPP), as was used in CLM4 and CLM4.5, was driven by the fact that observations indicate that plant biomass saturates with increasing productivity, which is inconsistent with the behavior in CLM4 and CLM4.5 where biomass perpetually increases with increasing productivity (Negrón-Juárez et al., 2015). Because the prior allocation rules implicitly led to a saturation of leaf carbon allocation, this change does lead to a possible trade-off between accuracy of biomass and accuracy of leaf area and remains a large uncertainty and an area of active research. Soil carbon decomposition processes are unchanged from CLM4.5 to CLM5, but assessment with a new metric for the temperature sensitivity of apparent soil carbon turnover times  pointed to the need to adjust the parameter that controls intrinsic depth limitation on soil carbon turnover toward a weaker depth limitation (rather than the strong depth limitation in CLM4.5) and to adjust the parameter that controls soil moisture limitation on soil carbon turnover rates in dry soils to a wetter soil moisture level than that used in CLM4.5. Note that vertical C and N processes are only calculated for hydrologically active soil layers (see section 2.3.2), which vary in space. The concept of FUN assumes that N uptake requires the expenditure of energy in the form of C (in CLM4.5 there was no C expenditure for N uptake), often a significantly large portion of NPP (Doughty et al., 2018;Marschner, 1995) and further, that there are numerous potential sources of N in the environment which a plant may exchange for C: symbiotic biological N fixation, arbuscular-mycorrhizal and ecto-mycorrhizal (two types of root fungus) uptake, direct root uptake, and leaf N retranslocation. The ratio of C expended to N acquired is therefore the C cost, or exchange rate, of N acquisition. This C is assumed to respire as it is used for N acquisition. As FUN calculates the rate of symbiotic N fixation, this N is passed straight to the plant, as opposed to passing through the soil mineral N pool. CLM5 now separately calculates rates of free-living N fixation as a function of evapotranspiration (modified from Cleveland et al., 1999), which is added to the soil inorganic ammonium (NH 4 + ) pool. Previous versions of CLM added the N fixation flux, which was calculated as function of NPP (without an associated C cost; Cleveland et al., 1999;Thornton et al., 2007;Wieder et al., 2015), to the soil mineral N pool.
The static plant carbon:nitrogen (C:N) ratios utilized in CLM4 and CLM4.5 are replaced with variable plant C:N ratios, as in Zaehle and Friend (2010), which allows plants to adjust their C:N ratio, and therefore their leaf N content, with the cost of N uptake (Ghimire et al., 2016). The implementation of a flexible C:N ratio means that the model no longer relies on instantaneous down-regulation of potential photosynthesis rates based on soil mineral N availability to represent nutrient limitation. Furthermore, stomatal conductance in CLM5 is based on the N-limited photosynthesis rate rather than on potential N-unlimited photosynthesis as in CLM4 and CLM4.5, thereby allowing for more realistic coupling between plant C and water cycles (Medlyn et al., 2016).
Finally, the Leaf Use of Nitrogen for Assimilation (LUNA; Ali et al., 2016;Xu et al., 2012) model is incorporated. The model allocates N to maximize daily net photosynthetic carbon gain under the following two key assumptions: (1) N allocated for light capture, electron transport, and carboxylation are colimiting; and (2) respiratory nitrogen is allocated to maintain dark respiration determined by V cmax25 . Compared to traditional photosynthetic capacity models, a key advantage of LUNA is that it is able to predict potential acclimation of photosynthetic capacities for different environmental conditions as determined by temperature, radiation, CO 2 concentrations, day length, and humidity. Importantly, the inclusion of LUNA means that V cmax25 , the maximum rate of carboxylation, is a prognostic model quantity, dependent on leaf N per unit area and environmental conditions, whereas it was fixed for each PFT in CLM4 and CLM4.5.

Land Management Processes
Representation of human management of the land (agriculture and wood harvest) is augmented in several ways. Critically, the introduction of the capability to dynamically adjust land unit weights during a simulation means that the crop model can be run coincidentally with prescribed land use change, which significantly expands the capabilities of the model. The CLM4.5 crop model is extended to operate globally through the addition of rice and sugarcane as well as tropical varieties of corn and soybean (Badger & Dirmeyer, 2015;Levis et al., 2018). These crop types are added to the existing temperate corn, temperature soybean, spring wheat, and cotton crop types. Industrial N fertilization amounts and irrigation-equipped area are updated annually based on crop type and geographic region through the land use time series data set. The irrigation trigger is updated to remove the dependence on the CLM4.5 plant SMS calculation (replaced in CLM5 with PHS, section 2.3.6) and instead uses a target soil moisture level, which was tuned to get reasonable irrigation amounts. Additional minor changes to crop model include the following: (1) Crop phenological triggers vary by latitude for selected crop types, which is a temporary solution that generates more realistic global crop planting dates outside of the temperate regions for which the growing degree day-based crop planting window was originally parameterized (though serious crop planting window errors still occur), and (2) grain C and N is transferred during crop harvest into a 1-year product pool with the C needed to seed the next season's crops removed from grain C while the rest of the crop vegetation residue is transferred to litter C and N pools. To better match wood harvest inventories specified in the LUH2 data set, mass-based, rather than area-based, wood harvest is applied. Shifting cultivation is represented by calculating unrepresented gross transitions in the LUH2 time series and then removing aboveground C to account for the conversion of the gross forest PFTs to crop or pasture PFTs not included by the net transitions. Shifting cultivation is an optional feature of CLM5 and is off by default and in all simulations considered in this paper.
Changes to urban modeling capabilities include the introduction of several human heat stress indices for both urban and rural areas that are calculated and output by default (Buzan et al., 2015). A more sophisticated and realistic building space heating and air conditioning submodel that prognoses interior building air temperature and includes more realistic space heating and air conditioning waste heat factors is incorporated (Oleson & Feddema, 2019).

Fire
The fire parameterization in CLM5 simulates four types of fire: agricultural fires in cropland, deforestation fires in tropical closed forests, peat fires, and nonpeat fires outside cropland and tropical closed forests (see Li & Lawrence, 2017 for details;Li, Wigmosta, et al., 2013;Li et al., 2012). Burned area is affected by climate and weather conditions, vegetation composition and structure, and human activity. Once burned area is determined, the impact of the fire is calculated, including biomass and peat C losses, fire-induced vegetation mortality, adjustment of the vegetation C:N pools, and fire C and other trace gas emissions. The fire model is mainly unchanged from CLM4.5 except with a modified scheme for the dependence of fire occurrence and spread on fuel wetness for nonpeat fires outside cropland and tropical closed forests and with the dependence of agricultural fires on fuel load removed. The CLM5 fire model, when coupled to the Community Atmosphere Model, can simulate and transfer emissions of total C, aerosols (e.g., black C, organic C, and DMS), greenhouse gases (e.g., CO 2 , N 2 O, and CH 4 ), and other trace gases (e.g., CO, NO, NO 2 , NH 3 , HONO, SO 2 , and over 15 nonmethane hydrocarbon species) to the atmosphere (Ford et al., 2018). Fire emissions are estimated at the PFT level from total fire C emissions, a conversion factor from C to dry matter (DM; 0.5 g C/g DM) and emission factors (g species/g DM) that convert DM burned into emissions. The emission factors for each species used in CLM5 are derived from up-to-date inventories compiled from field and laboratories studies (Andreae & Merlet, 2001, updated to 2016Akagi et al., 2011Akagi et al., , updated to 2014and references therein). The vertical distribution of fire emissions is derived from PFT-dependent maximum injection heights (4.3 km for needleleaf trees, 3 km for other boreal and temperate trees, 2.5 km for tropical trees, 2 km for shrublands, and 1 km for grass and croplands). These injection heights are compiled from satellite-based observations of fire smoke plumes (Val Martin et al., 2010;Val Martin et al., 2018). The fire emissions module is not active by default in CESM2 but is available as a research option.

Parameters
Parameters of CLM5 were defined where possible from literature values and meta-analyses, with some adjustments made to reduce large model biases, while accounting for errors in observational data sets and in the globally applied model structure. Default parameter values for all model parameters can be found in the CLM5 technical description. A brief description of the rationale for the values used for selected parameters is included here. Note that during the process of finalizing the CLM5 parameter set, we found several instances where parameter value trade-offs needed to be made related to joint goals of relatively small biases for quantities such as GPP and LAI and reasonably high PFT survivability rates (see section 4.2). Fisher et al.
(2019) provided a more detailed assessment of CLM5 C and N cycle sensitivity to parametric uncertainty as well as additional discussion of parameter definition for CLM5. Note that ILAMB was not used during the parameter adjustment process.

Plant Hydraulics Parameters
The plant hydraulics scheme introduces four new parameters for each PFT (Kennedy et al., 2019), including the water potential at which half of the hydraulic conductivity of each plant element (root, stem, shaded leaf, and sunlit leaf) is lost (p50), the conductivity of the soil-root interface (krmax), the conductivities at the interfaces between each of the plant elements (kmax), and the cavitation vulnerability curve shape-fitting parameter (ck). The code is structured so that in future investigations, parameter values for each plant element can be adjusted individually, but in the released version all plant elements use the same value. Estimates of p50 across PFTs are obtained from analysis of the data set presented by Choat et al. (2010). Large data sets on comparable plant tissue conductivities (kmax and krmax) are not widely available. Further, because the resistances of the plant and roots act in series, the minimum conductivity among the plant elements largely controls the overall plant conductivity. Plant conductivities are therefore calibrated as follows: kmax values are set uniformly high, and krmax is considered a free tuning parameter. The introduction of PHS represents the first instance where a plant hydrodynamic model has been applied globally across all biomes in CLM, or indeed, in any ESM of which we are aware. Consequently, the plant hydraulics parameter values included in the released CLM5, which were defined in a generally ad hoc manner, should be considered an initial estimate of reasonable parameter valuables that can and should be refined as required.

Vegetation Parameters
Several vegetation parameters were updated relative to those used in CLM4 or CLM4.5.
(1) PFT-specific values for the slope of the Medlyn stomatal conductance (medlynslope) were adapted from Medlyn et al. (2011) as documented in Franks et al. (2017). (2) PFT-specific values of the respiration model intercept (lmr_intercept_atkin) were derived from Atkin et al. (2015). (3) Leaf longevity (leaf_long), target leaf CN ratio (leafcn), and specific leaf area (slatop) were all derived from the mean PFT-specific values identified in the TRY database (Kattge et al., 2011). With our final set of default CLM5 parameters, the productivity for boreal and temperate needleleaf evergreen trees is too high, particularly when the LUNA model is active. To calibrate model performance, leafcn was increased to one standard deviation above the mean reported value for these PFTs.
The parameters for carbon allocation are as follows: ratio of new coarse root to new stem allocation, croot_stem; ratio of new fine root to new leaf allocation, froot_leaf; and ratio of new stem to new leaf allocation, stem_leaf. The ratios of tissue biomass are the basis for the fixed carbon allocation scheme used in CLM5, which is an oversimplification of real allometric ratios that vary as plants age. Thus, it is difficult to directly connect the parametric allocation ratios used in CLM5 to those obtained from databases. The CLM5 allocation parameters (ratio of new coarse root to new stem, croot_stem; ratio of new fine root to new leaf, froo-t_leaf; ratio of new stem to new leaf, stem_leaf; and ratio of new live wood to new total wood) were initially derived from an analysis by Ghimire et al. (2016) but were further adjusted to reduce large biases in LAI in deciduous PFTs. CLM4.5 down-regulated leaf allocation with high NPP, whereas CLM5 adopts a fixed allocation scheme to rectify issues with woody biomass accumulation in tropical forests identified by Negrón-Juárez et al. (2015). For CLM5, allocation to stems and roots was increased for many PFTs, potentially compensating for the removal of a variable allocation parameterization, and potentially also contributing to low growth and survival in more marginal climate areas. This set of parametric trade-offs reflects the need for a whole-plant-based (as opposed to big leaf, tissue-based) allocation scheme, as is envisaged for future generations of the model Fisher et al., 2018).

Nitrogen Model Parameters
The introduction of the FUN model to CLM5 adds numerous parameters describing the costs of N acquisition from the environment and control on the flexibility of the tissue C:N ratios. Many of these parameter values are constrained by data but still include some uncertainty since they represent processes (N uptake, fixation, and allocation) that are sparsely documented in the literature. Nitrogen cycle models in general have large structural and parametric uncertainty. The maximum fraction of net carbon assimilation that can be spent (at a PFT level) on fixation is a proxy for the fraction of N fixers (FUN_fracfixers) in an ecosystem. FUN_fracfixers is set at 0.25 for each PFT and 0 for all CFTs except temperate and tropical soy where it equals 1. Note that although FUN_fracfixers allows fixation, this does not necessarily mean it occurs if there are cheaper C costs for N acquisition from other pathways. Parameters for fixation cost (a_fix, b_fix, c_fix, and s_fix) were derived from Houlton et al. (2008). The relative values of the six parameters of the active cost of N uptake (akc_active, akn_active, ekc_active, ekn_active, kc_nonmyc, and kn_nonmyc) were taken from Brzostek et al. (2014). These parameters shape the C cost curves for the mycorrhizal and direct root uptake pathways. Note that N uptake costs of some PFTs were adjusted from Brzostek et al. (2014) values to reduce biases in GPP, especially broadleaf tropical deciduous trees and C4 grass, which Brzostek et al. (2014) did not provide. The parameters that adjust C expenditure on N uptake with changing environmental cost and existing tissue ratios (fun_cn_flex_a, fun_cn_flex_b, and fun_cn_flex_c) were determined via an off-line calibration exercise to achieve variations in tissue C:N ratios for the typical modeled N-cost range to be consistent with the range of observations. These parameters allowed FUN, which was originally parameterized for models with fixed plant C:N ratios, to work with the variable plant C:N ratios in CLM5. The fraction of ectomycorrhizal fungi (per_ecm) was derived from Shi et al. (2016).

FATES
Included as an option with CLM5 is the FATES . FATES is a cohort model of vegetation competition and coexistence, allowing a representation of the biosphere which accounts for the division of the vegetated land into successional stages and for competition for light between height-structured cohorts of representative trees of various PFTs. FATES allows the prediction of biome boundaries directly from plant physiological traits via their competitive interactions and includes the SPITFIRE model of Thonicke et al. (2010), modular allometry and allocation schemes, flexible trait-based PFT definition, interactive logging, and plant hydrodynamics based on Christoffersen et al. (2016). FATES fast-timescale physiological processes are based on CLM but resolved for a height-structured and multi-PFT canopy. FATES is not active by default in CLM5 and is not active within any simulations assessed in this manuscript. Open-source development and application of the codebase is ongoing (https://github.com/NGEET/fates).

Data Assimilation Capabilities
The capabilities for conducting data assimilation with CLM5 using the Data Assimilation Research Testbed (DART, Anderson et al., 2009) continue to improve, particularly with respect to computational efficiency. The CLM-DART system relies heavily on the CESM multi-instance capability and other workflows. The latest distribution of DART includes full support for CLM5 both in terms of the initial setup scripts provided to create a multi-instance case suitable for DA and the assimilation scripts called by CESM and for the DART executables themselves. CLM-DART has the ability to assimilate many land observation types using the general DART framework, including in situ and remote sensing measurements of soil moisture and temperature, eddy covariance flux tower measurements of carbon and water fluxes, and most recently LAI and aboveground biomass (Fox et al., 2018). Previous work with CLM-DART has concentrated on hydrometeorology and describe capabilities to assimilate snow cover fraction (Zhang et al., 2014), AMSR-E brightness temperature for snow depth (Kwon et al., 2016), soil moisture , and GRACE total water storage (Zhao & Yang, 2018). Work is underway to add capability to assimilate solar-induced fluorescence and the latest generation of spaceborne soil moisture observations.

Simulations and Assessment
3.1. Simulations Table 2 lists the CLM4, CLM4.5, and CLM5 simulations that have been performed. This set of experiments provides a comprehensive assessment of CLM across model generations and across common CLM configurations, as well as the basis to assess the sensitivity to forcing data sets. The assessment of three model versions allows readers to understand the progression of model performance and provides context for CESM1 versus CESM2. These include simulations that apply LAI prescribed from satellite phenology (SP) and simulations with prognostic vegetation state and active biogeochemistry (BGC). Note that only CLM5 has the capability to dynamically simulate crop management and crop management change through time so this simulation is defined as CLM5 BGC crop. All simulations were completed at a resolution of 0.9°latitude by 1.25°longitude and except where indicated include all required historical or future CLM forcings (as applicable for each configuration) including time series of CO 2 , aerosol deposition, N deposition, and land use change. The projection period (2015-2300) simulations, which used the "anomaly forcing" method , and the no land use change simulations are not assessed here but are available to the community via the data portal for use. The +N and +CO 2 simulations are 20-yearlong simulations starting in year 1995 that replicate the CLM4, CLM4.5, and CLM5 BGC simulations but with a step increase of (1) nitrogen deposition (5 g N·m −2 ·year −1 above ambient evenly distributed over the year) and (2)  The standard CLM spin-up protocol is used to achieve carbon, water, and energy equilibrium at the start of the simulation. The year 1850 equilibrium conditions are calculated by integrating over a repeating 20-year period of an atmospheric reanalysis data set (i.e., years 1901 to 1920 from the forcing data sets described below) along with fixed atmospheric CO 2 , N deposition, aerosol deposition, and land use (note that wood harvest is set to zero during spin-up). As with earlier versions of CLM, it is prohibitively expensive to run the full model for the period of time required to achieve a quasi steady state. Thus, the spin-up procedure involves a new "accelerated decomposition" methodology, updated from that introduced in Thornton and Rosenbloom (2005) and Koven et al. (2013), with modifications for CLM5 to both add a geographic term to the acceleration and also accelerate the stem and coarse root C turnover. During the accelerated decomposition phase, the decomposition of the slow C pools (e.g., the long turnover time soil C and coarse woody debris pools) are artificially increased to allow faster convergence on the equilibrium state (see section 21.8 of CLM5 technical description for details). The CLM historical simulations assessed here were initialized from spin-up simulations that consisted of~400 years in accelerated mode, followed by an additional 400-800 years in "normal mode." Though the length of time for spin-up varies across configurations, by the end of the spin-up, the global total ecosystem C is drifting by less than 0.02 Pg C/year, and fewer than 5% of grid cells are out of C balance by more than 1 g C·m −2 ·year −1 . For CLM5, initial/cold start (prior to spin-up) soil C and N stocks are increased substantially over earlier model versions, which was done to permit vegetation establishment in harsh environments (where the need for plants to pay for N uptake can inhibit growth under marginal conditions). In some high-latitude grid cells, however, vegetation does not survive, and soil C turnover is slow due to cold climate conditions. In these locations, the high initial soil C stocks do not deplete during the accelerated spin-up, which leads to unrealistically high equilibrium soil C stocks in those grid cells. To circumvent this undesirable feature, the C stocks of the slow C pools are set to zero where vegetation C is <0.1 g C/m 2 by the end of the accelerated spin-up phase.

Meteorological Forcing Data Sets
For comparison, we utilize three historical meteorology/climate forcing data sets  which are drawn from standard forcing data sets that will be used within LS3MIP (Van den Hurk et al., 2016).

GSWP3v1
The Global Soil Wetness Project forcing data set (GSWP3) is the default forcing data set for LS3MIP (Van den Hurk et al., 2016) and LUMIP  land-only simulations. It is a 3-hourly 0.5°global forcing product  that was developed for the third phase of GSWP3 (http://hydro.iis.u-tokyo.ac.jp/ GSWP3/). It is based on the 20th Century Reanalysis version 2 performed with the NCEP model (Compo et al., 2011). The reanalysis was dynamically downscaled to T248 (0.5°) resolution using the Global Spectral Model using a spectral nudging technique (Yoshimura & Kanamitsu, 2008). Bias correction for temperature, precipitation, and longwave radiation, and shortwave radiation were made using CRU TS v3.21 (Climate Research Unit, Jones & Harris, 2013), GPCCv7 (Global Precipiation Climatology Centre, Schneider et al., 2014), and Surface Radiation Budget data sets, respectively. A wind-induced undercatch correction was applied.

CRUNCEPv7
CRUNCEP is the default forcing data set used in the Global Carbon Project TRENDY simulations (Le Quéré et al., 2018) and MsTMIP simulations (Huntzinger et al., 2013). It is also a secondary forcing data set for LS3MIP land-only simulations. It is a 6-hourly 0.5°global forcing product  which is a combination of the CRU TS v3.24 monthly climate data set (Jones & Harris, 2013) and NCEP reanalysis (Kalnay et al., 1996). The reanalysis is only used to generate diurnal and daily anomalies added to CRU TS monthly means. Precipitation, temperature, cloudiness, and relative humidity are all based on CRU while longwave radiation, pressure, and wind speed are taken directly from NCEP.

WATCH/WFDEI
WATCH is a 3-hourly or 6-hourly 0.5°global forcing product . It uses the CRU TS2.1 (Mitchell & Jones, 2005) and GPCCv6 data sets to provide the mean climate and the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA-40) product to distribute the mean monthly climate to daily and hourly estimates. Years 1958-2001 are based directly on ECMWF Reanalysis (ERA-40) whereas years 1901-1957 are based on reordered ERA-40 data. Corrections have been applied for seasonal-and decadalscale variations in the effects of tropospheric and stratospheric aerosol loading on solar radiation, thereby accounting for the effects of global "dimming" and "brightening." Additional detail about the WATCH data set is available in Weedon et al. (2011). Note that simulations with WATCH forcing only run through year 2001. We also utilize the WFDEI product, which utilizes the WATCH methodology to the ERA-Interim reanalysis data set (Weedon et al., 2014). This WFDEI data set covers the period 1979-2012. Due to the short record, we only use WFDEI data set for SP simulations.

ILAMB
The International Land Model Benchmarking (ILAMBv2.1, Collier et al., 2018) package is used to assess the models. ILAMB is an open-source land model evaluation system that operates on global-, regional-, and sitelevel data and provides a hierarchical scoring system to indicate model fidelity. The ILAMBv2 version used here integrates analysis for 28 variables utilizing more than 60 data sets and data products. For each variable, ILAMB produces statistics, maps, time series, and metrics for annual mean, bias, relative bias, RMSE, seasonal cycle phase, spatial distribution, interannual variability, and variable-to-variable assessments. Both global and regional assessments are included.
The CLM diagnostics package provides a vast set of additional plots and tables, including plots for many variables that are not included in ILAMB as well as seasonal comparisons against selected observed data sets. CLM diagnostic package results are available here for reference (http://www.cesm.ucar.edu/experiments/ cesm2.0/land/diagnostics/clm_diag_PCKG.html).

Results
In this section, we present a representative sample of analyses that are selected to emphasize strengths and weaknesses of CLM5, relative to CLM4 and CLM4.5, as well as to highlight new features of the model. Due to the breadth of model improvements and the scope of the model output, the assessment presented here is necessarily incomplete. Companion manuscripts focused on CLM5 for the CESM2 Special Issue provide more in-depth assessment of specific aspects of the model (CO 2 and N-additions response, Wieder et al.,

Assessment with ILAMB
Encouragingly, there is a general progression in the quality of the simulations across model generations. CLM5 outperforms CLM4 for the majority of assessed variables (Figure 3, see also http://www.cesm.ucar. edu/experiments/cesm2.0/land/diagnostics/clm_diag_ILAMB.html). We refer the reader to ILAMB output where vast amounts of additional figures and statistics are available. The improvements from CLM4.5 to CLM5 are comparatively subtle with several variables showing improvement (biomass, burned area, LAI, net ecosystem carbon balance, latent heat, terrestrial water storage, albedo, net ecosystem exchange, and ecosystem respiration) but others showing degradation (soil carbon, runoff, surface net radiation, and CO 2 ). The broad improvements across model generations are an emergent feature of the comprehensive model development activities described in section 2. Definitive identification of the source of particular improvements (or degradation) is beyond the scope of this paper, but some insight is provided in the analyses below. Note that ILAMB results should be interpreted carefully. The summary scores shown in Figure 3 reflect integrated scores across multiple metrics (RMSE, bias, interannual variability, spatial pattern, etc.) and for some variables also multiple observational data sets. An overall improved or degraded score for a particular variable can be a result of a mix of scores for individual metrics. For runoff, for example, the overall score is degraded in CLM5 which, when one drills down into ILAMB output, comes from a combination of degraded interannual variability, improved spatial distribution, and a slightly greater mean bias (shifting from a low bias in CLM4 to a high bias of similar magnitude in CLM5 when forced with GSWP3v1; when forced with CRUNCEPv7, all model versions show a large low bias in runoff). Consequently, the overall reduced score for runoff should be considered within this more nuanced perspective.
ILAMB scores indicate a degradation in the simulations of soil carbon stocks from CLM4.5 to CLM5, but the observed estimates are known to be highly uncertain. An alternative soil carbon metric that evaluates the models against apparent soil carbon turnover time shows an improvement from CLM4.5 to CLM5 (section 4.6). This apparent disagreement between two metrics of soil carbon highlights one of the challenges of benchmarking. When there is disagreement across metrics, we argue that the metric that emphasizes a model process is more meaningful than one that simply evaluates a stock or flux. Consequently, in this instance, our interpretation (based partly on expert judgment) is that the representation of soil carbon is actually slightly improved in CLM5, even though the ILAMB assessment indicates otherwise. We refer the readers to Collier et al. (2018) for more information on how observed data set uncertainty is accounted for in ILAMB and note that improved treatment of observational data uncertainty is ongoing within the ILAMB project.
ILAMB also assesses functional relationships between two variables (e.g., precipitation vs. GPP or LAI). CLM5 performs better than CLM4 or CLM4.5 for the majority of the functional relationships assessed (Figure 4), suggesting improved process representation in CLM5. In particular, the relationships between GPP and climate variables such as solar radiation and precipitation are improved, though there is a slight degradation (CLM4.5 to CLM5) of the relationship between GPP and surface air temperature. Relationships between burned area and climate are also improved (see ILAMB plots), with burned area correctly peaking at average annual precipitation rates of 2.5 to 5 mm/day, an ecoclimatic regime that is dry enough for fire but productive enough to establish fuel loads.
The ILAMB system was designed to probe model performance across both timescales and spatial scales. At the global scale, the seasonal cycle of atmospheric CO 2 deduced from CLM carbon fluxes improved substantially from CLM4 to CLM5, especially in the mid-to-high northern latitudes. However, the magnitude of interannual variability has degraded, especially in the tropics. For all CLM model versions, the Northern Hemisphere interannual variability is at most one third of that observed at NOAA marine boundary layer sites.
Utilizing ILAMB, we can also identify a significant sensitivity of simulation output to the forcing data set ( Figure 5). While all of the forcing data sets used in this study are observationally derived, each one employs different methodology for downscaling and bias correction and can therefore potentially be assessed with ILAMB. GSWP3-forced simulations score best for most of the forcing variables (assessed forcing variables are surface air temperature, precipitation, surface relative humidity, and surface downward shortwave and longwave radiation) with relative humidity being the exception. Generally, CLM5 scores best for simulations forced with the GSWP3 forcing data set. The fact that model output variables score better with the best (according to ILAMB) forcing data set suggests, not surprisingly, that land models are likely to perform better with more accurate forcing, particularly when functional relationships are represented reasonably by the model. As noted in section 3.1, it is beyond the scope and aim of this paper to provide an assessment of the performance of CLM5 within CESM2. However, we direct interested readers to the ILAMB results for CESM1/CLM4 versus CESM2/CLM5 that we provide on the ILAMB webpage associated with this paper. In those results, we see that the land climate forcing variables (e.g., surface air temperature, downwelling shortwave and longwave radiation, and surface relative humidity) are generally marginally improved in CESM2 (with the exception of precipitation which shows slight degradation). The assessed land carbon, water, and energy variables show similar improvements in the coupled simulations (i.e., from CESM1 to CESM2) as they do in land-only simulations (CLM4 to CLM5). The modest improvement in coupled model land forcing quantities combined with the consistent relatively strong improvements in land-only and coupled simulations implies that the source of improvement in land surface variables derives from developments in CLM, rather than due to improvements in other components of CESM.

PFT-Level Assessment
Biases in the annual monthly maximum LAI for selected PFTs are shown in Figure 6 and for all PFTs in supporting information Figure S2. CLM5 shows reduced root mean square error compared to MODIS LAI (Table 3) for nine out of 14 PFTs compared to CLM4.5. Broadleaf evergreen tropical trees, broadleaf deciduous temperate trees, and C4 grasses showed the biggest improvement.
During the course of the development of CLM5, we tested the model with parameter sets that resulted in considerable areal fractions of the vegetation not surviving for one or more PFTs. This result leads us to routinely track survival percentage throughout the model development process. Survival percentage for each PFT is reported in Table 3. In general, survival percentage is slightly higher in CLM5. Survival fraction plots in Whitaker space are shown in Figure S3. We can see, unsurprisingly, that for most PFTs survival fractions are low in dry and warm climates or in very cold climates. CLM PFTs have the same parameters across their entire geographical range, thus not accounting for geographical trait variations which could nonetheless regulate surface fluxes (Giardina et al., 2018;. Land models where PFTs or their parameters are more disaggregated, for example, into those adapted for more and less productive environments (e.g., CLM-FATES), should in principle be able to circumvent this issue. It is important to note that in CLM, once a PFT dies (i.e., vegetation C goes to zero) in a particular grid cell, that PFT cannot grow back during the course of the simulation, even if climate conditions become more amenable for survival.
Maximum carboxylation rate at 25°C, V cmax25 , values (representing leaf canopy average) for each PFT and each model version are shown in Table 3 and are compared to the synthesized Kattge et al. (2009) observational estimates. In CLM4 and CLM4.5, the V cmax25 values are prescribed with the values in CLM4.5 specifically calibrated to reflect data in Kattge et al. (2009), except for broadleaf evergreen tropical trees which were adjusted upwards so as to produce a viable tropical forest photosynthesis levels. In CLM5, V cmax25 is a prognostic quantity (see section 2.3.7), and the values shown in the table represent a spatially weighted average monthly maximum V cmax25 for each PFT. With the model's current parameterization, CLM5 predicts V cmax25 values that are lower than the observational estimates for most PFTs, especially C3 grasses Figure 4. ILAMB variable-to-variable comparison summary diagram for CLM4BGC, CLM4.5BGC, and CLM5BGC for GSWP3v1 forcing. See Collier et al. (2018) for details on this metric. Right panels show example ILAMB relationship plot for a particular variable-to-variable comparison between climatological annual precipitation and LAI. Black line, repeated in each plot, is the observationally derived relationship. Error bars indicate the ±1 standard deviation of LAI for all grid cells that lie within that precipitation bin. Values in parentheses indicate ILAMB score for that comparison. (Table 3 and Figure S4). The discrepancy may be partially related to the fact that observed values of V cmax25 may not represent the environmental conditions (e.g., shading) as experienced by the plants in CLM, in addition to challenges associated with the limited spatial representativeness of the observed values. The ability of the model to represent photosynthesis and LAI reasonably well even with such low V cmax25 values is potentially indicative of a structural problem in the leaf-level versus canopy-scaled value (as discussed in Rogers et al., 2017) which will be investigated further using off-line tools such as those presented by Walker et al. (2018). The prognostic V cmax25 values produced in CLM5 should be perceived as an initial effort to incorporate parameterizations that can simulate changes in leaf N allocation and photosynthetic capacity under environmental change. Further investigation is needed to improve the model representation of photosynthetic capacity.  (prescribed vegetation, left) and CLM5BGC (prognostic vegetation and carbon cycle, right) forced with three alternative forcing data sets (GSWP3v1, CRUNCEPv7, and WFDEI/WATCH). Note that the CLM5BGC WATCH-forced runs only run through year 2001 which means that CLM5BGC-WATCH runs are evaluated over different set of observational years. Gray color for CLM5-WATCH for terrestrial water storage is because there are not enough years of overlap between observations and model. Note that a different set of forcing data sets is used for SP versus BGC simulations (WFDEI for SP and WATCH for BGC) which affects the relative scores even for forcing variables such as precipitation which is the same for CRUNCEPv7 and GSWP3v1 in SP versus BGC.
Simulated canopy height and canopy height biases with respect to those derived from ICEsat (Simard et al., 2011) are shown for all tree PFTs for CLM5BGC in Figure S5. On average, boreal needleleaf evergreen trees are too tall by 5-10 m while tropical broadleaf evergreen trees and temperate and boreal deciduous trees are too short by 5-10 m. These biases are related to simulated plant biomass as well as uncertainties in the specified allometric relationships between biomass and height. Biases in canopy height will affect the land surface roughness length and therefore turbulent heat flux exchange between the land and the atmosphere.

Hydrology
The main changes to soil hydrology (see section 2.3.2) are (1) introduction of spatially variable soil depth (depth to bedrock), (2) replacement of the unconfined aquifer that existed below the soil column with a no flux bottom boundary condition , and (3) a revised soil evaporation parameterization that accounts for the rate of diffusion of water vapor through a dry surface layer (Swenson & Lawrence, 2014). Figure 7 illustrates the impacts of these new features for two example grid cells in the southwest and southeast United States. At the southwest U.S. grid cell, one can see that ET is too variable compared to the observations for CLM4 and CLM4.5. With the dry surface layer in CLM5, soil evaporative water losses are restricted, resulting in improved ET seasonality. Water from snow melt and spring rains then infiltrates deeper into the soil column (which is 8.5 m deep at this location), providing a source of moisture for evaporation into the summer months. At the eastern U.S. grid cell, we can observe a different feature of the new model. The shallow 1-m-thick soil prescribed at this location in CLM5 cannot store much water. Consequently, we can see strong drying throughout the soil column in the low precipitation year of 1993, which then restricts ET from summer into fall, in agreement with observations. In CLM4 and CLM4.5, ET is unrealistically supported through this period by soil water that is stored deeper in the standard 3.5m-thick soils.
ILAMB and CLM diagnostics package results indicate only relatively small changes in the quality of annual streamflow for the top 50 biggest rivers. In particular, mean flow for the Amazon and Congo rivers is increased and shows better agreement with observed flows, with the improvement mainly due to reduction of the excessively high tropical forest ET that was seen in CLM4. The mean bias in global annual mean river flow is slightly degraded, with CLM5 showing a high bias in global river discharge in both SP and BGC configurations (bias is larger in BGC mode). On the other hand, the global annual mean bias and bias/RMSE scores for ET show nominal improvement in CLM5. We also note that differences in simulated runoff and ET between forcing data sets are larger than the differences across model versions. Assessment of the impact of hydrology changes on simulated land-atmosphere interactions is beyond the scope of this manuscript. However, we can infer that the relationship is likely to differ by examining the simulated soil moisture residence time (SMRT) across models. SMRT is the e-folding decay timescale of soil moisture due to evapotranspiration and is an integrative measure of soil-plant-atmosphere dynamics. We calculate SMRT for the root zone (0-0.5 m) from daily soil moisture curves during post-rain periods using a procedure similar to the estimation of a base flow recession constant (Vogel & Kroll, 1996). This residence time metric is reflective of the evapotranspiration dry-down response timescale (Teuling et al., 2006). In Figure 8, SMRT as simulated by CLM5 is shown for the continental United States for the May to October warm season and is compared to observationally derived estimates from the North American Soil Moisture Database . In general, the SMRT as simulated by CLM5 compares well with observations except for the western United States where observations show a wide range of residence times from less than 60 days to greater than 90 days whereas CLM5 shows uniformly longer residence time (120 days or more). At least some of the western U.S. discrepancy could be attributed to the poorly resolved topographic gradients at the nominal 1°resolution of these simulations. Figures 11b and 11c compare the SMRT in CLM5 with that in CLM4 and CLM4.5. Overall, the SMRT in CLM5 has increased across much of the eastern United States and decreased in parts of the western United States compared to both CLM4 and CLM4.5. Identification of the source of the changes in residence time is beyond the scope of this paper, but the spatially explicit soil depths, the introduction of the dry surface layer parameterization for soil evaporation, and soil moisture dynamics associated with the PHS routine are all likely to be factors. Averaged across the continental United States domain, SMRT is higher by 15% compared to CLM4.5 and 1.5% compared to CLM4. Dirmeyer et al. (2016) concluded that the SMRT in CLM4 was 18% too low, so the lengthened residence time in CLM5 may represent a change in the desired direction.
The residence time metric suggests improvements in CLM5 compared to CLM4, and CLM4.5 with CLM5 shows a generally higher SMRT across majority of the soil moisture observing network, as one would expect with generally deeper soils and stronger soil evaporation limitations associated with the dry surface layer parameterization. In many regions this moves the model further from observed estimates (Table S1) though caution is warranted when comparing CLM SMRT with observationally derived SMRT due to uncertainties from a number of sources including uncertainties in observationally derived SMRT due to different types of sensors and measurement techniques at each site, the substantial spatial-scale mismatch between grid cells and observational sites, as well as uncertainties in model parameterizations (Dirmeyer et al., 2016). We

10.1029/2018MS001583
Journal of Advances in Modeling Earth Systems repeated our calculations using the soil moisture memory metric employed in Dirmeyer et al. (2016) and found a similar change in CLM5 compared to CLM4 and CLM4.5 (not shown). Changes in SMRT are likely to impact a range of land-atmosphere interaction phenomenon including land-driven climate predictability.

PHS and ET Partitioning
The PHS configuration implements new parameterizations for root water uptake and water stress for CLM5. For comparison, we also ran CLM5 with PHS replaced with the SMS parameterization included in prior CLM versions (see section 2.3.6). One of the broadest impacts of PHS is a decrease in the coefficient of variation of GPP (CV GPP ) and transpiration (CV ET ) (Figures 9d and 9h). The global distributions of CV GPP and CV ET both shift toward lower values with PHS (Figures 9c and 9g), corresponding to global reductions in CV of 8.0% and 12.5% for GPP and ET, respectively, relative to SMS. Decreases in CV GPP tend to occur in waterlimited ecosystems with seasonal rainfall, such as the Sahel region of Africa and northern Australia (Figure 9d). PHS incorporates more flexible root water uptake (Kennedy et al., 2019), which can utilize more of the soil column to buffer shortfalls in precipitation, acting to reduce variability imposed by precipitation variations. CV ET decreases follow roughly the same patterns, reflecting the coupling of transpiration and photosynthesis through stomatal conductance (Figure 9h). With PHS, vegetation water stress is sensitive to atmospheric demand for transpiration and tends to narrow the range of transpiration values, which results in relatively larger reductions in CV ET as compared to CV GPP . In some regions, variability increases with PHS, primarily at high latitudes (e.g., eastern Siberia) and in arid regions. Such increases in CV GPP and CV ET are generally associated with increases in the mean fluxes of GPP and ET in these regions with PHS.
Other mechanisms unrepresented in CLM, including adaptive responses of V cmax25 to dry conditions and biochemical responses to stress (Keenan et al., 2009;Niinemets & Keenan, 2014), could in principle increase interannual variability of these fluxes; thus, the decrease in variability seen here is not necessarily indicative of a structural degradation or inappropriate PHS parameters.
The partitioning of evapotranspiration into transpiration, canopy evaporation, and soil evaporation is a key emergent process simulated by land models, essential to assess ecosystem WUE . In Figure 10, we show the transpiration fraction from each model compared to estimates of transpiration fraction from the Water, Energy, and Carbon with Artificial Neural Networks data set (WECANN, Alemohammad et al., 2017, available at https://gentinelab.eee.columbia.edu/content/datasets). In prescribed vegetation configurations, CLM5SP shows better agreement with WECANN transpiration fraction than either CLM4SP or CLM4.5SP, especially in the tropics. Globally, the contribution of soil evaporation to ET is diminished in CLM5 relative to CLM4 and CLM4.5, resulting in a higher percentage of ET coming from transpiration (CLM4SP and CLM4.5SP 53%; CLM5SP 60%, Table 4), in line with recent isotopic data estimates of 61% ± 15% (Jasechko et al., 2013;Schlesinger & Jasechko, 2014). However, in prognostic vegetation mode, biases in simulated LAI lead to poorer agreement with WECANN for ET partitioning for all model versions. In particular, low LAI biases for tropical deciduous trees ( Figure S2), especially in the Sahel and southern Africa, appear to correlate with low biases in transpiration fraction, though errors in the observations. Table 4 shows global percentages for transpiration, soil evaporation, and canopy evaporation for the CLM versions. Note that simulations forced with CRUNCEPv7 show a higher proportion of ET coming from canopy evaporation than GSWP3v1-forced simulations. This difference is likely due to the temporal frequency of the forcing precipitation (6-hourly for CRUNCEPv7 and 3-hourly for GSWP3v1), which can have a strong impact on canopy evaporation.

Permafrost and Snow Density
Permafrost is a key feature of the earth system, and uncertainty regarding the strength of the permafrost climate-carbon feedback is considerable (McGuire et al., 2018;Schuur et al., 2015). The permafrost climate-carbon feedback is a challenging research problem that depends on many features of a land modeling system. A known deficiency in prior versions of CLM was an unrealistically low fresh snow density, which led to excessive snow insulation of the ground, particularly at low snow depths . Several changes to fresh snow density and snow densification were introduced in CLM5 (van Kampenhout et al., 2017) resulting generally in denser snow for both seasonal and perennial snowpacks. The denser snow over Greenland and Antarctica is an improvement and along with the deeper snowpack allows the model to more realistically represent firn and the transition from snow to ice. The denser surface snowpack also largely eliminates excessive subsnow surface melt that occasionally occurred in CLM4 and CLM4.5 in very cold climates where the simulated near-surface thermal conductivity was unrealistically low.
The changes to modeled snow density also have beneficial impacts on permafrost distribution and ALT (the depth to which permafrost soils thaw each summer). In Figure 11, maps of ALT and February snow density are shown for CLM4.5 and CLM5 with GSWP3v1 and CRUNCEPv7. These maps reveal that there are strong relationships between the forcing data set, the snow density formulation, and simulated ALT. Snow is denser across the permafrost domain in CLM5 (225 to 275 kg/m 3 ) compared to CLM4.5 (<200 to 225 kg/m 3 ). This denser snow in CLM5 is more consistent with the values of 230 to 330 kg/m 3 reported for northwest Alaska (Sturm et al., 2010). The denser snow reduces snow insulation and results in colder soils and shallower ALT in CLM5 compared to CLM4.5.

Journal of Advances in Modeling Earth Systems
It is also relevant to note the impact of forcing data set on snow density and ALT simulations. Snow tends to be less dense with CRUNCEPv7 forcing than with GSWP3v1 forcing. Taken in isolation, this should lead to shallower ALT with GSWP3v1 forcing, but instead ALT is generally deeper which appears to be due largely to greater downwelling longwave and shortwave radiation in GSWP3v1 forcing data. The large differences in simulated permafrost distribution and ALT between the two forcing data sets reveal an important aspect of uncertainty in permafrost modeling (which propagates to uncertainty in modeled soil carbon stocks, as discussed below). ILAMB output indicates that downwelling longwave radiation, downwelling solar radiation, and humidity variables all score significantly higher across the Arctic land domain with GSWP3v1 (other forcing quantities are roughly equivalent across these two forcing data sets) which suggests that for permafrost studies, GSWP3v1 forcing may be more appropriate. If we consider just the GSWP3v1-forced simulations, we see that CLM4.5, with its low-density snow, exhibits ALT that is unrealistically deep (ALT >1 m deep across nearly the entire permafrost domain) while CLM5, with its denser snow, is more realistic. These results are an indirect indication that the CLM5 snow density parameterizations may represent an improvement. Table 5 lists the simulated global total carbon stocks and annual mean fluxes for the different model versions compared to available data products. Global GPP agrees best with available data products for CLM5 (119 Pg C/year in CLM5BGC, 134 Pg C/year in CLM4BGC, and 118 Pg C/year for FLUXNET-MTE observed GPP estimate; values are for area of land intersection between model and observations, that is, grid cells where  . The latitudinal variation of CUE simulated by CLM5 seems plausible, based on published estimates (Campioli et al., 2015;DeLucia et al., 2007;Malhi et al., 2011;Vicca et al., 2012) but deserves further investigation. All three model versions reasonably replicate the global totals for vegetation carbon stocks, but the spatial distribution differs across models. ILAMB results show that CLM4BGC placed too much carbon into tropical rainforests and too little carbon into boreal forests, especially across Europe and Siberia. To first order, the biases are reversed in CLM5BGC with too little carbon in the tropical rainforests and too much carbon across the boreal forests, largely reflecting the spatial pattern of GPP biases but likely also related to changes in C allocation in CLM5.

Carbon and Nitrogen Fluxes and Stocks
Soil C stock patterns are more realistic in CLM4.5BGC and CLM5BGC than in CLM4BGC because of the introduction of vertically resolved soil biogeochemistry in CLM4.5 , which allows the model to generate large C stocks across the northern high-latitude permafrost domain, as observed. The relationship between apparent soil C turnover times (defined as the ratio of mean soil C stocks over climatological annual mean NPP) and mean air temperature is more realistic in CLM4.5 and CLM5 ( Figure 12, metric reproduced as in Koven et al., 2017), with both of these model versions at least partially capturing the transition to longer apparent soil C turnover times in cold climates. This metric suggests that CLM5 apparent soil C turnover times are slightly improved over CLM4.5 with a steeper increase in turnover times at cold temperatures as well as a broader spread of turnover times in warm climates associated with soil wetness (short turnover times in warm-wet climates and long turnover times in warm-dry climates). Because of the greater permafrost extent and colder permafrost soil temperatures in CLM5 when forced by CRUNCEPv7 than by GSWP3v1, the stocks of soil C to 3-m depth are a factor of 2 larger when forced by CRUNCEPv7 (4,000 Pg C) than when forced by GSWP3v1 (1,925 Pg C), demonstrating the extreme sensitivity of simulated permafrost soil C stocks to simulated permafrost conditions.
The spatial distribution and global sums of terrestrial N inputs and losses remain poorly constrained with data and highly variable among versions of CLM. Table 6 shows published estimates of global terrestrial N fluxes and corresponding estimates from the GSWP3-forced BGC simulations. Within CLM, N inputs come from N deposition and N fixation. Inputs from N deposition are consistent among model versions, with forcings coming from Lamarque et al. (2010), and show broad agreement with observationally derived estimates (Fowler et al., 2013). Estimates of global N fixation show greater spread among models. The empirical approach applied in CLM4 and CLM4.5 estimated biological N fixation rates as function of NPP (Cleveland et al., 1999). CLM5 calculates both symbiotic and free-living N fixation. Total N fixation in CLM5 is lower than in previous versions of the model and lies within the range of estimates of N fixation rates (Vitousek et al., 2013). Finally, with the ability to simulate a global interactive crop model, CLM5 provides opportunities to estimate anthropogenic changes to the terrestrial N cycle through planting N fixing crops and fertilizer application. The N fixation rates simulated by soy in the model are well below upscaled estimates of agricultural N fixation (Herridge et al., 2008), but simulated fertilization rates appear to be on target with  (Collier et al., 2018). Observations are GPCC for precipitation, GLEAM for ET, Dai et al. (2009) for runoff, and WECANN.
observational estimates (Fowler et al., 2013). CLM simulates N losses through leaching, gaseous emissions, and biomass removal. Successive model versions show increasing hydrological N losses, though these have not been evaluated against data. Houlton et al. (2015) pointed out that gaseous N losses were too high in CLM4. The same is likely true with CLM5, which still suffers from poorly implemented representation of soil N dynamics resulting in a high bias in gaseous (as opposed to hydrologic) N losses. With intensification of land use and land management, CLM5 also shows anthropogenically driven N losses associated with wood harvest, crop harvest, and land use change. These N loss fluxes, as well as gaseous N emissions (including NO x emissions due to fire and soil N 2 O fluxes), remain poorly constrained and an area for future model evaluation and development.

CO 2 and N-Addition Response
Over the course of model development, CLM (BGC configurations) transitioned from a model that exhibited strong N limitation of the terrestrial carbon cycle (CLM4) to a model that showed greater responsiveness to elevated concentrations of CO 2 in the atmosphere (CLM5; Wieder et al., 2019), consistent with recent observations that suggest that there has been only weak N limitations on CO 2 fertilization (Campbell et al., 2017). Specifically, the carbon cycle simulated by CLM4 showed an unrealistically strong nitrogen limitation (Bonan & Levis, 2010; and a lower than observed response to CO 2 enrichment ( Figure 13; Hoffman et al., 2014;Walker et al., 2014;Zaehle et al., 2014). With revisions to the photosynthesis parameterization and soil biogeochemical model (Bonan et al., 2011;Koven et al., 2013), CLM4.5 showed a lower sensitivity to N enrichment than its predecessor that was more in line with observations (LeBauer & Treseder, 2008), but it still exhibited lower sensitivity to CO 2 enrichment than observations from Free-Air CO 2 Enrichment sites (Ainsworth & Long, 2004). CLM5 includes a suite of model developments focused on improving the representation of vegetation C-N dynamics (outlined in section 2.3.6). The globally integrated response of terrestrial ecosystems to N and CO 2 enrichment suggests that CLM5 shows improved agreement with observed ecosystem response to these environmental manipulations ( Figure 13; Ainsworth & Long, 2004;LeBauer & Treseder, 2008), though the globally integrated improved agreement with these syntheses should not be overinterpreted. Besides capturing the appropriate magnitude of terrestrial C pools and fluxes to N enrichment, simulations with CLM5 also show increases in foliar N content and ecosystem C use efficiency that are consistent with observations (Campioli et al., 2015;Vicca et al., 2012;Wieder et al., 2019). Similarly, foliar N content and V cmax decline under elevated CO 2 , again consistent with observations (Ainsworth & Long, 2004). Together, these results suggest that CLM5 Confronting land models with perturbations that are similar to experimental manipulations also exposes shortcomings in the model's structural assumptions and parameterizations. For example, although the bulk C cycle response to N enrichment simulated by CLM5 appears more appropriate than CLM4 or CLM4.5, the model still fails to capture observed shifts in plant C allocation toward greater aboveground productivity or decreases in heterotrophic respiration that are commonly seen in nutrient addition experiments (Janssens et al., 2010;Liu & Greaver, 2010). Similarly, terrestrial sensitivities to elevated CO 2 simulated by CLM5 seem more in line with observed responses, but the model achieves higher productivity by increasing LAI and nitrogen fixation rates beyond what is likely to occur in natural ecosystems (Ainsworth & Long, 2004;Hungate et al., 2004;Medlyn et al., 2015;Terrer et al., 2018). Indeed, results from experimental manipulations emphasize that acclimation as well as changes to plant allocation (which are not represented in CLM5) and stoichiometry are important aspects of terrestrial ecosystem responses to global change drivers (Liu & Greaver, 2010;Luo et al., 2006;Reich et al., 2006). Despite its improvements, CLM5 still has limited capacity to capture these responses, highlighting priority areas that should be addressed in future model developments. Specifically, understanding and modeling appropriate changes in aboveground and belowground C and N allocation remains uncertain, especially in response to global change (Giardina et al., 2005;Terrer et al., 2018). This is an outstanding challenge to be addressed in land models and evaluated with observations from experimental manipulations. Despite these limitations, the overall transition toward the use of optimality theories in N cycle representation in CLM5 and in integrating N processes directly into plant physiology, rather than the post hoc reconciliation of N-unlimited and N-limited rates of GPP in CLM4, appears to broadly move the model in the right direction, though there is much work still to do (e.g., resolve limitations in representation of soil nutrient competition between plants, microbes, and mineral surfaces; Zhu et al., 2016).

Land Carbon Accumulation Over Historical Period
The global land C accumulation trends exhibit clear differences across model versions ( Figure 14). As noted above, CLM4 produces an unrealistically strong nutrient limitation on photosynthesis, which limits that model's capacity for C uptake even as atmospheric CO 2 increases. Consequently, in CLM4 land use and land cover change (LULCC) C loss fluxes dominate over the CO 2 fertilization response resulting in an accumulated land C loss of~60 Pg C over the period 1850 to 2014, which is outside the observational estimates of −8 Pg C (range +32 to −52 Pg C, 1850-2010; Hoffman et al., 2014). CLM4.5, on the other hand, shows C uptake and accumulation in response to CO 2 fertilization that is perhaps too strong, especially under the GSWP3v1-forced simulation. The CLM5 land C accumulation curve lies in between CLM4 and CLM4.5 and appears to result in the best match with observational estimates, for the historical period as well as the global carbon project era (1950Le Quéré et al., 2014). These results are also reflected by the comparatively high scores for the Global Net Ecosystem Carbon metric in ILAMB for CLM5 ( Figure 3).
Although it is tempting to infer that the more realistic responses of CO 2 and N additions in CLM5  are responsible for the improved emergent behavior of the model with respect to the historical land C accumulation, historical C accumulation is a function of several sometimes counteracting processes that control C fluxes and stocks, and thus, these changes should be interpreted cautiously. These processes include deforestation and wood harvest fluxes and the dependency of these fluxes on initial forest vegetation C stocks, C uptake responses to increasing CO 2 and N deposition trends, and vegetation and soil C responses to climate trends and variability. Furthermore and importantly, as noted above, Fisher et al. (2019) demonstrate that CLM5 responses to CO 2 and N fertilization exhibit strong sensitivity to several uncertain parameters. Nonetheless, the improvement in this important emergent behavior of the model is intriguing and is investigated in more depth in Bonan et al. (2019).
Also apparent in Figure 14 is a strong sensitivity to atmospheric forcing with accumulated land C for the period 1850 to 2014 differing between runs forced with GSWP3v1 and CRUNCEPv7 by 50, 20, and 10 Pg C for CLM4, CLM4.5, and CLM5, respectively. The divergence in C accumulation between runs with different forcing data sets arises early in the period, mainly prior to 1950, when CO 2 fertilization would have been relatively small and LULCC fluxes dominate. This implies, then, that LULCC C fluxes can differ substantially even within a model version forced with exactly the same LULCC time series but under different estimates of historical climate forcing. We hypothesize that the simulated preindustrial (year 1850) vegetation C stocks and their regional distribution can impart a strong influence on historical LULCC C fluxes. Figure 12. Metric for apparent soil carbon turnover time versus mean air temperature, as in Koven et al. (2017) for observations and CLMBGC model versions. Turnover time is calculated in observations and models as ratio of mean carbon stocks (SOM) over climatological annual mean carbon inputs (NPP). Each dot represents one grid cell, color coded by mean annual precipitation. The best fit regression curve for the observational data with 50% prediction intervals is shown as black lines for the models. RMSE represents the agreement with the best fit curves. See Koven et al. (2017) for full description of this metric. Observations for soil organic matter (SOM) are merged from Harmonized World Soil Database (Fao/Iiasa/Isric/Isscas/Jrc, 2012) and Northern Circumpolar Soil Carbon Database (Hugelius et al., 2013). Observed NPP estimate is from MODIS (Zhao et al., 2005).
Finally, while the long-term land carbon accumulation agrees better with observed estimates, which are derived from atmospheric CO 2 and ocean C inventories, the interannual variation in land C accumulation appears to be degraded in CLM5 (larger low bias in variability), based on a comparison of interannual variability of atmospheric CO 2 simulated from the CLM fluxes compared to that observed (see ILAMB CO 2 diagnostics). Throughout the Northern Hemisphere, interannual variability is at most one third of that observed at NOAA marine boundary layer sites. The drivers and implications of this degradation from CLM4 to CLM4.5 to CLM5 require further investigation, since climate-driven variations at interannual timescales may provide useful information about future climatedriven changes in terrestrial carbon stocks (Cox et al., 2013;Keppel-Aleks et al., 2018). Preliminary investigation suggests that although the plant hydraulics scheme does tend to reduce variability in GPP and transpiration (see section 4.4), it does not appear to be primarily responsible for the reduced C flux variability in CLM5, with the reduced variability potentially resulting from increased interannual synchronicity between NPP and ecosystem respiration.

Water Use Efficiency
Quantification of changes in WUE (carbon uptake per unit of water loss) due to climate change and rising atmospheric CO 2 levels is challenging (Cheng et al., 2017). Changes in WUE will have strong implications for water availability, food and fiber production, as well as the C sink capacity of terrestrial ecosystems. Though this topic has received considerable recent attention in the literature (e.g., Cheng et al., 2017;Frank et al., 2015;Huang et al., 2015;Keenan et al., 2013), there is still no consensus on how the coupled terrestrial carbon and water cycles have changed or will change in the future.
A key feature of CLM5 is a more realistic coupling of N limitation and stomatal conductance, with stomatal conductance in CLM5 based on the Nlimited photosynthesis (Ghimire et al., 2016) rather than on N-unlimited potential photosynthesis as it was in CLM4 and CLM4.5. This more realistic coupling has consequences for WUE and WUE trends since changes in N limitation will propagate directly into simulated transpiration. The increase in global WUE (defined here as GPP/transpiration) over the historical period is considerably stronger in CLM5 compared to CLM4 and CLM4.5 ( Figure 15). Global GPP trends are comparable across models, though CLM5 marginally exhibits the strongest increase, while CLM4 shows the weakest increase, at least partially due to the high N limitation in that version (see section 4.7). Global transpiration trends, on the other hand, diverge considerably across versions with CLM4 and CLM5 showing a declining trend in transpiration during 1980 to 2014 and CLM4.5 showing an increasing trend over the same period. Spatially, the increase in WUE is larger almost everywhere in CLM5 than in the other model versions, but the driver of the WUE change differs considerably by region. In the tropics, the CLM5 increase in WUE is driven by both increased GPP and somewhat reduced transpiration ( Figure S6). In the boreal forest and across the mid-to-high northern latitudes, the historical increases in GPP are high, but transpiration is largely unchanged or is weakly increased. Deeper analysis of the WUE trends and its interaction with CO 2 fertilization and LAI, N limitation, and soil moisture limitation trends across model versions and compared against available estimates  (Lamarque et al., 2010). e Soy N fix is also included in global estimate of symbiotic N fixation listed above in the table. Data here are from CFT output for nitrogen fixation. Figure 13. Simulated effect sizes of nitrogen versus CO 2 enrichment on global rates of net primary productivity (NPP) that was calculated for CLM4BGC, CLM4.5BGC, and CLM5BGC (brown, turquoise, and purple symbols, respectively; GSWP3 simulations). Observational constraints for the nitrogen response (aboveground NPP from LeBauer & Treseder, 2008) and CO 2 response (DM production from Ainsworth & Long, 2004) are shown with the vertical and horizontal lines, respectively (mean ± 95% confidence interval). of historical WUE trends is worthy of additional study but is beyond the scope of this paper.

Crops
Agricultural management practices can have a considerable impact on climate (Bagley et al., 2015;Davin et al., 2014;Lombardozzi et al., 2018;Mueller et al., 2017;Thiery et al., 2017), highlighting the importance of representing agriculture in ESMs. CLM5 is the first version of CLM that includes transient representation of crop distribution and management, and the inclusion of managed agriculture in CLM5 does affect carbon, water, and energy fluxes from the land surface. The representation of crops in CLM5 also allows the model to track crop yields through time. The crop yields simulated by CLM5 increase from 1.1 tons/ha in 1850 tõ 3 tons/ha in 2010 (Figure 16c). For the crop types represented in CLM5, the simulated yields match observations for the same crop types from the United Nations Food and Agriculture Organization (UN-FAO) from the start of available observations in 1961 through approximately 1980. Yields in CLM5 level off after that time, whereas the UN-FAO yields steadily increase, with the discrepancy likely due to the fact that crop representation in CLM5 does not include processes associated with intensification, such as increasing planting density. The spatial distribution of crop yields illustrates that CLM5 underestimates crop yields throughout the Northern Hemisphere compared to UN-FAO, particularly in the Central United States, Europe, and Southwestern Asia, but overestimates crop yields throughout much of the tropics (Figures 16a and 16b). Yields of individual crops are generally similar to UN-FAO estimates, though CLM5 underestimates corn yields throughout most temperate regions. The management techniques represented in CLM5 also impact the magnitude of crop yields. Globally, agricultural expansion and fertilization have large impacts on increasing crop yields, and irrigation has a smaller impact due to the fact that less than~25% of cropland area is irrigated. Irrigation is quite important for crop yields within irrigated areas, however. Note that due to the inflexibility of the planting windows in CLM5, planting dates in some regions, such as India (too early), are unrealistic. A more flexible climate-driven planting date scheme is planned for future model versions.

Urban
To evaluate behavior of the updated urban model and building properties data, observations from five urban flux tower sites and a global anthropogenic heat flux (AHF) data set were used. In simulations described in Oleson and Feddema (2019), radiative and turbulent fluxes, surface temperatures, and AHF were found to be generally improved compared to the previous version. The simulation of global and regional AHF is also significantly improved, mainly due to the new building energy model. For example, large positive biases in AHF over the United States and Europe, evident in the previous model version, are reduced such that simulated values are now within 1% and 11% of observations, respectively. The increased simulation fidelity and new capabilities of the model should enhance its utility for research into the combined effects of urbanization and global climate change.

Summary and Discussion
As with prior CLM versions, the development of CLM5 was an extensive community effort involving researchers from many different institutions and culminating with the integration of numerous disparate development efforts. The resulting updated model represents a significant advancement, relative to prior model versions. CLM5 includes new default and optional functionality, improved flexibility in model configurations and land cover transitions (natural vegetation ↔ glacier, natural vegetation ↔ crop), as well as more mechanistic and ecologically relevant representations of the physics, biology, and human land management processes that govern terrestrial states and fluxes.

Journal of Advances in Modeling Earth Systems
Benchmarking packages such as ILAMB mark a significant enhancement in our ability to evaluate land model representations of water, energy, and carbon cycles. Broadly, ILAMB and other metrics presented here indicate that the simulation quality is improved in CLM5 over CLM4 and CLM4.5, although differences between CLM4.5 and CLM5 are less distinct, and particular variables or metrics show degraded performance. However, even with the deployment of advanced model assessment tools and metrics, in many cases a clear and unambiguous demonstration of improvement or degradation for a complex model such as CLM remains challenging. We find, for example, perhaps unsurprisingly, that climate and weather forcing uncertainty confound the interpretation of impacts of model structural advances. The impact of parameter uncertainty is not assessed here (see Fisher et al., 2019 for partial parameter sensitivity assessment of CLM5). Nonetheless, we interpret the broad indications of improvement across multiple variables and metrics (>30) suggest genuine progress, which (hopefully) is grounded in the upgraded model parameterizations and more comprehensive process representation.
We stress, however, that model users should consider improvements or degradation identified in ILAMB or other metrics presented here with caution due to observed data limitations related to data scale applicability, measurement uncertainties, inconsistencies across multiple observational data sets for one or more variables (e.g., water and energy budgets derived from the available observationally based ILAMB data sets do not close), as well as limitations in the metrics included in ILAMB. Improved methods within ILAMB to account for observational data uncertainty are critical and are a priority for the ILAMB project. An improvement or a degradation for a particular variable or metric does not on its own imply that the model is suited or not suited for research related to that particular variable. For example, ILAMB indicates that snow water equivalent is degraded in CLM5 relative to CLM4. This apparent degradation occurs despite several mechanistic improvements to snow physics that have been introduced between CLM4 and CLM5. The lower ILAMB score for snow water equivalent for CLM5 could indicate a real model snow simulation performance degradation (due to structural or parametric problems introduced during development from CLM4 to CLM5), but it could also potentially be attributed due to inaccuracies in the forcing data or biases in the observed data set used in ILAMB or limitations in the ILAMB metrics themselves. Consequently, CLM5 users interested in applying the model for research into snow processes will need to balance knowledge of the snow physics and snow physics structural advances against the ILAMB score decrease and against their own assessment of snow simulations to decide whether or not the model is "fit-for-purpose." More explicit process representation enables new types of observations to be applied for evaluation of CLM. For example, since CLM5 implements prognostic, rather than prescribed, leaf photosynthetic traits, observations of V cmax25 and J max25 can be used as a means for assessing the model. Similarly, the introduction of plant hydraulics opens up the potential to employ several observational quantities that were previously not applicable to CLM including mid-day stomatal conductance, leaf water potential, and sap flow. This list could continue, but in general, the expansion of CLM science to more realistically represent physical and ecological processes opens also new opportunities to evaluate the model with diverse observational data sets. Identifying, developing, and applying these and other new data products to constrain the more realistic representations of physical and ecological processes is likely to be a fruitful avenue for research and model development going forward.
Open-source development of CLM is ongoing (https://github.com/escomp/ctsm). Model users and developers are encouraged to provide feedback, report bugs, and contribute model developments. New model features and parameterizations are in development for future versions of CLM including multiple lines of FATES development, explicit treatment of biomass heat storage , a representative hillslope formulation that permits water to flow laterally within a grid cell according to topographic or water table gradients, and a multilayer canopy parameterization  as well as ongoing projects on agriculture (e.g., more realistic crop phenology and allocation, Peng et al., 2018;tillage, Levis et al., 2014; and biofuel crops), water management (e.g., multiple sources of irrigation water and reservoirs), and forestry. As these development projects come to fruition, they will be made available to the CLM research community for use.