Volume 13, Issue 7 e2020MS002421
Research Article
Open Access

Improving CLM5.0 Biomass and Carbon Exchange Across the Western United States Using a Data Assimilation System

Brett Raczka

Corresponding Author

Brett Raczka

School of Biological Sciences, University of Utah, Salt Lake City, UT, USA

Now at National Center for Atmospheric Research, Boulder, CO, USA

Correspondence to:

B. Raczka,

[email protected]

Search for more papers by this author
Timothy J. Hoar

Timothy J. Hoar

National Center for Atmospheric Research, Boulder, CO, USA

Search for more papers by this author
Henrique F. Duarte

Henrique F. Duarte

Department of Atmospheric Sciences, University of Utah, Salt Lake City, UT, USA

Now at Earth System Science Center, National Institute for Space Research, São José dos Campos, Brazil

Search for more papers by this author
Andrew M. Fox

Andrew M. Fox

Joint Center for Satellite Data Assimilation, Boulder, CO, USA

Search for more papers by this author
Jeffrey L. Anderson

Jeffrey L. Anderson

National Center for Atmospheric Research, Boulder, CO, USA

Search for more papers by this author
David R. Bowling

David R. Bowling

School of Biological Sciences, University of Utah, Salt Lake City, UT, USA

Department of Atmospheric Sciences, University of Utah, Salt Lake City, UT, USA

Search for more papers by this author
John C. Lin

John C. Lin

Department of Atmospheric Sciences, University of Utah, Salt Lake City, UT, USA

Search for more papers by this author
First published: 19 June 2021
Citations: 12

Abstract

The Western United States is dominated by natural lands that play a critical role for carbon balance, water quality, and timber reserves. This region is also particularly vulnerable to forest mortality from drought, insect attack, and wildfires, thus requiring constant monitoring to assess ecosystem health. Carbon monitoring techniques are challenged by the complex mountainous terrain, thus there is an opportunity for data assimilation systems that combine land surface models and satellite-derived observations to provide improved carbon monitoring. Here, we use the Data Assimilation Research Testbed to adjust the Community Land Model (CLM5.0) with remotely sensed observations of leaf area and above-ground biomass. The adjusted simulation significantly reduced the above-ground biomass and leaf area, leading to a reduction in both photosynthesis and respiration fluxes. The reduction in the carbon fluxes mostly offset, thus both the adjusted and free simulation projected a weak carbon sink to the land. This result differed from a separate observation-constrained model (FLUXCOM) that projected strong carbon uptake to the land. Simulation diagnostics suggested water limitation had an important influence upon the magnitude and spatial pattern of carbon uptake through photosynthesis. We recommend that additional observations important for water cycling (e.g., snow water equivalent, land surface temperature) be included to improve the veracity of the spatial pattern in carbon uptake. Furthermore, the assimilation system should be enhanced to maximize the number of the simulated state variables that are adjusted, especially those related to the recommended observed quantities including water cycling and soil carbon.

Key Points

  • Assimilating observations of biomass and leaf area reduces simulated biomass and projects a weak land carbon sink across the Western United States

  • Our estimate of carbon exchange contrasts with an independent FLUXCOM estimate that shows a significant carbon sink in the Western United States

  • Water cycle observations should be used to complement biomass observations to improve the spatial pattern of modeled carbon fluxes

Plain Language Summary

The Western United States is dominated by natural lands that play a critical role for carbon balance (e.g., trees, soils), water quality, and timber reserves. This region is also particularly vulnerable to tree death from drought, insect attack, and wildfires, thus requiring constant monitoring to assess its health. Traditional carbon monitoring techniques are usually not possible within mountainous terrain, thus we used satellite observations of leaf area and forest biomass to improve modeled simulations of the Western United States. When we accounted for observations of trees our modeled estimates showed reduced amounts of biomass and relatively small amounts of atmospheric CO2 transfer from the atmosphere to the land (the land absorbs carbon from the atmosphere through photosynthesis). Our best estimate of carbon absorbed by the land was much less than other modeled estimates. This suggests our method better accounted for the current conditions of the trees including death from fire, insect attack, and drought. Our modeled estimate of biomass and carbon balance across the Western United States can be improved further by considering more observations of the land surface related to soil moisture and soil carbon.

1 Introduction

The Western United States holds a significant fraction of the country's carbon reserves (M. Liu et al., 2015; Schimel et al., 2000) which are highly sensitive to variation in weather and longer term changes in climate. The vast majority of annual precipitation in this region falls during the winter, with precipitation at higher elevations occurring predominantly as snow (Hunter et al., 2006; Monson et al., 2002) followed by warm, dry summers. The winter snowpack is a critical reservoir for water resources that helps bridge the precipitation gap during the dry summer months (Hu et al., 2010). The Western United States also has one of the fastest warming trends since 1950 (0.2°C decade−1) within the contiguous US (NOAA, 2020) which shifts more winter precipitation from snow to rain (Knowles et al., 2006). The warming trend reduces spring snowpack and combines with increased surface evaporation to reduce soil moisture, photosynthesis, and land-atmosphere carbon uptake (e.g., Ma et al., 2012). The reduced soil moisture exerts physiological stress upon forests, and can increase drought-induced mortality (Anderegg et al., 2016), insect attacks (Hicke et al., 2013; Negron et al., 2008), and wildfires (Wiedinmyer & Neff, 2007; Williams et al., 2016). Given the sensitivity of these ecosystems to weather, climate, and disturbance, it is critical to monitor forest health by estimating biomass and terrestrial carbon uptake.

The Western United States is characterized by complex, mountainous terrain that challenges techniques for monitoring carbon fluxes either through the eddy covariance approach (Yi et al., 2008) or through atmospheric inversions that require CO2 observations (Lin et al., 2017). Complex terrain promotes complex flow patterns that violate assumptions of the eddy covariance approach, and the atmospheric flow is difficult to characterize due to interactions with the fine-scale heterogeneity of the terrain. Given these challenges and the overall sparsity of observations, some approaches to estimate regional land-atmosphere carbon exchange have combined terrestrial biosphere models (TBMs) with atmospheric inverse analyses and eddy covariance-based flux observations (Desai et al., 2011; Jung et al., 2020; Sun et al., 2010). Atmospheric inversions rely on an observation network of atmospheric CO2, and infer the land-atmospheric carbon flux using atmospheric transport models. TBMs, on the other hand, represent land surface processes and are designed to spatially upscale estimates of carbon exchange that do not require direct observations of atmospheric CO2 or carbon exchange.

TBMs are driven by meteorological data to simulate land surface carbon dynamics through a representation of radiative transfer, energy balance, and carbon, water, and nutrient cycling. While TBMs have become increasingly mechanistic and detailed in their representation of ecological processes (Arora et al., 2020), significant limitations remain. Errors within TBMs arise from inaccuracies related to meteorological data sets (Abatzoglou, 2013; Thornton & Running, 1999) and initialization of state variables (Bonan & Doney, 2018; Dietze, 2017), as well as the poor representation of key ecological processes (Keenan et al., 2012; Luo et al., 2015; Trugman et al., 2018). Although the exact contribution of each of these factors to model error depends upon the time scale of the model simulation, for short term hindcasting or forecasting, initial conditions contribute the most to uncertainty (Bonan & Doney, 2018; Dietze, 2017). To reduce this type of uncertainty, the observed state may be prescribed directly into the model (e.g., Antonarakis et al., 2011; Moore et al., 2008). Nevertheless, portions of the system state remain unobservable (e.g., subsurface) thereby necessitating a spin-up in which the TBM is run for many centuries to generate a near equilibrium system state (e.g., Thornton & Rosenbloom, 2005). This spun-up system state is still subject to biases from meteorological data and structural uncertainty, however, and is also dependent upon the accuracy of disturbance and land-use change maps to properly represent the present-day state of the system. Thus, data assimilation frameworks have been developed to adjust TBMs to better represent observations.

Data assimilation (DA) is the process of combining a model with observations to improve the accuracy of the system being modeled. Although DA can be used to adjust either model parameters (i.e., intrinsic values that influence the model behavior) or the model state (i.e., emergent properties of the model influenced by parameters and other factors); here, we focus on model state assimilation. DA has been applied extensively to better simulate Earth system processes, including those related to the atmosphere (Kalnay, 2003; Lynch, 2006), oceans (e.g., Karspeck et al., 2013), and subsurface hydrology (e.g., Beven & Freer, 2001). DA has also been used to understand land surface processes of carbon and water exchange (e.g., Fox et al., 2009; Raupach et al., 2005; Trudinger et al., 2007; Williams et al., 2009) leading to the development of land surface DA systems (LDAS) (e.g., Albergel et al., 2017; Kumar, Jasinski, et al., 2019; McNally et al., 2017; Xia et al., 2019). Particularly relevant for the contiguous US (CONUS), satellite-derived observations of leaf area index (LAI) were assimilated into the Noah-MP TBM which improved the simulation of water cycling (Kumar, Jasinski, et al., 2019). This adjustment of LAI also improved carbon cycling processes of gross primary productivity (GPP) and ecosystem respiration (ER) (Kumar, Mocko, et al., 2019). Multiple satellite-derived data streams of LAI and soil moisture improved the simulation of LAI and GPP across the CONUS (Albergel et al., 2018). In general, the use of DA to fuse satellite-derived observations of LAI, GPP (e.g., SIF, NDVI, and EVI), surface soil moisture and evapotranspiration to constrain global to regional estimates of land surface water and carbon cycling continues to be a promising area of research (Schimel & Schneider, 2019; Smith et al., 2020).

The Data Assimilation Research Testbed (DART; Anderson et al., 2009) has been coupled with components of the Community Earth System Model (CESM) including the atmosphere (CAM; Raeder, Anderson, Collins, T. J. Hoar, et al., 2012), land (CLM; Kwon et al., 2016; Zhang et al., 2014) as well as other complex models. The LDAS version (CLM-DART) has been shown to improve the representation of biomass at a site in New Mexico (Fox et al., 2018) as well as globally (Ling et al., 2019). CLM has also been used extensively to investigate water and carbon cycling within the Western United States (Burns et al., 2018; Duarte et al., 2017; Raczka et al., 20162019; Wieder et al., 2017).

In this analysis, we go beyond the previous CLM-DART assimilation studies by implementing the latest CLM release (CLM5.0) with updates to DART (CLM5-DART). We simultaneously assimilate satellite-derived observations of LAI and AGB to provide observation-constrained estimates of carbon exchange across the complex terrain of the Western United States. We ask: does assimilating observations of AGB and LAI significantly change the magnitude and pattern of simulated biomass? How does this assimilation impact the modeled net carbon exchange across the Western United States? Going beyond the previous assimilation studies, we also quantify the uncertainty of this observation-constrained representation of the carbon cycle by varying the components of the assimilation system including the adjusted state variables, the type of observation, observation acceptance, and approaches to estimate the observation uncertainty. Given that the Western United States is a strongly water-limited region, we also examine the role of water limitation upon the carbon cycle and to what extent CLM captures hydrological variables.

2 Materials and Methods

2.1 CLM5-DART Overview

We used an ensemble DA system to provide improved estimates of biomass and carbon flux across the Western United States. This assimilation system has three main components including (a) a representation of the land surface through the Community Land Model v5.0 (CLM5.0; Lawrence et al., 2019); within the Community Earth System Model v2.0 (CESM2.0; Danabasoglu et al., 2020), (b) the DART, and (c) satellite-derived observations of above-ground biomass and leaf area index. An overview of CLM5-DART is provided in Figure 1. All abbreviations are defined in (Table S1). CLM5.0 simulates the carbon, nitrogen, water, and energy cycles across the land surface (Section 2.2). During a free CLM5.0 simulation, meteorological data are prescribed to CLM5.0 and the model alone (no observations included) simulates the land surface behavior. The meteorological data are prescribed from an 80-member ensemble DART reanalysis (Raeder, Anderson, Collins, & T. Hoar, 2012) from the Community Atmosphere Model (CAM4.0).

Details are in the caption following the image

An overview of the assimilation system (Community Land Model v5 [CLM5]-DART, Data Assimilation Research Testbed [DART]) that combines observations with model ensemble simulations from CLM5.0. Through use of an Ensemble Kalman Filter satellite-derived observations are used to adjust the simulated model state of Community Land Model v5.0 (CLM5.0). In this analysis, the state variables related to above-ground biomass and leaf area are adjusted within CLM5-DART, whereas the fluxes, soil carbon, and soil water are downstream variables (Section 2.3). This adjustment, in general, decreased the magnitude of the carbon pools from the initial to the posterior carbon states.

The 80 CAM4 ensemble members are designed to sample the uncertainty in the atmospheric reanalysis (Raeder, Anderson, Collins, Hoar, et al., 2012) and this creates spread across the 80-member ensemble of CLM5.0 simulations. During an assimilation run, DART (Section 2.3) adjusts the simulated state of the land surface to better match observations (Section 2.4). The adjustment step (assimilation) depends upon observation uncertainty and the covariance between the CLM5.0 state components and observations. This assimilation step and subsequent improved forecast simulation are intended to reduce model simulation error that can arise from either meteorological or model structural error. A more complete description of each component of the assimilation system is provided in Sections 2.2-2.4.

2.2 Community Land Model CLM5.0

CLM5.0 is the land surface component within the Earth system model CESM2.0. The details of CLM5.0 are provided within the supplement (Text S1) and elsewhere (Lawrence et al., 2019). Here, we provide a overview emphasizing the model components important for the water-limited region of the Western United States where CLM has been applied before (Buotte et al., 2019; Burns et al., 2018; Duarte et al., 2017; Raczka et al., 20162019; Wieder et al., 2017). The surface topography is represented by elevation and slope, which influence the evolution of snow cover, liquid water runoff, and penetration into the soil subsurface. The soil depth is spatially explicit (Pelletier et al., 2016), with a description of soil texture (% sand and clay) that influences both subsurface drainage and the ability for plant roots to extract water. Soil water evaporation is represented through diffusive transport of water vapor through a dry surface layer (Swenson & Lawrence, 2014). Lateral subsurface water flow and terrain aspect are not explicitly represented.

The vegetated land surface within CLM5.0 is represented by grid cells classified into areas of natural vegetation (e.g., trees, grasses, and shrubs), with as many as 14 plant functional types (PFTs), and crops with 64 possible crop functional types. Similarly, soil properties are classified by columns. Individual PFTs and columns are not assigned a specific location within the grid cell. Photosynthesis in C3 plants is based on an approach from Farquhar et al. (1980) which simulates leaf level photosynthesis by taking the minimum of the Rubisco enzyme, light-limited, and product-limited rates of carboxylation. Carbon uptake is coupled to the nitrogen cycle based on the Leaf Use of Nitrogen for Assimilation model (Ali et al., 2016; Xu et al., 2012), which calculates a dynamic photosynthetic capacity through the optimal use of foliar nitrogen. The expenditure of carbon is allowed in order to gain access to nitrogen, following the Fixation and Uptake of Nitrogen Model (Shi et al., 2016). CLM5.0 also adopts variable carbon:nitrogen allocation ratios to plant tissue, which provides a more realistic relationship between nitrogen limitation and stomatal conductance (Ghimire et al., 2016). The maximum leaf stomatal conductance depends primarily upon the water vapor pressure deficit (Medlyn et al., 2011).

Water-carbon coupling in CLM5.0 follows a plant hydraulic stress (PHS) approach (Kennedy et al., 2019). The PHS model explicitly represents water transport through plant tissue and calculates the water potential for roots, xylem, and leaves. The model balances water supply and water demand by reducing the maximum stomatal conductance to avoid excessively high xylem tension and low leaf water potential. The extent to which soil water limitation impacts photosynthesis is quantified through a diagnostic soil moisture stress value (urn:x-wiley:19422466:media:jame21399:jame21399-math-0001; see Text S1). This can be used to evaluate the impact that water limitation has upon the simulated carbon cycling processes.

CLM5.0 assumes a fixed, empirical plant mortality rate of 2% year−1 (Lawrence et al., 2018). The model can simulate prognostic fire-related mortality; however, the fire module was turned off for our simulations as it tends to overestimate burn area (Buotte et al., 2019). Despite the lack of explicit representation of mortality within CLM, the assimilation framework (CLM5-DART) accounts for disturbances related to mortality by adjusting the model based on AGB and LAI observations. We used the CLM5.0.03 tag release for all of the simulations discussed in this manuscript.

2.3 Data Assimilation Research Testbed

We coupled CLM5.0 with DART, an open-source ensemble DA software system developed by the National Center for Atmospheric Research (Anderson et al., 2009). The DART system has been previously coupled with other geophysical models, including the atmosphere and ocean components of CESM (e.g., Karspeck et al., 20132018; Raeder, Anderson, Collins, Hoar, et al., 2012). Most relevant for our application, DART has been coupled with CLM4.5 for a site-level simulation in New Mexico (Fox et al., 2018) and coupled with CLM4.0 for a global simulation (Ling et al., 2019). In general, we followed a similar approach to Fox et al. (2018), except the assimilation domain was increased to include the Western United States (Section 2.7). We used the Ensemble Adjustment Kalman Filter (EAKF; Anderson, 2001), which unlike the more traditional Ensemble Kalman Filter is deterministic in the calculation of the analysis spread based upon prior and observation uncertainty. We applied “adaptive inflation” to increase the ensemble spread and help avoid departure of the ensemble forecast from the observations (Anderson, 2007; Anderson et al., 2009). It is considered “adaptive” because the inflation values can vary in time, space, and by state variable. The inflation algorithm is designed to inflate state variables that are correlated with the observation. In general, the larger the mismatch between the observation and prior ensemble estimate, the more the inflation is increased. This increase in ensemble spread also helps in the calculation of state variable covariance which is important for model updates during the assimilation step. We applied a spatial localization function (Gaspari & Cohn, 1999) with a cutoff value (half of the width at which the observation has no influence) of 0.015 radians. The observation outlier threshold was set to 3 standard deviations of the combined prior distribution and observation likelihood away from the model ensemble mean. Observations that are outside of this range were rejected and not used in the assimilation. This approach protects against unrealistic observations that were not accounted for in the QA/QC, and also prevents large increments during the assimilation that may destabilize the model. Large increments may occur when either the observation uncertainty is underestimated or subject to systematic biases. For example, systematically low values of LAI are known to exist for some LAI data products, in part because of interference with cloud cover and snowcover (Fang et al., 2013; Garrigues et al., 2008; Jin et al., 2017). The model-observation residual may also be excessively large due to underlying model structural error.

The DART system customizes how the model state variables are influenced by the observations (AGB, LAI). Specifically, the CLM state variables may be directly adjusted (adjusted variables) by the EAKF within DART or indirectly influenced through the internal dynamics of CLM (downstream variables). In our case, for example, the adjusted state variables are the components of AGB: leaf carbon, live, and dead stem carbon; and LAI (i.e., leaf carbon and leaf area specific parameters). In this way, adjustments to CLM are applied to variables that are unobserved because of the strong relationship they have to the observed variables. The adjusted state variables are updated by the EAKF via the ensemble correlation with the observed variables. The strength of the correlation between the adjusted state variable and the observation is proportional to the magnitude of the assimilation update. This is a powerful component of ensemble Kalman filters in that any unobserved state variable that is correlated with an observation can be updated.

In theory, the number of adjusted state variables within the assimilation system should be maximized (number of downstream variables minimized) to accelerate the rate at which the model state comes into equilibrium with the observations, keeping the entire model state internally consistent. In practice, however, whether to adjust a state variable during the assimilation update should include selection criteria based upon physical realism and the strength of the correlation to the observations. For these reasons, we selected adjusted state variables that were physically related and correlated to the observations in general accordance with Fox et al. (2018). We perform a range of assimilations in which we vary the number (4, 13, and 19) of adjusted state variables (Section 2.8). These simulations include AGB carbon related variables (state-4), above and below-ground carbon and nitrogen variables (state-13), and the addition of litter carbon and nitrogen variables (state-19). The complete list of adjusted state variables is provided in Table 1 where a preliminary analysis guided our decision to use the state-13 as the standard assimilation in which to base our analysis and results (Sections 3.2 and 4.2).

Table 1. Complete List of the Adjusted State Variables Within CLM5-DART for Each Assimilation Run
State-4 assimilation State-13 assimilation State-19 assimilation
Leaf carbon (0.5 kg C m−2) Fine root carbon (0.8 kg C m−2) Litter carbon, slow (20 kg C m−2)
Live stem carbon (0.25 kg C m−2) Live coarse root carbon (0.020 kg C m−2) Litter carbon, medium (30 kg C m−2)
Dead stem carbon (15 kg C m−2) Dead coarse root carbon (6.0 kg C m−2) Litter carbon, fast (3.0 kg C m−2)
Leaf area index (10 m2 m−2) Leaf nitrogen (0.01 kg N m−2) Litter nitrogen, slow (0.2 kg N m−2)
Fine root nitrogen (0.02 kg N m−2) Litter nitrogen, medium (0.35 g N m−2)
Live coarse root nitrogen (0.0008 kg N m−2) Litter nitrogen, fast (0.04 kg N m−2)
Dead coarse root nitrogen (0.02 kg N m−2)
Live stem nitrogen (0.003 kg N m−2)
Dead stem nitrogen (0.07 kg N m−2)
  • Note. Each state variable is accompanied by the upper bound limits imposed upon the adjustment implemented by DART. Each assimilation (number indicates total adjusted state variables) includes all the adjusted state variables listed below and the adjusted state variables of the assimilations listed to the left. The state-13 assimilation is the standard assimilation that we compare against other estimates of biomass and carbon flux including the free run and FLUXCOM. The other assimilations (state-4, state-19) are used as diagnostics.
  • DART, Data Assimilation Research Testbed.

In contrast to the adjusted state variables, downstream state variables are all other variables within the CLM state vector that are not updated during the assimilation. These variables are still influenced by the assimilation indirectly through biophysical and feedback mechanisms within the CLM simulation. For example, adjustment of AGB related state variables by the EAKF influences the carbon pools within CLM indirectly, including soil carbon, soil nitrogen pools, and water variables. We also imposed an upper limit after the EAKF adjustment is applied to each state variable to prevent unrealistic values and promote overall model stability (Table 1). We set the upper limit values to be approximately double the domain-wide maximum value from the free simulation. The upper limit is fixed throughout the assimilation. The DART software for the model runs was from the Subversion version control, revision 12752.

2.4 Above-Ground Biomass and Leaf Area Observations

We used satellite-derived global data sets of AGB and LAI observations to adjust CLM5.0 during the assimilations. The global AGB data set was based on measurements from an ensemble of passive microwave sensors sensitive to vegetation optical depth (VOD; Y. Y. Liu et al., 2015), in combination with an empirical relationship between VOD and AGB (Saatchi et al., 2011) to provide an AGB map at annual time resolution from 1993 to 2012. Although other biomass maps exist across the Western United States, almost all other biomass products do not cover the required years (1998–2011) or are limited to a specific vegetation type (e.g., forest). The annual AGB product uncertainty was taken from the 95% uncertainty bounds of the empirical relationship between VOD and AGB (Y. Y. Liu et al., 2015). On average, the relative uncertainty for this AGB product was 5%. We downscaled the annual AGB data from annual to monthly time steps through linear interpolation, and assumed that the monthly uncertainty was the same as the annual uncertainty (5%). To first order a linear interpolation of AGB captures the tendency for vegetation to allocate new growth throughout the year, but ignores specific phenological events such as spring leaf flush and senescence. Nevertheless, this approach is sufficient to remove the systematic biases in AGB within CLM5.

The LAI data product (GIMMS LAI3g) was generated from a neural network algorithm that used the third generation Global Inventory Modeling and Mapping Studies (GIMMS) Normalized Difference Vegetation Index (NDVI3g) and Terra Moderate Resolution Imaging Spectroradiometer (MODIS) LAI and FPAR products (Zhu et al., 2013). We used the LAI3g product files (15-days frequency) from the first half of each month for our assimilations. The native resolution uncertainty estimate was based upon the RMSE calculated between GIMMS LAI3g and field measured LAI (0.68 m2 m−2; Zhu et al., 2013). The RMSE was calculated from 45 data sets from 29 field sites that spanned six biomes representing different land-cover classifications (Zhu et al., 2013).

2.5 Spatial Aggregation of Observations and Uncertainty Propagation

We aggregated (spatially averaged) the observations from the native product resolution for LAI (0.083° × 0.083°) and AGB (0.25° × 0.25°) to the CLM resolution (0.95° × 1.25°) in an effort to avoid land surface representation mismatch. The source of this mismatch is that a CLM grid cell is a statistical representation of multiple PFTs and soil columns without a specific subgrid location (Section 2.2). This lack of fine-scale information makes it impossible to relate specific PFTs/column to the fine-scale observations. Currently DART calculates a weighted average of all PFTs and columns within a grid cell that is a better representation of the observations aggregated to CLM resolution. It is possible to design DART to adjust only the subgrid PFTs/columns most related to the observed biome types (e.g., forest, grasslands, and shrubs); however, our observation data sets do not include that level of biome specific information. In practice, failure to aggregate the observations to match the resolution of CLM led to model instability during the assimilation (data not shown), presumably because of the mismatch in scale between the observed and simulated land surface properties. Potential alternatives to this approach are discussed in (Text S2).

The spatial averaging of the observations also required the propagation of observation uncertainty from fine to coarse spatial resolution. Because a detailed characterization of the native resolution uncertainty was not available, we took the following approach. We assumed the uncertainty for the AGB product was fully systematic because the VOD-AGB empirical relationship was based on a single biome (tropics) that differed from our experimental region (Y. Y. Liu et al., 2015). Therefore, during the aggregation step, we assumed the AGB product uncertainty did not decrease with sample size, but maintained the same 5% relative uncertainty during the spatial aggregation (Section 2.4). The LAI product, on the other hand, was designed to be applied globally for all possible biomes (Zhu et al., 2013) thus we assumed random error without spatial correlation and used a random error propagation formula that decreased by a factor of 1/urn:x-wiley:19422466:media:jame21399:jame21399-math-0002 (Taylor, 1997) providing a reduced aggregated LAI uncertainty (0.2 m2 m−2), where N is the number of observations within a CLM grid cell (0.95° × 1.25°). To test the impact of the AGB uncertainty upon the assimilation, we performed a diagnostic assimilation (“low” uncertainty) where similar to the LAI observations, we also assumed AGB uncertainty decreased by a factor of 1/urn:x-wiley:19422466:media:jame21399:jame21399-math-0003.

2.6 Coupling CLM5.0 and DART

Data assimilation within DART was carried out by running CLM5.0 for a duration of time that matched the monthly frequency of the observations. A single assimilation cycle worked as follows: (a) a month-long 80-member ensemble simulation was performed with CLM5.0; (b) the CLM restart file containing the complete model state vector was output at the end of the simulation; (c) the state vector within the restart file was updated using the EAKF; (d) the updated restart file was reinserted into CLM, and the simulation was restarted for the next month. The 80-members ensemble simulation was run in CESM multiinstance mode such that only one executable was required. The model state at the end of the month-long simulation period is known as the forecast or prior state of the ensemble simulation because it is unconstrained by the model observations. At this point, with prior inflation enabled, the spread amongst ensemble members was increased. Next, an observation operator was applied to the CLM state vector to convert from model to observation space. For example, the leaf carbon state variable was multiplied by the PFT specific leaf area, and the fractional PFT to provide a weighted grid cell LAI average to compare against the observed LAI. Next, the EAKF performed an update that converted the prior LAI to the analysis, or posterior LAI. These posterior observation space values for LAI were then regressed back into model space (leaf carbon), and inserted back into the CLM restart file. Similarly, the modeled AGB was compared against the observed AGB, by summing modeled leaf carbon, live stem carbon, and dead stem C, weighted by fractional coverage of PFT.

Imposing EAKF updates upon the adjusted state variables within CLM can lead to model instability. Once CLM has been adjusted by the EAKF and restarted, the downstream model state variables must come into equilibrium with the adjusted state variables. This can present challenges for the carbon, nitrogen, water, and energy balance checks within CLM, because the adjustments initially violate mass conservation. To prevent model failure, we turned off the CLM balance checks for the first 30-min model time step during the beginning of each of the monthly simulations, and turned them back on afterward. During preliminary assimilations, we encountered model balance check failures that occurred after the first time step. We found we could overcome these model balance check failures by assimilating monthly observations and by aggregating the observations to match the coarser spatial resolution of CLM5.0 (Section 2.5).

2.7 Experimental Region: Western United States

All simulations were performed within the Western United States (11 states), with the model domain defined between the US borders with Mexico and Canada and between the Pacific Ocean and the eastern edge of Colorado within 49.00°N and 31.30°N and 124.40°W and 102.05°W. This region is characterized by complex, mountainous terrain that includes the Rocky Mountains through the interior, Marine West Coast Forests (e.g., Coast Range), and Western Mountain Ranges (e.g., Cascades, Klamath, and Sierra Nevada Ranges) (Sleeter et al., 2012). Lower elevation ecoregions include Mediterranean California, Cold Deserts (e.g., Great Basin), and Warm Deserts (e.g., southern New Mexico and Arizona). Especially in the interior regions, the vegetation distribution has a strong dependence on elevation, latitude, predominant wind direction, and aspect (Bailey, 1995). Common vegetation types include Engelmann spruce and subalpine fir (subalpine zone), ponderosa pine, aspen, juniper, and oak at intermediate elevations (montane zone), and sagebrush, oak, pinyon-juniper woodland, and blue grama grass (foothill zone), with grasslands and shrublands commonly found in low-elevation valleys (Drummond, 2012).

In general, the Western United States is a semiarid region where the vast majority of precipitation falls during winter, primarily as snow in the highest elevations where the snowmelt is critical for water supply and plant growth for the surrounding ecoregions (Drummond, 2012). Annual precipitation is highest in the Pacific Northwest where there is a sharp precipitation gradient from west to east ranging from 2,510 mm in the Coast Range to 120 mm in the Central Basin with regional mean annual temperature ranging from 7 to 13.5°C (Hudiburg et al., 2009). For comparison, the Southern Rocky Mountain Region is characterized by mean annual temperatures from 2 to 10°C and mean annual precipitation ranging between 260 and 1,020 mm (Bailey, 1995).

2.8 Experimental Setup

We perform a series of CLM5-DART simulations (Table 2) to understand the impact of observations upon carbon-cycle behavior, and to test a range of DART settings to quantify the uncertainty of the simulations. All simulations used the CLM5 BGC-Crop component set with a default representation of plant hydraulic stress (Lawrence et al., 2019) and the fire module turned off. In order to provide near present-day conditions for the assimilation runs, a single instance simulation was started from near bare ground conditions and spun-up for 200 years in accelerated decomposition mode followed by 1,000 years in standard mode. This was followed by a transient phase simulation from 1850 to 2000. The spin-up phase used atmospheric CO2 preindustrial conditions set to 285 ppmv and year 1,850 conditions prescribed for nitrogen deposition, aerosol deposition, and land-cover/use. The transient phase of the simulation used prescribed, transient atmospheric CO2, nitrogen and aerosol deposition, and land-cover/use, along with cycled GRIDMET meteorology (Abatzoglou, 2013; Buotte et al., 2019). We prescribed GRIDMET meteorology to CLM5.0 during the spin-up and transient simulations by cycling years 1980–2009. The GRIDMET data product is especially useful for complex, mountainous terrain because it was derived from temporally rich meteorological data from the North American Land Data Assimilation System (NLDAS-2; Mitchell et al., 2004) and high spatial resolution data from the Parameter-elevation Regressions on Independent Slopes Model (Daly et al., 2008).

Table 2. The List of CLM5-DART Simulations Performed in This Analysis
Simulation name Observations assimilated, outlier threshold? AGB observation uncertainty Observation spatial/temporal resolution Assimilation loops (1998–2011) Adaptive inflation? Adjusted CLM5 state variables CLM5 spatial resolution
Free None N/A N/A 0 N/A 0 0.95° × 1.25°
State-13 AGB/LAI, yes High, systematic 0.95° × 1.25°, monthly 3 Yes 13 0.95° × 1.25°
State-13* AGB/LAI, no High, systematic 0.95° × 1.25°, monthly 3 Yes 13 0.95° × 1.25°
State-4 AGB/LAI, yes High, systematic 0.95° × 1.25°, monthly 3 Yes 4 0.95° × 1.25°
State-19 AGB/LAI, yes High, systematic 0.95° × 1.25°, monthly 3 Yes 19 0.95° × 1.25°
LAI only LAI, yes High, systematic 0.95° × 1.25°, monthly 1 Yes 4 0.95° × 1.25°
AGB only AGB, yes High, systematic 0.95° × 1.25°, monthly 1 Yes 4 0.95° × 1.25°
Low obs. uncertainty AGB, LAI, yes Low, random 0.95° × 1.25°, monthly 1 Yes 4 0.95° × 1.25°
  • Note. The above-ground biomass (AGB) and leaf area index (LAI) observations were available at finer spatial resolution than listed, but were aggregated to match the spatial resolution of CLM5.0. All simulations were performed during the model years of 1998–2011, and this time period was looped three times (assimilating observations each time) for the assimilation runs to approach equilibrium. The adjusted state variables are the subset of variables within the CLM5 state vector that are adjusted by the EAKF within DART. All simulations listed below assimilate data except for the free run. The state-13 assimilation run is the standard assimilation used in the analysis for CLM5-DART. All other assimilation runs are used as diagnostics. The State-13 assimilation used on observation outlier threshold whereas State-13* did not.
  • EAKF, Ensemble Adjustment Kalman Filter; DART, Data Assimilation Research Testbed.

After the transient simulation was completed, we performed an ensemble spin-up simulation to create a minimum amount of ensemble spread (Section 2.1) prior to the assimilation. This ensemble spin-up used 80 members of the CAM4 meteorology (1998–2011) three times, repeated in sequence, for a total of 39 years. We used near present-day conditions (i.e., year 2000) to prescribe atmospheric CO2, nitrogen deposition, aerosol deposition, and land-cover. This endpoint was used as the initial condition for all of the simulations (assimilation and free runs) shown within this manuscript. Similar to the ensemble spin-up, the manuscript assimilations and free runs also used the CAM4 meteorology and near present-day atmospheric conditions and land-cover.

The standard assimilation run (Table 2) used observations of LAI and AGB (Section 2.4), assumed high observation uncertainty (Section 2.5), and adjusted 13 state variables (state-13) within CLM (Section 2.3; Table 1). The standard assimilation was composed of three loops that repeat the years 1998–2011 assimilating observations each time. The purpose of the looping was to remove the transience of the model adjustment by allowing the adjusted and downstream state variables (Section 2.3) to approach equilibrium. The analysis of the assimilation runs is primarily limited to loop 3; however, we also present loops 1 and 2 within the figures to demonstrate the rate of adjustment.

The assimilation run was compared against FLUXCOM, an ensemble of carbon flux estimates based on a machine learning approach where remote sensing and meteorological data sets (RS + METEO) are trained on eddy covariance flux data (Jung et al., 2020). Similar to the LAI data product used in CLM5-DART, FLUXCOM uses MODIS satellite data to characterize the vegetation state in terms of LAI and vegetation indices (e.g., normalized difference vegetation index). The ensemble spread of FLUXCOM represents a range of machine learning approaches (n = 3), meteorological data sets (n = 5), and eddy covariance flux partitioning methods (n = 2). In addition, we performed a series of diagnostic assimilations (Table 2) to test the uncertainty in the CLM5-DART estimates in AGB, LAI, and carbon fluxes given uncertainties within the setup of the assimilation system. This included the impact of the type of observations assimilated (LAI only, AGB only), the observation uncertainty (low obs. uncertainty), and the number of adjusted state variables (state-4, state-19) within the assimilation. We evaluated the success of an assimilation through observation acceptance statistics, RMSE between the assimilation and observations, and the total spread of the ensemble. In general, the goal was to reduce the discrepancy between observations and the assimilation run by maximizing the accepted observations (Section 2.3), and minimizing the RMSE between the observations and simulation. We also evaluate what influence the observation outlier threshold has upon the observation acceptance rate and assimilation through a diagnostic run (state-13*) in which all observations are accepted. The total spread statistic is the sum of the CLM5 ensemble variance and observation error variance and should be comparable to the RMSE. This provides an indication of proper functioning of the adaptive inflation algorithm.

3 Results

3.1 Simulated Biomass and Carbon Exchange (State-13 Assimilation)

The assimilation of observations into CLM substantially reduced the simulated biomass when compared to the free run, with domain average reduction in AGB and LAI of 31% and 27%, respectively (Figures 23S1 and S2; Table 3).

Details are in the caption following the image

Comparison across the Western United States for the CLM5-DART assimilation (loops 1–3; light, medium, and dark red lines with shaded regions, respectively) against a CLM5 free run (no observations; black lines and shaded regions) and observations (blue dots) of leaf area (LAI) and above-ground biomass (AGB). The panels provide simulated values of (a) LAI, (b) AGB and carbon fluxes of (c) Gross primary productivity, (d) Ecosystem respiration, and (e) cumulative NEP. The assimilation runs were performed across the 1998–2011 time window three times (loops 1–3) to reach equilibrium. Only loop 3 is considered for evaluating assimilation performance, and loops 1–2 are shown for reference only. The shaded regions within (a)–(d) represent the spread of the 80-member ensemble with the middle solid line representing the ensemble average and top and bottom solid lines representing ±1 SD. Where shading or individual loops are not visible (a)–(d) indicates either the ensemble spread is very small or loops are superimposed. For clarity only the median simulation is provided for cumulative NEP.

Details are in the caption following the image

Similar to assimilation runs provided in Figure 2 except specific to the major plant functional type within the Western United States domain. Leaf area index (LAI) is provided for (a) temperate evergreen trees (ENFT), (b) boreal evergreen trees (ENBT), (e) c3 grass, and (f) shrubs. Above-ground biomass (AGB) is also provided for (c) ENFT and (d) ENBT. The shaded regions for each simulation represent the spread of the 80-member ensemble with the middle solid line representing the ensemble average and top and bottom solid lines representing ±1 SD, respectively. All PFT and CFT AGB and LAI behavior within the Western United States domain are described in Figures S4, S5 and Text S3.

Details are in the caption following the image

The ensemble average carbon fluxes (2001–2010) for (a–c) gross primary productivity, (d–f) ecosystem respiration, and (g–i) NEP for the free simulation (left column), CLM5-DART assimilation (middle column), and FLUXCOM. FLUXCOM is a data-constrained modeled estimate of carbon flux based off 30 (GPP, ER) and 15 (NEP) ensemble members that span a range of land surface models, machine learning techniques, and flux partitioning methods (Jung et al., 2020).

Table 3. Comparison of the Average Simulated AGB, LAI, and Carbon Fluxes for 2001–2010 for the Western United States Domain
Simulation name AGB (kg C m−2) LAI (m m−2) GPP (g C m−2 month−1) ER (g C m−2 month−1) NEP (g C m−2 month−1)
Free 1.98 1.31 48.18 47.18 1.00
CLM5-DART 1.36 0.96 38.49 37.21 1.28
FLUXCOM N/A N/A 43.56 36.87 8.01
State-4 1.44 0.92 37.01 37.15 −0.05
State-13* 1.33 0.88 36.61 34.96 1.65
State-19 1.33 0.93 37.08 39.52 −2.43
  • Note. The free simulation is CLM5.0 without observations. The state simulations are CLM5-DART assimilation runs (loop 3 only) that adjust a varying number of state variables (4, 13, and 19). FLUXCOM is a separate data-constrained estimate of carbon flux (Jung et al., 2020). The state-13 assimilation is the standard assimilation used in this analysis for CLM5-DART. All other “state” assimilation runs are used as diagnostics.
  • DART, Data Assimilation Research Testbed.

Whereas LAI was reduced across almost the entire Western United States, the reduction in simulated AGB was limited to the high elevation and Pacific Northwest regions only (Figure S3). The assimilation reduced the AGB/LAI of the natural vegetation species only (Figure 3), whereas crop species were left unaffected (Figures S4 and S5). For example, the overall reduction in AGB (−607 g C m−2) was composed primarily of a reduction of the temperate evergreen needleleaf (−392 g C m−2) and boreal evergreen needleleaf (−202 g C m−2) PFTs, respectively (Figure 3). The overall reduction in LAI (−0.38 m2 m−2) consisted of PFT specific reductions for temperate evergreen needleleaf (−0.08 m2 m−2), boreal evergreen needleleaf (−0.06 m2 m−2), broadleaf deciduous temperate shrub (−0.05 m2 m−2), and c3 nonarctic grass (−0.24 m2 m−2). This significant decrease in biomass, in turn, decreased the domain average GPP and ER by 20% and 21%, respectively, and increased the strength of the carbon sink to land from 1.0 to a 1.3 g C m−2 (Table 3).

The acceptance rate of observations during the assimilation influenced how quickly the ensemble mean approached the observations (Figure 5). Overall, the LAI observations were accepted at a higher rate than AGB with 84% and 73% of observations accepted on average during loop 3, respectively (Figure 5). This resulted in a forecast RMSE of 0.38 m2 m−2 and 104 g C m−2 for LAI and AGB, respectively. The acceptance of AGB observations increased strongly across the three assimilation loops with an acceptance rate of 48%, 66%, and 73% accompanied by reductions in forecast RMSE of 178, 111, and 104 g C m−2, respectively. The acceptance rate of LAI observations, on the other hand, remained steady with 80%, 84%, and 84% acceptance rates during the three assimilation loops, respectively. The LAI observations had a higher acceptance rate during the winter months (94%), compared to the summer months (72%).

Details are in the caption following the image

The RMSE and total spread statistics for the assimilation run (state-13) for leaf area (a) loop 1 and (b) loop 3, and above-ground biomass (c) loop 1 and (d) loop 3 for the Western United States. The RMSE and total spread are provided for the prior (“pr”—black and teal) and posterior (“po”—gray and light teal) estimates. Also provided are the total observations available (pink circles) versus the observations assimilated (pink asterisks) during each assimilation time step (monthly).

Estimates of ecosystem respiration from the CLM5-DART assimilation (state-13) and median FLUXCOM ensemble were relatively similar across the Western United States (Table 3; Figures 4 and 6). GPP, on the other hand, was lower for CLM5-DART (38.5 g C m−2) compared to FLUXCOM (43.6 g C m−2), and when combined with ER, simulated a weak net sink of carbon across the Western United States (1.3 g C m−2). In contrast, FLUXCOM projected a much stronger carbon sink across the Western United States during this period (8.0 g C m−2). CLM5-DART simulated a weak carbon source for a majority of the high, mountainous terrain and a carbon sink for the lower terrain including the Eastern Plains (Figures 4 and S3). FLUXCOM projects nearly the opposite, with a strong carbon sink in the high mountainous terrain and near neutral carbon exchange elsewhere (Figure 4). Other regions of deviation in net carbon exchange between FLUXCOM and CLM5-DART include the Pacific Northwest, the Eastern Plains of Colorado and Montana. In terms of seasonal timing, both FLUXCOM and our CLM5-DART simulations project similar peak carbon uptake during June/July; however, the CLM assimilation projected the land as a strong carbon source during the nonsummer seasons, whereas FLUXCOM projects the land to be close to carbon neutral during this time (Figure 6).

Details are in the caption following the image

A comparison of carbon fluxes (a, c, e) and cumulative carbon fluxes (b, d, f) between the free (black) and the CLM5-DART assimilation (red) against FLUXCOM (yellow) averaged over the Western United States. The year 2005 (a, c, e) was provided as an example of seasonal carbon flux behavior and is representative of all years between 2001 and 2010. The shaded regions for the free and assimilated simulations represent the spread of the 80-member CLM5 ensemble with the middle solid line representing the ensemble average and top and bottom solid lines representing ±1 SD. The shaded regions for FLUXCOM represents the mean deviation for the 30 (gross primary productivity, ecosystem respiration) and 15 (NEP) ensemble members from the median value (middle solid line). The FLUXCOM ensemble members represent a range of meteorological data, machine learning approaches, and flux partitioning methods (Jung et al., 2020).

The carbon uptake of the CLM5-DART assimilation was limited by soil moisture availability (Figure 7). The domain average GPP was the highest during meteorological spring (39.8 g C m−2 month−1) and summer (76.8 g C m−2 month−1), with a urn:x-wiley:19422466:media:jame21399:jame21399-math-0004 water limitation value (low values correspond to strong limitation) of 0.69 and 0.71, respectively. Although urn:x-wiley:19422466:media:jame21399:jame21399-math-0005 was lower in the winter compared to other seasons, the diagnostic is susceptible to low GPP and frozen surface soils and becomes a poor indicator of GPP limitation for that season. The Pearson correlation coefficient between GPP and urn:x-wiley:19422466:media:jame21399:jame21399-math-0006 for spring (0.67) and summer (0.64) was relatively high compared to fall (0.30) and winter (0.56). This indicates that water limitation is a stronger control upon GPP in spring/summer compared to other seasons when factors such as temperature, sunlight, and LAI are increasingly important. Snow water equivalent (SWE), an important contributor to soil moisture during the spring and summer seasons, was generally underestimated across the Western United States during spring (17 mm), summer (0.0 mm), fall (6 mm), and winter (33 mm). The Southern Colorado Rockies, Sierra Nevada, and Cascade mountain ranges were generally devoid of (simulated) SWE throughout all seasons. For reference, historical site observations (1998–2010) within the Southern Colorado Rockies and Cascades show significant amounts of SWE with peak median values occurring between the last week of March to first week of April (NRCS, 2021). Furthermore, the median date for complete melting in these areas occurs between mid-June and mid-July.

Details are in the caption following the image

The CLM5-DART (state-13) assimilation average behavior (1998–2010) for (a–d) BTRAN (urn:x-wiley:19422466:media:jame21399:jame21399-math-0007), (e–h) gross primary productivity, and (i–l) SWE (snow water equivalent) for meteorological spring, summer, fall, and winter. The urn:x-wiley:19422466:media:jame21399:jame21399-math-0008 value is a diagnostic for soil moisture limitation upon GPP with low (∼0) and high (∼1) values indicating strong and weak limitation, respectively.

3.2 Sensitivity of Assimilation to CLM5.0 Adjusted State Variables

The simulated average net carbon uptake across the Western United States was sensitive to the choice of adjusted state variables during the assimilation (−2.4 to 1.3 g C m−2 month−1); however, no combination of adjusted state variables brought the CLM5-DART assimilation in close agreement to FLUXCOM (8.0 g C m−2 month−1; Table 3). The CLM5 state variables that were adjusted across all of the simulations (state-4, state-13, and state-19), for example, LAI and AGB, show a relatively small range of values for loop 3 (LAI: 0.92–0.96 m2 m−2, AGB: 1.33–1.44 kg C m−2; Figures 8a and 8b). In contrast, state variables that switch between adjusted and downstream variables between the assimilations showed the largest ranges, including, for example, root carbon (492–580 g C m−2) and especially litter carbon (0.2–2.6 kg C m−2) (Figures 8 and S6). The relatively large increase in litter carbon pool for state-19 compared to the other assimilations contributed to a strong boost in heterotrophic flux and promoted a net emission of carbon from the land (Table 3).

Details are in the caption following the image

Comparison between Western United States assimilation runs that vary the number of state variables adjusted by Data Assimilation Research Testbed. The assimilation runs state-4, state-13, and state-19 adjust 4, 13, and 19 state variables, respectively (Table 1). This figure shows a sampling of the simulated state variables including (a) Leaf area, (b) Above-ground biomass, (c) Soil carbon, (d) Root carbon, and (e) Cumulative NEP. All assimilations show loop 3 for each assimilation, respectively. The shaded regions (a)–(d) represent the ±1 SD of the 80-member ensemble. The state-13* assimilation is identical to state-13 except the observation outlier threshold is turned off.

3.3 Sensitivity of Assimilation to Observation Type, Uncertainty, and Acceptance Rate

Overall, the simulation was more sensitive to the type of observation assimilated with a cumulative NEP differing by 180 g C m−2 (Figure 9), but less sensitive to acceptance rate and observation uncertainty with cumulative NEP differing by only 40 and 65 g C m−2, respectively (Figures 8S6 and S8). The LAI-only assimilation behaved similarly to the LAI/AGB assimilation (difference of 14 g C m−2 in cumulative NEP) whereas the AGB-only assimilation was nearly identical to the free simulation (difference of 35 g C m−2 in cumulative NEP; Figure 9). This apparent higher impact of LAI compared to AGB observations upon the assimilation reflected both the high sensitivity of carbon cycling within CLM5 to leaf area, but also the low acceptance rate (5%) of observations during the AGB-only assimilation (not shown). Both AGB and LAI observations were necessary to invoke a higher acceptance rate of AGB observations (48%) which prevented unrealistic simulated AGB (Figure 9b). The inclusion of LAI observations helped to provide an initial adjustment to simulated AGB because leaf carbon is a common state variable to both simulated AGB and LAI (Section 2.6). This adjustment from the LAI observations helped bring the simulated AGB closer to the AGB observations and thus reduced the AGB observations rejected through the observation outlier threshold. In terms of observation uncertainty, the low observation uncertainty assimilation imposed a faster adjustment upon simulated AGB as compared to the assimilation making standard uncertainty assumptions (Section 2.5; Figure S7). In terms of observation acceptance, adding more observations to the assimilation through adjustment of the observation outlier threshold (state-13*) tended to decrease the AGB and LAI more than the standard assimilation (state-13; Figures 8 and S6) and led to decreases in both GPP (−1.88 g C m−2 month−1) and ER (−2.25 g C m−2 month−1). These decreases in GPP and ER offset and lead to a small increase in NEP (0.37 g C m−2 month−1; Table 3).

Details are in the caption following the image

Comparison of assimilation behavior when both above-ground biomass (AGB) and leaf area index (LAI) observations (peach), AGB observations only (brown), and LAI observations only (green) are assimilated, respectively. The AGB/LAI assimilation is loop 1 from the state-4 assimilation run. In panels where AGB/LAI simulations (peach) are not completely visible, it is covered by LAI only simulations (green). The shaded regions (a and b) represent the ±1 SD of the 80-members ensemble.

4 Discussion

4.1 Assimilation of AGB and LAI Observations

Satellite-derived observations of AGB and LAI were assimilated into a land surface model in an effort to improve the representation of AGB, LAI, and carbon exchange across the Western United States. The observations substantially reduced the simulated AGB and LAI compared to a free (no observation) simulation (Figures 2 and S3) bringing the assimilation run into closer agreement with the observations (Figure 5). This adjustment in LAI and AGB led to a slightly stronger simulated net carbon uptake across the Western United States from 2001 to 2010 (Figures 46 and S3) through a reduction of carbon release at higher elevations, and an increase in uptake at lower elevations (Eastern Plains). Overall, the assimilation of observations brought the CLM5-DART estimate of carbon uptake in closer agreement with FLUXCOM, yet large differences remained between the two. Most striking was the difference in carbon uptake across the higher, mountainous terrain where CLM5-DART projected near neutral carbon exchange and FLUXCOM strong carbon uptake.

Whereas both CLM5-DART and FLUXCOM are observation-constrained estimates of carbon uptake, both have limitations and neither should be interpreted as truth. The FLUXCOM carbon uptake is more consistent with flux tower data that demonstrate higher elevation forests across the Western United States are carbon sinks (e.g., Anderson-Teixeira et al., 2010; Blanken et al., 2009; Duarte et al., 2017; Kwon et al., 2018). This similarity reflects the machine learning approach in FLUXCOM, which is trained on flux tower data (Jung et al., 2020). However, the FLUXCOM approach is also limited by the sparse availability and quality of flux data (Section 1) within the Western United States leading to enhanced prediction errors (extrapolation index; Jung et al., 2020) and, in general, less skill at predicting NEE compared to other flux variables (Tramontana et al., 2016). This weaker NEE performance could indicate that FLUXCOM does not explicitly account for site disturbance history and forest inventories (Tramontana et al., 2016). CLM5-DART, on the other hand, does account for disturbance history through the assimilation of AGB and LAI observations, as well as through land-cover and land-use change during the transient phase of the spin-up (Section 2.8). This spin-up process is also important for initializing the soil carbon pools, an important component of net carbon flux. The relatively low carbon uptake estimate within CLM5-DART could reflect that the Western United States is a semiarid, water-limited system (Berner et al., 2017). This is consistent with our simulation in which water limitation was found to have an important impact on the pattern and magnitude of carbon uptake (Figure 7). However, it is concerning the CLM5-DART estimate of carbon uptake does not capture increased uptake with elevation, and there are clear biases in simulated snowcover (Figure 7). This suggests including observations to better inform water cycling could be important within CLM5-DART (Section 4.3). In summary, the large difference in carbon uptake solutions reflects the contrast between an empirical approach constrained by flux data (FLUXCOM) to a mechanistic approach constrained by vegetation state data (CLM5-DART). Furthermore, the changes within the assimilation setup in our analysis could not account for this large difference carbon uptake (Section 4.2).

Our results differed from three other assimilation studies where the LAI observations generally increased the simulated LAI across the Western United States, leading to increases in GPP (Albergel et al., 2018; Kumar, Mocko, et al., 2019; Ling et al., 2019). Similar to this analysis, in two out of the three studies, the LAI adjustment across the Western United States also led to modest convergence with the FLUXNET GPP and NEE estimates, yet the majority of convergence for those studies (Albergel et al., 2018; Kumar, Mocko, et al., 2019) was limited to croplands across the Midwest United States. This likely illustrates the contrast between crop ecosystems where productivity is primarily controlled by the vegetation state (e.g., biomass) versus semiarid systems of the Western United States where productivity is also strongly limited by environmental conditions, such as water limitation.

4.2 Sensitivity of CLM5-DART to Assimilation Setup

The CLM5-DART solution of a weak carbon sink across the Western United States was generally robust across the diagnostic simulations tested here including the number of adjusted state variables, observation type, observation acceptance rate, and observation uncertainty. Whereas varying the number of adjusted state variables led to a range of carbon uptake solutions ranging from a weak sink to weak source of carbon (Figure 8), the only solution that suggested a carbon source (assimilation-19) was influenced by unrealistic litter pool behavior (Figure S6). There is evidence that the carbon uptake solutions for all CLM5-DART assimilations were biased toward less carbon uptake because of an overestimation of soil carbon and heterotrophic respiration flux. This is because the soil carbon pool (downstream variable) has a slow turnover time, and thus was still equilibrating (decreasing) during the assimilation adjustment at the end of loop 3 (Figures S1 and S2). Nevertheless, the CLM5-DART solution is substantially weaker than FLUXCOM and the two do not agree within uncertainty bounds (Figure 6).

The assimilation system tended to reject observations within localized regions of the Western United States, yet this did not significantly influence the carbon uptake solution (Table 3 and Figure 8). That is because regions of observation rejection (Text S3, Figure S9) either had a low impact on the carbon cycle (e.g., Great Basin and Desert Southwest), or observation rejection was limited to 1–2 months each year (e.g., Rocky Mountains). Given the relatively high impact of the Pacific Northwest upon the carbon cycle, the high rejection of LAI observations was of concern. However, this high LAI observation rejection was justified in that the seasonal variation in observed LAI (1.0–4.5 m2 m−2) was unrealistically high and inconsistent with the CLM5.0 prescribed land-cover type of predominantly evergreen species (i.e., areal coverage of 72% ENFT, 16% natural grass, deciduous broadleaf, and crop combined). The rejection of observations, therefore, was in part from systematic differences between independent observations (and algorithms) of LAI (Zhu et al., 2013) and land-cover types (Lawrence et al., 2018). Furthermore the low observed winter LAI values are a known issue with satellite-derived LAI products (Fang et al., 2013; Garrigues et al., 2008; Jin et al., 2017). Finally, the observation type and observation uncertainty had a relatively weak impact upon the overall net carbon uptake (Figures 9 and S7). Although observation uncertainty influenced the rate of the assimilation adjustment (Figure S7), our approach to loop across the assimilation time window 3 times reduced the sensitivity of simulated carbon uptake to observation uncertainty.

4.3 Opportunities to Improve Assimilation Setup

We have demonstrated that assimilation of LAI and AGB within CLM5-DART was an important first step to represent land-atmosphere carbon exchange, yet the inability to capture the general pattern of increased carbon uptake at higher elevations suggests opportunities for improvement. First, the published uncertainty values for the AGB and LAI data products may be overconfident and not fully capture the true uncertainty for our application. For example, the biomass satellite product (Y. Y. Liu et al., 2015) is based on an empirical relationship between vegetation optical depth (VOD) and AGB which was calibrated for tropical regions and then applied to the Western United States domain. The application of this empirical fit to a region outside the tropics may have led to an underestimation of the product uncertainty. For example, the Y. Y. Liu et al. (2015) product provides low biomass estimates within the interior mountain regions of the Western United States compared to other products (Xiao et al., 2019). Furthermore, bodies of water interfere with the VOD measurement, therefore, estimates of AGB were not available for the coastal plain of the Pacific Northwest. The LAI product tends to provide uncharacteristically high seasonal variation (1.5–4.5 m2 m−2) in regions of the Pacific Northwest (Figure S10) which is dominated by temperate coniferous forests. This may have had an adverse effect upon the biomass state in these regions and contributed to an underestimation of carbon uptake. We recommend the data providers not only include an uncertainty estimate, but also characterize the nature of the uncertainty between random and systematic components to allow for proper error propagation for end-users. This information is useful when the data products are aggregated at a spatial resolution different than the native resolution.

Water cycling observations (e.g., snow water equivalent, soil moisture, evapotranspiration, surface temperature) should improve the simulation of carbon cycling given the strong dependence of vegetation productivity on water limitation within the Western United States (Berner et al., 2017). This is supported by the simulated water limitation responsible for the spatial pattern of GPP in this analysis (Figure 7). Furthermore, the simulated snowcover within CLM5, a key contributor to soil moisture, tended to underestimate snowpack which likely limits water availability especially during the spring season. Whereas some degree of water limitation is normal in the Western United States, given that CAM4 was derived from a fully coupled ESM (Raeder, Anderson, Collins, Hoar, et al., 2012) and not designed specifically for complex terrain like GRIDMET, the CAM4 ensemble likely included a dry/warm bias. The utility of water cycling observations is consistent with other LDAS studies in which snow cover (Kumar, Mocko, et al., 2019), soil moisture (Albergel et al., 2018; Bonan et al., 2020), and land surface temperature (Smith et al., 2020) were used to improve model simulations. Although these studies focused primarily on water cycling, clearly there is potential to improve carbon cycling as well.

The adjusted state variables within CLM5-DART should be expanded to include soil carbon and soil water. Soil carbon was included as a downstream variable within all assimilations in this analysis and as a result its behavior generally lagged behind the adjusted variables (Figures S1 and S2). The looping across the assimilation window did not completely allow for internal equilibrium, leading to the transient behavior in soil carbon and heterotrophic respiration (Figure S2). Including soil carbon as an adjusted variable would not only eliminate the transient behavior but could also have an important impact upon the magnitude of heterotrophic respiration and thus net carbon uptake. On the other hand, within the current assimilation setup, we recommend that litter carbon should not be an adjusted state variable given that the state-19 assimilation (included carbon and nitrogen litter state variables) yielded unrealistic behavior (Figure S6) indicative of weak correlations between the litter variables and simulated LAI and AGB.

There is an opportunity for the assimilation system to apply a more accurate and direct correction of the simulated vegetation behavior within CLM5. The current assimilation setup system adjusted individual PFTs from the LAI and AGB observations indirectly through the covariance between the simulated PFT and grid cell average PFT behavior. This indirect correction was necessary given that the current observations did not provide PFT specific information, and CLM5 does not provide the PFT location within a grid cell. Yet, in the future, emerging high resolution and PFT specific databases could allow for a direct correction of the dominant PFT within the CLM grid cell. An example of where PFT specific corrections might offer improvement is within the Eastern Plains which include portions of Montana, Wyoming, and Colorado. The CLM5-DART assimilation adjustment tended to increase the strength of the carbon sink within this region (Figure 4). This increase in carbon sink occurred to this predominantly grassland area despite the reduction in LAI and AGB (Figures S3 and  S11S13) because it maintained its carbon uptake (NEP) during the growing season while heterotrophic respiration was reduced especially during the late winter/early spring (Figure S13). This increase in carbon uptake should be viewed with caution, as the AGB and LAI adjustment was only applied to the natural grasslands within this region and not to the crops (Figure S12). It is unclear how the adjusted carbon uptake would shift if the AGB and LAI adjustments were applied more fairly across both the natural grass and crop PFTs.

Finally, enhancing CLM5-DART to adjust parameter values is another promising approach to reduce model errors and forecast uncertainty (e.g., Aksoy et al., 2006; Shi et al., 2014). Although the adjustment of state variables can help to correct for many sources of error that negatively impact the model state (i.e., structural, parametric, and boundary errors), in general, the forecast skill degrades as the simulation becomes further removed from the previous adjustment step. In this analysis, a degradation in model performance was observed in terms of the simulation of LAI, in which the RMSE was much greater for the prior versus the posterior state (Figure 5). There is an opportunity to use parameter estimation to improve the degradation in LAI RMSE through, for example, the leaf tissue allocation parameter in CLM5. In general, parameter estimation requires a strong understanding of the sources of model error, such that the correction is applied directly to the offending parameterization and not convolved with other sources of error (e.g., meteorology). Alternatively, the forecast error uncertainty could potentially be addressed through adjusting the frequency of the assimilation or the time resolution at which the increment is applied (Girotto et al., 2016). Ultimately decisions about the frequency of application of the assimilation must be balanced with practical concerns of availability of observations, model stability and computational expense.

5 Conclusions

This study made use of a land surface data assimilation system (CLM5-DART), assimilating observations of vegetation state to help improve simulations of carbon exchange across the complex terrain of the Western United States. Observations of AGB and LAI reduced the simulated biomass, especially for the higher elevation terrain, leading to substantial reductions in both simulated GPP and ER. The reductions in GPP and ER mostly offset yielding a revised estimate of carbon uptake nearly identical to the free simulation. This revised estimate of net carbon uptake was approximately six times weaker than FLUXCOM, and persistently simulated weak carbon uptake even when multiple sources of uncertainty in assimilation methodology were considered. We hypothesize a key difference between CLM5-DART and FLUXCOM carbon uptake estimates relates to different methods that account for site history, including information about land-cover change and disturbance.

We conclude that biomass observations such as AGB and LAI are an important first step to improving the simulated vegetation state, yet the inability for the assimilation system to capture the overall spatial pattern of increased carbon uptake with elevation suggests that water limitation incorrectly limited carbon uptake. This is supported by CLM5 model diagnostics that strongly link water limitation with the spatial pattern of GPP. Furthermore simulated snowcover within CLM5 is persistently underestimated especially in southern and western portions of the domain. We hypothesize this underestimation in snowcover adversely influenced the simulation of soil moisture and consequently carbon dynamics. We recommend adding additional observations related to snowpack and soil moisture to improve the representation of water cycling that better complement carbon-based biomass observations. Furthermore, the assimilation system should augment the adjusted state variables to include both soil carbon and soil water state variables to promote internal equilibrium within the model, and better account for observations related to water cycling.

Acknowledgments

This research was supported by the NASA CMS Program (awards NNX16AP33G and 80NSSC20K0010). CESM is sponsored by the National Science Foundation and the U.S. Department of Energy. The authors would like to thank the Center for High-Performance Computing at the University of Utah. The authors would also like to acknowledge high-performance computing support from Cheyenne (https://doi.org/10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation, through allocation awards UUSL0005 and UUSL0007. A special thank you to the editor Eleanor Blyth, associate editor, and three reviewers who provided helpful insight that improved the manuscript. The National Center for Atmospheric Research is sponsored by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation.

    Data Availability Statement

    The CLM5.0 regional maps of biomass and carbon exchange generated in this analysis are archived at the ORNL DAAC titled “CLM5-DART estimates of regional carbon fluxes and stocks over the Western United States” (data set ID 77065fbc29) available at https://doi.org/10.3334/ORNLDAAC/1856 as part of the NASA Carbon Monitoring Systems Data set List.