Toward more realistic projections of soil carbon dynamics by Earth system models
Abstract
Soil carbon (C) is a critical component of Earth system models (ESMs), and its diverse representations are a major source of the large spread across models in the terrestrial C sink from the third to fifth assessment reports of the Intergovernmental Panel on Climate Change (IPCC). Improving soil C projections is of a high priority for Earth system modeling in the future IPCC and other assessments. To achieve this goal, we suggest that (1) model structures should reflect real-world processes, (2) parameters should be calibrated to match model outputs with observations, and (3) external forcing variables should accurately prescribe the environmental conditions that soils experience. First, most soil C cycle models simulate C input from litter production and C release through decomposition. The latter process has traditionally been represented by first-order decay functions, regulated primarily by temperature, moisture, litter quality, and soil texture. While this formulation well captures macroscopic soil organic C (SOC) dynamics, better understanding is needed of their underlying mechanisms as related to microbial processes, depth-dependent environmental controls, and other processes that strongly affect soil C dynamics. Second, incomplete use of observations in model parameterization is a major cause of bias in soil C projections from ESMs. Optimal parameter calibration with both pool- and flux-based data sets through data assimilation is among the highest priorities for near-term research to reduce biases among ESMs. Third, external variables are represented inconsistently among ESMs, leading to differences in modeled soil C dynamics. We recommend the implementation of traceability analyses to identify how external variables and model parameterizations influence SOC dynamics in different ESMs. Overall, projections of the terrestrial C sink can be substantially improved when reliable data sets are available to select the most representative model structure, constrain parameters, and prescribe forcing fields.
Key Points
- First-order decay functions capture macroscopic SOC dynamics, but their underpinnings need more study
- Optimal parameter calibration through data assimilation is a high priority to reduce model biases
- Traceability analyses are needed to understand the consequences of variation in driving variables
1 Introduction
Soils contain the largest terrestrial pool of organic carbon (C) [Jobbágy and Jackson, 2000; Tarnocai et al., 2009], yet their representation in Earth system models (ESMs) currently contributes substantial uncertainty to C cycle and climate projections in the Earth system [Jones et al., 2003, 2006; Arora et al., 2013; Todd-Brown et al., 2013]. When C cycle feedbacks are expressed in a common framework with other feedbacks in the climate system, such as clouds or ice albedo effects, the magnitude of their uncertainty is comparable to cloud feedbacks, which have long been regarded as the most significant uncertainty in climate modeling [Gregory et al., 2009]. Particularly relevant to the subject of our analysis in this paper, Jones and Falloon [2009] found a strong relationship between changes in soil organic C (SOC) and the strength of simulated C-climate feedbacks within ESMs. Thus, it is important to improve representation of soil processes and feedbacks in ESMs. This paper aims to reveal possible ways of improving model projections.
Representation of soil C dynamics in ESMs requires knowledge in soil science and model development. In the discipline of soil science, climate has long been recognized as one of the primary factors determining distributions of soil C and nitrogen (N) stocks [Jenny, 1961]. Soil microorganisms were recognized as the main agents of decomposition of soil organic matter, and the effects of substrate quality on decomposition rates, including C-N interactions, have been studied for nearly a century [Tenney and Waksman, 1929; Waksman, 1952]. However, we continue to struggle with the challenge of applying our knowledge about these basic principles as we develop sophisticated numerical models to represent the feedbacks between climate and soil C stocks. For example, soil C stabilization, although long recognized as a key emergent property for understanding C stocks and flows, is still a subject of much research and has not yet been explicitly integrated into numerical soil biogeochemistry models [Kleber et al., 2007; Schmidt et al., 2011; Davidson et al., 2014].
The initial development of soil organic C models, such as the RothC and CENTURY models [Jenkinson et al., 1987; Parton et al., 1987], began in the 1980s to simulate the effects of different agricultural practices on crop yields, soil C and N dynamics, and nutrient cycling for long-term agricultural experimental sites. During the 1990s, when the effects of environmental change (e.g., elevated CO2 and climatic change) on ecosystem dynamics became major research questions, soil C models were integrated into many ecosystem models to simulate plant production, soil C dynamics, trace gas fluxes, and nutrient cycling in response to climate change for all of the major ecosystems in the world [Schimel et al., 1996]. Global models of the terrestrial C cycle were developed about two decades ago with the primary goal of projecting changes in terrestrial C storage under increasing atmospheric CO2 concentration and climate change [Melillo et al., 1993; Haxeltine and Prentice, 1996; McGuire et al., 1997]. Simulation of SOC dynamics has been mainly based on existing models such as CENTURY [Parton et al., 1993], together with dynamic global vegetation models [Cramer et al., 2001], land use scenarios [Hurtt et al., 2011], and N cycling [Edburg et al., 2011] in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Those models have been used to determine the strength of climate-C feedbacks [Arora et al., 2013] and anthropogenic CO2 emissions compatible with a target CO2 pathway [Jones et al., 2013].
Various evaluation studies of CMIP5 results show that ESMs give widely different projections of soil C dynamics and poorly fit observations [Todd-Brown et al., 2013, 2014; Carvalhais et al., 2014; Yan et al., 2014]. Furthermore, the simulated contemporary soil C stock varies from 510 to 3040 Gt C among the 11 ESMs [Todd-Brown et al., 2013]. Similar magnitudes of uncertainties were also found for the overall land C models in other studies [Friedlingstein et al., 2006; Arora et al., 2013]. This variability is not well constrained by modern benchmarks and is reflected in the future scenarios [Todd-Brown et al., 2014]. On the other hand, theoretical analysis suggests that many of the soil C processes are intrinsically predictable given knowledge of the initial conditions, carbon input rates, soil carbon residence times, and their environmental sensitivities [Luo et al., 2015]. Thus, it is still possible that projections of soil C dynamics by the current ESMs can be significantly improved through reduction of biases among models and better fit with observations.
Fundamentally, model projections of SOC dynamics rely on three components: model structure, parameterization (including initial values of C pools), and external variables (Figure 1) [Raupach and Lu, 2004; Luo et al., 2011]. The poor performance of current ESMs can result from biases in any of the three components across time, space, and soil depth. Model structure is the set of equations used to describe dynamic patterns of SOC processes. For example, a first-order decay function with fixed coefficients describes the monotonic decrease over time of SOC that has entered a soil pool. More complex behaviors can arise when the decay coefficients vary with time- and site-dependent microbial activities or alternative functional forms, such as Michaelis-Menten kinetics, are incorporated [Li et al., 2013; Wang et al., 2014a]. The basic patterns described by a model must be consistent with empirical evidence, as ESMs are ultimately used to predict real-world phenomena. Once the model structure is defined, coefficients in the equations are assigned through parameterization, a process necessary to complete model specification. While model structure tends to define the range of possible trajectories, the choice of parameter values for a given model defines the quantitative accuracy in specific simulations of SOC dynamics. Thus, even with the same model structure, differences in parameter values can generate divergent modeling results. Realistic projections of soil C dynamics also require more reliable external variables, representing the physical and biological environment the soil experiences. As C cycle processes are highly sensitive to environmental change, biases in soil forcings can lead to biases in model projections. Note that while some variables, such as temperature and precipitation, are treated as forcings when land models alone are used for simulation, they can be prognostic variables in coupled simulations of ESMs. To focus the discussion of this paper on modeling SOC dynamics, we refer to all variables that are exogenous to the soil C cycle as external variables no matter how they are represented in a model.

This paper examines each of the three modeling components in terms of their representation in the current generation of ESMs, recent advances, and our recommendations for future research directions. We first show that extensive exploration of alternative model structures in the past three decades has mostly conformed to a similar formulation with C input from litter production and soil C transformation being processed through a series of first-order decay processes. Vast results from experimental and observational studies need to be synthesized to evaluate whether the model formulation should be altered or whether the model parameters should be varied with soil attributes and external variables. Second, we argue that advances in parameter estimation methods make them ready for use in model calibration as standard practice to reduce systematic biases among models. Third, we demonstrate that external variables are represented in ESMs in different ways and potentially cause major variations among model projections. To disentangle the complex impacts of external variables, it is necessary to trace what external variables are represented and how they are manifested in the models.
2 Structure of Soil C Models
2.1 Current Model Structures


Overall, equation 2 can conceptually express all of the soil C transformation processes and summarize structures of classic SOC models, such as the CENTURY [Parton et al., 1987, 1988, 1993] and RothC models [Jenkinson et al., 1987], as well as those embedded in ESMs [Ciais et al., 2013]. Also, this model structure is generally consistent with five fundamental properties of the terrestrial C cycle: compartmentalization, C input through photosynthesis and subsequent plant tissue senescence and mortality, partitioning among pools, donor pool-dominant transfers, and first-order decay [Luo and Weng, 2011; Luo et al., 2015]. Thousands of data sets published in the literature from litter decomposition and soil incubation studies have been used to obtain first-order decay parameters that can be used in ESMs [Zhang et al., 2008; Schädel et al., 2013, 2014; Liang et al., 2015]. The scalar function, ξ(t), in equation 2 represents the environmental modifier for decomposition and transfer rates with respect to changes in temperature, moisture, litter quality, and soil texture. Empirical studies have also indicated that temperature, moisture, litter quality, and soil texture are primary factors that control soil C decomposition and stabilization [Burke et al., 1989; Adair et al., 2008; Zhang et al., 2008; Xu et al., 2012; Wang et al., 2013a].
2.2 Recent Advances
If the fundamental processes of SOC dynamics are straightforward (i.e., C inputs and transformations before release as CO2) and can be described by a relatively simple set of equations, why do ESMs disagree so much in projecting global soil C dynamics? To answer this question, we need to understand what we have learned from empirical studies on SOC transformation and C input.
SOC transformations are regulated by environmental variables (e.g., temperature, moisture, oxygen, N, phosphorus, and acidity varying with soil profile, space, and time), litter quality (e.g., lignin, cellulose, N, or their relative content), organomineral properties of SOC (e.g., complex chemical compounds, aggregation, physiochemical binding and protection, reactions with inorganic, reactive surfaces, and sorption), and microbial attributes (e.g., community structure, functionality, priming, acclimation, and other physiological adjustments) (Figure 2). Meanwhile, C inputs are also regulated by plant allocation strategies (e.g., root/shoot ratio) and root biology (e.g., rooting depth, rhizodeposition, and symbiotic relationship with mycorrhizae). In addition, both C inputs and transformations are influenced by soil erosion, mineralogy [Egli et al., 2008; Dümig et al., 2011; Sistla and Schimel, 2013; Doetterl et al., 2015], topography, land management, land use change, and other disturbances. While we apparently cannot incorporate all those factors and processes into one model, the art of soil modeling is to determine what should be explicitly represented in models and what can be ignored. As no consensus has been developed on what should be explicitly represented in models, the modeling community has explored several types of alternative model structures from equation 2.

First, fundamentally different from the first-order, linear C transfer model are the nonlinear microbe-explicit models that have been developed to account for microbial roles in decomposition and stabilization of SOC [Wieder et al., 2015]. These models have the potential to explain priming effects [Kuzyakov, 2010], acclimation [Luo et al., 2001; Bradford et al., 2010], and pulse responses of soil respiration to wet-dry cycles of precipitation [Liu et al., 2002; Lawrence et al., 2009]. Many microbial models are based on forward or reverse Michaelis-Menten equations [Schimel and Weintraub, 2003; Allison et al., 2010; Wang et al., 2013b; Wieder et al., 2013, 2014] that mathematically couple microbial biomass to C substrate pools. Such model formulations generate markedly different patterns of soil C dynamics, such as oscillatory responses to perturbation and insensitivity of soil C storage to C input, in comparison with classic models [Li et al., 2014; Wang et al., 2014b; Hararuk et al., 2015]. Such patterns may exist at the microbial reaction sites but have not been observed in litter decomposition and soil incubation studies. Other ways to represent microbial processes include multiplication of decomposition coefficients by microbial biomass [Fujita et al., 2014], making decomposition a function of substrate chemistry and enzyme-based microbial guilds [Moorhead and Sinsabaugh, 2006], and embedding soil enzyme dynamics in an ecosystem model [Sistla and Schimel, 2013]. Such models may avoid undesirable oscillations of nonlinear microbial models yet account for microbial roles in SOC decomposition. Nevertheless, it has also been argued that decomposition of SOC is more likely to be limited by substrate availability than microbial activities [Schimel and Schaeffer, 2012].
The second category of alternative model structures is related to differences in the way that pools are defined. Soil C pools, which are conceptually devised with different decomposability in the classic SOC models [Jenkinson, 1990; Parton et al., 1993], have been replaced by measurable SOC fractions [Smith et al., 2002; Stewart et al., 2008; Luo et al., 2014] or continuous functions [Ågren and Bosatta, 1996]. A key motivation for this change in definitions was the need to reconcile models with observations of 13C and data. Many long-lived compounds are in principle highly decomposable, implying that the idea of a fixed distribution of turnover times as in classic soil models is too simplistic [Schmidt et al., 2011]. A vertically resolved soil biogeochemical scheme has been introduced with mixing of soil C and N among soil layers due to bioturbation, cryoturbation, and diffusion [Koven et al., 2013]. The soil C dynamics in each layer can still be represented with the classic model structure, and as a result, inclusion of the vertical dimension does not alter the fundamental behaviors of the models. The soil C storage in vertically resolved models is still determined jointly by C influx and residence times.
Third, while the model formulation to simulate C transfer may be similar to the classic models, different response functions (i.e., different ξ(t) in equation 2) are used to simulate C cycle responses to external variables. For example, temperature modifies almost all processes in the C cycle. A variety of formulations, including exponential, Arrhenius, and optimal response functions, have been used to describe C cycle responses to temperature changes in different models [Lloyd and Taylor, 1994; Jones et al., 2005; Sierra et al., 2015a]. Similarly, influences of soil water content on the C cycle are represented by empirically derived coefficients to modify temperature response functions [e.g., Falloon et al., 2011; Moyano et al., 2012, 2013] or limitations of substrate supply for enzymatic processes due to diffusion through water films of varying thickness as soil moisture varies [Davidson et al., 2012]. Different response functions are used to link C cycle processes with nutrient availability, soil clay content, litter quality, and many other environmental conditions. Differences in the use of individual response functions may not change basic dynamics but can substantially contribute to the uncertainty in model projections [Exbrayat et al., 2013].
Fourth, disturbance events may be represented in models in different ways [Grosse et al., 2011; West et al., 2011; Goetz et al., 2012; Hicke et al., 2012]. Soil erosion, for example, can be modeled with horizontal movement of C, adding a third dimension to classic two-dimensional models [Rosenbloom et al., 2006]. Other disturbances, such as fire, extreme drought, insect outbreaks, land management, and land cover and land use change can be represented in models to influence soil C dynamics via (1) modifying soil and microclimatic environments; (2) transferring C from one pool to another (e.g., from live to dead pools during storms or release to the atmosphere with fire); and (3) altering rates of C processes, for example, gross primary productivity (GPP), growth, tree mortality, or heterotrophic respiration [Kloster et al., 2010; Thonicke et al., 2010; Luo and Weng, 2011; Prentice et al., 2011; Weng et al., 2012]. Although many disturbance events can be incorporated into classic models without changing the basic formulation (i.e., equation 2) [see, e.g., Weng et al., 2012], the structure of a model with disturbances represented is different from that without, leading to different simulated SOC dynamics.
Lastly, model structures are commonly considered altered even if the model formulation is the same when different processes are represented to simulate environments that soils experience. Most of the ESMs involved in CMIP5 underestimate C storage in wetland and peatland regions [Limpens et al., 2008] because C dynamics in wetlands or peatlands are not simulated. Different rates of decomposition under anaerobic conditions in the wetlands and peatlands are eventually expressed through changes in parameters of equation 2 instead of its formulation. Similarly, dramatically increased decomposition rates of thawed soil C in permafrost regions can be represented by different parameter values in models that explicitly simulate permafrost processes [Hobbie et al., 2000; Koven et al., 2013; Schädel et al., 2014]. More generally, model structural uncertainty can arise from many different external variables being represented although all the models may use a similar formulation as in equation 2.
In sum, past studies have extensively explored alternative model structures. Except the nonlinear microbial models and the three-dimension erosion models, all the alternative structures conform the basic formulation: C inputs from litter production and soil C transformation are processed through a series of first-order kinetic transfer functions between pools (equation 2). Those processes and factors as depicted in Figure 2, or even different ways of defining pools, all influence soil C dynamics mainly through their effects on rate parameters and state variables of equation 2. However, how the rate parameters and state variables are affected by those variables is not well specified.
2.3 Future Research Directions
It is urgent to synthesize experimental and observational results to better represent soil carbon cycling processes in ESMs. Thousands of papers have been published from observational and experimental studies on many processes and factors that influence soil carbon dynamics. Although empirical results from litter decomposition and soil incubation studies have been partly synthesized to verify the first-order kinetic transfer [Zhang et al., 2008; Xu et al., 2016], the majority of data published in the literature has not been integrated to help modelers decide which processes should be explicitly represented.
Data synthesis is particularly crucial to evaluate how microbial processes should be incorporated into ESMs (Figure 3). Microorganisms have long been known to catalyze almost all the SOC transformation processes (e.g., decomposition, stabilization, and mineralization). Developing microbial models to project SOC dynamics will require firm empirical evidence to address two questions: (1) how do microbial functions vary with environmental factors? and (2) does that variation significantly affect decomposition and other key soil processes [Schimel, 2001]? Although many microbial models have been proposed to explore possible microbial roles in SOC dynamics [Wieder et al., 2015], these models need rigorous evaluation with observations before they can be incorporated into ESMs.

In addition, more and more studies indicate that multilayer soil C models are needed to account for depth-dependent variations in C inputs and decomposition. For instance, radiocarbon ages of soils in most climate zones increase rapidly with depth [Mathieu et al., 2015], implying that some properties of soil C dynamics may be different in shallow versus deep layers. In addition, decomposition rates in permafrost-affected soils drop off abruptly with depth as soils transition from a seasonally thawed active layer at the surface to permanently frozen layers.
Moreover, many processes associated with disturbances, land management, dynamic vegetation, and nutrients all potentially have strong effects on soil C dynamics. Soil C processes in wetlands and peatlands are also strongly regulated by anaerobic environmental conditions, leading to distinctive SOC dynamics from uplands. Ideally, all the important processes should be incorporated into soil C models. Practically, it is impossible for any models to do so. Thus, it is a challenge to decide what processes are to be included versus omitted.
3 Parameterization of Soil C Models
3.1 Current Practices
Parameterization is a major cause of the mismatch between modeled and observed soil C stocks. For example, global soil C stocks varied from 510 Gt C to 3040 Gt C among 11 models involved in CMIP5 [Todd-Brown et al., 2013] and ranged from 425 to 2111 Gt C among 10 terrestrial biosphere models in the Multi-scale Synthesis and Terrestrial Model Intercomparison Project [Tian et al., 2015]. The large differences in modeled soil C stocks were attributed to threefold differences in C inputs and fourfold differences in residence times [Todd-Brown et al., 2013; Carvalhais et al., 2014; Yan et al., 2014]. Moreover, ESMs simulated different responses of land C storage and emissions to land use, partly due to different parameterizations of land surface processes [Brovkin et al., 2013].
Parameterization is among the least studied components of modeling and has been analyzed relatively crudely [Luo et al., 2001]. It is well known among modelers that a model with well-calibrated parameters at one site may not reliably estimate SOC dynamics at other sites unless the parameters are adjusted again [Xiao et al., 2014]. Parameter calibration has been traditionally used to fit model output to observations in simulation models. It is sometimes based on data from observations and experiments so that parameter values can be specified within observed ranges when data can be directly converted to parameter values, such as specific rates of litter decomposition [Zhang et al., 2008]. Parameter values can alternatively be set with an educated guess when they are not measurable in experiments due to technical limitations, such as root exudation. In addition, parameter values can also be set to generate reasonable pool sizes, such as fine root biomass as a result of root growth and mortality. Or parameter values can sometimes be chosen to ensure model stability.
Even though it has long been a common practice in the ecological modeling community, the calibration of parameters in ESMs often is not practical. It becomes very difficult to identify parameters in a complicated model that can be effectively calibrated to fit data well across diverse landscapes. Without rigorous parameter calibration, together with the omission of some processes, all ESMs or global land models show systematic biases in modeled soil C stocks [Luo et al., 2015]. Spatially, SOC estimates from individual models are poorly correlated with the harmonized world soil database (HWSD) [Food and Agriculture Organization et al., 2012], showing either systematic underestimation or overestimation [Todd-Brown et al., 2013; Tian et al., 2015], although it should be noted that there is considerable uncertainty in the HWSD.
3.2 Recent Advances
Model parameters can be calibrated with data using statistically rigorous methods, such as data assimilation [Luo et al., 2011]. Data assimilation integrates multiple sources of information from field observations to constrain parameters of C cycle models at ecosystem, regional, and global scales [Xu et al., 2006; Zhou and Luo, 2008; Weng and Luo, 2011; Zhou et al., 2012; Niu et al., 2014]. At the global scale, for example, Hararuk et al. [2014, 2015] applied data assimilation methods to estimate the optimal parameters of the Community Land Model (CLM3.5) and two nonlinear microbial models with global databases of soil C content and microbial biomass. The optimized model explained 41% of the global variability in the observed SOC in comparison with 27% with the original parameters of CLM3.5 (Figure 4). The estimated SOC at the regional and global scales still have high mismatches at least partly due to the assumption that the training data represents soils in steady state [Carvalhais et al., 2008, 2010; Weng et al., 2011] and partly due to model structural errors (e.g., no representation of peatlands or wetlands). When parameters that quantify nonsteady states of the C cycle are estimated, the mismatches significantly decrease for both vegetation and soil C pools [Zhou et al., 2013].

The research community at large has recently made substantial progress in understanding parameters and their estimation with data assimilation in several aspects. First, it is known that soil C dynamics are quantitatively determined by relative changes in C inputs and residence times even if there are tens or hundreds of parameters in a typical soil C cycle model [Todd-Brown et al., 2014; Luo et al., 2015]. Thus, the behavior of modeled SOC dynamics at any point in time, space, and soil depth can be analyzed according to variations in C influx and residence time as they are related to environmental variables, litter quality, SOC properties, microbial attributes, and disturbances [Xia et al., 2013]. When parameters related to C residence times were calibrated against a global soil C database with data assimilation, model agreement with observations was substantially improved (Figure 4) [Hararuk et al., 2014].
Second, it is important to recognize that individual data sets contain information that may be useful in constraining only a subset of parameters [Weng and Luo, 2011]. For example, flux data can constrain flux-related parameters but contain little information with which to constrain pool-based parameters and vice versa [Keenan et al., 2013; Du et al., 2015]. Thus, data on soil respiration and net ecosystem exchange from flux measurements may not be useful to constrain soil C dynamics unless combined with pool-related data sets [Sierra, 2012; Keenan et al., 2013]. A few C cycle data assimilation systems have been developed to assimilate data mostly from eddy flux, atmospheric, and satellite measurements [Rayner et al., 2005; Kaminski et al., 2013], whereas several studies use pool-based data sets only to constrain C transfer coefficients among pools [Hararuk et al., 2014]. It will become more effective to improve global land models when flux- and pool-based global data sets are combined to constrain both C input and residence times [e.g., Smith et al., 2013].
Third, most parameter estimation studies have shown that the number of parameters constrained by observational data sets is limited, typically only a few parameters from each data set [Wang et al., 2001; Xu et al., 2006]. When a few data sets are used to calibrate parameters of complex models, such as ESMs, the calibrated models with very different structures are able to fit equally well the existing observations but project largely different responses to future scenarios [He et al., 2014]. This is an issue of equifinality with respect to the available information [Luo et al., 2009; Sierra et al., 2015b]. To avoid the equifinality issue, multiple sources of high-quality diverse observations are necessary (Table 1). Because soil C content is a complex product of litter production and SOC decomposition [Luo et al., 2003; Zhou and Luo, 2008; Xia et al., 2013], all parameters related to C inputs, allocation, and decomposition need to be constrained by multiple observations [Smith et al., 2013]. In addition, observations that constrain parameters controlling soil physics (for example, thermal conductivity and water holding capacity) will indirectly help constrain SOC decomposition. Data products that are most effective for constraining parameters of soil C models include soil C pools, litter C pools, and root biomass [Jackson et al., 2000] (Table 1). NPP and litterfall can be used to constrain C inputs to the soil system. Radiocarbon data are very informative as joint constraints for parameters of slow processes of soil C models [Gaudinski et al., 2000; Trumbore, 2009; Baisden and Canessa, 2013; Koven et al., 2013].
Category | Data Type | Temporal Frequency | Spatial Coverage | Model Constraints | |
---|---|---|---|---|---|
Observation | Litter production | Most annually | DPMa | C input to soil pools | |
Litter mass | Most monthly | DPM | Litter pools and transfer coefficients | ||
Litter decomposition | Most monthly | DPM | Rates of litter decomposition | ||
Root growth | Irregular | DPM | Rates of root growth | ||
Root biomass | Irregular | DPM | Root pools and transfer coefficients | ||
Soil carbon | Once every few years | DPM | Soil carbon pools and transfer coefficients | ||
Soil microbial biomass | Irregular | DPM | Microbial pools and transfer coefficients | ||
Soil respiration | Most monthly | DPM | Rates of soil carbon decomposition | ||
Isotopes | 13C in soil and efflux | Irregular | DPM | Carbon transfer | |
14C in soil and efflux | Irregular | DPM | Residence times of soil carbon | ||
Vertical profiles | Root, soil C, and 14C | Irregular | DPM | Vertical transfer and properties of soil carbon | |
Global change experiment (elevated CO2, warming, precipitation, and nitrogen) | Litter, soil, and root dynamics | Weekly to yearly | Mostly in temperate and boreal regions | Response functions of soil carbon processes to global change factors | |
and a few from tropical regions | |||||
- a DPM = distributed point measurement.
Fourth, the quality of the data sets matters for calibrated parameters. Higher-quality data sets used in data assimilation result in more representative parameters estimates. One of the most popular soil databases used to benchmark ESMs is the HWSD. The HWSD presents estimates of SOC content for 30 × 30 arc sec grid cells (~1 × 1 km), using class transfer functions that take into account regional differences in soil types [Omuto et al., 2013]. It is important that these data sets be accompanied by soil reference data that encompass factors important in soil formation. Recently, digital soil mapping techniques that include machine learning algorithms have been developed that draw on large soil profile databases and analyses of environmental covariates representing soil forming factors [Arrouays et al., 2014; Hengl et al., 2014], leading to a much improved accuracy (e.g., r2 = 0.61 for SOC content in Africa at 250 m resolution) [Hengl et al., 2015]. Alternately, data assimilation can directly use spatially distributed data points [Zhou et al., 2013], avoiding the uncertainty introduced by harmonization as required for HWSD-type mapping approaches.
Fifth, parameters are not necessarily constants as often assumed in traditional simulation models. More and more syntheses have shown that model parameters vary between sites of measurements, often change with time, and may be better represented as probability distributions [Medlyn et al., 1999; Lebauer et al., 2013]. For example, fine root allocation tends to follow a log-normal distribution [Saugier et al., 2001]. Moreover, parameters estimated from data sets derived from global change experiments vary with those global change factors [Luo et al., 2003; Xu et al., 2006; Shi et al., 2015]. For example, comparison of posterior probability density functions from data assimilation studies showed that estimated C turnover in foliage and fine root pools was much higher at elevated than ambient CO2 at the Duke forest free-air CO2 enrichment site [Luo et al., 2003; Xu et al., 2006]. Those differences in estimated parameters propagate through models, leading to differences in simulated soil C pools.
In sum, technical development in the past decade or so makes it possible to use data assimilation to rigorously calibrate parameters of soil C models as standard practice. Systematic biases of soil C models can be effectively reduced when many sets of high-quality data from both flux- and pool-based measurements over long time and large space are used to calibrate two synthetic parameters: C influx and residence time.
3.3 Future Research Directions
Data assimilation is among the highest priorities for near-term research to reduce systematic biases (Figures 3 and 4) that pervade almost all soil C models. As a cornerstone for data assimilation, research efforts are needed to develop high-quality diverse data sets capability of effectively constraining parameters in soil C models (Table 1). Parameter calibration through data assimilation with common high-quality databases is expected to be especially effective when inputs, external forcings, and parameters can be constrained simultaneously.
Parameter calibration for ESMs with multiple data streams at the global scale has to tackle several challenges, such as the compatibility of multiple, heterogeneous data sets that constrain different model aspects across a diverse range of temporal and spatial scales, intractability of structural complexity of big models, equifinality of model structure selection and parameter estimation, and computational demand of global optimization with complex models. To effectively tackle the challenges of data assimilation with complex models, we have to develop more innovative approaches through multidisciplinary collaboration with mathematicians, statisticians, and computer scientists. One example is the development of the semianalytic spin-up method by Xia et al. [2012] that greatly reduces the computing time for global parameter estimation. The traceability framework developed by Xia et al. [2013] can help isolate various components and develop high-fidelity emulators of the complex C models for data assimilation [Hararuk et al., 2014].
While data assimilation is expected to reduce model biases, parameterization also needs to represent uncertainty arising from subgrid heterogeneity in models. Many of the dominant processes incorporated in ESMs are characterized by very high subgrid-scale variability in space and time. This variability is always much higher than any ESMs can resolve. These highly variable processes must be parameterized using the variables represented in ESMs, together with ancillary data on boundary conditions such as soil and vegetation properties, sometimes through stochastic upscaling models. This is an important issue that the soil modeling community needs to work on.
4 External Variables to Soil C Cycling
4.1 Representation of External Variables in ESMs
Soil C dynamics are subject to changes in external variables. The external variables that influence soil C dynamics include climate conditions (e.g., temperature and precipitation), edaphic conditions (e.g., soil texture, mineralogy, and soil depth), soil thermal conditions, hydrological conditions (soil moisture, water table in wetlands, frozen versus liquid versus vapor state in seasonally and perennially frozen soils), oxygen and nutrient levels (e.g., redox state and N and phosphorus availability), and vegetation characteristics (e.g., rooting depth and litter types) (Figure 2). Those external variables regulate various aspects of soil C dynamics, cause spatial or temporal variability, and need to be appropriately represented in models.
External variables are represented in ESMs at least in three different ways: (1) parameters that do not evolve over time but directly control the system dynamics, (2) boundary conditions that evolve over time as forcing but are not part of the system being modeled, and (3) prognostic variables that are allowed to evolve over time as part of the modeled system. Traditionally, prognostic soil environmental variables are considered to be part of the modeled system but treated as external variables to SOC processes in this paper to facilitate analysis of SOC modeling results. Indeed, isolating endogenous processes from exogenous variables helps not only diagnose causes of model uncertainty but also understand fundamental properties of the terrestrial C cycle [Luo et al., 2015].
Clay content and mineralogy, which influence soil C stabilization and decomposition, are parameters that are usually assigned from observations. Regional and global data sets can constrain information on physical properties (soil texture and mineralogy) [Gulde et al., 2008; Feng et al., 2009; Journet et al., 2014]. Soil depth, which is required for multisoil layer models, is a forcing parameter that is directly derived from measurement [Journet et al., 2014]. Vegetation and land use for stand-alone, site-specific models are usually set as parameters to control the system dynamics.
Many forcing variables are represented as boundary conditions to land components of ESMs. For example, prescribed atmospheric CO2 concentration, temperature, and precipitation that are used to drive global land models are external forcing variables. A global N deposition model product from 1860 to 2100 offers dynamic forcing variables for coupled C-N models [Lamarque et al., 2010]. Simulating horizontal and vertical movements caused by soil erosion at regional and global scales will require a global net soil redistribution map [Chappell et al., 2014] to provide boundary conditions for global land models.
Many of those climate, edaphic, hydrological, and vegetation variables are exogenous to soil C processes but simulated in coupled models. For example, a coupled C-N model simulates not only N influences on C processes such as photosynthesis, plant C allocation, and litter decomposition but also N dynamics as influenced by C cycle processes, such as plant N uptake, N fixation, microbial N immobilization, and denitrification. To realistically represent N influences on the C cycle, the models have to accurately simulate both N processes and responses of C cycling to N. Temperature, precipitation, and atmospheric CO2 concentration are boundary conditions in global land models but evolve over time as prognostic variables in the coupled climate-C models. As the domain of the modeling problem expands from just the soil to entire ecosystems, to the land surface, and to the Earth system with coupled land-atmosphere-ocean dynamics, more and more exogenous variables to the SOC dynamics are included in models as prognostic variables to reflect real interactions among system components.
Overall, external variables are represented in different ESMs in very different ways. Tracing the way of each exogenous variable being represented in each ESM, either as parameter, boundary condition, or prognostic variable, is critical to understanding the nature of model uncertainty and to disentangling the sources of uncertainty.
4.2 Recent Advances
The relative contributions of external variables to uncertainty in land C modeling have been recently quantified. Ahlström et al. [2013] investigated the potential sensitivity of the global terrestrial ecosystem C balance to different climate forcing generated by four general circulation models (GCMs) under three different CO2 concentration scenarios. Variations in climate variables (e.g., temperature, precipitation, and shortwave radiation) generated by different GCM explained the majority of the uncertainty in the future evolution of global terrestrial ecosystem C. Studies with Dynamic Global Vegetation Models suggest that the uncertainty in total terrestrial C storage caused by differences in climate variables among GCMs is comparable to the uncertainty caused by the responses of the C cycle components [Berthelot et al., 2005; Schaphoff et al., 2006; Scholze et al., 2006; Ahlström et al., 2012, 2013].
To disentangle complex representations of external variables in influencing simulated C dynamics in ESMs, Xia et al. [2013] developed a traceability framework to decompose the complex terrestrial C cycle into a few traceable components (Figure 5). The traceability analysis helps identify sources of uncertainty in modeled steady state ecosystem carbon storage due to (1) C input as affected by phenology, physiology, and C use efficiency [Xia et al., 2015], (2) edaphic and vegetation characteristics as related to baseline C residence time, (3) climate scalars, and (4) environmental variables among models. The traceability framework has been applied to assess influences of external variables being represented as parameters, boundary conditions, and diagnostic variables in models.

Differences in values of external variables set as parameters can be a major source of uncertainty in modeling the C cycle. Driven with similar climate data, the CLM3.5 model simulated ~31% larger C storage capacity than the Australian Community Atmosphere Biosphere Land Exchange (CABLE) model [Rafique et al., 2014]. According to the traceability analysis, the projected difference in C storage between the two models results from differences in either NPP or residence time or both. CLM3.5 simulated 37% higher NPP than CABLE due to higher rates of carboxylation in CLM3.5. On the other hand, residence time of ecosystem C was 11 years longer in CABLE than CLM3.5. The difference in residence time is mainly caused by longer baseline residence time of woody biomass (23 years in CABLE versus 14 years in CLM3.5) and higher proportion of NPP allocated to woody biomass (23% in CABLE versus 16% in CLM3.5).
A recent application of the traceability framework partitioned climate-induced soil C modeling uncertainties into soil decomposition rates, NPP, and vegetation turnover [Ahlström et al., 2015]. A global dynamic vegetation-ecosystem model, LPJ-GUESS, was used with a detailed individual and patch-based representation of vegetation structure, demography, and resource competition [Smith et al., 2001]. Changes in climate variables and CO2 concentrations from 13 different climate or Earth system model simulations from CMIP5 under RCP8.5 radiative forcing were used as boundary conditions for simulations with LPJ-GUESS. The 13 climate change scenarios caused uncertainties in modeled global C storage through their influences on NPP, vegetation dynamics and turnover, and soil decomposition rates. To quantify relative contributions of climate-induced changes in those processes to modeled carbon storage uncertainty, an emulator was developed to describe the carbon flows and pools exactly as in simulations with LPJ-GUESS according to the traceability framework. Traceability analysis indicated that NPP, vegetation turnover, and soil decomposition rates explained 49%, 17%, and 33%, respectively, of uncertainties in modeled C storage.
When external variables to the soil C cycle evolve in the model as diagnostic variables, their impacts on C cycles are much more difficult to disentangle. Xia et al. [2013] applied the traceability framework (Figure 5) to analyze impacts of N feedback on the C cycle. Incorporation of N processes into the CABLE model decreased C storage in all biomes via decreased NPP or decreased residence times or both. The decreases in residence times resulted from N-induced changes in C allocation among plant pools and changes in transfers from plant to litter and soil pools.
4.3 Future Research Directions
To understand model-model differences, it is essential to trace what and how external variables are represented in models. Complex impacts of external variables on the modeled C cycle are due to at least three reasons: (1) different sets of external variables being incorporated into individual ESMs, (2) the same set of external variables being represented in different ways either as parameters, boundary conditions, or prognostic variables in ESMs, and (3) different response functions to link external variables to C cycle processes. The diverse ways of representing of external variables substantially contribute to differences in modeled soil C dynamics for any model intercomparison projects.
It is also essential to expand the traceability framework to analyze transient dynamics of the C cycle under climate change. The traceability framework developed by Xia et al. [2013] can be only applied to steady state carbon cycle analysis. Carbon cycle modeling is primarily to study responses of ecosystems to climate change. Thus it is critical to trace how different sets of external variables and their diverse representations affect the transient dynamics of C cycle under climate change.
5 Concluding Remarks
Recent analyses of CMIP5 results have revealed enormous differences in SOC projections among ESMs. This paper attempts to identify causes of the model-model differences from the three components of modeling: structure, parameterization, and external forcing. The current generation of ESMs shares a similar model formulation to represent soil C processes (i.e., the donor pool-dominant and first-order C transfers among multiple pools). This formulation is consistent with fundamental properties of the terrestrial C cycle and captures the macroscopic patterns observed from litter decomposition and soil incubation studies. Synthesis of vast available data sets needs be done to examine whether the model formulation or merely its parameters should vary with microbial processes, soil depth, nutrient availability, and disturbances among many other processes and factors.
Incomplete use of observations in model parameterization is a major cause of model-model differences. Of the two synthetic parameters that determine soil C storage, contemporary C influx differed by threefold and residence times differed by fourfold among CMIP5 models. It is conceivable that optimized calibration of model parameters with common databases through data assimilation could substantially reduce systematic biases among models, especially if inputs and external forcings are also simultaneously constrained by common protocols. To achieve this, we need to improve the availability and use of global databases, develop C cycle data assimilation systems that can effectively assimilate both flux- and pool-based data sets into global C cycle models, and understand subgrid variability of model parameters.
Individual ESMs not only include different sets of external variables that are exogenous to soil C cycles but also represent them at least in three different ways (e.g., parameter, boundary condition, and prognostic variables) and using different response functions. The diverse representations of external variables contribute markedly to the differences in modeled soil C stock and dynamics. In the next few years, we should expand the list of output variables from ESMs so as to permit more comprehensive model evaluations, such as traceability analysis, to attribute the model differences to various causes. The long-term goal is to develop an evaluation-improvement system to allow fast feedback between performance evaluation and model development toward realistic representations of ecosystem C cycle responses to climate change.
Acknowledgments
The paper stemmed from a workshop “Representing soil carbon dynamics in global land models to improve future IPCC assessments” held at Breckenridge, CO, USA on 12–14 June 2014. The workshop was financially supported by the United States National Science Foundation Research Coordination (RCN) grant DEB 0840964 and Department of Energy DE SC0008270. Y.L. was financially supported by U.S. Department of Energy grants DE-SC0006982, DE-SC0008270, DE-SC0014062, DE-SC0004601, and DE-SC0010715 and U.S. National Science Foundation (NSF) grants DBI 0850290, EPS 0919466, DEB 0840964, and EF 1137293; N.C. by Project UID/AMB/04085/2013; CDJ by the Joint DECC/Defra Met Office Hadley Centre Climate Program (GA01101). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Please contact the corresponding author at [email protected] for details of the data used in this work.