Representing Grasslands Using Dynamic Prognostic Phenology Based on Biological Growth Stages: Part 2. Carbon Cycling

Grasslands are one of the most widely distributed and abundant vegetation types globally, and land surface models struggle to accurately simulate grassland carbon dioxide, energy, and water fluxes. Here we hypothesize that this is due to land surface models having difficulties in reproducing grassland phenology, in particular in response to the seasonal and interannual variability of precipitation. Using leaf area index (LAI), net primary productivity, and flux data at 55 sites spanning climate zones, the aim of this study is to evaluate a novel prognostic phenology model (Simple Biosphere Model, SiB4) while simultaneously illustrating grassland relationships across precipitation gradients. Evaluating from 2000 to 2014, SiB4 predicts daily LAI, carbon, and energy fluxes with root‐mean‐square errors < 15% and individual biases <10%; however, not including management likely reduces its performance. Grassland mean annual LAI increases linearly with mean annual precipitation, with both SiB4 and the Moderate Resolution Imaging Spectroradiometer (MODIS) showing a 0.13 increase in LAI per 100‐mm increase in precipitation. Both gross primary production and ecosystem respiration increase with growing season length by ∼8.5 g C m−2 per day, with SiB4 and Fluxnet estimates within 18%. Despite differences in mean annual precipitation and growing season length, all grassland sites shift to seasonal carbon sinks one month prior to peak uptake. During a U.S. drought, MODIS and SiB4 had nearly identical LAI responses, and the LAI change due to drought was less than the LAI change across the precipitation gradient, indicating that grassland drought response is not as strong as the overlying climate response.


Introduction
Grasslands are one of the most widely distributed and abundant types of vegetation globally. These ecosystems cover 40% of the non-ice land area while existing in a wide variety of ecoregions on all continents except Antarctica (Suttie et al., 2005;White et al., 2000), and they account for slightly over one third of the terrestrial gross primary productivity (GPP) (Beer et al., 2010;Suttie et al., 2005;White et al., 2000). Grassland species grow across a broad range of climatic conditions, from desert regions that receive less than 250 mm of annual precipitation (e.g., Gremer et al., 2015) to regions that receive more than 2,000 mm (e.g., Wolf et al., 2011). Grassland species are very hardy, living in locations where the mean annual temperature (MAT) is  Beringer, Hutley, et al. (2016), Beringer, Mchugh, et al. (2017), and Isaac et al. (2017) AU-Stp −17.20 133.35 26.0 640 M 08-13 Beringer (2013b), Beringer, Hutley, et al. (2016), Beringer, Mchugh, et al. (2017), and Isaac et al. (2017) Lauenroth and Sala (1992), Milchunas and Lauenroth (2001), and Uresk et al. (2015, DOI:10.3334 Baldocchi (2000), Ma et al. (2007), and Grant et al. (2012, DOI:10.17190 Scholes et al. (2001) and Archibald et al. (2009) Note. The first three columns show the site name, latitude, and longitude. Site name is colored by grassland type: blue for C3 arctic, green for C3 nonarctic, orange for mixed C3/C4, and red for C4. The fourth and fifth columns show the mean annual temperature ( • C) and the mean annual precipitation (mm). The sixth column shows an M if MODIS LAI data are available (* for adjusted sites). The seventh and eighth columns show FLUXNET (black) and LTER (purple) data information: the last two digits of the starting/ending years and references, respectively. state without using a GSI is an approach by Caldararu et al. (2014), where vegetation states are determined using a mechanistic model with the premise that plant phenology is based on an optimal strategy for carbon gain. SiB4 combines these approaches to continually predict the vegetation state using growth stages (phenophases) determined from climate, weather, and environmental relationships to dictate carbon allocation, leaf photosynthetic capacity, and litterfall.
The focus of this study is twofold. First is to evaluate the performance of the grassland phenology in SiB4, which was described in detail in Haynes et al. (2019). SiB4 introduces an innovative approach to use commonalities that exist between grasslands growing in vastly different climates to model this vegetation type in an effective manner for site, regional and global studies. Work has been ongoing to generalize grassland behavior, in particular in response to rainfall, light, and fire (e.g., Gilmanov et al., 2010;Knapp et al., 2004;Koerner & Collins, 2014), and the SiB4 modeling approach expanded on these findings to develop a modeling strategy that takes advantage of consistencies in grassland behavior. This study assesses the performance of SiB4 in predicting grassland behavior across various climates and temporal scales, and the ability of SiB4 to reproduce the grassland relationships seen in this data set provides validation for this new modeling approach.
The second goal of this study is to investigate grassland behavior across precipitation gradients and to examine the impact of drought on LAI using a rich data set. Despite a substantial body of literature studying grasslands, there have been few worldwide grassland studies combining information from both flux and productivity data. Gilmanov et al. (2010) presented worldwide grassland productivity relationships with precipitation and radiation; however, they only used flux data. An additional study by Gilmanov et al. (2005) used both flux and satellite vegetation data to extrapolate grassland behavior; however, this was done only for the Great Plains region in the United States. Similarly, studies have used North American and South African grassland sites to learn about grassland consistencies between vastly different locations (Knapp et al., , 2006Koerner & Collins, 2014); however, these studies only used two sites. This study uses flux towers, in situ measurements, and satellite data, compiling three different data sets containing grassland carbon and energy metrics at 55 global sites spanning climate zones.

Sites
The grassland sites investigated in this study are listed in Table 1 and shown in Figure 1. We have been able to include sites from all continents where grasslands exist. The sites span climate zones, with site conditions ranging from MATs of −13 to 29.7 • C and mean annual precipitation (MAP) of 85 to 2,272 mm.
For fluxes, this study uses eddy covariance measurements acquired by the Fluxnet community at 34 sites (Table 2). We evaluated NEE, GPP, RE, latent heat flux (LE) and sensible heat flux (SH). When available, we compare SiB4 to Fluxnet-derived estimates of GPP and RE, even though this means comparing two modeled quantities. For sites providing multiple estimates of NEE, GPP, and RE, we selected the method closest to capturing the annual mean GPP across all calculations and years at each site (for details, see supporting information section S1). FLUXNET2015 provided filled data, and for these sites we used all the data with the selected method. Sites from the other networks were not filled, and the specific handling of the missing data is described in section 2.3.
For aboveground biomass, this study uses 8-day leaf area index (LAI) available from 2000-2014 at 48 grassland sites from the Moderate Resolution Imaging Spectroradiometer (MODIS) Land Product Subsets Project (ORNL DAAC, 2008a, 2008b. These MODIS LAI data are 8-day composites that have been specifically subset for predefined areas of ∼8 × 8 km centered on the towers. In doing this, the algorithm selects the optimal pixels from all acquisitions during the 8-day period (Knyazikhin et al., 1999). To capture phenology, seasonality, and interannual variability consistently across our broad range of sites, we use MODIS 8-day samples of LAI, rather than the more sparse site-level data.
Since the preprocessed MODIS subsets are designed to represent specific sites, we use LAI data for all of their available grassland sites and do not attempt to process global data at the tower sites that are not available. Even with using the customized site product, we found that a few of the sites include a significant fraction of nongrass vegetation. To prevent a potential bias from the influence of evergreen vegetation, we examined the satellite images of the MODIS subsets. If a substantial fraction of the preprocessed MODIS subset area is covered by forested vegetation and if there appears to be a constant offset in LAI likely due to this coverage, we adjust the data by subtracting a constant, site-specific offset. After close inspection, we adjusted the MODIS LAI at 5 of the 48 sites. This offset is set to the minimum nongrowing season LAI, shifting the time series to minimal values when the grasslands are dormant, thus removing the influence of any interfering vegetation. Although evergreen vegetation grows during the growing season, since they retain their leaves all year, these changes occur slowly and are minimal compared to the large seasonality in the surrounding grasslands, justifying the use of a constant offset.
For annual productivity, this study uses annual aboveground and belowground net primary productivity (ANPP and BNPP) collected from five Long-Term Ecological Research (LTER) sites. All five sites are located in the United States, but they cover a diverse set of grasslands, from alpine (US-NRM) to desert (US-Seg and US-Jrn) to short-grass steppe (US-CPE) to tallgrass prairie (US-Kon). Information regarding the years available at each site along with site reference and description information is provided in purple in Table 1.

The Simple Biosphere Model (SiB4)
SiB4 is a mechanistic, self-consistent, prognostic land surface model that integrates heterogeneous land cover, environmentally responsive prognostic phenology, dynamic carbon allocation, and carbon pools that cascade from live biomass to surface litter to soil organic matter (see Haynes et al., 2019 for details). In SiB4, 10-min carbon, energy, water, and vegetation predictions arise from functional relationships that hold across climate zones and vegetation type. Phenology stage, determined daily, links carbon fluxes to carbon pools by specifying the allocation fractions and maximum photosynthetic rate. SiB4 uses five stages to represent grasslands as they develop and mature through the growing season: (1) leaf-out, (2) growth, (3) maturity, (4) stress/senescence, and (5) dormant. The timing and length of each stage is not prescribed, but rather is dynamic and diagnosed using three elements: (1) growth index, (2) environmental potential, and (3) daylength potential. Large values indicate high potential, with decreasing potential as the values become smaller. A phenological index (PI) is created by multiplying these together, and PI is high when it is beneficial to grow new leaves and when the environmental conditions support photosynthesis (leaf-out and growth stages). In contrast, PI is low when there is no benefit in growing new leaves or when conditions are not suitable for photosynthesis (stress/senescence and dormant stages). To distinguish between the phenology stages, SiB4 applies prescribed threshold values to PI.
Once the phenology stage is determined, carbon is allocated to the pools and they are updated. Using the newly calculated vegetation state, SiB4 calculates radiation, hydrology, and land-atmosphere exchanges as in previous versions: Farquhar carbon assimilation rates (Farquhar et al., 1980); Ball-Berry stomatal conductance (Collatz et al., 1991; prognostic canopy air space temperature, moisture, and CO 2 (Baker et al., 2003); canopy, soil, and snow hydrology ; two-stream approximation radiative transfer (Sellers et al., 1986(Sellers et al., , 1996; and interactive carbon fluxes and pools (Haynes et al., 2019).
A novel element to this phenology approach is the growth index, which determines the benefit of growing new leaves versus allocating carbon belowground. When vegetation coverage is low, the growth potential is high as each additional leaf has a significant impact on the photosynthesis rate; however, as the canopy closes adding more leaves becomes less beneficial and the growth potential decreases. The LAI at which the growth potential begins decreasing depends on the 10-year running average precipitation rate, which dictates how much vegetation the climate can support. Both the daylength and environmental potentials are similar to those used in the GSI approach; however, the environmental potential incorporates humidity, temperature, and soil moisture.
This paper focuses on the three grassland plant functional types (PFTs): C3 tundra (C3A), C3 non-tundra (C3G), and C4 grassland (C4G). Mixed temperate C3 and C4 grasslands use the category CXG, where both C3 and C4 PFTs are simulated and combined using site-specific area-weighted contributions. At each site, the C3 and C4 contributions match the distribution growing at that location. All sites use PFT-specific parameters that are not tuned to any individual site. The parameter values were initially set to literature values and optimized in the course of this work to reach one set of values per PFT. By using common parameters, we hypothesize that this methodology is sufficiently applicable to all climates within each PFT to be useful for modeling land surface properties, ecological processes, and land-atmosphere exchanges around the globe.
All simulations in this study begin in January 1997 and run through December 2014. To provide consistency and compatibility across sites while testing the process interconnections controlling SiB4, the meteorological driver data and soil characteristics are from global data sets. For weather, this study uses hourly data (surface incident shortwave and longwave radiation; surface pressure, mixed-layer temperature, water vapor mixing ratio, and wind speed; and convective and large-scale precipitation) from the Modern-Era Retrospective Analysis for Research and Applications (MERRA), gridded at 0.5 • by 0.625 • (Rienecker et al., 2011). Following the procedure in Baker et al. (2010), to ensure realistic rainfall, each hourly convective and large-scale precipitation value is scaled equally such that the monthly total rainfall matches the monthly precipitation in the Global Precipitation Climatology Project (GPCP) Version 1.2 product (Huffman et al., 2001). The soil characteristics, including soil texture and reflectance, are specified from the International Geosphere-Biosphere Programme (IGBP) (Global Soil Data Task, 2000). The scale of the driver data is larger than the footprint of a flux tower; however, Sellers, Heiser, and Hall (1992) and Sellers, Berry, et al. (1992) showed theoretically and numerically that differences between sites are directly related to vegetation type and that model performance is effectively scale invariant with regard to the forcing meteorology, making comparisons of SiB4 predictions to tower observations robust.
Without a doubt, the grassland sites featured in this study have different land use histories, ecosystem states, and species-specific characteristics. If the goal of this study were to perform the best simulations at these sites, we would use site-specific driver data to capture local states and responses, improving model-observation agreement (and SiB4 is set up to incorporate such site-specific data easily). However, our larger goal is to test the extent to which SiB4 can simulate large-scale patterns from functional relationships. Using global data sets not only illustrates how well the processes included in SiB4 capture overarching grassland behaviors, but also provides estimates of likely errors for global simulations. Locations that perform poorly can be used to learn about important processes that are not included or whose modeling needs improvement.

10.1029/2018MS001541
To initialize the carbon pools, SiB4 calculates steady-state carbon pools using an accelerated equilibrium approach. Initializing with steady-state conditions implies mature ecosystems with no disturbances where growth balances decay (NEE ∼ 0) and is a typical assumption for biogeochemical models (Richardson et al., 2012;Schaefer et al., 2008;Xia et al., 2012). The model repeatedly simulates 1997-2014, and after each iteration the steady-state pool sizes are determined analytically by setting the time derivatives in the pool equations to 0 (Schaefer et al., 2008). Each subsequent 18-year cycle begins from the previous steady-state carbon pool sizes. Spin-up continues until the beginning, ending, and steady-state pools are within 1% of each other. By satisfying this requirement, the model will no longer reflect the arbitrary pool sizes from which it started and instead will represent the effects of climate and other time-variant forcing.
During the initialization process, SiB4 continuously updates the soil moisture and running-mean variables.
Since 10 years is typical for models to initialize soil moisture (Corbin et al., 2010;Wang et al., 2007), beginning each iteration with the previous soil moisture reservoirs ensures ample time for soil moisture to come to a state consistent with the driver data and model assumptions.

Evaluation and Analysis Procedures
For all results, SiB4 is sampled in the same manner as the observations. For LAI, we sampled SiB4 on the same days the MODIS data are reported (45 comparisons per year). Monthly and annual means are the means of the corresponding 8-day samples. At high-latitude sites with low solar zenith angles during the winter dormant months, MODIS has considerable missing data. To avoid biases in the annual means, we used a constant value to fill these missing values. Since non-missing values typically started and ended each growing season with an LAI of ∼0.15, we used this to fill the winter missing values. For Fluxnet, we compared same-day hourly values (8,760 comparisons per site year adjusted for leap years less any missing data that are not evaluated) and daily mean values (365 comparisons per site year). Monthly and annual means are the average of all nonmissing daily data for each time period. To provide an overview of the data, scatter plots of SiB4 against LAI and flux observations are shown in supporting information section S2.
For site-specific evaluations, we calculate four comparative statistics: root-mean-square errors (RMSEs), mean bias errors (MBEs), correlation coefficients (COR), and normalized standard deviations for the model (NSD, normalized by the observed SD). Using daily data, we computed one value per site per statistic for both the entire time period and the growing season only. To determine the growing season, we applied thresholds to the model and observations and evaluated all times when either was above the threshold in order to include mismatches in the timing of green-up and senescence. For NEE we used days below zero; for SH we used days above 0; for LE we used days above 10% of the site maximum; and for LAI and carbon fluxes we used days above 20% of the maximum. The results are compiled into box plots showing the distribution of values across sites. Correlations are significant at the 95% level if they are above 0.3 as calculated from the two-tailed t test. To obtain this for LAI, we used 45 degrees of freedom calculated from 3 degrees of freedom per year (green-up, growing season, and browning) over 15 years. For fluxes, since these measurements are reported hourly, we used 48 degrees of freedom calculated from 6 degrees of freedom per year (dormancy, leaf-out, early growing season, peak growing season, late growing season, and senescence) and assumed the data exist for an average of 8 years.
For diurnal cycle evaluations, we calculate the monthly mean NEE diurnal cycle across all years at each site.
To compare Northern Hemisphere and Southern Hemisphere sites, the months are shifted such that the month with the maximum NEE drawdown represents the peak growing season. The diurnal cycle analysis groups all sites together by grassland type, and the monthly NEE analysis shows the seasonal cycle for all sites ordered by increasing precipitation from top to bottom using the site MAP calculated from years with valid observations. Six sites have missing months, which are shown as 0 since they do not occur during the growing season.
To evaluate phenology, we used LAI to investigate four seasonal metrics: start of the growing season (SOS), day of maximum (DOM) to indicate the peak of the growing season, end of the growing season (EOS), and growing season length (GSL). We chose LAI for this analysis since it has the longest time record for the most sites, and we again applied a threshold to mean seasonal cycles for both MODIS and SiB4 LAI calculated using a spline fit to both data sets. SOS occurs on the day when LAI exceeds 20% of the maximum value, the DOM is the day of year when LAI is at its maximum value, EOS occurs once the LAI drops below 20% of the maximum site-specific value, and GSL is the number of days occurring between the SOS and the EOS. We chose the 20% threshold after performing sensitivity tests with thresholds ranging from 10-30%, choosing the value that best captured the predominant features while avoiding spurious events. Since this study includes worldwide locations with the SOS day-of-year (DOY) occurring throughout the entire year, sites that have the EOS DOY before the SOS DOY have been shifted by 180 days to align with the Northern Hemisphere growing season.
To evaluate annual productivity, we compared mean annual ANPP at five LTER sites and BNPP at two LTER sites. For each site, the NPP measurements for every year available between 2000 and 2014 were averaged to obtain one mean NPP value per site. If necessary, site productivity was converted from dry weight to carbon using the standard 0.5 conversion factor (Verchot et al., 2006). SiB4 productivity was calculated in the same way: annual productivity above and below ground was calculated per calendar year, and the same years measured at each LTER site were averaged to create the SiB4 mean annual NPP per site. We show the results using a bar graph, where whiskers show the measured and modeled range in annual productivity.
Large-scale climatological relationships are investigated using monthly and annual mean LAI and NEE calculated over the entire 15-year time period. We use monthly mean LAI and NEE to show the mean seasonal cycle for all sites. For every site, the multiyear monthly means are calculated and the timing is shifted such that month 0 has the highest LAI or maximum uptake for that site. To compile them into a single figure, the monthly values at each site represent a single row in a contour diagram where the sites are compiled together and ordered by increasing precipitation. We also analyze multiyear annual means by calculating each annual mean from matching 8-day and daily values and then averaging these annual means across all years. We use annual means rather than growing season means to be able to compare sites across climates without introducing biases due to the length of the growing season. To generalize these relationships, we fit both linear and third-order polynomials to the data. We report the coefficient of determination (r 2 ) value and slope of the line and show the fit to the third-order polynomial.
To analyze interannual variability, we investigated two severe droughts in the central North American grasslands from 2010-2012 using a subset of nine sites across the U.S. Great Plains (Table S2). The region is characterized by a west-east gradient in precipitation (∼200 to 1,000 mm year −1 ) and a north-south temperature gradient (∼6 to 21 • C). The large-scale environmental gradients across the grassland result in large-scale changes in vegetation, from deserts in the west to short grasslands in the central region, to tallgrass in the east. The north-south temperature gradient is correlated with a change in C3 and C4 coverage: C3 species dominate in the cool/dry portions and C4 species dominate in the warm/wet areas (Lauenroth et al., 1999).
The first drought occurred in the southern plains and began in late 2010 (Seager et al., 2014). The dry conditions, centered over Texas and extending into seven southern states, continued through the summer of 2011, breaking heat records and causing one of the most catastrophic short-term droughts in U.S. history (Nielsen-Gammon, 2011). The second drought occurred across the entire Great Plains region and was one of the most widespread and severe droughts in over one hundred years (Hoerlin et al., 2014). The precipitation deficits started to evolve in May across the Great Plains, eventually affecting more than half of the U.S., and by July 62% of the country experienced moderate to exceptional drought (National Climatic Data Center, 2013). During 2012, the annual mean temperature was 1.8 o C above average, with the warmest spring and second warmest summer in the period of 1895-2012 (Wolf et al., 2016). After peaking in July and August, the drought faded away gradually from September to October .
To evaluate grassland drought response, we investigated the 2010, 2011, and 2012 annual mean LAI and precipitation. To compare the response of LAI due to drought to the change in LAI across the climate gradient, for each site we regressed the mean annual total and drought LAI against the corresponding precipitation. For the sites affected by both the 2011 and 2012 droughts, the drought LAI and MAP was the mean between the two years. For the sites affected in 2012, the drought LAI and MAP were the 2012 values only.
To single out the response in individual years, we regressed the standardized anomalies in LAI against the standardized anomalies in precipitation.

Site Evaluation
To evaluate the performance of daily SiB4 predictions across all sites, Figure 2 shows box plots of RMSE, MBE, correlation, and NSD for LAI, carbon, and energy fluxes. These metrics are calculated using all non-missing data and growing season only. For LAI, the results are similar regardless of including dormancy. The median RMSE error between SiB4 and MODIS is 0.26 and all RMSE errors are < 0.72 (corresponding to normalized errors less than 30%). SiB4 does not have a prevalent LAI bias across grasslands, with a median MBE of 0.01. Both the upper and lower MBE quartiles lie within 0.1; however, two sites have negative biases > 0.2 and three sites repeatedly overestimate the LAI by more than 0.3, resulting in positive biases. All sites have statistically significant correlations ( Figure 2C), with a median correlation coefficient of 0.78 between SiB4 and MODIS LAI, and 90% of the sites have correlation coefficients > 0.5. The correlations drop when only the growing season is evaluated; however, all sites remain significantly correlated.
Comparing the observed and simulated standard deviations, SiB4 slightly underestimates the LAI variability such that the mean difference in spread is 0.04, and this underestimation increases to 0.08 for the growing season. All sites have a NSD between 0.45 and 1.6 ( Figure 2D). Low RMSE errors, negligible MBE over all years, high correlations, and NSD's near one indicates SiB4 is able to capture the predominant seasonal and interannual variability in LAI across sites, because large differences for any year would increase the RMSE to error values closer to observed values and consistent early or late phenology would lead to a larger MBE.
For GPP (second column in Figure 2), the mean RMSE between SiB4 and observed is 2.1 mol C m −2 s −1 , and this increases to 2.8 when evaluating the growing season only. The mean and median MBE are 0.14 and 0.03 mol C m −2 s −1 , respectively, indicating SiB4 does not have an overall bias in simulating GPP. All sites have significant correlations of GPP, with half of the sites having correlations > 0.77; however, the correlations drop by ∼0.18 for the growing season. The variability is similar between measured and simulated for both the entire time-series and for growing season only, as the mean NSD is within ∼0.05 of 1.
For RE, RMSE errors are ∼0.5 mol C m −2 s −1 smaller than GPP errors; however, the bias is larger, indicating that SiB4 overestimates the respiration at 75% of the sites, with mean and median biases of ∼0.4 mol C m −2 s −1 that are slightly larger for the growing season. This offset in RE is likely due to the initial pool values in SiB4 representing steady-state conditions while many of the sites are carbon sinks. All sites have correlation coefficients > 0.5, with half of the sites having RE correlations ≥ 0.74. SiB4 simulates the overall variability in RE well, with the NSD mean being 1.06 and the median being 0.98. It should be kept in mind that these exact differences between SiB4 and Fluxnet observations are subject to the choice of partitioning method at many of the sites; however, since we have selected methods that are near the average across all partitioning choices, it is expected that these results are not biased by using any particular methodology.
Looking at NEE, SiB4 underestimates the variability in NEE with a median NSD of 0.72 mol C m −2 s −1 . The median daily RMSE between SiB4 and observed across sites is 1.8 mol C m −2 s −1 , the median MBE is 0.24 mol C m −2 s −1 , and the median NSD is 0.72 mol C m −2 s −1 , which indicates that at nearly all sites SiB4 underestimates the summer drawdown of carbon dioxide from grasslands. The correlation coefficients for NEE are low, with 75% of the sites have non-significant correlations during the growing season. Because NEE is the difference between GPP and RE, the model error is at best difficult to interpret and at worst may actually be hidden in NEE evaluations due to compensating errors.
Energy flux evaluations for SiB4 compared to Fluxnet observations are shown in the last two columns of Figure 2. Compared to carbon fluxes, SiB4 does not capture the variability in energy fluxes as well, with both LE and SH having mean and median NSDs < 1. Evaluating throughout the year, daily RMSEs are all < 40 W m −2 for LE and < 32 W m −2 for SH (with one exception), and 75% of the sites have energy flux RMSEs <30 W m −2 . The errors for LE increase at all sites when the growing season is evaluated; however, the errors for SH do not systematically increase, but instead the mean and median remain constant while the errors increase at a few sites but also decrease at a few sites. SiB4 does not have a bias in LE but has a mean negative bias of ∼-8 W m −2 for SH. SiB4 has significant LE and SH correlations at all but three sites, with higher LE correlations.
To show how these errors arise, we assembled time series of modeled and observed 8-day LAI, daily energy and carbon component fluxes, and monthly NEE at five sites: an arctic grassland (RU-Ha1), a Mediterranean seasonal C3 grassland (US-Var), a temperate productive C3 grassland (AT-Neu), a C4 desert grassland (US-Wkg); and a C4 tallgrass prairie (US-Kon). We chose these sites because they sample the three different grassland PFTs; highlight different climates and grassland behaviors; and all have LAI data, multiyear Fluxnet records, and Fluxnet data that include the separation of NEE into the carbon components. Figures S8-S12 show these results, and section S3 provides a full discussion of these sites.

Annual Productivity
Mean annual NPP at five sites is shown in Figure 3. Four of the five sites have similar ANPP, despite different climates. Both US-Seg and US-Jrn are desert sites, while US-CPE is a temperate short-grass steppe and US-NRM is an alpine meadow. Compared to the low ANPP at these sites, the ANPP at US-Kon, a tallgrass prairie, is nearly 3 times more. SiB4 is able to simulate this difference, capturing both the mean ANPP and the range of ANPP within 0.3 Mg C ha −1 year −1 . BNPP is measured at two sites, and despite similar ANPP at these two sites, US-CPE has nearly twice as much BNPP as US-Seg. SiB4 again is able to predict this difference, capturing the mean and maximum annual NPP within 0.2 Mg C ha −1 year −1 .

Phenology
Seasonal metrics of LAI are shown in Figure 4. Comparing SiB4 against MODIS observations, correlations of all four phenology metrics (SOS, DOM, EOS, and GSL) are significant. SOS values are shown in Figure 4A. SiB4's correlation with MODIS is very tight, with an r 2 value of 0.95, and both MODIS and SiB4 show a general pattern of earlier SOS with increasing MAP. Most of the Northern Hemisphere temperate sites begin growing during their spring in April, and the Southern Hemisphere temperate sites follow this same pattern and begin growing in October (180 days shifted). At arctic sites, the SOS is shifted later and typically occurs in late May once the temperatures have warmed enough to melt the snow and support photosynthesis. The sub-Saharan sites begin growing when the rainy season begins, typically in late July.
The days of year when the LAI peaks are shown in Figure 4B. The correlation between MODIS and SiB4 DOM is less than that for SOS, with a value of 0.77. DOM ranges from January to September, with most sites peaking in June or July. SiB4 tends to model slightly early DOMs for this cluster of sites, which is what reduces the correlation and causes an offset in the linear fit. For the sub-Saharan, Southern Hemisphere, and Mediterranean sites, the DOM in SiB4 and in MODIS are within a few days of each other. At three sites (CH-Cha, RU-Tuv, and US-Arc), there is more than a 50-day difference in DOM between MODIS and SiB4, and all these sites have at least two peaks of LAI during the growing season. While SiB4 captures the multiple peaks, their magnitude and timing does not exactly match MODIS; and averaged over multiple years, these relatively minor discrepancies aggregate to create large differences in the mean DOM. Some of these discrepancies may be due to the 8-day temporal average rather than daily values, while some may be due to management practices and other factors that are not included in SiB4. Figure 4C. SiB4 does not predict browning-down as well as greening-up, with large discrepancies in EOS at numerous sites. The r 2 between SiB4 and MODIS is 0.77, and SiB4 tends to end the season later than MODIS, particularly for sites with MAP > 1,000 mm. Considered by PFT, the EOS for the C3A sites occurs in late September, with SiB4 and MODIS agreeing on the EOS within 8 days. The majority of the non-arctic sites have their EOS clustered at the end of the year, and timing differences between SiB4 and MODIS are greatest for C3G.

Site EOS comparisons and values are shown in
Due to the large span in climatic conditions, grasslands have growing seasons ranging from 70 days to over 300 days ( Figure 4D). Since GSL results from a combination of SOS and EOS timing, it is no surprise that the correlation of this phenology metric is lower. Additionally, due to less skill at predicting EOS timing, SiB4 has higher GSL variability. Arctic and sub-Saharan sites have the shortest growing seasons, being water and temperature limited, respectively. SiB4 and MODIS match GSL well at sites with minimal precipitation and short growing seasons; however, compared to MODIS, SiB4 overestimates GSL at sites with growing season lengths from 150 to 250 days. Despite this, both MODIS and SiB4 show that longer GSL coincides with higher MAP, and an analysis of GSL and temperature shows that GSL has a relationship with temperature using the site MAT difference from an optimal value (supporting information section S4).

Diurnal Variability
Carbon and energy fluxes have strong diurnal cycles over land, driven by incoming solar radiation (Betts, 2002). Plants take up carbon dioxide during the daytime yet respire both day and night. During the growing season, this causes net CO 2 uptake during the day and release overnight. As the growing season progresses, the amplitude of the diurnal cycle increases until the peak growing season, after which the diurnal cycle amplitude decreases to minimal values during dormancy. Figure 5 shows monthly mean diurnal cycles of NEE for the four grassland PFTs. Three months prior to peak uptake (top left panel), the growing season is just beginning for temperate sites; thus, the amplitude of the seasonal cycle is only ∼5 mol m −2 s −1 for C4 grasslands (red) and ∼2 mol m −2 s −1 for C3 grasslands (green). The cycle is negligible for arctic grasslands (blue) because their short growing season has not yet begun. Early in the growing season, SiB4 matches the amplitude of the seasonal cycle with the Fluxnet observations for C3 and C4 grassland sites, yet SiB4 overestimates the annual mean amplitude of the diurnal cycle at mixed grassland sites by ∼3 mol m −2 s −1 (yellow). The timing of the diurnal cycle for SiB4 closely matches the C3 grassland observations; however, SiB4 begins photosynthesis too late in the day at the C4 sites, as the uptake lags the observations by ∼2 hr throughout the morning.
As the growing season progresses, diurnal cycle amplitudes increase and differences in timing begin to emerge in both SiB4 and the observations. Two months prior to peak uptake, SiB4 overestimates the amplitudes for the mixed and C4 grasslands by ∼50% (∼4 and ∼5 mol m −2 s −1 , respectively). One month prior to peak uptake, SiB4 comes within 1.5 mol m −2 s −1 of the observed amplitudes for the mixed and C3 grasslands, while continuing to overestimate the diurnal cycle amplitude for C4 grasslands by ∼3 mol m −2 s −1 . With the amplitude increase throughout the growing season, these differences now represent errors of 12%, 15%, and 22% for C3, mixed, and C4 grasslands, respectively. Arctic C3 grasslands have a much longer duration of daytime uptake due to longer days, while C4 grasslands have less; and this pattern is seen in the SiB4 results and in the data. At the peak of the growing season (top right panel), the diurnal cycle amplitudes are the largest. Among the PFTs, C4 grasslands take up the most carbon during the day, with maximum net uptake rates of ∼14 mol m −2 s −1 at noon. Mixed and C3 grasslands take up less carbon during the day than C4 grasslands, and SiB4 underestimates the peak uptake rates for these PFTs by ∼2 mol m −2 s −1 (∼20%). As expected, tundra grasslands take up the least carbon during the day, with maximum uptake rates of only ∼3 mol m −2 s −1 for SiB4 and ∼4 mol m −2 s −1 in the observations. At night SiB4 overestimates the respiration for all PFTs compared to the observations by up to 15% (2 mol m −2 s −1 ). This could be because SiB4 is representing a mature ecosystem in long-term balance or possibly because nighttime flux data may be biased low despite u* filtering.
SiB4 captures the basic pattern in having the fewest hours of uptake for C4 grasslands and the most hours of uptake in the arctic. SiB4 over-accentuates this pattern, underestimating the length of daytime uptake for C4 grasslands by ∼3 hr and overestimating the length of daytime uptake for C3 grasslands by ∼1 hr. For C3 grasslands, the underestimation of maximum daytime uptake results in a U-shape diurnal cycle in SiB4, compared to the observed V-shape. This midday uptake underestimation is a classic model problem arising from the timing of radiation data and the treatment of within-canopy light extinction (Baker et al., 2003). Since SiB4 adjusts the maximum rubisco velocity (V max ) with phenology, underestimating V max during peak growing season could be contributing to the midday differences seen between the model and the observations.
Following the peak seasonal uptake (bottom row), SiB4 simulates the late summer decrease in uptake for C4 and C3 grasslands with errors throughout the day of < 20%. For mixed grasslands, SiB4 decreases the uptake throughout the afternoon more than seen in the observations, with differences up to 50% (∼5 mol m −2 s −1 ) indicating that the model becomes stressed too early in the day at these sites. Figure 5 illustrates that the diurnal cycle varies between grassland PFTs and evolves throughout the year. Although SiB4 has specific differences of up to 50% during some parts of the day compared to Fluxnet data, these differences occur in individual PFTs and months. Comparing across all PFTs, months, and times of day, SiB4 captures the general amplitude and timing patterns of the grassland diurnal cycles as they progress through the growing season.

Seasonal Patterns
The length of the growing season for grasslands ranges from a few months to year-round, and LSMs have a difficult time accurately simulating monthly carbon fluxes for these complex ecosystems; thus, we analyzed the mean seasonal cycle across all grassland sites to see if any patterns emerge across climate zones ( Figure 6). Looking at observed monthly mean LAI ( Figure 6A), desert sites have small seasonal amplitude in LAI and short growing seasons (e.g., the row labeled Seg near the top of the figure). As the precipitation increases, both the amplitude and length of the seasonal cycle increase (e.g., the row labeled Pol near the bottom). Despite these increases, the shape of the seasonal cycle primarily remains symmetrical around the month of maximum LAI, indicating that the length of time for green-up and brown-down are similar. While numerous sites experience strong seasonality, several of the temperate C3 and tropical C4 grassland sites have non-negligible LAI year-round. SiB4 appears to capture these dominant features in LAI ( Figure 6B), as the pattern appears similar to that seen in the observations. Seasonal cycles of NEE for all sites are shown in Figures 6C and 6D. A striking pattern emerges: all sites take up carbon (negative NEE) for approximately 3 months, and this drawdown begins 1 month before the maximum drawdown. From 2 to 6 months prior to maximum uptake, all sites except one are sources of carbon. This pattern remains despite changes in productivity and increases in annual precipitation. In general, drawdown continues for one month after maximum uptake; however, the wetter, more productive sites remain sinks of carbon for longer periods of time. SiB4 captures this overall behavior as demonstrated in Figure 6D; however, the magnitude of the growing season uptake is underestimated and at some sites the model shifts to a weak carbon sink one month earlier than observed.

Climatological Gradients
Grassland productivity is related to precipitation, and Figure 7A shows a strong linear relationship between annual mean LAI and MAP. Both observed and modeled correlations are tight and significant, with an observed r 2 of 0.75 and a simulated r 2 of 0.81. The observed and modeled slopes are nearly identical, indicating that for grasslands, a 100-mm increase in precipitation results in a 0.125 increase in LAI. Annual carbon fluxes are also related to annual precipitation; however, these relationships are weaker ( Figures 7C  and 7E). For carbon uptake, the observed r 2 is 0.71 and the slope is 1.06, indicating that for every 100 mm of precipitation, annual GPP will increase by 106 g C m −2 . Precipitation explains slightly more of the modeled GPP variability than observed, with an r 2 of 0.77. SiB4 also slightly overestimates the response of GPP to precipitation with a higher slope by 0.07. The relationship between RE and MAP is slightly weaker in the observations, with an r 2 of 0.67. The observed slope of the relationship between RE and precipitation is also slightly less than the GPP slope, at 0.92. As seen in GPP, the modeled relationship between RE and precipitation is stronger than observed, with an r 2 of 0.77 and a slope of 1.12. It was found that a third-order polynomial closely matches both the observed and simulated data, and using this relationship highlights regions of model-data mismatches. For GPP, the observed and modeled relationship is similar at sites with annual precipitation < 800 mm; however, for wetter sites, SiB4 overestimates the annual GPP, particularly at temperate C3 grasslands. For RE, SiB4 overestimates RE compared against the flux observations at all sites, with bigger differences as the site precipitation increases.
Over the past decades, studies have indicated that the Northern Hemisphere growing season duration has significantly increased due to both an earlier start and delayed ending (Jeong et al., 2011;Menzel et al., 2006;Myneni et al., 1997;Park et al., 2016). To investigate potential impacts this may have on grasslands and the carbon cycle, we investigated the relationships of LAI, GPP, and RE with GSL ( Figures 7B, 7D, and 7F). Opposite of precipitation, the relationship between LAI and GSL is weak, with an observed and simulated r 2 values of 0.42. Both slopes were similar, indicating that annual mean LAI increases 0.05 per day of growing season added; however, considerable site differences are seen in both the observations and the model.
GPP and RE both have stronger relationships with GSL. Fluxnet r 2 values for GPP and RE against GSL are identical at 0.72, and SiB4 r 2 values are slightly higher at ∼0.8. The slopes of the GPP and RE relationships against GSL is also similar, with values of ∼7.4 and ∼8.8 for Fluxnet and SiB4, respectively. The higher slopes for SiB4 results from the tendency of SiB4 to underestimate the carbon flux at sites with short growing seasons and overestimate the flux at sites with longer growing seasons. It is interesting to note that GSLs calculated with GPP differ substantially from GSLs calculated using RE. Using carbon fluxes to estimate growing season length dramatically shortens the growing season length at the desert site (US-Seg). Additionally, a shift toward longer GSLs in numerous C3G sites when using RE to determine the length can clearly be seen in both Fluxnet and SiB4.
To investigate the relationship between aboveground carbon biomass and carbon flux, Figure 8 shows GPP and RE plotted against LAI. Uptake is strongly related to LAI, both in the observations and in SiB4 (r 2 values of 0.73 and 0.78, respectively). Although the slope of the relationship is higher in the observations than in SiB4, both suggest that every 0.1 increase in annual mean LAI is associated with a ∼92 g C m −2 increase in GPP per year. SiB4 underestimates the carbon uptake at temperate grassland sites with annual mean LAI ≥ 1.3. For RE, the observed r 2 is slightly higher than for GPP, bringing it equal to the modeled r 2 . Both observations and SiB4 show that the slope of RE against LAI is very slightly lower than the slope of GPP, with SiB4 again underestimating the respiration for temperate C3 grassland sites.

Interannual Variability: Drought Response
Across the Great Plains of North America, there is a strong linear relationship between annual mean LAI and MAP ( Figure 9A, dashed lines), with r 2 values of 0.86 and 0.90 for MODIS and SiB4, respectively. Both MODIS and SiB4 have identical slopes showing that for a 100-mm increase in precipitation the annual mean LAI will increase by 0.13. This coincides with the well-accepted linear relationship between MAP and ANPP across the Great Plains (Huxman et al., 2004;Knapp & Smith, 2001;Sala et al., 1988), and the slope is nearly identical to the slope obtained using all 48 sites ( Figure 7A), indicating that this response is robust.
To compare the response of LAI due to drought to the change in LAI across the climate gradient, for each site we regressed the mean drought LAI against the corresponding drought MAP ( Figure 9A, sites with "-D," solid lines). The drought lowered the MAP and LAI at all sites. SiB4 captures the mean drought and nondrought LAI with differences from MODIS of < 0.2 across all sites. Fitting a line to the values, the relationship if strongly linear: MODIS has a slope of 0.0011 and an r 2 of 0.76, while SiB4 has a slope of 0.0012 and an r 2 value of 0.84. The difference in slope is less than 10%.
The slope for the drought LAI is lower than the slope calculated using the long-term mean values. This indicates that across all sites, the response to drought is less than the change in LAI due to climatological differences in precipitation. Written another way, for a given precipitation value, if that precipitation is the MAP then the LAI will be lower than if that precipitation is only a single-year anomaly from a climatologically higher MAP. This is consistent with other research suggesting grasslands have developed a tolerance to drought and may have mechanisms to provide a buffer against precipitation deficits (Knapp & Smith, 2001;Lei et al., 2016).
To calculate the net change in LAI due to the droughts, we plotted all the anomalies for 2010-2012 together and calculated separate liner fits to the positive and negative anomalies to isolate the LAI changes due to decreased precipitation ( Figure 9B). When MAP is more than average, SiB4 overestimates the LAI; however, during the drought years when the MAP is lower than average, MODIS and SiB4 have nearly identical best-fit lines with a slope of 0.84. The change in LAI due to drought is only weakly linear however, as the r 2 values are only 0.22 and 0.27, respectively. However, with correlation coefficients of 0.47 and 0.52 and assuming 27 degrees of freedom (one per site per year), this relationship is still significant at the 95% level using the standard two-tailed t test.
Walking through the drought progression, in 2010 all sites have MAP and LAI near or above the long-term mean ( Figure 9C). When the sites are hit by drought, the LAI is significantly lowered (Figures 9D and 9E). In 2011, the drought affected the southwestern and south-central sites (FR1, Arc, Wlr, and Seg), causing lower MAP and corresponding lower LAI. One exception to this is BKG, which had a lower MAP for the year but a higher mean LAI. This site is further north than the region impacted by the 2011 drought and has a strong seasonal cycle; thus, the reduced precipitation is not due to the drought and does not occur during the peak growing season to have an impact on LAI. The 2012 drought was widespread, and all of the sites had lower MAP than average. SiB4 predicted that all sites had lower LAI, while MODIS showed lower LAI for all sites except Arc and FR1. The 2012 drought was less intense at both these sites, and it could be the vegetation was actually recovering from the record moisture deficit and heat in 2011.

Discussion
Large-scale grassland behaviors emerge from combining LAI, flux, and NPP measurements. While each observation method alone is associated with shortcomings, combining this information leads to a broader and more robust understanding of grassland ecology. Additionally, predicting these different metrics all within reasonable error margins illustrates model skill, rather than tuning to optimize performance against one specific data set.
Annual mean grassland LAI has a significant linear relationship with MAP, supporting a large body of work showing that productivity in these ecosystems increases linearly with rainfall amount on the global scale (Jia et al., 2015;Shi et al., 2014;Zhu & Southworth, 2013). Both MODIS LAI data and SiB4 simulations show that every 100-mm increase in rainfall results in a 0.125 increase in mean annual LAI. Droughts lower both MAP and LAI, and the ability of SiB4 to differentiate mean annual LAI between drought and non-drought conditions with responses nearly identical to those seen by MODIS shows that the grassland phenology model is capable of responding to environmental conditions and predicting interannual variability.
GPP and RE also have significant relationships with MAP. Although the strength of these flux relationships is stronger in SiB4 than in the observations, these sites suggest that across large scales, grassland GPP increases by ∼110 g C m −2 annually for every 100 mm increase in precipitation. Since both carbon uptake and aboveground biomass have strong linear relationships with precipitation, not surprisingly they are significantly related to each other and these sites indicate that for every 0.1 increase in LAI the annual GPP will increase by ∼110 g C m −2 .
Grassland phenology also has relationships with MAP. The start of the growing season shifts earlier as MAP increases, and SiB4 and MODIS SOS are tightly correlated, predicting SOS values within 6 days of each other. For EOS, SiB4 has larger errors and tends to delay senescence. Two important reasons EOS is not well predicted emerge. First is the lack of management practices in SiB4. Many of the sites where SiB4 has a delayed EOS compared to the observations are intensively managed. The lower performance of SiB4 at these sites is likely because SiB4 does not include grassland cutting and harvesting. Second is a missing driving process for senescence. At tropical sites in particular, senescense begins prior to temperature or moisture stress, suggesting that some grasslands have evolved to have a nearly fixed growing season length based on the mean rainy season. Despite EOS differences, both SiB4 and MODIS show that EOS shifts later as MAP increases.
GSL also shows responses to MAP and MAT. Both MODIS and SiB4 show that the SOS and EOS relationships with MAP combine to lead to a lengthening of the growing season by ∼1 day per 8-mm MAP increase. For temperature, regressing GSL against MAT differences from an optimal value (283 K for these sites) yielded a statistically significant linear response, with a 6-day shortening of the growing season for a 1 K MAT temperature difference from the optimal. These relationships are based on an analysis of LAI GSL; however, statistically significant differences in specific GSL arose from the choice of metric used to determine the growing season. GSL is shortest when determined using GPP (150 days on average), ∼15 days longer using RE (175 days on average), and the longest using LAI (193 days on average). We hypothesize

Journal of Advances in Modeling Earth Systems
10.1029/2018MS001541 that these changes in GSL arise from the ecological behavior of the metrics being investigated. GSL is shortest using GPP since this only represents the time period plants are photosynthetically active. Next comes GSL using RE, which has contributions from both autotrophic and heterotrophic respiration. Since there are non-photosynthetic inputs to respiration, this metric results in longer values. Finally, GSL determined using LAI has the longest values because LAI results from complex interactions between carbon fluxes and represents a composite of ecological processes. While fluxes can start and stop quickly with changing environmental conditions, LAI emerges from fluxes, allocation, and growth processes, thus resulting in longer GSL estimates. This finding highlights the importance of thoroughly understanding the GSL calculation to properly understand the results and attribute any conclusions.
Shifting phenology, particularly earlier spring onset in high northern latitudes, illustrates an important ecological response to climate change (Armano et al., 2010;Menzel et al., 2006;Post et al., 2018;Richardson et al., 2013). Changes in phenology and GSL will impose changes in carbon fluxes that will cascade into resultant carbon storage pools (Forkel et al., 2016;Keenan et al., 2014;Park et al., 2016). As vegetation responds dynamically to environmental changes, complex patterns in greening and browning have been observed (Goetz et al., 2011;Lara et al., 2018;Miles & Esau, 2016). Using this selection of global sites, we investigated how GSL is related to LAI, GPP, and RE. Both carbon flux components are significantly correlated with GSL, and site observations suggest that for every day added to the growing season, annual GPP and RE increase by ∼8 g C m −2 . The LAI relationship with GSL was considerably weaker due to longer and more variable GSLs calculated when using LAI as the determinant. Despite site-specific differences in EOS and GSL, SiB4 captures the large-scale relationships of LAI, GPP and RE with GSL reasonably well. Although SiB4 overestimates the slope of both the GPP and RE responses by ∼1.5 g C m −2 , compared against the large background flux from the existing growing season, this difference is relatively small (< 1%).
Looking at monthly patterns in grassland carbon cycling, aboveground vegetation is relatively symmetrical about the month of peak LAI, and monthly patterns in LAI are closely matched between MODIS and SiB4. In contrast, grassland NEE is less symmetrical and exhibits model-data mismatches. From Fluxnet data, grasslands begin to be seasonal carbon sinks only one month prior to peak drawdown across all sites. This is a striking pattern that has both ecosystem and carbon cycling implications. Although the length of the seasonal sink increases with longer growing seasons, the temporal shape of the sink appears to be consistent across climatic conditions: grasslands rapidly switch from sources to sinks of carbon, reach their maximum seasonal sink magnitude after one month, and continue as decreasing seasonal carbon sinks for one to several months (depending on rainfall distribution) until the ecosystem returns to being a source of carbon through dormancy. Fluxnet data show that while both flux components increase through the growing season, RE lags GPP and has maximum efflux 1 to 2 months after peak drawdown ( Figure S14).
SiB4 underestimated the magnitude of the peak drawdown and had a tendency to predict drawdown too early. Since NEE results from two counteracting components, causes for this behavior can be difficult to unravel; however, we have several ideas for the cause of this behavior. First off, site-specific land history and management practices that increase site productivity are not included in SiB4. Techniques such as burning, cutting, and fertilization have been shown to increase productivity (e.g., Buis et al., 2009;Imer et al., 2013;Koerner & Collins, 2014;Wohlfahrt et al., 2008). Not including these practices could lead to an underestimation of NEE compared to observed. Second, the rate of photosynthesis relative to aboveground biomass may not be correctly simulated. Although SiB4 seasonally changes the maximum rate of photosynthesis, it could be that this seasonality needs adjusting. Perhaps early and late in the season leaves are too productive, yet during peak drawdown the maximum rate of photosynthesis is underpredicted. Third, timing of respiration may be altering the NEE pattern. An investigation of monthly RE shows that SiB4 respiration one month prior to seasonal peak drawdown is higher than observed ( Figure S14). Additionally, higher respiration extends longer following the peak drawdown in the observations that is not simulated by SiB4. These two patterns provide guidance for future improvements to SiB4 respiration; however, further study may be required to isolate the processes causing these changes since ecosystem respiration is comprised of both autotrophic and heterotrophic respiration. One hypothesis is that the autotrophic respiration is too high in SiB4 during green-up at the beginning of the season and too low during maturity and senescence. Another hypothesis is that the seasonality in heterotrophic respiration is shifted too early in the season, perhaps due to inaccurate temperature and/or moisture responses. It seems likely that all of these processes are contributing to the incorrect NEE seasonality.

10.1029/2018MS001541
SiB4 improves simulated grassland carbon and energy fluxes worldwide. Compared to model results presented in Raczka et al. (2013), SiB4 reduces simulated grassland annual GPP and RE biases from 2.4 and 3.0 Mg C ha −1 year −1 to 1.0 and 0.9 Mg C ha −1 year −1 . From another model-data comparison of GPP by Schaefer et al. (2012), SiB4 shows improvement in simulating carbon uptake. Comparing 26 models against estimated GPP at 4 grassland eddy covariance towers, Schaefer et al. (2012) show model-mean daily RMSE of 2.8 mol m −2 s −1 , while SiB4 has a daily mean RMSE of 2.1 mol m −2 s −1 across the 23 sites in this study with available GPP Fluxnet estimates. Looking at the two sites that are in common between the two studies, Schaefer et al. (2012) has RMSE values of 3.5 and 3.2 mol m −2 s −1 for US-IB2 and US-Var, respectively. At these same sites, SiB4 has RMSE values of 2.3 and 1.7 mol m −2 s −1 , showing a 50% reduction in predicted GPP compared to previous model results.
This study identified three key areas that will form the basis for upcoming incremental improvements to SiB4. First is to better represent management practices. Land history, management, and disturbance such as burning, cutting, fertilizing, and harvesting play an important role in grassland ecosystems. Not including these processes limits the skill of SiB4, particularly with respect to predicting site-specific productivity and growing season senescence. Further, expanding grazing to include spatial variations of grazing intensity and duration should also yield improved productivity predictions at heavily grazed locations. Second is to improve both autotrophic and heterotrophic respiration estimates to decrease efflux at the beginning of the growing season and increase efflux at the end of the growing season. Third is to investigate processes controlling senescence, including introducing a new process representing hypothesized biological adaptation to established growing season lengths. It appears that too-long growing seasons in SiB4 are compensating for underestimated maximum GPP, and improving the EOS predictions will help highlight errors in carbon assimilation rates that can be adjusted to yield further model improvements.

Conclusions
This study combined remotely sensed LAI data, eddy covariance flux measurements, and LTER NPP observations to investigate worldwide grasslands spanning climate variability. Grasslands have a large range of annual mean LAI that responds linearly with MAP (∼0.125 LAI increase per 100-mm MAP). The LAI response due to drought was less than the mean LAI change across the precipitation gradient in both MODIS and SiB4, indicating that the drought response across sites is not as strong as the overlying climate response. As MAP increases, grassland growing season begins earlier in the year and extends later in the year, leading to a longer GSL; however, actual GSL values depend on the metric being used to identify the growing season, with the biggest differences at desert sites. Using GPP to predict GSL results in the shorter values (∼150 days on average) while using LAI results in the longest GSL (∼195 days on average). Annual GPP and RE have statistically significant linear relationships with MAP (∼100 g C m −2 increase per 100-mm MAP) and GSL (∼8.5 g C m −2 increase per growing season day). Looking at net carbon flux, monthly mean NEE patterns revealed that regardless of productivity or growing season length, all grasslands transition to seasonal sinks of carbon 1 month prior to peak uptake. Despite grasslands being one of the most diverse and difficult ecosystems to simulate, SiB4 is able to predict these observed site-level and large-scale grassland behaviors, illustrating that it is a useful predictive tool available for studies ranging from carbon cycle predictions to ecosystem process investigations to climate change evaluations. This work used numerous sites of MODIS LAI subset data, and we thank the MODIS Land Product Subsets project and the ORNL DAAC for processing the data and making it publicly available online (https:// modis.ornl.gov/sites/). This work used Long Term Ecological Research (LTER) data at four sites, and we thank the LTER Network for making these data available online (http://lternet.edu). Site-specific acknowledgments and funding for these data are as follows: for US-Jrn, the Jornada Basin LTER project and the U.S. National Science Foundation (Grant DEB-1832194)