Improving Estimates of Gross Primary Productivity by Assimilating Solar‐Induced Fluorescence Satellite Retrievals in a Terrestrial Biosphere Model Using a Process‐Based SIF Model

Over the last few years, solar‐induced chlorophyll fluorescence (SIF) observations from space have emerged as a promising resource for evaluating the spatio‐temporal distribution of gross primary productivity (GPP) simulated by global terrestrial biosphere models. SIF can be used to improve GPP simulations by optimizing critical model parameters through statistical Bayesian data assimilation techniques. A prerequisite is the availability of a functional link between GPP and SIF in terrestrial biosphere models. Here we present the development of a mechanistic SIF observation operator in the ORCHIDEE (Organizing Carbon and Hydrology In Dynamic Ecosystems) terrestrial biosphere model. It simulates the regulation of photosystem II fluorescence quantum yield at the leaf level thanks to a novel parameterization of non‐photochemical quenching as a function of temperature, photosynthetically active radiation, and normalized quantum yield of photochemistry. It emulates the radiative transfer of chlorophyll fluorescence to the top of the canopy using a parametric simplification of the SCOPE (Soil Canopy Observation Photosynthesis Energy) model. We assimilate two years of monthly OCO‐2 (Orbiting Carbon Observatory‐2) SIF product at 0.5° (2015–2016) to optimize ORCHIDEE photosynthesis and phenological parameters over an ensemble of grid points for all plant functional types. The impact on the simulated GPP is considerable with a large decrease of the global scale budget by 28 GtC/year over the period 1990–2009. The optimized GPP budget (134/136 GtC/year over 1990–2009/2001–2009) remarkably agrees with independent GPP estimates, FLUXSAT (137 GtC/year over 2001–2009) in particular and FLUXCOM (121 GtC/year over 1990–2009). Our results also suggest a biome dependency of the SIF‐GPP relationship that needs to be improved for some plant functional types.


Introduction
The terrestrial biosphere currently plays a pivotal role for climate by offsetting about one fourth of carbon dioxide (CO 2 ) emissions released by anthropogenic activities into the atmosphere (Le Quéré et al., 2018). Terrestrial ecosystems assimilate CO 2 in leaf chloroplasts by photosynthesis, but most of the assimilated carbon is released back into the atmosphere (mainly as CO 2 ) through ecosystem respiration. Although the net carbon budget of terrestrial ecosystems is the main quantity of interest for climate studies, quantifying the gross fluxes (photosynthesis and respiration) is crucial in order to better understand the drivers of the net fluxes. Current knowledge of the spatial and temporal distribution of net and gross carbon fluxes over the globe is mainly driven by atmospheric and ecosystem measurements, but our ability to anticipate their evolution under a changing climate largely relies on global terrestrial biosphere models (TBMs) (Sitch et al., 2015). TBMs numerically represent the different processes controlling the exchanges of trace gases (including CO 2 ), water, and energy, between the atmosphere and the biosphere in a simplified manner. The large uncertainties that remain in our understanding of the carbon sequestration in land ecosystems (Anav ©2019. The Authors This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Friedlingstein et al., 2014) lead to a large spread in TBM simulations, especially in terms of spatiotemporal patterns of the terrestrial gross primary productivity (GPP). In practice, the spread results from structural differences in the representation of key processes, that is, their mathematical formulation (Jung et al., 2007;Kuppel et al., 2013), and from the different fixed values assigned to the associated parameters. For instance, the impact on simulated GPP resulting from a plausible change in parameter values ranges from about 10 to 15 GtC/year in the global GPP budget for the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE) model  to 3-4 times these values for the JSBACH model . The question of optimizing parameter values in TBMs from the existing limited number of observations, obtained at heterogeneous spatial scales (from the individual plant organs to the plant community) and under specific environmental conditions, has therefore grown in importance over the recent past and now involves sophisticated data assimilation approaches as well as simple heuristic ones.
Data assimilation (DA) techniques enable the numerical minimization of the mismatch between model predictions and corresponding independent observations, within prescribed confidence intervals, by adjusting the model parameters (Kaminski et al., 2002;Raupach et al., 2005). In addition to atmospheric CO 2 mole fraction measurements (Koffi et al., 2012;Peylin et al., 2016;Rayner et al., 2005;Schürmann et al., 2016), local scale eddy-covariance flux measurements (Fox et al., 2009;Groenendijk et al., 2011;Knorr & Kattge, 2005;Sacks et al., 2007;Wang et al., 2001;Williams et al., 2005) or remote sensing proxies of vegetation greenness like vegetation indices Migliavacca et al., 2009) and the fraction of absorbed photosynthetically active radiation (FAPAR) Kaminski et al., 2012;Knorr et al., 2010;Stöckli et al., 2008;Zobitz et al., 2014), are traditionally assimilated to constrain the value of the main model parameters controlling the carbon cycle. Eddy covariance flux and atmospheric CO 2 measurements provide ecosystem-to large-scale information on net ecosystem exchange. Vegetation indices (VI) or FAPAR spaceborne products, on the other hand, provide information on the photosynthetic potential of the ecosystemsi.e. the "greenness" of the vegetation; however, these measures do not always give accurate information about the actual photosynthetic activity (Grace et al., 2007;Zhang et al., 2016) or the short-term variations of GPP , given that their estimates also depend on nongreen canopy elements (i.e., not only by chlorophyll) and are not directly related to the photosynthetic activity itself (Smith et al., 2018;Walther et al., 2016). Satellite retrievals of solar-induced fluorescence (SIF) over land surfaces have been available for nearly a decade and their use for model parameter optimization is starting. These observations exhibit a strong correlation with GPP at broad spatial and temporal scales (Frankenberg et al., 2011;Guanter et al., 2012;Zhang et al., 2016) and at site scale for different vegetation types (Joiner et al., 2013;Madani et al., 2017;Sanders et al., 2016;Yang et al., 2015). SIF tracks GPP seasonality better than traditional vegetation indices Smith et al., 2018;Walther et al., 2016) and provide complementary information on the drivers of plant phenology and physiology (Jeong et al., 2017). There is theoretical evidence to support the covariation of SIF and GPP. Chlorophyll fluorescence originates in the chloroplasts inside the leaf. It is strongly linked to CO 2 fixation Porcar-Castell et al., 2014) and is affected by regulation mechanisms modulating photosynthesis due to environmental conditions, for example, water availability (Flexas et al., 2002;Sun et al., 2015;Wang et al., 2016). The top of canopy SIF emission can be expressed as the product of the absorbed photosynthetically active radiation (APAR) by the fluorescence yield of the canopy Φ F (also referred as the SIF light-use efficiency  that accounts for these mechanisms occurring inside the leaves) modulated by the probability Ω c that a SIF photon emitted within the canopy escapes to space (Frankenberg & Berry, 2018): SIF = APAR × Φ F × Ω c . This theoretical expression is analogous to the classical expression of GPP as a function of photosynthetic light use efficiency (LUE): GPP = APAR × LUE Joiner et al., 2014).
Given the recent availability of SIF observations, modeling groups are thus adapting TBMs to take advantage of this GPP proxy in order to evaluate simulations of gross productivity Thum et al., 2017). The next logical step is to assimilate the SIF data to optimize their parameters (Koffi et al., 2015;MacBean et al., 2018;Norton et al., 2018). Both model evaluation and optimization (via DA techniques) require the development of an observation operator that functionally relates the model variables (e.g., photosynthesis and fluorescence) and related parameters to the observed SIF.
Two different classes of method can be distinguished. The first simple approach is based on a statistical linear relationship between GPP and SIF (MacBean et al., 2018), usually defined at the level of plant functional type (PFT), the typical unit of vegetation description in TBMs. However, such simple relationship limits the potential of SIF data to constrain photosynthetic parameters, as part of the SIF information is hidden by the statistical relationship. The second more process-based approach relies on the incorporation of (or some part of) the Soil Canopy Observation Photosynthesis Energy (SCOPE) model (van der Tol et al., 2009 in TBMs. SCOPE combines a fully prognostic leaf biochemical model and a one-dimensional (1-D) canopy radiative transfer model. SCOPE therefore enables simulations of chlorophyll fluorescence emitted by the canopy depending on the leaf biochemistry, canopy structure, meteorological conditions (including incoming radiation), and observation geometry. However, the use of SCOPE in TBMs is computationally expensive. Its implementation within the Biosphere Energy Transfer Hydrology (BETHY) model (Norton et al., 2018) has increased CPU computation expenses by more than 3 orders of magnitude in data assimilation runs (Norton, personal communication)-the major computational cost being attributed to the radiative transfer scheme. As a consequence, alternatives or simplified versions of such 1-D radiative transfer modeling scheme need to be incorporated into complex TBMs for efficient and computationally affordable assimilation purposes.
In this context, the objectives of our paper are the following: • To describe a new simplified process-based model of canopy fluorescence implemented in the ORCHIDEE (Krinner et al., 2005) TBM. The approach relies on a new parametric model to estimate the total non-photochemical quenching (NPQ) and a simplification of the 1-D radiative transfer model in SCOPE to reduce the computational burden; • To use the SIF observation operator to assimilate SIF retrievals from the Orbiting Carbon Observatory-2 (OCO-2) instrument (Frankenberg et al., 2014, Sun et al., 2018 in ORCHIDEE in order to constraint its main photosynthetic and phenology parameters; • To assess the overall potential of the SIF data to optimize the photosynthesis of ORCHIDEE, comparing the prior and posterior (after the assimilation of SIF data) spatial and temporal variation of GPP against independent information.
The modeling approach of SIF in ORCHIDEE is described in section 2. The simulation setup, the data used for assimilation and evaluation of the optimized model, and the assimilation scheme are described in section 3. The results relative to assimilation are presented in section 4. In section 5, we then discuss the constraints brought by SIF on photosynthesis and phenology in TBMs, the limits of the approach, and propose strategies aiming at making the best use of SIF observations for reducing uncertainties in TBM parameterization.

Overview of the ORCHIDEE Terrestrial Biosphere Model
ORCHIDEE is a mechanistic TBM that simulates the exchanges of carbon, water, and energy between biosphere and atmosphere (Krinner et al., 2005). It is the land surface component of the Earth System Model of Institut Pierre-Simon Laplace IPSL-CM (Dufresne et al., 2013) and therefore has been contributing to the recent exercises of the Coupled Model Intercomparison Project (CMPI) established by the World Climate Research Programme (https://www.wcrp-climate.org/wgcm-cmip). Photosynthesis and all components of the surface energy and water budgets are calculated at a half-hourly resolution while the dynamics of the carbon storage (including carbon allocation in plant reservoirs, soil carbon dynamics, and litter decomposition) are resolved on a daily basis. Photosynthesis depends on light availability and CO 2 concentration, soil moisture, and temperature and is parameterized based on Farquhar et al. (1980) and Collatz et al. (1992) for C3 and C4 plants, respectively. The formulations of Yin and Struik (2009) are used to describe the main photosynthesis processes, including the stomatal conductance and the rates of CO 2 assimilation under Rubisco and electron transport limitations and the intercellular CO 2 partial pressure (Vuichard et al., 2018). As in most TBMs, the spatial distribution of vegetation is represented using fractions of PFTs for each grid point (Cramer, 1997;Prentice et al., 1992;Wullschleger et al., 2014). Here we use a new land cover map for the year 2015 with 14 PFTs (among which are two representative C3 and C4 crop PFTs) in addition to bare soil. It is derived from the ESA CCI Land Cover (LC) products (see https://orchidas.lsce.ipsl.fr/dev/lccci/ for more details) and a LC-to-PFT cross-walking approach described in Poulter et al. (2015) (see Table 3 for the description of the PFTs).
We use the soil map from Zobler (1986) to prescribe the texture class in each grid cell. The impacts of land use change, forest management, harvesting, and fires are not included in this study. Except for the phenology (detailed in Botta et al. (2000) and MacBean et al. (2015)), processes are described with the same governing equations for all PFTs, but usually with different parameter values.

General SIF Modeling Approach
The modeling of SIF in ORCHIDEE relies on the formalism developed in SCOPE (v1.61) at the leaf level and transposed at the canopy scale, in order to calculate (1) the relative contribution of photosystems I and II (denoted PSI and PSII, respectively) to leaf level fluorescence and (2) the total fluorescence flux emitted at the top of the canopy (TOC). At TOC, the total fluorescence spectrum SIF (in W.m −2 .μm −1 .sr −1 ) at wavelength λ is expressed as a linear combination of the fluorescence spectra of PSI and PSII, with the PSII fluorescence flux modulated by a regulation factor ε (Verrelst et al., 2015;Zhang et al., 2016): Parameter ε is a regulatory scaling factor that represents the changes in the fluorescence emission of PSII in response to the physiological regulation of PSII related to meteorological forcing and plant stress (see equation (2)). As a first approximation, it is calculated for the TOC leaves. Also, given that the spatio-temporal dynamics of PSI fluorescence remain unknown, we assumed that the fluorescence emitted by PSI does not vary with environmental conditions or across species and that its fluorescence yield remains constant (Pedrós et al., 2010;Porcar-Castell et al., 2014). TOC PSI and PSII SIF fluxes are calculated with a parametric representation of the 1-D radiative transfer model in SCOPE, which accounts for the effect of the canopy structure on re-distribution (absorption/diffusion) of fluorescence fluxes through the canopy.

Leaf Scale
The competing processes of photosynthesis, fluorescence and thermal energy dissipation, occur in the chloroplast; however, we modeled them at the leaf level, which is the smallest physiological unit considered in ORCHIDEE. In a first step, we define the regulatory scaling factor ε that accounts for the changes in fluorescence yield in PSII at the steady state (ΦF ′ PSII ;see equation (3)), relative to a minimal fluorescence level during summer (ΦF 0 PSII ) obtained in the absence of physiological stress (NPQ approaching zero) and with a maximum fraction of open and functional PSII reaction centers (Baker, 2008).
To compute the quantum yields of the processes related to PSII, we assumed that there is a large connectivity between photosynthetic units, correctly represented by the "lake" model described for instance in Lavergne and Trissl (1995), Kramer et al. (2004), Porcar-Castell (2011), or Amarnath et al. (2016. In this model, each yield equals the ratio of the rate constant of each process divided by the sum of the rate constants over all the competing processes. In the lake model formalism, ΦF ′ PSII is expressed as a function of the photochemistry yield for PSII (ΦP ′ PSII ) (Porcar-Castell et al., 2014;Lazár, 2015;Koffi et al., 2015;Lee et al., 2015): and depends on the rate constants of fluorescence (k F ) and of thermal energy dissipation. Thermal energy dissipation is divided between an invariant part (k D , basal or constitutive thermal energy dissipation) and a variable part, k NPQ , related to the total NPQ process (i.e., reversible and sustained NPQ). The fluorescence rate constant k F does not vary (Porcar-Castell, 2011), while the rate constant related to NPQ is regulated by environmental conditions. For simplicity, all rate constants are expressed in units relative to the sum of k D and k F , with k D set to 0.95 and k F to 0.05, and the rate constant of photochemistry k P was set to 4 as in SCOPE . This corresponds to a value of the reference fluorescence yield ΦF 0 PSII (without NPQ) of 0.01, and a corresponding maximum photochemical yield ΦP PSII MAX of 0.8. In equation (3), only k NPQ and ΦP ′ PSII are unknown. We calculate these variables using the approaches presented in sections 2.3.1 and 2.3.2.

Modeling of the PSII Photochemical Yield
The yield of photochemistry for PSII (ΦP ′ PSII ) is a function of the most limiting factors between electron transport rate (Maxwell & Johnson, 2000;Porcar-Castell et al., 2014;Ruuska et al., 2000) and the carboxylation rate (see equation (4)). These two factors are already calculated by the photosynthesis module of ORCHIDEE, based on the Farquhar et al. (1980) and Collatz et al. (1992) models for C3 and C4 plants, respectively: where ETR c PSII , the electron transport rate (in μmol e − .m −2 .s −1 ) limited by the carboxylation rate, derives from equation (2) (C3 plants) and equation 23 (C4 species) in Yin and Struik (2009), while ETR is calculated according to equation 4 of the same reference. The factor c (=0.5 mol e − .(mol photon absorbed) −1 ) represents the partition of energy between the two photosystems, which is assumed identical (as in Maxwell & Johnson, 2000;Hendrickson et al., 2004;Galmés et al., 2007). The absorbed photosynthetically active radiation (APAR) is expressed as the incoming photon flux density (μmol photon .m −2 .s −1 ) multiplied by the leaf absorptance (α leaf ).
In the standard ORCHIDEE version used for this study, the conversion efficiency of absorbed light into ETR at limiting light relies on a foliar absorptance of 0.8 (Medlyn et al., 2002). Baker (2008) and further Porcar-Castell et al. (2014) report a value of 0.84 for α leaf . Instead of using a constant value, we have introduced a parameterization of α leaf as a function of the leaf chlorophyll a and b content (C ab ). This approach also links the effects of the photosynthetic pigments on the leaf photochemistry and on the radiative transfer within the canopy: C ab is indeed one of the input parameters of SCOPE that has the largest impact in the PAR domain (400-700 nm; cf. section 2.4), because it also controls the amount of fluoresced light re-absorbed by leaves in the red domain. The parameterization relies on leaf absorptance simulations performed with the PROSPECT model of leaf optical properties (Feret et al., 2008;Jacquemoud & Baret, 1990), where the parameters of leaf structure, carotenoid content and chlorophyll content, were varied at the same time, within respective prescribed ranges of variation. The average leaf absorptance over the PAR domain is expressed as The parameter values were optimized against PROSPECT simulations (Table 1); with those values, leaf absorptance for a leaf chlorophyll content of 40 μg.cm −2 is 0.88. The C ab parameter was introduced in ORCHIDEE as PFT dependent, and a prior value of 40 μg.cm −2 was set for all vegetation types (Table 3).
In addition, we also account for temporal variations of C ab with leaf age in the same way the control of leaf age on the maximum carboxylation capacity is already implemented in ORCHIDEE (see Krinner et al., 2005;Santaren et al., 2007), a covariation of these traits having been documented by Niinemets et al. (2004), Muraoka and Koizumi (2005), and Šesták and Čatský (1962).

Modeling of the Non-photochemical Quenching Effect, k NPQ
Because there is no simple model in the literature that calculates k NPQ (for total NPQ) as a function of the ORCHIDEE state variables related to the photosystem functioning, we followed an approach similar to the one used in SCOPE (van der Tol et al., 2014, see also Lee et al., 2015;Koffi et al., 2015) that consists in determining a parametric model of k NPQ based on leaf fluorescence measurements. Compared with the above-mentioned studies however, we did not use the same measurement data sets and we explicitly accounted for the effect of incoming radiation and temperature on total NPQ. The model for k NPQ was calibrated using the measurements of (i) Porcar-Castell (2011) for Pinus sylvestris that are adapted to boreal climate and (ii) Flexas et al. (2002) for several broadleaf (Celtis australis L., Pistacia terebinthus L., Quercus ilex L.) and crop (Solanum melongena L.) species that are more adapted to a Mediterranean climate. Since we did not have any reasonable proxies to estimate the contribution of PSI to the PAM fluorescence data, we decided not to attempt any correction for PSI fluorescence in the calibration PAM data set and avoid any additional source of uncertainty: The values of the parameters are provided in Table 1. T is the air temperature, bounded to −2.5°C. As in van der Tol et al. (2014), the parameter x is defined as Despite the differences in terms of plant species and climate conditions between the two data sets used for tuning the k NPQ model (equation (6)), the model is able to reproduce both observation data sets reasonably well, as illustrated in Figure 1. The better performance seen for the Flexas data set is explained by the smaller number of measurements (78), also performed under more controlled conditions (acquisitions performed at midday in summer), than the 24,500 Pinus sylvestris measurements that correspond to about one year of continuous monitoring. We evaluated the models of van der Tol et al. (2014) and Lee  on the tuning data set, and they showed lesser performance: the root-mean-square error (RMSE) were, respectively, of 2.05 and 2.25 s −1 over the Porcar-Castell (2011) data, and 0.75 and 1.45 s −1 over the Flexas et al. (2002) data.

Canopy Level
For the calculation of the TOC fluorescence fluxes of PSI and PSII (equation (1)), we set up a parametric model that we tuned from an ensemble of simulations of the SCOPE (v1.61) model (equation (8)). The ensemble was designed by simultaneously varying the most influential model parameters on canopy fluorescence found in the sensitivity study conducted by Verrelst et al. (2015). In the selection of key parameters, we also considered the possibility to relate them to ORCHIDEE parameters or state variables. In the PAR domain, the main contributors to the total canopy fluorescence are the leaf area index (LAI; which is prognostically calculated by ORCHIDEE), the chlorophyll a + b content, the leaf angle distribution, the dry matter content, and the incoming radiation. The leaf angle distribution implemented in SCOPE depends on two parameters. In order to reduce the number of degrees of freedom of the parametric model, we replaced it by the ellipsoidal distribution (Campbell, 1990) that depends on the average leaf angle (ALA) parameter only. Dry matter content is inversely proportional to specific leaf area (SLA) which is an input parameter of ORCHIDEE. Given we aim at assimilating OCO-2 observations acquired at nadir, we fixed the view zenith angle at 0°while variations of the sun zenith angle are accounted for.
We conducted a preliminary sensitivity analysis of the SCOPE model to identify its main drivers (within the confines of parameters and variables used or simulated by ORCHIDEE) and mimic their mean effects through mathematical functions. The simulations were determined thanks to a statistical approach used in Design Of Experiments for Simulation (DOES) (Bacour et al., 2002). A Hyper Graeco Latin Geometric sampling scheme made of 2,401 simulations was used (as in Jacquemoud et al., 2009). Table 2 lists SCOPE input parameters and indicates those varying with each simulation, including their variation interval, and those kept fixed, with their associated value. Along the simulations, the varying parameters took seven discrete values evenly spread over their variation range.
The determination of the mean effect of each parameter from the DOES approach helped to identify the mathematical functions that best reproduce them, and therefore to further build the parametric model. Several formulations were tested and the one that best fits the original SCOPE fluorescence is expressed as for a given wavelength λ and with parameters a 2 to g 2 estimated by a nonlinear least squares optimization procedure, using SCOPE radiative transfer simulations as pseudo-observations (after deactivating the biochemical module). For an application to OCO-2 space-borne observations, the model was adjusted to a wavelength λ of 760 nm and a constant view geometry set at nadir was considered. Note that at this wavelength, the contribution of Cab to the total canopy fluorescence is small (Verrelst et al., 2015). The values of the parameters are listed in Table 1. The ability of the parametric model to reproduce the original SCOPE simulations was assessed over an ensemble of 300 simulations where the dependent parameters were changed randomly within their variation range. The high values of the correlation coefficients (0.996 for both PSI and PSII) together with low RMSE values (0.015 W.m −2 .μm −1 .sr −1 for PSI and 0.025 W.m −2 .μm −1 .sr −1 for PSII) highlights the ability of the parametric model to accurately emulate the radiative transfer module of SCOPE. The parameters a 2 to g 2 will not be further optimized against OCO-2 SIF observations, which is equivalent to assume that the SCOPE model (and its emulated version) is perfect.
We recall that, while the parametric model somewhat represents the fluorescence regime (emission and re-absorption) within the canopy, we consider that the regulation of PSII fluorescence is essentially determined by the top (illuminated) layer of the canopy. This important hypothesis is discussed in section 5.

ORCHIDEE Configuration
ORCHIDEE is forced by 6-hourly meteorological fields from CRU-NCEP v7 (a product combining the climate reanalysis from the National Centers for Environmental Prediction and observations from the Climatic Research Unit) at 0.5°resolution (Viovy, 2018), which are interpolated to a half-hour time step within ORCHIDEE (linearly for all fields except for the shortwave downwelling radiation which is interpolated using the cosine of the solar zenith angle). A prior spin-up simulation was performed at the global scale by recycling forcing data over the 1901-1910 period in order to bring plant carbon reservoirs to equilibrium. This was followed by a transient simulation up to the observation period accounting for changing climatic conditions and the yearly increase of the global atmospheric CO 2 concentrations. Grid-point-scale simulations (section 3.4.1) rely on the initial conditions extracted from this global run.
In order to explicitly account for the impact of the time of OCO-2 overpass in the estimation of SIF, the simulated TOC SIF corresponds to the value at 1:30 pm local time. ORCHIDEE TOC SIF simulations were performed every day (hence accounting for the temporal evolution of the illumination angle) and were then averaged at a monthly resolution (cf. section 3.2.1).
The overarching hypothesis of our approach is that instantaneous SIF estimates (here derived from OCO-2 observations at~1:30 local time) are able to constrain daily GPP (here simulated by the ORCHIDEE model). It is supported by the recent work of Zhang, Xiao, et al., 2018).

Data Sets 3.2.1. Assimilated Data: OCO-2 SIF Products
The OCO-2 instrument was launched in July 2014 to monitor atmospheric CO 2 . Thanks to its high spectral resolution in the O 2 A-band, the land surface SIF is estimated and distributed as a separate product (Frankenberg et al., 2014). We used the OCO-2 B7300 SIF data available at ftp://fluo.gps.caltech.edu/data/ OCO2/sif_lite_B7300/ which has shown very good consistency with airborne SIF measurements (Sun et al., 2017). In this study, we used the monthly synthesis products at a 0.5°× 0.5°spatial resolution for years 2015-2016. The high spatial resolution of OCO-2 observations (footprint of 1.3 × 2.25 km 2 ) combined with their low revisit frequency (16 days) make such spatio-temporal aggregates best suited for comparison with a TBM at the global scale. The aggregated products also have reduced random errors. We only considered nadir observations to remove directional effects not accounted for in this study. OCO-2 has a sunsynchronous orbit with an ascending node at 1:30 pm local time; as we explicitly accounted for the time of the satellite overpass in ORCHIDEE simulations, we assimilated the raw OCO-2 SIF product and not the daily normalized product.

GPP Evaluation 3.2.2.1. Procedure
We compared ORCHIDEE GPP simulations with respect to two independent GPP estimates derived from data-driven approaches. We used different years for the evaluation of the simulated GPP than for the assimilation of SIF data (2015)(2016); this can be seen as an opportunity to assess the temporal validity of the optimized model. To do so, we used the parameter values optimized over the assimilation period to perform global-scale hindcast simulations of GPP.

Data Sets
In the first place, we choose the recent FLUXCOM GPP products described in Tramontana et al. (2016), based on an upscaling of in situ estimates of GPP at FLUXNET sites (http://fluxnet.ornl.gov) with several global MODIS (Moderate-Resolution Imaging Spectroradiometer) satellite products and meteorological forcing. In this study, we used the median between the three machine learning estimates as a reference GPP and based on the flux partitioning approach of Reichstein et al. (2005). The spatial and temporal resolutions of the FLUXCOM GPP are 0.5°and monthly, the same as the ORCHIDEE simulations. FLUXCOM should not be considered as a validation product (as in Norton et al., 2018) given the sparsity of the training flux sites and other shortcoming issues (for instance, it does not represent the fertilization effect of increasing CO 2 atmospheric concentrations), but it is a commonly used product usually taken as a benchmark. In Figure 3 of Tramontana et al. (2016), a scatterplot of their eightday product against independent observations shows a RMSE of 1.5 gC.m −2 .d −1 . We also use the FLUXSAT GPP estimates described in Joiner et al. (2018). Similarly to FLUXCOM, FLUXSAT also derives from a data-driven approach relying on FLUXNET measurements and MODIS reflectances in seven spectral bands and calibrated against FLUXNET measurements. One noteworthy difference is that FLUXSAT does not use any meteorological forcing. A dual-calibration procedure is applied by discriminating low versus high productive FLUXNET sites, the identification of which is based on space-borne SIF products derived from Global Ozone Monitoring Experiment 2 (GOME-2) observations. FLUXSAT GPP calibration remains independent from SIF data and the GPP estimates are not impacted by GOME-2 SIF artifacts (instrument degradation; . The FLUXCOM product is used as the main benchmark for the analyses presented in the following, given that it has been used in numerous evaluation studies regarding land surface model simulations (Norton et al., 2018; 2019, among others) and space-borne SIF estimates (Walther et al., 2016;Sun et al., 2018;Zhang, Xiao, et al., 2018). We used FLUXCOM data available over the 1990-2013 period, with specific time windows depending on the evaluation purpose: focus on year 2008 for assessing the modeled GPP spatial patterns at the global scale relative to SIF data, 2008-2013 for the temporal evolution of the modeled GPP for an ensemble of OCO-2 pixels, and 1990-2009 for characterizing the trend of the yearly global GPP budget. FLUXSAT is available from 2000 onward. We used the data over 2001-2009 to assess the yearly variations of the GPP budget at the global scale.

Overview of the Data Assimilation Framework
The optimization of ORCHIDEE with regard to OCO-2 SIF products was performed with the ORCHIDEE Data Assimilation System (ORCHIDAS) (https://orchidas.lsce.ipsl.fr/). ORCHIDAS has been described in detail in previous studies focusing on the assimilation of in situ net surface CO 2 and energy flux observations, satellite-derived vegetation indices or FAPAR products and atmospheric CO 2 concentration data Kuppel et al., 2012;MacBean et al., 2015;Peylin et al., 2016), and on the assimilation of GOME-2 SIF products using a simple linear SIF-GPP relationship (MacBean et al., 2018).

Journal of Geophysical Research: Biogeosciences
ORCHIDAS relies on a Bayesian framework to optimize a set of selected ORCHIDEE parameters (called control variables and gathered in a control vector x) from a background (a priori knowledge) estimate (x b ), using OCO-2 SIF observations (y), while explicitly accounting for the uncertainties in parameters and observations (assuming Gaussian probability distributions as in most data assimilation studies). The optimal set of parameters is found by minimizing a function J(x) that gauges the misfit between model outputs (H(x)) and observations, together with the departure of x from x b , relative to the error covariance matrices of the observations (R, including model and observation errors) and of the parameters (B; Tarantola, 2005): How the B and R matrices are determined is described, respectively, in sections 3.4.1 and 3.4.2.
After optimization, several metrics can be calculated to assess the knowledge improvement brought by the assimilation. The posterior error covariance matrix of the optimized parameters (A) is computed under the hypothesis that the model is linear in the vicinity of the solution. It is a function of H ∞ , the Jacobian matrix of the model at the solution: The uncertainty reduction on model parameters is quantified by 1 − σ post /σ prior , where σ post and σ prior are the error standard deviations derived from the posterior (A) and prior (B) covariance matrices, respectively.
Finally, to quantify the model improvement, we calculated the reduction of the root-mean-square difference (RMSD) between model and data, prior and posterior to optimization, expressed in %, as 100 × (1 − RMSD post /RMSD prior ).

Data Assimilation Setup 3.4.1. Optimized Parameters and Associated Uncertainties
We optimized several ORCHIDEE parameters involved in carbon assimilation and hence directly impacting GPP dynamics through photosynthesis and phenology processes. The parameter values vary with PFT. Most of these parameters have already been identified in previous studies (Kuppel et al., 2014;MacBean et al., 2018;Peylin et al., 2016) that focused on an older version of the model. The process equations they are involved in are described in these papers as well as in Krinner et al. (2005), Santaren et al. (2007), and MacBean et al. (2016). New ones were introduced in the control vector x for this version of ORCHIDEE. The list of optimized PFT-dependent parameters is provided in Table 3, with their prior value and defined interval of variation (defined from a literature survey and from expert knowledge). The "photosynthesis" parameters in Table 3 have a direct impact on the ε regulatory scaling factor of PSII fluorescence emission (equation (1)). Note that ETR is controlled by maximum rate of e − transport rate under saturated light (J max25 ) which is inferred from the maximum rate of Rubisco activity-limited carboxylation at 25°C (V cmax25 parameter). For C3 plants, the ratio J max25 /V cmax25 is a linear function of the plant growth temperature, following Kattge and Knorr (2007); the offset is 2.59 and the slope is −0.035 (see their Table 3). For C4 plants, the model does not consider temperature acclimation, with a fixed ratio of 1.715. Two parameters specific to the SIF model were also adjusted, C ab and ALA. These two parameters are currently only impacting the computation of the SIF fluxes and not the photosynthesis calculation. Like many ORCHIDEE parameters, C ab and ALA are PFT dependent. However, without any prior information on their distribution with vegetation type, the same a priori values and variation ranges were used across PFTs. The a priori value for ALA corresponds to a spherical distribution of leaves. Only the diagonal elements of the uncertainty matrix of the parameters B were accounted for, with the standard deviation of the error associated to each parameter being set to 40% of its variation range (Kuppel et al., 2012).

Selection of OCO-2 Grid Points and Observation Uncertainties
The spatial resolution of the OCO-2 product used here (0.5°) somewhat precludes the assimilation of global observation maps because of excessive computational times. Instead, we assimilated OCO-2 time series for an ensemble of selected grid points, an approach already employed in MacBean et al. (2015MacBean et al. ( , 2018. For each PFT, 15 grid points were randomly chosen among those with the highest fractions of the targeted PFT for the optimization of the model, and 10 other ones for evaluation purposes. Grid points with a 10.1029/2019JG005040

Journal of Geophysical Research: Biogeosciences
fraction of the target PFT lower than the 85 th percentile were discarded in order to select highly homogeneous grid points. The higher biome homogeneities were found for TrEBF, TeEBF, TeC3GRA, BoC3GRA, and C4GRA, vegetation types with 85 th percentile values of the vegetation fraction above The RMSD of fit and correlation coefficient over all grid points before and after assimilation is provided for each vegetation type. The reduction of the RMSD is provided (%) for the 15 optimization grid points (black italic) and for 10 validation grid points (grey italic). The group (letter G next to PFT name) refers to model changes relative to SIF and GPP, and is described in section 4.2.1.

Journal of Geophysical Research: Biogeosciences
70% for the selected grid points. For the other PFTs, the 85 th percentile threshold corresponds to vegetation fractions usually higher than 50%, except for C4CRO (between 40% and 56%) and TrC3GRA (46-73%). In addition, grid cells exhibiting a low temporal correlation between the ORCHIDEE prior SIF simulation and OCO-2 SIF observations were discarded from the selection, given that they likely reflect model-data inconsistencies that the assimilation would not reconcile. To do this, we removed grid points with a correlation below a prescribed threshold (55 th percentile of the correlation distribution estimated PFT per PFT). Finally, only grid cells with more than six valid OCO-2 monthly means per year were retained.
As in previous ORCHIDAS studies ( MacBean et al., 2015;Santaren et al., 2014), the observation uncertainty (R matrix) was set diagonal and the prior error was defined for each PFT as the RMSD between the prior model simulations and OCO-2 SIF data over all 15 grid points selected for assimilation. The values of the prior RMSD for each PFT, averaged over the 15 grid points, are provided in Figure 2a. The observation uncertainty accounts for uncertainties in the SIF retrieval processing chain (from the radiance measurements to the monthly gridded products) and uncertainties originating from the ORCHIDEE-SIF model (including parameterization uncertainty as well as uncertainties in the meteorological forcing and in the prescribed land cover).

Data Assimilation Procedure
The assimilation was performed in two steps in order to reduce the computational burden. At the first step, it was performed PFT per PFT (over the 15 selected grid points) for the 14 vegetated PFTs, and only the ORCHIDEE parameters of the main PFT at the 15 grid points were adjusted by the optimization (similar to MacBean et al., 2018). In a second step, all pixels/PFTs were combined, in order to better account for the vegetation mixing within grid cells, the optimization process starting from the parameter values pre-optimized at the first step. The iterative minimization of J(x) (equation (9)) was achieved by the gradient-based L-BFGS-B algorithm (Byrd et al., 1995) which accounts for parameter bounds. It requires the estimation of the gradient of the misfit function at each iteration, here calculated by a finite difference approach. The minimum of J(x) detected by the assimilation algorithm may be sensitive to the starting point of x due to possible multiple local minima (arising from model nonlinearity) (Bastrikov et al., 2018). For this reason, we carried out 20 optimizations at step 1 starting from 20 different parameter first guesses, chosen randomly within their range of variation. We selected the solution of step 1 as the one out of the 20 that resulted in the lowest posterior value of J(x); for parameters that do not depend on PFT, we retained the median value over all PFTs as the estimated value. The values estimated at step 1 were then used as first guesses for the second step of the assimilation procedure. Note that the R and B matrices remained unchanged from the first to the second step. Figure 2a illustrates the mean improvement of the model with respect to SIF observations for the grid points used for optimization and validation. The prior model already shows relatively good agreements with the mean OCO-2 SIF time series, but disagreements in magnitude and seasonality are nonetheless visible, the importance of which varies with the PFT. An overestimation of the prior SIF magnitude is seen for most PFTs, except for TeDBF, BoDBF, TrEBF, BoC3GRA, C3CRO, and C4CRO, where it is underestimated. For deciduous PFTs, the prior model also generally simulates a longer growing season. The assimilation generally corrects these general biased features in the model. For the optimization grid points, the median value of the reduction in RMSD between observed and simulated SIF is 18% across all PFTs, ranging from only 0.3% for TrC3GRA to about 45% for C4GRA.

Improvement of the Modeled SIF After Optimization
The RMSD reduction for the 10 evaluation grid points is usually lower, with a median value of 14% over all PFTs. Noticeably, it is however higher than the one computed over the optimization grid points for six PFTs (TrDBF, TeEBF, TeDBF, BoDNF, TeC3GRA, BoC3GRA) while higher posterior RMSD are obtained for TeENF and TrC3GRA which indicates a degradation of the model performance.

Journal of Geophysical Research: Biogeosciences
peak of SIF during the growing season; for C3CRO the only improvement is related to phenology. For the several deciduous PFTs, the temporal improvement of SIF is achieved by a shortening of the growing season length mostly via an earlier senescence.

Impact on GPP Spatio-temporal Patterns 4.2.1. GPP Temporal Dynamics at the Optimization Grid Points
The impact on the GPP simulation resulting from the assimilation of OCO-2 SIF retrievals is presented in Figure 2b for the mean cycle over the 2008-2013 period. For 11 PFTs out of 14, we note an increase of the consistency between the model and FLUXCOM-GPP after optimization with a median improvement of about 6% over all PFTs. The highest improvements are seen for C4GRA (by 53%), TrEBF (29%), TrC3GRA (18%), BoENF (16%), C3CRO (16%), and TeC3GRA (15%). At the opposite, an increase in RMSD is obtained for four PFTs: for temperate (by 38% for TeDBF, up to 56% for TeEBF) and boreal (by 4% for BoDBF) forests, as well as C4CRO but only for the validation pixels (about 7%).
For all PFTs but three, the variation in SIF magnitude through the optimization (increase or decrease) is consistent with that of GPP. For TrEBF, BoC3GRA, and C4CRO, however, opposite variations in modeled SIF and GPP are obtained after assimilation. These opposite variations are partly attributed to two parameters, C ab and ALA, that depend only on the SIF model. The relative differences between ORCHIDEE SIF, OCO-2 SIF, and FLUXCOM-GPP magnitude suggest some biome dependence in terms of PSII fluorescence yield and/or of radiative transfer regimes.
A first group of PFTs (G1 in Figure 2) encompasses those showing similar relative changes (posterior-prior) in SIF and GPP magnitudes simulated by ORCHIDEE and for which the assimilation results in a joint improvement with respect to both SIF and GPP benchmark. It encompasses nine PFTs: TrDBF, TeENF, boreal forests (BoENF, BoDBF, BoDNF), TeC3GRA and TrC3GRA, C4GRA, and C3CRO. For TrDBF and TrC3GRA however, the posterior amplitude of the SIF and GPP temporal profiles remains lower than that recorded by OCO2-SIF or estimated by FLUXCOM-GPP.
The second group of PFTs (G2: TeEBF and TeDBF) corresponds to those exhibiting a coherent change in simulated magnitude for SIF and GPP after assimilation, but for which the optimized GPP further departs from FLUXCOM. For these PFTs, the prior model was already in good agreement with FLUXCOM estimates. The change in modeled SIF magnitude required to fit OCO-2 data results in a strong underestimation/overestimation with respect to FLUXCOM-GPP for TeEBF/TeDBF. Note that for TeDBF, the increase of SIF amplitude during the growing season in the optimized model still underestimates OCO-2 SIF amplitude. Importantly however, the posterior GPP overestimates FLUXCOM and is degraded compared to the prior model-data fit. This apparent inconsistency between the model, OCO-2 SIF and FLUXCOM, still need to be resolved, particularly given that these PFTs are well represented by eddy covariance tower sites in the FLUXCOM product.
The third and fourth groups (G3: TrEBF and BoC3GRA; G4: C4CRO) correspond to PFTs for which the model optimization results in opposite variation of SIF and GPP. While G3 shows an improved agreement between the optimized ORCHIDEE model and FLUXCOM, G4 shows an increased mismatch but only for the validation pixels.
With regard to the seasonality, we mostly observe an enhanced agreement between the optimized model and both OCO-2 SIF observations and FLUXCOM-GPP estimates (higher correlation coefficients for 11 PFTs out of 14). For several PFTs (TeENF, TeDBF, BoENF, BoDBF, BoDNF), the optimized model however simulates a shifted carbon uptake period as compared to FLUXCOM (later start and later senescence), although it captures reasonably well what is actually monitored by OCO-2 SIF.
The fact that we improve the consistency between ORCHIDEE-GPP with FLUXCOM for tropical tree and most herbaceous PFTS and degrade it for some temperate and boreal tree PFTs, while always improving the modeled SIF with respect to OCO-2 data, may indicate that we estimated sub-optimal parameter values. This may also point to a deficiency in our model to represent a biome dependency with respect to the PSII fluorescence yield at the leaf scale or to the radiative transfer.

GPP Spatial Dynamic at Global Scale
The maps in Figure 3 show the changes in ORCHIDEE annual mean SIF and GPP for all continental grid points after assimilation (over 2015-2016 for SIF, and 2008 for GPP), compared to OCO-2 SIF observations 10.1029/2019JG005040

Journal of Geophysical Research: Biogeosciences
and FLUXCOM-GPP estimates. The a priori ORCHIDEE model already reproduces OCO-2 SIF observations reasonably well in terms of spatial patterns and annual magnitude, with however a tendency to underestimate SIF annual-mean values over the tropical rain forests (in particular in the Amazon and Congo basins) and to overestimate SIF in the surrounding regions (Figures 3a and 3b). ORCHIDEE SIF also underestimates SIF product over the temperate Northern Hemisphere (eastern part of the United States and western Europe). Of particular note, OCO-2 exhibits about as high SIF values over the temperate Northern Hemisphere as over the tropics (values above 1.2 W.m −2 .μm −1 .sr −1 ), while

Journal of Geophysical Research: Biogeosciences
FLUXCOM shows a ratio of GPP annual magnitude of about two between the tropics and temperate Northern Hemisphere. Otherwise, the spatial gradients of FLUXCOM-GPP ( Figure 3f) and OCO-2-SIF appear consistent (despite the different periods considered; see also Sun et al., 2018), with a spatial correlation coefficient between the annual means of 0.8. The ORCHIDEE simulations show a higher spatial correlation between prior SIF and prior GPP magnitudes (0.95). The spatial correlation between prior ORCHIDEE-SIF and OCO-2 (0.78) is also slightly lower than the spatial correlation between prior ORCHIDEE-GPP and FLUXCOM-GPP (0.84).
The assimilation of OCO-2 fluorescence estimates decreased the modeled yearly SIF for most regions (blue pixels in Figure 3d), which generally goes along with an increased consistency with respect to OCO-2 SIF. The model improvement occurs at most grid points (red pixels in Figure 3e), but a degradation can be seen for some, in particular in Europe and Asia. In parallel, we observe a redistribution of the modeled GPP at large scale (Figure 3i), with typically an increase in the U.S. corn belt mostly as well as over boreal regions but to a lower extent, and a decrease elsewhere (even more pronounced in South Africa and South America). Where simulated SIF and GPP are both decreased, their joint reduction seems proportional over most regions (varying between 10% and 40%), but a higher decrease in GPP (above 65%) than in SIF (about 50%) can be seen over the tropics (South American and Africa). The large regions in Africa and South America surrounding the tropical rain forest (composed mainly of TrDBF, TrC3GRA, C4GRA, and C3 and C4 crops) are those where the decrease both in SIF and GPP is the most important. Over the tropical rain forests (Amazonia and Africa), noticeably, the model optimization mostly leads to an increased annual SIF and an opposite reduction of yearly GPP (which goes along with an enhanced agreement of ORCHIDEE with FLUXCOM-GPP, as already noticed above with Figure 2).
The reduction in RMSD with respect to FLUXCOM-GPP shows more contrasted results than for SIF with a larger number of pixels with increased RMSD (blue pixels in Figure 3j That positive trend in ORCHIDEE is due to the fertilization effect of increasing CO 2 atmospheric concentrations to which the model is sensitive (Sun et al., 2019), although moderated by a downregulation effect that mimics the role of nitrogen limitation on photosynthesis in the version used, and to the increasing surface temperatures (Anav et al., 2015). In parallel, the interannual variation (IAV) was increased from 1.67 to 1.93 GtC.yr −1 over the 1990-2009 period, which remains higher than FLUXCOM IAV (0.31); over 2001-2009 however, ORCHIDEE-GPP IAV was decreased from 1.43 to 1.19 GtC/year which is closer to that estimated in FLUXSAT (1.25 GtC/year) but departs from the mean IAV of an ensemble of land surface models during the 2000-2010 period in Chen et al. (2017) (1.48).

GPP at the Regional and Ecozone Scales
The latitudinal and ecozonal changes in GPP monthly averages following the assimilation of OCO-2-SIF (2015-2016) are detailed over seven land regions in Figure 4 and illustrated for year 2008. We focused on three latitudinal bands (temperate north (30°-60°), tropics north (0°-30°), and tropics south (0°-30°)) as well as four bioclimatic zones inferred from the grouping of several Koppen Geiger classes derived from Peel et al. (2007). The GPP annual budget simulated by ORCHIDEE was reduced after assimilation for all regions except boreal biomes where a slight increase is observed.

SIF Constraint on Parameter Values and Uncertainties
The large changes in the optimized parameter values relative to the prior values for photosynthesis (mainly V cmax25 , A1, SLA-see Table 3 for their definition) and phenology (LAI MAX , T sen , K LAI-happy , T leaf-init , L age-crit for all PFTs but C3 and C4 grasses and crops) parameters highlight how SIF observations constrain different processes ( Figure 5). , and compared to ORCHIDEE simulations before (blue) and after (orange) grid-point-scale assimilation of OCO-2 SIF estimates. The annual budget associated to each GPP estimate is provided for each region as well as the correlation coefficient (R) between ORCHIDEE simulations and FLUXCOM/FLUXSAT, respectively. The map inserts illustrate each analyzed region. Three latitudinal bands are considered: temperate north (30°-60°), tropics north (0°-30°), and tropics south (0°-30°). The bioclimatic zones are grouped in the following way: "tropical biomes" correspond to class A in the Koppen Geiger scheme (Peel et al., 2007); "arid biomes" correspond to class B, "temperate biomes" correspond to class C plus the a and b subgroups of the D (cold) class (i.e., with hot and warm summer, respectively), and "boreal biomes" correspond to the c and d subcategories of the D class (i.e., cold and very cold winters).

Journal of Geophysical Research: Biogeosciences
The increase of the modeled SIF magnitude required to match the OCO-2 retrievals for TrEBF, TeDBF, and BoDBF PFTs (Figure 2a) is tied to a noticeable increase of V cmax25 , which boosts the photosynthetic capacity. While a parallel increase in leaf area is obtained for TeDBF and BoDBF (through the increase of both SLA and LAI max ), a decrease is observed for TrEBF (SLA and LAI max are decreased). For that matter, SLA and maximum LAI decreased for all PFTs except TeDBF and BoDBF (plus TrDBF, TeEBF, and C4 for maximum LAI but to a lesser extent). Noticeably, the largest joint increase of V cmax25 and SLA are obtained for two aforementioned PFTs, TeDBF and BoDBF, while ALA reaches its lower bound. The highest number of "edge-hitting" parameters-that is, parameters whose posterior values reach one of the limits of their predefined bounds-is also obtained for these PFTs. The impacted parameters are V cmax25 , C ab , and ALA. It consequently limits the potential model improvement in terms of increase of carbon uptake and explains the residual underestimation of the modeled SIF amplitude regarding to OCO-2 time series for these PFTs. The general increment in T sen (for all PFTs but TeC3GRA) and the general decrease of both L age-crit and L fall contribute to advance the start of senescence and increase the leaf turnover. The changes in the parameters controlling the start of the growing season and the increase in leaf biomass (K LAI-happy and T leaf- Journal of Geophysical Research: Biogeosciences init ) exhibit a general increase over the PFTs, except noticeably for temperate and tropical C3 grasses. Overall, the result is a shortening of the growing season for deciduous (temperate and boreal) and grass and crop PFTs.
The SIF-related parameters C ab and ALA show strong and often opposite changes, with a large variability between PFTs. We recall that these two parameters only contribute to the SIF observation operator without any feedback on the photosynthesis and phenology processes. There are several cases where opposite changes in parameters result in opposing changes in the modeled SIF magnitude: for instance, the increase/decrease of V cmax25 /SLA which results in a SIF decrease for BoDNF, TrC3GRA, BoC3GRA, and C3 and C4 crops, while it corresponds to a SIF increase for TrEBF. These opposite changes highlight nonlinear interactions between optimized parameters and confounding effects (e.g., MacBean et al., 2018). This makes it difficult to interpret which parameters contribute the most to the model improvement relative to SIF data, and hence the change in GPP. However, the relative comparison of the uncertainty reductions between parameters and PFTs somewhat inform on those who are the most constrained by SIF data. The highest uncertainty reductions (above 50%) are achieved for the V cmax25 , A1, SLA, T sen , L age-crit , and ALA parameters, but not always for the same PFT. The lowest uncertainty reductions (below 10%) are obtained also for B1, as well as for the phenology-related parameters K LAI-happy , Tl eaf-init , GDD crit , Hum sen , Hum nosen , and L ageMin-sen , and for the C ab parameter of the SIF model. The reasons for these low uncertainty reductions may however be different. They may express a low sensitivity of the model to some of the parameters (GDD crit , Hum nosen , LAI MAX , for some PFTs for instance, or C ab that has little influence on the TOC fluorescence in the near infrared).

SIF Constraints on GPP-Related Parameters and Processes
Our assimilation framework of SIF data has achieved a strong reduction in the global mean annual GPP simulated with the ORCHIDEE model. At global scale, the reduction of GPP amounts to 28 GtC.yr −1 on average over the 1990-2009 period (from 162 to 134 GtC/year). For the sole year 2008, the reduction is of 28.8 GtC, which is comparable to the correction achieved by MacBean et al. (2018) by assimilating GOME-2 SIF data over 2007-2011 with ORCHIDAS also, based however on a simple statistical observation operator that does not use any mechanistic SIF model, and a different setup (with an older version of ORCHIDEE).
Although the analysis of the estimated parameter values after assimilation does not enable to really identify the main parameters causing the change in GPP for all PFTs, the reduction in GPP budget are attributed to a decrease of GPP magnitude (due to the lowering of photosynthetic capacities and leaf area) and/or of the seasonal cycle length depending on PFT. The decrease in GPP trend is attributed to the parameters controlling the GPP magnitude, and which optimization resulted in a reduced sensitivity of GPP to the increase of atmospheric CO 2 concentration. The difficulty to identify the driving parameters is revealed by error correlations between optimized parameters and relatively low and equal sensitivity of SIF to some parameters. These aspects partly explain the rather low uncertainty reduction on parameters, which is generally below 50%. Our results differ from those of MacBean et al. (2018) who found higher constraint levels on ORCHIDEE parameters brought by SIF observations, which was also partly due to their larger prior parameter bounds.

Leaf-Scale Modeling of SIF
The development and optimization of our parametric model of NPQ relies on leaf-scale fluorescence measurements over a limited range of vegetation species. In addition, we have made the intrinsic hypothesis that the processes driving non-photochemical quenching will operate similarly across all vegetation types and climates. Extensive measurement campaigns would be required to validate this assumption and, if need be, to improve our modeling framework. The parametric model also does not enable us to represent various factors that could impact NPQ and the link between SIF and GPP: (1) stresses other than light-limiting photosynthesis (e.g., water stress; van der Tol et al., 2014; which is however for the moment directly applied on V cmax and on the stomatal conductance), (2) the possible sustained decrease of photochemical quenching due to photoinhibition, (3) the partitioning of NPQ between sustainable and reversible mechanisms that operates at different time scales and with different kinetics (Porcar-Castell et al., 2014), or (4) the possible variation in the partitioning of energy between PSI and PSII (parameter c, in equation (4)) and its potential effect 10.1029/2019JG005040

Journal of Geophysical Research: Biogeosciences
on PSI contribution, NPQ, and coupling between SIF and GPP. Obviously, the scientific community would strongly benefit from a full characterization of these factors across ecosystems and in response to different sources of stress, and the subsequent development of a physical model of NPQ that would incorporate these processes.
The values of the rate constants (k D and k F ) and so of the maximum yield of photochemistry (ΦP PSII MAX ), here derived from SCOPE, are empirical and are also subject to uncertainty: a value of 0.8 for ΦP PSII MAX was here consistently used in our model, whereas this parameter is known to vary slightly between species with mean values around 0.83 (Baker, 2008;Björkman & Demmig, 1987). The parameters could be even biome specific (Frankenberg & Berry, 2018). The sensitivity of the ORCHIDEE SIF model to those parameterizations will be further investigated as they impact the upscaling from SIF to GPP. These parameters were not optimized in this study, to reduce the complexity of the optimization, but they should be considered in the future.

Canopy-Scale Modeling of SIF
The radiative transfer to simulate top of canopy SIF, based on SCOPE, was simplified in order to speed up the computation time. Although it is simpler than a recent machine learning approach applied to the SCOPE model (Verrelst et al., 2017), the fairly good performance of the optimized parametric SIF model to reproduce SCOPE SIF simulations validates the use of this approach.
The SIF of plant canopies simulated by ORCHIDEE assumes that it is essentially determined by the illuminated leaves of the top layer. This simplification is questionable given that the distribution between illuminated and shaded leaves may influence the total fluorescence flux. It is however justified by the fact that the major part of the canopy fluorescence originates from the top layers. For example, Thum et al. (2017) showed that more than 85% of the SIF is produced in the uppermost layer of their model. Yang and van der Tol (2018) also showed that SIF observed at the top of canopy at 760 nm largely originates from first-order scattering and thus from sunlit leaves, mainly situated at the top layers.
SIF simulations with ORCHIDEE were performed at the time of the satellite overpass. Possible disagreements between meteorological forcing and actual meteorological conditions at the synoptic scale (e.g., instantaneous cloud coverage impacting the PAR) were smoothed in the monthly averaged output.
The major limitation of the ORCHIDEE SIF model, as highlighted earlier, is its lack of biome dependency (a feature that is also not intrinsically represented in SCOPE). Actually, differences in leaf optical properties and canopy structure between vegetation species influence the radiation regime within the canopy and hence the escape probability of emitted SIF photons Guanter et al., 2014;Sun et al., 2018;Zhang et al., 2016). We have initially chosen to have a representation of the SIF radiative transfer that is generic across all PFTs by not constraining the prior distribution and uncertainty of the two SIF-related parameters C ab and ALA, hence relying on the data assimilation system to handle possible PFT dependencies. Note that this approach is somewhat similar to MacBean et al. (2018) who did not consider any a priori PFT dependency of the two empirical parameters of the SIF-GPP relationship, or to Norton et al. (2018) who assumed the structural parameters of SCOPE to be global. The optimized values of C ab and ALA show large differences between PFTs. However, correlations between parameters during assimilation do not allow us to constrain sufficiently the distribution of the parameter values given that opposing changes in SIF and GPP are observed for some PFTs. This implementation can be improved given new evidence of different biome dependencies in the SIF-GPP relationship (Guanter et al., 2012;Sun et al., 2018;Zhang et al., 2016). That biome dependency may be attributable to both different physiological regulation mechanisms and differences in the radiative transfer within the canopy due to specific biome characteristics in terms of leaf optical properties and canopy structure that influence the amount of re-absorption and scattering of the chlorophyll fluorescence. The influence of structural effects is for instance supported by the recent work of , who observed directional variations in SIF monitored by OCO-2 in target mode, the behavior of which depended on the vegetation type. Therefore, with the current framework, it seems necessary to prescribe biome-specific prior parameter values and bounds given that they indirectly impact GPP due to parameter interactions within the assimilation procedure. Additionally, introducing a leaf-clumping parameter that characterizes the grouping of leaves within the canopy (Nilson, 1971) could bolster the representation of PFT-specific structural effects on top of canopy SIF, given that ecosystems exhibit distinct clumping index values (Chen et al.,

10.1029/2019JG005040
Journal of Geophysical Research: Biogeosciences 2005). Note that, in SCOPE, the clumping index equals 1, which corresponds to a random grouping of leaves. The implementation of a leaf clumping parameter is feasible with the current version of ORCHIDEE relying on a big-leaf approach, but would necessitate cautious constraining of the covariations of that parameter and those of C ab and ALA.

SIF Model Coupling With ORCHIDEE
In the current implementation, C ab and ALA do not impact the photosynthesis module in ORCHIDEE through the calculation of the absorbed PAR or the extinction coefficient of incoming radiation within the canopy. They therefore help to absorb possible bias in the observation data (Bacour et al., 2019;Köhler et al., 2015; and act similarly to the offset parameter of the SIF-GPP linear model in MacBean et al. (2018). The next logical step is to have a consistent accounting of these parameters between the fluorescence and photosynthesis modules.
In the future, the radiative modeling of SIF in ORCHIDEE will benefit from and evolve with the model development. In particular, the ORCHIDEE-CANopy version (Naudts et al., 2015) that represents the vertical distribution of leaves use a more physical description of the radiative transfer within the canopy (two stream model; Pinty et al., 2006). Similarly, we should also include the consideration of the quality of the incoming photosynthetically active radiation (direct versus diffuse light) that largely influences the SIF emission. Also, the upcoming version of ORCHIDEE will include the nitrogen cycle (Vuichard et al., 2018) and therefore will better represent nutrient limitations on CO 2 fertilization (He et al., 2017) . This should enable to better represent the covariations of V cmax and leaf chlorophyll content, and hence the impact of C ab on the SIF signal. Indeed, leaf chlorophyll content is related to leaf nitrogen content (Gholizadeh et al., 2017;Yoder & Pettigrew-Crosby, 1995) with a dependence that may however vary depending on the season, species, and allocation of nitrogen for different plant processes Ghimire et al., 2017;Kalacska et al., 2015). In parallel, a more direct relationship between C ab and V cmax shall be implemented to represent the dynamic of leaf chlorophyll content Houborg et al., 2013;Migliavacca et al., 2017).

Summary and Conclusion
We have presented the development of a process-based model for solar-induced fluorescence implemented in the ORCHIDEE TBM. It relies (1) on the simulation of the regulation of the PSII fluorescence quantum yield at the leaf level, following the lake model formalism and using a novel parameterization of NPQ as a function of temperature, PAR, and the normalized quantum yield of photochemistry, and (2) a parametric simplification of the SCOPE model to emulate the radiative transfer of chlorophyll fluorescence to the top of the canopy. The prior ORCHIDEE-SIF model enabled to simulate levels and temporal variations of SIF already in agreement with those recorded by OCO-2 for most PFTs.
We applied a Bayesian assimilation framework to assimilate OCO-2-SIF products and hence optimize ORCHIDEE over an ensemble of grid points for all vegetation PFTs. It resulted in a substantial improvement of the simulated amplitudes and temporal seasonality for all PFTs, and of the simulated spatio-temporal patterns of yearly SIF with respect to OCO-2 observations.
Through the optimization of photosynthesis-and phenology-related ORCHIDEE parameters the impact on the simulation of GPP was considerable with a decrease of the global budget by 28/29 GtC.yr −1 in average over the 1990-2009/2001-2009 periods (from 162/165 to 134/136 GtC/year). The global GPP budget of the optimized model was then in closer agreement with that of FLUXCOM estimates (121 GtC/year over 1990-2009) and even more with that of FLUXSAT (137 GtC/year over 2001. The reasons for this large change vary between PFTs. For deciduous trees and C3 and C4 grasses and crops, it is attributed to a decrease of the growing season length; for tropical forests, temperate and boreal evergreen trees, and C3 and C4 grasses and crops, it can also be related to a reduction of the leaf area. At global scale, this translated into a decrease of the gross primary productivity over the tropics principally, Europe, and over a large area covering the northwestern part of North America; at the opposite, an increase in GPP was obtained in the U.S. corn belt and over boreal Asia.
Despite the strong parallel improvements of the optimized model both regarding OCO-2 SIF data and independent FLUXCOM-GPP estimates, the assimilation however failed to reconcile these two data sets and the ORCHIDEE model for temperate and boreal deciduous forests. Indeed, the increase of the simulated 10.1029/2019JG005040

Journal of Geophysical Research: Biogeosciences
SIF amplitude required to match OCO-2 time series for those PFTs resulted in a parallel increasing of the modeled GPP which in turn degraded the model agreement with FLUXCOM-GPP estimates.
Uncertainties in the FLUXCOM-GPP and OCO-2 SIF estimates put aside, this may point out some model deficiencies that can be explained by an additional biome dependency of the SIF-GPP relationship (Guanter et al., 2012;Sun et al., 2018;Wood et al., 2017;Zhang et al., 2016) not represented in ORCHIDEE (nor in the SCOPE model).
The validation of SIF models and of their representation/parameterization of the processes regulating plant photosynthesis and fluorescence emission is critical for using SIF estimates in data assimilation systems. The biome dependency in particular needs to be resolved to enhance the observational constraint brought by space-borne SIF products on vegetation productivity and reinforce the predictive capabilities of TBMs relative to the fate of the carbon cycle.