The ECCO‐Darwin Data‐Assimilative Global Ocean Biogeochemistry Model: Estimates of Seasonal to Multidecadal Surface Ocean pCO2 and Air‐Sea CO2 Flux

Quantifying variability in the ocean carbon sink remains problematic due to sparse observations and spatiotemporal variability in surface ocean pCO2. To address this challenge, we have updated and improved ECCO‐Darwin, a global ocean biogeochemistry model that assimilates both physical and biogeochemical observations. The model consists of an adjoint‐based ocean circulation estimate from the Estimating the Circulation and Climate of the Ocean (ECCO) consortium and an ecosystem model developed by the Massachusetts Institute of Technology Darwin Project. In addition to the data‐constrained ECCO physics, a Green's function approach is used to optimize the biogeochemistry by adjusting initial conditions and six biogeochemical parameters. Over seasonal to multidecadal timescales (1995–2017), ECCO‐Darwin exhibits broad‐scale consistency with observed surface ocean pCO2 and air‐sea CO2 flux reconstructions in most biomes, particularly in the subtropical and equatorial regions. The largest differences between CO2 uptake occur in subpolar seasonally stratified biomes, where ECCO‐Darwin results in stronger winter uptake. Compared to the Global Carbon Project OBMs, ECCO‐Darwin has a time‐mean global ocean CO2 sink (2.47 ± 0.50 Pg C year−1) and interannual variability that are more consistent with interpolation‐based products. Compared to interpolation‐based methods, ECCO‐Darwin is less sensitive to sparse and irregularly sampled observations. Thus, ECCO‐Darwin provides a basis for identifying and predicting the consequences of natural and anthropogenic perturbations to the ocean carbon cycle, as well as the climate‐related sensitivity of marine ecosystems. Our study further highlights the importance of physically consistent, property‐conserving reconstructions, as are provided by ECCO, for ocean biogeochemistry studies.

Plain Language Summary Data-driven estimates of how much carbon dioxide the ocean is absorbing (the so-called "ocean carbon sink") have improved substantially in recent years. However, computational ocean models that include biogeochemistry continue to play a critical role as they allow us to isolate and understand the individual processes that control ocean carbon sequestration. The ideal scenario is a combination of the above two methods, where data are ingested and then used to improve a model's fit to the observed ocean, also known as, data assimilation. While the physical oceanographic community has made great progress in developing data assimilation systems, for example, the Estimating the Circulation and Climate of the Ocean (ECCO) consortium, the biogeochemical community has generally lagged behind. The ECCO-Darwin model presented in this paper represents an important technological step forward as it is the first global ocean biogeochemistry model that (1) ingests both physical and biogeochemical observations into the model in a realistic manner and (2) considers how the nature of the ocean carbon sink has changed over multiple decades. As the ECCO ocean circulation

Introduction
The ocean plays a vital role in regulating the global climate system and mitigating climate change. This has motivated numerous studies to quantify and monitor trends and patterns in the oceanic sink of atmospheric carbon dioxide (CO 2 ), which has absorbed roughly 48% of anthropogenic CO 2 emissions during 1800-1994 (Sabine et al., 2004). For example, based on a survey of inorganic carbon distribution in the 1990s, the global oceanic carbon uptake has been estimated at 118 ± 19 Pg C year −1 . More recent estimates have been inferred from model-based methods, yielding mean uptake rates of 2.3 ± 0.6 Pg C year −1 (Khatiwala et al., 2009) and 2.6 ± 0.3 Pg C year −1 (Gruber et al., 2019). Although results from these studies suggest an increase in oceanic CO 2 uptake over the last several decades (DeVries, 2014;DeVries et al., 2017DeVries et al., , 2019Khatiwala et al., 2013;Sarmiento & Gruber, 2002), storage rates may depend on complex biologically driven feedbacks (Riebesell et al., 2007) and show considerable spatial variability (Gruber et al., 2019), rendering the long-term efficiency of the ocean CO 2 sink uncertain (Landschützer et al., 2015;Le Quéré et al., 2010;Munro et al., 2015). Quantifying the magnitude and time-space variability of the oceanic CO 2 sink has been recognized as an important goal in the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC, 2013). Addressing this goal will improve predictions of the future climate trajectory, assist in the formulation of climate-related policies, and help implement mitigation and adaptation strategies.
As the global carbon budget represents the time-space integration of small-scale processes, a mechanistic understanding of the regional drivers of air-sea CO 2 exchange  at seasonal to multidecadal timescales is required for assessing the response and potential vulnerability of the oceanic CO 2 sink to climate change (Ciais et al., 2013). Although CO 2 exchange across the air-sea interface is difficult to measure directly, observations of oceanic and atmospheric partial pressure of carbon dioxide (pCO 2 ) have increased rapidly over the last decade Pfeil et al., 2013;Takahashi et al., 2009). However, these measurements remain sparse and exhibit weak temporal and spatial coherence, preventing their use in estimating oceanic CO 2 sink variability over the time-space scales required for detecting anthropogenic trends . This is especially problematic given that air-sea CO 2 fluxes also exhibit substantial spatiotemporal variability Gray et al., 2018;Gruber et al., 2002;Lovenduski et al., 2015;Wanninkhof et al., 2013). Furthermore, multidecadal time series of surface ocean pCO 2 only exist at a few locations in the global ocean (e.g., the Tropical Atmosphere Ocean Array, Hawaii Ocean Time Series, and Bermuda Atlantic Timeseries Study) (Bates et al., 2014;Dore et al., 2009;Feely et al., 2006).
To address these deficiencies, extensive efforts have been made to combine internally consistent databases of surface ocean pCO 2 measurements with gas transfer velocity parameterizations (Wanninkhof, 1992) and various interpolation methods, resulting in global maps of surface ocean pCO 2 and air-sea CO 2 flux (Landschützer et al., 2014(Landschützer et al., , 2016Rödenbeck et al., 2013;Takahashi et al., 2002Takahashi et al., , 2009. However, despite the broad-scale consistency between these interpolation-based products, particularly in the equatorial Pacific Ocean , estimates from these methods suffer from a number of methodological uncertainties and uneven observational sampling , which can lead to diverging results in data-sparse regions (e.g., the Arctic and Southern Ocean) (Ritter et al., 2017;Rödenbeck et al., 2015;Schuster et al., 2013).
In contrast to interpolation-based methods, ocean biogeochemical models (OBMs) (Aumont et al., 2015;Galbraith et al., 2010;Stock et al., 2014) have the ability to resolve the spatiotemporal scales necessary for attributing air-sea CO 2 fluxes to their respective mechanisms (Ito & Follows, 2013;Lauderdale et al., 2016;Takahashi et al., 2002). Numerous hindcast OBMs, forced with time-varying atmospheric boundary conditions, have been previously used to assess oceanic CO 2 sink variability (Le Quéré et al., 2003Quéré et al., , 2018Sarmiento et al., 2010), but these models do not assimilate physical and biogeochemical observations. Additionally, data-assimilative OBMs either are limited to regional studies (e.g., the Southern Ocean, Verdy & Mazloff, 2017) or have global coverage but are integrated over short time periods (e.g., 2009periods (e.g., -2011periods (e.g., , Brix et al., 2015. With the advent of global carbon cycle data assimilation systems, such as the National Aeronautics and Space Administration Carbon Monitoring System Flux project Liu et al., 2014Liu et al., , 2017, there is a need for data-assimilative OBMs that resolve fine-scale regional patterns of air-sea CO 2 flux over seasonal to multidecadal timescales. In this paper, we describe our efforts to update and improve ECCO-Darwin, a global ocean biogeochemistry model that assimilates both physical and biogeochemical observations. ECCO-Darwin is based on (1) an adjoint-based ocean circulation estimate provided by the Estimating the Circulation and Climate of the Ocean (ECCO) consortium and (2) a realistic ecosystem model provided by the Massachusetts Institute of Technology (MIT) Darwin Project. Here we significantly improve ECCO-Darwin from the pilot study of Brix et al. (2015) by extending the time coverage to multiple decades  and addressing important issues such as an arbitrary constraint on the global-mean carbon uptake and exaggerated seasonal and synoptic variability in Southern Ocean air-sea CO 2 fluxes. We use a low-dimensional Green's function approach to adjust initial conditions and biogeochemical parameters (six). This optimizes the model's fit to observations in a property-conserving manner (i.e., without nudging or creating artificial sources/sinks), resulting in a quantitative description of the time-varying global ocean biogeochemical state that is ideal for ocean carbon budget studies. Finally, we use this updated version of ECCO-Darwin, along with complementary interpolation-based products, to provide global and biome-scale estimates of seasonal to multidecadal surface ocean pCO 2 and air-sea CO 2 flux variability.

Overview
The ECCO-Darwin OBM (henceforth referred to as ED) described in this paper is based on a global ocean and sea ice configuration of the MIT general circulation model (MITgcm) (Marshall et al., 1997) and on the pilot study described in Brix et al. (2015). The physical ocean circulation is from the ECCO consortium (Wunsch et al., 2009), which synthesizes the MITgcm with nearly all available ocean observations since the era of satellite altimetry (~1992), and provides an adjoint-based reconstruction of the three-dimensional, time-varying global ocean and sea ice state. The ECCO circulation estimates are coupled online with the MIT Darwin ecosystem model (Dutkiewicz et al., , 2014Follows et al., 2007;Follows & Dutkiewicz, 2011), which in turn drives and interacts with marine chemistry variables. Instructions for building and running ED are given in supporting information Text S1.

ED Physical Model
For the ED physical fields, we use the ECCO LLC270 global ocean and sea ice data synthesis . ECCO LLC270 is built upon two previous ECCO efforts, ECCO v4 , and ECCO2 (Fenty et al., 2017;Menemenlis et al., 2008). Compared to the lower-resolution ECCO v4 synthesis (nominal 1°grid spacing), ECCO LLC270 has finer horizontal grid spacing (~1/3°at the equator and~18 km at high latitudes). The vertical discretization comprises 50 z levels; model integration spans January 1992 to December 2017. Terrestrial runoff along coastal boundaries is forced using the monthly climatology of Fekete et al. (2002). Since horizontal resolution is insufficient to resolve mesoscale eddies, their impact on the large-scale ocean circulation is parameterized using the Redi (1982) and Gent and McWilliams (1990) schemes.
Remotely sensed data constraints include daily along-track sea level anomalies from satellite altimetry  relative to a mean dynamic topography (Andersen et al., 2016), monthly ocean bottom pressure anomalies from the Gravity Recovery and Climate Experiment mission (Watkins et al., 2015), daily sea surface temperature (SST) (Reynolds et al., 2002), and sea ice concentration fields from passive microwave radiometry (Meier et al., 2017). The primary in situ data constraints include the global array of Argo floats Roemmich & Gilson, 2009), ship-based hydrography incorporated as monthly climatological temperature and salinity profiles from the World Ocean Atlas 2009 Locarnini et al., 2010), tagged marine mammals (Roquet et al., 2013;Treasure et al., 2017), and ice-tethered profilers in the Arctic (Krishfield et al., 2008).
To fit the above observations, the ECCO LLC270 solution minimizes a weighted quadratic sum of model data differences. This is done using the adjoint method, also known as 4-D-Var (see Wunsch et al., 2009). The control variables include initial temperature and salinity fields; time-invariant diapycnal, Redi (1982)-isopycnal, andGent andMcWilliams (1990)-thickness diffusivities; and 14-day adjustments to the 6-hourly ERA Interim (Dee et al., 2011) estimates of downward shortwave and longwave radiation, precipitation, 2-m air temperature and specific humidity, and 10-m zonal and meridional wind. In this way, ECCO LLC270 provides a physically consistent, property-conserving reconstruction of the three-dimensional, time-evolving ocean and sea ice state. The specific state estimate used in this manuscript is not fully optimized: it is the result of 42 forward-adjoint iterations. Nevertheless, the cost function has been reduced by 85% relative to the unconstrained (iteration 0) simulation. A detailed evaluation of the ECCO LLC270 iteration-42 state estimate, the so-called "ECCO standard analysis" of , is made available as Data Set S1 in the supporting information. Because the initial conditions are estimated as part of the optimization procedure, the ECCO LLC270 state estimate has negligible drift and therefore does not require spin up.
Although not yet fully optimized, the ECCO LLC270 state estimate has begun to be employed in a small number of early studies. In particular, it is being used as lateral boundary conditions for several published (e.g., Khazendar et al., 2019;Nakayama et al., 2017Nakayama et al., , 2018Nakayama et al., , 2019Wood et al., 2018) and underway regional studies. The present study is the first utilization of ECCO LLC270 in a global context. The importance of a data-constrained, property-conserving ocean state estimate for OBMs cannot be overstated. For example, Table 1 in Mikaloff Fletcher et al. (2006) shows that even a very early, coarse-resolution ECCO state estimate (Stammer et al., 2004) achieved skill comparable to, or greater than, that of more established and specialized OBMs. The ECCO LLC270 state estimate is substantially improved compared to the state estimate of Stammer et al. (2004), not only because of increased horizontal/vertical grid resolution and the inclusion of the Arctic Ocean and sea ice but also because the earlier solution did not include adjustments for model subgrid-scale parameterizations. Compared to the state estimate used in Brix et al. (2015), the LLC270 state estimate does not admit mesoscale eddies; however, it is of substantially longer duration (26 vs. 2 years) and, again because of subgrid-scale adjustments, is closer to observations. Additionally, the ECCO project is actively working on higher-quality, higher-resolution ocean state estimates so that the quality and realism of the ECCO-Darwin OBM will continue to improve as new physical state estimates become available.

ED Biogeochemistry Model
To couple the ED physical model with ocean ecology and carbon chemistry, we use the MIT Darwin Project ecosystem model previously described in Brix et al. (2015). The ED biogeochemistry includes 39 prognostic variables (supporting information Table S1) that are advected and mixed by the ECCO LL270 physical fields; currently, there are no feedbacks between biogeochemistry and circulation, such as light absorption and biomixing effects (Kunze, 2019). The coupling between ocean physics and biogeochemistry is done online at every model time step (1,200 s). The simplified Darwin ecology includes five large-to-small phytoplankton function types (diatoms, other large eukaryotes, Synechococcus, and low-and high-light adapted Prochlorococcus) and two zooplankton types that graze preferentially on either the large eukaryotes or small picoplankton. The carbon cycle is explicitly represented, along with nitrogen, phosphorus, iron, silica, oxygen, and alkalinity; carbonate chemistry is based on the efficient solver of Follows et al. (2006). Air-sea CO 2 flux is computed using the parameterization of Wanninkhof (1992) and atmospheric pCO 2 from the zonally averaged monthly National Oceanic and Atmospheric Administration Marine Boundary Layer Reference product (Andrews et al., 2014). Atmospheric iron dust deposition at the ocean surface is forced using the monthly climatology of Mahowald et al. (2009). To prevent unrealistic accumulation of particulates in bathymetric "wells" (i.e., wet grid cells surrounded horizontally by dry cells) on the model grid, we added a porous bottom boundary condition (i.e., all particulates are instantly removed once they reach the seafloor). We stress that all tracers, both physical and biogeochemical, are conserved within model numerical precision from the initial conditions to the end of the simulation.

ED Biogeochemical Improvements and Optimization
The biogeochemical observations used to evaluate and adjust ED include (1) surface ocean fugacity ( fCO 2 ) from the monthly gridded Surface Ocean CO 2 Atlas (SOCATv5, Bakker et al., 2016), (2) GLODAPv2 ship-based profiles of NO 3 , PO 4 , SiO 2 , O 2 , dissolved inorganic carbon (DIC) derived from pH and alkalinity, and alkalinity , and (3) BGC-Argo float profiles of NO 3 and O 2 (Drucker & Riser, 2016;Riser et al., 2018). Since BGC-Argo DIC and alkalinity are derived from an empirical relation between pH and salinity and therefore have larger uncertainty compared to GLODAPv2 observations, we verified that inclusion or exclusion of these observations had negligible impact (<1% change to globally integrated airsea CO 2 fluxes) before including them as data constraints.
In addition to a better fit to the above observations, we sought to fix three serious biogeochemical issues with the pilot study of Brix et al. (2015): (I1) Southern Ocean air-sea CO 2 flux synoptic variability that was strongly exaggerated compared to flask observations (see bottom panels of Figure 5 in Ott et al., 2015).
(I2) Southern Ocean air-sea CO 2 flux seasonal cycle that was much larger (Figure 1 To fix issues I1-I3 and improve the model's fit to the global biogeochemical observations, we used a Green's function approach to adjust biogeochemical initial conditions and six model parameters. Green's functions consist of forward model sensitivity experiments that express the linearized response of a forward model integration to perturbations of initial/boundary conditions and model parameters. We note that the Green's function approach (used to adjust the biogeochemistry) and the adjoint method (used to adjust the ocean physics) are both linearized least squares minimization approaches. The main difference is that Green's function approach can, in practice, only be applied to a small number of control variables (Menemenlis et al., 2005). Given the computational cost of each model sensitivity experiment, we are not able to conduct an exhaustive exploration of model parameter space. Instead, we selected a small number of sensitivity experiments that, in the authors' collective expertise, seemed the most likely to reduce model-data differences.
The first guess unadjusted experiment (Table 1, Experiment #1) used the optimized biogeochemical initial conditions and model parameters of Brix et al. (2015) but applied to the ECCO LLC270 ocean circulation estimate. To reduce the large initial drift in the model's biogeochemistry, Experiment #2 repeated the 1992-2017 model integration but starting from the 1996 biogeochemical conditions of Experiment #1. Because the Brix et al. (2015) DIC and alkalinity initial conditions were biased low compared to GLODAPv2 observations , Experiments #3-5 attempted various combinations of initial conditions with increased DIC and alkalinity and GLODAPv2 climatological initial conditions. Experiment #6 is an interim optimization (supporting information Table S1). Experiments #7-11 are additional initial condition sensitivity experiments whose primary objective was to further reduce model drift in the equatorial Pacific. Since iron is an important limiting micronutrient in several key areas of the ocean (Martin & Fitzwater, 1988), Experiments #12-13 perturbed iron dust solubility, which is poorly known, and iron scavenging rate, which can vary by orders of magnitude (e.g., Parekh et al., 2005). We also perturbed phytoplankton growth rates parameters (Experiments #14-15), which are critical for the amount of carbon taken up by these organisms. Since diatoms are one of the key components of the Southern Ocean marine food web, Experiment #16 altered their palatability to grazers in order to explore how some top-down control on the system impacts air-sea CO 2 fluxes. Finally, we also altered the amount of particulate inorganic carbon produced per particulate organic carbon (Experiment #17); this value likely varies significantly over the global ocean . Because particulate inorganic carbon production and dissolution have a large impact on alkalinity, which strongly influences pCO 2 concentrations, this parameter is a good candidate for optimization.
Using the above sensitivity experiments and the Green's function approach described in Menemenlis et al. (2005), we minimized a quadratic "cost" function of weighted model-data differences to obtain optimized biogeochemical initial conditions and model parameters. Table 1 summarizes some results from this optimization. Experiments #2-11 pertain to initial condition sensitivity experiments, while Experiments #12-17 pertain to biogeochemical model parameters. The optimized biogeochemical initial conditions are constructed as a linear combination of Experiments #2-11, according to the coefficients listed under column D (which sum to one). For Experiments #12-17, columns C-D show initial and optimized values for each adjusted parameter. The ED simulation described in this paper (Experiment #18 in Table 1) uses the optimized biogeochemical initial conditions and the optimized biogeochemical parameters of Table 1. Therefore, ED represents a well-tuned biogeochemical model simulation driven by adjoint methodconstrained ocean physics provided by the ECCO project. Figure 2 shows a scatterplot of observations versus simulations before and after the biogeochemical adjustments. The cost function reduction of ED (Experiment #18) relative to the first guess simulation based on  Brix et al. (2015) (Experiment #1) is 6%, 36%, 32%, 26%, 19%, 95%, and 96%, respectively, for surface ocean fCO 2 , NO 3 , PO 4 , SiO 2 , O 2 , DIC, and alkalinity. Being at the surface, fCO 2 has elevated synoptic variability, and therefore, the cost function reduction is smaller than the other full-depth variables; the largest cost function decrease is for DIC and alkalinity. The overall biogeochemical cost function reduction of ED (Experiment #18) relative to the first guess simulation based on Brix et al. (2015) (Experiment #1) is~68% (Column F in Table 1).
As is discussed in Menemenlis et al. (2005), the Green's function approach permits offline parameter sensitivity exploration as well as the exploration of different cost functions. For example, columns E and F of Table 1 indicate the importance of each sensitivity experiment in fitting the biogeochemical observations. Specifically, each row of columns F and G display the cost of the various sensitivity experiments relative to the Brix et al. (2015) initial conditions and parameter values (Experiment #1). For the initial condition experiments and optimized simulation (#2-11 and #18), we directly report the relative cost of each individual simulation. For the parameter sensitivity experiments (#12-17), we report the expected cost of a simulation that would start with the Brix et al. (2015) initial conditions but use the optimized value of each parameter, one at a time, while the remaining parameters retain the Brix et al. (2015) values. The expected cost for the optimized parameters is estimated based on the model sensitivity experiments under the assumption that the model response to parameter perturbations is linear. We have verified that the assumption of a linear response is approximately satisfied by comparing the expected cost of the optimized linear combination, 30.85% of Experiment #1 cost, to the actual relative cost of the optimized simulation, 31.90% of Experiment #1 (Column E, Experiment #18 in Table 1).
For the full-depth set of observations (column E), most of the cost function reduction comes from the GLODAPv2 initial conditions (Experiment #5). This is because over the 26-year integration period, there is little change to biogeochemical properties in the deep ocean. If we focus on upper ocean observations only, for example, if we only include surface ocean pCO 2 observations in the cost function (column G), the most significant cost reduction results from sensitivity experiments that reduce solution drift (Experiments #7-11) and from the adjustment of iron scavenging rate (Experiment #13). A different method to establish the contribution of each parameter to the optimized solution is to remove each parameter from the optimization, one at a time, as is done in Table 3 of Menemenlis et al. (2005). Using this method, the most important parameter for improving surface ocean pCO 2 relative to Brix et al. (2015) is the large phytoplankton growth rate, which contributes 39% cost reduction, followed by iron scavenging rate, iron dust solubility, and small phytoplankton growth rate, which contribute, respectively, 22%, 15%, and 11% cost reduction. Note that contrary to Brix et al. (2015), the present ED solution uses pure observations as constraints and does not rely on an additional arbitrary constraint for the global ocean CO 2 sink. The initial condition biases in DIC and alkalinity turned out to be the major causes for all three (I1-I3) issues in the older ED simulation. As shown in Brix et al. (2015), the Green's function approach reduces both bias and drift for biogeochemistry, similar to the adjoint-based method for the physics. We further remove any residual "large" drift by discarding the first 2 years (1992)(1993)(1994) of the ED simulation in our analysis, where large is defined relative to the interpolation-based estimates of the global ocean CO 2 sink. This residual model drift occurs primarily in the Southern Ocean.

Comparison With Interpolation-Based Products
To illustrate the performance of ED and its ability to reproduce air-sea CO 2 flux spatiotemporal variability, we compared model output with three interpolation-based surface ocean pCO 2 and air-sea CO 2 flux products: 1. The 4°× 5°monthly gridded climatological product of Takahashi et al. (2009), which uses statistical interpolation combined with an advection-diffusion equation and is referenced to the year 2000. 2. The 4°× 5°daily gridded product of Rödenbeck et al. (2013), which spatiotemporally interpolates surface ocean pCO 2 observations while remaining compatible with mixed-layer DIC dynamics. Surface ocean pCO 2 and air-sea CO 2 fluxes are first linked to each other, and to the spatiotemporal field of internal ocean carbon sources/sinks, through parameterizations of air-sea gas exchange, solubility, and carbonate chemistry, as well as a budget equation for the mixed layer DIC content. The internal ocean carbon sources/sinks are then adjusted to optimally fit the surface ocean pCO 2 field to the observations. Spatiotemporal interpolation is achieved by enforcing source/sink adjustments to be smooth; temporal interpolation is overlaid by the inherent relaxation timescales of the mixed layer DIC budget. Process parameterizations are driven by SST, wind speed, mixed layer depth climatology, alkalinity climatology, and some auxiliary variables. However, this external variability only determines features not constrained by surface ocean pCO 2 observations (e.g., day-to-day variations or variability in data-sparse regions/time periods), while the estimated surface ocean pCO 2 field in well-constrained regions/time periods is determined by the observed signal (i.e., there is no regression against environmental drivers). We use version oc_v1.6 of this product, extended in time to December 2017 based on SOCATv6 . 3. The 1°× 1°monthly gridded, smoothed product of Landschützer et al. (2013), which uses an observationbased, self-organizing map feed forward neural network (SOM-FFN). For the purpose of this work, we extend this product in time to December 2017 and adjusted the Arctic Ocean mask to be more consistent with the products of Takahashi et al. (2009) and Rödenbeck et al. (2013). The two-step SOM-FFN method first clusters the ocean into biogeochemical regions and then establishes relationships between SOCATv6 surface ocean pCO 2 observations and environmental drivers (i.e., SST and salinity, mixed layer depth, satellite-based chlorophyll a, and atmospheric pCO 2 ). More details regarding the SOM-FFN method and its evaluation can be found in Landschützer et al. (2014Landschützer et al. ( , 2016Landschützer et al. ( , 2018.
These interpolation-based products, hereby referred to in this paper as Tak09, Röd13, and Land13, respectively, are interpolated and land masked onto the ED model grid using nearest-neighbor interpolation; air-sea CO 2 fluxes are conserved during the interpolation.
Additionally, we account for the land-ocean river runoff loop of CO 2 in the interpolation-based products by subtracting an area-weighted runoff contribution from air-sea CO 2 flux in each wet grid cell, as done in Le Quéré et al. (2018) and Friedlingstein et al. (2019). Globally integrated, the runoff contribution to the global ocean CO 2 sink is 0.78 Pg C year −1 (Resplandy et al., 2018).
Statistically based interpolation methods, such as Tak09 and Röd13, are typically dominated by surface ocean pCO 2 observations and remain fairly independent of environmental driver data sets (e.g., SST, mixed layer depth, and chlorophyll a) and choice of process parameterizations. However, time-space sampling biases, particularly in data-sparse regions, can lead to the seasonal exclusion of air-sea CO 2 flux features and the creation of spurious artifacts. Regression-based interpolation methods such as Land13 allow for more complete space-time coverage by constructing nonlinear models between surface ocean pCO 2 and a suite of environmental drivers observed with more complete coverage. However, this approach can only reproduce modes of variability that are contained in the chosen environmental drivers. By comparison, ED represents dynamical interpolation, where the ocean physics and many of the biogeochemical processes that drive variability in surface ocean pCO 2 are explicitly resolved. While ED contains inherent model errors, our solution is constrained by both the observed ocean circulation (adjoint-based ECCO LLC270) and biogeochemistry (low-dimensional Green's function optimization).

Biomes
We compute area-weighted mean surface ocean pCO 2 and spatially integrated air-sea CO 2 flux in the context of the time-independent 1°× 1°open-ocean biomes described in   (Figure A1 in Appendix A). The 17 biomes are determined from coherence between SST, spring and summer chlorophyll a, sea ice fraction, and maximum mixed layer depth and therefore provide a more robust estimate of biogeochemical ocean regions compared to rectangular boundaries . All biomes are interpolated onto the ED model grid using nearest-neighbor interpolation.

Trend Analysis
In order to identify statistically significant trends in global ocean and biome-scale air-sea CO 2 flux, we perform a two-tailed Mann-Kendall test (Kendall, 1975;Mann, 1945), which is commonly used to detect monotonic trends in environmental data (Yue et al., 2002). Additionally, we modify this method to account for autocorrelation in the time series (Hamed & Rao, 1998). Trends are computed using a Theil-Sen linear regression (Gilbert, 1987) on annual mean air-sea CO 2 fluxes from 1995 to 2017; trend significance is evaluated at the 95% confidence level.

Climatological Evaluation (1995-2017)
ED (which refers to the optimized simulation for the remainder of the paper) generally reproduces the timemean large-scale spatial patterns of elevated surface ocean pCO 2 and outgassing in the equatorial regions and high-latitude uptake shown in all interpolation-based products (Figures 3 and 4). Additionally, the ED solution covers the complete Arctic domain and includes coastal regions (Figures 3a and 4a). On the regional scale, the higher-resolution ED solution exhibits more complex spatial patterns of outgassing in the Subtropical Pacific and Atlantic Oceans (Figure 4a) compared to the interpolation-based products (Figures 4b-4d). Here ED produces outgassing patches that extend 100 s of km offshore of coastal upwelling zones (e.g., Western North America, Central and South America, and northwest Africa), with reduced outgassing further offshore and near-equilibrium conditions in the convergence zones of the subtropical gyres. These outgassing features are weak or absent in the interpolation-based products, particularly in the South Pacific Ocean where surface ocean pCO 2 measurements are limited. We also note that ED and Land13 partially resolve the fine-scale coastal outgassing features located offshore of western Central America (Figures 4a and 4d). In the tropical Indian Ocean, particularly along the Arabian and Somalian coasts, ED exhibits weaker outgassing (maximum of~1.7 mol C m −2 year −1 ) compared to the interpolation-based products (Land13 maximum of~3.3 mol C m −2 year −1 ).
ED does not produce the strong outgassing patch southeast of the Bering Sea that is shown in the interpolation-based products (Figure 4), despite this region being data rich for surface ocean pCO 2 and biogeochemical profiles. Here ED results in a smaller patch of increased surface ocean pCO 2 (maximum of~384 μatm) that is shifted westward (Figure 3a), with air-sea CO 2 fluxes characterized by weak uptake or near-equilibrium conditions (Figure 4a). Additionally, Southern Ocean outgassing in ED is shifted to higher latitudes (primarily south of 60°S) compared to interpolation-based products and is generally constrained along the periphery of Antarctica, where elevated surface ocean pCO 2 is present (Figure 3a). This is primarily due to the suppression of air-sea gas exchange by winter sea ice cover, followed by outgassing during periods of sea ice melt and open water. In the Arctic Ocean, ED has the highest surface ocean pCO 2 concentrations in the Canadian Basin (Figure 3a), with weak outgassing throughout the region (Figure 4a).
Time-mean uptake is generally stronger in ED compared to the interpolation-based products, particularly in the Southern Ocean and high-latitude Pacific and Atlantic Oceans (Figure 4). Additionally, ED produces stronger uptake along many of the western boundary currents and their respective extensions (e.g., the Gulf Stream, Kuroshio, and East/West Greenland Currents). Zonally averaged air-sea CO 2 fluxes highlight the pole-to-pole patterns of ocean outgassing and uptake ( Figure 5). Compared to the interpolation-based products, ED air-sea CO 2 fluxes show enhanced zonal variability ( Figure 5, blue line in top panel), particularly between 15-45°S and 30-45°N. Additionally, ED results in stronger uptake between 60-45°S and 35-65°N ( Figure 5, bottom panel). Near the equator (7.5°S-7.5°N), the zonally averaged ED air-sea

10.1029/2019MS001888
Journal of Advances in Modeling Earth Systems CO 2 flux magnitude and variability agrees well with the interpolation-based products ( Figure 5); here ED and all interpolation-based products show peak outgassing at~4°S.

Biome-Scale Seasonality
We now zoom into the biome scale and examine the seasonal cycle of air-sea CO 2 flux ( Figure 6) and surface ocean pCO 2 ( Figure A2). For a map of the biomes used in this section see Figure A1. Additionally, monthly climatological fields of surface ocean pCO 2 and air-sea CO 2 flux for ED and the interpolation-based products are shown in supporting information Figures S6 and S7, respectively.
In terms of the amplitude and phase of the seasonal cycle, we find the closest agreement between ED and the interpolation-based products in the Pacific, Atlantic, and Indian subtropical permanently stratified biomes (STPS, biomes 4, 7, 11, 13, and 14) and seasonally stratified biomes (STSS, biomes 3 and 10). For all Pacific and Atlantic STPS and STSS biomes, ED produces air-sea CO 2 flux and surface ocean pCO 2 seasonal amplitudes that are larger than the interpolation-based products ( Figure A2), while Röd13 has the strongest seasonality of all interpolation-based products. In the equatorial biomes (EQU, biomes 5, 6, and 12), which are strongly influenced by interannual variability driven by El Niño-Southern Oscillation (ENSO) events, we find limited seasonality and poor matchup between the individual interpolation-based products and ED. In particular, ED shows a strong decrease in AEQU (biome 12) CO 2 outgassing during spring-summer, with a transition to uptake during August; this feature is coincident with increased phytoplankton blooms offshore of West Africa (not shown).
At higher latitudes in the Pacific and Atlantic subpolar seasonally stratified biomes (SPSS, biomes 2 and 9), ED produces weak outgassing and near-equilibrium conditions in, respectively, the NP SPSS and NA SPSS biomes during summer. The NA SPSS summer decrease in uptake ( Figure 6, biome 2) is primarily driven by strong outgassing on the Scotian Shelf, which is less pronounced in the interpolation-based products (supporting information Figure S2). During fall-winter, ED results in stronger uptake compared to the interpolation-based products. For these biomes, ED surface ocean pCO 2 is generally out of phase with the interpolation-based products ( Figure A2, biomes 2 and 9). In the Southern Ocean, ED results in a larger air-sea CO 2 flux seasonal amplitude of 1.25 Pg C year −1 in SO STSS (biome 15), compared to 0.19, 0.13, and 0.15 Pg C year −1 in Tak09, Röd13, and Land13, respectively. This difference is highlighted by the surface ocean pCO 2 seasonal amplitude; here the ED amplitude (~40 μatm) is a factor of~3 larger than Tak09 and Land13 and a factor~8 larger than Röd13 ( Figure A2, biome 15). Further south in the Southern Ocean SPSS biome (biome 16), the ED seasonal amplitude is in closer agreement with the interpolation-based products; however, ED results in stronger uptake during February-September. For biomes that are influenced by sea ice cover year-round (ICE biomes 1, 8, and 17), ED and the interpolation-based products diverge in both amplitude and phase.

Biome-Scale Interannual to Multidecadal Variability (1995-2017)
Next, we evaluate time series of air-sea CO 2 flux and surface ocean pCO 2 for ED, Röd13, and Land13 in each biome (Figures 7 and A3). Additionally, time series of surface ocean fCO 2 for ED and SOCATv5 and the Here ED and Röd13 show increased interannual variability in air-sea CO 2 fluxes compared to the spatially less heterogeneous Land13 product. We find that ED generally has higher mean surface

10.1029/2019MS001888
Journal of Advances in Modeling Earth Systems ocean pCO 2 and a stronger seasonal cycle compared to Röd13 and Land13 ( Figure A3, biomes 1-17). For all biomes, ED, Röd13, and Land13 all exhibit statistically significant air-sea CO 2 flux trends over the period examined (supporting information Table S2).
Similar to the seasonal results, we find good agreement in interannual air-sea CO 2 flux and surface ocean pCO 2 magnitude and trend between ED and the interpolation-based products in the Pacific, Atlantic, and Indian STSS and STPS biomes (Figures 7 and A3, biomes 3 , 4, 7, 10, 11, 13, and 14). For these biomes, the largest differences between ED and the interpolation-based products occur in NA STSS (biome 11), where the larger ED surface ocean pCO 2 seasonal amplitude biases air-sea CO 2 fluxes toward CO 2 outgassing. ED and Land13 show statistically significant trends in air-sea CO 2 fluxes for all Pacific, Atlantic, and Indian STSS and STPS biomes, with Röd13 only exhibiting a statistically significant trend in NP STSS (biome 3).
Equatorial biomes have the largest interannual variability in air-sea CO 2 flux and surface ocean pCO 2 (biomes 5, 6, and 12), with temporal variability in the equatorial Pacific biomes (PEQU) strongly modulated by ENSO events (biomes 5 and 6). ED shows considerable skill in reproducing the interannual variability shown in Röd13 and Land13, particularly in PEQU-E (biome 6). Additionally, we find that ED produces a stronger reduction in PEQU outgassing compared to the interpolation-based products during the 2015-2016 ENSO, which may be due to lower surface ocean pCO 2 during this time period ( Figure A3). Interestingly, only ED has statistically significant air-sea CO 2 flux trends in PEQU-W and PEQU-E, while both ED and Land13 obtain statistical significance in AEQU (biome 12).
For NP and NA SPSS (biomes 2 and 9), ED results in stronger uptake (factor of~3 and~1.3 larger in NP SPSS and NA SPSS, respectively) compared to Röd13 and Land13. In the Southern Ocean STSS (biome 15), ED and Röd13 have comparable statistically significant trends (−0.0091 and −0.0088 Pg C year −2 , respectively), while Land13 has a larger significant trend (−0.0128 Pg C year −2 ) (supporting information Table S2). We note that Land13 shows decreasing CO 2 uptake in SO STSS during 1996-2002, followed by an increase until 2011; this weakening and reinvigoration event is not captured by ED and Röd13 in this biome. ED interannual variability in SO SPSS (biome 16) is more muted, with Röd13 having variable but increasing uptake during 2002-2014 and Land13 showing a similar temporal response to SO STSS. All air-sea CO 2 flux trends in SO STSS fail to reach statistical significance, with Röd13 having the smallest p value (supporting information  Table S2). Similar to the seasonal results, there is limited temporal coherence between ED and the interpolation-based products in NP, NA, and SO ICE (biomes 1, 8, and 17).

Discussion
While data-based reconstructions of the global ocean carbon sink have made major advances in recent years, OMBs continue to play an important role, especially since they allow for a process-based understanding of the ocean carbon sink. The ideal scenario, however, is a combination of these two methods through some form of data assimilation. While the physical oceanographic community has made great strides in the development of data assimilation systems (e.g., ECCO), the biogeochemical community has generally lagged behind. Therefore, the work presented in this paper represents an important technological step forward in ocean biogeochemical data assimilation since ED is the first global OBM model that (1) assimilates both physical and biogeochemical observations in a property-conserving manner (i.e., by adjusting model parameters and initial conditions, as opposed to nonconserving optimal interpolation and Kalman filter approaches) and (2) considers the time-varying nature of the ocean carbon sink over multidecadal timescales.
Using only a small number of model adjustments (biogeochemical initial conditions and six model parameters), ED produces air-sea CO 2 flux estimates that show broad-scale consistency with Tak09, Röd13, and Land13, particularly in the subtropical and equatorial biomes. The low dimensionality of this biogeochemical adjustment highlights the strength of the adjoint-based ECCO LLC270 ocean physics in reproducing observed trends in the ocean carbon sink (DeVries et al., 2017(DeVries et al., , 2019McKinley et al., 2017). Compared to interpolation-based products, the higher-resolution ED simulation also allows for a better representation of fine-scale ocean features (e.g., boundary currents, fronts, and coastal regions), along with covering the complete Arctic Ocean.
One problematic aspect of model-based biogeochemical estimates is bias and drift. It is possible to reduce drift at the expense of bias by spinning up a model toward equilibrium. Alternatively, it is possible to reduce bias at the expense of drift by forcing the model solution to be very close to the observations. Various ad hoc corrections have been devised to minimize the impact of bias and drift on model solutions (e.g., Sausen et al., 1988). The Global Carbon Project (GCP) corrects for model bias and drift by subtracting a time-dependent model bias calculated as a linear fit to the annual CO 2 flux from a control simulation carried out with constant, preindustrial atmospheric CO 2 concentration (Friedlingstein et al., 2019). Instead of ad hoc corrections, bias and drift reduction can be addressed as an optimization problem whereby model parameters and initial conditions are adjusted to simultaneously reduce model bias and drift relative to observations (e.g., Tziperman & Thacker, 1989).
As demonstrated in Brix et al. (2015), a Green's function optimization can be used to reduce bias and drift in a global ocean biogeochemistry model. Unlike the proof-of-concept ED model described in Brix et al. (2015), however, the simulation presented in this paper does not require an arbitrary constraint on global mean air-sea CO 2 flux in the optimization process. Supporting information Figure S8 compares air-sea CO 2 flux time series in the  biomes, similar to Figure 7, except that we include the first 2 years (1992)(1993)(1994) of the simulations during which there is larger drifts, as well as showing time series for the simulation based on the Brix et al. (2015) initial biogeochemical conditions and parameters (Experiment #1 in Table 1) and the interim optimization (Experiment #6). Relative to Experiments #1 and #6, the optimized Experiment #18 removes most of the initialization transients in the equatorial Pacific (Biomes 5 and 6), reduces the unreasonably large seasonal cycle in the SO SPSS (Biome 16), but does not remove the Southern Ocean initialization transients (Biomes 15-17).
For the 1995-2017 period, the ED time-mean global ocean CO 2 sink is −2.47 ± 0.50 Pg C year −1 , which lies within the uncertainty of the GCP estimate of −2.24 ± 0.76 Pg C year −1 over the same time period (Friedlingstein et al., 2019) (Table 2). Additionally, we find reasonable agreement with the ocean interior carbon inversion estimate of −1.7 ± 0.4 Pg C year −1 (nominal 1995) from Gruber et al. (2009); for year 1995, the ED estimate is −1.6 Pg C year −1 . ED, Röd13, and Land13 all demonstrate statistically significant trends in the global ocean CO 2 sink for 1995-2017, with trends of −0.065, −0.045, and −0.058 Pg C year −2 , respectively (supporting information Table S2). On the biome scale, Land13 has the largest number of biomes with statistically significant trends in air-sea CO 2 flux (14 out of 17 biomes), followed by ED (11 out of 17 biomes). Long-term air-sea CO 2 flux trends in Röd13 are less frequent (4 out of 17 biomes), likely due to the direct interpolation scheme of the mixed-layer method, which tends to extrapolate high-frequency noise in data-sparse regions and can result in larger interannual variability (Landschützer et al., 2015).
We find that the ED global ocean CO 2 sink (Figure 8, black line) generally lies within the envelope of the Röd13 and Land13 estimates (Figure 8, cyan and magenta lines; Table 2). Additionally, compared to many of the OBMs used to estimate the GCP ocean sink (Figure 8, orange thin lines), ED exhibits enhanced interannual and decadal-scale variability that is comparable to the interpolation-based products; these features are not typically captured in other OBM studies (Lovenduski et al., 2008;Sarmiento et al., 2010). These results suggest that ED, combined with the estimates from Röd13 and Land13, which have been previously shown to have the best fit to observations in their representation of global and tropical variability , could be used to form an improved uncertainty band for ocean carbon sink studies. Note. All values represent uptake; units are in petagram of carbon per year. Values represent temporal mean ± one standard deviation due to interannual variability. Error estimates for the GCP ocean sink include additional uncertainty (0.5 Pg C year −1 ) from an ensemble of OBMs.
We observe the closest agreement between ED and the interpolation-based products in subtropical biomes (Figures 6 and 7,biomes 3,4,7,10,11,13,and 14), albeit with ED often having a larger seasonal amplitude. This consistency between the model, complementary interpolation-based methods, and surface ocean fCO 2 observations ( Figure A4) lends support to the estimates described here. Because subtropical surface ocean pCO 2 is strongly controlled by ocean temperature (Takahashi et al., 2002), we attribute this agreement to the ECCO LLC270 physics, which yield very realistic SST. In the equatorial Pacific Ocean, ED and the interpolation-based products show similar interannual variability, with a significant reduction in outgassing and surface ocean pCO 2 during ENSO events (Figure 7 and A2, biomes 5 and 6). These results suggest that ED is well suited for quantifying the drivers of air-sea CO 2 flux in the equatorial Pacific Ocean (Gierach et al., 2012(Gierach et al., , 2013, along with characterizing ENSO diversity (Capotondi et al., 2015). When integrated across the PEQU-W biome, ED outgassing is substantially reduced (e.g., 2010-2011) and/or completely arrested (e.g., 2015-2016) during onset/mature El Niño conditions (Figure 7, biome 5). In the PEQU-E biome, outgassing is prevalent during the entire period examined; here ED produces a weaker response to the 1997-1998 El Niño compared to Röd13 and Land13, along with showing a stronger decrease in outgassing for 2012-2017. While we find general agreement in equatorial Pacific Ocean air-sea CO 2 flux trends over interannual and longer timescales, there is substantial model-product mismatch in the seasonal cycle ( Figures 6 and A2, biomes 5 and 6). We attribute this discrepancy to spatial gradients in ED surface ocean pCO 2 and air-sea CO 2 flux in these biomes (supporting information Figures S1 and S2), which are driven by model high-frequency ocean dynamics (i.e., tropical instability waves) and phytoplankton blooms that propagate westward from the Equatorial Islands (e.g., the Galapagos) (Gierach et al., 2013;Gove et al., 2016) and the coastal periphery of Central/South America.
The largest differences in long-term air-sea CO 2 fluxes between ED and the interpolation-based products occur in the Southern Ocean and North Pacific/Atlantic Oceans (Figures 2 and 7). In the SO SPSS, the time-mean ED ocean CO 2 sink is a factor of~2 and~3 larger than Röd13 and Land13, respectively (Figure 7, biome 16). This results from strong winter uptake during March-August ( Figure 6, biome 16), which biases the long-term CO 2 sink magnitude. In contrast, recent float-based observations from Gray et al. (2018) suggest significant winter outgassing between the polar front and marginal ice zone, which they attribute to the upwelling of nutrient-and carbon-rich waters. Our model results do not support this hypothesis, as Southern Ocean winter outgassing in ED is generally weak (supporting information Figure S7) and dominated by uptake when spatially integrated (Figures 6 and 7, biomes 15 and 16). We acknowledge that air-sea CO 2 fluxes in ED and the interpolation-based products are based on a simple parameterization of ocean-atmosphere exchange through leads; air-sea CO 2 flux is scaled by the open-water fraction of the ice cover. This parameterization may lead to the misrepresentation of biological productivity and air-sea gas exchange in regions affected by sea ice cover (e.g., Semiletov et al., 2004). Additionally, errors in the ECCO LLC270 representation of winter mixed layer dynamics in the Southern Ocean (see supporting information Figures S1-S5) could also be responsible for this mismatch.
While ED has a substantial air-sea CO 2 flux trend in SO STSS (~0.009 Pg C year −1 ; supporting information Table S2), the simulation shows muted decadal-scale variability further south in SO SPSS (Figure 7, biome 16), similar to other OBM results (Lenton et al., 2013). In particular, ED does not show a weakening and subsequent reinvigoration of SO SPSS uptake for 2002-2011 (Gruber et al., 2019;Landschützer et al., 2015;Ritter et al., 2017); these trends are most pronounced in Land13 (Figure 7, biomes 16 and 17). We stress that while interpolation-based products suggest substantial decadal-scale variability in the Southern Ocean CO 2 sink, these estimates are accompanied by large uncertainties and time-space sampling biases. Further model development and evaluation, along with sustained year-round observations from Argo-BGC floats , are needed to further assess the fidelity of OBMs in the Southern Ocean.
In the NP SPSS biome, ED exhibits a smaller winter outgassing patch that lags the interpolation-based products by several months (supporting information Figures S1 and S2). The weaker outgassing patch, along with strong winter uptake ( Figure 6, biome 2), biases ED toward larger long-term uptake compared to Röd13 and Land13 (Figure 7, biome 2). While the North Pacific Ocean is typically data rich for surface ocean pCO 2 , substantial data gaps do occur in the observational record ( Figures A4 and A5, biome 2), which could alias the surface ocean pCO 2 seasonal cycle and thus lead to weaker uptake in the interpolation-based products. Additionally, interactions between the Western Subarctic Gyre, Oyashio Current, and Bering and Okhotsk Sea outflows are typically not well resolved by OBMs (McKinley et al., 2006), which could also explain model-product differences in this region. We also observe similar patterns of mismatch between ED and the interpolation-based products in the North Atlantic Ocean, with ED having stronger winter ocean uptake in the NA SPSS (Figures 6 and 7, biome 9). The systematic bias of ED toward stronger winter uptake in subpolar seasonally stratified biomes, which results primarily from the competing influence of surface cooling and divergence in mixed layer DIC (Lauderdale et al., 2016), may result from the model's physical misrepresentation of seasonal mixed layer dynamics. We compared ED mixed layer depth with the observation-based product of Hosoda et al. (2010). For the 2004-2017 period, the simulated mixed layer at high latitudes is generally deeper during winter and shallower during summer than the Hosoda et al. (2010) product (supporting information Figures S1-S5). Future efforts focusing on understanding the underlying causes of model-product mismatch in these regions are critical for improving estimates of the global ocean CO 2 sink.

Summary and Concluding Remarks
We have developed ECCO-Darwin, a data-assimilative global ocean biogeochemistry model that allows for a quantitative multidecadal (1995-2017) description of the ocean's physical, chemical, and ecological state. Model physics are provided by the adjoint-based ECCO LLC270 circulation estimate, which assimilates nearly all available ocean observations since the era of satellite altimetry and has~1/3°nominal horizontal grid spacing (~18 km at high latitudes). A low-dimensional Green's functions approach (adjusting biogeochemical initial conditions and six model parameters) is used to improve model fit to global biogeochemical observations in a property-conserving manner (i.e., without nudging or generating spurious sources/sinks). We then use ECCO-Darwin, along with a suite of interpolation-based products, to estimate seasonal to multidecadal surface ocean pCO 2 and air-sea CO 2 fluxes across the global ocean and in 17 open-ocean biomes.
Modeled air-sea CO 2 fluxes show broad-scale consistency with surface ocean pCO 2 observations and interpolation-based products in many biomes. The closest agreement occurs in the subtropical and equatorial regions. The largest disagreement occurs in subpolar regions that are characterized by large seasonal variations in mixed layer depth and where ECCO-Darwin has, in general, stronger winter uptake than the interpolation-based products. Compared to other ocean biogeochemical models, ECCO-Darwin has a global ocean CO 2 sink that agrees well with interpolation-based products in terms of both magnitude (time mean of −2.47 ± 0.50 Pg C year −1 ) and temporal variability.
While ED already provides a promising model-based framework for investigating air-sea CO2 fluxes and ocean carbon cycling, there are several areas where model and data assimilation can be substantially improved.
(1) Land-ocean aquatic continuum: Terrestrial runoff for rivers, ice sheets, and groundwater are coarsely represented in ED, do not contain nutrients and organic carbon, and do not interact with coastal ecosystems. Ongoing ED model development aims to add time-varying global biogeochemical runoff and to better parameterize coastal ecosystems in order to improve the representation of coastal (Chen et al., 2013;Cotrim da Cunha et al., 2007) and ice sheet processes (Hopwood et al., 2018).
(2) Dissolution rate and bottom sediments: Our treatment of calcium carbonate does not currently include nonlinear dissolution rate laws (Naviaux et al., 2019;Subhas et al., 2015) or account for sediment burial (Dunne et al., 2012). Improved parameterizations for these processes are required in order to better represent ocean acidification.
(3) Mesoscale and submesoscale processes: The current nominal 1/3°horizontal grid spacing of ED does not permit explicit representation of mesoscale eddies, let alone submesoscale ocean processes. As it becomes computationally practical, we expect that increased ED model resolution will permit an increasingly more realistic representation of mesoscale and submesoscale physical-biogeochemical interactions. (4) Satellite and airborne observations of ocean color: Although we have not yet included ocean color data constraints in ED, the Darwin ocean ecology package already includes a radiative transfer module , which will permit the direct utilization of satellite and airborne ocean color observations for evaluation and adjustment of ED. Future data assimilation development will include these important data constraints, with potential to improve phytoplankton phenology and thus improve the timing and magnitude of blooms. (5) Adjoint method optimization: In this study, we rely on a low-dimensional Green's functions approach to minimize biogeochemical model-data misfit; the incorporation of an adjoint-based optimization method, as is done by Verdy and Mazloff (2017) for the Southern Ocean, would allow for a larger number of control variables and for joint physical-biogeochemical optimization. Adjoint-method optimization will permit a much larger number of control variables than is possible with the Green's functions approach and hence a closer fit of ED to biogeochemical observations. Furthermore, adjoint-method optimization of the coupled physical-biogeochemical model will allow the use of biogeochemical observations for adjusting ocean physics. (6) Formal uncertainty estimates: In this manuscript we have presented a preliminary comparison with observations and observation-based products. These results suggest that ECCO-Darwin, combined with complementary interpolation-based products, can be used to form improved uncertainty estimates for ocean carbon sink studies. In particular, synthetic observations, formed by sampling ECCO-Darwin at the locations and times of available observations and adding geophysical noise, can be used to evaluate sampling biases in the interpolation-based approaches. We acknowledge that the ED simulation presented in this paper is at best eddy permitting, so this evaluation may overestimate the representation of observations required. Conversely, the interpolation-based approaches will allow us to identify and reduce ED model deficiencies.
Despite the above limitations, many of which are already being actively addressed, our modeling efforts provide a significant step forward in the development of a global ocean physical and biogeochemical estimation framework. The solution presented herein has been made available on a data server and is already beginning to be used for science and practical applications, for example, by the Carbon Monitoring System Flux project. Of equally high significance is that the model and analysis software have also been carefully documented and made available in public code repositories and therefore can be used or further developed by other researchers. As the ECCO ocean state estimates, the Darwin Project ocean ecology and biogeochemistry, and interactions between these two projects and other climate components continue to evolve, we expect ECCO-Darwin to become an ever more accurate and useful tool in support of ocean carbon, marine ecosystem, and climate-related studies. Figure A1 illustrates the geographical extent of the 17 open-ocean biomes used in this study. Figures A2-A5 provide some additional model-data comparisons in these 17 biomes.

10.1029/2019MS001888
Journal of Advances in Modeling Earth Systems Figure A2. Seasonal cycle of surface ocean pCO 2 for each biome. Surface ocean pCO 2 is time averaged for each month (for January 1995 to December 2017) and represents the area-weighted mean for each biome. Anomalies with respect to the long-term mean are shown. Similar y axis scales are used for each individual ocean basin. Figure A3. Monthly surface ocean pCO 2 time series for ED (black), Röd13 (cyan), and Land13 (magenta). Surface ocean pCO 2 represents the area-weighted mean for each biome. Thin lines show monthly values and thick lines show interannual variability (12-month running mean). Similar y axis scales are used for each individual ocean basin.