Reconciling Observation and Model Trends in North Atlantic Surface CO2

The North Atlantic Ocean is a region of intense uptake of atmospheric CO2. To assess how this CO2 sink has evolved over recent decades, various approaches have been used to estimate basin‐wide uptake from the irregularly sampled in situ CO2 observations. Until now, the lack of robust uncertainties associated with observation‐based gap‐filling methods required to produce these estimates has limited the capacity to validate climate model simulated surface ocean CO2 concentrations. After robustly quantifying basin‐wide and annually varying interpolation uncertainties using both observational and model data, we show that the North Atlantic surface ocean fugacity of CO2 (fCO2−ocean) increased at a significantly slower rate than that simulated by the latest generation of Earth System Models during the period 1992–2014. We further show, with initialized model simulations, that the inability of these models to capture the observed trend in surface fCO2−ocean is primarily due to biases in the models' ocean biogeochemistry. Our results imply that current projections may underestimate the contribution of the North Atlantic to mitigating increasing future atmospheric CO2 concentrations.


Introduction
About 50% of the carbon dioxide (CO 2 ) emitted from fossil fuel and land-use change activities is absorbed each year by the natural terrestrial and marine sinks, in similar proportions (Le Quéré et al., 2018). While the land and ocean carbon sinks play a fundamental role in controlling the levels of atmospheric CO 2 , and hence mitigating climate change, the limited number and spatial coverage of the CO 2 observations makes the quantification of their intensity and variability at both the global and regional scale challenging. Due to the land spatial heterogeneity, which is difficult to capture by point measurements (especially compared to the relatively well-mixed ocean), the land sink is difficult to quantify and is commonly estimated as the residual from total emissions, atmospheric CO 2 growth, and ocean sink (Le Quéré et al., 2018). As such, robust estimates of the marine carbon sink's response to natural variability and climate change must be produced, so that (1) change in the CO 2 airborne fraction can be quantified and (2) the relationship between anthropogenic CO 2 emissions and atmospheric CO 2 concentrations represented in models can be confidently evaluated.
The North Atlantic Ocean is a region of strong natural and anthropogenic CO 2 uptake (Khatiwala et al., 2013;Mikaloff Fletcher et al., 2007;Sabine et al., 2004). While the large-scale processes controlling atmospheric CO 2 uptake by the North Atlantic are well understood, estimates of the time-varying sink over recent decades range from a decline (Schuster & Watson, 2007), through variability (Bates et al., 2014;Gruber et al., 2009), to an increase (Landschützer et al., 2016;Schuster et al., 2013;Ullman et al., 2009). These differences are likely to arise from the choice of study interval McKinley et al., 2011), internal climate variability (McKinley et al., 2011(McKinley et al., , 2016, and limited quantification of the uncertainty arising from the interpolation of CO 2 observational coverage . Indeed, despite international efforts to provide a well-distributed CO 2 observational coverage (both spatially and temporally), substantial gaps remain, including in the North Atlantic (Figures 1 and S1 in the supporting information). While various techniques (e.g., Landschützer et al., 2013;Rödenbeck et al., 2015;Schuster et al., 2013) have been developed to estimate the surface ocean fugacity of CO 2 (fCO 2−ocean ) away from measurement locations and have been widely used to determine air-sea CO 2 fluxes across the oceans, basin-wide and time-varying interpolation uncertainties either have not been calculated (e.g. Rödenbeck et al., 2015) or are too broad to allow the detection of significant temporal trends in surface fCO 2−ocean (Jones et al., , 2019. This lack of welldelimited interpolation uncertainty on observational-based products limits our ability to understand the marine carbon response to increasing atmospheric CO 2 and climate change, and also inhibits our evaluation of surface CO 2 concentrations simulated by climate models. By making use of the strengths of both observational and model data, here we robustly quantify basin-wide interpolation uncertainties of North Atlantic surface fCO 2−ocean from 1992 to 2014, which allows us to (1) determine whether the change in the surface ocean CO 2 concentrations is significant, and (2) robustly compare the observationbased results with those simulated by the current generation of Earth System Models (ESMs) and identify potential shortcomings in those models.
We first present the observation-based interpolation technique used to provide basin-wide fCO 2−ocean estimates and the novel interpolation uncertainty assessment (section 2). Once an appropriate method for robustly quantifying the uncertainties of the annually varying fCO 2−ocean and its trend is identified, we then determine the recent change in the North Atlantic surface fCO 2−ocean and evaluate the corresponding simulated change in state-of-the-art ESMs from CMIP5, the Coupled Model Intercomparison Project Phase 5 (section 3). Finally, we investigate the reasons behind the discrepancy between the models and the observation-based results by (1) using the models' preindustrial control runs to discuss the role of internal variability and (2) generating a set of ocean model simulations initialized with observations to study the role of biogeochemical initial conditions in driving the North Atlantic surface fCO 2−ocean trend (section 4).

The MLR Interpolation Technique
Multiple linear regression (MLR) approaches rely on the mechanisms that link the predicted variable, which is described here by the spatially and temporally discontinuous observations of surface fCO 2−ocean , to a set of explanatory variables, which could be any available basin-wide variables that are involved in the fCO 2−ocean response to anthropogenic changes, as well as physical and biogeochemical oceanic properties. To account for specific relationships between the predicted and explanatory variables across different biogeochemical regimes (e.g., a surface fCO 2−ocean signal mostly temperature driven in the subtropical North Atlantic and principally temperature and biologically driven in the subpolar region; Schuster & Watson, 2007), MLR-based studies commonly build a linear fit within each geographical regime, whose separated fCO 2−ocean results are finally merged to reconstruct the basin of interest (e.g., Iida et al., 2015;Watson et al., 2009).   in the North Atlantic, for the period 1992-2014. Waters shallower than 1,000-m depth have been removed using the ETOPO1 bedrock product (Amante & Eakins, 2015). The term "fCO 2−ocean values" refers to the monthly gridded values in SOCAT Version 4, calculated from the fCO 2−ocean observations that were submitted to the SOCAT database. See Figure S1 for a representation of the annual spatial coverage of SOCATv4 for 1992-2014.

Global Biogeochemical Cycles
Most studies assess the interpolation uncertainty through the comparison of the predicted results against the observations used to train the MLR and/or against observations that were not included in the MLR (independent products), therefore limiting the uncertainty assessment to the irregular observational coverage (e.g., Landschützer et al., 2014;Schuster et al., 2013). Substantial assumptions are therefore made when applying the "localized" uncertainty to the basin-wide and continuous predicted field; one assumption being that the interpolation method does not add any bias on the surface fCO 2−ocean trend. By contrast, Watson et al. (2009) (hereinafter W09) used, for the first time, a biogeochemical model to evaluate the basin-wide uncertainty and test for potential biases introduced by the interpolation. Specifically, the model's surface fCO 2−ocean field was subsampled at the same locations and days of the year of study as the observational coverage and was then treated as "real observations" in a separate MLR. By comparing the MLR-predicted fCO 2−ocean to the model-truth values, W09 were able to generate a basin-wide uncertainty. As such, the MLR approach and uncertainty assessment developed by W09 is here extended and improved through a set of observation-based (section 2.1.1) and model-based MLRs (section 2.1.2). 2.1.1. The Observation-Based MLR W09 built, for each 2-month interval of the year 2005 and for subdivisions of the North Atlantic of 30°, 20°, and 10°latitude, a linear relationship between the surface fCO 2−ocean and three explanatory variables: the sea surface temperature (SST), mixed layer depth (MLD), and longitude. By being directly linked to changes of CO 2 concentration in seawater, the SST and MLD are characterized as mechanistically driven explanatory variables. Indeed, the solubility of CO 2 increases as the temperature of the surface waters decreases, and the deepening of the MLD through density-influenced and/or wind-mixing events can (1) enhance the dilution within the mixed layer of additional CO 2 taken up by the surface ocean and hence stimulate further uptake; (2) bring nutrient-enriched waters to the euphotic layer, potentially enhancing photosynthesis and the CO 2 uptake; and (3) bring dissolved inorganic carbon (DIC)-enriched waters to the surface, leading to local CO 2 outgassing (Sarmiento & Gruber, 2006). The longitude explanatory variable used here, and in W09's method, is included to account for east-west differences in water properties, for example, resulting from the contrasting temperature histories of water in eastern and western boundary currents, not accounted for by the other variables.
Here, we extend the W09 MLR method by (1) optimizing the approach by testing the use of latitude band widths of 60°, 30°, 20°, 10°, 5°, 2°, and 1°in order to account for the approximately latitudinally separated biogeochemical regimes resulting from the basin's circulation but also to investigate the impact of the spatial division on the predicted fCO 2−ocean ; (2) widening the temporal extent of study from a single year to the period 1992-2014 (dates chosen to correspond to the starting year of the MLD product; Table S1); and (3) incorporating a further explanatory variable, the atmospheric CO 2 mixing ratio (xCO 2 ), in order to account for the time-varying impact of anthropogenic CO 2 emissions on the marine carbon system. The MLR analysis performed on observational products is hereinafter referred to as the "observation-based MLR." The observation-based MLR was performed using surface fCO 2−ocean from the Surface Ocean CO 2 ATlas (SOCAT) monthly gridded product Version 4 (hereinafter SOCATv4; Bakker et al., 2016), and using for the monthly SST, MLD, and xCO 2 , the Optimum Interpolation Sea Surface Temperature (OISST) Version 2 product (Reynolds et al., 2007), the Estimating Circulation and Climate of the Ocean (ECCO2) version 2 product (Menemenlis et al., 2008), and the GLOBALVIEW-CO 2 reference matrix (GLOBALVIEW-CO2, 2013), respectively, from which (1) the period 1992-2014 and the North Atlantic (defined here as 10°N to 70°N and from 75°W to 5°E) were extracted, and (2) the regions shallower than 1,000-m water depth (as in W09) were removed using the ETOPO1 Bedrock product (Amante & Eakins, 2015) to minimize the freshwater inputs from rivers and coastal effects (cf. Table S1 for further description on the observational data processing). The observation-based MLR followed two steps: 1. The explanatory variables were subsampled at the locations and times at which fCO 2−ocean values were available within SOCATv4 (in the North Atlantic over 1992-2014), referred as the "subsampled" data.
The subsampled data were used to feed the MLR as follows: where β 0 is the intercept and β 1,2,3,4 are the regression coefficients returned by the statistical model.

10.1029/2019GB006186
Global Biogeochemical Cycles 2. The β coefficients were applied to the monthly basin-wide explanatory variables, to predict the monthly basin-wide fCO 2−ocean from 1992 to 2014: The MLR computed over the 60°latitude bandwidth (corresponding to the entire North Atlantic basin width) therefore generated one set of β coefficients. To potentially improve the model's predictive skill (section 2.2), separate MLRs were trained over each latitude band of 30°, 20°, 10°, 5°, 2°, and 1°width, returning 2, 3, 6, 12, 30, and 60 sets of β coefficients (one set per subregion), respectively. The observation-based MLR analyses therefore included a total of 113 MLRs, leading to seven different monthly fCO 2−ocean estimates for the North Atlantic from 1992 to 2014 (one for each spatial division method). The uncertainty assessment, achieved through the CMIP5-based MLR analyses (section 2.1.2), will allow us to identify the optimum spatial division method that provides bias-free (i.e., nonsignificant error) North Atlantic surface fCO 2−ocean annual means and trend (section 2.2).
Assuming the CMIP5 models to be perfectly known plausible alternative worlds, we use their outputs as investigation tools. This investigation approach offers a way to test how well the interpolation method performs, crucially, at places where no observational data are available. By providing a large model diversity, the CMIP5 framework specifically allows us to statistically investigate the effectiveness of the MLR interpolation technique used in the real world. Unlike W09, which assessed the interpolation uncertainty by training a MLR on a biogeochemical-modeled fCO 2−ocean field with observational-based explanatory variables (a first attempt that unrealistically relied on the assumption that the model captures the real-world variability), the present analysis performs Observation Sampling Experiments (OSEs). By subsampling each CMIP5 model outputs (uniformly regridded) at the months and grid cells at which CO 2 observations were gathered within SOCATv4 over the period 1992-2014, OSEs allow to generate separate model-consistent MLRs, hereinafter referred as the "CMIP5-based MLRs." For each CMIP5-based MLR, the steps described in the observation-based MLR (section 2.1.1) are followed using the subsampled fCO 2−ocean model field from one of the 19 CMIP5 models and the SST, MLD, atmospheric xCO 2 fields from that CMIP5 model (and longitude, which is not a model-specific variable). Note that since the CMIP5 model simulations are conditioned with global annual atmospheric xCO 2 values, and not with seasonally and spatially varying values as in the real world, the CMIP5-based MLRs are built on model-relevant variables for consistency purposes (i.e., choosing annual atmospheric xCO 2 for CMIP5-based MLRs, while consistently selecting the seasonally and spatially varying xCO 2 for the observation-based MLR). As for the observation-based MLR, the CMIP5-based MLRs were also generated across the latitudinally divided North Atlantic. As such, the CMIP5-based MLRs generated over latitude band of 60°, 30°, 20°, 10°, 5°, 2°, and 1°width, returned a total of 19, 38, 57, 114, 228, 570, and 1,140 sets of β coefficients. For each spatial division method, the 19 reconstructed basin-wide surface fCO 2−ocean products generated by the CMIP5-based MLRs were subtracted by their corresponding model-truth fCO 2−ocean values (calculated interactively within each of the CMIP5 models), defining the fCO 2-residuals (i.e., fCO 2-residuals = f CO 2-ocean; MLR-predicted -fCO 2-ocean, model-truth ). The study of the fCO 2-residuals enables us to investigate which of the seven spatial divisions used in the MLR method provides the best predictive skills for the North Atlantic surface fCO 2−ocean annual means and the corresponding 1992-2014 trend (section 2.2).

Identifying the Appropriate MLR Setup 2.2.1. Annually Varying Uncertainty
To study whether the MLR method introduces a significant bias on the predicted annual fCO 2−ocean in the North Atlantic over the period 1992-2014, the annual time series of the fCO 2-residuals produced by each 10.1029/2019GB006186

Global Biogeochemical Cycles
CMIP5-based MLR analysis is first calculated from the basin area-weighted monthly means. Then, the annually varying model mean (thick black line in Figure 2) and standard deviation σ (dark shading in Figure 2) errors are calculated, respectively, by where the horizontal line represents the North Atlantic area-weighted monthly means, which are averaged to the year y (from 1992 to 2014) and m to the CMIP5-based MLR analysis with M the total number of models (19).
Analysis of the annual bias in the North Atlantic averaged fCO 2−ocean field derived by the MLR method ( Figure 2) shows that (1) the smaller the width of the latitude band over which the MLR is trained, the smaller the overall width of the uncertainty (the gray shadings in Figure 2 are much wider for the 60°study than for the 5°or 1°studies, with indiscernible visual improvements between the 5°, 2°, and 1°studies); (2) the uncertainty generally gets smaller over time, in line with the idea that the increase in number of fCO 2−ocean values ( Figure 1b) improves the MLR's predictive skill; and (3) the MLR analyses built on latitude bands of 5°, 2°, and 1°width present similar results, suggesting that those three statistical methods provide a similar predictive skill on annual means. A t test is used to identify whether the uncertainty over the period 1992-2014 is significantly different from a distribution of mean zero at the 5% significance level. The MLR analyses computed successively over latitude bands of 10°, 5°, 2°, and 1°width reproduce the model-averaged fCO 2−ocean time series without the addition of a significant bias (the black thick line is statistically indistinguishable from the zero red line), at the 5% significance level (Figures 2d-2g). However, for the analyses using the larger latitude bands, the MLR overestimates the model-true annual fCO 2−>ocean values at the 5% significance level (Figures 2a-2c). We conclude from this first assessment that the 10°, 5°, 2°, and 1°MLR analyses are potentially appropriate for calculating the annually varying surface fCO 2−ocean .

Temporal Trend Uncertainty
The error introduced by the MLR interpolation technique on the North Atlantic surface fCO 2−ocean trend is studied using the CMIP5-based MLR analyses. For each CMIP5-based MLR analysis, linear trends in North  (3)). The dark, medium, and light gray shadings correspond, respectively, to the 1σ, 2σ, and 3σ across the 19 annual averages of fCO 2-residuals (equation (4)). The dashed red line indicates the zero level. The t test result is indicated in each panel, where 0 means that there is no statistically significant mean bias in the residuals at the 5% significance level.

Global Biogeochemical Cycles
Atlantic surface fCO 2−ocean were calculated using the MLR-predicted (Γ MLR-predicted ) and CMIP5 modeltruth (Γ model-truth ) fields. Specifically, each basin-wide surface fCO 2−ocean product was first averaged into North Atlantic area-weighted monthly means over 1992-2014 and then averaged into annual means, from which the linear surface fCO 2−ocean trend and its associated standard error (returned by the linear fit) were obtained. By studying the differences between Γ MLR-predicted and Γ model-truth across the 19 CMIP5based MLR analyses, we can therefore quantify the error on the North Atlantic surface fCO 2−ocean trend over 1992-2014 introduced by the MLR (Figure 3).
For each of the seven different spatially divided MLR analyses, the R 2 and the root mean square error (RMSE) between Γ MLR-predicted and Γ model-truth were calculated ( Figure 3). Across the methods, the 5°latitude bands analysis provides the highest R 2 value (0.734) but most importantly the smallest RMSE (0.060 μatm/ year; Figure 3). The fact that the MLR methods on smaller latitude bands (over 2°and 1°width) provide slightly higher RMSE than the method on 5°latitude bands (Figures 3f and 3g) suggests that overfitting might be occurring. Indeed, the smaller the region over which the MLR is trained within a given period, the fewer observations are available, which could lead the MLR to become too specific to the trained data set and provide slightly poorer predictive skill for the overall population than when training the MLR over a wider region (Hastie et al., 2016).
In summary, the MLR method computed over 5°latitude bands across the North Atlantic provides unbiased (i.e., nonsignificant mean error) and robust results for both the annually varying surface fCO 2−ocean and its corresponding trend over 1922-2014. As such, the observation-based MLR results generated from the 5°l atitude bands method are adopted for the remaining analysis. Observation-based results are hereinafter presented with the associated interpolation uncertainty determined from the CMIP5-based MLR analyses (the ones generated from the 5°latitude band width setup). Specifically, the North Atlantic annually varying surface fCO 2−ocean is constrained with the annually varying 1σ, 2σ, and 3σ (equation (4)), displayed in Figure  2e. The uncertainty on the observation-based trend in surface fCO 2−ocean over the period 1992-2014 is given by the standard deviation of the difference between Γ MLR-predicted and Γ model-truth calculated across the 19 CMIP5 models (Figure 3e), which equals 0.060 μatm/year. Note that the annually varying uncertainties are in agreement with the uncertainty assessment carried out in W09 for 2005. Indeed, W09 quantified a 1σ error on the fCO 2−ocean annual mean between 0.8 and 1.8 μatm, similar to the 1σ errors determined by our analysis (which ranges between 1.0 and 1.8 μatm across 1992-2014).

Global Biogeochemical Cycles
Further justification for using the CMIP5-based MLR uncertainty results to constrain the observation-based interpolated estimates is provided in a complementary analysis (Text S2; Figure S2). We show there that the two MLR studies (the one using the observational-based products and the one using the CMIP5 outputs) react similarly to their corresponding proxy variables. At the locations and times at which observations were made, the CMIP5-based MLR uncertainties are in agreement with the observation-based MLR uncertainties (on annual means; Text S2 and Figure S2). Since the annually varying uncertainty assessment is valid for the current CO 2 observational coverage, one can assume that the basin-wide annually varying uncertainty assessment is a robust estimate to delimit the annually varying observation-based fCO 2−ocean estimates.

Recent Change in the North Atlantic Surface fCO 2−ocean
In this section we explore the implications of the uncertainties in annual variation and trend on the North Atlantic fCO 2−ocean using the MLR technique applied to 5°latitude bands as determined above (Figures 2e and 3e). While the observation-based fCO 2−ocean predicted by the MLR is available at a monthly frequency and over 5°latitude bands, a study of the long-term change (23 years) at those resolutions would require estimates that are bias-free (i.e., the mean interpolation error not statistically different from zero) and robust within each subregion in each season, requirements that are not met for the month of August ( Figure S3) and for some subregions of the North Atlantic ( Figures S4 and S5). For instance, the MLR over 5°latitude band widths shows that while the method is robust in the subtropical North Atlantic, the range of uncertainties increases as we move northward ( Figure S4), suggesting that the proxy variables used within the MLR might miss some information regarding the long-term change in the subpolar region and/or that not enough observations exist at high latitudes to detect a well-delimited signal. As such, we deliberately focus on basin-wide annual means for which we can demonstrate that the MLR results are unbiased and robust-a requirement for the comparison with the CMIP5 models (section 3.3).
The North Atlantic surface fCO 2−ocean obtained from the observation-based MLR is first area-weighted into monthly means and then averaged into annual means (thick blue line in Figure 4). Over the period 1992-2014, the North Atlantic surface fCO 2−ocean increased approximately linearly at a rate of 1.47 ± 0.06 μatm/year ( Figure 4). The standard error on the trend due to the linear fit is 0.04 μatm/year, which is encompassed by the 1σ interpolation uncertainty. The relatively small year-to-year variability in surface fCO 2−ocean (as characterized by the relatively small standard error on the linear fit) is expected because (1) interpolation methods systematically tend to smooth high-frequency variability and hence ultimately smooth the interannual signal and (2) the interpolation method used here is a linear regression model, which by construction we expect to lead to a robust estimate of the first-order trend. The increase of surface fCO 2−ocean is considerably less than that of fCO 2-atmosphere over the same interval (1.88 ± 0.02 μatm/year; Text S3; dashed line in Figure 4; GLOBALVIEW-CO2, 2013; Kalnay et al., 1996;Reynolds et al., 2007), resulting in an increased atmosphereocean fCO 2 gradient ( Figure S6). With all else being equal, this would result in increased oceanic CO 2 uptake.
The idea that atmospheric and surface ocean CO 2 concentrations can be seen to diverge in response to anthropogenic CO 2 emissions, while ultimately an inevitable consequence of the Revelle factor (Revelle & Suess, 1957), could be considered counterintuitive. Indeed, McKinley et al. (2011) found ocean and atmosphere CO 2 trends to converge in the North Atlantic on multidecadal time scales but also highlighted that in the latter period of study (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005), the largest region examined in their study, the permanently stratified Subtropical Atlantic, had a lower ocean than atmosphere CO 2 trend. While McKinley et al. (2011) suggest that the permanently stratified Subtropical Atlantic trend may be the result of multidecadal variability in the Atlantic, studies other than that presented here, which further extend the ocean CO 2 time series (Iida ). The other lines present results from previous independent observation-based studies, using different techniques to reconstruct the trends, but similar data: The pink is from the neural network method of Landschützer et al. (2016Landschützer et al. ( , 2017, the brown is from a MLR method from Iida et al. (2015), and the green is from the Jones et al. (2019) statistical gap-filling method, which also provided an estimate for basin-wide uncertainties shown by the dashed lines. Note that Landschützer et al. (2016Landschützer et al. ( , 2017 and Iida et al. (2015) specifically provided pCO 2-ocean , but in terms of the illustrative analysis presented here, the difference between pCO 2-ocean and fCO 2−ocean is negligible. The bar plot shows the contribution of the xCO 2 , mixed layer depth (MLD) and sea surface semperature (SST) variables to the predicted fCO 2−ocean trend (section 3.2). Jones et al., 2019;Landschützer et al., 2017) and from which can be calculated the area-weighted North Atlantic mean (in opposition to McKinley et al., 2011, which provide trends from nonbasininterpolated values), support our results that at a basin scale the North Atlantic is maintaining a lower CO 2 trend than that observed in the atmosphere (Figures 4 and S7b). Indeed, our multidecadal fCO 2−ocean surface results are consistent with three methodologically independent and complementary techniques (Figure 4) based on similar data: a Neural Network approach (Landschützer et al., 2016(Landschützer et al., , 2017, an alternative MLR approach Iida et al. (2015), and the Jones et al. (2019) statistical gap-filling method, which also provided a corresponding basin-wide uncertainty (Figure 4; Table S1). For each of the three independent techniques, the interpolated monthly surface fCO 2−ocean (or pCO 2-ocean ) was extracted for the North Atlantic open ocean and for the period 1992-2014 (Table S1), spatially averaged using area-weighted means and averaged to annual means, from which the trend was finally calculated. The methods from Landschützer et al. (2016Landschützer et al. ( , 2017, Iida et al. (2015), and Jones et al. (2019) obtained a fCO 2−ocean (or pCO 2-ocean ) trend of 1.49 ± 0.05, 1.71 ± 0.03, and 1.69 ± 1.03 μatm/year over 1992-2014, respectively (where the uncertainty corresponds to the standard error returned by the linear fit for the first two methods and corresponds for Jones et al. (2019) to the range of possible trends given their annually varying uncertainty), which is comparable to our trend estimate of 1.47 ± 0.06 μatm/year. While the results from Iida et al. (2015) are reaching the limit of agreement with the present work (given the 3σ range), the lack of time-varying uncertainties, which would likely be greater than the uncertainty on the linear fit (i.e., 0.03 μatm/year), might alter the interstudy comparisons. The uncertainty provided by Jones et al. (2019) covers a wide range of possible fCO 2−ocean trends within that interval, limiting what can be concluded about North Atlantic CO 2 trends from this approach. Finally, while our method was specifically designed to provide robust basin-wide fCO 2−ocean annual estimates, we show in an additional analysis that the localized surface fCO 2−ocean is in good agreement with the monthly CO 2 measurements obtained within Bermuda Atlantic Time-series Study (BATS) vicinity over our period of study ( Figure S8).

Identifying the Drivers of the Recent Change in the North Atlantic Surface fCO 2−ocean
To examine the drivers of the North Atlantic surface fCO 2−ocean trend, the role of each explanatory variable in the interpolation technique is studied. The set of β coefficients from the observation-based MLRs over 5°latitude band width (equation (1)) was applied, within each band, to the separately varying subregional xCO 2 , MLD, and SST following equations (5), (6), and (7), respectively (note that this approach was not used for longitude as it is a nontemporally varying variable). For example, the contribution of xCO 2 to the rate of change of the North Atlantic surface fCO 2−ocean was studied by calculating a new (unrealistic) fCO 2−ocean field within each 5°latitude band, using the corresponding β coefficients to the mean value of each explanatory variable, except for xCO 2 , which was allowed to vary in time and space (equation (5)). The 12 f CO 2 -ocean<−xCO2 monthly varying fields (one per each 5°latitude band) were then merged to reconstruct the basin-wide North Atlantic, monthly averaged with an area-weighted mean and annually averaged, from which the mechanistic trend was finally calculated. As such, by studying the sign and amplitude of the mechanistic trends in f CO 2 -ocean < -xCO2 , fCO 2-ocean<-MLD , and fCO 2-ocean<-SST , the dominant driver (among the given explanatory variables) and the mechanisms involved in the rate of change in surface fCO 2−ocean can be identified (bar plot in Figure 4).
where β 0 is the intercept and β 1,2,3,4 are the regression coefficients (for a given 5°latitude band) returned by the linear model (equation (1)) and horizontal lines indicate that the data are averaged to a constant value.

10.1029/2019GB006186
Global Biogeochemical Cycles respectively (bar plot in Figure 4). By having the largest trend amplitude, the atmospheric xCO 2 appears as the predominant driver explaining the increase in surface fCO 2−ocean over the period 1992-2014, while expected provides evidence that the anthropogenic signal dominates the annual surface fCO 2−ocean in the North Atlantic over this period. This result anticipates the results of McKinley et al. (2011), which found thata 25-year-long interval was required for the long-term signal in surface fCO 2−ocean to emerge from the North Atlantic decadal variability (note however that their study was based on a different interval, 1981-2009). Over the period 1992-2014, the MLD and SST play a less important role than the atmospheric xCO 2 in controlling the surface fCO 2−ocean , and their signs are consistent with our understanding of the mechanisms between those two variables and the surface fCO 2−ocean . The negative sign in the fCO 2-ocean<-MLD trend indicates that an overall increase in the MLD would lead to a decrease in the surface fCO 2−ocean , suggesting that the enhancement of the dilution of CO 2 in the mixed layer and/or the stimulation of the biological activity from the input of nutrient-enriched deep waters to the surface are the main MLD-related mechanisms involved in the surface fCO 2−ocean for the basin-wide North Atlantic. As such, the impact of carbon-enriched deep waters to the surface as MLD deepens appears to be minimal over the period 1992-2014 and is unlikely to explain the rate of change of surface fCO 2−ocean . The positive sign in the fCO 2-ocean<-SST trend indicates that, over the period 1992-2014, an increase in the SST leads to an increase in the North Atlantic surface fCO 2−ocean , which is consistent with the decrease in the solubility of CO 2 into seawater as surface waters warm. Nonetheless, we acknowledge that the drivers in the recent change in the North Atlantic surface fCO 2−ocean would vary across the basin (with a remaining dominance from atmospheric xCO 2 ), for instance in the subpolar region, which experienced localized deep water formation events at the beginning of the period of study (Yashayaev et al., 2007) and which could therefore result in a MLD-driven mechanism ( Figure S9).

Evaluating the Recent Change in the North Atlantic Surface CO 2 Concentrations in the CMIP5 Models
Recent change in the surface fCO 2−ocean simulated by 19 CMIP5 models (Table 1) is evaluated through comparison with our new error-bounded observational-based time series and trend ( Figure 5). We find that (1)

10.1029/2019GB006186
Global Biogeochemical Cycles the CMIP5 models overall behave similarly to each other in response to the increase in atmospheric CO 2 concentrations and (2) the CMIP5 models differ from the observation-based estimates in terms of the amplitude of the surface fCO 2−ocean , and importantly its trend (Figures 5a and 5b). Over the period 1992-2014, the North Atlantic fCO 2−ocean trend in the CMIP5 models is on average 1.90 ± 0.09 μatm/year (where the ±1σ value corresponds to the intermodel variability; Table 1). The models' surface ocean concentration closely follows the rise of atmospheric CO 2 , which they experienced (the model-mean fCO 2-atmosphere trend, i.e., the calculation of fCO 2-atmosphere from the prescribed historical and RCP8.5 xCO 2 using modeled temperature and pressure, over the period 1992-2014 is 1.92 ± 0.01 μatm/year; Figure S7b), consequently limiting the air-sea CO 2 gradient and therefore the removal of CO 2 from the atmosphere. The fCO 2−ocean trends in the CMIP5 models (1.90 ± 0.09 μatm/year) are significantly larger than the observation-based fCO 2−ocean trend (i.e., 1.47 μatm/year), at the 5% significance level (right-tailed t test statistics).
The fact that the CMIP5 models have a larger fCO 2−ocean trend than the observations means that the difference between fCO 2-atmosphere and fCO 2−ocean (i.e., ΔfCO 2 ) increases at a slower rate in the models than in the real world (for a negligible difference in the fCO 2-atmosphere between the real-world and the models, as shown in Figure S7b). Therefore, the air-sea CO 2 flux, which is proportional to ΔfCO 2 , would increase at a slower rate in the models than in the real world (with potential CO 2 outgassing), in the absence of significant trend in the gas transfer velocity and solubility.

Discussion
The statistically significant discrepancy in the surface fCO 2−ocean trends between the CMIP5 models and the observation-based estimate is likely to result from one or more of four factors: (1) the CMIP5 models and the real world could be forced with slightly different atmospheric CO 2 concentrations, which could have impacted the surface fCO 2−ocean and led to slightly different trends between the two systems; (2) the specific number and/or combination of the chosen CMIP5-based MLR analyses used to calculate the trend uncertainty may result in an anomalous fCO 2−ocean trend uncertainty, impacting the outcome of the modelobservation comparison; (3) the real world could be experiencing, over the period of study, a phase of natural variability not captured by any of the CMIP5 models; and/or (4) the CMIP5 models could poorly represent or miss some key characteristics of the marine CO 2 system necessary to capture the observed surface fCO 2−ocean trend. These four possibilities are explored here.  Kalnay et al., 1996;Reynolds et al., 2007; Text S3; which is closely followed by the CMIP5 models' atmospheric fCO 2 , as shown in Figure S7b) and (b) resulting linear trends over the 1992-2014 interval. Blue corresponds to observation-based multiple linear regression (MLR) results and orange to each of the available CMIP5 models (Table 1). The dark, medium, and light gray shadings, respectively, represent the ±1σ, 2σ, and 3σ annual and trend interpolation uncertainties. In (b), the thick orange line corresponds to the CMIP5 model-mean trend value, and the ±1σ, 2σ, and 3σ across the models by the associated error bars.

A Model Bias due to Different Atmospheric xCO 2 ?
To verify that the discrepancy in the surface fCO 2−ocean does not arise from differences in the respective atmospheric xCO 2 products, the annual atmospheric xCO 2 values in the real world and in the model world are compared ( Figure S7a). Overall, the xCO 2 annual time series from GLOABALVIEW-CO2 (extracted for the North Atlantic and area-weighted into annual means) and from the "historical+RCP8.5" scenarios are similar, with indistinguishable resulting trends over 1992-2014 (i.e., 1.95 ± 0.02 ppm/year and 1.98 ± 0.02 ppm/year, respectively; Figure S7a). For the period up until 2005 (year from which the CMIP5 models are forced with RCP8.5 scenario xCO 2 values), the atmospheric xCO 2 trends from the observationbased product and the RCP8.5 projection values are also indistinguishable (1.83 ± 0.04 ppm/year and 1.80 ± 0.03 ppm/year, respectively; Figure S7a). As such, the trend discrepancy in surface fCO 2−ocean between the real world and the models cannot be explained by the minimal differences in atmospheric xCO 2 between the two systems.

Model-Observation Discrepancy Explained by a Sensitive Interpolation Uncertainty?
The trend uncertainty assessment, which was used to constrain the observation-based trend estimate, relied on the CMIP5-based MLR analyses and was determined from the distribution across the 19 CMIP5 models of the difference between Γ MLR-predicted and Γ model-true (see section 2.2). As such, we performed a sensitivity analysis on the trend uncertainty to test whether the difference between the observation-based and the CMIP5 fCO 2−ocean trends remains significant if we were to determine the uncertainty using fewer and different combinations of CMIP5-based MLR analyses. We calculated the trend uncertainty using a number of models k (a subset of the available models) that varies from 2 to 18 and considers the different possible combinations C M k of those models: where M is the total number of models available (19) and k the number of selected models (from 2 to 18) in a subset used to calculate the uncertainty. The standard deviation (i.e., trend uncertainty) calculations are repeated for each model subset size and combination.
The observation-based fCO 2−ocean trend with the uncertainties resulting from the sensitivity analysis (i.e., 1.47 ± (1, 2, 3) σ C M k μatm/year are compared to the CMIP5 model ensemble of fCO 2−ocean trends with an unpaired two-sample left-tailed t test, for each of the possible combinations ( Figure S10). A t test statistics show that in all scenarios, the fCO 2−ocean observation-based trend is always significantly smaller than the CMIP5 trends, at the 5% significance level ( Figure S10). The sensitivity analysis therefore indicates that the CMIP5 models robustly overestimate the observation-based fCO 2−ocean trend.

Internal Variability
One of the major challenges when interpreting the time-varying behavior in a model ensemble against observations on decadal to multidecadal time scales is that the model ensemble and the real world could be experiencing different phases of internal variability (e.g., the North Atlantic Oscillation or Atlantic Multidecadal Variability, both of which have been implicated in North Atlantic CO 2 uptake variability; McKinley et al., 2011;Schuster et al., 2009;Thomas et al., 2008). The model ensemble may therefore be significantly different from the observations because it does not capture the component of natural variability sampled by the real system. To assess this possibility, the internal variability in the model ensemble and in the real world should be quantified. While studies (e.g., DeVries et al., 2019;Landschützer et al., 2016;Rödenbeck et al., 2015) suggest substantial decadal variability within observation-based estimates linked to climate variability (e.g., Landschützer et al., 2019), the relatively short length of the fCO 2−ocean observational record limits our confidence in the representation of the true internal variability and its interaction with CO 2 uptake at decadal and longer time scales (e.g., McKinley et al., 2011). A further complication is that this variability is superimposed on the anthropogenically forced climate change and the two can only be disentangled by assuming the validity of internal variability generated by models (e.g., Hegerl & Zwiers, 2011).
To investigate the role of internal variability within the North Atlantic surface fCO 2−ocean trend, an initial analysis was conducted using the forced model fields. We specifically calculated the surface fCO 2−ocean 10.1029/2019GB006186 Global Biogeochemical Cycles trends simulated by the CMIP5 models within relatively shorter intervals than the period of study 1992-2014 (precisely over the periods 1992-2013, 1993-2014, 1992-2012, and 1994-2014; Figure S11). We show that the surface fCO 2−ocean trends within 21-and 22-year-long intervals, as simulated by the CMIP5 models, are within the same range as the trends over the period 1992-2014 ( Figure S11), suggesting that the signal captured by the CMIP5 models is consistent with overall forced system, as opposed to internal variability.
Nevertheless, models offer idealized platforms to fully quantify the internal climate variability using their unforced control simulations. To attempt to quantify internal variability in surface ocean CO 2 concentrations in a more robust manner (relative to the analysis conducted in Figure S11), we use the CMIP5 model preindustrial control simulations. The North Atlantic internal variability (i.e., the unforced variability) in the models' surface pCO 2-ocean contained in 23-year-long intervals is quantified by using the models' preindustrial experimental runs, which describe the climate system without anthropogenic forcing. Out of the 19 CMIP5 models used in the model trend evaluation study, 14 provided pCO 2-ocean data from their preindustrial control simulation (Table S2). In the context of this study, the difference between pCO 2-ocean and fCO 2−ocean is assumed to be negligible. Since the 14 CMIP5 models' control simulations were run over different lengths of time (from 240 to 1,000 years), the first 240 years in each of the control simulations was used for consistency. Over 240 years, 217 possible 23-year-long continuous intervals are defined, leading to the generation of 217 pCO 2-ocean linear trends for each of the 14 CMIP5 control simulations ( Figure 6). The standard deviation σ unforced of the pCO 2-ocean trends calculated across all the possible 23-year-long intervals and across the available control simulations (a total of 217 × 14 = 3,038 trends) equals 0.036 μatm/year (corresponding to the dark green shading in Figure 6) should represent the amplitude of the unforced variability in the models. The internal variability in the surface pCO 2-ocean trend over a 23-year-long  (Table S2). The dark to light green bands correspond to 1σ unforced , 2σ unforced , and 3σ unforced of those pCO 2-ocean trends, respectively. interval (up to 3σ unforced = 0.108 μatm/year), as simulated by the CMIP5 models, is about four times smaller than the difference between the CMIP5 model mean and the observation-based fCO 2−ocean trend (i.e., 1.90 − 1.47 = 0.43 μatm/year), indicating that the systematic trend overestimation in the CMIP5 models cannot be explained by the models' internal variability. To further support this statement, additional analysis (Text S3 and Figure S12) suggests that an ensemble of 19 forced model runs (with the model diversity as provided by the CMIP5 framework) is large enough to sample most of the models' unforced variability captured within an interval of 23 years. Due to the increase in atmospheric CO 2 concentrations since the beginning of the industrial era, the surface fCO 2−ocean in the present day is higher than during preindustrial times, leading to the surface ocean being less buffered and potentially more variable than prior to the existence of anthropogenic forcing. As such, the dispersion of the trends simulated by the preindustrial runs may not necessarily be directly comparable with the dispersion of the trends simulated in an anthropogenically forced climate. However, the fact that none of the models used to investigate the trends over 1992-2014 capture the observation-based trend, while they all experienced the anthropogenic atmospheric CO 2 rise and consequent reduction in buffering, further suggests that the ensemble size is large enough to account for model internal variability (Text S3 and Figure S12).
While the fCO 2−ocean trends in the 19 ensemble members seems to predominantly capture the forced change rather than the models' unforced variability, it cannot be ruled out that the observation-based fCO 2−ocean trend estimate (calculated over a 23-year-long interval) may describe a phase of multidecadal internal variability, which is not generated by the models (McKinley et al., 2017;Schuster et al., 2009;Thomas et al., 2008). Indeed, the observational-based study of McKinley et al. (2011) identified that in the North Atlantic, the long-term trend took 25 years to emerge from the variability occurring on decadal time scales. Multiple lines of evidence-observational analysis indicating that xCO 2 was the dominant driver of our identified trend, model unforced control run analysis, and use of a 19 member model ensemble-have pointed to the trend difference being anthropogenically forced and further evidence that the discrepancy between the CMIP5 model and the observed trends is anthropogenic in origin comes from a mechanistic exploration of the difference in trends below.

Investigating the Mechanisms Explaining the CMIP5 Model Bias
While the CMIP5 models are the most advanced tools available at this time to explore the Earth's climate response to anthropogenic forcing, and the coordination of their simulations to provide multimodel ensembles helps us avoid drawing model-specific conclusions, such multimodel analyses lead to challenges. Different modeling groups will have made different assumptions in building and setting up the model, for example, with different initial conditions and spinup procedures (Séférian et al., 2016). As such, to understand and identify the potential mechanisms responsible for the systematic bias in the fCO 2−ocean trends in the CMIP5 models, we performed five ocean-only ensembles using a single model. The five ensembles are identically initialized with observation-based products but are forced with surface conditions from five different sources spanning the interval of interest, which allow us to investigate the impacts of the initialization and of the surface forcing fields on the simulated fCO 2−ocean trends.
Simulations were performed using a 1°, 75 vertical level, global ocean-only physical-biogeochemical model, the GO5.0 (Megann et al., 2014) configuration of the Nucleus for European Modelling of the Ocean (NEMO) hydrodynamic model (Madec, 2008) coupled with v4.1 of the CICE model (Hunke et al., 2010), and the Hadley Centre Ocean Carbon Cycle (HadOCC; Palmer & Totterdell, 2001) biogeochemical model. All NEMO fields were initialized to zero, except for temperature and salinity, which were taken from the EN4 objective analysis v4.1.1 (Good et al., 2013;Gouretski & Reseghetti, 2010). For HadOCC, initial conditions for nutrients were taken from the World Ocean Atlas climatology (Garcia et al., 2010), for DIC and Total Alkalinity (TA) from the Global Ocean Data Analysis Project (GLODAP) climatology ; whose version is more representative of the 1980s, period over which the simulation initialization is made, than the second updated version), and for phytoplankton, zooplankton, and detritus from the end of a previous simulation (Ford et al., 2012). The atmospheric pCO 2 values were prescribed using globally and monthly averaged surface data based on observations (Dlugokencky & Tans, 2016).
Considering initially the first ensemble member, five simulations were run from 3 January 1979 to 31 December 2014 and were forced by prescribing the surface conditions from five different sources (one source for each experiment). The prescribed surface forcing includes air temperature, snowfall, specific humidity, vector winds, precipitation, downwelling shortwave, and longwave radiation. The motivation behind running five simulations forced with five different surface conditions was to test whether the fCO 2 −ocean simulated trend was highly impacted by the atmospheric forcing (heat, moisture, and momentum fluxes) or by the initialization of the ocean variables, which was kept the same across the simulations. The first simulation used daily surface conditions from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (Dee et al., 2011) and is hereinafter referred as the "ERA-Interim forced simulation." The remaining four simulations used daily surface conditions from four CMIP5 model outputs, specifically from the GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, and CanESM2 (Text S1) and are hereinafter referred as the "CMIP5-forced simulations." The four CMIP5 models were chosen to approximately span the fCO 2−ocean CMIP5 model behaviors (Table 1).
For each of the five simulations of the first ensemble member, the surface fCO 2−ocean outputs were (1) regridded into a 1°× 1°regular grid using the bilinear interpolation within the CDO package (http:// www.mpimet.mpg.de/cdo), (2) extracted for the open waters of the North Atlantic (the shelf waters above 1,000-m depth were removed using ETOPO1; Amante & Eakins, 2015), (3) monthly averaged (as the outputs were saved at a daily frequency), and (4) averaged into annual area-weighted means; leading for each simulation to a fCO 2−ocean annual time series from 1979 to 2014. For each of the five annual fCO 2−ocean annual time series, the linear trend was calculated over the period of interest 1992-2014. As such, the first 13 years of the simulations (i.e., from 1979 to 1991) correspond to the spinup period. While a 13-year spinup is inevitably of insufficient length for the model to reach equilibrium, an analysis of the model fields suggested that the large initial drifts had settled down during this period (the initial perturbation settles after ∼3-4 years, example in Figure S13), certainly for the surface processes which we are most interested in over the time scales of this study, and which are most driven by the atmospheric forcing. As such, we are satisfied that the hindcast is sufficiently able to reproduce the mean state and observed variability of the ocean (Ford & Barciela, 2017). Furthermore, we require a short spinup to avoid the DIC and TA fields drifting too far from their initialized observed state and therefore to test the carbon cycle behavior when the model's DIC and TA are close to the observed (initialized) state on the simulated surface fCO 2−ocean trends. Three additional  Table 1). All simulated trends are here model-drift corrected (see Text S4; Figures S13 and S14). The blue and red lines correspond to the observation-based fCO 2−ocean trend and the CMIP5 model-mean, respectively. The gray and orange shadows correspond to the 3σ returned by the CMIP5-based multiple linear regression analysis and by the intermodel spread. members were run with constant atmospheric CO 2 mixing ratio for each of the five different atmospheric forcing experiments to (1) quantify the model drift in each forcing experiment and (2) confirm that the drift in the trend is not overly sensitive to interannual variability in the forcing (Text S4; Figure S13).
After quantifying and removing the model drift on the simulated fCO 2−ocean trends using the runs from the constant atmospheric CO 2 mixing ratio members (Text S4; Figure S14), we find that the simulated North Atlantic fCO 2−ocean trends show good agreement with the observation-based trend and among themselves (potentially less agreement when considering the CanESM2-forced simulation; Figure 7). By being forced with an observation-based product, the ERA-Interim run provides an improvement in simulating the North Atlantic fCO 2−ocean trend compared to the averaged CMIP5 model (Figure 7a). Nevertheless, a realistic surface forcing field is not a necessary condition for simulating an appropriate North Atlantic fCO 2−ocean trend. Crucially, however, the initialization with observation-based biogeochemical (DIC and TA) and physical (T and S) fields is necessary for the trend to match the observations. Indeed, the CMIP5-forced simulations are performed with nonrealistic atmospheric conditions, and yet they provide realistic North Atlantic fCO 2−ocean trends (Figures 7b-7e). Over the period 1992-2014, the resulting surface fCO 2−ocean trends calculated from the four CMIP5-forced simulations (i.e., using the surface fluxes from GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, and CanESM2) are, respectively, 1.52 ± 0.09 μatm/year, 1.54 ± 0.09 μatm/year, 1.43 ± 0.04 μatm/year, and 1.68 ± 0.11 μatm/year, corresponding to a change that is respectively 83%, 86%, 92%, and 62% closer to the observation-based fCO 2−ocean trend than their corresponding CMIP5 model-truth trends were (Figures 7b-7e). Besides being ocean-only simulations, the major difference between our new simulations and the CMIP5 simulations is that ours are initialized from observed ocean physics and biogeochemistry shortly before the results are produced, whereas the CMIP5 models have been spun up for periods up to 10,000 years to allow them to approach their unique preindustrial equilibrium (Séférian et al., 2016), from which the CMIP5 transient simulations are initiated. Over a long spinup period important carbon cycle fields, such as alkalinity, will drift from the observed state toward the models own equilibrium state (Figures S15-S18). Our simulations are therefore attempting to quantify the surface ocean CO 2 response to the real-world carbon cycle state, whereas the CMIP5 models describe a surface ocean CO 2 response with their model state.

Conclusions
While the latest generation of ESMs are widely used to underpin policy making, and of particular relevance here, to determine the allowable CO 2 emissions to remain below agreed levels of global warming, their evaluation typically focuses on how well they represent the climatological state. Since questions asked by policy makers relate to how the system is changing, which are answered by using ESMs, it is important to assess how these models simulate change. Evaluation of the models' ability to simulate change requires observational information about how the real world has changed; a challenge in a field such as ocean biogeochemistry, where observational records are short and often sparse. By carefully assessing and refining a method to interpolate sparse surface fCO 2−ocean observations across the North Atlantic over the period 1992-2014 and quantifying robust annually varying uncertainties associated with the resultant time series and trend, and by investigating the impact of internal variability on the simulated trends, we have been able to identify that the CMIP5 model ensemble significantly overestimated the trend in the North Atlantic surface CO 2 concentration over the period 1992-2014. The possible mechanisms impacting the simulated North Atlantic surface fCO 2−ocean were explored with an ocean-only model, which suggested that the discrepancy between the surface fCO 2−ocean trend in the real world and in the CMIP5 models arises from inadequacies in their simulation of the background biogeochemical state. Inadequacies in CMIP5 physics and/or biology lead to substantial biases in TA that may be the cause for the fCO 2−ocean trend overestimation in the models through its impacts on the buffer capacity ( Figure S18). The present analysis therefore challenges our ability to predict the decadal-to-centennial evolution of the North Atlantic sink for CO 2 in the future using the current state-ofthe-art ESMs. The model biases identified here and the extension of continuous ocean CO 2 measurements must be addressed if the climate-science community is to provide the best possible guidance on what anthropogenic CO 2 emissions are consistent with agreed atmospheric CO 2 targets. Nevertheless, our study further shows that models provide an essential platform to investigate the uncertainty and error associated with an observation-based interpolation technique. We therefore encourage the community to use OSEs to provide further critical understanding of the robustness of interpolated products and their corresponding basin-wide and time-varying uncertainties.