Key Uncertainties in the Recent Air‐Sea Flux of CO2

The contemporary air‐sea flux of CO2 is investigated by the use of an air‐sea flux equation, with particular attention to the uncertainties in global values and their origin with respect to that equation. In particular, uncertainties deriving from the transfer velocity and from sparse upper ocean sampling are investigated. Eight formulations of air‐sea gas transfer velocity are used to evaluate the combined standard uncertainty resulting from several sources of error. Depending on expert opinion, a standard uncertainty in transfer velocity of either ~5% or ~10% can be argued and that will contribute a proportional error in air‐sea flux. The limited sampling of upper ocean fCO2 is readily apparent in the Surface Ocean CO2 Atlas databases. The effect of sparse sampling on the calculated fluxes was investigated by a bootstrap method, that is, treating each ship cruise to an oceanic region as a random episode and creating 10 synthetic data sets by randomly selecting episodes with replacement. Convincing values of global net air‐sea flux can only be achieved using upper ocean data collected over several decades but referenced to a standard year. The global annual referenced values are robust to sparse sampling, but seasonal and regional values exhibit more sampling uncertainty. Additional uncertainties are related to thermal and haline effects and to aspects of air‐sea gas exchange not captured by standard models. An estimate of global net CO2 exchange referenced to 2010 of −3.0 ± 0.6 Pg C/year is proposed, where the uncertainty derives primarily from uncertainty in the transfer velocity.


Introduction
The air-sea flux of carbon dioxide is a critical part of the climate system (Ciais et al., 2013;Rhein et al., 2013) and a major factor in the development of the ocean (e.g., ocean acidification). An ocean sink is an important component of the anthropogenic perturbation of the global carbon cycle. Several methods are recognized to give fairly consistent estimates of the global time-averaged ocean sink or uptake rate (Rhein et al., 2013). In particular, an inventory of anthropogenic carbon in the ocean implies an uptake rate of −2.3 Pg C/year (in the range of −1.7 to −2.9) from 2000 to 2010, while an estimate based on atmospheric O 2 /N 2 measurements implies an uptake rate for the same decade of −2.5 ± 0.6 Pg C/year. Also, Takahashi et al. (2009) estimate a sink of −2.0 ± 1.0 Pg C/year "normalized to the year 2000" based on a calculated contemporary air-sea flux of −1.6 ± 0.9 Pg C/year corrected by an estimated preindustrial flux of +0.4 ± 0.2 Pg C/year. From these three estimates, there is a high level of confidence in a global uptake between −1.0 and −3.2 Pg C/year (Rhein et al., 2013). Estimates for the preceding decades are generally lower and a trend of an increasing sink with time is implied. Note that we will always represent a net downward flux as a negative value, for more consistency with the convention for uptake.
The estimates by Takahashi et al. (2009) are based on a method of estimating the contemporary air-sea flux of carbon dioxide through an air-sea flux equation that calculates the local air-sea gas flux as the product of an exchange coefficient and a partial pressure difference. (This is broadly equivalent to calculating the flux as a product of transfer velocity and an effective concentration difference-see equation (1) below-but cannot deal satisfactorily with vertical temperature gradients . The method depends on both estimates of exchange coefficient (the product of transfer velocity and solubility) and sufficient measurements of CO 2 in and above the ocean. It follows that improved knowledge of key factors could reduce the uncertainty of the air-sea flux, both globally and more locally. It is generally understood (e.g., Takahashi et al., 2009) that both the parameterization of the transfer velocity and the relative sparsity of measurements of upper ocean dissolved CO 2 contribute substantially to the uncertainty of the global flux. The air-sea transfer velocity remains an active research topic (Garbe et al., 2014), while the collection, quality control, and release of CO 2 data have grown and improved with the formation and activity of the Surface Ocean CO 2 Atlas (SOCAT) community (Bakker et al., 2014). In this paper, we investigate if the growing knowledge of both CO 2 concentrations in the upper ocean and the air-sea gas transfer velocity are yet sufficient to reduce the uncertainty in the global flux.
The results were produced using FluxEngine, which is described in detail by Shutler et al. (2016). Raw data on atmospheric and oceanic CO 2 concentrations and other environmental data (particularly wind speed, temperature, and salinity) are ingested into FluxEngine, which then calculates fluxes based on parameterizations. Details of the data and parameterizations used in this study are explained in the methods section. One importance of the "air-sea flux method" is that it can be applied to estimate far more local fluxes than other methods. The calculations made with FluxEngine are at a monthly and 1°(latitude × longitude) resolution, but in this paper we will restrict the presentation of results to integrations over four large oceanic regions and their total ("global values").
We use a standard equation for the net air-sea flux, F, of a gas: Here H is Henry's law constant and K w is the transfer velocity. We specify that the calculations are for poorly soluble gases. In this case, we can satisfactorily substitute k w for K w , where k w is a water-side transfer velocity (Garbe et al., 2014). In the next section, we will briefly summarize the available options for k w and the sources of data required for the calculations. In addition, the concentration difference is to a good approximation only across a marine microlayer, the mass boundary layer (MBL) at the sea surface, with a concentration at the top of the MBL, C I = C a /H, and a concentration at the base, C M = C w . The concentration anywhere within the ocean is the product of a solubility and a fugacity, fCO 2 , but as solubility varies with temperature and salinity, all calculations of flux should ultimately be in terms of concentrations.

Parameterization of Transfer Velocity
A traditional wind-speed-dependent transfer velocity is parameterized as where U is the instantaneous 10-m elevation wind speed and Sc is the Schmidt number of the dissolved gas.

Global Biogeochemical Cycles
(Since the effect of the wind at the surface is sensitive to atmospheric stability, wind speeds, U, are corrected to an equivalent value in a neutrally stable atmosphere). These transfer velocities can be calculated in a computationally efficient method as described in the supporting information.
The polynomial expressions considered here for k w are summarized in Table 1. The expressions are shown graphically in the supporting information ( Figure S1). These expressions are only a few of those found experimentally but provide a reasonable test of the uncertainties in flux resulting from uncertainties in transfer velocity. We consider mainly parameterizations based on "dual tracer experiments" (DTEs). Estimates based on DTEs depend on measurements of the relatively rapid decline in dissolved 3 He relative to SF 6 as a result of the much higher transfer velocity of 3 He. Calculation of the transfer velocity for any single gas, including CO 2 , relies on the Schmidt number dependence implicit in equation (2). A large number of DTEs have been conducted (e.g., Nightingale et al. (2000) and Ho et al. (2006Ho et al. ( , 2007Ho et al. ( , 2011).
Parameterizations of transfer velocity based on DTEs are widely used but are not the only ones available. A wider spectrum of expert opinion on transfer velocity follows from including parameterizations based on different methods. Here, we include a single parameterization (McGillis et al., 2001) based on direct micrometeorological measurements of CO 2 flux.

Data Sources and Processing
As described by Shutler et al. (2016), we have developed a flexible processing system, FluxEngine, for the calculation of air-sea gas fluxes. The system can ingest a large number of data sets, and the results will depend, for example, on the choice of sea surface temperature product. FluxEngine also includes a number of switches that modify the calculations performed. The first global flux products with a 1°× 1°spatial resolution produced by FluxEngine used the same data inputs (based on approximately 3 million surface ocean values of partial pressure) as used by Takahashi et al. (2009) but interpolated to a new grid. That option allowed a verification against the fields published by Takahashi et al. (2009), which showed that the net global flux differed by <1% from the original publication . In parallel, the development of the SOCAT community has provided uniform, quality-controlled data sets for surface ocean fCO 2  containing progressively more data values. Recent papers by Woolf et al. (2016) and Goddijn-Murphy et al. (2015) describe the principles and practical steps involved in reanalyzing upper ocean CO 2 data from SOCAT synthesis data files (Bakker et al., 2014;Bakker et al., 2016;Pfeil et al., 2013) to prepare them for synoptic-scale flux calculations. The version of FluxEngine (Version 2) and data used in this study are included in Holding et al. (2018), which also provides links to the software. In this paper, we present calculations referenced to 2010, using an enlarged data set on upper ocean CO 2 (Version 4 of SOCAT;18.5 million values, 1957 and various other data sets specifically for each calendar month of 2010. A major difference in principle between the calculations presented by Takahashi et al. (2009) and those presented here is that the difference in concentration is carefully calculated with due regard to gradients of temperature and salinity . Since many years of upper ocean CO 2 data are used and secular changes in temperature are significant to the calculations at the base of the MBL, temperatures are required for all months and years corresponding to CO 2 data, in addition to the 12 months of the Optimum Interpolation Sea Surface Temperatures are a "Reynolds SST" corrected to buoy measurements and thus a "subskin temperature." These temperatures were used directly for calculations at the base of the MBL, while 0.17 K was subtracted to give an appropriate skin temperature based on the estimates of Donlon et al. (2002).
In principle, salinity gradients introduce similar complications to temperature gradients. However, the sensitivity of CO 2 concentrations to salinity is much weaker than to temperature and the effects are correspondingly weaker . Additionally, while trends in oceanic temperature are substantial, those in salinity are relatively weak. These facts justify the relatively simple approach to salinity in this study. Unlike temperature, we do not recompute in situ SOCAT fCO 2 using gridded values of salinity (Goddijn-Murphy et al., 2015). Climatological values of surface salinity from Takahashi et al. (2009) are used for calculations of solubility and transfer velocity in the reference year. Uncertainty in the haline skin effect is large (Yu, 2010;Zhang & Zhang, 2012), but the effect is of order of 0.1 PSU. Climatological surface salinities are assumed to be "subskin" values, while 0.1 PSU is added for calculations at the top of the MBL.
As described by Goddijn-Murphy et al. (2015), we reprocess SOCAT values in two steps to give individual values of fugacity in the reference year. First, the equivalent fugacity is calculated at an accepted value of subskin SST in the month and location (on a 1°× 1°grid of the surface ocean). Second, a secular trend is applied to fugacities to give a value in the reference year in the same calendar month and location. In many cases, values in a particular calendar month and location of the reference year may originate from several different years and various cruises. In the original version (Goddijn-Murphy et al., 2015), these values were averaged in each grid cell without weighting, but in this study contributions from each year were weighted by the number of cruises contributing to that cell, which is consistent with the weighting method used in Sabine et al. (2013). A key uncertainty relates to the application of a secular trend. A study of SOCAT Version 2 supports a steady trend of 1.5 μatm/year (Zeng et al., 2014), but the trend will introduce uncertainty and a part of our study (section 2.3.4) addressed this issue.
In spite of the use of multiyear data, there are many permutations of calendar month, latitude, and longitude (1°× 1°) cells where no data on upper ocean fugacity are available. Data coverage is adequate for some oceanic areas (e.g., Wróbel & Piskozub, 2016), but a lack of data in some cells is common in regional and global studies of CO 2 flux. Some method of mapping (interpolation and extrapolation of fragmented data) is necessary. The SOCOM study (Rödenbeck et al., 2015) compared many approaches to this mapping, including the use of various numerical and statistical models. That study showed that while there are some differences, the integrated regional CO 2 fluxes are not highly sensitive to the mapping method. This context is important in relation to the absolute values of global and regional fluxes reported here since our method is based on that described by Goddijn-Murphy et al. (2015), which was included in the SOCOM study. Our mapping is based on simple geospatial methods; that is, interpolation is based on proximity only, not on any inferred relationship with other data types, since the robustness of such relationships is unknown. The standard method is "ordinary block kriging," but another method must be used where this method fails because a variogram cannot be computed (section 4 of Goddijn-Murphy et al., 2015). Kriging methods have the merit that they are statistically unbiased and do not require a "baseline" or "prior" value that might skew the results but instead are entirely based on data. In this study, the greater number of data in SOCAT Version 4 made the kriging method more reliable, but it could not be used for the month of November and inverse distance interpolation had to be used instead for that month.
Flux calculations also require values of wind speed and atmospheric CO 2 values. This study takes wind speeds from the European Space Agency multisensor merged and calibrated GlobWave data set and values of the molar fraction of CO 2 in the dry atmosphere from the National Oceanic and Atmospheric Administration Marine Boundary Layer reference data set (https://www.esrl.noaa.gov/gmd/ccgg/mbl/ index.html). These were converted to pCO 2 (air) using the formulation of Weiss (1974), as implemented in Shutler et al. (2016), and sea level atmospheric pressure from the National Centers for Environmental Prediction/National Center for Atmospheric Research Reanalysis data set (Kalnay et al., 1996). Atmospheric "C a /H" (from equation (1)) is then calculated as described in section 2 of Woolf et al. (2016), taking care to use "skin" values of temperature and salinity.

Global Biogeochemical Cycles
The focus of this study is the uncertainty of fluxes integrated over regions and globally, but for illustration we show a typical global map of the 1°× 1°resolution fluxes in Figure Table 1 -and defaults for all other processing described below. The general pattern of fluxes is quite similar to many other studies (see, e.g., Roobaert et al., 2018) but with some local differences and a subtle general shift toward more predominantly downward fluxes. That shift can be traced to the treatment of temperature in the processing method of Goddijn-Murphy et al. (2015). Calculations at each grid cell require the reprocessed SOCAT values described above and other data. The figure shows the point flux calculations in each grid cell where there is sufficient coincident data available (i.e., atmospheric and aqueous CO 2 , air pressure, salinity, SST, and wind speed data all available for the same grid cell). The details of the standard calculation method are described in the appendix of Shutler et al. (2016), including corrections where direct calculation is not possible. White areas near the coast result from missing values of one or more necessary variable and the integrated fluxes reported below include the first-order correction explained by Shutler et al. (2016).
Integrated fluxes are calculated as an integration of equation (1) over the surface of a spheroidal Earth. Where a grid cell includes land or ice, the integrated flux is adjusted by the default scheme described in the appendix of Shutler et al. (2016). Ice fractions were drawn from the OSI-SAF/SSMI sea ice data record. As noted already, analysis in this study is restricted to "global" air-sea fluxes and to four oceanic areas. The four regions are illustrated in the supporting information ( Figure S2). A relatively small region south of 60°S is defined as "Southern," while the main oceanic expanse and marginal seas are assigned as "Pacific," "Atlantic," or "Indian" according to International Hydrographic Office (IHO) definitions. The sum of these four regions provides a global value. This global value excludes inland seas such as the Mediterranean Sea but includes the deep ocean and their marginal seas. Generally, through widespread international effort and collaboration the sampling of upper ocean CO 2 measurements has increased over the last few years, as reflected in Figure 2. However, the increase has not been steady and the geographical distribution is not even. The Northern Hemisphere is sampled better than the Southern Hemisphere. Political instability and piracy has deterred measurements in large parts of the Indian Ocean for several years, while some locations, for example, within the seas of Southeast Asia, lack any data.

Experiments
Experiments are designed to explore sensitivities of global and regional fluxes to several factors. The first two sets of experiments are anticipated in the preceding subsections, but in this subsection we will summarize all the experiments.

Temperature and Salinity Gradients
As described in section 2.2, skin effects of −0.17 K and +0.1 PSU are assumed as standard, but one set of calculations was conducted with no skin effect (i.e., using subskin temperatures and salinities throughout the calculation).

Transfer Velocities
As described in section 2.1, several parameterizations of transfer velocity are available. We made calculations using the eight polynomial relationships to wind speed described in Table 1. Standard temperature and salinity gradients were applied.

Sparse Sampling
As described in section 5.4 of Goddijn-Murphy et al. (2015), the influence of the fortuitous timing and track of specific measurement cruises can be evaluated by a bootstrap experiment. We constructed an ensemble of ten synthetic data sets by selecting randomly with replacement from actual cruises (using the cruise ID within SOCAT). Each synthetic data set was processed in exactly the same way as the standard data set. Atmospheric data sets, wind speeds, sea surface temperatures, and salinities are not afflicted by sparse sampling to the same degree as surface ocean fugacities of CO 2 , and we used a standard data set for these other variables throughout all the calculations.

Year of Sampling
Atmospheric partial pressures of CO 2 rose at slightly more than 1.5 μatm/year on average through the period 1980-2000 but accelerated more recently. A fairly steady increase of~2.0 μatm/year in the period 2000-2010 was followed by a further acceleration in recent years (Ed Dlugokencky and Pieter Tans, National Oceanic and Atmospheric Administration/Earth System Research Laboratory; www.esrl.noaa.gov/gmd/ccgg/ trends/). Over a similar time period, analyses of surface ocean have not shown clear evidence of an acceleration in the rise of fugacity (Zeng et al., 2014), which underpins our decision to apply a steady 1.5-μatm/year increase throughout the central calculations. Part of the problem is the need to extrapolate old measurements forward to 2010. For example, if we extrapolate forward fugacities from 1990 to 2010, then an error of 0.5 μatm/year in the trend leads to an error of 10 μatm in the 2010 value. If extrapolated globally, that amounts to an error of >2 Pg C/year in the net flux, which is clearly unacceptable. Since only~24% of the data precede 2005, the situation is less serious, but there are regions and months where older data and the assumed trend may substantially affect the calculated result. A few key issues are apparent from Figure 2; for example, more than 50% of data in the Indian sector precede 2004, including almost 20% in 1995 alone.
There is an issue with applying a simple and uncertain trend and the procedure and experiments described

10.1029/2018GB006041
Global Biogeochemical Cycles in the following two paragraphs are designed respectively to minimize the effect and to assess the uncertainty.
A bias in the value of a steady trend should be less critical if the reference year is centered within the period of data, since the error for data extrapolated forward should be balanced by the error for data extrapolated backward (assuming approximately the same number of data before and after the reference year). A poor value of trend will result in local errors and increased uncertainty, but the central estimate of the net global flux should be affected only slightly, unless the trend accelerates or decelerates. Local and regional patterns of sampling (e.g., if a particular oceanic region is mainly sampled before the reference year) will introduce regional biases.
Our study included simulations where only SOCAT data from 2005-2014 (roughly centered on 2010) was used. We present data for only a single transfer velocity parameterization (Nightingale et al., 2000), but calculations are presented for two applied trends, 1.5 and 2.0 μatm/year.

Temperature and Salinity Gradients
Calculations were conducted with and without skin effects using the transfer velocity parameterization of Nightingale et al. (2000); Model 7 in Table 1. This pair of calculations show that the combined effect of the thermal and haline skins is an increase in global net flux of −0.35 Pg C/year. Note that this amount is proportional to the atmospheric concentration and to the transfer velocity but is insensitive to upper ocean concentrations and should be reliable given reasonable estimates of the global mean skin differences (−0.17 K and +0.1 salinity) and global mean transfer velocity.

Transfer Velocities
Fluxes were calculated globally and regionally using each of the eight models described in Table 1. The results for the annual and global totals are summarized in Figure 3. (Some seasonal and regional results are included in sections 3.3 and 3.4 below.) The presentation in Figure 3 illustrates the variation in both gross and net downward fluxes among the eight models described in Table 1. An increase in a transfer velocity will always increase the gross downward flux, but the situation is more complicated for net fluxes. There is a net upward flux in CO 2 in some locations and months. An increase in transfer velocity at those locations will decrease the total net downward flux. It is useful to compare the effect of a transfer velocity model on both net and gross fluxes, since both comparisons can provide insight. The net fluxes vary between~3.6% and~4.2% of the gross flux among the eight models.
Among all eight models, the annual net global flux ranges between −2.4 and −4.3 Pg C/year, but most values cluster around a net flux of around −3 Pg C/year on a gross downward flux of~−80 Pg C/year. Analyses of subsets of this data are described in section 4.2.

Sparse Sampling
The influence of sparse sampling is illustrated by presenting values from the standard calculations along with those of the 10 members of the ensemble from the bootstrap experiment. Results are calculated using Model 1 of the transfer velocity (Ho et al., 2006), but effects of sparse sampling will be similar for other models. The annual net global CO 2 flux varies with a standard deviation of 2% among the ensemble members about a mean of −2.99 Pg C/year, while no member varies by more than 4% from the standard value (−3.06 Pg C/year).
Annual net values are shown in Figure 4. The results suggest that the values for each oceanic region are reasonably robust. The greatest variation is seen among the values for the Indian region, but the Pacific and Atlantic also contribute significantly to the global uncertainty.  Table 1. Net fluxes are the gray bars; gross fluxes are the points. Note the different scales on the two y axes (coincidence implies a net flux that is 4% of the gross flux).
Sparse sampling is a more significant issue for seasonal variations. Standard and ensemble values are presented by calendar month and region in Figure 5. These variations among ensemble members are larger than those of annual values; however, values for most regions and months seem to be fairly robust. There are several combinations of month and region that show high variation, and these can be identified as significant gaps in sampling. The Southern Ocean seasonal cycle appears robust, with a substantial downward flux in the austral summer (peaking in February) and a near balance in winter and spring. However, examination of the spatial distribution of the data indicates that the great majority of the data in winter comes from a restricted region around Drake Passage, so the bootstrap technique may not adequately reflect the true uncertainty of the fluxes in the Southern Ocean as a whole (see below, section 4.2.2.2). In the Pacific Ocean, variation is substantial in all months but peaks in September to November. Among the Atlantic sector results, June and August appear to be the most problematic. The net downward flux in the Indian sector is small from December to March and appears to be well constrained by sampling in those months, while the net downward flux is larger in the rest of the year, but less adequately constrained by data.

Year of Sampling
In addition to the standard results based on upper ocean data since 1990 (as illustrated in Figure 2), we have two sets of calculations restricting data to the period 2005 to 2014. As expected, since the data for the latter two sets are distributed around the reference year of 2010, the sensitivity to the assumed trend was almost removed. Interestingly, for both trends, the annual net downward CO 2 flux was decreased by more than 0.2 Pg C/year (−2.86 Pg C/year for all data and 1.5-μatm/year trend; −2.65 Pg C/year for data since 2005 and 1.5-μatm/year trend; −2.63 Pg C/year for data since 2005 and 2.0-μatm/year trend). However, on closer examination, we determined that the results when restricting the period of data were less credible than the standard results, as we will explain below.
Examining the results of calculations by region and calendar month is revealing (Figure 6). It is clear that in most regions and months there is very little difference in results among the three sets, but the standard values diverge from the other two sets in several key cases. In the global values of net flux, the restriction by year greatly reduces the net downward flux in June, November, and December, but this is compensated partly by higher values in August. Breaking down the analysis by region reveals more: The result in June is related to a data set contributing to values in the Atlantic and Indian sectors. The difference in August relates to Indian Ocean data. The differences in November and December are related to early data sets for the Pacific Ocean. Crucially, in every case that we have examined, the large difference is a result of excluding data in regions and seasons where no data since 2005 are available. That exclusion places an even greater burden on interpolation, and the results of geospatial interpolation are manifestly poor when early data are

10.1029/2018GB006041
Global Biogeochemical Cycles excluded. The global and regional seasonality is affected by excluding data prior to 2005 and in our judgment; the result appears far less credible. That finding is unfortunate, since it is clear (from the almost complete coincidence of these two sets of results in Figure 6) that excluding earlier data and centering the data can greatly reduce sensitivity to secular trends. Nonetheless, by using all data from SOCAT Version 4, and centering on 2010, we judge that the confidence intervals can be narrow; specifically, the global interval could exclude the "2005 to 2014 only" value. A substantial challenge remains to sample CO 2 in the upper ocean sufficiently, but we are encouraged that only a limited number of region and month combinations exhibit major issues. In most cases, the seasonal dependence can be inferred with reasonable confidence by informing assessment of poorly sampled months with neighboring better sampled months, but the last few calendar months in the Pacific are a major concern. Sampling efforts could be prioritized further using maps of uncertainty. Such maps are not included here but are a direct product of the interpolation method (Goddijn-Murphy et al., 2015).
Using multiyear data remains a necessary compromise for global calculations and calculations of many regions, including using data from the twentieth century for evaluations of the recent oceanic sink.

Expression of Uncertainty
In this study, uncertainties are expressed following the guidelines set out by the Joint Committee for Guides in Metrology (BIPM, 2008) as far as is practical. Those guidelines seek a "realistic" rather than a "safe" estimate of uncertainty (see Annex E in BIPM, 2008). That objective leads to combining uncertainties irrespective of whether individual components are random or systematic in nature. The logic and mathematics of the method are summarized in this subsection and then their application to the global net flux of CO 2 is described in the following subsection.
The simplest statistical analyses of uncertainty are based on "n" repeated, independent measurements of a quantity (a "Type A" evaluation, in the terminology of BIPM, 2008). Since this method is standard, we summarize it very briefly here but provide a more thorough treatment in the supporting information for reference.
A simple analysis of a set of repeated measurements yields a central estimate of the true value, q; and its standard uncertainty, u. The value of interest might be inferred from the measurement of one or more related quantities ("inputs"). If not all substantial uncertainties are included in a single measurement process, then a combined uncertainty, u c , may need to be calculated from uncertainties arising from individual inputs. In the case of independent errors, the combined uncertainty can be calculated as the square root of the sum of uncertainties squared. Confidence intervals are placed on the quantity, Q = q ± e u c , where e = 2 or e = 3 are commonly used, the choice depending on how conservative an estimate is required.
It is not always practical to evaluate an uncertainty by a standard Type A method. A key philosophy of the BIPM methodology is that all substantial uncertainties should be considered irrespective of whether a Type A estimate is practical. Type B methods are diverse and include any method of evaluation that is not Type A (BIPM, 2008). Type B evaluation requires more expert judgment but can be as reliable as Type A given adequate information. A combined uncertainty is calculated ignoring the method of evaluation.

Global Biogeochemical Cycles
This study requires Type B evaluations to be included. For key uncertainties, we will use the same equations for sample mean and standard error as a Type A evaluation, but typically using a only small number of estimates. In this case, the experimental average, standard deviation, and standard uncertainty will be poor estimates of the true values. Perhaps more importantly, since the effective number of degrees of freedom will be small, the level of confidence associated with a defined extended uncertainty will be weaker than for Type A uncertainties and poorly defined. Therefore, while an interval can be reported, its definition is inexact. We will calculate an interval using e = 2, thus quoting a value in the following form: We can reasonably state that the value is "likely" to fall within the defined interval, but we cannot be more specific.

Global Net Flux 4.2.1. Central Value
The main purpose of this study was to investigate key uncertainties, but first, we comment on the central estimate of the global net air-sea flux of CO 2 , about −3.0 Pg C/year in 2010. This value is much larger than estimated by Takahashi et al. (2009) Figure 7). For the carbon budget to be balanced, a larger oceanic sink requires a smaller land sink, which requires further investigation.

Uncertainties
Quantitative estimates of uncertainty will be limited to the annual global net flux of CO 2 (referenced to 2010) and will be calculated by the methods outlined in the previous subsection. We begin with considering uncertainty related to the transfer velocity and then consider upper ocean sampling and thermal and haline effects. Our conclusions are contingent on having included all major sources of uncertainty in the global value.

Transfer Velocity
Models of transfer velocity were chosen to elucidate various sources of uncertainty in the net fluxes (sections 2.1, 2.3.2, and 3.2). Models 1 to 7 are based on estimates of gas transfer velocity by a single experimental method, that is, DTEs, while Model 8 is a highly cited model based on micrometeorological fluxes of CO 2 (McGillis et al., 2001). By choosing different subsets of these models, we will reach four estimates of uncertainty. Since all fluxes are simply proportional to the transfer velocity, a particular percentage uncertainty in transfer velocity translates to an identical percentage uncertainty in the flux.
Two estimates (A and B) of uncertainty will be based on Models 1 to 4, which are different statistical fits to a single data set. Model 1 (Ho et al., 2006) defines a simple square-law dependence on wind speed with reported confidence limits of the single fitting coefficient, "c 2 ," of 7%. Models 2-4 are taken from a response by Ho et al. (2007) to a comment on the previous paper. Ho et al. argue that while Models 2-4 also fit the data, they can be excluded. However, in our opinion there is no theoretical reason to favor a simple square law and only Model 2 is poorer empirically. The uncertainty identified here is that associated with the statistical model and scatter within the data set; both a structural uncertainty (the choice of model) and a parameter uncertainty associated with the inability of any simple model to describe all the variance in the data. The average flux and the structural uncertainty can be estimated by applying equations (S7)-(S10) to a subset of the models, but note that the number of models is small and they are not independent; therefore, these are not Type A uncertainties. The parameter uncertainty can be estimated approximately from the reported confidence limits of ±7% in Model 1. For the purpose of completing calculations similar to equations (S13) and (3), those limits are interpreted as a standard uncertainty of 3.5% and assumed to be independent of the structural uncertainty. Two estimates of structural uncertainty alternately exclude (Estimate A) or include (Estimate B) the least credible model, Model 2. The results are included in Table 2. Note that in each case, the combined standard uncertainty is calculated from the parameter uncertainty and the structural uncertainty using equation (S13). The application of that equation is questionable, since the uncertainties are not independent, and has a substantial effect as the structural and parameter uncertainty are comparable. For Estimate A, final limits of ±9% are estimated, while ±13% is calculated for Estimate B.
A different path to estimating uncertainties related to transfer velocities is to take a number of "expert opinions" and analyze the transfer velocities and fluxes resulting from those. This method is similar in principle to sending samples to a number of laboratories and analyzing the variation among the laboratory reports. (An opinion may be accompanied by expressions of uncertainty, but if we include those uncertainties in a calculation of combined uncertainty then that is likely to be "double counting" [see section 4.3.10 of BIPM, 2008] since a large part of both the "difference in opinion" and "limits of confidence" will share common sources.) The method is hazardous, first, since there are subjective judgments about what opinions should be included and, second, since the more credible opinions are not really independent. In the case of transfer velocities, most of the models considered are based on a single type of experiment-DTEs (see explanation at the end of section 2.1)-but Model 8 has been deliberately chosen to include a contrary opinion. The remaining models, Models 1 to 7, can be used to investigate uncertainty internal to the DTE method, but they cannot shed light on biases common to all applications of the DTE method (e.g., Goddijn-Murphy et al., 2016;Jacobs et al., 2002). Thus, any estimate based on any subset of Models 1 to 7 is likely to underestimate the true uncertainty. On the other hand, by including Model 8, we are introducing a more controversial opinion and might exaggerate the true uncertainty. Generally, including methods other than DTEs will increase the calculated uncertainty, since the results of other methods often differ substantially from those of DTEs. Specifically, Model 8, based on direct flux measurements of CO 2 (McGillis et al., 2001), implies relatively high transfer velocities in strong winds. More recent direct flux measurements of CO 2 report a diversity of results, ranging from further evidence of very high transfer velocities in strong winds (e.g., Blomquist et al., 2017) to values fairly consistent with the DTE-based models (e.g., Bell et al., 2017).
Models 1 to 4 have already been analyzed (Estimates A and B above and Table 2) and share the same data and authorship. Model 5 also shares the same transfer velocity data with Models 1 to 4, but Smith et al. (2011) combined these with different wind speed data and Model 5 predicts the highest fluxes among Models 1 to 7. Models 6 and 7 share experiments with the other models but provide a different-though not independent-interpretation. Therefore, we use Models 1 and 5-7 to analyze variation in expert opinion, limited to interpretation of DTEs (Estimate C). We add Model 8 to form a larger subset and a broader variation in expert opinion (Estimate D). In each case, the variation among the model outputs is interpreted through equations (9)-(11) to a complete value of the standard uncertainty.
This study assesses transfer velocity models based on local process studies and their extrapolation globally. An inventory of Bomb 14C in the ocean can provide a global constraint (Naegler et al., 2006; Table 2 Estimates of Combined Standard Uncertainty in Transfer Velocity (Equation (14)) and the Inferred Limits of Confidence (Equation (17)
Applying the constraint to a square law dependence on wind speed yields a coefficient c 2 = 0.27 (Takahashi et al., 2009) or 0.251 (Wanninkhof, 2014), values that are satisfyingly close to Model 1. Other polynomial dependencies on wind speed could meet the same single constraint and it is not possible to find a unique solution. Since the Bomb 14 C flux is largely downward, it is more directly a constraint on the gross downward flux (see Figure 3). An approximate understanding of the effect of this constraint can be derived through normalizing the net fluxes, by shifting all gross downward fluxes to a single value. It is apparent that the Bomb 14 C constraint could recalibrate Models 2, 5, and 8 to give net fluxes in a similar range to the other models (Model 1 and Model 5 would then be identical). However, while the values of Models 1, 3, 4, 6, and 7 would be rearranged, their range would not be greatly affected. In conclusion, while global inventories provide a useful check on models of transfer based on tracer studies, they are insufficiently accurate to narrow confidence intervals. Table 2 summarizes the four Estimates. The mean of the four flux values together imply a net air-sea flux of greater than −3.0 Pg C/year. On the other hand, Models 4, 6, and 7 are among the most credible models and all yield a net air-sea flux close to −2.85 Pg C/year. On balance, we think that −3.0 Pg C/year represents a reasonable "consensus" value of the net flux. Among the estimates of uncertainty, two suggest a standard uncertainty of approximately 5% (limits of confidence, ±10%), while the other two suggest a relatively high standard uncertainty of up to 10% (limits of confidence, ±20%). A rational case can be made for either conclusion and we will consider both possibilities in the following. We continue now to consider other sources of error in the estimated flux.

Ocean Sampling
The results from the experiments on sparse sampling (section 3.3) and year of sampling (section 3.4) are revealing. It is important here to distinguish between "sparse sampling" (requiring interpolation between distant cruise tracks, but within coherent water bodies) and "absence of data" (requiring extrapolation into oceanographically distinct regions). The experiments reveal the uncertainty related to interpolation but can only hint at issues related to unsampled oceanographic regions.
Applying equations (S8)-(S10) to the members of the ensemble described in section 3.3, we find an average of −2.99 Pg C/year with a standard uncertainty of only 0.6%. The standard uncertainty is quite low compared to a~2% difference between the standard value (−3.06 Pg C/year) and the average of the ensemble, but the standard value would not be an outlier among the ensemble.
Uncertainties are larger when we resolve particular oceanic regions ( Figure 4) and specific calendar months ( Figure 5), but most results remain fairly robust. The accumulation and quality controlling of data in the SOCAT databases contributes greatly to reducing uncertainties to a useful level even in many remote regions. One noteworthy result is an apparently robust estimate of the net annual downward flux south of 60°S (the IHO definition of the Southern Ocean) of~−0.13 Pg C/year (see Figures 4 and 5). That result contrasts with an upward flux calculated by Takahashi et al. (2009) south of 58°S but is far more consistent with other analyses of the global carbon cycle (e.g., Gruber et al., 2009).
Excluding older data would be desirable, but we find that excluding data prior to 2005 makes the results less credible (section 3.4 and Figure 6). By choosing a fairly central reference year, 2010, we have avoided a strong sensitivity of the global flux to the assumed trend in upper ocean CO 2 . However, we should consider the three values of net global flux from these experiments (see section 3.4) in reaching an estimate of the combined standard uncertainty resulting from upper ocean sampling (including the necessary choice of a trend). In conclusion, a standard uncertainty related to sampling of 3% seems reasonable for the global net flux.
The bootstrap method used in this study randomly resamples individual cruises irrespective of their age.
Since the method does not resample periods, it cannot test for effects of decadal or multi-decadal variability. It is possible that our global and regional estimates for 2010 are distorted by samples from earlier decades, but that cannot be evaluated.

Thermal Effects
The standard runs in this study assume a cool skin effect of −0.17 K and a salty skin effect of +0.

Global Biogeochemical Cycles
is uncertainty in the magnitude of both skin anomalies and in the effect of a thermal skin effect on CO 2 flux, while the effect of warm layers is not explicitly included. We will make a "Type B" estimate of the standard uncertainty associated with thermal and haline effects separately.
Several factors need to be considered in estimating the flux uncertainty related to thermal effects. There is a distribution in the values of the measured thermal skin difference (Donlon et al., 2002) and both systematic and varying differences between SST products from different instrumentation (Merchant et al., 2014). Our value of −0.17 K is taken from the analysis of Donlon et al. (2002), who report an average cool skin effect of −0.17 K for a wind speed greater than 6 m/s with a root-mean-square variation of 0.07 K. Random variations are relatively unimportant to a calculation since the resulting standard uncertainty is very small given the large number of measurements. Differences of the average values among data sets (e.g., Table 1 of Donlon et al., 2002, shows averages ranging from −0.14 to −0.20 K) are more significant, since they may imply either small methodological biases or regional and seasonal differences. Nevertheless, the challenge of measuring the thermal skin effect is no longer a major impediment. Two competing systematic errors may be more significant . First, the calculation of the effect on CO 2 flux depends on choosing a model of how the temperature profile within the thermal skin affects the CO 2 profile (sections 4 and 5 of Woolf et al., 2016). The calculations in this study assume the "rapid model," while the alternative "equilibrium model" would predict a greater effect. Second, the presence of warm layers (Section 6 of Woolf et al., 2016) implies an opposing effect on CO 2 fluxes. Since there are two opposing effects of similar magnitude, we do not apply a systematic correction but apply a combined standard uncertainty of 0.04 K to the thermal skin effect, which implies a standard uncertainty of 3% in the net global flux.

Haline Effects
The size of the "salty skin anomaly" is highly uncertain (Yu, 2010;Zhang & Zhang, 2012), and the action of the anomaly on CO 2 fluxes is complex (section 7 of Woolf et al., 2016). We suggest a standard uncertainty of 0.1 PSU on the best estimate of 0.1 PSU is reasonable. Even with that large relative uncertainty, the effect on the net global flux is relatively small, 1.7%.

Combined Uncertainty
We can now combine our four sources of uncertainty (transfer velocity, sampling, thermal, and haline) into a combined standard uncertainty of the flux. As explained earlier, we retain two representative values of the standard uncertainty related to transfer velocity (5% and 10%). Both of these values are large enough to dominate over the other three values and we calculate combined standard uncertainties of (5 2 + 3 2 + 3 2 + 1.7 2 ) 1/2 = 7% and (10 2 + 3 2 + 3 2 + 1.7 2 ) 1/2 = 11%. Using equation (3), we can then calculate two alternative limits of confidence in the net global flux, −3.0±0.4 or −3.0±0.7 Pg C/year, which probably span the reasonable range.

Other Sources of Uncertainty
This study is based on key experiments rather than an exhaustive model of all uncertainties. We have argued a priori that uncertainty in our understanding of the transfer velocity and sparse sampling of upper ocean CO 2 are critical, with temperature and salinity sampling also important. Some uncertainties are not considered explicitly, for example, those relating to different data sets on wind speed, temperature, and salinity, since we have adopted a preferred single data set for each of these variables. We assume that basic understanding rather than faults in the wind speed data are primarily responsible for uncertainties in transfer velocity. We acknowledge that measurement uncertainty in the wind speed is sufficient to imply substantial uncertainties in air-sea heat flux calculations (Kent et al., 2013) and can affect gas flux calculations (Fangohr et al., 2008;Roobaert et al., 2018) but stand by the statement that model uncertainty is more important to gas transfer velocities. Other uncertainties of interest-but neglected in our calculations-include those related to the covariance of variables (these are ignored when we use averages for each cell) and the interaction of wind, waves and currents (Villas Bôas et al., 2019).

Limitations of the Standard Model
The models of gas transfer used in this study are simple in principle and easy to apply to climatological calculations and within Earth system models. However, these models are probably too simple, since they ignore mechanistic processes . Simplifying air-sea gas transfer to a simple case of wind-driven stirring excludes processes including convection, modification by surface-active materials (Pereira et al., 2018), and the full complexity of wind-wave-current interaction, wave breaking, and bubbles. The effect of rain (Ashton et al., 2016) is also neglected. All of these complications could introduce regional and seasonal biases that probably have little effect on the global gross flux but may introduce substantial biases in all seasonal and regional fluxes and in the calculated global net flux.
Modeling of carbon isotopes by Krakauer et al. (2006) suggests a regional dependence of transfer velocities contradictory to the simple models tested here. Altering the latitudinal distribution of transfer velocities to lower transfer velocities at high latitudes, while raising them at low latitudes, may be compatible with tracer data and would lower the net downward CO 2 flux. On the other hand, the asymmetry of bubble-mediated transfer may also introduce an additional downward flux of~0.1 Pg C/year (Zhang, 2012) or more (Leighton et al., 2018). In summary, the central estimate of −3.0 Pg C/year may be reasonable, but we should be cautious in our estimates of confidence, veering toward the more conservative of the two limits of confidence already discussed. Limits of confidence of ±0.6 Pg C/year are proposed.

Conclusions
The uncertainties in air-sea CO 2 flux estimates based on applying the air-sea flux equation are reducing as the most pernicious sources of error are eroded. Fairly consistent models of air-sea gas transfer velocity by different authors and a far greater accumulation and consistent quality controlling of upper ocean data all contribute to greater confidence in estimates. However, the process is ongoing and both gaps in ocean data and weaknesses in gas transfer models require further attention. As knowledge of transfer velocity and sampling of the upper oceans improves, some other sources of uncertainty will gain priority.
A contemporary global annual net air-sea CO 2 flux of −3.0±0.6 Pg C/year (referenced to 2010) is likely, though a value outside of those bounds cannot be ruled out. The primary cause of uncertainty in this global annual value is uncertainty in transfer velocities, but if the standard uncertainty of transfer velocity is reduced to <5%, then uncertainties in thermal effects and sampling become important. For the "referenced, annual, and global" value of net air-sea CO 2 flux, the sampling of upper ocean CO 2 appears to be adequate, but sparsity of sampling is a much greater issue for regional and seasonal fluxes and is insufficient for annually resolved fluxes. Confidence limits will necessarily be broader for reference years distant from 2010, since uncertainty in the trend of upper ocean CO 2 is a major issue, especially if the reference year is not centered within the available data.