A statistical gap‐filling method to interpolate global monthly surface ocean carbon dioxide data

We have developed a statistical gap‐filling method adapted to the specific coverage and properties of observed fugacity of surface ocean CO2 (fCO2). We have used this method to interpolate the Surface Ocean CO2 Atlas (SOCAT) v2 database on a 2.5°×2.5° global grid (south of 70°N) for 1985–2011 at monthly resolution. The method combines a spatial interpolation based on a “radius of influence” to determine nearby similar fCO2 values with temporal harmonic and cubic spline curve‐fitting, and also fits long‐term trends and seasonal cycles. Interannual variability is established using deviations of observations from the fitted trends and seasonal cycles. An uncertainty is computed for all interpolated values based on the spatial and temporal range of the interpolation. Tests of the method using model data show that it performs as well as or better than previous regional interpolation methods, but in addition it provides a near‐global and interannual coverage.


Introduction
The world's oceans absorb approximately 25% of the total anthropogenic emissions of carbon dioxide (CO 2 ) released into the atmosphere every year [MikalofFletcher et al., 2006;Le Qu er e et al., 2009]. Understanding oceanic fluxes of CO 2 is critical to explain present and project future perturbations of the global carbon cycle caused by human activities. The air-sea fluxes are driven primarily by the difference in the concentration of CO 2 between the atmosphere and the ocean surface. The concentration of CO 2 in surface water is commonly expressed as either the partial pressure (pCO 2 ) or fugacity (fCO 2 ) of carbon dioxide. Over 10 million surface ocean fCO 2 measurements have been collected since 1968 [Takahashi and Sutherland, 2013;Pfeil et al., 2013;Bakker et al., 2014]. The majority of these measurements have been obtained in the northern hemisphere ( Figure 1a) during the past 20 years (Figure 1b), which restricts in-depth analysis of global patterns and long-term trends.
The relatively limited distribution of surface ocean CO 2 measurements has restricted most mapping efforts to calculating climatological products of the seasonal cycle [Takahashi et al., 2002] or examining long-term trends [e.g., Takahashi et al., 2003Takahashi et al., , 2009Fay and McKinley, 2013], with little or no emphasis on variability at other temporal scales. Until recently, only regional studies have focused on CO 2 variability on subannual time scales [e.g., Bates et al., 1998;Sarma, 2003;Shim et al., 2007;Olsen et al., 2008;Litt et al., 2010] and on interannual variability [Bates et al., 1996;Gruber et al., 2002;Cosca et al., 2003;Wong et al., 2010]. Some spatial and temporal interpolation efforts have been published using a variety of techniques based on harmonic curve fitting [e.g., Schuster et al., 2009] or on empirical relationships between CO 2 and proxy variables such as sea surface temperature, salinity, chlorophyll and mixed layer depth [Boutin et al., 1999;Lefèvre and Taylor, 2002;Cosca et al., 2003;Ono et al., 2004;Olsen et al., 2004;Lohrenz and Cai, 2006;Park et al., 2006;Jamet et al., 2007;Watson et al., 2009;Park et al., 2010;Telszewski et al., 2009]. The geographic and temporal scope of most of these studies has been limited to the relatively observation-rich regions. Furthermore, the use of proxy variables creates additional uncertainties due to the assumption that the relationships between CO 2 and these proxy variables are constant in time [Boutin et al., 1999;Lefèvre and Taylor, 2002;Cosca et al., 2003;Jamet et al., 2007;Park et al., 2010]. project (fCO 2 ) [Pfeil et al., 2013;Bakker et al., 2014] has provided opportunities for a more detailed global analysis of surface ocean CO 2 over multiple time scales. Interpolated products of surface ocean CO 2 observations covering multiple years are valuable to characterize trends and variability. They can provide insights into the response of oceanic CO 2 to climate change and variability and the driving processes , provide the prior estimates necessary for atmospheric inverse methods [e.g., Gurney et al., 2002], and help validate ocean biogeochemical models [e.g., Le Qu er e et al., 2009].
A number of methods are currently being developed to globally interpolate surface CO 2 observations using a range of techniques, including neural networks [Sasse et al., 2013;Landsch€ utzer et al., 2014;Zeng et al., 2014], diagnostic inverse models [R€ odenbeck et al., 2013], biogeochemical models [Valsala and Maksyutov, 2010], and multilinear regressions [Park et al., 2010]. Many of these methods are extensions of previous regional scale interpolations [e.g., Schuster et al., 2009;Telszewski et al., 2009]. This paper presents a statistical gap-filling method to interpolate surface ocean fCO 2 in space and time for the entire global ocean south of 708N. The method interpolates fCO 2 observations using a combination of spatial autocorrelations of fCO 2 observations within a ''radius of influence'' as used in the World Ocean Atlas [Jones et al., 2012;Cressman, 1959;Barnes, 1964;Levitus, 1982], harmonic curve fitting as used in GLOBALVIEW [Keeling et al., 1989;Masarie and Tans, 1995], and cubic spline fitting as used, for example, in Bacastow et al. [1985]. Our approach does not rely on proxy data, but uses only fCO 2 observations. The method includes an assessment of the uncertainty for every interpolated value, both by considering the amount of interpolation performed, and by carrying out a verification using model output, which provides information on the limitations of the interpolation.

Method
Our method uses the SOCAT v2 database [Bakker et al., 2014], which contains 10.1 million individual surface fCO 2 observations obtained between 1968 and 2011. We focused on the 1985-2011 time period, which encompasses 99.7% of the observations.

Data Preparation 2.1.1. Gridding
The SOCAT v2 observations were binned into 2.5832.58 grid cells with daily temporal resolution. Leap years were converted to 365 day duration by dividing the year into even lengths of 1 1 365 calendar days. The complete data set was analyzed to remove any outliers that would adversely affect the subsequent curve and cubic spline fitting routines: in each grid cell, observations falling outside three standard deviations of the mean daily fCO 2 value were discarded in an iterative process, repeated until no further outliers were detected. 604 days' outliers (0.5% of the gridded observations) were eliminated in this manner. Setting the threshold to two standard deviations would have removed 12.2% of observations, severely impacting the performance of the method. 2.1.2. Spatial Variability of fCO 2 The radius of influence over which observations were interpolated was dependent on the spatial variability of the fCO 2 observations, which was measured using two metrics: a spatial autocorrelation analysis to quantify the spatial extent over which fCO 2 observations are related, and the mean difference in fCO 2 between all neighboring grid cells to assign a magnitude of difference between observations. For the spatial autocorrelation analysis, autocorrelation functions (ACFs) were calculated for each grid cell following the technique used in Jones et al. [2012]. A spatial ACF was calculated for each cruise in the SOCAT database using the Moran's I method [Moran, 1950], and the resulting e-folding length (i.e., the distance at which the autocorrelation coefficient drops below 1/e) assigned to each cell through which the cruise passed. This was used as a first guess of the decorrelation length for that cell, i.e. the distance beyond which fCO 2 observations were deemed unrelated. The cell's decorrelation length was refined by calculating the ACF for only those observations within a radius of five times the first guess ACF. The e-folding length of this refined ACF was used as the final spatial decorrelation length for the cell. The mean e-folding length was used where multiple cruises passed through a given cell. The decorrelation length for a given cell varied with the compass direction of the ship's heading as it passed through the cell, particularly in strong ocean currents. To provide the greatest accuracy, four directional spatial decorrelation lengths were calculated for each cell, one for each direction (i.e., north-south, east-west, northeast-southwest, and northwest-southeast). The mean decorrelation length for each direction was calculated using only those cruises traveling in the relevant direction. A ''directionless'' decorrelation length was calculated where there were insufficient ACF data to construct directional decorrelation lengths, using all cruises regardless of their direction of travel.
The difference in fCO 2 between grid cells was calculated for each pair of grid cells in turn. Whenever observations were made in both grid cells within 7 days, the absolute difference between them was recorded. The mean of these differences over the entire time period was recorded as the difference in fCO 2 between the two grid cells. This process was repeated for every pair of grid cells in the ocean.
The combination of these two metrics provided a dual assessment of the spatial variability of fCO 2 . For a given grid cell, it was possible to estimate both the spatial autocorrelation with other grid cells (Figure 2, red shading) and the magnitude of the difference in fCO 2 with other grid cells (Figure 2, numbers). Both of these metrics of spatial variability were used when spatially interpolating the observations (see section 2.2.3).

Gap Filling
The method developed here combined temporal [Masarie and Tans, 1995] and spatial [Cressman, 1959;Barnes, 1964] interpolation techniques. No interpolation was attempted poleward of 708N as there were too few observations available.
The interpolation technique developed here comprised a series of distinct stages applied iteratively on each 2.5832.58 grid cell (hereafter referred to as the target cell). Here we provide an overview of the interpolation technique and detail the individual steps afterward. Figure 3 provides an overview of the complete process used to gap-fill the fCO 2 data, applied in parallel to each cell. For each target cell, its time series of observations was retrieved ( Figure 3, box 1) and a temporal curve fit was attempted (Figure 3, box 2; section 2.2.2). If the curve fit was not successful, the spatial interpolations were performed (Figure 3, box 4; section 2.2.3) and the temporal curve fit attempted again. This was repeated until either a successful curve fit was achieved or no new observations could be interpolated into the target cell's time series (Figure 3, box 6).

Overview of the Gap-Filling Method
Even if a curve fit successfully passed the criteria for a valid fit (see section 2.2.2), it was possible that additional spatial interpolation could further constrain the fCO 2 curve fit for the target cell. This was because a time series with only a few data points may not have captured the full characteristics of the temporal variation of fCO 2 . The gap-filling method accounted for this by performing one more iteration of the spatial interpolation to produce an extra curve fit ( Figure 3, boxes 8-10). The two curves were then correlated. If the correlation coefficient r 2 0.99 (an empirically determined limit), the curves were deemed to be identical for the purposes of the interpolation, and the original curve fit was used as the final output of the interpolation for the target cell ( Figure 3, box 13). A correlation coefficient of r 2 < 0.99 indicated that the curve fit benefitted from the extra interpolated observations. In this case the interpolation was repeated (Figure 3 13) or no more observations could be added via spatial interpolation ( Figure 3, box 9). The corresponding steps are further detailed in the following sections.

Step I: Curve Fitting
Step I of the gap-filling method fits a curve to the time series of each target cell ( Figure 3, boxes 2 and 10) of the form: where t is the time in days since 1 January 1985, a 0 the y-axis intercept, a 1 the linear trend, and n the maximum number of harmonics used to represent the seasonal cycle. n is initially set to four to allow the fitted curve to encompass deviations from a purely sinusoidal progression of the seasonal cycle that may be caused by biological activity and temperature changes [e.g., L€ uger et al., 2004; K€ ortzinger et al., 2008]. Equation (1) is a simplified version of that used by Masarie and Tans [1995] for atmospheric CO 2 mole fraction, which includes a polynomial term to account for changes in the long-term trend. We omitted the polynomial term here because there were insufficient observations to fit varying long-term trends over the time period examined.
With no constraints beyond the fCO 2 observations, the curve fitting algorithm frequently produced unrealistic fits due to the relative lack of observations in any given cell. Each curve fit was therefore assessed against a number of criteria to ensure that it produced a realistic result, as listed in Table 1. The criteria ensured that the curve was based on data covering an extended time period with observations representing a large proportion of the calendar year; that the fitted curve was representative of the range of fCO 2 observations and exhibited a plausible seasonal cycle; and that the trend of the fitted curve was within known reasonable limits. If the fit failed to meet all criteria, the number of harmonics, n, was reduced by 1 and the curve fit repeated until a good fit was achieved. If no good fit was achieved after n was reduced to 1, the curve fitting was deemed to have failed. A flowchart showing the progression of the curve fit is presented in Figure 4.

Step II: Spatial Interpolation
Step II of the gap-filling method was applied to target cells where a curve fit could not be found using the target cell's own observations alone. In this case, observations from nearby cells were added to the target cell's time series (Figure 3, boxes 4 and 8) and the curve fit was attempted again. Candidate cells for this spatial interpolation were chosen based on the spatial autocorrelation of the SOCAT database (see section 2.1.2). Only cells within the decorrelation length (i.e., whose spatial autocorrelation coefficient was >5 1/e) were included. These candidate cells were then sorted in order of those whose observations had the smallest difference to the target cell's fCO 2 (see section 2.1.2 and Figure 2). Any cell that had no concurrent observations with the target cell was excluded from the interpolation. In the example shown in Figure 2, only candidate cells with both red shading (within the decorrelation length) and a number (concurrent observations) were used in the spatial interpolation.
The observations from the first candidate cell were merged with the time series of the target cell. For days where the target cell's time series contained an observation, no interpolated value was used. For the remaining time steps, the observations from the interpolated cells were added and given a weighting according to the spatial autocorrelation value (i.e., between 1 e and 1). Original observations from the target cell were given a weighting of 1. The merged time series was then used in the next iteration of the temporal curve fitting algorithm.
If the new curve fit was still not successful according to the criteria in Table 1, the second candidate cell was added to the time series. If this cell had any observations on the same days as the previously interpolated cell, they were combined as a weighted mean according to the autocorrelation coefficient between the two grid cells. Again, original observations from the target cell remained unchanged. The curve fit was then attempted a third time. This was repeated until either a successful curve fit was achieved or no more candidate cells were available.
After one iteration 3,807 grid cells (54%) had successful curve fits. After nine iterations 4,736 cells (67%) had successful curve fits and no further curve fits could be achieved. Those cells that could not be interpolated Figure 3. Flow diagram of the steps used to gap-fill fCO 2 data in a single grid cell. A curve is fitted to the cell's time series (2). If the fit cannot be made, or the fitted curve does not meet the criteria in Table 1, spatial interpolation is performed (4) and the curve fitting is repeated (2). If there are no new data points (5), then the interpolation fails (6) and the grid cell is processed in the next iteration of the gapfilling. If the curve fit succeeds, the fitted curve is stored (7) while more checks are performed. Another spatial interpolation is made (8). If this adds more data points (9), the subsequent curve fit is successful (10) and the new curve is significantly different from the stored fit (12), then the new fit is stored (7) and the process repeated. If the extra interpolation does not change the curve, the process reverts to the original stored fit (13) and the interpolation is marked as successful (14).
Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 either lacked sufficient observations for a curve to be fitted, or the local spatial variability was too high resulting in poor curve fits that were rejected based on the criteria in Table 1.

Uncertainty of Interpolated Observations
Original fCO 2 observations in the target cell's time series were given an uncertainty of 62.5 matm as the default uncertainty for direct surface ocean CO 2 observations [Takahashi and Sutherland, 2013]. Estimated uncertainties in observations copied from candidate cells gap-filled fCO 2 values were calculated as the root mean squared sum of 62.5 matm plus the variation in fCO 2 from the target cell's observations (i.e., the numbers on the map in Figure 2) to account for the spatial interpolation.

Step III: Conversion to Monthly Resolution and Calculation of Uncertainties
Once all possible curve fits had been completed, each cell was converted to monthly resolution and uncertainties were calculated for the complete time series. Each monthly time series was constructed using the curve parameters established from the iterations of Steps I and II ( Figure 5a). In months where original or interpolated observations were present, the weighted mean of those observations (weighted by the autocorrelation coefficient between the target cell and the cell from which the observations were interpolated) was inserted into the monthly time series, replacing the fitted curve value. Original observations from the cell were given a weighting of 1, while interpolated observations were weighted according to the spatial autocorrelation coefficient between the target and interpolated grid cell (red shading in Figure 2). The uncertainty for these observations was calculated as the root mean squared (RMS) uncertainty of the individual observations as in section 2.2.3.1. Uncertainties for the fitted curve where no observations were available were calculated for each month in the seasonal cycle in turn as follows. Any observations taken in January of any Total time range The timespan covered by the earliest and latest measurements in the time series must be at least 5 years.
Short timespans of measurements are unlikely to reflect the long-term characteristics of pCO 2 .

Standard deviation
The standard deviation of the available measurements must not exceed 75 matm.
Curve fits applied to time series with only extreme low and high measurements are frequently unrealistic.

Populated months
Measurements must be available in at least eight of the twelve calendar months at some point in the time series.
Unless at least three of the four annual seasons are represented in the time series, the fitted curve is unlikely to represent a realistic seasonal cycle. Curve ratio The amplitude of the fitted curve must be between 50% and 150% of the range of values represented by the measurements. The upper and lower limits of the curve must not exceed the limits of the measurements by more than 75 matm.
A fitted curve whose amplitude is too small or too large does not represent an accurate fit to the measurements.
Seasonal peaks Plankton blooms can produce a secondary peak in an otherwise sinusoidal seasonal cycle. Only one such peak should exist in the fitted curve. The size of the secondary peak must not exceed 33% of the total magnitude of the seasonal cycle.
Fits of multiple harmonics can produce an over-fitted curve with multiple peaks in the seasonal cycle. This is unrepresentative of the known annual cycles of pCO 2 concentrations.

Linear trend
The fitted linear trend (a 1 in equation (1)) must be in the range 22.5 a 1 4.75 matm yr 21 .
Linear trends outside these limits are unlikely to be realistic.  Figure 3). Initially, a curve is fitted of the form in equation (1), with four harmonics. If a curve is fitted and it meets the criteria for a good fit (Table 1) then the fit is considered successful. If not, the number of harmonics is reduced by one and the fit tried again. If a successful fit cannot be made with three, two or a single harmonic, the fit is deemed to have failed.
Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 year in the time series were collected together. For each of these observations, the distance between the observation and the fitted curve was added to the uncertainty of the observation itself using a root mean squared sum ( Figure 5b). This represents the uncertainty of the curve fit in relation to that observation. The root mean square of the uncertainties for all January measurements was used as the uncertainty for the curve fit. This was repeated for all other calendar months ( Figure 5b, shaded area). If there were any months with no observations, a linear interpolation was performed between neighboring months to fill in the missing uncertainty. The observations (with their uncertainties) from the target cell and the spatially interpolated values were overlaid on the fitted curve to provide a complete time series (Figure 5c).

Step IV: Spline Fitting
The time series generated from the combination of fitted curve and interpolated observations occasionally resulted in sharp and unrealistic discontinuities (Figure 5c). To eliminate these, each time series was smoothed by fitting a cubic spline function (''smooth spline'') [Chambers and Hastie, 1991], with a smoothing parameter (0.3) chosen to compromise between smoothing out the discontinuities and maintaining the variability from the mean seasonal cycle that the observations represented ( Figure 5d). Uncertainties for the spline fit were calculated as the uncertainty of the original time series plus the difference between the series and the fitted spline. The deviations of the spline fit ( Figure 5d) from the fitted long-term trend and seasonal cycle (Figure 5a) was used to determine the interannual variability in each grid cell where observations were present or had been spatially interpolated. 2.2.6.
Step V: Completing the Gap Filling The grid cells for which no valid curve fits could be found were filled by spatial interpolation of the complete time series (original observations, interpolated observations and the fitted curve) from neighboring cells where curve fits were successfully generated. The time series were only interpolated from directly neighboring cells to reduce the uncertainty and likely errors in the interpolated values. If there was a large area of grid cells to be filled, the area was filled using several iterations with the outer edges (i.e., those

Reconstructing Model Output
We assessed the performance of the gap-filling method by subsampling model output and applying our gapfilling method to these pseudo-data to recreate a complete pCO 2 field. We used pCO 2 output from a simulation of the PlankTOM5 model (updated from Buitenhuis et al. [2010]). pCO 2 is very similar to fCO 2 (typical differences are on the order of 1 matm), so it is an effective measure to use for the method evaluation. PlankTOM5 is an ocean biogeochemical model forced with NCEP daily reanalysis meteorological data [Kalnay et al., 1996]. We use the ORCA2-LIM version which has a spatial model grid of 28 zonally and 0.58 to 28 meridionally [Madec and Imbard, 1996] and 15 time steps per day.
The PlankTOM5 output was regridded at 18318 spatial and daily temporal resolution. From this we reconstructed the individual cruise tracks in the SOCAT database that took place in between 1985 and 2011. The date and location of each observation in each SOCAT cruise was matched with the corresponding value in the regridded PlankTOM5 output to produce an analogous model ''cruise.'' Where more than one observation was taken within a PlankTOM5 grid cell on the same day, only one value was recorded to prevent unrealistically strong spatial autocorrelations over short distances. The model ''cruises'' were then used to calculate the spatial autocorrelation characteristics of the sampled PlankTOM5 output as they were for the original SOCAT database (section 2.1.2). The PlankTOM5 decorrelation lengths were typically longer than those found in the SOCAT observations by a mean of 250 6 500 km due to PlankTOM5's 18 resolution ( Figure 6). The observation locations were typically reported at much higher resolution (up to 0.0018), which means that much finer scales of variability could be detected in the actual observations. The pattern of decorrelation lengths was broadly similar for both the observations and PlankTOM5. The greatest differences were found in regions of high spatial variability, particularly the Indian Ocean and Eastern Equatorial Pacific and coastal regions, all of which were influenced by the model resolution. The coastal Southern Ocean was the only region with seemingly systematic differences in decorrelation lengths.
The input to the gap-filling method was constructed by sampling the 18318 PlankTOM5 output at the same spatial and temporal density as the SOCAT database (Figure 1), and then regridding this to the target resolution of 2.5832.58 in the same manner as the observations (section 2.1.1). The resulting sampled Plank-TOM5 data were interpolated by applying the gap-filling method.
We assessed the performance of our interpolation method by calculating the difference (error) between the original PlankTOM5 output and the result of the gap-filling performed on the sampled PlankTOM5 output at every month in each grid cell. Figure 7 shows the root mean square errors for each grid cell (Figure 7a) and each time step (Figure 7b). The largest errors were typically concentrated around those areas with few or no observations, namely the South Pacific, South Atlantic and Southern Ocean. These were caused by the lack of observations rather than differences in the decorrelation lengths that influence the method's operation. Areas of high fCO 2 variability such as the Eastern Equatorial Pacific and coastal regions also had Figure 6. Decorrelation lengths (in km) calculated for each 2.5832.58 grid cell from the PlankTOM5 model output compared to the equivalent decorrelation lengths calculated from the SOCAT measurements. Darker shading indicates that multiple grid cells had the same comparative decorrelation lengths. The gray square represents the maximum decorrelation length between two neighboring grid cells (196.5 km). Cells with decorrelation lengths within this square cannot be spatially interpolated as they have no relationship to their neighbors.
Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 relatively large errors (Figure 7a). The magnitude of the differences was stable through time and not significantly influenced by the density of observations in any given year (Figure 7b). 25% of the gap filled values had an error of 3.31 matm. 50% had errors 7.62 matm, and 75% were within 12.99 matm (Figure 7c). In relative terms, 25% of gap-filled values were within 0.93% of the correct value; 50% were within 2.1%, and 75% within 4%.
This model evaluation also allowed us to assess the efficacy of the uncertainties calculated alongside the gap-filled pCO 2 ; they should be similar to the errors if they were truly representative of the limitations of the method. The mean error for the gap-filled model output was 11.3 6 12.4 matm, while the mean calculated uncertainty was 10.5 6 9.5 matm. 50% of the calculated uncertainties were within 65.15 matm of the interpolation error, and 75% were within 610.41 matm. Figure 7d shows a scatter plot comparing the errors in the reconstruction to the assigned uncertainties. These results indicate that the majority of uncertainty estimates calculated by our method are representative of the likely real errors in the estimate fCO 2 values.
Trends in the original PlankTOM5 output and the gap-filled interpolation were calculated as the difference between the 1985-1989 mean and the 2007-2011 mean divided by the 27 year period of the interpolation. For the interpolation, trends were also calculated using the extreme upper and lower limits of the gap-filled fCO 2 values bounded by their uncertainty, to give the maximum and minimum trend across the possible range of values. The range of these trends was used as the uncertainty range of the calculated trend. The trend in each grid cell was subtracted from the atmospheric CO 2 trend (calculated in the same manner, to account for the nonlinearity of the atmospheric trend [Dlugokencky and Tans, 2014]). We examined the difference between the two sets of trends (Figure 8).
The global mean gap-filled trends during 1985-2011 were marginally higher than the original PlankTOM5 output (means of 20.07 matm yr 21 relative to the atmospheric trend versus 20.03 matm yr 21 respectively). The gap-filled trends showed greater spatial variability than the trends of the PlankTOM5 output (0.55 matm Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 yr 21 and 0.25 matm yr 21 respectively), but the broad spatial patterns of positive and negative trends were similar. As with the overall errors, the largest differences in trends occurred in those regions with the fewest observations, namely the southern hemisphere oceans. Trends were also difficult to reconstruct in the Eastern Equatorial Pacific because of high interannual variability from the El Niño Southern Oscillation (ENSO), and the sparse sampling of the observations in a region of complex currents and water masses. The seasonal cycle in the original PlankTOM5 output and the gap-filled data were also in good agreement in nearly all regions (Figure 9). The global mean amplitude of the seasonal cycle is 36.8 matm in PlankTOM5 and 33.0 matm in the gap-filled reconstruction, with an RMS error of 17.9 matm. The correlation of amplitudes averaged at each latitude band (the zonal correlation) gave r 2 5 0.73.
The accuracy of the PlankTOM5 reconstruction was further tested by examining the relationship between pCO 2 and sea surface temperature (SST), and between pCO 2 and chlorophyll a (Chla). The detrended and deseasonalized time series of pCO 2 and SST (Chla) from the original model data were correlated in each 2.5832.58 grid cell (Figures 10a and 11a). The same correlation was performed using the reconstructed pCO 2 field (Figures 10b and 11b), and the differences examined (Figures 10c and 11c). 75% of the reconstructed correlations had an r value that is within 0.29 (0.23 for Chla) of the original correlations, and 95% were within 0.59 (0.49 for Chla) (Figures 10c and 11c). The largest differences were in the Southern Ocean, particularly with SST where the pattern of correlations along the Antarctic Circumpolar Current was not captured. The South Atlantic and Eastern South Pacific showed similarly poor performance. In these regions the sampling of the original model was such that the correlation between pCO 2 and SST (Chla) was not representative of the complete model, so it is unsurprising that the relationship could not be reconstructed successfully. The other main region of difference was the Equatorial Pacific, which can be fully explained by a lack of observations. Here, the interannual signal was larger than the seasonal cycle in both pCO 2 and SST. While the curve fitting process can reconstruct a seasonal cycle, it cannot accurately reproduce interannual variability where there are missing observations. The corresponding difference in the correlation with Chla was neither as strong nor as widespread as that seen in SST, but the effect was still visible. We assessed the efficacy of the gap-filling method in relation to previous regional interpolation techniques by comparing the errors found in reconstructing the PlankTOM5 output with the errors reported for published studies ( Table 2). The errors reported in those studies ranged from 0.77 to 32 matm, while those found in this study were in the range 0.99-20 matm for the corresponding times and regions, and were 4.2 matm smaller on average. This shows that the global gap-filling method is comparable in performance to the regional studies. The errors in our interpolation (Table 2, column 5) were smaller than the uncertainties assigned to the interpolated values (Table 2, column 6) in the majority of cases.

Comparison to Other CO 2 Data 3.2.1. Fixed Moorings
There are a number of oceanographic CO 2 data sets from time series stations that are not included in the SOCAT database. We compared our interpolated data calculated from the SOCAT observations to the pCO 2 values from some of these stations. The SOCAT interpolated observations (Figure 12, red dots) in the corresponding time series boxes show significant differences to the time series observations themselves (in blue). These differences in the data were directly translated to the gap-filling method. Nevertheless the gap filling method achieved a fit close to the uncertainty at the European Station for Time Series in the Ocean (ESTOC) [Gonz alez-D avila and Santana-Casiano, 2009], at the TAO S2 170W in the Equatorial Pacific [Chavez, 2004], and to a lesser extent at the Hawaii Ocean Time-Series (HOT) [Dore et al., 2009]. At PIRATA 10 Lefèvre et al. [2007], however, the amplitude of the SOCAT interpolated observations was much smaller than that of the station observations, a bias which was translated in the gap-filling method to both the estimated values and an underestimated uncertainty. Finally, both the Bermuda Atlantic Time Series (BATS) [Bates, 2007] and Irminger station [Olafsson, 2007] have very large seasonal amplitudes of approximately 90 matm (Figures  12a and 12d). This leads to large differences between the gap-filled time series and the station observations,  Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 because even small temporal offsets from the seasonal cycle fitted to the SOCAT observations can result in differences of tens of micro-atmospheres.

Comparison With LDEO Data
The LDEOv2012 database [Takahashi and Sutherland, 2013] is another collection of global surface ocean CO 2 observations, constructed independently of SOCAT. This contains some observations that are not in SOCAT, so it was possible to compare these to our interpolated values.
We took the observations between 1985 and 2011 from the LDEOv2012 database and binned them on to a 2.5832.58 monthly grid, to match the resolution and coverage of our interpolated data set. We did the same for the complete SOCAT data set. We then removed from the gridded LDEO data any points where measurements were present in SOCAT. This left the gridded LDEO measurements that were not present in SOCAT. These were then compared to the corresponding points in the interpolated data set based on SOCAT, and the differences between them were assumed to be errors in the interpolated data (Figure 13a). 25% of the gap-filled values had an error of 4.7 matm. 50% had errors 11.0 matm, and 75% were within 22.6 matm (Figure 13b). In relative terms, 25% of gap-filled values were within 1.3% of the true value, 50% were within 3.1%, and 75% were within 6.3%. As with the model evaluation above, we compared the differences between the interpolated values with the uncertainties calculated during the interpolation ( Figure  13c). 25% of the uncertainties were within 7.1 matm of the error, 50% were within 15.8 matm, and 75% were within 30.2 matm.
There were four main areas where the differences between the interpolated SOCAT data and the LDEO data were largest: the Equatorial Pacific, the Atlantic sector and coastal areas of the Southern Ocean, the Bering Sea, and the North Pacific between 308N and 508N. In the case of the Equatorial Pacific, the method struggles to capture the very high interannual variability in CO 2 , as also seen in the model-based evaluation. The other regions also have very high variability [Naveira Garabato et al., 2004;Resplandy et al., 2014;Bates et al., 2011], and also poor sampling (Figure 1b). In the Southern Ocean there are very few observations to be used as input to the interpolation. In the Bering Sea there are a relatively high number of observations, but they are focused in a very small region. These values are interpolated over the entire Bering Sea, and the large spatial variability in this region means that the interpolated values do not reflect the true distribution  Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 of fCO 2 . The range of errors between 20 and 25 matm in the North Pacific was unexpected, because there is very good observation coverage here and the fCO 2 should be well constrained. Our investigations showed that the additional LDEO data in this region have much higher variability than the SOCAT data. The reason for this difference is unknown.

Limitations of the Evaluation Techniques
Both of the evaluation techniques used above have limitations in their accuracy. Numerical models are of relatively coarse resolution, and are therefore unable to reproduce fine details on spatial and temporal scales. However, since the gap-filling method also produces coarse resolution results (monthly means on a 2.5832.58 grid), the problem has less impact that it would with a higher resolution interpolation.
In the interpretation of the comparison of interpolation results to observations from fixed stations and the LDEO database we need to consider that the comparison is between data from single geographical points and times, with values that are averages over much larger spatial and temporal scales. The latter eliminates the small scale variability captured by the former, and therefore cannot reproduce the observations accurately.

Results
The gap-filling method allows us to produce monthly fCO 2 values for the years 1985-2011 on a 2.5832.58 grid south of 708N based on the SOCAT v2 database. Uncertainties assigned to the gridded data ( Figure 14) are predominantly in the range of 0 to 12 matm. The uncertainties are typically a function of the number of available observations in each grid cell, with higher observation densities requiring less spatial interpolation (on which the uncertainty estimates are based). The smallest uncertainties are therefore in the North Pacific and North Atlantic. The uncertainties are largest in the Eastern Equatorial and South Pacific, South Atlantic and Southern Ocean. One exception to this is the Southern Ocean in the Antarctic Circumpolar Current, where strong zonal currents result in very similar fCO 2 in neighboring grid cells. The spatial interpolations   Figure 15 shows the annual mean fCO 2 for the year 2000, comparing our gap-filled results with the commonly used pCO 2 climatology for the year 2000 constructed from the LDEO database of pCO 2 observations using trend-based adjustments and lateral transport equations [Takahashi et al., 2002;Takahashi et al., 2009]. Again, we ignore differences between pCO 2 and fCO 2 as they are negligible (approximately 1 matm). The gap-filled annual mean (Figure 15a) is very similar to the Takahashi climatology (Figure 15c): the two products match within the bounds of uncertainty over the great majority of the globe (Figure 15b). The RMS difference between the two maps is 9.7 matm, with an overall pattern correlation of r 2 5 0.78. Examining the zonal mean concentration (Figure 15d) removes many of the effects of variability between individual cells, and provides a better picture of the coherence between the climatology and the gap-filled data. This shows that the overall structure of the two is very similar, with a zonal mean correlation of r 2 5 0.93.

Seasonality
We also analyzed the amplitude of the seasonal cycle in both data products (Figure 16). A traditional uncertainty cannot be given for the amplitude in the gap-filled data, since the possible range of amplitudes is defined by the uncertainties of the individual values. This is not evenly distributed, and so cannot be represented by a traditional 6 value. Instead, we show the range between the largest and smallest possible amplitudes that could be calculated using the individual monthly values (Figure 16b). This is typically twice as large as one would expect a traditional 6 uncertainty to be. The overall pattern is again similar between the two data sets, although there are more areas where the Takahashi amplitude is outside the uncertainty range of the gap-filled data (Figure 16a and 16b, black dots). The RMS error between the climatology and the gap-filled data are 23.9 matm, with a zonal correlation of r 2 5 0.56 (Figure 16d). While not necessarily Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 statistically significant, the regions of largest difference between the two data sets are the Equatorial Pacific, the Eastern South Pacific and regions of the Southern Ocean. The differences in seasonal amplitude in the Equatorial Pacific are large and account for some of the low correlation between the two methods. The Takahashi climatology excludes observations obtained during El Niño periods, but they are included in our method. Furthermore, the Equatorial Pacific has large uncertainty in our method because of the fine structure of ocean currents and relatively large interannual variability in this region [Cosca et al., 2003;Doney et al., 2009]. The differences in South Pacific and Southern Ocean are mostly due to low observation density (Figure 1), exacerbated in the Southern Ocean by its highly variable oceanic conditions [Naveira Garabato et al., 2004;Resplandy et al., 2014] that will lead to errors in both our method and the Takahashi et al. climatology. Amplitudes are more comparable elsewhere, with many consistent spatial structures in the North Pacific and North Atlantic.

Long-Term Trends
It is difficult to assess the accuracy of long-term trends in any observation-based fCO 2 database because they are known to be sensitive to the temporal and spatial scales over which they are calculated [e.g., McKinley et al., 2011, Fay andMcKinley, 2013]. For this study, we assessed the trends over the 27 year period of the interpolation. Trends and their uncertainties (Figures 17a and 17b) were calculated for each grid cell using the same method as for the model validation (section 3.1). Trends were deemed insignificant when their uncertainty range crossed zero (Figure 17a, black dots). The fCO 2 trends were compared against the corresponding atmospheric trends (Figure 17c).
Much of the ocean fCO 2 trend is slower than that of the trend in atmospheric CO 2 , although there is large regional variability and uncertainty in the trends indicates that the difference is not significant in the majority of cases (Figure 17c). The global mean relative trend (ocean trend minus atmospheric trend) is 20.18 6 0.76 matm Figure 16. Comparison of the gap-filled seasonal amplitude of fCO 2 with the seasonal amplitude pCO 2 from the Takahashi climatology [Takahashi, 2009]. (a) The seasonal amplitude from the gap-filled SOCAT data. (b) The range of possible amplitudes in each cell of Figure  16a, based on the limits of the uncertainties of the fCO 2 values. This is typically twice as large as a normal 6 uncertainty, which is not representative of the true uncertainty in the amplitude. (c) The seasonal amplitude from the Takahashi climatology. (d) The zonal mean amplitude from Takahashi (black) and the gap-filled data (red). The red envelope indicates the range limit of gap-filled amplitude. In Figures 16a  and 16b, black dots indicate cells where the gap-filled data are significantly different from Takahashi, i.e., the Takahashi value is not within the possible range of amplitudes in the gap-filled data.
Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 yr 21 . In the Atlantic between 358N and 608N there is some evidence of an east-west difference in the fCO 2 trend, with the eastern region (408W to 08; 0.04 6 0.56 matm yr 21 ) increasing faster than the western (808W to 408W; 20.29 6 0.85 matm yr 21 ). Similar but smaller effects have been observed in previous studies [Schuster et al., 2009;Takahashi et al., 2009;Watson et al., 2009]. This pattern, along with the widespread low fCO 2 trend in the South Atlantic, is also evident in Landsch€ utzer et al. [2014]. Trends in the North Pacific are almost exclusively slower than the atmospheric trend. In the midlatitudes (108N to 358N) the trends are only slightly lower (20.14 6 0.41 matm yr 21 ), similar to those found by Midorikawa et al. [2006]. The difference is much greater in the higher latitudes (358N-608N; 20.34 6 1.06 matm yr 21 ), similar to Lenton et al. [2012] despite a region of very rapid increase in the Sea of Okhotsk to the north of Japan. The Equatorial Pacific (108S to 108N) is the region with least agreement with prior studies, reflecting its high uncertainty. We see significantly higher trends in the Eastern Equatorial Pacific (1708E to 1208W) than anywhere else on the globe (0.32 6 0.75 matm yr 21 ), in disagreement with Feely et al. [2006]. Meanwhile, the Western Equatorial Pacific (1008E to 1708E) has slightly smaller trends (20.37 6 0.29 matm yr 21 ) than those seen in previous estimates Ishii et al., 2009].

Interannual Variability
The interannual variability (IAV) in fCO 2 is only captured in our method where original measurements are used or interpolated, since the fitted curves will provide a climatological seasonal cycle and long-term trend only. Consequently, IAV cannot be fully resolved and is almost certainly smaller in magnitude than the true IAV. The spline fit applied in Figure 5d deviates from the harmonic curve fit where measurements are present. The difference between the harmonic fit and the spline represents the IAV that the method is able to capture. Averaging the IAV across ocean regions shows some key features of pCO 2 variation ( Figure 18). In the Equatorial Pacific (Figure 18f), the progression of ENSO events is clearly visible. IAV of similar magnitude can be seen in the Southern Ocean (Figure 18h). In the North Atlantic, IAV is consistently discernible only  Figure 18a); There were no regular measurement programs prior to this, thus IAV could not be resolved consistently.
The cubic spline fitting performed in Step IV of the method (section 2.2.5) eliminates discontinuities in the data set, but also removes some of the IAV. The effect of this is negligible, however; the regional interannual variability shown in Figure 18 is between 0.5% and 2% smaller than it would be without the spline fitting. Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416 4.4. Coastal fCO 2 fCO 2 in coastal regions is much more variable (and therefore difficult to predict) in coastal regions than in the open ocean. However, at the 2.5832.58 used in this study individual grid cells are quite large (278 km across at the equator). Eliminating these grid cells would remove a relatively large portion of the ocean. While it would be possible to remove coastal measurements from the initial input to the interpolation by filtering them from the original SOCAT data set, tests have shown that these values still have considerable value in providing input to the seasonal curve fitting algorithms. The high variability of coastal fCO 2 is automatically incorporated into the uncertainty estimates for the method, since it is based partially on the difference between the observations and the fitted curve (section 2.2.4; Figure 5b). High variability therefore leads to large uncertainties in many coastal regions (Figure 14).

Conclusion
We have presented a gap-filling method adapted to the available global observations coverage of surface ocean fCO 2 values over the 1985-2011 period south of 708N. The estimated accuracy of our results is comparable to or an improvement over estimates reported from other regional approaches ( Table 2). The gapfilled fCO 2 data include gridded uncertainties based on the spatial and distance over which values have been interpolated and the closeness of temporal curve fits to the observations. These uncertainties can help guide data selection and interpretation in future studies using fCO 2 observations. The output of this method can be used to assess fCO 2 variability over multiple temporal and spatial scales, to help establish the optimal location and frequency for CO 2 observation programs, and to reduce the uncertainties in our knowledge of this key ocean variable. Our gap-filled data set can also provide prior estimates required by atmospheric inversion models, and data to evaluate process model simulations.
The technique developed here provides an alternative approach to those currently available in the literature as it relies neither on knowledge of other oceanic variables nor on the physical characteristics of the ocean. The independent statistical nature of the technique means that it can be readily applied to other environmental global data sets. It also means that the output directly depends on the quantity and quality of the data input, and that the absence of data in many regions and months can lead to an underestimation of the interannual variability and to excess spatial variability in the assessed trends. These biases would be reduced with the increased collection and inclusion of observations. projects CARBOCHANGE ''Changes in carbon uptake and emissions by oceans in a changing climate'' (grant agreement 264879) and GEOCARBON (agreement 283080). The code and input data used to calculate the gapfilled fCO 2 , with instructions for use, are published at Pangaea. Alongside the code is the calculated gap-filled fCO 2 data set from SOCAT v2 that is presented in this study. Both the code and output are available from Pangaea, doi 10.1594/ PANGAEA.849262 [Jones et al., 2015]. We thank the reviewers for their thoughtful and helpful comments, which have proved invaluable.
Journal of Advances in Modeling Earth Systems 10.1002/2014MS000416