Exploring Sources of Uncertainty in Steric Sea-Level Change Estimates

Recent studies disagree about the contribution of variations in temperature and salinity of the oceans — steric change — to the observed sea ‐ level change. This article explores two sources of uncertainty to both global mean and regional steric sea ‐ level trends. First, we analyze the in ﬂ uence of different temperature and salinity data sets on the estimated steric sea ‐ level change. Next, we investigate the impact of different stochastic noise models on the estimation of trends and their uncertainties. By varying both the data sets and noise models, the global mean steric sea ‐ level trend and uncertainty can vary from 0.69 to 2.40 and 0.02 to 1.56 mm/year, respectively, for 1993 – 2017. This range is even larger on regional scales, reaching up to 30 mm/year. Our results show that a ﬁ rst ‐ order autoregressive model is the most appropriate choice to describe the residual behavior of the ensemble mean of all data sets for the global mean steric sea ‐ level change over the last 25 years, which consequently leads to the most representative uncertainty. Using the ensemble mean and the ﬁ rst ‐ order autoregressive noise model, we ﬁ nd a global mean steric sea ‐ level change of 1.36 ± 0.10 mm/year for 1993 – 2017 and 1.08 ± 0.07 mm/year for 2005 – 2015. Regionally, a combination of different noise models is the best descriptor of the steric sea ‐ level change and its uncertainty. The spatial coherence in the noise model preference indicates clusters that may be best suited to investigate the regional sea ‐ level budget. and re ﬂ ect how the oceans have been responding to global warming. For this reason, several recent studies have quanti ﬁ ed the contribution of steric variations to global and regional sea ‐ level change. However, the reported rates largely differ between studies. In this paper, we look at how the use of different temperature and salinity data sets can be one of the causes of the different estimates of steric sea ‐ level change published so far. We also investigate how different methods (noise models) used to obtain the rate of change can be another source of different results. We ﬁ nd that the rate of change can vary up to 2 mm/year for the global mean as a result of different data sets and methods used. Regionally, differences can reach up to several tens of millimeters per year. We show that the noise models should always be carefully chosen for each region, so that the rate of change is accurately estimated. are available, and leave studying deep ocean uncertainties for future work. We compute and compare steric sea ‐ level trends from 1993 to 2017 (satellite altimetry era) and from 2005 to 2015 (Argo period), considering 15 different gridded temperature and salinity data sets and 8 different noise models. We will show that the uncertainties of both data sets and noise models play an important role in estimating the steric contribution to SLC. We describe how we calculated steric SLC and the ocean temperature and salinity data sets used in section 2, followed by an explanation of the noise models used to compute the trends and uncertainties. We show the results of the different data sets and noise models for the global mean and regional steric SLC in sections 3 and 4, respectively. We display mainly the results for 2005 – 2015, when it was inconvenient to display both periods in the same ﬁ gures. The complementary ﬁ gures for 1993 – 2017 can be found in the supplementary information. around 20°S the trend becomes positive, with a stripe pattern of nar-row bands until the far south of the transect. Only for the SIORG data set do we see an overall positive trend from 10°S to 60°S. The uncertainty values are generally lower in the Atlantic compared to those in the other transects. Most of the data sets, especially EN4, show an increase in the uncertainty around 40°S. The atypical negative trend between 40°N and 20°N of the APDRC data set is accompanied by a very high uncertainty value.


Introduction
The oceans are a major reservoir of heat and have stored about 90% of the human-induced heat in the climate system over the last 50 years (MacIntosh et al., 2017;von Schuckmann et al., 2016). The resulting ocean warming leads to sea-level change (SLC), which is an important reflection of how the oceans respond to global warming (Kopp et al., 2016). This process, known as thermosteric SLC, is the dominant component of the steric contribution to global mean sea-level (GMSL) change.
Steric SLC is also partially driven by salinity variations (i.e., halosteric change). The contribution of salinity to GMSL change is negligible, as the ocean's salt content is considered to be constant over multidecadal time scales (Cazenave et al., 2018). Regionally, however, halosteric SLC can be just as important as thermosteric changes, or even the dominant process of steric sea-level variations, for instance, in polar regions (Stammer et al., 2013). Hence, studies of steric contributions to regional SLC should consider both salinity and temperature variations (MacIntosh et al., 2017).
In addition to density-driven variations, present-day SLC is also driven by ocean mass fluctuations . The main processes that change the amount of mass available in the oceans are the melting of glaciers and ice sheets and variations in terrestrial water storage. Together, density and mass variations represent the total budget of SLC (e.g., Cazenave et al., 2018;Gregory et al., 2013). Thus, accurately quantifying steric sea-level variations and their uncertainties constrains the other contributions to the observed SLC.
Comparing published rates of steric GMSL change is complicated, as the period considered differs between studies. For example, while the fourth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR4, Bindoff et al., 2007) estimated a thermosteric contribution of 1.6 ± 0.5 mm/year to the GMSL change for 1993-2003, the IPCC AR5  reported a thermosteric contribution of 1.1 ± 0.3 mm/year for 1993-2010. In addition to the different study periods, part of the difference between steric rates can also be explained by measurement biases, such as depth biases in expandable and mechanical bathythermograph (XBTs and MBTs) observations (Ishii and Kimoto, 2009). However, even if one considers the same time period and the same corrections, estimates of steric SLC can still differ for the following three reasons.
First, the deep ocean (>2,000 m depth) contribution to steric SLC is uncertain. Given the lack of observational data below 2,000 m depth, different estimates of how much the deep ocean has contributed to SLC have been made in the recent past (e.g., Desbruyères et al., 2016;Dieng et al., 2015;Llovel et al., 2014;Purkey and Johnson, 2010). For instance, based on the few high-quality full-depth hydrographic measurements available, Purkey and Johnson (2010) estimated that deep ocean warming contributed 0.1 mm/year to the GMSL change from 1980 to 2010. Contrarily, through a budget analysis, Llovel et al. (2014) found a negative (−0.13 ± 0.34 mm/year) deep ocean contribution to GMSL change from 2005 to 2013. According to an ocean model simulation, repeated hydrographic measurements tend to underestimate the deep ocean warming due to biases induced by temporal and spatial sampling (Garry et al., 2019).
A second source of uncertainty in steric SLC is the use of different temperature and salinity data and how these data are processed. Dieng et al. (2015) found that the steric sea-level trend varied from 0.15 to 0.92 mm/year, from 2003 to 2012, depending on the data set analyzed. These differences can be explained mainly by the corrections made to individual measurements and by different data processing methodologies, such as gap filling methods (Dieng et al., 2015). Furthermore, the type of the data used (i.e., in situ observations, reanalysis (REA), or ocean model output) can lead to diverging trends.
Finally, the method used to determine the trend from a time series is another source of uncertainty. Trends are usually obtained by fitting a linear or quadratic model to a time series. The associated error can be described by different stochastic noise models (Bos et al., 2013). Bos, Williams, et al. (2014) showed that uncertainties in SLC can be underestimated when an improper noise model is used. While the effect of using different noise models on estimates of the total SLC from satellite altimetry and tide gauges has already been discussed (e.g., Bos, Williams, et al., 2014;Hughes and Williams, 2010;Royston et al., 2018), its effect on estimates of steric SLC is still unclear. Given that the choice of the noise model is highly dependent on the nature of the process it describes (Royston et al., 2018), here we investigate the impact of different noise models on steric sea-level trends.
This study explores how steric sea-level trends can vary, based on the data sets and the noise models used. We limit our analysis to the upper ocean (0-2,000 m), where most data are available, and leave studying deep ocean uncertainties for future work. We compute and compare steric sea-level trends from 1993 to 2017 (satellite altimetry era) and from 2005 to 2015 (Argo period), considering 15 different gridded temperature and salinity data sets and 8 different noise models. We will show that the uncertainties of both data sets and noise models play an important role in estimating the steric contribution to SLC.
We describe how we calculated steric SLC and the ocean temperature and salinity data sets used in section 2, followed by an explanation of the noise models used to compute the trends and uncertainties. We show the results of the different data sets and noise models for the global mean and regional steric SLC in sections 3 and 4, respectively. We display mainly the results for 2005-2015, when it was inconvenient to display both periods in the same figures. The complementary figures for 1993-2017 can be found in the supplementary information.

Calculating Steric Sea-Level Anomalies
Following Gill and Niller (1973) and Tomczak and Godfrey (2003), the steric sea-level anomaly (SLA) η s can be described as follows: where ρ 0 is a reference density and ρ′ is the local density anomaly. The expression is vertically integrated from the maximum local depth z 2 to the water surface z 1 . We use the Thermodynamic Equation of Seawater (TEOS-10, McDougall and Barker, 2011) as the equation of state to calculate the steric SLA and consider only the upper ocean (0-2,000 m) in our computations.
The density anomaly ρ′ is a function of variations in the ocean temperature, salinity, and pressure fields and is computed using ocean temperature and salinity data from in situ and REA data sets (section 2.2).

Ocean Temperature and Salinity Data Sets
We collected and analyzed 15 publicly available gridded data sets of continuously monthly ocean temperature and salinity published by different research groups worldwide (Table 1). All data sets were downloaded between January and March 2020 and are currently accessible (last check June 2020). The data sets were selected based on their recurrent use in the literature. Given that we use gridded data sets, we do not consider the uncertainty propagation from a single profile of temperature and salinity to steric height. The concept of uncertainty propagation was reviewed and discussed in detail in MacIntosh et al. (2017).
The data sets were categorized according to their data type: in situ measurements versus ocean REA. The first category includes ship-based hydrographic data (such as conductivity-temperature-depth profiles, XBTs, and MBTs) and observations collected by Argo profiling floats ). The in situ  Note. The abbreviations are short forms for either the reference paper (e.g., C17 for Cheng et al., 2017), the responsible institution (e.g., APDRC), or the official name of the data set (e.g., CORA). Time period in bold indicates the data sets available for the entire satellite altimetry era. a Two versions of EN4v4.2 are available, depending on the type of correction applied for the XBT and MBT data. Following the description in the discussion of Cheng et al. (2016) about the benefits of each correction, we use the data set with the Gouretski and Reseghetti (2010) correction. b These products were obtained via the Global Ocean Ensemble Reanalysis Version 2 of Mercator Ocean (GLOBAL_REANALYSIS_PHY_001_031).
observations are generally merged onto a grid using statistical interpolation (Storto et al., 2017). This category is further divided into two categories: Argo, for data sets that have only data from Argo floats, and multiple in situ (MiS), for products that combine several sources of in situ observations, in addition to Argo data. The second category, ocean REA, assimilates the observational data into ocean circulation models, extrapolating the information spatially and, sometimes, temporally (MacIntosh et al., 2017).

Data Processing
First, the steric SLA was computed on the native grid of each data set following Equation 1. We then standardized the varying resolution by remapping all data sets on a 1°by 1°grid, using bilinear interpolation. Next, we selected the grid points within 66°S to 66°N of latitude and applied a land mask based on ETOPO1 (Amante and Eakins, 2009). The land mask was applied without extrapolating any values, meaning that areas which are not covered by all data sets are still considered, using the available data sets in each location. Figure S2 shows the number of data sets available at each grid cell after applying the land mask. Using a more conservative ocean mask, in which only the grid points that all data sets have data are considered, leads to a minor difference of 0.08 mm/year for the global mean trend. Hence, we decided to use the ETOPO1 mask, which has the advantage of a wider spatial coverage. Next, we computed a category mean for each of the three categories (Argo, MiS, and REA) and a total ensemble mean of all individual data sets. Using an area-weighted mean, we computed the global mean steric SLC for each data set.

Estimating Trends and Uncertainties
The trends and respective uncertainties were estimated using the Hector software (Bos et al., 2013). Hector uses a weighted least squares model to estimate the linear trend, while accounting for periodic signals (i.e., seasonal and annual components). The variations not captured by the regression model are defined as noise (Bos, Williams, et al., 2014). The software allows the user to choose between a number of stochastic models to describe the noise properties, such as temporal correlation and spectral density behavior, thus reducing the risk of underestimating the trend (Bos et al., 2013). The uncertainties provided by Hector and shown in this paper represent one standard deviation.
Based on sea-level literature (e.g., Bos, Williams, et al., 2014;Royston et al., 2018), we tested eight models to find the best descriptor of the uncertainties in our data: a white noise (WN) model; a generalized Gauss-Markov (GGM) model; a pure power law (PL) model; a PL model combined with WN (PLWN); autoregressive models of orders 1, 5, and 9 (AR(1), AR(5), and AR(9), respectively); and an autoregressive fractionally integrated moving average model of order 1 (ARFIMA, henceforth ARF). The WN model is the simplest stochastic model, in which the spectral density of the noise is equal to zero, and no temporal correlation between the residuals is considered. In the PL model, all observations influence one another, although their correlation decreases with increasing temporal distance. The AR(p) models consider that each observation at time t i is affected by white noise and the observation at time t i − 1 . The time of preceding observations influencing each observation is given by the order p of the autoregressive model. To represent PL observations at low frequencies, the ARF model combines an AR(1) model with a fractional integration and a moving average of the noise. The GGM model is a generalized form of the ARF model. More details about each model can be found in the Hector software manual and in Bos et al. (2013), Bos, Fernandes, et al. (2014), and Bos, Williams, et al. (2014).
To determine the goodness of fit of the noise models, we use the Akaike information criterion (AIC; Akaike, 1974) and the Bayesian information criterion (BIC; Schwarz, 1978), together with visual inspection of the power spectrum of the noise model and the fit residual. Both AIC and BIC use the maximum likelihood and the number of parameters to judge the quality of the model. However, BIC penalizes the number of parameters stronger than does AIC (Liddle, 2004). AIC and BIC are the most common criteria used to describe the goodness of the fit, yet there is no consensus in the literature about which criterion should be used. Therefore, we used these criteria to select the "best" noise model for each data set and then computed an overall ranking of each noise model. Following Royston et al. (2018), we consider the mean scoring of AIC and BIC to decide which noise model is the most suitable. More details about the selection procedure is given in supporting information Text S1.
We complemented the global mean noise model analysis with the software ARMASA (Broersen, 2020). Given a residual time series, ARMASA identifies the ideal noise model type and order, choosing between different autoregressive, moving average, and autoregressive moving average models (AR(p), MA(q), and ARMA(p′, p′ − 1), respectively). More details about ARMASA and the parameters used are given in supporting information Text S2 and Klees and Broersen (2002).

Steric Sea-Level Time Series
From the temperature and salinity data sets (Table 1), we computed 15 global mean steric sea-level time series, three category means (red, blue, and green lines for Argo, MiS, and REA, respectively), and one ensemble mean (black line) ( Figure 1). Despite the differences among the time series, all the category means agree within one standard deviation from the ensemble mean. Due to the addition of Argo data, starting in 2002 and reaching a near-global coverage in 2005, the standard deviation of the global mean steric sea level reduces strongly: from about 8 mm/year during the first 10 years to 4.2 mm/year from 2002 to 2005 and to 3.6 mm/year after 2005. This reduction is seen not only for the ensemble mean and Argo data sets but also for the MiS and REA categories, suggesting that all data sets strongly depend on the Argo float data.
The ensemble means based on the 10 and 15 data sets (for 1993-2017 and 2005-2015, in Figure 1, solid and dashed black lines, respectively) behave very similarly for the overlapping period. This indicates the robustness of the ensemble mean to the number of data sets used. Furthermore, the category means agree well with the ensemble mean for 2005-2015, all within one standard deviation from the ensemble.

Influence of Data Sets and Noise Models on Steric Estimates
For each of the time series, category, and ensemble means, we computed linear trends using eight different noise models, leading to 104 ( The situation is similar when a single noise model is chosen and the data sets are varied. For example, in the case of AR(1), the spread of the trend and uncertainty, although reduced, continues to be large, respectively , the spread of the trend becomes significantly smaller. This suggests that varying the data set has a larger influence on the estimated trend than has varying the noise model. In addition, we see that the spread is highly dependent on the chosen data set (Figures 2e and 2f). While IK09 (gray) has a minimal spread as a result of varying the noise models, GLORYS2 (white) shows a variation of almost 1 mm/year depending on the noise model being used. Interestingly, the ensemble mean trend and uncertainty (Figures 2g and 2h) show almost no sensitivity to the choice of noise model. This could be the result of the ensemble mean being smoother than the individual time series, and it indicates that using the ensemble mean reduces the dependency on the noise model choice. The results are similar for the period 1993-2017 ( Figure S3).

Noise Model Selection (Global Mean)
In order to determine which noise model best describes the global mean steric trends and uncertainties, we use the AIC and BIC scores (section 2.4). Our selection criterion is based on a threshold value (see Text S1 for more information); thus, multiple noise models can be selected for each data set if the score passes the threshold, and the final percentage of each score can be larger than 100.  (1) noise model when varying all the data sets, (e, f) for the IK09 (white) and GLORYS2 (gray) data sets when varying all the noise models, and (g, h) for the ensemble mean when varying all the noise models. The count on the y-axis is with respect to the number of data sets (n) considered in each case.
According to AIC, AR (5) (1) is never chosen as the preferred noise model for 1993-2017. This illustrates how the averaging process in ensemble means reduces the temporal variability of the time series, especially when a longer period is considered. Thus, a simple noise model, such as AR (1), becomes the preferred model for the category and ensemble means. For the mean of AIC and BIC, both AR(5) and ARF are equally preferred for 1993-2017, and AR(1) is ranked first for 2005-2015.
We complement our noise model selection with ARMASA (section 2.4). Between AR(p), MA(q), and ARMA(p′, p′ − 1) models, a simple moving average of order 6 is selected as the preferred noise model for the ensemble mean of 2005-2015. This shows that there is still some short-term periodicity in the residuals, even after the annual and semiannual signals have been removed. Between AR(p) and ARMA(p′, p′ − 1), the AR(1) is selected as the best noise model for both periods. However, for the individual time series, many different AR(p), MA(q), and ARMA(p′, p′ − 1) models were chosen (see Table S2). For the individual data sets, the most recurrent noise model for both periods was ARMA(2, 1). The preferred order of the noise models tends to be higher for the time series for 1993-2017 than for 2005-2015. Interestingly, while AIC and BIC found AR(5) to be one of the best noise models, this order was never preferred by ARMASA. This may be related to the much wider range of orders and model types tested by the ARMASA algorithm in order to select the one model that is statistically representative of the data (Broersen, 2002). We have used the ARMASA results as a qualitative indicator of the ideal noise model and find that while there are differences with the chosen model in Hector, these differences are small.
Considering the AIC and BIC ranking and the results of ARMASA, we concluded that an AR(1) model is a good descriptor of the trend and uncertainties of the global mean time series, especially for the category and ensemble means. Nevertheless, there is still a periodicity in the residuals. We see that for the Argo period, a signal of several months (2 to 6) is still present in the noise, while for the satellite altimetry era, an interannual signal, in the order of 3 up to 20 months, is seen in the residuals.

Global Mean Steric Sea-Level Trends
Applying an AR (1)  The Argo category (Figure 4, red) shows a good intracategory agreement with all individual values agreeing within the category uncertainties. The Argo category mean trend of 0.90 ± 0.09 mm/year is in the lower uncertainty range of the APDRC and SIORG data sets. The MiS category (Figure 4, blue) shows large differences for the two time periods, with a category mean of 1.00 ± 0.07 and 1.24 ± 0.08 mm/year for the Argo period and satellite altimetry era, respectively. For 2005-2015, the category mean is within 1-sigma of all individual data sets, except for Armor3D on the lower limit and EN4 on the upper limit of the category distribution. For the 1993-2017 period, the category mean is only within the uncertainty ranges of the CORA and EN4 data sets, with IK09 on the lower limit and Armor3D on the upper limit of the category distribution.
The REA category (Figure 4, green) shows the largest ranges for both periods, with almost 1 mm/year difference between the trends of individual data sets. The category mean for 2005-2015 is 1.40 ± 0.21 mm/year, falling within the uncertainty range of C-GLORS, FOAM, and GLORYS2. The category mean for 1993-2017 is 1.47 ± 0.23 mm/year, within the uncertainty range of all individual trends. The uncertainties  Table S1.

Journal of Geophysical Research: Oceans
CAMARGO ET AL.
from the REA data sets are much larger than for the other groups, probably related to the higher spatial resolution of such products and the modeled internal variability.
To test if the variance in the ensemble and category means explain the variance of the individual data sets, we computed the R 2 (Table S1)

Regional Steric SLC
Several processes that are important at a regional scale are averaged out in the GMSL analysis, especially when discussing residuals and uncertainties. Thus, in this section we complement the global mean discussion with results about the regional steric sea-level trends. To show how the steric SLC is regionally influenced by the chosen data sets (section 4.1) and the noise models (section 4.2), we selected one latitudinal transect in the center of each ocean basin (Figure 5a). We then use the ensemble mean to present the regional selection of noise models (section 4.3) used to obtain the regional steric sea-level trends (section 4.4).

Trend Dependence on Chosen Steric Data Sets
We first compare the steric trends of all different data sets for three latitudinal transects, using the AR (1)  The Indian Ocean (Figures 5c and 5d) is dominated by a positive steric trend, which can be related to the increasing atmospheric temperature in the region (Carvalho & Wang, 2019). Around 10°S, the trends in all data sets are not statistically significant and reverse to −2 mm/year, accompanied by an increased uncertainty (~3 mm/year). Going south, the trends reverse back to positive, reaching the highest values of trend and uncertainty around 40°S (up to 30 ± 10 mm/year for FOAM). Further south (around 55°S) the trend becomes negative again, possibly the effect of negative trends of the Antarctic Circumpolar Current (Frankcombe et al., 2013).
In the Pacific transect (Figures 5e and 5f), the band between 40°S and 60°S shows most variation between data sets, with a striping pattern that varies in intensity and width from data set to data set. A similar behavior was reported in a climate model study by Slangen et al. (2015), who related these changes to aerosol and greenhouse gas forcing. In the southern South Pacific, all data sets have a negative trend. Similarly to the Indian Ocean, the uncertainties in the Pacific follow the pattern of the trends and increase where the trends are reversing. There is a band of high uncertainty around the equator, coinciding with the negative trend values.
Most of the Atlantic Ocean transect trends and uncertainties (Figures 5g and 5h) agree between the data sets, except for the APDRC results. Most of the data sets have a negative trend in the north, matching the mid-2000s' signal of the North Atlantic subpolar gyre (Chafik et al., 2019). However, the length and intensity of this negative band vary significantly between the data sets. Only the CORA data set shows a positive trend in the north. For most of the data sets, around 20°S the trend becomes positive, with a stripe pattern of narrow bands until the far south of the transect. Only for the SIORG data set do we see an overall positive trend from 10°S to 60°S. The uncertainty values are generally lower in the Atlantic compared to those in the other transects. Most of the data sets, especially EN4, show an increase in the uncertainty around 40°S. The atypical negative trend between 40°N and 20°N of the APDRC data set is accompanied by a very high uncertainty value.

Trend Dependence on Chosen Noise Models
In contrast with varying the data set, there is almost no latitudinal variation in the trend of the ensemble mean when only varying the noise model ( Figure 6 for 2005-2015 and Figure S5 for 1993-2017). However, the effect of the different noise models becomes much clearer in the uncertainty profiles. We

Journal of Geophysical Research: Oceans
see that the WN model is likely underestimating the real uncertainty, because of the low-frequency variability in the time series. Due to the small impact of WN, PL and PLWN show the same behavior. The noise models of the autoregressive family are slightly different from each other, with the uncertainties for AR(1) generally larger than those for AR(5), which in turn are larger than the AR(9) uncertainties. The ARF model has a more pronounced spatial pattern compared to the simple autoregressive models. The GGM model stands out due to very large uncertainties. For 1993-2017 ( Figure S5), the latitudinal patterns of the trends and uncertainties are, in general, the same as for 2005-2015. Figure 6 also illustrates the importance of choosing an appropriate noise model to describe the uncertainties present in the data. The mean uncertainty for the WN model, in all transects, is of the order of ±0.7 mm/year, while the mean uncertainty for the GGM model is of the order of ±7 mm/year. However, a very low or high uncertainty does not mean that a specific noise model cannot be the best descriptor of the data. In the next section, we look at the AIC and BIC to find the preferred noise model for each grid point.

Noise Model Selection (Regional)
To find which noise model best describes the regional variations of the ensemble mean, we investigate the noise model performance at each ocean grid point. These results depend on the resolution and ocean mask used. The regional selection of the preferred noise model (Figure 7 for the ensemble mean and Figure S6 based on individual data sets), shows that AR(1) and AR(5) are the preferred noise models, agreeing with the global analysis (section 3.3). According to AIC (Figures 7a and 7b), AR (5) is the preferred noise model for 41% of the ocean area for 2005-2015 and 53% of the regions for 1993-2017. In contrast, the preference of BIC (Figures 7c and 7d) for AR(1) is very clear, selecting it as the best noise model for 73% and 54% of The noise model preference shows some distinctive regional patterns. The AR(5) pattern resembles the ocean gyres, suggesting a link between these dynamic regions and a preference for more complex noise models. The subtropical regions (both north and south) tend to also prefer the AR(5) model, for instance, in the AIC for 2005-2015 and BIC for 1993-2017. All scores show a clear "boomerang"-shaped pattern in the west equatorial Pacific (from~130°E to 180°E of longitude and 20°S to 20°N of latitude), associated with the El Niño-Southern Oscillation (ENSO; Wang & Picaut, 2004). However, in this region different orders of autoregressive models score higher for the different criteria: AR(5) for AIC and ARF for BIC. While for some data sets, the GGM is selected as the preferred noise model for the global mean time series; regionally, this is almost never the case. In contrast, while the WN model is never preferred for the global mean, it is selected a few times in the regional patterns. Comparing the two time periods, we see an increase of the preference of higher order of autoregressive noise models with the increasing study period length, as already noted for the case of the global mean.

Regional Steric Sea-Level Trends
In the previous section, the regional noise model selection (Figure 7) showed that a combination of different noise models is necessary to best describe the regional ensemble mean trend and uncertainties. While mixing noise models is not standard practice, it allows local processes to be well represented in the trend and uncertainty estimations. Thus, in this section, we show the regional steric sea-level trend and uncertainty (Figure 8) using the preferred noise model for each grid point. As AR(1) was the preferred noise model in more than 50% of the ocean grids, the differences between the trends (Figures 8b and 8e) estimated with AR(1) and with the preferred noise model are only significant in the most dynamic regions. For both time periods, the differences become larger for the estimated uncertainties (Figures 8d and 8g).
Over the shorter time period (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), the regional trends are dominated by short-time-scale dynamics (Figures 8a and 8b). For example, the equatorial Pacific has a very strong signal of ENSO. The western boundary currents also display strong trend signals. A clear signal of the Gulf Current (west North Atlantic) is marked by a high trend of about 6 to 7 mm/year. The same is seen for the Kuroshiro Extension (west North Pacific) and for the Malvinas Current (west South Atlantic). The Agulhas Current (South Africa) shows well-marked eddies. All these regions are dominated by internal variability rather than

10.1029/2020JC016551
Journal of Geophysical Research: Oceans a forced trend, with higher uncertainties (Figures 8c and 8d), and consequently, they are sometimes not statistically significant.
For the longer period , the trends (Figures 8e and 8f) are reduced in strength and less dominated by interannual oscillations and ocean currents, revealing a predominantly global positive trend. For instance, the strong positive trend in the east equatorial Pacific from ENSO fades into a weak negative trend, while the negative trend of the west is reversed to a positive, but weaker, trend. The trend also shows a weak reflection of the western boundary currents. The positive signal of the Gulf Stream is replaced by a negative trend. The negative trend of the North Atlantic Subpolar Gyre in 2005-2015 changes to a positive trend in 1993-2017, as also reported by Chafik et al. (2019). Although the Southern Ocean is outside our domain, we can still see in the lower boundaries of our maps the signal of the Antarctic Circumpolar Current, marked by a negative trend for both periods. As the time series get longer, internal variability has a smaller influence on the long-term trend, resulting in reduced uncertainties in 1993-2017 (Figures 8g and 8h) in comparison to those in 2005-2015.

Discussion
There is no clear consensus in the literature of which criterion should be used to select the best noise model, either AIC or BIC (Burnham and Anderson, 2002) or even another criterion (Klees & Broersen, 2002). While Sonnewald et al. (2018) use only the AIC, Hughes and Williams (2010) advise using BIC, which has less

10.1029/2020JC016551
Journal of Geophysical Research: Oceans spatial variability and thus would be more reliable than AIC. Figure 7 confirms this, since it displays more spatial differences for AIC selection than for BIC. However, if the aim is to investigate the spatial variability, then using a stricter criterion might not be preferable. Still, the preference of AIC for higher-order models and of BIC for simpler models is emphasized in Figures 3 and 7, as also reported by Sonnewald et al. (2018).
To overcome these differences, we have chosen to compare and show both selection criteria, complemented by an ARMASA noise model analysis.
Our global mean analysis indicates that an AR(1) model is the best descriptor of the residuals of the ensemble mean steric SLC. This contrasts with the findings of Bos, Williams, et al. (2014), who investigated the effect of noise models on GMSL, focusing on altimetry and tide gauge sea-level data. They found that for altimetry data, ARF and GGM models are equally preferred, followed by an AR(5) model. Although preferable to a white noise model, the GGM model was one of the least preferred models in our global analysis. When considering all data sets in the noise model analysis, our results indicate that AR(5) and ARF are among the best candidates for the global mean time series, partially in line with the results of Bos, Williams, et al. (2014).
Regionally, our noise model analysis indicates that most of the oceans are best described by an AR(5) or AR(1) noise model. This agrees with the findings of Hughes and Williams (2010) and Royston et al. (2018), who both looked for the best noise model to describe regional sea level observed with tide gauges and satellite altimetry. While Hughes and Williams (2010) found that AR (5) is the best descriptor of the sea-level uncertainties, Royston et al. (2018) found that most of the regional monthly data (both from tide gauges and altimetry) can be described by the WN or AR(1) noise model. However, the WN model was almost never selected in our analysis. Both studies also highlight that the appropriateness of the noise model is ordered in larger spatial structures. The regional preference of the noise models ( Figure 7) display spatial coherent patterns based on the temporal and spatial variability of the ocean processes. Our regional results agree to some extent with the spatial patterns described by previous studies, particularly the preference of more complex models in the tropics and highly dynamic regions. The latter might be related to multiple baroclinic modes present in these regions (Hughes & Williams, 2010).
The physics of the spatial patterns reported by Hughes and Williams (2010) were also confirmed by Sonnewald et al. (2018), who investigated the linear predictability of sea-surface height. Sonnewald et al. (2018) found that up to 50% of the sea-surface height variability over 20 years is explained by a seasonal signal. This might explain why the uncertainties for 1993-2017 are smaller than the ones for 2005-2015, as the linear trend becomes more important than the seasonal signal. The differences between noise model choices of previous studies (Hughes & Williams, 2010;Royston et al., 2018), which focused on tide gauge and altimetry data, and the present study, which focus on steric SLC, illustrate that the best noise model for steric SLC is not necessarily the best for total SLC. Hence, it is important to always analyze which noise model is most appropriate for each type of data and to consider the effect of different spatial (global vs. regional) and temporal (daily, monthly, or annual) resolutions.
Several sea-level budget studies use a suite of data sets, where the spread around the ensemble mean is used to describe the uncertainty of the sea-level trends (e.g., Cazenave et al., 2018;Gregory et al., 2013). This can result in misrepresentation of the uncertainty. For example, if we consider the 1-sigma spread of the estimated trends, then we would obtain an uncertainty of 0.71 and 0.36 mm/year, whereas with the AR(1) noise model we obtain uncertainties of 0.10 and 0.07 mm/year for 1993-2017 and 2005-2015, respectively. In some other cases, the authors do not mention how the uncertainties were obtained or what they represent in detail (e.g., Leuliette, 2015), making it difficult to interpret and compare steric estimates in the literature. As our results show, the noise model has an effect not only on the uncertainty but also on the derivation of the trend itself. This is expected once the stochastic model is incorporated in the least squares regression (Bos, Williams, et al., 2014). We caution that future studies should carefully describe how uncertainty estimates have been obtained, as this aids physical interpretation.
While we find that the noise model has the strongest effect on the uncertainty, our results show that differences in the trend are mainly a result of the data set choice, both at a regional scale ( Figure S9) and for the global mean (Figure 2). Most of the differences was seen within the REA products, mainly the FOAM data set. A possible explanation can be the incorporation of temperature and salinity data from instrumented marine mammals (Carse et al., 2015), which may lead to a high bias in the salinity values. Furthermore,

10.1029/2020JC016551
Journal of Geophysical Research: Oceans the energy of deep currents (1,000 m) is highly overestimated in FOAM (Desportes et al., 2019). However, in comparison with other ocean REA, FOAM estimates are the closest to tropical mooring observations (Desportes et al., 2019) and should not be discarded.
The World Climate Research Programme (WCRP) sea-level budget report (Cazenave et al., 2018) studied the same Argo period, from January 2005 to December 2015, and found an ensemble mean thermosteric sea-level trend of 1.31 ± 0.40 mm/year based on 11 in situ data sets, which is considerably higher than our ensemble mean of 1.08 ± 0.07 mm/year. However, Cazenave et al. (2018)  Our results show that the variation of the steric sea-level trends from the use of different data sets can be reduced by using an ensemble of data sets. This is in line with the assumption that the systematic biases of the individual data sets are reduced by the averaging process (Storto et al., 2017). However, Rougier (2016) argues that the offsetting of biases alone is not a good reason to use the ensemble mean, as one should not expect that individual data sets will have such fundamental differences. Rougier (2016) poses that the ensemble mean does not contain all of the variability of the true process, which can result in underestimating the true sea-level trend and uncertainty. Instead of using the ensemble mean, one could use the data set with the lowest root-mean-square error in relation to the ensemble mean, which will retain the variability of an individual data set around the mean of the ensemble (Rougier, 2016;Royston et al., 2020). In our analysis, the ISAS+ and C-GLORS data sets are closest to the ensemble means for 2005-2015 and 1993-2017, respectively. For the global mean, the uncertainty of both data sets is best described by the AR(5) model. The regional pattern of preferred noise models ( Figure S7) is dominated by AR(9) for 2005-2015 and by AR(5) for 1993-2017. Compared to the ensemble mean, the noise model selection according ISAS+ and C-GLORS displays more spatial variability, especially for the shorter time period. By construction, the final trend and uncertainty patterns for ISAS+ and C-GLORS ( Figure S8) are similar to the ensemble mean (Figure 8), though displaying higher spatial frequency features.

Conclusion
Motivated by the large discrepancies in dealing with uncertainties in the sea-level data sets, and the wide variety in steric sea-level trends published in the literature, this paper explored the variation in present-day steric SLC estimates. Two sources of variation were investigated: the uncertainties caused by the use of different data sets and by the use of different noise models to describe the residuals. We analyzed 15 data sets and eight noise models and showed the different rates and uncertainties of steric SLC for two time periods (2005-2015, the Argo period, and 1993-2017, the satellite altimetry era). By simultaneously varying all the data sets and noise models, the 2005-2015 global mean steric sea-level trend varied from 0.56 to 2.33 mm/year and the uncertainty from 0.02 to 1.65 mm/year.
Although the noise models are mainly used to describe the uncertainties of time series, we found that they also affect, to a smaller degree, the trend itself. By alternating the eight noise models for the ensemble mean, the 2005-2015 global mean steric sea-level trend varied from 1.07 ± 0.03 to 1.11 ± 0.19 mm/year. This illustrates the significant impact of the noise model on the uncertainty, which was even stronger on the regional scale. Our noise model analysis suggests that an AR(1) model is the most appropriate model to describe the 10.1029/2020JC016551

Journal of Geophysical Research: Oceans
residual behavior of the global mean steric SLC of the ensemble mean and consequently leads to the most representative uncertainty. Regionally, a combination of different noise models is required to best describe steric SLC and its uncertainty. While AR(1) was the preferred noise model for most of the regions, ultimately the most appropriate noise model depends on the study location. The spatial coherence in the noise model preference shows clusters that have similar dynamics, which can be used to investigate the regional sea-level budget.
Compared to the noise model, the choice of the data set has a stronger influence on the estimated trend. By alternating the data sets while keeping the AR(1) noise model fixed, the steric SLC varied from 0.66 ± 0.12 to 1.86 ± 0.27 mm/year for 2005-2015. Using an ensemble mean of several data sets reduces the effect of uncertainties from a single data set and gives a more robust estimate of the observed steric change.
There are distinct differences in the trends between the three categories of data sets: REA, MiS, and only Argo. While the REA products have clear advantages, such as covering longer periods and depths than in situ-based data sets, there is a large spread of the results within the REA category. Using only one REA product to estimate the steric SLC might lead to a considerable overestimation or underestimation of the trend.
This study showed that the choice of data set and noise model results in large differences in the steric sea-level trend and the associated uncertainty. We therefore recommend that studies on sea-level trends and sea-level budgets perform a noise model analysis and report which noise model was used to determine the trend. An AR(1) noise model and an ensemble of data sets provide the best estimate of the steric contribution to global mean SLC up to at least 25 years. For the purpose of describing regional variations, more complex noise models are needed, such as higher-order autoregressive ones.