Volume 123, Issue 1 p. 399-410
Research Article
Free Access

Retrieving Temperature Anomaly in the Global Subsurface and Deeper Ocean From Satellite Observations

Hua Su

Corresponding Author

Hua Su

Key Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, National Engineering Research Centre of Geo-spatial Information Technology, Fuzhou University, Fuzhou, China

Correspondence to: X.-H. Yan, [email protected]; H. Su, [email protected]Search for more papers by this author
Wene Li

Wene Li

Key Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, National Engineering Research Centre of Geo-spatial Information Technology, Fuzhou University, Fuzhou, China

Search for more papers by this author
Xiao-Hai Yan

Corresponding Author

Xiao-Hai Yan

Laboratory for Regional Oceanography and Numerical Modeling, National Laboratory for Marine Science and Technology, Qingdao, China

State Key Laboratory of Marine Environmental Science, Xiamen University, Xiamen, China

Center for Remote Sensing, College of Earth, Ocean and Environment, University of Delaware, Newark, DE, USA

Correspondence to: X.-H. Yan, [email protected]; H. Su, [email protected]Search for more papers by this author
First published: 08 January 2018
Citations: 76

Abstract

Retrieving the subsurface and deeper ocean (SDO) dynamic parameters from satellite observations is crucial for effectively understanding ocean interior anomalies and dynamic processes, but it is challenging to accurately estimate the subsurface thermal structure over the global scale from sea surface parameters. This study proposes a new approach based on Random Forest (RF) machine learning to retrieve subsurface temperature anomaly (STA) in the global ocean from multisource satellite observations including sea surface height anomaly (SSHA), sea surface temperature anomaly (SSTA), sea surface salinity anomaly (SSSA), and sea surface wind anomaly (SSWA) via in situ Argo data for RF training and testing. RF machine-learning approach can accurately retrieve the STA in the global ocean from satellite observations of sea surface parameters (SSHA, SSTA, SSSA, SSWA). The Argo STA data were used to validate the accuracy and reliability of the results from the RF model. The results indicated that SSHA, SSTA, SSSA, and SSWA together are useful parameters for detecting SDO thermal information and obtaining accurate STA estimations. The proposed method also outperformed support vector regression (SVR) in global STA estimation. It will be a useful technique for studying SDO thermal variability and its role in global climate system from global-scale satellite observations.

Key Points

  • A new approach to retrieve global subsurface temperature anomaly from satellite observations
  • Random Forest can well estimate subsurface temperature anomaly in the global ocean from sea surface parameters
  • Random Forest outperforms Support Vector Regression in global subsurface temperature anomaly estimation

1 Introduction

There are vast, complex, and highly dynamic processes at work within the ocean. Accurately estimating the thermohaline structure in the global subsurface and deeper ocean is essential for fully understanding these processes. Deeper ocean remote sensing has become an important research subject, as this part of the ocean is warming significantly in response to climate change—particularly during the recent global warming hiatus from 1998 to 2013 (Balmaseda et al., 2013; Song & Colberg, 2011). The recent hiatus (Kosaka & Xie, 2013) has been proven to be a heat redistribution within the oceans (Yan et al., 2016). The global subsurface and deeper ocean has played an essential role in the hiatus by heat uptake and storage (Chen & Tung, 2014; Drijfhout et al., 2014), but there is large uncertainty and discrepancy in the subsurface and deeper ocean warming evaluation (Su et al., 2017). Further understanding of the ocean heat redistribution could help better track the energy budget in the Earth's system (Yan et al., 2016), but requires long-term, global-scale ocean interior observation data. Detecting the subsurface and deeper ocean may help us to better depict the processes and features in the subsurface and deeper layer, as well as their precise implications in regards to climate system by deriving dynamic parameters in the ocean's interior through satellite observations (Klemas & Yan, 2014).

It is important, but very challenging, to retrieve the thermal structure of the global subsurface and deeper ocean from satellite observations. Estimating the thermohaline structure over global spatial scale is necessary for accurately estimating subsurface flow fields, advection, and other dynamic processes in the ocean's interior over large scales (Wilson & Coles, 2005). Satellite remote sensing have collected multiple sea surface observations at various spatiotemporal scales for several decades, but they are confined to the ocean surface layers (Ali et al., 2004). Satellite remote sensors cannot directly detect information beneath the ocean surface layer, yet many significant dynamic processes and features are located at much greater depths below the surface (Klemas & Yan, 2014). It is vital to determine which and how such surface remote sensing observations can be used to gather important dynamic parameters about the ocean's interior.

Remote sensing observations and numerical models enable us to interpret and depict subsurface phenomena from sea surface features (Su et al., 2015). Satellite observations do not directly represent subsurface information, but some aspects of subsurface structure with surface manifestations could be detectable by satellite sensors (Fiedler, 1988). Most of the studies on this subject based on dynamic models could not provide sufficiently wide coverage. Complicated and time-consuming assimilation models can more accurately reflect the large-scale subsurface thermal structure. However, a combination of satellite observations and in situ measurements can be used to directly and effectively retrieve the subsurface information if appropriate algorithms are applied (Klemas & Yan, 2014).

The subsurface temperature anomaly (STA) and dynamic height anomaly (DHA) have high spatial similarities at different depths (Dennis et al., 2001; Willis et al., 2003). Sea surface topography from satellite altimetry could be used to infer subsurface temperature structure (Khedouri et al., 1983). Sea surface temperature from satellite observation can be used to determine vertical thermal structure (Chu et al., 2000). A multivariate projection method is applied to estimate 3-D temperature field using both SSH and SST anomalies (Fischer, 2000). Meijers et al. (2011) used satellite altimetry to estimate the 4-D structure of the Southern Ocean. Guinehut et al. (2004, 2012) derived high-resolution 3-D temperature and salinity fields from satellite observations and in situ data. In addition to traditional regression methods, a neural network was used to determine ocean subsurface temperature structure from sea surface parameters (Ali et al., 2004). A self-organizing map (SOM) neural network was employed to estimate subsurface temperature profiles from multiple surface observations (Wu et al., 2012). The subsurface temperature anomaly (STA) in the Indian Ocean and Global Ocean was estimated from multisource satellite measurements based on a support vector machine (SVM) approach (Li et al., 2017; Su et al., 2015). An iterative self-organizing maps was presented for the reconstruction of subsurface velocities from satellite observations (Chapman & Charantonis, 2017). A new “interior + surface quasigeostrophic” (isQG) method was proposed for reconstructing subsurface velocity and density fields from sea surface density and sea surface height information (Wang et al., 2013). The density and horizontal velocity fields of the ocean's interior can also be retrieved from surface data via isQG method (Liu et al., 2014, 2017). There have been other valuable studies on the ocean's interior by employing satellite observations (Fox et al., 2002; Swart et al., 2010; Willis et al., 2003).

Existing approaches for retrieving subsurface thermal structure from sea surface parameters are generally based on either dynamical models or statistical models. The existent statistical approaches give little attention to global-scale application and advanced machine-learning models, and employ only a few surface parameters to derive the subsurface dynamic fields. Thus, the estimation methods themselves and their global-scale accuracy still show much room for improvement. Machine-learning algorithms are being developed to estimate regional and basin-scale ocean interior thermohaline structure using multiple satellite observations combined with in situ measurements. There is also room, however, for further improvements in the accuracy of large-scale subsurface thermal structure to facilitate heat content estimation of the subsurface and deeper layers (Yan et al., 2016).

This paper proposes a new method based on a Random Forest (RF) machine-learning model to estimate STA in the global ocean from multiple satellite observations including sea surface height (SSH), sea surface temperature (SST), sea surface salinity (SSS), and sea surface wind (SSW) using in situ Argo data for RF training and testing. All four global-scale sea surface parameters are derived from satellite remote sensing. The performance was evaluated by comparing the horizontal maps of RF-predicted STA with Argo STA at different depth levels. The proposed method was proven feasible and effective by comparison against another popular machine-learning model, support vector regression (SVR) (Vapnik, 1995; Vapnik et al., 1997).

2 Study Area and Data

The global ocean covers almost 71% of the Earth's surface and contains about 97% of the Earth's water. The ocean, the Earth's most prominent feature, plays a significant role in modulating the global climate system—this has become especially clear in the context of global warming and the recent warming hiatus (Yan et al., 2016). It plays an important role as a sink for heat building up in the Earth's systems (Chen & Tung, 2014), and also acts as an important sink for the increasing CO2 resulting from human activities. The study area examined here is the global ocean which is located between 180°W–180°E and 78.375°S–77.625°N and includes the Pacific Ocean, Atlantic Ocean, and Indian Ocean.

Multisource satellite observations can be used to obtain diverse sea surface parameters at multiple spatiotemporal scales such as SSH, SST, SSS, and SSW. In this study, the SSH data were obtained from AVISO altimetry with 0.25° × 0.25° spatial resolution (AVISO, http://www.aviso.altimetry.fr). The SST was provided by an AMSR-E sensor on the AQUA satellite with 0.25° × 0.25° spatial resolution (http://www.remss.com/missions/amsre). The accuracy for the SST products is about 0.5°C. The SSS was obtained from a MIRAS sensor on the SMOS satellite with 1° × 1° spatial resolution (http://eopi.esa.int). The SMOS Level 3 SSS products present a global mean and standard deviation error of 0.006 pss and 0.287 pss, respectively. The SSW was provided by the CCMP (Cross-Calibrated Multi-Platform) wind velocity product with spatial resolution of 0.25° × 0.25° (https://rda.ucar.edu/datasets/ds745.1/). The CCMP data set combines multisource surface wind data using a variational analysis method (VAM) to produce high-resolution gridded analyses (Atlas et al., 2011). The Argo data sets obtained from the International Pacific Research Center (IPRC, http://iprc.soest.hawaii.edu) are monthly global gridded data based on an optimal VAM interpolation method. The Argo temperature data with 1° × 1° spatial resolution have global coverage at 27 standard depth levels in the upper 2,000 m.

The data inputs to the proposed RF machine-learning process are monthly SSHA, SSTA, SSSA, and SSWA of the global ocean in 2010, and data output from the RF model is the estimated STA at different depth levels. The SSHA, SSTA, SSSA, and SSWA are anomalies of SSH, SST, SSS, and SSW subtracted by the monthly mean climatology, which prevents any climatology seasonal variation signals. The Argo STA data were used as RF training and testing labels, and also as a performance measure for accuracy evaluation. All the data as RF inputs (each sea surface feature and Argo STA data) were unified to the same 1° × 1° spatial resolution with the same temporal and spatial coverage of the global ocean.

3 Methods

3.1 Random Forests (RFs)

RFs are a popular and well-used ensemble learning method for data classification and regression. Breiman (2001) proposed the general strategy of RFs, which fit numerous decision trees on various data subsets by randomly resampling the training data. RFs adopt averaging to improve the prediction accuracy and control overfitting, and correct for the decision tree's tendency of overfitting. RFs have been effectively applied in various remote sensing studies (Gislason et al., 2006; Ham et al., 2005; Zhang et al., 2014) and typically perform very well. Several advantages make RFs well-suited to remote sensing studies (Guo et al., 2011; Yu et al., 2011).

The basic strategy of an RF is to grow a number of decision trees on random subsets of the training data (Stumpf & Kerle, 2011), and determine the decision rules and choose the best split for each node splitting (Liaw & Wiener, 2002). This strategy performs well compared to many other classifiers (such as SVMs and neural networks), and make it robust against overfitting (Breiman, 2001). RF is also user-friendly since it needs only two input parameters for training, the number of trees in the forest (ntree) and the number of variables/features in the random subset at each node (mtry), and both parameters are usually not very sensitive to their values (Liaw & Wiener, 2002). The advantages are good for the STA prediction from RF.

3.2 Experimental Setup

The flowchart for STA estimation at different depth levels (e.g., 300 m depth) of the global ocean via RF model is shown in Figure 1. There are three steps: training data set preparation and input, RF training for the prediction model, and RF estimation by the model. There are only two parameters for an RF regression, ntree and mtry. To ensure an optimal RF learning model, we adopted a grid-search quantitative analysis to achieve the optimal input parameter values (ntree = 500 and mtry = 2 for four-parameter RF, ntree = 500 and mtry = 1 for three-parameter RF, and ntree = 500 and mtry = 1 for two-parameter RF) (Figure 2). The Figure 2 shows four-parameter RF performs stable and best (with stable and largest r2) when ntree reaches about 500, meanwhile r2 presents highest when mtry = 2, suggesting ntree = 500 and mtry = 2 are the best combination for four-parameter RF. We also used the same strategy to determine the optimal values for three-parameter and two-parameter RFs.

Details are in the caption following the image

The flowchart for STA estimation at different depth levels (e.g., 300 m depth) of the global ocean via RF-based approach. SSHA, SSTA, SSSA, and SSWA are from satellite observations of the sea surface.

Details are in the caption following the image

The determination of optimal input parameter values (ntree and mtry) for RF model (e.g., four-parameter RF) based on grid-search analysis.

First, as preprocessing steps for RF, the input data were transformed to an RF package format and applied a simple linear scaling. We employed a linear normalization method to rescale all the data to the range [−1, 1], so as to avoid fields in larger numeric ranges dominating those in smaller ones, and also to prevent numerical difficulties during the model calculation. We applied the same normalization to both training and testing data. Then, Argo STA data at each depth level were randomly sampled and independently separated for the RF training label and testing label: 60% for training and 40% for testing, respectively. Finally, we established the RF model by using the optimal parameters (ntree, mtry) and sea surface parameters (SSHA, SSTA, SSSA, SSWA) as inputs for the RF training and prediction. We also used MSE (mean squared error) and r2 (squared correlation coefficient) to evaluate RF performance, and to quantitatively evaluate the RF-predicted results (Figure 1).

The RF model was set up by using four parameters (SSHA, SSTA, SSSA, SSWA), three parameters (SSHA, SSTA, SSSA), and two parameters (SSHA, SSTA) as input data sets for performance comparison. STA was estimated by RF model at 16 depth levels by using three different combinations of sea surface parameters (four parameters, three parameters, and two parameters) as RF inputs for the sake of comparison. In addition to the RF method, we also set up an SVR model (a popular supervised learning model based on support vector machine (SVM)) for global STA estimation as a comparative experiment on which method is better-suited to detecting the global-scale STA. To ensure an optimal SVR model for comparison, we carried out a cross-validation (grid-search) quantitative analysis for obtaining the best penalty (C) and Gamma (G) parameters (C = 1, G = 8) for four-parameter SVR setup.

4 Results and Discussions

The sea surface parameters (SSHA, SSTA, SSSA, and SSWA) of the global ocean on October 2010 from satellite observations are shown in Figure 3. They are input data sets for RF model with the same temporal and global coverage. The SSHA, SSTA, SSSA, and SSWA range from −2 to 2 m, −4 to 3°C, −1.5 to 1.0 PSS and −4 to 4 m/s, respectively (Figure 3). The four sea surface parameters represent distinctive spatial features with obvious spatial heterogeneity over the global ocean. The SSHA shows a significant zonal block-pattern (positive or negative) distribution over different regions of the global ocean, with an especially pronounced negative pattern in the central and eastern equatorial Pacific Ocean and positive pattern in the western North and western South Pacific Ocean (Figure 3a). The SSTA presents a significant zonal distribution pattern. The negative SST anomaly in the central and eastern equatorial Pacific Ocean presents a La-Niña-like cooling pattern with −1.5°C of Nino 3.4 index. The SST in the tropical Indian Ocean shows a negative Indian Ocean Dipole (IOD) pattern (Figure 3b). The SSSA presents a zonal and even distribution pattern in different ocean basins (Figure 3c), while the SSWA shows a zonal block-pattern and longitudinal alternation distribution over the global ocean (Figure 3d).

Details are in the caption following the image

Monthly SSHA (a), SSTA (b), SSSA (c), and SSWA (d) for the global ocean from satellite observations (sea surface parameters as RF input data sets) with identical 1° × 1° spatial resolution on October 2010.

The RF experiments were set up for retrieving STA at 16 different depth levels upper 1,000 m (from 30 m to 1,000 m), respectively. The RF inputs included three combinations of sea surface parameters (four parameters (SSHA, SSTA, SSSA, SSWA), three parameters (SSHA, SSTA, SSSA), and two parameters (SSHA, SSTA)), which allowed us to determine which parameter combination is best-suited to the RF model and to explore the effects of SSS and SSW on the RF estimation. The RF-estimated STA from the four-parameter RF model at depths of 100, 300, 500, 700, and 1,000 m in the global ocean is shown in Figure 4. The Argo STA at the same depth levels was employed to verify the RF-estimated STA from the horizontal maps and to validate the prediction results per the performance measures discussed above (Figure 4 and Table 1).

Details are in the caption following the image

RF-estimated STA (left) based on four sea surface parameters (SSHA, SSTA, SSSA, SSWA) compared to Argo gridded STA (right) at different depth levels of the global ocean on October 2010.

Table 1. Performance Measures of Two-Parameter, Three-Parameter, and Four-Parameter RFs for STA Estimation at 16 Depth Levels per Calculated MSE and r2 Values
MSE r2
STA estimation at different depths (16 levels) 2var (SSHA, SSTA)/3var (SSHA, SSTA, SSSA)/4var (SSHA, SSTA, SSSA, SSWA) 2var (SSHA, SSTA)/3var (SSHA, SSTA, SSSA)/4var (SSHA, SSTA, SSSA, SSWA)
30 m 0.0153/0.0133/0.0114 0.554/0.618/0.661
50 m 0.0109/0.0098/0.0085 0.434/0.504/0.557
75 m 0.0125/0.0110/0.0095 0.501/0.564/0.627
100 m 0.0116/0.0102/0.0090 0.483/0.539/0.603
125 m 0.0108/0.0095/0.0083 0.509/0.560/0.630
150 m 0.0158/0.0137/0.0113 0.472/0.521/0.609
200 m 0.0089/0.0077/0.0066 0.418/0.469/0.531
250 m 0.0109/0.0092/0.0080 0.400/0.459/0.516
300 m 0.0084/0.0072/0.0063 0.427/0.484/0.531
400 m 0.0074/0.0062/0.0054 0.428/0.485/0.538
500 m 0.0057/0.0048/0.0043 0.447/0.498/0.547
600 m 0.0057/0.0050/0.0045 0.439/0.481/0.528
700 m 0.0069/0.0062/0.0054 0.378/0.416/0.485
800 m 0.0079/0.0071/0.0061 0.339/0.382/0.458
900 m 0.0089/0.0081/0.0070 0.300/0.345/0.422
1,000 m 0.0099/0.0091/0.0080 0.241/0.294/0.370

The distribution features of the RF-estimated STA generally coincided with the Argo STA according to visual comparison of the horizontal RF-based and Argo STA maps at different depth levels. Most of the distinctive anomaly patterns (positive or negative) of the subsurface temperature over the global ocean were effectively detectable from sea surface remote sensing observations via the RF machine-learning approach. We quantitatively evaluated the results using MSE and r2 as performance measures (Figure 4 and Table 1). The RF models showed different performances at the various depth levels, e.g., MSE = 0.0090 and r2 = 0.603 at 100 m depth, MSE = 0.0063 and r2 = 0.531 at 300 m depth, MSE = 0.0043 and r2 = 0.547 at 500 m depth, MSE = 0.0054 and r2 = 0.485 at 700 m depth, MSE = 0.0080 and r2 = 0.370 at 1,000 m depth. A lower MSE and higher r2 indicate better RF performance with higher accuracy. These indicators altogether suggest that the RF method can accurately estimates STA in the upper 1,000 m of the global ocean from surface remote sensing observations.

The STA range decreased with depth, while the anomaly signal for the subsurface temperature grew weaker and less distinct in spatial heterogeneity (Figure 4). This phenomenon is related to the significant differences in dynamic processes between the intermediate and upper ocean. At the depth of 100 m (upper ocean), the distinctive warm pool (positive STA) and La-Niña-like patterns (negative STA) in the tropical ocean dominate the subsurface temperature variation over the global upper ocean. From 300 to 1,000 m depth (subsurface and deeper ocean), conversely, the dominant STA patterns show similar distribution features but differ considerably from those at 100 m depth in the upper ocean. We observed significant STA signals in the Antarctic Circumpolar Current (ACC), Gulf Stream, and Kuroshio Current (strong current dynamic processes with intense mesoscale eddy processes) regions; the STA patterns close to the boundary are more intense and significant than those in the central ocean basin (likely due to the strong boundary current). In general, in the upper ocean, the dominant STA patterns are distributed in the tropical ocean (including Pacific, Indian, and Atlantic areas) and are dominated by warm pool and La-Niña-like patterns; in the subsurface and deeper ocean, the dominant anomaly patterns occur in the Southern Ocean (ACC region), North Atlantic (Gulf Stream region), and western North Pacific (Kuroshio Current region) accompanied with strong current processes and mesoscale processes (Figure 4).

Our quantitative comparison of performance measures for two-parameter, three-parameter, and four-parameter RFs for global STA estimation at 16 different depth levels per MSE and r2 results is shown in Table 1 and Figure 5. The average MSE and r2 of the 16 depth levels are 0.0091/0.0080/0.0069 and 0.406/0.456/0.521 for two parameters/three parameters/four parameters (MSE ranges from 0.0057/0.0048/0.0043 to 0.0158/0.0137/0.0114, and r2 ranges from 0.241/0.294/0.370 to 0.554/0.618/0.661 for two parameters/three parameters/four parameters), respectively. The four-parameter RF outperforms the other two in global STA estimation, with the lowest MSE and highest r2 not only on average but also at each depth level; three-parameter RF outperforms two-parameter one. In effect, the RF with SSS and SSW achieved much better performance in STA estimation compared to those RFs without SSS and SSW (Table 1 and Figure 5). SSS and SSW are useful parameters in addition to SSH and SST for detecting the subsurface and deeper ocean thermal information, and can improve STA estimation accuracy at all depth levels in the upper 1,000 m. These important ocean surface dynamic parameters may play important roles in ocean motion and processes (e.g., current, mixing, and convection dynamics), which connect the upper ocean and the subsurface layer to a certain extent.

Details are in the caption following the image

Performance measures of four-parameter (SSHA, SSTA, SSSA, SSWA), three-parameter (SSHA, SSTA, SSSA), and two-parameter (SSHA, SSTA) RFs for STA estimation at 16 different depths per MSE and r2 values.

We assessed the accuracy and sensitivity of the proposed method at 16 different depth levels from 30 to 1,000 m. The four-parameter RF performed best at each depth level, followed by the three-parameter and two-parameter RFs, but they showed similar accuracy variation trends. The MSE stayed low (from about 0.004 to 0.015) in all experimental cases. The lowest MSE was calculated at 500 m and the highest at 150 m. The MSE and r2 trends are distinct. The r2 has a general down trend, and the MSE has a down and up trend with trough at 500 m. The outlier in r2 trend is at 50 m and in MSE is at 150 m. In the upper 500 m, The MSE and r2 trends are slowly down together, MSE decreases from about 0.015 to 0.004 and r2 declines from about 0.66 to 0.40. However, MSE gradually increases from about 0.004 to 0.009 while r2 gradually decreases from about 0.50 to 0.30 at depths below 500 m. The tendency for underestimation in the deeper layers in the ACC region and Gulf Stream region, which are characterized by strong physical processes, is illustrated in Figure 4.

In general, RF performance was clear but with relatively high MSE and r2 in the upper 500 m per the MSE and r2 trend we calculated. The outlier in MSE and r2 trend might be related to the complex dynamic processes of the upper ocean and the disturbance from the mixing layer and thermocline. The estimation accuracy gradually decreases at depths below 500 m, per the steep decline in r2 and increase in MSE. This might be due to the relatively stable seawater stratification in the deeper layer—deeper ocean phenomena have weaker surface manifestations that are harder to interpret from the satellite measurements (Figure 5).

We also employed a well-known SVR for comparison against the proposed RF method. The average MSE and r2 of the 16 levels are 0.0069/0.0091 and 0.521/0.488 for the four-parameter RF and SVR, respectively. RF performed better than SVR, with lower MSE and higher r2 not only on average but also at each depth level, suggesting that RF is better-suited than SVR to global-scale STA estimation in the upper 1,000 m (Table 2 and Figure 6).

Table 2. Performance Measures of Four-Parameter (SSHA, SSTA, SSSA, SSWA) RF and SVR for STA Estimation at 16 Depth Levels per Calculated MSE and r2 Values
STA estimation at different depths (16 levels) MSE (RF/SVR) r2 (RF/SVR)
30 m 0.0114/0.0139 0.661/0.619
50 m 0.0085/0.0115 0.557/0.467
75 m 0.0095/0.0115 0.627/0.576
100 m 0.0090/0.0102 0.603/0.572
125 m 0.0083/0.0108 0.630/0.585
150 m 0.0113/0.0149 0.609/0.535
200 m 0.0066/0.0090 0.531/0.478
250 m 0.0080/0.0107 0.516/0.462
300 m 0.0063/0.0081 0.531/0.496
400 m 0.0054/0.0061 0.538/0.520
500 m 0.0043/0.0050 0.547/0.517
600 m 0.0045/0.0051 0.528/0.486
700 m 0.0054/0.0061 0.485/0.431
800 m 0.0061/0.0070 0.458/0.398
900 m 0.0070/0.0078 0.422/0.360
1,000 m 0.0080/0.0086 0.370/0.300
Details are in the caption following the image

Performance measures of four-parameter (SSHA, SSTA, SSSA, SSWA) RF and SVR for STA estimation at 16 different depths per MSE and r2 values.

Scatter plots between RF-estimated (four-parameter RF) and Argo STA at 150 m (MSE = 0.0113, r2 = 0.609), 300 m (MSE = 0.0063, r2 = 0.531), 600 m (MSE = 0.0045, r2 = 0.528), and 900 m (MSE = 0.007, r2 = 0.422) depths are shown in Figure 7. We used the Argo STA to compare and validate the RF-estimated STA in different depth levels with these scatter plots. Most of the data points are concentrated along the equal-value line with a high correlation coefficient and a low MSE (Figure 7), especially at 300 and 600 m, again indicating that the RF method is reliable and robust for STA estimation at the global scale.

Details are in the caption following the image

Scatter plots between four-parameter RF and Argo STA estimates at 150 m (MSE = 0.0113, r2 = 0.609), 300 m (MSE = 0.0063, r2 = 0.531), 600 m (MSE = 0.0045, r2 = 0.528), and 900 m (MSE = 0.007, r2 = 0.422).

5 Conclusions

This paper proposed a novel method for retrieving STA in the global ocean based on an RF, a popular machine-learning method for data regression. The proposed model was found to accurately estimate the global STA in the upper 1,000 m through multiple surface remote sensing observations (SSHA, SSTA, SSSA, and SSWA) with in situ Argo data for labeling at different depth levels. The RF performance on global STA estimation was quantitatively evaluated by a combination of MSE and r2.

The RF-predicted results were tested for accuracy and reliability using the worldwide Argo STA data. The average MSE and r2 of 16 depth levels are 0.0069 and 0.521 for the four-parameter RF, respectively. The estimation accuracy was improved by including SSSA and SSWA as RF inputs (MSE decreased by 12.1%/24.2% and r2 increased by 12.3%/28.3% on average for three-parameter/four-parameter RFs compared to the two-parameter RF). The estimation accuracy gradually decreased as depth increased below 500 m. The STA signal became weaker and showed less distinct spatial heterogeneity in the global distribution at greater depths due to water stratification and stability.

Sea surface parameters are crucial inputs for retrieving the STA from RF. Our results showed that SSSA and SSWA, in addition to SSTA and SSHA, are useful for accurately estimating the subsurface thermal structure, and can help improve the model performance. We compared the proposed method against SVR method to find that the RF approach is more accurate (MSE decreased by 24.2% and r2 increased by 6.8% on average), i.e., better-suited to STA estimation than SVR throughout the global ocean. The RF-based method is also favorable in that there is no limitation on the input of sea surface parameters. To this effect, more useful sea surface parameters can be mined from satellite remote sensing and used as input attributes so as to further improve the STA sensing accuracy from RF.

We hope that the results provided here represent a useful approach to better detecting and studying the thermal structure and its variability in the ocean's interior, which has played an important role in global warming and its recent hiatus, from global-scale satellite observations. For future work, we will try to improve the STA detectability in the deeper layer by developing deeper ocean remote sensing techniques based on the combination of multiple satellite measurements with in situ observations, and further improve the estimation accuracy by using more useful sea surface parameters and more advanced techniques.

Acknowledgments

National Natural Science Foundation of China (41601444), SOA Global Change and Air-Sea Interaction Project (GASI-IPOVAI-01–04), Natural Science Foundation of Fujian Province, China (2017J01657), China Postdoctoral Science Foundation (2016M600495, 2017T100466), National Natural Science Foundation of China (41630963, 41476007), Fujian Collaborative Innovation Center for Big Data Applications in Governments (2015750401), and Central Guide Local Science and Technology Development Projects (2017L3012) are acknowledged for financial supports. We also thank the International Pacific Research Center (IPRC) for the Argo gridded data (http://iprc.soest.hawaii.edu), the AVISO altimetry for the SSH data (http://www.aviso.altimetry.fr), the Remote Sensing Systems (RSS) for the AMSRE SST data (http://www.remss.com/missions/amsre), the European Space Agency (ESA) Earth Online for the SMOS SSS data (http://eopi.esa.int), and the Research Data Archive at the NCAR for the CCMP SSW data (https://rda.ucar.edu/datasets/ds745.1/), which are freely accessible for public.