# A Multifidelity Framework and Uncertainty Quantification for Sea Surface Temperature in the Massachusetts and Cape Cod Bays

## Abstract

We present a multifidelity framework to analyze and hindcast predictions of sea surface temperature (SST) in the Massachusetts and Cape Cod Bays, which is a critical area for its ecological significance, sustaining fisheries and the blue economy of the region. Currently, there is a lack of accurate and continuous SST prediction for this region due to the high cost of collecting the samples (e.g., cost of buoys, maintenance, severe weather). In this work, we use SST data from satellite images and in situ measurements collected by the Massachusetts Water Resources Authority to develop multifidelity forecasting models. This multifidelity framework is based on autoregressive Gaussian process schemes that systematically exploit all correlations between data from multiple heterogeneous spatiotemporal sources with various degrees of fidelity. This enables us to obtain implicitly their functional relationships and, at the same time, quantify the uncertainty of the data-driven predictions. Specifically, in the current work, we develop and validate progressively more complex models, including temporal, spatial, and spatiotemporal multifidelity hindcast predictions of SST in the Massachusetts and Cape Cod Bays. Together with these predictions, we present for the first time uncertainty maps for the region.

## Key Points

- A data-driven approach for estimating sea surface temperature in the Massachusetts and Cape Cod Bays is presented
- A new assimilation technique based on autoregressive Gaussian processes to combine satellite and in situ measurements is presented
- The multifidelity model produces continuous sea surface temperature and uncertainty maps for the Massachusetts and Cape Cod Bays

## 1 Introduction

In the Gulf of Maine, Massachusetts and Cape Cod Bays encompass an area of highly productive fisheries, aquaculture facilities, and ecological resources that provide services to coastal communities. In this area, the Stellwagen Bank National Marine Sanctuary is of particular relevance by supporting strong local economies through commercial and recreational fisheries and providing more than 80% of the whale watching tourism in New England (https://sanctuaries.noaa.gov/science/socioeconomic/factsheets/stellwagenbank.html). The future of these benefits is uncertain as sea surface temperatures in the Gulf of Maine are increasing at one of the highest rates in the world's oceans, with clear consequences in reducing Atlantic cod recruitment that contributed to overfished stocks (Pershing et al., 2015). Similarly, in New England, the lobster industry has migrated almost entirely into Maine, being no longer supported in southern coastal states as in the past (Steneck et al., 2013), while about 40 other fish and invertebrate species are considered as highly vulnerable to climate change and its variability (Hare et al., 2016). In the summer of 2018, sea surface temperature (SST) anomalies were 1 to 3 ° C above the 30-year average in the Gulf of Maine. These anomalies and ocean heat waves are becoming common in the Gulf of Maine. On 8 August 2018, the SSTs measured were the second warmest ever observed there (https://www.drought.gov/drought/sites/drought.gov.drought/files/media/reports/regional_outlooks/GOM%20Summer%202018.pdf). In view of these changes and the likely scenario they would increase in the near future, more accurate and timely predictions of sea temperature at regional and local scales might better help resource management and economic planning.

The predictability of environmental drivers is a key step in the process of linking climate and biological processes for better management of marine resources. Advances in climate prediction facilitate this management, with a particular focus on dynamical seasonal to decadal prediction systems derived from Global Climate Models (Tommasi et al., 2017). Considerable progress has been made with this modeling approach, but some limitations exist in coastal predictions at regional scales, particularly in the U.S. east coast (Stock et al., 2015). In addition, in situ data are relatively scarce in this region from buoys, but the Massachusetts Water Resources Authority (MWRA) sustains a comprehensive monitoring program for a wide range of bay conditions (Werme et al., 2017). The MWRA measurements provide an accurate but coarse space-time measurements of various quantities of interest in the Massachusetts and Cape Cod Bays, including seawater temperature. Finally, satellite measurements have better temporal resolution but they are gappy due to cloud coverage and require calibration. Therefore, by using less conventional approaches as the one presented in this study, we aim to advance the predictability of SST in the region by using all available sources of data of variable fidelity.

Gaussian processes (GP) are nonparametric Bayesian machine learning techniques for learning functions from data (Rasmussen & Williams, 2006). GP provides a flexible prior distribution over functions, enjoys analytical tractability, and has a probabilistic workflow that yields robust posterior estimates and handles uncertainty in a principled manner. Before conditioning on the training data, a GP model is completely specified via its mean function and covariance function, which is called *kernel* and determines how the GP model extrapolates and generalizes to new data. There are many choices for kernel. For example, linear regression, splines, Kalman filters, and OI are all examples of GPs with particular kernels (Reece & Roberts, 2010). The kernel of a GP is often parameterized and the hyper-parameters are learned from the observed data, for example, using the maximum likelihood. This allows for further fine-tuning the kernel to the training data. The training also determines hyper-parameters associated with the uncertainty model. Training the kernel and the uncertainty model replaces the *guessing* work needed in OI, in which various matrices and noise models must be determined a priori.

The novelty of our approach is based on the use of a *multifidelity* framework for obtaining SST, exploiting correlations in space-time for sea surface, and estimating both mean values as well as their uncertainty as *surface maps* in the Massachusetts and Cape Code Bays by combining in situ measurements with satellite data. This framework allows us to replace the usual line error bars by space-time error uncertainty maps. Our ultimate goal is to develop a regional model that could even monitor acidification, similar to the three regional aragonite models for the U.S. West Coast (Alin et al., 2018; Davis et al., 2018; Juranek et al., 2009), but continuously updated and endowed with uncertainty quantification and forecasting capabilities that will drive future measurements.

## 2 Data Sources

For the estimation of SST in Massachusetts and Cape Cod Bays, we consider two sources of data: (1) High-Fidelity: The MWRA measurements. The MWRA routinely measures SST and various other quantities at fourteen stations in Massachusetts and Cape Cod Bays (Libby et al., 2017). These stations are shown in Figure 1 (right). The MWRA measurements are taken at various depths and on a monthly basis during nine surveys per year with the exception of winter months. In this manuscript, we use MWRA measurements taken at 1 m below the surface as an estimate for seawater surface temperature. (2) Low Fidelity: The MODerate-resolution Imaging Spectroradiometer (MODIS) Terra on board NASA satellite with the spatial resolution of 4 4 km (Werdell et al., 2013). The satellite images are processed at level 3 with daily and nightly temporal resolutions and are provided through the NASA Earth Science Data System (ESDS) project. It can be accessed via Physical Oceanography Distributed Active Archive Center (PO.DAAC). Throughout this manuscript, we will use the MODIS Terra, thermal-IR SST level 3, 4 km, daily. We consider the satellite data in the region that spans the longitude of W to W and the latitude of N to N. The satellite images are gappy due to cloud coverage as shown in Figure 1 (left). The observation counts of MODIS satellite for years 2015 and 2016 are shown in Figure 2. It is clear that during summer, there are significantly more satellite data available compared to late fall, entire winter, and early spring. Although the MODIS satellite data used in this study are level 3 SST products, that is, they are blended with in situ measurements when compared against MWRA measurements, the MODIS satellite measurements show up to C discrepancy. We expected that some of this disagreement resulted from comparing surface with subsurface measurements (MWRA data from 1 m deep), and some of it may come from the coarse satellite resolution 4 4 km. However, the correlation plot between the MWRA measurements and MODIS satellite measurements in the year 2015 and 2016 indicated that this bias is not consistent along the temperature interval (Figure 3). Therefore, we use the MODIS satellite measurements as a low-fidelity data source. Overall, the satellite data provide regional-scale patterns while the MWRA measurements are accurate but scarce in situ measurements.

## 3 Multifidelity Modeling

In this section, we present the details of the multifidelity modeling based on GP regression (GPR; Rasmussen & Williams, 2006), which is a nonparametric Bayesian machine learning technique that enjoys analytic tractability and provides a fully probabilistic framework for approximating functions and treating approximation uncertainties in a principled manner.

### 3.1 Multifidelity Scheme

#### 3.1.1 Training

The hyper-parameters are learned from the low-fidelity and high-fidelity observations. The low-fidelity observations are denoted by , where is a input points and is a , representing SST satellite measurements, and is the number of low-fidelity measurements. Similarly, the high-fidelity observations are denoted by , where with the size of and with the size of are high-fidelity input points and (MWRA) measurements, respectively, and is the number of high-fidelity measurements.

#### 3.1.2 Prediction

*any*arbitrary point, and they are not tied to the location of the training data, for example, to satellite grids. In that sense, the predictions of the multifidelity model are continuous. The resulting posterior distribution at the high-fidelity level is taken as the prediction of the multifidelity model. The probabilistic prediction at arbitrary points correspond to conditioning the joint Gaussian prior distribution on the low-fidelity/high-fidelity observations ( ). To this end, we observe that the joint probability density function (PDF) of the new prediction points and the training points is also a Gaussian given by

### 3.2 Comparison with Other Methods

In this section, we compare GPR and multifidelity modeling with other existing approaches. We focus on two essential aspects of the presented multifidelity formulation, namely, regression and data assimilation.

#### 3.2.1 Regression

First, we compare GPR with polynomial regression. GPR is a nonparametric regression technique, in which no explicit basis function is assumed, unlike the polynomial regression, which is a parametric regression technique and requires the specification of the basis (monomials). To compare GPR and polynomial regression, we consider synthetic data generated by function corrupted by noise. The size of the training set is 500. This amounts to a single-fidelity regression as there is only one data source. In Figure 4, the results of GPR and polynomial regression of different orders ( and 20) are shown. It is clear that lacks the required model complexity to accurately discover and is too complex. GPR, however, discovers accurately (see Rasmussen & Williams, 2006 for a comprehensive discussion on GPR).

#### 3.2.2 Data Fusion

Second, we compare the multifidelity model with OI. In OI terminology, low-fidelity and high-fidelity data are referred to as *background* and *observation*, respectively, and the mean of the posterior is referred to as *analysis*. We consider two sources of data that are synthetically generated: (1) High-fidelity by 4 samples of function
and (2) low-fidelity by 21 samples of function
. The location of high-fidelity samples are chosen to be a subset of low-fidelity samples (same
values), which makes it unnecessary to specify the interpolator matrix
in equation 1. In Figure 5a, the single-fidelity GPR with the four high-fidelity samples are shown. Obviously, four samples of
are not sufficient for an accurate regression, which is reflected in large uncertainty bands. We assume a square-exponential kernel for OI background and observation noises:
, where
and
are the variance intensity and the correlation length, respectively. Therefore,
and
. We consider
. In Figures 5b and 5c, we compare the results of multifidelity model with OI with two different correlation lengths of
and
. The multifidelity model recovers the true function with very good accuracy, which is reflected in small uncertainty. It is clear that the mean values of OI exhibit large discrepancy with the truth. This is mainly because in OI, both low-fidelity (background) and high-fidelity (observation) are assumed to be *unbiased*. This can be observed by subtracting the truth from both sides of equation 1 and obtain an equation for the error (for more details, see Daley, 1992). The correlation length affects the length of the region corrected by the high-fidelity observation. This is clearly demonstrated in Figures 5b and 5c, where
widens the neighborhood corrected by the high-fidelity observation, while
only improves the value of the assimilated data at the location of high-fidelity observation. On the other hand, the multifidelity model accounts for biased low-fidelity data and learns the bias
from the data, while the high-fidelity must be unbiased. Moreover, the multifidelity model learns the correlation length of the covariance matrices as hyper-parameters.

## 4 Demonstration Examples

In this section, we will demonstrate the application of the multifidelity framework in estimating SST in Massachusetts and Cape Cod Bays. In particular, we present results of three SST multifidelity models: (1) temporal, (2) spatial, and (3) spatiotemporal. For validation purposes, we split the MWRA measurements to two disjoint sets: *train* set that is used in model training and the *test* set that is used to validate the multifidelity predictions. In all of these demonstrations, we perform hindcast predictions.

### 4.1 Temporal Multifidelity Model of SST

### 4.2 Spatial Multifidelity Model of SST

For our demonstration, we built spatial multifidelity models for the days of 20 March 2015 and 18 May 2016, when MWRA measurements were available. We note that the MWRA measurements of different stations are not taken simultaneously, but they are all taken within a 24-hr time interval. However, we assume that these measurements are taken at exactly the same time. This assumption introduces a small uncertainty that will be estimated via the noise model in the high-fidelity GP. In Figures 7b and 7c, the mean and the standard deviation of the multifidelity model for 20 March 2015 are shown. To build this multifidelity model, we used the satellite data shown in Figure 7a as the low-fidelity data. We observe that the uncertainty is small near the MWRA stations, and away from the stations, the uncertainty increases (Figure 7c). Comparing the satellite data and the mean of the multifidelity model reveals that the multifidelity model relies on low-fidelity data albeit with a bias correction in regions where no MWRA measurements are available.

To demonstrate the effect of MWRA measurements on the multifidelity model, we built another multifidelity model for 20 March 2015, where the measurement of the MWRA station F29 was added to the training set. The results for this case are shown in the second row of Figure 7. It is clear that the addition of station F29 expands the low-uncertainty region near north of Cape Cod, and it results in a significant update in the multifidelity prediction, specially since the region near the location of station F29 is not densely sampled by MWRA nor is it covered with satellite data for this particular day.

In the last row of Figure 7, the predictions of the multifidelity model for 18 May 2016 are shown. For this day, the satellite measurements provide better coverage in both Massachusetts Bay and Cape Cod Bay. The results show that the satellite and MWRA measurements are correlated and that the multifidelity model corrects the bias uncertainty of the satellite with the inclusion of the MWRA measurements.

### 4.3 Spatiotemporal Multifidelity Model of SST

We considered 14 MWRA stations in our analysis, from which the SST measured at 11 stations in the span of years 2015–2016 are selected as the training set for high-fidelity points. The remaining three stations are used as a test set for validation of the multifidelity model. The training and test MWRA stations are shown in Figure 8f with square and circle symbols, respectively. Overall, there are high-fidelity points available. The satellite images are used as monthly snapshots, that is, skipping every 30 days resulting in low-fidelity points. Using the entire satellite set of measurements with daily temporal resolution would result in over fitting of the multifidelity model. Besides the issue of over fitting, for cases with large training data sets, the computational cost of training the multifidelity model becomes increasingly prohibitive, as the computational complexity of training the multifidelity model is , where . This will also be the case for example when high-resolution measurements are needed to build multifidelity model with fine space-time resolution; (see Gardner et al., 2005) for such measurements in the region. Recently, parametric GP has been developed that enables training GP with big data (Raissi et al., 2019) and it also alleviates the problem of over fitting. The parametric GP enables effective mini-batch training procedures with the computational complexity being reduced to , where is the number of mini-batches, and is the number of training points is each mini-batch. Given that the MWRA and satellite data are provided on monthly intervals, the corresponding multifidelity model can resolve up to monthly time scales.

In Figures 8a– 8c, the multifidelity model is validated with the test data set of three MWRA stations (F13, F29, and N04) in the span of 2015–2016. In all three cases, the MWRA measurements are in good agreement with the mean of the multifidelity model. These measurements lie mostly within the uncertainty bands, with a few cases nearby. Among these three stations, F29 shows the largest overall uncertainty as it is not surrounded by MWRA stations that are used in the training of the model (see Figure 8f for the location of F29). On the other hand, N04 has the lowest overall uncertainty and the multifidelity mean yields the best prediction among the three selected stations. In Figures 8 d and 8e, the mean of the multifidelity model and its uncertainty are shown for stations F02 and N07. The measurements of both of these stations are used in the training of the model, and as a result, the uncertainty becomes very small near the time of the measurements. Note that in the absence of the noise model in the high-fidelity GP, represented by , the multifidelity mean will pass through the MWRA measurements, that is, zero uncertainty. The term estimates the noise in the MWRA measurements and the unresolved spatiotemporal scales in the multifidelity model.

In Figure 9a, we perform an independent validation of the spatiotemporal multifidelity model by comparing the hindcast prediction of our model with the in situ measurements of buoy A01 for the years 2015 and 2016. The buoy A01 is part of the Northeast Coastal Ocean Observation System NERACOOS, and it is located at N and W as shown in Figure 9b with a square symbol. The data for this buoy can be downloaded from: http://www.neracoos.org. The buoy A01 measurements are taken hourly, but the satellite and MWRA training data of the spatiotemporal multifidelity model have a monthly temporal resolution. As a result, the higher frequency oscillations are not resolved in the multifidelity model and they are accounted for with the uncertainty as can be seen in Figure 9a. Overall, a good agreement between the mean of the multifidelity model and the buoy measurements is observed.

In Figure 10 ,we further illustrate the learning of the large-scale seasonal dynamics via the MWRA measurements in the spatiotemporal multifidelity model. In this figure, we show snapshots of the mean and the standard deviation taken every 10 days from 1 January to 21 May 2015 and the approximate days of MWRA measurements (red circles). We observe that uncertainty is consistently lower near the MWRA measurements. However, during the winter, uncertainties are larger compared to late April and May. This is due to the scarcity of satellite data and the lack of MWRA measurements during winter months. This is further confirmed by comparing the measurements of buoy A01 with the multifidelity predictions. In Figure 10, we show the prediction of the multifidelity and its uncertainty (i.e., ) at the location of the buoy A01 for two days. The buoy measurements are also shown for comparison. On 1 January 2015, the uncertainty is large (i.e., C) due to lack of data, while on 1 April 2015, the uncertainty is lower (i.e., C). In both cases, the buoy measurements lie within the uncertainty band.

## 5 Summary and Future Work

The presented framework goes beyond traditional predictions with line error bars to continuous space-time uncertainty maps by utilizing and developing modern machine learning techniques that enable fusion of data from all available sources with varying degrees of fidelity. In Figure 11, we compare the difference between single-fidelity Gaussian regression model with the multifidelity model for 20 March 2015 (top row) and 18 May 2015 (bottom row). In the left column, the satellite data on the corresponding days are shown, and the middle column shows the Gaussian regression of the satellite measurements, that is, the single-fidelity model. The right column shows the mean values of the multifidelity model, where we have used MWRA measurements as the source of high-fidelity data. The comparison of the mean values of the single-fidelity and multifidelity models clearly shows that these two temperature maps are correlated. However, the multifidelity model corrects the single-fidelity model up to 3° C on 20 March 2015 and roughly 1–2° C on 18 May 2015.

*and*their uncertainty directly related to ocean acidification, that is, salinity, temperature, dissolved inorganic carbon, pH, total alkalinity, using a flexible Bayesian data fusion framework that

*leaves no data behind*—including 3D physics-based models (see e.g., Xue et al., 2014). In particular, we will study the effect of kernel on hindcast and forecast predictions. Kernels can be chosen to encode various structures in the data including smoothness, stationarity, and seasonality. In Figure 12, we show the performance of square exponential and periodic kernels

Ultimately, we intend to build 3D volumetric maps of these quantities in the Massachusetts and Cape Cod Bays, including Boston Harbor and the Stellwagen Bank. This will enhance our understanding of coastal ocean acidification and our ability to focus our monitoring efforts in places of high uncertainty and providing management strategies in regions with relative low aragonite saturation means that could negatively affect valuable resources.

## Acknowledgments

The authors thank Dr. Raissi for useful comments and stimulating discussions. This research has been supported primarily by a grant from NOAA (NA18OAR4170105) and partial support from Office of Naval Research, grant number N00014-16-1-2956. The Moderate-resolution Imaging Spectroradiometer (MODIS) SST data were obtained from the NASA EOSDIS Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the Jet Propulsion Laboratory, Pasadena, CA (https://doi.org/10.5067/MODST-1D4N4). The MWRA measurements may be downloaded from http://www.mwra.com/harbor/html/wq_data.htm.