Volume 7, Issue 2 e2019EA000954
Research Article
Open Access

A Multifidelity Framework and Uncertainty Quantification for Sea Surface Temperature in the Massachusetts and Cape Cod Bays

H. Babaee

Corresponding Author

H. Babaee

Department of Mechanical Engineering and Materials Science, University of Pittsburgh, Pittsburg, PA, USA

MIT Sea Grant, Massachusetts Institute of Technology, Cambridge, MA, USA

Correspondence to: H. Babaee,

[email protected]

Search for more papers by this author
C. Bastidas

C. Bastidas

MIT Sea Grant, Massachusetts Institute of Technology, Cambridge, MA, USA

Search for more papers by this author
M. DeFilippo

M. DeFilippo

MIT Sea Grant, Massachusetts Institute of Technology, Cambridge, MA, USA

Search for more papers by this author
C. Chryssostomidis

C. Chryssostomidis

MIT Sea Grant, Massachusetts Institute of Technology, Cambridge, MA, USA

Search for more papers by this author
G. E. Karniadakis

G. E. Karniadakis

Division of Applied Mathematics, Brown University, Providence, RI, USA

Search for more papers by this author
First published: 03 January 2020
Citations: 9


We present a multifidelity framework to analyze and hindcast predictions of sea surface temperature (SST) in the Massachusetts and Cape Cod Bays, which is a critical area for its ecological significance, sustaining fisheries and the blue economy of the region. Currently, there is a lack of accurate and continuous SST prediction for this region due to the high cost of collecting the samples (e.g., cost of buoys, maintenance, severe weather). In this work, we use SST data from satellite images and in situ measurements collected by the Massachusetts Water Resources Authority to develop multifidelity forecasting models. This multifidelity framework is based on autoregressive Gaussian process schemes that systematically exploit all correlations between data from multiple heterogeneous spatiotemporal sources with various degrees of fidelity. This enables us to obtain implicitly their functional relationships and, at the same time, quantify the uncertainty of the data-driven predictions. Specifically, in the current work, we develop and validate progressively more complex models, including temporal, spatial, and spatiotemporal multifidelity hindcast predictions of SST in the Massachusetts and Cape Cod Bays. Together with these predictions, we present for the first time uncertainty maps for the region.

Key Points

  • A data-driven approach for estimating sea surface temperature in the Massachusetts and Cape Cod Bays is presented
  • A new assimilation technique based on autoregressive Gaussian processes to combine satellite and in situ measurements is presented
  • The multifidelity model produces continuous sea surface temperature and uncertainty maps for the Massachusetts and Cape Cod Bays

1 Introduction

In the Gulf of Maine, Massachusetts and Cape Cod Bays encompass an area of highly productive fisheries, aquaculture facilities, and ecological resources that provide services to coastal communities. In this area, the Stellwagen Bank National Marine Sanctuary is of particular relevance by supporting strong local economies through commercial and recreational fisheries and providing more than 80% of the whale watching tourism in New England (https://sanctuaries.noaa.gov/science/socioeconomic/factsheets/stellwagenbank.html). The future of these benefits is uncertain as sea surface temperatures in the Gulf of Maine are increasing at one of the highest rates in the world's oceans, with clear consequences in reducing Atlantic cod recruitment that contributed to overfished stocks (Pershing et al., 2015). Similarly, in New England, the lobster industry has migrated almost entirely into Maine, being no longer supported in southern coastal states as in the past (Steneck et al., 2013), while about 40 other fish and invertebrate species are considered as highly vulnerable to climate change and its variability (Hare et al., 2016). In the summer of 2018, sea surface temperature (SST) anomalies were 1 to 3 ° C above the 30-year average in the Gulf of Maine. These anomalies and ocean heat waves are becoming common in the Gulf of Maine. On 8 August 2018, the SSTs measured were the second warmest ever observed there (https://www.drought.gov/drought/sites/drought.gov.drought/files/media/reports/regional_outlooks/GOM%20Summer%202018.pdf). In view of these changes and the likely scenario they would increase in the near future, more accurate and timely predictions of sea temperature at regional and local scales might better help resource management and economic planning.

The predictability of environmental drivers is a key step in the process of linking climate and biological processes for better management of marine resources. Advances in climate prediction facilitate this management, with a particular focus on dynamical seasonal to decadal prediction systems derived from Global Climate Models (Tommasi et al., 2017). Considerable progress has been made with this modeling approach, but some limitations exist in coastal predictions at regional scales, particularly in the U.S. east coast (Stock et al., 2015). In addition, in situ data are relatively scarce in this region from buoys, but the Massachusetts Water Resources Authority (MWRA) sustains a comprehensive monitoring program for a wide range of bay conditions (Werme et al., 2017). The MWRA measurements provide an accurate but coarse space-time measurements of various quantities of interest in the Massachusetts and Cape Cod Bays, including seawater temperature. Finally, satellite measurements have better temporal resolution but they are gappy due to cloud coverage and require calibration. Therefore, by using less conventional approaches as the one presented in this study, we aim to advance the predictability of SST in the region by using all available sources of data of variable fidelity.

Various data assimilation methods have been used to combine satellite data and in situ measurements. Among these methods, optimal interpolation (OI) is the most widely used data assimilation technique for ocean and meteorological application (see e.g., Daley, 1992). Most weather centers used OI for weather forecasts during the 1970s and 1980s. OI is also widely used for combining satellite data with in situ measurements (see e.g., ocean group for high resolution SST products; Dash et al., 2012). OI combines the background measurements urn:x-wiley:ess2:media:ess2458:ess2458-math-0001 and observations urn:x-wiley:ess2:media:ess2458:ess2458-math-0002 via the analysis equation given by:
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0004 is the observation operator that interpolates from background field to observation space, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0005 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0006 are background and observation covariances. The interpolator urn:x-wiley:ess2:media:ess2458:ess2458-math-0007 is often chosen a priori, and the effect of its misspecification must be accounted for (Liu & Rabier, 2006). For sparse background measurements and observations available at different scales, determining urn:x-wiley:ess2:media:ess2458:ess2458-math-0008, urn:x-wiley:ess2:media:ess2458:ess2458-math-0009, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0010 is nontrivial, and this has been the subject of intense research (see e.g., Oke & Sakov, 2008 and references therein). In addition to OI, variational methods are also used for the purpose of data assimilation. In these methods, a cost function is defined that seeks to find a smooth field while penalizing the distance between the field and the observations. In Li et al. (2015), the 4D variational data assimilation algorithm was used to combine simulation predictions with in situ measurements in Gulf of Maine with very good results. Empirical orthogonal functions have also been used for field reconstructions from noisy measurements (Yang et al., 2010) and SST data assimilation (Smith et al., 1996). In Smith et al. (1996), the empirical orthogonal function modes were computed from OI-analyzed SST fields, and in situ measurements were used to determine the time dependence of each mode. More recently, other techniques have been proposed to blend satellite data with in situ measurements and to fill in the gaps in sattelite-derived SST products, including hierarchical Bayesian algorithms (Zhu et al., 2017, 2019), Bayesian maximum entropy (Li et al., 2013), and multiresolution wavelet-based algorithms (Chin et al., 2017).

Gaussian processes (GP) are nonparametric Bayesian machine learning techniques for learning functions from data (Rasmussen & Williams, 2006). GP provides a flexible prior distribution over functions, enjoys analytical tractability, and has a probabilistic workflow that yields robust posterior estimates and handles uncertainty in a principled manner. Before conditioning on the training data, a GP model is completely specified via its mean function and covariance function, which is called kernel and determines how the GP model extrapolates and generalizes to new data. There are many choices for kernel. For example, linear regression, splines, Kalman filters, and OI are all examples of GPs with particular kernels (Reece & Roberts, 2010). The kernel of a GP is often parameterized and the hyper-parameters are learned from the observed data, for example, using the maximum likelihood. This allows for further fine-tuning the kernel to the training data. The training also determines hyper-parameters associated with the uncertainty model. Training the kernel and the uncertainty model replaces the guessing work needed in OI, in which various matrices and noise models must be determined a priori.

The novelty of our approach is based on the use of a multifidelity framework for obtaining SST, exploiting correlations in space-time for sea surface, and estimating both mean values as well as their uncertainty as surface maps in the Massachusetts and Cape Code Bays by combining in situ measurements with satellite data. This framework allows us to replace the usual line error bars by space-time error uncertainty maps. Our ultimate goal is to develop a regional model that could even monitor acidification, similar to the three regional aragonite models for the U.S. West Coast (Alin et al., 2018; Davis et al., 2018; Juranek et al., 2009), but continuously updated and endowed with uncertainty quantification and forecasting capabilities that will drive future measurements.

2 Data Sources

For the estimation of SST in Massachusetts and Cape Cod Bays, we consider two sources of data: (1) High-Fidelity: The MWRA measurements. The MWRA routinely measures SST and various other quantities at fourteen stations in Massachusetts and Cape Cod Bays (Libby et al., 2017). These stations are shown in Figure 1 (right). The MWRA measurements are taken at various depths and on a monthly basis during nine surveys per year with the exception of winter months. In this manuscript, we use MWRA measurements taken at 1 m below the surface as an estimate for seawater surface temperature. (2) Low Fidelity: The MODerate-resolution Imaging Spectroradiometer (MODIS) Terra on board NASA satellite with the spatial resolution of 4 urn:x-wiley:ess2:media:ess2458:ess2458-math-0013 4 km (Werdell et al., 2013). The satellite images are processed at level 3 with daily and nightly temporal resolutions and are provided through the NASA Earth Science Data System (ESDS) project. It can be accessed via Physical Oceanography Distributed Active Archive Center (PO.DAAC). Throughout this manuscript, we will use the MODIS Terra, thermal-IR SST level 3, 4 km, daily. We consider the satellite data in the region that spans the longitude of urn:x-wiley:ess2:media:ess2458:ess2458-math-0014 W to urn:x-wiley:ess2:media:ess2458:ess2458-math-0015 W and the latitude of urn:x-wiley:ess2:media:ess2458:ess2458-math-0016 N to urn:x-wiley:ess2:media:ess2458:ess2458-math-0017 N. The satellite images are gappy due to cloud coverage as shown in Figure 1 (left). The observation counts of MODIS satellite for years 2015 and 2016 are shown in Figure 2. It is clear that during summer, there are significantly more satellite data available compared to late fall, entire winter, and early spring. Although the MODIS satellite data used in this study are level 3 SST products, that is, they are blended with in situ measurements when compared against MWRA measurements, the MODIS satellite measurements show up to urn:x-wiley:ess2:media:ess2458:ess2458-math-0018C discrepancy. We expected that some of this disagreement resulted from comparing surface with subsurface measurements (MWRA data from 1 m deep), and some of it may come from the coarse satellite resolution 4 urn:x-wiley:ess2:media:ess2458:ess2458-math-0019 4 km. However, the correlation plot between the MWRA measurements and MODIS satellite measurements in the year 2015 and 2016 indicated that this bias is not consistent along the temperature interval (Figure 3). Therefore, we use the MODIS satellite measurements as a low-fidelity data source. Overall, the satellite data provide regional-scale patterns while the MWRA measurements are accurate but scarce in situ measurements.

Details are in the caption following the image
Multifidelity data sources: Left: The MODerate-resolution Imaging Spectroradiometer (MODIS) Terra on board NASA satellite with the spatial resolution of 0.041 degrees (latitude) urn:x-wiley:ess2:media:ess2458:ess2458-math-0011 0.041 degrees (longitude), which is equivalent to the resolution of 4 urn:x-wiley:ess2:media:ess2458:ess2458-math-0012 4 km (Werdell et al., 2013). Right: Massachusetts Water Resource Authority (MWRA) in situ measurements shown in red. The NERACOOS buoy A01 is shown in blue, and it is used only for validation purposes.
Details are in the caption following the image
MODerate-resolution Imaging Spectroradiometer satellite observation counts for year of 2015 (top) and 2016 (bottom). Day 1 in each plot corresponds to 1 January of the respective year.
Details are in the caption following the image
Correlation between temperature derived from MODerate-resolution Imaging Spectroradiometer satellite and in situ measurements at 14 Massachusetts Water Resource Authority stations during the years of 2015 and 2016.

3 Multifidelity Modeling

In this section, we present the details of the multifidelity modeling based on GP regression (GPR; Rasmussen & Williams, 2006), which is a nonparametric Bayesian machine learning technique that enjoys analytic tractability and provides a fully probabilistic framework for approximating functions and treating approximation uncertainties in a principled manner.

3.1 Multifidelity Scheme

For the sake of brevity we present the linear formulation for two fidelity levels (Babaee et al., 2016; Kennedy & O'Hagan, 2000; Prempraneerach et al., 2017); however, the formulation can be extended to urn:x-wiley:ess2:media:ess2458:ess2458-math-0020 levels of fidelity in a straightforward manner (see e.g., Forrester et al., 2007) and to nonlinear autoregressive schemes using deep GP (see Perdikaris et al., 2017). To this end, we consider the low-fidelity (L) and high-fidelity (H) observations to be of the form
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0022 represents the urn:x-wiley:ess2:media:ess2458:ess2458-math-0023-dimensional input space. For temporal SST multifidelity models urn:x-wiley:ess2:media:ess2458:ess2458-math-0024, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0025 represents time, for spatial models urn:x-wiley:ess2:media:ess2458:ess2458-math-0026 with urn:x-wiley:ess2:media:ess2458:ess2458-math-0027 representing the pair of longitude and latitude, and finally, for spatiotemporal models urn:x-wiley:ess2:media:ess2458:ess2458-math-0028 with urn:x-wiley:ess2:media:ess2458:ess2458-math-0029 representing the triplet of longitude, latitude and time. In equation  2, urn:x-wiley:ess2:media:ess2458:ess2458-math-0030 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0031 are assumed to be two independent random fields, each represented by a GP in the form of
and urn:x-wiley:ess2:media:ess2458:ess2458-math-0033 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0034 represent the unbiased errors in the low-fidelity and high-fidelity observations, respectively. These errors may be caused by unresolved scales due to undersampling or measurement noise corruption, and they are modeled as a centered Gaussian noise urn:x-wiley:ess2:media:ess2458:ess2458-math-0035 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0036. The primary source of noise in our high-fidelity/low-fidelity data, however, is the effect of unresolved scales, because of coarse spatial and temporal resolution of the training data. These subgrid scales are commonly modeled via isotropic Gaussian noise in ocean/atmospheric problems (see e.g., Daley, 1992, Chapter 4). This does not mean, however, that these errors are completely homogeneous. As reasoned in Daley (1992), the inhomogeneities in the noise are often on a very large scale, while the noise correlation lengths are much smaller. Therefore, when the scale of the inhomogeneity is much larger than the scale of the correlation, the assumption of homogeneous Gaussian for the noise is not a major source of the modeling error.
The multifidelity scheme given by equation 2 tolerates and learns the systematic bias in the low-fidelity measurements. This bias is modeled by the scaling factor urn:x-wiley:ess2:media:ess2458:ess2458-math-0037 and the bias correction of urn:x-wiley:ess2:media:ess2458:ess2458-math-0038. In cases where the high-fidelity data have systematic bias, for example, due to sensor malfunction, the uncertainty has to be modeled explicitly. We use the square-exponential kernel in most cases presented in this paper (for hindcast predictions) as specified by
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0040 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0041 are hyper-parameters. Therefore, the kernel at each fidelity level contains urn:x-wiley:ess2:media:ess2458:ess2458-math-0042 hyper parameters, which in addition to the hyper-parameters introduced by urn:x-wiley:ess2:media:ess2458:ess2458-math-0043, urn:x-wiley:ess2:media:ess2458:ess2458-math-0044, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0045 results in urn:x-wiley:ess2:media:ess2458:ess2458-math-0046 hyper-parameters in total.

3.1.1 Training

The hyper-parameters are learned from the low-fidelity and high-fidelity observations. The low-fidelity observations are denoted by urn:x-wiley:ess2:media:ess2458:ess2458-math-0047, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0048 is a urn:x-wiley:ess2:media:ess2458:ess2458-math-0049 input points and urn:x-wiley:ess2:media:ess2458:ess2458-math-0050 is a urn:x-wiley:ess2:media:ess2458:ess2458-math-0051, representing SST satellite measurements, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0052 is the number of low-fidelity measurements. Similarly, the high-fidelity observations are denoted by urn:x-wiley:ess2:media:ess2458:ess2458-math-0053, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0054 with the size of urn:x-wiley:ess2:media:ess2458:ess2458-math-0055 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0056 with the size of urn:x-wiley:ess2:media:ess2458:ess2458-math-0057 are high-fidelity input points and (MWRA) measurements, respectively, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0058 is the number of high-fidelity measurements.

Since urn:x-wiley:ess2:media:ess2458:ess2458-math-0059 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0060 are independent GP, using the multifidelity scheme given by equation 2, it is straightforward to observe that the low-fidelity and high-fidelity observations have a joint Gaussian distribution given by urn:x-wiley:ess2:media:ess2458:ess2458-math-0061, where
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0063 is an urn:x-wiley:ess2:media:ess2458:ess2458-math-0064 vector that contains the low-fidelity and high-fidelity observations at points urn:x-wiley:ess2:media:ess2458:ess2458-math-0065 with the size of urn:x-wiley:ess2:media:ess2458:ess2458-math-0066, urn:x-wiley:ess2:media:ess2458:ess2458-math-0067 is the number of high-fidelity and low-fidelity data points, that is, urn:x-wiley:ess2:media:ess2458:ess2458-math-0068, and
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0070 is an urn:x-wiley:ess2:media:ess2458:ess2458-math-0071 matrix that represents correlations between the low-fidelity points, urn:x-wiley:ess2:media:ess2458:ess2458-math-0072 is an urn:x-wiley:ess2:media:ess2458:ess2458-math-0073 matrix, representing the correlations between the low-fidelity points and high-fidelity points, urn:x-wiley:ess2:media:ess2458:ess2458-math-0074 is an urn:x-wiley:ess2:media:ess2458:ess2458-math-0075 matrix that represents the correlations between high-fidelity points, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0076. The negative log-likelihood urn:x-wiley:ess2:media:ess2458:ess2458-math-0077 is expressed by
The hyper-parameters urn:x-wiley:ess2:media:ess2458:ess2458-math-0079 are obtained by minimizing the above function (i.e., maximizing the likelihood). The minimization problem is solved using the trust-region method. The kernel matrix urn:x-wiley:ess2:media:ess2458:ess2458-math-0080 has a size of urn:x-wiley:ess2:media:ess2458:ess2458-math-0081 and is a full matrix; therefore, it has a memory requirement of urn:x-wiley:ess2:media:ess2458:ess2458-math-0082 and its inversion has a computational complexity of urn:x-wiley:ess2:media:ess2458:ess2458-math-0083. Although we used this approach, for large data sets, this is prohibitive. To address this issue, a recursive autoregressive scheme has been suggested (Le Gratiet & Le Gratiet, 2014), in which the data sources of the two levels of fidelity can be decoupled and equivalently reformulated in a recursive two independent Gaussian regression problems. This requires that the high-fidelity and low-fidelity data sets are nested.

3.1.2 Prediction

Once the hyper-parameters are found, the probabilistic model can be used for the prediction of the mean values at new observed locations and times along with the associated uncertainty. These predictions can be computed for any arbitrary point, and they are not tied to the location of the training data, for example, to satellite grids. In that sense, the predictions of the multifidelity model are continuous. The resulting posterior distribution at the high-fidelity level is taken as the prediction of the multifidelity model. The probabilistic prediction at arbitrary points urn:x-wiley:ess2:media:ess2458:ess2458-math-0084 correspond to conditioning the joint Gaussian prior distribution on the low-fidelity/high-fidelity observations ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0085). To this end, we observe that the joint probability density function (PDF) of the new prediction points and the training points is also a Gaussian given by
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0087. The above Gaussian relationship between the prediction points and high-fidelity/low-fidelity data can be used to find the conditional distribution to make predictions:
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0089 is the posterior mean (i.e., predictions), the variance of the prediction is obtained by urn:x-wiley:ess2:media:ess2458:ess2458-math-0090, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0091 represents the uncertainty of the multifidelity predictions.

3.2 Comparison with Other Methods

In this section, we compare GPR and multifidelity modeling with other existing approaches. We focus on two essential aspects of the presented multifidelity formulation, namely, regression and data assimilation.

3.2.1 Regression

First, we compare GPR with polynomial regression. GPR is a nonparametric regression technique, in which no explicit basis function is assumed, unlike the polynomial regression, which is a parametric regression technique and requires the specification of the basis (monomials). To compare GPR and polynomial regression, we consider synthetic data generated by function urn:x-wiley:ess2:media:ess2458:ess2458-math-0092 corrupted by noise. The size of the training set is 500. This amounts to a single-fidelity regression as there is only one data source. In Figure 4, the results of GPR and polynomial regression of different orders ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0093 and 20) are shown. It is clear that urn:x-wiley:ess2:media:ess2458:ess2458-math-0094 lacks the required model complexity to accurately discover urn:x-wiley:ess2:media:ess2458:ess2458-math-0095 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0096 is too complex. GPR, however, discovers urn:x-wiley:ess2:media:ess2458:ess2458-math-0097 accurately (see Rasmussen & Williams, 2006 for a comprehensive discussion on GPR).

Details are in the caption following the image
Comparison of Gaussian processes regression versus polynomial regression for synthetic data. The observations are generated by: urn:x-wiley:ess2:media:ess2458:ess2458-math-0098. GP = Gaussian processes.

3.2.2 Data Fusion

Second, we compare the multifidelity model with OI. In OI terminology, low-fidelity and high-fidelity data are referred to as background and observation, respectively, and the mean of the posterior is referred to as analysis. We consider two sources of data that are synthetically generated: (1) High-fidelity by 4 samples of function urn:x-wiley:ess2:media:ess2458:ess2458-math-0099 and (2) low-fidelity by 21 samples of function urn:x-wiley:ess2:media:ess2458:ess2458-math-0100. The location of high-fidelity samples are chosen to be a subset of low-fidelity samples (same urn:x-wiley:ess2:media:ess2458:ess2458-math-0101 values), which makes it unnecessary to specify the interpolator matrix urn:x-wiley:ess2:media:ess2458:ess2458-math-0102 in equation 1. In Figure 5a, the single-fidelity GPR with the four high-fidelity samples are shown. Obviously, four samples of urn:x-wiley:ess2:media:ess2458:ess2458-math-0103 are not sufficient for an accurate regression, which is reflected in large uncertainty bands. We assume a square-exponential kernel for OI background and observation noises: urn:x-wiley:ess2:media:ess2458:ess2458-math-0104, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0105 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0106 are the variance intensity and the correlation length, respectively. Therefore, urn:x-wiley:ess2:media:ess2458:ess2458-math-0107 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0108. We consider urn:x-wiley:ess2:media:ess2458:ess2458-math-0109. In Figures 5b and  5c, we compare the results of multifidelity model with OI with two different correlation lengths of urn:x-wiley:ess2:media:ess2458:ess2458-math-0110 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0111. The multifidelity model recovers the true function with very good accuracy, which is reflected in small uncertainty. It is clear that the mean values of OI exhibit large discrepancy with the truth. This is mainly because in OI, both low-fidelity (background) and high-fidelity (observation) are assumed to be unbiased. This can be observed by subtracting the truth from both sides of equation 1 and obtain an equation for the error (for more details, see Daley, 1992). The correlation length affects the length of the region corrected by the high-fidelity observation. This is clearly demonstrated in Figures 5b and  5c, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0112 widens the neighborhood corrected by the high-fidelity observation, while urn:x-wiley:ess2:media:ess2458:ess2458-math-0113 only improves the value of the assimilated data at the location of high-fidelity observation. On the other hand, the multifidelity model accounts for biased low-fidelity data and learns the bias urn:x-wiley:ess2:media:ess2458:ess2458-math-0114 from the data, while the high-fidelity must be unbiased. Moreover, the multifidelity model learns the correlation length of the covariance matrices as hyper-parameters.

Details are in the caption following the image
Comparison of the multifidelity model and optimal interpolation for synthetic data. The high- and low-fidelity observations are generated by urn:x-wiley:ess2:media:ess2458:ess2458-math-0115 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0116: (a) Single-fidelity regression with Gaussian processes (GP); (b) Multifidelity GP and optimal interpolation (OI) with correlation length urn:x-wiley:ess2:media:ess2458:ess2458-math-0117; (c) Multifidelity GP and OI with correlation length urn:x-wiley:ess2:media:ess2458:ess2458-math-0118.

4 Demonstration Examples

In this section, we will demonstrate the application of the multifidelity framework in estimating SST in Massachusetts and Cape Cod Bays. In particular, we present results of three SST multifidelity models: (1) temporal, (2) spatial, and (3) spatiotemporal. For validation purposes, we split the MWRA measurements to two disjoint sets: train set that is used in model training and the test set that is used to validate the multifidelity predictions. In all of these demonstrations, we perform hindcast predictions.

4.1 Temporal Multifidelity Model of SST

With the proposed framework, we built a temporal multifidelity model for three MWRA stations: F02, F23, and N18 in the span of 2 years (2015–2016; see Figure 6d for the location of these stations in the Massachusetts Bay and Cape Cod Bay). Temporal multifidelity models are one-dimensional, and therefore, urn:x-wiley:ess2:media:ess2458:ess2458-math-0119 with urn:x-wiley:ess2:media:ess2458:ess2458-math-0120 representing time urn:x-wiley:ess2:media:ess2458:ess2458-math-0121, and the low-fidelity kernel takes the form of:
where urn:x-wiley:ess2:media:ess2458:ess2458-math-0123 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0124 are the hyper-parameters. The kernel for urn:x-wiley:ess2:media:ess2458:ess2458-math-0125 is defined similarly with the hyper-parameters urn:x-wiley:ess2:media:ess2458:ess2458-math-0126 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0127. With the addition of urn:x-wiley:ess2:media:ess2458:ess2458-math-0128, urn:x-wiley:ess2:media:ess2458:ess2458-math-0129 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0130, the temporal multifidelity model has seven hyper-parameters, which are learned from the satellite data and MWRA measurements. During these 2 years, 18 in situ measurements were taken at each station. In our models, we randomly chose urn:x-wiley:ess2:media:ess2458:ess2458-math-0131 of those MWRA measurements as our high-fidelity training data and used the 7 remaining for validation purposes (stars and circles, respectively, as shown in Figures 6a–6c). For the low-fidelity data, we linearly interpolated the satellite measurement onto the location of the stations. Overall, we had urn:x-wiley:ess2:media:ess2458:ess2458-math-0132, urn:x-wiley:ess2:media:ess2458:ess2458-math-0133, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0134 low-fidelity points for stations F02, F23, and N18, respectively. Due to cloud coverage, we did not have satellite data for every day for all stations. In particular, station F02 has a low number of satellite measurements. In Figures 6a–6c, the mean values of the multifidelity model along with uncertainty band ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0135 standard deviation) are shown for stations F02, F23, and N18. We note the smaller uncertainty near the MWRA measurements, that is, high-fidelity data points. Also, station F02 has a smaller uncertainty band compared with that of stations F23 and N18. The smaller uncertainty estimate of the multifidelity model for station F02 was further justified by accurate hindcast of the MWRA test points, which are not used in training of the multifidelity model. For stations F23 and N18, the MWRA test points also lie within or near the uncertainty band.
Details are in the caption following the image
Validation of temporal multifidelity model of SST. Day 1 corresponds to 1 January 2015. High-fidelity data source: MWRA measurements at stations F02, F23, and N18 (star symbols), and low-fidelity data source: The MODerate-resolution Imaging Spectroradiometer satellite measurements (square symbols). The circle symbols show the MWRA measurements that are not used in training the model but are used for validation. The location of the stations is shown in panel (d). The code and data for this example are available at: https://github.com/hbabaee/MFSST. GP = Gaussian processes; MWRA = Massachusetts Water Resources Authority; SST = sea surface temperature.

4.2 Spatial Multifidelity Model of SST

As the second demonstration, we built space multifidelity models in the Massachusetts and Cape Cod Bays for fixed days by combining MWRA measurements with the satellite data. For the spatial multifidelity model, urn:x-wiley:ess2:media:ess2458:ess2458-math-0136 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0137, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0138 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0139 are longitude and latitude, respectively. We use the square-exponential kernel for low-fidelity model as expressed by
and similarly, we employ a square-exponential kernel for urn:x-wiley:ess2:media:ess2458:ess2458-math-0141. These two kernels contain six hyper-parameters: urn:x-wiley:ess2:media:ess2458:ess2458-math-0142. With the addition of urn:x-wiley:ess2:media:ess2458:ess2458-math-0143 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0144, the spatial multifidelity model has nine hyper-parameters.

For our demonstration, we built spatial multifidelity models for the days of 20 March 2015 and 18 May 2016, when MWRA measurements were available. We note that the MWRA measurements of different stations are not taken simultaneously, but they are all taken within a 24-hr time interval. However, we assume that these measurements are taken at exactly the same time. This assumption introduces a small uncertainty that will be estimated via the noise model urn:x-wiley:ess2:media:ess2458:ess2458-math-0145 in the high-fidelity GP. In Figures 7b and 7c, the mean and the standard deviation of the multifidelity model for 20 March 2015 are shown. To build this multifidelity model, we used the satellite data shown in Figure 7a as the low-fidelity data. We observe that the uncertainty is small near the MWRA stations, and away from the stations, the uncertainty increases (Figure 7c). Comparing the satellite data and the mean of the multifidelity model reveals that the multifidelity model relies on low-fidelity data albeit with a bias correction in regions where no MWRA measurements are available.

Details are in the caption following the image
Spatial multifidelity model: The first two rows show results of multifidelity model for SST for 20 March 2015. The difference between the top row and the middle row is the inclusion of station F29 in the multifidelity model in the middle row. The bottom row shows results for 18 May 2016. Columns from left to right show satellite measurements, mean of multifidelity model, and standard deviation of the multifidelity model, respectively. The code and data for this example are available at: https://github.com/hbabaee/MFSST.

To demonstrate the effect of MWRA measurements on the multifidelity model, we built another multifidelity model for 20 March 2015, where the measurement of the MWRA station F29 was added to the training set. The results for this case are shown in the second row of Figure 7. It is clear that the addition of station F29 expands the low-uncertainty region near north of Cape Cod, and it results in a significant update in the multifidelity prediction, specially since the region near the location of station F29 is not densely sampled by MWRA nor is it covered with satellite data for this particular day.

In the last row of Figure 7, the predictions of the multifidelity model for 18 May 2016 are shown. For this day, the satellite measurements provide better coverage in both Massachusetts Bay and Cape Cod Bay. The results show that the satellite and MWRA measurements are correlated and that the multifidelity model corrects the bias uncertainty of the satellite with the inclusion of the MWRA measurements.

4.3 Spatiotemporal Multifidelity Model of SST

In this section, we demonstrate the results of the spatiotemporal multifidelity model of SST in the Massachusetts and Cape Cod Bays. This is our most comprehensive model as it exploits both spatial and temporal correlations of the SST data in a multifidelity framework. The spatiotemporal model is three-dimensional ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0146) and urn:x-wiley:ess2:media:ess2458:ess2458-math-0147, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0148 represents longitude, urn:x-wiley:ess2:media:ess2458:ess2458-math-0149 latitude and urn:x-wiley:ess2:media:ess2458:ess2458-math-0150 time. There are 11 hyper-parameters for this model. The low-fidelity kernel is expressed by
and it contains four hyper-parameters urn:x-wiley:ess2:media:ess2458:ess2458-math-0152. Similarly, the square-exponential kernel for urn:x-wiley:ess2:media:ess2458:ess2458-math-0153 introduces four hyper-parameters urn:x-wiley:ess2:media:ess2458:ess2458-math-0154. The multifidelity model also contains three other hyper-parameters: the scaling factor urn:x-wiley:ess2:media:ess2458:ess2458-math-0155 and the low-fidelity and high-fidelity standard deviations expressed with urn:x-wiley:ess2:media:ess2458:ess2458-math-0156, respectively.

We considered 14 MWRA stations in our analysis, from which the SST measured at 11 stations in the span of years 2015–2016 are selected as the training set for high-fidelity points. The remaining three stations are used as a test set for validation of the multifidelity model. The training and test MWRA stations are shown in Figure 8f with square and circle symbols, respectively. Overall, there are urn:x-wiley:ess2:media:ess2458:ess2458-math-0157 high-fidelity points available. The satellite images are used as monthly snapshots, that is, skipping every 30 days resulting in urn:x-wiley:ess2:media:ess2458:ess2458-math-0158 low-fidelity points. Using the entire satellite set of measurements with daily temporal resolution would result in over fitting of the multifidelity model. Besides the issue of over fitting, for cases with large training data sets, the computational cost of training the multifidelity model becomes increasingly prohibitive, as the computational complexity of training the multifidelity model is urn:x-wiley:ess2:media:ess2458:ess2458-math-0159, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0160. This will also be the case for example when high-resolution measurements are needed to build multifidelity model with fine space-time resolution; (see Gardner et al., 2005) for such measurements in the region. Recently, parametric GP has been developed that enables training GP with big data (Raissi et al., 2019) and it also alleviates the problem of over fitting. The parametric GP enables effective mini-batch training procedures with the computational complexity being reduced to urn:x-wiley:ess2:media:ess2458:ess2458-math-0161, where urn:x-wiley:ess2:media:ess2458:ess2458-math-0162 is the number of mini-batches, and urn:x-wiley:ess2:media:ess2458:ess2458-math-0163 is the number of training points is each mini-batch. Given that the MWRA and satellite data are provided on monthly intervals, the corresponding multifidelity model can resolve up to monthly time scales.

Details are in the caption following the image
Validation of spatiotemporal multifidelity model of SST in years 2015–2016. Day 1 corresponds to 1 January 2015. On the top row, the multifidelity model is evaluated at three MWRA stations (F13, F29, and N04), whose measurements were not used in model training. The location of these three stations along with the rest of the stations are shown in panel (f) with circle and square symbols, respectively. The first two panels on the second row show the evaluation of the multifidelity model at MWRA stations F02 and N07, whose measurements were used in the model training. The code and data for this example are available at: https://github.com/hbabaee/MFSST. GP = Gaussian processes; MWRA = Massachusetts Water Resources Authority; SST = sea surface temperature.

In Figures 8a– 8c, the multifidelity model is validated with the test data set of three MWRA stations (F13, F29, and N04) in the span of 2015–2016. In all three cases, the MWRA measurements are in good agreement with the mean of the multifidelity model. These measurements lie mostly within the uncertainty bands, with a few cases nearby. Among these three stations, F29 shows the largest overall uncertainty as it is not surrounded by MWRA stations that are used in the training of the model (see Figure 8f for the location of F29). On the other hand, N04 has the lowest overall uncertainty and the multifidelity mean yields the best prediction among the three selected stations. In Figures 8 d and  8e, the mean of the multifidelity model and its uncertainty are shown for stations F02 and N07. The measurements of both of these stations are used in the training of the model, and as a result, the uncertainty becomes very small near the time of the measurements. Note that in the absence of the noise model in the high-fidelity GP, represented by urn:x-wiley:ess2:media:ess2458:ess2458-math-0164, the multifidelity mean will pass through the MWRA measurements, that is, zero uncertainty. The term urn:x-wiley:ess2:media:ess2458:ess2458-math-0165 estimates the noise in the MWRA measurements and the unresolved spatiotemporal scales in the multifidelity model.

In Figure 9a, we perform an independent validation of the spatiotemporal multifidelity model by comparing the hindcast prediction of our model with the in situ measurements of buoy A01 for the years 2015 and 2016. The buoy A01 is part of the Northeast Coastal Ocean Observation System NERACOOS, and it is located at urn:x-wiley:ess2:media:ess2458:ess2458-math-0166 N and urn:x-wiley:ess2:media:ess2458:ess2458-math-0167W as shown in Figure 9b with a square symbol. The data for this buoy can be downloaded from: http://www.neracoos.org. The buoy A01 measurements are taken hourly, but the satellite and MWRA training data of the spatiotemporal multifidelity model have a monthly temporal resolution. As a result, the higher frequency oscillations are not resolved in the multifidelity model and they are accounted for with the uncertainty as can be seen in Figure 9a. Overall, a good agreement between the mean of the multifidelity model and the buoy measurements is observed.

Details are in the caption following the image
Independent validation of spatiotemporal multifidelity model with NERACOOS buoy A01. The code and data for this example are available at: https://github.com/hbabaee/MFSST. GP = Gaussian processes; SST = sea surface temperature.

In Figure 10 ,we further illustrate the learning of the large-scale seasonal dynamics via the MWRA measurements in the spatiotemporal multifidelity model. In this figure, we show snapshots of the mean and the standard deviation taken every 10 days from 1 January to 21 May 2015 and the approximate days of MWRA measurements (red circles). We observe that uncertainty is consistently lower near the MWRA measurements. However, during the winter, uncertainties are larger compared to late April and May. This is due to the scarcity of satellite data and the lack of MWRA measurements during winter months. This is further confirmed by comparing the measurements of buoy A01 with the multifidelity predictions. In Figure 10, we show the prediction of the multifidelity and its uncertainty (i.e., urn:x-wiley:ess2:media:ess2458:ess2458-math-0170) at the location of the buoy A01 for two days. The buoy measurements are also shown for comparison. On 1 January 2015, the uncertainty is large (i.e., urn:x-wiley:ess2:media:ess2458:ess2458-math-0171 C) due to lack of data, while on 1 April 2015, the uncertainty is lower (i.e., urn:x-wiley:ess2:media:ess2458:ess2458-math-0172 C). In both cases, the buoy measurements lie within the uncertainty band.

Details are in the caption following the image
Spatiotemporal multifidelity model: sea surface temprature (SST) mean and the standard deviation (first and second row, respectively) starting from 1 January 2015 with 10 days skipped between subsequent snapshots. The approximate times of MWRA measurements are shown with red circles. On 1 January and 1 April, the position of buoy A01 (white dot), its SST measurement ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0168), and the prediction of the model ( urn:x-wiley:ess2:media:ess2458:ess2458-math-0169) at that location are shown. MWRA = Massachusetts Water Resources Authority.

5 Summary and Future Work

The presented framework goes beyond traditional predictions with line error bars to continuous space-time uncertainty maps by utilizing and developing modern machine learning techniques that enable fusion of data from all available sources with varying degrees of fidelity. In Figure 11, we compare the difference between single-fidelity Gaussian regression model with the multifidelity model for 20 March 2015 (top row) and 18 May 2015 (bottom row). In the left column, the satellite data on the corresponding days are shown, and the middle column shows the Gaussian regression of the satellite measurements, that is, the single-fidelity model. The right column shows the mean values of the multifidelity model, where we have used MWRA measurements as the source of high-fidelity data. The comparison of the mean values of the single-fidelity and multifidelity models clearly shows that these two temperature maps are correlated. However, the multifidelity model corrects the single-fidelity model up to 3° C on 20 March 2015 and roughly 1–2° C on 18 May 2015.

Details are in the caption following the image
Comparison of spatial low-fidelity and multifidelity models of sea surface temperature for 20 March 2015 (top row) and 18 May 2015 (bottom row). Left: Satellite measurements; Middle: Mean of low-fidelity model; Right: Mean of low-fidelity model.
Our long-term goal is to analyze and forecast various quantities of interest and their uncertainty directly related to ocean acidification, that is, salinity, temperature, dissolved inorganic carbon, pH, total alkalinity, using a flexible Bayesian data fusion framework that leaves no data behind —including 3D physics-based models (see e.g., Xue et al., 2014). In particular, we will study the effect of kernel on hindcast and forecast predictions. Kernels can be chosen to encode various structures in the data including smoothness, stationarity, and seasonality. In Figure 12, we show the performance of square exponential and periodic kernels
in forecasting the SST. In the above kernels urn:x-wiley:ess2:media:ess2458:ess2458-math-0174 and urn:x-wiley:ess2:media:ess2458:ess2458-math-0175 are hyper-parameters. The data used in Figure 12 are obtained from buoy measurements located in the Gulf of Maine at 43.02° N, 70.54° W representing SST measurements during more than 9 years — spanning from July 2006 to November 2015. It is evident that for this long time record, the square exponential kernel fails to learn the seasonality beyond the training data, and therefore, it cannot be used for reliable forecast. This is also reflected in the large uncertainty bands. On the other hand, the periodic kernel captures the seasonality and it yields accurate forecast over a long period of time. Choosing an optimal kernel can be automated via Bayesian optimization (Gardner et al., 2015; Martinez-Cantin et al., 2009), or via neural-net-induced GP (Pang et al., 2019; for a broader list of research in this area, see the Automatic Statistician Project, https://www.automaticstatistician.com).
Details are in the caption following the image
Effect of kernel on model performance: Forecast seasonality of sea surface temperature (SST) using Gaussian processes with (a) square exponential kernel and (b) periodic kernel. The data used are mooring measurements of a buoy in Coastal Western Gulf of Maine at 43.02° N, 70.54° W maintained by the University of New Hampshire Coastal Carbon Group. The data can be downloaded from: https://www.nodc.noaa.gov/ocads/oceans/Coastal/unh_ts.html. Day 1 corresponds to 13 July 2006. The color blue shows the data used in the training of the model (hindcast), and the data in red show the test data (forecast).

Ultimately, we intend to build 3D volumetric maps of these quantities in the Massachusetts and Cape Cod Bays, including Boston Harbor and the Stellwagen Bank. This will enhance our understanding of coastal ocean acidification and our ability to focus our monitoring efforts in places of high uncertainty and providing management strategies in regions with relative low aragonite saturation means that could negatively affect valuable resources.


The authors thank Dr. Raissi for useful comments and stimulating discussions. This research has been supported primarily by a grant from NOAA (NA18OAR4170105) and partial support from Office of Naval Research, grant number N00014-16-1-2956. The Moderate-resolution Imaging Spectroradiometer (MODIS) SST data were obtained from the NASA EOSDIS Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the Jet Propulsion Laboratory, Pasadena, CA (https://doi.org/10.5067/MODST-1D4N4). The MWRA measurements may be downloaded from http://www.mwra.com/harbor/html/wq_data.htm.