Volume 34, Issue 8 p. 1292-1315
Research Article
Free Access

Global Core Top Calibration of δ18O in Planktic Foraminifera to Sea Surface Temperature

Steven B. Malevich

Corresponding Author

Steven B. Malevich

Department of Geosciences, The University of Arizona, Tucson, AZ, USA

Correspondence to: S. B. Malevich,

[email protected]

Search for more papers by this author
Lael Vetter

Lael Vetter

Department of Geosciences, The University of Arizona, Tucson, AZ, USA

Search for more papers by this author
Jessica E. Tierney

Jessica E. Tierney

Department of Geosciences, The University of Arizona, Tucson, AZ, USA

Search for more papers by this author
First published: 29 May 2019
Citations: 22

Abstract

The oxygen isotopic composition of planktic foraminiferal calcite ( urn:x-wiley:palo:media:palo20756:palo20756-math-0001) is one of the most prevalent proxies used in the paleoceanographic community. The relationship between urn:x-wiley:palo:media:palo20756:palo20756-math-0002, temperature, and seawater oxygen isotopic composition ( urn:x-wiley:palo:media:palo20756:palo20756-math-0003) is firmly rooted in thermodynamics, and experimental constraints are commonly used for sea surface temperature (SST) reconstructions. However, in marine sedimentary applications, additional sources of uncertainty emerge, and these uncertainty constraints have not as of yet been included in global calibration models. Here, we compile a global data set of over 2,600 marine sediment core top samples for five planktic species: Globigerinoides ruber, Trilobatus sacculifer, Globigerina bulloides, Neogloboquadrina incompta, and Neogloboquadrina pachyderma. We developed a suite of Bayesian regression models to calibrate the relationship between urn:x-wiley:palo:media:palo20756:palo20756-math-0004 and SST. Spanning SSTs from 0.0 to 29.5 °C, our annual model with species pooled together has a mean standard error of approximately 0.54‰. Accounting for seasonality and species-specific differences improves model validation, reducing the mean standard error to 0.47‰. Example applications spanning the Late Quaternary show good agreement with independent alkenone-based estimates. Our pooled calibration model may also be used for reconstruction in the deeper geological past, using modern planktic foraminifera as an analog for non-extant species. Our core top-based models provide a robust assessment of uncertainty in the urn:x-wiley:palo:media:palo20756:palo20756-math-0005 paleothermometer that can be used in statistical assessments of interproxy and model-proxy comparisons. The suite of models is publicly available as the Open Source software library bayfox, for Python, R, and MATLAB/Octave.

Key Points

  • We develop Bayesian calibration models for planktic urn:x-wiley:palo:media:palo20756:palo20756-math-0006
  • Accounting for seasonal abundance and species-specific sensitivities improves inference
  • Our models produce realistic SST reconstructions for both recent and “deep-time” urn:x-wiley:palo:media:palo20756:palo20756-math-0007 data

1 Introduction

The oxygen isotopic composition of foraminiferal shell calcite ( urn:x-wiley:palo:media:palo20756:palo20756-math-0008 or δc) is the foundational proxy of paleoceanography (Emiliani, 1955). Foraminiferal urn:x-wiley:palo:media:palo20756:palo20756-math-0009 is a function of ambient urn:x-wiley:palo:media:palo20756:palo20756-math-0010 and temperature at the time of calcification, the latter of which determines the magnitude of equilibrium fractionation in line with thermodynamic expectations (Urey, 1947). The relationship of the isotopic difference between urn:x-wiley:palo:media:palo20756:palo20756-math-0011 and urn:x-wiley:palo:media:palo20756:palo20756-math-0012, and temperature is traditionally approximated as a second-order polynomial (e.g., Epstein et al., 1953; Erez & Luz, 1983; Shackleton, 1974)
urn:x-wiley:palo:media:palo20756:palo20756-math-0013
At environmental temperatures, the nonlinearity of the temperature sensitivity is slight, leading many to use a simplified linear form (e.g., Bemis et al., 1998)
urn:x-wiley:palo:media:palo20756:palo20756-math-0014
Rearranging to the etiological form of the equation to express the dependence of urn:x-wiley:palo:media:palo20756:palo20756-math-0015 on temperature yields
urn:x-wiley:palo:media:palo20756:palo20756-math-0016
where −0.27 approximates the difference between the Vienna PeeDee Belemnite (for urn:x-wiley:palo:media:palo20756:palo20756-math-0017) and Vienna Standard Mean Ocean Water (VSMOW; for urn:x-wiley:palo:media:palo20756:palo20756-math-0018) scales (Hut, 1987).

Synthetic calcite studies indicate that under equilibrium conditions, urn:x-wiley:palo:media:palo20756:palo20756-math-0019 sensitivity to temperature is approximately −0.19‰   · ° C−1 near 30 °C and approximately −0.25‰   · °C−1 near 0 °C (Kim & O'Neil, 1997). Calibration studies using foraminifera from lab cultures and marine tows yield comparable temperature sensitivities (−0.20 to −0.28‰   · °C−1; e.g., Bemis et al., 1998; Bouvier-Soumagnac & Duplessy, 1985; Mulitza et al., 2003a; Shackleton, 1974) although in some cases the foraminiferal calibrations are offset (0.2–0.8‰) from inorganic calibrations (e.g., Caron et al., 1990; Duplessy & Blanc, 1981; Waelbroeck et al., 2005). The error in calibrations from culture or plankton tows is greater than the error of inorganic calibrations; for example, lab culture calibrations have standard errors of 0.15‰ (e.g., Bemis et al., 1998) and plankton tow calibrations of 0.26‰ (e.g., Mulitza et al., 2003a).

While there is general agreement of urn:x-wiley:palo:media:palo20756:palo20756-math-0020 sensitivity between synthetic calcite experiments and foraminiferal-based investigations, application of urn:x-wiley:palo:media:palo20756:palo20756-math-0021 paleothermometry to marine sediment records poses additional challenges and unavoidable sources of uncertainty. For one, urn:x-wiley:palo:media:palo20756:palo20756-math-0022 is not precisely known for past oceanographic conditions and must be estimated, potentially introducing a large source of uncertainty under both present and past oceanic conditions. To a leading order, urn:x-wiley:palo:media:palo20756:palo20756-math-0023 reflects the precipitation-evaporation balance over the open ocean, but it is also modified by local and regional processes such as ice formation, glacial meltwater, seasonal freshwater runoff, water mass advection, and mixing (Craig & Gordon, 1965). The climatological processes influencing urn:x-wiley:palo:media:palo20756:palo20756-math-0024, coupled with the scarcity of measurements in many regions of the modern ocean, can lead to large uncertainties in urn:x-wiley:palo:media:palo20756:palo20756-math-0025 for certain locations such as the high latitudes (LeGrande & Schmidt, 2006). Moving back through time, urn:x-wiley:palo:media:palo20756:palo20756-math-0026 distributions must either be estimated through a priori assumptions about oceanographic setting or predicted by isotope-enabled climate models. Alternatively, research questions about urn:x-wiley:palo:media:palo20756:palo20756-math-0027 can be addressed by reconstructing temperature with another independent proxy and then isolating urn:x-wiley:palo:media:palo20756:palo20756-math-0028 from urn:x-wiley:palo:media:palo20756:palo20756-math-0029.

In foraminiferal calcite, the uncertainty of shell urn:x-wiley:palo:media:palo20756:palo20756-math-0030 temperature calibrations is influenced by biological processes, such as photosynthesis in algal symbionts (e.g., Duplessy et al., 1970; Ravelo & Fairbanks, 1992; Spero & Lea, 1993; Spero et al., 1997) and biases in the formation of gametogenic and ontogenetic calcite (e.g., Hamilton et al., 2008; Spero & Lea, 1996; Williams et al., 1979). In addition, each species exhibits a distinct seasonality and depth habitat in the water column (e.g., Fairbanks & Wiebe, 1980; Fairbanks et al., 1982; Kohfeld et al., 1996; Sautter & Thunell, 1989, 1991; Ž̆̆arić et al., 2005), and even within the morphospecies commonly used for classification of fossil foraminifera, there may be additional differences in life cycle and habitat preferences due to the genotypic diversity (Aurahs et al., 2011; Darling & Wade, 2008; Kucera & Darling, 2002). While such ecological relationships can be leveraged for season- and depth-specific climate reconstructions (e.g., Mulitza et al., 2003b; Spero et al., 2003; Williams et al., 1981), these relationships can change through time in response to environmental or biological variations, complicating paleoenvironmental interpretations (e.g., Mulitza et al., 1998). Changes in pH/carbonate ion concentration [CO urn:x-wiley:palo:media:palo20756:palo20756-math-0031] during calcification also influence urn:x-wiley:palo:media:palo20756:palo20756-math-0032 (Bijma et al., 1999; Spero et al., 1997; Zeebe, 1999). Finally, the sedimentary environment influences the fidelity of urn:x-wiley:palo:media:palo20756:palo20756-math-0033 as it is preserved downcore. Shells deposited in bottom waters undersaturated in [CO urn:x-wiley:palo:media:palo20756:palo20756-math-0034] may partly dissolve and recrystallize, a process that alters the original isotopic signature via exchange with pore water δ18O (e.g., Schrag et al., 1995). Bioturbation can be an especially strong source of core top variability in areas with low sedimentation rates, where glacial age sediments may become mixed with relatively modern sediments in a core top sample (Waelbroeck et al., 2005).

Many paleoceanographic applications use laboratory calibrations to transform urn:x-wiley:palo:media:palo20756:palo20756-math-0035 data to sea surface temperatures (SST), but these calibrations do not capture the range of biological, chemical, and sedimentological uncertainties enumerated above. It is important to capture these uncertainties in order to realistically estimate paleo-SSTs from the marine sediment archive and critical for multiproxy or climate model-proxy comparisons. In this study, we develop a calibration for the urn:x-wiley:palo:media:palo20756:palo20756-math-0036-SST relationship in planktic foraminifera using core top data and Bayesian regression. We use a Bayesian approach which explicitly models the uncertainty in calibration model parameters and then propagates this uncertainty into inferred SSTs, facilitating probabilistic estimates of past climate (Tierney & Tingley, 2014, 2018). In developing Bayesian regression for urn:x-wiley:palo:media:palo20756:palo20756-math-0037, we pay particular attention to species-specific differences and seasonal abundance. Depth habitat is also an important factor, but in this work we focus specifically on near-surface (mixed layer) dwelling species that are frequently used to reconstruct SST, for which slight differences in depth habitat are less likely to appreciably affect regression models. Calibration of deep-dwelling species is left to future efforts.

In what follows, we develop mean annual and seasonal calibration models for five planktic species commonly used in paleoceanography: Globigerinoides ruber, Trilobatus sacculifer, Globigerina bulloides, Neogloboquadrina incompta, and Neogloboquadrina pachyderma. We also develop a model that pools these species together for application to extinct species of planktic foraminifera commonly used for Cenozoic paleoceanographic reconstructions (e.g., Zachos et al., 1994). These models are freely available to researchers as a software library called bayfox. We give examples of how our calibrations can be applied for paleotemperature reconstructions over the Late Quaternary as well as in deeper geologic time, comparing our calibration and its uncertainty with established Bayesian alkenone urn:x-wiley:palo:media:palo20756:palo20756-math-0038 and TEX86 reconstructions.

2 Methods and Data Selection

2.1 SST and Seawater urn:x-wiley:palo:media:palo20756:palo20756-math-0039

Our Bayesian models use modern SSTs and urn:x-wiley:palo:media:palo20756:palo20756-math-0040 as predictors. For SSTs, we used both monthly and annual fields from the World Ocean Atlas 2013 version 2 (Boyer et al., 2013). For urn:x-wiley:palo:media:palo20756:palo20756-math-0041, we used the top layer of the estimated annual fields from LeGrande and Schmidt (2006). Both of these products have 1° × 1° spatial resolution. The LeGrande and Schmidt (2006) urn:x-wiley:palo:media:palo20756:palo20756-math-0042 field is based on urn:x-wiley:palo:media:palo20756:palo20756-math-0043 observations from the last half-century, and in areas with sparse isotope sampling coverage, it uses regional urn:x-wiley:palo:media:palo20756:palo20756-math-0044-salinity relationships to estimate isotope values. The urn:x-wiley:palo:media:palo20756:palo20756-math-0045 field does not include uncertainty estimates for grid points, though LeGrande and Schmidt (2006) note that annual average values in regions near or under sea ice may be more uncertain due to a limited number of observations and large seasonal fluctuations in runoff and precipitation that induce high variance in urn:x-wiley:palo:media:palo20756:palo20756-math-0046.

2.2 Core Top Planktic Foraminiferal urn:x-wiley:palo:media:palo20756:palo20756-math-0047

We compiled planktic urn:x-wiley:palo:media:palo20756:palo20756-math-0048 sediment core records for five foraminiferal species. G. ruber, G. bulloides, N. incompta, and N. pachyderma. G. ruber (white), and G. ruber (pink) have some regional differences in seasonality and relative abundance (e.g., Bé, 1960; Williams et al., 1981). However, we opted to evaluate G. ruber white and pink together because our preliminary G. ruber (pink) calibration was strongly influenced by sites off the northwest coast of Africa, leading to model parameters that we believe may reflect sampling and statistical artifacts more than differences in G. ruber (pink) calcification.

Core top and Late Holocene records were gathered from the Multiproxy Approach for the Reconstruction of the Glacial Ocean data set (Waelbroeck et al., 2005) which extends the collection of Schmidt and Mulitza (2002). We supplemented this collection with additional sources (Arbuszewski et al., 2010, 2013; Boussetta et al., 2012; Brown & Elderfield, 1996; Cléroux et al., 2008; Dahl & Oppo, 2006; Dekens et al., 2002; Dyez et al., 2014; Elderfield & Ganssen, 2000; Fallet et al., 2012; Farmer, 2005; Ganssen & Kroon, 2000; Garidel-Thoron et al., 2007; Gebregiorgis et al., 2016; Gibbons et al., 2014; Johnstone et al., 2011; Kozdon et al., 2009; Lea et al., 2006; Leduc et al., 2007; Linsley et al., 2010; Mashiotta et al., 1999; Mathien-Blard & Bassinot, 2009; Meland et al., 2006; Moffa-Sánchez et al., 2014; Mohtadi et al., 2010, 2011; Nürnberg et al., 2008; Oppo et al., 2009; Oppo & Sun, 2005; Pahnke et al., 2003; Palmer & Pearson, 2003; Parker et al., 2016; Regenberg et al., 2009; Richey et al., 2007; Riethdorf et al., 2013; Riveiros et al., 2016; Romahn et al., 2014; Rosenthal et al., 2003; Rustic et al., 2015; Sabbatini et al., 2011; Saraswat et al., 2013; Schmidt et al., 2012a, 2004, 2012b; Steinke et al., 2005, 2008; Steph et al., 2009; Stott et al., 2007; Sun et al., 2005; Thornalley et al., 2011; Tierney et al., 2016; Visser et al., 2003; Weldeab et al., 2005, 2006, 2007, 2014; Werner et al., 2013; Xu et al., 2010).

We excluded records from sites with annual SST ≤ 0 °C to reduce complications from local sea ice formation and poor urn:x-wiley:palo:media:palo20756:palo20756-math-0049 estimates. Waelbroeck et al. (2005) show that age and sedimentation rate filtering can help reduce uncertainty stemming from the ambiguous “modern” age constraints. However, we consider these to be important sources of uncertainty to include in our calibration, so we did not filter core top sites by age or sedimentation rates. This also means that we are calibrating ambiguously modern core top samples against SST and urn:x-wiley:palo:media:palo20756:palo20756-math-0050 fields influenced by anthropogenic climate change. This is an issue that affects all core top calibrations.

Our compilation consists of 2,636 observations (Figure 1a) with 1,002 for G. ruber, 635 for G. bulloides, 442 for T. sacculifer, 425 for N. pachyderma, and 132 for N. incompta. We then gridded the core top data to reduce the impact of spatial clustering by averaging samples for each species to the nearest 1° × 1°grid point of our SST and urn:x-wiley:palo:media:palo20756:palo20756-math-0051 fields. After gridding, there were a total of 1,386 grid points, with 489 for G. ruber, 291 for G. bulloides, 273 for N. pachyderma, 243 for T. sacculifer, and 90 for N. incompta. References to core top data hereafter refer to the gridded core top data unless noted otherwise.

Details are in the caption following the image
Core top (a) and marine sediment trap (b) sites for each planktic species used in the calibration. We used the sediment trap data sets compiled by Ž̆̆arić et al. (2005) to estimate seasonal sea surface temperatures for the seasonal calibration models.

The core top data cover a wide range of modern SST values and reflect the general thermal preferences of each species (Table 1); for instance, G. ruber prefers relatively warmer waters, while G. bulloides is abundant across a wide range of temperatures, and N. incompta and N. pachyderma prefer cooler waters.

Table 1. Modern Annual SST Minimum, Maximum, Mean, and Standard Deviation (σ) for Each Core Top Species Group
Modern annual SST (°C)
Group n Min Max Mean σ
Globigerinoides ruber 489 10.9 29.6 24.9 3.8
Trilobatus sacculifer 243 10.6 29.6 24.5 4.0
Globigerina bulloides 291 1.8 29.6 13.6 6.9
Neogloboquadrina incompta 90 2.6 19.6 11.5 4.7
Neogloboquadrina pachyderma 273 0.1 21.4 6.1 4.3
Pooled 1,386 0.1 29.6 17.8 9.0
  • Note. n values are sample size for gridded core top data.

2.3 Estimation of Foraminiferal Seasonal Abundance

The abundance of individual planktic foraminiferal species varies seasonally in response to changes in temperature and nutrients, which affect food availability (e.g., Williams et al., 1979, 1981). This motivates the development of a seasonally adjusted calibration model, wherein foraminiferal seasonal expression is modeled as function of their preferred temperature range. To build such a model, we use sediment trap data compiled by Ž̆̆arić et al. (2005) to identify temperature ranges that correspond to peak abundance for each foraminiferal species. The Ž̆̆arić et al. (2005) data set pairs total foraminifera shell flux and local SSTs from 75 sites (Figure 1b). The data set contains a total of 5,548 observations with 1,807 for G. ruber, 1,034 for T. sacculifer, 1,255 for G. bulloides, 910 for N. incompta, and 542 for N. pachyderma.

To adjust for nonnormally distributed shell flux data, we applied a Box-Cox power transformation (Box & Cox, 1964) for comparison with SST (Figure 2). A Kernel Density Estimate was fit to the observations to estimate the SST interval that corresponds with the highest 10% (most abundant) flux observations, which is taken to represent their “ideal” thermal niche, similar to the approach in Ž̆̆arić et al. (2005).

Details are in the caption following the image
Peak abundance and SST ranges for each species. Points are observed foraminifera shell flux from the sediment trap network, plotted against SST measured at the sediment trap site (Ž̆̆arić et al., 2005). Contours are the Kernel Density Estimate of observations. Horizontal lines are the highest 10% of observed shell flux. Vertical lines and annotations are the 90% range of SSTs within the highest 10% shell flux.

We then used these SST ranges to estimate the most likely seasons of peak shell flux for each core top foraminiferal observation, by averaging SSTs from all months at the location of the observation that fell within the SST range. If, for a given observation, no monthly temperatures fell within the peak abundance range, we took the average of the three monthly SSTs closest to the range. If less than 3 months fell within the range, we included the next closest months so as to guarantee a seasonal average of at least three months. The resulting seasonality estimates show that at core sites in the tropics, foraminiferal species abundances are typically annual or near annual, without a strong seasonal signal (Figure 3). For the warm-water species G. ruber and T. sacculifer, a stronger seasonal signal (typically Summer-Fall) appears outside of the tropics. Cold-water species are generally predicted to be annual within their expected extratropical ranges, although seasonal expressions do occur for locations with particularly broad annual temperature ranges (e.g., Figure S4 in the supporting information). Monthly maps of predicted niche SST ranges for each species are available in the supporting information (Figures S1–S5).

Details are in the caption following the image
Estimated seasonal preferences for core top foraminifera based on sediment flux and monthly sea surface temperature, for the tropics, northern extratropics (latitude > 23°), and southern extratropics (latitude < −23°). Numbers around the circle represent months. Line widths correspond to how frequently months are paired together for seasonal sea surface temperature estimates in each species. Stronger lines indicate common month pairings. n is the number of gridded core top samples available for the given region.

We recognize that this method of identifying seasonal signals in foraminiferal abundance is a simplification that does not consider factors such as light and nutrient availability. We chose this approach because it is easily replicated and used with a large global sample data set. This approach can be adapted more broadly for forward modeling experiments where seasonality or monthly SSTs change relative to modern conditions.

2.4 Bayesian Calibration Models

We designed and fit four linear Bayesian regression models with Markov chain Monte Carlo (MCMC) sampling (for review see Gelman, 2014; Kruschke, 2015; McElreath, 2016). With a Bayesian approach, we can explicitly estimate the uncertainty in calibration model parameters and produce a full prediction posterior distribution of the predictant ( urn:x-wiley:palo:media:palo20756:palo20756-math-0052 or SST), rather than a single-point estimate. We designed four models to uniquely consider seasonal and species-specific adjustments to calibration. We compared the performance of these four models with cross-validation statistics. These statistics were used to objectively assess whether considering species-specific variability and seasonality resulted in model improvement.

In the first model, we pooled all five species for a single calibration against annual SSTs. This model design is
urn:x-wiley:palo:media:palo20756:palo20756-math-0053(1)
where α, β, and τ2 are the regression intercept, slope, and prediction variance, respectively. The unknown parameters, α, β, and τ were assigned weakly informed prior distributions. While we have discussed uncertainty in urn:x-wiley:palo:media:palo20756:palo20756-math-0054 fields, we did not explicitly model uncertainty in urn:x-wiley:palo:media:palo20756:palo20756-math-0055 because this is not constrained in the LeGrande and Schmidt (2006) data. We used a linear calibration model because the nonlinearity of temperature sensitivity is relatively small within the range of temperatures at which foraminifera calcify (e.g., Bemis et al., 1998), and this nonlinearity is apparently represented piecewise in species-specific calibrations (e.g., Mulitza et al., 2003a; Waelbroeck et al., 2005). To test the validity of our linear assumption, we developed a simple quadratic Bayesian model comparable to the temperature relationships for inorganic calcite from O'Neil et al. (1969) and Kim & O'Neil (1997; not shown). Cross validation showed that the quadratic model yielded no significant improvement over our linear regressions, suggesting that a linear assumption adequately explains the data.
Our second model was also calibrated against annual SSTs but uses a hierarchical design to allow model parameters to vary for each of the five species. The hierarchical design imposes a “global” commonality for calcification in planktic foraminifera (supported by thermodynamics) but allows the calibration parameters for each species to vary from this global constraint—reflecting, for example, differences in depth of calcification, vital effects, or temperature sensitivity (e.g., Bemis et al., 1998; Mulitza et al., 2003a; Spero et al., 2003), as well as influence from spatial sampling bias. This model is
urn:x-wiley:palo:media:palo20756:palo20756-math-0056(2)
where αi, βi, and τi are calibration parameters as described above but for given species, i. The hierarchical relationship is in the priors for species groups
urn:x-wiley:palo:media:palo20756:palo20756-math-0057(3)
which have unknown parameters depending on hyperparameters shared across all species groups. The hyperparameters are described in the appendix.

Our third and fourth models use seasonal SSTs in the place of annual SSTs, while retaining the same pooled and hierarchical model designs described above. The seasonal SSTs are based on seasonal peak abundance estimated from a network of marine sediment traps, as described in section 2.3. These seasonal models use annual urn:x-wiley:palo:media:palo20756:palo20756-math-0058 estimates, as monthly fields are not available from LeGrande and Schmidt (2006).

All calibration models were cross validated using Pareto-Smoothed Importance Sampling leave-one-out cross validation (LOOCV; Vehtari et al., 2017). This statistic measures model predictive performance by approximating a LOOCV—estimating each core top sample as though it were left out of the calibration for validation purposes. A relatively low score is “better,” indicating improved performance.

Our four models are “forward models”—that is, they predict urn:x-wiley:palo:media:palo20756:palo20756-math-0059 given SST and urn:x-wiley:palo:media:palo20756:palo20756-math-0060—but they can also be inverted, via Bayesian inference, to predict SST. To predict SST for a given urn:x-wiley:palo:media:palo20756:palo20756-math-0061 and urn:x-wiley:palo:media:palo20756:palo20756-math-0062, parameters are drawn from the full conditional posteriors of the calibration models for the likelihood and then combined with a prior distribution of SSTs to yield a posterior. The SSTs in Table 1 are a suggested starting point to develop a prior distribution, though users can specify values to fit different environmental settings.

Additional description of priors, hyperparameters, model inversion, and MCMC sampling is given in the appendix. We implemented the Bayesian models for this analysis with the pymc3 library (Salvatier et al., 2016) on an Open Source Python software stack (Hunter, 2007; Hoyer & Hamman, 2017; McKinney, 2010; Met Office, 2010; Oliphant, 2015). Code implementing our analysis is available online (https://github.com/brews/d18oc_sst). The calibration models are available for broader use with the Open Source bayfox software library, described in section 4.7.

3 Results and Discussion

The four Bayesian calibration models differ in whether they account for species-specific differences (“pooled” vs. “hierarchical”) or the seasonality of foraminiferal abundance (“annual” vs. “seasonal”). Our “pooled annual” model combines all five species together and calibrates core top urn:x-wiley:palo:media:palo20756:palo20756-math-0063 to annual mean SSTs. The “hierarchical annual” model also calibrates to annual mean SSTs but allows the calibration parameters to vary for each species. The “pooled seasonal” model and “hierarchical seasonal” model use the same pooled and hierarchical designs but are calibrated with our seasonal SST estimates—based on sediment trap fluxes (see section 2.3)—instead of annual SSTs.

3.1 The Core Top Relationship With SST

Within the core top data used for calibration, the relationship between annual SST and core top isotopic fractionation ( urn:x-wiley:palo:media:palo20756:palo20756-math-0064 - urn:x-wiley:palo:media:palo20756:palo20756-math-0065) is strongly negative for both the pooled data set (r= −0.97; p ≪ 0.01) and individual species data sets. The relationship sits close to previous calibrations based on inorganic calcite precipitation, live cultures, and plankton tows (Bemis et al., 1998; Kim & O'Neil, 1997; Mulitza et al., 2003a; O'Neil et al., 1969; Figure 4). Despite this, the core top data have notable spread and deviations relative to previous calibrations, which can be understood as the expression of uncertainty related to sedimentological factors (to glacial sediment mixing from bioturbation, loss of core top material when coring, and low sedimentation rates), biological factors (seasonal abundance and vital effects), and uncertainties in urn:x-wiley:palo:media:palo20756:palo20756-math-0066.

Details are in the caption following the image
Scatter plots showing core top fractionation ( urn:x-wiley:palo:media:palo20756:palo20756-math-0067 - urn:x-wiley:palo:media:palo20756:palo20756-math-0068) and annual SST. (a) Pooled foraminifera species. (b) Individual species. Dotted line is the inorganic calcite temperature relationship of Kim and O'Neil (1997). Annotations are Pearson correlation (r) between core top fractionation and annual mean SST. All Pearson correlations are statistically significant (p ≪ 0.01).

All four Bayesian calibration models reasonably replicate core top data spread when we predict core top fractionation (Figure 5). Calibration model spread is measured as the mean sample standard deviation of the posteriors ( urn:x-wiley:palo:media:palo20756:palo20756-math-0069). The pooled models have larger spread (Figure 5; urn:x-wiley:palo:media:palo20756:palo20756-math-0070 = 0.54‰ for the pooled annual model, urn:x-wiley:palo:media:palo20756:palo20756-math-0072 = 0.51‰ for the pooled seasonal model, 0.47‰ for the hierarchical annual model, and 0.49‰ for the hierarchical seasonal model). These urn:x-wiley:palo:media:palo20756:palo20756-math-0073 values correspond to uncertainties in SST of approximately 2.5–2.8 °C (with β =−0.19) or 1.9–2.1 °C (with β = −0.25), depending on the regression slope (β) of the calibration model. For comparison, the Orbulina universa high-light and low-light culture calibrations of Bemis et al. (1998) have standard errors between 0.10‰ and 0.15‰, which represents 0.5 to 0.7 °C using their calibration β of −0.21. The wider standard errors of species-specific plankton tow-based calibration of Mulitza et al. (2003a) range from 0.21‰ to 0.32‰ (0.9 and 1.2 °C at their calibration β= −0.23 and β = −0.21, respectively) likely resulting from the broader range of depth habitats and thermocline structure captured during plankton tow collection. The prediction spread is larger in our calibration models because our compilation of global core top data contains a wider range of uncertainty (e.g., calcification depth, postdepositional bioturbation, and differences in depositional age).

Details are in the caption following the image
Core top fractionation and SST compared with predictions from our (a) pooled annual model, (b) pooled seasonal model, (c) hierarchical annual model, and (c) hierarchical seasonal model. Our calibrations are compared with two inorganic calcite relationships (green lines), three culture calibrations (orange dots), and four species-specific calibrations from plankton tows (red dashes), over their respective calibration temperatures. The blue shading is the 95% credible interval (CI) of our calibration model prediction for the core top data (black circles). Annotated urn:x-wiley:palo:media:palo20756:palo20756-math-0071 is mean sample standard deviation, or mean standard error, of the core top prediction posteriors. The inorganic relationships are from O'Neil et al. (1969) and Kim and O'Neil (1997). Culture calibrations are from Bemis et al. (1998), for Orbulina universa (high light and low light) and Globigerina bulloides with 12-chambered shells. The plankton tow calibrations are from Mulitza et al. (2003a) for Globigerina bulloides, Globigerinoides ruber, Trilobatus sacculifer, and Neogloboquadrina pachyderma.

The posterior distributions of the slope parameter (the sensitivity between seawater temperature and urn:x-wiley:palo:media:palo20756:palo20756-math-0074; β) for the pooled annual and pooled seasonal models likewise fall near the bounds of thermodynamic expectation (Figure 6), with nearly identical mean values of −0.23. In the hierarchical models, β is allowed to deviate between species (βi) resulting in greater variability (Figure 6). In four of the five species, the median value of βi in the hierarchical annual model is lower than thermodynamic expectation (Figure 6). This tendency for lower sensitivity is also apparent in the posterior for the β hyperparameter—the slope parameter shared across all species—which is −0.18 ± 0.05 (95% CI) in the hierarchical annual model. In contrast, the hierarchical seasonal β hyperparameter more closely matches that of equilibrium fractionation (−0.21 ± 0.04, 95% CI). However, species-specific offsets in sensitivity do persist; in particular, βi is lower for N. incompta and T. sacculifer. These differences could simply be related to the narrower range of temperatures used for most of the species-specific calibrations (Bemis et al., 2002); for instance, βi values in the seasonal model are close to the inorganic sensitivity, and G. bulloides, which has a wider temperature range than the other species, has a posterior βi that is very similar to the pooled calibration values (Figure 6).

Details are in the caption following the image
Posterior distributions of the calibration slope parameter (βi) for each species in the hierarchical annual model (blue) and hierarchical seasonal model (orange). The gray dotted vertical lines show the range of slopes from inorganic calcite temperature relationships (Kim & O'Neil, 1997). The green vertical line denotes β from the pooled annual and seasonal calibrations (both have a small standard deviation of ∼0.0015).

3.2 Cross-Validation Statistics and Model Comparison

Model cross-validation statistics (LOOCV; Figure 7) suggest that predictive performance of the model improves when we account for seasonality of foraminiferal abundance and species-specific differences. The pooled seasonal model (LOOCV = 2,063 ± 65) outperforms the pooled annual model (LOOCV = 2,248 ± 74), and predictive performance further improves when we incorporate species-specific differences in the hierarchical annual model (LOOCV = 1,783 ± 68). The hierarchical seasonal model shows a decrease in validation performance (LOOCV = 1,952 ± 66) compared to the hierarchical annual model, but this does not necessarily mean that seasonality is unimportant. The lack of improvement partly reflects the fact that under the hierarchical model design, seasonal abundance differences can be subsumed under species-specific differences. This said, patterns in the residuals suggest that data from the Mediterranean region may explain the increased LOOCV in the hierarchical seasonal model (see discussion below).

Details are in the caption following the image
Comparison of model predictive performance based on cross validation (LOOCV). Lower LOOCV values indicate relatively better predictive performance, that is, lower expected deviance for predictions on new data. Circles are LOOCV mean; whiskers show ±1 standard error.

3.3 Model Residual Trends and Spatial Patterns

Taken as a whole, the residuals of the pooled annual calibration model are well behaved (Figure 8a). However, if we use the pooled annual model to predict urn:x-wiley:palo:media:palo20756:palo20756-math-0075 for individual species, strong trends emerge. At high SSTs, the model predicts more negative urn:x-wiley:palo:media:palo20756:palo20756-math-0076 than observed, and at low SSTs, the model predicts more positive urn:x-wiley:palo:media:palo20756:palo20756-math-0077 than observed, for all species except G. bulloides (Figure 8b). Predictions improve when the pooled seasonal model is used; residuals still retain similar species-specific trends (Figure 9a), but they are less severe, particularly for G. ruber and N. pachyderma. When the hierarchical calibration model is used for species-specific predictions, the trends in residuals are eliminated (Figures 10a and 11a). This, along with the LOOCV statistics, emphasizes that different species in the core top data have distinct urn:x-wiley:palo:media:palo20756:palo20756-math-0078 responses to temperature likely due to different lifestyles, depth habitats, and vital effects. Prediction is clearly improved by accounting for these differences.

Details are in the caption following the image
Pooled annual calibration model core top residuals (observed - predicted mean) as scatter plots for (a) species pooled together, (b) species separated, and (c) maps of mean residuals for each core top grid point. Whiskers bars in (a) and (b) are ±1 standard deviation.
Details are in the caption following the image
Pooled seasonal calibration model residuals (observed-predicted mean) as (a) scatter plots and (b) maps of mean residuals for each core top grid point. Whiskers bars in (a) are ±1 standard deviation.
Details are in the caption following the image
Hierarchical annual calibration model residuals (observed-predicted mean) as (a) scatter plots and (b) maps of mean residuals for each core top grid point. Whiskers bars in (a) are ±1 standard deviation.
Details are in the caption following the image
Hierarchical seasonal calibration model residuals (observed-predicted mean) as (a) scatter plots and (b) maps of mean residual for each core top grid point. Whiskers bars in (a) are ±1 standard deviation.

Model residuals also show spatially coherent patterns. For example, residuals are generally negative for G. ruber and T. sacculifer in the Western Pacific Warm Pool and positive for G. bulloides near the Southern Ocean (e.g., Figure 8b). These patterns are most prominent in the pooled calibration models (Figures 8b and 9b) and in some cases are alleviated by explicitly accounting for seasonality and species-specific offsets in the pooled seasonal and hierarchical seasonal models, respectively (Figure 11). However, some residual structures persist and may reflect biological responses to true geographic differences in secondary environmental parameters, such as gradients in urn:x-wiley:palo:media:palo20756:palo20756-math-0079, nutrients, and light penetration.

For example, G. bulloides has a demonstrated preference for nutrient-rich waters with high turbidity, and seasonal abundance of G. bulloides is strongly tied to regional upwelling patterns (Abrantes et al., 2002; Gibson et al., 2016). Thus, G. bulloides urn:x-wiley:palo:media:palo20756:palo20756-math-0080 may be strongly skewed to record temperatures from cold, upwelled waters. The positive urn:x-wiley:palo:media:palo20756:palo20756-math-0081 residuals for G. bulloides in the Benguela Current, Peru Current, and other upwelling sites along the Southern Ocean boundary may reflect this habitat preference; these positive residuals persist even when accounting for seasonality and species-specific sensitivity (Figures 8-11). In the southwestern Atlantic Ocean, positive residuals (model predicts warmer temperatures) may reflect a response of G. bulloides to increased turbidity from seasonal river discharge from the Rio de la Plata estuary and/or variations in nutrient delivery related to wind direction and interactions with the Brazil Current (Piola et al., 2005). These factors might skew G. bulloides abundance to cooler seasons and/or a deeper depth habitat.

We also observe persistent positive mean urn:x-wiley:palo:media:palo20756:palo20756-math-0082 residual values for G. ruber in the Mediterranean region in both the pooled seasonal and hierarchical seasonal models (Figures 9b and 11b). This group of residuals from the Mediterranean substantially contributes to the reduced performance of the seasonal hierarchical model (LOOCV = 1,952 ± 66) over the annual hierarchical (LOOCV = 1,783 ± 68) model. With Mediterranean core top data excluded, the two hierarchical models perform similarly (1,716 ± 66 and 1,691 ± 67, respectively).

The Mediterranean bias in the seasonal models suggests that either our estimation of seasonality for this region is problematic or that there are other factors influencing foraminiferal calcification that are not accounted for in our basic model setup. Our seasonality estimates predict that G. ruber abundance and shell flux should be skewed toward boreal summer and fall, which agrees well with plankton tow surveys in the Mediterranean (e.g., Pujol & Grazzini, 1995). We therefore favor the second explanation; that is, that there is another mechanism that might explain the Mediterranean model residuals. We can rule out issues related to bioturbation because the majority of the core top samples for these sites have age control with dates placing them in the Holocene or Late Holocene (Sabbatini et al., 2011). Postdepositional overgrowth from the supersaturation of bottom water calcite is known to complicate the use of the foraminiferal Mg/Ca proxy in the Mediterranean (e.g., Kontakiotis et al., 2011) and could feasibly also account for the positive urn:x-wiley:palo:media:palo20756:palo20756-math-0083 residuals. However, if this were the case, we would expect to see strong positive residuals in other Mediterranean species and not just G. ruber. Another explanation for this apparent bias in the Mediterranean relates to habitat depth. During the summer-fall season, G. ruber expands its depth habitat within the water column and therefore may be recording cooler temperatures. This is supported by substantial abundance of G. ruber observed down to 100 m in the summer-fall season in the Mediterranean (Pujol & Grazzini, 1995). Yet another source of uncertainty is the seasonal changes in urn:x-wiley:palo:media:palo20756:palo20756-math-0084 and salinity (of 1PSU) observed in the Mediterranean (e.g., MEDAR Group, 2002), which are not captured in the LeGrande and Schmidt (2006) urn:x-wiley:palo:media:palo20756:palo20756-math-0085 data.

We also observe a recurring pattern of positive urn:x-wiley:palo:media:palo20756:palo20756-math-0086 residuals (warmer predicted temperatures) near major oceanic frontal boundaries. This occurs for G. bulloides along the Southern Ocean front, for N. incompta near the confluence of the Agulhas and Benguela Currents, and for N. pachyderma near the boundary between the Labrador Current and Gulf Stream in the North Atlantic (Figure 11b). Such residual patterns could reflect advection of foraminifera shells which calcified along the colder side of these boundaries (Martínez-Méndez et al., 2010). However, sediment trap studies suggest that the rapid settling rate of foraminifera shells is unlikely to result in a strong advection bias (e.g., King & Howard, 2005), and differences in habitat related to water masses may be strongly dependent on species (e.g., Dyez et al., 2014). An alternate explanation is that sharp hydrological gradients in these frontal regions may bias the estimation of urn:x-wiley:palo:media:palo20756:palo20756-math-0087 from limited measurements (LeGrande & Schmidt, 2006; Waelbroeck et al., 2005). With a relatively coarse 1° ×  1°fields, our estimates of SST and especially urn:x-wiley:palo:media:palo20756:palo20756-math-0088 in boundary regions between water masses are likely poorly characterized.

A related issue is that assumptions about local urn:x-wiley:palo:media:palo20756:palo20756-math-0089-salinity relationships may introduce a region-specific uncertainty in urn:x-wiley:palo:media:palo20756:palo20756-math-0090, which is translated into highly coherent spatial patterns in urn:x-wiley:palo:media:palo20756:palo20756-math-0091 residuals. For example, G. bulloides, G. ruber, and T. sacculifer all have negative urn:x-wiley:palo:media:palo20756:palo20756-math-0092 mean residuals (generally colder predicted temperatures or higher predicted urn:x-wiley:palo:media:palo20756:palo20756-math-0093 than observed) in the eastern boundary upwelling regions of the Atlantic Ocean (Figure 11b). These trends are present in both the annual and seasonal versions of the hierarchical model. It is unlikely that these negative residuals represent a biological bias. Sediment trap studies off the coast of Saharan Africa demonstrate that G. ruber is more abundant during the winter months (Abrantes et al., 2002), yet the negative model residuals in urn:x-wiley:palo:media:palo20756:palo20756-math-0094 suggest that G. ruber is calcifying in warmer SSTs than predicted by the model. Here, a seasonal bias in the abundance of warm-water species is clearly not the cause of the geographic bias in model residuals, and a more likely candidate can be found in the predictor variables.

3.4 Summary of Calibration Model Performance

All four of our calibration models replicate the center and spread in the core top record (with model urn:x-wiley:palo:media:palo20756:palo20756-math-0095 from 0.54‰ to 0.47‰) while reasonably reproducing the temperature- urn:x-wiley:palo:media:palo20756:palo20756-math-0096 relationship given in established equilibrium calcite relationships (e.g., Kim & O'Neil, 1997; O'Neil et al., 1969), as well as relationships determined for foraminiferal calcite from live culture and plankton tows (e.g., Bemis et al., 1998; Mulitza et al., 2003b). There are species-specific trends in the residuals for both pooled models; these are eliminated in the hierarchical models as the latter allow the regression parameters to vary by species. All models show some residual bias along oceanic fronts and upwelling zones, where dynamic hydrography may introduce complex seasonal patterns in abundance and habitat depth. Overall, model performance improves when accounting for foraminiferal seasonality and species-specific variability, with the exception of the hierarchical seasonal model versus the hierarchical annual model. As discussed, the reduced performance in the hierarchical seasonal model is related to the unusual behavior of G. ruber data in the Mediterranean, which may reflect depth habitat migration. Even though the hierarchical seasonal model objectively performs worse by the LOOCV metric, it produces posterior distributions of temperature sensitivity (βi) in closer agreement with thermodynamic expectations (Figure 6). As we discuss below, the pooled annual model is a more appropriate choice for applications to extinct planktic species in the geologic past.

4 Examples and Applications

A key benefit of our Bayesian core top calibrations is that the models can propagate uncertainty from calibration into predictions about past climate conditions. We demonstrate this using several downcore examples from different oceanographic settings. For each example, we apply our calibration models to foraminiferal urn:x-wiley:palo:media:palo20756:palo20756-math-0098 and compare our results with SSTs inferred from independent organic geochemistry records from the same core sites (either urn:x-wiley:palo:media:palo20756:palo20756-math-0099 or TEX86 data) or sediment trap measurements. Inference of SST from urn:x-wiley:palo:media:palo20756:palo20756-math-0100 requires priors on SST means and standard deviations. We take these from Table 1, multiplying the standard deviation by 2. The urn:x-wiley:palo:media:palo20756:palo20756-math-0101 data are ice volume corrected before calibration to remove the changes in global urn:x-wiley:palo:media:palo20756:palo20756-math-0102 associated with ice sheets (see appendix). In all cases we use modern, annual urn:x-wiley:palo:media:palo20756:palo20756-math-0103 estimates from LeGrande and Schmidt (2006), unless noted otherwise.

4.1 G. bulloides in the Mediterranean: A Cosmopolitan Foraminifera Species

G. bulloides has a wide temperature tolerance (2.8–29.6 °C, Bè, 1977; Hemleben et al., 1989, Figure 2), so the seasonal and annual models perform effectively the same for this particular species; this is evident in the similar posterior distributions of β and βi (Figure 6). To investigate model performance on G. bulloides, we used our hierarchical annual calibration, though results are nearly identical for the seasonal calibration. As a test case, we applied the calibration to a G. bulloides urn:x-wiley:palo:media:palo20756:palo20756-math-0104 time series from core MD95-2043, spanning from 52 ka to present in the eastern Alboran Sea in the Mediterranean (Cacho et al., 1999). We used a modern urn:x-wiley:palo:media:palo20756:palo20756-math-0105 value of 1.05‰ (VSMOW; LeGrande & Schmidt, 2006) . Resulting SSTs are compared to an alkenone urn:x-wiley:palo:media:palo20756:palo20756-math-0106 reconstruction from the same site, calibrated with BAYSPLINE (Tierney & Tingley, 2018).

Our model-based G. bulloides reconstruction matches the mean and variability of the alkenone-based reconstruction remarkably well (Figure 12a), though the alkenone posterior is noticeably tighter ( urn:x-wiley:palo:media:palo20756:palo20756-math-0107 °C) than for G. bulloides ( urn:x-wiley:palo:media:palo20756:palo20756-math-0108 °C). However, the BAYSPLINE calibration explicitly calibrates alkenone data in the Mediterranean to November–May SSTs (Tierney & Tingley, 2018), and indeed the late Holocene values approach the modern November–May value of 16.5 °C. The median predicted values from G. bulloides are cooler than the present-day annual range and generally follow the alkenones. This may reflect a slight cool-seasonal bias for G. bulloides, which has peak abundance in the winter and spring in the Western Mediterranean (Bárcena et al., 2004; Rigual-Hernández et al., 2012). However, the uncertainty bounds are large and the latest Holocene estimates do encompass modern mean annual SSTs (18.8 °C; Boyer et al., 2013). Within calibration uncertainty, the model reconstruction using G. bulloides reasonably estimates annual SST.

Details are in the caption following the image
SST reconstructions from sediment core records using our annual (a–c) and seasonal (d) hierarchical SST calibrations, compared with Bayesian alkenone urn:x-wiley:palo:media:palo20756:palo20756-math-0097 reconstructions using the BAYSPLINE calibration (Tierney & Tingley, 2018) at three core sites. (a) Globigerina bulloides in the Alboran Sea in the Mediterranean (Cacho et al., 1999). (b) Neogloboquadrina pachyderma in the Gulf of Alaska (Davies et al., 2011; Davies-Walczak et al., 2014; Praetorius et al., 2015). (c, d) Globigerinoides ruber in the eastern equatorial Pacific (Koutavas & Sachs, 2008; Koutavas & Joanides, 2012). The Alboran Sea and Gulf of Alaska alkenone reconstructions reflect SSTs from November to May and June to August, respectively (Tierney & Tingley, 2018). Solid lines are posterior distribution means with ±1 standard deviation shading. Solid green bars are the site modern annual or seasonal SST means ±1 standard deviation, using monthly (1981–2017) National Oceanic and Atmospheric Administration Optimum Interpolation Sea Surface Temperature V2 (Reynolds et al., 2002).

4.2 High-Latitude Settings: Cool-Temperature Foraminifera

To investigate the performance of our model in a cold-water setting (Figure 12b), we apply our annual hierarchical calibration to a deglacial-to-Holocene N. pachyderma urn:x-wiley:palo:media:palo20756:palo20756-math-0109 record from core EW0408-85JC on the Gulf of Alaska (Davies et al., 2011; Davies-Walczak et al., 2014; Praetorius et al., 2015). The reconstruction uses modern urn:x-wiley:palo:media:palo20756:palo20756-math-0110‰ (VSMOW; LeGrande & Schmidt, 2006) . The results are similar if we used the seasonal hierarchical model (not shown). The N. pachyderma record shows similar changes through time as urn:x-wiley:palo:media:palo20756:palo20756-math-0111 but is offset from the latter by appropximately 2–3 °C. At this location, the urn:x-wiley:palo:media:palo20756:palo20756-math-0112 calibration BAYSPLINE assumes a June–August bias and explicitly predicts summer temperatures; thus, the warm offset is expected and primarily reflects differences in the seasonal production. The N. pachyderma reconstruction is slightly cooler on average than the modern annual SST range. N. pachyderma occupies a broad depth habitat range in the North Pacific (Kuroyanagi et al., 2011), and so the slightly cooler SSTs predicted may reflect the offset between sea surface and actual N. pachyderma habitat depth, which we do not explicitly account for with our calibrations. Additionally, N. pachyderma tends to have peak abundance in the spring and late winter in the Gulf of Alaska (Sautter & Thunell, 1989), which has modern SSTs closer to the late-Holocene predictions. Regardless, the reconstructed Holocene SSTs fall within uncertainty of present-day mean annual values, suggesting that our model is reasonably accurate at estimating annual SST from N. pachyderma  urn:x-wiley:palo:media:palo20756:palo20756-math-0113 at this location.

4.3 Annual Versus Seasonal Calibration

To explore the impact of accounting for seasonality, we apply both the annual and seasonal hierarchical models to a G. ruber record from core VM21-30 on the eastern equatorial Pacific spanning the last 30 ka (Koutavas & Sachs, 2008; Koutavas & Joanides, 2012). At this site, the effect of model selection is notable, with the annual model resulting in overall cooler temperatures and a larger range of variability (Figure 12c) than the seasonal model (Figure 12d). In both cases the we used a modern urn:x-wiley:palo:media:palo20756:palo20756-math-0114 of 0.24‰ (VSMOW; LeGrande & Schmidt, 2006) . The annual calibration and an alkenone-based reconstruction have similar uncertainties (both urn:x-wiley:palo:media:palo20756:palo20756-math-0115 °C; Figure 12c). The seasonal calibration results in temperatures that agree more closely with the alkenone-inferred SSTs, both in the mean and magnitude of reconstructed trends (Figure 12d), as well as producing a tighter reconstruction ( urn:x-wiley:palo:media:palo20756:palo20756-math-0116 °C). Observations of G. ruber in the tropical Pacific show peak abundance for warm SSTs (Thunell et al., 1983), and our flux-based estimates suggest that G. ruber at this site should be seasonally biased toward December through May, the time of year when SSTs are at their warmest (Fiedler & Talley, 2006). The fact that the alkenone reconstruction best matches our seasonal G. ruber predictions (Figure 12d) suggests that the urn:x-wiley:palo:media:palo20756:palo20756-math-0117 record could also be biased toward warmer months rather than recording annual SSTs. However, there is no indication of warm bias in the eastern equatorial Pacific core top alkenone data (Kienast et al., 2012; Tierney & Tingley, 2018), and the urn:x-wiley:palo:media:palo20756:palo20756-math-0118 predictions during the Holocene still overlap with the range of annual SSTs at this site.

4.4 Influence of Freshwater Input and Changes in urn:x-wiley:palo:media:palo20756:palo20756-math-0119

To see the impact of freshwater inputs on a urn:x-wiley:palo:media:palo20756:palo20756-math-0120-based temperature reconstruction, we apply our calibration to a G. ruber record from core GeoB 6518-1 (Schefuß et al., 2005) on the west coast of Africa in the Gulf of Guinea, near the mouth of the Congo River (Figure 13a). The core spans the last deglaciation (20 ka to present), which saw dramatic changes to the central African hydroclimate (Gasse, 2000; Schefuß et al., 2005). Alkenone urn:x-wiley:palo:media:palo20756:palo20756-math-0121 data, which are not affected by changing urn:x-wiley:palo:media:palo20756:palo20756-math-0122 or freshwater input, are available from this site for comparison with urn:x-wiley:palo:media:palo20756:palo20756-math-0123 (Schefuß et al., 2005). As in earlier examples, we apply our hierarchical seasonal calibration to the G. ruber urn:x-wiley:palo:media:palo20756:palo20756-math-0124 record and use BAYSPLINE (Tierney & Tingley, 2018) to reconstruct temperature from the urn:x-wiley:palo:media:palo20756:palo20756-math-0125 record. The estimated modern urn:x-wiley:palo:media:palo20756:palo20756-math-0126 for the site is 0.52‰ (VSMOW; LeGrande & Schmidt, 2006) , and our estimated peak seasonal growth for G. ruber at this site is from September to June. The reconstructions from urn:x-wiley:palo:media:palo20756:palo20756-math-0127 and urn:x-wiley:palo:media:palo20756:palo20756-math-0128 are relatively similar in the Late Holocene, the Younger Dryas (∼12 ka), and the Last Glacial Maximum. Outside of these periods, we see the foraminifera-based reconstruction diverge from the alkenone-based reconstruction, predicting warmer temperatures. These periods (the Early Holocene and Bølling-Allerød) correspond to times of larger freshwater inputs from increased precipitation across the extensive Congo River basin (Schefuß et al., 2005). This application demonstrates that, in coastal regions with large freshwater inputs, urn:x-wiley:palo:media:palo20756:palo20756-math-0129 will be biased toward more negative values, even when our new calibration models are applied.

Details are in the caption following the image
Other example applications. (a) The seasonal hierarchical calibration applied to G. ruber to reconstruct SST in the Gulf of Guinea (Schefuß et al., 2005). Separation between the foraminifera and alkenone-based reconstructions indicates periods with increased freshwater input. Solid lines are posterior distribution means with ±1 standard deviation shading. Solid green bars are the site modern annual or seasonal SST means ± 1 standard deviation, using monthly (1981–2017) NOAA Optimum Interpolation Sea Surface Temperature V2 (Reynolds et al., 2002). (b) The annual hierarchical calibration used to predict urn:x-wiley:palo:media:palo20756:palo20756-math-0130 for G. ruber in the northern Gulf of Mexico, compared with averaged sediment trap observations (2010–2013) from Richey et al. (2019). The predictions were run with urn:x-wiley:palo:media:palo20756:palo20756-math-0131 of 1.66‰ and 0.86‰ (Vienna Standard Mean Ocean Water), based on the observed mean and range urn:x-wiley:palo:media:palo20756:palo20756-math-0132 for the site. These predictions used HadISST monthly average SSTs (Rayner et al., 2003), which reasonably replicated buoy-measured SST seasonality (Richey et al., 2019). All errors are ±1 standard deviation. (c) The annual pooled calibration applied to Morozovella spp. urn:x-wiley:palo:media:palo20756:palo20756-math-0133 (John et al., 2008) analyzed in a Paleocene-Eocene Thermal Maximum section from Bass River, New Jersey. TEX86 (Sluijs et al., 2007) is calibrated using BAYSPAR (Tierney & Tingley, 2014). Samples are plotted against downcore depth. Solid and dotted lines show posterior means. Shading shows ±1 standard deviation. Dotted urn:x-wiley:palo:media:palo20756:palo20756-math-0134 lines are the calibration run with 0.5‰ and −0.5‰ for urn:x-wiley:palo:media:palo20756:palo20756-math-0135, at around 0.0‰ modeled in Tindall et al. (2010).

4.5 Gulf of Mexico Sediment Traps: Predicting Monthly urn:x-wiley:palo:media:palo20756:palo20756-math-0136 of G. ruber

In this example we test how well our calibration can replicate the seasonality of monthly urn:x-wiley:palo:media:palo20756:palo20756-math-0137 measured from G. ruber (white and pink) at a sediment trap site, where SSTs and urn:x-wiley:palo:media:palo20756:palo20756-math-0138 values are well constrained (Figure 13b). The site is in the northern Gulf of Mexico and has repeated urn:x-wiley:palo:media:palo20756:palo20756-math-0139 measurements for each month from 2010 to 2013 (Richey et al., 2019). These years are pooled into monthly mean urn:x-wiley:palo:media:palo20756:palo20756-math-0140 values. We use the hierarchical annual calibration and not the seasonal variant because the seasonal calibrations favor SSTs associated with high foraminifera abundance and so tend to be better at predicting peak abundance seasons rather than growth for each month of the year. We use HadISST (Rayner et al., 2003) monthly mean SSTs from the nearest grid point to the sediment trap site, as Richey et al. (2019) found that HadISST reasonably replicated the seasonality of SSTs when compared with measurements from a local buoy. We create two predictions each with urn:x-wiley:palo:media:palo20756:palo20756-math-0141 values of 1.40‰ (VSMOW) and 0.86‰ (VSMOW), following the measured mean urn:x-wiley:palo:media:palo20756:palo20756-math-0142 of 1.13‰ and annual range of 0.53‰  between 0 and 50m depth (Richey et al., 2019). We lag our calibration predictions by 1 month to account for the time needed for foraminiferal tests to settle in the water column. Our calibration replicates the observed urn:x-wiley:palo:media:palo20756:palo20756-math-0143 seasonal pattern remarkably well (Figure 13b). The closest predictions are from the model using urn:x-wiley:palo:media:palo20756:palo20756-math-0144‰. The absolute difference of its monthly prediction and observation means for G. ruber(pink) is between 0.02‰ and 0.24‰  and between 0.03‰ and 0.66‰ for G. ruber(white). This is an encouraging result given that the full prediction distribution has urn:x-wiley:palo:media:palo20756:palo20756-math-0145‰ and the additional spread in the sediment trap observations can be as high as σ= 0.47‰ (see Figure 13b).

4.6 Application to Deep-Time Paleoclimate Reconstructions

Data from Late Quaternary sediment cores benefit from application of our hierarchical models, but these models cannot be reliably applied to non-extant planktic species in deeper geological time. Here, our annual pooled model is a more appropriate choice, and this was one of our primary motivations for developing this calibration model. In the pooled model, all species are assumed to calcify similarly, which enables us to approximate a general “planktic dependency” of urn:x-wiley:palo:media:palo20756:palo20756-math-0146 on annual SSTs. Arguably, this is the best first-order approach for applications to non-extant species, for which species-specific information such as seasonality, depth habitat, presence of symbionts, and crust formation are poorly constrained.

As a demonstration, we apply our pooled model to a urn:x-wiley:palo:media:palo20756:palo20756-math-0147 record of the planktic Morozovella spp. from the Paleocene-Eocene Thermal Maximum marine section from Bass River, New Jersey (John et al., 2008) (Figure 13). These specimens are “glassy” (well preserved) and thus are unlikely to show any of the isotopic overprinting common in “frosty” deep-sea specimens from this period, which can generate anomalously cold SST reconstructions from simplistic interpretation of urn:x-wiley:palo:media:palo20756:palo20756-math-0148 measurements (Kozdon et al., 2011). We show reconstructions for urn:x-wiley:palo:media:palo20756:palo20756-math-0149‰, which are plausible values for this site (John et al., 2008), bounding the modeled value of 0.0‰ from Tindall et al. (2010). We compare the results to a TEX86-based SST reconstruction from the same section (Sluijs et al., 2007). The TEX86 data are calibrated to SSTs using the BAYSPAR analogue method as described in Tierney and Tingley (2014). We use a search tolerance of 0.15, resulting in 11 modern analog grid points. We use a wide, weakly informative prior for SST (Gaussian, 30.0 °C mean and 20.0 °C standard deviation) for both the TEX86 and foraminiferal data.

The inferred SSTs from the Morozovella spp. and TEX86 data generally agree with one another and overlap within uncertainties, though there is increased separation just below 355 m, near the onset of the Paleocene-Eocene Thermal Maximum (Figure 13c). It is tempting to interpret this as temporary differentiation of depth habitat (e.g., Morozovella spp. migrating deeper in the water column and/or TEX86 producers forced to the surface), but the large uncertainties on both SST reconstructions must also be considered. We note that the average standard deviation of the foraminiferal reconstruction ( urn:x-wiley:palo:media:palo20756:palo20756-math-0150 °C) is much smaller than the uncertainty on the TEX86 estimates ( urn:x-wiley:palo:media:palo20756:palo20756-math-0151 °C). This can be attributed to the fact that the TEX86 values for this sequence (which approach 0.9) are far outside the range of the calibration data set—which contains only a few values above 0.75—requiring heavy extrapolation. Another important factor is that overall, the temperature sensitivity of TEX86 is poorly constrained relative to β in our urn:x-wiley:palo:media:palo20756:palo20756-math-0152 calibrations. Indeed, β is well estimated by our pooled model and remarkably similar to thermodynamic expectation (annual pooled β posterior 95% CI is between −0.234 and −0.228) for modern species. That said, the sensitivity of the urn:x-wiley:palo:media:palo20756:palo20756-math-0153 proxy to urn:x-wiley:palo:media:palo20756:palo20756-math-0154 must be considered, where a difference of 1.0‰ corresponds to approximately 4.1 °C (Figure 13c). In an unglaciated Eocene world at 55 Ma, global urn:x-wiley:palo:media:palo20756:palo20756-math-0155 can be fairly well estimated, but poorly constrained local variations in urn:x-wiley:palo:media:palo20756:palo20756-math-0156 are another potential source of uncertainty to interpreting urn:x-wiley:palo:media:palo20756:palo20756-math-0157. This deep-time example underlines the importance of constraining variability and quantifying uncertainty in urn:x-wiley:palo:media:palo20756:palo20756-math-0158 when using urn:x-wiley:palo:media:palo20756:palo20756-math-0159 to reconstruct SST.

Overall, these examples show that our calibration models produce reasonable results when applied to foraminiferal records of urn:x-wiley:palo:media:palo20756:palo20756-math-0160. Our SST reconstructions compare favorably with independent SST reconstructions using biomarker-based proxies in a variety of paleoceanographic settings and using different planktic species. We are also able to predict subannual urn:x-wiley:palo:media:palo20756:palo20756-math-0161 seasonality found in sediment trap measurements. Importantly, these examples demonstrate how our models incorporate multiple sources of uncertainty in reconstructing SST, which is often not reported or considered in published downcore records and subsequent interpretations. This fundamental uncertainty in SST reconstructions, which is inherent in any paleo-proxy, needs to be considered when comparing records, integrating SST reconstructions across geographic regions or comparing data to climate model output.

4.7 bayfox: Bayesian Foraminifera Calibration Software

Our calibration models are available to users as a software library called bayfox. This library is packaged and available in both Python (https://github.com/brews/bayfox) and R (https://github.com/brews/bayfoxr). Scripts are also available for MATLAB/Octave (https://github.com/brews/bayfoxm). These packages include both forward and reverse calibration models so that users can infer urn:x-wiley:palo:media:palo20756:palo20756-math-0162 from SSTs and infer SSTs from urn:x-wiley:palo:media:palo20756:palo20756-math-0163. The software is available under an Open Source license.

5 Conclusions

Our Bayesian calibration models enhance the widespread use of planktic urn:x-wiley:palo:media:palo20756:palo20756-math-0164 in paleoceanography by providing a realistic representation of proxy uncertainty based on core top variability. We find that, in spite of the many biological and environmental factors that can influence planktic urn:x-wiley:palo:media:palo20756:palo20756-math-0165, the inferred sensitivity to SST is remarkably similar to established inorganic calcite calibration curves, attesting to the fidelity of urn:x-wiley:palo:media:palo20756:palo20756-math-0166. Our annual, seasonal, and species-specific calibration exercises demonstrate that model performance is improved by accounting for foraminiferal seasonal abundance and species-specific variability in calibration parameters. However, some residual patterns remain and can be difficult to diagnose due to the complex set of environmental factors which control the abundance of each foraminiferal species, as well as uncertainty in observed variables–specifically urn:x-wiley:palo:media:palo20756:palo20756-math-0167. We demonstrate how the calibration can be used to reconstruct SST in the Late Quaternary, where generally speaking, the most applicable model is the hierarchical seasonal model. We demonstrate how the calibration can replicate the seasonal signal of urn:x-wiley:palo:media:palo20756:palo20756-math-0168 observed in Gulf of Mexico sediment traps. We also demonstrate how our pooled annual model can be used to infer SSTs from urn:x-wiley:palo:media:palo20756:palo20756-math-0169 of non-extant species of planktic foraminifera in deeper geological time. We have made the calibration models available in Open Source software libraries (bayfox), so that users can apply these calibrations to both forward and inverse urn:x-wiley:palo:media:palo20756:palo20756-math-0170 modeling problems.

Acknowledgments

This research was funded by the Heising-Simons Foundation (2016-015) and the National Science Foundation (AGS-1602156). We thank our Editors and anonymous reviewers for their time and thoughtful comments. Thanks to Kaustubh Thirumalai for help collecting sediment trap data. Core top data used for this analysis are available as supporting information. Open Source Software packages implementing these calibrations are available for Python (https://github.com/brews/bayfox) and the R statistical environment (https://github.com/brews/bayfoxr). Scripts are also available for MATLAB/Octave (https://github.com/brews/bayfoxm).

    Appendix A: Bayesian Regression Model Design and Priors

    As described in section 2.4, we trained four Bayesian calibration models using two model designs: one that pools all species together and one that uses a hierarchical structure to account for variations between species. The pooled model likelihood is described in equation 1, where α, β, and τ are the regression intercept, coefficient, and standard error, respectively. α, β and τ were given weakly informed prior distributions
    urn:x-wiley:palo:media:palo20756:palo20756-math-0171

    The normal priors around α and β loosely reflect existing inorganic precipitation and culture-derived calibrations from the literature (see Bemis et al., 1998). However, we found that less informed α and β prior distributions, for example, urn:x-wiley:palo:media:palo20756:palo20756-math-0172, produced comparable results.

    The hierarchical model (equation 2) and its priors (equation 3) depend on shared hyperparameters
    urn:x-wiley:palo:media:palo20756:palo20756-math-0173

    With this design, updates to individual species parameters also inform shared hyperparameters. At the same time, hyperparameters influence the individual species parameters.

    We infer the posterior distribution of the models with a No-U-Turn Sampler (Hoffman & Gelman, 2014)–an MCMC sampler variant using Hamiltonian mechanics. We initialized No-U-Turn Sampler using an identity mass matrix with a diagonal adapted to the variance of the sampler tuning steps. The start of each sampling chain was the prior mean with added uniform noise (between −1 and 1). The sampler was run in two chains each with 1,000 tuning draws followed by 5,000 draws. We compared the chains for convergence and autocorrelation. Tuning steps were removed from the draws, and the chains were combined for analysis.

    The forward models described above can be “reversed” to infer SSTs. Inference of SSTs, C, requires a vector of observed δc and δw, and Gaussian priors for SST. The full conditional posterior for C, given all other variables, is then a multivariate normal distribution
    urn:x-wiley:palo:media:palo20756:palo20756-math-0174
    where
    urn:x-wiley:palo:media:palo20756:palo20756-math-0175
    where urn:x-wiley:palo:media:palo20756:palo20756-math-0176 is the prior covariance matrix and μC is the prior mean vector. This inversion is very close to that described in Tierney and Tingley (2014) but includes the urn:x-wiley:palo:media:palo20756:palo20756-math-0177 offset. Other parameters (τ, α, β) are drawn from the posterior of the forward calibration models. This applies to the pooled calibration model. For the hierarchical calibrations, τ, α, and β are replaced with their species-specific parameters τi, αi, and βi. Given these parameters, no additional steps are needed for the hierarchical calibrations.

    Appendix B: Ice Volume Correction

    We applied a simple ice volume correction to the urn:x-wiley:palo:media:palo20756:palo20756-math-0178 proxy records used in section 4. We assumed that the global urn:x-wiley:palo:media:palo20756:palo20756-math-0179 change since the Last Glacial Maximum is 1.0‰, following the pore water estimates of Schrag et al. (1996). We applied this to scale the LR04 benthic stack (Lisiecki & Raymo, 2005)—a proxy for global ice volume. This scaled time series is an estimate of global ice volume change in δ18O for seawater ( urn:x-wiley:palo:media:palo20756:palo20756-math-0180). We removed the ice volume change from a urn:x-wiley:palo:media:palo20756:palo20756-math-0181 proxy sample with
    urn:x-wiley:palo:media:palo20756:palo20756-math-0182
    where δcor is the ice volume corrected urn:x-wiley:palo:media:palo20756:palo20756-math-0183 proxy sample and δice is urn:x-wiley:palo:media:palo20756:palo20756-math-0184 for the corresponding date of δc. If urn:x-wiley:palo:media:palo20756:palo20756-math-0185 had no exact corresponding date, δc was linearly interpolated from urn:x-wiley:palo:media:palo20756:palo20756-math-0186 values.

    Software implementing this ice volume correction is available for broader use in the Open Source Python package erebusfall (https://github.com/brews/erebusfall).