Volume 34, Issue 8 p. 1249-1270
Research Article
Free Access

Calibration and Validation of Environmental Controls on Planktic Foraminifera Mg/Ca Using Global Core-Top Data

Casey P. Saenger

Corresponding Author

Casey P. Saenger

Joint Institute for the Study of the Atmosphere and Ocean, University of Washington, Seattle, WA, USA

Correspondence to: C. P. Saenger,

[email protected]

Search for more papers by this author
Michael N. Evans

Michael N. Evans

Department of Geology and Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD, USA

Search for more papers by this author
First published: 14 June 2019
Citations: 9

Abstract

The Mg/Ca of planktic foraminifera is commonly assumed to be a univariate function of ocean temperature, but recent work suggests that nonthermal variables may have a secondary effect, thereby complicating an inverse approach for paleotemperature reconstructions. However, the significance of secondary variables has not been independently validated, and their inclusion may reflect statistical overfitting. Here we evaluate the significance of seven predictive variables on a global compilation (n = 1,124) of core-top planktic foraminifera Mg/Ca spanning five species. An additive approach was used to construct models that included only variables with significant validation skill. Optimal models support the use of Mg/Ca as a paleothermometer but also find evidence for nonthermal variables: (a) Mg/Ca (N. pachy+incompta) = 0.186 ± 0.001 · exp(0.095 ± 0.002 · T) + 2.95E-3 ± 5.4E-6 · size; (b) Mg/Ca (G. ruber) = 0.685 ± 0.005 · exp(0.058 ± 0.001 · T) + 0.928 ± 0.02 · Omega(deep); (c); Mg/Ca (G. inflata) = 1.737 ± 0.007 · exp(0.065 ± 0.001 · T) · Omega(shallow)^-0.734 ± 0.003; and (d) Mg/Ca (G. bulloides) = 0.623 ± 0.007 · exp(0.079 ± 0.001 · T) + 0.548 ± 0.007 · Omega(deep). Results highlight the importance of independent validation and suggest that Ω and size dependences may complicate univariate inversions for paleotemperature in some species. This is particularly true over timescales when secondary terms cannot be considered constant. Forward modeling is an attractive alternative, and improved Mg/Ca models may reduce bias in paleoceanographic data assimilation efforts. Salinity is absent from all relationships, suggesting that it may have a smaller influence than previously thought or may not vary sufficiently to be resolved at core-top sites. The stepwise approach illustrated in this study is widely applicable, and validation exercises should be applied to the calibration of other paleoceanographic proxies.

Key Points

  • Independent validation is required to evaluate the skill of any paleoclimate proxy calibration
  • Cross validation of global core-top Mg/Ca calibrations suggests nonthermal effects in all species of planktic foraminifera considered
  • The validity of inverse approaches for reconstructing temperature varies with species

Plain Language Summary

Variations in the amount of magnesium in the shells of foraminifera are a primary tool for estimating past ocean temperature. A common method for calibrating the relationship between magnesium and temperature is to compare shells in recent ocean sediments to instrumental temperatures. That relationship can then be applied to fossil shells in sediment cores. Laboratory culture experiments suggest that magnesium may also depend on additional variables other than temperature, which have been incorporated into calibrations. However, a calibration with many variables may fit observations too well, leading to errors when it is applied to independent data. Evaluating the performance of calibrations by validating them with independent data is critical to avoid this but is rarely applied in paleoceanography. We apply a validation method to evaluate which nontemperature variables are most important for five species and calculate calibrations that are most valid globally. We find evidence for at least one nontemperature variable for each species. The influence of variables other than temperature may complicate traditional approaches for reconstructing past ocean temperature, particularly over many millennia, but will help efforts to integrate paleoclimate data with climate simulations. The validation approach we apply is adaptable and should be applied to other paleoceanographic calibrations.

1 Introduction

The magnesium to calcium ratio (Mg/Ca) of planktic foraminifera is a widely applied technique for estimating past variations in the ocean's temperature (Barker et al., 2005; Elderfield & Ganssen, 2000; Lea et al., 2003; Oppo et al., 2009; Saenger et al., 2011). The varying depth habitats of different species create the potential to reconstruct ocean temperature from the surface to thermocline, but realizing this potential is contingent on accurately understanding both species-specific Mg/Ca-temperature relationships and the impact of additional controls on the Mg/Ca of foraminiferal calcite.

The Mg/Ca-temperature relationship is typically calibrated using foraminifera preserved in recent core-top sediments (Dekens et al., 2002; Elderfield & Ganssen, 2000), collected in sediment traps or plankton tows (Anand et al., 2003; Martínez-Botí et al., 2011; McConnell & Thunell, 2005), or reared in culture experiments (Allen et al., 2016; Kısakürek et al., 2008; Lea et al., 1999). Most calibrations consider Mg/Ca to be a univariate exponential function of temperature, but it has long been suggested that additional variables may also have an appreciable effect on Mg/Ca. For example, shell size (Elderfield et al., 2002; Friedrich et al., 2012), postdepositional diagenesis (Brown & Elderfield, 1996; Dekens et al., 2002; Nouet & Bassinot, 2007; Regenberg et al., 2014, 2006), pH and/or saturation state (Evans et al., 2016; Kısakürek et al., 2008; Russell et al., 2004), and salinity (Arbuszewski et al., 2010; Dueñas-Bohórquez et al., 2009; Hönisch et al., 2013; Kısakürek et al., 2008; Lea et al., 1999; Mathien-Blard & Bassinot, 2009) have all been suggested to impact Mg/Ca independent of temperature.

Multivariate calibrations have considered the influence of nonthermal variables on foraminiferal Mg/Ca, with salinity and variables associated with the carbonate system being the most common (Table 1). The latter may influence Mg/Ca both at shallow depths as a primary control on Mg incorporation during calcification (Evans et al., 2016), as well as at the depth of sedimentary deposition where diagenesis may preferentially remove high-Mg shell regions (Brown & Elderfield, 1996; Dekens et al., 2002). In shallow environments, both pH and carbonate ion concentration (CO3) have been suggested to affect Mg/Ca (Allen et al., 2016; Evans et al., 2016; Lea et al., 1999; Russell et al., 2004), while diagenetic effects at the sediment-water interface are usually expressed as the difference between seawater CO3 and that at saturation (∆CO3; Dekens et al., 2002; Khider et al., 2015; Regenberg et al., 2014, 2006).

Table 1. Published Multivariate Mg/Ca Calibrations
Species Variables Archive Reference
G. ruber, G. sacculifer, N. dutertrei T, [∆CO3]deep Atlantic and Pacific core-tops Dekens et al. (2002)
G. ruber T, S Culture Kısakürek et al. (2008)
G. ruber T, S, [∆CO3]deep Global core-top Khider et al., (2015)
G. ruber T, S, [CO3]/pHshallow Global sediment traps Gray et al. (2018)
G. ruber T, S, [∆CO3]deep Atlantic core-top Arbuszewski et al. (2010; cf. Hertzberg & Schmidt, 2013)
G. ruber T, S Global core-top Mathien-Blard and Bassinot (2009)
G. bulloides, O. universa T, S, pHshallow Culture Lea et al. (1999)
G. bulloides, O. universa T, [CO3]shallow Culture Russell et al. (2004)

Although the potential influence of secondary variables has been frequently proposed, it is not typically tested against independent Mg/Ca data not used to derive the calibration. Validation of Mg/Ca calibrations using independent data are important for identifying more accurate estimates of random and systematic uncertainty and can help avoid overfitting calibration models (Wilks, 2006). Overfitting describes the case when additional independent variables (e.g., salinity, shell size, and ∆CO3) improve the fit to Mg/Ca in the calibration data set (e.g., core-top sediments) but do so at the expense of accurately estimating Mg/Ca in an independent data set with the same predictor variables (e.g., a downcore reconstruction). Thus, overfitting can reduce the generalizability of a calibration, can introduce unrecognized systematic error, and may latently increase the uncertainty in out-of-sample paleoceanographic applications of Mg/Ca.

Overfitting can be avoided by comparing model predictions with independent (i.e., out-of-sample) observations—the process of model validation. As an illustration, consider four data sets from our global compilation (section 2). Elderfield and Ganssen (2000) and Yu et al. (2008) evaluated the temperature dependence of Globigerina bulloides Mg/Ca preserved in North Atlantic core-top sediments. We fit their combined Mg/Ca data to an exponential function of temperature and, consistent with their results, find a strong correlation between Mg/Ca and temperature (Table 2). While not considered in the original publication, adding salinity to the calibrations improves the fit and root-mean-square error (RMSE), while the addition of calcite saturation state at the representative calcification depth (Ωshallow) further improves both metrics. These results suggest that a trivariate model of G. bulloides Mg/Ca is optimal, but if this were true, one would expect the calibration to perform similarly well when applied to data not used to build the model structure or to determine its coefficients.

Table 2. Mg/Ca Calibration Overfitting Example
T onlya T, S T, S, Ωshallow
Fit of combined EG00 & Y08 data to environmental variablesb
r2 0.40 (0.008) 0.46 (0.003) 0.50 (0.002)
RMSE (mmol/mol) 0.63 (0.003) 0.61 (0.001) 0.60 (0.001)
Application of EG00 + Y08 calibration to C08 datab
RMSE 0.43 (0.006) 0.37 (0.004) 0.40 (0.007)
CE 0.27 (0.02) 0.45 (0.01) 0.32 (0.03)
Fit of combined EG00, Y08, & C08 data to T & Sb
r2 0.46 (0.003)
RMSE (mmol/mol) 0.58 (0.001)
Application of EG00 + Y08 + C08 calibration to QK17 datab
RMSE 1.17 (0.03)
CE (mmol/mol) −0.05 (0.07)
  • Note. CE = coefficient of efficiency; RMSE = root-mean-square error.
  • a Parenthetical values are the 99% confidence interval for each mean.
  • b EG is Elderfield and Ganssen (2000), Y08 is Yu et al. (2008), C08 is Cléroux et al. (2008), and QK17 is Quintana Krupinski et al. (2017).

The G. bulloides Mg/Ca data of Cléroux et al. (2008) span a similar North Atlantic spatial domain and provide an opportunity to evaluate the calibration. Applying the Elderfield and Ganssen, and Yu et al. temperature-only calibration to Cléroux et al. data yields a low validation RMSE, which is further reduced when salinity is added as an additional predictor, but increases with the addition of Ωshallow (Table 2). This suggests that a bivariate temperature and salinity calibration may be a more appropriate choice and that the addition of Ωshallow as a predictor produces overfitting. The coefficient of efficiency (CE) statistic (Cook et al., 1999; Nash & Sutcliffe, 1970) compares resolved variance to actual variance for validation predictions and actual observations. Positive values are typically interpreted to have predictive skill above the mean of validation data. The addition of salinity to the univariate temperature model increases CE, but the addition of Ωshallow decreases CE. The evaluation of RMSE and CE in this example suggests that a bivariate temperature-salinity model may be optimal, while the inclusion of Ωshallow may produce overfitting. After identifying optimal variables, best practices suggest the final model should include all data (Wilks, 2006). In this example, that is a new bivariate temperature-salinity calibration using the combined Mg/Ca data of Cléroux et al., Elderfield and Ganssen, and Yu et al. that has demonstrable skill at predicting Mg/Ca within the North Atlantic (Table 2). Whether or not this is true in other spatial domains requires another round of validation, which can be performed using the more globally distributed data of Quintana Krupinski et al. (2017), and results in a much higher RMSE and unskillful CE (Table 2). Together, these results suggest that models for Mg/Ca may indeed be regionally specific, but independent validation of such models is necessary to demonstrate this.

The potential for model validation decreases as the amount of available data approaches the number of variables used in the calibration (Andersen & Bro, 2010), suggesting that the absence of its widespread application to Mg/Ca calibrations stems from a paucity of data in any individual investigation. However, the number of published Mg/Ca calibrations for a number of species is now sufficient to circumvent this challenge by aggregating data from multiple sources. Here we systematically evaluate the degree to which temperature, salinity, pH, and carbonate chemistry influence the Mg/Ca of five species of planktic foraminifera: Globigerinoides ruber, G. bulloides, Globorotalia inflata, Neogloboquadrina pachyderma, and Neogloboquadrina incompta. This study is motivated in part by recent efforts to understand climate variability during the last two millennia (Abram et al., 2016; Kaufman, 2014; McGregor et al., 2015; Tierney et al., 2015), and the five species considered are the most common in the Ocean2K metadatabase (McGregor et al., 2015). We are also motivated by recent advances in paleoclimate data assimilation (Hakim et al., 2016), whose efficacy relies in part on proxy system models (PSMs) that allow observations such as Mg/Ca to be mapped from climate simulations. Operationalizing such forward, or process-mimicking, models balances simplicity and generalizability with validated precision and accuracy. To achieve these goals, we compile 1,124 previously published values to generate globally distributed data sets of core-top Mg/Ca for each species and use consistent gridded products to calibrate additive models of increasing complexity. Critically, our approach also includes rigorous validation of Mg/Ca relationships, which minimizes overfitting and the spurious inclusion of additional variables in calibrations.

The paper is organized as follows: The statistical models and methods for estimating regression coefficients are described in section 2. Section 3 presents results by species and regression model complexity. In sections 4 and 5 we discuss the justification of nonthermal predictors, based on the validation tests presented in section 3, and the paleoceanographic implications of the results, respectively. Our conclusions are in section 6.

2 Methods

2.1 Data Compilation

Published core-top Mg/Ca data (Figure 1 and supporting information) were compiled from original publications and public repositories resulting in 1,124 data points from G. ruber (n = 440), G. bulloides (n = 183), G. inflata (n = 217), N. pachyderma (n = 127), and N. incompta (n = 157; Saenger & Evans, 2019). Compiled data approximate the global range of each species, spanning both hemispheres, multiple ocean basins, and depositional environments of <100- to >5,000-m water depth. Our compilation also considers a variety of sampled size fractions and accounts for variations in pretreatment using the empirical relationship of Rosenthal et al. (2004) to account for the influence of reductive cleaning. We limit our compilation to core-top data as they include the potential postdepositional effects observed in downcore data sets but not captured by sediment trap, plankton tow, or culture data. While core-top foraminifera may not always be modern because of bioturbation, the bias introduced by calibrating their Mg/Ca against modern conditions is expected to be minimal given that temporal variations in environmental variables are likely to be small compared to spatial variations. Comparison studies confirm this (Ortiz & Mix, 1997) and support the widespread use of core-top calibrations (Cléroux et al., 2008; Dekens et al., 2002; Schmidt & Mulitza, 2002; Tierney & Tingley, 2014).

Details are in the caption following the image
Spatial distribution of core-top Mg/Ca values used in this study.

The complementary environmental data used for calibration and validation are derived from the 2013 World Ocean Atlas (WOA; Locarnini et al., 2013) for temperature and salinity and version 2 of the Global Ocean Data Analysis Project for Carbon (Lauvset et al., 2016; Olsen et al., 2016) for pH and carbonate system parameters. Both products are gridded to 1° × 1° with 33 vertical levels. We use calcite saturation state (Ω) as a predictor in place of ∆CO3, which has the subtle advantage of accounting for the possibility of small changes in calcium ion concentration. Ω relates in situ carbonate ion concentration to that at saturation as a ratio and is therefore unitless. Although subsurface version 2 of the Global Ocean Data Analysis Project for Carbon data are limited to annual climatologies, seasonal WOA temperature and salinity averages were considered. These averages account for hemisphere such that spring is the mean March–May value in the Northern Hemisphere and September–November in the Southern Hemisphere.

Representative depth habitats for each species were used to extract temperature (T), salinity (S), pH, and Ω values from the 1° × 1° gridbox that best approximates the environmental conditions associated with each Mg/Ca datum. Specifically, G. ruber was assigned a depth habitat of 0–50 m (Anand, Elderfield, et al., 2003; Cléroux et al., 2008; Dekens et al., 2002; Elderfield & Ganssen, 2000), while N. pachyderma and G. bulloides were assigned depth habitats of 0–100 m (Cléroux et al., 2008; Elderfield & Ganssen, 2000; Marr et al., 2011; Nurnberg, 1995; Quintana Krupinski et al., 2017; Riveiros et al., 2016). Although the depth habitat of G. inflata is less well constrained, it has been suggested to correspond to the depth of the summer thermocline, near 100 m, north of 35°N, but be deeper, ~150–350 m, south of this latitude, including the Southern Hemisphere (Anand, Elderfield, et al., 2003; Cléroux et al., 2013, 2008, 2007; Elderfield & Ganssen, 2000; Groeneveld & Chiessi, 2011). Sensitivity tests indicated assigning a depth habitat of ±50 m had a negligible effect (results not shown).

The seasonality of calcification was also assigned based on the results of a recent global sediment trap study. Jonkers and Kučera (2015) suggested that G. ruber calcifies year-round in warm waters but is biased toward the warm season when mean annual temperature is below 25 °C. As such, we assumed mean annual values for all variables when mean annual temperature exceeded this threshold but used summer values at cooler sites. Similarly, Jonkers and Kučera, (2015) showed that N. pachyderma and G. bulloides shell flux maxima occurred earlier at lower latitudes and later at high latitudes, in correspondence with peak productivity. We approximated this pattern by assuming spring values at sites where mean annual temperature exceeded 10 °C and summer values at cooler sites. Finally, Jonkers and Kučera, (2015) found the timing of G. inflata calcification to occur consistently in spring, and we used these values in our calibration of this species.

Assuming a fixed depth habitat differs from the approach commonly used in core-top calibrations focused primarily on temperature (e.g., Cléroux et al., 2008; Elderfield & Ganssen, 2000). These studies typically use paired oxygen isotope (δ18O) analyses to estimate calcification temperature, which is then regressed against Mg/Ca. This novel approach has the advantage of potentially accounting for the variations in depth habitat and seasonality of foraminifera calcification between sites, assuming the δ18O of seawater is invariant. The approach is not without its own uncertainties however, and comparison of δ18O-based and in situ temperatures can show offsets of up to 3 °C (Anand, Elderfield, et al., 2003; Gray et al., 2018). δ18O-based temperatures are also poorly suited to multivariate calibrations considering that independent proxies for variables other than T either do not exist or have large uncertainties (Allen et al., 2016). It is encouraging however that the validity of both approaches is supported by generally good agreement between our calculated values and previous estimates based on δ18O (Figure 2).

Details are in the caption following the image
Comparison of the habitat temperature assigned in this study to originally published oxygen isotope-based calcification temperatures (when available) for each species. Black line indicates 1:1 relationship.

Finally, we also considered carbonate system variables by extracting pH and Ω values from gridded products both over the assigned depth habitat and at the water depth of each core site. The subscripts “shallow” and “deep” are used to distinguish between values that reflect conditions during calcification (pHshallow and Ωshallow) and those of the depositional environment at the sediment-water interface (pHdeep and Ωdeep). If the gridbox covering the latitude and longitude of a core site did not include data, the mean of immediately adjacent gridboxes was used.

2.2 Model Construction, Calibration, and Validation

A stepwise regression approach with bootstrapped sampling of validation statistics was used to identify the subset of seven candidate predictor variables (T, S, pHshallow, Ωshallow, pHdeep, Ωdeep, and size fraction) that produced reliably skillful estimates of Mg/Ca. Consistent with culture data (Hönisch et al., 2013; Kısakürek et al., 2008; Russell et al., 2004) and most empirical calibrations (Gray et al., 2018; Khider et al., 2015), we model Mg/Ca as an exponential function of T and S (Figures 3a and 3b). A power function was used to relate Ωshallow to Mg/Ca as suggested by a recent compilation of culture data (Evans et al., 2016; Figure 3c). The relationship between Mg/Ca and pH at the time of calcification has been modeled using logistic and power functions (Evans et al., 2016) but has also been suggested to be linear (Allen et al., 2016; Evans et al., 2016; Russell et al., 2004), and for simplicity we model Mg/Ca as a linear function of pHshallow (Figure 3d). Mg/Ca was also modeled as a linear function of the size fraction from which shells are picked (Elderfield et al., 2002; Friedrich et al., 2012), which we characterized using the mean of the size range. Finally, we considered Mg/Ca to be linearly proportional to carbonate parameters at the sediment-water interface below a specific critical threshold (Dekens et al., 2002; Regenberg et al., 2014). Calibrations considered a threshold of Ωdeep < 1.8, which approximates the ∆CO3 < 21 μmol/kg suggested by Regenberg et al. (2014), but other values were considered when Ωdeep was selected as a significant variable. Sites at which Ωdeep was higher than the assigned threshold were retained in order to allow all data to contribute to multivariate regressions but were set to the threshold value. The critical value for pHdeep was set at the mean pH value of sites that exceeded the assigned Ωdeep threshold. If either pH or Ω were selected as a significant variable, the other was excluded such that models could not include both pHshallow and Ωshallow or both pHdeep and Ωdeep.

Details are in the caption following the image
Culture-based relationships between foraminiferal Mg/Ca and (a) temperature (T), (b) salinity (S), (c) calcite saturation state (Ω), and (d) pH. Data are compiled from Allen et al. (2011), Dueñas-Bohórquez et al. (2009), Evans et al. (2016), Hönisch et al. (2013), Kısakürek et al. (2008), Lea et al. (1999), and Russell et al. (2004). Multivariate regressions were used to determine the sensitivity of each species to T, S, Ω, and pH, and following Evans et al. (2016), these relationships were used to normalize Mg/Ca values to a temperature of 26 °C, salinity of 35 psu, and/or Ω of 5 as appropriate to isolate univariate relationships.
Our stepwise approach first considered univariate relationships for each independent variable at the individual species level. These models had the general forms below:
urn:x-wiley:25724517:media:palo20769:palo20769-math-0001
for which pHdeep and Ωdeep higher than the assigned threshold were set to the threshold value as described above, and size is the mean of the size fraction considered.

Coefficients for each model were calculated using a calibration data set of a random selection of 75% of Mg/Ca data for each species. Other reasonable choices (e.g., 50% or 66%) yielded nearly identical results (results not shown). The random allocation of 75% of data to the calibration set was repeated 10,000 times. To incorporate uncertainty in habitat depth, each iteration drew environmental variables from a random depth horizon within the assigned range. For example, the WOA has seven levels in the upper 100 m, and the temperature or salinity regressed against G. bulloides Mg/Ca was randomly selected from this range for each data point in each iteration. To account for analytical and sampling uncertainty in Mg/Ca observations, a similar random draw was taken from a normal distribution surrounding each published Mg/Ca value with a relative standard deviation (RSD) of 1.5%, which is typical (Rosenthal et al., 2004). Coefficients for each model were calculated using ordinary least squares when possible, while more complicated multivariate models were calculated via nonlinear least squares using the nls function in R (Bates & Watts, 2007). Nonlinear least squares fits require starting estimates for regression coefficients, and in some cases not all 10,000 iterations converged. Instances for which more than 10% of iterations did not converge are clearly identified in our results and were considered to indicate that a particular subset of data identified for coefficient estimation was not suitable for regression.

The performance of each best fit relationship was evaluated by comparing predicted to observed Mg/Ca at the withheld 25% of sites. Validation skill was quantified using the CE calculated as
urn:x-wiley:25724517:media:palo20769:palo20769-math-0002
where Mg/Capred and Mg/Caobs are estimated and actual Mg/Ca values, respectively, within the withheld validation test set. urn:x-wiley:25724517:media:palo20769:palo20769-math-0003 is the mean Mg/Ca of validation data. CE can vary from negative infinity to 1 with positive values indicating the fraction of actual variance that is resolved (Cook et al., 1999; Nash & Sutcliffe, 1970). We consider a CE > 0 to be skillful. Our Monte Carlo approach produced 10,000 CE values for each combination of variables, which were normally distributed, and results are reported as the mean of these 10,000 values along with 99% confidence intervals on the mean.
The predictor variable that yielded the highest, positive (most skillful) CE among all univariate models was retained, and bivariate models were subsequently calculated using the remaining variables. The additional variable in bivariate models that resulted in the highest CE was retained, provided its CE was higher than the highest univariate CE and did not overlap with its 99% confidence interval. Subsequently, trivariate regressions were performed. When additional variables produced no significant improvement in CE, the training process was terminated. The most complicated multivariate models possibly using this approach had one of the general forms below:
urn:x-wiley:25724517:media:palo20769:palo20769-math-0004

We used this stepwise linear model construction exercise to identify optimal model variables, and best estimates of model coefficients were ultimately calculated using all Mg/Ca data without a withheld subset (Wilks, 2006).

3 Results

3.1 N. pachyderma

Univariate N. pachyderma models generally exhibited little to no skill in predicting Mg/Ca (Figure 4a and Table S7). Ωdeep yielded the highest CE, albeit with a barely positive value of 0.011 ± 0.003. The addition of Ωdeep and size increased CE to a still barely skillful maximum of 0.047 ± 0.004, ultimately producing a rather complex and nonintuitive three variable model:
urn:x-wiley:25724517:media:palo20769:palo20769-math-0005(1)
for which coefficient values and uncertainties represent the mean and 99% confidence interval of all Monte Carlo realizations. The Pearson correlation coefficient (r) for equation 1 is 0.38 with an RMSE on modeled Mg/Ca of 0.252 ± 0.001 mmol/mol, which is equivalent to 34% using the mean Mg/Ca of 0.74 mmol/mol for all data.
Details are in the caption following the image
Mean coefficient of efficiency (CE) for univariate and multivariate relationships considered for each species. Error bars representing 99% confidence interval based on 10,000 random allocations of data to calibration or validation data sets are typically smaller than the symbol size. Relationships solved via nonlinear least squares for which <90% of iterations did not converge have been omitted.

3.2 N. incompta

Salinity, temperature, and Ωshallow all yielded skillful univariate models for N. incompta, with salinity having the highest CE of 0.339 ± 0.005 (Figure 4b and Table S7). The addition of size increases CE to its maximum of 0.381 ± 0.004, making salinity and size the optimal variables selected for N. incompta. Incorporating all data yielded the relationship:
urn:x-wiley:25724517:media:palo20769:palo20769-math-0006(2)

For equation 2, r = 0.67 with an RMSE of 0.182 ± 0.001 mmol/mol or 13% RSD using the all N. incompta mean of 1.38 mmol/mol.

3.3 G. ruber

As for N. incompta, salinity, temperature, and Ωshallow produced skillful univariate models, but temperature resulted in the highest CE among these of 0.399 ± 0.002 (Figure 4c and Table S7). CE increased to 0.469 ± 0.002 in a bivariate model that added Ωdeep, and temperature and Ωdeep were selected as optimal predictors of G. ruber Mg/Ca. The final model constructed from all data was
urn:x-wiley:25724517:media:palo20769:palo20769-math-0007(3)

For equation 3, r = 0.70 with an RMSE of 0.678 ± 0.001 mmol/mol or 15% RSD using the mean Mg/Ca of 4.41 mmol/mol for all data.

3.4 G. inflata

Salinity and temperature produced skillful univariate models for G. inflata, with temperature again resulting in the highest CE among these of 0.281 ± 0.003 (Figure 4d and Table S7). CE increased to 0.416 ± 0.003 with the addition of Ωshallow, leading to a final model of
urn:x-wiley:25724517:media:palo20769:palo20769-math-0008(4)

For equation 4, r = 0.68 with an RMSE of 0.283 ± 0.001 mmol/mol or 18% RSD using the mean Mg/Ca of 1.54 mmol/mol for all G. inflata data.

3.5 G. bulloides

Temperature and Ωshallow yielded skillful G. bulloides univariate models with temperature resulting in a CE of 0.728 ± 0.003 (Figure 4e and Table S7). Adding Ωdeep significantly increased CE to 0.741 ± 0.003 and produced the final model
urn:x-wiley:25724517:media:palo20769:palo20769-math-0009(5)

For equation 5, r = 0.88 with an RMSE of 0.907 ± 0.004 mmol/mol or 29% RSD using the mean Mg/Ca of 3.11 mmol/mol for all G. bulloides data.

4 Discussion

For every species, a multivariate model shows an improved ability to predict observed Mg/Ca (Figures 4 and 5), and the stepwise selection of optimal variables via validation exercises protects against artificial skill. In some cases the increase in CE, while significant, is small relative to a univariate model, but all else being equal, the more positive the CE for a given species' calibration, the more reliably that relationship can be applied to independent data. In the paleoceanographic context, relationships may be applied to independent downcore Mg/Ca data with the goal of inverting for a particular environmental variable or to independent simulation-derived environmental variables with the goal of forward modeling Mg/Ca. While all the relationships we present have predictive skill, that skill decreases in the order G. bulloides, G. ruber, G. inflata, N. incompta, and N. pachyderma. Ultimately, these relationships are empirical however, and the variables selected and their assigned coefficients need not necessarily reflect the true ecological sensitivity of foraminifera to a given environmental variable. This is particularly true when candidate-independent variables covary, as is commonly the case (Figures S1S4 and Tables S1S6). In such instances, the sensitivity of Mg/Ca to two or more variables may be conflated into a single sensitivity to a single predictor variable. On one hand, such a conflation is not important as long as Mg/Ca is modeled with the greatest possible validated skill. However, the covariance between predictors need not be stable through time, and inversion for reconstruction of the predictor variables from Mg/Ca observations might not be unique. In an ideal scenario, the optimal models would also characterize the dominant variables and their coefficient in a realistic way, and the selection of variables would parallel controlled culture studies that are less susceptible to covariations between potential predictors.

Details are in the caption following the image
Comparison of predicted to measured Mg/Ca for our derived T-only model (black) and optimal multivariate model (colored) for each species.

Conflation between variables can be easily identified by a change in coefficients as additional terms are added, but there is no perfect way to isolate the true sensitivities of Mg/Ca to environmental variables that covary significantly. One approach may be to transform environmental data into truly independent principal components (e.g., Gemperline et al., 1991), but this approach transforms variables from physical and chemical observations into linear combinations of variables that may not be easily interpretable. Rotation of principal components (Cook et al., 1999; Evans et al., 2000; Richman, 1986) relaxes the orthogonality constraint and may improve interpretability in terms of physical or chemical patterns when covariance structures are identified (Bretherton et al., 1992). Alternatively, a subset of data may be chosen such that only one variable spans an appreciable range, while other covarying parameters are approximately constant (e.g., Gray et al., 2018). However, as the example in section 1 illustrates, model structure and determined coefficients in one regional domain cannot be assumed to hold globally, and assigning a local relationship outside its domain of validity may compromise reconstruction skill. Furthermore, both approaches require a subjective choice of model structure, which differs from our objective approach of using validation statistics to select the most appropriate independent variables and their coefficients. To evaluate how realistic equations 14 are in terms of the variables selected and their sensitivities, we compare our results to independent studies below.

4.1 N. pachyderma and N. incompta

N. pachyderma and N. incompta produced low validation CE scores, with N. pachyderma performing particularly poorly. N. pachyderma produced the most complex calibration, within which the inclusion of Ωshallow is only partially consistent with previous independent work (Davis et al., 2017; Hendry et al., 2009; Jonkers et al., 2013). Similarly, the exclusion of salinity is consistent with early studies that suggested salinity to be of secondary importance (Nurnberg, 1995), but its influence has not been specifically isolated in culture studies. However, the selection of Ωdeep and exclusion of temperature are not consistent with suggestions that N. pachyderma is among the most dissolution-resistant planktic foraminifera (Malmgren, 1983) or with culture-based evidence for a strong T dependence (Davis et al., 2017). Along with its marginal skill, this casts considerable doubt on the usefulness of equation 1, and we cannot recommend that it be applied in paleoceanographic contexts.

Difficulty identifying a robust relationship for N. pachyderma likely stems, in part, from additional confounding variables common in its high-latitude environment that were not included as potential predictors. Chief among these is sea ice, which may alter the local carbonate ion concentration (Hendry et al., 2009), leading to anomalously high Mg/Ca values (Kozdon et al., 2009; Meland et al., 2006; Nurnberg, 1995; Riveiros et al., 2016). Excluding sites with summer temperatures <3 °C near the sea ice edge has been suggested as a means to avoid this effect and produce calibrations more similar to other species (Kozdon et al., 2009; Riveiros et al., 2016). However, doing so with our compiled data produces no skillful reconstructions, in part because available data are reduced by over 50%.

Additional confounding variables that may inhibit robust N. pachyderma calibrations include volcanic ash (Meland et al., 2006; Morley et al., 2017), mixing of glacial and Holocene foraminifera at low sedimentation rate sites (Meland et al., 2006), variations in the proportion of low-Mg outer crust, which can comprise up to 90% of calcite (Davis et al., 2017; Lohmann, 1995) and up to seven different N. pachyderma genotypes (Darling et al., 2007). While it is possible to measure and control for many of these influences (Lea et al., 2005; Rosenthal & Lohmann, 2002), the requisite data are not available to do so for our compilation. However, incorporating these additional confounding variables as candidate predictors may lead to a simpler and more plausible model of N. pachyderma Mg/Ca.

Somewhat higher validation statistics for N. incompta suggest greater predictive skill, but the selection of salinity rather than temperature as an optimal variable is surprising given two independent culture studies suggesting a temperature dependence similar to that of other species (Davis et al., 2017; von Langen et al., 2005). While these studies did not explicitly evaluate salinity and therefore do not provide robust evidence disproving a salinity effect, both present convincing evidence for a temperature influence. The strong correlation between salinity and temperature (r = 0.88; Table S3) provides a plausible explanation for this apparent discrepancy and suggests that the two variables may be conflated. Consistent with this, the implied sensitivities in univariate salinity and temperature models of 30%/psu and 3%/ °C, respectively, both decrease to 22%/psu and 1%/°C in a bivariate temperature-salinity model, although this bivariate model does not have the highest CE. Forcing temperature to be included yields alternative models, but none exceed the CE of the bivariate salinity-size relationship, and all suggest a temperature sensitivity of <4%/°C. Given previous independent evidence for a temperature dependence, it would be unreasonable to suggest our proposed N. incompta relationship is an entirely realistic model for Mg/Ca, but it does provide some evidence that salinity may be more important than previously thought. This prediction could be tested in controlled culture studies that vary salinity across a realistic oceanographic salinity range at fixed temperature. Regardless, the possibility that equation 2 is not entirely consistent with true ecological sensitivities does not negate its predictive skill, which exceeds that of all other calibrations considered for the species, although it might increase nonuniqueness of inversion for temperature and salinity.

The relatively poor skill and nonintuitive results for N. pachyderma and N. incompta models likely stem in part from the smaller sample sizes of each population. While acknowledging that these species are genetically distinct (Darling et al., 2007), we also evaluated how combining the populations would influence our derived relationships. Repeating the exercise above with the combined data set selected a bivariate temperature-size model, with a CE of 0.569 ± 0.002:
urn:x-wiley:25724517:media:palo20769:palo20769-math-0010(6)

Pearson r and RMSE for equation 6 are 0.77 and 0.261 mmol/mol (23% RSD), respectively. While many of the concerns discussed above still hold for this combined relationship, it is noteworthy that it has considerably greater skill than either of the optimal relationships for the individual species and would therefore be more appropriate to apply to independent Neogloboquadrina foraminifera regardless of species. Furthermore, equation 6 is generally more consistent with independent results (Davis et al., 2017; Jonkers et al., 2013; von Langen et al., 2005), suggesting that it is more realistic in addition to being more skillful. We therefore see equation 6 to have greater utility for paleoceanographic studies and suggest that it be used in place of either equation 1 or 2.

4.2 G. ruber

Of the species we considered, G. ruber is perhaps the most extensively studied and widely applied in paleoceanographic investigations. While this species includes a spectrum of morphologies, the degree to which their Mg/Ca sensitivities differ is equivocal (Richey et al., 2019; Sadekov et al., 2008; Steinke et al., 2005; Thirumalai et al., 2014), and early literature does not distinguish among them. Because of this and because of the demonstrated increase in skill achieved by combining Neogloboquadrina species, we did not consider each morphology individually. Our results support a significant dependence of G. ruber Mg/Ca on T, with a sensitivity of 6.8 ± 0.1% per degree Celsius that is significantly lower than the 9–10% previously inferred from cultures, individual sediment traps, and core-tops (Anand, Elderfield, et al., 2003; Khider et al., 2015; Kısakürek et al., 2008) but very similar to the value of 6.7 ± 0.8% per degree Celsius found in global sediment trap/plankton tow samples (Gray et al., 2018). While most of the nonthermal variables we evaluated have been suggested to influence G. ruber Mg/Ca (Table 1), our analysis suggests that only deep carbonate chemistry leads to a significant increase in predictive skill over that of the T-only calibration (Figures 4c).

Our results support previous suggestions that G. ruber diagenesis at calcite saturation states close to and below 1 can appreciably reduce Mg/Ca (Dekens et al., 2002; Regenberg et al., 2006, 2014). An Ωdeep threshold of 1.8 appears to adequately model this effect and is in general agreement with a ∆CO32− threshold of ~25 μmol/kg, as suggested previously (Regenberg et al., 2006, 2014). The inclusion of Ωdeep in our G. ruber calibration but not those for all other species suggests that this species may be particularly sensitive to postdepositional alteration. Consistent with this, Regenberg et al. (2006) suggested that Mg begins to be removed from G. ruber shells at shallower depth and higher saturation states than most other coexisting species.

The exclusion of salinity from our G. ruber calibration may be surprising given the number of previous studies that have suggested an appreciable effect on Mg/Ca (Arbuszewski et al., 2010; Gray et al., 2018; Hönisch et al., 2013; Kısakürek et al., 2008; Khider et al., 2015; Mathien-Blard & Bassinot, 2009). While we cannot rule out the possibility that a salinity dependence is conflated with our temperature coefficient, the two variables share only ~25% variance (r = −0.53; Table S4), suggesting that the effect is weak. We note that no previous study advocating for the inclusion of salinity performed validation exercises, raising the possibility that its inclusion in multivariate models (e.g., Gray et al., 2018; Khider et al., 2015) may reflect overfitting.

Alternatively, it must be kept in mind that the exclusion of certain variables from calibration models does not necessarily mean that these variables are not important but rather that their importance cannot be resolved by existing data. A G. ruber salinity sensitivity of 3.3 ± 1.7%/psu has been derived from culture experiments in which salinity was evenly distributed between 33 and 45 psu (Hönisch et al., 2013; Kısakürek et al., 2008). This rather low sensitivity may not be resolved by the salinity range of our assigned G. ruber environments, which spans 32.0–37.3 psu for the depth habitat mean at each site. It is therefore entirely possible that salinity or additional variables could be incorporated into the Mg/Ca calibration of G. ruber as additional data sample a wider range of environmental conditions. Validation studies that test the culture-identified salinity influence in core-top data from real high-salinity (e.g., northern Red Sea) and low-salinity oceanographic environments would help resolve this question.

The same argument may explain the absence of either Ωshallow or pHshallow in our multivariate calibration, despite considerable evidence to the contrary from culture and sediment trap studies (Allen et al., 2016; Evans et al., 2016; Gray et al., 2018; Kisakrüek et al., 2008). All our Ωshallow values are highly oversaturated with a mean value of 5.5 ± 0.5 (1σ) and only three sites with a value below 4, while complementary pH values almost entirely fall between 8.0 and 8.1. The sensitivity of G. ruber Mg/Ca to carbonate ion concentration should be reduced at these high saturation states (Allen et al., 2016; Evans et al., 2016) and may be difficult to resolve given the narrow range of Ωshallow values we considered. Similarly, the range of pH values we considered is far narrower than the 7.6–8.5 range used to infer a pH dependence in culture experiments (Allen et al., 2016; Evans et al., 2016; Kısakürek et al., 2008). Thus, we cannot rule out an effect of shallow carbonate parameters on G. ruber Mg/Ca but also cannot resolve one within the narrow range of values that characterize the modern surface ocean. Although further analyses of core-top data spanning a wider range of Ωshallow or pHshallow would help evaluate their influence, the range of values used in culture experiments would be difficult, if not impossible, to reproduce in real oceanographic environments. In sum, we find both the variables selected and their sensitives in our G. ruber model (equation 3) to be broadly consistent with previous results and generally realistic.

4.3 G. inflata

G. inflata exhibited generally lower validation statistics and poorer fit (Figure 4e) in comparison to the other species we considered, possibly reflecting the greater uncertainty and variability in its deeper depth habitat. However, a skillful T-only calibration was derived and improved with the addition of Ωshallow. A temperature sensitivity of 6.5 ± 0.1% per degree Celsius in this multivariate model (equation 4) is lower than the canonical value of 9% per degree Celsius but is similar to previous estimates of 5.8%, 5.1%, and 7.6% by Anand, Elderfield, et al. (2003), Cléroux et al. (2008), and Groeneveld and Chiessi (2011), respectively. Our G. inflata T sensitivity is also very similar to the value for G. ruber here and elsewhere (Gray et al., 2018), providing some evidence for similar T sensitivities across species.

At −0.73, the Ωshallow exponent for G. inflata is also more negative than previous culture-based estimates for other species (Figure 3c), which range from a maximum of −0.01 for G. sacculifer to a minimum of −0.47 for G. bulloides. This suggests that G. inflata may be particularly sensitive to variations in Ωshallow relative to other planktic foraminifera. The mechanism responsible for the exponential decrease in the Mg/Ca as Ωshallow increases is debated but could be caused by less efficient removal of Mg from the calcifying space (Evans et al., 2016), a thinner, low Mg/Ca crust (van Raden et al., 2011), or slower growth kinetics (Burton & Walter, 1987; Gabitov et al., 2014) at lower saturation states. Regardless of mechanism, the effect of Ωshallow on Mg/Ca is expected to be reduced in the highly supersaturated surface habitats of many planktic foraminifera but could be more significant at the deeper habitats assigned to G. inflata. Consistent with this, the Ωshallow values we assign to G. inflata vary from 1.9 to 4.8, making any influence on Mg/Ca more easily resolved.

Exclusion of any additional variables beyond T and Ωshallow is generally consistent with the few studies that have considered secondary impacts on G. inflata Mg/Ca. For example, Friedrich et al. (2012) found no change in G. inflata Mg/Ca with size fractions larger than 200 microns, which are similar to the values of our compilation. Similarly, no relationship with salinity exists across a ~2-psu gradient in the western Mediterranean that approximates ocean salinities (van Raden et al., 2011). Finally, the omission of deep carbonate system variables is consistent with the work of Malmgren (1983) who found G. inflata to be the second most dissolution-resistant species among the eight considered. Thus, in addition to demonstrating the highest predictive skill, we conclude that the variables included (and excluded) from our G. inflata model (equation 4) are realistic and have plausible sensitivities that are consistent with existing data on the species.

4.4 G. bulloides

G. bulloides yielded the most robust validation statistics and suggested a temperature sensitivity of 7.9 ± 0.1% per degree Celsius that broadly agrees with previous calibrations, which show a relatively wide range of sensitivities. Culture (Mashiotta et al., 1999; Lea et al., 1999) and some sediment trap data (Jonkers et al., 2013; Pak et al., 2004) point to a higher T sensitivity of 9–10% per degree Celsius, while other sediment trap data (Gibson et al., 2016; McConnell & Thunell, 2005) and core-top analyses of individual chambers (Marr et al., 2011) point to a lower sensitivity of ~6% per degree Celsius. All else being equal, calibrations across a wider T range should better resolve the true T sensitivity, and one possible explanation for the observed range in T sensitivity may be the T range over which calibrations were performed. High T sensitivities typically derive from studies that consider relatively narrow T ranges of ~5–9 °C, while studies that suggest lower T sensitivities consider ranges of ~12–15 °C. Our compilation spans a temperature range of over 26 °C, suggesting that its intermediate value may be more accurate than previous results that span a narrower range with fewer data.

The inclusion of Ωdeep as an independent variable is supported by some previous data but is not unequivocal. Anand, Ganssen, et al. (2003) found Mg/Ca to decrease by ~0.5 mmol/mol below the lysocline, coincident with a decrease in shell weight. However, Marr et al. (2011) found an opposite relationship between Mg/Ca and shell weight, while Mekik et al. (2007) suggested postdepositional alteration had little influence on G. bulloides Mg/Ca based on a comparison to an independent dissolution metric. Similarly, the omission of size and shallow carbonate parameters from our optimal G. bulloides model is only partially consistent with previous work. With respect to size, Friedrich et al. (2012) suggested a significant decrease in Mg/Ca of ~0.2 mmol/mol per 100 microns with increasing shell size, while Elderfield et al. (2002) found a similar magnitude of change but of opposite sign, and others found no relationship with size (Martínez-Botí et al., 2011; Sadekov et al., 2016). With respect to shallow carbonate parameters, culture studies suggest that G. bulloides Mg/Ca should decrease at higher pH and carbonate ion concentrations (Lea et al., 1999; Russell et al., 2004), but our calibration cannot resolve this effect. Such discrepancies may reflect differences in calcification behavior in culture environments versus natural conditions, regional heterogeneities in the relationship between Mg/Ca and environmental variables, genetic differences (Sadekov et al., 2016), or a relatively narrow range of shallow carbonate parameters in our compiled data. For example, Lea et al. (1999) found a pH dependence in G. bulloides reared in values between 7.6 and 8.5. In contrast, our compiled G. bulloides data span a range of 8.07 ±0.06 (1σ), which may be too narrow to capture this effect. We therefore conclude that our G. bulloides relationship (equation 5) is plausible within the relatively poorly constrained ranges of existing data, but that additional work is necessary to reach consensus on how G. bulloides Mg/Ca varies with most of the predictors we consider. However, we again note that these uncertainties do not diminish the demonstrable skill of equation 5, and it is still the most appropriate relationship to apply to independent G. bulloides Mg/Ca data.

5 Paleoceanographic Implications

This study was motivated, in part, to better understand climate variability during the last two millennia, and we evaluate the effect of our revised calibrations using Mg/Ca compiled in the Ocean2K metadatabase (McGregor et al., 2015). The compilation presented by McGregor et al. (2015) includes temperature estimates generated from varying calibrations derived from regional core-top sediments, sediment trap data, and culturing studies. In many cases, calibrations constructed from data at a single site (Anand, Elderfield, et al., 2003) are assumed to be valid regionally within the same ocean basin (Lund et al., 2006; Richey et al., 2009, 2007; Saenger et al., 2011) and globally in different ocean basins (Linsley et al., 2010; Oppo et al., 2009; Rustic et al., 2015). As illustrated in section 1, the validity of such assumptions is difficult to evaluate without validation exercises. To evaluate how our global calibration would influence the temperatures reconstructed from these studies we first compared Mg/Ca-based temperature estimates derived from our species-specific, temperature-only relationships to originally published values.

Results show both positive and negative shifts in the mean temperature of records, although G. bulloides tends to be shifted cooler, while G. inflata is shifted warmer (Figure 6a). A portion of this observation can be attributed to differences in how the calibration used in the original publication was constructed. In some cases, Mg/Ca was regressed against mean annual sea surface temperature (e.g., Dekens et al., 2002), which would tend to lead to warmer temperatures due to the shallow assumed depth habitat but either warmer or cooler temperatures depending on how mean annual values compare to our assigned seasonality. In other studies, Mg/Ca was regressed against oxygen-isotope “calcification temperature” (e.g., Anand, Elderfield, et al., 2003; Elderfield & Ganssen, 2000), which could easily be warmer or cooler than our assigned temperatures.

Details are in the caption following the image
(a) comparison of originally published temperature estimates in Ocean2K Mg/Ca data of McGregor et al. (2015) with those generated from temperature-only calibrations in this study. (b) As in (a) for this study's optimal multivariate calibrations (equations 36). Black line indicates 1:1 relationship.

Scaling is also different but is more uniform in sign and suggests that Mg/Ca paleotemperature records would show greater variance when our relationships are applied, consistent with the lower temperature sensitivities derived for most species. The effect is somewhat more prominent in G. inflata, possibly due to greater uncertainty in habitat depth and seasonality. G. ruber from the Cariaco Basin (Wurtzel et al., 2013) also exhibit a strong change in scaling, likely because the high sedimentation rate at the site allowed the original calibration to be constructed from temporal, rather than spatial, variations in sea surface temperature with a variable weighting to account for upwelling (Wurtzel et al., 2013). Scaling differences are somewhat smaller for G. bulloides, perhaps suggesting greater agreement between different methods of estimating temperature, and more well-constrained and consistent depths and seasons of calcification.

It is also possible that foraminifera from different locations exhibit real differences in Mg/Ca in response to environmental predictors. The Mg concentration of foraminifera is known to be impacted by biologically mediated “vital effects” that can produce Mg/Ca variations in the absence of changes in temperature and other environmental variables (Bentov & Erez, 2006; Sadekov et al., 2008). Furthermore, relationships may vary with genotype (Naik, 2016; Steinke et al., 2005), which can vary temporally at a core site (Darling et al., 2003). Any such differences will inevitably produce deviations between local and global calibrations, as observed in Figure 6.

While inconsistencies in calibration method and true regional differences in Mg/Ca-T may contribute to observed biases and changes in scaling, we cannot rule out the possibility that previous calibrations do not capture the global mean Mg/Ca-T relationship for a species as accurately as often assumed due to (1) a relatively small number of data points, (2) a relatively small temperature range, and (3) the lack of out-of-sample validation of model coefficients and terms to be included. To evaluate the influence of secondary variables, we also compared originally published Mg/Ca temperature to our optimal, multivariate models (equations 36). Results show scaling trends that are largely similar to those generated from the temperature-only case (Figure 6) but biases that are more equally distributed between positive and negative offsets, as would be expected if a relationship accurately captured the true mean. This suggests that much of the scaling corrections in our revised calibrations can be attributed to having more, as well as more widely distributed, data, while secondary nonthermal variables may correct for bias. Our data-rich, well-validated calibrations are therefore arguably more accurate on global scales than more regional, ad hoc calibrations used previously and may be more universally applicable to downcore reconstructions.

It is noteworthy that McGregor et al. (2015) opted to interpret synthesized data as standardized anomalies, in part because differences in the mean and variance of records from adjacent sites were difficult to reconcile (their Figure 2a). When we compile our revised temperature estimates in the same way as McGregor et al. (2015), we reproduce their results of a general cooling during the Common Era (not shown), which is not surprising given that the standardization of anomalies erases any scaling and bias corrections. However, the temperature reconstructions compiled by McGregor et al. (2015) show the same Common Era cooling trend when presented without standardization (Figure 7). Application of the multivariate models presented here (equations 36), with nonthermal variables assumed constant, preserves this trend but shows greater variance within a bicentennial window and a larger magnitude of cooling over the entire two millennia (Figure 7). Both observations can be attributed to temperature sensitivities in most of the calibrations we present that are lower than previously estimated. Therefore, the calibrations we present will not fundamentally change our understanding of Common Era climate, but we anticipate that they may help facilitate more informative intercomparison and synthesis of Mg/Ca paleoreconstructions by providing calibrations that more closely approach universal relationships, reduce bias, and help facilitate the estimation of mean values and nonstandardized anomalies.

Details are in the caption following the image
Box plot showing the median, inner quartile, and outliers of Mg/Ca-based temperatures compiled by McGregor et al. (2015; gray) and recalculated from our optimal multivariates relationships (equations 36; yellow) in 200-year bins spanning the past two millennia.

Application of Mg/Ca calibrations typically uses an inverse approach to infer past temperatures by assuming Mg/Ca is a univariate function of T. This approach is only approximately valid when Mg/Ca is influenced by multiple secondary variables, as we find for all species. The degree to which an inverse approach can be used to infer T when additional independent variables are shown to be significant depends on the relative magnitude of secondary terms, and the degree to which those dependencies can be skillfully validated and/or controlled. For example, a univariate inverse approach is likely still valid for our combined Neogloboquadrina relationship given that size is easily constrained during analysis. The same approach may also still be reasonable for G. ruber and G. bulloides relationships that include Ωdeep as long as they are applied over relatively short intervals of less than a few thousand years during which deep carbonate chemistry was likely relatively constant. They may be less valid over glacial-interglacial cycles when larger, spatially heterogeneous, variations in Ωdeep may have occurred (Yu et al., 2014; Zeebe & Marchitto, 2010). In contrast, assuming Ωshallow to be constant in order to apply an inverse approach to G. inflata Mg/Ca is tenuous. While Ωshallow likely varied by less than 10% during the Holocene, variations of 20–25% occurred on various timescales throughout the Cenozoic (Zeebe, 2012), and the assumption of constant Ωshallow could compromise the accuracy of G. inflata paleotemperature reconstructions on most timescales.

When secondary variables cannot be considered constant, an inverse approach may still be possible if nonthermal variables can be independently estimated. For example, carbonate parameters may be estimated by measuring U/Ca, B/Ca, and the boron isotopic composition of the same foraminifera used for Mg/Ca analyses (Hendry et al., 2009; Rae et al., 2011; Russell et al., 2004). These additional geochemical proxies come with their own uncertainties however and are likely also influenced by secondary variables that have not been rigorously explored through independent validation exercises. In such cases, validation exercises similar to the ones we employ here are warranted, as are Bayesian techniques that give probabilistic estimations of the likelihood that various combinations of predictors gave rise to the observations, as has been done for tree ring data (Tolwinski-Ward et al., 2013).

A forward modeling approach that combines PSMs and simulation estimates of all relevant environmental variables more naturally accommodates multivariate calibrations and presents an attractive alternative to the inverse approach (Dee et al., 2016; Evans et al., 2013). Along with archive models that consider age-model uncertainty, accurate PSMs are fundamental to the incorporation of Mg/Ca into paleoclimate data assimilation efforts (Hakim et al., 2016) and its use in pseudoproxy experiments that evaluate paleoclimate data networks and uncertainty in paleoclimate reconstructions (Christiansen & Ljungqvist, 2016; Evans et al., 2014; Wang et al., 2014). Our results suggest that generating accurate forward models of N. pachyderma, N. incompta, G. ruber, and G. bulloides Mg/Ca is relatively straightforward in recent millennia and can be done by combining standard general circulation model temperature fields at appropriate depth habitats and seasons with an assumed modern Ωdeep value. Conversely, accurate PSMs for the Mg/Ca of G. inflata across all timescales and other species on timescales for which Ωdeep may have varied may require general circulation models with a biogeochemical module (e.g., Cossarini et al., 2017) capable of realistically estimating the evolution of Ω.

6 Conclusions

Through a series of calibration and independent validation exercises we identify the variables that predict the Mg/Ca of five species of planktic foraminifera (N. pachyderma, N. incompta, G. ruber, G. inflata, and G. bulloides) with the greatest predictive skill. Results from these analyses suggest that temperature is a primary control on the Mg/Ca of all species, supporting the basic premise of its interpretation as a paleothermometer. However, all species show evidence for secondary effects on Mg/Ca associated with the mean size fraction analyzed and/or carbonate chemistry at either the depth of calcification or deposition (equations 36).

Contrary to some previous results, we find that salinity does not have a significant control on the Mg/Ca of any species, possibly indicating that calibrations including salinity without independently validating its influence could reflect overfitting. While we suggest our proposed relationships are broadly realistic, we stress that excluding salinity or any other variable from optimal relationships should not be considered to be conclusive evidence that it does not influence Mg/Ca. Instead, it may indicate that a small sensitivity cannot be resolved with existing data and Mg/Ca observational uncertainty. Even though our analyses are conducted on globally distributed data, the range of salinity and some other variables covered by our core-top network is relatively narrow compared to many culture studies that suggest an effect. The optimal relationships we derive may therefore evolve as additional, as well as possibly higher-precision, Mg/Ca data become available.

In most cases, our proposed temperature sensitivity is similar to, but somewhat lower than canonical temperature sensitivities of 9–10% per degree Celsius. However, because our estimates are based on far more data than previous investigations, we suggest that they more accurately represent the true Mg/Ca-T relationships on global scales. The paleoclimatic consequences of the inferred reduction in temperature sensitivity are relatively small, but, all else being equal, will slightly increase the inferred temperature change associated with a measured change in Mg/Ca.

The finding that secondary variables often have an appreciable influence on foraminifera Mg/Ca complicates the typical inverse approach for estimating temperature from measured Mg/Ca. In cases where secondary variables can be constrained with confidence (e.g., size fraction) an inverse approach is likely still valid, but when this is not true (e.g., for carbonate system variables) it may be unreasonable to rely on an inverse approach. In these cases, forward modeling of multivariate relationships between Mg/Ca and environmental variables represents an attractive alternative that can still yield valuable paleoceanographic insight when combined with climate model simulations.

Perhaps most importantly, our general approach of using a subset of withheld data to validate the significance of environmental variables is by no means unique to Mg/Ca and should be applied in other paleoceanographic applications of geochemical data. Many geochemical proxies are interpreted as univariate systems out of convenience and to facilitate an inverse approach to paleoceanographic reconstructions, but given the complexity and uncertainty of foraminiferal biomineralization (Erez, 2003), it is reasonable to suspect that secondary variables also affect other trace element ratios and isotopic compositions. The same argument pertains to nonforaminiferal paleoceanographic proxies that also have a relatively uncertain mechanistic connection between a measured quantity and environmental variables. Future efforts should therefore focus on conducting independent validation exercises on any paleoclimate proxy with sufficient data to facilitate robust calibrations even when a subset of data is withheld (e.g., Neukom et al., 2018; Phipps et al., 2013) and on the development of controlled studies to identify physical, chemical, and biological process understanding that can be compared to empirical model fitting exercises. Doing so will promote more accurate calibrations that are less prone to overfitting, ultimately resulting in an improved understanding of past ocean variability.

Acknowledgments

Data supporting this manuscript are provided in supporting information and Saenger and Evans (2019). Compilations of Mg/Ca will be archived with paleoceanography data at NOAA's National Centers for Environmental Information. We acknowledge the help of Stephan Barker, David Thornalley, Paola Moffa-Sanchez, and Jimin Yu for clarifications and help acquiring data from Elderfield and Ganssen (2000). Jess Tierney provided valuable discussion and data sharing. Yair Rosenthal and an anonymous reviewer provided valuable comments that improved this work. Support for this work was provided by NSF awards OCE1536418 and OCE1536249.