Volume 32, Issue 2 p. 104-122
Research Article
Free Access

Defining uncertainty and error in planktic foraminiferal oxygen isotope measurements

A. J. Fraass

Corresponding Author

A. J. Fraass

Now at Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, Washington, District of Columbia, USA

Department of Geosciences, University of Massachusetts Amherst, Amherst, Massachusetts, USA

Correspondence to: A. J. Fraass,

[email protected]

Search for more papers by this author
C. M. Lowery

C. M. Lowery

Department of Geosciences, University of Massachusetts Amherst, Amherst, Massachusetts, USA

Now at Institute for Geophysics, University of Texas at Austin, Austin, Texas, USA

Search for more papers by this author
First published: 19 January 2017
Citations: 11

Abstract

Foraminifera are the backbone of paleoceanography. Planktic foraminifera are one of the leading tools for reconstructing water column structure. However, there are unconstrained variables when dealing with uncertainty in the reproducibility of oxygen isotope measurements. This study presents the first results from a simple model of foraminiferal calcification (Foraminiferal Isotope Reproducibility Model; FIRM), designed to estimate uncertainty in oxygen isotope measurements. FIRM uses parameters including location, depth habitat, season, number of individuals included in measurement, diagenesis, misidentification, size variation, and vital effects to produce synthetic isotope data in a manner reflecting natural processes. Reproducibility is then tested using Monte Carlo simulations. Importantly, this is not an attempt to fully model the entire complicated process of foraminiferal calcification; instead, we are trying to include only enough parameters to estimate the uncertainty in foraminiferal δ18O records. Two well-constrained empirical data sets are simulated successfully, demonstrating the validity of our model. The results from a series of experiments with the model show that reproducibility is not only largely controlled by the number of individuals in each measurement but also strongly a function of local oceanography if the number of individuals is held constant. Parameters like diagenesis or misidentification have an impact on both the precision and the accuracy of the data. FIRM is a tool to estimate isotopic uncertainty values and to explore the impact of myriad factors on the fidelity of paleoceanographic records, particularly for the Holocene.

Key Points

  • Open source model for planktic foraminiferal oxygen isotope reproducibility
  • Model performs experiments using several variables to identify changes in precision
  • Number of individuals is important, but local oceanography has the most control

1 Introduction

Planktic foraminifera have been studied for their stable isotopic signals since the pioneering work of Urey [1947, 1948] and Emiliani [1954, 1955] and have since evolved into the primary carriers of paleoclimate data in marine environments. Back in those heady days, the limitations of mass spectrometry required the use of hundreds of individual foraminiferal tests to return a usable value. This had the fortunate effect of averaging short-term seasonal to decadal variations inherent in planktic foraminifera, which usually have life-spans of about a month [Hemleben et al., 1989; Bijma et al., 1990], into single data points often representing a several thousand year average. With the advent of more sensitive mass spectrometers, smaller sample masses (and thus fewer individuals per data point) have become possible to analyze. This provides a tremendous advantage to paleoceanographers, allowing the generation of longer time series with the same amount of time spent at the microscope, but speed and ease have come with the cost of reduced precision.

The use of increasingly smaller sample sizes has decreased reproducibility because of seasonal to interannual variability within a single sample. Although some studies specifically target this variability, as when studying the annual cycle or El Niño–Southern Oscillation-scale (ENSO-scale) climate changes with single specimen analysis [Ganssen et al., 2011; Leduc et al., 2009], the majority of paleoceanographic studies aim to use a single data point to represent an interval of time representing the average of hundreds to tens of thousands of years. With fewer tests per sample, the differences between individual specimens will result in less precise values for each sample. Sometimes authors acknowledge this and record the number of specimens within each point or amalgamate a large number of crushed individual tests into each data point [e.g., Wade and Kroon, 2002; Ganssen et al., 2011]. Unfortunately, most authors do not. At best, many papers include a discussion of analytical error associated with the mass spectrometer used while ignoring the possibly larger uncertainty associated with the number of individuals employed.

This presents a problem. If foraminifera are the primary carriers of paleoclimatic signals in the oceans, then paleoclimate reconstructions are only as robust as the foraminiferal data on which they are built. In an attempt to correct this, we have created a simple model (FIRM; Foraminiferal Isotope Reproducibility Model) which includes steps analogous to the processes in which individual tests form and are preserved. The paleoceanographic community can use this model as a tool to estimate realistic environmental parameters for a particular locality to obtain an uncertainty value for their sample (with caveats; see section 5).

To model the planktic foraminiferal isotope system, we take modern water column temperature and salinity data from the World Ocean Atlas 2013 (WOA13) data set [Locarnini et al., 2013; Zweng et al., 2013] to estimate δ18Osw and use well-established equations [e.g., Spero et al., 2003] to calculate δ18O values for individual foraminiferal tests (δ18Oforam). We then repeat this process to come up with a synthetic data set of individual planktic foraminiferal δ18O values for that particular locality and within a given range of water depths. A number of tests can then amalgamated to create a single data point in order to simulate the process of using a mixture of specimens in a single mass spectrometer measurement. Through the use of this simplistic model, we alter various parameters to simulate the effects of natural processes (e.g., seasonality), postdepositional processes (e.g., diagenesis), and laboratory practices (e.g., accidental inclusion of multiple species within a single analysis). This allows us to demonstrate the potential uncertainty inherent in planktic foraminiferal isotope records recovered from different regions and processed under different conditions, and to demonstrate the extent to which increasing the sample size can mitigate this uncertainty. Two data sets are simulated using FIRM to demonstrate how the model holds up against real data (section 4.7).

FIRM was created in the widely used and open-source statistical computing environment R [R Core Team, 2015] with the hope that the community will use and modify the tool. It is available for download in the supporting information (S2 and S3) or GitHub (https://github.com/Fraass/FIRM). A complete walkthrough explaining the model is also available in the supporting information (S4).

2 Materials and Methods

To understand the ways error and uncertainty are introduced into stable isotope measurements of foraminiferal calcite, it is first necessary to understand the processes by which foraminifera record ambient conditions in the stable isotopes of their tests and the different types of biological, oceanographic, postdepositional, and laboratory variables that can affect that record. The following is a brief summary of the foraminiferal isotope systematics, followed by a discussion of the most significant natural and laboratory variables that may influence planktic foraminiferal isotope values in a manner apart from that which the researcher is attempting to investigate. This should not be considered an exhaustive list. An additional summary of some common effects on the oxygen isotope value of foraminiferal calcite can be found in Table 1.

Table 1. Common Factors Which Impact the Oxygen Isotope Ratio of Foraminiferal Calcite
Effect to δ18O Notes Reference Modeled in FIRM
Increase in Temperature Decrease 1°C = ~0.25‰ McCrea [1950], Emiliani [1955], Shackleton and Opdyke [1973], and Kim and O'Neil [1997] Yes
Increase in ice volume Increase 10 m = ~0.10‰ Emiliani [1955], Shackleton [1974], and Zachos et al. [2001] No
Increase in pH (i.e., CO32−) Decrease 1 unit pH = ~1.42‰ Spero [1992], Spero and Lea [1996], and Zeebe [1999, 2001] No
Light Decrease only for species with symbionts Spero [1992], Bemis et al. [1998], and Bijma et al. [1999] Yes
Salinity Increase Urey [1947], Craig [1965], and Rohling and Cooke [1999] Yes
Diagenesis Typically, convergence with depleted benthic values Pearson et al. [2001] Yes
Planktic foraminifera secrete calcium carbonate tests in rough isotopic equilibrium with surrounding seawater, incorporating the calcium and bicarbonate ions into their test in a reducing microenvironment over a period of hours (e.g., Bé et al. [1977]; see summaries in Hemleben et al. [1989], Kitazato and Bernhard [2014], and de Nooijer et al. [2014]) as shown by the equation
urn:x-wiley:08838305:media:palo20393:palo20393-math-0001(1)

The oxygen isotopic ratio of the foraminiferal calcite is a function of water temperature and the stable isotopic composition of seawater (which is strongly influenced by salinity and sometimes referred to simply as “salinity”), which in turn is a function of global ice volume and local evaporation-precipitation, riverine flux, etc. [Urey, 1947; McCrea, 1950; Emiliani, 1954, 1955; Shackleton and Opdyke, 1973, Spero and Williams, 1989; Spero et al., 1991; Schweitzer and Lohmann, 1991; Ravelo and Fairbanks, 1992; Wolff et al., 1998; Spero, 1998]. These seemingly simple theoretical relationships are complicated by the biologic filter which translates them from ambient seawater to foraminiferal calcite. That filter includes disparate life strategies, depth habitats, and reproductive habits of foraminifera species and introduces potentially significant uncertainty, usually addressed under the catch-all term “vital effects” [e.g., Shackleton and Opdyke, 1973]. Additionally, seasonal, annual, or decadal changes in the environment may create significant isotopic offsets between specimens found in the same sample. Finally, uncertainty may also be introduced by postdepositional processes and diagenesis and in the laboratory by mass spectrometer machine error and decisions about what species, size fraction, and number of individuals to analyze.

2.1 Biological Variability

2.1.1 Vital Effects

Foraminiferal calcite does not directly record ambient seawater temperature and salinity, as an abiogenic calcite precipitate might. Instead, these important proxy signals are overprinted by biological processes like life habit and growth rate, termed vital effects [Urey et al., 1951], unique to each individual species of foraminifera. Vital effects are the cause of the difference in the theoretical equilibrium between inorganic calcite and seawater at ambient temperature and salinity and the experimentally observed disequilibrium fractionation in foraminifera [e.g., Shackleton, 1974; Erez, 1978]. Foraminifera exert significant control on the geochemistry of their microenvironment, and the extent of this control varies from species to species. This control can be driven by a variety of mechanisms, including some discussed below (e.g., the presence of photosymbionts and different depth habitats), all of which exert some influence on the carbonate ion (CO32−) concentration of the microenvironment in which the foraminifer grows its test [Spero, 1992; Spero and Lea, 1993; Spero et al., 1997; Zeebe, 1999, 2001]. While this can be partially controlled by selecting a single species for isotopic analysis, certain taxa can still exhibit strong intraspecific variability [e.g., Ganssen et al., 2011].

2.1.2 Photosymbionts

Eleven extant species of planktic foraminifera living in the photic zone incorporate photosymbionts into their life strategy [e.g., Hemleben et al., 1989; Norris, 1998]. These photosymbionts exhibit a strong influence on the microenvironment of their host and can offset stable isotope values significantly. For example, Spero [1992] cultured Orbulina universa, a symbiont-bearing mixed layer species, under a variety of light conditions, from total darkness to high irradiance. O. universa living in total darkness and low light precipitated their tests in equilibrium with expected values. Specimens living under high light, however, produced tests that were up to 0.3‰ depleted in oxygen isotopes [Spero, 1992]. Light levels in the upper water column can be expected to vary seasonally due to the angle of the Sun, cloud cover, turbidity, etc.

2.1.3 Depth Zonation

Planktic foraminifera occupy a variety of depth habitats in the upper water column, from the shallow mixed layer to 1000 m depth [e.g., Hemleben et al., 1989]. These life positions are summarized in Figure 1. Taxa from different depth levels have different average life-spans, often migrating in the water column throughout their life-span, and record unique temperatures and salinity in their tests [e.g., Kozdon et al., 2009]. While the selection of an individual species for isotopic analysis will restrict variations in calcification depth somewhat, it is important when comparing species with different life habits or when attempting to reconstruct a “bulk” signal from multiple species (a practice occasionally employed in older records when indurated sediments or sample size limitations prohibit single species analysis). Modeling foraminiferal calcification as a single depth is an oversimplification, however, as foraminifera calcify at varying depths throughout their lifecycle, with some undergoing gametogenesis at depths far below their previous habitat. Detailed life history modeling, while important, is beyond the scope of this current effort.

Details are in the caption following the image
Example thermocline, halocline, and depth habitats of planktic foraminifera for IODP Site U1406 and ODP Site 803. Seasonal temperature (blue [Locarnini et al., 2013]) and salinity (green [Zweng et al., 2013]) profiles together are used to calculate a δ18Oforam profile (black; see text for details). Forams (Globigerinoides ruber, Globigerinoides bulloides, Orbulina universa, Globigerinoides sacculifer, Neogloboquadrina dutertrei, Globorotalia menardii, Globorotalia tumida, and Globorotalia truncatulinoides) after Kennett and Srinivasan [1983] and Schiebel and Hemleben [2005]; depth habitats after Schiebel and Hemleben [2005]. Habitats are rough estimates of mean calcification depths, while the grey box is a rough estimate of a “typical” range for depth habitat for planktic foraminifers as a group.

2.1.4 Growth Timing

The number of foraminifera that reach sexual maturity depends on food availability [Schiebel and Hemleben, 2005]. For example, the highest abundance of mature tests in the North Atlantic occurs during the spring phytoplankton bloom [Schiebel et al., 1995]; seasonal plankton blooms also account for most of the foraminiferal abundance in the Arctic [Volkmann, 2000] and Antarctic [Spindler and Dieckmann, 1986]. In mesotrophic to oligotrophic environments, meanwhile, far fewer individuals survive to adulthood [Schiebel and Hemleben, 2005]. This means that regions with strong seasonal control on productivity will preferentially preserve isotope signals from the most productive time of year. This has the effect of skewing the observed δ18O values to one season (the peak growth season) and away from others.

2.1.5 Intraspecies Variability

The late 1990s saw a revolution in foraminiferal taxonomy, as researchers studying foraminiferal genetics realized that significant biological variation existed within commonly accepted morphospecies [e.g., Darling et al., 1996, 1997, 1999; Huber et al., 1997; de Vargas et al., 1997, 1999]. To date, 54 cryptic species have been identified in modern planktic foraminiferal populations, including many commonly used for paleoceanographic proxy work (Kucera and Darling [2002], Darling and Wade [2008], and Morard et al. [2013]; though see André et al. [2014]). These genetically unique species often display differing “ecological preferences” [Huber et al., 1997; Kuroyanagi and Kawahata, 2004; Morard et al., 2009, 2013; Aurahs et al., 2011]; Kucera and Darling [2002] emphasize the importance of distinguishing between genotypes when selecting specimens for geochemical analysis [see also Thirumalai et al., 2014]. The importance of this excellent advice is starkly illustrated by the δ18O variation between two common genotypes O. universa, morphologically distinguished as a thick-shelled and thin-shelled variety. These two genotypes are known to calcify at different depths and display an offset in δ18O values up to 0.5‰ [Deuser et al., 1981; Deuser, 1987; Marshall et al., 2015]. Thus, a wide variation of values may exist in a coeval population of a single morphospecies.

Individual species also sometimes display a size-dependent fractionation. For example, Spero and Lea [1996] document a size effect on δ18O values in Globigerina bulloides (~0.8‰ from smallest to final chamber), which they attributed to size-dependent fractionation [see also D'Hondt and Zachos, 1993]. Ezard et al. [2015] further documented the size-dependent fractionation effect in a variety of species. The solution is the common practice of picking from a restricted size fraction.

2.2 Oceanographic Variability

Variations in sedimentation rate and sample size mean that an individual deep-sea sediment sample may represent tens of thousands of years or merely tens of years; this is an important factor in understanding the uncertainty of isotopic measurements. Temporal variability can lead to several issues, from important seasonal or El Niño–Southern Oscillation (ENSO) variation [e.g., Thirumalai et al., 2013] to Milankovitch-scale (10 kyr to >1 Myr) variation. These changes all manifest as changes in salinity (water mass changes, upwelling, and changes in evaporation-precipitation) and temperature (due to climate forcing or oceanographic changes in water mass or upwelling), which are subsequently recorded by foraminifera [e.g., Kroon and Ganssen, 1989; Peeters et al., 2002; Marshall et al., 2015]. Additionally, changes in local oceanography can change the composition of the assemblage and/or result in shifts in the dominant season in the record (which could manifest in a sample missing the target species, for example). FIRM can handle seasonal changes (see below) but is currently not equipped to handle the longer-term sorts of this uncertainty. Temporal shifts in oceanographic properties, like large-scale water mass changes, are not possible within the current model.

2.2.1 Seasonality

The most common and probably most significant source of variability within a sample is a seasonality. Local variations in oceanographic parameters, including water mass changes, upwelling, precipitation, and temperature can be found in nearly every region of the world ocean, although they are most common in high latitudes and regions prone to monsoons. For example, see Figure 1 for a comparison of seasonal temperature and salinity profiles for an equatorial site and a temperate northern site [Locarnini et al., 2013; Zweng et al., 2013]. This demonstrates the range of possible seasonal changes at various types of study sites; obviously, sites with strong seasonal shifts will have higher errors associated with seasonality.

Additionally, many localities, especially in the middle to high latitudes, have higher planktic foraminiferal abundances during particular times of the year or a succession of dominant species from season to season [e.g., Deuser et al., 1981; Thunell et al., 1983; Deuser, 1987]. In a sediment trap study in the northeast Pacific at 50°N latitude, Sautter and Thunell [1989] found a seasonal succession of foraminifera that was dominated by normal high-latitude taxa (Neogloboquadrina pachyderma, Neogloboquadrina incompta, and Globigerina quinequeloba) but included temperate taxa (O. universa and G. bulloides) during warmer months. A stable isotope data set based on these seasonally common species would result in data skewed toward the warmer months. Obviously, these variables are highly site dependent and will vary based on latitude, water mass regime, upwelling or downwelling, sea ice, and so on.

2.2.2 ENSO-Scale Variability

The impact of multiyear variability, such as the El Niño–Southern Oscillation (ENSO), is difficult to determine. It is logical to assume that ENSO-driven temperature changes should result in a wide range of δ18Oforam values within a single sample from ENSO-sensitive regions, and indeed, analyses of individual foraminiferal tests have been used to reconstruct ENSO variability in the geologic past [e.g., Koutavas et al., 2006; Leduc et al., 2009; Khider et al., 2011]. Recent statistical interrogation of the uncertainty inherent in these individual foraminiferal analyses by Thirumalai et al. [2013], however, found that these analyses may actually be dominated by seasonal cycles in some localities. While we do not include an ENSO term in our model, variability at this scale can be an important factor to consider.

2.3 Postdepositional Variability

2.3.1 Bioturbation

Bioturbation can play a significant role in altering a time series from a sedimentary sequence [Bard, 2001]. By the stirring of sediment within that sequence, individual grains are moved up and down the column, attenuating the recovered data set [Bard, 2001]. The importance of that smoothing is a function of the sampling interval, the degree of bioturbation within those sediments, and the underlying record (e.g., if the values are invariant through the interval of interest, then bioturbation has no effect). The most rudimentary model of bioturbation, with a constant sedimentation rate and a constant rate of bioturbation, would distribute individual foraminifera in a roughly normal distribution about the depth of original deposition (though see more complicated techniques in Trauth [1998]). In fact, the common practice of taking a running mean through high-resolution data sets [e.g., Coxall and Wilson, 2011] roughly mimics this process, pulling data from stratigraphically contiguous samples together. Bioturbation is not modeled here; our efforts are to model an individual sample, without the inclusion of a time or depth dimension.

2.3.2 Diagenesis

Diagenesis is a persistent problem within the study of foraminiferal stable isotopes, particularly within planktic foraminifera. Pearson et al. [2001] demonstrated the effect of diagenesis on moderately preserved planktic foraminifera when compared with pristine individuals from Tanzania [see also Edgar et al., 2015]. The effect of diagenesis is to modify the original δ18Oforam value driving it toward the values for <100 m below seafloor burial depths (Edgar et al. [2013] and further discussion of Edgar et al. [2015]). While this has a negligible (though real) effect in benthic foraminiferal studies [Edgar et al., 2013] the effects are felt most strongly in planktic foraminifera. There is a larger difference between diagenetic calcite and planktic values than benthic values; thus, planktics have “more to lose.” Often fossil foraminifera are evaluated for diagenesis on a visual basis, with “glassy” tests assumed to be unaltered, translucent tests somewhat altered, and opaque or “sugary” tests significantly altered. A range of preservation states is possible within a single sample, and the inclusion of multiple preservation states can skew an isotope data set.

2.4 Analytical Variability

2.4.1 Machine Error

The analytical variability within stable isotope mass spectrometry is not insignificant. Typical “workhorse” mass spectrometers employed by the paleoceanographic community have error values of 1σ ≈ 0.07‰; we use this value for our machine error term in FIRM. This does not include the within-sample reproducibility. Machine error is something mentioned sometimes in the literature but rarely, if ever, included in figures. It is, however, an important consideration.

2.4.2 Statistical Uncertainty

Ideally, each data point within all paleoceanographic records would be built upon large numbers of specimens (n > 30) to provide a robust mean value within that population [Killingley et al., 1981; Schiffelbein and Hills, 1984]. In the context of modern planktic foraminiferal isotope work, this is rare either due to the scarcity of the desired species or due to the time required to pick the specimens. Belaboring this point is redundant especially considering the previous discussion of the variability inherent in planktic foraminiferal isotope values. Given that heterogeneous nature of the data, it is plainly obvious that more individuals are better within each analysis. The more important question is: how few is reasonable?

2.5 Previous Studies on Uncertainty and Error in Planktic Foraminifera

The question of how few is enough goes back at least as long as mass spectrometers were capable of running smaller numbers of specimens. One early study by Killingley et al. [1981] found interspecimen variability of ~2‰ (results echoed later by Löwemark et al. [2005] and Ganssen et al. [2011]) and suggested that at least 50 specimens be run to obtain a robust value. This presented a problem beyond the significant time investment required: obtaining 50 specimens of some species from discrete samples in deep-sea cores is often impossible.

An early attempt at quantifying this problem was made in 1984 by Schiffelbein and Hills, who used a jackknife analysis to estimate the confidence level for three planktic species: O. universa, “Globigerinasacculifer (now Globigerinoides sacculifer), and Pulleniatina obliquiloculata. This work was hampered at the time by the necessity to run extraordinarily large tests (~600–900 µm; which often show a size effect in stable isotope values—see section 2.1.5 above) in order to meet the sizable mass requirement of a 1980s era mass spectrometer. The resulting data present a fairly damning view of planktic stable isotope values. For example, in their analysis a reproducible result within then-standard machine precision (± ~0.1‰) requires 417 G. sacculifer. Ironically, process-based studies like this are the only ones that ever run close to that number of tests per sample anymore.

This probably leaves the reader, as it did the authors, with a simple question: Why do published planktic foraminiferal isotope records work at all, given that the difficulties with vital effects, preservation, life-habit changes, seasonality, etc., and the large uncertainty ranges suggested by the above studies for common isotope sample sizes robustly reproducing foraminiferal isotope curves seem implausible? Of course, planktic foraminiferal isotope records clearly are reproducible on long timescales across ocean basins, as can be seen empirically from countless studies. Several more recent analyses suggest that planktic isotope records work quite well [e.g., Martinson et al., 1987; Mashiotta et al., 1999; Rohling et al., 2014].

Perhaps most relevant to the current work, Thirumalai et al. [2013] developed a synthetic methodology to investigate the sensitivity of single-specimen foraminiferal isotopic analysis and its usefulness in reconstructing ENSO variability. Essentially, they use the same technique we employ in order to investigate single-specimen analysis utilized in studies of high-frequency change rather than multispecimen analysis typically utilized in investigations of longer time series, as we do. In doing so, they demonstrate that planktic foraminifera in the eastern Pacific Ocean are more sensitive to changes in seasonal cycles than changes in ENSO amplitude [Thirumalai et al., 2013]. Schmidt [1999] used the Goddard Institute for Space Studies global ocean model to forward model foraminiferal calcite values, investigating the covariation of the different factors controlling δ18Oforam. FIRM represents an improvement over these studies from its open-source nature and variety of parameters and, in the case of the Schmidt model, its reliance on real data to simulate the isotopic values.

Ezard et al. [2015] examined the relationship between a number of parameters and oxygen and carbon isotopes in planktic foraminifera. They conclude that the main parameters controlling oxygen isotope values are size and local oceanographic conditions, although they were unable to robustly resolve the slope of these relationships due to limited statistical power. Therefore, we cannot directly transfer their results into our model; the slopes of the size-isotope relationship would be required to determine the variability within FIRM. Our model is independent of their results, approaches the problem from an entirely different methodology, thus acting as a test for some of their conclusions.

3 Model Methods

FIRM (S3; https://github.com/Fraass/FIRM) is a tool developed to derive confidence values for individual isotopic analyses based on a variety of different oceanographic, geologic, and analytical parameters (Figure 2). Importantly, we are not attempting to fully model the entire process of foraminiferal calcification. Instead, we are trying to include only enough parameters to estimate the uncertainty in foraminiferal δ18O records. Like all models, FIRM is a compromise. The parameters do not fully capture the process they purport to simulate: planktic foraminifera calcify at different depths throughout their lives, for example, while FIRM only employs a single depth for calcification. The use of multiple depths within a single test requires either ~20 different calcification depths (one for each chamber) and the approximate mass addition to the test or some algorithmic estimation of that process. This would still be a compromise and would add what we consider unnecessary complications. FIRM contains all the parameters necessary to faithfully reproduce foraminiferal δ18O values from real data sets (see section 4.7 below).

Details are in the caption following the image
Schematic design for Foraminiferal Isotope Reproducibility Model (FIRM). Depicts the transformation of temperature (T) and salinity data through respective seasonal thermocline and haloclines (T/Hcline) to synthetic δ18Oforam values, synthetic isotopic analyses based on n tests, and then assessment of 95% confidence limits to assign an uncertainty estimate.

Here FIRM was employed to explore the variation expected from different species of planktic foraminifera and tested against two well-resolved data sets composed of single-specimen analyses [Schiffelbein and Hills, 1984; Koutavas et al., 2006]. FIRM is composed of several sections run in sequence. At the most basic, it requires a location, number of individuals (n), a water depth range (top-bottom), and the number of iterations to run. More complexity can be added by modifying parameters (e.g., diagenesis, vital effects, misidentification, differing masses, seasonality, and machine error).

Additionally, there is a well-known difference between symbiotic foraminifera grown in light versus dark environments [Bemis et al., 1998]. In life this could lead to difference (though likely very subtle) if the foraminifera was producing calcite on a cloudy day or lived in a cloudier month, year, decade, etc., than an identical specimen in the same sample; this is impossible to model from existing oceanographic data sets. We have not explicitly tested this here, but the equations for doing so are included in FIRM. Lastly, the underlying data set is a seasonal average rather than the daily data which would be needed to construct a closer approximation of the monthly foraminiferal life cycle.

The various parameters were chosen as a balance between accurately describing the history of a foraminiferal test and a tool simple enough to be employed by the entire community. FIRM is a relatively complete look at the major factors in oxygen isotopes, and the entire model is open source; further expansion to include other parameters or modification to increase complexity is welcome. The model is also entirely process oriented. Calculations proceed as close as possible to the order processes do in practice, with calcification first, burial next, with analytical issues last. Thus, we hope the model system and results are as intuitive and transparent to paleoceanographers as possible.

First, the user chooses a location (latitude/longitude), and FIRM automatically accesses the predownloaded World Ocean Atlas 2013 (WOA13) global compilation data set to provide the temperature [Locarnini et al., 2013] and salinity [Zweng et al., 2013] relationship to depth at that location (WOA13 data provided by the NOAA/Department of Commerce (DoC)/National Environmental Satellite, Data, and Information Service/National Centers for Environmental Information (NCEI), Maryland, USA, are from https://www.nodc.noaa.gov/OC5/woa13/). Second, two equations relate salinity (s) to δ18Osw, equation 2 for depths 0–77.5 m below sea level (mbsl), and equation 3 for depths below 77.5 mbsl (“2012” equations [Conroy et al., 2014]). These equations are for the central tropical Pacific Ocean. While δ18Osw to salinity relationships can vary, using a single system of equations seems like an appropriate first step. This can easily be altered to include specific δ18Osw-salinity relationship across any oceanic basin by changing only two lines of code should that be desired. Including the second equation results in minimal changes to the overall ±95% confidence interval (CI) values, typically only increasing the values by ~0.01–0.03‰.
urn:x-wiley:08838305:media:palo20393:palo20393-math-0002(2)
urn:x-wiley:08838305:media:palo20393:palo20393-math-0003(3)

FIRM then generates a single depth from the depth habitat parameters (top and bottom possible depths) defined by the user. The depth habitat parameter can be used in one of two ways: an even chance of drawing anywhere within the range (uniform distribution) or a normal distribution. In the normal distribution, the midpoint of the range is the mean of the distribution, with the standard deviation equal to (top − bottom)/6. Normally distributed habitats are probably more ecologically sound, as foraminiferal habitats are likely optimized at specific levels within the pycnocline [tracking a specific food source, buoyancy, etc. [Hemleben, 1989]. FIRM samples the depth range n times for each iteration.

Next, several equations can be used to convert temperature and δ18Osw to foraminiferal calcite δ18O (collated in Pearson [2012] from a variety of sources). Notably, there are equations for both nonsymbiont bearing and symbiont bearing species, which should be considered carefully when considering extinct species without well-established ecologies. Within section 4, the Erez and Luz [1983] cultured G. sacculifer equation is typically used except when otherwise noted, where (δ18Oforam) is foraminiferal calcite and (t) is temperature at depth. The differences in equations have a small effect on the uncertainty (up to a 0.03‰ difference) but a substantial one on the modeled mean (up to 0.76‰ difference). All of the equations found in Pearson [2012] can be employed within FIRM at the discretion of the user.
urn:x-wiley:08838305:media:palo20393:palo20393-math-0004(4)

It should be noted that uncertainty within the above coefficients are not included in this first calculation but are included later in the vital effect parameter. Calculation of δ18Oforam is repeated for the number of individuals included in the measurement (n). This is the end of the basic inputs and calculations involved in running the model at its simplest. The rest of the parameters discussed below can be used or not based on user preference.

In the following sections we use “ideal” to mean a model run with only the number of individuals and depth habitats employed (see also Figure 2); the rest of the parameters are unused (seasonality is enabled or not, depending on the individual experiment, and is noted). Thus, the ideal case is one without vital effects, diagenesis, or so forth.

A simple error term is then used to mimic the previously discussed vital effects, which is a blunt catchall for a variety of species-specific effects and the “messiness” of biological systems. While the possibility of placing a user-specified value as the vital effect term is permitted in FIRM, by default (and in this study) a value of 0.146‰ is used. This value is derived from the goodness-of-fit terms via the temperature to δ18Oforam calibrations of Bemis et al. [1998] and is roughly equivalent to 0.7°C. This is somewhat unsatisfactory, as it flattens all biological effects into a single numerical value. However, as a first step, this is a reasonable approximation of a very complex system into an understandable term for the model. FIRM applies the vital effect modification by sampling a normal distribution with 1σ = 0.146‰ then adding that vital effect value to each synthetic δ18Oforam value.

FIRM also includes several modifications to test the effect of various nonideal conditions. Diagenesis can be modeled, expressed as the percentage of the original signal destroyed and the number of diagenetically altered individuals (0 − n). Within an individual sample, tests can be of differing preservation. Differential preservation is a combination of both diagenesis and bioturbation. Diagenesis happens to each test evenly within a single horizon, though there can be some variability between different tests [Branson et al., 2015]. Bioturbation then mixes tests of different horizons into the sampled interval, giving the appearance that a single test has a completely different diagenetic history than its neighbor. While this is an important distinction, our use of “diagenesis parameter” is only a small oversimplification of terminology. This is the only inclusion of a bioturbation as well, as the aim is to model individual time slices and samples rather than time series.

The effect of different sizes of individuals is also included. Though the use of specific size fractions for picking is rote, individual tests can still vary in size within a size fraction. When dissolved in acid, a heavier individual will contribute a larger amount of gas while a smaller individual contributes less, weighting the results toward the value of the heavier individual and away from the lighter. Thus, a simple parameter is used to vary the possible size of the tests within the sample. This, however, does not allow for modeling the growth rate of the individual, which can alter the oxygen isotope signal significantly [Bemis et al., 1998] or the size-dependent relationships of Ezard et al. [2015].

Seasonality is modeled very simply using two parameters. One term describes which of the four seasons to include, while the second provides the percent chance of drawing from each season. For example, one run could be confined to just spring (as in the ideal case discussed in section 4.1), representing a taxon with a very specific seasonal bloom. A comparative run could be spread unevenly throughout the year (spring 20%, summer 50%, fall 20%, and winter 10%) representing a more broadly growing taxon with a preference for summer.

Lastly, the inclusion of misidentified individuals can also be modeled. The user defines the chance that an individual will be misidentified. An important difference is that instead of the parameter being “1 in 10” or “5 in 30,” here there is a percent chance that a test will be included that does not fit with the desired taxon. While diagenesis, for example, is a knowable quantity, as the worker should be grading the tests, a misidentification is unknown. (If it were known, the worker would not be including that test.) If an individual is randomly selected to be misidentified, that individual then has a randomized depth, from 0 to 500 m (again, user-defined depth ranges), and is included in the final measurement in place of the properly identified test. This is a worst case scenario, as misidentified specimens are most likely closely related to the desired species, likely within the same genus. The depth habitats for species within the same genus are typically similar (though there are many, many exceptions), and so the effect of including an individual from a sister taxon is not purely stochastic. Additionally, not every individual included in an analysis calcifies at the same depth or under the same conditions. With the FIRM “misidentification” term, one can model intraspecific variability due to the inclusion of a wider size range and thus calcification depth [e.g., Spero and Lea, 1996]; this term is also very useful in estimating the error from the undetected inclusion of cryptic taxa of a single morphospecies [e.g., Marshall et al., 2015]. For example, Lohmann [1995] found that a small percentage of G. sacculifer calcify a few hundred meters below most of the G. sacculifer population. The above term could be used to model the uncertainty of using G. sacculifer, taking into account the differences in depth habitat.

After setting the various parameters above, the resulting δ18O values are averaged together (or scaled to their relative contribution if using the variable mass parameter). The model then proceeds iteratively, repeating the process of generating a single value from the δ18O values derived. Here we employ 10,000 iterations. After the aforementioned calculations, an additional machine error parameter modifies the values if employed, adding the results of a randomly sampled normal distribution centered at 0, with a standard deviation provided by the user (here 1σ = 0.07‰, unless otherwise noted).

4 Results

Model outputs can be used to interrogate general relationships between variables which control foraminiferal isotope measurements and to create uncertainty estimates for particular research locations. Here we show example data from two sites, Ocean Drilling Program (ODP) Site 803 and Integrated Ocean Drilling Program (IODP) Site U1406 (Figures 3 and 4), to demonstrate both of these applications. Site 803 is an equatorial site in the western Pacific Ocean. Site U1406 is in the northern Atlantic Ocean; together, these sites offer two very different sets of temperature, seasonality, and water column structure (Figure 1). Model results are reproducible to ~0.01‰ with 10,000 iterations.

Details are in the caption following the image
Model results from ODP Site 803, using a 0–70 m depth habitat, single season (Northern Hemisphere summer), and otherwise ideal conditions as described in the text. Y axes are unlabeled because the numerical frequency of the modeled values is unimportant; only the shapes of the distributions and the variation about the x axes are important. Red lines depict the 95% confidence limits. (a) Sampled temperature values. (b) Estimated δ18Osw from FIRM. (c) Generated individual test values for a uniform (even) depth habitat. (d to h) Synthetic isotopic analyses based on an increasing number of tests per analysis. For example, 7 includes 20 tests into each isotopic measurement. (e) Synthetic isotopic analyses using a normal rather than uniform depth habitat. (i and j) Individual tests and isotopic analyses with varying amounts of diagenesis included in a single test. Note the change in scale. (k) Individual tests and isotopic analyses with a 10% chance of misidentification.
Details are in the caption following the image
Model results from seasonality experiments using western equatorial Pacific Ocean (Site 803) and Northern Atlantic Ocean (Site U1406) as examples. Seasonality is depicted by the circle in the middle of the figure. Black denotes a strongly weighted season while grey is a less weighted season. Figure follows the conventions of Figure 3.

In the following experiments we select two depth ranges for foraminiferal calcification (0–70 m, “upper water column dweller”; 70–150 m, “upper thermocline dweller”) for a single species during a single season (Northern Hemisphere summer) with no intraspecific variability, diagenesis, or misidentification. We refer to these as ideal conditions because it includes many assumptions that are implicitly made in most studies about planktic foraminiferal isotope data regarding lack of signal degradation due to seasonality, diagenesis, intraspecific variability, and so forth. We begin, then, only by examining the effect of increased sample sizes.

4.1 Sample Size

There is an obvious, intuitive correlation between a greater number of specimens and increasing precision, but it is still useful to quantify. Individual foraminiferal values generated by the model are depicted in Figure 3c (Site 803), with the Monte Carlo results summarized in Table 2 and Figures 3d–3h. Varying n can have a relatively weak control on the uncertainty, relative to the difference between habitats (Table 2). Depth habitat's control here is due to where the largest variability in oxygen isotopes is possible; if the depth habitat is homogenous, the 95% confidence interval (CI) will be small; if there are large variations in temperature or salinity, then there will be substantial uncertainty values even with large numbers of individuals. See later discussion about the validity of our chosen depth habitat.

Table 2. Model Results From the Realistic and Ideal Experiments Described in the Section 4.6a
n Realistic 95% CI (‰) Ideal 95% CI (‰)
Site 803 0–70 mbsl
5 0.06 0.03
10 0.04 0.02
20 0.03 0.02
50 0.02 0.01
400 0.01 <0.01
Site 803 70–150 mbsl
5 0.31 0.18
10 0.22 0.12
20 0.15 0.09
50 0.10 0.06
400 0.03 0.02
Site U1406 0–70 mbsl
5 0.33 0.19
10 0.23 0.14
20 0.16 0.10
50 0.10 0.06
400 0.04 0.02
Site U1406 70–150 mbsl
5 0.13 0.08
10 0.09 0.06
20 0.07 0.04
50 0.04 0.03
400 0.01 0.01
  • a n denotes the number of individuals included in analysis. ODP Site 803 is a western equatorial Pacific Ocean site, while IODP Site U1406 is a Northern Atlantic Ocean site. Confidence intervals are generated with 10,000 iterations (resolving to ~0.01‰). The 0–70 mbsl is an approximation of a mixed-layer depth habitat, while 70–150 mbsl is an approximation of a thermocline habitat.

4.2 Depth Habitat Distribution

Foraminifera probably do not live evenly distributed within a range of depths. They have an optimum habitat at a specific depth (or density, etc.), where they are commonly found, and then a decreasing likelihood above and below that depth. Results of modeling this difference (uniform versus normal distributions) are presented in Figures 3d and 3e. Normal distributions result in ~ 1/3 smaller distributions.

4.3 Intrasample Variability

Next, we vary each parameter in FIRM to explore their effects (all experiments n = 10, normal distributions; Table S1). Adding a diagenetically altered specimens tends to have little-to-no effect on the uncertainty (precision) but importantly shifts the mean values (accuracy). In extreme cases (e.g., 5 tests in 10 altered 25% or 50%), the precision improves. Varying size has a negligible effect on the CI values, only increasing them a maximum of 0.02‰ and only if size is allowed to vary by >80% and only in instances with higher uncertainty to begin with (Table S3). The vital effects parameter, derived from the goodness of fit from Bemis et al. [1998], has a 0.00–0.07‰ effect on the uncertainty.

To test misidentification, misidentified individuals are randomly drawn from 0 to 500 m below sea level. At Site 803, a 10% possibility of adding an individual from a random depth increases the uncertainty to ~0.52‰ (Figure 3k and Table S3). As the distributions become more heavily skewed, a ±95% confidence interval becomes a poor measure of confidence but reported here for ease of comparison to the other presented experiments and intuitive value. Ten percent misidentification is rather abysmal micropaleontological skill. A more reasonable identification precision of 0.1% results in almost the same CI as the 10 individual ideal condition experiment runs, depending on the habitat and local oceanography. This clearly demonstrates the prime importance of local oceanography and consideration of species habitats on the robustness of any foraminiferal data set, poorly picked or not. It is important to note that while the CI values quickly reach ideal condition CIs with modest identification skill, the mean values change (0%: −2.56‰; 10%, −2.25‰; and 1%: −2.51‰). The precision of the measurement is “good” (though skewed), but the accuracy is not, resulting in the value measured not recording the desired oceanography.

4.4 Seasonality

Thus far, uncertainty estimates have assumed constant water conditions, with the variation limited to calcification depth, biological differences, and postdeposition. Seasonal changes in water temperature and salinity, however, are likely to be the biggest variable in any foraminiferal data set and are far larger than the terms investigated thus far. Figure 4 depicts histograms of 10 “mixed-layer” individuals for both ODP Site 803 and IODP Site U1406. In Figures 4a, 4b, and 4d each individual foraminifer has an equal chance of being from the “activated” seasons.

Confidence limits can be quite broad when including multiple seasons, as seen in the multiseason runs from Site U1406 (Figures 4a–4d). The presence of a single dominant season also changes the mean (Figure 4c). Indeed, the mean for each data set varies between each model run by slightly more than 0.5‰, although the range of values within the CI tend to be larger than in the Site 803 data. Both of these observations again demonstrate the importance of local oceanographic conditions on determining the uncertainty of δ18Oforam values.

4.5 Hypothetical Realistic Examples

The previous experiments have only explored single parameter effects on expected δ18Oforam variability. To better identify the “realistic” uncertainty associated with a typical δ18Oforam investigations, several parameters were employed at the same time. FIRM parameter settings were as follows: normal depth distribution; mass variation 20%; diagenesis ~1/10 * n rounded up (e.g., for n = 5, one test altered) 50%; vital effects 0.146‰; misidentification 0.01%; seasons (three seasons, with 20%, 50%, and 30% chances); and machine error 0.07‰. These are not perfect conditions but reasonable for some records: diagenesis and misidentification are issues that all paleoceanographers attempt to minimize, though diagenesis is sometimes unavoidable in a section, and the effect of seasonality, as well, can be reduced by careful selection of a target species. Even if the depth habitats are wide, a better target species would hopefully have a more limited calcification range. These experiments are summarized in Table 2. Again, a simple result is that more specimens are better, uncertainty lowers to ~0.07–0.14‰ of the ideal conditions CI value if the worker picks >15 individuals for isotopic work. That acknowledged, the uncertainty from these model runs can be quite large. For example, the uncertainty with our above realistic conditions for a thermocline dweller in the western equatorial Pacific Ocean are 0.16‰, even when including 20 tests in the analysis.

4.6 Testing With Empirical Data

To see if the model accurately represents the oxygen isotope system, two real data sets were compared to FIRM output. First, data reported by Koutavas et al. [2006] was used. Using their site (1.216°S, 89.683°W), we generated synthetic data for 33 G. ruber. We used the Spero et al. [2003] G. sacculifer highlight equation and assumed no diagenesis or misidentification. G. ruber (white) is a year-round mixed-layer dweller, so we used 0–50 m for our depth habitat and 25% chance of drawing in each season. Also used are the following: 0.146‰ vital effect, machine error 0.1‰ (higher due to smaller mass), and mass variation 20%. We then compared the standard deviations of the individual values (33 individual synthetic tests per iteration * 10,000 iterations), instead of the typical FIRM output of synthetic amalgamations of multiple tests. The normal run results in a mean value of −1.4‰, and a standard deviation of 0.38‰, while uniform distribution results in a mean of −1.39‰ and standard deviation of 0.5‰. The real data, for comparison, are ~ −1.72‰ with a standard deviation of 0.42‰; only 0.02‰ different standard deviation than the FIRM normal distribution results. The Koutavas et al. [2006] data are not a perfect fit for our scenario, since it examines the El Niño-La Niña variability, while the WOA13 data are based on multidecadal mean values. This might explain the differences in mean oxygen isotope values.

Replication of the Schiffelbein and Hills [1984] experiment used the O. universa data (thus the Bemis et al. [1998] equation), at their Ontong Java plateau site (1.4°N, 157.3°E). Individuals were allowed to live between 0 and 200 m, extrapolating from Fairbanks et al. [1980], with a normal depth habitat, and machine precision was adjusted to 0.09‰ as reported. O. universa is a strongly seasonal species, and so a seasonality parameter was included (10% Northern Hemisphere winter, 20% spring, 50% summer, and 20% spring). Here the mean value is slightly more negative (−2.24‰) than the actual values (−2.07‰), but the standard deviations are a good match (~0.44‰ real data and ~0.5‰ FIRM output). These two model-data comparisons should speak toward a reasonably good fidelity in the model simulation, despite the simplifications.

5 Discussion

These results should be heartening for the paleoceanographic community. They do not contain any major surprises and generally suggest that reasonable numbers of foraminifera (i.e., ≥20 individuals) are sufficient to acquire data with reasonable confidence in most situations. Obviously, all workers are at least intuitively aware that “more is better” when it comes to foraminiferal sample size and isotopes. Our model serves to quantify this intuitive knowledge for specific oceanographic circumstances.

Simply altering the number of individuals reveals several significant facts. First, as mentioned, increasing sample size results in a higher precision. In some locations the number of individuals has a quite weak control, as it does at Site 803 “upper water column dweller,” where only 0.04‰ uncertainty (uniform distribution) is gained by increasing the number of individuals by an order of magnitude. In contrast, the increase from 5 to 20 tests in the thermocline drops the uncertainty by 0.21‰ (Table S2). Obviously, thermocline structure varies within our chosen sites (Figure 1), and so the number of individuals required to arrive at a reasonable uncertainty is different given the structure.

Several of the parameters had minimal responses based on the individual experiments. For example, varying the mass, which one might a priori expect to increase test δ18O variance, does not change the uncertainty values in any appreciable way. Similarly, when including reasonable misidentification parameters (<0.1%) they have little effect on the CI. They have a larger effect when considering the realistic experiments, when a larger test could be diagenetically altered, or be misidentified, or two genotypes of a single morphospecies may be inadvertently included.

The experiments in diagenesis illustrate a counterintuitive point. Diagenesis can sometimes slightly improve the precision of stable isotopic measurements (Table S3), though it obviously decreases the accuracy. Diagenesis changes individual δ18Oforam toward inorganic calcite precipitation at or below the water sediment interface (modeled in FIRM as the lowest depth at location). This has the effect of decreasing the variation, as all δ18Oforam values are converging on a single point, causing the uncertainty to decrease. Increasing the number of tests altered or severity of the alteration decreases the accuracy further. The study of diagenesis in foraminifera has not progressed to any sort of quantitative taphonomic method, as the current best methodology is qualitative grading systems [e.g., Sexton et al., 2006; Ando et al., 2010; Edgar et al., 2015]. It would, for example, be speculation to say that a “frosty” foraminifer has lost 50% of the original calcite (though we might be approaching a point when such statements are possible; see Edgar et al. [2015]), while a “chalky” test has lost 70%. Thus, this parameter in FIRM is difficult to employ robustly. It does, however, present a severe cautionary tale about time series with intervals of differing diagenetic overprints. If one were moving from an interval with glassy preservation into an interval with few frosty individuals (e.g., one test altered 50%), a shift of ~0.3‰ could be interpreted as an important change in hydrography rather than a simple diagenetic front.

Results from FIRM can be obviously nonnormal (e.g., Figure 3e). The misidentification parameter, for example, drives values toward a cooler temperature, resulting in a skewed distribution. In that sort of a case, a simple 95% CI is no longer a good measure of precision, as the CI could be +0.1‰ and −0.5‰ of the mean. Seasonal experiments also frequently contain multiple peaks (see below; Figure 4). In these cases, more complicated metrics for measuring data confidence should be employed, since the exact variability about the mean is important. However, as there are such a high number of unknowns and estimated values, refining these confidence values to something other than simple single number error estimates is probably unwarranted at this moment.

Seasonality's effect can be clearly observed within the histograms in Figure 4; the variety of peaks is dictated by the number of seasons in the experiment. The resultant distribution is no longer normal but instead is bimodal (or trimodal or quadramodal). The seasonal contrast within the individual site dictates the differences; a stronger seasonal contrast forces the δ18O values farther apart and more strongly bimodal. This abrupt structure to these distributions is, at least partially, an artifact of the data set used in FIRM having four defined time intervals (i.e., seasons). A more realistic data set would have a more gradational temperature and salinity profile in between the seasons, leading to a less abrupt series of peaks. However, it still demonstrates the importance of the seasonal parameter on the values recorded. An additional wrinkle is that the habitat for various planktic foraminifera is defined by an ecological niche, be it temperature, salinity, food source, or a specific layer of density. As that niche shifts throughout the season, the habitat of the species will change with it. Therefore, the uncertainty estimates discussed here should be looked at as a worst case scenario, as in actual oceanographic terms the foraminifera are varying their calcification depths to fit within the temperature and salinity space better than the abruptly defined depth space the model would suggest. This suggests that perhaps the model overestimates the uncertainty associated with seasonality, although it should still be considered the largest single source of variability in certain planktic foraminiferal isotopic data sets.

Obviously, more tests are better; but in the process of picking, is that a good rule of thumb? Are there exceptions? For example, is it more important to include more individuals, even if they are not as well preserved, or is it better to go with fewer but more pristine specimens? All else being ideal at Site 803, with five pristine individuals the CI is 0.03‰ with a mean value of −2.56‰. Including one additional poorly preserved test (50% original value) to, say, bolster the mass of the sample keeps the CI at 0.03‰ but shifts the mean to −2.08‰. At Site 803 then, a few well-preserved individuals is better than a larger number of tests with inconsistent preservation.

What kind of noise should one expect within a record generated from a weak species concept or cryptic, nondepth conforming, species? Here we use a 0–50 mbsl range for the tight species concept (10 individuals) and then 0–50 mbsl with a 10% chance of misidentification (0–100 mbsl) again at Site 803. The tight species concept (analogous to picking in a strict sense form) results in a 0.01‰ uncertainty (−2.58‰ mean), while the loose species concept (sensu lato) results in a 0.03‰ uncertainty (−2.57‰ mean), a surprisingly modest problem relative to the machine error. The same experiment run at Site U1406 results in 0.06‰ higher uncertainty and a 0.05‰ higher mean value. Adding in the other effects due to nonideal conditions, clearly it would be important to use a strict species concept when the time series fidelity is important beyond the broad trends. This would be particularly true in certain portions of the ocean, like Site U1406.

5.1 Caveats

While FIRM represents a step forward in developing more robust foraminiferal time series, there are important caveats concerning applications. The first is that FIRM relies on modern thermocline and halocline structure, as well as an understanding of the ecology of foraminifer species to arrive at the estimate of uncertainty. These are most well constrained in the modern. Choosing a best estimate modern analogue to the paleoceanographic circumstances is possible or even obvious for many studies, especially for more recent intervals like the Holocene. For instance, a worker studying the Medieval Climate Anomaly might make the reasonable assumption that the oceanographic conditions at her site are approximately similar to the modern; she can then use this model to estimate an uncertainty value for her entire core. Obviously, locations without modern analogues (e.g., Cretaceous epeiric seas) are a problem, as are times with independently unconstrained oceanographic circumstances. FIRM also explicitly only models a single point in time, thus ignoring a time series where, for example, the researchers may aim to study a change in the thermocline. Applying FIRM in these instances could lead to circular reasoning (e.g., reconstructing the thermocline structure and then using that reconstruction to drive the model uncertainty values). We would suggest that uncertainty estimates developed with circular reasoning, while not ideal, are better than no discussion of uncertainty at all. One simple solution is to develop reasonable best and worst case scenarios, given the available oceanographic data, estimate uncertainty through FIRM and present those alongside the data. Other caveats due to the simplifications in the calcification simulation have been discussed prior but nevertheless are important considerations when moving forward.

Further work will focus on developing the ability to incorporate time series, so that rather than a single uncertainty term for an entire study, multiple estimates of uncertainty can be made for a long paleoceanographic data set. Time series will also require further developments within FIRM to include temporal parameters, including a deeper methodology surrounding bioturbation, time averaging, and so on.

6 Conclusions

Our work supports the conclusions of previous investigations into foraminiferal isotope statistics: the more foraminifers per sample the better. The obvious question is “exactly how many specimens are needed for a statistically robust data point?” To which the at-first unsatisfying answer is “it depends.” It depends mainly on the local oceanography and climate of the place from which a particular sample comes, it depends on what species is selected for analysis, and it depends on the specific question a researcher is asking. Since a sample size rule of thumb is unwarranted, the better tactic is to decide a reasonable uncertainty value, estimate the number of tests required for a particular site and, most importantly, consider and report that uncertainty for the resulting isotope value in the analysis and subsequent publication. FIRM provides a quantitative way to do that across the world ocean.

The benefits of FIRM are twofold: (1) as a tool for estimating uncertainty and error for paleoceanographic researcher, and (2) as a means for exploring the impact of intraspecific variability, interspecific variability, sample size, seasonality, and geographic location on planktic foraminiferal oxygen isotopic data.

Acknowledgments

FIRM is available from https://github.com/Fraass/FIRM and within the Supporting Information. This work was unfunded. The authors do not have any conflicts of interest. We would like to thank Anna Joy Drury and Kaustubh Thirumalai for thoughtful conversation and insight. Susanna Fraass, Sarah White, Kaustubh Thirumalai, and two anonymous reviewers provided comments and editorial suggestions on earlier versions of this manuscript, which greatly improved the quality of this work. Lastly, we would like to thank R. Mark Leckie for years of encouragement, support, and foraminiferal training which made this possible.