Volume 49, Issue 13 e2022GL098076
Research Letter
Open Access

Plankton Imagery Data Inform Satellite-Based Estimates of Diatom Carbon

A. P. Chase

Corresponding Author

A. P. Chase

Applied Physics Laboratory, University of Washington, Seattle, WA, USA

Correspondence to:

A. P. Chase,

[email protected]

Contribution: Conceptualization, Methodology, Formal analysis, Writing - original draft

Search for more papers by this author
E. S. Boss

E. S. Boss

School of Marine Sciences, University of Maine, Orono, ME, USA

Contribution: Methodology, Resources, Writing - review & editing, Funding acquisition

Search for more papers by this author
N. Haëntjens

N. Haëntjens

School of Marine Sciences, University of Maine, Orono, ME, USA

Contribution: Software, Data curation, Writing - review & editing

Search for more papers by this author
E. Culhane

E. Culhane

Woods Hole Oceanographic Institution, Woods Hole, MA, USA

Contribution: Software, Formal analysis, Data curation, Writing - review & editing

Search for more papers by this author
C. Roesler

C. Roesler

Department of Earth and Oceanographic Science, Bowdoin College, Brunswick, ME, USA

Contribution: Methodology, Writing - review & editing

Search for more papers by this author
L. Karp-Boss

L. Karp-Boss

School of Marine Sciences, University of Maine, Orono, ME, USA

Contribution: Data curation, Writing - review & editing, Funding acquisition

Search for more papers by this author
First published: 18 June 2022
Citations: 4

Abstract

Estimating the biomass of phytoplankton communities via remote sensing is a key requirement for understanding global ocean ecosystems. Of particular interest is the carbon associated with diatoms given their unequivocal ecological and biogeochemical roles. Satellite-based algorithms often rely on accessory pigment proxies to define diatom biomass, despite a lack of validation against independent diatom biomass measurements. We used imaging-in-flow cytometry to quantify diatom carbon in the western North Atlantic, and compared results to those obtained from accessory pigment-based approximations. Based on this analysis, we offer a new empirical formula to estimate diatom carbon concentrations from chlorophyll a. Additionally, we developed a neural network model in which we integrated chlorophyll a and environmental information to estimate diatom carbon distributions in the western North Atlantic. The potential for improving satellite-based diatom carbon estimates by integrating environmental information into a model, compared to models that are based solely on chlorophyll a, is discussed.

Key Points

  • Field observations show that diatom carbon estimates derived from pigment-based proxies are higher than those derived from plankton imaging

  • An updated equation to estimate diatom carbon from in situ or satellite chlorophyll a concentration measurements is provided

  • Environmental data and plankton imagery are applied in a neural network to map diatom carbon concentrations in the western North Atlantic

Plain Language Summary

Diatoms are microalgae that can form large blooms and play important roles in marine food webs and the carbon cycle. Direct measurement of their biomass in the ocean is challenging and time consuming, thus researchers often rely on bulk measurements of pigment concentrations as an approximation of diatom biomass. Here, we compare pigment-based estimates of diatom carbon to those derived from direct measurements of diatom cell counts and biovolume measurements that were obtained with an automated microscope. We show that pigment-based estimates of carbon tend to be higher than those obtained from the imagery-based measurements. We propose a new empirical relationship between diatom carbon and chlorophyll biomass and apply it to earth-observing satellite data to obtain an ocean basin-scale view of diatom carbon in the western North Atlantic.

1 Introduction

Information on phytoplankton populations at large spatial and temporal scales is needed to assess broad-scale changes in phytoplankton communities, which are likely to occur in response to changing environmental conditions (Irwin & Oliver, 2009; Rousseaux & Gregg, 2015), and to develop and evaluate regional and global biogeochemical models. Of particular interest are the diatoms, a globally ubiquitous and diverse group of phytoplankton consisting of an estimated 100,000 species (Mann & Vanormelingen, 2013). Some species can form massive blooms and their aggregation and sinking characteristics have been linked to sequestration of carbon to the deep ocean (Honjo & Manganini, 1993; Jin et al., 2006). While diatoms are highly varied in their adaptations to different conditions and their role in the biological carbon pump (Kemp & Villareal, 2018; Tréguer et al., 2018), for the purposes of regional or global assessment they are often considered as one group. Satellite-based measurements have the potential to provide the large-scale information on the quantity and distribution of diatoms needed to answer ocean ecosystem and climate research-related questions.

Previous studies have successfully linked satellite ocean color and reflectance data with phytoplankton groups assessed using flow cytometry (Thyssen et al., 2015; Zubkov and Quartly, 2003). Regarding diatom presence or abundance, information retrieved from remote sensing data include the analysis of multispectral water leaving radiance anomalies (Alvain et al., 20052008; Rêve-Lamarche et al., 2017), remote sensing reflectance (Rrs(λ)) band ratios (Kramer et al., 2018; Sathyendranath et al., 2004), a neural network approach incorporating environmental data (Raitsos et al., 2008), empirical orthogonal functions between Rrs(λ) bands and phytoplankton groups (Xi et al., 2020), and a method of differential optical absorption spectroscopy (DOAS) that requires high spectral resolution of water-leaving radiation measurements in the blue wavelengths (Bracher et al., 2009; Losa et al., 2017; Sadeghi et al., 2012). Phytoplankton pigment concentrations obtained from high performance liquid chromatography (HPLC) measurements are used in the construction and evaluation of the majority of these algorithms. While pigment analysis is invaluable for many applications and has been effective in assessing different phytoplankton groups (e.g., Kramer et al., 2020; Kramer & Siegel, 2019; Swan et al., 2016), it often lacks validation against more direct observations of community composition. For example, fucoxanthin, a photosynthetic carotenoid pigment found in diatoms (Jeffrey & Vesk, 1997), is commonly used as a marker for this group, although it is not unique to it and is present in other common phytoplankton groups, namely prymnesiophytes and silicoflagellates (Jeffrey & Vesk, 1997; Roy et al., 2011). Moreover, pigment ratios within a population can vary in response to the availability of light and nutrients, and as a result of diel changes in pigment synthesis (Becker et al., 2020; Goericke & Montoya, 1998; Organelli et al., 2017).

Historically available data on phytoplankton assemblages typically do not have the spatial and temporal coverage needed for obtaining sufficient matches with remote sensing data, with the exception of continuous plankton recorder (CPR) data. CPR data have been applied to improve and develop algorithms for the detection of diatoms from remote sensing data (Raitsos et al., 2008; Rêve-Lamarche et al., 2017); however, the data are semi-quantitative. In addition, the CPR data likely underestimate diatom biomass given the 270 μm mesh size it utilizes (Richardson et al., 2006). Advances in plankton imaging now allow measurements of cell concentrations and cell biovolumes from which estimates of carbon per-cell can be derived. We used imaging-in-flow cytometry and HPLC measurements from the western North Atlantic to compare cell imagery-based and accessory pigment-based estimates of diatom carbon. Based on these measurements, we propose a new empirical relationship between diatom carbon and chlorophyll a (Chl a). We also show how satellite-derived information on Chl a, temperature, and salinity can be integrated to estimate diatom carbon, and compare results to those from accessory-pigment and Chl a-based estimates.

2 Data and Methods

2.1 In Situ Temperature, Salinity, and Pigments

The North Atlantic Aerosol and Marine Ecosystems Study (NAAMES, 2015–2018) was conducted in the western North Atlantic onboard the R/V Atlantis, and encompassed different seasons and stages of the phytoplankton annual cycle (Behrenfeld et al., 2019). In-situ data used in this study were obtained both from instruments deployed for continuous (flowthrough) measurement of surface temperature, salinity, and Chl a concentrations, and from discrete water samples collected for the analysis of HPLC pigment concentrations (Figure 1a). The methods and processing protocols of these data types are provided in Text S1 in Supporting Information S1. Note that throughout the manuscript, Chl a from HPLC refers to as “total Chl” and is defined as the sum of the concentrations of monovinyl Chl a + divinyl Chl a + chlorophyllide a + Chl a allomers and epimers.

Details are in the caption following the image

(a) The four NAAMES cruise tracks in the western North Atlantic Ocean. Colored dots show IFCB sample locations; blue = NAAMES01 in November 2015; orange = NAAMES02 in May-June 2016; yellow = NAAMES03 in August-September 2017; and purple = NAAMES04 in March-April 2018 (total n = 4,328). Black squares show locations of water samples taken for HPLC analysis (n = 205). Scenes 1 and 2 from NAAMES02 further analyzed with satellite data are outlined in red boxes. (b) Example diatom images collected underway during NAAMES02. Black 10 μm scale bars on each image are equivalent. Genera/categories of each image: A—Chaetoceros sp., B—likely Guinardia sp., C—Pseudo-nitzschia sp., D—order Naviculales, E—Chaetoceros sp., F—Corethron sp., G—unidentified pennate, H—Thalassiosira sp., I—unidentified centric, J—Guinardia sp., K—Rhizosolenia sp.

2.2 Diatom Carbon Estimated From Plankton Imagery

Phytoplankton cells were imaged with an Imaging FlowCytobot (IFCB, McLean Research Laboratories, Inc.; Olson & Sosik, 2007) with a 150 μm Nitex mesh attached to the intake (Text S2 in Supporting Information S1). Images were classified into the highest taxonomic category possible based on morphology using the EcoTaxa platform (Picheral et al., 2017; https://ecotaxa.obs-vlfr.fr/), and then grouped more broadly into 18 categories, including a category for all combined diatoms. A deep learning classification network was trained and tested using the classified images, and then applied to all images within the data set (Text S2 in Supporting Information S1). The network identified diatoms with 90% accuracy, and in addition was found to correct some mislabeled classified images due to human error (Text S2 in Supporting Information S1). In total, 336,872 diatom cells or chains were identified from 4,328 IFCB samples across all four NAAMES cruises (Figure 1b; Figure S1 in Supporting Information S1). When cell counts were low, multiple samples were combined to increase sample volume and reduce statistical counting uncertainty, as in Chase et al. (2020); details are provided in Text S2 in Supporting Information S1. Following the procedure to combine samples, the final number of IFCB samples used in the present analysis is 1,449. Biovolume (μm3) was calculated for each diatom cell or chain (Moberg & Sosik, 2012; Sosik & Olson, 2007; https://github.com/hsosik/ifcb-analysis/tree/features_v3), and converted to diatom carbon (Cdiat) following the diatom-specific formula reported by Menden-Deuer and Lessard (2000):
urn:x-wiley:00948276:media:grl64411:grl64411-math-0001(1)

On average, diatoms have a lower carbon per volume compared to other phytoplankton (see figure 5 in Menden-Deuer & Lessard, 2000), which may be the result of large vacuoles often present in diatoms (Strathmann, 1967). We calculated uncertainties in Equation 1 using the confidence intervals for the coefficients reported in table 4 of Menden-Deuer and Lessard (2000). A recent study by McNair et al. (2021) showed that the conversion from cell volume to carbon is not greatly affected by the method used to estimate cell volume. Total diatom carbon concentrations for a given sample were normalized to the sample water volume, resulting in diatom carbon concentration per sample (Cdiat_IFCB, units of mg m−3).

2.3 Diatom Carbon Estimated From Accessory Pigments

The fraction of diatom contribution to total Chl a was estimated using a method often referred to as the diagnostic pigment analysis (DPA), which assigns accessory pigments to phytoplankton groups. The application of accessory “diagnostic” pigments was originally used to define phytoplankton size classes (Uitz et al., 2006; Vidussi et al., 2001), and has subsequently been used to define phytoplankton taxonomic groups (Hirata et al., 2011; Losa et al., 2017; Soppa et al., 2014). Using a global data set, the fraction of total Chl a attributed to diatoms was defined by Hirata et al. (2011) as:
urn:x-wiley:00948276:media:grl64411:grl64411-math-0002(2)
where the summed weighted diagnostic pigments are defined as urn:x-wiley:00948276:media:grl64411:grl64411-math-0003 = 1.41Fuco + 1.41Peri + 1.27Hexa + 0.35Buta + 0.6Allo + 1.01Chlb + 0.86Zea (Uitz et al., 2006), and where Fucocorr = Fuco – (Fuco/Hexa)baseline * Hexa (Hirata et al., 2011). Pigment abbreviations are as follows: Fuco = fucoxanthin, Peri = peridinin, Hexa = 19′-hexanoyloxyfucoxanthin, Allo = alloxanthin, Buta = 19′-butanoyloxyfucoxanthin, Chlb = total chlorophyll b, and Zea = zeaxanthin. The (Fuco/Hexa)baseline is defined as the median of the Fuco/Hexa ratio across all 205 HPLC samples. We also applied the equations in Losa et al. (2017):
urn:x-wiley:00948276:media:grl64411:grl64411-math-0004(3a)
urn:x-wiley:00948276:media:grl64411:grl64411-math-0005(3b)
where the weights for accessory pigments are the values from their global study: urn:x-wiley:00948276:media:grl64411:grl64411-math-0006 = 1.27Fuco + 2.43Peri + 1.07Hexa + 0.0Buta + 2.06Allo + 1.30Chlb + 2.36Zea, and the contribution of fucoxanthin to nanoplankton (Fuconano) in Equation 3b is defined in Losa et al. (2017) as:
urn:x-wiley:00948276:media:grl64411:grl64411-math-0007(4)

Additionally, we tested the application of the CHEMTAX program (Mackey et al., 1996) to estimate the relative contribution of phytoplankton groups to total Chl a (Text S3 and Figures S2–S3 in Supporting Information S1).

To compare cell imagery- and pigment-based estimates of diatom biomass in units of mg C m−3, we calculated Chl a concentrations of diatoms by multiplying fDiat by Chl a, and then assumed a constant cellular carbon-to-chlorophyll (C:Chl) ratio (Equation 5a-5c). Given that the C:Chl ratio varies (Behrenfeld et al., 2016; Jackson et al., 2017; Sathyendranath et al., 2009), we used mean, minimum, and maximum values of C:Chl for diatoms to calculate a range of possible values for pigment-derived diatom carbon (Cdiat_Pigments) as follows:
urn:x-wiley:00948276:media:grl64411:grl64411-math-0008(5a)
urn:x-wiley:00948276:media:grl64411:grl64411-math-0009(5b)
urn:x-wiley:00948276:media:grl64411:grl64411-math-0010(5c)
where Chl a concentrations were obtained from HPLC analysis, and C:Chl values were reported in Sathyendranath et al. (2009). We applied Equation 5a-5c to fDiat defined in both Equations 2 and 3a-3b.

To increase the number of matches between Cdiat_Pigments and Cdiat_IFCB, we estimated diatom biomass from Chl a measurements using the empirical relationships proposed by Hirata et al. (2011), Losa et al. (2017), and Soppa et al. (2014). All equations are provided in Table S1 in Supporting Information S1. Chl a concentrations were obtained from spectral particulate absorption measurements (Text S1 in Supporting Information S1), and converted results to units of carbon with the same C:Chl constants shown in Equation 5a-5c.

2.4 Neural Network-Based Estimates of Diatom Carbon

Neural networks provide a tool to model complex relationships between a desired target quantity (here, diatom carbon) and multiple input parameters (here, temperature, salinity, and Chl a). We trained a shallow neural network model by using Cdiat_IFCB from the entire NAAMES02 cruise track (631 IFCB samples comprising 209,261 diatom images collected between 11 May and 4 June 2016) as the target parameter (Text S4, Figure S4 in Supporting Information S1). Chl a, surface water temperature and surface salinity were provided as input parameters. These input parameters were chosen as they are significantly correlated with Cdiat_IFCB (Figure S5 in Supporting Information S1; see also Brun et al., 2015), and can be derived from remote sensing data.

To assess distributions of diatom carbon across the study area, two sections of the ship track that had relatively low cloud cover were chosen and denoted Scene 1 (39–45°N and 57–72°W) and Scene 2 (46–58°N and 36–56°W) (Figure 1a). The network model was applied to remote sensing data from the two scenes using satellite-based products for Chl a, temperature, and salinity (Text S5 in Supporting Information S1).

3 Results and Discussion

3.1 Comparison of Pigment- and Cell Imagery-Derived Cdiat

Across the diverse environmental conditions sampled during the four NAAMES campaigns, values of Cdiat_Pigments were higher than Cdiat_IFCB regardless of the model used. The median bias value was 142%, which amounts to a median difference of 2.4 mg C m−3 for the model of Losa et al. (2017), and 138% and 0.9 mg C m−3 for the model of Hirata et al. (2011) (Figure 2a). This is likely explained in part by the presence of other fucoxanthin-containing phytoplankton groups including prymnesiophytes, silicoflagellates, and pelagophytes (Jeffrey & Vesk, 1997; Roy et al., 2011), as well as some dinoflagellate types (Yoon et al., 2002). Representatives of these groups were detected in the NAAMES IFCB samples across both the nano- and microphytoplankton size classes (Chase et al., 2020). The model of Losa et al. (2017) considers the fucoxanthin found in nanoplankton (Equations 3a-4) and similarly Hirata et al. (2011) apply a method to correct the representation of diatoms using fucoxanthin by removing a portion of it as a function of Hexa. However, the discrepancy between cell imagery- and accessory pigment-based estimates of diatoms remains, and can likely be attributed to the approximations made when defining diatoms using simple accessory pigment ratios. The CHEMTAX model, a more complex application of accessory pigment ratios, also predicts higher diatom carbon concentrations compared to imagery-based results (Text S3 and Figures S2–S3 in Supporting Information S1). The degree of overestimation varies, depending on the initial pigment ratio and phytoplankton group inputs used in the CHEMTAX model. Initial pigment ratios and phytoplankton groups suggested by van de Poll et al. (2013) show the lowest deviation (Figure S3b in Supporting Information S1). We note that as the choice of phytoplankton groups and initial pigment ratios produces noticeably different CHEMTAX results (Figures S2–S3 in Supporting Information S1), potential use of the CHEMTAX approach is best suited to scenarios where a priori knowledge of the phytoplankton communities exists.

Details are in the caption following the image

(a) Diatom carbon estimated from IFCB imagery (x-axis; Section 2.2) and from accessory pigment-based methods (y-axis; Equations 2-5c). Y-axis error bars show the range of possible values when converting from accessory pigment-based Chl a values to diatom carbon (Equation 5a-5c, Section 2.3). (b) Chl a versus diatom carbon estimated from IFCB imagery across all four NAAMES cruises (gray dots, n = 1,449), and using the equations in Table S1 in Supporting Information S1 (colored lines). Shaded areas around the lines represent the range of values in diatom carbon with minimum and maximum C:Chl values applied (Section 2.3). Black line shows the fit described by Equation 5a-5c of this study. Cdiat_IFCB error bars in both (a) and (b) represent the combined uncertainty in particle biovolume estimates, uncertainties in the conversion from cell volume to carbon, and statistical counting errors.

Estimates of Chl a from IFCB imagery, in combination with conventional flow cytometry data, compare well with Chl a from HPLC for the NAAMES study (see figure 3a in Chase et al., 2020), but Cdiat_IFCB, could potentially underestimate diatom carbon due to the 150 μm mesh at the sampling intake that prevents very large cells and chains of cells from entering the flow cell. However, some large chains do enter the instrument as they tend to orient themselves with their major axis parallel to the flow, and in the NAAMES data set, the major axis length of diatoms ranged between approximately 6 and 200 μm (Figure S9 in Supporting Information S1). Specifically, 4.3% of all images classified as diatoms have a major axis length >150 μm. Although small nano-diatoms are often identifiable as diatoms in IFCB imagery due to their symmetrical shapes and rectangular cross-section, the IFCB does not comprehensively image particles smaller than approximately 6–9 μm (sensitive to instrument settings), and thus they are excluded from our analysis. In some regions, small nanoplanktonic diatoms can contribute significantly to total particulate organic carbon, as has been shown in the Mediterranean Sea (up to 26%; Leblanc et al., 2018). However, both global ocean and region-specific databases (e.g., the North Atlantic) suggest that small nano-diatoms (<5 μm) generally account for a low percentage of total diatom abundance in the open ocean (Leblanc et al., 20122018). The potential underestimation of diatoms due to omission of the largest and smallest cells implies that at times and places when there are significant concentrations of the largest and smallest diatom cells, estimates based on the IFCB imagery may represent a lower bound in diatom carbon concentration.

3.2 A Revised Empirical Model for Cdiat

Estimates of diatom carbon from published Chl a-based models (Table S1 in Supporting Information S1) were higher than those from IFCB imagery (Figure 2b). This is expected due to approximations made when using accessory pigments as a proxy for diatoms. Based on the NAAMES imagery and Chl a data, we propose a new empirical relationship between diatom carbon (urn:x-wiley:00948276:media:grl64411:grl64411-math-0011) and Chl a (both in units of mg m−3):
urn:x-wiley:00948276:media:grl64411:grl64411-math-0012(6)
where coefficient uncertainty values are shown in parentheses. Uncertainties for the coefficients were calculated using a bootstrapping method, where the 1,449 data points used to derive the fit (Figure 2b) are iteratively subsampled (10,000 iterations; subsampled with replacement). The standard deviations of the results of this bootstrapping, for each of the two coefficients, are then defined as the uncertainty values. Equation 2-6 provides a relationship between Chl a and diatom carbon that is based on direct measurements of diatom concentrations and therefore may be more suitable for use than previously developed models that are solely based on relationships with accessory pigment. Equation 2-6, however, does not capture the variability observed in the imagery data as a function of Chl a concentrations (Figure 2b). It also indicates that while Chl a-based methods can provide a reasonable estimate of diatom carbon on average over an ocean basin, they have high uncertainty when predicting diatom biomass at any given time and location. This should be considered during application of any Chl a-based equation as a tool to assess diatom distributions.

3.3 Diatom Carbon Models Applied to Remote Sensing Data

We applied three approaches to estimate diatom carbon (Equations 2-6; and the neural network model (Section 2.4)) from satellite data available for May 2016, and compared the resulting distribution maps. The application of Equations 2 and 5a-5c results in overall higher diatom carbon concentrations compared to the other two approaches (Figures 3a and 4a). Diatom carbon concentrations derived from Equation 5a-5c and the neural network model, both based on the IFCB cell imagery data, show a similar order of magnitude in diatom carbon but differ in their spatial patterns, notably with more (sub)mesoscale features highlighted in the results from the neural network model (Figures 3b and 3c4b and 4c). Comparison of in situ measurements to satellite-modeled diatom carbon in the highly dynamic North Atlantic region is challenging, as a result of changes in physical and biological features on the timescale of hours to days and the need to use composite multi-day satellite data (Figures S8 and S10 in Supporting Information S1). As with the application of any remote sensing algorithm, the ability of the satellite data to accurately represent in situ conditions is ultimately a limiting factor. More detailed analyses of satellite subpixel variability in regards to phytoplankton community composition are warranted.

Details are in the caption following the image

Scene 1 diatom carbon estimated from satellite data, with in situ imagery-based data points from May 11 to 13, 2016 overlain. (a) Cdiat estimated using fDiatH11 from Hirata et al. (2011) (Equation 5a-5c) converted to units of carbon (mg m−3) (Equation 5a-5c). Cdiat mean, median, and SD are 19.9, 10.2, and 23.1 mg C m−3, respectively. (b) Cdiat estimated using Equation 5a-5c of this study. Cdiat mean, median, and SD = 3.5, 1.5, and 6.7 mg C m−3, respectively. (c) Diatom carbon estimated with the three-parameter neural network model described in this study (Section 2.4). Cdiat mean, median, and SD = 5.6, 2.3, and 9.7 mg C m−3, respectively. Regions of missing data not seen in the other two panels are the result of neural network inputs from satellite data falling outside the range of in situ data used to train the neural network.

Details are in the caption following the image

Scene 2 diatom carbon estimated from satellite data. Panels as in Figure 3, but for the Scene 2 region with imagery-based in situ data from May 16 to 23, 2016. (a) Cdiat mean, median, and SD are 16.6, 7.0, and 22.6 mg C m−3, respectively. (b) Cdiat mean, median, and SD = 4.1, 1.1, and 10.3 mg C m−3, respectively. (c) Cdiat mean, median, and SD = 6.5, 2.7, and 10.8 mg C m−3, respectively.

In the western North Atlantic, relationships between diatom carbon and Chl a, temperature, and salinity are highly variable (Figure S4 in Supporting Information S1), precluding a simple and robust model for predicting diatom carbon at a given time and location within the region using these three inputs. However, our results demonstrate that further work with neural networks, which have previously shown promise (Palacz et al., 2013; Raitsos et al., 2008), could be beneficial to resolve finer spatial scale distributions of phytoplankton communities. In addition to the seasonal variability of the North Atlantic, there is high spatial variability in Chl a concentration, temperature, and salinity (Figures S6, S7 in Supporting Information S1). This underscores the need to better understand conditions and mechanisms determining diatom abundance in the North Atlantic, consistent with recent studies that highlight and work toward explaining the unexpected lack of large bloom-forming diatoms in the region (Behrenfeld et al., 2021; Bolaños et al., 2020).

Uncertainties must be accounted for, both in the data and in the model approach used. The average uncertainty in Cdiat using the neural network (Section 2.4; Figures 3c and 4c) is 65% (Text S4 in Supporting Information S1). This value takes into account the data used for developing the neural network model, and the accuracy of the model itself (Figure S4 in Supporting Information S1). Calculated uncertainties for the parameters in Equation 5a-5c and a description of their calculation are provided (Section 3.2). Previously published algorithms to estimate diatoms from Chl a report statistics of best fit lines, but not uncertainties based on data used in algorithm development and/or in the uncertainty modeled fit parameters. We emphasize the need for defining and propagating uncertainties when estimating diatom and other phytoplankton group quantities from pigments both in situ and using remote sensing data and interpreting the observed patterns.

4 Conclusions

We provide here an updated model for estimating diatom carbon from Chl a, and highlight the need for independent measurements of diatom biomass, which is critical to quantify and thus remove biases associated with different measurement types. Our results also illustrate the potential for combining satellite data from multiple platforms in a neural network framework. The inclusion of ancillary environmental information such as water temperature and salinity can improve phytoplankton group modeling efforts (Figure S10 and Table S2 in Supporting Information S1; see also Brewin et al., 2019; Xi et al., 2021). Looking forward, the upcoming NASA Plankton, Aerosol, Cloud, and ocean Ecosystem (PACE) mission will include polarimetry and hyperspectral ocean color instruments that can potentially be combined with other remotely sensed or modeled data. Accessory phytoplankton pigments can be estimated from hyperspectral Rrs(λ) (Chase et al., 2017; Kramer et al., 2022), and this information could help constrain neural network results, as different major phytoplankton groups contain different accessory pigment assemblages. Estimating diatoms and other phytoplankton groups using remote sensing techniques requires in situ data collected for validation, using a variety of methods, as well as critical attention to uncertainties and potential biases. The study presented here aims to address these needs while working toward the goal of phytoplankton community composition assessment from space.

Acknowledgments

The authors are grateful to Michael Behrenfeld and Chris Hostetler for their leadership of the NAAMES expedition, as well as the crew and scientists who helped with data collection, with special thanks to Sasha Kramer and Nicholas Baetge. Thank you to Adriana Zingone for assistance with diatom taxonomic identification. The authors appreciate the helpful feedback from M. Thyssen and one anonymous reviewer, which helped improve the manuscript. The authors state no financial conflicts of interest. Funding for this work was provided by NASA grants #NNX15AE67G and #80NSSC20M0202. A. Chase is supported by a Washington Research Foundation Postdoctoral Fellowship.

    Data Availability Statement

    Data presented in this paper are available at the NASA SeaBASS repository (https://seabass.gsfc.nasa.gov/naames; DOI: https://doi.org/10.5067/SeaBASS/NAAMES/DATA001). IFCB images are viewable on the EcoTaxa platform (https://ecotaxa.obs-vlfr.fr/) and image feature data following classification with the neural network are available at DOI: https://doi.org/10.5281/zenodo.6595852.