PaCTS 1.0: A Crowdsourced Reporting Standard for Paleoclimate Data

The progress of science is tied to the standardization of measurements, instruments, and data. This is especially true in the Big Data age, where analyzing large data volumes critically hinges on the data being standardized. Accordingly, the lack of community‐sanctioned data standards in paleoclimatology has largely precluded the benefits of Big Data advances in the field. Building upon recent efforts to standardize the format and terminology of paleoclimate data, this article describes the Paleoclimate Community reporTing Standard (PaCTS), a crowdsourced reporting standard for such data. PaCTS captures which information should be included when reporting paleoclimate data, with the goal of maximizing the reuse value of paleoclimate data sets, particularly for synthesis work and comparison to climate model simulations. Initiated by the LinkedEarth project, the process to elicit a reporting standard involved an international workshop in 2016, various forms of digital community engagement over the next few years, and grassroots working groups. Participants in this process identified important properties across paleoclimate archives, in addition to the reporting of uncertainties and chronologies; they also identified archive‐specific properties and distinguished reporting standards for new versus legacy data sets. This work shows that at least 135 respondents overwhelmingly support a drastic increase in the amount of metadata accompanying paleoclimate data sets. Since such goals are at odds with present practices, we discuss a transparent path toward implementing or revising these recommendations in the near future, using both bottom‐up and top‐down approaches.

This is especially true in the Big Data age, where analyzing large data volumes critically hinges on the data being standardized. Accordingly, the lack of community-sanctioned data standards in paleoclimatology has largely precluded the benefits of Big Data advances in the field. Building upon recent efforts to standardize the format and terminology of paleoclimate data, this article describes the Paleoclimate Community reporTing Standard (PaCTS), a crowdsourced reporting standard for such data. PaCTS captures which information should be included when reporting paleoclimate data, with the goal of maximizing the reuse value of paleoclimate data sets, particularly for synthesis work and comparison to climate model simulations. Initiated by the LinkedEarth project, the process to elicit a reporting standard involved an international workshop in 2016, various forms of digital community engagement over the next few years, and grassroots working groups. Participants in this process identified important properties across paleoclimate archives, in addition to the reporting of uncertainties and chronologies; they also identified archive-specific properties and distinguished reporting standards for new versus legacy data sets. This work shows that at least 135 respondents overwhelmingly support a drastic increase in the amount of metadata accompanying paleoclimate data sets. Since such goals are at odds with present practices, we discuss a transparent path toward implementing or revising these recommendations in the near future, using both bottom-up and top-down approaches.
Plain Language Summary Standardizing the way data are described and shared is key to accelerating the progress of science. Building on recent advances in paleoceanography and paleoclimatology, we present the first community-led reporting standard for such datasets. The Paleoclimate Community reporTing Standard (PaCTS) provides guidelines as to which information should be included when reporting data from various paleoclimate archives, as well as themes common to many fields, like uncertainty and other site-specific information. The ultimate goal of this effort is to (1) make these datasets more re-usable over the long term, and (2) provide a roadmap for implementing and revising the standard, as the field of paleoclimatology and its practitioners both evolve. The requirements are driven by the differing needs of data producers and the data consumers, who often have different goals in mind. Thus, agreeing on and writing up these requirements involves building consensus among the community to decide on their present and future goals.

Introduction
Paleoclimatology is a highly integrative discipline, often requiring the comparison of multiple data sets and model simulations to reach fundamental insights about the climate system. Currently, such syntheses are hampered by the time and effort required to transform the data into a usable format for each application. This task, called data wrangling, is estimated to consume up to 80% of researcher time in some scientific fields (Dasu & Johnson, 2003), an estimate commensurate with the experience of many paleoclimatologists, particularly at the early-career stage. Wrangling involves not only identifying missing values or outliers in the time series but also searching multiple databases for the scattered records, contacting the original investigators for the missing data and metadata, and organizing the data into a machine-readable format. Further, this wrangling requires an understanding of each data set's originating field and its unspoken practices and so cannot be easily automated or outsourced to unskilled labor or software. There is therefore an acute need for standardizing paleoclimate data sets.
Indeed, standardization accelerates scientific progress, particularly in the era of Big Data, where data should be Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al., 2016). Standardization is critical to many scientific endeavors: efficiently querying databases, analyzing the data and visualizing the results; removing participation barriers for early-careers scientists or people outside the field; reducing unintended errors in data management; and ensuring appropriate credit of the original authors. While the paleoclimate community has made great strides in this direction (e.g., Williams et al., 2018), much work remains. The recent adoption of the FAIR data principles (Wilkinson et al., 2016) by the American Geophysical Union (Stall et al., 2017) elevates the urgency of defining what data and metadata should be archived, and how. This article proposes a community-recommended set of preliminary reporting standards and an open platform to determine which metadata are important for public archival, with an eye toward maximizing the long-term value of hard-earned paleoclimate observations and ensuring optimal reuse.
The need for standardization in paleoclimate research is beyond vocabulary agreement. Consider the editorial of Wolff (2007), which tackled the ambiguous definition of time in the paleoclimate community. The notation before present (BP) has become a de facto standard in the community, although "present" means different things to different people. It is often taken as Common Era (CE) 1950 (especially within the radiocarbon community), undefined, or defined as some other date (e.g., CE 2000), or the year the study was performed/published. For studies spanning several million years with age uncertainties in excess of 1,000 years, a 50-year difference is immaterial. However, for studies working at higher resolution (e.g., decadal to subannual), concentrating on recent millennia, this difference is consequential. Thus, an agreement over the precise meaning of the term present turns out to be critical to many uses of these data sets. The same can be said of many other metadata properties, underscoring the need for common practices in paleoclimate data reporting.
Given this acute need for standardization, the National Science Foundation EarthCube-funded LinkedEarth project nucleated a discussion on data reporting practices. EarthCube (2015) defines a standard as "a public specification documenting some practice or technology that is adopted and used by a community." The emphasis on community and practice underlines the cooperative nature of standard development. If only one person uses a technical specification, it is not a standard. If it is voted on but not applied in practice, it is of little practical use.
Standardization requires three distinct elements: (1) a standard format for the data, (2) a standard terminology for metadata, and (3) standard guidelines for reporting data (i.e., reporting standards). We note that some prior knowledge of standardization practices (e.g., which data to include) can be useful in the planning stages of data collection. As an analogy, consider the organization of library cards into an old-fashioned file cabinet. For this system to function, one needs (1) a set of compartments and drawers to house the information, (2) labels to identify and classify the contents of the drawers, and (3) a disciplined adherence to the classification system. This entails including essential information required for application and reuse of the cards and the information they contain. In other words, every user follows similar guidelines to generate, use, and file the cards; otherwise, the classification falls apart and the cards may as well be stored in a random pile.

Paleoceanography and Paleoclimatology
This article focuses on the last requirement, namely, the creation of standards for reporting paleodata and metadata. It builds upon recent efforts to address the first two points. On the first point, the Linked PaleoData format (LiPD; McKay & Emile-Geay, 2016) and derived vocabulary agreements to describe paleoclimate data (the LinkedEarth Ontology; Emile-Geay et al., 2019) provide a data container for paleoclimate data (section 2), which is currently used in a range of data analysis software (Bradley et al., 2018;McKay et al., 2018). On the second point, the National Oceanic and Atmospheric Administration (NOAA) World Data Service for Paleoclimatology (WDS-Paleo) has created a set of standard names to document paleoclimate variables, the Paleoenvironmental Standard Terms (PaST) Thesaurus (National Oceanographic and Atmospheric Administration, 2018). This article's aim is twofold: First, to provide a snapshot of the first version of the Paleoclimate Community reporTing Standard (PaCTS), as of 2019, with the understanding that this standard will eventually evolve, and second, to document the process of community elicitation of such guidelines, so as to provide maximum transparency on why and how these decisions were made. We start from the premise that sampling decisions predate these reporting decisions, so the standard aims to guide an investigator's decisions as to how they should report existing measurements, for example, at the time of publication.
The remaining sections are organized as follows: Section 2 summarizes the relevant prior standardization efforts, which serve as the foundation for PaCTS v1.0. Section 3 describes the standardization process, including eliciting community feedback. Section 4 presents recommendation from a group of 135 international researchers actively engaged in paleoclimate research. Section 5 illustrates the application of PaCTS v1.0 to an existing paleoclimate record. Finally, Section 6 concludes with a plan to disseminate the first version of PaCTS within the paleoclimate community and provides a roadmap for further standards development and their future applications.

The LinkedEarth Framework: An Online Approach to Standard Development
The LinkedEarth project established an online (Gil et al., 2017) that enables the curation of metadata for publicly accessible data sets by experts and fosters the development of terminology agreements and standards for paleoclimate metadata. Our approach builds on two synergistic elements: (1) the LinkedEarth Ontology (Emile-Geay et al., 2019), which provides an unambiguous structure and terminology to describe the metadata of a paleoclimate data set, and (2) the LinkedEarth Platform (Gil et al., 2017), which enables the collaborative authoring of highly structured metadata about paleoclimate data sets using the terms in the LinkedEarth Ontology.
The LinkedEarth Ontology represents vocabulary agreements to describe paleoclimate metadata. In a domain like paleoclimatology, we usually can distinguish the different kinds of objects that we want to describe (i.e., a sample, a measurement, and a data set) and the relationships used to describe those objects (e.g., a measurement is taken from a sample and therefore they are related and the measurement is in a data set and therefore they are related). An ontology is a formal way to represent objects and their properties, and they represent consensual knowledge that helps a community describe major concepts in the domain using common terms. Specifically, an ontology formalism allows the representation of objects types as classes and relationships as properties of those classes. Classes can have subclasses, and a given class can be a subclass of several classes. For example, the class proxy archive can have coral as a subclass, and the class repository item can have sample as a subclass. A feature of ontologies is that they allow the creation of machine-readable metadata, that is, data descriptions that can be queried programmatically by machines to retrieve data sets of interest. Thanks to the ontology, machines can navigate through metadata and discover data that otherwise would be hidden to them. LinkedEarth relies on semantic web technologies to represent ontologies, specifically the Web Ontology Language (OWL) standard of the World Wide Web Consortium (W3C;W3C OWL Working Group, 2012). More details are provided in Emile-Geay et al. (2019).
The LinkedEarth Platform allows users to (1) describe paleoclimate data sets using the terms available in the LinkedEarth Ontology and (2) propose new terms if they cannot find an appropriate one in the ontology. The LinkedEarth Platform is a sociotechnical system, and as such, it provides technology infrastructure coupled with social processes that support terminology and standards convergence. When users describe a paleoclimate data set, the terms in the existing LinkedEarth Ontology are offered to them as editable forms and completion commands, which promotes adoption. If a user does not find a term that is appropriate for their data set, they can create a new term on the fly. Such new terms can then be discussed on the platform, building community consensus on their definitions and the essential status of their inclusion to a data set. The social extensions of the LinkedEarth Platform allow working groups to organize activities by users with similar expertise to build a common vocabulary. Each working group was assigned a special page on the LinkedEarth Platform to nucleate their activities, including discussions and polls for rapid community feedback. The terms discussed within these working groups form the crowdsourced part of the LinkedEarth Ontology. The social editorial processes eventually will lead to a new version of the LinkedEarth Ontology. The LinkedEarth Platform and its associated social processes are described in detail in Gil et al. (2017).
The LinkedEarth Platform is implemented as an extension of the Semantic MediaWiki framework (Krötzsch & Vrandečić, 2011). Semantic wikis augment traditional wikis with the ability to structure information through (1) semantic annotations, which enable the assignment of a class (or category) to an object in a wiki page and properties (or qualifiers) that are useful to describe that object and (2) automated reasoning capabilities that exploit those annotations to organize the wiki's knowledge (Gil, 2013). For example, if the page for Los Angeles is annotated as being in the class city and having a property location = California, and the page for California has a property that location = U.S.; then the semantic wiki can infer that Los Angeles is in the U.S. even though that was not explicitly stated. Semantic wiki pages can also include queries that are executed when the page is visited, so dynamic content is created in a way that is up to date with the latest additions. Semantic wikis also have facilities to track edits together with the data and contributor, so that the provenance of edits can be examined and undesirable ones can be easily undone. The content of semantic wikis becomes part of the open Semantic Web, as it can be published as a set of linked Web objects in the Web of Data, following Linked Data Principles (Heath & Bizer, 2011). With this approach, the metadata for all paleoclimate data sets defined in the wiki becomes openly available on the Web, machine readable, and can be queried programmatically by any application. More details are provided in Gil et al. (2017).

Previous and Concurrent Efforts Toward a Data Standard
The discussion below is nonexhaustive and only focuses on the relevant efforts that have sparked the discussion about PaCTS.

Origins of a Standard Format for Paleoclimate Data
Climate modeling has greatly benefitted from the netCDF data format (Unidata, 2019), designed to support the creation, access, and sharing of array-oriented data, including climate model output. Despite the importance of paleoclimate data availability for model evaluation (Masson-Delmotte et al., 2013), until recently, there was no universal container to describe, store, and share these data sets. Emile-Geay and Eshleman (2013) first introduced the idea of a flexible container, where metadata would be stored semantically with the numeric data in tabular form. This concept was the basis for the LiPD format (McKay & Emile-Geay, 2016).
LiPD is a universally readable data container that organizes paleoclimate data and metadata in a uniform way. It is based on JSON-LD (JavaScript Object Notation for Linked Data), a JSON-based format compliant with the Linked Data paradigm. JSON is a lightweight data interchange format that is easy for humans and machines alike to read and write. LiPD has six distinct components: root metadata (e.g., data set name, investigator, and version); geographic metadata (e.g., coordinates and descriptive location such as a country or city); publication metadata (e.g., authors, title, journal, and digital object identifier [DOI]); funding metadata (e.g., funding agency and grant number); PaleoData, which includes all the measured (e.g., Mg/Ca) and inferred (e.g., sea surface temperature) paleoenvironmental data; and ChronData, which mirrors PaleoData for information pertaining to age. These components provide the rigidity necessary to write robust codes around the format while remaining extensible enough to capture (meta)data as rich as the users want to provide for them. Utilities in Matlab, Python, and R  allow users to interact with the files (specifically, to read, write, query, or filter data sets matching specified conditions).
In many ways, LiPD is intended to be the netCDF of paleoclimate observational data. However, although both LiPD and the LinkedEarth Ontology provide a standard way to describe a paleoclimate data set, they say little about what information should be stored to ensure reuse. The endorsement of netCDF by a 10.1029/2019PA003632 Paleoceanography and Paleoclimatology broad community further benefited from the adoption of the Climate and Forecast (CF) conventions (Gregory, 2003). The CF conventions define metadata describing what the data in each variable represents, and the spatial and temporal properties of the data. In other words, it defines both a set of common terms (a standard vocabulary) and a reporting standard. Efforts toward standardization of common terms have been undertaken by WDS-Paleo in the form of the PaST thesaurus (National Oceanographic and Atmospheric Administration, 2018), which provides the preferred option for a standardized name and definition. PaCTS details a crowdsourced approach for deciding what information should be included when reporting paleoclimate data, a CF convention for paleoclimate data sets.

Archive-Focused Initiatives
Attempts at paleoclimate data standardization have a long history. For data sets derived from wood archives, LinkedEarth relied on the tree-ring data standard, TRiDaS (Jansma et al., 2010), which complies with established data standards such as Dublin Core (DCMI Usage Board, 2008). The TRiDaS project aimed at defining the properties that are used in the dendro community and give them a consistent name (i.e., a controlled vocabulary) and identifying whether the quantity should be mandatory and repeatable (i.e., best practices). These efforts help inform the PaCTS one for wood archives, though it should be noted that tree-ring science is far broader than dendroclimatology, involving applications to paleofire, landscape evolution, paleoecology, art history, and archeology. Because PaCTS is focused on paleoclimate, we reused the relevant subset of the TRiDaS standard.
A discussion regarding paleoceanographic data standards was started during the Paleoclimate Model Intercomparison Project (PMIP) Ocean Workshop 2013-Understanding Changes Since the Last Glacial Maximum (hereafter, PMIP LGM) in Corvallis, Oregon, in December 2013. Given the expertise of the working group members, the discussion focused on marine sedimentary archives and was summarized into a document, which is available on the LinkedEarth Platform (Kucera et al., 2013). Their recommendations served as the foundation for a preliminary reporting standard for records based on marine sedimentary archives. Although the group identified recommended properties to be included with marine data sets, they did not propose a complete vocabulary nor a subset of required properties for acceptance in a database.
The Marine Annually Resolved Proxy Archives (MARPA) working group, nucleated under the EarthCube umbrella, is one of the first grassroots efforts within the paleoclimate community to enhance and facilitate the archiving and sharing of paleoclimate data as they pertain to annually resolved archives (e.g., corals, mollusks, coralline algae, and sclerosponges; Dassié et al., 2017). Their efforts included a registry of physical samples and their associated geochemical data and metadata, which are our primary focus here. The MARPA group summarized their recommendations in a document that was circulated among the community and constitutes the backbone of the recommendations presented here. Most of these recommendations were also applicable to other archives, rather than MARPA-specific, underscoring that despite their diversity, paleoclimate data sets retain common core properties that facilitate multiproxy syntheses and comparisons.
The Speleothem Isotopes Synthesis and Analysis (SISAL) group was formed under the international Past Global Changes (PAGES) project and aimed at bringing together speleothem scientists, process modelers, statisticians, and climate modelers to develop a global synthesis of speleothem isotopes that can be used to further our understanding of past climate variability and in model evaluation. As part of this initiative, a template was created, outlining the necessary metadata for speleothem-based records (Atsawawaranunt et al., 2018). This template (Comas-Bru & Harrison, 2019) forms the backbone of properties applicable to speleothems-based records presented here.

Workshop on Paleoclimate Data Standards
The workshop on paleoclimate data standards held in Boulder, USA in June 2016 (Emile-Geay & McKay, 2016, Figure 1) served as a focal point to initiate a broader process of community engagement and feedback solicitation, with the goal of generating a community-vetted standard for reporting paleoclimate data. Workshop participants identified the necessity to distinguish a set of essential, recommended, and desired properties for each data set. By default, any and all information was considered desired, though we shall see exceptions to this principle. A subset of the archived information should be recommended to ensure optimal reuse of the data set. Yet a smaller subset of this information is defined as essential, meaning that the data set cannot be reused reliably or at all without these critical pieces of information.

Paleoceanography and Paleoclimatology
A consensus emerged that these distinctions are archive-specific; for instance, what is needed to meaningfully reuse a speleothem record could be quite different from what is needed to meaningfully reuse an ice core record. It was therefore decided that experts on particular paleoclimate archives organized into working groups (WGs) would be best positioned to elaborate and discuss the components of a data standard for their specific subfield of paleoclimatology. Consequently, seven WGs were created on the LinkedEarth Platform centered around the main archives used in paleoclimate studies: historical documents, ice cores, lake sediments, marine sediments, MARPA, speleothems, and tree rings. A call for additional WGs was made in the fall of 2016. Observations common to two or more archives (e.g., alkenones) were discussed in one WG with a link to the discussion in other WGs. It is also critical to ensure interoperability among standards to enable investigations using multiple observations on the same archive and across archives; to that end, three longitudinal WGs were created to deal with information common to all archives (such as publication, geographical coordinates, and funding information), to report uncertainties in the record, and to report how chronologies were established.
The workshop participants also identified the need to have a separate set of requirements for newly generated data sets and legacy data sets, for which less metadata would likely be available. In PaCTS v1.0, a legacy data set is defined as a data set that is not being archived by the author(s) of the original study.

Working Groups
Rules of engagement on the LinkedEarth Platform were published in the fall of 2016 along with the establishment of seven WGs (ice cores, lake sediments, marine sediments, MARPA, speleothems, trees, and uncertainties, Figure 1). Three WGs (chronologies, cross-archive, and historical documents) followed in the spring of 2017 as additional archives, and common information to all archives were identified. Each WG leader was tasked to organize their subcommunity either directly on the platform, through videoconferences, meetings at conferences, and/or other working groups (e.g., MARPA group and the PAGES SISAL group). The WG leaders were tasked to regularly update the discussion directly on the LinkedEarth platform or provide a document for integration on the platform. One difficulty in defining desired, essential, and recommended properties was related to the expected use of the data: Depending on what one wants to do with the data, one needs different metadata. By far, the most important and metadata-hungry task is to perform queries to find data sets pertinent to a scientific question.
As an example of finding data sets pertinent to a scientific question, consider a study conducted by a paleoceanographer who wants to characterize millennial-scale sea surface temperature (SST) variability during the Holocene epoch (Khider et al., 2016). In the current research ecosystem, a typical workflow would consist of querying several databases to find suitable records, extract the data, consult the original publication(s) for additional metadata (e.g., author's definition of present), reformat the data into a coherent format for analysis, apply spectral analysis to examine the frequency content of the records, perform some statistical analysis of the results, and visualize them. In an ideal world, the query, preferably from a single database, should (1) find records that span the Holocene, (2) find the subset of those that primarily reflect SST, and (3) find the subset of that subset with a specified resolution (e.g., finer than 200 years) to have at least five data points per 1,000-year cycle (a permissive assumption for this sort of work). Simple though it may seem, this query requires the following (meta)data: (1) a measure of age (time) and minimum and maximum values of the time series; (2) an estimate of SST, as an inferred variable, and/or Mg/Ca, U k' 37 , TEX 86 , or microfossil assemblages as measured variables from which SST can be inferred; and (3) temporal resolution, calculated from the data.
Other types of basic queries include: searching for a particular publication, using either the DOI, title, journal, or authors; and searching by the type of archives. Defining the search parameters for these complex queries on the LinkedEarth platform (Khider & Garijo, 2018) sparked the discussion for the needed properties.
A standard helps not only with the menial task of searching for records in a database. Such a standard can also assist with doing the science per se, by ensuring that the required information is present in the data set. For instance, making a simple map of all the records in a database by archive types (Figure 1a of PAGES2k Consortium, 2017) requires each data set to report latitude, longitude, and the archive type. More complex data analysis requires more information: to investigate the effect of age uncertainties (e.g., with the Bchron (Haslett & Parnell, 2008) or BACON (Blaauw & Christen, 2011) packages, or to establish new depth-age models (Blois et al., 2011;Giesecke et al., 2014), one needs the raw radiocarbon measurements, their measurement uncertainties, and associated depth in the archive.

Community Surveys
To decide which of the properties identified within the various WGs should be considered essential, recommended, or desired, we first gathered input via the LinkedEarth platform ( Figure 2a). As of 1 August 2018, it was home to 207 polls, with 796 votes given by 32 different users. On average, each question received three votes, with some questions receiving no votes and others as many as 27. Note that some questions were duplicated across different WGs and the final count presented here takes into account all votes received on the platform. The low number of votes can be partially attributed to the fact that voting was only possible after authentication onto the platform, creating a barrier to widespread participation. To broaden community involvement, the polls were then threaded on Twitter from the LinkedEarth account with voting allowed over a 7-day period ( Figure 2b). The Twitter polls increased engagement (by a factor of 3 on average) and also led to discussions that were then moved to the LinkedEarth platform for traceability of decisions.
Finally, by request from the community, the questions were summarized in a survey distributed to the paleoclimate community through the ISOGEOCHEM, CLIMLIST, paleoclimate, and cryolist list-servs, as well as the PAGES e-news, website, and social media. The survey contained 603 questions across all working groups for which respondents were asked to determine whether each property is deemed essential, recommended, or desired for new and legacy data sets, in addition to open-ended questions and prompts for community feedback. The survey was more comprehensive than the polls on the LinkedEarth platform or Twitter since all questions were framed to allow for a response for legacy and new data sets. On the other hand, the LinkedEarth platform also contains duplicate questions across various WGs (e.g., should depth be reported as essential, recommended, and desired), polls aiming to define the scope of the data sets housed on LinkedEarth (e.g., should the LinkedEarth platform only contain data sets that appear in peer-reviewed publications?), and the operating definition of legacy versus new data sets that was then used in the survey. Ninety-five scientists participated in the survey. Each question on the survey received on average 54 answers.
Paleoclimatology is a multidisciplinary effort where researchers typically have expertise in one or more proxy systems (e.g., different observations on the same archive, similar observations on different archives, or a mix of different sensors, observations, and archives). Scientists are often led to compare their own data sets to others obtained from proxy systems with which they are less familiar. Consequently, the metadata they need tend to differ based on their level of expertise (it is easier to fill in the blanks in one's own area of expertise). For instance, an ice core expert interested in comparing their deuterium record with a nearby record of SST would most likely only require the age at each horizon and associated SST. On the other hand, an expert on foraminiferal Mg/Ca-based SST reconstruction may also need information about the cleaning methodology or the number of individual foraminifera in the sample. To ensure that both needs were represented, respondents were encouraged to complete the entire survey, rather than focus exclusively on their own areas of expertise.

10.1029/2019PA003632
Paleoceanography and Paleoclimatology survey and LinkedEarth platform. Since voting on Twitter is anonymous, it is impossible to identify these voters or establish whether they voted on other platforms. We are aware that some researchers may have answered the same question several times on the various platforms. Since the number of survey answers dwarfs the number of votes on Twitter and the LinkedEarth platform (Supplementary Information) and Twitter does not track the user names associated with the votes, we did not attempt to correct for multiple responses. Therefore, 135 contributors represent our best estimate for the number of total participants.
Most of the polls on Twitter and the LinkedEarth platform referenced legacy versus new data sets. However, in the cases where the data set status was not specified, we assumed that the question referred to a new data set only. Furthermore, if a question was repeated on various WGs (e.g., latitude and longitude), the number of votes were tallied and included in the total count for the cross-archive metadata reporting (see section 4.1). Responses on the survey, Twitter, and the LinkedEarth platform were given equal weight.
For each of the properties, we identified respondents' recommendation for both new and legacy data sets as the majority vote. We used mind maps to visually organize the hierarchical information, keeping the relationship intact (Figures 5) and mosaic plots to display the frequencies of the essential, recommended, and desired categories for each working group ( Figure 6). Overall, the community identified 208 properties (69% of polled properties) as essential, 82 (27%) properties as recommended, and 12 (4%) as desired for new data sets. For legacy data sets, fewer properties were deemed essential: 131 (44%) of polled properties versus 136 properties (45%) were considered recommended and 34 properties (11%) were identified as desired. This difference is not unexpected and highlights the fact that legacy data sets, although not as metadata-rich as new data sets, are still valuable to the community (Figure 6).

PaCTS v1.0: Paleoclimate Community reporTing Standard
This section is based on the recommendations made in the various WGs, which were then subject to polling through the LinkedEarth platform, Twitter, and the survey. We are aware that these recommendations may be incomplete for some archives, a point discussed in section 6. A list of these properties, definitions, and associated recommendations are available on the LinkedEarth platform. Despite their diversity, paleoclimate records (and compilations thereof) share common metadata properties such as contributors, geographical information (e.g., coordinates and site name), publication information (e.g., authors, title, journal, and DOI), funding information, and general information about the paleoenvironmental and chronology data (e.g., should the raw data be included?). In total, the community identified 54 properties applicable to all archives (Figures 5 and 7).
For new data sets, 36 of these properties were identified as essential, 9 as recommended and 9 as desired. It is not surprising that 67% of the properties were voted as essential since these properties are critical for the data reuse with no expert knowledge about the proxy systems or paleoclimate. Likewise, 24 of these properties (44%) were identified as essential for legacy data sets. For a data set to be reused, information regarding the location, publication, and interpreted chronology and paleoenvironmental variables is critical. Hence, several researchers commented that new data sets should contain both the raw and interpreted data. The bar for legacy data sets should be lower, recognizing that much of the desired data may no longer be available and that interpreted data are still useful for many applications.
In addition to the properties identified, a data set DOI and a data set license would also promote data reuse. LinkedEarth is not set up to mint DOIs directly, but they can be obtained through other platforms such as PANGAEA, Dryad, or FigShare. The registry of research data repositories, re3data, gives information on whether a repository provides persistent identifiers. The Creative Commons (CC-BY) license is recommended for paleoclimate data since under this license, other researchers are free to share and adapt materials while giving appropriate credit to the original contributor of the resource.

.1. Ice Cores
The ice core WG identified 16 properties specific to glacier ice, including information pertaining to the archive, such as melt in transport, storage conditions, the observations available for the archive, and the chronology. For new data sets, eight properties were deemed essential and eight recommended. The number of essential properties dropped to four for legacy data sets with three properties deemed recommended (Figures 5, 6, and 8).
As with historical documents, most survey respondents were not experts on records generated on ice cores and therefore only responded for properties they were likely to use.

Lake Sediments
The lake sediments WG reported 54 properties specific to this archive, which were grouped by proxy sensor/ observation types: particle size, mineralogy, imagery data, accumulation rate, and compound specific isotopes. Whereas some properties were common across the various types of observations (i.e., units, interpretation, and pretreatment methods), many were observation-specific (e.g., source of compound for compound-specific isotopes), highlighting the necessity of detailed sets of guidelines down to the proxy observation level to meet researchers' needs.  Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4W9podcxp86PPvf/t/cross-archive-metadata.

Paleoceanography and Paleoclimatology
For new data sets, 39 properties were identified as essential and 15 as recommended. For legacy data sets, 25 were seen as essential, 28 as recommended, and 1 as desired (Figures 5, 6, and 9). In addition to these 54 properties, the WG started a discussion on how to best report the concept of depth in the archive. Although several WGs identified depth (i.e., position in the archive sample) as an essential property, especially for new data sets, none had defined how this depth should be reported. The majority of the respondents indicated a preference to report top and bottom depth for both new and legacy data sets although several respondents proposed to lower the bar for legacy data sets to whatever is available for these records.
Respondents also noted that pictures of the core after the sampling process would be useful. Whether these pictures should be available with the data or stored in the database of the physical sample repository is a decision best left to individual researchers, based on their constraints and mandates by funding entities.

Marine Sediments
The marine sediments WG identified 48 properties specific to this type of archives. These properties were divided into six groups, according to the type of observation: general sampling, bulk sediment geochemistry, foraminifera geochemistry, alkenones, the glycerol dialkyl glycerol tetraether (GDGT) proxies, and micropaleontology. The foraminifera geochemistry category was further subdivided into stable isotopes, boron isotopes, and trace elements. Although some of the requirements were common to all observations, this WG included several observation-specific properties such as the cleaning methodology for foraminiferal trace elements or raw peak areas for GDGTs.
For new data sets, 36 properties were identified as essential and 12 as recommended. The number of essential properties drops to 24 for legacy data sets, with the remainder considered recommended (Figures 5, 6, and 10).

Coral, Mollusks, and Other Annually Resolved Marine Records
The properties for these archives were taken from the spreadsheet the MARPA group had circulated online for feedback. Most of these properties were applicable to all archives reporting geochemical properties and were therefore incorporated into the cross-archive WG and questions. Two archive-specific properties were also identified: interpolated chronologies (i.e., distance from core top translated to time, usually a calendar Figure 8. Mind map of the various properties identified by the ice core archives WG and associated vote. Color is the same as in Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4XNNeGhIngfjHzB/t/historical-documents.

10.1029/2019PA003632
Paleoceanography and Paleoclimatology day for each sample then interpolated to even monthly intervals) and X-ray pictures (and associated drilling path). For both new and legacy data sets, the raw (distance from core top), interpolated chronologies, and X-ray pictures were considered essential and recommended, respectively (Figures 5 and 6). The reporting of growth increments in mollusks and corals is still an ongoing discussion within MARPA.

Speleothems
When constructing their database (Atsawawaranunt et al., 2018), the SISAL WG identified 23 properties specific to speleothem records. The SISAL database only focuses on stable isotopes in speleothems, and these properties only apply to this proxy system. These properties can be further subdivided into four categories describing the cave and modern cave conditions, the physical sample, and information about the sample data. For new data sets, 11 properties were considered essential and 12 recommended. For legacy data sets, only 2 properties were considered essential and 21 were marked as recommended (Figures 5, 6, and 11).
Although evidence for equilibrium (e.g., the Hendy test; Hendy, 1971, or monitoring data that supports equilibrium precipitation of calcite) was narrowly voted as essential for new data sets and recommended for legacy data sets, three respondents (two on Twitter and one on the survey) expressed concerns about the value of this property as it rarely shows up in monitoring data and the Hendy test has been abused by the paleoclimate community. This illustrates the need for an evolving standard, one that fits the needs of the community and changes as our scientific understanding about proxy systems increases.

Tree-Based Records
The tree ring community has a long history of developing and adopting data standards; however, the metadata capacity or requirements in earlier data formats (e.g., Tucson, Heidelberg, Sheffield, CATRAS, and Belfast among many others) were limited by the technology of the decade in which they were created (Brewer et al., 2011). The 35 properties in the survey were taken from TRiDaS (Jansma et al., 2010) and Figure 9. Mind map of the various properties identified by the lake sediments archives WG and associated vote. Color is the same as in Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4h9m-GhIjjbm3yX/t/lake-sediments.

10.1029/2019PA003632
Paleoceanography and Paleoclimatology Figure 10. Mind map of the various properties identified by the marine sediments archives WG and associated vote. Color is the same as in Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4iIkodcxlDKTK6v/t/marinesediments. Figure 11. Mind map of the various properties identified by the speleothem archives WG and associated vote. Color is the same as in Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4gwj-GhIl4VmfYP/t/speleothem.

Paleoceanography and Paleoclimatology
from the proposed tree-ring isotope databank (Csank, 2009). TRiDaS was chosen as a starting point as it was designed as a standard to represent dendrochronological data across its many subdisciplines, including dendroclimatology. TRiDaS therefore includes many (optional) properties as essential or recommended that are not applicable to data sets collected for paleoclimate reconstructions.
For new data sets, 26 properties were considered essential, 7 recommended, and 2 desired. For legacy data sets, 19 properties were voted on as essential, 9 as recommended, and 7 as desired ( Figures 5, 6, and 12). Several researchers were confused about the terms used in TRiDaS, suggesting that the standard may be too broad for most paleoclimate applications and should be further refined if it is to be widely adopted. The reason for this confusion may be because TRiDaS was initiated by the cultural dendrochronology community (e.g., dendroarcheology, art, and building history) in a response to the more pressing need for standardized metadata in these disciplines. Despite attempts to engage all subdisciplines of dendrochronology in the development of TRiDaS, the cultural aspects of the standard were more fully implemented due to the greater participation of users from these areas of research.
Nevertheless, a subset of the fields defined in TRiDaS was used as a starting point for discussion for PaCTS v1.0. Many fields within TRiDaS are already addressed in the cross-archive metadata and were disregarded, leaving only dendro-specific fields. These were then supplemented by fields for tree-ring isotope data taken from the tree-ring isotope databank proposed by Csank (2009). Regretfully, discussion of the suitability of these fields among the dendroclimatology community has been limited and the list of initial fields was not subsequently refined. The public voting process has resulted in a number of fields being marked as essential that are not routinely (if ever) collected for dendroclimatological research. Furthermore, some of the quantities that are being proposed are difficult to measure or know, raising the issue of whether these properties are even desired. Some of the properties are a characteristic of the data themselves (ring count) and not metadata per se. These may be useful as convenience fields when querying large data collections (rather than having to extract and calculate).
The confusion in the voting process could reflect confusion over whether PaCTS v1.0 is to be a data standard applicable to all dendrochronological data sets or exclusively to those collected for use in climate  Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4huaYdcxhdzTB9z/t/trees. reconstructions, for which a smaller number of essential fields would be required. It could also reflect sampling bias in the voting process related to the composition of the WG.
While the work described here is clearly an important step towards incorporating dendroclimatological data into a universally applicable paleoclimate data standard, there remains a great deal of work to be done. This work needs to begin with discussions that engage a much broader cross section of the dendroclimatological community and refined criteria in subsequent surveys.

Documentary Archives
Historical documents differ quite significantly from the other archive types presented in PaCTS v1.0. Documentary data are extracted from written sources (books, chronicles, newspaper, etc.), and each of these sources in the data set needs a reference to the publication metadata (in addition to the scientific publication of the data in a journal). The raw data most comparable to measurements on other archives are quotes; that is, text strings in any language cited from the source from which location, time, and event are extracted. Every single data point in the set can thereby have a different location and a variety of parameters describing the event (Glaser, 1996). The time step can be, but is not necessarily, periodic. The quote might contain information regarding the temperature in a city, precipitation conditions, and the resulting water level in a river, as well as statements concerning harvest amount and quality of a certain crop. The resulting data type can be boolean (for presence/absence), integer (for indices), real numbers with units for measurements, or enumerations (Riemann et al., 2016).
he documentary archives WG identified nine properties, which concerned the source material, including original scans of the documents, quote ID, language, and reference to the source material (e.g., DOI, license, and page). Among these nine archive-specific properties, four (the quote, reference to the quote, the quote ID, and the quote's DOI) were voted as essential and five as recommended for new data sets. For legacy data sets, only two (the quote and its reference) were identified as essential (Figures 5, 6, and 13). Four survey respondents indicated that they were least familiar with this type of archive, which may help explain why fewer properties compared to other archives were considered essential for optimal reuse of the resource by researchers not familiar with the intrinsic details of the archive.

Uncertainties
The uncertainties WG identified seven properties applicable to most records. These properties fell into two broad categories concerning the uncertainty in the measured variable (analytical uncertainty, number of repeat measurements, and reproducibility) and the uncertainty associated with models to infer variables, including chronologies (output statistics, output ensembles along with the parameters, and the publication in which the model is described). For new data sets, four properties (analytical uncertainty, number of repeat measurements, the publication, and parameters of the model) were deemed essential and the other three recommended. For legacy data sets, only one was deemed essential (number of repeat measurements), Figure 13. Mind map of the various properties identified by the documentary archives WG and associated vote. Color is the same as in Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4XNNeGhIngfjHzB/t/historical-documents.
while the rest were recommended. This highlights the commitment of the community to better characterize uncertainties in paleoclimate records and the acknowledgement that uncertainty has often been ignored when reporting data sets in the past, making it difficult to include metadata for legacy data sets (Figures 5, 6, and 14).
Respondents voted on reporting the analytical uncertainty and reproducibility as 2-sigma (estimated as the standard error of the mean), although a point was raised that the reporting should be community-specific, following their own accepted standards (e.g., radiocarbon; Stuiver & Polach, 1977, Millard, 2014, but clearly indicated in the metadata. A compromise is to keep community-specific standards while encouraging 2sigma reporting if there is no preexisting standard.
For models, the method used should be documented both in the papers and with the data, with publication information about the software and parameters used being considered essential for new data sets. For legacy data sets, all information about the model is considered recommended.
The uncertainties WG has barely scratched the surface of uncertainty reporting in paleoclimate studies. Although several other WGs have reported that uncertainty should be an essential parameter, there is not yet a clear path forward as to how this uncertainty should be unambiguously reported. However, there is some consensus that the method of reporting does not matter as long as the method is clearly described. To do so, the LinkedEarth Ontology (Emile-Geay et al., 2019) offers several paths forward. The class Uncertainty can refer to a single value for all the data values, to a list of values of equal length as the uncertain variable, and to models output stored in ensemble, summary, and distribution tables.
Consider the example of radiocarbon dating. Each radiocarbon value is associated with an uncertainty that is often reported in a separate column of the measurement table. This radiocarbon-age uncertainty is then translated (via a calibration curve) into a calendar age uncertainty that is also stored in a separate column. In both of these cases, the uncertainty is a variable that can be described with the same richness as other columns in the data table. Furthermore, probabilistic age modeling software such as Bchron (Haslett & Parnell, 2008) and BACON (Blaauw & Christen, 2011) for radiocarbon, HMM-Match (Lin et al., 2014) for stratigraphic alignments, and the Banded Age Model (Comboul et al., 2014) return possible age distributions around the calendar age value and age model ensembles for each depth in the paleorecord. In this particular example, each measured value has at least one associated uncertainty value, possibly an entire probability distribution.
On the other hand, uncertainty associated with measurements of trace elements and stable isotopes is often reported as the uncertainty of the standard or a handful of replicates that are taken to represent the uncertainty for all values. The LinkedEarth Ontology (Emile-Geay et al., 2019) allows for the specification of not only the values and units of the uncertainty but also how this uncertainty is estimated and the level at which it is being reported (e.g., one standard error of the mean).  Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4gttodcxjfvSst0/t/uncertainties.

Chronologies
The chronologies WG identified 54 properties, 43 of which were deemed essential for new data sets, 10 recommended, and 1 desired. For legacy data sets, 30 were identified as essential, 22 as recommended, and 2 as desired (Figures 5, 6, and 15).
Chronologies are obtained using two methods: absolute and relative. Relative chronologies often involve the alignment of one paleoclimate time series with another of known age. For instance, benthic foraminifera stable oxygen isotope (δ 18 O) records have often been aligned to the dated LR04 benthic δ 18 O stack (Lisiecki & Raymo, 2005). For this type of chronology, the original measurements (e.g., benthic foraminifera δ 18 O), the alignment target (e.g., LR04 benthic δ 18 O stack), its associated reference chronology (e.g., LR04 age model), and alignment method (e.g., HMM-Match; Lin et al., 2014) should be clearly identified (essential) for both new and legacy data sets. We acknowledge that there is potentially more work to be done to devise a standard for relative chronologies, which should include an integration framework for biostratigraphy, paleomagnetism, stable isotopes chronologies, and orbitally tuned chronologies.
Absolute chronologies are based on radiometric measurements (commonly radiocarbon, lead, and uraniumdecay series, or terrestrial cosmogenic nuclide), layer counting, counting of annual cycles in geochemical/ isotopic proxies, dendrochronological or tephrochronological crossdating, or luminescence. In addition, some records are characterized by floating chronologies that are absolutely dated (within the uncertainty of the radiometrically derived age), but which have a precise internal chronology due to clear annual banding/cycles (e.g., U-series dated fossil corals and radiocarbon-dated tree chronologies).  Figure 5. Parentheses indicate recommendations for legacy data sets when different from new data sets. Available online at https://coggle.it/diagram/W4hzXeGhIi5Fm0q7/t/chronologies.

Paleoceanography and Paleoclimatology
The radiocarbon community has a long history of standardizing the reporting of their measurements. In 1977, Stuiver and Polach highlighted recommendations that have remained mostly unchanged (Stuiver & Polach, 1977). For chronological studies using the Libby half-life (Libby et al., 1949), Stuiver and Polach recommend reporting the δ 13 C ratio, the conventional radiocarbon age (relative to CE 1950), associated error (expressed as ± one standard deviation), the estimated reservoir correction, and (optionally) the per mil depletion or enrichment with respect to 0.95 NBS Oxalic acid standard (Olsson, 1970). For geochemical samples, dendrochronological samples, reservoir equilibria, and diffusion models, they recommend reporting the δ 13 C ratio, percent modern, and δ 14 C and Δ 14 C based on the Cambridge half-life of 5730 years (Godwin, 1962). These guidelines were further extended to include postbomb 14 C data (Reimer et al., 2004) and the reporting of calibrated dates (Millard, 2014) and formed the basis of the properties that were put to a vote. Given the long history of standardization, it is not surprising that legacy radiocarbon data sets are also held at a stringent reporting level.
For U-Th dating, the WG recommended the use of the standard proposed by Dutton et al. (2017), with most properties recognized as essential when reporting U-series dates.
Survey respondents also defined what information should be included when reporting the use of age modeling software. The method's name is deemed essential for both legacy and new data sets with most of the other properties identified as recommended. In addition, there is interest in storing ensembles of posterior draws from Bayesian approaches to ensure that the study is fully reproducible. The LiPD structure is already setup to handle multiple model output instances, allowing updates of chronologies for legacy data sets when raw data are available. They thus provide a natural container to store this information.
Finally, respondents were asked to define some nomenclature, including the use of present in paleoclimate studies. Over 80% of respondents voted on keeping the concepts of age and year separated. Age is represented on a time axis starting from the present and counting positively back in time. On the other hand, year follows the Gregorian calendar and is particularly useful for studies concentrating on the past 2,000 years. Over 60% of respondents also voted on reporting years relative to CE (Common Era) rather than AD.
Asking for a definition of present yielded diverse results. Sixty-eight percent of respondents voted in favor of using 1950 as the present, following the radiocarbon convention, 7% voted in favor as defining present as the last year in a record (with no mention of uncertainty), 12% voted in favor of using 2000 as the present, while the last 13% answered other. This last category includes the use of 1950 for radiocarbon and either something else for the other chronologies or readjusting to 1950 to stay in tune with radiocarbon and the use of either 1950 or 2000 as long as it is clearly defined with the data. In summary, there is a consensus that present should be defined as an absolute date (and reported in the metadata), but it should be archive-dependent, with practitioners of U-series dating leaning towards CE 2000 and practitioners of radiocarbon dating leaning towards CE 1950.
One issue in reporting ages is, again, the lack of standards. The most common standard for time and date reporting (e.g., ISO 8601) does not accommodate for geologic time. The more recent OWL time ontology draws on the work of Cox and Richards (2015) and includes these concepts. However, these authors offer no finer division of geologic time than eras. This means that the vast majority of archived paleoclimate data sets (particularly, the totality of data sets archived on the LinkedEarth platform) would represent a single time point (the Quaternary era). To remedy this gap between ISO 8601 and the OWL time representation, we hereby propose a precise mechanism to report the time axis in paleoclimate data sets: Time age ð Þ ¼ significand:10 exponent years direction datum where significand and exponent are components of standard floating-point representation; direction indicates whether time flows forward (since a datum, as in the case of AD dates), or backward (before a particular datum, as in the case of ages). Datum here refers to the origin point of the time (age) axis, which is arbitrary and (as recounted by Wolff, 2007) highly inconsistent among researchers. Table 1 shows how this representation would work in practice. Note that variability in the datum for rows 1 (21 ky BP, a common date for the Last Glacial Maximum) and 4 (127 ky BP, a common date for Marine Isotope Stage 5e) could arise because of the date being reported from a radiocarbon versus U-series chronology and is usually impossible to infer without clarification from the original publication, or from its authors.

Paleoceanography and Paleoclimatology
The current proposal removes such ambiguities and can accommodate both observed and simulated data sets, potentially easing the task of model-data comparison if both communities start adopting it.

An Example: MD98-2181
This section puts these recommendations into practice on a real-world data set: the MD98-2181 marine sedimentary record from Khider et al. (2014). The purpose is twofold: (1) illustrate how to implement these recommendations in practice and (2) draw attention to practical difficulties that may impede large-scale adoption of PaCTS v1.0.
MD98-2181 is the most metadata-rich data set currently available on the LinkedEarth platform since it was used as an example to further develop the LiPD framework and later the LinkedEarth Ontology. The data set consists of measurements of Mg/Ca and δ 18 O made on the planktic foraminifera Globigerinoides ruber (white, sensu stricto, and sensu lato) and δ 18 O made on the benthic foraminifera Cibicidoides mundulus to infer surface and deep ocean variability in the western tropical Pacific over the Holocene. The age model is based on radiocarbon measurements for the Holocene and deglacial portion of the core.
Using the standards proposed for cross-archive metadata, Mg/Ca and δ 18 O on foraminifera, radiocarbonbased chronology, and uncertainties, we calculated how many metadata properties in the essential and recommended categories were present in the MD98-2181 data sets ( Figure 16). Since, by default, all metadata are desired, we ignored this category for the purpose of this example. In terms of its cross-archive metadata, the MD98-2181 record is nearly complete, with 95% of the essential metadata and 78% of the recommended metadata present in the record ( Figure 16). The only missing component of essential metadata is the sample thickness. For the recommended category, the International Geo Sample Number for the sample and date at which the measurements were performed (i.e., analysis date) are missing. The core International Geo Sample Number should be assigned by the core repository directly (e.g., Bremen Core Repository and Oregon State University core repository). Both analysis dates and sample thickness are metadata readily available at the time of collection. Although both were collected in either a physical notebook or by the instrument during  analysis, they were not archived with the data set on LinkedEarth since the information was not deemed by the metadata authors as essential for reproducibility.
The paleodata for the record consist of Mg/Ca and δ 18 O measurements on foraminifera tests from sediment core subsamples. For the essential reporting of δ 18 O on foraminifera, the MD98-2181 record lacks metadata regarding the taxonomy scheme being followed and equilibrium offsets. In the recommended category, only the volume of sediment analyzed is missing. For Mg/Ca reporting, the contamination indicator values (Mn/ Ca and Fe/Ca; Khider et al., 2014) are missing from the archived record in addition to the taxonomy scheme being followed. Neither were deemed useful for reproducibility by the authors of the study at the time of reporting. In the recommended category, the volume of sediment analyzed and habitat depth has not been reported. In both cases, the values are unknown, either because they were not measured during sample preparation (sediment analyzed) or could not be accurately determined (habitat depth) from previous studies in the region.
The MD98-2181 chronology was based on radiocarbon measurements. Ninety percent of the raw radiocarbon dates used in Khider et al. (2014) were reported in Stott et al. (2004Stott et al. ( , 2007. The raw data necessary for the repeatability and replicability of the age model in Khider et al. (2014) were rereported in the later study. However, the archived record is missing information about the modern fraction (F14C), the sample ID, and the matrix, which are deemed essential. The archived record is also missing most of the recommended properties, only reporting the reservoir age correction (ΔR), the ensemble statistics, and the ensemble age models. The last two properties are essential in the context of the Khider et al. (2014) study to reproduce the ageuncertain spectral analysis. The Stott et al. (2004Stott et al. ( , 2007 studies are also missing the essential and recommended properties with respect to reporting of raw measurements.
For uncertainty quantification, the record metadata lack the number of repeated measurements and the model parameters in the essential category, though it should be noted that the values of repeated measurements are reported in the measurement table itself. The record is complete in the recommended category.
This example highlights the difficulty of reporting all essential metadata, especially after the study has been completed. We therefore present version 1.0 of PaCTS as an aspirational standard, one that would theoretically ensure optimal reuse of paleoclimate data sets but is difficult to observe in practice. Clearly, being aware of these requirements at the start of a study would help scientists keep track of the necessary metadata and ensure that they are reported when the data set is digitally published (e.g., on WDS-Paleo or PANGAEA). We therefore recommend that investigators plan ahead of time which properties they intend to report and structure their lab notebooks so this information is easier to track at the time of publication.

Discussion
This paper describes the first effort by the global paleoclimate community to define standards for digitally archiving paleoclimate data sets. Such standards aim to make publicly archived paleoclimate data more reusable by clearly describing them with comprehensive metadata. In combination with the LinkedEarth Ontology, these standards also help meet the interoperability principle by using a formal, accessible, shared, and broadly applicable language for knowledge representation. If the data sets are properly described using microdata (e.g., Schema.org), they are also findable. Together, these standards bring such data sets closer to compliance with FAIR principles.
The standards arose through collective discussions, both in person and online, and via an innovative social platform (Gil et al., 2017). The results of this collective decision-making reveal an evident desire for archiving a rich set of metadata properties, with respondents identifying roughly two thirds of properties (208 out of 302) as essential for new data sets. Respondents also recognized that legacy data sets may not be as complete, so they identified less stringent requirements in order not to overlook valuable data sets. Nonetheless, respondents identified 131 properties as essential for legacy data sets, highlighting the fact that a data set loses its usefulness if too many requirements are not met. Several respondents also indicated that while some properties should theoretically be essential (or recommended), they may be hard to obtain in practice and/or variable in time. These include seasonality and habitat depth of foraminifera and many of the properties from TRiDaS. Furthermore, although rich metadata are always valuable, these requirements should be balanced with the researcher's time. Scans of historical documents or uploads of X-radiographs of archive samples would be highly valuable to the community, but these activities are time-consuming and this use of time is rarely, if ever, incentivized by funding agencies.
PaCTS v1.0 is also missing several proxy systems, including loess and continental records, faunal and floral counts in lake sediments, and does not incorporate recent standards such as the one developed by Courtney Mustaphi et al. (2019) for 210 Pb dating. Finally, although cross pollination was encouraged, common properties were not adequately identified across WGs, resulting in duplicates. This is especially apparent in the lake and marine sediment WGs.
Another salient outcome is that this first version of PaCTS can only be described as aspirational. Indeed, section 5 illustrates that even in the best of circumstances (the author describing their own data set, generated less than a decade ago), the compliance rate was far from perfect. This points to the need for more realistic guidelines. It is indeed apparent that many participants misinterpreted what was meant by essential. Further, the participation rate is still far below what is needed for this standard to be representative of the worldwide paleoclimate community, which would gain much from harmonization. How can this standard be collectively refined and more broadly adopted? How should the standard, and its future versions, be implemented in practice?

Broadening Participation
The genesis of PacTS v1.0 serves as a useful template for future efforts. As detailed in section 2, the spark for the discussion came from the 2016 workshop on Paleo Data Standards. Nothing replaces the immediacy of in-person communication for this sort of work. However, it would be costly, carbon-intensive, and unrealistic to expect large segments of the paleoclimate community to travel for such an event, should it happen again. We therefore advocate that further discussion takes place within, or around, existing meetings. Examples include the annual meetings of the American Geophysical Union and the European Geosciences Union, the Goldschmidt conference, Ocean Sciences meeting, the PAGES Open Science Meeting, the International Conference on Paleoceanography, meetings of the International Union for Quaternary Research, and more focused meetings like WorldDendro, Karst Record, or the ASLO Aquatic Sciences Meeting. We have also found PAGES-sponsored workshops to be excellent opportunities to discuss data stewardship considerations, of which reporting standards are an important aspect. At the very least, an annual session at an international meeting would be useful for the community to touch base and take stock of progress and challenges, but more frequent interactions will be desirable until adoption reaches a critical threshold (e.g., 80% of submissions to public repositories like WDS-Paleo or PANGAEA).
Assuming that such meetings will take place over the next few years in many corners of the community, there is still a need for more sustained forms of communication. The virtual working groups on the LinkedEarth platform are where many of our discussions took place, and they remain available to complement the in-person discussions. Membership is open, and we encourage interested readers to join LinkedEarth so they can participate in these forums or create their own forums on a platform of their choice (traceability and transparency being of paramount importance).

Roadmap to Standardization
In practical terms, we recommend that the next iteration of PaCTS use the following steps: 1. The procedure for ratification is developed in tandem with major stakeholders (scientific societies, data repositories, and chief editors). 2. The proposed procedure is widely distributed to the community (e.g., through the PAGES magazine, AGU and EGU communication channels, and social media). 3. The timeline for discussion and voting is clearly indicated, and voting occurs on the LinkedEarth platform. 4. The vote outcome is presented at a major international meeting, and any additional discussion is considered before the vote is certified at the meeting. 5. The standard is widely disseminated and encouraged by appropriate incentives (see below).

Implementing Emerging Standards
We envision two main ways to encourage the adoption of the standard. The first is to use technical innovation to lower the barrier to metadata archiving; the second is to change the incentive structure to make it worthwhile for researchers to adopt the standard, despite the inevitable opportunity cost that comes with providing more complete data records.
On the first point, the LinkedEarth project has recently implemented a web interface to convert paleoclimate data sets into the LiPD format: the lipd.net playground (http://lipd.net/playground). To promote standardization, the reporting recommendations described herein will be flagged as users create LiPD files interactively on the lipd.net website, pulling data and metadata from native archival formats (e.g., Excel spreadsheets). Ideally, all records, especially those accepted on the LinkedEarth platform, will show their compliance rate with PaCTS. This rate can be computed during creation of the LiPD file, allowing unavailable as an answer for the essential fields. At present, the lipd.net playground displays the rate of required fields that have been entered but is not set up to track archive or proxy-specific completeness, although this is possible with further development. The unavailable category serves two purposes: (1) to encourage researchers to gather these metadata during their next study and (2) to investigate how many of these essential properties are reported in practice. Alternatively, LinkedEarth could appoint a Board of Data Editors to approve the data sets for upload onto the platform. The Board presents several advantages over an automatic process: (1) to answer specific questions, therefore taking into consideration the intricacies of a data set; (2) to identify needed changes to the reporting standards faster; and (3) to assist the community with the online Web service when needed. The major drawback is the volunteer time of the Board of Data Editors. In our experience, the time of researchers is already stretched thin, and they have little incentive to commit more of it to the relatively thankless task of standardization.
How might the reward structure be changed? There are essentially two levers to activate. The first is funding agencies. In the United States, for instance, the National Science Foundation funds the vast majority of paleoclimate research. While the agency now requires a data management plan to be submitted for each proposal, its reporting guidelines are very broad. They could be made more specific and point paleoclimate researchers to the latest version of PaCTS. The European Research Council similarly supports Open Science, but with far less specific guidelines than PaCTS v1.0. To the best of our knowledge, the situation is similar for other countries (e.g., Canada and Australia). We therefore call on funding agencies to either endorse this standard or propose a meaningful alternative.
The second lever is publishers and editors: while each publishing house encourages digital data archiving to varying degrees, the decision of what (meta)data to include is ultimately up to the author and often fails to consider the long-term value proposition of the data set. Publishers could help ensure that the present standard is, at the very least, encouraged, if not mandatory. In particular, the American Geophysical Union and Copernicus publishers recently endorsed requirements to make data FAIR. Affiliated journals could use their leverage to promote more stringent reporting standards. As an example, the recent PAGES 2k special issue of the journal Climate of the Past piloted the implementation of open-data practices, which included some reporting standards, and reported the challenges faced when requiring such practices . Another avenue for promoting best practices, including adoption of reporting standards, is through professional paleoscience organizations such as PAGES and INQUA.
We expect the present reporting standard to evolve to meet the needs of the paleoclimate community. It is our hope that this publication will stimulate volunteers to join the effort and organize discussions at all community levels; there can be no community standard without community involvement. We are confident that improving paleoclimate data standards will promote collaboration on international data syntheses and encourage the development of software based on the new standards. In turn, such software will reduce the time to science, by compressing the time researchers spend on the menial task of data wrangling.