Analysis of an Updated Paleointensity Database (QPI‐PINT) for 65–200 Ma: Implications for the Long‐Term History of Dipole Moment Through the Mesozoic

The global paleointensity database for 65–200 Ma was analyzed using a modified suite of paleointensity quality criteria (QPI) such that the likely reliability of measurements is assessed objectively and as consistently as possible across the diverse data set. This interval was chosen because of dramatic extremes of geomagnetic polarity reversal frequency ranging from greater than 10 reversals per million years in the Jurassic hyperactivity period (155–171 Ma) to effectively zero during the Cretaceous Normal Superchron (CNS; 84–126 Ma). Various attempts to establish a relationship between the strength of Earth's magnetic field and the reversal frequency have been made by previous studies, but no consensus has yet been reached primarily because of large uncertainties in paleointensity estimates and sensitivity of these estimates to data selection approaches. It is critical to overcome this problem because the evolution of the dipole moment is a first order constraint on the behavior of the geodynamo. Here we show that conventional statistical tests and Bayesian changepoint modeling consistently indicate the strongest median/average virtual dipole moment during the CNS. In addition, the CNS and Jurassic hyperactivity period are characterized by the highest and lowest percentage of virtual dipole moments exceeding the overall median for the 65‐ to 200‐Ma interval, respectively. These observations suggest that the superchron dynamo was able to generate stronger fields than the dynamo operating in the frequently reversing regime. While the precise mechanism remains unclear, our results are compatible with the hypothesis that field strength and reversal rate variation are controlled by changes in core‐mantle boundary thermochemical conditions.


Introduction
Data on the long-term variations of Earth's magnetic field, including those of geomagnetic polarity reversal frequency, secular variation magnitude, and paleointensity, are crucial for understanding the geodynamo and planetary evolution (e.g., Aubert et al., 2010;Biggin et al., 2012). The continuous chronology of polarity reversals has been reliably established for the last~170 Ma from seafloor marine magnetic anomalies and magnetostratigraphic studies (e.g., Ogg, 2012;Opdyke & Channell, 1996), and numerous reversals have been identified in older rocks back to the Neoarchean (e.g., Pavlov & Gallet, 2005;Strik et al., 2003). These data indicate that the frequency of reversals has varied widely, from periods during which no reversal occurred for tens of millions of years (superchrons), to periods when the reversal frequency exceeded that observed for the last 10 Ma by factor of 2-3 (e.g., Ogg, 2012). Interpretations of the geomagnetic reversal record however remain controversial. While some studies postulated that the geodynamo could have produced the observed frequency variation in a purely stochastic manner (e.g., Hulot & Gallet, 2003;Jonkers, 2003Jonkers, , 2007Lowrie & Kent, 2004), a contrasting view posits that the variation has been primarily controlled by changes in the boundary conditions, most importantly, the heat flow across the core-mantle boundary (CMB) (e.g., Jones, 1977;Larson & Olson, 1991;Lhuillier et al., 2013;McFadden & Merril, 1984).
A full understanding of the geodynamo mechanisms requires a holistic approach in which the geomagnetic reversal record is considered together with the long-term data on the field strength (paleointensity) and secular variation (e.g., Hulot & Gallet, 2003). However, despite continuing effort, some first-order questions about the long-term history of geomagnetic field remain open. One such question is whether the long-term variations of geomagnetic reversal frequency, secular variation, and field strength are intrinsically related or whether they have varied independently throughout geological history. Some numerical geodynamo models exhibit an inverse relationship between geomagnetic reversal frequency and paleointensity (e.g., Aubert et al., 2010;Driscoll & Olson, 2011;Glatzmaier et al., 1999;Olson et al., 2010;Olson & Amit, 2015), whereas other models allow different scenarios (e.g., Driscoll & Olson, 2009). Model outcomes are also sensitive to the boundary conditions and model parameters, which are currently not well constrained (e.g., Driscoll & Olson, 2009;Olson & Amit, 2015).
In the absence of strict theoretical constraints, empirical paleomagnetic data become a principal source of information about the geodynamo. However, existing paleointensity data do not allow us to make unambiguous conclusions about the relationship between field strength and reversal frequency. Prior analyses of the results obtained from bulk volcanic rocks that dominate the global paleointensity database (earth.liv.ac.uk/ pint/) appear to not support any correlation between geomagnetic reversal frequency and paleointensity (e. g., Heller et al., 2002;Ingham et al., 2014;Prévot & Perrin, 1992;Selkin & Tauxe, 2000). While paleointensity values consistently lower than the average value for the last 180 Ma are common during periods of high reversal rate (Sprain et al., 2016;Tarduno & Cottrell, 2005;Tauxe et al., 2013), the values significantly exceeding the average are rare during the Cretaceous normal polarity superchron (CNS; e.g., Prévot et al., 1990;Garcia et al., 2006;Zhu et al., 2008). The abundance of low paleointensities in the Mesozoic led to a Mesozoic dipole low (MDL) hypothesis (Prévot et al., 1990) which validity has remained a controversial topic (e.g., Dodd et al., 2015;Goguitchaichvili et al., 2002;Tarduno & Cottrell, 2005). These observations led to the conclusions that the geodynamo processes governing the field intensity variations and geomagnetic reversals are decoupled (e.g., Prévot & Perrin, 1992;Selkin & Tauxe, 2000) or ambiguous (Biggin & Thomas, 2003;Ingham et al., 2014).
In contrast to whole rock data, paleointensity determinations from single silicate crystals that contain singledomain magnetic inclusions (Cottrell & Tarduno, 2000) suggest the inverse relationship with reversal frequency, consistently indicating high field strengths during the CNS (Cottrell et al., 2008;Tarduno et al., 2001Tarduno et al., , 2002Tarduno et al., , 2006 and low field strengths during periods of high reversal frequency (Bono & Tarduno, 2015;Tarduno & Cottrell, 2005). Albeit based on a limited number of data points, the long-term trend revealed by single crystal data suggests that the mechanisms controlling the field intensity and stability are intrinsically linked (e.g., Tarduno et al., 2006).
Early paleointensity analyses of submarine basaltic glass (SBG) concluded that the field strength during the CNS was of weak to average strength (e.g., Pick & Tauxe, 1993;Selkin & Tauxe, 2000). These results have later been augmented by studies that yielded higher field intensities for the CNS (Tauxe, 2006;Tauxe & Staudigel, 2004). However, the SBG-based paleointensity values for the superchron are more variable, and most of them are significantly lower than those derived from single silicate crystals (Tarduno et al., 2006;Tauxe & Yamazaki, 2007).
In a recent statistical analysis of the paleointensity database for the last 200 Ma, which combined the data from all paleointensity carrier types, Ingham et al. (2014) did not find any significant correlation between the reversal frequency and field intensity. However, the authors pointed out that the deficiencies of the existing paleointensity database did not allow constraining the true history of the geodynamo with confidence. Furthermore, their analysis confirmed that the single crystal approach produces paleointensities significantly different from those using more conventional whole-rock methods.
The discrepancies in the histories of geomagnetic field intensity derived from different remanence carriers warrant further development and refinement of the paleointensity database to advance our understanding of the relationship between the long-term behavior of paleointensity, reversal frequency, and secular variation of the field. However, determination of paleointensity represents a much more difficult task than determination of paleomagnetic directions. While the latter can be achieved even if only a small part of the primary magnetization vector is retained in the rock, the former requires a much more complete preservation of magnetic remanence. In addition, there is a plethora of factors, including natural and laboratory alteration, the presence of nonideal magnetic carriers and magnetizations, and others that are likely to result in paleointensity biases (e.g., Dunlop et al., 2005;Fabian, 2009;Levi, 1977;Paterson, 2013;Smirnov et al., 2017;Smirnov & Tarduno, 2003Yamamoto et al., 2003). These biases may be responsible for the disagreement on the long-term paleointensity trends inferred from the whole rock, SBG, and single silicate crystals data sets, thus leading to controversial conclusions about the mechanisms governing the long-term behavior of the geomagnetic field (e.g., Smirnov et al., 2017;Tarduno & Smirnov, 2004). To unravel longterm trends in paleointensity behavior, the effects of these biases need to be minimized by subjecting the paleointensity data to efficient reliability criteria (e.g., Biggin & Paterson, 2014).
Here we present the results of our effort to update and refine the global paleointensity database (PINT) for the 65-to 200-Ma period, using a revised set of paleointensity quality criteria (Q PI ; cf. Biggin & Paterson, 2014). The age interval selected for this endeavor encompasses the wide range of geomagnetic reversal frequencies, from the period of no reversals during the CNS to intermediate reversal rate, to the period of extremely frequent reversals in the late Jurassic (the Jurassic hyperactive period, JHAP). The updated Q PI -PINT database has been analyzed to investigate the relationship between the reversal frequency and strength of Earth's magnetic field. The results of these analyses and their implications for understanding the geodynamo are discussed.

The Paleointensity Database (Q PI -PINT) for 65-200 Ma
We started our analysis with an exhaustive literature search, which yielded 796 site mean paleointensity estimates from 75 individual studies, expressed as virtual or virtual axial dipole moments (collectively referred to as VDMs hereinafter; supporting information Table S1 and Figures 1 and 2). Compiled VDMs for the chosen time interval include 61 data points from six studies published since the release of 2012 Q PI -PINT database, used in the most recent analysis of paleointensity data for the Mesozoic (Ingham et al., 2014). The majority of the selected VDMs (n = 586) are obtained using variants of the double-heating Thellier method (Thellier & Thellier, 1959), whereas the remaining VDMs are obtained with the van Zijl method (n = 2; van Zijl et al., 1962avan Zijl et al., , 1962b, different modifications of the Wilson method (n = 77; Wilson, 1961), variants of the Shaw method (n = 179; Shaw, 1974;Tsunakawa et al., 1997, Yamamoto et al., 2003, the microwave method (n = 36; Hill & Shaw, 1999), or the multispecimen parallel differential method (n = 15; Dekkers & Bohnel, 2006). These numbers include site-mean VDMs measured using combinations of two or more paleointensity techniques. Most VDMs (n = 654) are obtained from whole rock samples, 123 from submarine basaltic glass (e.g., Juarez et al., 1998;Tauxe et al., 2013), and 19 from single silicate crystals (e.g., Tarduno et al., 2006). The age distribution of selected data is uneven, with 627 and 169 VDMs representing the Cretaceous and Jurassic periods, respectively ( Figure 2a). The VDM values range from 0.5 to 25.3 × 10 22 Am 2 with a median of 4.2 × 10 22 Am 2 (Figures 2a and 2b).
We have subjected each selected VDM to a set of paleointensity quality criteria, largely following the Q PI suite proposed by Biggin and Paterson (2014). Each criterion has been assigned a "pass" (1) or a "fail" (0) value (supporting information Table S1) based on the conditions described below. A pass has been given only if explicit positive evidence for the required condition was available. In the absence of such direct evidence, the criterion has been deemed as failed.

Age
This criterion requires that the age of the rock sequence is reliably determined using radiometric or stratigraphic methods with an error not exceeding 10% of the nominal age. In addition, to ensure the age of remanence, convincing evidence is required that the primary remanence component is used for paleointensity determination (e.g., directional data from the paleointensity experiment confirming isolation of the characteristic remanence). Overall, 403 VDMs have passed this criterion.

DIR
This criterion, which was not included in the original suite proposed by Biggin and Paterson (2014), requires that a VDM is associated with a well-defined paleomagnetic direction, calculated as mean of a minimum of five individual samples directions with attendant Fisher (1953) precision parameter k ≥ 50. These minimal threshold values for the number of samples and within-site precision are commonly used in paleomagnetic studies of secular variation for discriminating the site-mean directions that can be reasonably deemed to be reliable (e.g., Johnson et al., 2008;Cromwell et al., 2018;Doubrovine et al., 2019). If only a 95% cone of confidence (α 95 ) was provided, k was calculated using the equation: where N is the number of samples used to calculate the site-mean direction.
We note that well-defined paleomagnetic directions can be potentially used for assessing whether a paleointensity estimate represents a stable polarity or transitional field. In our study, we did not attempt to use the directional data for assessing the transitional nature of the field but instead used the interpretations provided in the original publications. This criterion was met by 243 VDMs.

STAT
This criterion requires that a site-mean VDM is based on at least five individual estimates per site and that the ratio of standard deviation to the mean value does not exceed 25%, following adjustment for the number of specimens used (Paterson et al., 2010). This requirement reflects the expectation that paleointensity determinations from the same site must not be significantly different. A large within-site scatter indicates that the estimates from some or all the samples may be inaccurate, hence reducing the reliability of the site-mean VDM. Overall, 123 site-mean VDMs have met this criterion.

TRM
This criterion requires that the remanence component used for paleointensity determination is a thermoremanent magnetization (TRM). As evidence for TRM, we have accepted microscopy observation indicating a primary igneous texture (e.g., homogeneous titanomagnetite grains) and the absence of indicators of the processes that could have resulted in a non-TRM remanence acquisition (e.g., hydrothermal alteration/precipitation, or low-temperature oxidation). If no microscopy data have been available, we assigned TRM = 0.
The TRM criterion has been applied at the formation level (i.e., microscopy data from representative units have been applied to all the results from the formation). Likewise, unambiguous evidence for a non-TRM remanence has been used to set the TRM = 0 for the entire formation. For example, we have assigned both TRM and total Q PI to zero for all VDMs from the Deccan Traps because their magnetic remanence has been shown to be a CRM due to a post emplacement hydrothermal activity (Kono, 1974).  Table S1).
The TRM criterion has been satisfied by 121 VDMs. The low number primarily reflects the absence of microscopic analyses. As a caveat, we note that certain non-TRM remanences (e.g., thermochemical remanent magnetization) may not be recognizable even with microscopy data. Bearing this in mind, we have assigned TRM = 0 to all the VDMs obtained from plutonic rocks due to a high potential for exsolution in magnetic grains to occur below blocking temperatures.

ALT
This criterion requires that a paleointensity estimate is not substantially biased by physicochemical alteration during the experiment. We have accepted an estimate if it passed the alteration screening based on rigorous alteration check procedures (e.g., partial TRM checks (Coe et al., 1978) or monitoring of magnetic  Table S1). (a) The site-mean VDMs based on bulk rocks (grey circles), submarine basaltic glass (blue triangles), and single silicate crystals (red diamonds). The light blue curve shows the average reversal frequency calculated with a 5-Ma sliding window. The geomagnetic polarity timescale (Ogg, 2012) shows the normal/reversed polarity as black/white bars. (b) A histogram and empirical CDF of the VDMs shown in (a). (c) A histogram of the total paleointensity quality (Q PI ) score for the entire database. CDF = cumulative density function; VDM = virtual dipole moment.

MD
The MD criterion addresses a fundamental requirement of the Thellier method that the paleointensity signal should ideally be carried by noninteracting single-domain magnetic grains. The presence of strong interactions and/or multidomain-like behavior of magnetic grains results in nonlinear Arai plots leading to an intrinsic paleointensity bias that becomes more significant with the increasing grain size (Levi, 1977;Paterson, 2011;Smirnov et al., 2017;Xu & Dunlop, 2004). We have assigned MD = 1 if the absence of a significant MD bias was explicitly demonstrated by the pTRM-tail check (Riisager & Riisager, 2001) and/or the Thellier-IZZI (Tauxe & Staudigel, 2004) procedures. Alternatively, as a measure of MD contribution, we have used the fraction (f) of NRM replaced with laboratory TRM for the linear segment of Arai plot used to calculate paleointensity (Coe et al., 1978). The MD = 1 value has been assigned to a site-mean VDM if f was equal or greater than 70% for at least half of the intensity estimates. The VDMs obtained using a domain-state independent method such as the low-temperature demagnetization-double heating Shaw or Wilson methods (e. g., Muxworthy, 2010;Yamamoto et al., 2003) have been assigned MD = 1 by default. Overall, 396 VDMs have met the MD criterion.

ACN
This criterion assesses the effects of three different rock magnetic phenomena-the anisotropy of TRM, cooling rate effect, and nonlinear TRM acquisition-any of which, if present, may result in a substantial paleointensity bias. A site-mean VDM has passed this criterion only if the study explicitly assessed all three effects and all the necessary corrections were applied. However, while the cooling rate correction is often addressed in paleointensity studies, the other two effects are usually considered to be negligible in lava flows and, hence, are hardly ever discussed. Consequently, only 37 VDMs have satisfied this criterion requirement.

TECH
The criterion requires that a site-mean paleointensity estimate is an average of results obtained by two or more paleointensity methods. Importantly, the methods must be different in their physical principles and/ or experimental protocol (e.g., the Thellier versus Wilson method) rather than be different variants of the same method (e.g., the Coe-modified versus IZZI (In-field -Zero field -Zero field -In-field) variants of the Thellier method). This criterion has been met by 100 VDMs.

LITH′
In the original formulation, this criterion required that a site-mean estimate is an average of results from more than one lithology or from samples of the same lithology showing different unblocking behavior (Biggin & Paterson, 2014). In the present analysis, we have used a stricter approach assigning LITH′ = 1 only when the paleointensity estimates are obtained from different lithologies (e.g., baked host rock versus a baking body). Only 13 VDMs have satisfied the LITH′ criterion.

MAG
This criterion requires that specimen-level paleointensity data are publically available so that they can be reanalyzed (Biggin et al., 2015). Currently, 138 VDMs from four studies meet this criterion. While we have assessed this criterion and retained it in our data sets, we have not used it to calculate the final reliability score for our analyses below because, while the availability of raw data can increase confidence in their results, this criterion does not assess the quality of paleointensity determination.
The final Q PI score for each site-mean VDM, calculated as a sum of the individual criteria scores, ranged from 0 to 7 (out of the maximum of 9; supporting information Table S1 and Figure 2c).

Data Selection
We analyzed the updated Q PI -PINT database to investigate the long-term behavior of paleointensity and its relationship with the geomagnetic reversal frequency between 200 and 65 Ma. Selected paleointensity data were divided into five time intervals: the early to middle Jurassic (EARLY, 200-171 Ma), the JHAP (171-155 Ma), the late Jurassic-early Cretaceous (MID, 155-126 Ma), the CNS (126-84 Ma), and the late Cretaceous (LATE, 84-65 Ma; Figure 3). The divisions have been selected to represent different reversal frequency regimes (i.e., low or effectively zero during the CNS, medium 1-3 reversals/Myr during the EARLY, LATE, and MID periods, and high frequency, exceeding~11 reversals/Myr during the JHAP), but they are not based on any implied physical mechanism or statistical characteristics of the reversal process.
For our analysis, we excluded the VDMs unequivocally identified as based on non-TRM remanence, affected by laboratory alteration, and/or reported to represent a transitional field. The refined database contains 623 VDMs (subset 1) including 481 from whole rock samples, 123 from SBG, and 19 from single silicate crystals ( Figure 3a), with the Q PI scores ranging from 0 to 7 (supporting information Table S2 and Figures 3c and 4). The selected VDMs range from 0.5 to 19.9 × 10 22 Am 2 , with a median M 65-200 Ma = 4.1 × 10 22 Am 2 (Figures 3a and 3b). The filtering did not result in preferential removal of low or high VDMs (Figure 3b), but most of the removed data were characterized by low Q PI scores (0-2; Figure 3c). Interestingly, the filtering affected only paleointensity data sets obtained from whole rock samples. The nonparametric two-sample Kolmogorov-Smirnov (KS) test (Sheskin, 2004) indicates that the distributions of the initial and refined data sets are indistinguishable (KS statistic = 0.0254, p = 0.98; cf. Figure 3b).
In order to select a subset of more reliable data with a higher likelihood of representing the true field behavior, we utilized two approaches. In the first approach (a Q PI cutoff), we accepted only data for which the Q PI score exceeds a certain cutoff value. Higher cutoff values should result in selection of more reliable data, reducing the effect of undesirable biases. However, the reduced size of the data set limits the ability of the statistical analyses to properly identify differences between the time periods. As a compromise, we used Q PI ≥ 3 as the minimum acceptable cutoff, which divides the total data set into two nearly equal data sets (N QPI ≤ 2 = 296 and subset 2 with N QPI ≥ 3 = 327). Subset 3 represents VDMs with minimum Q PI ≥ 4 cutoff.
The blanket Q PI cutoff approach, however, does not account for the type and relative importance of the individual Q PI criteria contributing to the total score. As a result, the accepted data may still be influenced by biases. For example, a VDM with the Q PI = 3 score based on the AGE, TECH, and LITH′ criteria must be considered less reliable than a VDM with the same score based on the TRM, ALT, and MD criteria, because the former paleointensity determination does not incorporate the alteration checks and may represent a paleointensity estimate biased due to the presence of MD effects. To ensure this is not the case in our analyses, we further define a prioritized set of Q PI criteria whereby VDMs that fail any one of AGE, ALT, STAT, MD, and TRM are excluded.
Arguably, the most reliable data set based on all five prioritized criteria contains only 20 data points (from 14 single crystal, five whole rock, and one SBG sites) with no data representing the LATE and EARLY intervals and only a single VDM within the JHAP. In order to compare the paleointensity behavior between the different intervals, the overall reliability needs to be traded for a larger data set by relaxing the chosen requirements. Due to our conservative approach to evaluate the TRM criterion, many VDMs with TRM = 0 may in fact be based on a thermal remanence. Thus, we feel that removing this criterion has the least effect on the selected data reliability. However, the size of the accepted data set still remains very small (45 VDMs) with one and two data points within the LATE and JHAP periods, respectively. Of the remaining four prioritized criteria, we consider AGE and ALT the most critical to retain because they ensure that the accepted data are reliably dated and not affected by the experimental alteration. Hence, in our analyses, we used the two remaining possible prioritized combinations, AGE + ALT + STAT and AGE + ALT + MD (subsets 4 and 5, respectively).
First, we considered the combined data set (subset 1) that includes data obtained from all paleointensity carriers. In addition, in view of the apparent discrepancies between the characteristics of data sets obtained from different paleointensity recorders, we analyzed the corresponding data subsets separately in order to evaluate their properties and relative reliability.

Combined Paleointensity Data Set
The CNS interval for SUBSET 1 is characterized by the highest median VDM value (4.8 × 10 22 Am 2 ) and by a relatively large scatter (the interquartile range, IQR-the difference between third and first quartiles) of VDMs (Table 1 and Figure 3d). In contrast, the JHAP has the lowest median (2.5 × 10 22 Am 2 ) and IQR values (Table 1 and Figure 3d). To quantify the variation of paleointensities between the selected intervals, we  Table S2). (a) The site-mean VDMs based on bulk rocks (grey circles), submarine basaltic glass (blue triangles), and single silicate crystals (red diamonds). The light blue curve shows the average reversal frequency calculated with a 5-Ma sliding window. The geomagnetic polarity timescale (Ogg, 2012) shows the normal/reversed polarity as black/white bars.  Table 1 and Figure 3d). To assess the field variability, we used the ratio of IQR to median VDM value: with a notion that the same VDM scatter (expressed as the IQR) may correspond to a low or high field variability depending on the characteristic (in our case, median) field strength.  Figure 3a). However, this paleointensity behavior is defined only by the VDMs from whole rock samples in the currently available paleointensity database (supporting information Figure S1). The MID interval contains no single crystal data and only two VDMs measured from SBG samples (Figure 3a and supporting information Table S2).
Application of the Q PI ≥ 3 cutoff (subset 2) removes a higher relative percentage of low (<3 × 10 22 Am 2 ) VDMs and results in a noticeable decrease in the number of data points for the JHAP (n = 10) and EARLY (n = 16) periods (Figure 5a) (Figure 5a). Both the subset 1 and Q PI -screened SUBSET 2 data show negative correlation between the median VDMs, IQR, V%, and P% values and the average reversal rate (Table 1 and Figure 5d).
A further increase of the Q PI cutoff to Q PI ≥ 4 (subset 3) results in reduction of the data set to N = 199, with the JHAP and EARLY intervals now containing 10 and 6 data points, respectively (Figure 6a and Table 1).   Figure 6d).
The two prioritized combinations, subsets 4 (AGE + ALT + STAT) and 5 (AGE + ALT + MD), result in much smaller data sets (n = 71 and n = 173, respectively) with as few as two data points within each time interval (Table 1 and Figures 7 and 8). Both data sets reveal the strongest field during the CNS and a negative correlation between the P% values and the average reversal rate. Although the median VDM value for the      JHAP appears comparable to those for the periods of the moderate reversal rates (for the subset 4), the P% values are consistently the highest and lowest for the CNS and JHAP, respectively (Table 1). In both cases, the greatest field variability (V%) is observed for the CNS whereas the JHAP is characterized by the lowest variability (Table 1).
In addition to our analysis of geomagnetic field strength variations using the combined data set, we considered the paleointensity data yielded by different paleomagnetic recorders. A detailed description of these analyses is given in supporting information. Overall, despite some Q PI -reliability differences between data sets derived from different paleointensity recorders, analyses of these data sets yielded a similar relationship between geomagnetic field strength and average reversal rate, supporting the inverse correlation between the two geomagnetic field properties.
A considerable reduction in data set size with strengthening the acceptance criteria, to the point when only two VDMs characterize a time interval, brings forth a reasonable question: whether the remaining data are sufficient to usefully represent the geomagnetic field strength within such poorly populated time bins? In order to check the robustness of our analyses, we also performed a Bayesian changepoint analysis on select data sets, following the procedure described in Ingham et al. (2014). The advantages of this method are that it can be applied to irregularly spaced data with no or poorly estimated errors, allowing us to quantify the number and positions of significant changes within a data series. Following the formulations of changepoint modelling problem of Ingham et al. (2014); see also Gallagher et al., 2011, and references therein), a regression model for paleointensity data (VDMs) is defined as a piecewise function, with the paleointensity model value changing at specific ages (changepoints) and being constant between the consecutive changepoints. The model parameters include the number of changepoints (n) within the time interval of a data series, locations of changepoints, paleointensities for each of the adjacent time intervals bounded by consecutive changepoints, and the error standard deviation (σ) quantifying the level of noise in the data series. The posterior distribution of model parameters is sampled using an iterative Bayesian method based on a transdimensional, reversible jump Markov chain Monte Carlo technique, which generates a large collection of models (typically 10 5 -10 6 ) approximating the posterior distribution of model parameters. This ensemble of models, which can be considered as a random sample drawn from the posterior distribution, is then used to define the average regression model for the time interval of interest, its 95% confidence limits, and ages corresponding to the most significant changes in paleointensity. A detailed description of this method has been published in the studies of Gallagher et al. (2011) and Ingham et al. (2014), and we refer the reader to these papers for more information.
Similarly, to Ingham et al. (2014), we used uniform prior distributions for the model parameters in our changepoint analysis. The prior ranges for model paleointensity values were set to the full range of VDM values for each analyzed data set. For the changepoint locations, we used a discrete approach (e.g., Gallagher et al., 2011), in which the possible locations were confined to an interval defined by the age limits of data in each particular data set at 1-Ma increments, and each unique configuration for a fixed number of changepoints n was assigned equal probability defined by the number of distinct configurations, that is, p = n! (K − n)!/K!, where K is the number of all possible changepoint locations. For the error standard deviation (σ), the prior range was set between 2.0 and 3.5 × 10 22 Am 2 , and for the number of changepoints, a uniform discrete prior distribution between n = 0 and 40 was used.
The input scale parameters for the proposal distributions associated with model perturbations involving a change of changepoint location, paleointensity value and data noise level were optimized to achieve the acceptance rates for proposed models in the range of 10% to 30%. Several test runs with 10 5 iterations were performed for each data set to fine tune the acceptance rates and to ensure that the Markov chain Monte Carlo sampler was converging to a stable state of sampling from the posterior distribution. For all analyzed sets of data, the log-likelihood function increased rapidly within the first few tens of thousand iterations and then remained stable with no systematic trend, indicating that the convergence was reached. The final run was then performed with 10 5 burn-in iterations (which were discarded) followed by 10 6 iterations used to define the average model.
The outcomes of the Bayesian changepoint analyses on data subsets 1 and 2 are presented in Figures 9a and  9b. The paleointensity model based on subset 1 indicates a noticeable decrease in average paleointensity from 4.8 to 2.8 × 10 22 Am 2 at approximately 171 Ma, coincident with the onset of the JHAP (Figure 9a).

Journal of Geophysical Research: Solid Earth
The changepoint was detected in roughly 20% of models. Another significant change of dipole moment was observed at~133 Ma, when average VDM increased from 2.8 to 5.0 × 10 22 Am 2 . These changes were detected by almost every sampled model (Figure 9a). Two spikes in the mean model VDM were observed at 117 and 95 Ma, suggesting substantial changes in paleointensity.
In order to assess the statistical significance of observed paleointensity changes, the Pearson correlation coefficient r was calculated between the average geomagnetic reversal frequency and model paleointensity data, calculated as mean values for 5, 10, and 15 Myr nonoverlapping windows. The resultant r values (r 5Myr = −0.510, r 10Myr = −0.591, and r 15Myr = −0.626) indicate a statistically significant at 0.05 level inverse correlation between the two field properties (critical r coefficient values are −0.374, −0.514, and −0.602 for data series with a time window width of 5, 10, and 15 Myr, respectively).
The changepoint model based on subset 2 yielded similar results (Figure 9b). Although no significant changepoint in paleointensity signal has been detected in the vicinity of the JHAP, the mean paleointensity model indicates a gradual decrease in average VDM between 200 and 134 Ma. A sharp spike in field intensity at~133 Ma is revealed by a changepoint detected in 70% of sampled models with associated increase in average VDM from 3.8 to 5.0 × 10 22 Am 2 . Two additional spikes in the mean model VDM at 117 and 95 Ma are similar to those observed for the model based on the subset 1 data and can be attributed to VDMs obtained from single silicate crystals. Finally, the mean paleointensity model indicates a change in paleointensity revealed by 25% of sampled models at~86 Ma (Figure 9b). The decrease in paleointensity from 5.0 to 4.3 × 10 22 Am 2 is associated with the end of the CNS. Similar to the outcome of the subset 1 analysis, the calculated correlation coefficients (r 5Myr = −0.505, r 10Myr = −0.575, and r 15Myr = −0.638) indicate that observed paleointensity trends are inversely correlated with the geomagnetic reversal frequency at a 0.05 significance level.
Finally, the Bayesian changepoint analysis was applied to the data set filtered using the Ingham et al. (2014) selection approach (Figure 9c). A significant drop in paleointensity from~6.2 to 4.4 × 10 22 Am 2 was detected by approximately 25% of models at~181 Ma. The model paleointensity signal displays a gradual decrease in average VDM between 181 and 134 Ma, followed by a sharp increase in intensity from 3.1 to 5.6 × 10 22 Am 2 at 133 Ma. The entire Cretaceous is characterized by rather stable field intensity with the exception of a peak in mean VDM at~105 Ma, followed by a stable intensity of~5.9 × 10 22 Am 2 . Although paleointensity trends are visually similar to those detected with analyses of subsets 1 and 2, the Pearson correlation coefficients are less conclusive. While the first of the data set representing 5-Myr increment is still characterized by significant anticorrelation between average paleointensity and reversal rate (r 5Myr = −0.497,r crit = −0.374), the other data sets do not show any statistically significant inverse correlation between the two field properties (r 10Myr = −0.529, r crit = −0.532 and r 15Myr = −0.583, r crit = −0.602).

Discussion and Conclusions
In comparison to the previous works that considered the long-term behavior of geomagnetic field strength (e.g., Biggin et al., 2012;Heller et al., 2002;Ingham et al., 2014;Tarduno & Smirnov, 2004;Tauxe & Yamazaki, 2007), the novel aspect of our investigation consists in application of a rigorous approach to estimate the reliability of paleointensity data using a revised suite of Q PI criteria (cf., Biggin & Paterson, 2014). Our analysis of the 65-to 200-Ma paleointensity database reveals systematic changes in the paleointensity behavior with respect to the average reversal rate (e.g., Figures 2 and 3), most importantly, an inverse correlation between the geomagnetic paleointensity and reversal frequency. This correlation is expressed by the consistently strongest median VDM values observed for the CNS, and the low median VDM values observed for the JHAP. In addition, the highest and lowest percentages (P%) of the data exceeding the overall (65-200 Ma) median VDM value appear to be characteristic properties of the CNS and JHAP, respectively, although in some instances the difference between the median VDMs for the JHAP and the periods of moderate reversal rate becomes subtle (e.g., Figure 5). Interestingly, the data indicate that while the intensity range and variability of the field are both inversely correlated with the reversal rate, the minimum intensity envelope has a much weaker correlation (Figures 2 and 3 and Table 1). We note, however, that Pearson correlation coefficients in most cases indicate relatively weak inverse correlations between the median VDM values and average reversal frequency that are not always statistically significant. This is largely due to a low number of time intervals (N = 5). A simple statistical comparison of paleointensity signals for the CNS and JHAP using a nonparametric Wilcoxon rank sum test for equal medians indicates that median VDMs during the CNS are consistently greater than those during the JHAP (at the 5% significance level). This observation is independent of data selection approaches (Table 2). We further note that the P% values are characterized by much stronger inverse correlation with average reversal rate. The Bayesian changepoint analyses conducted on selected data sets (subsets 1 and 2) indicate inverse correlation between the mean VDM values and reversal rate. These analyses also support our observation that a significant increase in paleointensity preceded the onset of the superchron by a few millions of years.
While the long-term trends in paleointensity behavior we identified are, in many ways, similar to those revealed by the prior studies (see section 1), an important new observation is that the observed paleointensity trends are independent of the rigor and combinations of the used reliability criteria and are supported by a variety of statistical tests. This indicates that these observations are robust and that the detected paleointensity trends likely represent characteristic features of the Mesozoic geomagnetic field.
Overall, our analyses suggest that the nonreversing CNS dynamo was capable of generating a field reaching stronger intensities whereas the frequently reversing JHAP dynamo could not produce strong field intensities. The consistently weak geomagnetic field intensity during the JHAP indicated by our analyses is in agreement with the decreased amplitude of the Late Jurassic marine magnetic anomalies that correspond to the age of the Site 801 basalts (e.g., McElhinny & Larson, 2003;Tarduno & Cottrell, 2005). Our results seem to support a view that the apparent low field strength during the MDL may reflect a low-field bias due to rock magnetic effects, and, instead, that the real "MDL" is reduced to an interval of high reversal frequency in the late Jurassic at~150 to 170 million years ago (Goguitchaichvili et al., 2002;Smirnov et al., 2017;Tarduno & Cottrell, 2005).
In addition to the robust field features discussed above, our analyses hint at two other potentially important features in the field behavior. First, a sharp change in the character of data at~133 Ma (e.g., Figure 3 and supporting information Figure S8a) suggests an interesting possibility that the geodynamo transitioned from a regime producing a less variable and weak field, to a regime able to generate significantly stronger and more variable intensities typical for the CNS, about 10 Myr before the onset of the superchron. However, this observation is entirely based on whole rock data, and while the data sets after the Q PI ≥ 3 and Q PI ≥ 4 cutoffs reveal the same basic features, both prioritized approaches reduce the scatter for the CNS and late MID intervals by removing both very low and very high VDMs (Figures 7 and 8 and supporting information Figure S4 and Table S3). Moreover, the AGE + ALT + MD approach reduces the VDM range for the entire MID interval to a level comparable with the JHAP values ( Figure 8 and supporting information Figure S4d and Table  S3). Thus, whether a dramatic change in geodynamo regime preceded the start of CNS by several millions of years remains inconclusive. More high-quality paleointensity data, obtained with experiments designed to maximize the overall technical quality and reliability of data, are required to assess whether this observation is robust.
Second, the data for the first 10-15 Myr of the LATE interval, mostly represented by the VDMs derived from SBG, hint that the "CNS" geodynamo regime producing a more variable field able to reach high intensities could have continued for~10-15 Myr after the cessation of CNS (Figure 3a and supporting information Figures S5a and S8b). Unfortunately, this observation also remains highly tentative because the application of the Q PI ≥ 3 cutoff removes all the SBG data from the early LATE interval (supporting information Figure S6). Interestingly, the SBG data with high Q PI scores contain substantially fewer low VDMs for the CNS and exhibit a much stronger inverse correlation between the intensity and variability of VDMs and the reversal rate (supporting information Figure S6).
However, it is noteworthy that while the data sets obtained from different recorders have the strongest and weakest VDMs for the CNS and JHAP, respectively, the field variability for the CNS suggested by the single crystal data (V% = 10.9%) is several times lower than the variability suggested by the whole rock and SBG data (e.g., Figure 3 and Table 1). Unfortunately, the limited number of single crystal data for the Jurassic does not allow estimating the field variability for that period. However, the plagioclase data from the~56-Ma Nintoku Seamount (The Ocean Drilling Program (ODP) Site 1205) representing a time of moderate reversal rate, indicate nearly three times higher field variability than that of the CNS data (Tarduno & Cottrell, 2005). Thus, in contrast to the whole rock and SBG data, the single crystal data may suggest a more stable field for the superchron in comparison with a time of moderate reversal frequency, consistent with the hypothesis about an inverse correlation between the field stability and reversal rate.
While it is conceivable that the field intensity and variability could have varied throughout the CNS, a less variable field during the superchron indicated by the single crystal data is qualitatively consistent with the analyses of PSV based on paleomagnetic directional data that indicate a more antisymmetric (axial dipole dominated) and stable geodynamo for the CNS (e.g., McFadden et al., 1991;Tarduno et al., 2001Tarduno et al., , 2002Biggin, van Hinsbergen, et al., 2008;Doubrovine et al., 2019). However, the question whether the inconsistency in the field strength variability inferred from different paleointensity recorders represents the true field behavior, or it reflects intrinsic differences in the carriers' ability to record and preserve the paleointensity signal, is outside of the scope of this paper. We refer the reader to the plentiful literature relevant to this topic (e.g., Bowles et al., 2011;Draeger et al., 2006;Fabian, 2009;Ferk et al., 2012;Heller et al., 2002;Smirnov et al., 2017;Smirnov & Tarduno, 2003Tarduno et al., 2006;Tarduno & Smirnov, 2004;Tauxe, 2006;Tauxe et al., 2013;Tauxe & Yamazaki, 2007;Xu & Dunlop, 2004;Yamamoto, 2006;Yamamoto et al., 2003).
The use of modified Q PI reliability criteria in our investigation helped us to demonstrate the robustness of the general trends in the behavior of geomagnetic field strength. However, it is important to note that different data selection approaches have been used in paleointensity research and that the choice of the selection criteria may dramatically affect the outcomes of the analyses of paleointensity database. For example, Ingham et al. (2014) who did not find any statistically significant correlation between the field strength and reversal rate, restricted their final data set to the VDMs obtained using the Thellier-Thellier and microwave families of paleointensity methods with pTRM checks as well as the low-temperature demagnetization-double heating Shaw method. Their selection approach excluded nearly 75% of the site-mean VDMs initially selected for our analysis. In addition, Ingham et al. (2014) accepted evidently biased data such as those based on non-TRM remanence (e.g., Kono, 1974;Sherwood et al., 1993). We note however that, for the 65-to 200-Ma period, most VDMs excluded from our analyses as intrinsically unreliable (i.e., based on non-TRM remanence and/or affected by laboratory alteration) were also excluded by the Ingham et al. (2014) data selection approach. It should also be borne in mind that a significant cause of the difference in the conclusions of that study and the present may be simply that several important studies (e.g., Tauxe et al., 2013;Dodd et al., 2015;Sprain et al., 2016) were published too recently for their results to be included in Ingham's version of the PINT database.
The Ingham et al. (2014) data selection applied to our updated paleointensity database results in a preferential removal of low Q PI VDMs, leaving a total of 202 accepted VDMs for 65-200 Ma. The data set, however, exhibits a cardinally different VDM versus time behavior than that seen from the data filtered with the Q PI suite ( Figure 10). While the JHAP is still characterized by the lowest median VDM value, the Cretaceous is characterized by a very weak positive correlation between the median VDM and the average reversal rate, with the strongest field reached during the MID interval. We feel that the observed discrepancy between the outcomes of Ingham et al.'s (2014) and our analyses stems from omitting a large class of VDMs on the basis of paleointensity method and from using some reportedly biased VDMs in the former. This observation is further supported by the outcomes of our Bayesian changepoint analyses, which indicate a number of significant changes in average paleointensity signal with stronger field during the CNS, compared to the other time intervals (Figures 9a and 9b). The Bayesian approach also suggests a weaker field during the JHAP as indicated by mean paleointensity signal when the subset 1 was used. However, this observation remains somewhat tentative as this change in average VDM for the JHAP disappears when subset 2 was used for the changepoint modeling. Albeit somewhat different, the outcomes of Bayesian analyses based on subsets 1 and 2, both reveal the statistically significant inverse correlation between the reversal frequency and paleointensity during 65-and 200-Ma time interval. In contrast, this analysis, applied to the data selected using the Ingham et al. (2014) approach, yielded discrepant results indicating rather no statistically significant correlation between the two geomagnetic field properties.
A natural question arises at this point-how should the relative effectiveness of the Q PI suite or any other set of selection criteria be judged? In the absence of theoretical constraints, such a judgment cannot be based on our expectations about the long-term field strength behavior (in fact, reliable paleointensity data are the critical prerequisite to constrain theoretical and numerical models). The reliability of paleointensity data should be assessed based on careful consideration of the mechanisms of acquisition and preservation of the primary magnetization as well as on the analyses of the potential for natural and experimental alteration, and statistical bias. We feel that the modified Q PI criteria suite assesses all these factors in an adequate and systematic manner to provide a consistent approach for quantifying our confidence in the reliability of the data. We note, however, that even the comprehensive Q PI suite cannot address all possible data reliability issues, for example, related to rock magnetic biases such as a potential thermochemical remanence (TCRM)-related bias. In addition, some Q PI criteria leave room for subjective interpretation (e.g., AGE, TRM, or LITH).
In our effort to refine the paleointensity database for the 65-to 200-Ma period, we attempted to maximize the objectiveness of data quality assessment and selection. This entailed very lengthy reading and discussion sessions that ensured that published data were treated consistently according to a consensus methodology. Overall, the results of our analyses lend first-order support to the inverse correlation of geomagnetic field strength with the frequency of geomagnetic reversals. This observation, in turn, suggests that variations of these quantities on a scale of tens to hundreds of millions of years during the Mesozoic were likely controlled by the same geodynamic processes. The similarity of this time scale with that of the overall mantle convection may indicate that perturbations of the thermochemical conditions across the core-mantle boundary plays an important role in controlling the overall geodynamo efficiency. However, deciphering shorter-term variations of the geomagnetic field intensity is hampered by the low resolution and uneven distribution of data in the current paleointensity database. Therefore, we stress that acquisition of new paleointensity data with experiments designed to maximize data reliability is crucial for better understanding of the geomagnetic field evolution and for robust geodynamo modeling. These new studies, however, must aim to be as rigorous as possible by reporting the broad range of supporting data and rock-magnetic investigations that strengthen our confidence in paleointensity estimates, as embodied by, for example, the Q PI criteria.