Some Experiments in Extreme‐Value Statistical Modeling of Magnetic Superstorm Intensities

In support of projects for forecasting and mitigating the deleterious effects of extreme space weather storms, an examination is made of the intensities of magnetic superstorms recorded in the Dst index time series (1957–2016). Modified peak‐over‐threshold and solar cycle, block‐maximum samplings of the Dst time series are performed to obtain compilations of storm maximum −Dstm intensity values. Lognormal, upper limit lognormal, generalized Pareto, and generalized extreme‐value model distributions are fitted to the −Dstm data using a maximum‐likelihood algorithm. All four candidate models provide good representations of the data. Comparisons of the statistical significance and goodness of fits of the various models give no clear indication as to which model is best. The statistical models are used to extrapolate to extreme‐value intensities, such as would be expected (on average) to occur once per century. An upper limit lognormal fit to peak‐over‐threshold −Dstm data above a superstorm threshold of 283 nT gives a 100‐year extrapolated intensity of 542 nT and a 68% confidence interval (obtained by bootstrap resampling) of [466, 583] nT. A generalized extreme‐value distribution fit to solar cycle, block‐maximum −DstmBM data gives a nine‐solar cycle (approximately 100‐year) extrapolated intensity of 591 nT. The Dst data are found to be insufficient for providing usefully accurate estimates of a statistically theoretical upper limit for magnetic storm intensity. Secular change in storm intensities is noted, as is a need for improved estimates of pre‐1957 magnetic storm intensities.


Plain Language Summary
Estimating the statistical likelihood of extreme-value geomagnetic storm intensities depends on the selection of data and models. The data used here are discrete samples of a standard time series that measures geomagnetic disturbance. Different methods are examined for identifying data values that can serve as nonredundant representations of storm maximum intensity. Four different statistical models (two formal models and two physically motivated models) are examined as candidate descriptions of the data. Comparisons and contrasts are made of the goodness of model descriptions of the data. Extrapolations are made to estimate once-per-century storm intensities; an upper limit on storm intensity cannot be confidently established; slow secular change is noted in the occurrences of intense storms. General suggestions are made for improving inferences about the future occurrence of intense storms. Results inform projects concerned with forecasting and mitigating the deleterious effects of extreme space weather storms.

The Dst Time Series
The Dst index time series is the longitudinal average of low-latitude, ground-level geomagnetic disturbance (Sugiura, 1964;World Data Center for Geomagnetism et al., 2015). It is useful for identifying the primary evolutionary phases of individual magnetic storms (e.g., Loewe & Prölss, 1997;McPherron) and for long-term statistical analysis on storm occurrence rates. During the main phase of a magnetic storm, intensification of the westward directed magnetospheric ring current causes a proportional decrease in low-latitude, horizontal-component geomagnetic intensity. This causes a storm time decrease of Dst to negative values LOVE MAGNETIC SUPERSTORM STATISTICS 2 of 22 (e.g., Daglis, 2001) that is so distinctive it is sometimes taken to be the defining characteristic of a magnetic storm (e.g., Gonzalez et al., 1994).
The Kyoto World Data Center in Japan calculates the standard version of Dst from hourly mean, geomagnetic time series collected at four long-running observatories (Sugiura & Kamei, 1991): Hermanus (South African National Space Agency) (e.g., Kotzé, 2018), Kakioka (Japan Meteorological Agency) (e.g., Minamoto, 2013), and Honolulu and San Juan (both operated by the U.S. Geological Survey) (e.g., Love & Finn, 2011). A nonstormy, quiescent baseline is subtracted from each observatory time series; the residual disturbances are averaged, with latitude weighting, to give the 1-hr resolution Dst time series. Each hourly Dst value is centered on the bottom of the Universal Time hour: 00:30, 01:30, etc. We use Dst for years 1957-2016, a 60-year extent of time that encompasses numerous magnetic storms of various intensities that occurred over the course of six solar cycles, from the rising phase of solar cycle 19 through to just past the maximum of cycle 24; see Figure 1. There are 525,960 separate hourly Dst data values in the time series considered here; there are no gaps or obvious erroneous values.

Sampling for Extreme Intensities
The Dst time series is autocorrelated, and, especially from hour to hour, it is serially correlated. Individual magnetic storms typically evolve over the course of a couple of days (or so) and within the depth of a storm's main phase, the particular hour when −Dst attains its maximum value is typically preceded and followed by durations of time that also exhibit negative Dst values. Furthermore, the occurrence of one storm is often related to the occurrence in time of another storm. So, for example, a persistent sunspot group might be the source of several coronal mass ejections causing separate magnetic storms (e.g., Gonzalez et al., 2011); similarly, long-lasting coronal holes can give rise to a series of magnetic storms, often separated in time by the duration of one solar rotation (e.g., Richardson & Cane, 2012;Tsurutani et al., 2006). Furthermore, an unusually active solar cycle can give rise to more extremely intense magnetic storms than a relatively subdued solar cycle, and, insofar as the activity level of a given solar cycle represents the state of the quasi-oscillatory solar dynamo, magnetic storms within each cycle are not strictly statistically independent. Recognizing the autocorrelational properties of Dst time series while, at the same time, seeking to characterize the occurrence rate of extreme-intensity magnetic storms in terms of statistical models, we sample the Dst time series in two different ways that are inspired by standard (conditional) sampling methods used in extreme-event analysis (e.g., Coles, 2001;Mudelsee, 2014, Chapter 6).
First, we use a modified "peak-over-threshold" algorithm similar to that used by others in related analyses (Lotz & Danskin, 2017;Love et al., 2015Love et al., , 2018. The 1-hr Dst values are treated in a manner illustrated in Figure 2 for a series of storms that occurred in September 1957. We begin by ranking all of the Dst values by size; the greatest maximum −Dst m value is identified from among all the storms, and this value is set aside for statistical analysis. Next, a quiet level is chosen that, itself, is significantly less than the −Dst c threshold delimiting the weakest storm maximum value we analyze.  1957-2016 occurred in 1957, 1958, and 1959, each within solar cycle 19. To estimate the probability of such a clustering, we recognize that the number of ways k indistinguishable objects can be arranged into n different boxes is . The probability, then, that three out of four storms would be realized in any one of six solar cycles is 6 × = 0.2381. This is not an especially low probability, and so inter-solar cycle clustering might be regarded as relatively small. Second, we use "block-maximum" sampling, whereby peak data values are taken from representative windows of time covering a sequence of data. Block-maximum sampling is used, especially, in the analysis of phenomena exhibiting periodic or quasiperiodic variation. So, for example, hydrological analysis is sometimes made of annual-maximum rainfall or annual-maximum river flow (e.g., Jenkinson, 1955;Westra et al., 2013). For this study of magnetic storm occurrence statistics, the most natural modulation to consider is that due to the solar cycle. As shown in Figure 1, for the six solar cycles (19 through 24) covered by the extent of the Dst time series 1957-2016, we identify six −Dst BM m with block-maximum storms occurring in {July 1959, May 1967, July 1982, March 1989, November 2003. Note that block-maximum sampling of solar cycles omits some very intense magnetic storms-the great storm of September 1957, for example, is not in the block-maximum sampling because it is eclipsed in intensity by the storm of July 1959. For comparison, in Figure 1, we show the seven most intense storms as defined by peak-over-threshold sampling and occurring in {September 1957, February 1958, July 1959, May 1967, March 1989, March 2001-the storm of May 1967 had the same maximum intensity, to within the 1-nT resolution of Dst, as the storm of March 2001.

Theoretical Statistical Models
We assume that the solar-terrestrial conditions that gave rise to extremely intense magnetic storms in the past will be seen again in the future. Under that assumption, we use statistical models, fitted to the −Dst m data, to estimate storm occurrence rates as a function of storm intensity and to make extrapolated estimates of the occurrence rate of storms that are even more intense than those recorded in the −Dst m data. For many statistical analyses, the normal distribution is of fundamental importance; for a random variable x, usually assumed to be independent, the probability density function is where and 2 are the x population mean (or "location") and variance (or "scale"), and the corresponding complementary cumulative distribution is ] . (2) The relative ubiquity of the normal distribution is due to the central limit theorem (e.g., Bohm & Zech, 2010, Chapter 13.1.2), according to which the addition of many underlying statistical processes, described in terms of distributions having means and variances, will result in data that are normally distributed.
In contrast, a positive random variable x is lognormally distributed if lnx is normally distributed. With a change of variables that conserves differential probability, the random variable x has the density function and where lnx denotes the natural logarithm of x, and and 2 are the lnx population mean and variance. The corresponding complementary cumulative distribution is It is important to recognize that a central limit theorem also applies for the lognormal distribution, according to which the multiplication of many independent random variables, each of which is positive and drawn from a distribution having a mean and variance, will be lognormally distributed (e.g., Aitchison & Brown, 1957, Chapter 2.6; Bohm & Zech, 2010, Chapter 3.6.10; Crow & Shimizu, 1988, p. 5).
Following on from previous work (Love et al., 2015(Love et al., , 2018, we hypothesize that storm maximum intensities, as represented by −Dst m , are realized from a giant lognormal stochastic process. In terms of underlying processes that might give rise to a lognormal distribution of magnetic storm intensities, we note, first, that the solar dynamo, with quasiperiodic autoregressive amplification and deamplification of magnetic energy (e.g., Charbonneau, 2010;Conway, 1998;Solanki et al., 2000), is manifest as an 11-year modulation of sunspot number and a corresponding modulation of the rate of coronal mass ejections and geomagnetic storms. Second, the geoeffectiveness of solar wind-magnetosphere coupling is proportional to a multiplicative combination of solar wind velocity, density, and interplanetary magnetic field (e.g., McPherron et al., 1995;Newell et al., 2007). Several combinations of these solar wind variables are lognormally distributed (e.g., Burlaga & Lazarus, 2000;Veselovsky et al., 2010;Yurchyshyn et al., 2005). And, third, within-storm evolution of geomagnetic disturbance can be viewed as an autoregressive filter of solar wind forcing and past states of the magnetospheric-ionospheric system (e.g., Chandorkar et al., 2017;Vassiliadis et al., 1999). Considering these three processes together, then, we can plausibly invoke a lognormal model as a description of the many solar-terrestrial processes that give storm maximum −Dst m data. And, furthermore, we suggest that this lognormal model is most appropriate for the maximum intensities of the most intense storms that are both far-separated in time from previous and subsequent intense storms (and so relatively uncorrelated and causally unrelated), and, also, far removed from nonstatistical quiet time space weather conditions (which is highly autocorrelated).
As a further refinement, we can also consider the likely possibility that magnetic storm intensity has an effective upper bound. Vasyliūnas (2011) argues that ring current intensity is limited by a balance between plasma pressure within the magnetosphere and the pressure of the magnetic dipole. At the point of such a balance, he calculates that the greatest possible value for −Dst m is approximately 2,500 nT. In this context, we might hypothesize that magnetic storm intensities are realized from an upper limit lognormal process. Here, the variable ln is normally distributed, and x is positive and upper limit lognormally distributed, with density function , where x ∈ (0, ); for which the corresponding complementary cumulative distribution is LOVE MAGNETIC SUPERSTORM STATISTICS 5 of 22 (Bezdek & Solomon, 1983;Mugele & Evans, 1951). We note that Vasyliūnas's upper limit on −Dst m is far higher than that which occurred for the March 1989 storm, far higher than that realized for the storm of May 1921 −Dst m = 907 nT (Love et al., 2019b), and far higher than for the Carrington event of 1859, approximately 850 nT (Siscoe et al., 2006) to 1,050 nT .
In describing the statistics of extreme-value data, formal asymptotic theories can be invoked. These theories apply for data that are envisioned as having been taken, as conditional samples, from the large-value tail of some hypothetical "parent" distribution, far away from the core part of the distribution where central limit concepts usually hold. In particular, consider sequences of n independent data samples, each drawn from a "parent" distribution  ⊙ ( ), which can be any one from a wide class of distributions. Denote one such sequence as Y n = { 1 , 2 , · · ·, n } and its (conditional) maximum value as X n = {max(Y n )}. Under the Fisher-Tippett-Gnedenko theorem (e.g., Albeverio & Piterbarg, 2006;Gomes & Guillou, 2015), after normalizing for location and scale and taking the limit n → ∞, the realization in probability of these "block maxima" is given by the generalized extreme-value distribution, for which the complementary cumulative is (e.g., Bali, 2003;Walshaw). Depending on the "shape parameter," , the generalized extreme-value distribution reduces to one of three standard and widely used distributions, A particular parent distribution,  ⊙ , with block maxima that are approximately distributed by one of these three extreme-value distributions (Fréchet, Gumbel, and Weibull), is said to be in the "domain of attraction" of that particular extreme-value distribution (e.g., Albeverio & Piterbarg, 2006;Gomes & Guillou, 2015). In this respect, the lognormal distribution is in the domain of attraction of the Gumbel distribution (e.g., Embrechts et al., 1997, Example 3.3.31;Walshaw, 2013).
Alternatively, consider the maxima of sequences of data samples Y n drawn independently from a parent distribution  ⊙ ( ), suitably normalized and conditional on the maximum of each sequence exceeding some prechosen threshold , denoted as X n = {max(Y n ) > }. Under the Pickands-Balkema-de Haan theorem, after taking the limits n → ∞ and → ∞, the realization in probability of these "peak-over-threshold maxima" is given by a generalized Pareto distribution (e.g., Arnold, 2008), for which the complementary cumulative is Depending on the parameters, the generalized Pareto distribution reduces to one of three standard and widely used distributions, where the power law (e.g., Newman, 2005), often the preferred distribution in extreme-value theory, is a special case of  P for > 0 (e.g., Arnold, 2008, p. 122;Cirillo, 2013, p. 5948). In particular, physical systems that can be described in terms of self-organizing criticality often exhibit event occurrences that can be described in terms of power law or "fractal" statistics (e.g., Aschwanden et al., 2016;Turcotte, 1999). A power law distribution can also result from an autoregressive process perturbed by multiplicative noise (e.g., Sornette) and from a lognormal process perturbed by additive noise (e.g., Montebruno et al., 2019). In the context of space weather, the occurrence of individual magnetic substorms, such as caused by abrupt collapses of the magnetotail, has sometimes been described in terms of self-organizing criticality dynamics (Angelopoulos et al.,
It is important to recognize that extreme-value theory rests on the assumption that the extreme-value data on hand are actual (conditional) statistical samples of a parent distribution  ⊙ . Although the sampling methods assumed under extreme-value theory certainly inspire the designs of our sampling methods described in section 3, our fundamental motivation for performing a sampling of the Dst time series is removal of serial correlation, especially the intrastorm Dst values recording the continuous-time evolution within each storm. And since the Dst time series is not statistical, the −Dst m values we obtain from the Dst do not, in concept, closely resemble a conditional statistical sample of an underlying parent distribution  ⊙ , such as assumed under the Pickands-Balkema-de Haan theorem. That we have hypothesized that storm maximum intensities, as measured by −Dst m , might be viewed as the realization of a lognormal process does not amount to an equating of −Dst m with  ⊙ . On the other hand, the block-maximum −Dst BM m values, being the statistical maxima of many −Dst m data values realized across the solar cycles, do closely resemble independent conditional statistical samples from a parent distribution  ⊙ , as assumed under the Fisher-Tippett-Gnedenko theorem. In this case, we might reasonably view the −Dst BM m as samples from the lognormal process we hypothesize as giving rise to the −Dst m . Having said all of this, we recognize that all statistical models are idealizations. Therefore, we suggest that it is useful to compare and contrast the capacities for the generalized extreme-value and lognormal models for describing the −Dst m data.

Asymptotic Properties of the Models
In examining the basic theoretical properties of the candidate model distributions, we consider, first, those supported on the infinite domain with right endpoint x → ∞. Easily seen is the effect of a horizontal shift of the exponential distribution ( = 0) by an amount h, which results in an exponential scaling of the distribution. In the limit of large x, the Gumbel distribution has a similar property, But for the lognormal distribution, This means that with increasing x, the tail of a lognormal progresses toward 0 more slowly than an exponential distribution-the lognormal is "long-tailed." Generally speaking, with only pathological (nonsmooth) exceptions, a long-tailed distribution is also "heavy" and "subexponential" (e.g., Foss et al., 2013, Chapters 1 and 2; Zwart, 2009, Chapter 2.3), and, indeed, the lognormal is subexponential (e.g., Foss et al., 2013, Chapter 3.4). Usually, these properties are associated with the Fréchet and power law distributions ( > 0); we note that (e.g., Coles, 2001, Example 4.2). The subexponential property of the lognormal distribution holds even though it is within (outside) the domains of attraction of the exponential (power law) and Gumbel (Fréchet) distributions (Goldie & Resnick, 1988, Section 3).
Consider, next, a rescaling of the power law and Fréchet distributions ( > 0) by a positive factor r, This means that the power law distribution and the extreme-value end of the Fréchet distribution are "regularly varying" (e.g., Beirlant et al., 2004, Chapter 2.9.2;Bingham et al., 1987;Zwart, 2009, Chapter 2.1), "scale-invariant," or "self-similar" (e.g., Corral & González, 2019;Ghil et al., 2011;Sornette, 2006)-that is, a scaling of the independent variable x gives a corresponding power scaling of the distribution. In contrast, the lognormal distribution does not have this self-similar property, This means that, with increasing x, the tail of the lognormal progresses toward 0 faster than a power law (e.g., Malevergne et al., 2011, equation 3). Regular variation at infinity is a necessary and sufficient condition for a parent distribution to be in the Fréchet domain of attraction (e.g., Albeverio & Piterbarg, 2006; Chapter 3.2.1). The lognormal (14) is not regularly varying at infinity, and, indeed, its tail behavior is like that for the exponential and Gumbel distributions, From the preceding simple examination of shifting and rescaling properties, we understand that the lognormal distribution is rather special-although it resides in the exponential-Gumbel domain of attraction, it has properties that straddle boundaries between the exponential-Gumbel and the power law-Fréchet families of distributions.
And, finally, consider the model distributions defined over finite domains (shape parameter < 0). As x increases toward the right endpoint x m = − ∕ of the beta and Weibull distributions, (e.g., Hashorva et al., 2010, equation 2.6). These are symmetric with the corresponding properties of the power law and Fréchet distributions (13); indeed, equation (16) defines regular variation near the right endpoint (e.g., Albeverio & Piterbarg, 2006, Chapter 3.2.1). In contrast, for the upper limit lognormal, as x increases toward its right endpoint , which is antisymmetric with the corresponding properties of the lognormal (14) and exponential and Gumbel (15) distributions. As the right-hand endpoint is approached, the upper limit lognormal decreases more rapidly than either of the beta or Weibull distributions. Interestingly, even though the upper limit lognormal has, like the beta and Weibull, a finite right endpoint, it is not in their domains of attraction because it is not regularly varying (e.g., Albeverio & Piterbarg, 2006, Chapter 3.2.1).
The preceding observations demonstrate both asymptotic similarities and differences between the lognormal distributions and the formal extreme-value distributions. Perhaps most importantly, the lognormal (upper limit lognormal) distribution is seen to be heavy, even though its far tail converges toward 0 more rapidly than do the far tails of the Fréchet and power law (beta and Weibull) distributions. Practically speaking, this might potentially affect extrapolated estimates of occurrence rates-if the assumed distributional model is heavier (lighter) than that for the actual underlying process, then the far-tail, extreme-event occurrence rate might be overestimated (underestimated). The degree to which occurrence rate estimates are erroneous, and whether or not they are over-or underestimates, depends on how far out in the tail one extrapolates and the degree to which the model distribution approximates the actual process. We suggest that it is useful to compare and contrast extrapolations obtained from generalized extreme-value distributional models with those obtained from the lognormal models.
LOVE MAGNETIC SUPERSTORM STATISTICS 8 of 22

Reconnaissance Inspection of the Data
Before proceeding with our detailed statistical analysis, it is worth making a reconnaissance inspection of the −Dst m data to help identify a −Dst c threshold value, above which storm maximum intensities might be realized from the tail of a hypothetical statistical distribution. Within the wider extreme-value analysis community, it is standard to inspect plots of the mean (expected value) of the excess as a function of threshold (e.g., Coles, 2001;Scarrott & MacDonald, 2012). For the generalized Pareto distribution, the mean excess is linear in the threshold , (e.g., Davison & Smith, 1990, Section 2;Coles, 2001, equation 4.9).
Recognizing this, in Figure 3, we plot the empirical mean excess E of the −Dst m data as a function of −Dst c threshold. Note that for thresholds of −Dst m ≳ −Dst c = 250 nT (N ≲ 38 storms), the excess is approximately linear, possibly indicating that storms of such intensities are realized from the distributional tail where generalized extreme-value approximations might apply (Tsubouchi & Omura, 2007, Figure 3). Conversely, storms with −Dst m ≲ −Dst c = 250 nT might not be realized from an extreme-value distributional tail. Here, we note that a −Dst c ≈ 250 nT threshold is approximately the level said to define "superstorms" or "superintense" storms that are, almost exclusively, driven by interplanetary coronal mass ejections; less intense storms, those for which −Dst m ≲ 250 nT, can be driven by either interplanetary coronal mass ejections or corotating interactive regions at the front of high-speed streams of solar wind (e.g., Gonzalez et al., 2011;Kilpua et al., 2017;Richardson & Cane, 2012). Recognizing that superstorms are essentially monocausal, it is, perhaps, not surprising that the excess over the threshold −Dst c ≳ 250 nT apparently exhibits some asymptotic linearity.
But we should be careful to avoid overinterpretation of Figure 3. For the lognormal distribution, we note that, for large ,  (19) is, over a finite domain, approximately linear in (e.g., Mitzenmacher, 2004;Perline, 2005). From this, we understand that, although linearity in mean excess might be indicative of data realized from the far-tail, asymptotic regime of a hypothetical distribution, this is not necessarily a Pareto-the data might be realized from some other distribution-even, possibly, a lognormal.

Numerical Methods
For detailed analysis, we fit the candidate statistical models to the −Dst m data by maximizing the likelihood (e.g., Bohm & Zech, 2010, Chapter 6.5) where the product in is taken over all the −Dst m data, where and where A is a normalizing factor, We use a simplex algorithm (e.g., Press et al., 1992, Chapter 10.4) to obtain maximization and to estimate the parameters { , , · · ·}. Once this is done, the cumulative probability for a magnetic storm with maximum −Dst m exceeding some level x ≥ −Dst c is the cumulative, We use a bootstrap method (e.g., Efron & Tibshirani, 1993), resampling the data (with replacement) and refitting each sampled data set with models, to estimate the statistical errors associated with extrapolated storm occurrence rates (e.g., Love et al., 2015;Riley & Love, 2017).

Peak-Over-Threshold Fits
We begin our summary of detailed results with a presentation of fits of the (modified) peak-over-threshold −Dst m data provided by the lognormal model, equation (3), the upper limit lognormal model, equation (5), and the generalized Pareto model, equation (8), where for the latter two cases, we "bound" the maximum allowable −Dst m according to the suggestion of Vasyliūnas (2011) by fixing, respectively, = 2, 500 nT and − ∕ = 2, 500 nT ( < 0). In Figure 4, we show the cumulatives of the −Dst m ≥ −Dst c data and the cumulatives of the candidate models, each for a variety of different −Dst c thresholds; fits with low (high) −Dst c are to data with more (fewer) data. Qualitatively, the fitted model functions provide good representations of the −Dst m data, especially for (say) the rare occurrences of magnetic superstorms with −Dst c ≳ 283 nT (N ≲ 28 storms). On the other hand, for −Dst c ≲ 141 nT (N ≳ 168), many more storms covering a much greater range of −Dst m are fitted, but in these cases, the results are mixed; the lognormal and bounded upper limit lognormal provide reasonably good fits of the data, but the bounded generalized Pareto does not fit so well. This might be an indication that lower intensity −Dst m values are not realized from an extreme-value distribution (as already suggested in section 6).

Statistical Significance
A measure of goodness of fit of the statistical models is provided by the Kolmogorov-Smirnov test (e.g., Bohm & Zech, 2010, Chapter 10.3.5), from which we obtain an estimate of the p value probability for the realization of a discrepancy in probability between the cumulative of a hypothetical model and the data that are as large as or larger than those actually observed (e.g., Bohm & Zech, 2010, Chapter 10.2.4). Good (poor) fits have large (small) probabilities. For a lognormal model fitted to the data for a superstorm threshold of −Dst c = 283 nT, Figure 4f, the p value is 0.97 (a high probability). But this result, like those we have previously reported for similar analyses (e.g., Love et al., 2015), is based on data that were used to estimate the model parameters-it is circular logic (we now recognize) to calculate a p value using the same data used to estimate a model's parameters (e.g., Chave, 2017  Pareto models as descriptions of the peak-over-threshold −Dst m data for −Dst c ≳ 283 nT. We might interpret this as indicating that high intensity −Dst m are realized from a distributional tail where generalized extreme-value approximations might apply (also as suggested in section 6).

Extreme-Event Extrapolations
Of particular interest in projects for mitigating the deleterious effects of intense space weather events (e.g., Bernabeu, 2013;Cannon et al., 2013;National Science and Technology Council, 2085;North American Electric Reliability Corporation, 2016) is a 100-year magnetic storm benchmark (e.g., Love et al., 2015Love et al., , 2016bLove et al., , 2018Pulkkinen et al., 2012;Riley & Love, 2017). Occasionally, interest is also expressed in a 1,000-year storm benchmark (e.g., Gopalswamy;Oughton et al., 2018). Therefore, in Figure 4, we show model extrapolations to both 100-and 1,000-year storm intensities. For a superstorm threshold of −Dst c = 283 nT (N = 28), Figure 4f, we note that extrapolations to 100-year storm intensities range from 520 nT (generalized Pareto) to 545 nT (lognormal), a difference of 25 nT. On the other hand, for a much higher threshold of −Dst c = 400 nT (N = 5), Figure 4h, extrapolations to 100-year storm intensities range from 524 nT (lognormal, generalized Pareto) to 527 nT (upper limit lognormal), a difference of just 3 nT. For the lognormal model, the difference between the 100-year extrapolation for −Dst c = 283 nT and that for −Dst c = 400 nT is 21 nT.
Interest in 1,000-year space weather events needs to be contained by extreme caution. One might, for example, imagine that 1,000-year storms are qualitatively different from 100-year storms, or, at least, one might imagine that they are different from those that have occurred since the start of Dst in 1957. From Figure 4, for a threshold of −Dst c = 283 nT, extrapolations to 1,000-year storm intensities range from 650 nT (generalized Pareto) to 744 nT (lognormal), a difference of 94 nT. For a much higher threshold of −Dst c = 400 nT, extrapolations to 1,000-year storm intensities range from 651 nT (generalized Pareto) to 696 nT (lognormal), a difference of just 45 nT. For the lognormal model, the difference between the 1,000-year extrapolation for −Dst c = 283 nT and that for −Dst c = 400 nT is 48 nT. We conclude that extrapolations to 1,000-year storm intensities are insensitive to model selection. It is noteworthy that all of the extrapolations to 1,000-year extrapolated intensities are far below the upper limit of 2,500 nT suggested by Vasyliūnas (2011).

Superstorm Threshold
In Figure 5, we show the 100-and 1,000-year extrapolated storm intensities for the candidate models, together with 68% (one standard deviation) confidence intervals obtained by bootstrap resampling, refitting, and reextrapolation, each as a function of −Dst c threshold. Consistent with what might have been expected from section 6, the extrapolated intensities for the generalized Pareto model appear to converge for a superstorm threshold of −Dst c ≳ 283 nT (N ≲ 28), and, notably, so do the extrapolated intensities for the lognormal and upper limit lognormal models. The threshold of −Dst c = 283 nT appears to be the (approximate) critical point above which storm intensities are realized from the extreme-value distributional tail that can be well approximated by any of the candidate models. For the upper limit lognormal, and for −Dst c = 283 nT, the 100-year intensity [and 68% confidence interval] are 542 [466,583] nT; the corresponding 1,000-year intensity and confidence interval are 715 [552,797] nT. The statistical uncertainty of these estimates is considerable-the 100-and 1,000-year confidence intervals overlap.
We note that the March 1989 storm, with −Dst m = 589 nT and the most intense storm recorded in the Dst time series, was more intense than the upper limit lognormal 100-year extrapolation above the superstorm threshold. In this respect, the March 1989 storm might be considered to be more than a 100-year storm (e.g., Love et al., 2018), possibly one that just happened to have occurred in the past 60-year time span of the Dst time series. Indeed, for an upper limit lognormal model with −Dst c = 283 nT, the March 1989 storm had an intensity that is only expected to be exceeded (on average) once per 189 years. So, in that sense, the March 1989 storm was remarkable (see section 14).
Interestingly, the second most intense storm in the Dst time series, that of July 1959 with −Dst m = 429 nT, falls below the 100-year confidence interval. Furthermore, we recognize that the 100-year storm intensity estimated here is lower than the 645 nT value given by Tsubouchi and Omura (2007, Section 5.2) and much lower than the 880 nT given by Love et al. (2015), where in contrast to the method used here, the latter study used a low threshold (far below superstorm, −Dst c = 63 nT) so as to include as many storms as appeared to be consistent with a lognormal model. We note, however, that our 100-year extrapolated intensity of 542 nT falls within the (very wide) 95% confidence interval given by Love et al. (2015).

Figure 5.
Peak-over-threshold extrapolated (a) 100-and (b) 1,000-year storm intensities for the lognormal (blue), bounded generalized Pareto (green), and bounded upper limit lognormal (red) models, each for a range of −Dst c thresholds. Also shown are 68% (one standard deviation) confidence intervals (dotted lines) obtained by bootstrap resampling, refitting, and reextrapolation.

Statistically Elusive Upper Limit
Because it is physical, we have, until now, fixed the parameter in the upper limit lognormal model (5) and the parameter in the generalized Pareto model (8) by imposing the 2,500 nT upper bound on the intensity of a magnetic storm given by Vasyliūnas (2011). If, instead, and are allowed to assume their optimal value as free parameters in maximum-likelihood fits of the −Dst m data, one might hope to infer a theoretical upper limit consistent from the statistics of the −Dst m data. To investigate this, we fitted generalized Pareto and upper limit lognormal models to the −Dst m with both and free and for a range of −Dst c thresholds.
In Figure 6a, we see that, for −Dst c ≳ 218 nT, changes from positive to negative depending on the chosen threshold-this means that the data are insufficient, on purely statistical grounds, for determining whether or not the far tail of −Dst m is heavy (and unbounded) or light (and bounded). Relatedly, in Figure 6b, we see that, for −Dst c ≳ 218 nT, the −Dst m data are, again, on purely statistical grounds, insufficient for inferring a usefully accurate theoretical upper limit on storm intensity. From this, we understand that fixing an upper limit on storm intensity provides a small amount of stability in maximum-likelihood estimation of model parameters. We doubt the estimate drawn by Acero et al. (2018), based only on qualitative grounds, that magnetic storm intensity has an upper limit of −Dst m ≈ 850 nT. The data do not support such an estimate. Figure 6. Shape parameter (a) for the generalized Pareto model and (b) the estimated upper limit for the upper limit lognormal (red) and the estimated upper limit − ∕ for the generalized Pareto (green) models, each for peak-over-threshold fits of −Dst m data for a range of −Dst c thresholds. For estimated ≥ 0, the generalized Pareto does not have a finite upper limit, and its tail is said to be "heavy." If a most extreme storm intensity estimate is to be made, it needs to be quoted with confidence intervals, but from what we have seen here, those intervals are likely to be very wide.

Model Selection
Likelihood ratio is a standard model selection criterion (e.g., Bohm & Zech, 2010, Chapter 10.3.4). Among candidate models, the "best" one might be taken as that with the highest relative likelihood. For a lognormal model fitted to the data for a superstorm threshold of −Dst c = 283 nT, Figure 4f, the upper limit lognormal has the highest likelihood, lnL = 5.1652. But for a slightly lower (higher) threshold, such as −Dst c = 238 nT, Figure 4e (−Dst c = 336 nT, Figure 4g), the generalized Pareto has the highest likelihood, lnL = 5.4070 (lnL = 5.3081). In terms of the p values, for −Dst c = 238, 283, and 336 nT, the upper limit lognormal has the highest probability, though the p values for the other models are at least comparable to those for the upper limit lognormal. Insofar as all of the candidate models provide good representations of the −Dst m data (above superstorm threshold), it is, perhaps, not surprising that comparisons of likelihoods and p values do not provide a clear indication as to which candidate model is best. We prefer the upper limit lognormal model because it makes the most physical sense to us. We find it acceptable because it provides a good representation of the −Dst m data, but other models cannot be easily rejected.  (brown, 1957-1986) and the second half (black, 1987-2016) of the Dst time series. Also given are the number N of −Dst m data used in each fit, the natural logarithm of the likelihood lnL, the Kolmogorov-Smirnov p value, and the 100-and 1,000-year extrapolated storm intensities.

Stability and Secular Change
Recalling from section 10 that extrapolations to 1,000-year storm intensities were stable under model selection, we now consider stability under data selection. Recognizing that the March 1989 storm was of unusually great intensity, in Figure 7a, we show fits for the upper limit lognormal model with −Dst c = 283 nT and extrapolations with and without the −Dst m = 589 nT value of the March 1989 storm. The leave-one-out extrapolations are lower than for those in which the intensity of the March 1989 storm is included. For the upper limit lognormal, the leave-one-out 100-year extrapolation [and 68% confidence interval] are 473 [447,491] nT; the corresponding 1,000-year intensity and confidence interval are 567 [521,599] nT. We note that this leave-one-out 100-year confidence interval is not wide enough to accommodate the intensity of the March 1989 storm, but that of the 1,000-year interval is. Although this leave-one-out experiment is artificial, it shows that extrapolations to 100-and 1,000-year storm intensities depend very much on the most extreme storm intensities in the Dst time series.
Next, we show fits and extrapolations under a bisection test, separately using samplings from the first half  and the second half  of the historical Dst time series. We show results for the upper limit lognormal model with −Dst c = 283 nT in Figure 7b, where we also show results for sampling the entire Dst time series   557,876] nT. That the second-half extrapolations are clearly higher than the first-half extrapolations might indicate a slow, secular increase in magnetic storm intensity (related to a secular increase in solar activity) over the past three solar cycles. Noting that the first-half 100-and 1,000-year extrapolations fall within the second-half confidence intervals, but the second-half 100-and 1,000-year extrapolations fall outside of the first-half confidence intervals, an increase in storm intensity might be seen as being statistically significant.
These observations are consistent with the assessment made above that the 1989 storm was of unusually high intensity. It is also consistent with other time series related studies (e.g., Clilverd et al., 1998;Love, 2011) and physics-based studies of secular change in space climatology (e.g., Stamper et al., 1999). The recent increased storm intensity might also represent a reversion to a long-term mean, since, prior to the start of the Dst time series in 1957, a number of magnetic storms had very high −Dst m . In particular, the Carrington event of September 1859 had (1-hr average) −Dst m ≈ 850 nT (Siscoe et al., 2006) to approximately 1,050 nT , the storm of May 1921 had −Dst m = 907 nT (Love et al., 2019b), and the storm of September 1909 had −Dst m = 595 nT (Love et al., 2019a). Each of these, along with the storm of March 1989, had intensities that exceed our 100-year intensity estimate of 542 [466,583] nT for the upper limit lognormal model.

Effects of Conditional Sampling
Looking back to sections 3 and 4, we consider the effects of sampling the Dst time series. In Figure 8, we show cumulatives of the −Dst m data obtained by our modified peak-over-threshold sampling method for −Dst c = 119 nT and −Dst c = 283 nT, each with a quiet time level of = 30 nT. We recall that our modified sampling method excludes most multiple exceedances of −Dst c that occur during some storms. In Figure 8, we also show cumulatives for a compilation of −Dst m data obtained with a more traditional peak-over-threshold sampling method, such as envisioned under the Pickands-Balkema-de Haan theorem; for these, −Dst c = = 119 nT and −Dst c = = 283 nT (quiet time level the same as the exceedance threshold). The traditional sampling method admits multiple crossings of −Dst c for some storms. For comparison, in Figure 8, we show the cumulative of all the −Dst data (these are autocorrelated). We see, immediately, that both our peak-over-threshold sampling method and the traditional sampling method result in distributions for −Dst m data that are very different from that for the autocorrelated −Dst data. Notably, for −Dst c = 119 nT, Figure 8a, our modified (the traditional) peak-over-threshold sampling method reduces number data from N = 3,193 to 263 (422), and for −Dst c = 283 nT, Figure 8b, the reduction is from 135 to 28 (34). More interesting, however, are comparisons of the distributions for the different peak-over-threshold sampling methods. Our modified method results in fewer small −Dst m values, those marginally above −Dst c , than does the traditional sampling method-this is due to redundant counting of some storms under the traditional sampling method. These differences in distribution affect the model fits. The data selected by our modified peak-over-threshold method are better fitted by the bounded upper limit lognormal model, as measured by either lnL or p value, than are the data selected by the traditional peak-over-threshold method.

Block-Maximum Fits
Next, in Figure 9a, we show fits of solar cycle, block-maximum −Dst BM m data by the lognormal model, equation (3), the bounded upper limit lognormal model, equation (5), and the bounded generalized extreme-value models, equation (7), where for the latter two cases, and in parallel with the results presented in section 8, the maximum allowable −Dst BM m is fixed, = 2, 500 nT and − ∕ = 2, 500 nT ( < 0). The cumulatives of the −Dst BM m data and the cumulatives of the candidate statistical models show that all of the model functions provide good and almost identical representations of the solar cycle, block-maximum −Dst BM m data. Regarding statistical significance, bootstrap resampling and refitting gives median Kolmogorov-Smirnov p values ranging from 0.43 for the lognormal model to 0.35 for the upper limit lognormal model. These are not small enough to motivate, on purely statistical grounds, rejection of any of the candidate models as descriptions of the solar cycle, block-maximum −Dst BM m data. Regarding goodness of fit, the generalized extreme-value model has the highest likelihood, lnL = 36.7157, but this is only slightly higher than the model with the lowest likelihood, the upper limit lognormal model, lnL = 36.4650. As was the case for modeling of the peak-over-threshold −Dst m data, section 13, we cannot confidently say which of the candidate models is the best representation of the solar cycle, block-maximum −Dst BM m data. In Figure 9a, we also show model extrapolations to 9 and 90 solar cycles, corresponding, roughly, to 100-and 1,000-year storm intensities if we assume an 11-year solar cycle period. The nine-solar cycle extrapolations range from 553 nT (upper limit lognormal) to 591 nT (generalized extremevalue), a difference of just 38 nT. The 90-solar cycle extrapolations range from 759 nT (upper limit lognormal) to 899 nT (generalized extremevalue), a difference of 140 nT. As with 1,000-year extrapolations of peak-over-threshold −Dst m data, section 10, all of the extrapolations to 90-solar cycle intensities are far below the upper limit of 2,500 nT suggested by Vasyliūnas (2011). Because only six-solar cycle, block-maximum −Dst BM m data are available, it is not practical to use bootstrap resampling to estimate statistical confidence intervals.
Similar to the leave-one-out test performed in section 14, we perform a replace-one stability test of fits of the solar cycle, block-maximum −Dst BM m data. Noting that the March 1989 storm, with −Dst m = 589 nT, was the most intense storm of solar cycle 22, we replace its value with that of the second most intense storm of the cycle, the November 1991 storm, −Dst m = 354 nT. We show fits in Figure 9b. All candidate models provide good representations of the data. For corresponding model fits, extrapolations to 9-and 90-solar cycle intensities are lower for fits in which the November 1991 −Dst m value is used than for those in which the March 1989 storm is used. As we concluded in section 14, extrapolations to extreme storm intensities depend, critically, on the most extreme storm intensities that have actually been recorded.

Small Irreconcilable Differences
We note that the 9-and 90-solar cycle, block-maximum extrapolations (553 and 759 nT) are slightly higher than the 100-and 1,000-year peak-over-threshold extrapolations, each obtained for an upper limit lognormal, 542 [466,583] and 715 [552,796] nT for −Dst c = 283 nT. This is a systematic difference. With block-maximum sampling of solar cycles 19 through 24, we have the ranked −Dst BM m values of {589, 429, 422, 387, 325, 222 nT}, but for peak-over-threshold sampling, the seven most intense storms have six ranked −Dst m values of {589, 429, 427, 426, 422, 387 nT} (two storms have −Dst m = 387 nT). The distribution of the block-maximum intensities is flatter than the distribution of the peak-over-threshold intensities. Flatter distributions give more extreme extrapolations, but the differences in the extrapolations are, in this case, small. For the upper limit lognormal 100-year and nine-solar cycle extrapolations, the difference is just 11 nT. This is well within the 68% confidence interval obtained by bootstrap resampling of the peak-over-threshold samples.

Conclusions
The existing −Dst m data  can be about as well described by lognormal models as by formal extreme-value models. The models permit extrapolation to 100-year storm intensities. Such extrapolations are useful, but, beyond that, we have found it challenging to infer other integrated qualities of extreme-event storm intensities-the data are about as well-described by models that have infinitely long tails as by those having finite right endpoints, by models with tails that are either regularly varying or not, and by models with tails that are either heavy or light (though, perhaps, not especially heavy and not especially light). That we have not been able to draw more specific inferences is reflective of the flexibility of the distributional models and the limited information content of the −Dst m data.
This study is an advancement on previous work which we either performed (e.g., Love et al., 2015) or with which we have been associated (e.g., Riley & Love, 2017). Whereas those works were focused exclusively on lognormal and power law distributional models of the −Dst m data, in this study, we additionally consider the upper limit lognormal and formal extreme-value models. Whereas those works were focused on peak-over-threshold data for single, specific thresholds intended to encompass as much data as possible, in this study, we explore a range of thresholds −Dst c , selecting a smaller data set that shows possible asymptotic consistency with extreme-value distributions, and we also consider block-maximum data selection. This focus on the most extreme superstorm intensities has led to a lower 100-year extrapolated estimate of 542 nT as compared to 880 nT given by Love et al. (2015). These very different estimates illustrate sensitivity to data and model selection.
Further progress in characterizing the statistical properties of the −Dst m data, and especially in reduction in the uncertainty in extrapolated intensities, would benefit from theoretical refinements in specification of model distributions and their parameters. Although we advocate for use of the upper limit lognormal, tighter specification of the theoretical upper limit of storm intensity is needed if tight confidence intervals are required for extrapolations to extreme intensities. We appreciate that a view far across the past is necessary, though not sufficient, for making modest forecasts of the future (e.g., Saffo, 2007). We recognize that estimating pre-1957 storm intensities is a high priority. Noting that models of the post-1957 −Dst m data yield 100-year extrapolated storm intensities that are eclipsed by several pre-1957 storms, we might reasonably infer that our extrapolations are underestimates.