Reproducible Aspects of the Climate of Space Weather Over the Last Five Solar Cycles

Each solar maximum interval has a different duration and peak activity level, which is reflected in the behavior of key physical variables that characterize solar and solar wind driving and magnetospheric response. The variation in the statistical distributions of the F10.7 index of solar coronal radio emissions, the dynamic pressure PDyn and effective convection electric field Ey in the solar wind observed in situ upstream of Earth, the ring current index DST, and the high‐latitude auroral activity index AE are tracked across the last five solar maxima. For each physical variable we find that the distribution tail (the exceedences above a threshold) can be rescaled onto a single master distribution using the mean and variance specific to each solar maximum interval. We provide generalized Pareto distribution fits to the different master distributions for each of the variables. If the mean and variance of the large‐to‐extreme observations can be predicted for a given solar maximum, then their full distribution is known.


Introduction
The space plasma environment near the Earth is highly dynamic, with its own space weather. Space weather impacts include electrical power loss, aviation disruption, interrupted communications, and disturbance to satellite systems (Baker & Lanzerotti, 2016;Hathaway, 2015). The state of the solar wind and the Earth's geomagnetic response have now been almost continually monitored by ground-and space-based observations at reasonably high resolution at over the last five solar cycles. Each of the last five solar cycles has a different duration and peak activity level for the interval around the maximum (hereafter solar maximum interval), which will be reflected in the intensity of space weather activity at Earth. This paper seeks to quantify how these different activity levels during each solar maximum interval are manifested in key space weather relevant physical parameters. It is not a study of solar wind-magnetosphere coupling per se (see Newell et al., 2007, for an example of such a study); however, statistical studies of this kind (Tindale & Chapman, 2016) can shed light on how these coupling parameters vary with solar cycle variation in the overall level of activity.
The solar corona and solar wind drivers, and Earth's geomagnetic response, are routinely characterized by space weather variables that are both multicomponent in their physical origin and can exhibit long-tailed statistics such that the largest events are significantly more likely than would be expected from a Gaussian distribution. Space weather events correspond to the tails of these distributions (Baker & Lanzerotti, 2016;Hathaway, 2015), and there is a considerable interest in both reconstructing past space climatology (Lockwood et al., 2017(Lockwood et al., , 2018 and the possibility of quantitative prediction of the probability distribution of these more extreme values for the next solar maximum interval.

10.1029/2018SW001884
The solar corona shows solar cycle variation (Luhmann et al., 2002;Schwenn, 2006) leading in turn to variation in the rate of flares and coronal mass ejections (Cremades & St. Cyr, 2006;Wheatland, 2000) and their geoeffectiveness (Shen et al., 2014;Thomson et al., 2010). While it is well known that overall geomagnetic activity tracks that of the solar cycle (e.g., Baker & Lanzerotti, 2016;Hathaway, 2015;Richardson & Cane, 2012, and references therein), the behavior of the extremes do not necessarily track the time average, so that knowledge of the full distribution is needed. For example, the time average of the aa index always exceeds a baseline value, and this baseline increases linearly with averaged sunspot number; however, the average aa values above this baseline show significant scatter and can take relatively large values even when sunspot numbers are low (Feynman, 1982).
The detailed quantification of the time variation of the statistical distribution of key space weather relevant variables is thus a topical question. The distributions of geomagnetic activity (Tanskanen et al., 2017) and geomagnetic indices (Campbell, 1979;Lockwood et al., 2017Lockwood et al., , 2018 vary both within and between solar cycles (Tindale & Chapman, 2016. The distributions of solar wind variables can be non-Gaussian (Veselovsky et al., 2010, and references therein). The distributions of fluctuations in solar wind magnetic energy density (Hnat et al., , 2011Kiyani et al., 2007) are non-Gaussian and show solar cycle dependence. Fluctuations in geomagnetic indices (Hnat et al., 2002(Hnat et al., , 2003(Hnat et al., , 2005 track this behavior across the distribution. Solar wind-magnetosphere coupling parameters (Tindale & Chapman, 2016) also track the variation between and across solar cycles seen in distributions of solar wind magnetic field. There have been recent attempts to survey the statistics of extreme events from sun to solar wind to Earth assuming power law statistics and time stationarity (Riley, 2012). Recently, Hush et al. (2015) found that the underlying distribution functional form of extreme bursts in the AE index does not vary significantly with the solar cycle, with their relative intensity tracking the solar cycle, suggesting robust underlying processes that can be modeled on a physical basis. By assuming and fitting a lognormal form for the distribution, Lockwood et al. (2017Lockwood et al. ( , 2018 found evidence that the underlying distribution of geomagnetic indices and power input into the magnetosphere are approximately invariant in their underlying distribution functional form, which provided a powerful tool for past space climate reconstruction. However, as Lockwood et al. (2018) note, the lognormal does not necessarily provide the best fit across the full distribution for all variables of interest. Indeed, Tindale and Chapman (2017) found that the distribution of solar wind magnetic field can be far from lognormal. Here we will show how to test whether the functional form of the distribution of a given variable is invariant against solar cycle dependent changes, without assuming any specific functional form, lognormal or otherwise, for the underlying distribution. Only when this invariance is established do we then perform a fitting to specific distributions.
In this paper we focus on key physical variables that are used in the modeling and characterization of space weather impacts at Earth. The F10.7 index (Tapping, 2013) of solar coronal radio emissions is used by many operational space weather models as their prime solar input. It is correlated with the density of the upper atmosphere, which in turn has consequences for the design and operation of satellites in low Earth orbit (e.g., Vedder & Tabor, 1992). Solar wind drivers of geomagnetic space weather disturbances include solar wind dynamic pressure P Dyn and effective convection electric field E y ∼ vB z observed in situ upstream of Earth. Two key indices that characterize magnetospheric response at relatively high time resolution as observed at the Earth's surface and are widely used to parameterize the intensity of space weather events are the enhancement of the ring current D ST , and auroral activity at high latitudes, the AE index (Mayaud, 1980). Quantile-quantile (QQ) plots (Gilchrist, 2000) compare the statistical distribution of one data sample against that of another. Using a QQ plot analysis over intervals of the maxima and preceding minima of solar cycles 23 and 24, we recently found (Tindale & Chapman, 2016) that samples of the largest events change in a different manner to the bulk of the distribution as we compared one solar maximum interval to the other. The distribution is multicomponent: It has (at least) two components-a core or bulk of smaller amplitude, more frequently occurring values, and a long tail of larger, less frequently occurring values. The QQ plot can be used to identify the threshold, that is, the value or quantile that is at the crossover between the core of the distribution and the long tail of extreme values. Importantly, the QQ plot can be used to identify this threshold without requiring any fitting of the functional form of the distribution. Looking across the past five solar cycles, we will use this method to identify the threshold at which there is a transition in the data, from the bulk of the distribution to the long tail, for each of the key physical variables F10.7, P Dyn , E y , AE, and D ST . We then find that the observations in excess of this threshold, the distribution long tail, follow a distribution that only differs in its mean and variance from one solar cycle maximum to the next. The long tail (threshold exceedences) of distributions from a given solar maximum interval then rescale to any other when normalized

10.1029/2018SW001884
to the exceedence mean and standard deviation of that solar maximum interval. Each of the variables F10.7, P Dyn , E y , AE, and D ST has a distinct distribution long tail, and each of these for each solar cycle maximum can be described by its own single master distribution for that variable along with the exceedence mean and variance for that unique solar maximum interval.
Extreme value theory (EVT) is a well-developed approach to characterizing the statistics of large observations, which has found application in space weather (Thomson et al., 2011). It provides a statistical characterization in terms of generalized extreme value and generalized Pareto distributions (GPDs). Pioneering application to space weather includes the F10.7 index (Vedder & Tabor, 1992), the half-daily aa index (Silbergleit, 1999;Siscoe, 1976), and the D ST index (Silbergleit, 1996;Tsubouchi & Omura, 2007). Another application of EVT in the context of space weather has been to energetic electron fluxes (Koons, 2001;Meredith et al., 2015), which also established an unexpected upper bound to the flux of relativistic killer electrons (O'Brien et al., 2007). Important fine-scale details of geomagnetic ground effects have recently been studied using this framework (Beggan et al., 2013;Thomson et al., 2011). EVT studies have recently been performed on extreme solar flares (Elvidge & Angling, 2018) and in the solar wind Davidsen, 2010, 2011), but this work did not differentiate solar cycle dependence as we do here; we will obtain GPD fits for solar maximum intervals of solar cycles 20-24.
However, as well as the important problem of the most extreme events, the natural hazard communities in space weather (Boteler, 1991) and elsewhere, notably hydrology (Rodríguez-Iturbe et al., 1972), have always needed to consider the chance of exceeding more typical, and thus more frequently exceeded, thresholds (Vanmarcke, 2010). But it is still not widely realized that the GPD itself can be applied at thresholds other than just very high ones. Indeed, Hosking and Wallis (1987) noted that the GPD's "applications include use in the analysis of extreme events, in the modeling of large insurance claims, as a failure-time distribution in reliability studies, and in any situation in which the exponential distribution might be used but in which some robustness is required against heavier tailed or lighter tailed alternatives." Here we employ the GPD simply as a flexible fitting distribution. We find that the transition quantile identified by QQ plots identifies an appropriate threshold. Above that threshold we find a distinct GPD master distribution for observations of each of the F10.7 index, the solar wind dynamical pressure, and effective electric field E y and the D ST and AE indices. These distributions can be long tailed but are not in the heavy-tailed class (Embrechts et al., 1997), so that a relatively small data sample is required to quantify the mean and variance compared to that needed to resolve the distribution long tail. An estimate of the mean and variance could then be obtained over a relatively short time interval at the beginning of an interval of solar maximum activity, and this would then provide a prediction of the likelihood of occurrence, and hence the return times, that could be expected for the more extreme values that may occur during that solar maximum interval.

Data Sets and Sampling of the Solar Maximum Intervals
We analyze time intervals (defined below) spanning each of the last five solar maxima using observations on an hourly time base; all of the data analyzed here is from the extended OMNI2 data set extracted from NASA/GSFC's OMNI data set (King & Papitashvili, 2005) through OMNIWeb (the OMNI data were obtained from the GSFC/SPDF OMNIWeb interface at https://omniweb.gsfc.nasa.gov). Solar wind variables are observed upstream of Earth's magnetosphere in situ by satellites that at different times are at different locations and so has been time shifted to a common location in the OMNI2 data set.
We present studies of the F10.7 index, the solar wind dynamic pressure P Dyn , an estimate of the solar wind convection electric field E y , and geomagnetic indices AE and D ST . The F10.7 radio emissions originate high in the chromosphere and low in the corona of the solar atmosphere. The F10.7 index (Tapping, 2013) tracks solar ultraviolet forcing of the upper atmosphere. Many operational space weather models use the F10.7 index as their prime solar input (Codrescu et al., 2012;Dudok de Wit & Bruinsma, 2011). The dynamic pressure (the solar wind momentum flux density [Lopez et al., 1986;Schwenn, 2006]) P Dyn = (1 + 4n ∕n p )n p v 2 sw of the solar wind compresses the dayside magnetosphere (Wing & Sibeck, 1997), enhancing the ring current (where n p and n are the proton and alpha particle densities and v sw is the solar wind speed). We also consider the OMNI2 estimate of the solar wind convection electric field E y = −v sw B z (magnetic field B z in GSM), which is a key driver of geomagetic activity. Geomagnetic indices are derived from ground-based magnetometer observations (Mayaud, 1980) and are widely used to indicate the intensity of space weather events. Note. The bottom line gives the threshold quantile q E used to distinguish the long tail of the distribution so that, for example, for the AE index q E = 0.75, and 25% of these records are used in the statistical analysis of the long tail of the AE index.
The D ST index measures low-latitude global variations in the horizontal component of the geomagnetic field, thus representing the strength of the equatorial ring current. The AE index is comprised of the difference between the most positive and most negative deviations seen at high latitudes.
We will quantify how the statistical distribution of each of these quantities changes from one solar maximum interval to another. In order to form a distribution we need to select time intervals that correspond to each solar maximum interval that are both long enough to capture the more extreme events with statistical significance and short enough that within the sample, the time series can be treated as quasi time stationary. Both the duration and the time between one solar maximum interval and the next are variable. In order to test the robustness of our results we have selected samples of the solar maxima using two different criteria: (i) 3.5-year intervals starting 2.5 years following the previous minimum and (ii) 2-year intervals centered on the monthly sunspot number maximum. We use the following dates (first of each month) for the minima: August 1964, March 1976, September 1986, May 1996, and December 2008and the maxima: November 1968, December 1979, July 1989, March 2000, and April 2014. We obtained similar results using these two different criteria and we present in detail the results using samples selected with criterion (i). Table 1 gives the number of OMNI2 records in each of these samples. Looking across the past five solar cycles, we will use QQ plots to identify the threshold at which there is a transition in the data, from the bulk of the distribution to the long tail, for each of the key physical variables. These threshold quantiles q E are also given in Table 1; the fraction 1 − q E of these OMNI2 records exceed the threshold and form the distribution long tail.

Curve Collapse of the Distribution Tails
We use the samples of the last five solar maxima to form distributions, which we can plot as the probability density P(x), or in terms of the cumulative density function C(x), where x is the F10.7 index (Figure 1), the solar wind dynamic pressure P Dyn (Figure 2), the estimated solar wind convection electric field E y (Figure 3) or geomagnetic indices AE (Figure 4) and D ST ( Figure 5). It is convenient to plot the survival distribution S(x) = 1 − C(x) as this emphasizes the distribution tail of large-to-extreme values, and it gives the return time, where Δt is the OMNI2 record cadence of 1 hr as we discuss in more detail in Section 4. The figures plot both the survival functions and the probability densities for comparison. We plot the empirical survival function where for observations x k , k = 1 … N, the x axis plots the observations in ascending rank order, such that k = N is the largest value, and the y axis plots 1 − k∕N. Each filled marker on such a plot is a single hourly observation. Uncertainties in the survival functions are indicated by the shaded regions and are estimated using Greenwood's formula (Greenwood, 1926;Kalbfleisch & Prentice, 1980). On the probability density plots the bin-by-bin scatter provides an estimate of the sampling uncertainties; samples are shown down to the one count per bin level.
We focus on the long tails of these distributions, that is, observations that exceed a threshold. The threshold is chosen at a fixed q E quantile for each physical variable, using QQ plots as described in detail in the next section. The q E quantile identifies the threshold x = u(q E ) below which the fraction q E of observations are found, so that for q E = 0.9, 90% of observations take values x < u(q E ). The threshold u(q E ) for fixed q E thus varies between one solar maximum interval sample and the next; it is indicated for the maximum of solar cycle 23 on the figures. Other quantiles may be directly read off the survival distribution plots as 1 − C(x) is the fraction of the observations that are found at values greater than x.  This then is our essential result: That for each of the F10.7 index, solar wind P Dyn , estimated convection E y , and the geomagnetic indices AE and (−)D ST , the long tail of the distribution at any of the last five solar maxima Figure 3. Single functional form for the distribution long tail of estimated solar wind convection electric field E y over the last five solar cycles. The format is as in Figure 1. Curve collapse is tested on the positive tail, which corresponds to −B z IMF.
can be approximately described by a single functional form specific to that physical variable (each has a single master distribution) along with the mean E and standard deviation E of that variable's exceedences, specific to each unique solar maximum interval.

QQ Plot Determination of the Thresholds u(q E )
We have taken each of the variables under consideration to be distributed with two components: a core of smaller amplitude, more frequently occurring values, and a long tail of larger, less frequently occurring values that are in excess of a threshold q E . We have found that normalizing to the mean and variance of the long tail does not, in general, give a curve collapse for the entire distribution, but it does for the long tail. Therefore, each variable has, for physical reasons, a distinct response in its core and in its long tail to different levels of activity around solar maximum. The value for the threshold q E can be determined from the data by using QQ plots. Importantly, the QQ plot can be used to identify this threshold directly from the data without the need to fit specific functions to the observed distribution (Gilchrist, 2000;Tindale & Chapman, 2016. QQ plots of one data sample against another can be used to test whether the samples originate from the same underlying distribution functional form. To compare a sample of observations x q with a reference sample x R the QQ plot has as its coordinates the quantiles (x R (q), x q (q)) where the likelihood q (value of the cumulative density function) acts as a parametric coordinate. Since the two sets of observations are unlikely to coincide on precisely the same quantiles, it is necessary to resample the data, here using kernel density estimation (using a constant width Gaussian kernel; Coles, 1991;Gilchrist, 2000).
We show an example of one of these plots, for the AE index, in Figure 6. The maximum of solar cycle 23 is used as the reference distribution x R as it is recent, and thus less subject to data gaps, but explores a wider range of values than the unusually quiet solar cycle 24 maximum. In order to emphasize changes in the distribution, this plot is compensated, that is, we plot x q − x R versus x R . If the x q and x R are drawn from the same distribution, that is, if the distribution for a given solar maximum interval is the same as that for the solar cycle 23 maximum interval, then they have the same quantiles and such a plot gives a straight line x q − x R = 0. If a solar maximum interval has the same underlying distribution functional form f as that for solar cycle 23 maximum, but has different mean and standard deviation so that f (x q ) ∼ f ( x R + ), then on the compensated plot x q − x R = ( − 1)x R + , which will be a single straight line offset from, and at an angle to, the line x q − x R = 0. Straight line fits are overplotted on the x q < u(q E )data. If an entire sample is changing in distribution in the same manner from one solar cycle to another then these straight lines should fit the entire data set. Instead, we can see clear breaks in behavior at u(q E ). From the perspective of how the sampled distribution of the physical variable changes from one solar maximum interval to another, this method identifies two components in the distribution. We then use these plots to approximately identify the quantiles u(q E ) used as the thresholds for the distribution long tails in Figures 1-5. For simplicity we use a single value of q E for each physical variable, identified from QQ plots.
Our finding, that there is more than one physical component to the observed data, which can be identified from their different response to solar cycle changes, is not unexpected; the time series are comprised of relatively quiet intervals punctuated by large bursts, that is, coherent structures in the solar wind seen in P Dyn and substorms and storms seen in AE and D ST . The QQ plots provide a model-independent method to approximately identify the threshold between these two components; the former form the bulk of the distribution, and the latter, the long tail.

GPD Fits to the Distribution Tails
Within the EVT approach, the GPD can be motivated by the Pickands-Balkema-de Haan theorem (Coles, 1991;Embrechts et al., 1997) and fitted to the observed distribution tail that is then defined as the set of exceedences of observations above a high threshold. This however relies on finding the value of the threshold for which the GPD will hold.
In the absence of other information about the system from which the observations derive, this is usually done by finding the range of threshold values for which the mean excess, or the scale and shape parameters of the GPD, do not change (Embrechts et al., 1997;Ghosh & Resnick, 2010). That the GPD should provide a good description does not automatically follow because solar wind variables are not strictly independent, some, for example, have autocorrelation timescales of several tens of hours (Elliott et al., 2013).
The GPD can also be used as a convenient fitting function (Hosking & Wallis, 1987) to characterize the solar cycle independent master distribution for each physical variable. Results of this procedure should then be understood as a model for the observed values rather than a prediction for more extreme events that go beyond the observed range. This is the approach taken here. The Pickands-Balkema-de Haan theorem states that a sample of independent, identically distributed variables X that exceed a sufficiently high threshold will have exceedences z = (X − ) that follow the GPD. The GPD's survival distribution is We perform a GPD fit to the scaled exceedences of the distribution long tails for each physical variable. For each variable, the samples for the five solar maxima are first independently rescaled to (x k − E )∕ E using each mean E and standard deviation E of observations that exceed the threshold u(q E ) for that solar maximum interval. These five rescaled samples are then aggregated to form a single sample, from which we obtain the exceedences z = (x − u(q E )) using the quantile u(q E ) of the aggregated sample. We then perform a maximum likelihood GPD fit to the exceedences of this aggregate over the five solar maxima. An input to the fitting procedure is the threshold for which we again use the quantile u(q E ) for q E identified from the QQ plots above and for which the distributions collapse onto a single functional form. This is a novel method as it uses two physical insights about the physical variables to determine (i) that they are expected to be multicomponent and ii) that their multicomponent nature is revealed by how they change in distribution from one solar maximum interval to another. Without this additional physical input, the choice of threshold would rely solely upon standard tests for the GPD, such as a linear dependence of the mean excess on the threshold, or fitted GPD parameters if these can be seen not to vary as the threshold varies.
The GPD fits are overplotted on the right hand panels of Figures 1-5 and can be seen to lie within the data uncertainties. For the survival distributions we also overplot (inset, top right) the 95% confidence intervals of the GPD fit on the data. For D ST the single GPD fit does not capture the roll-off seen in the largest few Note. The exceedence thresholds are at the q E quantiles.
∼10 observations in the interval of solar maximum of cycle 24 and to a lesser extent, that of solar cycle 23. There is also a systematic deviation in P Dyn from the GPD although they do coincide within observational statistical uncertainties. These deviations from the GPD fit are for the largest 0.1% of the observations, that is, the largest few tens of hourly OMNI2 records in a 3.5-year-long sample where the uncertainties become large. This suggests a large uncertainty on any extrapolation of the GPD fits to estimate the probability of occurrence of larger, as yet unobserved events. However, we note that other estimates based on fitting specific distributions, and on EVT, also carry large uncertainties (Riley & Love, 2017).
A check on sensitivity to the selected time intervals used to sample each solar maximum is provided by Table 2, which gives the GPD parameters for the two different sampling criteria. These are in agreement within the 95% confidence levels, the exception being the F10.7 index. This can be seen from Figure 1 to have discretization (Tapping, 2013) that is significant in the extreme values, which will in turn affect GPD fitting. An appropriate resampling of the underlying F10.7 data may address this.
The GPD encompasses the three Fisher-Tippett subclasses of EVT (Embrechts et al., 1997). We find that F10.7 is in the Weibull class (k < 0), consistent with Vedder and Tabor (1992). AE (k = −0.071, = 1.169) is close to the Gumbel (k = 0) case and is almost exponential (k = 0, = 1). By contrast P Dyn , E y , and D ST are in the Fréchet class (k > 0). The F10.7 index distribution's tail falls off faster than an exponential, while the AE index decays only just faster than exponentially. This can also be seen in the concave shape of the survival functions when plotted on semilog axes as here. On the other hand the solar wind dynamic pressure and the D ST index have long tails that decay more slowly than exponential (subexponential [Embrechts et al., 1997]), with a correspondingly increased likelihood of extreme events. Importantly for all the exceedence distribution long tails, k < 1∕2 so that both the mean and variance are finite (Coles, 1991;Embrechts et al., 1997).
Since D ST is the ground magnetic field's response to all current systems it may, in addition to capturing changes in the ring current, also be directly influenced by magnetopause currents, which in turn are directly driven by solar wind dynamic pressure (e.g., Zhang et al., 2004, and references therein). Following Gonzalez et al. (1994), we corrected for this and obtained a GPD with fit parameters k = 0.199 and = 0.675 so that the corrected D ST long tail has essentially the same distribution as that before correction , consistent with (O'Brien & McPherron, 2000).
The mean excess dependence on threshold for D ST was previously used by Tsubouchi and Omura (2007) to identify the threshold to obtain a GPD fit for a preselected subset of the observations that identified the most intense events. These most intense events fall into the region of the distribution (see Figure 5 for −D ST ) where there is a solar cycle dependent roll-off at the largest values.

Survival Functions and Return Times
The survival function is directly related to the return time. Consider a return level x m that is exceeded once every m observations. Then the value of the survival function for x m is If in time interval we have N observations, then the average time we will wait to see one observation x > x m is the return time R(x m ) = m ∕N, which is where we make observations every Δt = 1 hr here. The return time is an averaged quantity; it indicates the average of the waiting times between the occurrence of events x > x m . Importantly, it does not imply that a single observation of x > x m will occur once every R(x m ). For each solar cycle maximum the return times can thus also be described by their own single master distribution for each variable along with that variable's exceedence mean and variance for that unique solar maximum interval; the master distribution is just that determined for the survival function here.
The GPD is fitted to the distribution exceedences and can be related to the probability of the full set of observations as follows (see Coles, 1991). The GPD is The probability that an observation exceeds the threshold u to become part of the population of exceedences, that is, X > u, is S(u) = Pr {X > u} so that the probability of exceeding a return level x m in the full set of observations is which gives the GPD return time from (3). Using (2), the return level estimated from the GPD is

Discussion and Conclusions
We have quantified the variation between the last five solar maxima of the statistical distributions, that is, survival or return time distributions, of large-to-extreme observations of key physical variables that characterize solar and solar wind driving of space weather and the magnetospheric response: the F10.7 index (Tapping, 2013) of solar coronal radio emissions, the dynamic pressure P Dyn , and estimated convection electric field E y in the solar wind observed in situ upstream of Earth, the enhancement of the ring current D ST , and auroral activity at high latitudes, the AE index.
These statistical distributions can be long tailed, and the tail of the distribution that corresponds to the most space weather-effective events changes in a different manner to the bulk of the distribution as we compare one solar maximum interval to another. These components of the distribution may in turn map onto distinct physical processes. QQ plots can be used (Tindale & Chapman, 2016 to track how an observed distribution has changed from one interval of time to another. Looking across the last five solar maxima, we used QQ plots to identify the threshold between the bulk of the distribution and the tail, which change in a different manner from one solar maximum interval to the next. Each of the physical variables F10.7, E y , P Dyn , AE, and D ST has a distinct distribution long tail that is comprised of observations above a threshold specific to that variable. We then find that for a given physical variable, the only change in the distribution tail of return times between one solar cycle maximum to the next is in its mean and variance. For each physical variable, the threshold exceedences, that is, the distribution long tail, from a given solar maximum interval rescales to any other when normalized to the exceedence mean and standard deviation of that solar maximum interval. The distinct distribution long tail for each physical variable (F10.7, P Dyn , E y , AE, and D ST ) at each solar cycle maximum interval can be approximately described by a single master distribution for each variable along with the variable's exceedence mean and variance for that unique solar maximum.
If this behavior holds for future solar maxima, then it offers a predictable aspect of space weather climatology. For the class of distributions (GPD with k < 1∕2) found here, a relatively small data sample is required to quantify the exceedence mean and variance compared to that needed to resolve the distribution long tail. An estimate of the exceedence mean and variance could then be obtained over a relatively short time interval at the beginning of an interval of solar maximum activity. A prediction for the (large-to-extreme observations) mean and standard deviation could, using the results from our study, be translated into a prediction for the full return time distribution of the large-to-extreme observations. In the case where a variable's distribution of large-to-extreme observations is close to exponential, as found here for AE, then to a reasonable approximation only a prediction of the mean is required as the variance is then simply related to the mean.
Space weather events seen in these variables are events or bursts, that is, intervals in the time series where a given physical variable's value is continually above a crossing threshold. Our results pertain to the occurrence probability of observed values; however, for a stationary time series this will also constrain burst properties from level crossing theory (Boteler, 1991;Cramér & Leadbetter, 2004;Vanmarcke, 2010). If we consider some value for the crossing threshold u, then the survival function 1−C(u) at that value of u constrains the averages < … > of the duration of the burst (the time between the upcrossing and the downcrossing) d (u) and the time from the start of one burst to the start of the next (the upcrossing time interval) c (u) (or alternatively, the burst occurrence frequency f c (u)) through Therefore, our plots of survival functions can be read across directly as plots of < d (u) > < f c (u) > provided that u is in the distribution tail. This could in turn be used to quantify the occurrence frequency of different classes of events (corotating interaction regions and coronal mass ejections) provided that another quantity could be used to distinguish the class to which a given burst corresponds.
For each solar cycle maximum interval the the product of average burst duration and average burst frequency can thus also be described by its own single master distribution for each variable along with that variable's exceedence mean and variance for that unique solar maximum; the master distribution is just that determined for the survival function here.
For each physical variable we obtained the corresponding single GPD that characterizes the functional form of the distribution long tail for all of the past five solar cycles. The observations, and their uncertainties, establish the goodness of the GPD fit over the range of values explored by the data. For the F10.7 and AE indices and E y our GPD fit is acceptable from a quite moderate threshold quantile up to the most extreme values that are observed. For D ST and P Dyn our GPD fit may not hold for the most extreme (largest 0.1% or few tens of hourly records in a 3.5-year sample) observations; D ST has an additional solar cycle dependent roll-off in the distribution at very high values for the most recent solar maxima, and P Dyn has a systematic departure from the GPD.
There is a rich phenomenology in how structures in the solar wind interact with the near-Earth field and plasma environment. The likelihood of a space weather event can in fact depend on a combination of different physical variables; the geoeffectiveness of a large-scale solar wind structure at Earth can be conditioned by the direction of the interplanetary magnetic field (Gonzalez et al., 2008), solar wind Mach number (Borovsky & Denton, 2006;Lavraud & Borovsky, 2008), or by topological details (Lugaz et al., 2016). A natural extension of this work would be to construct joint distribution functions of more than one variable, provided that there is sufficient data to establish statistical significance.