Robust statistical properties of the size of large burst events in AE

Geomagnetic indices provide a comprehensive data set with which to quantify space climate, that is, how the statistical likelihood of activity varies with the solar cycle. We characterize space climate by the AE index burst distribution. Burst sizes are constructed by thresholding the AE time series; a burst is the sum of the excess in the time series for each time interval over which the threshold is exceeded. The distribution of burst sizes is two component with a crossover in behavior at thresholds ≈1000 nT. Above this threshold, we find a range over which the mean burst size varies weakly with threshold for both solar maxima and minima. The burst size distribution of the largest events is exponential. The relative likelihood of these large events varies from one solar maximum and minimum to the next. Given the relative overall activity of a solar maximum/minimum, these results constrain the likelihood of extreme events of a given size.


Introduction
Quantifying the large-scale dynamic response of the magnetosphere to solar wind driving is central to our understanding of solar wind-magnetosphere coupling. This dynamic response is complex, with the "state" of the magnetosphere affecting its response to the driver. Auroral geomagnetic indices coupled with in situ solar wind monitors provide a comprehensive data set, spanning several solar cycles. We can characterize these observations in terms of "space climate" by quantifying how the statistical properties of ensembles of these observed variables vary between different phases of the solar cycle.
There has been a long-standing interest in characterizing space weather relevant time series in terms of the statistics of events or bursts, where a single burst is defined as the area under the curve of the segment of the time series that is continuously above a threshold.
Statistics are now robustly established for solar flares, where Datlowe et al. [1974], Lin et al. [1984], and Dennis [1985] were some of the first to perform frequency distributions on solar flare hard X-ray observations, and where their systematic characterization in terms of power laws has been well established [Crosby et al., 1993;Lee et al., 1993;Crosby et al., 1998;Georgoulis et al., 2001;Aschwanden and Parnell, 2002]. Solar surface magnetic fields seen in SOHO magnetograms also show power laws [Parnell et al., 2009]. Heavy tailed statistics of fluctuations in solar wind magnetic energy density are seen by monitors at L1. Their distributions show solar cycle dependence Hnat et al., 2007Hnat et al., , 2011.
While all of these studies characterize the statistics of bursts with power law regions, they recognize that there is a departure from this self-affine behavior for the largest events. One approach to modeling such distributions is that of Extreme Value Theory (EVT) which is formally applied to model the likelihood of the observables themselves [Coles, 2001] rather than bursts as such. EVT has so far been of limited application in space weather but notable pioneering exceptions include application to the aa index [Siscoe, 1976;Silbergleit, 1999] and the Dst index [Silbergleit, 1996;Tsubouchi and Omura, 2007]. Another application of EVT to energetic electron fluxes was by Koons [2001]. The latter paper, further developed by O'Brien et al. [2007], used the Generalized Extreme Value (GEV) distribution to establish an unexpected upper bound to the flux of relativistic "killer" electrons. Ground effects [Thomson, 2007] have only recently begun to be studied using the GEV framework [Thomson et al., 2011;Beggan et al., 2013], see also in the solar wind [Moloney and Davidsen, 2010].
The success of these studies suggests that even the largest events may have statistics that are robustly definable from the observations. In this letter we explore the properties of the burst distribution of a well-studied geomagnetic index, AE, with emphasis on the largest events, and its variation with the solar cycle. We find that there are clear, robust trends in how the burst size distribution of AE depends on the threshold used to define the bursts. At all phases of the solar cycle there is a clear transition to a single, approximately exponential burst size distribution of larger events. These large events follow the same distribution but a greater fraction fall into this category at solar maximum compared to solar minimum. These model-independent results provide constraints on the upper limits of activity above a threshold.
This letter identifies robust aspects of the distribution of bursts in AE. The size of a burst event is defined as the integrated area under the curve of an excursion of the time series above a threshold. In geomagnetic indices, the size of a burst characterizes the overall geomagnetic effect of an event. We find that plots of the mean burst size as a function of threshold readily identify the well-known [e.g., Freeman et al., 2000b;Uritsky et al., 2001] multicomponent nature of AE. Above a threshold of ≈1000 nT, the mean burst size becomes weakly dependent of threshold. We see the same behavior at both solar maxima and minima; at the maxima, and at more active minima, a larger fraction of burst events are found in this >1000 nT population. We find that these large burst events (hereafter LBEs) have a probability distribution of burst sizes that is independent of threshold and is exponential for LBE size >2000 nT min. This can inform statistical prediction of the largest, space weather relevant events.

Constructing Bursts From AE
We perform statistical analysis of the full AE data set as derived by the WDC [Davis and Sugiura, 1966]. The AE geomagnetic index is sampled every minute and the full data set is available from January 1978 to June 2014. From this data set we will also draw yearlong samples centered on solar maxima and solar minima, the date of each solar maximum/minimum is determined from the monthly smoothed sunspot number. The dates of the solar maxima are as follows: cycle 21, December 1979; cycle 22, July 1989; and cycle 23, March 2000. The dates for the solar minima are as follows: cycle 22, September 1986; cycle 23, May 1996; and cycle 24, January 2008.
In order to characterize the overall impact of a given event, we will focus on the statistics of burst sizes rather than of the raw data set. Events are defined in the time series as the contiguous group of data samples that exceed a given threshold. Aggregating these samples then suppresses observational uncertainties associated with individual data points while quantifying the overall impact of the event. In particular, AE is constructed from the data from the station with maximum response at any given time. Since the stations are spatially sparse and nonuniform, the peak of AE can significantly undersample a given event peak.
The method for constructing the burst time series is shown in Figure 1. The size of each burst is the integrated area between the segment of the time series above threshold and the threshold value. Thus, for the AE time series (nT) sampled every minute, the natural unit of a burst is nT min. For any given threshold, the original time series gives an ensemble of burst sizes S which we can quantify statistically. We have tested the robustness of all the results presented here against varying the (yearlong) time window within which an ensemble of bursts is constructed; we recalculated all quantities having offset the time window by 1 month. In general, the statistics of burst size will depend on the threshold. An isolated burst that has a single, well-defined peak will decrease in size as the threshold increases so that if all bursts were of this type and of roughly the same peak height, the mean size would decrease with threshold. However, there is a variation in peak heights of the bursts so that as the threshold increases successively, smaller bursts with lower peaks tend to zero size first and drop out of the average. Also, the geomagnetic index time series are multipeaked; a single storm event can encompass several large peaks. In this case a single large burst at low threshold can divide into several, smaller bursts as the threshold is increased. At very high thresholds, when there are only a few isolated bursts remaining, one can see a fall off, but here the statistics are not sufficient to meaningfully discuss the mean. We exclude bursts of duration shorter than 5 min from the analysis as these bursts are not sufficiently statistically well resolved.
We plot mean burst sizeS as a function of the threshold T in Figures 2 and 3. Each point on these plots is an estimate of the meanS(T), based on a sample size N(T), N decreases as T increases. The main panel plots this for the entire AE data set (black line) 1978-2014 and for an ensemble formed from the three solar maximum intervals, that is the three years of data centered in December 1979, July 1989, and March 2000. In the inset we show the same plot except that the three solar maxima are now plotted as separate, yearlong ensembles. The error on the mean is indicated by the error bars on these plots; this is given byS ± F × (S)∕ √ N where (T) is the standard deviation of the N(T) samples associated with the threshold T, and F is the inverse of the t distribution at the 2.5% and 97.5% quantiles. We immediately see a clear crossover at a threshold of ≈1000 nT between a roughly linear decrease, and a flattening out of these curves with increasing threshold. Events which contribute to the mean burst size estimate for thresholds >1000 nT are events which at their peak amplitude are >1000 nT, these are the large burst events (LBEs). As we move to higher thresholds the number of bursts used to estimate the mean decreases, and the statistical uncertainty can be seen to increase. Within these uncertainties for the full AE data set, the LBEs have mean burst size that is essentially independent of the threshold.   Figure 2 but now for the years centered on solar minima (September 1986, May 1996, and January 2008. The same crossover in behavior and flattening out of the curves is seen. The main difference in these curves is the extent of the region >1000 nT, that is, the range of LBE peak amplitudes that occur. As we might expect, there are larger amplitude events occurring at solar maximum compared to solar minimum. Comparing Figures 2 and 3 looking at the ensembles over three successive maxima (main panel), the largest 20 events at solar maximum are at >2420 nT, whereas at solar minimum they are at >1770 nT (for the entire distribution over the full data set, the largest 20 events occur Figure 4. The CDF of burst sizes is plotted for thresholds between 1000 nT and 5000 nT. We plot log(1 − CDF), to expand the tail of the distributions. For each threshold, the CDF falls on a similar curve. It is roughly exponential at burst sizes greater than 3 × 10 4 nT min. An exponential function exp(−S∕S 0 ) for S 0 = 2.73 × 10 4 nT min is plotted (offset, red line) for comparison. above 3500 nT). However there is considerable variation between one solar maximum/minimum and the next, so that the level of activity can overlap. The largest 20 events at the maximum of cycle 21 are at thresholds > 1650 nT, whereas the largest 20 events of the minimum of cycle 22 are at thresholds > 1740 nT.
In summary, above 1000 nT threshold, the mean burst size of (i) the entire AE data set does not vary with threshold, it is constant within the errors; (ii) solar max for cycles 21-23 taken together (main panel of Figure 2) is constant up to ≈2000 − 2500 nT after which it falls off, there is no data beyond 3000 nT; and (iii) solar minimum for cycles 22-24 taken together (main panel of Figure 3) is constant within the errors. Note that at very high thresholds, when there are only a few isolated bursts remaining, one can see a fall off with increasing threshold, but here the statistics are not sufficient to meaningfully discuss the mean. Smaller ensembles such as those for individual solar maxima and minima (shown inset in Figures 2 and 3) will more clearly show this fall off.

Statistical Properties of Bursts
We now consider whether the full probability distribution of burst sizes is independent of the threshold. Figure 4 plots the burst size complementary cumulative distribution function (ccdf ); for a variable x, the ccdf(X) is the probability that x > X, it is 1-cdf(X). We have calculated the ccdf of the LBE burst size using thresholds between 1000 nT and 5000 nT, the resulting multiple curves are overplotted in Figure 4. We can see that within the uncertainties there is good collapse of these curves onto a single envelope, suggesting that there is a single function for the LBEs that is independent of threshold that captures the burst size probability. Varying the threshold varies the size of a given burst so that different thresholds use the same data to map out different regions of the same curve. The plot is on semilog axes and for the larger events is roughly linear. We overplot a straight line fit and estimate that for burst sizes > 4000 nT min the ccdf(X)∼exp(−X∕X 0 ) with X 0 = 2.73×10 4 nT, which is a characteristic burst size of an LBE. The burst duration varies monotonically with the burst size S, such that S ∼ where ≈ 1.5 for thresholds lower than 3000 nT. Above this, the number of events is sufficiently small that we cannot accurately estimate the exponent . This relationship between burst size and duration is consistent with previous results (see, e.g., the larger, longer-duration events in Uritsky et al. [2001]) Burst size statistics are usually presented as a probability density function (pdf ) and this is (minus) the derivative of the ccdf. Since the LBE ccdf tends to an exponential for large events, the pdf will fall off exponentially also. This is the characteristic roll off often seen at large values of burst size pdfs reported to have a power law range of dependence [Freeman et al., 2000a;Riley, 2012].

Conclusions
The long, well-curated time series of geomagnetic indices provide a comprehensive data set with which to quantify space climate, that is, how the statistical likelihood of activity of a given intensity varies with the solar cycle. We have characterized space weather events by their burst size as seen in the AE index. The statistical distribution of burst sizes is two component with a crossover in behavior at a threshold ≈1000 nT. For events above this threshold, the burst distribution varies only weakly as the threshold is increased. This behavior is robust for both solar maxima and minima. The likelihood of events occurring at these high thresholds (>1000 nT) is larger at solar maximum compared to solar minimum but does vary from one solar cycle to the next. The full distribution of burst sizes of the largest event is exponential.
Importantly, we have shown here that this exponential roll off region has an envelope which has a functional form that is independent of the threshold used to construct the bursts. This functional form could then be used as a basis to characterize the likelihood of these large bursts. As it is an exponential roll off, it suggests that estimates based on power law extrapolations to the largest events will tend to overestimate the burst size and hence the severity of these largest events.
Taking these results together could assist in the quantification of space climate, as it varies within a solar cycle, and across one solar cycle to the next. Above ≈1000 nT, the mean burst size becomes weakly dependent on threshold within uncertainties. The range of thresholds at which the largest events occur varies both between solar maximum and solar minimum, and from one solar maximum (or minimum) to the next. The full distribution of bursts also varies weakly with threshold. This could assist in the prediction of space climate for a given solar maximum or minimum. Once it is determined that solar activity is sufficient for events to be seen above a given threshold, the occurrence likelihood of the burst size of all events below that threshold (and >1000 nT) is known for that solar maximum or minimum.
Finally, the procedure used here is related to the method of mean residual life plots, where one plots the mean excess, that is, the mean of the values of the data points above a threshold. For data that follows generalized extreme value statistics, the mean excess varies linearly with threshold. We constructed mean residual life plots for yearlong samples of AE but found that the mean excess of the raw data is not a robust quantity, it fluctuates significantly, by as much as 125% if one varies the time interval of the data by 1 month. However, the mean excess is just the mean burst size/mean burst duration [Lawrance and Kottegoda, 1977;Watkins et al., manuscript in preparation] and we have found that both mean burst size and duration statistics are robust against varying the time interval of data. This then offers a robust methodology for estimating the mean excess.