Long-term occurrence probabilities of intense geomagnetic storm events
Abstract
[1] A quantitative assessment of the occurrence probability of intense geomagnetic storms (peak Dst < −100 nT) has been investigated by analyzing the Dst index time series database from 1957 to 2001. The main purpose was to derive two parameters, the probable intensity S_{T} and the occurrence frequency λ_{t}, that can act as proxies for long-term space weather quantities. The intensity S_{T} represents the expected maximum storm level with an occurrence rate of 1/T (a^{−1}, where a is years) and has been derived from the probability density function (PDF) of extreme (∣Dst∣ > 280 nT) storms. The mathematical tool to determine this type of PDF is the extreme value modeling, which exhibits more accurate statistics for extreme behavior. Our results estimate S_{60} ≈ 589, indicating that the March 1989 storm (the event with the largest ∣Dst∣ in the database) corresponds to an event expected to occur only once every 60 a. The other parameter λ_{t} gives the average occurrence rate of storm events. We have tested the null hypothesis that the storm occurrence pattern can be modeled as a Poisson process represented by λ_{t}, where different λ_{t} exist for the active and quiet periods of the solar cycle. Ordinary χ^{2} tests of goodness of fit can not reject this hypothesis, except within the periods that include extremely frequent occurrences. The rate λ_{t} is approximately 2.3 (0.7) per 3 months in the active (quiet) period. A future practical application of this work is that the resultant Poisson probability will enable us to calculate the expected damage due to storms, which represent potential risks in space activities.
1. Introduction
[2] Many space weather effects, especially those hazardous to human activities, are associated with the subsequent features of a geomagnetic storm. Hence a probabilistic assessment on likely future storm occurrence is one of the top priorities for space weather forecasting. Extreme events, in spite of their infrequent occurrence, are particularly important, since such events produce fatal damage to space assets, even if they occur only once on a timescale of decades (e.g., the March 1989 storms [see Kappenman and Albertson, 1990]).
[3] The storm occurrence is normally indicated by a large decrease of the Dst index, which represents the hourly average disturbance of the geomagnetic field in the Earth's low-latitude region and is a measure of the ring current intensity. We have concentrated on intense events (typically characterized by a peak Dst < −100 nT [Gonzalez et al., 1994]), which have the potential to be most disastrous to the Earth-space environment. The major interplanetary drivers of intense magnetic storms are coronal mass ejections (CMEs) [Echer et al., 2006; Gonzalez et al., 2007]. Corotating interaction regions (CIRs) also lead to storm development. However, the CIR-driven storms exhibit a weaker intensity than the CME-driven ones and mainly develop during the solar declining phase [Tsurutani et al., 2006; Gonzalez et al., 2007].
[4] Recently, Borovsky and Denton [2006] classified a dominant sort of storm aftermath according to its triggering causes; for instance, CME-driven storms can be more hazardous to Earth-based electrical systems because of effects such as geomagnetically induced currents (GIC), while CIR-driven ones are more effective in spacecraft surface charging. Once these interplanetary structures are specified by monitoring solar activity in real time, we can issue storm alerts with possible damage predictions to space weather-related organizations. Predictions of this type should be accomplished within the time lag between the event occurring on the Sun and its arrival at the Earth-space environment. An hourly to daily range is the typical timescale, and we will refer to this as a “short-term” forecast.
[5] The present study, on the other hand, focuses on a different aspect of space weather, a “long-term” forecast, which is related to monthly to yearly scales. While the short-term prediction can be utilized for a real-time warning, the long-term one is required for future risk management of space activities, such as satellite operations. In this situation we are more interested in when and how frequent the storm occurrences are, or how large they are, rather than how the storm occurs. Our purpose is to estimate the storm risk quantities through their probable intensity and occurrence frequency. The analysis is based on statistical modeling of the time series database. We use the Dst as an indicator of the storm occurrence, not of its individual properties. In section 2 we give the basic statistics of the Dst data (1957–2001) used in this study. From this long-term database, we evaluate the following parameters, which describe the statistical properties of a storm occurrence.
[6] The first parameter, S_{T}, is the probable storm intensity within a given (T-) year. For instance, S_{10} has a value such that there will be at least one storm per 10 years (a) where the peak ∣Dst∣ > S_{10}. We can use S_{T} as an indicator of the maximum risk in space weather effects of a storm origin. In order to quantify S_{T}, it is necessary to determine the accurate distribution function of Dst, especially in the extreme range. However, the number of ∣Dst∣ > S_{T} data is so poor in general that the statistical error is large.
[7] We have used extreme value theory (EVT) [see, e.g., Coles, 2001; Reiss and Thomas, 2001] as an appropriate tool to reduce this inaccuracy. EVT is a statistical theory that focuses only on the behavior of the upper tail of the distribution function and has previously been used in other fields, such as regional precipitation analysis in hydrology [e.g., Katz et al., 2002]. To our knowledge, the first use of EVT in space science was by Koons [2001], who used it to estimate the statistics of extreme space weather events, such as high-energy radiation belt fluxes. Recently, O'Brien et al. [2007] further developed the scheme and found the existence of a finite upper limit in extremely high MeV electron fluxes in the outer zone. The evaluation of S_{T} in the present study is an extension using EVT. In section 3, statistical modeling of the extreme Dst by means of EVT is briefly introduced. This results in the straightforward derivation of S_{T} as a function of year.
[8] The other important parameter is the storm occurrence frequency, λ_{t} (the subscript t indicates time variance). The statistical properties of waiting times ΔT, the time interval between successive events, lead to an evaluation of λ_{t}. For solar flares the ΔT of observed successive flare bursts has been shown to display a power law distribution ∼(ΔT)^{α}, especially for large ΔT (>10 h) [e.g., Boffetta et al., 1999; Wheatland and Litvinenko, 2002]. Power law behavior is often regarded as evidence of self-similarity and self-organized criticality (SOC) [e.g., Bak et al., 1988]. In magnetospheric studies, Choe et al. [2002] performed a similar analysis for ΔT between the peaks in AL and showed that the AL time series has the dynamical property of SOC. On the other hand, Wheatland and Litvinenko [2002] explored a simple theory that the power law feature in the waiting-time distribution (WTD) can be accounted for in terms of a Poisson process with a time-varying rate λ_{t}, if the λ_{t} distribution has a power law form ∼ λ_{t}^{δ} (α ∼ −3 − δ). The index α was shown to vary with the solar cycle.
[9] Borovsky and Denton [2006] indicated that the occurrence pattern of CME-driven storms is irregular, while CIR-driven ones have a 27-d repeating feature. Since CME-driven storms dominate the intense events, following the work of Wheatland and Litvinenko [2002], we assume that the interval of storm occurrences is governed by a solar cycle-dependent Poisson process with a minor component of a 27-d periodicity. Storm occurrence strongly relates to solar activity, though it is not a strict one-to-one correspondence. Thus properties of the storm WTD can be used to infer the probabilistic manifestation of solar activity on the geoenvironment. In section 4 we give the WTD of intense storm events from a selected event set. The primary investigation is to validate a Poisson process and find the dependence of the rate λ_{t} on the solar cycle. Section 5 summarizes the results with some brief discussion.
2. Basic Statistics of Dst and Storms
[10] The Dst database we used is available from the World Data Center for Geomagnetism, Kyoto University, Japan (http://swdcwww.kugi.kyoto-u.ac.jp/index.html) and consists of hourly values from 1957 to 2001 (total of 394,464 points). Figure 1 shows the probability density function (PDF) of Dst on a log-log scale. The plot range is limited to Dst ≤ −10 nT (the unit “nT” will not explicitly be shown hereafter). We classify the PDF characteristic into the normal (d_{n}: Dst > −280) and extreme (d_{ex}: Dst ≤ −280) regimes, where the decay pattern in d_{ex} is of power law A(∣Dst∣)^{α} with its index α ∼ −4.9. This critical value, Dst ∼ −280, will again be referred to in the next section. The whole PDF has a fat-tail property, which indicates more frequent occurrence of extreme events (large ∣Dst∣) than expected under the assumption of a Gaussian distribution. Although the PDF form seems to change at Dst ∼ −40, we do not investigate any further details on d_{n} in this paper.
[11] In the present analysis the identification of intense storm events is determined as follows. We extract the data with Dst < −100 (4632 points, 1.2% of the whole database). Any consecutive subsets with the Dst < −100 data is considered as one storm. We require each event to be as independent as possible. Therefore if the interval between the end time of one storm and the start time of the next is less than 48 h, both storms are counted as one event in our assumption. Through these procedures, 322 events have been obtained.
[12] The event time refers to the time when Dst reaches its minimum during the event. We have constructed two data sets. One is the storm intensity I_{s} defined by Dst at the event time, and the other is the storm waiting time T_{s} defined by the interval between consecutive event times. Figure 2 shows a scatter plot of (I_{s}, T_{s}). Whereas most of the events can be identified in I_{s} > −280 nT and T_{s} < 5000 h, the events of extraordinary larger intensity with a longer interval seem exclusive. In other words, giant events (I_{s} < −280 nT; 26 events in the present storm data set) do not happen suddenly when there is no event for more than half a year (T_{s} > 5000 h). Almost all the giant events (25 among 26) were seen during the solar active phase, where a mean occurrence rate is high, about one event per one month. Therefore the present feature is reasonable in that the occurrence probability of the event with both I_{s} < −280 nT and T_{s} > 5000 h is extremely low; ∼1% if a Poisson distribution is assumed for a occurrence frequency (details on the occurrence frequency are drawn in section 4).
[13] Figure 3 (top) shows the yearly occurrence histogram of intense storm events between 1957 and 2001. In Figure 3 (bottom) the corresponding yearly profile of the sunspot number is plotted along with the maximum storm intensity. Here, the sunspot number (solid line) is the yearly averaged value, and the maximum storm intensity (dashed line) is given by the yearly minimum Dst. This figure highlights the strong correlations between intense storm occurrence (C_{1}), solar activity (C_{2}), and the most intense storm level (C_{3}) in a year. The best correlation is between C_{1} and C_{3} (correlation coefficient is 0.87), while the correlation coefficients of C_{1} − C_{2} and C_{2} − C_{3} are 0.77 and 0.74, respectively. The following two sections give more detailed analyses, which will be useful for establishing quantitative schemes for long-term space weather prediction.
3. Probable Intensity
[14] To produce a long-term forecast, we propose a scheme of future probable intensity (S_{T}) evaluation. S_{T} is an important index to estimate the maximum risk during the operation period of space-related missions.
3.1. Extreme Value Statistics
[15] Intense storms are regarded as extreme events that rarely happen. This corresponds to the lower tail of the total Dst distribution. To describe the statistical behavior in such an extreme range precisely, we apply extreme value theory (EVT), which can exclude a bias because of the bulk of the distribution. The main aim of EVT is to make a parametric model of the probability distribution function F_{E}(x) focused on the extreme data, which is determined by the use of the peak-over-threshold (POT) method in the present study. Coles [2001] and Reiss and Thomas [2001] are both excellent textbooks for a concise understanding of EVT.
[17] Determining the appropriate threshold μ is of critical importance for checking the validity of the POT method. While μ should be large enough to assure the extreme properties, larger μ can lead to too little data for a meaningful analysis. There is no general, systematic way of finding the best μ estimation. Ideally, μ should be the lowest value above which the exceedance distribution obeys the same GPD (constant γ, σ). One of the methods commonly used is to examine a mean excess function M(u) = E[X − u∣X > u], which represents the mean value of the exceedance data subset over a threshold u. The GPD characteristic indicates that M(u) is a linear function of u (see the textbooks cited above for details). The parameter μ can be identified by the minimum u that conforms to this linearity.
[18] Figure 4 shows the mean excess function of −Dst as a function of the threshold u. The graph appears to be linear beyond u ∼ 280, which coincides well with the distribution range of d_{ex} in Figure 1. Thus the data subset of −Dst > 280 can be expected to obey an identical distribution to that represented by the form of the GPD. Note here that previous studies show other cutoff values of extreme storms: −Dst > 200 ∼ 250 [e.g., Tsurutani et al., 1992; Gonzalez et al., 2007]. Storms above these cutoff values were caused by the CME-related processes, such as magnetic clouds or shock compression. Conversely, no physical processes are associated with the present threshold (−Dst ∼ 280). Our criterion is simply based on the requirement of EVT.
[20] Numerical techniques are required to solve this. Stephenson and Gilleland [2006] reviewed several tools for extreme value analysis that are commonly used. Among the software tools they reviewed, we use the R package “ismev,” which was originally developed in the S-PLUS language by Coles [2001] (“R” is a free software environment for statistical computing and graphics; see http://www.r-project.org/).
[21] In Table 1 we summarize the basic information on the present extreme value analysis (the bottom two rows will be referred to in sections 5.1 and 5.2). The number of exceedance data (−Dst > 280; 121 data points) confirms the rare occurrence (∼0.03%) of such events. The distribution function can be found by substituting the estimated parameters (γ = 0.177 ± 0.117, σ = 38.2 ± 5.6) into the GPD form (2), as displayed in Figure 5. Also plotted are the corresponding Dst values, indicated by crosses. The cumulative probability from the Dst is simply evaluated as follows. We sort the exceedance −Dst data into ascending order, x_{{i}} = {x_{1} ≤ x_{2} ≤ ⋯ ≤ x_{k}; X > 280}, where x_{1} = 281 and x_{k} = x_{121} = 589. Under the assumption that an occurrence of each x_{i} is equally weighted, the cumulative probability for x_{i}, Pr[X ≤ x_{i}∣X > 280], can be estimated as i/(k + 1). The crosses in Figure 5 indicate the point (x_{i}, i/(k + 1)), where x_{i} is the −Dst value larger than 280 and k = 121. The derived GPD is a good fit to the real Dst distribution.
Method | Data | Dst (min/max) | μ | γ | σ |
---|---|---|---|---|---|
POT (1957–2001) | 121 | −589/−281 | 280 (fixed) | 0.177 ± 0.117 | 38.2 ± 5.6 |
POT (1957–2003) | 139 | −589/−281 | 280 (fixed) | 0.081 ± 0.101 | 45.8 ± 6.0 |
Block-maxima | 45 | −589/−91 | 192.2 ± 13.7 | 0.031 ± 0.126 | 80.2 ± 10.2 |
3.2. Estimation of S_{T}
[22] One important characteristic of the GPD (2) is that the shape parameter γ determines the behavior in the extreme limit [e.g., O'Brien et al., 2007]. If γ < 0, (2) gives a finite upper limit, μ + σ/∣γ∣. On the other hand, if γ ≥ 0 the distribution is unbounded. The present result finds γ = 0.177 ± 0.117, which suggests an unbounded tail, or extremely large upper limit, in the Dst distribution. Thus we cannot reject the possibility that an extreme storm, exceeding any recorded in the existing Dst database, may take place in the future. For example, Tsurutani et al. [2003] investigated the 1–2 September 1859 magnetic storm (considered to be the most intense event in recorded history) and estimated the Dst of this event to be ∼−1760 nT.
[23] EVT is a useful tool for evaluating the occurrence probability of such never-observed events by extrapolating from the GPD. The proxy parameter is given by S_{T} as a function of a year T, where the event of −Dst ≥ S_{T} is defined by its mean occurrence rate per T years greater than one.
T, a | S_{T} |
---|---|
10 | 450.8 ± 26.7 |
20 | 501.3 ± 42.7 |
30 | 533.8 ± 55.2 |
50 | 578.2 ± 74.8 |
100 | 645.3 ± 109.2 |
200 | 721.2 ± 154.4 |
[26] We also evaluate the occurrence probability of storm events identified in the previous section (restricted to 45-a observations): when there are n^{#} (among 322) events with an intensity I_{s} larger than S, S should be compared with S_{T} where T = 45/n^{#}. In Figure 6 the points (45/n^{#}, S) are represented by dots. Whereas S_{T} is overestimated in T < 10 (S_{T} > S), the long-term (longer than the solar cycle) estimation is relatively well evaluated within the confidence intervals. The most intense storm during the Dst available period is the March 1989 event, which has a −589 nT peak decrease in Dst. The S_{T} formulation indicates that such an event takes place at least once every ∼60 a. However, the corresponding error estimate is large (30 to 350 a; S_{60} ∼ S_{30} + SD(S_{30}) ∼ S_{350} − SD(S_{350})). Obviously, the longer T yields the largest errors in S_{T}. For the extreme long-term forecast (several decades to hundreds of years), the error strength SD(S_{T}) should also be a critical parameter in the context of indicating a future uncertainty level, which complements the ambiguity of using S_{T} alone.
4. Occurrence Frequency
[27] In the previous section we evaluated how frequently extreme storms can be expected to happen within a given period, with scales of decades. This timescale is far beyond the solar cycle so that it is useful for estimating the probability of the most disastrous event for any space activity. This section focuses on a shorter scale, several months to a few years. Over such timescales, it is more interesting to investigate the probability of the occurrence frequency λ_{t} of storms rather than the intensity. The estimation of λ_{t} can further clarify the process of storm occurrence.
4.1. Relationships With the Solar Cycle
[28] It is easy to associate storm occurrence with solar activity. Recently, Tsurutani et al. [2006] briefly summarized the solar cycle dependence of storms and confirmed more frequent occurrences during the solar maximum. In section 2 we have isolated 322 storm events where the peak Dst is less than −100 nT. We cumulatively count these events by their occurrence order, which is shown by dots in Figure 7. For instance, the first count is plotted at 21 January 1957 and the final one (322) at 24 November 2001. The corresponding monthly averaged sunspot number (SN) is also shown (indicated by the solid line).
[29] The count profile in Figure 7 obviously shows a zigzag increment, indicating the variable rate of storm occurrence. The rate is high (low) during a steep (flat) slope in the increment curve. Major kink points on the curve distinguish the solar maximum/minimum. An eyeball inspection shows these points roughly coincide with SN = 40, where a horizontal broken line is drawn. Therefore we define the solar active and quiet period to be divided at SN ∼ 40. The quiet period in our determination is shaded in Figure 7. The divided periods are indicated in the first column of Table 3. Note here that SN is not strictly more or less than 40 during each period, since 3 months are taken into account later as unit interval t_{u} for estimating the average occurrence.
Period | Storms | N_{k} (k = 0, ⋯, 8) | λ_{t} | p-Value | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||||
'57/01–'61/09 | 71 | ** | 1 | 3 | 2 | 9 | 3 | ** | 1 | ** | 3.7 | <0.05 |
'61/10–'66/03 | 9 | 13 | 2 | 2 | 1 | ** | ** | ** | ** | ** | 0.5 | 0.05 |
'66/04–'74/12 | 41 | 10 | 16 | 4 | 3 | 2 | ** | ** | ** | ** | 1.2 | 0.35 |
'75/01–'77/06 | 6 | 6 | 2 | 2 | ** | ** | ** | ** | ** | ** | 0.6 | 0.30 |
'77/07–'84/06 | 58 | 4 | 7 | 8 | 4 | 2 | 3 | ** | ** | ** | 2.1 | 0.80 |
'84/07–'87/12 | 10 | 6 | 6 | 2 | ** | ** | ** | ** | ** | ** | 0.7 | 0.54 |
'88/01–'93/06 | 69 | 1 | 3 | 3 | 6 | 6 | 2 | ** | ** | 1 | 3.1 | <0.05 |
'93/07–'97/12 | 20 | 6 | 6 | 4 | 2 | ** | ** | ** | ** | ** | 1.1 | 0.95 |
'98/01–'01/12 | 38 | 1 | 3 | 6 | 4 | ** | 1 | 1 | ** | ** | 2.4 | 0.50 |
Active | 277 | 16 | 30 | 24 | 19 | 19 | 9 | 1 | 1 | 1 | 2.3 | 0.18 |
Quiet | 45 | 31 | 16 | 10 | 3 | ** | ** | ** | ** | ** | 0.7 | 0.33 |
- a Occurrence distribution N_{k} is counted every 3 months. Asterisks represent no occurrences and λ_{t} indicates the expectation value of a Poisson distribution P_{λt}(k), equivalent to the average occurrence frequency within a unit observing period, here 3 months.
[31] In Figure 9 the WTD during the solar active and quiet periods are separately drawn with solid and dotted lines, respectively. The average is about 960 h for the active period and 2800 h for the quiet period. Note that narrow peaks are found at T_{s} ∼ 300, 650, and 1200 h in the WTD for the quiet period. In particular, the peak at T_{s} ∼ 650 h ∼ 27 d suggests the predominance of recurrent storms of this period. This confirms that the storm source during the declining phase of the solar cycle is dominated by CIR-driven ones [e.g., Borovsky and Denton, 2006]. On the other hand, we do not have presently plausible ideas to account for the latter two peaks. The 2-month periodicity (the peak at T_{s} ∼ 1200 h) may imply recurrent storms driven by CIRs from a same source region without developing into storms at the Earth during some of their passages. From the peak at T_{s} ∼ 300 h, which suggests a periodicity of about half a month, it is inferred that there are sequential arrivals of different CIRs within one solar rotation period. Further in-depth analysis is necessary to interpret the significance of these peaks.
[32] Short-interval storms (T_{s} < hundreds of hours) predominantly take place during the active period: the occurrence rate is approximately one order of magnitude larger than that in the quiet period. This spread shrinks for events with longer intervals and dissipates, or is even reversed, for T_{s} > 3000 h ∼ 4 months. In both periods the tail of the WTD for T_{s} > 1000 h can again be fitted to a power law; α ∼ −2.2 ± 0.2 for the active period and −1.4 ± 0.2 for the quiet period.
[33] Wheatland and Litvinenko [2002], in their investigation of solar flare statistics, presented a theory to account for the power law behavior in the WTD in terms of a nonstationary Poisson process with a time-varying rate λ_{t}. The assumption is that the total WTD is represented by the sum of piecewise Poisson processes each of which involves a slow variation of λ_{t} with respect to the waiting time. If the λ_{t} distribution exhibits a power law λ_{t}^{δ}, the WTD accordingly has a power law tail T_{s}^{α} where α ∼ −3 − δ. Wheatland and Litvinenko [2002] applied this theory to solar flare occurrences and showed that the occurrence rate distribution of X-ray flares greater than C1-class during 1975–2001 has a power law form with δ ∼ −0.9 ± 0.1. This results in a power law index for the flare WTD α ∼ −3 − δ = −2.1 ± 0.1, which is consistent with the observed index for T_{s} > 10 h, α ∼ −2.2 ± 0.1. Wheatland and Litvinenko [2002] also gave the power law indices for the solar maximum and minimum phase. Those indices, together with our results, are summarized in Table 4.
Total α | Active α | Quiet α | |
---|---|---|---|
Storm (>1000 h) | −2.2 ± 0.1 | −2.2 ± 0.2 | −1.4 ± 0.2 |
Flare (>10 h) | −2.2 ± 0.1 | −3.2 ± 0.2 | −1.4 ± 0.1 |
[34] Though every storm occurrence is not caused by flares (rather CME or CIR are better candidates), Table 4 shows the coincidence of power law indices in the WTD between storms and flares for the total (α ∼ −2.2) and quiet (α ∼ −1.4) periods. In contrast, there is a distinct difference for the active period. Recently, Wanliss and Weygand [2007] examined the “burst” lifetime distributions of solar wind parameters and SYM − H index, both of which yield power law exponents. Their results also indicated that the power laws between solar wind parameters and SYM − H are consistent during solar minimum but inconsistent during solar maximum. It will be interesting to elucidate underlying implications from such probabilistic consistency/inconsistency, but it is beyond the scope of the present study.
[35] Here we focus on the similar power law behavior between the WTD of storms and flares. We hypothesize that the storm occurrence is also governed by a nonstationary Poisson process. If the theory of Wheatland and Litvinenko [2002] is valid for the storm case, the rate (λ_{t}) distribution may exhibit a power law, δ ∼ −3 − α = −0.8 (total), −0.8 (active), and −1.6 (quiet). Analysis of the detailed λ_{t} distribution in the storm waiting time is under investigation at present and so we do not further refer to it in this paper. Instead, we have undertaken a conventional test of the significance to verify whether the storm occurrence during both the solar active and quiet periods is likely to be described by a Poisson process.
4.2. Association With Poisson Processes
[37] The second column of Table 3 shows the total number of storms during each period. These values are further distributed to show the number of storms in each 3-month interval (t_{u} = 3 months). For instance, in the first period (January 1957 to September 1961, 57 months), 71 storm events are distributed into 57/3 = 19 t_{u} (i_{1}, ⋯, i_{19}), where i_{1} for January through March 1957 and i_{19} for July through September 1961. The occurrence distribution N_{k} represents the number of t_{u} within which events were k-times identified. In the present instance, one event was identified in the interval (i_{18}), two in (i_{4}, i_{10}, i_{13}), three in (i_{5}, i_{19}), four in (i_{2}, i_{3}, i_{6}, i_{7}, i_{9}, i_{11}, i_{14}, i_{15}, i_{17}), five in (i_{1}, i_{8}, i_{12}), and seven in (i_{16}). Therefore N_{k} is summarized as {N_{k}; k = 0, ⋯, 7} = {0, 1, 3, 2, 9, 3, 0, 1}. Note that ΣN_{k} = 19 and ΣkN_{k} = 71.
[39] The last two columns in Table 3 show the estimated λ_{t} and the resultant p-value. In every period except the first (January 1957 to September 1961) and the seventh (January 1988 to June 1993), H_{0} can not be rejected at the 5% significance level (α = 0.05) normally used in the significance test, since the p-value is larger than α. The two periods where the hypothesis can be rejected (p-value ≤ α) have a common feature: both are in the active period, and there are extremely frequent storm occurrences (k = 7, 8; far from the average occurrence) within the period. In such cases, an occurrence rate must vary temporarily such that the assumption of the constant λ_{t} within t_{u} becomes invalid. For instance, some successive events may commonly originate from a single solar active region, where a Poisson process with different λ_{t} predominates. Another prominent process for the storm occurrences is the 27-d recurring pattern, mostly evident during the quiet period (Figure 9). However, the recurrent storm events are classified at k ∼ 1–3 that is closer to the average. Therefore the form of (k) is not greatly deformed, showing the proper approximation of the occurrence of a quiet period storm by a Poisson process.
[40] It is also the case that deviation from a Poisson distribution as a result of such extra occurrences becomes smaller as the number of statistical samples is increased. Figure 10 shows the results of P(k) for the total active (closed circles) and quiet (open circles) periods, with the solid and dotted lines showing the fitted (k), respectively. The resultant p-values are larger than the 5% significance level (Table 3). Thus H_{0} cannot be rejected. On average, storm occurrence can be well modeled as a Poisson process dependent on the solar cycle.
5. Summary and Discussion
[41] The present study has formulated a practical scheme for evaluating a quantitative long-term space weather forecast. Intense geomagnetic storms are one of the most important space weather phenomena, and we have statistically analyzed their occurrence in the 45-a database containing the Dst index between 1957 and 2001. We have developed two parameters that describe the probability of storm occurrences over long timescales, i.e., monthly to yearly range; the probable intensity S_{T} and the occurrence frequency λ_{t}. The results are summarized as follows.
[42] 1. The distribution of −Dst is significantly skewed and exhibits a power law tail (an index ∼−4.9) larger than 280 nT. On the basis of extreme value theory, the cumulative probability distribution focused on such an extreme data set can be approximated by the generalized Pareto distribution function W_{μ;γ,σ}(x), where μ = 280, γ = 0.177 ± 0.117, and σ = 38.2 ± 5.6.
[43] 2. The GPD determined from the data fitting gives the probable intensity S_{T} as function of year T. A storm event such that its peak −Dst > S_{T} is estimated to happen at least once within a T-year period. For example, the solution for S_{T} ∼ 589 is T ∼ 60 a, indicating that the occurrence probability of the most intense event during 1957–2001 (March 1989) can be evaluated as approximately 1/60 (a^{−1}).
[44] 3. There are 322 intense storm events that satisfy our definition, namely, a peak Dst of less than −100 nT and the occurrence interval between one storm and the next, the storm waiting time T_{s}, is more than 48 h. As the storm peak intensity is larger (less than −280 nT), its waiting time from the previous event becomes shorter (less than 5000 h). Most of such extreme events (25/26) were seen during a solar active period. This feature is also validated by strong correlation of solar activity with both the annual occurrence frequency and largest intensity of storms.
[45] 4. The storm waiting time distribution (WTD) exhibits a power law tail in the range T_{s} > 1000 h with an index ∼−2.2 ± 0.1, which is consistent with the WTD of X-ray solar flares greater than C-class given by Wheatland and Litvinenko [2002]. The storm WTD has also been evaluated for solar active and quiet periods, divided at the monthly averaged sunspot number ∼40. For both cases, the WTDs show a power law behavior for T_{s} > 1000 h, with their indices given by −2.2 ± 0.2 (−1.4 ± 0.2) for the active (quiet) periods.
[46] 5. The conventional χ^{2}-test of goodness of fit has been applied to test whether the storm occurrence obeys a solar-cycle-dependent Poisson process. The results suggest that this hypothesis is satisfied and the occurrence frequency λ_{t} is given by λ_{active} ∼ 2.3 and λ_{quiet} ∼ 0.7 per 3 months.
[47] Below we outline some additional points stemming from our analysis and perspectives relevant to further research.
5.1. Ambiguity in S_{T} Estimation
[48] Let us calculate the year T from equation (6) that returns the period satisfying S_{T} ∼ 1760, suggesting the most intense storm in recorded history (Dst-unavailable) [Tsurutani et al., 2003]. The solution shows an extreme value (T > 40,000 a), which seems too inaccurate and is probably incorrect. When the practical use of S_{T} (such as to design a protection for instruments against damages due to storms) is taken into account, users must be attracted to extrapolate S_{T} to extremely high levels that have never been observed. This temptation, however, should be judged by physical knowledge and statistical accuracy.
[49] The underlying physics that accounts for the tail distributions of the storm intensity are currently unknown. Furthermore, Tsurutani et al. [2003] indicate that the statistics for extreme storms with Dst < −400 nT are unreliable and any statistical evaluation of extreme behavior is inaccurate. As the next best approach to describe the extreme statistics, we have introduced EVT in this study. The present results of S_{T} can at least be used as a first rough estimate of the largest storm level within a given period T, where we should simultaneously note the large uncertainty of the extreme long-term range.
[50] Since there are few data in the tail part of the distribution, the EVT parameters are sensitive to the presence of even one event in the future. The variation in the shape parameter (γ) is particularly important in that γ characterizes whether the tail distribution is unbounded or not. We perform an additional test by using the Dst database extended to the end of 2003 (total 47 a) to show how the recent Halloween event (late October to early November 2003) affects the GPD parameters (μ, σ, γ).
[51] The same threshold of μ = 280 then gives the 139 exceedance data points. The estimated scale and shape parameters are (σ, γ) = (45.8 ± 6.0, 0.081 ± 0.101) (the second row in Table 1). Compared with the results obtained in section 3.1 (the first row in Table 1), the replaced shape parameter is closer to zero and is even negative within the confidence interval, indicating the existence of the limit S_{T} (if γ ∼ 0.081 − 0.101 = −0.02, the limit will be μ + σ/∣γ∣ ∼ 2500). This is because there are no events with S_{T} > 589 (the level of the March 1989 event) in the 2002–2003 Dst, and the tail distribution is effectively made thin. The return period for S_{T} = 589 is then T ∼ 75 a, which is longer than our previous estimate of 60 a but is not drastically altered. On the other hand, it is surely expected that the occurrence of extremely intense events, such as S_{T} ∼ 1760 of the September 1859 storm, results in the increase in γ, that is, a fat-tail distribution. Then T for a fixed S_{T} will be shortened. In any case, the ambiguity in γ will shrink as the event sample increases in the future. The tail behavior within a timescale of a few decades will correspondingly be described more accurately, while the reliability of the extreme (∼several hundreds of S_{T}) event forecast may remain doubtful.
5.2. Extreme Value Model
[53] According to the POT results, the average exceedance rate per year is approximately 2.69 (121/45 a). However, there are frequently multiple data with −Dst > 280 in a single event. If events with multiple −Dst > 280 data are counted only once, the number of extreme storm events reduces to 26 (0.58 per a). This demonstrates why it is important to always check for overestimation when using the POT method. In addition, these events have been identified in solar active periods only. As can be seen in Figure 6, S_{T} is overestimated in the return period T less than 10 a. Therefore it is expected that S_{T<10} strongly depends on the phase of solar activities. To further improve the scheme of S_{T} evaluation, it is important to investigate the dependence of the extreme distribution parameters (μ(t), γ(t), σ(t)) on the solar cycle.
5.3. λ_{t} Estimation
[54] We have determined the mean occurrence rate-per-unit interval (t_{u} = 3 months) separately for the solar active and quiet periods (Table 3). Here a constant Poisson rate λ_{t} within each period is the primary assumption. This assumption may be too crude, since the solar activity exhibits fluctuations on much shorter timescales. In the equivalent analysis for solar flare occurrences, Wheatland and Litvinenko [2002] evaluated more precise time-varying rates as a function of time by means of the Bayesian block method [Scargle, 1998]. They found that a power law feature in the λ_{t} distribution, together with a power law tail in the event WTD, account for the occurrence property as a time-dependent Poisson process. Applying this method to the storm case is the next step. Our present assumption is still valid and we can associate the storm occurrence with different Poisson processes according to the solar phase. In order to confirm a Poisson process from an event frequency distribution (Table 3), a suitable choice of t_{u} is important. Units that are too short bias the frequency to low rates (0 ∼ 1 per t_{u}), while units that are too long reduce the total number of intervals. For both cases, this makes the results statistically unreliable. The 3-month period we use here appears to be an appropriate choice, as the goodness-of-fit test suggests the acceptance of our null hypothesis H_{0} (Table 3).
k | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Active (λ_{t} = 2.3) | 90 | 67 | 40 | 20 | 8 |
Quiet (λ_{t} = 0.7) | 50 | 16 | 3 | 0.5 | 0.07 |
5.4. Future Work
[56] The present study proposes two parameters, S_{T} and λ_{t}, as possible indices for a long-term space weather forecast. Statistical approaches have been taken to estimate these parameters, and any physical processes connecting the Sun and the geospace environment have been ignored. We note that identification based on Dst alone is occasionally misleading for defining intense storm events [Kamide, 2006]. As a manifestation of storms, there are many hazardous patterns (e.g., GIC, SEP, etc.), whose driven source is either CME or CIR [e.g., Borovsky and Denton, 2006]. Custom-made analysis fitted to such individual hazards will improve the present approach as a more rigorous scheme of a risk assessment for extending space activities, where a thorough comprehension of physical causality will assist the correct identification of the event.
[57] The validity of the present analysis can be checked by using the latest available data, which have not been included in the construction of the statistical models. In statistical-based modeling, accumulating data continuously allows reevaluation and renewal of proxy parameters such as S_{T} and λ_{t}. After 2002, several serious events (e.g., the Halloween event) have taken place. We will investigate how accurately our results give the occurrence probability of these events in terms of S_{T} and λ_{t} and will modify the parameters to account for the additional data (already mentioned in this discussion). Before the next solar cycle becomes active, we hope to be able to establish a long-term predictor for the occurrence of geohazardous magnetic storms.
Acknowledgments
[58] The authors acknowledge World Data Center for Geomagnetism (Kyoto) for the use of Dst index database. This work was partially supported by 17GS0208 for Creative Scientific Research “The Basic Study of Space Weather Prediction” of the Ministry of Education, Science, Sports and Culture of Japan.