A new technique for determining Substorm Onsets and Phases from Indices of the Electrojet (SOPHIE)

We present a new quantitative technique that determines the times and durations of substorm expansion and recovery phases and possible growth phases based on percentiles of the rate of change of auroral electrojet indices. By being able to prescribe different percentile values, we can determine the onset and duration of substorm phases for smaller or larger variations of the auroral index or indeed any auroral zone ground‐based magnetometer data. We apply this technique to the SuperMAG AL (SML) index and compare our expansion phase onset times with previous lists of substorm onsets. We find that more than 50% of events in previous lists occur within 20 min of our identified onsets. We also present a comparison of superposed epoch analyses of SML based on our onsets identified by our technique and existing onset lists and find that the general characteristics of the substorm bay are comparable. By prescribing user‐defined thresholds, this automated, quantitative technique represents an improvement over any visual identification of substorm onsets or indeed any fixed threshold method.


Introduction
Substorms are the elemental dissipative events in the coupled solar wind-magnetosphere-ionosphere system that process~10 15 J of captured solar wind energy during their lifetime [Tanskanen et al., 2002]. In the magnetosphere, substorm expansion phases are accompanied by a dipolarization of the magnetotail magnetic field, an injection of energetic particles into the inner magnetosphere, a reduction of the magnetic flux within the magnetotail lobes, and a diversion of the cross-tail current into the ionosphere. In the ionosphere, substorm expansion phases are accompanied by a brightening and expansion of the nightside aurora [Akasofu, 1964], electromagnetic ULF waves (for a review, see Rae, I.J. and Watt, C.E.J., ULF waves above the nightside auroral oval during substorm onset, accepted in AGU Geophysical Monograph Series) and an enhancement in the auroral electrojet current due to the cross-tail current diversion, which results in a deflection of the magnetic field at ground level.
Substorms are typically broken down into three phases: growth, expansion, and recovery. During the growth phase, first identified by McPherron [1970] and which, on average, lasts 30-90 min [Li et al., 2013], magnetic flux is added to the magnetotail lobes through reconnection at the dayside magnetopause. This process enhances magnetospheric convection [Axford, 1969] and the ionospheric electrojets, resulting in a small deflection of the H component of the ground magnetic field at auroral latitudes [McPherron, 1970]. As lobe magnetic flux increases, the auroral oval moves equatorward [Coumans et al., 2007] and the temperature of the plasma sheet increases [Forsyth et al., 2014]. At the onset of the expansion phase (often referred to as the substorm onset) there is an exponential increase in the auroral intensity [Voronkov et al., 2003] and ULF wave activity [Voronkov et al., 2003;Rae et al., 2012] and the aurora expand poleward over~15 min [Partamies et al., 2013;Chu et al., 2015]. In the magnetosphere, magnetotail currents are diverted into the ionosphere through field-aligned current systems [McPherron et al., 1973], enhancing the westward electrojet and resulting in a formation of sharp negative bays in the H component of the ground magnetic field [Akasofu and Chapman, 1961;Davis and Sugiura, 1966]. The end of the expansion phase and start of the recovery phase are indicated by a reduction of the auroral intensity, ULF wave activity, and strength of the westward electrojet currents. Over~1 h, the magnetospheric current systems are re-organized and distinct auroral features, such as "omega bands" and pulsating auroral patches, are observed [Opgenoorth et al., 1994]. While isolated substorms usually follow this growth-expansion-recovery paradigm, events with multiple onsets or intensifications, seen as expansion phases occurring immediately following a recovery phase, are also reported [Pytte et al., 1976].
In order to determine these and other repeatable physical processes during substorms, lists of substorm onsets have been generated using data from space-and ground-based auroral imagers and ground-based magnetometers [Liou et al., 2001;Frey et al., 2004;Nishimura et al., 2010;Newell and Gjerloev, 2011]. These lists have then formed the basis of statistical studies, such as superposed epoch analyses [e.g., Kistler et al., 2006;Gjerloev et al., 2007;Boakes et al., 2009;Milan et al., 2009].
There is a clear link between auroral brightenings at substorm onset and enhancements in the westward electrojet that deflect the H component of the ground magnetic field [Heppner, 1954;Akasofu, 1964]. As such, auroral indices which combine data from magnetometers around the auroral zone are a useful alternative to direct auroral observations for determining substorm onset since these magnetometer data sets are almost continuously available over long periods of time. Specifically, the AL index [Davis and Sugiura, 1966] and SuperMAG AL (SML) index [Newell and Gjerloev, 2011;Gjerloev, 2012] act as virtual magnetometer stations that track the peak of the westward auroral electrojet irrespective of latitude or local time differing in the number and latitudinal extent of contributing stations. Yearly means of the auroral indices AU, AL, and AE follow the solar cycle, maximizing in the declining phase [Ahn et al., 2000], and the AL and AE auroral indices have bimodal lognormal distributions [Vassiliadis et al., 1996], with the two distributions being described as quiet or laminar and disturbed or turbulent. In contrast, the distributions of fluctuations in AE are smoothly varying [Consolini and De Michelis, 1998]. As one would expect, SML has similar distributions (shown in the supporting information).
Given the well-defined magnetic field profile of the expansion and recovery phases in auroral zone magnetic indices, individual substorm phases can also be readily identified. Here we propose a quantitative technique that not only determines substorm expansion phase onsets but also identifies the onset and duration of each individual substorm phase using auroral indices. We demonstrate the advantages of using this technique by applying it to the SML index. We compare the substorm expansion phase onset times derived using our technique with previously published onset lists using other methods.

Visual Identification of Auroral Onset
The most commonly referred to techniques for identifying substorm onset hark back to the definition by Akasofu [1964]-"a sudden increase in the brightness of… a quiet arc and subsequent rapid motion of the arc towards the geomagnetic pole." Using this, substorm onsets have been visually identified from spacebased auroral imagers [Liou et al., 2001;Frey et al., 2004;Frey and Mende, 2006] and ground-based all-sky cameras . However, these methods are subjective, depending on the observer's judgment as to when an exponentially growing auroral arc started to brighten and whether the aurora expanded, and indeed how far. As a consequence, the results of these studies are nonrepeatable and cannot be applied to different data sets. Measures that are quantitative and objective measures are more useful [Murphy et al., 2009b].

Automated Identification of Onset From Auroral Data
In order to address the need for a quantitative measure of auroral onset, Murphy et al. [2014] developed a novel technique for identifying the time interval encompassing substorm onset by maximizing a so-called "brightening factor" calculated from Fourier analysis of the product of auroral intensity and change in auroral intensity from ground-based all-sky cameras. By iteratively applying this technique to successively smaller areas of the auroral images, this technique also identifies the location of the auroral brightening. This technique thus provides an unbiased determination of the time and location of an auroral brightening but requires a predetermined list of possible onset intervals.

Identification of Substorm Onsets From Ground Magnetometer Data
Substorm onsets can be identified with the start of negative H component bays or the start of exponential growth of ULF wave power in ground-based magnetometer data collected from auroral latitudes or auroral indices. Hsu and McPherron [2012] visually inspected 7 h intervals of AL data to identify substorms. Despite a number of criteria for the identification of substorms being listed, such a visual inspection is ultimately subjective, similar to the identification of auroral onsets. Newell and Gjerloev [2011] used a rate of change in SML (À15 nT/min over at least 3 min) to indicate substorm onset. Chu et al. [2015] used intervals in which a midlatitude positive bay index peaked above 25 nT 2 to indicate substorm times. Both Newell and Gjerloev [2011] Journal of Geophysical Research: Space Physics 10.1002/2015JA021343 and Chu et al. [2015] found that their onset lists had a good agreement with auroral onsets determined from a space-based auroral imager by Liou et al. [2001]. Milling et al. [2008] and Murphy et al. [2009a] developed a technique for identifying substorm onset by determining when the magnetic ULF wave power increased above "the mean plus two standard deviations of the quiet time ULF wave power" for auroral zone magnetometers and determined that the onset of exponential ULF wave growth occurred several minutes before visually identified global auroral substorm onset as determined from global auroral imaging and at the start of auroral arc brightening seen in all-sky camera data [Rae et al., 2009].
The above substorm onset identifications provide no further information about substorm phases other than the time of the expansion phase onset. Juusola et al. [2011] used the median positive and negative changes in AL as a basis for determining expansion and recovery phase intervals. The statistics of the substorms identified by this technique, such as the median phase lengths and total durations, were examined by Partamies et al. [2013]. Forsyth et al. [2014] applied a similar technique to the SML data. None of these studies specifically compared the calculated onset times with previously published lists of substorm onset. One notable difference between these techniques is that Juusola et al. [2011] only considered intervals in which the interplanetary magnetic field (IMF) had a southward component to be growth phase intervals, whereas Forsyth et al. [2014] took all intervals outside the expansion and recovery phases to be growth phase. This was based on the justification that solar wind coupling functions, such as the ε function [Perreault and Akasofu, 1978], are nonzero for all but purely northward IMF; thus, energy is being added to the system at all times.

Substorm Onsets and Phases From Indices of the Electrojet (SOPHIE)
In the following, we describe a new "expert system" for identifying substorm onsets as well as the times of each substorm phase. This technique has been developed using the 1 min cadence SuperMAG AL (SML) data set, although in principle it could be applied to any auroral zone magnetic index or ground magnetometer time series.
Since there is no clear threshold value beyond which the data can be said to be indicative of a substorm (see Introduction), we identify substorms in a nonparametric manner on the basis of exceedance of a percentile in the rate of change of SML. We assume that negative changes in SML beyond a user-specified percentile level are indicative of substorm expansion phases and positive changes in SML are due to substorm recovery phases. We do not insist that a recovery phase must follow an expansion phase, since some expansion phases may lead into events such as steady magnetospheric convection (SMC) [Sergeev et al., 1996;Kissinger et al., 2012;Walach and Milan, 2015], although we modify the percentile threshold of positive changes in SML that identify the recovery phase to provide nearly equal numbers of expansion and recovery phases. We note that the average occurrence rate of SMCs is approximately 1/10 of that of substorms [Kissinger et al., 2012]. McPherron [1970] identified the substorm growth phase as "significant deviations" away from a "quiet trace" of the H component from auroral zone magnetometers, although they note that the start of the growth phase is dependent on the definitions of a significant deviation and the quiet trace. Superposed epoch analysis of AL around substorm onset shows that, on average, AL shows a shallow downward trend prior to onset [e.g., Weimer, 1994]. However, on a case-by-case basis this signature is not always apparent, indicating that either substorms are not necessarily preceded by a growth phase or that growth phases do not have a unique signature in these data. Subsequent studies have chosen to take periods of southward IMF prior to an expansion phase onset as the growth phase [e.g., Gjerloev et al., 2003;Juusola et al., 2011;Li et al., 2013]. However, Petrukovich [2000] showed that, particularly for small substorms, growth phase signatures were observed in the magnetotail even when the IMF was weakly northward. More recently, Forsyth et al. [2014] considered all nonexpansion and nonrecovery times to be growth phases, arguing that solar wind coupling functions such as the ε function and others [see Milan et al., 2012] are nonzero for all but purely northward IMF. However, solar wind energy input is not a sole indicator of a substorm growth phase, since without a measure for energy loss we cannot determine the net energy gained by the system. As such, we choose to define nonexpansion and recovery phase times as possible growth phases.
Based on the above, substorm phases are identified in a three-stage process (a flow diagram of each of these procedures is presented in the supporting information). In the first stage, substorm phases are identified by Journal of Geophysical Research: Space Physics 10.1002/2015JA021343 1. low-pass filtering the data with a 30 min cutoff to remove the effect of ULF waves, commonly seen around substorm onset (see Rae and Watt, submitted, and references therein) and in the recovery phase [Jorgensen et al., 1999]; 2. calculating the time derivative of SML (dSML/dt) using a three-point Lagrangian interpolation; 3. calculating the percentiles of dSML/dt < 0 (expansion percentiles, EPs) and dSML/dt > 0 (recovery percentiles, RPs); 4. where dSML/dt is negative and |dSML/dt| is greater than a specified EP threshold (EPT), identifying the time as "expansion phase"; 5. where dSML/dt is positive and greater than a specified RP threshold (RPT), identifying the time as "recovery phase"; 6. identifying all other intervals as "possible growth phase." As a result of using different thresholds for identifying the expansion and recovery phases, the above procedure results in short intervals of possible growth phases between expansion and recovery phases. Similarly, short intervals of possible growth phases occur between recovery and expansion phases during substorms with one or more intensifications. In the second stage, we remove these by 1. Identifying times when an expansion phase changes into a possible growth phase; 2. For each of these intervals, determine whether there is a recovery phase up to 30 min after the end of the expansion phase; 3. If a recovery phase begins within this 30 min window, find the minimum SML between the expansion and recovery phases; 4. Identify data prior to the minimum in SML as expansion phase and data following the minimum in SML as recovery phase.
A similar procedure is used to remove short possible growth phases (<30 min) between recovery and expansion phases using the local maximum in SML to separate the recovery and possible growth phases. Furthermore, we remove the following in order: 1. short (<10 min) expansion phases that occur between two possible growth phases; 2. short (<10 min) recovery phases that occur between two possible growth phases; 3. short (<30 min) possible growth phases that occur between two expansion phases; 4. short (<30 min) possible growth phases that occur between two recovery phases; 5. short (<30 min) recovery phases that occur between two possible growth phases; 6. short (<30 min) recovery phases that occur between possible growth phases and expansion phases.
Finally, since filtering the data smoothes out sharp decreases in SML, we adjust the expansion phase onset times to be at the first time at which two successive data points of the unfiltered dSML/dt are less than EPT up to 20 min after the previously determined onset time in order to account for the Gibbs phenomenon [Gibbs, 1898[Gibbs, , 1899] that expands in time any sharp changes in the unfiltered data.
While EPT and RPT can be set arbitrarily and independently, we assume that we have correctly identified the expansion phases and that it logically follows that there are an equal number of expansion and recovery phases. Thus, in the third stage, we iteratively modify RPT, with EPT remaining fixed, to minimize the difference between the number of expansion and recovery phase onsets identified.
During substorms, enhancements in the eastward and westward electrojets, and their associated enhancements in the AU and AL indices, are essentially independent [Rostoker, 1972]. At other times, AU and AL can vary in tandem, indicating enhancements in magnetosphetic convection and thus in the global current system. We expect SML and SMU to act similarly; thus, we flag those expansion phases in which the mean or median value of |dSML|/|dSMU| over the expansion phase is less than two as potentially falsely identified substorms. We also flag the following recovery phase. For completeness, the phase identification (expansion or recovery) is retained.
The supporting information provides a list of the start times of possible growth phases (phase = 1), expansion phases (phase = 2), and recovery phases (phase = 3), along with flags indicating whether the event is an enhanced convection event (flag = 1) or not (flag = 0). Three lists for EPTs of 50%, 75%, and 90% are provided, based on SML data between 1 January 1996 to 31 December 2014.

10.1002/2015JA021343
In the following, we apply this technique to SML data from 1 January 1996 to 31 December 2014, when there were~100 stations from which SML was derived. (Figure 1a shows the number of SML stations available over time.) We apply the technique to each year individually to minimize any solar cycle effects [Ahn et al., 2000;Tanskanen et al., 2002Tanskanen et al., , 2011. Figure 1 shows an example of the data processing and phase identification results from a representative day on 10 May 2005. Figure 1b shows the input data, and Figures 1c and 1d show the data processing in stage 1. Figure 1e shows the phase identifications at the end of stage 1 (possible growth phase in green, expansion in blue, and recovery in red), and Figure 1f shows the phase identifications at the end of stage 2. Figure 1 shows that Substorm Onsets and Phases from Indices of the Electrojet (SOPHIE) is able to identify the various substorm phases and that stage 2 corrects the classification of a number of data points initially labeled as possible growth phases. In the following, we test the validity of this technique through comparing the substorm onsets determined by SOPHIE with previously published lists of substorm onsets.

Comparison With Previous Techniques
As noted above, existing published lists provide the times of substorm expansion phase onsets only. Using SOPHIE, we identify any expansion phase onset as the time at which the phase changes into an expansion phase. In Figure 2, we compare these onsets with event lists from (a) Figure 2 shows (left-hand column) the probability distribution of time differences (Δt) between onsets in these previously published lists and the closest onset from SOPHIE. Comparison between the given list and the 50% EPT onsets are shown in black, with the 75% EPT onsets shown in blue, and with the 90% onsets shown in green. Comparisons between the given lists and NG11 are shown in red. Negative Δt indicates that onsets identified using SOPHIE occur before onsets in these lists. The right-hand column of Figure 2 shows the cumulative probability of the |Δt|. We examine the closest SOPHIE events to the events in these lists (i.e., L01, FM06, N10, and NG11 are the fiducial lists), rather than vice versa, since none of the auroral lists provide full time coverage. As such, there are events in the SOPHIE lists that cannot have a counterpart in the auroral event lists because there were no data to determine whether there was an auroral signature or not; thus, for a consistent comparison with all lists there is only a one-way correlation.
The left-hand column of Figure 2 shows that for À30 < Δt < 30 min the distribution of the time differences are sharply peaked between À2 and +3 min, with the distribution for the EPT of 75% lists giving the largest peak, and drop off rapidly away from this peak. The full width at half maximum (FWHM) of the distributions varied between 4 min for EPT of 50% up to 13 min for EPT of 90%. The lowest of these is comparable to the FWHM of the Gaussian fit to the Δt distribution from Chu et al. [2015]. The SOPHIE probabilities are somewhat higher for positive Δt for the comparisons with FM06, L01, and N10, showing that it is more likely that the onsets from SOPHIE follow auroral onsets. The Δt distributions from the comparison with NG11 differ significantly from the other results, with a higher probability of a SOPHIE onset preceding an onset from NG11 and much higher peak probabilities of 12%, 20%, and 17% for EPT of 50%, 75%, and 90%, respectively. This results from using the same data set as NG11 but with a different approach to identifying onset. We note that the NG11 onset threshold (À15 nT/min over at least 3 min) corresponds to the 82nd percentile of the unfiltered data set. The relative excess of events between À15 min and 0 min for EPTs of 50% and 75% can be explained by considering that the dSML/dt may decrease over a few minutes prior to reaching a rate of À15 nT/min, depending on the sharpness of the change in SML. If the SML profile is such that onset is detected by both methods, the lower thresholds will be met earlier; thus, the SOPHIE onsets will precede those in NG11.
The probabilities in the left-hand columns are integrated with respect to |Δt| to give the cumulative probabilities shown in right-hand column of Figure 2. These show the probability of a SOPHIE event being associated with an event in the comparison list (FM06, L01, N10, or NG11) within the time |Δt| or less. This probability is higher for EPTs of 50% and 75% than EPT of 90%. Similarly, the probability of an NG11 event being associated with an event in the FM06, L01, and N10 lists within a given time frame is lower than for events from SOPHIE. We find that for EPT of 50% or 75%, half of the onsets within the tested lists are associated with a SOPHIE onset within 20 min, whereas this time > 30 min when comparing NG11 with FM06, L01, and N10. As such,

10.1002/2015JA021343
using EPT of between 50% and 75%, our technique returns onset times more closely correlated to FM06, L01, and N10 than NG11. Comparing the NG11 onsets with those from SOPHIE, we find that the probability of a SOPHIE onset being associated with a NG11 onset is higher at lower Δt than for the comparison of SOPHIE or NG11 with FM06, L01, and N10. Figure 2d shows that for EPT of 50%, 75%, and 90%, SOPHIE returns 20%, 33%, and 36% of its onsets within 1 min of onsets in the NG11 list. This is to be expected, given that the same data set and broadly similar techniques were used to determine the onset times. We are

10.1002/2015JA021343
unable to directly compare FM06, L01, and N10 in a similar manner since these lists of onsets do not overlap in time.
The time for the 50% cumulative probability from a comparison of NG11 with L01 differs from that presented in NG11 because that study only compared onsets with those from L01 identified between 1997 and 1998,

10.1002/2015JA021343
only considered events with a positive Δt, and discounted any events with a Δt greater than 60 min. In our analysis, we include all events. Table 1 shows the proportion of time taken up by each phase, the number of phase onsets, and the number of transitions between different phases. Those expansion phases in which the difference in the variation of SML and SMU is low, along with any subsequent recovery phases, are listed as "enhanced convection." Also shown are the means and medians of the phase lengths and the waiting times between expansion phase onsets directly preceded by a possible growth phase.
For higher values of EPT (corresponding to faster negative changes in SML), the proportion of the data marked as growth phase increases and the proportion marked as expansion and recovery correspondingly decreases. The percentage of time in the recovery phase for list from EPT of 50%, 75%, and 90% is almost double that in the expansion phase. The mean and median phase lengths of the recovery phases are approximately constant with changing EPT, but the expansion phase durations shrink with increasing EPT. Our average recovery phase lengths are consistent with the values from Partamies et al. [2013], although our mean expansion phase lengths are slightly shorter and our median expansion phase lengths are slightly longer. The mean and median lengths of the possible growth phase intervals from SOPHIE are greater than the growth phase lengths from Partamies et al. [2013] and McPherron [1994]; however, this is to be expected as they used a more restrictive criterion for identifying the growth phase.
For higher EPT values, there are fewer substorm intensifications (expansion phases following a recovery phase interval) than expansion phases following a possible growth phase, suggesting that intensifications are relatively small. One of the benefits of SOPHIE over the existing event lists examined here is that our technique enables the explicit determination of whether an expansion phase onset is a substorm intensification. a Shown are the proportions of the data set identified as growth, expansion, and recovery phases; the number of growth, expansion, and recovery phase onsets; the number of transitions between the various phases; and measures of the distribution of waiting times between expansion phase onsets preceded by a possible growth phase. The mean number of expansion phase onsets preceded by a growth phase or preceded by a recovery phase (intensifications) are shown in the bottom rows. The first row shows the EPT value tested, and the second row shows the range of RPT values obtained.

10.1002/2015JA021343
For an EPT of 75, SOPHIE identifies 26,295 substorms following a possible growth phase, within 13% of the 30,484 substorms identified by NG11 in the same period, although the total number of expansion phases is higher (45,917, 50% more than NG11). At lower EPT SOPHIE identifies more expansion phases than NG11, and at EPT of 90 SOPHIE identifies fewer expansion phases than NG11. A comparison with the number of onsets in FM06, N10, and L01 is complicated due to the incomplete time coverage of the instruments used in these studies. Figure 3 shows the waiting time distributions between (a) substorms identified in the FM06, L01, N10, and NG11 lists; (b) substorms from SOPHIE preceded by a possible growth phase for EPT values of 50%, 75%, and 90%; and (c) all the expansion phase onsets from SOPHIE (including substorm intensifications) for the given EPT values. The waiting time distributions from each of the previously published lists are markedly different, despite all apparently determining substorm onsets. While this may be a result of the times covered by these lists and the variations in substorm activity with solar cycle, it is more likely that these differences result from the different techniques used in creating these lists. NG11, L01, and N10 all show peaks in the distributions at~30 min, whereas FM06 peaks at~60 min. The waiting time distributions from onsets identified by SOPHIE preceded by a possible growth phase (Figure 3b) are similar to that from FM06 (Figure 3a), whereas the distributions of all expansion phase onsets from SOPHIE ( Figure 3c) are more similar to NG11, L01, and N10 (Figure 3a). This suggests that a large number of events in the NG11, L01, and N10 lists are in fact intensifications.
While SOPHIE gives users the ability to determine onset times using any EPT, based on our analysis above, we recommend that an EPT of 75% to give a list of onsets similar to previous studies.

Superposed Epoch Analysis of Substorm SML
In order to examine whether or not the general profile of the substorms identified by SOPHIE is consistent with those from previously published onset list, we compare the results of a superposed epoch analysis of SML. Figure 4 shows the results of this analysis with the zero epoch defined as the expansion phase onset times from the auroral and magnetic lists and from SOPHIE, using an EPT of 75%. For each trace, we set the median SML value at zero epoch to 0 nT. Data from FM06 onsets are shown in black, L01 are shown in blue, N10 are shown in yellow, and NG11 in red. Data from SOPHIE (EPT of 75%) including all expansion phase onsets are shown as a dotted black line, and results only onsets following a possible growth phase are shown as the dashed black line.
The superposed epoch study results based on the auroral lists (blue, green, and solid black lines) generally show similar trends: SML is approximately constant prior to onset, then SML rapidly decreases over 25 min, before recovering to its preonset level over the next 150 min. Results from the N10 list are different, however, and give a minimum in SML at 57 min after onset. Figure 3 demonstrates that the tested substorm onset lists generally show the same large-scale characteristics apart from N10. While the N10 events show a small geomagnetic bay following their onsets, it is much weaker and shows no sharp decrease in SML at the onset time. This suggests that the auroral intensifications identified by N10 are most likely intensifications of a preexisting current system rather than a substorm onset, as first pointed out by Frey [2010].
The superposed epoch results from the events from the magnetically derived SOPHIE and NG11 lists show similar trends, although the magnitude of the events in NG11 is approximately double that of the events from SOPHIE. Unlike the auroral lists, the traces from SOPHIE including all onsets (dotted line) and from NG11 show a distinct minimum~10 min prior to onset. This minimum is removed if only events with onsets following a growth phase are used (dashed line). This shows that the local minimum in the superposed epoch analysis of SML prior to onset is due to the identification of substorm intensifications as onsets and strongly suggests that some of the onsets in NG11 are in fact substorm intensifications.
In summary, onsets from SOPHIE are generally identified prior to auroral onsets with more than 50% of auroral onsets occurring within 20 min of an onset identified by SOPHIE. The distributions of the time differences between onsets from SOPHIE are similar in shape to those found by comparing NG11 to auroral onset lists. The average length of the expansion and recovery phases determined by SOPHIE is comparable with previous studies, but the possible growth phase is much longer due to a much less restrictive definition of this phase. Comparing the superposed epoch analysis of SML determined using each list shows similar trends.

Discussion
We have developed a novel technique, called SOPHIE, for identifying substorms from auroral electrojet indices. This new technique enables us to identify the start and end of each phase, including substorm intensifications, thus providing a complete picture of substorm activity, as opposed to solely identifying the expansion phase onset. Furthermore, by using magnetometer-based auroral indices, SOPHIE provides  Figures 3b and 3c, black shows onsets determined with an EPT of 50%, blue shows onsets for an EPT of 75%, and red shows onsets for an EPT of 90%. The mean, median, and standard deviations of the displayed distributions are shown next to the list identifiers.

10.1002/2015JA021343
a near-continuous, long-term identification of substorm phases. We have shown that the expansion phase onset times from SOPHIE are similar to, although often earlier than, those from existing automated and visually determined lists. SOPHIE defines the substorm expansion or recovery phase based on exceedance of some threshold in the rate of change of SML. Given that the distribution of dSML/dt appears continuous, there is no obvious threshold value to choose. Consequently, SOPHIE uses a nonparametric approach (i.e., not based on physical units), defining substorm times based on a given percentile of the rate of change of SML. In this way, the method is insensitive to changes in SML, such as seasonal variations [Singh et al., 2013], solar cycle variations [Ahn et al., 2000;Tanskanen et al., 2002Tanskanen et al., , 2011 and changes in the number of stations in the magnetometer network, in the sense that the proportion of expansion and recovery phases identified is essentially preserved for given EPT and RPT values. More precisely, an EPT of 90% will pick out the largest 10% of negative changes in SML as expansion phase events. Smaller EPT and RPT limits will enable smaller changes in SML to be identified as substorms and thus will pick out smaller events. It should be noted that this would also move the expansion phase onsets of large events to earlier times. Also, it could be argued that the optimal EPT and RPT values are those that yield equal number of expansion and recovery phase intervals. Our method iteratively calculates the RPT value (to within 1%) that minimizes the difference between the number of expansion and recovery phases.
With this technique, we present a means to provide an accurate and continuous and, most importantly, userindependent determination of substorm onset, as well as substorm expansion, recovery, and potential growth phase durations. As with all automated procedures, it is necessary to calibrate against existing results. We have calibrated the output of SOPHIE against the substorm onset lists of Liou et al. [2001], Frey and Mende [2006], Nishimura et al. [2010], and Newell and Gjerloev [2011] as well as with the reported results of other studies. We provide the results of three different EPT values and find that an EPT of 75% renders expansion phase onsets within 20 min of those in other lists and that the superposed epoch analyses of SML using  Newell and Gjerloev [2011] (red) are shown along with the analysis of all expansion phase onsets from SOPHIE (dotted line) and isolated expansion phase onsets from SOPHIE (dashed line). The trends in the superposed data from each list are similar, although the magnitudes of the variations differ. We note that a slight upturn in SML prior to onset is seen from the Newell and Gjerloev [2011] list and from the SOPHIE onsets including compound substorms but is not seen in the data from the auroral lists or isolated onsets from SOPHIE.

10.1002/2015JA021343
the 75% EPT were similar to superposed epoch analysis results of the above auroral onset lists. As such, we recommend that an EPT of 75% be used to identify substorms similar to those within the literature, although we also include both the 50% and 90% EPT values to enable researchers to study small, medium, and large substorm events.
We note that SOPHIE tends to identify expansion phase onset times after onsets determined from auroral observations. A similar trend was seen by Chu et al. [2015], who identified substorm onsets from midlatitude magnetometer data. This is most likely due to requiring a sufficiently high threshold to exclude small-scale variations that are not substorms. We note that recent studies have shown that ULF wave activity begins prior to auroral brightenings observed by global auroral imagers [Murphy et al., 2009a] but contemporaneous with ground-based all-sky imager observations of the brightening of the substorm arc [Rae et al., 2009]. The formation of the geomagnetic bay may occur prior to the time at which the aurorae are sufficiently bright over a large enough area for a global auroral imager to measure a discernable change in auroral brightness. We thus conclude that our onset times are as valid as other substorm onset times, although while the mechanisms behind onset remain unclear, determining true onset times is impossible.
Unlike the most commonly used onset lists, SOPHIE is not limited to identifying expansion phase onset; we also identify the start of the possible growth phases and recovery phases and consequently the durations of all substorm phases. A comparison between average phase lengths and substorm waiting times shows that SOPHIE gives comparable results to earlier studies [Borovsky et al., 1993;Hsu and McPherron, 2012;Partamies et al., 2013], with the notable exception that our possible growth phases are somewhat longer due to our less restrictive criteria in identifying these phases.
The superposed epoch analysis (Figure 4) shows that while the overall trends in SML for the events identified in each of these lists are similar, there are some interesting differences: notably, the magnitude of the bays from Newell and Gjerloev [2011] were approximately twice that from the auroral lists, the minimum in SML from the Nishimura et al. [2010] list was 25 min later than the other lists, and the SML traces from all the events identified by SOPHIE and from Newell and Gjerloev [2011] showed an increase in SML just prior to the expansion phase onset. The onsets from Liou et al. [2001] and Nishimura et al. [2010] were identified close to or during solar minima, whereas events from Frey and Mende [2006] and Nishimura et al. [2010] were identified close to solar maximum. Larger substorms are typically seen following solar maximum ; thus, this may account for the difference in the median SML for the events from these auroral lists. The larger magnitude of the substorm bays from the Newell and Gjerloev [2011] list is likely due to a threshold which picks out changes in SML above the 82nd percentile. Using a lower percentile will still identify these large events but also identify smaller events, as the auroral lists do. The upturn in SML just prior to onset seen in the superposed epoch analysis of all the SOPHIE events and the Newell and Gjerloev [2011] events is due to the identification of substorm intensifications (expansion phases immediately following a recovery phase). The studied auroral lists all discount further brightening of the aurora within a specified time from an identified onset; thus, they tend not to identify subsequent substorm intensifications. This is demonstrated by isolating those events found by SOPHIE that are preceded by a possible growth phase interval (dotted line in Figure 4), which shows only a slight upturn in SML prior to onset. Such a comparison has not previously been possible, as the existing lists do not identify each substorm phase.
Like previous techniques, SOPHIE is limited in that it distils substorm processes into a set of signatures in a single data set. Previous studies have shown that combining substorm signatures from the aurora with the occurrence of particle injections at geosynchronous orbit introduces uncertainty into the identification of substorms [Boakes et al., 2011]. From the waiting time distributions presented here and in previous studies [Borovsky et al., 1993;McPherron, 1994;Hsu and McPherron, 2012], we can see that different techniques designed to identify the same global phenomenon determine events with very different temporal distributions as a result of using different techniques and data sets. In practice, SOPHIE identifies times of increasing or decreasing deflection of the ground magnetic field at auroral latitudes, which can be linked to increase or decrease in the strength of the westward electrojet. This is commonly reported as a signature of expansion and recovery phase activity as a so-called "negative bay" [Davis and Sugiura, 1966]. However, there are a zoo of events that are not substorms but that may exhibit some substorm-like features: pseudobreakups and convection bays may be identified as expansion and recovery phase times and steady magnetospheric convection events may be identified as possible growth phases. Users of this technique should be aware of these Journal of Geophysical Research: Space Physics 10.1002/2015JA021343 limitations and that using SOPHIE in conjunction with observations of auroral activity, wave activity, and particle injections will give a more comprehensive list of substorm events. However, by using SOPHIE we can improve our knowledge of the processes within the substorm cycle by comparing data not only at times before or after expansion phase onset but at specific intervals within the possible growth, expansion, and recovery phases.
Existing lists of substorms concentrate solely on substorm onset, either by identifying auroral brightenings or sharp decreases in auroral indices. These are often used as the basis of statistical superposed epoch analysis of a variety of different data sets. However, given that substorm phases can vary in length, the validity of comparing data at times increasingly far from the defined onset reduces as the analysis mixes growth, expansion, and recovery phase data. Yokoyama and Kamide [1997] and Hutchinson et al. [2011] discussed this shortcoming of superposed epoch analysis in terms of geomagnetic storms. Using SOPHIE, we identify the start and duration of each substorm phase, therefore enabling future studies to perform superposed epoch analyses of individual phases or comparing data points within given fractions of the total substorm or substorm phases.
While we have applied this technique to the SML data set, in principle it could be applied to any auroral zone magnetometer or magnetometer chain. Given that the thresholds for identifying each phase are based on the variations in the data set, rather than absolute values, this may enable substorm phases to be more readily identified in stations that are typically further from the auroral electrojet. However, study of this application is beyond the scope of this technique paper.

Summary
We have developed a new technique, called SOPHIE, for identifying all substorm growth/energy input, expansion and recovery phase onsets, and durations from auroral electrojet indices and ground magnetometer data. In order to test this technique, we applied it to the SuperMAG AL (SML) index. In summary, the technique works as follows: 1. Intervals during which |dSML/dt| is greater than a specified percentile (EPT) of the dSML/dt < 0 data set are marked as expansion phase; 2. Intervals during which dSML/dt is greater than a specified percentile (RPT) of the dSML/dt > 0 data set are marked as recovery phase; 3. All other data are marked as possible growth phase.
The data are then processed to remove specifically identified artifacts. The RPT value is adjusted iteratively to minimize the difference between the number of expansion phases identified and the number of recovery phases identified.
Comparing the expansion phase onsets determined using this new technique with previously published lists, we find that at least 50% of the time there is always a SOPHIE event within 20 min of events in the comparison list, although our method tends to identify the substorm onset time after those in auroral lists. We determined that an EPT of 75% gives a comparable distribution of substorm onsets to previous lists. Through a comparison with existing lists and recent results examining the timing of various onset signatures, we conclude that our identified onsets are at least as valid as existing lists of substorm onsets. Unlike many other techniques, SOPHIE provides the opportunity for far more detailed study of individual substorm phases during substorms as a whole and not just around the expansion phase onset by identifying the time and duration of each individual substorm phase. Thus, SOPHIE is an improvement over existing lists of substorm onsets.