Journal of Geophysical Research: Space Physics Network analysis of geomagnetic substorms using the SuperMAG database of ground-based magnetometer stations

The overall morphology and dynamics of magnetospheric substorms is well established in terms of the observed qualitative auroral features seen in ground-based magnetometers. This paper focuses on the quantitative characterization of substorm dynamics captured by ground-based magnetometer stations. We present the ﬁrst analysis of substorms using dynamical networks obtained from the full available set of ground-based magnetometer observations in the Northern Hemisphere. The stations are connected in the network when the correlation between the vector magnetometer time series from pairs of stations within a running time window exceeds a threshold. Dimensionless parameters can then be obtained that characterize the network and by extension, the spatiotemporal dynamics of the substorm under observation. We analyze four isolated substorm test cases as well as a steady magnetic convection (SMC) event and a day in which no substorms occur. These test case substorms are found to give a consistent characteristic network response at onset in terms of their spatial correlation. Such responses are di ﬀ erentiable from responses to the SMC event and nonsubstorm times. We present a method to optimize network parametrization with respect to the di ﬀ erent individual station responses, the spatial inhomogeneity of stations in the Northern Hemisphere, and the choice of correlation window sizes. Our results suggest that dynamical network analysis has potential to quantitatively categorize substorms.


Introduction
Substorms are an extensively studied phenomena in geophysics, and while there is an overall established substorm cycle [McPherron et al., 1973], there is considerable variation in the specific detailed sequence of events [Akasofu, 2004;Meng and Liou, 2004]. The ability to quantify substorm dynamics in an automated manner would be a valuable tool to determine what initial conditions, in terms of the internal state of the magnetosphere-ionosphere system and energy loading by the solar wind, produce a given detailed response. Attempts have been made at a classification of substorms based on images of the aurora [Syrjasuo et al., 2007], where training algorithms were used to identify a wide range of arc shapes that can be present during a substorm. AE indices have also been used to identify substorm behavior [Gjerloev et al., 2004], although such descriptions are limited by the scalar and spatially aggregating nature of the AE indices.
Ground-based magnetometer stations detect the variation in the local magnetic field resulting from time-dependent current systems in the ionosphere and serve as a proxy for dynamics occurring in the magnetosphere. There are typically ∼100 magnetometer stations available to observe any given substorm. The question is whether an algorithmic methodology can be developed to quantitatively characterize a substorm signature from these ∼100 time series in an automated manner. SuperMAG is a database that collates and processes all available ground-based magnetometer vector time series into a standardized baselined format at 1 min cadence [Gjerloev, 2012]. This provides an excellent starting point for studies attempting to characterize collective information from these stations. Here we investigate whether canonical correlation between the vector time series of pairs of magnetometer stations can be used to construct a network that can characterize substorms. Correlation between stations has been examined previously, although only using a few contraposed station pairs [Jackel et al., 2001]. If the correlation between all ∼100 stations can be quantified in a robust and readily accessible manner, then this would provide a tool for substorm identification and classification.

10.1002/2015JA021456
The use of complex networks [Boccaletti et al., 2006] has expanded from its birth in the social sciences [Milgram, 1967] through uses in biological [Nicol et al., 2012] and engineering systems [Sivrikaya and Yener, 2004] to geophysical systems [Radebach et al., 2013;Malik et al., 2012;Donges et al., 2009]. Recent work on spatially embedded networks [Heitzig et al., 2012] has facilitated the application of complex networks to physical systems where the observations are spatially distributed. A dynamical network is defined by instances of time-varying connections that exist between the available pairs of points (here the magnetometer stations). In a substorm context the connections are identified using similarities between the vector time series of pairs of stations. Importantly, pairs of stations are either connected or not connected so that at its core, network methodology consists of quantifying when the cross correlation between the time series of a given pair of stations exceeds a threshold. A central problem in applying this methodology to real-world systems is identifying the appropriate threshold. This depends on the individual station responses which are influenced by multiple factors such as ground conductivity, oceanic currents, and instrument response. We will present new methodology to identify the threshold needed to account for such effects. Once the thresholds are determined, a series of networks can be constructed for a given substorm and parameters formed to describe the network topology. Questions then arise: do these dimensionless network parameters robustly identify the substorms? If so, then could this provide a new framework to quantitatively identify and characterize geomagnetic substorm activity? Is there a statistical network that describes the statistical substorm?
In this paper we establish a new methodology for dynamical network analysis in the magnetospheric substorm context and identify several dimensionless network parameters that capture key aspects of the dynamics of the substorm. A brief overview of the paper is as follows: In section 2 we introduce the data set used here, how connections in our network of stations were determined, and how the dimensionless network parameters that are used to describe the substorms are formed. In section 3 we apply this methodology to the four test case substorms as well as a steady magnetic convection (SMC) event and a day in which no substorms occur. The methodology is outlined in the body of the paper and further details are provided in the supporting information (SI) section.

The Data Sets Used in This Study
We used vector magnetometer time series data at 1 min cadence from the SuperMAG database. The Super-MAG database is an amalgamation of stations from different magnetometer station groups. All magnetometer data have been preprocessed in an identical way to remove long-term (> 1 day) trends [Gjerloev, 2012]. The vector time series are in local magnetic coordinates [Gjerloev, 2012]. Care must be taken due to variations in resolution, dynamical range, and local ground conductivity [Tanskanen et al., 2001] between magnetometers from different groups, as we discuss briefly in section 2.1 (with more detail in SI Text S1). The nonlocal time-dependent contributions from ground conductivity cannot be removed from the magnetometer time series. However, the contributions are accounted for here in terms of their average effect on correlation between stations through the choice of station-specific thresholds. All active sites in the Northern Hemisphere between magnetic latitude (MLAT) 50 ∘ to 90 ∘ are used in the analysis. The distribution of available stations below 50 ∘ MLAT becomes much more inhomogeneous, such inhomogeneities can distort network parameters.
Four substorms are investigated and were selected according to the criteria outlined in Gjerloev and Hoffman [2014]: temporal isolation, the substorms do not occur during a magnetic storm (|DST| < 30 nT), and they are of the classic bulge type. They occur in the years 1997 and 1998 November-February. Winter months were chosen to limit sunlight on the dayside. The events are also selected such that there is a good distribution of stations in the nightside at the onset of the substorm. Onset times for the substorms were determined to 1 min precision using the Polar satellite's Visible Imaging System (VIS) and Earth Camera [Gjerloev and Hoffman, 2014]. Care was taken such that the onset brightening developed continuously into a substorm to eliminate pseudo onsets [Gjerloev and Hoffman, 2014]. The substorm peak is also identified using Polar VIS images. The peak is a qualitative estimate of the combined intensity of the event and the westward and poleward expansion of the poleward auroral boundary [Gjerloev and Hoffman, 2014]. A quiet day and a steady magnetic convection event were also investigated. The quiet day is defined as a day in which no substorms have occurred. The SMC event chosen occurs on 10 February 2008. This event was chosen due to a good distribution of magnetometer stations in the nightside at the start of the event.

Constructing the Network
For each of these events, we form dynamical networks of connected stations in magnetic local time (MLT)-MLAT space in the Northern Hemisphere. To establish whether a pair of stations is connected in the network, we need to identify the time intervals where the correlation between signals from two stations is above a threshold. Connected stations then reflect a shared response to some spatially extended magnetic activity. Whether or not a pair of stations is connected varies with time as the geomagnetic conditions change. The network is formed in the following steps which are illustrated in Figure 1: Step 1. A running 128 min window is applied to the raw vector time series, and the data are detrended in each window with a linear fit. The window is chosen to exceed the timescale in which changes are occurring within the substorm. Linear trends on the window timescale are removed so that the effect of the window on the resultant cross correlation is small. The same running window is used to obtain the time-varying cross correlation (step 2). We use a 128 min window for two reasons: first, a window of sufficient length is needed to have confidence in correlation (results using a shorter window length are included in SI Text S4). Second, 2 h represents the typical global evolution timescale for a substorm. Fluctuations on timescales shorter than the window then can contribute to the amplitude of the cross correlation. Changes in the value of cross correlation on timescales shorter than the window are not resolved. The spatiotemporal scales of the observations fundamentally limit what can be deduced from any measure of cross correlation. Observations are at 1 min resolution, so with a 128 point cross correlation, an equatorial station has moved by ∼3300 km. For a station near the auroral oval (∼70 ∘ latitude), the station will have moved ∼1100 km. This roughly determines the spatial resolution implied by the time window.

Journal of Geophysical Research: Space Physics
10.1002/2015JA021456 Figure 2. The monthly averaged normalized degree for representative stations is plotted as a function of the global threshold, C T . A given normalized degree for the network, n 0 , gives a corresponding threshold for each station, C Ti .
Step 2. We use canonical correlation [Brillinger, 1975] between the windowed segments of pairs of vector magnetometer time series to establish similarity between pairs of stations as a function of time. We focus on near instantaneous correlation, that is, correlation at zero lag (an example of a 4 min lag network is presented in SI Text S3). We calculate the canonical correlation between the ith and jth station for all possible station pairs to form a cross-correlation matrix (or weighted network), C ij (t). C ij (t) contains the correlation coefficient for the first canonical component for each station pair.
Step 3. Next we threshold C ij (t) to obtain the adjacency matrix A ij (t), which is zero (no connection) or one (connection) for a given pair of stations ij. The threshold C Tij in principle has a different value for each pair of stations in the network. We will construct a standardized adjacency matrix such that all stations, on a long timescale (here, 1 month), have the same average degree (or likelihood to be connected to the network). On average each station will, by construction, then be connected to the same fraction of the network as any other station. We define a normalized degree, n i (t), which is the number of connections a station i has divided by N(t) − 1, where N(t) is the number of active stations at time t. Figure 2 plots the month averaged degree n i (C T ) obtained by applying a single global threshold C T across all stations. We plot how n i (C T ) varies with C T . We standardize our adjacency matrix by finding the threshold C Ti for each station that gives the same fixed degree n 0 (when averaged over 1 month). The station-dependent thresholds, used to obtain the time-dependent adjacency matrix, are then C Tij = min[C Ti , C Tj ] (This procedure is described in detail in SI Text S1.).
Step 4. Once a network is constructed for each time window we can calculate time-dependent dimensionless parameters that describe the spatial distribution and extent of the correlated behavior. These can then be used to characterize the system.

Network Parameters
Once the threshold for each station pair, C Tij , is determined as above, we can obtain the adjacency matrix: so that A ij = 1 if station i is connected to station j and is zero otherwise and Φ is the Heaviside step function. All diagonal elements (self connections) of A ij are set to zero, and in an undirected network the matrix is symmetric, as in the case for correlation calculated at zero lag. Once the dynamical network has been formed, network parameters can be used to quantify its evolution. Given A ij (t), time-dependent global network parameters can be determined as follows: 1. The normalized total number of connections, where N(t) 2 − N(t) is the total number of possible connections in the network. N(t) is the number of active stations taking data, which varies with time. 2. The average geodesic connection distance (physical distance), , in the network. Note that the average geodesic connection distance here is not the shortest path (or graph geodesic) between two stations

Journal of Geophysical Research: Space Physics
10.1002/2015JA021456 [Newman, 2010]. A distance separation matrix, d ij , is formed that is the geodesic distance separations between all stations. The average connection distance is then which is normalized to the average connection distance if all stations were connected. Note, can be > 1. 3. Θ kp is the number of connections within, and between, two fixed latitudinal bands. The lower latitude band extends from a lower bound, L l = 50 ∘ MLAT (no station data below this latitude were used), to an upper bound U l . U l is defined as the upper edge of the auroral oval before onset of the substorm of interest at magnetic midnight. The position of the auroral oval is obtained via visual inspection of Polar VIS images, and U l is different for each event. The upper latitude band extends from the upper edge of the auroral oval, where the subscripts k and p take all values of the indices for the lower and upper bands u and l. Θ uu is then the normalized number of connections between stations in the upper band, Θ ul the normalized number of connections between the stations in the upper band and stations in the lower band, and Θ ll the normalized number of connections between stations in the lower band.

Establishing Statistical Significance
We now quantify the likelihood that a connection between a station pair could occur by chance (a "false positive"). For colored noise, canonical correlation is known to produce increasingly high correlation coefficients in the first component for increasing , where the power spectrum of the noise inputs varies as f − [Jackel et al., 2001]. To quantify the likelihood of false positives, we construct ten noise surrogate data sets as follows: For each noise surrogate data set, each time windowed segment of the signal for all stations is Fourier transformed. The phases are then randomized, leaving the power spectrum amplitude unchanged, and the resulting signal is then inverse transformed. The same process for forming the network that we apply to the observations is then applied to the surrogate data to obtain an estimate for the number of false positives: where f ij is the surrogate network and F the normalized total number of connections in the network (i.e., false connections). Ten of these surrogate networks, f ij , are formed, and the normalized total number of connections (summed over the surrogate network), F, for each surrogate network is found. The average of these 10 surrogate values of the (normalized) total number of false connections can then be averaged to give an estimate of the network false positive number.

Results
We present results for dynamical networks calculated for four substorms over a 10-12 h interval centered on the substorm onset. We also obtain the networks for a steady magnetic convection event and a "quiet" day (defined by a lack of substorms occurring). The dynamical networks and their parameters are obtained for a 128 min running window with a 126 min overlap; i.e., a new network is calculated every 2 min. The results here focus on canonical correlation networks at zero lag; therefore, the network parameters represent the near-simultaneous response to correlated magnetic activity. A normalized degree for the network n 0 = 0.05 is chosen in order to reduce the number of false positives (spurious connections) in favor of allowing more false negatives (unidentified real connections). As a consequence the networks are formed from only the strongest connections in the system. where v x is the solar wind velocity along the Earth-Sun line and b z is the north-south component of the IMF. (fifth row) Also plotted is AE (blue), AL (red), and AU (green). The first vertical dashed red line indicates the onset time, and the second indicates the peak of the substorm. The time of the peak of the substorm is a qualitative estimate based on POLAR satellite images (see section 2). The times (1)-(4) are highlighted in Figure 3 (top left; substorm 1). The networks at these times are plotted in Figure 4. Note, since the occurrence of false positives is independent of geodesic separation length and position, , on average, will be 1 and the false connections are evenly distributed in the latitudinal bands. Figure 3 plots the time evolution of the network during four substorms. The network parameters are plotted as functions of the time of the leading edge of the correlation window for each realization of the network. Therefore, when comparing with AE and −v x b z , a range of values equal to the window length must be considered.

Network Response to Substorms
Both v x and b z are propagated solar wind parameters in geocentric solar magnetospheric coordinates, and the data are obtained from the Wind satellite.
Substorms 1 and 2 are isolated events with interplanetary magnetic field (IMF) b z turning southward 1-2 h before onset. Both substorms have low connectivity in the network before the substorm onset, with any existing connections being short range (i.e., is small). There is a rapid increase in connectivity around onset, accompanied by an increase in above preonset levels for both substorms. Both substorms show an increase in high-latitude connections, low-and cross-latitudinal connections at onset. There is then a gradual decrease in the overall connectivity as the substorms enters the recovery phase. This phase is defined by the slow return of AL to presubstorm levels. For substorm 1, does not decrease during this phase to presubstorm levels and at the end of the substorm there is a resurgence of network activity dominated by low-latitude connections. also reaches its maximum here. We associate this resurgence of activity with the later stages of the recovery phase of the substorm as the correlation window still encompasses a large portion of the recovery phase (the leading edge window time is plotted). Substorm 2 only shows a minor resurgence of activity at the end of the recovery phase consisting almost entirely of low-latitude connections. does only returns to presubstorm conditions after several hours.
Substorm 3, unlike the previous two substorms, has a substorm occurring 5 h before the substorm of interest at 18:00 UT. Both substorms can be seen on the plot. IMF b z remains negative following the end of the previous substorm. There is a strong network response to onset of the substorm, with reaching 0.28 during the onset peak. At onset there is again a large number of high-latitude connections indicated by Θ uu . There is a second peak in connectivity at the end of the recovery phase where alpha reaches 0.55, i.e., half of all available connections are present. These low-latitude connections dominate here. also reaches maximum here after a slow ramp up during the substorm.
For substorm 4, b z is southward well before the onset and AE is also in a perturbed state. There is a gradual increase in connectivity before the substorm onset with reaching 0.22 here. The first peak around onset is dominated by high-latitude connections. continues to increase reaching a maximum of 0.4 during the recovery phase. This phase shows an increase in low-latitude connections. also reaches maximum during the recovery phase after a slow ramp up during the substorm. The likelihood of false connections during this substorms is much higher than the other three substorms (for consistency the same normalized degree, n 0 , for the network was chosen for all events). The onset peak, however, is largely free of false connections. Substorm 4 shows some activity in the network well before the substorm onset, although it is difficult to identify whether it is associated with this specific substorm or is indicative of unrelated activity.
The magnitude of the response in network parameters, for the test case substorms appears to be largely independent of the magnitude of peak AE during the substorms. This seems to indicate that the network is not simply tracking the magnitude of ongoing activity. Sustained AE > 300 nT is not necessarily associated with near-simultaneous magnetic activity. This can be seen in substorms 1 and 2 where there is a clear dropoff in connectivity in the network postpeak while AE remains high. This dropoff in cannot be seen as clearly in substorms 3 and 4 possibly due to the short recovery phase in comparison to substorms 1 and 2.
The networks can be visually represented; we show this for substorm 1 in Figure 4. Figure 4 shows snapshots of the connection maps (left column), maps of the spatial normalized degree distribution (middle column), and Polar VIS data (right column) for times that are indicated in Figure 3 (top left). The connection maps show that the 2 h before onset (1) there is little connectivity in the network, and any existing connections are local. At the onset phase (2) the connection structure is composed of highly concentrated connections at high latitude (the green connections) as well as significant cross-latitudinal connections (the red connections). The connections at this stage are situated around the onset brightening seen in the Polar VIS images. This is seen clearly in the degree maps, stations in the evening sector at high latitudes having large normalized degree. During the recovery phase (3) the correlated behavior shifts to cross connectivity between regions centered around 20 MLT and 8 MLT. At the end of the recovery phase (4) the network is at its most globally distributed, with significant connectivity between the dayside and the nightside. In general, stations with the highest normalized degree are found outside of the expanded auroral bulge region during the recovery phase substorm. However, one cannot directly compare a single snapshot of the auroral visible light emission to the network connection structure, which aggregates information from a 2 h time window.

Network Response to Other Phenomena
To test the robustness of this approach, we now apply the same methodology to both a "quiet day," defined here by a lack of substorms occurring and a steady magnetic convection event. The results are plotted in Figure 5 where the axes have the same scales and format as Figure 3. The quiet day was selected at random, and there were no constraints on the solar wind conditions. We can see that for the quiet day ( Figure 5, left) does not exceed 0.04, which is significantly lower than that occurring during substorms, and similarly, does not exceed 0.35. There is little discernible network response to the brief step-like southward turnings of the IMF and the subsequent responses in AE.
For the SMC event, Figure 5 (right), there is a gradual increase in at the onset of the event. The increase in connectivity coincides with the increase in AE from ambient levels. continues to increase and reaches a maximum of 0.11 at the end of the event, which is a factor of 3 less than seen during substorms. Throughout the event connections are dominated by cross-latitude and high-latitude connections. is raised from typical background levels from the onset of the perturbations in AE and reaches a maximum of 0.55. During this event the nightside sector was well represented by the station configuration, from Figure 6 we can see that there is a comparable station configuration, and thus spatial sampling, to substorms 1 and 2.
To summarize, the typical signatures of isolated substorm activity are then as follows: 1. Few connections in the network before onset of substorm. 2. The network exhibits a clear rapid response at onset indicated by an increase in connectivity, > 0.22.
High-latitude connections are a key feature of the onset peak; however, low-and cross-latitudinal connections are also present. 3. There is a switch from a high-latitude-dominated connection structure to a low-latitude-dominated connection structures during the later stages of the recovery phase. During the recovery phase usually reaches maximum. 4. The maximum and reached during the substorms is ≥ 0.32 and ≥ 0.7, respectively. In comparison, the maximum and during the SMC event was 0.11 and 0.55, respectively. Similarly, for the quiet day, the maximum = 0.04 and = 0.35.