Solar cycle variation of the statistical distribution of the solar wind ε parameter and its constituent variables

We use 20 years of Wind solar wind observations to investigate the solar cycle variation of the solar wind driving of the magnetosphere. For the first time, we use generalized quantile‐quantile plots to compare the statistical distribution of four commonly used solar wind coupling parameters, Poynting flux, B2, the ε parameter, and vB, between the maxima and minima of solar cycles 23 and 24. We find the distribution is multicomponent and has the same functional form at all solar cycle phases; the change in distribution is captured by a simple transformation of variables for each component. The ε parameter is less sensitive than its constituent variables to changes in the distribution of extreme values between successive solar maxima. The quiet minimum of cycle 23 manifests only in lower extreme values, while cycle 24 was less active across the full distribution range.


Introduction
The 11 year solar cycle is variable, with an extended minimum observed in cycle 23 and a notably quiet maximum in the most recent cycle 24 [Lockwood, 2013;Zerbo and Richardson, 2015]. Satellites in the solar wind upstream of Earth provide a comprehensive data set covering several solar cycles. Variables observed in situ are combined into solar wind parameters that aim to capture the driving of the magnetosphere by the solar wind [e.g., Gonzales, 1990]. This letter focuses on the systematic changes in the statistical distributions of these solar wind parameters between different phases of the solar cycle.
The Poynting flux S is the energy flux density carried by the electromagnetic fields of the solar wind plasma. Along with S, we will consider three of the most commonly used parameters: the parameter [Perreault and Akasofu, 1978], in which S is scaled to account for energy transport into the magnetosphere [Koskinen and Tanskanen, 2002]; the westward electric field, estimated as v x B z [Burton et al., 1975]; and finally, B 2 , found to be more closely related to solar activity [Kiyani et al., 2007].
There has been extensive work on the statistics of solar wind variables. Early work [Burlaga and King, 1979] identified an approximately lognormal probability density function (PDF) of the magnetic field strength. Feynman and Ruzmaikin [1994] showed that the PDF of the interplanetary magnetic field (IMF) field strength could not be exactly lognormal due to nonzero kurtosis, which was substantiated later by Burlaga and Ness [1996] who determined the PDF to be lognormal with an exponential tail but also discussed the possibility of a Pareto tail. Koons [2001] successfully fitted the Gumbel class of extreme value distribution to annual maxima of the 60 MeV proton flux. Moloney and Davidsen [2010] found that the block maxima of the parameter follow the Fréchet distribution; however, this fit overestimated the highest values. Since the Fréchet class is the limiting distribution for a variable that has a PDF with a power law tail [Schumann et al., 2012], a Fréchet fit to block maxima of the parameter would support Burlaga's earlier proposal of a lognormal PDF with a Pareto tail. Burlaga and Lazarus [2000] also found evidence for lognormal distributions of the solar wind speed, density, and temperature, and that these distributions vary with the phase of the cycle. They attributed this variation to the dominance of corotating streams in the solar wind at solar minimum [Tsurutani et al., 2006]. Common to all these findings is the multicomponent nature of PDFs of solar wind variables.
A complementary approach to the statistical analysis of solar wind variables is the use of bursts. These are defined as the integrated signal over periods where the variable continuously exceeds a given threshold; they have been used to compare the solar wind with both solar flares [Moloney and Davidsen, 2011] and geomagnetic indices [Freeman et al., 2000a] • Figure S2 • Figure S3 • Supporting Information S1 to be power law with an exponential roll off [Wanliss and Weygand, 2007;Freeman et al., 2000b], as has the distribution of waiting times of bursts in B z,S [D'Amicis et al., 2006]. Of these fits, Wanliss and Weygand [2007] found the power law exponents of the and (v x B z ) s distributions to be solar cycle dependent; however, when looking at higher thresholds, Moloney and Davidsen [2014] found no such dependency. In the related studies of geomagnetic indices, Freeman et al. [2000a] found the distribution of burst lifetimes of AU and AL to be multicomponent, including a power law and exponential cutoff similar to , and Hush et al. [2015] found a multicomponent distribution of AE burst size, including a solar cycle-dependent exponential component at extreme values.
While these studies have been successful in fitting parts of the distribution of solar wind parameters, there are still open questions, including how the PDF changes over the solar cycle and which of the constituent variables within a parameter such as drives these changes. In this paper, we address these questions, with the first application of quantile-quantile (QQ) plots [Gilchrist, 2000] to the comparison of observed distributions of each solar wind parameter measured at different phases of the solar cycle. We examine the changes in the cumulative distribution functions (CDFs) of the coupling parameters S, B 2 , , and (v x B z ) s in cycles 23 and 24, between each maximum and minimum, between successive maxima, and between successive minima. In each case the distribution is multicomponent, and we find that over a broad range each subcomponent has a functional form that remains unchanged across all phases of the solar cycle. The change in CDF, and hence PDF, is captured by a transformation of variables for each subcomponent; this transformation takes the form of P(x) → P( x + ), independent of the underlying PDF functional form. Our results quantify the effect of the quiet minimum of solar cycle 23 and the quiet maximum of cycle 24 on the likelihood of observed values of these parameters. We see systematic changes in the PDFs of the constituent variables of and identify which variable drives the variation in the statistics of . We find that the statistics of can be less sensitive to these changes than those of its constituent variables.
In section 2 we introduce the data set and solar wind parameters. In section 3 we describe the QQ plot method and apply it to the solar wind Poynting flux. In section 4 we repeat this analysis for parameters B 2 , solar wind , B z and v x B z , and for all parameters across the full solar cycles. We conclude in section 5.

Data
Data from the Magnetic Fields Investigation and Solar Wind Experiment on the Wind spacecraft were provided by National Aeronautics and Space Administration/Goddard Space Flight Center (NASA/GSFC's) OMNI data set. Data were only taken at those times when the spacecraft was situated in the upstream solar wind. During preprocessing, the 15 s cadence interplanetary magnetic field (IMF) and 92 s ion plasma velocity are interpolated to 1 min resolution. The plasma velocity measurements include both fast and slow streams, and all measurements are in geocentric solar magnetospheric (GSM) coordinates. One year of data was taken spanning each of the minima and maxima of cycle 23 and cycle 24, using inclusive dates December 1995 to November 1996 (cycle 23 minimum), October 1999 to September 2000 (cycle 23 maximum), August 2007 to July 2008 (cycle 24 minimum), and November 2013 to October 2014 (cycle 24 maximum). Data gaps are reasonably evenly distributed between successive maxima and minima; they typically cover ∼10% of the data, ranging from ∼5% for the minimum of solar cycle 24 to ∼26% for the maximum of cycle 23. The remaining sample sizes are of order 400,000 samples in each data set. The Wind satellite orbits L1 and is, for our yearlong samples at maxima and minima, within 100 R E of the Sun-Earth line. It is outside this range for ∼8% of the full data set.
The Poynting flux, S, is the characteristic energy flux carried by the solar wind. For ideal magnetohydrodynamics, the electric field is E = −v × B, so that the component of Poynting flux along the Sun-Earth line is approximated by The parameter is based on the Poynting flux but is scaled to account for the interaction between the solar wind and magnetosphere. In SI units it is where v is the ion plasma speed, B S is the magnitude of the IMF with southward B z , l 0 is a length scale taken to be 7 R E , and = tan −1 (B y ∕B z ) is the clock angle. The parameter depends on all three components of the IMF, while the Poynting flux depends on B y and B z only.
Energy transfer between solar wind and the magnetosphere is in part ordered by the flux of southward directed IMF (i.e., negative B Z in GSM), due to the increased rate of magnetospheric reconnection. We therefore calculate the solar wind parameters S and B 2 for both the full data set and the subset of data where B Z < 0 and the magnetosphere coupling parameters and v x B z at times of southward IMF only. After removing the northward IMF values, the southward IMF subsets of data contained approximately 200,000 data points. The subscript S will denote the southward IMF only subsets of each parameter.

QQ Plots for Poynting Flux
The quantile-quantile (QQ) plot is a tool for comparing two statistical distributions. The distribution of an independently and identically (iid) random variable X can be described both by its probability density function (PDF), P(x), and the corresponding cumulative density function (CDF), C(x), which are related via is the likelihood that the value of variable X will occur below a value x (X ≤ x). The CDF can be inverted, so that for any likelihood q, a value x(q) can be calculated such that the likelihood of a measurement of X occurring below x(q) is q. The value x(q) is referred to as the qth quantile of the distribution of variable X; for example, the q = 0.5 quantile of the distribution, x(0.5), is the median: the value below which 50% of the data lie.
Consider two iid random variables, X 1 and X 2 , distributed according to PDFs P 1 (x 1 ) and P 2 (x 2 ) with corresponding CDFs C 1 (x 1 ) and C 2 (x 2 ). The quantiles of the two distributions are as follows: where C −1 (q) is the inverse of CDF C(x(q)). The QQ plot then has as its coordinates the quantiles x 1 (q) and x 2 (q), where the likelihood q is a parametric coordinate. If X 1 and X 2 have same distribution, their quantiles will be the same, and so a straight "y = x" line of gradient 1 will be recovered. Any other straight line on the QQ plot would be described by in which case the CDFs are related by a transformation of variables: With equation (3) this yields so that the corresponding PDFs are related by the same transformation of variables. The scaling parameter preserves the normalization, so that the integrated PDF is unity. Hence, if the QQ plot is linear, the functional form of the underlying PDF of both variables X 1 and X 2 is the same, subject to a transformation of variables from x to (x − )∕ . The parameter denotes a change in scale, while is a shift in location. A positive value indicates an increase in mean, so that the likelihood of large values of X increases as the whole PDF is shifted.
If > 1, a given likelihood q corresponds to a larger value of X, resulting in the stretching of the PDF tail. This in turn modifies the raw higher moments of the distribution; however, the standard measures of skewness and kurtosis can be normalized to account for the change in scale [Gilchrist, 2000]. Both and can be found from the data with a linear regression to the QQ plot.   Table S1 in the supporting information.
the successive maxima, and in the right column we compare each cycle's maximum to its minimum. In each case we compare the 0.0001 to 0.9999 quantiles. A plot in the same format is given for additional quantities in the supporting information, along with a discussion of uncertainties. Figures 1ai-1aiii show the results for the solar wind Poynting flux. In each case the QQ plots are multicomponent, with three approximately linear regions. A linear region of the QQ plot indicates that the functional form of the PDF in that region does not change; thus, a transformation of variables captures how the distribution is changing over the solar cycle, as described above. These ranges are indicated on the plots. We perform a three-component linear-piecewise fit to these plots. The R 2 statistic associated with the each linear fit (given in Tables S1-S3 in the supporting information) exceeds 0.9 in almost all cases, indicating that the linear transformations provide a good representation of the change in distribution.
The QQ plot of Poynting flux with southward IMF, S S , comparing successive minima (Figure 1ai (5), showing a decrease in the scale of these values, but a large (∼10%) increase in their mean. The high and extreme components appear above the y = x reference line, so that the increased activity of the minimum of cycle 24 relative to cycle 23 is manifest mainly in these extreme values, rather than in the bulk of the distribution.
Conversely, we can see that in Figure 1aii the quantiles all lie below the y = x line, so that the maximum of cycle 24 was significantly less active than the maximum of cycle 23. Again, the QQ plot can be divided into linear regions which we denote as bulk, high, and extreme components. These linear relationships again indicate that within each of the components of the underlying PDF, the functional form is the same at both solar maxima. The crossover points between these regions of the CDF occur at q = 0.9800 and q = 0.9988. Unlike the change between successive minima, the bulk component of the distribution changes scale significantly between the successive maxima, with The high component of the PDF therefore decreases in scale but slightly increases in mean, while the opposite is true for the extreme component. We repeated this analysis comparing the full 11 year data sets of cycles 23 and 24 (see supporting information) and find that cycle 23 is overall more active.
Between the successive maxima, the translations of the high and extreme components are sensitive to whether the full data set or the southward IMF subset is included in the CDF. The bulk component undergoes approximately the same transformation of variables in both cases. The change of scale of the high component increases from = 0.209(2) for S S to = 0.366(5), while for the extreme component is roughly the same, changing from = 1.32(20) to = 1.42(10). The shift in location is ∼50% smaller for the high component and ∼20% larger for the extreme component, once the northward IMF values of S are included. The QQ plot thus has three well-defined linear components that translate according to equation (5) between phases of the solar cycle, irrespective of whether or not northward IMF values are included. However, the changes in scale and location ( and ) of the underlying PDF are more pronounced if we only consider the southward IMF subset. Figure 1aiii compares the distribution of Poynting flux at the maximum and minimum of each cycle. Again, the QQ plot is composed of several approximately linear regions, indicating the multicomponent and invariant functional form of the PDF discussed above. For both solar cycles, the bulk component occurs close to the origin, so that on this scale the QQ plots appear to show only the high and extreme components. These components again undergo a simple translation (as in equation (5)) between solar minimum and maximum; however, the transformations are different for the two cycles. In cycle 23, the high component has a large change in scale of = 9.18(17), while the extreme component shifts in location, = 0.291(31) but has a smaller change in scale ( = 1.57(32)). For cycle 24, the high component remains closer to the y = x line ( = 0.422(46), = 0.0587(43)) and so shows a less pronounced change between solar maximum and minimum. The extreme component does have a significant transformation, with both a large change of scale = 5.90 ± 1.98 and large shift in location = −0.618(266). Therefore, the quietness of cycle 24 is manifest in the bulk and high components, which are the same at maximum and minimum; only the extreme values above the q = 0.9989 quantile show increased likelihood at maximum relative to the cycle minimum.

QQ Plots for B 2 , S , B z,S , and (v x B z ) S
Figures 1bi-1biii through 1ei-1eiii plot the above solar cycle phase comparisons for the B 2 , B z,S , S , and (v x B z ) S parameters, with the latter three calculated using only values with southward IMF. The QQ plots can again be split into multiple approximately linear components, with R 2 values consistently above 0.9, so that the PDF functional form is approximately invariant for all parameters; it, again, simply translates as in equation (5).
Quantiles of both the full and southward IMF only B 2 data sets are shown in Figures 1bi-1biii. Qualitatively, these panels show the same behavior as in Figures 1ai-1aiii, indicating that S simply tracks B 2 . In all three plots, the transitions between components occur at similar quantiles for both B 2 S and S S . However, although the trends are the same, the and parameters differ between S S and B 2 , suggesting that the solar wind speed does not affect the form of the transformations but does alter the details. The comparison of the solar maxima ( Figure 1bii) also shows the same sensitivity as the Poynting flux to whether all values or the southward IMF only subset is used.
Between successive solar minima, the QQ plot for S (Figure 1ci) is qualitatively similar to those of S S and B 2 S (Figures 1ai and 1bi). However, when we consider how the distribution translates between successive solar maxima ( Figure 1cii) the distribution of the S parameter transforms in a remarkably different way to the other variables. The PDF retains its functional form and is translated by a single change in scale over the full range. The value of this transformation is 0.501(1), which is similar to the values of the bulk components of the S S and B 2 S . The high and extreme values of S thus do not retain the distinct changes between successive maxima seen in S S and B 2 S . Similar behavior of S is found when comparing the maximum and minimum of cycle 23 in Figure 1ciii; that is, the high and extreme components are indistinguishable and translate with a single change of scale = 4.81(9). Now, depends upon all three components of magnetic field, whereas the Poynting flux depends on the y and z components only; also includes a factor that depends on the clock angle. In Figure S1, we plot, in the same format as Figure 1, the changes in B 2 x and B 2 yz = B 2 y + B 2 z , along with the clock angle contribution to . We see that the distribution of the clock angle contribution does not change between the different phases of the solar cycles; therefore, this does not explain the relative insensitivity of . However, the distributions of B 2 x and B 2 yz translate in opposite directions on the plot; for example, the extreme components have < 1 and > 1, respectively (values are = 0.0602(408) and = 1.50 (20)). When these are combined within a single parameter, they may tend to suppress these changes. We have repeated this analysis (see supporting information) for the Milan and Newell coupling parameters [Milan et al., 2008;Newell et al., 2007] and find that they are intermediate in sensitivity between and Poynting flux. In common with the Poynting flux, they only depend on the y and z components of the IMF.
Within Figures 1di-1diii and 1ei-1eiii, the distributions of variables B z,S and (v x B z ) S show little change between the two minima except at the extreme values (Figures 1di and 1ei), with less sharp transitions between components. The comparison of the maxima for these variables (Figures 1dii and 1eii) shows three linear components, but again with less clear transitions; this is reflected in the lower R 2 values, which are 0.9722 and 0.9670 for the high component of B z,S and (v x B z ) S , respectively, compared to 0.9938 and 0.9829 for S S and B 2 S . The similarity of the QQ plots for B z,S and (v x B z ) S again suggests that the magnetic field drives the variation of the distribution of the constructed parameters. The solar wind speed also contributes to the change in distribution between the maximum and minimum of cycle 24: compare Figures 1diii and 1eiii with Figure S1aiii in the supporting information.
In Figures 1aiii, 1biii, and 1ciii we see similar qualitative behavior across the parameters. The bulk and high components of the cycle 24 QQ plot are close to the y = x line in most cases, indicating that the distribution of each variable for cycle 24 was the same at maximum and minimum up to the highest quantiles. The outstanding case is in Figure 1diii, where the bulk component of the B z,S distribution transforms in the same manner for both cycle 23 and 24, up to the q = 0.9800. The relatively low activity of cycle 24 is also evident; for all variables, the quantiles of cycle 24 lie below those of cycle 23. The extreme component occurs between q = 0.9989 and q = 0.9995 for all variables in cycle 24, and all except S and (v x B z ) S in cycle 23, suggesting that in all these variables (with the exception of S ) there is an extreme tail which transforms differently to the rest of the distribution.

10.1002/2016GL068920
Finally, we compared the distributions of the two full cycles; see the supporting information for QQ plots and a detailed discussion. Overall, cycle 23 shows higher activity, despite its quiet minimum. For all four parameters, ranges are again found over which the QQ plots are linear, suggesting that the transformation of variables in equation (5) captures the change in the PDF.

Summary
We investigated how the probability density functions of four solar wind-magnetosphere coupling parameters change over the two most recent solar cycles using quantile-quantile plots. The distributions for Poynting flux, B 2 , S and the westward electric field (v x B z ) S were compared between cycle 23 minimum and cycle 24 minimum, between cycle 23 maximum and cycle 24 maximum, and between the maximum and minimum of each of cycles 23 and 24.
We found that for all parameters the PDF has a multicomponent functional form which does not change over the solar cycle or between cycles. Instead, a linear transformation of variables for each component is required to map that region of the PDF from one solar cycle phase to another. These transformations can be found by fitting least squares regressions to linear regions of the QQ plot. We identified three regions of the distributions in the QQ plots; the bulk component, up to q ∼ 0.96, and beyond this the high and extreme components. The bulk component undergoes a change in scale between solar maxima but is roughly the same at both minima. The high and extreme components for S, B 2 and (v x B z ) S are each associated with a unique change in both scale and location. The S parameter behaves differently to S S , B 2 S and (v x B z ) S as the change in its PDF between successive maxima is captured by a single change in scale over the full range, so the high and extreme values are insensitive to the changes in their distribution seen in S S and B 2 S . The QQ plots for S resemble those of B 2 , and likewise the QQ plots for (v x B z ) S track those of B z,S , implying that changes in B drive changes in the PDFs of the solar wind coupling parameters. Cycle 23 exhibited a quieter minimum than cycle 24, which is manifest in the distribution as larger amplitude (quantile) high and extreme components at the minimum of cycle 24 than cycle 23, while the bulk component was roughly the same for both minima. The overall activity of cycle 23 was higher than that of cycle 24, seen across the distribution at the maximum of cycle 23 compared to the maximum of cycle 24. The changes in the distribution of S and B 2 between successive maxima were also found to be more pronounced in the distribution of values when IMF is southward.
As the functional form of the distribution of each solar wind variable does not change between extrema of cycles 23 and 24, it could be an intrinsic property, and as such would remain the same in future cycles. This qualitative property of the full distribution provides a benchmark for models and may support prediction of the likelihood of extreme space weather events. However, as we have shown, this depends on the variable chosen, as the parameter is less sensitive to changes in the likelihood of its large values than the S, B 2 and (v x B z ) S variables.