Journal of Geophysical Research : Space Physics Solar Wind Plasma Parameter Variability Across Solar Cycles 23 and 24 : From Turbulence to Extremes

Solar wind variability spans a wide range of amplitudes and timescales, from turbulent fluctuations to the 11 year solar cycle. We apply the data quantile-quantile (DQQ) method to NASA/Wind observations spanning solar cycles 23 and 24, to study how the uniqueness of each cycle maximum and minimum manifests in the changing statistical distribution of plasma parameters in fast and slow solar wind. The DQQ method allows us to discriminate between two distinct components of the distribution: the core region simply tracks the solar cycle in its moments but shows little sensitivity to solar wind state or the specific activity of each cycle. This would be consistent with an underlying in situ process such as turbulence driving the evolution of fluctuations up to an outer scale. In contrast, the tail component of the distribution is sensitive both to the differences between the maxima and minima of cycles 23 and 24, and the fast or slow state of the solar wind. The tail component varies over the solar cycle in such a way as to maintain a constant functional form, with only its moments varying with solar activity. Finally, after isolating the core region of the distribution, we test its lognormality over the solar cycle in each solar wind state and find the lognormal provides a more robust description of the statistics in slow wind than fast; however, in both states the goodness of fit is significantly reduced at solar maximum.


Introduction
The solar wind in the ecliptic exhibits two states: fast wind, which originates from the edges of long-lived, extended polar coronal holes (Hassler et al., 1999), and slow wind, thought to be generated in low-latitude, variable streamer belts and active regions (Schwenn, 2006). The fast wind is relatively smooth, whereas transient features including Viall puffs (Viall et al., 2010) and Sheeley blobs (Sheeley et al., 1997) are more characteristic of the slow wind (Rouillard et al., 2010). The 11 year solar cycle impacts upon the coronal sources of the fast and slow solar wind (Luhmann et al., 2002;Schwenn, 2010), so that on long timescales the solar wind reflects the variability of its solar origin. As each solar cycle is unique in its duration, its minimum, and its peak intensity, the impact on the solar wind is different for each cycle (Richardson et al., 2000;McComas et al., 2013). Turbulent fluctuations develop in both fast and slow wind (e.g., see textbooks, Tu & Marsch, 1995;Bruno & Carbone, 2013), resulting in further variability on scales from ∼1 h down to below the ion gyroperiod (Kiyani et al., 2015). Turbulence evolving in situ in the solar wind flow may be expected to exhibit universal properties such as a power law power spectrum with a characteristic exponent (Osman et al., 2014) and non-Gaussian fluctuations (Sorriso-Valvo et al., 2001) that show scaling (Horbury et al., 2005;Chapman & Hnat, 2007). As the solar wind plasma is subject to solar cycle driving, the varying solar activity can be used to test the universality of these characteristics; for example, Chapman and Nicol (2009) compared the power spectra and scaling properties of the turbulence seen by Ulysses in fast polar flows and found that while the overall power in the fluctuations varied between the different minima of cycles 23 and 24, the scaling properties were stable against these changes in the driving, consistent with a universal turbulent process.
Extensive observations of the solar wind spanning several solar cycles allow the use of statistical methods to quantify the evolution of solar wind variability with solar cycle phase and between different cycles (Dmitriev et al., 2009;. Studying the full distribution of observed values of the interplanetary magnetic field (IMF), Burlaga and King (1979) found an approximately lognormal distribution; this was later corroborated by Slavin and Smith (1983). As the lognormal is the limit distribution for the observed values of a multiplicative process (Mitzenmacher, 2004), this finding is consistent with the turbulent nature of the solar wind plasma (Bruno & Carbone, 2013, and references therein). However, as shown by Feynman and Ruzmaikin (1994), the nonzero kurtosis of the IMF distribution is inconsistent with an exact lognormal.
Despite their rarity, it is the most extreme space weather events that have the largest impact on Earth. However, they are sparsely observed so it is challenging to test potential models, and the dynamic nature of solar activity constantly changes what is considered an "extreme" event (Riley & Love, 2016). The tail of the distribution may therefore be sensitive to the solar cycle phase, as well as to longer-term trends in space climate (Lockwood et al., 1999). Wheatland (2000) modeled the waiting time between solar flares as a Poisson process whose rate parameter varied with solar activity. Tsubouchi and Omura (2007) applied the same approach to the Dst geomagnetic index but found the model was only valid provided the activity did not exceed a limit. Above this limit, extreme events occurred too close together in time to satisfy the independence condition inherent in the Poisson process, even after declustering had been applied. In other studies, the impact of the solar cycle on extreme events is investigated by performing the same extreme value analysis (Coles, 2001) at different solar cycle phases. Hush et al. (2015) compared the size distribution of bursts in the time series of the AE geomagnetic index, where a burst is defined as a continuous period where the signal exceeds a threshold, between three solar maxima and minima. The size distribution of the largest events was consistently exponential; however, the fit parameters were highly sensitive to each solar cycle. Moloney and Davidsen (2014) found that the size and duration of bursts in the solar wind parameter were insensitive to the solar cycle after adjusting for a cycle-dependent "characteristic timescale." Amicis et al. (2006) studied the waiting times between periods of southward IMF, otherwise thought of as bursts in −B z , and found a two-component distribution: a power law region exhibiting long-range memory and an exponential cutoff where Alfvénic fluctuations dominate. The relative weights of these components varied with solar activity. As the data quantile-quantile (DQQ) methodology allows us to explore the full distribution, our study will also address the solar cycle variation of the extreme values of solar wind plasma parameters, without requiring assumptions concerning the functional form of the distribution tail.
The data quantile-quantile (DQQ) plot can be used as a model-independent method for tracking changes in the statistical distribution of a variable across its full range. We applied it (Tindale & Chapman, 2016) to space weather relevant parameters in observations spanning solar cycles 23 and 24 seen by NASA's Wind spacecraft. This revealed that solar wind-magnetosphere coupling parameters have a multicomponent functional form that is invariant to the solar cycle, with moments tracking changing solar cycle variability. Here we use the DQQ methodology to perform the first detailed study of variability in fast and slow solar wind across the full distribution. As no assumptions concerning the functional form of the distribution are required for this technique, it can be used to track changes in the moments of each part of a multicomponent distribution, as well as changes to the breakpoints between each component's range of influence. The Wind data set spans two solar cycles, and we will use it to study the time dependence of the distribution of solar wind speed, IMF magnitude, and the components of the IMF observed in the fast and slow solar wind states. The more extreme values (the "tails") of the likelihood distributions of IMF magnitude and component are composed of two parts whose functional forms do not change, but whose moments are sensitive both to solar cycle phase and intensity at solar maximum and minimum; this multicomponent behavior is not fully resolvable in slow wind, though the tail of the distribution is still distinct from the turbulent core. In the core region, we find the mean of the distribution varies with solar cycle while the variance is roughly constant, independent of the fast or slow state of the solar wind. Finally, we test the lognormality of the distribution core over solar cycles 23 and 24 and find the model provides a reasonable description at solar minimum, but provides a poor description of the distribution at maximum due to the dominance of sharp, localized peaks in the distribution, particularly in fast wind.
We begin by describing the data set and detailing aspects of the DQQ methodology relevant to this analysis in section 2, including a demonstration of the method using synthetic data. In section 3 we apply the method to the distribution of solar wind plasma parameters: first, to the full data set to resolve statistically the behavior 10.1002/2017JA024412 of the more extreme values and subsequently splitting the data into the fast and slow solar wind states. We identify the turbulent core of the IMF magnitude distribution and discuss its dynamics in section 4, focussing first on the time dependence of its moments and second on the efficacy of the lognormal model in describing the distribution during different phases of the solar cycle. Finally, our conclusions are presented in section 5.

Data Used in the Study
We analyze data from the NASA Wind spacecraft at times when it was in the upstream solar wind, accessed through the NASA/GSFC OMNI data set. Fifteen-second IMF data from the MFI instrument and 92 s plasma velocity data from SWE were combined to give a 1 min resolution time series prior to our access. The data are in the geocentric solar magnetospheric (GSM) coordinate system. We took 1 year of data around each of the minima and maxima of solar cycles 23 and 24, spanning periods 1 December 1995 to 30 November 1996 (cycle 23 minimum), 1 October 1999 to 30 September 2000 (cycle 23 maximum), 1 August 2007 to 31 July 2008 (cycle 24 minimum), and 1 November 2013 to 31 October 2014 (cycle 24 maximum). These dates were chosen based on the sunspot number, as in Hush et al. (2015).
Each year-long sample contains roughly 500,000 data points. Around 10% of the data is missing from each time series, with data gaps roughly evenly spread through the four samples, leaving around 450,000 points per year. The data are split into the fast and slow streams with a boundary at 450 kms −1 , chosen as the midpoint between the typical fast and slow wind speeds of ∼300 kms −1 and ∼ 600 kms −1 , respectively, to reduce as far as possible the contamination of slow wind data points into the fast wind data set and vice versa. This leaves around 150,000 data points in each fast wind sample and 300,000 in each slow wind sample. While the sample size in fast wind is significantly smaller than slow wind, both are sufficiently large to allow robust statistics for the DQQ method.

Constructing the DQQ Plot
The data quantile-quantile (DQQ) plot is a model-independent method for comparing two empirically sampled statistical distributions (Braun & Murdoch, 2016). Here we take two samples of the same variable at different times and use the DQQ plot to investigate changes in the distribution over its full range. While not requiring any a priori assumptions of the functional form of the underlying distribution, the DQQ plot can be used to test how the functional form changes over time, or whether the samples are drawn from distributions with the same model but with time-dependent moments. Moreover, if the underlying distribution is multicomponent, the DQQ plot will reveal the (potentially time-dependent) transitions between regions of parameter space that contain the distinct components or populations of observations. One can in principle compare different samples of data by qualitative inspection of overplotted probability density functions (PDFs). As we demonstrate in Figure 2 this is a far less sensitive indicator of how two distributions differ than the DQQ method. A quantitative approach would be direct fitting of the functional form of each PDF with standard distribution functions. However, for a two-component distribution, this implies an optimization to estimate 4-6 fitting parameters (location and scale, and if needed, shape parameter for each distribution) trialing distribution functional forms. If the problem at hand is to compare samples of the observations at different times to see how the system is changing, then this can be achieved simply by plotting the data on a DQQ plot without any need to fit the PDFs.
The analysis requires us to identify two timescales. First, we build up each distribution from a finite sample of data; therefore, the sampling timescale must be long enough to capture the full range of the distribution. However, it must also be sufficiently short that we can assume within-sample weak stationarity: within the sampling time we cannot resolve any change in distribution. Over the longer observation timescale the distribution may change from one state to another; by drawing samples spaced over the observation timescale we can test for these changes in distribution using the DQQ plot. The sampling timescale needs to be sufficiently short to avoid resolving changes on the known physical timescale of the system, here the approximately 11 year solar cycle, so our maximum sampling timescale is 1 year around solar maximum or minimum. At other phases of the solar cycle, this would need to be shorter. A minimum sampling timescale is chosen to give sufficient samples to resolve aspects of interest in the distribution. This depends on whether we are focussing on rare, extreme events, where we look across the entire distribution with 1 year samples, or on whether we are focussing on the core of the distribution only, where 1 month samples are sufficient. The empirical CDFs of samples x 1 (taken during period t 1 ) and x 2 (period t 2 ). Proportion q 1 of the set of observations is bounded by quantile x 1 (q 1 ) in sample x 1 and quantile x 2 (q 1 ) in sample x 2 , similar for q 2 . (b) The DQQ plot is produced by plotting x 1 (q) against x 2 (q) for all values of q between 0 and 1.
The DQQ procedure is summarized in Figure 1. Consider two samples of variable X, denoted by time series x 1 sampled during a period centered on t 1 and series x 2 sampled during a period centered on a later time t 2 . The cumulative distribution function (CDF) gives the probability of observing a value of X less than or equal to x, as a function of x (Figure 1a). It can be determined empirically by ordering the sampled data in increasing size and then plotting the data value against its position in the series normalized by the total number of observations. Note also that since the empirical CDF is normalized by the number of data points, after separating data into fast and slow solar wind, the DQQ plot traces changes in the conditional probability of the variable exceeding a certain value given that it is observed in one or the other solar wind state.
The CDF is a monotonic function and so can be inverted to give the quantile function: the value of x(q) that is the upper bound of a given proportion q of the sample (Gilchrist, 2000). If the CDFs of time series x 1 and x 2 are given by C 1 (x 1 ) and C 2 (x 2 ), then the quantiles of the time series are (1) As shown in Figure 1a, if the underlying distribution has changed between period t 1 and period t 2 (i.e., C 1 ≠ C 2 ), the same proportion of data q will be bounded by different values of x in the two time series. If the quantile function of each sample is evaluated at the same set of q values from 0 to 1, we can compare each (x 1 (q), x 2 (q)) pair. The resulting DQQ plot ( Figure 1b) has as its axes coordinates the empirical quantiles of X in each time series, and likelihood q as a parametric coordinate. Due to the step-like nature of the empirical CDF, interpolation between data values is required when calculating quantiles. Kernel density estimation is a popular tool for obtaining a smooth estimate of the empirical distribution (Bowman & Azzalini, 1997); here we will use it to obtain a continuous functional fit to the CDFs.
The shape of the DQQ plot depends on the relationship between the CDFs of the two samples. If both samples were drawn from the same distribution, the quantile functions would be the same; thus, all points on the DQQ plot would lie on the y = x line. Indeed, this method is also used to compare data with empirical realizations of test distributions with the same moments; we refer to these fit-observation diagrams as QQ plots here. Provided the distribution can be uniquely specified in terms of scale and location parameters, any other straight line indicates so the CDFs are related by 10.1002/2017JA024412 Therefore, the functional form of the distribution is the same in both cases; however, between t 1 and t 2 the parameters of the distribution changed. The change in the scale parameter is captured by the gradient of the DQQ plot, , and the change in location parameter by the intercept, . This simple linear transformation of variables allows the mapping of the first distribution onto the second, without requiring any knowledge as to the functional form of either distribution. As the CDF is the derivative of the probability density function (PDF), P(x), equation (3) can be integrated to give an equivalent conversion of the PDF: so that the functional form of the PDF is also unchanged; however, an additional factor of is required to conserve the normalization property.
If the DQQ plot is not linear (or piecewise linear in the case of a multicomponent distribution; Tindale & Chapman, 2016), this indicates the distribution functional form (if expressible in terms of location and scale parameters) has also changed between the samples. In this case, the DQQ plot can be used to discern whether the second functional form of the distribution is heavier or lighter tailed than the first (Wilk & Gnanadesikan, 1968). However, transformations (2) and (3) rely on the underlying distribution being expressible in standard form as P(x) = P((x− ) ). This is not always the case and an important exception is the lognormal distribution, which we discuss next.

DQQ Plot With Model Pseudodata
We will first demonstrate aspects of this method relevant to the analysis to follow, using pseudodata sets composed of numbers randomly drawn from test distributions. In Figure 2, we show three possible outcomes of DQQ analysis, with the empirically sampled PDF plots of the distributions to be compared in the left-hand column and the corresponding DQQ plot in the right-hand column. In the top row, both samples are drawn from the same test distribution: a Gumbel distribution with a location parameter = 1.5 and scale parameter = 0.75. As discussed above, since the underlying distribution is the same in both cases, the DQQ trace lies along the y = x line across its full range. In the middle row, the second sample is now drawn from a Gumbel distribution with different parameters, = 2.0 and = 1.0. As the underlying functional form is unchanged, this results in a DQQ trace of a single straight line. The gradient and intercept of this line correspond to parameters and in equation (2); thus, they quantify the change in the location and scale parameters of the distribution between the two samples. Lastly, in the bottom row we compare the original Gumbel distribution to a Maxwellian with scale parameter a = 1.2. At the peak of the distribution these distributions track each other, and the tail is not sufficiently statistically resolved to distinguish between the two distributions within errors. However, the DQQ plot clearly shows that the tails differ; as the DQQ trace is not linear, the functional form of the two distributions must be different. The DQQ plot can therefore reveal whether two samples are drawn from two different distributions or from the same distribution with different moments, without requiring any a priori knowledge of the functional form of the distribution or fitting of its parameters.
The procedure was repeated using data drawn from the Gaussian, exponential and lognormal model distributions. In the first two cases, the PDFs can be expressed as a function of (x− )∕ ; thus, the linear transformation described above applies. However, for logarithmically distributed data, the natural logarithm of the variable X is distributed according to the Gaussian distribution; therefore, the probability of observing X in the range This is not of standard form P(x) = P((x − )∕ ), so even if two samples are both drawn from lognormal distributions they may no longer produce a straight line on the DQQ plot. However, as the natural logarithm of x is distributed according to the Gaussian, a transformation between two lognormal distributions will result in a straight line in natural log-log space. Therefore, the corresponding change of variables is ln(x 2 ) = ln(x 1 ) + .
We again demonstrate this using pseudodata in Figure 3. Now, sample x 1 is drawn from a standard lognormal distribution (location 0, scale 1), and samples x  0 and scale 1.6 for x 2 , and location 1 and scale 1 for x 3 . In the left column of Figure 3, we see the DQQ plot with linear axes. It is immediately apparent that in the case of a change of scale (top row); a straight line will not capture the shape of the trace. When only the location parameter has varied (bottom row), the trace appears closer to linear, but its gradient and intercept do not reflect the change in moments. However, when we plot the same quantiles on natural logarithmic axes (right-hand column), a single straight line is recovered in both cases; in Figure 3aii the gradient is 1.6, as expected from this change in scale, while in Figure 3bii the intercept is 1, denoting the shift in mean.

Data Set Combining Both Fast and Slow Wind: Resolving the More Extreme Values
We now apply the DQQ method to study the evolution of the distribution of (a) solar wind speed v, (b) IMF magnitude B, and (c) components of the magnetic field. In the discussion of the turbulent core of the distribution that follows, we will for conciseness plot the IMF x component B x only, as in the core the y and z component behavior was found to be similar. The DQQ plots in Figure 4 show how the distribution of the full range of each of these parameters changes between the minima of cycles 23 and 24 (left column), the consecutive maxima (middle column), and between the maximum and minimum of each cycle (right column), for cycle 23 in red and cycle 24 in blue. Here we use all data available in each year-long sample, comprising both fast and slow solar wind states.
All plots are composed of multiple linear regions. In most cases, more than 90% of the data fall into a single component, the "core" of the distribution. The remaining 10% of data form a distinct "tail" component, which itself splits into two parts in some cases. We label the quantile at which the tail splits from the core asq, and any further breakpoints within the tail as q. In the case of the IMF components (here B x ), the variable is signed, so the distribution has both a positive and negative tail. With around 450,000 observations in each year-long sample, the tail component of the distribution of IMF magnitude B represents 50,000 observations that occurred during the year, of which 500 are in the uppermost distribution component above the q ∼ 0.999 quantile. Each component undergoes a separate transformation between phases of the solar cycle, producing DQQ plots with a piecewise linear shape. The approximate linearity of each segment indicates that the functional form of the distribution is unchanged throughout these two cycles, while the displacement of each segment from the y = x line signifies a solar cycle dependence in the moments of each component of the distribution.
The unique activity of each solar cycle is reflected in how the moments of the distribution change with time. The minimum of cycle 24 was deeper and more extended than those of the previous cycles observed  In column i we compare the minimum of cycle 23 to that of cycle 24; in column ii we compare the two solar maxima; and in column iii we compare the maximum and minimum within each cycle, for cycle 23 in red and cycle 24 in blue. Each DQQ plot is split into multiple linear components, with the range of each component defined by the highlighted quantiles;q indicates the breakpoint between the core and tail of the distribution, while any subsequent breakpoints within the tail are denoted by q. As the IMF components are signed, their distribution has both a positive and negative tail; the breakpoints between the core and these two tail components are both denoted byq.
since the space age. Its subsequent maximum also exhibited reduced activity, as shown by a lower sunspot number among other solar activity tracers (e.g., Tsurutani et al., 2011;McComas et al., 2013). Focussing on Figures 4bi-4biii, we see these differences between the cycles are reflected in the distribution of the magnetic field magnitude; Figure 4bii shows that the variance of all three components was larger at the maximum of cycle 23 than that of cycle 24, while in Figure 4biii we see a more marked change between the maximum and minimum of cycle 23 than 24, particularly in the extremal component. However, examining Figures 4ci-4ciii we see changes in distribution are less clear for the x component of the field; notably in Figure 4ciii we see for both cycles 23 and 24 the core of the distribution of B x , located between theq quantiles, did not change significantly between solar maximum and minimum.
Finally, inspecting the DQQ plots for solar wind speed in Figures 4ai-4aiii, we notice a natural split in the distribution atq between 450 and 600 kms −1 . This is approximately the boundary between fast and slow solar wind, indicating that in distribution fast and slow wind transform differently over the solar cycles. The boundary between the fast and slow states also appears at different quantiles in each sample, revealing the variable probability of an observation occurring in fast or slow solar wind, depending on the solar cycle phase. This could suggest the impact of the solar cycle manifests differently in the two winds, so we will now use 10.1002/2017JA024412 the velocity measurements to create separate samples for the fast and slow wind states and repeat the DQQ procedure on each.

Separation of Fast and Slow Solar Wind
The DQQ plots for (a) v, (b) B and (c) B x are shown in Figure 5, for fast (left column) and slow (right column) wind separately. In each case we compare the q = 0.0001 to q = 0.9999 quantiles of the distribution of each parameter at solar maximum to those at solar minimum, for cycles 23 (red) and 24 (blue). The highlighted points now show the 0.9 and 0.99 quantiles in all cases, i.e., the values that exceed 90% and 99% of each sample. To split the data into the fast and slow states, we choose a constant threshold of 450 kms −1 and define fast wind as any single data point whose speed measurement exceeds this, giving 150,000 (300,000) points in each year-long fast (slow) wind sample. Of these, roughly 15,000 (30,000) exceed the 0.9 quantile, and 1,500 (3,000) exceed the 0.99 quantile.
Beginning with the fast wind ( Figure 5, left column), we see the same qualitative behavior in B and B x as seen for the full solar wind data set in Figure 4. For the IMF magnitude this consists of a three part distribution; the q 0.9 quantile again acts as the upper limit for the core of the distribution, while the Importantly, in fast wind the q > q 0.9 tail component of the IMF magnitude distribution (B > 8 nT values) can be approximated by a linear piecewise fit, suggesting that the functional form of the distribution tail does not change with solar cycle phase and simply tracks it with changing moments. The changing scale parameter of the tail component is shown by the steep gradient of each trace: at the maximum of cycle 23, the 0.9 and 0.99 quantiles of B were 12.2 nT and 23.9 nT, respectively, compared to the median value of 5.97 nT; whereas at the minimum of cycle 23, the corresponding values were 7.66 nT and 11.8 nT relative to a median of 5.02 nT. These values also demonstrate the unique activity of each cycle; while the highest plotted quantile at the maximum of cycle 23 was 6.7 times higher than the median; at the maximum of cycle 24 this ratio was 5.5. Importantly, this shows that theq < q < 0.99 component transforms in the same way within cycles 23 and 24, whereas the transformation of the extremal q > 0.99 component is sensitive to the unique activity of each cycle.
In contrast, we cannot clearly identify the three-component structure in the distribution of the slow solar wind observations ( Figure 5, right column), although there remains a distinct core component for both the magnitude and components of the IMF. The grey boxes again highlight the B < 8 nT and −8 nT< B x < 8 nT values, denoting core regions of the B and B x distributions (as evident from Figure 7 in the case of B). The core of the distribution of IMF magnitude in slow wind has moments (location and scale parameters) that change between solar maximum and minimum, but these changes are the same between the respective maxima and minima of cycles 23 and 24. In contrasts the core of the IMF component distribution (B x shown), which in slow wind, shows no variation either within or between solar cycles. As in fast wind, the extremal tails of both the IMF magnitude and component distributions in slow solar wind are sensitive to varying solar conditions. The distribution of the IMF magnitude has a smaller scale parameter in slow wind than fast; during the maximum of cycle 23 the 0.9 and 0.99 quantiles were 9.97 nT and 15.9 nT, 1.6 and 2.5 times the median value, respectively. The scale of the distribution tail also varies less over the solar cycle for slow compared to fast wind, as shown by the smaller offset of the extremal quantiles from the y = x line.
When plotted on linear axes, the DQQ plot naturally highlights the behavior of the extremal tail of the distribution. However, the majority of data samples are drawn from the core region of the distribution, and it v (kms -1 ), solar min. displays little variation between the distinct maxima and minima of solar cycles 23 and 24. Above this scale (∼ 8 nT) the IMF distribution is sensitive both to changes within the solar cycle and between cycles 23 and 24. These characteristics can be considered in the light of results (e.g., Horbury et al., 1995;Marsch & Tu, 1997) that suggest that active turbulence operates in both fast and slow solar wind. Alfvénic turbulence has been seen in the IMF components, while, due to the slightly compressive nature of the solar wind plasma, the signature of compressive turbulence has been noted in the IMF magnitude (Goldstein et al., 1995;Hnat et al., 2005Hnat et al., , 2011. In Figure 6 we show the same quantiles of B seen in Figures 5bi and 5bii, but now on natural logarithmic axes. In particular for slow wind, and also to some extent for fast, we see straight lines up to approximately theq ∼ 0.9 quantile, indicative of a lognormal-to-lognormal transformation. The lognormal is the limit distribution of a multiplicative process, which again has been associated with turbulence (Burlaga & Ness, 1998;Vörös et al., 2015). The deviation of these lines from y = x shows that the moments of this core turbulence component vary with solar cycle phase, though not with the unique activities of cycles 23 and 24.
In summary, we see a two-part distribution for the distribution of the IMF magnitude and its components. Values B < 8 nT and |B x | < 8 nT are drawn from a core component whose moments vary between solar maximum and minimum, but are insensitive to the unique activity of each cycle as well as the fast or slow solar wind state. As shown by the logarithmic DQQ plot (Figure 6), the form of this core region is approximately lognormal. In both solar cycles the core component is more distinct in slow wind distributions, consistent with the longer period of development of turbulent fluctuations in slow wind before their detection at 1 AU. The largest observed values of the IMF form a separate extremal component in both solar wind states; in fast wind this extremal component has two parts, both of which retain their functional form over the solar cycle but have moments which change within and between the different cycles. In slow wind the transformation of the extremal component is less structured but is still separate to the transformation of the core of the distribution.

Solar Cycle Variation of the Distribution Core
We now investigate in more detail how well the lognormal model describes the core of the B distribution over the solar cycle. As shown by the deviation of the logarithmic DQQ plots from straight lines at high quantiles ( Figure 6), we expect the data to be lognormally distributed only in a restricted core range. To determine this range, we first split each year-long sample into 12 month-long samples, maintaining separate data sets for fast and slow solar wind. We then discard any month-long samples with fewer than 5,000 observations; these will have insufficient data to support an accurate estimate of the empirical distribution. A test lognormal distribution is constructed for each of the remaining monthly samples, with parameters estimated by taking the natural logarithm of the data and determining its mean and variance. In Figure 7, the empirical quantiles of each month-long sample of B are compared to the quantiles of the test lognormal using QQ plots with natural log-log axes. The 12 month-long QQ traces from each year are superimposed on the same axes; row (a) shows the 12 months surrounding the minimum of cycle 23, row (b) shows the cycle 23 maximum, row (c) is the cycle 24 minimum and row (d) shows the cycle 24 maximum. As above, the left and right columns results for fast and slow wind, respectively.
In Figure 7 we can see a central region where the trace from each month collapses onto the y = x line for all solar cycle phases. This reveals the range within which the data are lognormally distributed, which may correspond to the range within which the process is statistically dominated by turbulent fluctuations. At higher values the plots diverge due to the solar activity-sensitive tail component. This divergence can be seen around 8 nT, which we showed earlier as the upper boundary of the turbulent-dominated range. We also note the plots begin to diverge at low quantiles, at values around 3 nT; this may reflect uncertainties in calibration (Farrell et al., 1995;Gogoberidze et al., 2012). Since it is not possible to accurately determine the lower boundary of the turbulent range, as a default we retain all data up to 8 nT when discussing the core component. Figure 8 shows the time evolution of the sample mean (top row) and variance (bottom row) of each month-long sample, with the moments of the full data set denoted by paler colors and those of data within the range B < 8 nT denoted by darker colors. The parameters are shown for fast and slow wind in red and blue, respectively. Inspecting first the top row of Figure 8, we see the expected solar cycle variation of the mean. When we consider the full distribution (i.e., paler colors), we see the variation in the calculated mean is greater in fast wind than slow, due to the sensitivity of the distribution to extreme events. When these are excluded by using only data B < 8 nT, the means of the turbulent core in fast and slow wind are similar and together track the solar cycle.
The variance of the full data set is also is dominated by extreme values; at solar maximum the variance of the full fast wind data set can be 10 times higher than that of the restricted fast wind set. This ratio is lower in the slow wind, with a factor of 5 between the full and B < 8 nT data set variances. Again, when we use only . The PDF is given by the normalized histogram with logarithmically spaced bins; thus, its error is the square root of the number of samples in each bin. Superimposed are the lognormal models whose moments are the same as those of the data, with the same color format as Figures 8 and 9. the B < 8 nT values, the variance of the fast and slow wind become more similar, though variance in slow wind appears to consistently exceed that in fast wind by roughly 0.5 (nT) 2 . However, we now see that the intrinsic spread in the variance of the turbulent core dominates over any underlying solar cycle variation.
Having studied the evolution of the moments of the data, we now test to what extent the data are described by the lognormal model across the solar cycles. In Figure 9, we plot the evolution of the normalized sum of squared residuals between the empirically sampled lognormal model (as in Figure 7) obtained by using the moments and the empirical PDF calculated from each monthly sample, in the same format as Figure 8. The empirical PDFs are estimated by a normalized histogram with logarithmically spaced bins, with an error estimated as the square root of the count number. The number of bins was allowed to vary so as to be high as possible while maintaining a relative error below 5% for all bins below the 8 nT boundary for the turbulence-dominated range. The sum of square residuals (SSR) for the full data set, shown by the paler colors in Figure 9, is normalized by the number of bins used to sample the PDF; the normalized SSR of the restricted fit is normalized by the number of bins centered below 8 nT so that these are directly comparable. In Figure 10  In Figure 9, the consistency of the lognormal model with the full distribution (pale lines) is reasonably constant over these two solar cycles for both solar wind states, fluctuating around 0.0015 (nT) 2 for fast and 0.0010 (nT) 2 for slow wind. The amount of variability around these values is increased at solar maximum. When we estimate the model parameters using only the < 8 nT core turbulence component, we see a much clearer solar cycle variation, particularly in fast wind. At solar minimum, the lognormal corresponds more closely to the B < 8 nT observations than the full data set, with a relatively smaller SSR of 0.0010 (nT) 2 in fast and 0.0008 (nT) 2 in slow wind. However, at solar maximum the lognormal function to B < 8 nT data performs less well, with a significantly larger normalized SSR around 0.004 (nT) 2 in both cases, suggesting at solar maximum the data are less well described by the lognormal model. The maximum normalized SSR values in slow wind appear later than those in fast; however, this may simply be due to restricted resolution on the plot; as there were few slow wind data points in these months, the number of bins used to calculate the empirical PDF had to be reduced to ensure statistical accuracy. This also accounts for the three anomalous peaks during 1999 and 2002 seen in the B < 8 nT model for slow wind.
Finally, in Figure 10 we overlay PDFs of these lognormal models with the observed PDFs. This may provide some insight into why the lognormal may not be a suitable description of the data at solar maximum. At solar minimum, the distribution tends to be dominated by a single peak; by excluding extreme events from the data set the <8 nT model is better able to capture the height and shape of this peak. However, at solar maximum the shape of the distribution is highly irregular, with multiple smaller features as opposed to a single peak. The uncertainties on the empirical PDF show that these features are intrinsic to the data and are not simply an artifact of the PDF estimation method. We provide further examples in the supporting information (SI) (Figures S1 to S4). These show that for some month intervals around solar maximum the PDF has a two-peak structure, which has been interpreted as "ejecta" plasma (Klein & Burlaga, 1982;Xu & Borovsky, 2004). However, it is not a persistent feature and, as discussed in the SI, the DQQ plot would effectively distinguish it should contribute significantly to the distribution of the year-long data samples.

Conclusions
We have used data quantile-quantile (DQQ) plots to investigate how the difference between and across solar cycles is manifested in the statistical distribution of plasma parameters in fast and slow solar wind. As the DQQ method is model independent, we can trace changes to the distribution across its full range with no assumptions regarding its functional form. Should the underlying distribution be multicomponent, the DQQ method also allows us to identify transition points between the regions where each distribution component is statistically dominant.
Applying the DQQ method to observations of the interplanetary magnetic field (IMF) taken in the upstream solar wind by the Wind spacecraft, we are able to identify a two-part functional form for the distribution of the IMF magnitude and its components. The majority of the data falls into the core component; in this region the mean tracks the solar cycle while the variance fluctuates around a constant value, with the same variation seen in both solar cycles and both solar wind states. The largest ∽ 10% of observations form a distinct tail component, which varies in a distinguishable way between the distinct activity levels of the minimum and maximum of each solar cycle and also behaves differently in fast and slow solar wind. We find that the tail region becomes statistically dominant over the core for values above B = 8 nT (|B x | = 8 nT).
The core region of the IMF distribution is not sensitive to the distinct activity levels of the maximum and minimum of each cycle. This is consistent with an underlying in situ process modifying the evolution of small fluctuations in this region. Using the DQQ plot with logarithmic axes, we find an approximately lognormal form for the core region; as the limit distribution of a multiplicative process, this is consistent with turbulence underlying the statistics of fluctuations in this region. However, when we test the lognormality of the core region over the solar cycle, we find that while the lognormal is a more robust description of the data in slow wind than fast across the solar cycle, in both solar wind states the lognormal most closely describes the data at solar minimum. At solar maximum, particularly in fast wind, the distribution exhibits multiple sharp peaks, and only in an interpolated sense follows a lognormal.
The extreme tail of the distribution is not captured by the lognormal model at any solar cycle phase, and excluding it from the data set used to create the lognormal model results in an improved correspondence to the core region at solar minimum. Unlike the turbulent core, the extremal component is sensitive to the differences in solar wind conditions in fast and slow solar wind seen at each distinct solar maximum and minimum. On DQQ plots with linear axes, the tails of the distributions of B and its components can be decomposed into piecewise linear fits. This means the distribution of extreme values tracks solar cycle changes in a manner consistent with an unchanging underlying functional form that simply shifts in its mean and variance. If this behavior is characteristic of all solar cycle changes, then it offers potential predictability of solar wind statistical "climate."