Prediction of Dst during solar minimum using in situ measurements at L5

Geomagnetic storms resulting from high-speed streams can have significant negative impacts on modern infrastructure due to complex interactions between the solar wind and geomagnetic field. One measure of the extent of this effect is the Kyoto $Dst$ index. We present a method to predict $Dst$ from data measured at the Lagrange 5 (L5) point, which allows for forecasts of solar wind development 4.5 days in advance of the stream reaching the Earth. Using the STEREO-B satellite as a proxy, we map data measured near L5 to the near-Earth environment and make a prediction of the $Dst$ from this point using the Temerin-Li $Dst$ model enhanced from the original using a machine learning approach. We evaluate the method accuracy with both traditional point-to-point error measures and an event-based validation approach. The results show that predictions using L5 data outperform a 27-day solar wind persistence model in all validation measures but do not achieve a level similar to an L1 monitor. Offsets in timing and the rapidly-changing development of $B_z$ in comparison to $B_x$ and $B_y$ reduce the accuracy. Predictions of $Dst$ from L5 have an RMSE of $9$ nT, which is double the error of $4$ nT using measurements conducted near the Earth. The most useful application of L5 measurements is shown to be in predicting the minimum $Dst$ for the next four days. This method is being implemented in a real-time forecast setting using STEREO-A as an L5 proxy, and has implications for the usefulness of future L5 missions.


Introduction
The solar wind has myriad effects on the Earth's magnetic field, among them the enhancement of the ring current around the Earth's equator (Gonzalez et al., 1994). Through streams of charged particles, the solar wind injects energy into the ring current and thereby reduces the global magnetic field strength (Daglis et al., 1999). This can have consequences on GPS and satellite communication as well as flight operations where crew may be exposed to greater levels of radiation (Schrijver et al., 2015). The extent of the enhancement and resultant reduction in field strength is often given by the disturbance storm time index, Dst. This is an hourly value derived from geomagnetic field variations measured at four observatories (Honolulu, Kakioka, San Juan and Hermanus) below 35 • in geomagnetic latitude (Mayaud, 1980). Different levels of geomagnetic activity are often described by the minimum of Dst reached, with a Dst ≤ −200 nT denoting an extreme geomagnetic storm, Dst ≤ −100 nT being an intense storm, and Dst ≤ −50 nT being only a moderate storm. A Dst ≤ −30 nT is sometimes used to denote weak geomagnetic storms. The largest negative values of Dst are seen almost exclusively in storms caused by interplanetary coronal mass ejections/ICMEs (Borovsky & Denton, 2006), while moderate storms occur throughout the solar cycle (Richardson et al., 2001;Tsurutani et al., 2006). These more common, milder storms are driven by high-speed streams and stream interaction regions/SIRs (Alves et al., 2006;Jian et al., 2006), and Richardson et al. (2000) showed that high-speed streams lead to 70% of geomagnetic activity during solar minimum. In rare cases SIR-driven storms can also become major geomagnetic events (Richardson et al., 2006) with an expected maximum possible storm strength of Dst ∼ −180 nT.
Although the official Kyoto Dst is derived solely from ground-based field measurements, a very good estimate of the upcoming Dst can be made based on in situ solar wind data from satellites at the Lagrange 1 (L1) point (e.g. most recently the DSCOVR satellite, see Burt & Smith, 2012). This allows for a prediction lead time of 10-50 minutes, with the amount of time determined by solar wind speed between L1 and Earth. The state-of-the-art approach in this respect is the model of Temerin and Li (2006), an empirical technique that achieves a Pearson correlation coefficient between the observed and predicted Dst values of 0.96 for the seven years of data evaluated. This model depends solely on solar wind input, and the variation in Dst at one point depends on both the solar wind at that point in time and past modelled Dst timesteps.
For forecasts beyond a half-hour window, we look now to possible future missions to the L5 point, which sits 60 • behind the Earth in its orbit and roughly 4.5 days in advance for corotating solar wind structures. A space weather mission at this point to perform in situ solar wind measurements has been discussed many times before (Gopalswamy et al., 2011;Lavraud et al., 2016;Hapgood, 2017), and presents a strong opportunity for accurate forecasts of space weather events with a much enhanced lead time ranging from hours to days. As shown in Thomas et al. (2018), a solar wind monitor at the L5 point can provide very good forecasts of the ambient solar wind variations, of which high-speed streams and SIRs are of primary interest.
In this work we show how predictions of the Dst index at Earth can be made using L5 data and discuss the applicability and accuracy of the method. This is carried out using data from the Solar Terrestrial Relations Observatory Behind (STEREO-B, NASA, see Kaiser, 2005) satellite, which crossed the L5 point in late 2009, as a proxy for a future L5 mission. Data from this satellite is mapped to L1 as if it had been measured there by correcting for both time passed in solar rotation speed and solar wind expansion (Thomas et al., 2018), and a method for predicting the Dst from L1 data is then applied. The accuracy of the Dst forecast is evaluated using a combination of traditional error metrics (e.g. correlation coefficient, mean error) as well as a method considering the prediction of events (e.g. Dst minimum) without comparing the Dst development point-to-point. Past studies have looked at predicting the general solar wind properties (Simunac et al., 2009;Turner & Li, 2011;Kohutova et al., 2016;Temmer et al., 2018;Owens et al., 2019) while this study aims specifically to determine how well the development of Dst and geomagnetic effects can be predicted using L5 data. The results also include a brief analysis of the sensitivity of Dst prediction to offsets in measurements of the magnetic field. This study serves as a verification for methods that are now being implemented in real-time using STEREO-A data as it crosses the L5 point and moves towards the Earth.

Mapping L5 data to L1
Studies using the STEREO satellites (launched in 2006) as proxies for satellites positioned at L5 have been undertaken in the past. Simunac et al. (2009) mapped data from L5 to L1 using a time shift determined by the synodic rotation period of the Sun and showed good agreement between the solar wind speed and density measured at the two locations. Turner and Li (2011) evaluated the correlation between time-shifted measurements ahead in the Parker spiral and L1 data. It was shown that while the correlation in solar wind speed remains high, there is rarely much correlation in magnetic field components. Thomas et al. (2018) continued in this thread but went a step further and carried out a comprehensive analysis of solar wind forecasting skill using data measured near L5.
To map data measured at L5 or thereabouts to L1, we use the same approach as described in Thomas et al. (2018) and apply a time shift to the data measured at STEREO-B assuming a rotation in the solar wind equivalent to the rotation speed at the solar equator of roughly 27 days (∆t lon ). Here we use a synodic rotation period of T syn = 27.27 days as given in Owens et al. (2013). A second adjustment to the time to correct for differences in radial distances (∆t r ) and solar wind expansion timing is calculated as follows based on Eq. 1 from Simunac et al. (2009): where r L1 and r STB are the radial distances of L1 and STEREO-B from the Sun, while v sw is the mean solar wind speed at the time of measurement. ∆lon is the difference in longitude between the Earth/L1 and STEREO-B. Ω Sun is the variable for solar rotation speed 360 • /T syn . The total time shift ∆t, which varies with longitudinal distance between the satellite and Earth, is then added to the time of measurements from STEREO-B. Since this results in a new range of times with increasing difference between the new and original values as STEREO-B moves away, the measurements are interpolated back to periodic hourly time values.
A second adjustment is applied to the solar wind data to account for areal expansion of the solar wind at different radii and in the Parker spiral (Kivelson & Russell, 1995). All variables are multiplied by a correction factor determined by the ratio between the distance of STEREO-B and L1 (r STB /r L1 ). The rate of expansion for the density is assumed to behave according to the inverse-square law (Kumar & Rust, 1996), while the magnetic fielc components scale according to factors as given in Hanneson et al. (2020) varying between -2 and -1. In the case of STEREO-B, which was at a distance greater than 1 AU, this means that the solar wind was corrected backwards and effectively compressed to L1.

Prediction of Dst from L1
There are many different models for predicting the Dst index from solar wind measurements. Earlier models from Burton et al. (1975) andO'Brien andMcPherron (2000) achieve a reasonable level of accuracy. When using the OMNI2 data set as input and comparing the results to the true Kyoto Dst values, they have correlation coefficients of 0.76 and 0.84 respectively. One of the most exhaustive L1-to-Dst algorithms is undoubtedly the semi-empirical model developed first in Temerin and Li (2002) for the years 1995-1999, which was later extended to the year 2002 in Temerin and Li (2006). This model has a linear correlation of around 0.95 for Dst in the periods the model was intended for (in good agreement with the original work), although when using it for predictions beyond this period (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019), there is a linear drift away from the real Dst of about −5.3 nT/year due to the inclusion of a time-dependent variable in the model (Temerin & Li, 2015). After applying a simple linear correction for the drift, the correlation for this model in future times is still very good at 0.90 on average. Due to the dependence of the model on local time, the input for data mapped to L1 was the time it was expected to arrive there and not the time it was measured.
The Temerin and Li (2006) method (henceforth called TL2006) makes very good predictions of Dst at the Earth using either data measured at L1 or data mapped to L1 using the aforementioned time-shifting method. A small improvement can be made to the model through application of a machine learning algorithm from the Python package SciKit-Learn to provide a correction factor for the base drift-corrected TL2006 output. The algorithm is a gradient boosting regressor (GBR), which develops an ensemble of basic regressors to calculate output (correction to Dst) based on the provided input. The input was the same set of variables provided to the TL2006 method (solar wind speed, density, and magnetic field components along with time). Some feature engineering was applied to the input variables to provide more information to the GBR such as including a solar wind pressure term and time derivative of B z to evaluate which variables improved the Dst prediction, and those that did not lead to an improvement were removed. The most important addition was the introduction a "ring-current term" (with both current and past values from the prior 24 hours) based on the method in O' Brien and McPherron (2000). The ring-current term described most of the Dst variation not accounted for by the TL2006 method, and the prediction of Dst from solar wind measurements using the TL2006 method plus this GBR correction value (enhanced method or ETL2006) leads to an improved average linear correlation of 0.95 and reduced RMSE between real and predicted Dst over the time range 2000-2018. In this study we apply the model using the enhancement throughout.

Data
The NASA OMNI2 data set was used for getting the measurements of the solar wind in the near-Earth environment and the values for the Kyoto Dst (the real values to which all predictive models are compared). The machine learning algorithm providing the correction to the TL2006 Dst prediction method was trained on the Kyoto Dst data set for 2000-2018.
STEREO-B one-minute resolution PLASTIC (Galvin et al., 2008) and IMPACT (Luhmann et al., 2008) instrument data was used as a proxy for data measured at L5. STEREO-B differs from a true L5 mission in two ways: firstly, it was constantly in motion and moving further away from the Earth in its orbit; and secondly, it was also at a greater distance from the Sun than the Earth (r STB ≈ 1.05 AU). These differences were both accounted for using the approach defined in the methods section. Beacon data, which is low-resolution data sent soon (minutes) after measurement, was used rather than the higher-quality science data that arrives later to simulate the forecasting application of this model in a real-time operation scenario. This will have an effect on the final results, although we would not expect the quality to be greatly degraded by using beacon rather than science data. The data was also downsampled via interpolation to one-hour resolution.
STEREO-B data is given in the reference frame STBHGRTN , a spacecraft-centric reference frame with x pointing from the Sun to the spacecraft with the Sun as the origin, y as the cross-product of the rotational axis with the x-component, and z as the normal to these two pointing out of the ecliptic. While the satellite is still in the geospace environment, the STBHGRTN coordinate system can be transformed to GSE by flipping the x and y directions. For Dst calculation purposes, this is converted to GSM according to the algorithms given in Hapgood (1992). The STBHGRTN frame rotates with the spacecraft as it moves away from the Earth but we assume a rotation of the solar wind with the mapping of data measured at STEREO-B to L1, and therefore can perform the same coordinate transformation to quasi-GSE (as if the measurements had been made in the geospace environment) and then to GSM. All spacecraft positions in this work are given in heliocentric Earth equatorial (HEEQ) longitudes and latitudes, in which the z-axis is parallel to the Sun's rotation axis and the x-axis is the intersection of the solar equator and solar central meridian as seen from Earth.
Because STEREO-B is always moving and only spent a short amount of time around the actual L5 point at −60 • , for the purposes of this study we consider the location of L5 to be −60±10 • in longitude in addition to evaluating overall statistics for the range 0 to −110 • . Comparisons of results in the next sections will refer to the two data sets as the full data set and the reduced data set. The time range in the full data set covers five years from Feb 2007 until Jan 2012. The time range for the reduced range of angles from STEREO-B near L5 is almost six months from August 2009 until February 2010, which encapsulates almost the entire solar minimum, meaning this is the optimal time range to evaluate L5-based ambient solar wind prediction.
To allow comparison to a simple reference baseline model, a third data set was created as a persistence model, as this has been shown to achieve reasonable accuracy with a solar wind recurrence rate of 27.27 days (Owens et al., 2013). For this, the same OMNI2 data was taken after being shifted into the future by 27.27 days. These models are referred to as OMNI, STB and PERS in short form and plots throughout the text. In order to calculate Dst reliably, any gaps in the data were linearly interpolated over. ICMEs that occured at one location or in one data set and not the other were removed from the data using start and end times given in the HELCATS ICMECAT catalogue (Möstl et al., 2017), which covered the period of evaluation for both STEREO-B and measurements near the Earth (in this case with the WIND satellite). This was carried out for all comparison data sets after STEREO-B had been time-shifted with ICME start and end times being corrected to the new shifted times. In total, three sets of ICMEs were removed from all data sets: those at STEREO-B, those at L1, and those at L1 shifted by 27.27 days for the persistence model. This reduces the size of the total data set by 12%.
An example of solar wind speed, vertical magnetic field and Dst values from the three data sets is plotted in Fig. 1, in which the time range covered by STEREO-B when it was near the L5 point (−60±10 • ) is shown. The STEREO-B data has already been time-shifted according to the solar rotation speed to Earth, and is plotted against the OMNI data for comparison. The high-speed streams observed at STEREO-B are easy to identify and generally overlap with those in the OMNI data, although in some cases they arrive earlier or later than their counterparts at Earth. Gaps in the data are ICMEs that have been removed. As can be seen in the comparison of Dst, the red line with Dst predicted from the OMNI data very closely matches the actual Kyoto Dst, while in the mapped STEREO-B Dst prediction there are still large variations. The persistence model (grey) performs worse than the STEREO-B approach.
The distribution of values in Dst (both observed and predicted) according to the different models is plotted in Fig. 2. The OMNI, Kyoto and PERS data sets have very similar distributions, but the STB Dst values have a slightly higher peak close to zero and fewer values in the positive region. See Sec. 6 for a list of all data sources.

Results
The goodness of prediction of Dst values predicted from data measured near the L5 point will be evaluated in this section. First, the goodness will be measured according to standard error metrics, and then the models will be compared using an event-based approach looking additionally at the forecasting of events within 24-hour windows.

Accuracy of prediction
We first evaluate the accuracy of a prediction using data from the L5 point using standard metrics such as the Pearson correlation coefficient (PCC), mean absolute error (MAE), mean error (ME) and root-mean-square error (RMSE) of the predicted values subtracted from the observed. The results for each data set are listed in Table 1.
Predictions from the OMNI data set consistently achieve a very good level of accuracy compared to the real Kyoto Dst, which is to be expected given the good behaviour of the ETL2006 model in general, although predictions using STEREO-B data are considerably worse. When comparing the STEREO-B/L5 predictions to the persistence model, we see that the prediction using STEREO-B data is better in all measures. Errors in both these models are usually twice as large as those from the OMNI model. In Fig. 3, the PCC and RMSE are both plotted as a function of longitudinal difference ∆lon between Earth and STEREO-B. Each point was calculated for the data measured at the point of longitude ±3.5 • , which led to time ranges of two to seven months being evaluated because STEREO-B was not moving away from Earth at a constant rate with regards to longitudinal distance. As can be seen in the figure, predictions of Dst using OMNI data fairly consistently achieve a PCC of 0.90 and an RMSE of roughly 5 nT, while the accuracy of the STEREO-B prediction degrades with increasing distance from the Earth, as would be expected. Note some features of the plot at first glance seem peculiar but can be easily explained. The sudden rise in PCC beyond the ∆lon of -100 is   In Fig. 3a, the curves in the figure show fits to the PCC values as a decaying function of ae −bx , and we see that at a ∆lon of −110 • , the predictions from STEREO-B data seem to have dropped to roughly the same accuracy as the persistence model. The PCC here starts at 0.80 close to the Earth and drops to 0 at around ∆lon = −110 • . Fig. 4 explains this quick drop-off showing the correlations between solar wind variables in OMNI and STEREO-B with increasing ∆lon, both as hourly values and the means over a window of 4 hours. The solar wind speed remains highly correlated, which is to be expected due to the rotation of coronal holes and the relatively slow temporal development of high-speed streams. The correlation for total magnetic field and the x-and y-components also remains good throughout, but the z-component drops off quickly and the correlation has already nearly reached 0 at ∆lon = −10 • . Due to the strong dependence of Dst on B z (see e.g. Gonzalez & Tsurutani, 1987), it is not surprising that we also lose accuracy in Dst prediction fairly quickly. What is however useful to note from Fig. 4 is that the correlation in mean(B z ) and the other components remains good beyond the point at which the correlation of hourly values has dropped significantly.
In Fig. 3, the results from the persistence model are also included in grey, and we see that there is a similar downward slope and decreasing correlation over time even though we would expect this to stay constant. This suggests that, for this time period at least, there may have been other effects in the solar wind leading to a reduction in accuracy from the prediction due to more rapid changes in the solar wind structures and high-speed streams. This is likely explained by the increase in solar activity as the Sun entered the rising phase of the solar cycle in 2011/2012. We would observe greater numbers of shortduration coronal holes closer to solar maximum in addition to the more regular coronal holes and high-speed streams that dominate the variations otherwise. Although time does not increase linearly from point to point in Fig. 3, on the whole it spans five years and we can look at the general development of PCC and RMSE over time. Both show a gradual decrease in accuracy as solar activity increases, which is most important to note for the persistence model that serves as a benchmark for the others. When taking this into account, a prediction using STEREO-B data at greater values of ∆lon may also be a rea-   sonable approach that is simply not represented well here due to the more active period evaluated.
At the L5 point, the PCC for the STEREO-B model is around 0.30. Unfortunately, looking at the points alone we see that the approach performed particularly badly at precisely this angle, although from the overall statistics we can deduce that this was only a short-duration dissonance that had little to do with the spacecraft position itself. This may instead be a result of latitudinal difference between the two satellites, which we look at in the following.
The dynamics of the outflowing solar wind in the heliospheric current sheet have a strong latitudinal dependence, with differences of a few degrees resulting in large variations in the shape and timing of the arriving solar wind structures. In Simunac et al. (2009) and Thomas et al. (2018), for example, differences in latitude between two spacecraft were shown to have a notable effect on forecasting skill. To quantify this effect in this study, we look at the accuracy metrics with the longitudinal dependence removed plotted against the absolute difference in latitude between the two measuring points, STEREO-B and Earth. See Fig. 5 for a depiction of this approach. This was achieved by subtracting the slope of the longitudinal dependence for STEREO-B (centre plot, light blue line) along with the slope in the OMNI values to account for unrelated changes over time. Interestingly, few of the metrics show any correlation with a difference in latitude, with the exception being the MAE and RMSE, which correlate mildly with ∆lat with a PCC of −0.35 (increasing accuracy with increasing ∆lat, a confusing result). All other metrics have correlations < 0.15, such as the PCC showed as an example in the figure (centre and right). If the range of ∆lon is reduced to 0 to −40 (in which the most periodic rotational behaviour was observed), a more predictable set of behaviour emerges. The correlation values for PCC, RMSE and MAE with ∆lat are -0.21, 0.24 and 0.15 respectively, suggesting small dependencies on ∆lat leading to a decrease in accuracy. Since an analysis of the whole range did not show the same pattern, we can deduce that discrepancies caused by the increase in solar activity later in the cycle far outweigh the effects of latitudinal difference in regards to accuracy.
As a sanity check, the same metrics with and without ICMEs were also compared, and the results were as expected. Predictions of Dst using OMNI data were equally good with and without ICMEs, but there was a small improvement in the metrics for data predicted using the STEREO-B data and the persistence model because the transient events unrelated to their measurements had been removed. An analysis of errors with the input solar wind data split into high-speed streams and slow solar wind was also carried out, with the results showing that the errors in Dst calculated for fast solar wind (above a threshold 1.25 times the median speed or ∼ 480 km/s) are slightly (∼ 20 − 30 %) larger than those for slow solar wind. In a further test, we also looked at the same evaluation without scaling of the magnetic field components to correct for different orbital distances to see if this step was necessary. Not scaling the solar wind input led to a very small increase in the errors (e.g. an RMSE of 12.25 nT increasing to 12.83 nT), showing that the scaling of magnetic field to correct for a distance of 0.1 AU (at maximum) does have an effect on the accuracy on the forecast, although it is minimal.

Event-based analysis
An interesting alternative to standard point-to-point measures that quantitatively assess the magnitude of the forecast error at each time step is to consider each time step as an event/non-event. The primary advantages of an event-based validation analysis are as follows. Firstly, periods of weak, moderate and enhanced geomagnetic activity are weighted equally by simple point-to-point comparison measures. However, end-users of operational real-time space weather models are usually more interested in accurately predicting times of enhanced Dst values, while the exact evolution in time of the Dst is in most cases of secondary importance. Secondly, outliers in the predicted times can have a significant influence on the error measures and correlation coefficients determined. In context, an efficient approach is to label each time step in the predicted and observed Dst timeline as an event/non-event (Owens, 2018). An example of this approach applied to high-speed streams can be found in Reiss et al. (2016), in which the OSEA software used for this analysis was also used.
In this study, we define an "event" as any time step the Dst exceeded (or went below) a certain threshold. In this case we defined −35 nT as the threshold for a weak geomagnetic storm. The value of −50 nT, usually the threshold for a moderate storm, was also seen in the period evaluated, but did not occur often enough to allow for a reasonable statistical analysis. Similarly, Dst values beyond −100 nT arose so infrequently that it was not worth considering them here, and we restrict ourselves to looking at all events below a level of −35 nT as a proxy for "mild geomagnetic activity". Using the corresponding threshold values, we label each time step as an event or non-event in the measured and predicted time series, and for the case of Dst everything below the threshold was an event. By cross-checking the events/non-events between true and predicted Dst, we count the number of hits (true positives; TPs), false alarms (false positives; FPs), misses (false negatives, FNs) and correct rejections (true negatives; TNs) and summarise them in the so-called contingency table. From the entries of the contingency table, we can compute different skill measures, including the True Positive Rate TPR = TP/(TP + FN) and the False Positive Rate FPR = FP/(FP + TN). While the TPR is the proportion of correctly predicted events among all the events, the FPR is the proportion of non-events wrongly predicted as events.
Moreover, we compute the Threat Score TS = TP/(TP+FP+FN) as a measure of the model performance, Bias B = (TP+FP)/(TP+FN) indicating whether the number of observations is underforecast (B < 1) or overforecast (B > 1), and the True Skill Statistics TSS = TPR -FPR as a measure of the overall model performance. The TSS is defined in the range [-1,1] where a perfect prediction would be equal to 1 (or -1, for a perfect inverse prediction), and a TSS equal to 0 indicates no predictive ability of the forecast model. It is important to note that the TSS is unbiased by the propor-  (Hanssen & Kuipers, 1965;Bloomfield et al., 2012). Since the number of non-events exceeds the number of events by 10 to 1, the TSS is very well suited for the validation analysis conducted in this study. For a more thorough discussion of the skill measures applied here, we would like to refer the interested reader to Jolliffe and Stephenson (2003). Table 2 shows the contingency table entries and the skill measures computed over the full and reduced time ranges. We find that the prediction using OMNI/L1 data has a very high level of accuracy in all measures, and in comparison the prediction from STEREO-B/L5 is only somewhat better than a persistence model. Both STB and PERS models show a tendency to underforecast when considering the full data set.
Interestingly, we see that for the time STEREO-B was around L5 (in the reduced data set), it managed to forecast 0 of the events below the threshold, while the persistence model also forecast 0 but achieved an impressive number of FPs. It is hard to carry out statistics for this period because it was an extremely quiet time, with only 6 events below the −35 nT threshold over six months. To look at a less time-sensitive approach, we consider an additional forecasting method.
The bottom rows of Table 2 show the event-based analysis for the prediction of the minimum Dst in the next 24 hours (with a 12-hour resolution), and in this way we consider the predictions while ignoring possible errors in timing of ±12 hours. (For the OMNI model, with a maximum forecast time of 30-60 minutes, this is obviously a pointless measure, but it is left in for comparison.) Here it becomes clear that the L5 monitor outperforms the persistence model while also achieving a higher TSS score than when simply applied to all data. In this case, the ratio of FPs to TPs is also greatly reduced from around 6 to 2.
A straightforward approach to illustrate the trade-off between the proportion of correctly predicted events (TPR) and the proportion of erroneously predicted events (FPR) for different event thresholds is the so-called receiver operator characteristic (ROC) curve. ROC curves are a helpful diagnostic to compare and quantitatively assess the predictive skill of forecast models. They illustrate how the number of correctly classified events varies with the number of incorrectly classified non-events for each model investigated here. In  other words, they show the trade-off between the completeness of events and the contamination with non-events. To illustrate the predictive abilities of the different forecast models in a single summary variable, we also compute the area under the curve (AUC) defined between 0 and 1, where the best results are equal to 1. Fig. 6 shows the ROC curves for all models using the full data sets. At a glance it is clear that the OMNI model far outperforms the models using STEREO-B and PERS data, and the STEREO-B model is almost always somewhat better than persistence.

Sensitivity of Dst model to field measurements
Here we also include a brief evaluation of the sensitivity of the ETL2006 prediction model to magnetic field measurements and the effect of possible offsets in the measurements. To achieve this, the ETL2006 model was used to predict Dst from two years of OMNI data to simply assess model sensitivity independent of any other factors. Fig.  7 shows the dependence of predicted Dst on offsets in magnetic field measurements. An error of ±1 nT in the B z component leads to an error of +5/ − 6 nT in the predicted Dst, although offsets in all other components have almost no effect. This is useful to know for future L1 and L5 missions, which rely on in-flight calibration methods such as that described in Plaschke (2019), to have an estimate of the error in the Dst prediction if the error in the magnetic field measurements are known.

Discussion and conclusions
In order to evaluate the possibility of forecasting the Dst index from the L5 point with a lead time of a few days, we have looked at in situ solar wind data measured by STEREO-B as it crossed the L5 point. This was mapped to L1 in time and space according to the solar rotation speed and expansion of the solar wind and high speed streams. The mapped data is used to make a prediction of Dst and the results were then compared to the Kyoto Dst as well as Dst predicted from an L1 monitor and a 27-day persistence model. This method is useful for predicting geomagnetic effects from high-speed streams and SIRs, but can not provide forecasts for transient events such as ICMEs. The results can be summarised as follows: • A prediction of the Dst index from data measured at L5 does not achieve a level of accuracy similar to predictions made using L1 data, but it performs better than a 27-day solar wind persistence model in all standard measures. Geomagnetic effects in particular are hard to predict due to the rapidly-changing development of B z . • The error in the prediction can be quantified using the MAE, which at the L5 point in the STB model has an average value of 8 nT, which is double the error from L1 at 4 nT. Offsets in B z measurements of ±1 nT can cause an offset error in the predicted Dst of +5/ − 6 nT. • As the values of magnetic field variations correlate much more strongly when looking at mean values, a method of predicting the Dst from means taken over multiple hours would likely be a strong forecasting method, and this should be considered for operational purposes. • The usefulness of L5 data in predicting Dst minima in the next 24 hours is reasonable and performs better than a persistence model.
• A strong dependence of the accuracy of the predicted Dst on latitudinal difference between the L5 proxy and L1 measurements could not be determined.
Unsurprisingly, because B z is so badly predicted from L5 due to the rapid changes in magnetic field (as discussed in detail in Thomas et al., 2018), it is difficult to quantify the geo-effectiveness of a high-speed stream in advance using data from somewhere as far-removed as L5 because of the strong dependence on B z in causing geomagnetic effects (Gonzalez & Tsurutani, 1987). This was compounded by the fact that the coherence of high-speed streams when STEREO-B was closest to the L5 point was uncharacteristically low, and so most of the results have been based on general trends in forecasting capabilities across a much wider range of longitudes from the Earth from 0 to −110 • , at which point the error appeared to match that seen in the persistence model. As described in Verbanac et al. (2011), Dst also has a strong correlation with the solar wind speed, which shows that B z is not the only input to the model providing valuable information on how the Dst develops. A further effect that may lead to reduced accuracy is stream-stream or stream-CME interaction. In some cases, a CME bracketed in an SIR can give rise to enhanced geo-effectiveness in the stream (Chen et al., 2019). This would be particularly effective if such a case were to occur at L5, drastically changing the nature of the predicted high-speed stream that would later arrive at Earth. Regardless, we have shown that predictions of Dst from data at L5 certainly outperform a persistence model, especially when looking at predictions of Dst minimum in the next 24 hours or days.
This work was carried out to validate application of the methods in an operational setting, and as such STEREO beacon data, which arrived in near real-time, was used. Beacon data is, however, of considerably lower quality than the science data, which arrives much later. Due to developments in on-board processing since the STEREO launch in 2006, real-time data to be expected from upcoming space weather satellites should be of better quality than the original beacon data and may be closer in quality to the science data. For comparison, the same parameters as discussed above were evaluated for STEREO Level 2 scientific data, and an increase in accuracy was observed. This was on a small scale (a correlation of 0.43 instead of 0.41, for example), but shows that the data from newer satellites with more advanced real-time data transfer should achieve a slightly higher accuracy for Dst predictions than presented here.
At the moment, STEREO-A is slowly approaching the Earth in its orbit, meaning that it can function as a proxy for real-time L5 forecasts for both the solar wind variables and Dst. This study has functioned as a verification for a real-time model currently running using STEREO-A data, which at the time of writing is at a longitude of −80 • . For this forecast, we can assume an error in Dst predicted of ±8 nT from the average MAE at the L5 point reached by STEREO-B. The knowledge gathered throughout this real-time forecasting using STEREO-A will be invaluable in preparation for setting up effective predictive methods relying on a real future L5 space weather mission.