Ensemble CME Modeling Constrained by Heliospheric Imager Observations

Predicting the arrival of coronal mass ejections (CMEs) is one key objective of space weather forecasting. In operational space weather forecasting, solar wind numerical models are used for this task and ensemble techniques are being increasingly explored as a means to improve these forecasts. Currently, these forecasts are not constrained by the available in situ and remote sensing observations, such as those from the heliospheric imagers (HIs) on the National Aeronautics and Space Administration's (NASA's) STEREO spacecraft, which record white‐light images of solar wind and CMEs. We report case studies of four CMEs and show how HI observations can be used to improve the skill and reduce the uncertainty of ensemble hindcasts of these events. Using a computationally efficient solar wind model, we produce 200‐member ensemble hindcasts, perturbing the modeled CME parameters within uniform distributions about the best estimates. By comparing the trajectory of the modeled CME flanks with HI observations, we compute a weight for each ensemble member. Weighting the ensemble distribution of CME arrival times improves the skill and reduces the hindcast uncertainty of each event. For these four events, the weighted ensembles show a mean reduction in arrival time error of 20.1 ± 4.1%, and a mean reduction in arrival time uncertainty of 15.0 ± 7.2%, relative to the unweighted ensembles. This technique could be applied in operational space weather forecasting, if real‐time HI observations were available. Therefore, as NASA and the European Space Agency are currently planning the next space weather monitoring missions, our proof‐of‐concept study provides some evidence of the potential value of including HIs on these missions.


Introduction
Coronal mass ejections (CMEs) are vast eruptions of magnetized plasma from the Sun's corona. At Earth, CMEs are the main driver of space weather, which energizes Earth's space environment and disrupts critical services provided by spacecraft, power grids, and aircraft (Cannon et al., 2013;Hapgood, 2011). Consequently, understanding the propagation of CMEs through the solar wind, and being able to estimate their expected arrival at Earth, are key research questions and objectives for space weather forecasting centers. Despite recent progress, the evolution of CMEs through the solar wind and heliosphere is still not well understood, due to historically sparse heliospheric observations and open questions regarding CME structure (Luhmann et al., 2020). Owens, Lockwood, and Barnard (2020) showed that CME arrival time forecasts are valuable for a range of hypothetical operational settings. They also demonstrated that additional value can be added to a CME forecast by including information about a CME's speed and magnetic field strength.
This highlights that although CME arrival time estimates are a necessary element of CME forecasting, they are not, and should not be, the sole focus.
Since the launch of the National Aeronautics and Space Administration's (NASA's) STEREO mission in 2007 (Kaiser et al., 2008), we have been able to routinely observe the propagation of CMEs from the solar corona, out through the inner heliosphere to Earth orbit, using the white-light coronagraph and heliospheric imager (HI) instruments in the Sun-Earth-Connection-Heliospheric-Investigation instrument package (Howard et al., 2008). Before this, CMEs could only be routinely observed close to the Sun (typically within 30 solar radii) in the white-light Large Angle and Spectrometric Coronagraph (LASCO) on NASA's SOHO mission (Brueckner et al., 1995), and measured with in situ solar wind plasma monitors, typically in near-Earth space.
Observations reveal that CMEs undergo many different dynamical processes during their propagation through the corona and heliosphere. Coronagraphs show that CMEs are accelerated and can be deflected and deformed by structures in the solar corona (Jones et al., 2017). This is expected to be due largely to gradients in magnetic pressure between features like coronal holes, active regions, and the heliospheric current sheet (Kay et al., 2015). Furthermore, interactions with the solar wind are important in determining a CME's evolution. Fast and slow solar wind streams can accelerate or decelerate a CME's speed (Tucker-Hood et al., 2015), and can also lead to distortions of a CME's structure if they flow through the boundary between fast and slow solar wind streams Savani et al., 2010). Consequently, forecasting the arrival of CMEs at Earth requires a good knowledge of the ambient solar wind conditions.
The current state-of-the-art for forecasting CME arrivals at Earth uses 3-D magnetohydrodynamic (3-D MHD) models of the heliosphere (Merkin et al., 2016;Odstrcil, 2003;Riley et al., 2001;Tóth et al., 2005). However, the Riley et al. (2018) review of CME forecasting techniques (many based on 3-D MHD models) concluded that the mean absolute arrival time error is ±10 hr (20% to 50% of the expected Sun-Earth transit time) and, more interestingly, that the accuracy of arrival time forecasts had not measurably improved since 2012. However, this significant uncertainty and stalled progress is maybe not surprising.
These models, such as ENLIL (Odstrcil, 2003) (used by the UK Met Office and NOAA's Space Weather Prediction Center (SWPC)), require an inner boundary condition only for determining the state of the solar wind. However, these inner boundary conditions cannot be observed directly, and must be estimated from observations of the Sun's photosphere and extrapolated using models of the Sun's corona; heliospheric models are currently driven by the output of coronal models that are empirically tuned to match in situ observations near Earth. Furthermore, currently for forecasting, CMEs are typically introduced to these 3-D MHD models as time-dependent perturbations of the solar wind speed and plasma parameters at the model's inner boundary (so-called "cone model" CMEs), with coronagraphs used to estimate the CME's initial state (Odstrcil, 2003). Both of these procedures have significant uncertainties that are difficult to estimate. Therefore, initial condition ensembles are being increasingly investigated as a way to estimate the uncertainty in 3D MHD simulations of CME propagation (Mays et al., 2015;Murray, 2018;Pizzo et al., 2015). Furthermore, ensemble modeling allows a probabilistic interpretation of the forecast, which can be more valuable than a single deterministic forecast (Owens & Riley, 2017a). However, 3D MHD simulations are sufficiently computationally expensive that it is challenging to appropriately sample the relevant parameter space of CME and solar wind properties, particularly for a large number of events or for multiple interacting CMEs.
Data assimilation (DA), widely used in meteorology, is another promising tool for improving space weather forecasts from solar wind numerical models (Lang & Owens, 2019). DA provides a framework to optimally combine models with observations to provide a best estimate of the state of a dynamical system. However, despite the current and past availability of HI observations throughout the domain of the heliospheric models, difficulties in interpreting HI data in terms of the model state have meant that it has not been possible to use HI and DA techniques to provide a better representation of solar wind CME interactions (Lang et al., 2017).
Alternatively, in recent publications, there is growing interest in using HI observations to weight ensemble members as a means of providing improved estimates of solar wind-CME interactions and CME arrival times at Earth (Harrison et al., 2017;Murray, 2018;Wharton et al., 2019). Here we report case studies of four CMEs that highlight the potential utility of such an approach, enabled through computationally efficient solar wind modeling. Using the HUXt solar wind model (Owens, Lang, et al., 2020;Riley & Lionello, 2011), we consider the evolution of the CMEs launched on 31 August, 28 September, 10 October, and 20 November in 2012. With HUXt, we produce an ensemble hindcast of these events and demonstrate that HI observations of these CMEs can be used to appropriately weight the ensemble members. This leads to improved CME arrival time estimates and reduced ensemble spread, and thus improved hindcast performance. The method presented here could be readily integrated into an existing forecasting system, assuming real-time availability of HI data, and so might be a valuable tool for operational space weather forecasting centers.

CME Events
Here we study four CMEs that were previously analyzed by Barnard et al. (2017). For each event, NOAA's SWPC produced a forecast of the CME using the standard WSA-ENLIL Cone modeling system. Using the available coronagraph data, SWPC calculated estimates of the required cone CME parameters, specifically the CME source longitude and latitude, width, speed, and the time at a height of 21.5r ⊙ ; these are given in Table 1. Verification of these forecasts using in situ solar wind plasma observations from the ACE spacecraft at the L1 point subsequently established their arrival times. Over this period, the STEREO-A and STEREO-B spacecraft were at HEEQ longitudes of ∼125 • and 240 • , and so were separated from Earth by ∼125 • and 120 • , respectively. In the following case studies, we use the same cone CME parameters as used by SWPC, except for adjusting the initiation time, to account for the fact the inner boundary of HUXt is at 30r ⊙ , rather than 21.5r ⊙ . For the rest of the article, these events will be referred to as CME-1, CME-2, CME-3, and CME-4.

Solar Stormwatch CME Tracking
As part of the study of Barnard et al. (2017), these CMEs were tracked by the Solar Stormwatch (SSW) project. Many citizen scientists tracked the events through the HI1 field of view (FOV) on both the STEREO-A and STEREO-B spacecraft (referred to as HI1A and HI1B, respectively). The HI1 instruments are white-light imagers that observe sunlight Thomson scattered from solar wind electrons (Howard et al., 2008). Figure 1 shows examples of running differenced images for HI1A and HI1B, which highlight transient features such as CMEs. The HI1 FOV spans 20 • and is centered at 14 • in the ecliptic plane. The nominal cadence of HI1 science images is 40 min with a binned pixel size of 70 arcsec. The location of features in the HI FOV will be discussed in terms of Helioprojective-Radial-Coordinates: position angle (PA), the anticlockwise angle from solar north, and elongation ( ), the angular distance from Sun center. Barnard et al. (2017) developed a procedure to produce a statistical consensus estimate of the CME front location for each HI1A and HI1B image, which allows the time-elongation profile of the CME front to be computed at any position angle spanned by the CME. Examples of these consensus profiles are presented in Figure 1. They also demonstrated that the resulting time-elongation profiles had better uncertainty estimates and more stable feature tracking than those obtained through the more commonly used J-map technique ). Here we use the SSW profiling of these events to extract the time-elongation profile of the CME flanks along the latitudinal plane corresponding to the mean HEEQ latitude of Earth during each event. Here we define the CME flank as the maximum elongation of a CME front viewed from a particular vantage point, along a specified position angle direction. , "best estimate") run of CME-1 at 2012-09-01T08:46. The orange line marks the boundary of the CME region. The shaded red and pink regions mark the HI1A and HI1B fields-of-view. The red-cross and pink-diamond mark the flanks of the modeled CME from the HI1A and HI1B perspective. (middle) The meridional plane of the same CME differenced imaged by HI1A, at the closest time to the model snapshot. The solid-red line marks the SSW consensus profile of the CME front, while the dashed lines mark the uncertainty in this profile. The dashed-green lines mark a 4 • position angle band around the solar equatorial plane. The red cross marks the elongation of the CME front in the equatorial plane. (right) HI1B differenced image of this CME, marked in the same way as the HI1A image.

HUXt
HUXt (Owens, Lang, et al., 2020) is a numerical model of the solar wind that uses a reduced physics approach, treating the solar wind as a 1-D incompressible hydrodynamic flow. This allows very efficient computational solutions, being ∼1,000 times faster than comparable 3-D MHD solar wind models. Despite this reduced physics approach, HUXt has been shown to closely emulate full 3-D MHD models; a 40-year validation test of ambient solar wind from HelioMas (Riley et al., 2001) was reproduced to within 7% of the HelioMAS solar wind speeds throughout the entire model domain (Owens, Lang, et al., 2020;Riley & Lionello, 2011). Therefore, HUXt can serve as an effective surrogate in situations where full 3-D MHD simulations are too computationally expensive. In particular, HUXt ensembles of model solutions that more completely explore the full range of uncertainties in the initial conditions. HUXt only requires the solar wind speed at the model inner boundary to be specified and so can work with the output of any model that can provide this. Here we use output from the MAS coronal model (Riley et al., 2001), but it can also operate with output from, for example, the Wang-Sheely-Arge model (Arge & Pizzo, 2000). In this work, HUXt is run in the latitudinal plane corresponding to Earth's HEEQ latitude at the initialization time of each cone CME.  Within HUXt, CMEs are parameterized as cone CME perturbations to the solar wind speed. Spatially, the cone CME consists of two hemispheres connected by a cylinder; at one extreme a CME is spherical, while more generally it is sausage shaped. The axis of the cone CME is directed radially and located by the CME's source longitude and latitude. The CME's width is used to parameterize the angular extent of the hemispheres, while the "thickness" sets the length of the cylindrical portion. This structure is advected through the model inner boundary at the CME's speed, and anywhere on the boundary within the cone CME domain is assigned the CME speed. This perturbation then propagates hydrodynamically through the model solution. CMEs are traced through the HUXt solution by computing the difference between the ambient and CME solutions, identifying and tracking areas where the speed differences are >20 km s −1 . In this work we must also calculate the flank of the CME from the HI1A and HI1B perspectives. This is achieved by computing the elongation of each point on the CME boundary and finding the point with maximum elongation, for both the HI1A and HI1B field of view.
For each event, we first use HUXt to produce a deterministic hindcast, using the same cone model parameters as was used in the SWPC WSA-ENLIL forecast. The HUXt inner boundary solar wind speeds at 30r ⊙ are set by the MAS coronal model, driven by Carrington maps of the photospheric magnetic field provided by the HMI instrument on NASA's Solar Dynamics Observatory. The MAS solutions are available at this site (https://www.predsci.com/mhdweb/home.php). Thus, we must estimate the time at which the cone CME enters the model domain. We linearly extrapolate the WSA-ENLIL CME initiation time at 21.5r ⊙ , assuming the the CME speed remains constant. An example of the output of HUXt for these hindcasts are shown in Figures 1 to 4. HUXt returns the boundary of the CME region, and from this we compute the maximum elongation of the CME flank in the HI1A and HI1B fields of view. Further to this, we also produce 200-member ensemble hindcasts of each event, where the CME cone model parameters are randomly perturbed. Ideally, we would sample from the empirically determined uncertainty distribution of each parameter. However, these distributions are not accurately known. Therefore, for this proof-of-concept study, we simply assume that the parameters uncertainty distributions are uniform. Each parameter is perturbed by sampling from the random uniform distribution, with the following limits: speed is ±200 km/s; longitude is ±20 • ; latitude is ±20 • ; width is ±20 • ; radial thickness is ±2 r ⊙ ; and initial height is ±3 r ⊙ . We consider these uncertainty limits as at the larger end of plausibility, but they are comparable to those explored in Pizzo et al. (2015). Additionally, the time that the cone CME enters the model domain is also updated to account for the perturbed CME speed and initial height.

Results
From both the deterministic and ensemble HUXt runs, we compute the expected arrival of a CME at Earth, and the time-elongation profile of a CMEs flank that would be observed by HI1A and HI1B, as shown schematically in Figures 1-4. Figure 5 shows the observed time elongation profiles for HI1A and HI1B, as well as the modeled time elongation profiles for both the deterministic and ensemble HUXt runs, for each of CME-1 to CME-4. It should be noted that the comparison between the observed and modeled time elongation profiles is limited by idealized nature of the cone CME paramterization and that we do not forward model the HI observations from the HUXt solutions. However, for the purposes of this proof-of-concept study, we consider the comparison of these observed and modeled time elongation profiles sufficient.
For these four events, there is some agreement between the observed and modeled time elongation profiles. We note that the agreement is typically better for the HI1B SSW observations, rather than HI1A, especially for CME-2, CME-3, and CME-4; these SSW profiles are within the ensemble spread and track closely to the deterministic run and other ensemble members.
This is less the case for the HI1A SSW profiles, which are typically closer to the edge of the ensemble distribution and in some cases outside it, particularly for CME-2 and CME-4. For CME-2 the HI1A SSW profile follows an obviously different trajectory from the deterministic HUXt run and is outside of the spread of all ensemble members. The cause of the disagreement in HI1A is not immediately clear. Both the SSW tracking of the CME in HI1A images, and the representation of the CMEs evolution within HUXt could be factors. It is also possible that the best estimates of the CME parameters were far from the truth, or that the ensemble spread was not appropriate for this event. We investigated this by repeating the ensemble analysis with twice the range in the cone CME parameter. Even with double the range, the SSW HI1A profile for CME-2 remained outside the ensemble spread, while for CME-4 it was at the very edge of the ensemble. However, the CME front is clearly defined in both the HI1A and HI1B differenced images, and the uncertainty in the SSW profiling of this event is modest. Therefore, on balance, we expect that the representation of the CME with the cone model in HUXt is the main factor. In fact, it is plausible that CME front could have been deformed at heliospheric distances below the inner boundary of HUXt at 30r ⊙ (Kay et al., 2015), which might explain these differences. We are experimenting with moving the inner boundary of HUXt to lower altitudes, and the impacts this has will form the basis of a future study.
Furthermore, we note that in all instances the SSW profiles lead HUXt, both in the deterministic form and the majority of the ensemble members. We interpret this as evidence there might be a bias in HUXt, such that CMEs slightly lag the observations. However, it is not obvious what the source of such a bias would be, and could involve many factors in the model's implementation. This too requires further investigation in a future study. But we are nevertheless able to compare weighted and unweighted HUXt ensembles, which are subject to the same biases.
To be able to use the SSW profiles to constrain the ensemble hindcast, it is first necessary to show that it is possible to weight each ensemble member according to its agreement with the SSW time-elongation profile. Here we test using a weight based on the root-mean-square difference between the SSW and ensemble time-elongation profiles. To do this, we linearly interpolate the HUXt time-elongation profile onto the SSW profile, and then compute the root-mean-square difference between the HUXt and SSW profiles. The inverse of the root-mean-square difference is then used to weight each ensemble member. We also compute a HI1A and HI1B combined weight using the mean of the HI1A and HI1B weights. Figure 6 shows scatter plots of the absolute arrival time error for each ensemble member as a function of the weight computed from the HI1A, HI1B, and combined HI1A + HI1B data, respectively.   In most cases, it is clear that members with larger weights typically have smaller arrival time errors. However, it is also apparent that the relationship between ensemble member weight and arrival time error is not linear and there is significant scatter. A detailed understanding of this requires further investigation, but we suggest that this is possibly due to the fact the observed time-elongation profiles of the CME flanks are projections onto a plane of the true CME evolution. Consequently, there is no unique mapping of a CMEs parameters to the observed time elongation profiles, and many profiles are degenerate for different combinations of CME parameters, both modeled and observed. Nonetheless, Figure 6 does show that the computed weights do contain useful information about the expected arrival of the studied CMEs. On this basis, it is reasonable to test if these weights can be used to improve the expected CME arrival times from the ensemble hindcasts.
In the following, we analyze the CME arrival time distributions for the ensemble hindcasts and compare them with weighted arrival time distributions, where each ensemble member is weighted according to their agreement with the SSW time elongation profile, as presented in Figure 6. We note that weighting ensembles with one source of relevant information is expected to lead to a reduction in ensemble spread. Therefore, although we expect that the HUXt ensemble spread will decrease after weighting with the SSW profiles, it is necessary to quantify by how much, and whether the reduced spread remains consistent with the observations. Figure 1 shows that for the deterministic run this CME hits Earth with a glancing blow, with Earth intersecting the western flank. Consequently, when perturbing the cone CME initial conditions, many of the ensemble members subsequently miss Earth. Of the 200 ensemble members, there are 113 hits, and 87 misses. Figure 7 presents histograms of the distribution of the CME arrival time for the HUXt ensemble members that hit Earth. In each panel, the gray histogram shows the distribution of the full ensemble of arrival times. The solid black vertical line shows the observed arrival time, the dashed red line marks the SWPC forecast arrival time, and the dotted green line marks the arrival time of the deterministic HUXt run. The black square shows the ensemble median, with the error bar marking the range between the 0.16 and 0.84 quantiles of the arrival time distribution (an approximate 1 − error). The full range of the ensemble arrival time distribution spans 31.2 hr, which is significantly larger than the average arrival time uncertainty over many different CME forecasts of ±10 hr (Riley et al., 2018). This highlights the need for event-specific uncertainty estimates. The weighted ensemble distributions are also shown in Figure 7. The blue, pink, and orange histograms show the arrival time distributions weighted by the HI1A, HI1B, and combined HI1A and HI1B observations, respectively. Similarly, the blue, pink, and orange circles mark the medians and approximate 1 − spread of the weighted distributions. Details of the modeled arrival times are provided in Table 2.

CME-1 Arrival Time Distribution
For this event, all the ensemble members arrive significantly later than observed CME at L1. Consequently, the observed arrival is outside the uncertainty region of the unweighted and weighted ensemble medians, and the corresponding errors of the ensemble medians span 15.8 to 19.3 hr. The medians of the weighted distributions are closer to the observed arrival time than the unweighted ensemble median, although these differences are smaller than the ensemble spread in each case. Additionally, the spread of the arrival time distributions is slightly less for each of the weighted ensembles, indicating smaller uncertainty (∼1 hr) in the expected arrival time. Focusing on the HI1A/HI1B weighted ensemble, there is a 15% reduction in the arrival error, and a 7% reduction in the arrival uncertainty, relative the the unweighted ensemble.

CME-2 Arrival Time Distribution
For CME-2, all 200 of the ensemble members hit Earth, and the distribution of arrival times is shown in Figure 8. The full range of the ensemble arrival time distribution spans 19.5 hr; this is a much smaller ensemble spread than any of the other studied CMEs. In this instance, the observed arrival time is within the uncertainty band for each of the weighted and unweighted ensembles, and the arrival errors are all small, between 1.6 and 2.2 hr. Again, the medians of the weighted distribution are closer to the observed arrival time than the unweighted ensemble median, but these differences are smaller than the ensemble spread in each case. The spread of the arrival time distributions is less for each of the weighted ensembles, indicating slightly smaller uncertainty (∼1 hr) in the expected arrival time. For the HI1A/HI1B weighted ensemble, there is a 27% reduction in the arrival error, and a 9% reduction in the arrival uncertainty, relative the the unweighted ensemble.

CME-3 Arrival Time Distribution
For CME-3, 158 of the ensemble members hit Earth, and the distribution of arrival times is shown in Figure 9. The full range of the ensemble arrival time distribution spans 50 hr. In this instance, the observed arrival time Figure 8. Histograms of the distribution of modeled CME arrival time for CME-2, in the same format as Figure 7.
leads all the ensemble members, being on the edge of the ensemble distribution, but outside the uncertainty band for each of the weighted and unweighted ensembles. The arrival errors of the ensemble medians are between 10 and 12.3 hr. Again, the medians of the weighted distribution are closer to the observed arrival time than the unweighted ensemble median, but these differences are smaller than the ensemble spread in each case. The spread of the arrival time distributions is reduced for each of the weighted ensembles, indicating smaller uncertainty (∼3 hr) in the expected arrival time. For the HI1A/B weighted ensemble, there is a 19% reduction in the arrival error, and a 21% reduction in the arrival uncertainty, relative the the unweighted ensemble.

CME-4 Arrival Time Distribution
For CME-4, 198 of the ensemble members hit Earth, and the distribution of arrival times is shown in Figure  10. The full range of the ensemble arrival time distribution spans 51 hr. In this instance, the observed arrival time leads many of the ensemble members, but is close to the center of the ensemble distribution, and inside the uncertainty band for each of the weighted and unweighted ensembles. The arrival errors of the ensemble medians are between 4.5 and 6.6 hr. Again, the medians of the weighted distribution are closer to the observed arrival time than the unweighted ensemble median, but these differences are smaller than the ensemble spread in each case. The spread of the arrival time distributions is less for each of the weighted ensembles, indicating a smaller uncertainty (∼4 hr) in the expected arrival time. For the HI1A/B weighted ensemble, there is a 20% reduction in the arrival error, and a 23% reduction in the arrival uncertainty, relative the the unweighted ensemble. Figure 9. Histograms of the distribution of modeled CME arrival time for CME-3, in the same format as Figure 7. For each of these four events, the weighted ensemble medians are closer to the observed arrival time than the unweighted ensemble and also have less uncertainty (with the exception of CME-2, for HI1A only). The results are similar when weighting by either HI1A, HI1B, or HI1A/HI1B combined. For the HI1A/HI1B weighting, there is a mean reduction in the arrival error of 20.1 ± 4.1%, and a reduction in the arrival uncertainty of 15.0 ± 7.2%, relative to the unweighted ensemble. These statistics are similar for the HI1A and HI1B weightings, but we use the HI1A/HI1B combined weighting to illustrate this as we think the combined weighting is probably more robust than weighting by a single perspective. We interpret this as evidence that the inclusion of the HI1 tracking data has improved the performance of the ensemble hindcast, relative to the unweighted ensemble.
Note that availability of photospheric magnetic field data, and hence the ambient solar wind speed input, were significantly different between the HUXt hindcasts and the SWPC genuine forecast. Thus, it isn't useful to directly compare their performance. The key conclusions we can draw are that HUXt can be used to perform a skilful hindcast of CME arrival time and that weighting the ensemble by comparison with HI observations can improve the skill and reduce the uncertainty.

Conclusions
We have used the HUXt solar wind model to produce an ensemble hindcast of four CMEs from 2012. By comparing the modeled and observed locations of the CMEs flanks, we have shown how HI observations can be used to weight ensemble members to provide an improved estimate of CME arrival time.
We have shown that HI observations can be used to constrain the ensemble hindcasts of these four CMEs and that this leads to reduced uncertainty in the CME arrival time hindcasts. However, it is important to note that uncertainty estimates are only useful if they are, on average, consistent with the observed arrival times.
Of the four CMEs we study, the hindcasts of CME-2 and CME-4 are consistent with the observed L1 arrival, and so it is fair to say that the weighted ensemble has improved these hindcasts. For CME-1 and CME-3, the observed arrival leads all the ensemble members, and so the hindcast uncertainty is not consistent with the observed arrival, and the weighting cannot improve this situation. It is not clear why the observed arrival is outside the ensemble spread for these two events, especially CME-1 which arrives significantly earlier.
We repeated these hindcasts with double the spread in the perturbed cone CME parameters and it did not significantly improve the situation. We are confident that the Solar Stormwatch identifications are robust and consistent with the CME identified by SWPC. Therefore, this suggests to us that HUXt isn't correctly capturing the evolution of this CME due to either the background solar wind environment or a breakdown of the assumptions of the cone CME parameterization, or perhaps both factors. Additionally, it is possible that the incorrect signature has been identified as this event in the L1 in situ data. Clearly, these issues need further investigation so that the technique proposed here can be developed into a robust tool for hindcasting, and potentially forecasting.
Therefore, following this work, several lines of research should be taken. First, we must test this method on more events to build a robust assessment of its performance. Solar Stormwatch is currently tracking over 1,000 CMEs in the same manner as Barnard et al. (2017), and these will be ready for analysis in 2020. Additionally, the time elongation profiles provided by the HELCATS project could be used (https:// doi.org/10.6084/m9.figshare.5803152.v1); however, these do not track features as consistently as the Solar Stormwatch profiles.
Furthermore, a similar experiment could be performed with the WSA-ENLIL ensemble forecast runs. The approach advocated in Harrison et al. (2017) was an "Ensemble pruning" technique, where HI data are used to exclude ensemble members considered unrealistic compared to the HI observations. Given the low computational expense of HUXt, it might be beneficial to run a large ensemble using HUXt to compare against the HI observations, from which a targeted ensemble of the most plausible CME parameters could be run in WSA-ENLIL. Similarly, this approach could also be useful within the ADAPT-WSA-ENLIL modeling framework (Adamson et al., 2019).
The initial condition ensemble used here only examined uncertainty in the cone model CME parameters. However, there is also uncertainty in the ambient solar wind speed solution. Perturbation of the ambient solar wind solution, using a method similar to Owens and Riley (2017b), could help more fully quantify all sources of uncertainty; this is a practical option with a computationally efficient model such as HUXt.
It is also possible more advanced comparisons between the HI observations and modeled CMEs could be beneficial. Here we have limited comparison to the forward flank of the CME from the HI1A and HI1B perspectives. This consequently ignores much of the extra information regarding both the modeled and observed CME. The advantage of our approach is that there are several well established tools and processes for identifying the forward flank of a CME in HI data Davies et al., 2009). However, it is possible that further improvements could be gained by also tracking the rear flank of the CME, as per Savani et al. (2009), or other features such as the "ghost fronts" reported in Scott et al. (2019). Physics-based forward modeling of the HI images from the modeled solar wind and CME is also worth investigation. A Thomson scattering simulation of the modeled inner heliosphere could allow a more direct comparison with HI data, that doesn't require manually identifying CME properties. However, due to the nature of the cone model CME parameterization, this approach could be particularly challenging. Cone model CMEs are a hydrodynamic perturbation that ignores the magnetic and density structure of observed CMEs. This parameterization works well for estimating the front of CMEs in numerical solar wind models, but could be misleading when comparing real and forward modeled HI images. Therefore, it is quite plausible this approach would require more complex CME parameterizations to be effective.
The next operational space weather missions are currently being developed by NASA and ESA. Of particular note is the proposed Lagrange mission to the L5 point. Our results provide support for including a HI on such a mission, as we have demonstrated a simple way that these data can be used to improve the prediction of CME arrival time at Earth.

Conflict of Interest
Luke Barnard received a grant from NERC (Grant NE/P016928/1/) and personal fees from University of Reading for his work as a postdoctoral research assistant during the conduct of this study. Mathew Owens received a grant from NERC (NE/P016928/1/) during the conduct of this study..

Data Availability Statement
The STEREO HI data used in this study were obtained from the UK Solar System Data Centre (UKSSDC, https://www.ukssdc.ac.uk/solar/stereo/data.html). The HUXt model, including the boundary conditons used here, was obtained from this site (https://github.com/University-of-Reading-Space-Science/HUXt). The Solar Stormwatch classifications of the CME flank and the analysis code used in this study are made available at this site (https://github.com/University-of-Reading-Space-Science/HIEnsembleHindcast).