Impact of Inner Heliospheric Boundary Conditions on Solar Wind Predictions at Earth

Predictions of the physical parameters of the solar wind at Earth are at the core of operational space weather forecasts. Such predictions typically use line‐of‐sight observations of the photospheric magnetic field to drive a heliospheric model. The models Wang‐Sheeley‐Arge (WSA) and ENLIL for the transport in the heliosphere are commonly used for these respective tasks. Here we analyze the impact of replacing the potential field coronal boundary conditions from WSA with two alternative approaches. The first approach uses a more realistic nonpotential rather than potential approach, based on the Durham Magneto Frictional Code (DUMFRIC) model. In the second approach the ENLIL inner boundary conditions are based on Inter Planetary Scintillation observations (IPS). We compare predicted solar wind speed, plasma density, and magnetic field magnitude with observations from the WIND spacecraft for two 6‐month intervals in 2014 and 2016. Results show that all models tested produce fairly similar output when compared to the observed time series. This is not only reflected in fairly low correlation coefficients (<0.3) but also large biases. For example, for solar wind speed some models have average biases of more than 150 km/s. On a positive note, the choice of coronal magnetic field model has a clear influence on the model results when compared to the other models in this study. Simulations driven by IPS data have a high success rate with regard to detection of the high speed solar wind. Our results also indicate that model forecasts do not degrade for longer forecast times.


Introduction
Operational solar wind forecasts are commonly produced by a heliospheric model whose inner boundary conditions are produced by a separate coronal model (see Figure 1). The overarching objective of this paper is the evaluation of different realizations of these boundary conditions for our heliospheric Sun-to-Earth model ENLIL (Odstrcil, 1994(Odstrcil, , 2003Odstrcil et al., 1996). ENLIL is a model that transports solar wind parameters from the vicinity of the Sun to Earth. ENLIL belongs to a group of reduced physics-based models of which Euhforia (Pomoell & Poedts, 2018), SUSANOO (Shiota & Kataoka, 2016), the Coronal-Heliosphere (CORHEL) model (Lionello et al., 2009), and SWMF-AWSoM (Space Weather Modeling Framework -Alfvén Wave Solar Atmosphere Model) (Sachdeva et al., 2019;Sokolov et al., 2013;van der Holst et al., 2014) are best known. Strictly speaking for simulating solar wind parameters, a reduced physics-based or even full MHD (Magneto Hydro Dynamic) model is not necessary. In some cases (e.g., Owens et al., 2017;Reiss et al., 2016;Riley et al., 2017), the models do not simulate the corona (and heliosphere) at all but rely on empirical methods to forecast the solar wind speed at Earth. Other approaches use a mix of full MHD simulations and semiempirical methods (Pinto & Rouillard, 2017). Here our boundary conditions that drive ENLIL are based on a simulation of the magnetic field in the corona, with the exception of IPS where boundary conditions are derived by iteratively fitting a kinematic solar wind model to observations. Coordinated Modeling Center (Kutzensova, 2020a) and find that there is not a single candidate that performs best, with each model having its strengths and weaknesses. • Comparisons of solar wind data from potential (electric currents in the corona are neglected) and MHD simulations in Riley et al. (2006) show that, if time-dependent effects can be neglected, the potential method provides a reasonable approximation to the MHD method, although there still are notable differences (e.g., the potential model appears to underestimate the amount of flux opened up to the heliosphere). • Edwards et al. (2010) have compared the magnetic structure and the resulting solar wind speed distribution from potential and nonpotential simulations (currents are accounted for) at 21.5R ⊙ for two solar maximum dates. They have identified considerable differences between the two types of coronal simulations: The nonpotential model has more complex magnetic structures and more open flux and, by using an empirical wind speed formula, leads to higher predicted wind speeds for the two study dates.
Finding suitable boundary conditions at 21.5R ⊙ for ENLIL simulations of the heliosphere is a crucial point in solar wind prediction and prone to large errors. Ideally, such boundary conditions would be inferred directly from magnetic field observations at 21.5R ⊙ . But measuring the coronal magnetic field is extremely difficult, chiefly because of the extremely low intensity of coronal magnetic field, and therefore, such measurements are very rare (e.g., Cargill, 2009). Routine observations on operational time scales (at least daily) of the photospheric magnetic field are, however, available (e.g., the Global Oscillation Network Group [GONG] network of magnetograms Harvey et al., 1996). The representation of the magnetic field at 21.5R ⊙ is thus based on these photospheric observations. There are also compromises to be made with the formulation of the coronal models. While a MHD simulation in the coronal domain, for example, the MHD-Around-a-Sphere, MAS, model (Lionello et al., 2011;Mikić et al., 1999;Riley et al., 2003Riley et al., , 2001 would provide reasonable boundary data within current understanding of physical processes, it is computationally time consuming even on today's supercomputers, especially in an operational context where forecast models are run every 1-2 hr. A number of simulation methods for reconstructing the coronal magnetic field (that can also be used in ENLIL) from observations have been developed in recent years. The methods use some form of simplification and parameterization, most notably the Potential Field Source Surface (PFSS) method (Altschuler & Newirk, 1969;Nikolic, 2017;Schatten et al., 1969), the magnetofrictional (MF) method (van Ballegooijen et al., 2000;Yang et al., 1986;Yeates et al., 2008), and the Current Sheet Source Surface (CSSS) method (Poduval & Zhao, 2014;Poduval, 2016;Zhao & Hoeksema, 1995). For the outer corona (beyond a source surface of, for example, r > 2.5R ⊙ where the magnetic field is assumed to be fully radial; see also Nikolic, 2017), the Schatten Current Sheet (SCS) method (Schatten, 1972) is commonly employed. In the Met Office (MO) version of the WSA code (see below), we also use a source surface of 2.5R ⊙ (see also Figure 1), though there is evidence emerging that this value should be revised to lower values (e.g., Nikolic, 2019).
Once the 3-D magnetic field has been reconstructed from measurements, an empirical wind speed formula is then used to compute the inner boundary conditions for initializing the heliospheric simulation (ENLIL) from the coronal simulation data at the interface between the two domains. The ENLIL inner model boundary sits at 21.5R ⊙ , and the boundary conditions are the radial magnetic field and radial solar wind speed (see Figure 1). Common empirical parameterization forms (which are discussed in the following sections) are the Wang-Sheeley (WS) model (Arge & Pizzo, 2000;Wang & Sheeley, 1990, 1992, the Wang-Sheeley-Arge model (WSA) (Arge et al., 2003), and the Distance from the Coronal Hole Boundary (DCHB) model (Riley et al., 2001).
In this paper we compare the sensitivity of ENLIL (operationally used at the MO) to coronal boundary conditions produced by WSA, which uses the PFSS approach, with the Durham Magneto Frictional Code (DUMFRIC) model, which uses the MF approach. In addition, a PFSS version of DUMFRIC (DUPFSS) enables us to compare two mathematical realizations of the PFSS solutions (DUPFSS vs. WSA) but also to assess the benefit of the MF solution DUMFRIC over DUPFSS in the same computational framework. As part of this study we also compare the impact of initialising the WSA model with GONG (observed) synoptic maps that are based on a data assimilation approach. In the following, we call these GONG-derived maps ADAPT (Air Force Data Assimilative Photospheric Flux Transport) and discuss it later in the text. This provides alternative boundary conditions in ENLIL to those obtained from GONG maps. As an alternative to empirically derived solar wind velocities, it is also possible to derive solar wind velocities directly from observations processed with a Computer Assisted Tomography (CAT) algorithm developed by the University of California San Diego (UCSD) based on IPS observations at Earth (IPS-TOMO). We intercompare these results with IPS-driven ENLIL model runs were boundary conditions in ENLIL at 21.5R ⊙ have been derived from IPS observations and magnetic field estimates from the CSSS model. ENLIL and WSA are installed at the MO. DUMFRIC model is installed at the University of Durham. The IPS and CSSS model are installed at UCSD. In order to quantify the impact of grid resolution in ENLIL, we also compare the models against the output of the operational MO ENLIL forecasts, which are carried out at a higher resolution. Table 1 details the models that we discuss in the following sections.

WIND Observations
To compare the model output against observations, we use WIND spacecraft observations at Lagrange Point 1 (L1) (e.g., King & Papitashvili, 2005). The WIND spacecraft was launched in 1994 and is still operating

Space Weather
10.1029/2020SW002499 today. A magnetometer on WIND measures the magnetic field magnitude (e.g., Lepping et al., 1995) and two Faraday cup sensors measure among other parameters the solar wind speed and density (e.g., Ogilvie et al., 1995). The observations of solar wind speed, plasma density, and magnetic field magnitude are already hourly averaged in the OMNI data base (https://spdf.gsfc.nasa.gov/pub/data/omni/low_res_omni/).

Magnetograms
ENLIL is driven by photospheric field measurements, that is, magnetograms, except for IPS-ENLIL where boundary conditions are derived by different means. Magnetograms are used to calculate the boundary conditions for ENLIL. The GONG full-disc photospheric magnetograms are based on six observing sites that are used to derive a map of the magnetic field over the entire surface of the Sun. This full-surface map is called a synoptic map because it provides a general view of the field condensed from many minute-by-minute images. Our synoptic GONG maps are daily updated (e.g., Arge & Pizzo, 2000) by the observed line-of-sight magnetic field around the central meridian (±60 • ), since measurements closer to the limb than this have large errors. But to mathematically simulate the magnetic field from observations, one has to use line-of-sight observations of the magnetic field that represent the full solar surface. The GONG synoptic maps account for this, and a synoptic map consists of observations around the central meridian over a full solar rotation. This also means parts in these daily updated synoptic maps at any one time are more than 2 weeks old (e.g., Petrie et al., 2018). Figure 1 shows an example of a GONG synoptic map and the resulting boundary conditions in ENLIL. In this study we use uncorrected GONG versions of the synoptic maps with respect to the poles. Figure 1also shows a typical ADAPT synoptic map (Arge et al., 2010;Henney et al., 2012;Hickemann et al., 2015) and the resulting boundary conditions for ENLIL. The ADAPT synoptic maps are constructed from GONG magnetograms by evolving them using a photospheric flux transport model, which is based on the Worden-Harvey model (Woorden & Harvey, 2000). New data are assimilated into the model once per day, and maps are output with a 2-hr cadence. GONG daily updated synoptic maps can only see new active regions as they appear or disappear within ±60 • of the central meridian. In contrast ADAPT-based synoptic maps will account for transient features over the full solar disc because of the flux transport model. In theory this should also improve the solar wind forecast at Earth because of the improved input to ENLIL. In the current version and at the time when we carried out our calculations, the ADAPT data set consists of an ensemble of 12 realizations, which account for model parameter uncertainties in the supergranular flow (e.g., Hickemann et al., 2015). We picked realization number 1 for our experiments, rather than using all ensemble members, since using one member rather than all only has a minimal influence on the results (see Weinzierl et al., 2016). Furthermore, the posterior ensemble spread (after running ENLIL, not shown) of ADAPT-based wind speeds, plasma densities, and magnetic field magnitude at Earth is not very sensitive to the ensemble realization number, with a narrow spread, and choosing only one ensemble member is a fair assumption at least for the low-resolution ENLIL configuration. It has to be mentioned here that in a future version of ENLIL, the posterior (at Earth) ensemble spread will also increase. ENLIL has not been developed with ensembles in mind, but the whole point of an ensemble system is the representation of model uncertainties. The processing chain of using ADAPT is similar to using GONG. Instead of running ENLIL every 2 hr based on one single GONG input file of synoptic magnetogram observations, forecasting centers like the MO will run ENLIL 12 times in parallel every 2 hr, representing 12 ADAPT realizations. We briefly mention here that the boundary conditions for the magnetic field at 21.5R ⊙ (see following sections) for the IPS-ENLIL runs are also based on GONG magnetograms. For the tomographic model (IPS-TOMO) that is run by UCSD the magnetograms are provided by NSO NISP/SOLIS (National Solar Observatory Integrated Synoptic Program/Synoptic Optical Long-term Investigations of the Sun).

Reconstructing the Magnetic Field at 21.5R ⊙
The aim is to reconstruct the magnetic field in the inner parts of the corona from the solar photosphere (1R ⊙ ) to 21.5R ⊙ . This information is essential for calculating the boundary conditions for ENLIL, which is a two-part process: • Calculate the magnetic field from GONG (ADAPT) synoptic map observations first from 1R ⊙ to a source surface height of 2.5R ⊙ , and then calculate the (radial) magnetic field at 21.5R ⊙ . • Calculate the boundary conditions for ENLIL at 21.5R ⊙ .
We calculate the magnetic field from 1R ⊙ to 2.5R ⊙ by two different methods. For the PFSS simulation the magnetic field between 1R ⊙ and 2.5R ⊙ is computed by extrapolation from the observed photospheric radial magnetic field, assuming that the field is current free ( ⃗ ∇× ⃗ B = 0) and radial at 2.5R ⊙ (e.g. Nikolic, 2017). The second method the so called nonpotential coronal model simulates the evolution of the large-scale magnetic field between 1R ⊙ and 2.5R ⊙ using the MF method. Here, the velocity ⃗ v is approximated by the MF form is the electron current and is a friction coefficient. This enforces the relaxation of the magnetic field toward a nonlinear force-free state where ⃗ J × ⃗ B = 0. The MF model allows for a gradual build-up and conservation of magnetic energy and electric currents in the corona. In theory this should give a more realistic description of the reconstructed 3-D magnetic field compared to the PFSS method. However, this has never been tested in an operational environment. The temporal evolution of ⃗ B = ⃗ ∇ × ⃗ A is driven by photospheric B r maps from which the update The method used for the electric field reconstruction is described in Weinzierl et al. (2016) and based on work by Amari et al. (2003), Fisher et al. (2010), and Kazachenko et al. (2014). The MF method uses a grid that is equally spaced in , s, , where = ln and s = cos is part of spherical coordinates (r, , ). The resolution is 60 × 180 × 360 grid boxes.
All the models that we discuss in the following (see also Table 1), with the exception of IPS-ENLIL (which uses a CSSS Approach, e.g., Jackson et al., 2016) use as a second step some form of the SCS method to extrapolate the magnetic field in the outer corona (2.5R ⊙ to 21.5R ⊙ ) (Nikolic, 2017(Nikolic, , 2019Schatten, 1972). Here, we solve for a potential field using the absolute values of the radial magnetic field component at the source surface (derived from the PFSS or MF method) and the assumption that ⃗ B → 0 at ⃗ r → ∞. Then, the field line direction is reversed where B r < 0 at 2.5R ⊙ , producing infinitesimally thin current sheets. Once a solution beyond the source surface has been computed, the disoriented field is again reversed so as not to violate the Maxwell equations.
After the coronal magnetic field at 21.5R ⊙ has been reconstructed from photospheric magnetogram observations (e.g., GONG and ADAPT), the solar wind speed at 21.5R ⊙ for input into ENLIL (e.g., Odstrcil, 2003;Odstrcil et al., 1996Odstrcil et al., , 2004Odstrcil et al., , 2005 has to be calculated. For this we use an empirical model, which is part of our overarching models WSA, DUPFSS, DUMFRIC, and IPS. For example WSA includes the PFSS model, SCS model, and empirical velocity equation model (see next sections). The radial magnetic field at 21.5R ⊙ follows from the SCS model and is another input boundary for ENLIL ( Figure 1). ENLIL transports these initial boundary conditions in the heliosphere from the Sun to Earth and beyond to Mars at 1.7 AU. However, we only output time series of plasma parameters at the point of Earth and compare it against time series observations from WIND.
In the following we detail these empirical velocity models used in this study to determine the boundary conditions for our heliospheric Sun to Earth model ENLIL.

The Solar Wind Parameterization in WSA v2.2
The empirical relationship to derive the solar wind speed v r at 21.5R ⊙ is defined as where Θ b (in degrees) is the minimum distance of a fieldline footpoint from a coronal hole boundary in the photosphere. There are five dimensionless free parameters ( , , , , l), in this equation ( has dimension of degrees), plus the fast and slow wind speed v fast and v slow .
The dimensionless flux tube expansion factor in Equation 1 is defined as with B r (R s ) the radial magnetic field magnitude at R s > R ⊙ above the solar radius R ⊙ . The parameter f s is calculated with the magnetic field at 2.5R ⊙ .

10.1029/2020SW002499
We use the following parameters in Equation 1: The solar wind resulting from Equation 3 is mapped onto the WSA grid of 144 (longitude) × 72 (latitude) grid cells (e.g., Figure 1). This information is then mapped onto the ENLIL inner boundary grid. GONG-WSA-ENLIL, like most of the experiments shown here, runs ENLIL with a low resolution, which means 256 grid cells in the radial direction r, 30 cells in Θ (from 30 • to 150 • ), and 90 cells in based on a spherical coordinate system. GONG-WSA-ENLIL-Oper is simply the operational version of ENLIL run at the MO (see Table 1). This is run with a higher resolution (512 × 360 × 30).
We quickly mention here that the WSA formula was developed from the WS model (Arge & Pizzo, 2000;Wang & Sheeley, 1990, 1992, which only uses f s as an input parameter: and the DCHB formula (Riley et al., 2001), which only uses Θ b and is defined as where Θ b is the minimum distance from an open-closed boundary, measured over the photosphere; is a measure of how thick the slow flow band is (≈0.1 radians); and w is the width over which the flow is raised to coronal hole values (≈0.05 radians). Recent research suggests that Θ b is the dominating parameter for determining the solar wind speed, and in some cases the presence of f s actually weakens the predictive power of the WSA formula (Riley et al., 2015). However,  found evidence that the expansion factor might influence the distribution of fast solar wind deep inside coronal holes and that both factors (Θ b and f s ) might be important to accurately model the solar wind.

The Solar Wind Parameterization in the DUMFRIC and DUPFSS Model
For the models ADAPT-DUPFSS-ENLIL and ADAPT-DUMFRIC-ENLIL the coronal models DUPFSS and DUMFRIC are based on ADAPT input synoptic maps. DUPFSS uses a form of the potential field surface method and SCS method to compute the magnetic field within the corona. The PFSS model in DUPFSS is similar to the PFSS model in WSA except for the way how the solutions are obtained. DUMFRIC uses a MF method to calculate the magnetic field at the source surface of 2.5R ⊙ but also employs the SCS model for the domain 2.5 to 21.5R ⊙ .
Equations 3-5 were developed for input data using a relatively coarse grid, which would not resolve much of the fine-scale structures of the underlying magnetic field and small (or thin) coronal holes. The derived solar wind speed and density maps in Edwards et al. (2010) are based on Equation 2 with the default parameters for that model. We developed a new formula for v r by reducing the number of free parameters and leaving out the expansion factor term in our modified DCHB model: with Θ b in radians, > 1, < 1, v slow ∈ [100 … 400] (km s -1 ), and v fast ∈ [500 … 1,000] (km s -1 ). The resolution of v r in Equation 6is 180 (longitude) × 360 (latitude) grid boxes. The solutions (DUPFSS and DUMFRIC) of the coronal magnetic field are then used in Equation 6 to compute the ENLIL boundary conditions.
By comparing the predicted solar wind speed histograms from ADAPT-DUMFRIC-ENLIL model runs at 1 AU with observations, we determined a parameter set for our modified DCHB model: • v slow = 200 km s −1 .

IPS Tomography and IPS Boundary Conditions
In IPS-ENLIL and IPS-TOMO (see Table 1) we use Interplanetary Scintillation data. The amplitude and timing of small-scale scintillation patterns in the signal from compact radio sources provide information on the integrated density and perpendicular velocity perturbations along the line of sight associated with solar wind plasma structures. These patterns are routinely observed by the ISEE (Institute for Space Earth Environmental Research) radio telescope array Tokumaru, 2013) with a daily cadence. These data are not available from December to April for velocity determinations during the period in this study due to technical issues of operating two of three of the IPS telescopes in snow. Following 2010 a new ISEE array began operation (Kojima et al., 2002;Tokumaru et al., 2011) that allows year-round operation that provides scintillation level.
This data set from radio observations is used in the University of California at San Diego IPS-tomography model (e.g., Jackson et al., 2008). The IPS-TOMO data (see Table 1) provide the density of the solar wind integrated along the line of sight to the radio source, by correlating the signal between three radio telescopes and measuring the lag of the IPS signal between them. This information can be input into a CAT code Kojima et al., 1998) to reconstruct the 3-D solar wind in the heliosphere. The UCSD 3-D tomography model iteratively reconstructs a global time-dependent 3-D model from a source surface at 15R ⊙ out to 3 AU for the plasma parameters velocity and density by fitting IPS lines of sight to a kinematic model (Jackson et al., 2003(Jackson et al., , 2010. These values from the 3-D model also provide the output of solar wind speed and density at Earth. As the IPS observations do not provide information on the magnetic field for the reconstruction method of the solar wind, the magnetic field in the model is obtained from an extrapolation of the photospheric magnetic field using the CSSS model (Poduval, 2016;Poduval & Zhao, 2014;Zhao & Hoeksema, 1995), which is then advected with the stream in the model (Dunn et al., 2005). In this study, we use output from the UCSD CAT model, which was run by UCSD to provide solar wind forecasts up to 5 days ahead.
Alternatively, solar wind parameters can be interpolated from the 3-D IPS tomography algorithm onto the inner boundary of ENLIL at 21.5R ⊙ , providing time-dependent boundary conditions for the solar wind speed and density. This frees us from using Equation 3 to construct the boundary conditions for ENLIL, and instead, we can use IPS for the initial solar wind speeds at the boundary of 21.5R ⊙ . However, the magnetic field at 21.5R ⊙ has to be inferred from observations. The magnetic field in the model is obtained from an extrapolation of the photospheric magnetic field (Dunn et al., 2005;Poduval, 2016;Poduval & Zhao, 2014;Zhao & Hoeksema, 1995). GONG magnetograms are used in the Stanford CSSS model (run at UCSD) to derive magnetic field as boundary condition for IPS-ENLIL. The GONG merged magnetograms used in the UCSD analysis and results first presented in Jackson et al. (2016) show good agreement with the observations for 10 years of data using both NSO SOLIS (Keller et al., 2003) and GONG magnetograms from 2005 to 2015.

Results
The chosen periods for our simulation are 1 May to 31 October in 2014 (2 years after the solar maximum) and 2016 (in the descending phase of solar cycle 24). These months are chosen since the IPS data are only available during summer months. Notwithstanding, the selected periods are likely to be sufficient to demonstrate systematic model errors. We run our models once per day from 0 UTC with a forecast length of 5 days. Exceptions are operational model GONG-WSA-ENLIL-Oper that we do not run ourself (output data are obtained from the MO archive instead) and IPS-TOMO (data are provided by UCSD). Although we ran daily ENLIL simulations for the ADAPT-DUMFRIC-ENLIL case during the May to October test period, these were taken from a continuously evolving MF simulation that was initialized at the beginning of January in each case. This allowed the background energy to "spin up" to a steady level. We hourly average the model output time series values of solar wind speed (km s −1 ), plasma density (N p cm −3 ) (here N p equates to the number of protons), and magnetic field magnitude (nT) at Earth position. Hourly averages ensure a meaningful comparison to the hourly averaged observations from the WIND spacecraft.
For all models, except IPS-ENLIL, IPS-TOMO, and the operational GONG-WSA-ENLIL-Oper, no coronal mass ejection (CME) information was input to the ENLIL runs. In the case of the IPS models, the CME signal is intrinsically present in the tomographic reconstruction, and GONG-WSA-ENLIL-Oper will routinely account for Earth directed CMEs based on space-borne coronograph observations from SOHO (SOlar and Heliospheric Observatory) (Brueckner et al., 1995;Domingo et al., 1995) and STEREO-A (Solar TErrestrial

10.1029/2020SW002499
RElations Observatory) (Harrison et al., 2005) and the creation of CME input files (cone files) by MO forecasters. Cone files for 2014 are not available from the MO archive, and adding cone files to ADAPT-DUPFSS-ENLIL and ADAPT-DUMFRIC-ENLIL is not straightforward, so to ensure a clean comparison, we removed the CME signal from the IPS-based and MO operational model runs. Details of how this is done appear in the appendix (see also Figure A1).

Verification Metrics
When we evaluate a model that is used for space weather forecasts, we are concerned with three main strands: • The background solar wind speed, plasma density, and magnetic field.
• Arrival times of CMEs at Earth.
• High speed solar wind stream from coronal holes (HSS).
Ideally, a model should be able to capture all three aspects. In the following we put special emphasis on the background and high speed solar wind (e.g., Grandin et al., 2019). CMEs and HSSs can lead to geomagnetic storms on Earth and can perturb the Earth's magnetic field (e.g., Kataoka & Pulkkinen, 2008). However, solar wind prediction models require a very good representation of the background solar wind speed because of the importance of filtering out the CME effect from HSSs.
We use statistical metrics to compare predictions of solar wind speed, plasma density, and magnetic field magnitude against observations from the WIND spacecraft at L1. We assume that observations at L1 represent the true state, although we acknowledge that observations as such can suffer from instrument calibration and observation errors and are of concern in data assimilation techniques (e.g., Lang & Owens, 2017). But also, the colocation of model output and observations has to be taken into account. We output model calculations at Earth, but we use observations from L1, which we treat as a good proxy for near-Earth observations. We also need to mention that we compare 3-D model output results against single-point observations at L1. This comparison is useful from a practical point of view because we are mostly concerned with operational forecasts of space weather-related impacts on Earth.

Initial Visual and Numerical Model Verification
We start by plotting model fields f i versus observations y i to indicate the correspondence between the model and observed values. In addition, for a quantitative comparison, we evaluate the following parameters (see, e.g., Jolliffe & Stephenson, 2012;Mentaschi et al., 2013 for further details) • Root mean square error: • Mean bias (MBias): wherēdenotes the mean of the observations and N is the sample size.
The root mean square error shows the average magnitude of the errors. A low root mean square error is a sign that the model output parameter (here hourly averaged values) agrees well with the corresponding observations. But the mismatch between model and observations in the root mean square error is squared; hence, this statistical measurement does not reveal if the model is overpredicting or underpredicting the observed parameter. The bias in units of the measured variable on the other hand can be negative or positive and will give an indication whether the predicted model values are too high or too low. It is helpful to normalize the root mean square error especially when comparing the root mean square error of parameters with widely different scales or units. The RMSE value normalized by the mean measured value is referred to as the scatter index (S.I.). A lower S.I. corresponds to a better forecast (compared to observations), which is also a reflection of a low RMSE value. A S.I. lower than 0.5 is a sign that on average the magnitude of model value is typically within 25% of the observed value. The correlation coefficient shows the linear relationship between forecast and observation. But one can get high correlation even with a high bias, so it is useful to use the correlation in association with the bias.  Table 2 summarizes the individual model performance metrics for these three  Visual inspection of Figure 2 shows that for wind speed, the models in general performed better in 2016 (descending phase of solar cycle) than in 2014 (solar maximum) with IPS-ENLIL, IPS-TOMO, and GONG-WSA-ENLIL following the 1:1 correlation line in 2016 quite well. However, ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL often underestimate the wind speed compared to observations in 2016. This is also consistent with the negative bias of greater than −60 km s −1 in 2016 indicated in Table 2 for these two simulations. The root mean square error for GONG-WSA-ENLIL and GONG-WSA-ENLIL-Oper (see Figure 5) are the lowest in 2016 with values of around 90 km s −1 (see Table 2). Also, the scatter index is lower than 0.5 for all the models, which is a sign there is some relation between model and observations. The results seem to show a good level of consistency for the solar wind speed in 2016 with the exception of ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL. The latter two simulations have a high negative bias in 2016 but the smallest bias in 2014, as well as showing the lowest RMS errors in 2014 (around 90 km s −1 ). In 2016, the correlation coefficients are highest for GONG-WSA-ENLIL and the corresponding operational model GONG-WSA-ENLIL-Oper, but they are only around 0.5. A high correlation is not always associated with a low RMSE. The correlation coefficient for IPS-ENLIL and IPS-TOMO is about 0.35 in 2016, but the RMSE for these runs is higher than the RMSE of ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL despite the latter having similar correlation coefficients. Overall, the correlation coefficients in 2014 are lower than in 2016, possibly indicating the difficulty in forecasting wind speed at times of higher solar activity. Note. RMSE = root mean square error (km s -1 ), (N p cm − For the density plots (see Figure 3 and, for the operational model, Figure 5), RMSE and the correlation are also generally better in 2016 than in 2014. However, in both years the correlation is very low (usually less than 0.2) for all models. In addition, all the models exhibit a negative bias, which indicates that all the models underpredict the magnitude of the plasma density at L1 (see Table 2). We do not know if there is a systematic bias in either the models or observations. Differences between the various model versions are less apparent than for wind speed. The main difference is that biases for ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL are generally smaller by a small margin than for the other models, and in 2016 these two models have a smaller RMSE and a generally larger correlation than the other models. However, the scatter index for all the models is greater than 0.75, which is a clear indication that models struggle to meet the observations.
For the magnetic field magnitude the situation is more complex as Figure 4 and Table 2 demonstrate. First, we note there are no plots for IPS-TOMO for the years 2014 and 2016. This is because only recently, there has been an interest in deriving magnetic field information from IPS measurements (e.g., Jackson et al., 2019). Based on the description of the magnetic field via a nonpotential theory, we would expect that ADAPT-DUMFRIC-ENLIL will perform best. This is clearly the case in 2016, where ADAPT-DUMFRIC-ENLIL shows the smallest mean bias of −0.24 nT, and scatter index of 0.42. This run also shows the smallest RMSE for 2016, but surprisingly, this is not the case for 2014. What is notable is that the other run that uses ADAPT, namely, ADAPT-DUPFSS-ENLIL, demonstrates small bias and RMSE in both years. This suggests that it may be the use of ADAPT in the ENLIL initialization, rather than the DUMFRIC model, which is contributing to the good performance. This is examined further in section 4. We also see that the RMSE and mean bias for the IPS-ENLIL runs are largest in 2014 and second largest in 2016, which indicates that the method of initializing the magnetic fields using the CSSS model in these runs may be inappropriate.We note that for 2016 IPS-ENLIL and GONG-WSA-ENLIL have similar values for RMSE, , scatter index, normalized mean bias, and mean bias.
A comparison of GONG-WSA-ENLIL and GONG-WSA-ENLIL-Oper results for 2016 in Table 2 and Figures 2-5 shows them to be very similar. These runs use the same initial conditions to drive ENLIL, but GONG-WSA-ENLIL-Oper uses a higher ENLIL resolution. It seems from this analysis, and comparison with the other model results, that the inner boundary condition, rather than the resolution, is the dominant factor in determining ENLIL forecast skill for background solar wind, at least at the L1 point. Table 2 and Figures 2-5 show results accumulated over all forecast times. There is little evidence of a change in the performance of the forecasts with changing lead time (not shown). Lead time here is defined as the forecast time in days into the future. All simulations presented here use a lead time of up to 5 days.

Gaussianness of Model and Observed Distributions
Histograms showing the distributions for all variables appears in Figure 6. The histograms show results accumulated over all forecast times. There is little evidence of a change in the performance of the forecasts with changing lead time (not shown). We note that neither the observations nor the models follow a perfect Gaussian distribution. The observations tend to be highly skewed to the right, though the skew is least apparent for the magnetic field magnitude. For the wind speed ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL are closest to the observations in 2014, but the speeds from these runs are too weak in 2016. This is consistent with the changes in bias from positive to largely negative values for these runs shown in Table 2 and Figure 2. It appears that the speeds from the ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL simulations show a similar distribution in both years and do not respond much to the change in the observed speeds from WIND. It is also apparent that the IPS-ENLIL and IPS-TOMO forecasts overrepresent the higher wind speeds in both years. IPS-ENLIL often has wind speed in excess of 700 km s −1 with no clear indication that observations at L1 support it. This is also true for GONG-WSA-ENLIL in 2014, but this run, and GONG-WSA-ENLIL-Oper, performs best in 2016. For plasma density, all the model simulations do quite a good job of representing the skewed peak and long tail to high values seen in the distribution for the observations. However, the skew in the model distributions is a little bit too far to the right, consistent with the low bias reported in the discussion of Table 2 and Figure 3, which is mostly negative. In 2016 IPS-ENLIL follows the distribution of the density observations very well, but less so in 2014 where IPS-ENLIL forecasts have lower density values than were being observed. In 2014 ADAPT-DUMFRIC-ENLIL is doing a fairly good job representing the observed L1 density values. In 2016 ADAPT-DUMFRIC-ENLIL overpredicts the occurrence of the density values between 4 and 8 (N p cm −3 ). For both years 2014 and 2016 the mean bias for ADAPT-DUMFRIC-ENLIL is similar (see Table 2), but the distributions compared to the observations look distinctively different. We can also see that the distribution of ADAPT-DUMFRIC-ENLIL in 2016 is consistent with the 2-D histogram plot of magnetic field magnitude in Figure 4.
Note that the classical histograms do not necessarily show the whole picture. For example, in Figure 6 ADAPT-DUMFRIC-ENLIL is following the distribution of observations very well. This is in contrast to Figure 2 where ADAPT-DUMFRIC-ENLIL is poor at representing larger observed solar wind speed values (e.g., at higher observed wind speeds, the model values remain lower than the observations). In Figure 2 IPS-ENLIL or IPS-TOMO seems to be performing better for the year 2014, which is not obvious from the histogram in Figure 6. Each 2-D grid box in Figure 6 considers the colocated (in time) observations and model values. Whereas in Figure 6 the histogram of the observations and histogram of model values are independent. This means on average model values could follow the distribution of observations in a reasonable manner. But this way of plotting observations versus model values does not take into account that the model values and observations in any one bin size are not necessarily colocated in time.

Taylor Diagrams
Taylor plots, or Taylor Diagrams, were first developed in Earth science to compare the performance of different climate models in one graphical representation. We can also use Taylor plots to compare the performance of our models to quantify the degree of correspondence between the modeled and observed parameters (here solar wind speed, plasma density, and magnetic field magnitude). Taylor plots are derived from three statistical parameters: the Pearson correlation coefficient, the root-mean-square error, and the standard deviation.
In our analysis Taylor plots should be seen as a method to intercompare and relate individual model results from Table 2 and Figures 2-5 to other model results in this study. Taylor plots are a relationship of the following form (Taylor, 2001): where 2 and 2 is the standard deviation of the model and observations, respectively. denotes the correlation coefficient between model values and observations.
The centered pattern difference E (which is based on the law of cosines: c 2 = a 2 + b 2 − 2ab cos( ); see also    Figure). The radial distance from the origin to the colored markings for each model is the ratio (exemplarily shown for model GONG-WSA-ENLIL:Ws and denoted as a). The distance from the reference point (black blob) to each model marking would be E N (see Equation 9) (exemplarily shown for model GONG-WSA-ENLIL:Ws and denoted as c; = accos(ρ)). A model performs better the closer it is to the black reference blob (denoting observations) at the circle with radius 1. Note: in this plot Ratio Standard Deviation is unitless.
We plot in this Taylor diagram 3 different parameters and physical units (km s −1 , N p cm −3 , nT). For this reason the standard deviations have to be normalized with respect to the standard deviation of the observations (unitless). Equation 7 can be written as where equals 1. Our reference point (in our Taylor plots this is the circle with radius 1) of solar wind speed, plasma density, and magnetic field magnitude therefore lies at the x axis with a value of 1 (black blob). If the models were perfect, they all would correspond with that marked location (E N = 0) because they would have a correlation of 1 and the same standard deviation as the observations. In reality though, the models will be located away from that reference point and E N > 0. (gray cross) for plasma density, and ADAPT-DUPFSS-ENLIL (blue star) magnetic field magnitude, while in 2016 the points closest to the observations are GONG-WSA-ENLIL-Oper (purple blob) for the solar wind speed, IPS-ENLIL (gray cross) for the plasma density and ADAPT-DUPFSS-ENLIL (blue star) for the magnetic field magnitude. According to Table 2 for the solar wind speed in 2014, we would expect that either ADAPT-DUMFRIC-ENLIL or ADAPT-DUPFSS-ENLIL comes out at the top because it has a higher correlation coefficient and lower bias and root mean square error than the other models. However, the Taylor plots do not directly take root mean square error into account but account for the ratio of standard deviation of model to observations. Figure 6 shows that a model can have a RMSE that is lower than the RMSE for a competing model but a standard deviation that is higher. For example, ADAPT-DUPFSS-ENLIL in 2014 has a standard deviation of 84 km s −1 and a RMSE of 92 km s −1 (see Table 2). GONG-WSA-ENLIL for the same year has a standard deviation of 66 km s −1 but a much higher RMSE of 116 km s −1 . We also need to mention that the distributions of model values and observations are skewed and a standard deviation only makes sense for a Gaussian distribution. However, the observations and models are all skewed to the right, and we are still comparing likes with likes in loosely defined terms. The same can be said for the magnetic field magnitude in 2014 where ADAPT-DUPFSS-ENLIL would have a better correspondence to the standard deviation of observations (it lies closer to the circle with radius 1) than GONG-WSA-ENLIL, but the correlation coefficient is close to 0.

High-Speed Solar Wind Streams (HSSs)
In addition to the metrics defined in the previous section, event-based validation is crucial in assessing the various models. For the purpose of this study, the focus is on the arrival time of HSSs. It has been known for 45 years that the fast solar wind stream is outflow from coronal holes (e.g., Krieger et al., 1973, or Richardson, 2018, for a review article). The interaction between HSSs and surrounding plasma and slow solar wind can build up stream interaction regions. But detecting stream interaction regions from solar wind speed time series alone is prone to large errors. So we only discuss HSSs events in the following. HSS regions in the solar wind speed time series are events when the solar wind speed increases from a low background level from <450 km s −1 to values larger than 500 km s −1 in a very short time span and often within hours. Forecasting the arrival time of HSSs on Earth is crucial in order to filter them out from CMEs as both can lead to increased geomagnetic activity. HSSs can trigger a geomagnetic storm (Gerontidou et al., 2018;Richardson & Hilary, 2012), and the fast solar wind is associated with high electron fluences in the radiation belt. The high-speed solar wind stream is also important for modeling and forecasting the Earth's radiation belt (e.g., Horne et al., 2013). Iles et al. (2002), for example, found observational evidence that fast solar wind speeds above 500 km s −1 are linked to relativistic electron enhancements in the outer radiation belt. One of the drivers for developing ADAPT was to get a better characterization of coronal holes, associated with the fast solar wind, and subsequently to derive better input conditions to ENLIL.
We use a HSS detection algorithm that was originally developed in Owens et al. (2005) and further refined in Bu et al. (2019), Jian et al. (2015Jian et al. ( , 2016, and MacNeice (2009). We use here the definition from Jian et al. (2015) and adapted our own computer code for automatically detecting HSS regions within solar wind speed time series (see the appendix). We assess in the following the performance of the models in detecting HSS events.
There exists a great number of metrics, or skill scores, for assessing the "goodness" of a binary forecast (e.g., Barnes et al., 2016). They are often based on the number of hits of a sample of events that were correctly predicted, the number of misses of that event, and the number of false alarms (e.g., the model reported something outside of a set criteria). One such skill score, which we use here, is the critical success index or Threat Score (TS) (e.g., Schaefer, 1990 and references therein): TS = n hit n hit + n miss + n alse , The TS measures the fraction of events that are correctly predicted. A hit (n hit ) here is defined if any part in the HSS of the model (blue vertical bars in Figure 8) and observations (red vertical bars in Figure 8) overlap. Two nonoverlapping distinct bars are a miss (n miss ) (with respect to the HSS from the observed time series of solar wind speed). A false alarm (n false ) results from a predicted HSS of the model, but no HSS whatsoever has been detected in the observed time series. Ideally, n miss and n false should be zero so that the TS equals 1.  Table 3 shows further statistics of model versus observations. See also the appendix.
In Figure 8 we show example plots for 2016 of time series of solar wind observations versus two models ADAPT-DUMFRIC-ENLIL and IPS-ENLIL and the corresponding HSS regimes based on our detection algorithm. If the model values would follow the observations, we would find many agreements between the observed HSS regions (red bars) and model derived ones (blue bars). Our algorithm is not perfect as can be shown in Figure 8 for ADAPT-DUMFRIC-ENLIL. The algorithm found a HSS for the model (blue line and bar) around day of year (doy) 127 but none for the observations (red line) although there is a sharp increase in wind speed from doy = 127 to doy = 130. But upon closer inspection, one can see that for the observations the wind speed drops off at doy = 129 (<600 km s −1 ) while staying at the same level (> 400 km s −1 ) for the model compared to the prior doy 128. The algorithm would very likely have given a HSS for the observations if we would not have removed the observations during the CME event because in that case the wind speed increases over 4 days (doy 125 to 129). Notwithstanding, the observations are our reference because if the model had followed the observations, the algorithm would not have found a HSS region for the model. Table 3 summarizes the HSS prediction results as a function of lead time. Note that some of the operational model forecasts were not stored and this explains why upon closer inspection, the number of observed HSS regions N Obs is not always exactly the same. This is because for the HSS analysis, we only use the observational data at times when the model forecasts are also available. A similar situation exists for IPS-ENLIL where we lack forecasts for certain days. Table 3 shows that agreement between the numbers of HSSs detected in IPS-ENLIL and the observations is generally good but fewer HSSs are detected in ADAPT-DUMFRIC-ENLIL than in the observations. For example, in 2014 IPS-ENLIL had nine hits for the forecast with a lead time of 1 day compared to the six and three hits of ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL (see Table 3). Table 3 also confirms the picture shown in Figure 8 of more HSS hits for IPS-ENLIL than ADAPT-DUMFRIC-ENLIL in 2016. For both years the number of hits for IPS-ENLIL is generally higher than for ADAPT-DUPFSS-ENLIL and ADAPT-DUMFRIC-ENLIL, with the hits for GONG-WSA-ENLIL and GONG-WSA-ENLIL-Oper typically somewhere in between. For 2016 IPS-ENLIL Table 3 This Table Summarizes    has a very high hit rate (>10) compared to the other models except for GONG-WSA-ENLIL-Oper. But IPS-ENLIL has the highest number of false alarms for all five forecast days while GONG-WSA-ENLIL-Oper has very few false alarms. However, as Table 3 demonstrates, there is no model that really outperforms any other model. Overall, based on the TS for a lead time of 1 day IPS-ENLIL, GONG-WSA-ENLIL and GONG-WSA-ENLIL-Oper perform best in predicting the HSSs. We note that the TS indices generally remain below 0.5 for all the models and only IPS-ENLIL has a larger value (a TS index of 0.56 for 2014 and a lead time of 1 day). We also note that with increasing lead time the TS index does not necessarily become worse.

The Impact of ADAPT and DUMFRIC on the Model Results
Our results indicate that the two ADAPT-based model runs (ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL) compare well with the other models discussed in this paper. However, some results indicate that further improvement is needed. For example, for the magnetic field magnitude it is surprising that the run that uses a nonpotential representation of the magnetic field, ADAPT-DUMFRIC-ENLIL, performs relatively poorly in 2014 where the model values are largely biased to very low values if we use observations as a reference (see Table 2). In addition, the quality of the solar wind speed forecasts for these two experiments clearly differs between 2014 and 2016. Figure 6 shows that the change in solar wind speed distributions when compared to observations from 2014 to 2016 is smaller than for the other model experiments. Associated with this, there is a strong low mean bias (−70%) prevalent in the ADAPT-driven runs for 2016 but not for 2014 where the mean bias is largely positive (+25%). It is unclear whether the performance of ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL is related to the ADAPT initial conditions, or to the use of the DUMFRIC model, or both. Therefore, in this section we focus on this issue. For technical reasons we cannot run DUMFRIC driven by GONG data (for either DUMFRIC or DUPFSS versions). However, in 2014 we can drive the WSA model with ADAPT (run ADAPT-WSA-ENLIL) to test the impact of using ADAPT and also compare it against GONG-WSA-ENLIL 2014 that we used in this study. For consistency reasons in regard to ADAPT-DUPFSS-ENLIL and ADAPT-DUMFRIC-ENLIL in the following, we only plot output based on ADAPT tracer number 1. In low ENLIL resolution mode the spread of the 12 ADAPT ensemble members at L1 is fairly narrow, and the plots would not change by a large margin if we would use a different tracer number. For technical reasons we could not run ADAPT-WSA-ENLIL for 2016 in this study due to removal of ADAPT fields from its ftp server. Note. RMSE = root mean square error (km s −1 ); = Pearson correlation coefficient (−); S.I. = scatter index (−); NRMSE = normalized root mean square error (−), MBias = mean bias (km s −1 ), (N p cm −3 ), (nT). This table should be compared to Table 2.  Table 5 we use WSA v4.5, which uses a slightly different parameterization for the solar wind speed (see Equation 3)

Space Weather
The results would look similar (not shown) if we would use the values for v r from WSA v2.2 from Table 1. This may be different if we would run ENLIL in medium resolution.
For solar wind speed, ADAPT-WSA-ENLIL and GONG-WSA-ENLIL performance is similar (except for correlation), and the two DUMFRIC-based model runs also perform similarly (see Table 2), which suggests that solar wind speeds forecasts are strongly dominated by the coronal model (here WSA) that is used to create the boundary conditions for ENLIL, rather than the means by which observations are used. Table 4 suggests some evidence that use of ADAPT rather than GONG leads to improved correlation with observations and this is also true for solar wind density. Otherwise, the chief difference in the density results is the higher bias of <1 (N p cm −3 ) for WSA-based models than the mean bias of ≈ −0.5 (N p cm −3 ) for the DUMFRIC-based models. So again, the density forecast appears to be dominated by which coronal model is used. For magnetic field magnitude, the situation is a little different. In Figure 9 all runs show a bias to low values (model values often lower than observations), apart from run ADAPT-DUPFSS-ENLIL, which has a lower mean bias of −0.8 (nT). This can also be seen in Figure 6 where in 2014 the ADAPT-DUPFSS-ENLIL histogram follows Note. See the text and Table 3 for definition of parameters.
the observed magnetic field magnitude values at L1 reasonably well and better than the other models. The two WSA-driven forecasts seem quite similar to each other, again suggesting the importance of the coronal model used in influencing the forecasts at L1. However, the two DUMFRIC-driven results are quite different to each other, and it is highly curious that ADAPT-DUPFSS-ENLIL, which has a simpler representation of the evolution of the coronal magnetic field, in general appears to produce a better forecast. Histogram plots (not shown, but similar to Figure 6) show that the distribution of wind speed, density, and magnetic field magnitude for run ADAPT-WSA-ENLIL is very similar to that for run GONG-WSA-ENLIL.
We would like to make a caveat here. It might seem that it is not important if GONG or ADAPT magnetograms are used. But this conclusion is misleading because Equation 3 has been developed for average conditions over a full solar cycle. As mentioned in the prior sections, ADAPT accounts for transient features where the GONG synoptic maps often would miss an active region. Equation 3 does not account for different phases of the solar cycle (e.g., maximum vs. minimum) and cannot fully exploit the information available in different synoptic maps (e.g., ADAPT vs. GONG). In a similar vein, Equation 6 is predicated on observations from 2014 and only represents a small part of the solar cycle. This is also the reason why the ADAPT-based model runs that use Equation 6(see Figures 2 and 6) have better agreement with the observed solar wind speeds in 2014 than in 2016. If Equations 3 and 6 would use parameterizations as a function of the different phases in the solar cycle, we would also see better consistency between models and different years.
The Taylor plot ( Figure 10) for 2014 that also includes ADAPT-WSA-ENLIL is slightly different. Here for the solar wind speed, ADAPT-WSA-ENLIL (purple blob) is closer to ADAPT-DUPFSS-ENLIL (blue blob) and ADAPT-DUMFRIC-ENLIL (brown blob) than to GONG-WSA-ENLIL (green blob). The same can be said for the plasma density. This difference is a result of the marginally higher correlation in the ADAPT-WSA-ENLIL ( = 0.26) results in 2014 compared to GONG-WSA-ENLIL ( = 0.14) as mentioned above. For the magnetic field magnitude, ADAPT-WSA-ENLIL (purple star) is closer to ADAPT-DUMFRIC-ENLIL (brown star) than to GONG-WSA-ENLIL (green star), too, and this re-emphasizes the point above about the differences between the two DUMFRIC models.
Regarding the modeling of HSSs, a comparison of Table 5 with Table 3 shows that the pattern of HSS events for ADAPT-WSA-ENLIL is not obviously similar to those from the other runs driven by ADAPT or by WSA. However, we note that the hit rates are higher, suggested some benefit of ADAPT over GONG. Also, there are fewer misses with ADAPT-WSA-ENLIL compared to GONG-WSA-ENLIL with the penalty that there are more false alarms for ADAPT-WSA-ENLIL. The TS drops from days 1 to 2 and then rise again through to day 4, which a behavior seen in GONG-WSA-ENLIL but not the two DUMFRIC runs. This suggests it is related to the initial wind speeds and magnetic fields at the inner boundary of ENLIL, once more supporting the conclusion that the method of how to construct the coronal magnetic field from observations is as much as important than the quality of synoptic maps of observations.

Conclusions
In this paper we have compared the performance of five different model configurations for prediction of the solar wind near Earth. Most of the models use magnetogram observations and a coronal magnetic field model to create boundary conditions for the forward model ENLIL. The exception is the IPS-ENLIL runs, which produce initial conditions based on IPS wind speed and density observations and then use a different coronal model to estimate the magnetic field initial conditions for ENLIL.
In this study we tried to answer three questions: • Can a more realistic nonpotential (DUMFRIC) coronal magnetic field reconstruction in the forecast model deliver better results than the classical approach of a potential force free assumption without accounting for electric currents in the corona? • What impact does using the IPS approach have on the results? • How dependent are the results on the means by which the magnetograms are created (GONG or ADAPT)?
All the models were compared with observations from the OMNI database at L1. The observations of solar wind speed, plasma density, and magnetic field magnitude are based on the WIND spacecraft (which is part of the OMNI). We did both a statistical analysis and an evaluation of their performance in predicting fast solar streams from coronal holes (HHSs). For this we considered two 6-month periods in 2014 and 2016, one at solar maximum and one in the descending phase of solar cycle 24. We observed a difference Space Weather 10.1029 in performance of the models due to the phase in the solar cycle. In 2016 the operational, medium resolution, GONG-WSA-ENLIL-Oper was also included. When considering improvements to the MO operational system, this is the benchmark against which new models should be compared. Our results showed that GONG-WSA-ENLIL-Oper and GONG-WSA-ENLIL results were very similar. This suggests that angular resolution has little impact on results at L1, though of course, we may expect results to be different if we had compared the two models' results at the ENLIL inner boundary of 21.5R ⊙ , based on a simple ranking of the RMSE, correlation coefficient, scatter index, and mean bias scores shown in Table 2 This assessment ignores the fact that some of these metrics for different models are similar and may not be significantly different and, in the same manner, a simple ranking also ignores cases where metrics are hugely different between models. The Taylor diagrams address these issues differently, and they indicate that GONG-WSA-ENLIL performs best for wind speed, density, and magnetic field magnitude in 2014, while in 2016 GONG-WSA-ENLIL performs best for wind speed, ADAPT-DUMFRIC-ENLIL is best for density, and ADAPT-DUPFSS-ENLIL is best for magnetic field magnitude.
The above assessment is still unable to highlight subtleties in the results that have been brought out in the text. For example, IPS-ENLIL solar wind speed in 2014 has a moderate high correlation ≈ 0.3 with observations but also shows a high RMSE (140 km s −1 ) and a large mean bias (64 km s −1 ), while Table 2 and Figure 6 indicate that the two ADAPT-driven runs between 2014 and 2016 have similar correlation coefficients ( = 0.23 and = 0.31) but a comparatively smaller mean bias (4 and 23 km s −1 ) and root mean square error (92 km s −1 ). However, this bias in 2014 for the ADAPT runs becomes a large negative (−64 and −69 km s −1 ) bias in 2016. Furthermore, while there is correlation between observed and modeled solar wind speed in most of the model simulations, this is not the case for density and magnetic field magnitude. The scatter plots for density suggest zero correlations between observed and modeled values (at least for more extreme events), while for magnetic fields the models represent a too-narrow range of field magnitude values.
It is also important to assess how well the models can simulate High Speed Solar Wind Stream events. Models historically have a hard time to catch these events, which can increase the solar wind speed at L1 within hours. While, overall, the IPS-ENLIL and IPS-TOMO runs seem to be performing slightly poorer than the other models in most of the assessments, these models in general perform best in the representation of HSSs if we can accept a higher false alarm count. This suggests that specifying ENLIL boundary conditions based on observed wind speed and density, rather than magnetic field as in the other runs, can lead to better results. While the results in Table 3 are quite hard to interpret, it appears that ADAPT-DUMFRIC-ENLIL and ADAPT-DUPFSS-ENLIL models are slightly worse at capturing HSS events. Having said that, the hit rate and TS for the ADAPT-driven runs tend to increase with forecast lead time, which does not happen with the IPS or GONG driven forecasts.
We also ran a simulation for 2014 with ADAPT-WSA-ENLIL in order to understand whether the ADAPT-DUPFSS-ENLIL and ADAPT-DUMFRIC-ENLIL results were chiefly due to ADAPT or the two magnetic field solutions (PFSS and DUMFRIC). For solar wind speed, it is very clear that the forecasts are strongly dominated by the coronal model that is used to create the boundary conditions for ENLIL, rather than by ADAPT or GONG, though there is some evidence that use of ADAPT improves the correlation between model and observations. ADAPT may show its true potential for ensemble modeling with the 12 ADAPT ensembles and CME forecasting, which requires a good understanding of the errors and associated sensitivities to the background solar wind. This is something we will look into in a future study. In addition, the density forecast also appears to be dominated by the choice of coronal model. For magnetic field magnitude, the situation is a little different. The two WSA-driven forecasts seem quite similar to each other, again suggesting the importance of the coronal model used in influencing the forecasts at L1. However, the two DUMFRIC-driven results are quite different to each other, and it is highly curious that ADAPT-DUPFSS-ENLIL, which has a simpler representation of the evolution of the coronal magnetic field,

Space Weather
10.1029/2020SW002499 in general appears to produce a better forecast. This appears a little surprising, but as was mentioned in the main text, the nonpotential model has not really been calibrated for operational space weather forecasting, and further research and testing could lead to improved results.
In summary, • All models tested produce fairly similar results for the background solar wind. We do not know how this would change if we would use the models to forecast CME arrival times. But it is expected that the model with the best background solar wind speed will also predict the best CME arrival time. This assumes though that a particular model will always have the best background solar wind speed. • The choice of coronal magnetic field model (DUMFRIC or WSA) has a clear influence on the model results. While use of the nonpotential approach (ADAPT-DUMFRIC-ENLIL) often produces very good results, this is not always the case when compared to the MO operational model, which suggests that further tuning of the model for operational use is needed. • Direct use of wind speed and density observations in the IPS-ENLIL and IPS-TOMO simulations is beneficial with regard to very good HSS detection. For other metrics, however, the IPS-driven models perform slightly poorer than the other models, but this could also be addressed by further adjusting of the IPS-ENLIL system. • Use of an ensemble tracer from ADAPT instead of GONG overall does not appear to lead to notable forecast improvements. However, there is some evidence of improved model-observation correlation when ADAPT is used.

Appendix A: Removal of CMEs from Model and Observation Time Series
Previous research has shown that in the aftermath of a CME hitting Earth, the measured ionization state at L1 of iron jumps to higher values often exceeding a typical background value of 10 (Q Fe > 10) (Kohutova et al., 2016). First, we tried to remove the CME signal by identifying dates with elevated iron charge states in the observed SWICS data product (Solar Wind Ion Composition Spectrometer) at L1 (Gloeckler et al., 1998). By setting a threshold, we would remove all those dates ±12 hr from the time series output of model and observations (solar wind speed, magnetic field, and plasma density) alike. The 24-hr buffer is designed to accommodate the typical error on CME arrivals in the models. However, as Figure A1 shows, this method is not without problems. If the threshold is say Q Fe > 11, we would end up with many false alarms when compared to the reported CME arrival times from the NASA score board (Kutzensova, 2020b). If however we increase the threshold to say Q Fe > 12, we would greatly reduce the false alarm rate but would miss many reported CME arrivals. Given this, we took the practical approach to remove the CME signal from the model output time series by choosing the listed CME arrival dates from NASA plus an error window of ±12 hr Figure A1. Time series of Q Fe from the Solar Wind Ion Composition Spectrometer (SWICS) instrument on the ACE satellite (Gloeckler et al., 1998) for 2014 and 2016. Also shown the threshold levels of Q Fe = 11 and Q Fe = 12. The gray vertical bars denote the reported CME arrival times for 2014 and 2016 from the NASA score board (Kutzensova, 2020b).