Volume 18, Issue 9 e2020SW002452
Research Article
Open Access

Evaluation of Total Electron Content Prediction Using Three Ionosphere-Thermosphere Models

O. Verkhoglyadova

Corresponding Author

O. Verkhoglyadova

Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA

Correspondence to:

O. Verkhoglyadova,

[email protected]

Search for more papers by this author
X. Meng

X. Meng

Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA

Search for more papers by this author
A. J. Mannucci

A. J. Mannucci

Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA

Search for more papers by this author
J.-S. Shim

J.-S. Shim

CCMC, Catholic University of America, Washington, DC, USA

Search for more papers by this author
R. McGranaghan

R. McGranaghan

ASTRA, LLC, Louisville, CO, USA

Search for more papers by this author
First published: 04 August 2020
Citations: 5


Prediction of ionospheric state is a critical space weather problem. We expand on our previous research of medium-range ionospheric forecasts and present new results on evaluating prediction capabilities of three physics-based ionosphere-thermosphere models (Thermosphere Ionosphere Electrodynamics General Circulation Model, TIE-GCM; Coupled Thermosphere Ionosphere Plasmasphere Electrodynamics Model, CTIPe; and Global Ionosphere Thermosphere Model, GITM). The focus of our study is understanding how current modeling approaches may predict the global ionosphere for geomagnetic storms (as studied through 35 storms during 2000–2016). Prediction approach uses physics-based modeling without any manual model adjustment, quality control, or selection of the results. Our goal is to understand to what extent current physics-based modeling can be used in total electron content (TEC) prediction and explore uncertainties of these prediction efforts with multiday lead times. The ionosphere-thermosphere model runs are driven by actual interplanetary conditions, whether those data come from real-time measurements or predicted values themselves. These model runs were performed by the Community Coordinated Modeling Center (CCMC). Jet Propulsion Laboratory (JPL)-produced global ionospheric maps (GIMs) were used to validate model TEC estimates. We utilize the True Skill Statistic (TSS) metric for the TEC prediction evaluation, noting that this is but one metric to assess predictive skill and that complete evaluations require combinations of such metrics. The meanings of contingency table elements for the prediction performance are analyzed in the context of ionosphere modeling. Prediction success is between about 0.2 and 0.5 for weak ionospheric disturbances and decreases for strong disturbances. We evaluate the prediction of TEC decreases and increases. Our results indicate that physics-based modeling during storms shows promise in TEC prediction with multiday lead time.

Key Points

  • Physics-based predictions of the total ionospheric content are evaluated for 35 storms
  • Prediction success is estimated below 0.5 and is with line with other geospace model predictions
  • Physics-based modeling shows promise in TEC prediction with multiday lead time
  • It is important to evaluate all elements of the contingency matrix and to understand the role of each element in evaluation metrics

1 Introduction

Total electron content (TEC) is an integral characteristic of ionospheric state and readily reflects ionospheric dynamics during geomagnetic activity periods or geomagnetic storms. Prediction of TEC is an important research direction within the scope of space weather forecast efforts (Chartier et al., 2016; Mannucci et al., 2015; Meng et al., 2016, 2020; Schunk et al., 2005, 2012, 2014; Shim et al., 2017). There are several physics-based or first principles-based models that have been successfully applied to understand space weather events in the Earth's ionosphere-thermosphere (IT). The Community Coordinated Modeling Center (CCMC) hosts a number of widely used models and tools (see, for instance, Rastätter et al., 2019). (Shim et al., 2011, 2012,2017) performed several systematic studies to evaluate prediction capabilities of IT models for selected storms. However, using these models as prediction tools has several challenges (Mannucci et al., 2016). There are current shortcomings in understanding IT coupling to regions above and below (e.g, Heelis & Maute, 2020). The interplanetary magnetic field (IMF) is a critical driver of space weather. However, prediction of IMF magnitude and orientation at the location of Earth's orbit faces difficulties (Kilpua et al., 2019). Uncertainties in lower atmosphere forcing, energy deposition in high-latitude region, quantification of particle precipitation, and potential limitations of physics-based approaches in modeling can affect prediction performance. Further analysis of prediction capabilities of the first principles IT models for a larger set of storms to gain insight into a model performance is called for.

There is a “critical mass” of available observations (e.g., https://www.haystack.mit.edu/atm/open/madrigal/index.html) that provide information on ionospheric structure especially over land and in regions of dense Global Navigation Satellite System (GNSS) coverage. However, a gridded data set is more convenient to use for global model validation than line-of-site measurements. JPL provides a data set of Global Ionospheric Maps (GIMs) that contains global maps of TEC on a regular grid (1° by 1°) generated every 15 min. The TEC data are produced by processing GPS data posted on public sites (for instance, by the Scripps Orbit and Permanent Array Center (SOPAC) archive hosted by the University of California, San Diego) with JPL-developed software to estimate slant TEC, converting data to vertical TEC by assuming the standard ionospheric shell height of 450 km, and applying optimal mapping techniques to produce a gridded data set (Komjathy et al., 2005; Mannucci et al., 1998, 1999). The data set is available for scientific research and can be accessed at https://sideshow.jpl.nasa.gov/pub/iono_daily/gim_for_research/jpli/website.

In this paper we expand on our previous analysis of medium-range ionospheric forecasts (Meng et al., 2016) and present new results on evaluating prediction capabilities of physics-based ionosphere-thermosphere models. The motivation for this new study is to analyze predictive capability of physics-based models for a larger number of storms of different types over a solar cycle. We also refined our success metric (see details in section 2.2).

We distinguish between the terms “forecast” and “prediction” following the definitions proposed by the scientific community at a Chapman Conference titled “Scientific Challenges Pertaining to Space Weather Forecasting Including Extremes,” held 11–15 February 2019 in Pasadena, California. “Prediction” refers to the quantitative output of a model referring to a future value of ionospheric TEC. “Forecast” refers to future ionospheric state that generally relies on predictions and human participation. Thus, we limit our study of TEC forecasting to evaluation of TEC prediction by physics-based models and without manual adjustment of model runs or human-based quality control of the result.

Specifically, we focus on three global circulation models (GCMs): TIE-GCM (Thermosphere-Ionosphere-Electrodynamics General Circulation Model) (Roble et al., 1977), CTIPe (Coupled Thermosphere Ionosphere Plasmasphere Electrodynamics Model) (T. J. Fuller-Rowell et al., 1996), and GITM (Global Ionosphere Thermosphere Model) (Ridley et al., 2006). All GCMs solve the three-dimensional momentum, energy, and continuity equations for selected neutral and ion species at each time step. All model runs were conducted by CCMC. Our selection of these models is motivated by their wide use by the science community and their availability for “run on request” at CCMC. They all have similar driving setups that make them good candidates for intercomparison. We understand that there are other models that can be used in such study but limited our analysis to these three representative models.

The goal of our study is to understand to what extent current GCMs can be used in TEC prediction in the multiday lead time scenario, explore uncertainties, and gain insight into improvement of these prediction efforts. We focus on predicting TEC of the global ionosphere for 35 geomagnetic storms during 2000–2016 that covers over one solar cycle. These storms are selected to represent coronal mass ejections (CMEs) and Corotating Interaction Regions/High Speed Streams (CIR/HSS) driven storms during the epoch when numerous ground GPS stations were incorporated into estimation of global TEC maps.

In this paper we describe our prediction evaluation methodology, including GCM forecast-compatible run setups and definitions for TEC metrics. We apply the True Skill Statistic (TSS) metric for evaluation of the TEC prediction (Bloomfield et al., 2012) with event-based binary classification. The physical meanings of prediction hits, misses, and “false alarms” are analyzed in the context of ionosphere modeling. Then we discuss the reference TEC data set and its limitations. We evaluate prediction success for different strengths of ionospheric TEC increases and decreases during storms, different storm types, and modeling approaches by using data-derived benchmarks. We will conclude with summary of the results and suggestions for future research directions for the ionospheric TEC prediction.

2 Methodology

2.1 Modeling Setup

To evaluate current prediction capability for the IT system, we performed modeling of geomagnetic storm intervals with three representative GCMs: TIE-GCM, GITM, and CTIPe. The runs utilized specific versions of the models and relied on CCMC implemented modeling setups that we will describe below. TIE-GCM was developed by R. G. Roble at High Altitude Observatory/National Center for Atmospheric Research (HAO/NCAR) (Roble et al., 1977). Diurnal and semidiurnal tides are inputs for the lower model boundary. A team led by A. Ridley developed GITM (Ridley et al., 2006). CTIPe was developed by a team led by T. Fuller-Rowell (T. J. Fuller-Rowell et al., 1996). Some details on the CCMC-based modeling setup are summarized in Table 1. The Weimer05 model (Weimer, 2005) was used to specify electric field potential in the high-latitude ionosphere for all three models. Particle precipitation is specified either by the Fuller-Rowell and Evans model (T. Fuller-Rowell & Evans, 1987) or OVATION (Oval Variation, Assessment, Tracking, Intensity, and Online Nowcasting) Prime model (Newell et al., 2009). The grids (latitude by longitude) are in geographic coordinates.

Table 1. Characteristics of Three GCM Modeling Setups
Model Altitude range Horizontal grid Auroral precipitation notes
TIE-GCM 97–500 km 5° by 5° Fuller-Rowell and Evans model assumes hydrostatic equilibrium
GITM 90–600 km 2.5° by 5° Ovation Prime model allows for nonhydrostatic solutions
CTIPe >80 km 2° by 18° Fuller-Rowell and Evans model includes the plasmasphere

We selected 35 geomagnetic storm intervals from 2000 throughout 2016. Originally, we selected 10 CME-type and 10 CIR/HSS-type storms based on the following storm characteristics, solar and interplanetary conditions. For the 10 CME-type storms, the minimum Dst is lower than or approximately 100 nT. Among the selection, there is one CME event, 29 September 2014, that did not cause any significant geomagnetic activities. We expanded the data set with the additional selection of 15 strong CME-type storms based on minimum Dst index and average AE index over the storm interval. Storms with 400 nT < minDst < 100 nT and average AE > 500 nT were selected. Careful consideration was made for selection of storms induced by single CMEs. We utilized event lists from the LASCO halo CME Catalogue (https://cdaw.gsfc.nasa.gov/CME_list/) and the Richardson-Cane Catalogue of near-Earth ICMEs (https://www.srl.caltech.edu/ACE/ASC/DATA/level3/icmetable2.htm). The 10 CIR/HSS-type storms were selected from all CIR/HSS-type storms between 2002 and 2016. For each selected storm, 80 nT < minDst < 40 nT. The interplanetary condition for each storm shows a clear CIR/HSS solar wind structure, with the peak solar wind speed distributed between 600 and 800 km/s. Out of these 10 CIR/HSS events, 5 had the HSS originated from transequatorial coronal holes, 3 from northern coronal holes, and 2 from southern coronal holes. The complete list of the modeled storms is presented in Table 2. The quiet days listed in the table were selected as the nearest day before a storm with the daily Ap index less than 5. The ionospheric state during a quiet day was used as the baseline to compare with the states during storm days for each event.

Table 2. List of the Modeled Storm Intervals
Storm type Storm start day to end day Quiet day
CME 6 Apr 2000 to 8 Apr 2000 3 Apr 2000
CME 15 Jul 2000 to 17 Jul 2000 12 Jul 2000
CME 10 Aug 2000 to 13 Aug 2000 9 Aug 2000
CME 28 Oct 2000 to 31 Oct 2000 27 Oct 2000
CME 4 Nov 2000 to 8 Nov 2000 3 Nov 2000
CME 31 Mar 2001 to 2 Apr 2001 30 Mar 2001
CME 11 Apr 2001 to 14 Apr 2001 10 Apr 2001
CME 21 Oct 2001 to 23 Oct 2001 18 Oct 2001
CME 6 Nov 2001 to 8 Nov 2001 4 Nov 2001
CME 24 Nov 2001 to 26 Nov 2001 21 Nov 2001
HSS 11 Apr 2005 to 16 Apr 2005 10 Apr 2005
CME 15 May 2005 to 14 May 2005 14 May 2005
CME 24 Aug 2005 to 26 Aug 2005 20 Aug 2005
HSS 8 Apr 2006 to 11 Apr 2006 7 Apr 2006
HSS 27 Jul 2006 to 29 Jul 2006 26 Jul 2006
HSS 9 Nov 2006 to 12 Nov 2006 8 Nov 2006
CME 14 Dec 2006 to 16 Dec 2006 13 Dec 2006
HSS 25 Oct 2007 to 28 Oct 2007 24 Oct 2007
HSS 8 Mar 2008 to 16 Mar 2008 7 Mar 2008
HSS 21 Apr 2008 to 27 Apr 2008 20 Apr 2008
HSS 3 Sep 2008 to 9 Sep 2008 2 Sep 2008
HSS 22 Oct 2010 to 27 Oct 2010 21 Oct 2010
CME 4 Aug 2011 to 7 Aug 2011 3 Aug 2011
CME 25 Sep 2011 to 29 Sep 2011 24 Sep 2011
CME 24 Oct 2011 to 26 Oct 2011 23 Oct 2011
CME 22 Apr 2012 to 26 Apr 2012 20 Apr 2012
HSS 10 Apr 2012 to 14 Apr 2012 9 Apr 2012
CME 14 Jul 2012 to 17 Jul 2012 13 Jul 2012
CME 30 Sep 2012 to 2 Oct 2012 29 Sep 2012
CME 12 Nov 2012 to 15 Nov 2012 11 Nov 2012
CME 27 Feb 2014 to 1 Mar 2014 26 Feb 2014
CME 11 Apr 2014 to 14 Apr 2014 10 Apr 2014
CME 29 Sep 2014 to 1 Oct 2014 28 Sep 2014
CME 21 Jun 2015 to 25 Jun 2015 20 Jun 2015
CME 19 Dec 2015 to 23 Dec 2015 18 Dec 2015
CME 18 Jan 2016 to 24 Jan 2016 17 Jan 2016
CME 12 Oct 2016 to 15 Oct 2016 11 Oct 2016

For all three GCMs, we set up the runs in a forecast-compatible mode such that the drivers or inputs to the models (except solar EUV) depend on the actual solar wind parameters. All three GCMs require the solar wind conditions at the Earth, which are obtained from in-situ measurements. These solar wind conditions can in principle be forecasted using first principles models of the solar wind. Critical importance of predicting IMF and especially Bz component is acknowledged by the scientific community. Significant progress has been made in benchmarking CME arrival time but predicting structure and orientation of a CME remains challenging (Kilpua et al., 2019; Verbeke et al., 2019). Existing solar wind measurements at 1 AU and in front of the bow shock may not be the best representation of the solar wind driver for the geospace modeling (Walsh et al., 2019). Thus, there are currently critical uncertainties in predicting solar wind parameters. Our modeling setup allows us to focus on evaluation of predictive capabilities of the coupled IT models themselves and independently of prediction skill for the solar wind parameters, which is beyond the scope of the current paper. Different models require different inputs or take inputs in different ways for their forecast-compatible-mode runs. However, all three GCMs require the daily F10.7 and 81-day-averaged F10.7 indices. There are currently several linear predictive approaches for solar EUV irradiance and F10.7 index (Henney et al., 2012; Tobiska & Bouwer, 2006; Warren et al., 2017), and some of them are used operationally (e.g., Rutledge et al., 2013). There is an evidence of forecast improvement relative to persistence and climatology especially in combination with solar magnetic flux modeling or use of solar images. Thus, we think that there is a clear path to improved prediction of F10.7 and decided to include predicted F10.7 as a part of our assessment. However, there is not yet a standard method. We utilized a simple autoregression model using 136 days before the prediction day (K. Tobiska, private communication, December, 2017). In addition, CTIPe also requires the hemispheric power index, which describes particle precipitation. This can be calculated based on an empirical relationship between the solar wind and the Kp index (Newell et al., 2008; Zhang & Paxton, 2008).

For all three GCMs, the solar wind conditions are used to specify the high-latitude electric field. In addition, GITM uses the solar wind conditions to specify the auroral precipitation pattern via the OVATION Prime model (Newell et al., 2009). Both TIE-GCM and CTIPe use the Fuller-Rowell and Evans model (T. Fuller-Rowell & Evans, 1987) to specify the auroral precipitation, which relies on the hemispheric power index. For TIE-GCM, the hemispheric power index is computed by the model itself based on the IMF Bz, the solar wind velocity, and the cross polar cap potential from the Weimer 2005 model (Weimer, 2005). GITM runs for four CME-type storms were excluded from the analysis since corresponding modeling runs were not successfully completed.

TEC evaluation is performed on native 2-D grids for each model (see Table 1). Since the custom hourly model outputs are defined on different grids, we adopt the common grid (30° in latitude and 90° in longitude) to calculate TEC metrics on an hourly basis. This common grid selection is made for two reasons. First, we choose a larger grid cell than for the models for a realistic requirement of predicting relatively large-scale TEC dynamics rather than aiming for accurate smaller-scale predictions. Second, the grid size is chosen to include an integer number of either model or GIM cells to avoid interpolation that could possibly incur additional errors. The model and TEC values are averaged into the custom grid.

2.2 Metrics for the TEC Prediction Evaluation

Our approach is to use JPL GIMs (Ho et al., 1997; Iijima et al., 1999; Mannucci et al., 1998) as the baseline for global TEC estimates (TECGIM). We realize that these data cannot be considered as the “ground truth” in an absolute sense despite high accuracy of the GNSS-based technique of TEC measurements. First, there are intrinsic errors in converting slant (line-of-sight) TEC measurements to vertical measurements. Second, there are interpolation errors. Third, TEC estimations over regions with poorer coverage (over oceans) incur greater errors than measurements over regions with better coverage or over dense local networks. These errors could result in typical root-mean-square (RMS) accuracy better than 2–3 TECU in the middle and high latitudes (Hernandez-Pajares et al., 2017). Nevertheless, we use GIMs as a widely accepted and convenient measure of ionospheric state. Following improvements in GIM derivation methods, the accuracy of GIMs is being constantly evaluated. JPL GIM accuracy is comparable with other similar data sets (e.g., Roma-Dollase et al., 2018).

We define TEC disturbance in a model, or dTECModel, as the difference between the TEC model estimate (TECModel) and a prestorm value (TECModel,prestorm) normalized to the variability of the pre–storm time series of TEC (var(TECGIM,quiet) ). Similar definition is used for estimating dTEC from GIM data over a 2-D grid cell at a longitude (lon) and a latitude (lat), and time (UT):

Variability of TEC (var(TECGIM,quiet(UT,lon,lat))) is introduced for each model to account for both natural variability of TEC that is not related to storm time driving and for day-to-day uncertainties in GIM. It is calculated for each storm separately to reflect seasonal and solar cycle influences. TEC variability is estimated at hourly intervals (in UT) on the common grid. It is defined as the standard deviation of absolute differences between GIM TEC and the median GIM TEC in this grid cell at a certain UT over 27 days. The median value is subtracted to reduce TEC trends and to capture TEC changes around the trend. This 27-day running median for each storm is centered on the fourteenth day before the first storm day. Meng et al. (2016) instead used a baseline of a mean quiet time TEC over several days. We think that a median TEC value over the synodic solar rotation period is a better representation of the TEC baseline in each grid cell. The solar rotation accounts for a rotation period of a fixed feature on the solar disk (active region) as viewed from the Earth. Thus, it captures an averaged solar EUV flux. Ideally, TEC variability could be better approximated if centered on a prestorm day. However, in the prediction framework, all parameters need to be defined based on observations prior to the storm. Thus, the variability is precalculated. This approach provides a simple quantitative measure of the baseline and variability of TEC over 27-day intervals in each cell and its dependence on UT. Alternatively, the median absolute deviation of TEC or interquartile range can be utilized instead of the standard deviation.

Since two of the models are limited in altitude range and do not include TEC portion of the upper ionosphere (above 600 km) and plasmasphere, we introduce the scaling factor (Scale(UT)) to account for discrepancies between the total TEC and truncated model estimates. The scaling factor is calculated from median values for prestorm conditions and depends on UT and a model. Typically, it changes from about 1.4 to 2.4 with the largest value being around 3. It should be noted that plasmasphere contribution to the TEC depends on local time (longitude) and latitude (e.g., González-Casado et al., 2015). Our approach is simplified by introducing the scaling factor that only depends on UT of a day or a geomagnetic disturbance timeline. Changes in the Scale(UT) values could be attributed in part to variability of plasmasphere contribution to the TEC in GIM. Since we are focusing on modeling validation during storm periods, the strongest storm response is expected during a daytime when the plasmasphere contribution can reach up to 20% while storm time VTEC variations are expected to be larger than the missing plasmasphere contribution. Moreover, the CTIPe model includes the plasmapause and presumably captures plasma dynamics at all altitudes. We found that scaling factors for CTIPe and TIE-GCM vary in about the same range. Thus, we expect that plasmasphere variability contributes to potential error in our approach but other uncertainties are larger.

Figure 1 shows an example of dTECGIM and dTECModel for 13–16 December 2006 in the longitude range from 90°W to 0°W. Each panel corresponds to a latitude bin. The timeline starts with a prestorm day and corresponding dTEC is 0. TEC disturbances identified in the model trace dTECGIM well in the latitude range of 60°S to 90°S (bottom panel). TEC disturbances in GIM and the model are mismatched in the latitude range of 60°N to 30°S on the second storm day.

Details are in the caption following the image
Example of dTECGIM (black) and dTECTIEGCM (red) as functions of time (in hours) for each grid box in the longitude range from 90°W to 0°W.

Prediction evaluation takes into account TEC increases and decreases within the coarse grid cells we have defined. TEC increase is defined as TEC increase in a grid cell and a certain UT range during the storm day relative to the TEC value in the same grid cell and UT range but on a quiet day, that is, dTECGIM(UT,lon,lat) > 0. TEC decreases are defined by the condition dTECGIM(UT,lon,lat) < 0. In the ionospheric physics terms, these increases and decreases correspond to positive and negative storms, respectively. We analyze these cases separately. Our motivation is that positive and negative ionospheric storms are caused by different physical mechanisms and can be potentially modeled with different accuracy. Hereafter, we use the magnitude of dTEC for decreases and increases. We introduce disturbance levels as integer bins for dTEC threshold magnitude from 1 to 10. We compute the dTEC metric every hour (at UT = 1, 2, 3, 4 UT, etc.) and consider a dTEC match between model and GIM when the dTEC values simultaneously exceed a threshold or TEC disturbance level. To impose realistic requirements on ionospheric storm prediction, we allow for a time mismatch of 3 hr between modeled and observed dTEC in a grid cell.

We utilize the TSS metric to evaluate storm predictions (Bloomfield et al., 2012) on an hourly basis. This is one of widely used metrics that assesses predictive skill only (Liemohn et al., 2018). A contingency table is defined based on the success of the model dTEC prediction for each grid cell and hourly UT. We thus can categorize predictions as to whether they are defined by true positive (TP), false negative (FN), false positive (FP), and true negative (TN) occurrences. For each model and a storm event, TP estimates the number of actual TEC disturbances (identified in GIMs or dTECGIM(UT,lon,lat)) reproduced by simulations within the time margin of 3 hr. FN is the number of actual TEC disturbances in GIMs that are not captured by a model. FP is the number of TEC disturbances identified in model runs, that is, dTECModel(UT,lon,lat), but not found in GIMs. TN is identified as an instance of either of the remaining occurrences not being found within the 3-hr window. Physical meaning of the contingency table elements is clarified in terms of hits (TP), misses (FN), false alarms (FP) or model overprediction, and no events (TN). A simple schematic of contingency table elements is shown in Figure 2:

Details are in the caption following the image
Diagram for TEC prediction evaluation contingency table elements in a storm event.
The TSS metric definition is
Model evaluation approaches often do not account for false alarms but focus on true positive rate (TPR) as modeling success metric (Camporeale, 2019):

The TPR metric evaluates the prediction “success,” that is, the fractional number of correct predictions over the total number of actual TEC disturbances. The latter is composed of the correct predictions and missed events. The true positive rate is an intuitive measure of the prediction success and widely used metrics utilize similar approaches (Meng et al., 2016; Shim et al., 2012). We introduce this metrics for easier comparison with prediction efficiency evaluations in prior studies. However, the TSS metric provides more strict evaluation of predictive capabilities. It accounts not only for the successes but also for false positives. We believe it can provide additional insight into a model performance. In this paper we evaluate and interpret results for both metrics. We distinguish between TEC increases and TEC decreases and provide statistics for prediction TEC increases and decreases separately. There are notable differences between methodology of this paper and our previous analysis (Meng et al., 2016). First, we analyze storms of two types over one solar cycle. Second, we allow for 3-hr mismatch between observed and modeled TEC disturbances. Third, we focus on large-scale TEC variations and introduce a larger grid size. Forth, we use two metrics to evaluate prediction success. Moreover, we expand our evaluation study to use three representative models. Our analysis intends to contribute to a discussion of challenges in space weather prediction and to shed light on this ongoing discussion by outlining a well-defined use case.

3 Results

3.1 Prediction Evaluation

We start with evaluating the distribution of dTEC magnitudes for the whole set of modeled storms. These numbers differ depending on a storm strength. We introduce disturbance levels as integer bins for dTEC magnitude from 1 to 10 to distinguish between the number of disturbances at different magnitudes. Values of dTEC in model and GIM are considered “matched” if they are both in the same direction (decrease or increase) and are bounded by the same thresholds corresponding to a dTEC magnitude bin. The numbers are normalized to the total number of identified disturbances or dTEC. Figure 3 shows statistics for these matched disturbances. Different model statistics are shown by colors and decreases and increases are shown by solid and dashed lines, correspondingly. All three models have about the same fractional numbers of disturbances matching GIM disturbances at all disturbance levels. In absolute values, Table 3 shows that the number of correctly identified decreases is higher than that of increases across all models, with TIEGCM and CTIPe having higher numbers than GITM.

Details are in the caption following the image
The number of total TEC disturbances per dTEC magnitude estimated from three model runs. The values for each model and disturbance type are normalized to the total number of disturbances at all levels over all storms.
Table 3. The Number of Total Identified Disturbances
Disturbance type GITM TIEGClM CTIPe
decrease 24,718 28,159 28,578
increase 20,271 24,332 24,669

The largest number of dTEC is identified for the disturbance levels between 1 and 4. The latter means four time TEC increase over the TEC variability level estimated for this day, UT and 2-D grid cell. The number of disturbances decreases below 10% for larger TEC changes. This reflects lower numbers of very large TEC disturbance compared to moderate disturbances even in strong storms. The number of occurrences of TEC decreases and increases at low disturbance levels are close, with TEC decrease occurrences being somewhat larger. There are slightly more occurrences of TEC increases than TEC decreases for larger disturbances. We will evaluate both TSS and TPR at each disturbance level separately. It is reasonable to assume that statistical results are more reliable at lower disturbance levels.

Figure 4 presents results of evaluation of TEC decreases and increases in different disturbance ranges across three models and aggregated over 35 selected storms.

Details are in the caption following the image
Metrics for TEC prediction success evaluated for different disturbance levels: (a) TSS and (b) TPR. TEC decreases and increases are shown by solid and dashed lines, correspondingly. Different models are color coded. See text for details.

First, we discuss prediction performance in TSS. TEC decreases are shown by solid lines in Figure 4a. TSS for all models is about 0.1 or lower for low to moderate disturbance levels. TIE-GCM (red) demonstrates the best performance with almost flat skill score. CTIPe (blue) shows an intermediate performance until the disturbance level of about 5 when it is overcome by GITM. GITM (green) shows lower TSS scores at low disturbance levels and intermediate performance at moderate to strong disturbance levels. TEC increases shown by dashed lines are best captured by TIE-GCM while CTIPe shows a lower TSS metric. GITM underperforms in this metric. Note the increase of prediction skill at disturbance levels >7. We will discuss it below.

Second, we focus on prediction success and do not include false alarm rate into the metric. TPR is shown in Figure 4b. The best prediction performance of >50% is demonstrated by CTIPe for TEC increases at low disturbance levels. The TPR decreases at larger disturbance levels. TIE-GCM successfully models about 35% of TEC decreases at low disturbance levels. The success rate decreases to about 10% at disturbance levels >5. GITM prediction performance decreases from about 30% at low levels to about 5% at large disturbance levels. Prediction skills for TEC increases are lower than for corresponding TEC decreases. TIE-GCM demonstrates better overall performance than GITM. CTIPe is the most successful in modeling TEC decreases, while underperforms in modeling TEC increases. We will analyze elements of the contingency table over all modeled storms to understand differences between TSS and TPR metrics and to evaluate model performances in details.

Figure 5 visualizes elements of the contingency table for TIE-GCM (a), GITM (b), and CTIPe (c) model runs separately. The numbers for TP (red), FP (blue), and FN (gray) occurrences are shown at each disturbance level. TN occurrences are much larger than other contingency table elements especially for larger disturbance levels and are not shown. Filled and empty bars correspond to TEC decreases and increases, correspondingly. Analysis of TIE-GCM-based prediction performance shows that both TEC decreases and increases are underpredicted (identified as FN or misses) at low levels of disturbances. Correct predictions (TP or hits) and overpredictions by the model (FP or false alarms) are nearly equal at the lowest disturbance level. TP numbers decrease faster than FP numbers. The model tends to overpredict at the disturbance levels above 5. The actual number of dTEC occurrences decreases dramatically with disturbance strength and statistics is likely low for levels >6. Both TEC increases and decreases show similar statistics.

Details are in the caption following the image
Distributions of contingency table elements TP, FP, and FN by disturbance level for TIE-GCM (a), GITM (b), and CTIPe (c) models. See text for details.

GITM-based prediction evaluation (Figure 5b) shows similar dynamics to TIE-GCM-based prediction with predominance of FN elements. TP occurrences are lower than FP occurrences at all disturbance levels. TEC decreases have higher TP numbers than TEC increases. FP elements show an opposite trend. The model tends to overpredict at large disturbance levels especially for TEC increases.

CTIPe-based results (Figure 5c) show better prediction performance for TEC decreases. FP (false alarms) occurrences of TEC decreases dominate at all disturbance levels, whereas FN (misses) dominate for TEC increases. The success of CTIPe-based prediction of TEC decreases is the highest among all models at the lowest disturbance level. This agrees with the TPR metric in Figure 4b. Still, increased FP or model overprediction occurrences result in slight decrease of the TSS metric relatively to TPR. Focusing on TP (red) and FN (gray) occurrences explains decreasing trends for TPR in Figure 4b. On the other hand, large values of FN at small disturbance levels can explain nearly flat trends of TSS at <6 in Figure 4a. Overall, the number of TP for TEC decreases is larger than that for TEC increases.

TIEGCM and GITM results show increases of the prediction success rates for TEC decreases at large disturbance levels (shown by the solid lines in Figure 4). TPR is defined by the numbers of TP and FN (misses) occurrences. Figures 5a and 5b show sharp decreases of TP and FN occurrences for TEC decreases at large disturbance levels for the two models. The FN occurrences decrease more steadily in the case of TEC increases and their numbers are larger than those for TP occurrences at large disturbance levels. This explains an increase of TPR values seen for decreasing TEC and implies that TIEGCM and GITM tend to underpredict large TEC increases to a more significant degree than large TEC decreases. All elements of the contingency matrix, especially TP and FP occurrences, decrease at large disturbance level when the number of analyzed TEC disturbances decrease dramatically (Figure 3). This result is in agreement with statistical properties of rare events (e.g., Ferro & Stephenson, 2011).

We intercompare successful prediction performances across different models by showing distributions of TP (hits) occurrences with the “sunburst diagram.” Figure 6 (left-hand panels) shows distributions of correct prediction occurrences of TEC decreases (a) and TEC increases (b). The inner circle shows a distribution of dTEC by the magnitude or disturbance levels indicated by the numbers. The diagram restates that the largest number of disturbances (TEC increases or decreases) occur at lower disturbance levels (from 1 to 4), and the number of dTEC with magnitudes above 4 decreases dramatically. The outer circle shows how TP occurrences are partitioned among TIE-GCM, GITM, and CTIPe results. It shows relative contribution of each model to the prediction “hits” in each disturbance magnitude range. The analysis of model successes indicate the best performance of CTIPe prediction of TEC decreases at low disturbance levels (seen at dTEC from 1 to 4 indicated in the adjacent inner circle) closely followed by TIEGCM and GITM. This behavior persists through the level 4. TIEGCM leads successful prediction of TEC increases at low disturbance levels followed by GITM and CTIPe. It is clear from Figure 6 that there are fewer TPs and statistics is much lower for dTEC > 4 occurrences. Two right-hand panels show results for dTEC from 5 to 10 only. CTIPe shows the best results for modeling TEC decreases up to the Disturbance Level 6, while TIEGCM shows higher TP occurrences for dTEC from 7 through 10. TIEGCM shows the largest number of TP occurrences in TEC increases as compared with other models for all disturbance levels. It is followed by GITM (for dTEC from 1 through 4) and CTIPe for higher disturbance levels.

Details are in the caption following the image
Distribution of TP occurrences of TEC decreases (a) and TEC increases (b) across three models and different disturbance levels. Disturbance level is indicated by color. The numerical value of disturbance level is indicated in the second ring from the outside. The left panels show all disturbance levels, whereas the right panels show more detail for Disturbance Levels 5–10 separately for TEC decreases (c) and TEC increases (d).

3.2 Control Cases

To evaluate the importance of physics-based modeling of ionospheric storm dynamics for TEC prediction, we analyze prediction success in Control Case 1. It is designed to evaluate to what extent predictive capability depends on physics-based modeling of storm dynamics. To investigate this in a simple test case, we assume that modeled TEC does not change throughout a storm and remains the same as on the first storm day. We utilize the same model runs but use results for quiet days and storm start days only. The GIM data should reflect storm dynamics. Will the prediction success be about the same for this persistency test as for the dynamically modeled TEC? Model-based disturbances for all storm dates are defined as
Prediction metrics for the Control Case 1 are shown in Figure 7.
Details are in the caption following the image
Metrics for TEC prediction success for the Control Case 1 evaluated for different disturbance levels: (a) TSS and (b) TPR. TEC decreases and increases are shown by solid and dashed lines, correspondingly. Different models are color coded. See text for details.

The Control Case shows that predictive capability is significantly degraded as compared with actual forecast-compatible runs (Figure 4). Specifically, TEC decreases and increases show a TSS lower than 5% (see Figure 7a). Prediction success (i.e., not including FP occurrences) is reduced by about 2.5 times for TEC decreases in CTIPe and by about 1.7 times in TIE-GCM at the lowest disturbance level (see Figure 7b). The prediction success is reduced by about 1.4 times for TEC increases in TIE-GCM. The success rate for TEC increases in CTIPe is about the same as previously. The success rate is reduced for all modeling results at the moderate disturbance level of 4 except for prediction of TEC increases in CTIPe that remains at about the same low value. Figure 8 shows TIE-GCM results for the number of total TEC disturbances that are matched with corresponding disturbances in GIM, separately according to disturbance level in the Control Case 1. These values are normalized to the total number of TEC disturbances identified in the model results for the test case. Next we compare these numbers with the corresponding distribution for the prediction runs in Figure 3. In the Control Case the distribution is heavily weighted toward low disturbance levels at about 0.63 for TEC decreases as compared to about 0.4 in the forecast-compatible runs. The number of TEC decreases at larger disturbance levels is smaller than that for forecast-compatible runs and the number of potentially large TEC disturbances is reduced. The number of TEC increases starts dominating over the number of TEC decreases at a disturbance level of 2 and thereafter maintains about the same numbers of occurrences in the Control Case 1 and forecast-compatible TIE-GCM runs. Thus, the lack of accounting for storm dynamics decreases predictive capability of TEC in physics-based modeling and reduces the number of large TEC disturbances. At the same time, this test indicates that physics-based models reproduce TEC changes better than simple persistence of TEC values throughout a storm.

Details are in the caption following the image
The number of total TEC disturbances per dTEC magnitude for the Control Case 1 estimated from TIE-GCM runs. It is normalized to the total number of disturbances over all storms.

Control Case 2 was developed to address the influence of uncertainties in GIM, albeit small, on prediction success evaluation. The GIM data product has the largest intrinsic uncertainties due to nonuniform distribution of GPS ground sites and interpolation errors. Ho et al. (1997) showed that errors grow with distance to the nearest site. Thus, we select a grid cell over the European sector with densely distributed GPS sites (from 0° to 90° in East longitude and from 30° to 60° in latitude) and calculate the prediction metrics (1–3) for TIE-GCM runs in this particular cell. Figure 9a shows the TSS between 0.1 and 0.2 for the lower disturbance rates, which is consistent with TSS for global forecast-compatible runs (see Figure 4a). Panel (b) shows that the TPR reaches 0.3 for TEC increases and is about 0.45 for TEC decreases for low disturbance levels. Success rate for prediction TEC increases in this cell are comparable with the success rate for global TIE-GCM runs. However, TEC decreases are predicted somewhat better in this cell than for global runs by about 1.3 times. Statistics for larger disturbances with dTEC > 4 falls below 100 occurrences (see panel c). There are only 15 occurrences of dTEC decreases and 27 occurrences of dTEC increases at the disturbance level of 7. Even though prediction success in this grid cell at larger disturbances appears higher as compared to global evaluation, we cannot draw conclusions based on the poor statistics.

Details are in the caption following the image
Results for the Control Case 2 estimated from TIE-GCM runs: (a) TSS, (b) TPR, and (c) the number of total TEC disturbances per dTEC magnitude.

4 Summary of the Results

Forecast-compatible modeling runs of three GCMs with multiday lead time were performed for 35 storms, both CME and CIR/HSS type. We applied statistical evaluation of TEC prediction success based on an event-based approach and by identifying hits, misses, and false alarms in model outputs (Figure 2). The JPL GIM data set was used as the reference data set. The main results are as follows:
  1. The number of dTEC disturbances decreases with the disturbance level, with TEC decreases slightly dominating at lower disturbance levels and TEC increases dominating at larger disturbance levels (see Figure 3). These statistics could reflect predominance of TEC increases for stronger storms. The relative number of events decreases dramatically for dTEC > 5.
  2. At lower disturbance levels, CTIPe is the most successful in prediction of TEC decreases (with TPR at about 0.5) and TIE-GCM shows the best performance in prediction of TEC increases (TPR is about 0.3), see Figure 6. Accounting for the false positive (false alarms) decreases the prediction success metrics (compare TPR and TSS in Figure 4). CTIPe is somewhat less accurate than the other two models in prediction of TEC increases at low disturbance levels with a large number of false negatives (misses). TIEGCM and GITM tend to underpredict large TEC increases to a larger degree than large TEC decreases. We should note that different models use different drivers and boundary conditions, for instance, Fuller-Rowell and Evans model versus Ovation Prime for particle precipitation. These drivers can be estimated with varying degrees of accuracy in our forecast-compatible approach. Impact of prediction uncertainty in drivers on prediction success is not well understood. Moreover, further study is needed to evaluate the impact of the grid size on predictive capability for each model. This paper is one of the initial efforts to evaluate TEC prediction framework utilizing GCMs.
  3. Detailed analysis of the contingency matrix elements show importance not only of model false negatives (misses) but also of model false positives (overpredictions) for the overall picture of model prediction capability as was previously indicated for TEC analysis by Meng et al. (2016). We suggest using a more comprehensive skill score (such as TSS) to fully account for all contingency matrix elements (see also Lopez et al., 2007; Pulkkinen et al., 2013). Figure 5shows that false positive instances in either TEC increases or TEC decreases start dominating at larger disturbance levels. This warrants further analysis why all three models predict larger TEC changes than observed in the reference data set.
  4. Control Case 1 shows that forecast-compatible modeling during disturbed periods performs better than a simple persistence of TEC values (see Figures 4 and  7). This simple test shows promise for use of physics-based modeling in prediction with multi-day lead time.
  5. We explored the influence of potential GIM errors on the prediction evaluation. Results of the Control Case 2 show that prediction metrics calculated globally and over densely distributed GPS ground sites are consistent overall with each other at the low to moderate disturbance levels (compare Figures 4 and  9). Results for large disturbance levels are not conclusive because of poor statistics.

5 Discussion of Sources for TEC Prediction Uncertainties

Our results show that the maximum TEC prediction success (or the true positive rate) is about 0.5. This is in general agreement with other efforts of evaluation of geospace predictions based on modeling, for example, for a similar Heidke Skill Score by Pulkkinen et al. (2013) and another estimation of prediction error (Andreeva & Tsyganenko, 2019). We should note that our prediction evaluation results and model ranking could look different if a different metric, for example, RMS or prediction efficiency (Shim et al., 2012), is used. The evaluation of prediction capability using a single metric should not affect the perception of the model's usefulness. We suggest that no single metric is capable of robustly quantifying a model's capability. A robust standardized assessment of predictive capabilities of IT models with a combination of standard metrics should be developed following approaches by (Liemohn et al., 2018; Shim et al., 2017).

Another challenge of space weather prediction is to identify a system variable to predict. We selected TEC as widely used and readily available parameter that characterizes ionospheric state. However, prediction of TEC maxima occurrences in a certain time window and spatial range might have better practical applications than overall prediction of global TEC on a grid. There could be several reasons that potentially limit the success rate of TEC predictions with multi-day lead time using the physics-based models.

First, any space weather prediction will be limited by the uncertainties of the input variables used to predict, often the solar wind characteristics. We used predicted F10.7 index that is a proxy for the true spectrum and is an approximation of EUV driver in the models (see analysis of EUV importance for predictions in Meng et al., 2020). However, we used the actual solar wind data in our forecast-compatible modeling setup. We believe it is important to establish prediction limitations of the IT models even in this idealistic case.

Figure 10 shows several examples of forecasted F10.7 values for modeled events. Results for two HSS storms are shown in the top panel, and results for two CME storms are shown in the bottom panel. The F10.7 forecast begins on the day when the blue and black symbols start to separate from each other. Both daily F10.7 and 81-day averages are used as inputs into the GCMs. Forecasted values of 81-day averages and observation-based estimates track each relatively well for all modeled storms, showing the maximum differences of up to 20 units for the CME-type storm of September 2011 and HSS-type storm of April 2012 (not shown). Daily forecasted and actual values for the modeled storms can show larger discrepancies. The previous Solar Cycle demonstrated irregular behavior of the F10.7 index, especially throughout the rising and the main phases (see NOAA's Solar Cycle progression on https://www.swpc.noaa.gov/products/solar-cycle-progression). An autoregression model used in this study may not be sufficient to capture complex variability of F10.7 during that Solar Cycle. This uncertainty in forecasted F10.7 input is a part of our predictive capability assessment.

Details are in the caption following the image
Comparison of daily F10.7 (dots) and 81-day averages (triangles) used in forecast-compatible runs (blue) and actual values derived from observations (black). Top panels show results for HSS-type storms (April 2005 and 2006), and bottom panels show results for two CME-type storms (August and September 2011).

To get a glimpse of our prediction evaluation sensitivity to F10.7, we selected two CME-type storms from those shown in Figure 10. The August and September 2011 storm intervals demonstrate the smallest and largest differences between the forecasted and actual F10.7 values, respectively. We performed additional model runs with actual F10.7 values for these two intervals. Figures 11a and 11b show TSS for the September 2011 storm interval. Results obtained from the model runs with forecasted F10.7 (as above in the paper) are shown by solid lines for TEC decreases (panel a) and by dashed lines for TEC increases (panel b). Results obtained from model runs with actual F10.7 are indicated by filled dots. There were no disturbances identified at large disturbance levels (>7). Overall, the prediction success metrics of TEC decreases is higher when actual F10.7 values are used. This is mostly achieved by either increasing TP occurrences (for TIEGCM and CTIPe) or decreasing FN occurrences (for GITM). The TSS values for model runs with actual F10.7 exceed the typical success rates shown in Figure 4a. Comparison between TSS estimates for TEC increases in model results with actual and forecasted F10.7 is not very clear. Forecast success for CTIPe and GITM results is low for all model runs. TIEGCM appears to have better success with forecasted F10.7 values than with actual F10.7 values. The number of TEC decreases and increases in observations were 1029 and 602, correspondingly. Thus, the statistics of TEC increases is about half that of TEC decreases, which may introduce uncertainty in the results. TSS estimates for the August 2011 storm interval are shown in panels (c) and (d). Values of F10.7 during this interval are lower than for the previous interval, and there is less overall difference between forecasted and actual F01.7 values. Unfortunately, CTIPe runs with actual F10.7 values failed for this event. Forecast success rate for TEC decreases in TIEGCM and GITM model runs is low and there is not a large difference in TSS between runs with forecasted and actual F10.7. TSS for TEC increases in TIEGCM runs is higher than in GITM runs. However, there is not much difference in success rate between runs with forecasted and actual F10.7 values. This comparison indicates importance of accurate forecast of F10.7 for overall TEC forecast success.

Details are in the caption following the image
Comparison of TSS calculated from model runs with actual F10.7 (lines) and forecasted F10.7 (dots). TSS for September 2011 storm event are shown in panel (a) for TEC decreases (solid lines) and in panel (b) TEC increases (dashed lines). TSS results for August 2011 storm event are shown in panels (c) and (d). Different colors correspond to different models.

Second, our approach for prediction evaluation has certain limitations. The GCMs provide TEC within different altitude ranges (see Table 1), while GIM evaluates TEC up to a GPS satellite orbit. Only CTIPe includes plasmasphere, which is beneficial for modeling ionospheric uplift during large storms. The scaling factor was introduced for each model and each storm to account for potential underestimation of TEC. In forecast-compatible runs this scaling was estimated based on prestorm or quiet time values and was assumed to be independent of latitude and longitude. The scaling factor can be different for storm times, in particular during strong geomagnetic storms. This introduces additional uncertainty in our analysis. However, we do not expect major change of our statistical results because of a large number of moderate storm time TEC disturbances and TEC decreases (see Figure 3 and Table 3).

A fundamental problem with IT prediction is the uncertainty in specification of the GCM drivers (e.g., Chartier et al., 2013). As noted in Siscoe and Solomon (2006), the IT is strongly driven and specifying the drivers accurately is critical. Empirical models used for the forecast-compatible runs capture only general variability at the magnetosphere-IT interface. It is not well understood how large is the impact of driving by smaller scales on TEC. There are limitations on scales of temporal and spatial variability of the IT system that can be currently predicted (Verkhoglyadova et al., 2016). Future efforts on resolving coupling at smaller temporal and spatial scales could be important for understanding and predicting the IT system (Deng et al., 2009; Lu et al., 1995; Verkhoglyadova et al., 2017). However, even if the appropriate scales are modeled, there is the problem of how to determine the drivers at these scales with multiday lead time in the absence of contemporaneous measurement. Even in the best case, a climatological mean driver pattern could be input, but certainly not the details of the spatial driver variations at these small scales.

Another possible explanation for the lack of good correlations is that the existing codes have not put in proper subroutines for the inclusion of prompt penetration electric fields (PPEFs) for storm time responses of the ionosphere. It is well known that PPEFs occur at the dayside and nightside near-equatorial ionospheres and may be the major storm time ionospheric effect causing TEC variations (Mannucci, 2005; Tsurutani, 2004). The above hypothesis can be tested by inclusion of PPEF subroutines in the codes, with a retesting of codes after implementation.

GCMs are constantly evolving and improving. Substantial efforts are under way to couple IT models and magnetosphere models for a more comprehensive description of the geospace dynamics. Chartier et al. (2016) demonstrated improving IT state prediction of 1 hr ahead by assimilating ionospheric measurements into TIE-GCM. Promising direction of predicting the IT system is a combined approach of data-based machine learning guided by physics-based modeling or “gray box” approach (Camporeale, 2019). Another consideration must be given to a possibility of intrinsic stochasticity of the coupled geospace system. How reliable is our “deterministic” approach to IT modeling and does it capture actual IT processes? All these caveats and potential research directions indicate multiple challenges in the fast growing field of space weather forecasting.


Portions of work were performed at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA. Sponsorship of the Heliophysics Division of the NASA Science Mission Directorate is gratefully acknowledged. The work performed by the CCMC is supported by grants from NSF Space Weather Program. R. McGranaghan was supported by the NASA Living With a Star Jack Eddy Postdoctoral Fellowship Program, administered by the University Corporation for Atmospheric Research and coordinated through the Cooperative Programs for the Advancement of Earth System Science (CPAESS). The authors would like to thank T. Matsuo for insightful discussion. We thank K. Tobiska (Space Environment Technologies, https://www.spacewx.com) for helpful discussions on F10.7 forecasting. Solar wind parameters and activity indices were taken from the OMNI database. The modeling runs were performed at the CCMC through their public Request a Model Run interface (https://ccmc.gsfc.nasa.gov/requests/requests.php).

    Data Availability Statement

    Model outputs can be accessed at the CCMC data depository and visualized with available online tools. The list of runs and corresponding links are provided in the online dataset (https://doi.org/10.6084/m9.figshare.11496663.v1) associated with the paper. The paper utilized GIM data set that is provided at https://sideshow.jpl.nasa.gov/pub/iono_daily/gim_for_research/jpli/ website.