On Imaging South African Regional Ionosphere Using 4D‐var Technique

One of the major research areas in the space weather community is the ability to understand, characterize, and model a time‐space variant ionosphere through which transionospheric signals propagate. In this paper a strong constraint four‐dimensional variational data assimilation (4D‐var) technique was used to more accurately estimate the South African regional ionosphere (bound latitude 20–35°S, longitude 20–40°E, and altitude 100–1,336 km). The altitude was capped to the JASON‐1 satellite orbital altitude for the purpose of eliminating the plasmasphere contribution hence reducing the computation expense. Background densities were obtained from an empirical internationally recognized ionosphere model (IRI‐2016) and propagated in time using a Gauss‐Markov filter. Ingested data were STECs (slant total electron content) obtained from the South African Global Navigation Satellite System receiver network (TrigNet). The vertically integrated electron content was validated using Global ionosphere Maps and JASON‐3 data over the continent and ocean areas, respectively. Further, vertical profiles after assimilation were compared with data from a network of ground‐based regional ionosondes Hermanus (34.25°S, 19.13°E), Grahamstown (33.3°S, 26.5°E), Louisvale (21.2°S, 28.5°E), and Madimbo (22.4°S, 30.9°E). Results show that assimilation of STEC data has a profound improvement on the estimation of both the horizontal and vertical structures during quiet and storm periods. Accuracy of the horizontal structure decreases from the continent toward the ocean area where GPS receivers are less abundant. Superiority of assimilating STEC is best pronounced during daytime especially when estimating maximum electron density of the F2 layer (NmF2), with a 60% root‐mean‐square error improvement over the background values.


Introduction
Over recent decades, particularly in the communication sector, technological systems have become more interdependent with a demand for high precision both on a global and regional scales. Consequently, understanding, characterizing, and modeling a time-space variant ionosphere through which transionospheric signals propagate have taken a rather high profile within the scientific and engineering communities. Over the years, instruments such as ionosondes, incoherent scatter radars, rocket sounders have been used in probing and monitoring the Earth's ionosphere (e.g., Davies, 1990;Hunsucker, 1991;Seddon, 1953). However, due to the shortcomings of these instruments, for example, the high cost of maintenance and limitation in temporal and spatial coverage, physics-based and empirical models have been developed and published with the aim of describing the general nature of terrestrial ionosphere (e.g., Fuller-Rowell & Rees, 1980;Huba et al., 2000Huba et al., , 2008Bilitza, 2001;Ridley et al., 2006, Strangeways et al., 2009: To date, the inclusion of physics-based models into operational services is still limited due to the high computation cost and inadequacy of the drivers involved in running such models. In addition, the complexity of physical and chemical processes that govern terrestrial ionosphere dynamics is far from being fully understood and represented in physics-based models. Empirical models on the other hand have a low computational expense compared to physics-based models and do not require a full understanding of the underlying physics but underperform over regions with data scarcity during the early stages of model development. To make global physics-based and empirical models more tractable, they are developed or run on a coarse grid, which limits their ability to describe detailed localized features of the ionosphere. For this reason, most often, global models cannot be used in nowcasting and forecasting of the regional ionosphere with high precision in demand. Over the South African region, different researchers/groups have attempted to develop regional models that can more accurately define the local ionosphere (e.g., Habarulema, 2010;McKinnell, 2002;Okoh et al., 2010;Opperman, 2007;Ssessanga et al., 2014;Ssessanga et al., 2015). However, most of these models have until now focused on the simplistic 2-D horizontal structure of the ionosphere, in which important information about ionospheric vertical dynamical changes is lost.
Recently, tomography techniques have widely been applied in regional ionosphere imaging in order to obtain a more coherent 3-D picture (e.g., Seemala et al., 2014;Saito et al., 2017;Ssessanga et al., 2015Ssessanga et al., , 2017. However, tomography techniques fail to adequately spread observation information within the grid, particularly to remote areas where measurements are lacking. Subsequently, a more accurate methodology of assimilating observations into models has taken the center stage, wherein observation information is spread within the region of interest while considering the underlying physics (e.g., Buresova et al., 2009;Bust et al., 2004;Chen et al., 2016;Durazo et al., 2017;Hajj et al., 2004;Pi et al., 2004).
In this paper, we present a preliminary study that assess the capability of an in-house developed fourdimensional (in space and time) data assimilation scheme to more accurately estimate the 3-D picture of the ionosphere over the South African region. More specifically, we attempted to objectively appraise and point out the limitations of our 4D-var algorithm, which will be addressed as the algorithm matures. The current scheme ingests TEC observations and the results are compared/validated with the Global Ionosphere Maps (GIMs), satellite altimetry data (JASON-3 vertical total electron content), and ionosonde measurements. The next section briefly reviews the formulation of the 4D-var algorithm used in this study. Section 3 describes the data used in assimilation. Section 4 describes the experiment setup, followed by a discussion of the results in section 5. Conclusions and future recommendations are given in section 6.

Data Assimilation
A process of ingesting observations into a background model (physics-based model or empirical model) is data assimilation. Data assimilation techniques have attained a degree of maturity and are extensively applied in diversified fields such as meteorology, oceanography, and seismology. Recently, these techniques were introduced in the ionospheric community and there are a growing number of studies (e.g., Buresova et al., 2009;Hajj et al., 2004;Lee et al., 2012Lee et al., , 2013Pi et al., 2004;Scherliess et al., 2004Scherliess et al., , 2006Schunk et al., 2004;Thompson et al., 2006;. The most common data assimilation techniques are known as 3D-var, 4D-var, and ensemble Kalman filters (see, e.g., Daley, 1993). Here we utilize the 4D-var technique.

4D-var
This is a variational technique that minimizes a cost function (J) that quantifies the mismatch between a model state and observations at their appropriate time (t k ) within a time window t 0 ≤ t k ≤ t N . Because the optimization is both in 3-D space and time, the technique is referred to as 4D-var. Due to the extra time dimension, 4D-var has high computational cost of minimizing J, when compared to 3D-var that misses time-dependent dynamics. Even so, we opted for 4D-var with an idea that future improvements will require an ingestion of different types of observations collected at different time samples.
Consider a set of observations Y ! k collected at time sample,t k , such that where H k is a design matrix that maps the model predicted state (X ! k ) into the observational space at time t k and ε ! k are observational errors at time t k , with covariance R k . Deducing what form R k takes in data assimilation is nontrivial, and there is a growing body of work on this topic especially in assimilation cases where variations of uncertainty in different ingested data are significant. Here ε ! k are considered to be independent and identically distributed such that R k is a diagonal matrix. The model predictions are obtained by applying a forecast operator M to previous predictions, that is to say (2) ω ! k represents the model errors, and these are assumed to be uncorrelated with observation errors for simplicity. In this study we adopt a simple tractable Gauss-Markov filter as the forecast operator (e.g., Bust et al., 2004;Gelb, 1974;Howe et al., 1998) where dT is the length of the time sample (t k+1 − t k ), τ is correlation time along the time trajectory, which determines how much predictions at time sample k influence the predictions at k+1, and X ! b is the background electron density at k + 1 time sample.
Assuming that all errors are unbiased and Gaussian (Lorenc, 1981(Lorenc, , 1986) and our forecast model to be "perfect" (ω ! k: ¼ 0) such that the system equations are treated as strong constraints, then the cost function to be minimized is where X ! o represents X ! at t 0 and B is the background model error covariance matrix to be defined later.
Equation (4) can be solved using a classical Lagrange multiplier technique which requires a constrain to be satisfied, here the model strong constraint g X !
The Lagrangian to be solved is then defined as follows: where multipliers, λ k , are to be determined and are fundamental in expressing optimality conditions and development of the algorithm to solve as detailed below (Ito & Kunisch, 2008).
Using variational calculus and a boundary condition λ N+1 = 0, it can be shown that at extreme ∇ α L = 0, where α being X ! o ; X ! k ; λ k (e.g., Olaguer, 2011;Ssessanga et al., 2018;Zou et al., 1997) The initial step in solving the above optimization problem requires running the forward model (prediction) for the entire assimilation window up to X ! N : In equation (6) (which is the initial step of the algorithm) λ N can only be computed after X ! N has been determined. Once λ N is computed, λ k can also be determined using the adjoint model, equation (7), which involves an evaluation of the first-order derivative of the forward model (Jacobian matrix ∂M X k ! ∂ X k ! ) and measures the sensitivity of the cost function (J) to changes made on state vector X ! k (e.g., Ssessanga et al., 2018). Furthermore, it should be noted that since this whole process is run backward in time, that is, k = N−1, N−2 … 0, final disagreements between estimates and observations will be propagated backward in time to correct the background state as shown in equation (8). The above process is repeated until convergence is attained. Here the algorithm was terminated when number of iterations was 50 (i.e., that process is presumed to have failed to reach convergence) or the L-2-norm of the update difference; The background error covariance (B) determines the spread of information during analysis; hence it has a large influence on the accuracy of any data assimilation algorithm. Moreover, in our algorithm B enters the final results without any alterations (see equation (8)) and yet it is the most difficult error covariance to estimate and in most cases too large to store, calculate, or even use. For simplicity and reduction in computation expense, we have adopted a B matrix described by a similarity kernel (Bust et al., 2004) where i and j are indices of the grid points with coordinates changed from geographic to geomagnetic using IGRF-12 model (Thébault et al., 2015); S b is defined as ϒ*X ! b , and the ϒ (= 0.4) factor was obtained from ; z is altitude; d ij is the horizontal great circle distance; L z and L H are the vertical and horizontal correlation lengths, respectively.
The horizontal correlation length between the grids i and j is defined as where α is azimuth angle and L θ and L ∅ are latitude and longitude correlation lengths. In Bust et al. (2004), L θ (L ∅ ) for low and middle latitudes are 3(5)°and 5(10)°, respectively, with L ∅ being measured in great circle degrees. Vertically, L z is set to vary from 20 to 25 km in the E and F regions to 500 km in the plasmasphere. The same above values are adopted in this study.
The R k diagonal elements (assimilated observation variances) are computed as a where n represents the nth observation in column vector Y ! k , η (= 0.1) is a learned parameter from experience, and Y arc is the mean value computed from Y n k subset of values collected within a continuous receiversatellite arc in a given assimilation window.

Data
Compared to other techniques such as 3D-var, one of the major advantages of 4D-var technique is that different types of measurements collected within the assimilation window can be used to improve the analysis while also considering the time information. However, during the evaluation stage of an assimilation scheme, it is reasonable that different data types should be input sequentially in order to ascertain the effect of each data source on the fidelity of the assimilation scheme. Since the current study is preliminary work, to limit the complexity and permutations that comes with assimilating different data types, only TEC data derived from a South African network of GNSS ground receivers (TrigNet, 55 stations marked as solid green triangles in Figure 1) are assimilated. To be more specific, although GNSS comprises of different satellite constellations (e.g., BDS, Galileo, GLONASS, and GPS), only data obtained from GPS were considered in this analysis. In addition, even though other privately owned GNSS receiver stations might be available within the grid, we have intentionally only used TrigNet GPS receiver stations to mimic a realistic data distribution that will be readily available for operational use.
To derive TEC, delays and advances (recorded at the receiver) in code and phase, respectively, of the GPS transmitted signals are utilized. The amount of phase change from this delay is dependent on TEC, which is defined as the total number of electrons (N e ) integrated along the path of the signal from the transmitter (T X ) on the GPS satellite to the receiver (R X ) on ground where ds is the element of geometric range. TEC is measured in unit of 10 16 electrons per square meter, where 10 16 electrons/m 2 = 1 TEC unit (TECU; Davies & Hartmann, 1997).
The GPS observation data to derive TEC are downloadable in RINEX (Receiver Independent Exchange Format) with a resolution of 30 s (ftp://ftp.trignet.co.za/). The data set used in this analysis covered a period 7-8 September 2017, during which a geomagnetic storm was strong and eminent. To limit multipath effects, the data set was filtered to only include observations with elevation angles greater than 30° (Habarulema et al., 2013;Ssessanga et al., 2017).

Background Ionosphere Model
Because of the chaotic nature of a nonlinear systems such as the ionosphere and the fact that our 4D-var analysis is a strong constraint scheme, a correct representation of the background ionosphere (X ! b ) or the initial guess should highly affect the accuracy of the final results. Moreover, the background electron densities (X ! b ) are used in computing the B matrix (see equation (8)) that is highly influential in spreading observation information.
Physics-based models should be used as background in most data assimilation schemes, since they provide a more resolute time-varying dynamics of the ionosphere. However, the computational challenge (specifically under high resolution grids) limits their usability. On the other hand, empirical models are less expensive computationally, hence utilized in most assimilation schemes (see, e.g., Fuller-Rowell et al., 2006;Araujo-Pradere et al., 2007;Yue et al., 2012). In this study, X ! b densities were obtained from an empirical model, IRI (international reference ionosphere), which is internationally recognized and recommended standard for the specification of plasma parameters in Earth's ionosphere (Bilitza et al., 2011). Since IRI inception, the IRI research group has continually improved the model as the ionospheric database expands to include more observed data. The most recent release is IRI-2016 , the version used in this study. The IRI model source code (in FORTRAN) and literature about the model can be accessed on the homepage at http://irimodel.org and references therein (e.g., Bilitza, 1990Bilitza, , 2011Bilitza, , and 2014.

Setup
The assimilation was carried in a region bound in latitudes of 20-35°S, longitudes of 20-40°E with 2°horizontal resolution and altitudes of 100-1,336 km with vertical resolution of 10, 50, and 167.2 km between 100-450, 450-500, and 500-1,336 km, respectively. Note that the ionospheric E and F regions where we expect large density variations have the highest resolution. The maximum altitude used in assimilation is 1,336 km, which is much lower than the GPS satellites orbit at altitudes of~20,200 km, in order to reduce the computation expense. However, neglecting densities above 1,336 km would introduce errors in our reconstruction results specifically during nighttime, and especially during geomagnetic storms, when the plasmaspheric contribution to the TEC value can be substantial (Lee et al., 2013;Trichtchenko et al., 2007;Yizengaw et al., 2008). Using TEC measurements from JASON-1 satellite which did orbit at an attitude of~1,336 km (equal to the maximum grid altitude in our study), Yizengaw et al. (2008) estimated plasmasphere contributions between~1,336 and~20,200 km for geomagnetic low, middle, and high latitudes. At middle latitudes, which is the region of interest in this study, Yizengaw et al (2008) estimated percentage plasmasphere contributions to GPS-VTEC (Vertical TEC) as 15% and 30% during daytime and nighttime, respectively, which are then used here to deduct the extra plasmasphere contributions from the ingested data as follows where TEC i is derived total electron content corresponding to the ith ray with elevation angle Elv i . To obtain VTEC i , an algorithm developed at Boston College was employed. Details about the algorithm can be found in Seemala and Valladares (2011) and Uwamahoro et al. (2018).
Further, TEC i derived from differential phase along the same satellite-to-receiver paths is subject to an unknown cycle offset in addition to receiver and satellite biases. To negate this effect in the final results, adjacent rows in equation (1) corresponding to the same satellite-receiver continuous observation arc were differenced. Therefore, the solved equation is To obviate redundancy in our computation, the minimum distance between neighboring stations whose data were available for assimilation was set to 100 km. Moreover, in cases where neighboring GPS receivers are identical (with same specifications, make and model) and ionospheric variation is below the noise level, data points that are too close together may have errors that are linearly dependent, which could increase the condition number (that determines numerical instability) of R matrix that is inverted in equations (6) and (7).
As a data quality control process, we utilized a simple "buddy check" procedure to remove outliers that may have resulted from inherent measurement processes. That is to say, an observation or datum is considered an outlier if the magnitude of the innovation,|H n k X ! k −Y n k |, exceeds a threshold C* σ 2 ok þ σ 2 fk 1 = 2 , where C is a predefined multiple (here set to 2) and σ 2 ok and σ 2 fk are observation and model (in observation space) error variances, respectively, at time t k (see, e.g., Dee et al., 2011).
In section 2.1 the forecast model was assumed to be an ideal perfect model, while neglecting errors in the forecast model (ω ! k: ¼ 0). For this assumption to hold and also to inhibit the growth of neglected errors over time, we set a relatively short assimilation window within which the Gauss-Markov propagation filter can be considered reliable: Each assimilation window was set to a maximum of 15 min (binned into segments of length 2.5 min (t k+1 − t k ) within which the total number of assimilated STECs is on average 713) and τ= 1 hr, a value assumed from the works of Klobuchar (1980), wherein the author used ground-based measurements to determine the time delay beyond which the temporal correlation coefficient falls to 70% (also see Jonathan, 1997, and references therein).
To include information about the time evolution of background model errors, a simple procedure was adopted; once an optimal solution is obtained within the assimilation window of 15 min, estimates ( X ! N ) at time sample,t N , are used as an initial guess or background electron density in the next window of assimilation and a new B matrix is computed using X ! N . This process is repeated for a period equal to τ, and there after the whole process is reinitialized: B and X ! b are recomputed from the background model, IRI. In this study our focus was placed on the period indicated by the two vertical dashed lines in Figure 2. DOYs 250 and 251 were selected to represent quiet and disturbed conditions, respectively.

The 3-D Reconstruction
Figures 3a and 3b show the 3-D background ionosphere and assimilation results, respectively, at sunrise (04:00 UT), midday (10:00 UT), sunset (16:00 UT), and midnight (22:00 UT) on DOY 250. For clarity, the altitude has been limited to the region of uttermost interest, 100-500 km. Local time is equivalent to UT +2. Green solid triangles at the 100-km height are projections of geographic locations of ground-based GPS receiver stations used in the assimilation process. Obvious is a clear difference between IRI and assimilation results, especially at midday when the solar irradiance is expected to be at its highest during daytime. As expected, the F region depicts the largest densities in each 3-D picture. In Figure 3b, on a geomagnetic quiet day and particularly at midday in the F region, high densities due to strong solar irradiance are expected to be nearly uniformly distributed throughout the grid as illustrated in the background model results in Figure 3a at 10:00 UT. However, in Figure 3b, noticeable is that the largest densities (improvements) are located over continent areas where GPS receivers are located and gradually reduce toward the remote areas, particularly over the oceans. This is obviously due to the poor distribution of the ground GPS receiver stations (used in the assimilation) within the region. Therefore, to improve overall accuracy of the ionosphere representation, other data sources that cover remote areas (oceans), for example, occultation data (Yue et al., 2011), should be used to complement the current GPS dataset in future assimilation processes. Furthermore, an inclusion of a GPS receiver in North West region (22-28°S, 10-15°E) over the continent area would also highly improve the results.

Validation of the Integrated Electron Density (Horizontal Structure)
In Figure 4a we have analyzed the ability of our 4D-var scheme to estimate the integrated ionosphere along the GPS ray path. Ingested TEC data are compared to 4D-var and background model (IRI) TEC values on both DOY 250 and DOY 251. Plots in Figure 4a show a distribution of residues (Δ Y ! ¼ Y ! −H X ! ) between the input TEC and the models (4D-var and background). Clearly, on both days, especially during the daytime, 4D-var outperforms the IRI model. The 4D-var residues have a narrower normal distribution (with small root-mean-square error, RMSE) almost centered at zero TECU. Conversely, IRI residues have a bimodal distribution with most values falling within 10-20 TECU range. During daytime, 4D-var reduces the background RMSE by~61% and~50% on quiet (250) and storm (251) days, respectively. At night when the background model performs better (with nearly a normal distribution of Δ Y ! as illustrated in Figure 4a lower panels), the 4D-var assimilation reduces the RMSE by~22% and 12% during quiet and storm days, respectively.
To further analyze the performance of our 4D-var scheme, integrated vertical densities (VTEC) were compared to those from GIMs. However, results from this comparison should be considered mindful of a fact that GIMs also utilize data from some TrigNet stations. GIMs are generated by different International

10.1029/2019SW002321
Space Weather GNSS Service ionosphere working groups (IIWG), CODE (Center for Orbit Determination in Europe), European Space Operations Center from European Space Agency, JPL (Jet Propulsion Laboratory), Chinese Academy of Sciences, and Universitat Politècnica de Catalunya/IonSAT that utilize data from a global distribution of ground-based GNSS receivers. Due to paucity and uneven distribution of GNSS receiver stations or measurements, IIWGs apply different interpolation techniques to fill both temporal and spatial data gaps; hence, the accuracy and reliability of these maps differ each other (see, e.g., Mannucci et al., 1998;Hernández-Pajares et al., 2009). Here we use GIMs from CODE, a popular product in space geodesy (data and information about CODE GIMs are accessible at ftp://ftp.aiub.unibe.ch/CODE/). CODE GIMs are generated using data from~300 GNSS receiver stations. Modeling is performed in a solar geomagnetic reference frame using a spherical harmonics expansion up to degree and order 15. To convert line-of-sight TEC into vertical TEC, a modified single-layer model mapping function approximating the JPL extended slab model mapping function is adopted (see Coster et al., 1992). For the computation of the ionospheric pierce points, a spherical layer with a radius of 6,821 km is assumed. Final modes of CODE GIMs that are more stable, reliable, and with better accuracy are systematically generated usually at a spatial and temporal resolution of 5°× 2.5°and 1 hr, respectively. Extensive information about CODE GIMs and their validation can be found in some of the following works and references therein (Hernández-Pajares et al., 2009;Jee et al., 2010;Schaer, 1999). ) VTEC and GIM VTEC values corresponding to grid points where the regional and global mesh points are collocated. Since in our scheme the altitude is limited to a maximum of 1,336 km, yet GIMs are generated using data that represent an integration of electron densities from ground to GNSS transmitters (~20,000 km), we follow the procedure in section 4 to remove the plasmasphere contribution from GIM VTEC values for this comparison. Specifically, a value of 0.3*VTEC (nighttime) or 0.15*VTEC (daytime) was deducted from each GIM VTEC.
In both panels of Figure 4b IRI VTEC distribution has a horizontal stratification at 15 TECU, and most values are generally consistently smaller than GIM values. Chuang et al. (2019) also compared VTEC values from GIM to those from IRI-2016 and found the later to underestimate VTEC at most latitudes. These results are not surprising since IRI is a climatological model that represents monthly averages of terrestrial plasma densities . Therefore, the IRI model lacks the ability to reproduce detailed ionospheric variations at high spatial and temporal resolutions. Moreover, because IRI is empirical in nature, it does not accurately describe ionospheric dynamics in areas (e.g., Southern Hemisphere/within the African sector) or during times that suffered data paucity during early stages of model development (e.g. Habarulema, 2010;Habarulema & Ssessanga, 2017;McKinnell & Poole, 2004;Okoh et al., 2012;Ssessanga et al., 2014). Figure 4b are distributed around the 1:1 correlation line (optimal line) especially in the left panel (quiet period), and the horizontal stratification observed in IRI VTEC values is no more. In fact, 4D-var values are less biased and RMSE are reduced by 1.64 (~37.2%) and 3.38 TECU (~45.3%), from those of IRI during quiet and disturbed period, respectively. Comparing left and right panels (quiet day and disturbed day, respectively), eminent is that both 4D-var and IRI perform better during the quiet period, where the correlation values are seen to reach 90% and deteriorate during the storm period, exhibiting an increase in RMSE of 69.1% (3.05 TECU) and 47.3% (1.31 TECU) for IRI and 4D-var, respectively. Figure 4b, the scatter distributions in the right panel are more irregular about the optimal line compared to those in the left panel. This underperformance could be a result of the following: During the storm period (DOY 251) the EIA (Equatorial Ionization Anomaly) could increase in density and expand to cover a range of ±30°in latitude (e.g., Balan et al., 2018;Mannucci et al., 2005). In addition, during storm conditions strong ionospheric gradients generated in the polar regions have been observed to appear in the middlelatitude and low-latitude ionosphere propagating equatorward (e.g., Jakowski et al., 2005;Stankov et al., 2006;Trichtchenko et al., 2007). Such phenomena cause steep gradients in middle-latitude ionosphere which are complex to design and not catered for in our background covariance matrix (B), hence the observed inconsistences in the fidelity of the algorithm. Nonetheless, on average our 4D-var scheme can adequately match the integrated ionosphere produced in CODE GIMs; hence, the scheme can yield analysis output that is worthy utilizing in different technological systems.

Still in
As mentioned above, GIMs cover data gaps through interpolation techniques, especially in ocean areas with few GNSS tracking stations. Therefore, the accuracy of GIMs maps is usually biased toward continents where GNSS ground stations are relatively dense. In addition, GIMs have uncertainties that result from the conversion procedure of STEC to VTEC (see, e.g., Mannucci et al., 1998;Mushini et al., 2009). To more accurately assess our assimilation scheme over the ocean areas, data from a more reliable source (JASON-3 mission) were used. JASON-3 is a satellite altimeter mission that measures the surface height of global oceans using a dual-frequency satellite radar. JASON-3 adopted characteristics of the previous missions (TOPEX, JASON-1/JASON-2), operating at 13.575 GHz (Ku band) and 5.3 GHz (C band), flying at an orbit altitude~1,336 km with an inclination angle of 66°and having a period of 112 min (Fu et al., 1994;Imel, 1994;Jee et al., 2010;Liu et al., 2018). Ionospheric VTEC from JASON-3 is a by-product which is estimated as where dR is Ku band ionospheric range correction and f Ku is Ku band frequency in Hz (Brunini et al., 2005;Dumont et al., 2016). To reduce altimeter inherent noise, dR were smoothed with a running window of size 23 frames for local times between 06 and 24 hr and 30 frames for local times between 00 and 06 hr (see recommendations in Dumont et al., 2016). Numbers (decimal hours in UT) in red and green indicate the start and end of visibility of that particular trace. Each trace is seen to last for a period less than 4 min, which is much less than the 4D-var assimilation window (15 min). Therefore, for the entire lapse of each JASON-3 trace the ionosphere was assumed to be stationary. To extract model (4D-var and IRI) VTEC corresponding to JASON-3 traces, model VTEC maps were reinterpolated to a much finer grid of horizontal resolution 1 × 1°, and JASON data resampled to a 5-s interval. Along the JASON-3 trace, only data points within a distance of 1°to the model grid points were selected to be included in analysis. Figure 4c right panels show a scatter of JVTEC and model (4D-var and IRI) VTEC. JVTEC values greater (less) than 5 TECU correspond to a day (night) time in the JASON-3 traces. Both models (4D-var and IRI) underestimate (overestimate) JVTEC during nighttime (daytime). The night underestimation seems more persistent. This may be attributed to the fact that the quality of the GPS TEC data used in the assimilation procedure can be degraded within a residue accuracy of 2-3 TECU (e.g., see Thompson et al., 2006) and yet during nighttime when solar irradiance is nonexistent by deducting a large plasmasphere contribution (~30%) the ingested TEC values could be pretty low and have large uncertainties. Thus, the 4D-var assimilation might suffer from limitations on the overall accuracy as shown in right panels of Figure 4c.
On DOY 250 (upper right panel), with the longest JASON-3 trace positioned on the western side (15.93-15.994 UT, daytime), 4D-var performs worse than IRI, increasing the RMSE by 14.2% (~0.45 TECU). On the contrary, on DOY 251 (lower right panel), the longest trace is on the eastern side (also daytime) and 4D-var performance is much better than IRI, reducing the RMSE by 42.3% (~2 TECU). This spatial discrepancy in 4D-var performance could be a result of the eastern (western) side having larger (smaller) proportions of GPS receiver stations used in analysis (see Figure 1). Thus, the ocean area on the eastern side was extensively influenced by the measurements, courtesy of ionospheric correlations set in the similarity kernel matrix B. This result once again underscores the need to include in assimilation a data source that covers remote areas such as the oceans.

Validation of Bottomside Vertical Structure
Ground-based ionosondes are among the most accurate tools utilized in probing the bottomside ionospheric densities up to the F2 layer peak; thus, ionosonde data are typically used for calibrating other more complicated observational methods. A network of ionosondes (Digital portable sounders-4D, DPS-4D) exists over the South African region. Figure 1 illustrates the ionosonde locations within the region (indicated by solid red circles on the map), while Table 1 lists the same stations with their five character codes and geomagnetic locations obtained using IGRF-12 model. Hereafter, all ionosonde stations are referred to by the location name. Observation data from the four ionosondes were used in validating the 4D-var assimilation results. Albeit manually scaled ionosonde data are more reliable, in this study readily available automatically scaled data were utilized. The autoscaling was performed using ARTIST 5 (Galkin et al., 2008), a program developed at the University of Massachusetts Lowell. The ionosonde data are available at resolution of 15 min and are archived at the South African National Space Agency (SANSA) and publicly accessible at https://lgdc. uml.edu/common/DIDBFastStationList.
Despite the fact that DPS-4D is a more advanced ionosonde and uses the latest ARTIST software (ARTIST 5), it is imperative to at least put into consideration the error bounds associated with using autoscaled data to characterize the ionosphere. Bamford et al. (2008) and Stankov et al. (2012) have reported on these error bounds with a 95% probability, although the authors were using an older version of ionosondes and ARTIST software. Jeong et al. (2018) have recently compared autoscaled foF2s and hmF2s by ARTIST-5 with manually scaled ones and found the standard deviation of 0.12 MHz and 9.6 km in their differences, respectively, for 1-year ionogram data set from Jeju (33.4°N, 126.3°E). Here, as reference for results presented further in text in which autoscaled data are considered as the "truth," we quote bottomside ionosphere peak parameters error bounds presented in Stankov et al. (2012); foF2 (-0.75, +0.85 MHz), foF1(-0.25, +0.35 MHz), foE (-0.35, +0.40 MHz), h′F2 (-68, +67 km), h′F (-38, +32 km), and h′E (-26, +2 km).
Figures 5 and 6 present time series images of the vertical density profiles recorded at the four ionosonde stations with the sampling rate of 15 min, on DOY 250 (quiet day) and 251 (storm day), respectively. In both Figures 5 and 6, subfigures a, b, c, d, e, and f represent, ionosonde, background model (IRI), 4D-var, and differences ionosonde-IRI, and ionosonde-4D-var, respectively. Because ground-based ionosondes (used as a validation tool in this study) can only measure electron densities to the F2 peak density altitude (hmF2) and the topside profiles were estimated using the Reinisch-Huang technique (see Reinisch & Huang, 2001), the validation analysis hereafter is limited to the bottomside of the electron densities below the F2 peak. For clarity, all images have been limited to a maximum altitude range of 700 km. White spaces in the images indicate regions without ionosonde measurements. Although there are changes in hmF2 after assimilation, in this study the accuracy of hmF2 is not discussed because assimilation of GNSS ground receiver STEC has already been found in other studies to have little or no effect on hmF2 variation (see, e.g., McNamara et al., 2011). In addition, ionosonde hmF2 is not a measured quantity but rather obtained from inversion schemes such as POLAN (Titheridge, 1985) and NHPC (Huang & Reinisch, 1996;Reinish et al., 2005) that are subject to systematic errors (e.g., Sauli et al., 2007). Therefore, hmF2 lines were only indicated as solid gray (ionosonde) and dashed (Red: 4D-var and black: IRI) lines in Figures 5d, 5e, 6d, and 6e in order to vividly track the location of the bottomside ionosphere.
During both quiet and storm days at all stations, evidently, the background model (IRI) only reconstructs the general structure of the bottomside ionosphere without detailed features (see Figures 5b and 6b). This is well illustrated in the differences between IRI and ionosonde results which show that IRI mainly underestimates the densities (quiet time~5 × 10 11 elec/m 3 and storm time~7 × 10 11 elec/m 3 ) in the F region and overestimates densities in E region (at Grahamstown and Hermanus) particularly between 5:00 and 15:00 UT (see Figures 5d and 6d). Table 2 presents the RMSE and correlation coefficient (Corr) between ionosonde measured electron densities and model (IRI, 4D-var) densities below the F2 peak, for the four ionosonde stations. The IRI model exhibits larger RMSE values than the 4D-var model during daytime on both quiet and storm days. This corroborates the 2-D results in section 5.2 where IRI was found to perform worst during daytime. Mostafa et al. (2018) and Yang et al. (2017) also compared the IRI E region estimations to ionosonde measurements. The authors found that IRI had deficiencies in estimating the E layer and associated this underperformance to the IRI model using solar indices such as sunspot number and F10.7 as the proxy for EUV (that is essential in generation of the E layer), despite the indices having been found to poorly represent the solar EUV variability (e.g., Chen et al., 2011;Lean et al., 2001;Liu et al., 2006). At night when the E region nearly vanishes, 4D-var has worse performance as illustrated by the RMSE values in Table 2 and difference images in

Space Weather
Figures 5e and 6e (in the altitude range below 200 km). Because nighttime bottomside densities are generally low, an attempt at resolving such densities while using a much less accurate data set such as TEC is bound to introduce errors in the assimilation results, hence the poor night performance of 4D-var compared to IRI.
Compared to IRI, during daytime a superior performance of 4D-var is eminent at all stations during both quiet and storm periods as shown in Table 2. Average of absolute residues between ionosonde and 4D-var data assimilation results were found to be below 2 × 10 11 elec/m 3 . On the quiet day, during daytime at Grahamstown station where IRI performs worst, 4D-var is able to reduce the IRI RMSE by 44.2%. Still at Grahamstown station, albeit underestimating the density values and the width of the trough (see Figures 5a and 5c) 4D-var is able to reconstruct the structure of F region density perturbations observed in the altitude range 200-300 km, between 8:00 and 12:00 UT. In addition, during storm period at Madimbo and Louisvale stations where IRI performs worse, 4D-var mitigates RMSE by 36% and 46.4%, respectively.
A comparison between quiet and disturbed period RMSE values in Table 2 shows that the 4D-var scheme performs best under quiet conditions. In addition, although 4D-var dramatically improves the daytime vertical structure, nighttime conditions on average exhibit lower RMSE values than daytime conditions (by 40% and 59% on DOY 250 and 251, respectively). Indeed, a time series of difference in Figures 5e and 6e show 4D-var worst performance recorded mainly during daytime between 8:00 and 16:00 UT. This time range covers nearly the same period when IRI had the worst performance. Therefore, the high daytime (compared to nighttime) 4D-var RMSE values could be due to inaccuracies in our background model during this period. In fact, during daytime the overestimation of the E region densities by the background model (especially at Grahamstown and Hermanus stations) is seen to be reproduced in the 4D-var results. Thus, to further improve on the daytime vertical structure, careful steps must be taken in choosing the right background model. It should also be pointed out once again that heights used in constructing the ionosonde time series images (here considered as the truth), particularly the height of the E layer and F2 layer, are not direct measurements, rather derivations from inversion schemes that are subject to systematic errors (e.g., Sauli et al., 2007).
Still during daytime at Madimbo station in Figures 5e and 6e, 4D-var exhibits a poor reconstruction of the densities in the range 200-300 km. The algorithm also introduces some "ghost" images (density like perturbations, between 9 and 15 UT) not observed in the ionosonde data. Such images in ionospheric regional tomography and data assimilation are inevitable and can only be minimized. That is to say, due to limitations of the minimum angle of elevation, GNSS ground receiver ray paths are biased in a vertical sense with horizontal rays through the ionosphere missing (e.g., Yeh & Raymund, 1991;Raymund et al., 1994). In addition, depending on the transmitter (GPS)-receiver geometry, regional boundary conditions limit any rays that cross the vertical minimum boundary, hence creating a deficiency of measurement at those locations (such as Madimbo in this case). Due to such limitations, regional ionospheric data assimilation and tomography techniques are, in general, computationally difficult. Thus, the use of simplifications and the limitations imposed on the computational domain lead to compromises in relation to theoretically optimal approaches, hence the reduced fidelity.

Space Weather
At night, with the exception of Madimbo station, there is a clear general trend at all stations that 4D-var performs worse (has higher RMSE values) than the IRI model, specifically during the quiet period. This degradation in performance may be explained as follows: At night, when a large percentage plasmasphere contribution was deducted from the originally measured TEC, geomagnetic middle-latitude regions (specifically during quiet conditions) are generally known to have TEC values substantially lower than 10 TECU (e.g., see Figure 4e). Therefore, assimilating such low TEC values is bound to introduce errors in the final solution, given that GPS data assimilation schemes can only get to within a residue accuracy of 2-3 TECU (e.g., Bust & Datta-Barua, 2014;Thompson et al., 2006). Moreover, during nighttime with no solar irradiance electrodynamic processes dominate the ionosphere so that the assumed simple Gaussian density correlations in designing the B matrix might not be reflective of the actual ionosphere conditions. Hence, larger RMSE errors can be introduced than those from the background model (IRI). This deficiency points out the need to include another more accurate data source (e.g., ionosonde data) in the assimilation, particularly during nighttime.

10.1029/2019SW002321
Space Weather with lower RMSE and high correlation values at all stations, except at Louisvale where IRI has a slightly higher correlation value.
During the quiet period (DOY 250), 4D-var recorded the highest (62%) and lowest (37%) RMSE improvement from the background model at Grahamstown and Madimbo stations, respectively. The low RMSE improvement at Madimbo is understandable, given that Madimbo is affected by boundary conditions as mentioned earlier. Madimbo station is located furthest away from the GNSS receiver stations (see Figure 1), such that GPS TEC will not have a good effect on estimating the correct foF2 (see also, Decker & McNamara, 2007 ;McNamara et al., 2011).
On the storm day (DOY 251), 4D-var still performs better than IRI at all stations, however, overestimates (underestimates) the foF2 values at Grahamstown (Madimbo) between 5:00 and 11:00 UT. Furthermore, at Grahamstown and Hermanus stations during evening hours (15:00-20:00 UT), 4D-var follows the background model rather than the observed foF2. Since this irregular performance is not observed in the quiet time results (Figure 7), then the potential candidates leading to this underperformance could be the design of B and R. That is to say, the current parameters (e.g., scale values ϒ, η, and horizontal and vertical correlation lengths) used in defining B and R could be appropriate for quiet condition but not storm conditions, in that a misrepresentation of the weighting on B and R during this period biases the 4D-var results toward the background model rather than the measurements. Moreover, during the storm conditions, specifically under 10.1029/2019SW002321 Space Weather regional high-resolution grids, the ionosphere might be more irregular such that the Gaussian model adopted in defining B matrix is not the appropriate functional form.
Still during the storm day, Louisvale has the best improvement of 63% and Grahamstown the least with 29.2%. The fact that the data used in this study has autoscaling errors could have compromised the process of determining the accuracy of our algorithm. For example, at Grahamstown, the data between 16:00 and 17:15 UT show erratic variations which could be caused by an autoscaling "blunder" by ARTIST, hence increasing the RMSE values in the comparison.
Indeed, ionosonde hmF2 values in Figure 6e, during the same period, indicate unrealistic values with the peak densities located above 450 km. While it is advisable to utilize manually scaled ionosonde data, for operational purposes, this may not always be feasible. In order that the assimilation procedure should in general terms provide a realistic picture of the ionospheric variability, more complemental GNSS measurements near the ionosonde stations should be used during the assimilation process. Nevertheless, during both quiet and storm days, our 4D-var results are a better representation of the 3-D picture of the ionosphere over the South African region than the current commonly used and recommended ionosphere model (IRI-2016).

Conclusion
We have successfully developed a four-dimensional variation assimilation (4D-var) scheme and utilized it to estimate the 3-D picture of the ionosphere over the South African region. Being a preliminary study, only GNSS-TEC data obtained from the South African TrigNet network were ingested, and its impact was assessed during both geomagnetic quiet and storm periods.
Results show that the 4D-var scheme performs best during quiet conditions. The horizontal integrated 2-D structure is estimated to be most accurate over the continent and less so over the remote areas (Oceans), specifically over the western side where GPS receiver stations are less abundant. By comparing assimilation results with ionosonde measurements, we found that assimilation of TEC significantly improves the estimation of the bottomside ionosphere during both quiet and storm periods. This improvement is best pronounced during daytime. At night when ionospheric densities are low, the 4D-var exhibits inconsistences in estimating the bottomside. The superiority of 4D-var over empirical models such as IRI is distinct when estimating the foF2 parameter. The assimilation of TEC dramatically reduces the IRI foF2 RMSE values at all ionosondes during quiet and storm periods.
Although our assimilation results have shown significant improvements over the background model, the following issues need to be addressed in the future.
1. The current 4D-var scheme assumes a perfect model (strong constraint). This limits the algorithm to relatively short assimilation windows in order to inhibit the growth of initial condition errors over time.
However, studies have shown that albeit the weak constrained 4D-var is cumbersome to develop, it provides more accurate results since the dynamical evolution of the model errors along the time trajectory is considered. 2. The assimilation performance deteriorates from the mainland toward the remote areas (oceans) where GPS receivers are less abundant. Therefore, to improve overall accuracy measurements such as radio occultation TEC and satellite altimeter data (JASON-3) must be included in the assimilation. 3. At some ionosonde stations (Grahamstown and Hermanus), the background IRI model overestimated the E region during daytime. This error was seen to be projected in the 4D-var results. To mitigate such errors, we suggest that the background densities should be constructed from an ensemble of ionosonde measurements, rather than an empirical model. 4. We did not analyze the scheme's ability to correctly estimate the peak height (hmF2) because the currently assimilated data (TEC) have a minimal effect on hmF2 (e.g., see McNamara et al., 2011). Hence, we suggest that bottomside profile data from ionosondes should be included in the assimilation to improve the entire bottomside profile of the ionosphere specifically the E region. 5. To have a more accurate and true representation of the ionosphere, spatial correlation lengths used in defining the background error covariances matrix need to be improved. That is to say, local/regional data sets should be collected and analyzed to determine the appropriated spatial and time correlation lengths that are local time, season, and geomagnetic condition dependent.