Validation of SAGE III/ISS Solar Occultation Ozone Products With Correlative Satellite and Ground‐Based Measurements

The Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) was launched on 19 February 2017 and began routine operation in June 2017. The first 2 years of SAGE III/ISS (v5.1) solar occultation ozone data were evaluated by using correlative satellite and ground‐based measurements. Among the three (MES, AO3, and MLR) SAGE III/ISS retrieved solar ozone products, AO3 ozone shows the smallest bias and best precision, with mean biases less than 5% for altitudes ~15–55 km in the midlatitudes and ~20–55 km in the tropics. In the lower stratosphere and upper troposphere, AO3 ozone shows high biases that increase with decreasing altitudes and reach ~10% near the tropopause. Preliminary studies indicate that those high biases primarily result from the contributions of the oxygen dimer (O4) not being appropriately removed within the ozone channel. The precision of AO3 ozone is estimated to be ~3% for altitudes between 20 and 40 km. It degrades to ~10–15% in the lower mesosphere (~55 km) and ~20–30% near the tropopause. There could be an altitude registration error of ~100 m in the SAGE III/ISS auxiliary temperature and pressure profiles. This, however, does not affect retrieved ozone profiles in native number density on geometric altitude coordinates. In the upper stratosphere and lower mesosphere (~40–55 km), the SAGE III/ISS (and SAGE II) retrieved ozone values show sunrise/sunset differences of ~5–8%, which are almost twice as large as what was observed by other satellites or model predictions. This feature needs further study.


Introduction
The Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) is the second instrument from the SAGE III project. It was launched on a SpaceX Falcon 9/Dragon spacecraft on 19 February 2017 and began routine operation in June 2017. Similar to its predecessors, SAGE I (1979SAGE I ( -1981, SAGE II (1984-2005, and SAGE III/M3M (2001)(2002)(2003)(2004)(2005)(2006), SAGE III/ISS uses the solar occultation technique to retrieve vertical profiles of ozone (O 3 ), water vapor (H 2 O), nitrogen dioxide (NO 2 ), and aerosol extinctions at multiple wavelengths (e.g., Mauldin et al., 1985;McCormick et al., 1993;Thomason et al., 2010;Wang et al., 2006). In addition, SAGE III can utilize the multispectral measurement of the oxygen A-band (758-771 nm) to derive vertical profiles of temperature and pressure (Pitts & Thomason, 2003). The SAGE series of observations has provided valuable data for understanding global ozone trends (SPARC/IO3C/GAW, 2019; WMO [World Meteorological Organization], Scientific Assessment of Ozone Depletion, 2018) and the impact of volcanoes and human activities on stratospheric aerosol (SPARC, 2006). SAGE III/ISS can also observe the atmosphere at night by using the lunar occultation technique. Lunar occultation is achieved by rotating the solar attenuator out of the optical path and using a fully programmable charge coupled device (CCD) that enables selection of different spectral channels and integration times. The lunar observations can provide vertical profiles of ozone (O 3 ), nitrogen dioxide (NO 2 ), nitrogen trioxide (NO 3 ), and chlorine dioxide (OClO). A separate algorithm (e.g., Rault, 2005;Rault & Loughman, 2013) is being developed to retrieve trace gases from limb scattering measurements, which are still research products and not yet available to the public.
Unlike the first SAGE III instrument on the Meteor 3M spacecraft (SAGE III/M3M), which was in a sun-synchronous orbit providing observations in the northern hemisphere at mid to high latitudes (~45°-80°N), and in the southern hemisphere at midlatitudes (~35°-60°S), SAGE III/ISS is in a mid-inclination orbit (51.6°). The solar observations can provide near global (~70°S-70°N) measurements on a monthly basis with coverage similar to that of the SAGE II measurements. There is, however, some loss of measurements due to the obscuration of the Sun by the ISS and limitations to operations due to spacecraft visits to ISS. The sampling coverage of SAGE III/ISS solar observations can be augmented by lunar measurements, which occur at locations and times not covered by solar observations. McCormick et al. (2020) evaluated the first year of SAGE III/ISS ozone profiles with ozonesondes and lidar measurements at Hohenpeissenberg and Lauder, as well as with observations from the Atmospheric Chemistry Experiment Fourier Transform Spectrometer (ACE-FTS). They found that the percentage differences between SAGE III/ISS solar ozone and ground-based ozonesonde/lidar measurements are generally less than 10% in the stratosphere, with the best agreement of~5% between SAGE III and Hohenpeissenberg lidar for altitudes of 20-40 km. The seasonal average differences between SAGE III and ACE-FTS were also found to be less than 5% between 20 and 45 km.
In this paper, we evaluate the quality of SAGE III/ISS version 5.1 solar ozone data by using longer period of SAGE III data (i.e., the first 2 years) and including more correlative satellite and ground-based measurements. Section 2 describes the SAGE III retrieval algorithm, solar ozone products, and some known anomalies in the current algorithm. The correlative satellite and ground-based datasets are described in section 3. Section 4 describes the coincidence criteria and validation methodology. The comparison results are shown in section 5, followed by the conclusions in section 6.

Instrument and Retrieval Overview
The SAGE III instrument makes solar occultation measurements by scanning a relatively small field-of-view (0.5 arcmin in the vertical and 5.0 arcmin in the horizontal) vertically across the face of the Sun and focusing the light into a simple grating spectrometer. The spectrometer uses a CCD array with 809 spectral columns with resolutions of~1-2 nm that provide nearly continuous spectral coverage between~280 and~1,035 nm as well as a single photodiode covering 1,542 nm ± 15 nm. These 809 CCD pixels are then subsampled (i.e., read out individually or co-added or averaged with other pixels) into a number of "pixel groups" that change for different modes of operation. For solar occultation, there are 86 of these pixel groups (87 including the photodiode) that fall into 12 different channels illustrated in Figure 1. For comparison, the central wavelengths of the seven channels used by SAGE II are also shown in Figure 1.
The current retrieval algorithm for SAGE III/ISS is version 5.1, which is essentially the same as that used for SAGE III/M3M. A complete description of the SAGE III retrieval algorithm is available in the SAGE III Algorithm Theoretical Basis Document: Solar and Lunar Algorithm (SAGE III ATBD, 2002). The algorithm consists of two main parts, the transmission algorithm and the species inversion algorithm. The transmission algorithm involves taking the raw uncalibrated radiance counts from the CCD (and photodiode) and converting them into line-of-sight (LOS) transmissions at each wavelength and tangent altitude. The species inversion algorithm uses these multiwavelength LOS transmission profiles to derive vertical profiles of trace gas concentrations and aerosol extinction coefficients. This is done by first removing modeled contributions from Rayleigh scattering and O 4 absorption, then separating the remaining LOS transmission profiles into the contributions from each species of interest, and lastly inverting these LOS contributions into vertical profiles of concentration or extinction using a global fit inversion method (or a nonlinear Levenberg-Marquardt onion peeling method for water vapor or temperature/pressure retrievals).
The solar occultation retrieval for SAGE III actually produces three separate ozone products. The "MES" (mesospheric) algorithm uses absorption features in the ultraviolet (<300 nm) to retrieve vertical profiles betweeñ 45 and~100 km. The other two use ozone absorption in the Chappuis band (near 600 nm) to retrieve vertical profiles from the surface or cloud top up to 70 km. Each uses the same pixel groups in the spectral channel surrounding 600 nm (Channel 5) but differ in how they treat aerosol and NO 2 within the retrieval. The "MLR" (multiple linear regression) algorithm uses Channels 5 and 3 (~450 nm) to solve for both O 3 and NO 2 simultaneously while making an assumption about the spectral shape of aerosol extinction through each channel. The "AO3" (aerosol ozone) algorithm removes the contributions from NO 2 that were solved in the MLR retrieval and then uses all of the data between Channels 4 and 11 (see Figure 1), excluding the O 2 A-band and the H 2 O channels, to better constrain the influence of aerosol. The AO3 algorithm is similar to the retrieval used for the SAGE II instrument (e.g., Chu et al., 1989;Damadeo et al., 2013). It is worth noting that while the AO3 algorithm explicitly solves for aerosol extinction in each channel, this solution is not reported. Instead, the reported aerosol is computed as a residual based on the MLR algorithm after solving for ozone and NO 2 .

Known Anomalies in Version 5.1
The SAGE III/ISS instrument is by far the most/best characterized SAGE instrument. The detailed knowledge of the intricacies of the instrument's behavior and performance allow the SAGE III team to incorporate several new algorithms to improve the data quality. One such characterization is that of the spectral stray light within the spectrometer (re-entrant spectra). While the instrument was still on the ground, a thorough characterization of the spectral stray light was performed on the instrument, and one particular problem area was identified. A portion of the light incident on the UV range of the CCD actually comes from near the peak of Chappuis ozone absorption. This will have a negative impact on the mesospheric ozone retrieval and needs to be corrected. While a rudimentary correction is currently implemented, it stems from an ad hoc correction derived for SAGE III/M3M data and does not use the most up-to-date information. As such, we do not recommend the MES ozone product for validation or research studies as it is still preliminary.
The SAGE III/ISS algorithm uses auxiliary temperature and pressure data from MERRA-2 (Modern-Era Retrospective analysis for Research and Applications, version 2) (Global Modeling and Assimilation Office [GMAO], 2015;Gelaro et al., 2017), which is necessary for modeling refraction and molecular (Rayleigh) scattering. These data are provided with geopotential heights, which the SAGE III algorithm converts to geometric altitudes at the location of the measurements. It has been discovered that this conversion between geopotential height and geometric altitude, which was actually copied from the SAGE II algorithm, was never thoroughly vetted and is more of an approximation (i.e., it assumes that the surface gravity is not latitude dependent). As such, the current altitude registration of the meteorological products that pass through the algorithm, but not the retrieved profiles of species concentrations or aerosol extinctions, are biased on the order of 100 or so meters (altitude and latitude dependent) as shown in Figure 2 (see Appendix A for a  recommended correction). The impact of this mis-registration would be most noticeable when converting SAGE III retrieved ozone from native number density on geometric altitude to mixing ratio on pressure (VMR/P) coordinates when using the reported temperatures and pressures in the SAGE data files, especially at higher altitudes (see Appendix A). It is, of course, also noteworthy to point out that, since the code was present in the SAGE II v7.0 algorithm, that data product has a similar bias when making the same conversion to VMR/P coordinates.
Since aerosol measurements are intertwined with ozone measurements (i.e., through partitioning of the slant-path transmissions into the contributions from ozone, aerosol, and other interfering gases), assessing the quality of the aerosol product can also yield information about the quality of the ozone product. While aerosol extinctions at different wavelengths will vary with atmospheric conditions (e.g., total amount and type of aerosol from volcanoes and/or fires), it is expected that the "aerosol spectrum" (i.e., extinction as a function of wavelength) should be slowly varying and monotonic in almost all stratospheric conditions (Thomason et al., 2010). Instead, the aerosol spectrum derived from SAGE III/ISS measurements exhibits a "dip" near 600 nm that has different characteristics in different altitude regimes (latitude dependent) as shown in Figure 3. At altitudes in the troposphere and lowermost stratosphere (below~20 km in the tropics), this dip follows the shape of the ozone cross sections and is systematically larger at lower altitudes. The primary contribution appears to be an error in the creation of the spectroscopic database for O 4 used by the retrieval algorithm (i.e., a preprocessing error, not an error in the source cross sections themselves), which has two strong absorption features in the Chappuis spectral range (at~577 nm and~630 nm) covered by the instrument. This yields the incorrect spectroscopic line shape of O 4 , which aliases into the retrieval and results in a solution for ozone that should be too large (discussed later) when the contribution to extinction from O 4 is significant (i.e., scales with density squared). Since aerosol is solved as a residual using MLR ozone, any systematically large ozone would cause systematically small aerosol showing a wavelength dependence that scales with the ozone cross sections. At altitudes above the lowermost stratosphere (⪆20 km in the tropics), this dip still follows the shape of the ozone cross sections but scales with the ozone mixing ratio. A possible explanation for this is that the overall magnitude of the source ozone cross-section database is too large by 1-2% percent in the Chappuis relative to the other channels, but this requires further study. It is noteworthy that the magnitude of the dips is smaller in the aerosol data produced by the AO3 algorithm (not shown). This suggests that the use of additional aerosol channels in the retrieval better constrains the allowable shape of the aerosol spectrum, resulting in a potentially more robust aerosol data product. The SAGE team is investigating if the aerosol solution from the AO3 algorithm should be the released data product in future versions. of the nine aerosol wavelengths (blue dots). The median extinction values at each wavelength are shown by red squares, with the red line being a simple linear interpolation between points to aid the eye in seeing the spectrum. The log of extinction is fit with the log of wavelength using a simple quadratic function for each event at this altitude, and the median of fit values is shown in black. A dip between the data and the fit is clearly visible in the channels surrounding the Chappuis. (b) The median value of the relative residuals between the data and the fit to the aerosol spectrum (i.e., (data-fit)/fit) at each wavelength and altitude for the same latitude range and time period as (a). Results at midlatitudes are similar, simply shifted down in altitude. Gray stippling denotes areas where aerosol extinction data does not exist. The median residuals in the channels used for the quadratic fit are <1% between~20 and 30 km.

Aura MLS
The Earth Observing System (EOS) Microwave Limb Sounder (MLS) aboard the Aura satellite has provided daily global measurements of ozone (O 3 ) profiles and other trace gases from the upper troposphere to the upper mesosphere from August 2004 to present. Aura MLS measures thermal radiance emissions in five broad regions between 118 GHz and 2.5 THz by scanning the Earth's atmospheric limb vertically from the ground to~90 km (Waters et al., 2006). Aura is in a sun-synchronous near-polar orbit with ascending equatorial crossing time of ∼13:45 LT. Unlike the UARS MLS instrument, which observed limb emission in a direction perpendicular to the spacecraft flight direction, Aura MLS observes emission from the atmosphere directly ahead of the satellite. This results in near global-coverage from both daytime and nighttime measurements with~3,500 profiles each day.
Aura MLS ozone retrieved from the 240 GHz spectral region by using an optimal estimation approach (Livesey et al., 2006;Rodgers, 2000) is the standard reported ozone product. It has a vertical resolution of 2.5-3 km from the upper troposphere to the lower mesosphere and~5 km in the upper mesosphere. As indicated by comparisons with correlative measurements, the estimated accuracy of MLS v2.2 ozone is within about 5% for much of the stratosphere. The biases increase with decreasing altitudes, with some systematic positive biases of 10-20% in the lowest portion of the stratosphere Livesey et al., 2008) and~20-30% in the upper troposphere (Jiang et al., 2007).
The latest Aura MLS v4.23 ozone data were used in this study. MLS v4.2x ozone profiles are very similar to v2.2 in the stratosphere and above, so the validation results for v2.2 product generally hold for the v4.2x product (Livesey et al., 2018). MLS v4.2x ozone profiles are retrieved on 12 surfaces per decade between 316 hPa and 1 hPa, twice as fine a resolution as that used in v2.2. There are several improvements in MLS v4.2x ozone retrievals. The high bias of MLS v2.2 ozone at 215 hPa is reduced in v4.2x. Compared to v3.3 ozone, v4.2x reduces the vertical oscillation behavior in the tropical upper troposphere and lower stratosphere (UT/LS) regions (although some oscillations still exist). The sensitivity of retrieved ozone to thick clouds is also improved in the v4.2x product. In this study, MLS v4.2x ozone data were screened based on the recommendations of Livesey et al. (2018).

OSIRIS
The Optical Spectrograph and InfraRed Imaging System (OSIRIS) on board the Odin satellite has been taking limb scattering measurements of the atmosphere from November 2001 to present. It operates at wavelengths of 280-810 nm, with a spectral resolution of~1 nm (Llewellyn et al., 2004;McLinden et al., 2012). The Odin satellite has a polar orbit with equatorial crossing local times at~6:00 p.m. (ascending node) and at 6:00 a.m. (descending node). OSIRIS provides near global coverages up to 82°latitude for time periods near the equinoxes and coverages of the sunlit summer hemisphere for the rest of year. There is no coverage of the midlatitude to high-latitude winter hemisphere.
The OSIRIS SaskMART v5.0x ozone data are retrieved using the multiplicative algebraic reconstruction technique (MART) (Degenstein et al., 2009;Roth et al., 2007) and the SASKTRAN spherical radiative transfer model (Bourassa et al., 2008;Zawada et al., 2015). The retrieval algorithm simultaneously merges information from UV and VIS radiances. Ozone number density is retrieved on an altitude grid from 60 km down to cloud tops (or 10 km during absence of clouds). Additionally, NO 2 number density and stratospheric aerosol extinctions are retrieved from 40 km down to cloud tops. The vertical resolution of retrieved ozone is~2 km at low altitudes. The resolution decreases toward higher altitudes and reaches 3 km at 50 km.
Through intercomparisons with other satellite and in situ measurements, the OSIRIS ozone data show good agreement (within 5%) with correlative measurements for altitudes above 20 km. Between 20 km and the tropopause OSIRIS shows negative biases of~5-20% for latitudes between 40°S and 40°N (Adams et al., 2014). It was also found that OSIRIS ozone biases depend on the OSIRIS optics temperature, retrieved aerosols, and albedo. The latest OSIRIS v5.10 ozone data, with a drift correction of sensor pointing bias, are used in this study. The drift in previous OSIRIS v5.07 ozone data (Hubert et al., 2016) is attributed to a changing bias in the procedure to determine the tangent altitudes of limb radiance profiles (Bourassa et al., 2018). There is no further filtering applied to OSIRIS data in this study since the OSIRIS v5.10 ozone profiles have been screened for outliers, based on the techniques described by Adams et al. (2013), prior to its distribution to the public.

ACE-FTS
The Atmospheric Chemistry Experiment-Fourier Transform Spectrometer (ACE-FTS) is a solar occultation instrument that records spectra between 2.2 and 13.3 μm (750-4,400 cm −1 ) at a high spectral resolution of 0.02 cm −1 Bernath, 2017). ACE-FTS was launched on the SCISAT satellite in August 2003. Measurements are made during each sunrise and sunset per orbit. ACE-FTS measurements are made up to 30 times per day. The volume mixing ratios of ozone and other trace gases as well as temperature and pressure are retrieved from cloud tops to~100 km by a modified global fit approach based on the Levenberg-Marquardt nonlinear least-squares method (Boone et al., , 2013. The final results are provided on the measurement (tangent height) grid, with vertical resolution of 3-4 km, and interpolated to a 1 km interval using a piecewise quadratic method.
When compared with the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) and the Aura MLS, the ACE-FTS v3.5 ozone generally agree within 5% in the middle stratosphere (~20-45 km) and exhibit a positive bias of~10-20% in the upper stratosphere and lower mesosphere (Sheese et al., 2017). ACE-FTS also tends to show negative bias with respect to MIPAS and MLS below 20 km. The negative bias increases with decreasing altitudes and reaches~20-30% near 10 km.
The ACE-FTS version 3.5 data extend from February 2004 to March 2013. A new version number (version 3.6) is used for data onward when the version 3.5 processor was ported from a Unix to Linux based system. Although the ACE-FTS team just released version 4.0 data, we used version 3.5/3.6 data because version 3.5/ 3.6 data are still the recommended data set for scientific and validation studies at the time of writing. Data quality flags based on Sheese et al. (2015) are provided in version 3.5/3.6 netCDF files. All ACE-FTS data with a non-zero flag value were excluded from this study (ACE-FTS data usage guide and file description, 2017).

OMPS LP
The Ozone Mapping and Profiler Suite (OMPS) was launched in October 2011 on board the Suomi National Polar-orbiting Partnership (NPP) satellite. OMPS consists of three ozone-acquiring sensors (Flynn et al., 2006) designed to provide profile and total ozone measurements. All three sensors measure scattered solar radiances in overlapping spectral ranges and scan the same air masses within 10 min . The nadir module combines two sensors, the Total Column Nadir Mapper (TC-NM) for measuring total column ozone and the Nadir Profiler (NP) for ozone vertical profiles. The Limb Profiler (LP) module is designed to measure vertical profiles of ozone with higher vertical resolution (~2-3 km) from the upper troposphere to the mesosphere. In this study, we will use OMPS ozone profile products from the Limb Profiler (OMPS LP).
The OMPS LP sensor is based on principles tested in the 1990s by flying the Shuttle Ozone Limb Sounding Experiment on two space shuttle missions, STS-87 and STS-107 (Flittner et al., 2000;McPeters et al., 2000). OMPS LP measures solar radiances scattered from the atmospheric limb in UV and VIS spectral ranges to retrieve ozone profiles with a high vertical resolution. The OMPS LP algorithm retrieves ozone profiles independently from UV and VIS measurements using wavelength pairs in the UV range and triplets in the VIS range (Rault & Loughman, 2013). Measured radiances are first normalized with radiances measured at 55.5 km and 40.5 km for UV and VIS retrievals, respectively. In this study, we use the most recent version 2.5 that was described and validated in Kramarova et al. (2018). Comparisons of ozone profiles derived from OMPS LP with MLS, OSIRIS, and ACE-FTS demonstrated that between 18 and 42 km, the mean biases are within ±10%, with the exception of the northern high latitudes where larger negative biases are observed between 20 and 32 km due to a thermal sensitivity issue (Kramarova et al., 2018). In the upper stratosphere and lower mesosphere (>43 km) OMPS LP tends to have a negative bias against Aura MLS, ACE-FTS, and OSIRIS instruments. In the UTLS below 15-18 km, especially in the tropics, negative biases increase up tõ 30%. A positive drift of 0.5% yr −1 against MLS and OSIRIS was found that was more pronounced at altitudes above 35 km. Such a pattern is consistent with a possible 100 m drift in the LP sensor pointing detected in the analysis of LP radiances (Kramarova et al., 2018).

Ozonesondes
Ozonesondes are balloon-borne in situ instruments that can provide ozone profiles from the surface to the middle atmosphere (~30-35 km) with a high vertical resolution (~100-150 m). When standard operating procedures are followed, the three most commonly used sonde types produce consistent results. For altitudes between the tropopause and~28 km, the systematic biases are less than 5% with precision better than 3% (Smit and ASOPOS panel, 2014). At higher and lower altitudes, the ozonesonde data quality degrades and the differences between different sonde types become larger. In the troposphere, the Electrochemical Concentration Cell (ECC) type sondes have the best quality with estimated accuracy of 5-7% and a precision of 3-5% (Smit and ASOPOS panel, 2014). Ozonesonde data from the Southern Hemisphere Additional Ozonesondes (SHADOZ) network Witte et al., 2017), World Ozone and Ultraviolet Radiation Data Center (WOUDC, https://woudc.org), and National Oceanic & Atmospheric Administration (NOAA) (https://www.esrl.noaa.gov/gmd/ozwv/ozsondes/) are used to evaluate the SAGE III/ISS data. Ozonesonde stations used in this study can be seen in Table 1.

Stratospheric Ozone Lidar
The Differential Absorption Lidar (DIAL) is a powerful technique to measure the vertical distribution of ozone in the stratosphere and troposphere with a vertical resolution of several hundred meters near the tropopause to 3-5 km in the upper stratosphere (Godin et al., 1999;Leblanc et al., 2016a). This technique uses two (or more) laser wavelengths which are chosen such that one has strong ozone absorption and the other has much lower absorption. The concentration of ozone is retrieved by measuring the different intensities of the backscatter light at two wavelengths. The choice of selected laser wavelengths depends on whether the measurement is intended for the troposphere or stratosphere (Mégie et al., 1985).
We used stratospheric ozone lidars in the Network for the Detection of Atmospheric Composition Change (NDACC, http://www.ndacc.org), which provide ozone number density versus geometric altitude profiles between the tropopause and 45-50 km. The precision of NDACC ozone lidar data is~1% up to 30 km, 2-5% at 40 km, and 5-25% at 50 km (Keckhut et al., 2004). Intercomparisons of different processing algorithms within the NDACC network indicate that the biases in retrieved ozone are~2% for altitudes between 20 and 35 km and increase to~5-10% at other altitudes (Keckhut et al., 2004;Leblanc et al., 2016b). Those larger biases are due to lower signal-to-noise ratio or saturation of the detectors. By comparing lidars with ozonesondes and satellites, Nair et al. (2012) also showed biases less than ±5% in the lidar for altitudes between 20 and 40 km. We used data from five stratospheric ozone lidars in the NDACC networks ( Table 2) that provide overlapping data with SAGE III/ISS in this study.

Methodology
To evaluate the quality of SAGE III/ISS ozone data with correlative measurements, we need to consider impacts from (1) spatial/temporal differences (mismatch), (2) different horizontal and vertical resolutions (smoothing), and (3) converting ozone profiles to different coordinates (auxiliary) (Hubert et al., 2016;von Clarmann, 2006). Common coincidence criteria are used to minimize the effect of spatial and temporal differences (i.e., mismatch error) between SAGE III/ISS and correlative measurements. For satellite comparisons, coincident profiles need to be on the same date with latitude difference less than ±2°and distance between them less  Journal of Geophysical Research: Atmospheres than 1,000 km. When there is more than one correlative ozone profile with a SAGE III/ISS ozone profile, the closest one in space is used. For comparisons with ground-based measurements, more relaxed coincidence criteria are used, with temporal differences of ±24 hr, and spatial differences of ±5°in latitude and distance less than 1,000 km. The larger coincidence criteria for ground-based measurements is to ensure there are enough correlative data to characterize the bias and precision of SAGE III ozone while minimizing the effects due to temporal and spatial variabilities.
There is no good way to minimize the effect of different horizontal resolutions between instruments (e.g., satellite measurement vs. ozonesondes); the ozone profiles from instruments with finer vertical resolution, however, can be smoothed before comparison to minimize the biases due to different vertical resolutions. For comparisons between SAGE III and MLS, the SAGE III ozone profiles were interpolated to MLS levels using a least-squares linear fit method recommended by the MLS science team (Livesey et al., 2018). The MLS averaging kernels and a priori profiles were not applied to interpolated SAGE III ozone profiles (e.g., Rodgers & Connor, 2003), because the effect of further smoothing by applying MLS averaging kernels has been shown to be very small (e.g., Adams et al., 2014). This is because the MLS averaging kernels are close to delta functions (sharply peaked and with vertical resolution comparable to the MLS retrieved profile level spacing). Finally, the MLS and SAGE III ozone number density profiles at varying geometric altitudes were linearly interpolated to every 1 km interval.
ACE-FTS ozone has a vertical resolution of~3-4 km. Ozone data are retrieved at tangent altitudes, with vertical spacing of~1.5 km at lower altitudes increasing to~6 km in the mesosphere. Retrieved ozone profiles are then interpolated to a 1 km interval by using a piecewise quadratic method. To minimize the effect of different vertical resolutions, the SAGE III/ISS ozone profiles were first smoothed at ACE-FTS retrieved tangent altitudes using a weighted Gaussian distribution function with a full width at half maximum (FWHM) that approximates the vertical resolution of ACE-FTS (Kar et al., 2007;Sheese et al., 2017). The smoothed SAGE III ozone profiles were subsequently interpolated to a 1 km grid before comparing with ACE-FTS data. Alternatively, the SAGE III ozone profiles can be smoothed by a triangular function with full width at the bases equal to the vertical resolution of ACE-FTS (Dupuy et al., 2009). It has been found that the choice of smoothing function (e.g., triangular or Gaussian function) does not introduce systematic bias when comparing ozone profiles with different vertical resolutions although it may introduce a slight difference in random errors (Hubert et al., 2016). The OSIRIS and OMPS LP have similar vertical resolutions of~2 km in most of the stratosphere and~3 km in the upper stratosphere and lower mesosphere. Similarly, the SAGE III ozone profiles were smoothed by the Gaussian distribution with FWHM corresponding to the vertical resolution of OSIRIS and OMPS LP. The ground-based ozonesondes and lidar (in the UT/LS regions) have higher vertical resolution than SAGE III. Correlative ozone profiles from ozonesondes and lidar, therefore, were smoothed (by Gaussian function) according to the SAGE III resolution (~1 km) before further intercomparisons.
In order to compare collocated ozone profiles between SAGE III/ISS and correlative measurements, those profiles need to be on the same coordinate. Due to an altitude registration error in current SAGE III/ISS v5.1 temperature and pressure data (see discussion in section 2), we used ozone in the SAGE III native retrieval coordinate, number density on geometric altitude. Ozone profiles in different coordinates (e.g., mixing ratio on pressure or mixing ratio on geometric altitude) from Aura MLS, ACE-FTS, and ozonesondes were converted to SAGE III native coordinates using their own observed temperature data, except for Aura MLS. Although Aura MLS also measures temperatures and retrieves geopotential heights (GPH) along with each ozone profile, there are seasonally and latitudinally repeating systematic errors in GPH (Livesey et al., 2018). The assimilated meteorology fields from the MERRA-2 (Global Modeling and Assimilation Office [GMAO], 2015), therefore, were used. The MERRA-2 temperatures (with resolution of 0.625°in longitude, 0.5°in latitude, 72 model layers from surface to 0.01 hPa, and every 3 hr) were first interpolated to MLS locations and pressure levels. The geopotential heights at MLS pressure levels were then derived by using the hypsometric equation and reference altitude from MERRA-2. With interpolated MERRA-2 temperatures and geopotential heights corresponding to the MLS grid, the original MLS ozone profiles can be converted to number densities on geometric altitudes.
To assess the overall quality of SAGE III/ISS ozone data with correlative measurements, we use the following two metrics: the mean relative differences and the standard deviations of relative differences. The mean bias (relative difference), D z ð Þ, in percentage is defined as 10.1029/2020JD032430 Journal of Geophysical Research: Atmospheres where n(z) is the number of coincident profiles and x s (z) and x c (z) are ozone number density at a particular altitude (z) from SAGE III and correlative measurement, respectively. The SAGE III reported uncertainty along with retrieved ozone contains random errors from three primary sources: (1) line-of-sight optical depth measurement error, (2) estimated Rayleigh scattering, and (3) uncertainty associated with removal of contributions from interfering gases and aerosol (SAGE III ATBD, 2002). In order to verify SAGE III reported random errors and provide additional information regarding the significance of the bias and the upper limit of the precision of SAGE III/ISS ozone data, we calculate the standard deviation of bias-corrected differences. The de-biased standard deviation is a measure of the combined precision of instruments that are being compared (von Clarmann, 2006) and is represented as where n(z) is the number of coincidences, D i (z) is the relative difference for the ith coincident pair, and D z ð Þ is the mean relative difference at a particular altitude (z).

Comparisons of the SAGE III/ISS Solar Ozone Between AO3 and MLR Algorithms and Between Sunrise and Sunset Measurements
As mentioned earlier in section 2, SAGE III/ISS produces two solar ozone products based on the ozone absorption in the Chappuis band by two different retrieval algorithms. The mean differences and reported uncertainties from these two ozone products are shown in Figure 4. The mean differences between AO3 and MLR ozone are negligible between 20 and 50 km (i.e., less than 0.5%) but become larger toward higher or lower altitudes. For altitudes above 50 km, the MLR ozone shows increasing high biases, reaching~20-30% at 60 km. In the lower stratosphere below 20 km the MLR ozone also shows increasing high biases (with decreasing altitudes), as large as~20% at 10 km. As expected, both MLR and AO3 ozone show the smallest uncertainties around the ozone peak area. The uncertainties become larger toward higher and lower altitudes where there is less ozone or larger contributions from other interfering trace gases and aerosol in the retrieval algorithms. The reported uncertainties in MLR ozone are a few percent between 20 and 30 km. They become larger than 100% for altitudes above~55 km and below 10 km. The mean uncertainties in AO3 ozone are approximately 2-3 times smaller than those of MLR ozone.
Using the residual analysis detailed in Damadeo et al. (2014), we can get an assessment of random errors in AO3 and MLR ozone. The time series of observed ozone (averaged within a specific temporal/spatial window) contains information about the natural variability and instrument uncertainties. The natural variability of ozone can be approximated by a regression model with predictors for seasonal cycle, long-term trend, quasi-biennial oscillation (QBO), solar cycle, and so forth. The spread of the residuals from the regression of observed ozone data can be used to ascertain the quality of the regression model and relative random errors in AO3 and MLR ozone retrievals. The total residuals consist of the correlated and uncorrelated residuals. The correlated residuals come from autocorrelation within the data and typically represent the natural variability that is not well represented by the regression model. Uncorrelated residuals represent a combination of measurement uncertainty and geophysical variability that is not well-sampled (e.g., zonal variability within the daily zonal means used for this analysis). For the purpose of this validation study, we only care to look at the uncorrelated residuals as an indication of data quality or precision. Since the choice of regression model has little bearing on the uncorrelated residuals, a rather simplistic model consisting only of a seasonal cycle (a 12-month Fourier series with four harmonics) was used for this analysis, applied to all SAGE III/ISS data between June 2017 and May 2019.
The spreads of the uncorrelated residuals from the regression of AO3 and MLR ozone are shown in Figure 5, which can provide an estimate of the upper limit of uncertainties in both datasets. This is an upper limit because zonal variability within each daily zonal mean used for this analysis will also increase the uncorrelated residuals. However, since the sampling is identical between the two data products, a direct comparison of the uncorrelated residuals yields information about the intrinsic data quality of each data product independent of any correlative source instrument. We can see that the uncorrelated residuals are similar throughout most of the stratosphere between the two products (~1-3%). The MLR ozone, however, is significantly noisier than the AO3 product both in the upper-most stratosphere and mesosphere and in the lowermost stratosphere and troposphere. These results are similar to those from a study (Wang et al., 2006) of SAGE III/M3M data using comparisons with other correlative data sets. While useful as an independent comparison of the relative data quality of the two data products, evaluating the statistics of the uncertainties (or precisions) for individual profiles via comparisons of correlative measurements can help mitigate the impact of the dynamical variability in the regression sample size (i.e., a daily zonal mean) and will be evaluated in later sections.
It has been reported that there is a difference in observed ozone values between sunrise and sunset from solar occultation instruments (Brühl et al., 1996;Kyrölä et al., 2013;Sakazaki et al., 2015;Wang et al., 1996).

Measurements from the Halogen Occultation Experiment (HALOE), ACE-FTS, and Superconducting
Submillimeter-Wave Limb-Emission Sounder (SMILES) show that the sunset values are higher than sunrise by 3-5% between 40 and 50 km (Sakazaki et al., 2015). SAGE II shows similar features as HALOE, ACE-FTS, and SMILES, but the magnitude of sunrise/sunset differences is approximately twice as large as those from other satellites, especially in the tropics during January (Wang et al., 1996). Based on observations from SMILES and the Specified Dynamic version of the Whole Atmosphere Community Climate Model (SD-WACCM), Sakazaki et al. (2013Sakazaki et al. ( , 2015 attributes the observed sunrise/sunset differences in the upper stratosphere to the vertical transport of atmospheric tidal winds, which reach a maximum in the tropics and during the winter season (December to February). The reason for the larger sunrise/sunset differences in SAGE II is not clear, but it is worth investigating whether a similar situation occurs in the SAGE III/ISS ozone data.
To investigate the sunrise/sunset differences in SAGE III/ISS retrieved ozone, we used two different methods. The first one is to apply the regression model described in Damadeo et al. (2018) to both SAGE II and SAGE III/ISS data simultaneously to derive the mean difference between sunrise and sunset data. In this regression model, special techniques are used to account for the nonuniform temporal, spatial, and diurnal sampling of the different instruments, as well as biases between them. There is currently insufficient sampling orthogonality within the SAGE III/ISS data set to differentiate seasonal variability from diurnal variability, so including SAGE II data (given its own diurnal cycle) helps constrain this. The lack of overlap between the two data sets is accounted for by considering SAGE III/ISS as an extension of the SAGE II product (i.e., not including any proxy terms that represent overall bias or drift between the instruments in the regression), which is acceptable since we are not interested in trend results in this work. The results are shown in Figure 6. Both AO3 and MLR ozone show similar results, with sunset values higher than sunrise by~5-10% in the upper stratosphere, though the pattern of differences is more coherent for the AO3 product than the MLR product. The sunrise values, however, become slightly larger than sunset in the lower stratosphere below 25 km. The sunrise/sunset differences are also larger in the tropics than midlatitudes. The vertical and latitudinal distributions of sunrise/sunset differences are consistent with the dynamical variations from atmospheric tidal winds (Sakazaki et al., 2013(Sakazaki et al., , 2015. We also used Aura MLS as transfer standard to evaluate the differences between SAGE III/ISS sunrise and sunset measurements. Figure 7 shows comparison results between SAGE III/ISS AO3 ozone, separated by sunrise or sunset, and coincident Aura MLS nighttime measurements. As shown in Figure 7, SAGE III/ISS sunset values are systematically higher than sunrise values by~5-8% for altitudes between 40 and 55 km. In the lower stratosphere between the tropopause and~25 km, the sunrise values become slightly larger (less than 5%) than sunset values. Similar results were also found by using MLR ozone compared against collocated Aura MLS data or comparing sunrise and sunset measurements directly (e.g., Wang et al., 1996) when they were observed on the same dates and approximately at the same locations (e.g., ±1°latitude, ±5°longitude, figures not shown). The reason for the large sunrise/sunset difference in SAGE retrieved ozone in the upper stratosphere is not clear, but since it occurs in both SAGE II and SAGE III/ISS, it could relate to the retrieval algorithm and needs further investigation.

Comparisons Between SAGE III/ISS and Other Satellites
Among the correlative satellite instruments, the Aura MLS provides the most comprehensive global coverage (from 82°S to 82°N) each day with the equatorial crossing time at~1:45 a.m. and 1:45 p.m. The comparisons between SAGE III/ISS retrieved stratospheric ozone products and Aura MLS nighttime measurements are shown in Figure 8. We used MLS nighttime measurements to minimize the effect of ozone diurnal cycle on the differences between SAGE III and MLS, since the SAGE III measurements occur during sunrise and sunset which in general yield ozone values that are closer to nighttime than daytime ozone (Parrish et al., 2014;Sakazaki et al., 2013).
SAGE III/ISS AO3 ozone shows very good agreement with Aura MLS for altitudes between~20 and 55 km, with differences less than 5%. The differences become larger toward the lower stratosphere and upper troposphere and reach~10% near the tropopause, with SAGE III ozone values higher than MLS. Above 55 km the SAGE III ozone values are systematically lower than those from Aura MLS with negative biases of~10% at 60 km and 40-60% at 65 km. The larger biases (e.g., >40%) between SAGE III and Aura MLS in the mesosphere cannot be completely explained by the ozone diurnal cycle (e.g., sunrise/sunset vs. nighttime) (Parrish et al., 2014). These biases could result from errors in the MERRA-2 temperature data in the mesosphere and/or deficiencies in SAGE III AO3 retrieval algorithm. We used MERRA-2 data to convert MLS ozone from mixing ratio and pressure coordinates to SAGE's native number density and geometric altitude coordinates. Any systematic error in auxiliary temperature and pressure data can lead to errors in converted MLS ozone profiles, but the evaluation of MERRA-2 temperature data in the mesosphere is outside the scope of this paper. Since the SAGE III AO3 ozone product is retrieved using the Chappuis band, the weakly attenuated signals in the mesosphere could yield degraded results in that region. Instead, the SAGE III/ISS MES algorithm may provide more information for mesospheric ozone after correcting for the stray light problem.
The SAGE III/ISS MLR ozone shows similar features as AO3 when compared against Aura MLS. The relative differences with MLS are less than 5% between 20 and 55 km for all latitudes. The differences, however,

10.1029/2020JD032430
Journal of Geophysical Research: Atmospheres become larger at higher and lower altitudes. In the lower mesosphere above 60 km, SAGE III MLR ozone shows positive biases of 20% or more for some latitudes. This is contrary to what is expected from the ozone diurnal cycle. SAGE III MLR ozone also shows positive biases in the lower stratosphere, with mean differences of approximately 10-30% in the middle to high latitudes and greater than 60% near the tropical tropopause.
The mean relative differences and standard deviations between SAGE III/ISS AO3 and MLR ozone against Aura MLS are summarized in Figure 9. Between the two SAGE III retrieved solar ozone products, the AO3 shows overall better accuracy and precision than MLR ozone. The systematic biases in AO3 ozone are less than 3% from~15 km to 55 km in the midlatitudes and~20 km to 55 km in the tropics. The biases increase with decreasing altitudes and reach~10% near the tropopause. The differences between SAGE III AO3 and MLS also become larger for altitudes above 55 km due to an increase of the ozone diurnal cycle. The SAGE III/MLS differences oscillate with altitude in the lower stratosphere and upper troposphere (UT/LS) especially in the tropics. This mainly results from Aura MLS which reports ozone on a slightly finer vertical grid than its actual vertical resolution in that region (Livesey et al., 2018). SAGE III MLR ozone shows similar biases as AO3 for altitudes between 20 and 50 km, but the biases become larger outside those altitudes. This is consistent with the earlier results of direct comparisons between SAGE III AO3 and MLR ozone data ( Figure 4). The MLR retrieved ozone also shows larger uncertainties than AO3 in the upper stratosphere and lower mesosphere (above 40 km) and in the UT/LS regions (below 20 km), as indicated by the larger standard deviations in Figure 9, which is consistent with results from the independent regression analysis shown in Figure 5. Similar features are also found in comparisons between SAGE III MLR ozone and other satellites (figures not shown). Because of the larger uncertainties and biases in MLR ozone for altitudes above 50 and below 20 km, we recommend using SAGE III AO3 ozone for scientific studies. In the following sections, we will just focus on validation results for SAGE III AO3 ozone.
The comparisons between SAGE III/ISS AO3 ozone and ACE-FTS, OSIRIS, and OMPS LP are shown in Figure 10. Both SAGE III and ACE-FTS use solar occultation techniques to measure ozone. Due to limitation of the orbit geometry, there are no collocated SAGE III/ACE-FTS ozone profiles in the regions between the equator and 20°S and poleward of 60°S. The differences between SAGE III and ACE-FTS are in general within 5% between 15 and 45 km. Above 45 km, SAGE III shows a negative bias of~10%. Below 15-20 km, SAGE III values become larger than ACE-FTS by 10-20% in midlatitudes (Figure 10a). This is consistent with an earlier study, which shows ACE-FTS v3.5 ozone has a positive bias of 10-20% in the upper stratosphere and mesosphere (>45 km) and negative bias of 20-30% in the UT/LS (Sheese et al., 2017). SAGE III and OSIRIS show the best agreement between 20 and 50 km. The differences are generally within 5%, except in the northern hemisphere around 30 km, where the differences are slightly larger than 5% (Figure 10b). The reason for this hemispheric difference is not known, but it does not occur in the comparisons between SAGE III against Aura MLS and ACE-FTS.
We can see that all satellite measurements show good agreement, with differences less or slightly larger than 5%, in the middle stratosphere, except for OMPS LP in the northern midlatitudes near 28-31 km (Figures 10c and 11). This is due to the thermal sensitivity problem in the OMPS LP instrument, which causes negative biases of 10-15% in retrieved ozone from the visible spectral ranges (Kramarova et al., 2018). In the upper stratosphere and lower mesosphere (e.g., above~45 km) the differences between SAGE III and other correlative measurements become larger. This is due to the ozone diurnal cycle and/or known biases in those datasets. For example, SAGE III shows negative biases of 5-10% relative to ACE-FTS in the upper stratosphere and lower mesosphere. This is due to known positive biases in ACE-FTS ozone in those regions (Sheese et al., 2017). SAGE III also shows altitude dependent high biases versus OMPS LP, with mean differences of~5% at 45 km and~15-20% at 52 km ( Figure 11). This is an artifact resulting from the known low biases (~10%) in OMPS LP ozone in the upper stratosphere and lower mesosphere (Kramarova et al., 2018) and the ozone diurnal cycle. In the upper stratosphere and mesosphere, the ozone levels show a strong depletion during the daytime and recover at night. The OMPS LP measurements mainly occurs during daytime (e.g., at local solar time~1:30 p.m.), while SAGE III takes measurements during sunrise and sunset when ozone values are closer to nighttime measurements. The day-night ozone differences are~10% at 50 km and increase to~60% at 65 km (Parrish et al., 2014). The low biases in OMPS LP ozone for altitudes above 45 km, therefore, would be further enhanced by the ozone diurnal cycle when compared with SAGE III and result in altitude-dependent structure as shown in Figure 11.
The comparisons between SAGE III and OSIRIS ozone for altitudes above~50 km show similar features (e.g., altitude-dependent biases) as those in SAGE III/OMPS LP comparisons. OSIRIS is on a sun-synchronous satellite, which observes ozone mainly at local solar time between 6:30 and 7:30 a.m. (closer to daytime ozone values). The observed differences between SAGE III and OSIRIS for altitudes above 50 km are consistent with what we expect from day-night ozone differences. The effects of the ozone diurnal cycle on the comparisons between SAGE III and Aura MLS or ACE-FTS in the upper stratosphere and lower mesosphere are smaller. This is because MLS nighttime measurements (~1:45 a.m.) were used in this study, and the ACE-FTS also makes measurements during local sunrise or sunset.
In the lower stratosphere and upper troposphere SAGE III ozone in general shows high biases against other correlative satellite measurements, with mean relative differences of~5-10% against Aura MLS and ACE-FTS from 20 km down to the tropopause. Most, if not all, of this bias is likely the result of the O 4 spectroscopy problem discussed in section 2.2. The differences between SAGE III and OSIRIS and OMPS LP are larger (~10-20%) in the southern hemisphere midlatitudes and in the tropics. This is most likely related to low biases in OSIRIS and OMPS LP ozone measurements in the UT/LS regions (Adams et al., 2014;Kramarova et al., 2018).
The standard deviations of relative differences between SAGE III and other satellite measurements, except ACE-FTS, show similar magnitudes and vertical structures. The smallest standard deviations of~5% are

10.1029/2020JD032430
Journal of Geophysical Research: Atmospheres found in the middle stratosphere (e.g., between 20 and 40 km). The standard deviations increase to~10% at 50 km and~20% at 60 km. The smaller standard deviations of differences between SAGE III and ACE-FTS in the upper stratosphere and lower mesosphere are due to both instruments making observations during sunrise and sunset with smaller noise. Below 20 km the standard deviations also become larger. These increases result from both measurement uncertainties and mismatch (inexact coincidence) between SAGE III and other satellites. The lower stratosphere and upper troposphere is a challenging area for satellite ozone observations. SAGE III ozone in the UT/LS will be further evaluated by ground-based measurements in the following section.

Comparisons Between SAGE III/ISS and Ground-Based Measurements
The ozonesondes and stratospheric ozone lidars were used to further evaluate the SAGE III/ISS ozone in the UT/LS region. The geolocations and data sources of ozonesondes and lidar and number of coincident profiles found for each with SAGE III are listed in Tables 1 and 2, respectively. For ozonesondes the tropical stations are mainly from the Southern Hemisphere ADditional OZonesondes (SHADOZ) network Witte et al., 2017). Although there are few coincident profiles (e.g., from 1 to 8) between SAGE III and individual ozonesonde stations in SHADOZ, the ozonesonde data have been processed with the same processing technique to minimize the inhomogeneities in ozonesonde data records. This enables us to group SHADOZ data in the tropics to provide better statistics for estimating SAGE III ozone biases in that region. Outside the tropical latitudes, ozonesondes from the WOUDC and NOAA Earth System Research Laboratories (ESRL) (Johnson et al., 2018) were used. There are five NDACC stratospheric ozone lidar stations that provide correlative measurements during the first 2 years of SAGE III operation (e.g., June 2017 to May 2019). Those stations are listed in Table 2.
Due to limited coincident profiles between SAGE III and ground-based measurements, the medians and spreads (defined as one half of the differences between the 84th and 16th percentiles) of relative differences are better diagnostics to represent the biases and random errors in SAGE III retrieved ozone. The median and spread are the same as the mean and standard deviation when the statistical sample has a Gaussian distribution (e.g., Wang et al., 2002). The occurrence of outliers in the distribution, however, can lead to larger standard deviations and introduce a discrepancy between the mean and median for a non-Gaussian

10.1029/2020JD032430
Journal of Geophysical Research: Atmospheres (asymmetric) distribution. For comparisons between SAGE III (or other satellites) and ground-based measurements, there could be outliers in the statistical sample due to anomalous data not being filtered out and/or large dynamic variability in the UT/LS (i.e., mismatch between SAGE III and ground-based measurements). The median and spread are more robust statistics to minimize the effect of outliers, especially for a distribution with small sample size. The relative percentage differences between SAGE III/ISS and lidar are shown in the middle panel. The mean and median relative differences are indicated by the black and red colors, respectively. The blue lines indicates differences estimated from averaged ozone profiles (see text). In the right panel, the standard deviations of mean and 1 − σ spreads of median differences are indicated by green and black lines, respectively. The standard deviations of coincident SAGE III/ ISS (red) and lidar (blue) profiles are also shown.

Journal of Geophysical Research: Atmospheres
The comparison results between SAGE III and lidar are shown in Figure 12. The analysis is performed using all collocated profiles in three broad latitude bands, southern midlatitudes (60°S to 20°S), tropics (20°S to 20°N), and northern midlatitudes (20°N to 60°N). There is only one lidar station in each of the southern midlatitude and tropical bands, Lauder and Mauna Loa, respectively. For northern midlatitudes, measurements from Hohenpeissenberg, Observatoire de Haute-Provence (OHP), and Table Mountain Facility are used. Both SAGE III and lidar show maximum ozone concentrations near 22-23 km in the midlatitudes and 26-27 km in the tropics (Figure 12 left panel). The ozone variabilities indicated by the standard deviations generally increase from the upper stratosphere down to the lower stratosphere and upper troposphere. SAGE III and lidar observations show similar results with standard deviations between 10% and 20% for altitudes between 20 and 40 km. The standard deviations increase to~50-60% in the UT/LS regions due to larger dynamical variability and smaller ozone amounts (Figure 12 right panel). The best agreement between SAGE III and lidar is found between 20 and 40 km. SAGE III shows a small positive bias of~5% against all lidar observations except at Mauna Loa, where SAGE III ozone shows slightly larger high biases of~5-10% between 30 and 40 km (Figure 12 middle panel). The reason for this is not clear, but SAGE III ozone is in good agreement (within 5%) with other satellites at the same altitude ranges in the tropics (Figure 11).
In the southern midlatitudes above~42 km, SAGE III and Lauder ozone lidar show mean differences of 40% or larger and standard deviations greater than 60%. The median differences, however, are only ±10%. The larger mean differences and standard deviations, compared to medians and spreads, between SAGE III and Lauder in the upper stratosphere are driven by outliers in the lidar measurements. Those outliers also contribute to larger standard deviations (by approximately a factor of 2 than SAGE III) in lidar observed ozone values (Figure 12 right panel).
In the lower stratosphere below 20 km, the systematic (median) differences between SAGE III and lidar measurements are within 10% except for Lauder. The systematic biases between SAGE III and lidar can be approximated (to first order) by the relative difference between averaged SAGE III and lidar ozone values (e.g., S¯− Lð Þ =L¯; where S¯and L¯indicate averaged ozone values from all collocated SAGE III and lidar profiles, respectively). This method can also minimize the impact of outliers. It yields similar results as those from the median of relative differences, except in the lower stratosphere at Lauder (Figure 12 middle panel). This is probably related to the fact that the number of coincident SAGE III and lidar ozone profiles at Lauder is too small (i.e., 13 profiles).
Similar analyses were performed between SAGE III and ozonesondes, and the results are shown in Figure 13. In the midlatitudes, SAGE III ozone values are generally biased high against ozonesondes with differences of~5% for altitudes above 15 km. The biases increase toward the lower stratosphere and upper troposphere and reach~10% at 12-13 km. The standard deviations (approximated by the spreads) of mean relative differences are~5% near the ozone peaks and become larger above and below the peaks. The standard deviations increase to~30-40% at 15 km and~50% near the tropopause. The comparisons between SAGE III and ozonesondes in the tropics show similar vertical structure as those in the midlatitudes. SAGE III ozone values are systematically higher than sonde ozone values by~5% for altitudes above 20 km. The biases increase rapidly toward the UT/LS and reaches~10% at 17-18 km and~40% (or greater) at 15-16 km. It should be noted that comparison results for altitudes below 17 km in the tropics are not robust because both the standard deviations and spreads of relative differences are larger than those of SAGE III and ozonesondes measurements and combined uncertainties ( Figure 13). Similar situations also occur for altitudes below 12 km in the midlatitudes.

Estimated Biases and Precisions of SAGE III/ISS AO3 Ozone
The comparisons between SAGE III/ISS solar ozone data and correlative satellite and ground-based measurements are summarized in Figure 14. Since there is known thermal sensitivity issue in the OMPS LP ozone data (Kramarova et al., 2018), the OMPS LP data between 28 and 32 km (e.g., Figure 11) were filtered before calculating the means and standard deviations of relative differences between SAGE III and other satellites. There is no additional filtering for Aura MLS, ACE-FTS, OSIRIS, lidar, and ozonesonde data. The median and spread are used for comparisons between SAGE III and ground-based measurements for reasons discussed earlier. Based on these correlative measurements, the systematic error (bias) of SAGE III/ISS AO3 ozone in the stratosphere is less than 5% for altitudes down to 15 km in the midlatitudes and 20 km in the tropics. The biases increase toward lower altitudes and reach~10% at the tropopause. In the southern hemisphere midlatitudes the SAGE III/ISS ozone show a positive bias larger than 10% near 15 km comparing to correlative satellite data ( Figure 14). This is due to larger biases between SAGE III/ISS and OMPS LP and OSIRIS in that region (e.g., Figure 11). The SAGE III/ISS, however, shows much better agreement (<10%) with Aura MLS and ozonesondes in the same region. The larger biases (>5%) between SAGE III/ISS and other satellites for altitudes above~50 km are due to the diurnal cycle effects not being removed from the comparisons as discussed earlier in section 5.2.
The standard deviation of relative differences between SAGE III/ISS and correlative measurements can be used as an approximation of measurement uncertainty in the SAGE III instrument. However, this becomes invalid when large fraction of the standard deviations are due to temporal/spatial mismatches and/or large uncertainties in the correlative measurements. The variance of the differences between SAGE III and collocated measurements contains uncertainties from not only SAGE but also correlative measurements and from uncertainties associated with natural variability (e.g., Sofieva et al., 2014).
where x s and x c are SAGE III and correlative measurements, respectively. The σ 2 (nat) is the variance contributed by the natural variability, which can be minimized by using stricter coincidence criteria. The uncertainties of satellite measurements generally become larger with decreasing altitude toward the UT/LS regions. This can be seen in Figure 14, where the standard deviations of relative differences between SAGE III and correlative satellite measurements increase from~5% at 20 km to~50-60% near 10 km. Although the ground-based measurements (e.g., ozonesondes) have better precision in the UT/LS region, the mismatch errors between SAGE III and ground-based measurements are larger (e.g., due to more relaxed coincidence criteria). Furthermore, the satellite measurements cover a larger spatial domain while ground-based observations represent a much smaller area. The different horizontal resolution (e.g., smoothing error) could further enhance the mismatch error. Due to the above-mentioned reasons, the standard deviations between SAGE III and ground-based measurements are similar or even larger than those in SAGE III and satellite comparisons ( Figure 14).
To better assess the precision of SAGE III ozone measurements especially in the UT/LS, we used the method in Fioletov et al. (2006). In Fioletov et al. (2006), it is assumed that paired measurements are perfectly collocated (i.e., no mismatch error). In reality, it is almost impossible to have SAGE III and correlative measurements at the same location and time. The effect of spatial and temporal differences, however, could be minimized by using tighter coincidence criteria. We used more stringent coincidence criteria of ±1°latitude, ±5°longitude, and the closest in time (within the same day) for this purpose. The estimated precisions of SAGE III AO3 ozone based on comparisons with correlative Aura MLS and OMPS LP data are shown in Figure 15. We did not use other correlative satellite or ground-based measurements because there were fewer coincident profiles with SAGE III compared to those with Aura MLS and OMPS LP.
By comparing SAGE III/ISS against collocated Aura MLS measurements the estimated precision of SAGE III ozone is approximately 3% (e.g., 2-4%) between 20 and 40 km and~10-15% at 55 km ( Figure 15). Below 20 km, the precisions of SAGE III ozone degrade toward lower altitudes and reach~20-30% near the tropopause. Similar results can be seen in the comparisons between SAGE III and OMPS LP except in the tropical UT/LS region. Since both analyses, between SAGE III and Aura MLS and OMPS LP, show consistent results, this indicates that the derived precisions of SAGE III ozone data are robust. The estimated precisions of SAGE III ozone shown in Figure 15 are in general slightly larger than the random errors reported by the SAGE retrieval algorithm (e.g., Figure 4). This is probably due to the small residual effect of spatial and temporal differences between SAGE III and correlative measurements (mismatch error cannot be completely removed from the analyses by the coincident criteria).

Conclusions
The Stratospheric Aerosol and Gas Experiment III on the International Space Station (SAGE III/ISS) was launched in February 2017 and started routine operation in June 2017. It is the second SAGE III instrument but with better latitudinal coverage. Similar to SAGE II, it provides near global observations on a monthly basis. The first 2 years of SAGE III/ISS version 5.1 solar ozone data were evaluated by using correlative measurements from satellites (Aura MLS, ACE-FTS, OSIRIS, and OMPS LP) and ground-based instruments (lidar and ozonesondes). There are three retrieved ozone products, denoted as AO3, MLR, and MES, from SAGE III solar occultation measurements. The first two (AO3 and MLR) algorithms both use ozone absorption in the Chappuis band but different methods to separate ozone and other interfering gases from the observed slant path radiances (SAGE III ATBD, 2002). The third algorithm (MES) uses ozone absorption in the ultraviolet band, which can provide better ozone signals at higher altitudes (e.g., above 45 km). The MES retrieval algorithm, however, is affected by a spectral stray light problem, which has not been properly corrected. The MES ozone product, therefore, is currently not recommended for scientific studies.
To evaluate the quality of SAGE III/ISS solar ozone data, appropriate procedures have been applied to SAGE III and correlative measurements to minimize the biases and uncertainties associated with mismatch (spatial/temporal differences) and different smoothing (e.g., resolution) in respective observations. The coincidence criteria are a trade-off between mismatch uncertainties and sample size (number of coincident profiles), especially for comparisons between SAGE III and ground-based measurements. There is no good way to remove the horizontal component of smoothing differences, which, however, would become part of random errors (in addition to measurement uncertainty) with a sufficiently large sample size (e.g., Cortesi et al., 2007). To remove or minimize the vertical component of smoothing differences between measurements, the method recommended by the instrument science team or of applying a Gaussian averaging kernel (e.g., Kar et al., 2007;Sheese et al., 2017) to the profiles with finer vertical resolution was used. Since there are altitude registration errors of approximately 100 m in the auxiliary temperature and pressure profiles in SAGE III/ISS version 5.1 data, we used ozone number density on geometric altitude as the common coordinate for comparisons. The altitude registration errors in SAGE III temperature and pressure profiles are due to a simplistic approximation in the geopotential height to geometric altitude conversion. It should be noted that this error would not affect SAGE III ozone on its native retrieved grids, number density, and geometric altitude, unless the profiles are converted to mixing ratio on pressure coordinate by using the auxiliary temperature and pressure profile accompanying each ozone profile.
For ozone retrieved from the AO3 and MLR algorithm, it was found that MLR ozone has larger biases (e.g., by 10% or higher) and uncertainties (by a factor of 2 to 3) in the UT/LS and above the upper stratosphere by comparisons with correlative measurements or using residual analyses (Damadeo et al., 2014). These results are similar to a previous study (Wang et al., 2006) for the SAGE III/M3M instrument. SAGE III/ISS AO3 ozone show very good agreement with correlative measurements, with mean biases less than 5% for altitudes down to~15 km in the midlatitudes and~20 km in the tropics. The differences become larger in the lower mesosphere (e.g., 10-15% near 60 km), which mainly results from the ozone diurnal cycle not being removed from the comparisons. In the lower stratosphere and upper troposphere, the SAGE III/ISS AO3 ozone show systematic high biases that increase with decreasing altitudes and reach~10% near the tropopause. The

10.1029/2020JD032430
Journal of Geophysical Research: Atmospheres precision of SAGE III/ISS AO3 ozone is estimated to be~3% between 20 and 40 km. The precision degrades toward higher and lower altitudes due to smaller signal-to-noise ratios in the Chappuis band and the large natural variability in the UT/LS region. The estimated precision in AO3 ozone is~10-15% in the lower mesosphere (55 km) and~20-30% near the tropopause.
In this study, the agreement between SAGE III solar ozone data and correlative sonde/lidar data is found to be better than in the analysis of McCormick et al. (2020). This could result from the tighter coincidence criteria and from having more coincident profiles in this study. Furthermore, more sophisticated techniques were used to account for different vertical resolutions between SAGE III and correlative ozone profiles; linear interpolation of profiles was used in the work by McCormick et al. (2020). For comparisons between SAGE III and ACE-FTS, similar results were found in both studies, with mean differences less than 5% for altitudes above 20 km. For altitudes below 20 km, SAGE III ozone generally showed a negative bias (>10%) relative to ACE-FTS, especially in the Southern Hemisphere, based on McCormick et al. (2020). Positive biases of~5-10% in SAGE III ozone, however, were found in the same altitude range in our study. Similarly, these differences could result from different coincidence criteria being used in both studies, with coincidences defined by latitude differences less than ±5 o (or ± 10 o in the Southern Hemisphere) in McCormick et al. (2020), while a criterion of ±2 o was used in our study. Dissimilar techniques to minimize the effect of differing vertical resolutions could also lead to some discrepancies, especially in the region (e.g., UT/LS) where the vertical gradient of ozone is large.
The sunrise/sunset differences in SAGE III/ISS retrieved ozone were examined by regression analyses and comparisons with correlative Aura MLS data. It was found that SAGE III sunset ozone values are systematically larger than sunrise values by~5-8%, at 40-55 km with mean differences larger in the tropics than at midlatitudes. In the lower stratosphere below~25 km, the sunrise values become slightly larger than sunset values by a few percent. The vertical and latitudinal distribution of sunrise/sunset differences in observed ozone is consistent with the vertical transport of atmospheric tidal winds (Sakazaki et al., 2013). The magnitudes of sunrise/sunset differences in SAGE III/ISS retrieved ozone in the upper stratosphere, however, are almost twice as large as those observed from other satellites and model prediction (Sakazaki et al., 2015). The reason for this is not clear and needs further investigation. The SAGE III retrieval algorithm team is investigating the high biases in retrieved ozone in the UT/LS region. Preliminary studies indicate that the oxygen dimer O 2 -O 2 (or O 4 ) spectroscopy used in the current v5.1 retrieval algorithm could primarily contribute to the observed high biases in ozone. It was also found that an underestimation of aerosol contribution in the ozone absorption band could indicate a potentially small high bias in stratospheric ozone in both the AO3 and MLR algorithms. The effects are more pronounced in the MLR than the AO3 algorithm. This is consistent with our validation results, which show altitude-dependent high biases in both MLR and AO3 retrieved ozone for altitudes below 15-20 km. The biases in MLR ozone are also larger than those in AO3. Further analyses will be made in the future by applying updated O 4 spectroscopy and aerosol clearing procedures in the retrieval algorithm to quantify these effects on retrieved ozone in the upper troposphere and lower stratosphere.

Appendix A: The Effect of Altitude Registration Bias on Ozone and Recommended Conversion
As a known anomaly in v5.1, Section 2.2 describes an altitude registration bias in the reported pressure and temperature profiles that are passed through the algorithm. This appendix details a recommended conversion from which Figure 2 derives. The process involves three simple steps: (1) convert the geometric altitude array upon which the pressures and temperatures are reported (Z OLD ) back to the original geopotential heights (Z Φ ) using the approximation used in the v5.1 algorithm, (2) convert the geopotential heights to geometric altitude (Z NEW ) using a better model, and (3) remap the reported pressures and temperatures on the new geometric altitudes to the desired grid (such as the original grid) using your favorite interpolation scheme.
Step 1 is very straightforward and comes from the overly simplistic assumption that the surface gravity is the same everywhere and is equal to the mean surface gravity (g 0 ) defined as 9.80665 m/s 2 : Step 2 is also straightforward: where g is the surface gravity at a particular geodetic latitude (θ, or "map" latitude). While the model of surface gravity is always being updated, the SAGE algorithm makes use of the World Geodetic System 1984 model (WGS84, updated in 2004) (NIMA Technical Report TR8350.2, 1997), and thus, this provides the recommendation for g: g θ ð Þ¼9:7803253359 1 þ 0:00193185265241*SIN 2 θ ð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 − 0:00669437999014*SIN 2 θ ð Þ p It is important to note that the latitude dependence of R EARTH should be taken into account for all of these calculations.
To evaluate the effect of altitude registration bias in the reported temperature and pressure profiles on ozone, SAGE III/ISS AO3 ozone data were compared against collocated Aura MLS nighttime measurements on volume mixing ratio and pressure coordinates (VMR/P). The coincidence criteria are the same as those Figure A1. Mean differences between SAGE III/ISS AO3 ozone and collocated Aura MLS data at three latitude bands 60°S to 20°S (left column), 20°S to 20°N (middle column), and 20°N to 60°N (right column). SAGE ozone profiles are converted to MLS coordinates by using reported (red) and bias corrected (blue) temperature and pressure profiles. The percentage difference is calculated as (SAGE-MLS)/MLS * 100%. described in section 4. SAGE III/ISS AO3 ozone profiles were converted to VMR/P by using accompanying temperature and pressure profiles. The mean biases between SAGE and MLS are generally within 5% between~83 and 0.3 hPa except in the tropics, where larger biases (>5%) are found below~46 and above 1 hPa ( Figure A1). It should be noted that the differences between SAGE and MLS in the tropics show an altitude-dependent structure. SAGE ozone shows increasing positive biases for altitudes above the ozone peak while increasing negative biases below the ozone peak. This is due to the altitude registration errors in reported temperature and pressure profiles that are more pronounced in the tropics than midlatitudes (Figure 2). After correcting the altitude registration errors in the reported temperature and pressure profiles, the SAGE ozone show better agreement with MLS data without the altitude-dependent feature. The mean differences in general are less than 3% for altitudes between 1 and~83 hPa in the midlatitudes and between 1 and~56 hPa in the tropics ( Figure A1).