Study of SO 2 Pollution in the Middle East Using MERRA‐2, CAMS Data Assimilation Products, and High‐Resolution WRF‐Chem Simulations

Oil recovery, power generation, water desalination, gas flaring, and traffic are the main contributors to SO 2 emissions in the Middle East (ME). Satellite observations suggest that the traditional emission inventories do not account for multiple SO 2 emission sources in the ME. This study aims to evaluate the most frequently used SO 2 emission data sets over the ME by comparing high‐resolution regional model simulations and meteorology/chemistry assimilation products, MERRA‐2 and CAMS, with satellite and available ground‐based air‐quality observations. Here, we employ the WRF‐Chem‐3.7.1 regional meteorology‐chemistry model and conduct simulations for the period 2015–2016 with 10 km grid spacing using HTAP‐2.2 emission data sets and the new OMI‐HTAP data, which is based on the combination of the near‐surface SO 2 emissions taken from the HTAP‐2.2 inventory with strong (>30 kt/year) SO 2 point sources obtained from the satellite Ozone Monitoring Instrument (OMI) observations. We find that conventional emission inventories (EDGAR‐4.2, MACCity, and HTAP‐2.2) have uncertainties in the location and magnitude of SO 2 sources in the ME and significantly underestimate SO 2 emissions in the Arabian Gulf. The WRF‐Chem, run in conjunction with the new OMI‐HTAP emissions, improves comparisons between the satellite and ground‐based SO 2 observations. Our simulations show that SO 2 surface concentrations in Jeddah and Riyadh frequently exceed European air‐quality limits. The ME generates about 10% of global anthropogenic SO 2 emissions, on par with India. Therefore, the development of effective emission controls and improvement of air‐quality monitoring in the ME are urgently needed.


Introduction
Atmospheric pollutants, both of natural and anthropogenic origin, have a considerable negative impact on human health and can cause premature mortality (Lelieveld, Evans, et al., 2015;Lelieveld et al., 2019). According to the World Health Organization (WHO, 2018), outdoor air pollution caused 4.2 million premature deaths worldwide in 2016.
Rapid economic and population growth, urbanization, industrialization, excessive traffic emissions, and burning of fossil fuels are the main causes of dramatically deteriorating air quality in developing countries (Janssens-Maenhout et al., 2015). Ongoing climate change will only exacerbate this problem (Leung & Gustafson Jr, 2005). Along with particulate matter (PM), sulfur dioxide (SO 2 ) is one of the most common pollutants. SO 2 is a toxic gas produced by burning sulfur-containing fossil fuels, and it is known to cause inflammation of human airways. In order to protect human health, the WHO (2006) have issued air-quality guidelines recommending that daily average SO 2 surface concentrations should not exceed 20 μg/m 3 . This limit is quite restrictive relative to most national air-quality requirements. The European Union Air Quality Directive (EUEA, 2008) limits SO 2 daily averaged surface concentration at 125 μg/m 3 , and a number of exceedances of this limit during the year should not be more than three. According to the air-quality standards issued by the Presidency of Meteorology and Environment (PME) of the Kingdom of Saudi Arabia To calculate the spatial-temporal distribution of atmospheric pollutants, emissions of SO 2 and other species must be provided to the free-running or data-assimilating models. To quantitatively evaluate air quality accurately, emissions must be correctly characterized and represented. However, conventional emission inventories, such as the Emissions Database for Global Atmospheric Research version 4.2 (EDGAR-4.2) (Janssens- Maenhout et al., 2013), Hemispheric Transport of Air Pollution version 2.2 (HTAP-2.2) (Janssens- Maenhout et al., 2015), and Monitoring Atmospheric Composition and Climate-CityZen (MAC-City) (Granier et al., 2011), are based on countries emissions reports. These reports are often incomplete or outdated, particularly in developing countries, creating major uncertainty in the data. This is especially true in the ME . Therefore, a comparison of model outputs with reanalysis data and their thorough test against available observation is essential.  also pointed out that substantial uncertainties exist in the estimated emissions, especially for regions experiencing rapid changes in the economy such as China and India. For instance, a comparison of different anthropogenic emission inventories using WRF-Chem over Southeast and South Asia has been performed in Amnuaylojaroen et al. (2014) and Sharma et al. (2017). Amnuaylojaroen et al. (2014) showed that the uncertainties of carbon monoxide surface mixing ratios in simulations with different anthropogenic emission inventories reach 30%. Sharma et al. (2017) showed that the modeled ozone mixing ratios during noontime are sensitive to the choice of an emission inventory.
Advances in satellite measurements have yielded new data and techniques that help to evaluate and improve conventional inventories ("bottom-up" approach). In particular, McLinden et al. (2016) and Fioletov et al. (2016) developed a novel "top-down" method for mapping the major SO 2 emission sources using SO 2 column loadings for the period 2005-2016 from the Ozone Monitoring Instrument (OMI) (Levelt et al., 2006;Li et al., 2013). They released a catalog of SO 2 point emissions, where they identified more than 500 point sources (some of which are not present in the conventional EDGAR-4.2 and HTAP-2.2 emission data sets), with annual SO 2 emission rates ranging from about 30 kt/year to more than 4,000 kt/year. For example, 14 unaccounted SO 2 sources located in the ME were detected, 12 of which are related to oil and gas exploration and refining. Liu et al. (2018) combined the point sources identified in Fioletov et al. (2016) with SO 2 emissions from the HTAP-2.2 inventory to develop the OMI-HTAP data set. They deployed this data set in the Goddard Earth Observing System, version-5 (GEOS-5) global atmospheric model (Molod et al., 2015) and tested the GEOS-5 output with the dense SO 2 air-quality observations measured over Europe and the United States.
In this study we focus on the ME, which is one of the most polluted areas in the world. Because of extremely high levels of natural particulate pollution driven by dust aerosols (Jish Prakash et al., 2016;Kalenderski & Stenchikov, 2016;Parajuli et al., 2019;Tsiouri et al., 2015), the region experiences extreme pollution episodes, compounded by a strong contribution of anthropogenic aerosols and gases (Ahmed, 1990;Barkley et al., 2017;Karagulian et al., 2015;Lelieveld, Beirle, et al., 2015;Lelieveld et al., 2009;Ukhov et al., 2020). ME emits 10.1029/2019JD031993 about 10% of the total global anthropogenic SO 2 (Klimont et al., 2013). These emissions could be involved in the monsoon circulation and have a global effect on atmospheric composition (Lelieveld et al., 2018). SO 2 conventional emission inventories have a high level of uncertainty over the ME, and air-quality observations are sparse. To our best knowledge, there is no thorough intercomparison of the emission inventories over the ME so far.
Thus, this study aims to address the following scientific questions: 1. What is the impact of SO 2 emissions on air quality over the ME region and how well SO 2 pollution is depicted by assimilation products or free-running models? 2. How do conventional anthropogenic emission data sets and MACCity) compare with the newly developed SO 2 emission data set based on the "top-down" approach for the ME region? 3. How do SO 2 column loadings and surface concentrations from MERRA-2, CAMS-OA, GEOS-5, and the WRF-Chem model compare with satellite observations and in situ measurements over the ME?
To answer these questions, we calculated SO 2 distribution for the period 2015-2016 using the WRF-Chem-3.7.1 model with 10 km resolution. We evaluated the output from the WRF-Chem runs with HTAP-2.2 and OMI-HTAP emissions data sets, GEOS-5 model output, and MERRA-2, CAMS-OA products against the satellite retrievals of SO 2 column loadings obtained from the Ozone Mapping Profiler Suite (OMPS) Zhang et al., 2017) and against in situ observations of SO 2 surface concentrations conducted by the Saudi Authorities for Industrial Cities and Technical Zones (MODON) in major cities of Saudi Arabia. Our comparisons show that using WRF-Chem with the OMI-HTAP emissions data set allows to overcome some deficiencies inherent in the currently available assimilation products, and our approach can capture air pollution patterns at finer spatial resolutions.
The rest of the paper is organized as follows: We describe the satellite retrievals of SO 2 column loadings and the assimilation products used in this study, and we describe the WRF-Chem model setup in section 2. In section 3, we evaluate the SO 2 emission data sets used in this study, and we compare the SO 2 column loadings and surface concentrations obtained from the models with the satellite observations and in situ measurements. We present our conclusions in section 4.

Data, Model Setup, and Data Assimilation Products
Below, we discuss satellite and ground-based observations, assimilation products, and WRF-Chem model setup.

SO 2 Satellite
Observations SO 2 column loadings are the most widely used observation products. Long-term data sets are currently available from measurements conducted by NASA Earth Observing System (EOS) Aura OMI (2004-current) (Levelt et al., 2006;Li et al., 2013) and NASA-NOAA Suomi National Polar-orbiting Partnership (Suomi-NPP) Ozone Mapping and Profiling Suite Nadir Mapper (OMPS-NM: 2011-current) Zhang et al., 2017).
We use NASA OMPS total column SO 2 (OMPS-NPP-NMSO2-PCA) data set Zhang et al., 2017). The data set is processed using principal component analysis (PCA) SO 2 retrieval algorithm, which is consistent with the OMI operational product (OMSO2) (Li et al., 2013). The OMPS pixel size is 20 × 20 km 2 , but we use here the gridded (0.5 • × 0.5 • ) monthly average SO 2 columns obtained assuming fixed planetary boundary Layer (PBL) SO 2 profile shape. Although due to its lower than OMI spatial resolution OMPS can detect fewer small point SO 2 emitters, for large point sources the OMPS-derived "top-down" SO 2 emissions are consistent with those from the OMI SO 2 catalog Zhang et al., 2017).

CAMS
The Copernicus Atmosphere Monitoring Service has been conducting near real-time analysis of SO 2 since July 2012 until present within the Monitoring Atmospheric Composition and Climate (MACC) project (https://atmosphere.copernicus.eu/). CAMS-OA product has a resolution of 0.8 • × 0.8 • before 21 June 2016 and 0.4 • × 0.4 • after with 60 vertical levels and a top level at 0.1 hPa. CAMS-OA calculates the atmospheric composition using an extended version of the Carbon Bond chemical mechanism 5 (CB05) (Yarwood et al., 2005) that has been implemented in the ECMWF Integrated Forecast System (IFS) (Flemming et al., 2015). CB05 comprises 54 chemical species and 126 chemical reactions complementing the already integrated modules for greenhouse gases and aerosols, including dust, sea salt, sulfate, black carbon, and organic matter.  Figure 1a). MACCity emissions are based on the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) inventory but has improved seasonal variability (Lamarque et al., 2013).
The generation of sulfate within CAMS-OA is simulated independently of the atmospheric chemistry calculated by CB05. For this purpose, CAMS-OA assumes an independent SO 2 tracer from CB05, driven by the same SO 2 and dimethyl sulfide (DMS) emissions, and converts them into sulfate aerosols. SO 2 oxidation is parameterized using a prescribed latitude-dependent e-folding timescale ranging from 3 days at the 10.1029/2019JD031993 equator to 8 days at the poles (Yarwood et al., 2005). CAMS-OA assimilates MODIS optical depth to constrain aerosols and also only significant stratospheric SO 2 loadings generated by volcanic eruptions. Thus, tropospheric SO 2 remains largely unconstrained.

MERRA-2 and GEOS-5
The Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) Randles et al., 2017) assimilates both meteorological and aerosols/chemistry observations. To simulate atmospheric processes, MERRA-2 utilizes the GEOS-5 model (Molod et al., 2015). Both GEOS-5 and MERRA-2 use the Goddard Global Ozone Chemistry Aerosol Radiation and Transport (GOCART) model (Chin et al., 2002) to simulate dust, sea salt, black and organic carbon, SO 2 , and sulfate. Hydroxide (OH) for SO 2 oxidation is calculated interactively. MERRA-2 assimilates AOD at 550 nm obtained via a neural network retrieval trained on AERONET data. The MERRA-2 aerosol analysis corrects aerosol loadings but does not assimilate any SO 2 observations. Thus, SO 2 concentrations are completely unconstrained . The output fields are available on a 0.625 • × 0.5 • latitude-longitude grid and 72 terrain-following hybrid − p model layers with the top layer at 0.01 hPa. and particulate (PM 10 , PM 2.5 , black and organic carbon) air pollutants. This inventory is based on annual national-level industrial fuel consumption data, agricultural statistical records, empirical information on technological standards influencing the amount of emissions, and emission reduction measures applied. Spatial source mapping uses a set of proxy location data of industrial, agricultural, transport, and urban objects and population density. This approach potentially could cause uncertainty in location and magnitude of emission sources. For example, we detected in EDGAR-4.2 one "hot spot" of unknown origin located in the middle of the desert west of Riyadh. It does not appear in the latest emission inventories. An updated version EDGAR-4.3.1 for 2010 (Crippa et al., 2016) was recently released.
In this study we employ WRF-Chem-3.7.1 and configure it over the ME. Simulations are carried out for the whole period of 2015-2016, for each month separately, starting from the last week of the preceding month. This week is considered to be a spin-up and is excluded from further analysis. The simulation domain (see Figure 1) with 450 × 450 grid points is centered at 28 • N, 42 • E. We use a 10 × 10 km 2 Mercator projection horizontal grid and a 50-level vertical grid with enhanced resolution closer to the ground comprising 11 model levels within the near-surface 1-km layer. The model top boundary is set at 50 hPa.
To improve the representation of the meteorological fields, we apply spectral nudging (Miguez-Macho et al., 2004) above the PBL (>5.0 km) to horizontal wind components (u and v) toward MERRA-2 wind fields. The nudging coefficient for u and v is set at 0.0001 s −1 . We nudged waves with wavelengths larger than 450 km. This allows to keep the large-scale motions close to "observations," and let the model freely develop small-scale features.
The aerosol/chemistry initial conditions and boundary conditions (IC&BC) are calculated using MERRA-2 output using the newly developed Merra2BC interpolation utility (Ukhov, 2020); see Appendix A.1. To be consistent with aerosol/chemistry IC&BC, we also define the meteorological IC&BC using MERRA-2 output (see Appendix A.1).
The Regional Atmospheric Chemistry Mechanism (RACM) (Stockwell et al., 1997) containing 77 chemical species and 237 chemical reactions, including 23 photochemical reactions, is used for the modeling of the atmospheric chemistry. It is embedded into WRF-Chem using the Kinetic PreProcessor (KPP) (Damian et al., 2002). The role of KPP is to integrate the system of stiff nonlinear ordinary differential equations, which represents the specified set of chemical reactions. An online computation of the photolysis rates for the 23 photochemical reactions of the RACM gas-phase chemical mechanism is implemented according to Madronich (1987). The set of physical parameterizations used in WRF-Chem runs is presented in Table 1. More details on the physical parameterizations can be found online (http://www2.mmm.ucar.edu/ wrf/users/phys_references.html).
Similar to MERRA-2, the GOCART aerosol and chemistry module is used to calculate SO 2 to SO 4 oxidation (Chin et al., 2002) by the hydroxide radical OH whose abundance is simulated by RACM.
To evaluate the dependence of SO 2 concentrations on emissions, we ran WRF-Chem with two of the latest emission data sets: HTAP-2. HTAP-2.2 presents emissions as sector-specific grid maps. The sectors for all substances are defined based on the emission category: international and domestic air traffic, international shipping, industry, transportation, and residential. HTAP-2.2 emissions are based on the latest official regional information consistent with air pollutant grid maps from the US Environmental Protection Agency and Environment Canada for North America, European Monitoring and Evaluation Programme (EMEP) (Visschedijk et al., 2009) for Europe, and Model Inter-Comparison Study (MIX)  for Asia. For most countries of the ME and North Africa, HTAP-2.2 emissions are identical to those defined in the newly released EDGAR-4.3.1 data set for 2010 (Crippa et al., 2016). The "hot spot" of unknown origin (see section 2.3) is not present in EDGAR-4.3.1 and HTAP-2.2 emissions data sets (see Figure 1).

OMI-HTAP Emissions
Traffic + residential + ships: lowest model layer Industrial emissions: 100-500 m GEOS-5 OMI-HTAP Surface emissions: lowest model layer Elevated emissions: 100-500 m WRF-Chem HTAP-2.2 Traffic + residential + ships: 0-1,000 m Industrial emissions: 120-1,000 m WRF-Chem OMI-HTAP Surface emissions: 0-1,000 m Elevated emissions: 120-1,000 m Here we apply the OMI-HTAP emissions data set for 2015-2016 within the WRF-Chem regional model configured for the ME region, where Fioletov et al. (2016) and McLinden et al. (2016) found significant uncertainties in the conventional emission inventories. We have to modify OMI-HTAP slightly as we found that the location of the point source associated with the Jeddah power plant was shifted in the catalog Fioletov et al. (2016). Thus, we corrected this error as we know the precise location of this plant.
Original OMI-HTAP (Liu et al., 2018) is provided without shipping and aviation emissions. Since shipping emissions are important over the Red Sea and Arabian Gulf, we supplemented OMI-HTAP with shipping emissions from HTAP-2.2. The aviation emissions are not accounted for as they are relatively small. Natural sulfur emissions associated with volcanoes are negligible in the ME relative to anthropogenic inputs, and thus, we do not account for them. Dimethyl sulfide (DMS) emissions are parameterized within the GOCART (Chin et al., 2002). Emissions of all other constituents (PM, black and organic carbon, etc.) are taken without any modification from the HTAP-2.2 inventory. SO 2 emissions reported in Fioletov et al. (2016) are annual mean since the catalog does not provide information on their monthly variability. Liu et al. (2018) assigned some plausible seasonal variability to OMI-HTAP, but here we use the OMI-HTAP data set in the annual mean format.

Results
Below we provide the analysis of the SO 2 emissions, SO 2 column loadings, and surface concentrations from MERRA-2, CAMS-OA, and GEOS-5 simulations, as well as WRF-Chem fine-resolution simulations focusing specifically on the ME region. To test the SO 2 fields, we compare them with the OMPS observations  and the obtained near-surface air-quality measurements. The inventory of all products/runs is presented in Table 2. Figure 1 compares the conventional SO 2 emission data sets: EDGAR-4.2 for 2008, MACCity for 2010, and HTAP-2.2 for 2010 with the OMI-HTAP data set for 2016. All emission inventories were conservatively interpolated on the WRF-Chem 10 × 10 km 2 grid. In Figure 1d the point sources of the OMI-HTAP data set are shown by red and yellow circles of different magnitudes depending on their emission rate. The yellow circles correspond to the surface point sources, and red circles to the elevated point sources.

Comparison of SO 2 Emission Data Sets
All four emission maps share the common spatial pattern but differ in magnitude and specific positions of the point and distributed sources.  (Klimont et al., 2013). The MACCity and EDGAR-4.2 emissions are close in total amount but not in their spatial distribution. The MACCity emission inventory has coarser spatial resolution than the other data sets.  To better characterize the spatial distribution of emissions in the different data sets, we define 10 spatial subregions, as shown in Figure 1, and calculate annual SO 2 emissions in each subregion for all emission data sets (see Figure 2). For HTAP-2.2, we split the total emissions into industrial and traffic + residential + shipping emissions. Following Liu et al. (2018) total OMI-HTAP emissions are split into elevated that are energy emissions and surface that are non-energy emissions, which comprise industry, residential, and transportation sources.
HTAP-2.2 emissions are larger compared with MACCity and EDGAR-4.2 in the subregions of southeastern Europe, Syria, Iraq, Iran, and Turkey. It is not the case in the central Arabian Peninsula, which may be due to the improved characterization of the emission sources in this region. We have to mention that the latest period accounted for in all emission data sets (except OMI-HTAP) is 2010, but emissions over the eastern United States and Europe have been reducing during the last decade, and the recent war in Iraq and Syria also suppressed industrial activities . At the same time, the economic development and population growth in Saudi Arabia and Iran drive the emissions up in the Red Sea region and over Iran (Lelieveld, Beirle, et al., 2015). Thus, actual emissions in 2015-2016 may be slightly different than those assumed in the standard emission inventories.
Over the Arabian Gulf, OMI-HTAP emissions differ significantly from other data sets because the "top-down" approach helped to identify unaccounted sources associated with gas flaring on oil platforms. Consistently, MACCity, EDGAR-4.2, and HTAP-2.2 report unrealistically low emissions over the Arabian Gulf. For almost all other subregions, the OMI-HTAP data set shows lower emissions than in HTAP-2.2. On a national level, the contribution of residential and traffic sources to the total emissions in HTAP-2.2 is relatively small. However, within the cities, traffic and residential emissions could contribute substantially to total air pollution.

Vertical Distribution of SO 2 Emissions
Emissions can be distributed vertically, either using a fixed profile (Benedictow et al., 2009;Schaap et al., 2005;Visschedijk & van der Gon, 2005;Visschedijk et al., 2007) or using a plume-rise model (Byun & Schere, 2006;Guevara et al., 2014;Houyoux et al., 2000). Plume-rise models calculate the rise of a plume using buoyancy, exit momentum, and stack height. The Briggs plume-rise algorithm is one of the most commonly used (Briggs, 1969(Briggs, , 1975(Briggs, , 1984. It can predict plume heights up to 2,000 m (Houyoux et al., 2000). However, this estimate contains significant uncertainties. According to Gordon et al. (2018), the Briggs algorithm may underestimate plume heights by up to 50%. In other cases the plume height is consistently overestimated by the Briggs algorithm (VDI, 1985). Liu et al. (2018) in their global GEOS-5 simulations with the OMI-HTAP emission inventory SO 2 from energy-related sources (elevated emissions) emitted in the elevated 100-500 m layer with a constant mixing ratio. The non-energy (industrial, residential, and transportation) sources (surface emissions) were placed in the lowest model layer (see Table 2). MERRA-2 that uses EDGAR-4.2 emission inventory releases all SO 2 industrial emissions in the elevated 100-500 m layer and traffic, residential, and shipping emissions in the lowest model layer (Buchard et al., 2014). CAMS-OA places all emissions in the lowest model layer to be mixed up within the PBL. The global models use this convention uniformly all over the globe. However, in desert regions, like the ME, PBL and mixing processes are quite different than in the midlatitudes (Bieser et al., 2011). The height of the unstable desert PBL can reach 6 km, and strong turbulence mixes pollutants vertically more effectively than in the midlatitudes. For example, Liu et al. (2008) showed that during storm events in the Saharan and Arabian deserts, dust particles have been carried aloft to a maximum altitude of 6.6 km.
The catalog presented by Fioletov et al. (2016) does not provide information on SO 2 point source emission heights, but we know that in Saudi Arabia the height of the stacks in power and desalination plants, or petrochemical facilities, is typically between 35 and 150 m (Nayebare et al., 2016). Therefore, in our simulations in the ME we place elevated OMI-HTAP SO 2 emissions in the 120-1,000 m layer and surface OMI-HTAP SO 2 emissions in the 0-1,000 m layer with a constant mass mixing ratio. A similar assumption we made in the WRF-Chem simulation with HTAP-2.2 emissions, where industrial emissions were released in the elevated 120-1,000 m layer, and traffic, residential, and shipping emissions were placed in the lower 1,000 m layer (see Table 2). We apply this assumption uniformly over the entire simulation model domain, as currently the WRF-Chem model does not allow to do it selectively. Buchard et al. (2014) stated that changing the vertical distribution of the emissions did not affect their SO 2 fields, but we found that this assumption is only true for total SO 2 loadings and fails for surface concentrations. We will discuss these issues in more detail below.

Evaluation of SO 2 Column Loadings
To evaluate the emission data sets, here we compare simulated annual averages for 2015-2016 of SO 2 column loadings with the corresponding OMPS observations (Figure 3). We choose to analyze the annual means, as this eliminates high-frequency processes that might not be adequately described in the models and is consistent with the annual mean emissions data set we use in this study. We do not use OMI observations for comparison, as they have already been employed to construct the catalog sources and cannot be treated as independent. In addition, OMPS instrument retrievals are better suited for the model evaluation as the OMPS has lower SO 2 retrieval noise than OMI and contiguous spatial coverage.
In the comparison we include outputs from GEOS-5, MERRA-2, and CAMS-OA and from two WRF-Chem runs conducted with HTAP-2.2 and OMI-HTAP emission data sets. For comparison purposes SO 2 loading fields from OMPS, GEOS-5, MERRA-2, and CAMS-OA are conservatively interpolated on the WRF-Chem 10 × 10 km 2 grid to retain fine spatial features developed in the WRF-Chem runs. Loadings are presented in   Table 3.
The loadings in Figure 3 reflect the total column amounts of SO 2 in the atmosphere, which are defined by emissions, dry and wet deposition, and SO 2 conversion to sulfate through photochemical and wet-phase in-cloud oxidation. We tested the sensitivity of loadings to the vertical distribution of the emissions and found it to be quite weak consistently with Buchard et al. (2014). Thus, considering the OMPS observations as ground truth, the loading is an ideal characteristic for testing SO 2 emissions.  Both assimilation products and the free model runs in Figure 3 show similar main spatial features with high (∼0.5-1 DU) SO 2 column loadings along the west and east coast of the Arabian Peninsula and eastern Iraq (∼0.1-0.3 DU). The unknown "hot spot" in EDGAR-4.2 mentioned in section 2.3 produces in MERRA-2 SO 2 column loadings about 0.7 DU, which is higher than in Riyadh (see Figure 3c). All simulated fields, except CAMS-OA, overestimate loadings in southern Europe and the eastern Mediterranean. CAMS-OA consistently overestimates SO 2 over the southern Red Sea coast because it uses overestimated in this region MACCity emissions (Figure 1a). All products underestimate SO 2 over the Arabian Gulf except the WRF-Chem and GEOS-5 runs with OMI-HTAP emissions, which show higher than observed loadings of 0.5-1.0 DU. This is consistent with the findings of Fioletov et al. (2016) and McLinden et al. (2016).
Loading is a vertically integrated characteristic, which, as we found, is not sensitive to the detailed vertical profile of emissions. However, transport, chemical transformations and especially near-surface concentrations of SO 2 depend on SO 2 vertical distribution. Figure 4 shows the annual mean vertical profiles of SO 2 mixing ratios obtained from WRF-Chem simulations, CAMS-OA, MERRA-2, and GEOS-5 outputs averaged over the spatial subregions (see Figure 1). CAMS-OA and GEOS-5 exhibit monotonous profiles of SO 2 mixing ratios with a maximum at the surface layer, but MERRA-2 and WRF-Chem runs in most cases show an elevated maximum. Therefore, for example, in the WRF-Chem run with OMI-HTAP emissions, the SO 2 content in the PBL over the Arabian Gulf is the highest, but GEOS-5 shows larger surface concentrations.
To further the evaluation of the simulated loadings and associated emissions data sets, we show in Figure 5 the differences between CAMS-OA, MERRA-2, GEOS-5, and WRF-Chem loadings with OMPS observations. We calculated the differences on the OMPS grid to avoid the side effects depicted in Figure 5f, where the difference of the WRF-Chem and OMPS loadings is presented on the original WRF-Chem grid. The RMS error (RMSE) in Figure 5f is bigger than in Figure 5d because the coarse-resolution OMPS observations do not resolve the fine-scale spatial features developed in the WRF-Chem run. The mean biases and RMS errors for all panels in Figure 5 are calculated for the entire domain and presented in Table 3. The WRF-Chem run with OMI-HTAP has the smallest RMSE and mean bias.
The SO 2 column loadings obtained from the WRF-Chem run with the OMI-HTAP emissions are in good agreement with the observations by OMPS, in both spatial distribution and magnitude. The WRF-Chem run with HTAP-2.2 emissions overestimates the SO 2 loadings for Jeddah vicinity, eastern part of the Saudi Arabia, Iraq, and Syria compared to the OMPS.  Figure 6 shows the RMSE and mean bias of the simulated SO 2 loadings with respect to the OMPS for each subregion (see Figure 1). Overall, the WRF-Chem run with OMI-HTAP emissions shows improvements in RMSE and mean bias almost in all subregions compared with all other products. Based on these results, we conclude that the SO 2 loadings from the WRF-Chem run with OMI-HTAP emissions compare better with the OMPS observations than MERRA-2, CAMS-OA, and the WRF-Chem run with HTAP-2.2. As expected, MACCity and EDGAR-4.2 tend to underestimate emissions as loadings are negatively biased, but both HTAP-2.2 and OMI-HTAP are positively biased relative to OMPS and, presumably, slightly overestimate emissions.

Evaluation of SO 2 Surface Concentrations
SO 2 column loadings characterize the total content of SO 2 in the atmosphere, but regional air quality is determined by the surface concentration of SO 2 where it most affects humans and environmental systems. This characteristic cannot be easily retrieved from satellite observations but is available from model simulations, assimilation products, and field observations at specific sites. To find out which of the products produces more accurate surface SO 2 concentrations, we use in situ observations conducted by air-quality stations that continuously measured SO 2 surface concentrations. These measurements were performed during 2015-2016 by MODON using stationary Air Quality Monitoring Systems (AQMS) installed in the major cities of Saudi Arabia: Riyadh, Jeddah, and Dammam. The measurements were conducted following a strict protocol, and instruments were recalibrated on a quarterly basis. Detailed information on the deployed instruments is presented in Appendix A.2. Figure 7 shows daily mean SO 2 surface concentrations from the WRF-Chem runs with HTAP-2.2 and OMI-HTAP emissions, MERRA-2, CAMS-OA for period 2015-2016 sampled at three AQMS locations, and compared with the daily averaged in situ measurements. The dash-dotted line corresponds to the WHO air-quality guideline for the daily averaged SO 2 surface concentration, that is, 20 μg/m 3 . The dashed line corresponds to the EU air-quality limit for the daily averaged SO 2 surface concentration, that is, 125 μg/m 3 . The plots on the right of each panel in Figure 7 show the corresponding annual averaged values. The KSA  PME air-quality limit for annual averaged SO 2 surface concentration, that is, 80 μg/m 3 , is shown by the solid blue line. In all cities the WHO guideline limit is exceeded. In Jeddah, the European SO 2 air-quality limit (125 μg/m 3 ) is exceeded throughout the whole summer. WRF-Chem simulations with OMI-HTAP emissions fit the averaged observed SO 2 concentration quite well. However, because we only use the annual mean emissions and do not account for the seasonal power consumption variability due to air conditioning, the simulations underestimate the SO 2 concentration in summer. WRF-Chem run with HTAP-2.2 emissions severely overestimates concentrations in Jeddah; therefore, we removed this run from the Jeddah's daily plots. MERRA-2 underestimates pollution in all cities. CAMS-OA and GEOS-5 concentrations compare quite reasonably with AQMS observations, but due to the lower spatial resolution the peak concentrations may be underestimated in comparison to the fine-resolution WRF-Chem runs. Annual KSA PME limit 80 μg/m 3 is not exceeded anywhere except for Jeddah in the WRF-Chem run with HTAP-2.2 emissions. This is probably because a big emission source in HTAP-2.2 is shifted to be too close to the location of AQMS station in Jeddah. Thus, we conclude that the WRF-Chem run with OMI-HTAP provides SO 2 surface concentrations that compare well with the AQMS measurements, and we will use WRF-Chem with OMI-HTAP for our further air-quality estimates. Figure 8 shows the geographic distributions of SO 2 surface concentration fields obtained from CAMS-OA, MERRA-2, and WRF-Chem averaged over 2015-2016 and surface concentrations from GEOS-5 averaged over 2014. CAMS-OA, MERRA-2, and GEOS-5 fields were conservatively interpolated on the 10 × 10 km 2 WRF-Chem grid. The atmospheric lifetime of SO 2 in the troposphere is typically about 1 week, but in dry environments it could be 2-3 times longer; thus, in the ME SO 2 could be transported over large distances. The peaks of SO 2 surface concentrations occur in the vicinity of strong SO 2 sources; see Figure 1. Annual average SO 2 surface concentrations within these areas exceed 100 μg/m 3 . Despite their low spatial resolution, this effect is more pronounced in the CAMS-OA and GEOS-5 fields, because in these models, significant emissions are released in the surface layer (see Figure 4). The highest SO 2 surface concentrations are found over the western and eastern coasts of Saudi Arabia, where the major SO 2 emission sources are located. Eastern Iraq, the Arabian Gulf, and Mediterranean countries also suffer from high SO 2 surface concentrations. Down the stream, SO 2 is oxidized to produce sulfate aerosols that are the subject of a separate study. The geographic distributions of sulfate concentration obtained from WRF-Chem simulations with OMI-HTAP data set are discussed in Ukhov et al. (2020). Figure 9 shows the 2015-2016 annual mean SO 2 surface concentrations for the subregions; see Figure 1. We see more differences between the products in surface concentrations than in the column loadings. This could be related to the different model configurations, as we treat the concentrations in the lower model layer as a surface concentration, but the models have different vertical grids. The differences in the SO 2 vertical distribution (see Figure 4) also translate to the differences in the surface concentrations. However, the averaging over the subdomains reduce the differences associated with the varying spatial resolution. Consistent with previous findings, CAMS-OA shows highest concentrations for the eastern Mediterranean coast and the western Arabian coast subregions. MERRA-2 concentrations are the lowest. The two WRF-Chem

Conclusions
In this study we test how well the assimilation products (MERRA-2 and CAMS-OA) and the free-running models (GEOS-5 and WRF-Chem) in combination with emission databases (MACCity, EDGAR-4.2, HTAP-2.2, and OMI-HTAP) reproduce SO 2 column loadings and surface concentrations in the ME on a regional basis during the period 2015-2016. We use satellite observations and in situ ground-based measurements of SO 2 surface concentrations to test both the assimilation products and the models. The WRF-Chem-3.7.1 is run with 10 km grid spacing, whereas the CAMS-OA, MERRA-2, and GEOS-5 have an effective range grid spacings of 40-80 km. We modified the WRF-Chem preprocessing to use MERRA-2 reanalysis for constructing meteorological and chemical initial conditions and boundary conditions to ensure consistency between chemistry and transport processes.
The SO 2 column loadings show a strong dependence on the emission data sets used, in both the assimilation products and free model runs, and therefore, they help to evaluate emissions, when used in combination with the OMPS observations.
We specifically test and improve the newly developed SO 2 emission data set (OMI-HTAP) based on the combination of the HTAP-2.2 SO 2 emissions from residential and traffic sources with the catalog of the significant (>30 kt/year) SO 2 point sources observed by the OMI instrument ("top-down" approach) and use it along with the conventional HTAP-2.2 emission data set in the WRF-Chem simulations. Along with SO 2 column loadings, we evaluated the SO 2 surface concentrations and their effect on air quality in the major ME cities. The air-quality observations were made available from in situ measurements conducted by the AQMS installed in Jeddah, Riyadh, and Dammam. The simulated surface concentrations appear to be sensitive not only to the magnitude of emissions but also to their vertical distribution and model spatial resolution, thus making it more difficult to simulate air quality than column loadings. For example, the information on the exact location of the significant SO 2 source and its emission height in the OMI-HTAP data set becomes of particular importance when we compare with AQMS observations taken within the vicinity of large emission sources. Due to limitations of the method that was used to construct the point source catalog , the coordinates of the point sources cannot be exactly determined. Since we know the exact location of the Jeddah power plant, we corrected the coordinates in the developed data set, which led to better agreement of our simulations with the observations.
Thus, the main findings of this study are as follows: 1. ME countries are prolific SO 2 emitters, and the ME is responsible for more than 10% of global anthropogenic SO 2 emissions. 2. The older data sets, MACCity and EDGAR-4.2, underestimate emissions across the ME. The new HTAP-2.2 and OMI-HTAP overestimate emissions over the ME, but the OMI-HTAP allows to reduce biases and RMSE in SO 2 loadings with respect to the OMPS observations. 3. All products reproduce SO 2 loadings relatively well, but WRF-Chem with OMI-HTAP emissions exhibits the smallest RMSE and bias. 4. SO 2 surface concentrations are less consistent between products than loadings, as they depend not only on the emissions, but they are also sensitive to the model resolution and the vertical distribution of SO 2 emissions. 5. The vertical distribution of SO 2 emissions should not be spatially uniformly prescribed in the models over the globe but has to be skillfully parameterized to account for specific meteorological conditions. 6. Higher spatial resolution in WRF-Chem allows to compare the simulated concentrations with air-quality measurements more reliably in comparison with the coarse-resolution global models that smooth concentration extremes and therefore improves model calibration and emissions evaluation. 7. CAMS-OA overestimates SO 2 surface concentrations, while MERRA-2 tends to underestimate them.
WRF-Chem run with OMI-HTAP emissions compares better with in situ air-quality measurements than with other products. 8. Populated coastal areas exhibit the largest SO 2 pollution. In Jeddah and Riyadh, the European SO 2 air-quality limit is exceeded around 75 days/year.
Developing a denser air-quality monitoring network and measurements of vertical distribution of air pollutants is urgently needed to improve air-quality modeling in the ME. The SO 2 retrievals used in this work do not allow to extract information on the monthly variability of sources, but adding the seasonality to the emissions data set is a work in progress. The developed framework presented here can be used to evaluate the effect of other air pollutants like NO x and O 3 . The results of this study could serve as the basis for a regional air-quality forecast system that interactively calculates high-resolution atmospheric chemistry and aerosol processes driven by anthropogenic emissions. This system could be especially valuable for the prediction of extreme pollution events and could also improve our understanding of the impact of anthropogenic pollution on air quality and human health in the ME.

A.1.3. Meteorological Boundary and Initial Conditions
To be consistent with BC&IC for chemical species and aerosols, we utilized the same procedure to build meteorological BC&IC from MERRA-2 reanalysis for all required meteorological parameters. In particular, the following 3-D parameters were processed: pressure (Pa), geopotential height (m), temperature (K), meridional and zonal wind components (m/s), and relative humidity (%); 2-D parameters: surface pressure Figure A2. The AQMS installed in Jeddah.

A.2. MODON Measurements
The "AF22M" by Environnement-S.A (see Figure A1) is a continuous ambient air-quality monitoring analyzer, based on the ultraviolet fluorescence principle, which is the standard method for the measurement of SO 2 concentrations. The measurements are conducted at regular intervals, and the collected data are transmitted in real time to servers at MODON for processing and storage. To provide confidence in the operational status of each AQMS, a comprehensive physical audit is conducted by Ricardo-AEA Ltd. (UK) on a quarterly basis. An external view of the AQMS installed in Jeddah is presented in Figure A2.

Author Contributions
A. Ukhov wrote the manuscript and constructed IC&BC for aerosol and chemistry species based on MERRA-2 reanalysis and took part in planning the calculations. S. Mostamandi performed the calculations, constructed meteorological BC&IC based on MERRA-2 reanalysis, and took part in the discussions. G. Stenchikov planned the calculations, led the discussion, and reviewed and improved the manuscript. Y. Alshehri collected, filtered, and validated SO 2 observational data and wrote the section on the SO 2 measurement procedures. N. Krotkov, J. Flemming, C. Li, V. Fioletov, C. McLimden, A. da Silva, and A. Anisimov participated in the discussion, advised on the products and emission data sets, and helped to formulate the research program, and reviewed the manuscript.