Mapping and Understanding Patterns of Air Quality Using Satellite Data and Machine Learning

The quantification of factors leading to harmfully high levels of particulate matter (PM) remains challenging. This study presents a novel approach using a statistical model that is trained to predict hourly concentrations of particles smaller than 10 μ m (PM10) by combining satellite‐borne aerosol optical depth (AOD) with meteorological and land‐use parameters. The model is shown to accurately predict PM10 (overall R 2  = 0.77, RMSE = 7.44  μ g/m 3 ) for measurement sites in Germany. The capability of satellite observations to map and monitor surface air pollution is assessed by investigating the relationship between AOD and PM10 in the same modeling setup. Sensitivity analyses show that important drivers of modeled PM10 include multiday mean wind flow, boundary layer height (BLH), day of year (DOY), and temperature. Different mechanisms associated with elevated PM10 concentrations are identified in winter and summer. In winter, mean predictions of PM10 concentrations >35  μ g/m 3 occur when BLH is below ∼ 500 m. Paired with multiday easterly wind flow, mean model predictions surpass 40  μ g/m 3 of PM10. In summer, PM10 concentrations seemingly are less driven by meteorology, but by emission or chemical particle formation processes, which are not included in the model. The relationship between AOD and predicted PM10 concentrations depends to a large extent on ambient meteorological conditions. Results suggest that AOD can be used to assess air quality at ground level in a machine learning approach linking it with meteorological conditions.


Motivation and Research Questions
Extensive research has been conducted in recent years on the adverse health effects of particulate matter (PM) on the human cardiovascular system and the lungs. Cohort studies show that negative effects include emphysema, lung cancer, diabetes, and hypertension (Lelieveld et al., 2019;Lim et al., 2012;Pope et al., 2002;Wichmann et al., 2000), which cause a large number of premature deaths (Lelieveld et al., 2019(Lelieveld et al., , 2015. Although these risks are largely known and confirmed, discussions on effective measures to reduce exposure to air pollution are ongoing. Suggested measures range from traffic bans for certain vehicle types (Ellison et al., 2013;Qadir et al., 2013) over a reduced or more efficient use of solid fuel-based residential heating (Chafe et al., 2015) to the expansion of urban vegetation (Bonn et al., 2016). However, the actual effects of those measures are not always evident as not only local emissions but also ambient conditions such as meteorology, vegetation, and other land cover can play a substantial role in determining local PM concentrations. Therefore, understanding and quantifying drivers of PM concentrations is necessary to improve the efficiency of measures toward better air quality. The influence of meteorological conditions on particulate matter concentrations are diverse (see Figure 1) and have been described, for example, by Rost et al. (2009), Reizer and Juda-Rezler (2016), Dupont et al. (2016), Li et al. (2017), and Fuzzi et al. (2015). Boundary layer height (BLH) determines the height up to which particles are distributed in the atmosphere (Gupta & Christopher, 2009a;Schäfer et al., 2012). Precipitation leads to a substantial reduction in PM concentrations by wet scavenging (Li et al., 2015;Rost et al., 2009). Wind regulates particle transport and turbulent mixing away from the surface (Li et al., 2017). Depending on the location of the measuring point, air masses from certain wind directions transport particles and lead to elevated concentrations (Lenschow et al., 2001;Li et al., 2015). Temperature can regulate the particle number in the atmosphere by stimulating photochemical reactions, which transform precursor gases to secondary aerosols (Gupta & Christopher, 2009a;Wang & Martin, 2007) or by causing partitioning of condensed precursor gases (Petit et al., 2014). Various land cover types can act as sinks or sources for particles (Beloconi et al., 2018;Bonn et al., 2016;Churkina et al., 2017). Topography can be of importance for PM concentrations, for example, as particles accumulate in valleys .
Satellite AOD provides information on atmospheric particle concentrations and can expand information to areas where station measurements are sparse , revealing hotspots and spatiotemporal variations of pollution (Cermak & Knutti, 2009;Gupta et al., 2006). However, relying on satellite AOD as a proxy for near-ground air pollution can be misleading, as AOD reflects the extinction of radiation in an atmospheric column, while particulate matter concentrations reflect a highly localized dry mass concentration of particles of a certain size distribution typically measured near ground (Wang & Christopher, 2003). Several studies have trained statistical models on the relationship between AOD and PM, accounting for a range of additional parameters, and mostly with a focus on applications (see review by Rybarczyk, 2018). Methods include linear regression models (Arvani et al., 2016), multiple-additive regression models (Gupta & Christopher, 2009a;Zhang et al., 2018), land-use models (Kloog et al., 2011;Nordio et al., 2013), or a combination of the latter two (Chudnovsky et al., 2014;Hu et al., 2014;Kloog et al., 2012). With increasing data availability and computational power, machine learning methods, for example, artificial neural networks (Gupta & Christopher, 2009b;Di et al., 2016) and random forests (RF) (Brokamp et al., 2017;Grange et al., 2018) have been applied frequently in recent years. These machine learning models are beneficial as they efficiently reproduce nonlinear relationships and interactions of input features (Brokamp et al., 2017;Elith et al., 2008). In contrast to physical models, machine learning approaches do not require extensive prior process knowledge and thus have the potential to reveal or quantify processes that are as yet undetermined (Knüsel et al., 2019). Multivariate processes can be investigated by isolating certain variables and studying inputs and responses for dominant patterns Cermak & Knutti, 2009;Fuchs et al., 2018).
While numerous air pollution studies apply statistical models mainly to accurately predict PM concentrations Kloog et al., 2015;Stafoggia et al., 2017;van Donkelaar et al., 2010;Zhang et al., 2018), recent studies additionally analyzed feature importances and the information content of explanatory variables in the statistical models to infer processes. For example, Park et al. (2019) used a RF model to predict PM10 in South Korea and found large influence of AOD, day of the year (DOY), wind speed, and solar radiation on the modeled PM concentrations. Similarly, Grange et al. (2018) obtained high feature importances for wind speed, back trajectory cluster, DOY, and air temperature, using the RF approach for PM10 measurements in Switzerland. The present study builds upon the approaches applied in these studies but provides a more in-depth analysis of model-inherent relationships. To this end, gradient boosted regression trees (GBRT) are used to understand and quantify the conditions driving air quality, as well as determinants of the relationship between AOD and PM10. GBRT have been successfully applied to study sensitivities of aerosol processes before (cf. Fuchs et al., 2018). Thus, a basis is set for targeted satellite-based analyses of spatial patterns of air quality.

Data and Methods
The data basis of this study is comprised of 8 years (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) including satellite observations from the moderate resolution imaging spectroradiometer (MODIS) and others, model output from the European Centre for Medium-Range Weather Forecasts (ECMWF), and station data from the German Meteorological Service (Deutscher Wetterdienst, DWD) and the German Federal Environmental Agency (Umweltbundesamt, UBA). AOD observations are based on the multi-angle implementation of atmospheric correction (MAIAC) algorithm. A statistical model is set up to predict hourly PM10 concentrations based on a variety of input features, which are summarized in Table 1. Data with high temporal resolution are selected to 2.1. Satellite Data 2.1.1. MAIAC AOD Satellite-borne, high-resolution (1*1 km) MAIAC Collection 6 AOD is used (Lyapustin et al., 2011a;Lyapustin, Martonchik, et al. 2011;Lyapustin et al. 2011b;Lyapustin et al., 2018). The product is based on data from the MODIS sensor aboard the Terra and Aqua satellites. The MAIAC algorithm makes use of look-up tables, explicitly taking into account surface bidirectional reflectance factors (BRF). The calculation of AOD relies on the assumption that surface BRFs remain largely constant over time, considering a time series of 16 consecutive days (Lyapustin et al., 2011a;Lyapustin, Martonchik, et al. 2011). Quality flags are applied for filtering. Pixels were only incorporated when clear conditions are reported, that is, no contamination of the data due to clouds is to be expected ). An additional filter was set up to exclude AOD close to clouds to avoid increased AOD near cloud fringes due to aerosol swelling effects (Schwarz et al., 2017;Várnai et al., 2013). Therefore, the distance to the nearest cloud as classified by the MAIAC algorithm  was calculated and a threshold of 0.1 • was set, which corresponds to ∼7 km. This is in the range of what previous studies used as threshold Koren et al., 2007). Terra or Aqua satellite overpass times were used as temporal reference for data collocation, that is, other input feature values closest to the overpass times were used. Valid daytime AOD acquisition times as used in this study range from ∼9 to ∼1 UTC. AOD is an important input to the statistical model as it provides implicit information on the total aerosol loading in the atmosphere, reflecting natural as well as anthropogenic sources.

MODIS NDVI
Sixteen-day NDVI averages obtained from the MODIS MOD13Q1 Version 6 product are used (DOI: 10.5067/MODIS/MOD13Q1.006) to approximate photosynthetically active vegetation (Tucker C. J., 1979). The NDVI was found to influence PM concentrations in several previous works (Beloconi et al., 2018;Chudnovsky et al., 2014;Stafoggia et al., 2017). Vegetation acts as a sink by increasing the aerodynamic roughness and available surface for mechanical deposition (Fuzzi et al., 2015). On the other hand, vegetation can increase background particle concentrations by emitting pollen in spring (Fuzzi et al., 2015) and by enhancing the creation of secondary organic aerosols (SOA, Churkina et al., 2017). Around each PM station coordinate, a window with an edge size of 20 km is established to reflect the local contribution of vegetation to the total particle concentration. An edge size of 20 km is comparable to the mean representativeness of the PM stations (EU, 2008). The mean NDVI of each window was used as a predictor.

NASA Night Lights
The NASA night lights data set is included as a surrogate for population density. Based on the data from the Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi National Polar-Orbiting Platform (SNPP), the night-time lights product is available at 500 m resolution (Román et al., 2018). Population density is an important factor contributing to PM10 concentrations as it reflects human activity (Beloconi et al., 2018;Park et al., 2019). The mean night light intensity of a 20 km window around each PM station was used as a predictor.

EEA DEM
Topography can have a marked influence on the accumulation of particles . Here, the v1.1 EU-DEM is used, which is a hybrid product produced by the European Environment Agency (EEA) based on SRTM and ASTER GDEM fused by a weighted averaging approach. It has a spatial resolution of 25*25 m (more information and download at https://land.copernicus.eu/imagery-in-situ/eu-dem/ eu-dem-v1.1?tab=metadata). Station altitude and the dominant topography around each PM station are incorporated as predictors. A topographic position index (TPI) is computed to provide information on the topography of a pixel relative to its surrounding pixels. It employs the station altitude and subtracts the mean altitude of surrounding pixels in its vicinity. Positive values indicate a summit position, whereas negative values indicate a valley position (Egli et al., 2018). Again, a window size of 20 km was chosen.

EEA Corine Land Cover
Data from the CORINE land cover (CLC) inventory for the years 2006 and 2012 are used (Bossard et al., 2000), accessed via https://land.copernicus.eu/pan-european/corine-land-cover/view. In its finest thematic accuracy, the CLC data set consists of 44 classes. For this study, a more simplistic classification in six land cover types was chosen, which represent the most relevant land cover types influencing air quality (see  Table 2). The data have a spatial resolution of 250 m. Because the CLC data are categorical, a window of 20 km edge size was set up, and the number of occurrences of each type in that window was calculated. The number of occurrences of each feature is used as predictor for the model.

Reanalysis Data
The Era-Interim reanalysis data by the ECMWF provides full spatial coverage for the whole study period with an interpolated spatial resolution of 0.125 • (Dee et al., 2011). The data set has been successfully used in previous air quality studies (cf. Chen et al., 2018;Stafoggia et al., 2017;Zheng et al., 2017). To capture regional transport of particles, ERA-Interim reanalysis wind components (m/s) in east-west and north-south direction (10 m height) are used. Wind direction and speed can influence both particle concentrations (Beloconi et al., 2018;Chudnovsky et al., 2013;Li et al., 2015) and the relationship between AOD and PM10 (Stirnberg et al., 2018;Zheng et al., 2017). Wind direction and speed are included as instantaneous values and as the mean of precedent days (72 hr). BLH data are employed to approximate the vertical distribution of aerosols in the lower troposphere, implying that particles are well mixed within the boundary layer (Ansmann et al., 2000;Gupta & Christopher, 2009a). Dispersion of particles within a high and well-mixed boundary layer leads to reduced particle numbers near ground. If the BLH is low, particles accumulate and increase PM concentrations near ground as they are constrained to a smaller volume (Gupta & Christopher, 2009a;Wagner & Schäfer, 2017). Era-Interim temperature in 2 m height is included as instantaneous values and as temperature anomaly. Temperature anomalies are determined as the deviation of the expected value for each day of the year. Expected values are calculated by averaging daily values over 30 years, then calculating a running mean over 30 days to achieve a smooth sequence of the expected temperature. In addition to temperature, downward surface solar radiation (SSRD) and convective potential energy (CAPE) are included to capture potential secondary aerosol formation based on photochemical transformation processes (Wang & Martin, 2007) and to capture convective mixing in the atmosphere .

Ground-Based Data 2.3.1. UBA PM10
The focus here is on PM10 rather than PM2.5 because the latter excludes the fraction of larger particles, which are nevertheless accountable for light extinction and thus contribute to AOD measured by the satellite . In addition, PM10 measurements are more widely available. The PM stations are maintained by the UBA and measure the hourly mean PM10 concentration. PM concentrations are determined by measuring the attenuation of -radiation by a dust-coated filter (Umweltbundesamt, 2004) or by the principle of oscillating micro weighing (TÜVRheinland, 2012). To avoid condensation, the particle inlet is heated permanently. Thus, measurements are largely uninfluenced by temperature and humidity (Umweltbundesamt, 2004). The uncertainty of the continuous measurements is prescribed to be below 25% (Bundesministerium der Justiz und für Verbraucherschutz, 2010;VDI, 2002). Instructions by the European Union prescribe site locations that avoid microenvironmental effects. Stations are classified as background, industrial, or traffic and are designed to be representative for urban, rural, or suburban areas. Traffic sites are generally situated relatively close to main roads or intersections (EU, 2008). Urban background sites are assumed to capture the contribution of all sources near the site without one particularly dominating source. Suburban background stations need to be placed downwind (referring to the main wind direction) of emission sources. Rural background stations should not be influenced by agglomerations or industrial sites closer than 5 km (EU, 2008). The spatial distribution of station types and their altitudes are shown in Figures A1 and A2. UBA PM station coordinates are used as spatial reference for data collocation, that is, pixels from continuous data grids are collocated with the position of these stations if below a distance threshold of 0.01 • (∼0.7 km). Urban industrial stations are not considered in this study, as they are primarily influenced by local emissions from point sources that cannot be adequately represented with the available data. PM10 concentrations were checked for sudden peaks, which could be due to localized events. These are filtered out by eliminating situations, where the PM10 concentration was more than double the mean of the previous and following hours.

DWD Meteorological Data
Air pressure and relative humidity (RH) data were obtained from the German Meteorological Service (DWD) (DWD Climate Data Center [CDC], 2017). Previous studies found a positive correlation of PM10 concentrations and air pressure. Higher air pressure indicates stable synoptic conditions, which favors the accumulation of particles (Li et al., 2015). RH potentially enhances particle numbers by stimulating the formation of aqueous SOA, forming in cloud or aerosol water (Ervens et al., 2011). In addition, RH can influence the relationship between AOD and PM10, as higher levels of humidity lead to hygroscopic particle growth. Moisture on particles increases their diameter, eventually causing a rise in AOD without affecting PM10 measurements (

Continentality Factor
The relationship between AOD and PM10 as well as driving factors of PM10 concentrations may vary in different climatic regions (Di et al., 2016). This is accounted for by the inclusion of a dimensionless continentality factor k, which is calculated following the formula by Conrad (1946) where A corresponds to the difference between the hottest and coldest mean monthly temperature and to the latitude in decimal degree. Here, k was calculated using the mean temperature of July  and the mean temperature of January . Temperature data are provided by the DWD (DWD Climate Data Center [CDC], 2018).

DWD Radolan Precipitation
The influence of precipitation on PM10 concentrations includes wash-out effects and the limitation of movement of particles after precipitation events (Fuzzi et al., 2015;Li et al., 2015;Rost et al., 2009). In this study, data from the Radar Online Adjustment project (RADOLAN) are used. The data set is produced by the DWD and merges radar measurements with rain gauge data, also including orographic correction (Bartels et al., 2004;Weigl, 2017). The data are available with high spatial coverage and are expected to be better suited for the present analysis than the Era-Interim precipitation product, which has some known biases (de Leeuw et al., 2015). In the statistical model, the time since the last precipitation event (hr) and its magnitude (mm/hr) as well as the accumulated precipitation of the last 24 hr (mm) are included. Hourly means of precipitation around each PM station are averaged within a window of edge size 5 km. The chosen window size is smaller than the previous ones, as precipitation effects typically vary on small scales.

Other Input Data 2.4.1. EEA Emission Database
Annual emissions of NO x , PM10, SO 2 , and NH 3 (in tonnes, based on the year 2008) are included to approximate background pollution levels. The data are gathered by the European Environment Agency (EEA) and described in Theloke et al. (2009). Diffuse air releases from traffic, agricultural, industrial, and residential sources are covered. Strong emitters that fall under the European Pollutant Release and Transfer Register (E-PRTR) are not included in this data set.

Spatiotemporal Information
Seasonality was shown to be an important factor in previous studies (Grange et al., 2018). Here, DOY is used as seasonal proxy. To mirror the seasonal cycle, DOY was converted to a sine curve with +1 representing summer solstice and −1 representing winter solstice (Park et al., 2019;Stolwijk et al., 1999). To further approximate variability in emission strengths based on human activity, day of the week is included.
1. Decision trees use decision nodes to split the predictor space in subsets, which provide the most homogeneous distribution, that is, the subsets' variance is minimized. For each subset, regression trees fit the mean response of the observations that go into the model (Elith et al., 2008). 2. Similar to the RF method, GBRT consist of an ensemble of decision trees. In GBRT models, however, the construction of the ensemble is different, as decision tree regressors are sequentially added to the ensemble (Elith et al., 2008;Hastie et al., 2009;Rybarczyk, 2018). Each new tree that is added to the ensemble boosts its predecessor with the goal to minimize a loss function, and existing trees are not changed when new trees are added. The trees are fitted on a subset of the complete data set, which induces a random component to the model to reduce overfitting (Elith et al., 2008;Hastie et al., 2009). 3. Each new predictor is fitted to the predecessor's previous residual error using gradient descent (Hastie et al., 2009;Elith et al., 2008).
GBRT capture complex interactions and interactive effects between individual predictors (Brokamp et al., 2017;Elith et al., 2008), which nevertheless need to be considered when interpreting the model outcome.
An advantage of tree-based methods such as RF or GBRT is that, compared to deep learning methods, model decisions can be retraced and dependencies of the model outcome to input features can be quantified, allowing for conclusions regarding physical processes. This makes GBRT an interpretable machine learning method. GBRT theoretically produce results more effectively than the RF method, as trees are built systematically and less iterations are required (Elith et al., 2008). GBRT have shown to have good predictive power in previous studies (Elith et al., 2008;Fuchs et al., 2018;Just et al., 2018). The general framework of setting up the model is shown in Figure 2 and includes input feature selection, hyperparameter tuning, model training, and model validation.

Feature Selection: Recursive Feature Elimination
Redundant input features lack information and potentially degrade model performance by inducing misleading information, thus weakening the target orientation of the model (Meyer et al., 2018). Therefore, a feature selection is conducted to eliminate redundant predictors. A base model is initialized with a fixed number of trees (500), a fixed learning rate (0.1), and all other hyperparameters on default settings. One feature is then excluded, and the data set is randomly split into a training data set (2/3) and a test data set (1/3) 50-fold. After 50 repetitions, the decrease in model performance on the test data set due to the exclusion of the feature was determined using R 2 as indicator. In the final model, the magnitude of the last precipitation event, the accumulated rainfall of the last 24 hr, and the annual mean emission of SO 2 are excluded, as their exclusion did not lead to a decrease in model performance.

Hyperparameter Tuning, Model Training, and Model Evaluation
Hyperparameters refer to the model architecture, for example, the number of trees or the number of decision nodes. The determination of adequate model hyperparameters is essential to avoid overfitting of the model but at the same time ensures that the model is able to generalize. A grid search is executed, where several parameter combinations are tested. A list of tested parameters is provided in Table 3.
There are trade-offs between max_depth, n_estimators, and learning_rate. A lower learning_rate requires a higher number of trees, as the contribution of each tree is decreased. The contribution of each tree is effectively the step size of the gradient descent. Thus, if learning_rate is too large, the model cannot adapt to the training data. The number of decision nodes in a tree (max_depth) also affects n_estimators: Increasing max_depth reduces the number of necessary trees but increases the risk of overfitting.
The model penalizes erroneous predictions to improve accuracy. The calculation of the penalty value is determined by the loss function. In this study, a least squares loss function is chosen. The least squares loss function is sensitive to very high (low) values as it strongly penalizes large deviations between predictions and observations and will adjust the model accordingly. This is desirable, as the model should be able to reproduce high concentrations of PM10. Model performance is validated using two kinds of validation strategies. The integrated scikit-learn split function creates a random training (2/3) and test data set (1/3), ensuring a comparable distribution of both data sets. To test the spatial generalizability of the model, a leave-location-out (LLO) approach is conducted. Therefore, one third of randomly chosen stations are restrained for validation. If the model performance is lower compared to the random approach, the spatial generalizability of the model is limited (Meyer et al., 2018).
For further analysis, four seasonal models are trained on seasonal subsets of the data (see also Figure 2) using the hyperparameters determined in the full-year grid search. One input feature set representative for all seasonal models and the full-year model is used to ensure their comparability. The only exception is the EEA emission data set, which is only included in the yearly model, since it represents yearly emissions.
2.6. Model Interpretation: Isolation of Feature Contributions 2.6.1. Feature Importance The relative feature importance reflects the explanatory power a feature provides to the model. Assuming that the model is able to capture physical processes well, the feature importance represents a valuable qualitative measure to determine the relative magnitude of the influence of input features to predicted PM10 concentrations. The feature importance is calculated by repeated permutation of one feature (Strobl et al., 2007).

Partial Dependence
To quantify the influence of input features on the model, the partial dependence (PD) of modeled PM10 concentrations on input features is calculated. PDs express the average effect of one input feature on the modeled PM10 outcome while accounting for average effects of complement input features (Elith et al., 2008;Hastie et al., 2009;Goldstein et al., 2015). The investigated input feature is gridded and the corresponding average PM10 prediction is calculated with respect to complement features, which are varied over their marginalized distributions (e.g., 1st-99th percentile). Thus, PD plots reflect the mean change of average predicted PM10 concentrations based on one input feature. The isolated effects of input features on the model response can be evaluated and put into context regarding their significance to physical and chemical processes determining PM10 concentrations (Grange et al., 2018).

Individual Conditional Expectation
PD plots reflect the mean model response and neglect model heterogeneity. Model responses to single data instances (i.e., one set of input features related to one PM10 observation) can be unwrapped and bundled in one plot using individual conditional expectation (ICE). ICE plots reflect individual predicted responses as a function of one data instance depending on correspondent feature observations. Model responses are computed by keeping the complement features constant while the investigated feature varies, thus creating new

Model Performance
The overall model performance is shown in Figure 3, depicting observed PM10 versus predicted PM10 for the validation data. The full-year model explains 77% of the variance. The slope (0.76) shows a slight overestimation of low PM10 concentrations as well as an underestimation of high PM10 concentrations. Presumably, the underestimation is due to processes not captured by the input features, that is, street-scale processes not covered by AOD observations but still influencing PM10 observations, such as increased PM10 emissions due to traffic jams or localized dust resuspension. In addition, the model possibly tends to underestimate higher PM10 observations, because most valid data points are available for medium to low PM10 concentrations. Thus, the model is optimized to best reproduce these observations. This tendency was reduced by the choice of the least squares loss function as describes in chapter 2.5.3 but likely still continues to affect the model accuracy. The model performance is comparable to similar studies, also in its underestimation of PM Grange et al., 2018;Stafoggia et al., 2017;Zhang et al., 2018). Tenfold random train/test splits were conducted, resulting in 10 models. Validation of these models revealed very similar performances, which is why only one model is shown and used for subsequent model analysis. Applying the LLO split  Table 1. approach slightly deteriorated model performance. Depending on the restrained stations, R 2 ranged from 0.5 to 0.7. The model is considered as adequate to be used for further investigations. Nevertheless, note that high PM10 values tend to be underestimated by the model (see Figure 3). The spatial distribution of the model skill is shown in Figures B1 and B2.
Model performances vary seasonally (see Figure 4). The model performs best in winter and spring with high R 2 values (0.77) and low NRMSEs (0.32 and 0.3), while R 2 is lowest in summer with an R 2 of 0.63 and a slightly increased NRMSE of 0.34. PM10 concentrations generally show less variance in summer, which reduces the RMSE but possibly provides the model less variance to learn from and deteriorates its skill (i.e., R 2 ). Obviously, processes governing PM10 concentrations in summer are not as well captured by the model. This will be further addressed in the following chapters.

Information Content of Input Features
Temperature (anomaly and absolute), AOD, 3-day mean east-west wind component, DOY, and BLH are of high importance to the model (see Figure 5). The importance of the DOY suggests that the model captures the seasonality of PM10 concentrations, which are higher in winter and lower in summer. The relatively high importance of AOD and the good model performance emphasize the suitability of AOD to infer on PM10 concentrations when additional parameters are taken into account. A comparison to similar studies, for example, by Grange et al. (2018) and Park et al. (2019), reveals comparable relative feature importances of AOD, solar radiation (Park et al., 2019), DOY, and BLH (Grange et al., 2018;Park et al., 2019). The high importance of the 3-day mean of the east-west wind component found in this study aligns with the high importance of the back trajectory clusters in the study by Grange et al. (2018). Both parameters reflect regional particle transport. However, there is a discrepancy regarding the importance of wind speed and temperature. Wind speed is considered as instantaneous wind components in this study, which have limited importance. While the importance of absolute air temperature is comparable, the high importance of temperature anomalies found here shows that the approach of splitting temperature information into absolute values and anomalies as pursued in this study provides additional information to the model. Since comparing feature importance values can provide only limited insights into processes behind air pollution patterns, further quantitative analyses are presented in the following chapters using the ICE approach.

Model Sensitivity 3.3.1. Mesoscale Wind Information
The PD of the 3-day mean east-west wind component shows a consistent pattern throughout all seasons. Positive values (i.e., prevailing western direction of inflow) are associated with reduced concentrations of PM10, whereas a negative east-west wind component (i.e., winds from the east) is associated with increased model PM10 concentrations (Figure 6). For the full-year model the maximum difference in mean PM10 predictions is ∼10 g/m 3 , while in winter, the maximum difference is ∼20 g/m 3 . These numbers agree well with results from van Pinxteren et al. (2019), who quantified the influence of eastern air masses on eastern Germany. They found the contribution of trans-boundary transport from eastern European countries to be 13 g/m 3 on average, depending on meteorological conditions. Air masses from continental eastern Europe tend to transport higher amounts of particles, whereas western, more maritime air tends to be cleaner due to precipitation along the trajectories of air masses. Source regions of PM10 include industrial and residential areas in Poland and the Czech Republic with heavy industries or extensive usage of solid fuels for residential heating (Beloconi et al., 2018;Kiesewetter et al., 2015;Reizer & Juda-Rezler, 2016;van Pinxteren et al., 2019). Results by Grange et al. (2018) also show increased values of PM10 for northern and northeastern wind directions, although to a lesser extent. In winter, the effect of particle transportation is strongest, presumably due to increased emissions from domestic heating in eastern Europe (Reizer & Juda-Rezler, 2016;  Figure 7). This could indicate insufficient mixing of the atmosphere, which would lead to an accumulation of particles near ground . Overall, instantaneous wind information has little influence on the model, causing the PD to remain relatively constant. The relatively constant PD of instantaneous wind information implies little influence on PM10 predictions. This suggests that wind information needs to be extended to a longer time scale to influence PM predictions.As shown in Grange et al. (2018), wind speed aggregated for a daily period can substantially influence PM10 concentrations, with lower speeds causing higher concentrations. Park et al. (2019) also found the maximum wind speed of previous 3 hr to be of importance for their statistical predictions of PM10. The north-south wind component PDs (instantaneous and 3 days) do not provide clear trends.

BLH and CAPE
The PD of BLH shows that the model is able to reproduce the pattern of decreasing particle concentrations with increasing BLH (Gupta & Christopher, 2009a;Wagner & Schäfer, 2017).The shape of the full-year PD of BLH shown in Figure 8a is similar to that provided by Grange et al. (2018) for observations in Switzerland. They found a reduction of ∼8 g/m 3 for daily PM10 predictions. A reduction in mean PM10 predictions of ∼10 g/m 3 for situations with higher BLH can be seen in Figure 8a. This similarity is encouraging and proves the robustness of the modeling approach, since both studies use ERA-Interim BLH in a comparable geographic setting. For BLH values above ∼800 m, the PD remains constant, that is, the influence of the boundary layer on PM concentrations stagnates (Liu et al., 2018). Mean modeled PM10 concentrations increase slightly in conditions with very high BLH (>2,000 m). This pattern could be related to the formation of a deep, convective boundary layer coinciding with high temperatures, enhancing the formation of secondary aerosols (Grange et al., 2018). The abundance of radiation, high temperatures, and precursor gases at excess concentrations would be needed therefore (Fuzzi et al., 2015). Indeed, the pattern is most prominent in summer, when these prerequisites are most likely to be met. The PD of BLH in summer is almost constant, that is, little information is provided to the model. Figure 8d reflects a shift of the frequency of occurrence of data points toward higher BLH: During summer months, medium to high BLHs are more likely to occur due to enhanced convection. As mentioned before, a BLH above 800 m provides only little information to the model, thus reducing its predictive ability. The accumulation effect of a lower BLH is most pronounced in winter with increased mean PM10 predictions of almost 20 g/m 3 . In addition, PM10 emissions are expected to be higher in wintertime due to combustion of solid fuels for domestic heating. This has been shown for Eastern European countries (Reizer & Juda-Rezler, 2016;van Pinxteren et al., 2019) and likely influences PM10 concentrations in Germany. Thus, with higher locally produced or advected PM10 emissions, the accumulation effect of a low BLH is more distinct in winter than in summer, where reduced emissions are expected and an accumulation of particles is not expected (Wagner & Schäfer, 2017). A dependence of model PM10 on CAPE could not be identified using the PD approach. Similar to the previously shown PD, the two-way PD shows the mean modeled PM10 (color coded from light orange to black) to the isolated effects of umean and BLH. On top of the PD isolines, a probability density function (PDF) using Gaussian kernel density estimates is added to indicate the qualitative frequency of occurrence of values from high (yellow) to low (dark blue).

Two-Way PD: umean and BLH
Depicting the PD of BLH and the 3-day mean east-west wind component simultaneously allows for the quantification of particularly high-polluted situations, considering combined effects of both features. A probability density function (PDF) using Gaussian kernel density estimates is added to provide a qualitative estimate of the frequency of occurrence of PD values. Values outside the PDF estimation as shown in Figure 9 are extrapolated based on the trained model and do not represent observed data.
The two-way PD suggests mean modeled PM10 concentrations double due to changes in BLH and wind flow, referring to the full-year model. Highest mean predictions (∼35 g/m 3 ) are modeled when eastern winds coincide with shallow boundary layers, whereas lowest mean predictions occur during medium BLH and western winds (∼17 g/m 3 ). Note that due to the tendency of the model to underestimate high PM10 levels, concentrations could be higher in reality. Patterns differ seasonally. In winter, highest mean PM10 predictions surpass 45 g/m 3 during shallow BLH conditions and wind flow from the east. In summer, this is not the case. As mentioned in chapter 3.3.2, there is indication of elevated PM10 concentrations during very high BLH (∼>2,000 m) conditions, coinciding with eastern wind flow. Highest mean predictions in summer do not surpass ∼22 g/m 3 (within the limits of the PDF estimation).

Thermal Influence on PM10
In spring, summer, and autumn, positive temperature anomalies cause a marked increase of mean model PM10 predictions (see Figure 10). Presumably, higher temperatures in spring and summer (Figures 10c  and 10d) reflect enhanced biogenic activity, as vegetation is generally more active at higher temperatures. Consequently, emissions of primary particles such as debris or pollen and the emission of biogenic volatile organic compounds (BVOCs) are stimulated (Laothawornkitkul et al., 2009). With higher BVOC emission, an enhancement of secondary SOA formations is leveraged (Churkina et al., 2017;Megaritis et al., 2013). In addition to increased biogenic activities, higher temperatures cause the soil to dry up more quickly, thus increasing dust emissions (Hoffmann & Funk, 2015). In winter, positive anomalies have very little effect on predicted PM10 concentrations (Figure 10b). This supports these hypotheses, since neither increased biogenic activity nor dried-up soils are to be expected in winter. Higher temperatures could reflect increased photochemical oxidation processes, which trigger photochemical reactions leading to new particle formation processes (Birmili & Wiedensohler, 2000;Größ et al., 2018;Wiedensohler, 2000). However, there was no trend in the partial dependence of SSRD, which shows a weak influence of SSRD on PM10 predictions (see Figure C1). Note however that the lacking influence of SSRD could also be due to the fact that this study is confined to cloud-free situations due to the availability of AOD. The variation of SSRD would be higher when including cloudy days, which would likely improve the information content provided to the model by including SSRD. Another possible explanation for increased PM10 concentrations at higher temperatures could be that these situations are associated with stable synoptic conditions (at least in spring, summer, and autumn), causing particles to accumulate in the atmosphere. PD trends for instantaneous temperature ( Figure 11) are inverse to those described for temperature anomalies. The full-year model PD shows a decrease in predicted PM10 concentrations for higher temperatures. Presumably, instantaneous temperature reflects the annual cycle of PM10 (similar to DOY), which is why the seasonal PDs show no trends. PD of air temperature as presented in Grange et al. (2018) reveal a different pattern. In their study, temperatures below freezing are associated with high PM10 concentrations, medium temperature in the range of 0-15 • C with low PM10 concentrations, and temperature above 15 • C with high PM10 concentrations. However, when combining the effects of temperature anomalies and temperature presented in Figures 10a  and 11a, the emerging pattern would be similar. This suggests that the model presented in this study is able to discriminate between the seasonal component of temperature and the immediate effect of temperature on particle emissions, for example, due to new particle formation (cf. Birmili & Wiedensohler, 2000;Bressi et al., 2013;Größ et al., 2018;Petetin et al., 2014;Wiedensohler, 2000)

RH and Precipitation
Increased RH is associated with higher PM10 predictions (see Figure 12). This is likely related to an increase in AOD due to aerosol swelling in humid conditions (Crumeyrolle et al., 2014;Wang & Christopher, 2003).
Other than increasing AOD, the importance of RH could also be related to a correlation between BLH and RH (see PDF estimate in Figure 13). Higher RH reduces the magnitude of turbulent vertical flux and subsequently reduces the BLH (Adamopoulos et al., 2007;Petäjä et al., 2016), which in turn could increase PM10 predictions. On the other hand, vertical transport of water vapor is impeded by a low BLH, which increases RH and stimulates formation of aqueous secondary aerosols (Liu et al., 2018). To analyze the influences of BLH and RH on predicted PM10, a two-way PD of RH and BLH is calculated (see Figure 13). It shows a changing pattern with increasing RH, visible in a change of line structure orientation from vertical to more horizontal in the two-way PD plot. While BLH dominates during shallow boundary layer conditions (vertical lines), the influence of RH is more prominent at higher BLH values (horizontal lines). This pattern points to an influence of RH on PM10 predictions, which is decoupled from the BLH, as the influence of BLH above 800 m is marginal. A similar pattern is found for all seasons except for summer. In summer, the model outcome does not show any response to changes in BLH, as there are rarely any BLH values below 800 m (see Figure D1). A study by Belle et al. (2017) conducted in the United States found RH to have positive impact on PM2.5 concentrations during cloud-free conditions due to an increase in sulfate and nitrate masses. However, they found PM2.5 to decrease with increasing RH during cloudy conditions.
The more time passed since the last precipitation event, the higher the PM10 prediction tends to be, reflecting the accumulation of local emissions in the atmosphere. The influence of this effect on the model is not pronounced and stagnates from about 100 hr (see Figure C2). The magnitude of the last precipitation event and accumulated precipitation of last 24 hr were not included in the model due to lacking importance as determined in the feature selection (see chapter 2.5.2). Note however that the low importance of precipitation could be related to the consideration of only cloud-free days in this study. Thus, the immediate effect of rainfall on particles in the atmosphere cannot be investigated. In addition, possible effects of precipitation along the trajectories of advected air masses are not covered.

NDVI, Corine Land Cover, and Spatiotemporal Factors
The NDVI was of minor importance for the prediction of PM10 concentrations. No trends or seasonal differences were found by application of the PD approach. However, with increasing number of pixels in the vicinity of a PM station classified as agricultural areas (2CLC), PM10 concentrations tend to be higher. Likely, this is related to primary emission of dust from arable lands and the application of fertilizers (NH3), which constitute important precursors for secondary particle formation (Hoffmann & Funk, 2015;Wagner et al., 2015). For the other land cover classes, no trends were found using the PD approach. Lower PM10 concentrations are predicted on Saturdays and Sundays, indicating that reduced anthropogenic activity (less traffic, reduced industrial production) has an immediate effect on PM10 concentrations. Increasing altitude slightly reduces the mean PM10 prediction, possibly due to lower population density and more effective pollution dispersion processes (Beloconi et al., 2018;Hu et al., 2014). The PDs of NH3, continentality, PM10 mean annual emissions, and of the TPI do not show a distinct trend. SO 2 was excluded from the model during the feature selection process.

Determinants of the Relationship Between AOD and PM10
The full-year model and the seasonal models associate increasing AOD with increasing PM10 (see Figure 14). This pattern is less distinct in summer (except for very high AOD) when particles are generally more dispersed within a well-mixed boundary layer, and the AOD is largely determined by particles higher up in the atmosphere, thus weakening the relation between AOD and PM10.
The relationship between AOD and PM10 is not bivariate and can be modified by ambient meteorology (Gupta & Christopher, 2009a;Guo et al., 2009;Sorek-Hamer et al., 2017;Stirnberg et al., 2018). A quantification of this effect is approached here by using the two-way PD method.
The two-way PD of AOD and BLH reveals a dependence of the model on both AOD and BLH (see Figure 15). The importance of interactive effects of these features can be illustrated by the following example: assume an AOD of 0.2 and BLH of 2,000 m versus an AOD of 0.2 and BLH of 200 m. In the latter case, the mean predicted PM10 concentration is ∼10 g/m 3 higher as the aerosol content determining AOD is closer to the ground and thus more relevant for the PM10 prediction. In other words, a prediction based on AOD (assuming that AOD is largely determined by attenuation in the boundary layer Schäfer et al., 2008) alone would lead to erroneous PM10 predictions, as AOD does not fully capture the particle accumulation effect of a shallow boundary layer (cf. Stirnberg et al., 2018). Similar effects can be observed for the two-way PD of AOD and the 3-day mean east-west wind component (see Figure 15) and for the two-way PD of AOD and 10.1029/2019JD031380 Figure 15. Two-way PD of AOD and BLH, full-year model. Description as in Figure 9. temperature anomalies (plot not shown). The two-way PD of AOD and BLH shows only a minor seasonal pattern, which is mostly driven by BLH (see Figure D2) Westerly wind flow (positive umean) leads to substantially lower PM10 predictions when compared to similar AOD values in situations dominated by easterly wind flow (negative umean). The two-way PD suggests this effect to be as large as ∼8 g/m 3 (see Figure 16). Air masses from the east possibly carry a higher amount of near-ground particles (Beloconi et al., 2018;Bonn et al., 2016;Reizer & Juda-Rezler, 2016), affecting PM10 observations more strongly than AOD. Another reason for the observed effect could be that western air masses carry a relatively large amount of sea salts with a high hygroscopic growth factor. By effectively taking up water, these constituents enhance light scattering, thus increasing AOD without increasing PM10 measurements (Stirnberg et al., 2018;Zieger et al., 2014;Zieger et al., 2013). The seasonality for the two-way PD of AOD and umean is weak (see Figure D3).

Summary, Conclusions, and Outlook
A machine learning model is used to advance the understanding of drivers of near-ground PM10 and the capability to use satellite AOD to infer on PM10. Parameters pertaining to meteorology, land cover, and satellite-based AOD are considered and related to hourly PM10 concentrations. The model performs well (overall R 2 of 0.77, RMSE = 7.44 g/m 3 ) and provides a basis to assess sensitivities. These allow for the isolation and quantification of effects of ambient conditions on PM10. Overall, the model is more sensitive to meteorological conditions than to land cover parameters. BLH, east-west winds, DOY, temperature, and RH are identified as the important driving factors of PM10 variations. Representing regional particle transport, the 3-day mean of the east-west wind component substantially modifies PM10 concentrations, Figure A1. Spatial distribution, type, and representativeness of UBA PM measurement stations in Germany. depending on the direction of inflow. Eastern inflow generally increases PM10 concentrations. Modeled PM10 concentrations were also increased during higher than average temperatures. Possibly, this is due to stimulated vegetation activity, increasing primary particle and precursor gas emissions. The influence of BLH is most prominent at very low (∼<500 m) values. However, there is indication that very high BLH values (∼<2,500 m) influence PM10 concentrations as well. While the former threshold marks the effects of Figure A3. Mean PM10 concentrations for measurement stations used in this study. Time period is the time frame analyzed in this study (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). particle accumulation within a shallow boundary layer, the latter threshold could indicate the formation of a deep boundary layer with stimulated formation of secondary aerosols as suggested by Grange et al. (2018). If BLH is between these thresholds, its explanatory power is limited. In these situations, other processes determine PM10 concentrations. Overall, the model outcome suggests that there are different meteorological boundary conditions that potentially cause elevated PM10 concentrations in winter and summer ( Figure 17). In winter, very shallow boundary layers, coinciding with multiday easterly wind flow cause highest mean PM10 predictions (>30 g/m 3 ). Mean PM10 predictions for these conditions are as high as 40 g/m 3 (see Figure 9). This is probably related to higher anthropogenic emissions in winter and frequently low BLH.

Journal of Geophysical
In summer, higher temperatures associated with increased formation of secondary aerosols and coinciding with multiday easterly wind flow lead to mean predicted PM10 concentrations >27 g/m 3 (figure not shown). High PM10 concentrations in summer appear to be largely uncoupled from changes in BLH (see Figure 8d). The model R 2 decreases in summer, suggesting that the statistical model does not as well resolve these processes. In addition, the relationship between AOD and PM10 is weaker in summer.
Results presented in this study suggest that meteorology plays a substantial role in the development of high pollution situations. This has potential implications for plans toward better air quality in high-polluted areas, as meteorological conditions need to be taken into account, for example, for temporary traffic bans. In addition, there is a need to introduce measures to reduce air pollution on a regional scale. Measures limited to city scales can only decrease pollution levels associated with local emission sources, which can be superimposed by transported particles.
The importance of AOD for the statistical model highlights the suitability of AOD for air quality studies. However, potential implications and limitations for the use of satellite AOD for air quality studies are described. This study has shown that satellite-derived AOD can be used to infer street-level PM10 concentrations, if ambient meteorological conditions are taken into account explicitly. In particular, temperature anomalies, the east-west regional wind component, and BLH modify the relationship between PM10 and AOD. A drawback of including AOD is the restriction to cloud-free situations, which potentially introduces a bias due to nonrandom data gaps . Depending on the situation and location, both an overestimation or underestimation of PM10 could be the consequence . In addition, the influence of certain meteorological variables could be underestimated due to important processes under cloudy conditions, which are not covered Brokamp et al., 2018). The use of GBRT proved fruitful to understand interconnected processes and the approach presented here can be potentially expanded to other research questions focusing on the understanding multivariate processes. Future efforts will further address the determination of mechanisms leading to high pollution events using machine learning not only for total PM10 concentrations but for individual aerosol species.  Rhine-Ruhr region (northwest), performance seems to be generally worse. These stations generally have high mean PM10 concentrations (see Figure A3). However, it appears that other urban areas, which also have high mean PM10 concentrations can be modeled quite well (e.g., Berlin in the northeast or Hamburg in the north). Acronyms AOD aerosol optical depth BLH boundary layer height BRF bidirectional reflectance factor CAPE convective available potential energy CLC Corine land cover DEM digital elevation model  MODIS moderate resolution imaging spectroradiometer NDVI normalized difference vegetation index PDF probability density function PM particulate matter RADOLAN Radar-Online-Aneichung RF random forest