Volume 124, Issue 5 p. 2753-2773
Research Article
Free Access

Toward Improving Short-Term Predictions of Fine Particulate Matter Over the United States Via Assimilation of Satellite Aerosol Optical Depth Retrievals

Rajesh Kumar

Corresponding Author

Rajesh Kumar

National Center for Atmospheric Research, Boulder, CO, USA

Correspondence to: R. Kumar,

[email protected]

Search for more papers by this author
Luca Delle Monache

Luca Delle Monache

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Conceptualization, Methodology, Resources, Writing - review & editing, Supervision, Project administration, Funding acquisition

Search for more papers by this author
Jamie Bresch

Jamie Bresch

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Methodology, Software, Writing - review & editing

Search for more papers by this author
Pablo E. Saide

Pablo E. Saide

Department of Atmospheric and Oceanic Sciences, Institute of the Environment and Sustainability, University of California, Los Angeles, CA, USA

Contribution: Methodology, Software, Writing - review & editing

Search for more papers by this author
Youhua Tang

Youhua Tang

National Oceanic and Atmospheric Administration, College Park, MD, USA

Contribution: Methodology, Software

Search for more papers by this author
Zhiquan Liu

Zhiquan Liu

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Methodology, Software

Search for more papers by this author
Arlindo M. da Silva

Arlindo M. da Silva

National Aeronautics and Space Administration Goddard Space Flight Center, Greenbelt, MD, USA

Contribution: Software

Search for more papers by this author
Stefano Alessandrini

Stefano Alessandrini

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Conceptualization, Writing - review & editing

Search for more papers by this author
Gabriele Pfister

Gabriele Pfister

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Conceptualization, Writing - review & editing

Search for more papers by this author
David Edwards

David Edwards

National Center for Atmospheric Research, Boulder, CO, USA

Contribution: Conceptualization, Writing - review & editing

Search for more papers by this author
Pius Lee

Pius Lee

National Oceanic and Atmospheric Administration, College Park, MD, USA

Contribution: Conceptualization, Software, Writing - review & editing

Search for more papers by this author
Irina Djalalova

Irina Djalalova

Physical Sciences Division, Cooperative Institute for Research Sciences/NOAA Earth System Laboratory, Boulder, CO, USA

Contribution: Conceptualization, Software

Search for more papers by this author
First published: 17 February 2019
Citations: 27

Abstract

This study develops a new approach to improve simulations of the particulate matter of aerodynamic diameter smaller than 2.5 μm (PM2.5) in the Community Multiscale Air Quality (CMAQ) model via assimilation of Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol optical depth (AOD) retrievals using the Gridpoint Statistical Interpolation (GSI) system. In contrast to previous studies that only consider errors due to transport, our computation of the background error covariance matrix incorporates uncertainties in anthropogenic emissions. To understand the impact of this approach, three experiments (one background and two assimilations) are performed over the contiguous United States (CONUS) from 15 July to 14 August 2014. The background CMAQ experiment significantly underestimates both the MODIS AOD and surface PM2.5 levels. MODIS AOD assimilation pushes both the CMAQ AOD and surface PM2.5 distributions toward the observed distributions, but CMAQ still underestimates the observations. Averaged over CONUS, the two assimilation experiments with and without including the anthropogenic emission uncertainties improve the correlation coefficient between the model and independent observations of PM2.5 by ~67% and ~48%, respectively, and reduces the mean bias by ~38% and ~10%, respectively. The assimilation improves the model performance everywhere over CONUS, except the New York and Wisconsin, where CMAQ overestimates the observed PM2.5 during nighttime after assimilation likely because of overcorrection of aerosol mass concentrations by the AOD assimilation. Future work should incorporate uncertainties in other processes (biomass burning and biogenic emissions, deposition, chemistry, transport, and boundary conditions) to further enhance the value of assimilating spaceborne AOD retrievals.

Key Points

  • Assimilation of MODIS AOD retrievals in CMAQ via GSI to improve surface PM2.5 forecasts
  • Accounting for uncertainties in anthropogenic emissions significantly improves the model performance after data assimilation
  • The assimilation-induced improvements in CMAQ aerosol initial conditions last for more than 48 hr

1 Introduction

The U.S. Environmental Protection Agency (EPA) has defined the fine particulate matter (PM), that is, the PM of less than 2.5 μm in aerodynamic diameter (PM2.5), as one of the six criteria air pollutants under the Clean Air Act. Elevated PM2.5 levels adversely affect human health, can cause premature mortalities via acute respiratory and cardiovascular diseases (Burnett et al., 2014; Fann et al., 2012), and may result in economic losses due to health care expenditure, missed school and work, and lost potential incomes from premature deaths. While adverse health impacts of elevated PM2.5 levels have been long known, a recent study showed that long-term exposure to PM2.5 of especially the elderly population at levels even below the National Ambient Air Quality Standard (NAAQS) of 12 μg/m3 (for annual average) can also cause premature deaths (Di et al., 2017).

The aforementioned impacts of PM2.5 can be mitigated if vulnerable groups and individuals receive timely information about anticipated PM2.5 pollution episodes so that they can take actions (e.g., reduce outdoor activities) to limit their exposure. The positive effects of such timely information have been demonstrated in previous studies. For instance, 57% of people with lifetime asthma and 51% of the people without asthma are reported to avoid exposure to air pollution in six U.S. states (Colorado, Florida, Indiana, Kansas, Massachusetts, and Wisconsin) by reducing their outdoor activities following air quality alerts from their healthcare professionals (Wen et al., 2009). Air quality alert announcements in Canada are also reported to reduce asthma-related emergency department visits by about 25% (Chen et al., 2018).

The responsibility of providing the timely information lies with the air quality managers across the United States, who provide this information by analyzing air quality and weather observations along with Numerical Weather Predictions (NWPs) and PM2.5 guidance from the National Air Quality Forecasting Capability (NAQFC). The NAQFC uses a state-of-the-science chemistry transport model (CTM) called the Community Multiscale Air Quality (CMAQ) model to predict PM2.5 (Lee et al., 2017). CMAQ employs advanced numerical procedures and sophisticated algorithms to process emission inventories and parameterizes a variety of atmospheric physical and chemical processes to predict concentrations of air pollutants including PM2.5. However, CTM simulations suffer from both systematic (i.e., biases) and random errors due to a number of factors including numerical approximations, inadequate understanding of some of the processes that control the spatial and temporal distribution of air pollutants, inaccuracies in initialization of the physical and chemical atmospheric state, and uncertainties in the emission inventories (Russell & Dennis, 2000). While continuous efforts are being made to improve the representation of processes controlling PM2.5 in CMAQ (e.g., Appel et al., 2013, 2017; Fahey et al., 2017; Nolte et al., 2015) and emission inventories are updated by the EPA every 3 years, recent developments have shown that improving initialization of aerosol mass concentrations in CTMs including CMAQ via assimilation of ground-based observations of PM2.5 and satellite retrievals of aerosol optical depth (AOD) can significantly improve PM2.5 predictions (e.g., Chai et al., 2017; Liu et al., 2011; McHenry et al., 2015; Pagowski et al., 2014; Saide et al., 2013; Schwartz et al., 2012; Tang et al., 2017).

This study develops a new approach to improve CMAQ aerosol initialization and short-term (48 hr) predictions of PM2.5 via the assimilation of Moderate Resolution Imaging Spectroradiometer (MODIS) AOD retrievals in the three-dimensional variational (3DVAR) framework of the community Gridpoint Statistical Interpolation (GSI) system (Developmental Testbed Center [DTC], 2016). Specifically, the incorporation of anthropogenic emission uncertainties in the background error covariance (BEC) matrix represents a novel aspect of this work. The BEC matrix plays a vital role in the variational analysis because, along with the observation errors, it determines how much of the innovation (the difference between the model and observed value) actually becomes the analysis increment and how the analysis increment is spread to neighboring grid points both horizontally and vertically. The BEC matrix could be calculated by knowing the difference between the modeled and true values of the analysis variables (aerosol chemical composition in this case). However, lack of knowledge of the true values of the analysis variables and the enormous size (~108 × 108 = ~1016 elements in our case) of the BEC matrix inhibits explicit calculation of the BEC matrix. Thus, its calculation needs to be simplified and several methods have been developed for this purpose.

The GSI system uses statistical parameters (variances, and horizontal and vertical correlation length scales) to approximate the convolution of the BEC matrix. These are generated using two model simulations initialized at different times but valid at the same time (e.g., Parrish & Derber, 1992; Wu et al., 2002). In previous studies of MODIS AOD assimilation (e.g., Liu et al., 2011; Saide et al., 2013; Tang et al., 2017), the two CTM simulations have differed only in terms of meteorological initialization, and the resulting background errors do not account for many important processes that can introduce large errors in regional air quality simulations (e.g., anthropogenic, biomass burning and biogenic emissions, chemical mechanisms, dry and wet deposition, and boundary conditions). Among these sources, anthropogenic emissions are one of the largest contributors to air quality. Therefore, we examine how incorporating anthropogenic emission uncertainties in the BEC matrix will affect CMAQ initialization and short-term PM2.5 predictions following assimilation of MODIS AOD retrievals. The manuscript is organized as follows. Section 2 provides details of the CMAQ model configuration, the GSI data assimilation system, and the MODIS AOD retrievals assimilated in this study. The observation data sets used for model evaluation are described in section 3. The experimental design, GSI adjustment of CMAQ AOD and aerosol chemical composition, and impact of AOD assimilation on CMAQ initialization, aerosol analyses, and 48-hr PM2.5 predictions are discussed in section 4. Results are summarized in section 5.

2 Materials and Methods

2.1 The CMAQ Configuration

This study uses the off-line version 5.1 of the CMAQ model (Byun & Schere, 2006) to simulate aerosol chemical composition, mass concentrations, and optical properties. We replicate the NAQFC domain settings by using the same model domain, map projection, and horizontal and vertical grid spacing to make our developments relevant to operations. The CMAQ domain is defined on a Lambert Conformal map projection centered at (40°N, 97°W) with Arakawa C-grid staggering and a horizontal grid spacing of 12 km in both the longitudinal and latitudinal directions (Figure 1). The domain has 442 grid points in the longitudinal direction, 265 grid points in the latitudinal direction, and 42 vertical levels extending from the surface to about 20 km.

Details are in the caption following the image
The Weather Research and Forecasting (WRF) and Community Multiscale Air Quality (CMAQ) modeling domains with terrain elevation. The white solid lines mark the location of CMAQ domain boundaries.

The meteorological fields required to drive CMAQ are also simulated at 12-km grid spacing using the Weather Research and Forecasting (WRF) model (Skamarock et al., 2008) that uses a larger domain than CMAQ with 481 and 369 grid points in the longitudinal and latitudinal directions, respectively, and 43 vertical levels stretching from the surface to 50 hPa. The static geographical fields such as the terrain height, soil properties, vegetation fraction, land use/vegetation, albedo, and erodible land fraction are interpolated from the U.S. Geological Survey data to the WRF domain using the WRF preprocessing system (WPS). The physical parameterizations used for the WRF model are listed in Table 1. The initial and boundary conditions for the meteorological fields are obtained from the 6 hourly North America Mesoscale 12-km analysis produced at the National Centers for Environmental Prediction. The WRF simulations use a time step of 15 s and save all the relevant meteorological parameters every hour to drive CMAQ, which are then interpolated to the CMAQ domain using version 4.3 of the meteorology-chemistry interface processor. The vertical diffusion and mixing in CMAQ are represented using the Asymmetric Convective Method 2 (ACM2), and the advection and diffusion scheme follows Byun (1999).

Table 1. List of the Selected WRF Atmospheric Physical Parameterizations Used in This Study
Atmospheric process Parameterization
Cloud microphysics WRF Single-moment 6-class (Hong & Lim, 2006)
Long-wave radiation Dudhia short-wave scheme (Dudhia, 1989)
Short-wave radiation Rapid Radiative Transfer Model (RRTM; Mlawer et al., 1997)
Surface layer MM5 similarity (Zhang & Anthes, 1982)
Land surface model Unified Noah Land surface model (Tewari et al., 2004)
Planetary boundary layer Yonsei University (YSU)
Cumulus Kain-Fritsch (Hong et al., 2006; Kain, 2004)
  • Note. WRF, Weather Research and Forecasting.

The gas-phase chemistry is represented by the Carbon Bond mechanism-2005 (CB-05) with an updated toluene chemistry (Whitten et al., 2010), whereas aerosol chemistry is represented using the AERO6 module. Aerosol processes in CMAQ are represented using three lognormal modes, namely, Aitken, accumulation, and coarse modes (Binkowski & Roselle, 2003). The AERO6 includes specification of trace metals (Appel et al., 2013; Reff et al., 2009) and source-specific ratios of organic mass to organic carbon (Simon & Bhave, 2012). Inorganic aerosols in the Aitken and accumulation modes are assumed to be in thermodynamic equilibrium, calculated using version II of the ISORROPIA thermodynamic equilibrium module (Fountoukis & Nenes, 2007). The gas-particle partitioning between the gas phase and coarse mode particles is treated dynamically following Kelly et al. (2010). The secondary organic aerosol formulation from various gas-phase precursors is calculated following Carlton et al. (2010).

Anthropogenic emissions of trace gases and aerosols are based on the EPA National Emission Inventory (NEI) for the year 2011. Biogenic emissions are represented using the Biogenic Emissions Inventory System (BEIS) version 3.13. Biomass burning emissions of aerosols and trace gases within the model domain are estimated using the U.S. Forest Service Bluesky Framework utilizing the National Oceanic and Atmospheric Administration (NOAA) Hazard Mapping System to geographically locate and estimate the strength of wildfires.

We generated 24-hr CMAQ forecasts of aerosols for the period of 10 July to 14 August 2014. CMAQ forecast on 10 July 2014 used idealized initial conditions for all the chemical species. The initial conditions for all other CMAQ forecasts were based on the previous day's CMAQ run. Similar to the NAQFC, lateral boundary conditions for CMAQ chemical fields are represented using Goddard Earth Observing System (GEOS)-Chem (Lee et al., 2017; Tang et al., 2009) simulated monthly median concentrations. The CMAQ simulations use a time step of 6 min for chemistry simulations, and the output is saved every hour for further analysis.

2.2 Chemical Data Assimilation System

This study uses the 3DVAR scheme of the community GSI system (version 3.5) for assimilating the MODIS AOD retrievals in CMAQ. The 3DVAR scheme blends information from the observations and a model background to find an optimal analysis state by minimizing a near-quadratic cost function as defined in equation 1 following DTC (2016).
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0001(1)
where x represents the state vector that consists of aerosol chemical composition and meteorological variables required in AOD calculation, xb represents the a priori information about x and is commonly referred to as background, B is the BEC matrix, H is the forward operator that transforms CMAQ aerosol chemical composition to AOD, y represents the MODIS AOD retrievals, and R is the observation error covariance matrix. The two terms on the right-hand side of equation 1 represent the deviation of the analysis state from the model background and observations within the constraints of background and observation errors, respectively. At the analysis point (x = xa), the xJ(x) becomes zero. The background and analysis fields are generated using the CMAQ configuration described in section 2.1, and the rest of the key components of the GSI system are discussed in following sections.

2.2.1 Control and State Variables

Two different approaches have been used to define control variables (variables that are adjusted by AOD assimilation) in previous MODIS AOD assimilation studies. The first approach defines individual aerosol species as control variables (e.g., Liu et al., 2011; Schwartz et al., 2012; Tang et al., 2017), and the second approach uses the total mass per aerosol size bin as control variables (e.g., Benedetti et al., 2009; Saide et al., 2013). We follow the second approach here because it reduces the number of control variables from 62 to 3, which in turn reduces the cost of both the BEC statistics calculation and iterative optimization, and also inhibits the accumulation of changes to aerosol species with the largest contribution to total aerosol mass. The three control variables are named as AMASSI, AMASSJ, and AMASSK representing the total aerosol mass for the Aitken, Accumulation, and Coarse modes. AOD assimilation generates analysis increments for these three control variables, which are then distributed to the individual aerosol chemical components in GSI using the percentage contribution of individual species to total aerosol mass per mode in the model background state. All of the CMAQ aerosol species, the three control variables, and other variables (temperature, pressure, relative humidity, and grid thickness) that are required for the AOD calculation from CMAQ aerosol chemical composition constitute the state vector in the GSI.

2.2.2 The BEC Matrix

We employed the National Meteorological Center (Parrish & Derber, 1992) method of a community Generalized Background Error (GEN_BE) for calculating the BEC statistical parameters, similar to previous chemical data assimilation studies involving MODIS AOD retrievals (Liu et al., 2011; Saide et al., 2013; Schwartz et al., 2012). GEN_BE uses the difference between two forecasts valid at the same time but initialized at different times (e.g., 00 Z and 06 Z forecasts in our case) to represent a sample of model background errors. GEN_BE calculation involves three stages listed below.
  • Stage 1: Calculate and store difference between 30 pairs of CMAQ forecasts valid at the assimilation times (15 Z, 18 Z, and 21 Z). The 30 pairs correspond to daily forecasts generated for the period of 15 July to 14 August 2014.
  • Stage 2: Remove the temporal mean from the differences generated in Stage 1.
  • Stage 3: This stage calculates the statistical parameters, that is, variance, horizontal, and vertical length scales to model the BECs. The horizontal length scale (HLS) is based on the ratio of the variance of a field (C) and the variance of its Laplacian using the following equation.
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0002(2)
where C represents the aerosol mass concentrations here. The vertical length scale for each sigma level (l) is calculated using the following equation:
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0003(3)
where vcor[l] and vcor[l + 1] represent the vertical error covariances for the levels l and l + 1, respectively. Further details regarding the derivation of these formulas to calculate horizontal and vertical length scale can be found in Wu et al. (2002).

In this study, two 24-hr CMAQ forecasts initialized at 00 Z and 06 Z with meteorology input from two different WRF forecasts are generated every day from 15 July to 15 August of 2014. We select a 06 Z initialization time for the second simulation in order to reduce the effect of initial conditions on the CMAQ simulations at 15 Z that corresponds to the first (Terra satellite) MODIS overpass of the day over the United States. These WRF forecasts are initialized 6 hr before the CMAQ forecasts, that is, at 18 Z of the previous day for the 00 Z CMAQ forecast and at 00 Z for the 06 Z CMAQ forecast. The first 6 hr of both WRF forecasts are discarded as model spin-up. The CMAQ forecasts valid at the MODIS (both Terra and Aqua) overpass time, that is, 15 Z, 18 Z, and 21 Z are then fed to the GEN_BE to calculate the BEC statistical parameters.

We generated two sets of the BEC statistical parameters. The first set is called MET_BE because the two CMAQ forecasts differed only in WRF meteorological input. The second set is same as the first one but we added a spatially varying perturbation factor to the NEI anthropogenic emissions for the 06 Z CMAQ forecast. This perturbation factor is estimated by comparing the NEI 2011 anthropogenic emissions estimates with those available from four other global emission inventories, namely, the Emissions Database for Global Atmospheric Research for Hemispheric Transport of Air pollution (EDGAR-HTAP) version 2, Representative Concentration Pathway 8.5 (RCP8.5), PEGASOS (Pan-European Gas-Aerosols-Climate Interaction Study - Atmospheric Chemistry and Climate Change Interactions emission inventory), and ECLIPSE (Evaluating the Climate and Air Quality Impacts of Short-lived Pollutants) emission inventories. EDGAR-HTAP provides global emissions of air pollutants at 0.1° × 0.1° resolution, while MACCity, PEGASOS, and ECLIPSE provide the emissions at 0.5° × 0.5° resolution. All these emission inventories are mapped to the CMAQ domain using a mass conserving anthropogenic emission preprocessor called anthro_emis (https://www2.acom.ucar.edu/wrf-chem/wrf-chem-tools-community). The comparison of NEI with these emission inventories is demonstrated with an example of anthropogenic primary organic carbon emissions over the CMAQ domain (Figure 2). In general, NEI emissions are on the low end compared to the global emission inventories. The emission perturbation factor is calculated by first subtracting the NEI emissions from each of the global emission inventories and then averaging the difference of four values. The spatial distribution of the perturbation factor looks very similar to the RCP85, PEGASOS, and ECLIPSE emissions inventories, which show similar spatial distributions but higher values than the EDGAR-HTAP. Similar maps of perturbation factors are generated for all the species and added to the NEI anthropogenic emissions for the MET+EMIS_BE case. Similar differences are seen for other trace gases and aerosols. The inconsistencies in activity data sets (e.g., fuel consumption) and differences in emission factors used in emission estimation algorithms are mainly responsible for large discrepancies among the current emission inventories (Granier et al., 2011).

Details are in the caption following the image
Spatial distribution of anthropogenic primary organic carbon emissions over the Community Multiscale Air Quality (CMAQ) domain from five different inventories. The perturbation factor derived by comparing National Emission Inventory (NEI) with these emission inventories is shown in the bottom rightmost panel.

We examine the influence of including anthropogenic emission uncertainties in the calculation of BEC statistical parameters by comparing the accumulation mode aerosol (AMASSJ) variances for both the MET_BE and MET + EMIS_BE cases (Figure 3). The variances for both the cases show a similar vertical distribution with highest values near the surface that decrease with altitude. However, we notice a larger increase in the variance in the lowest 20 model levels (i.e., up to about 3 km) with the inclusion of anthropogenic emission uncertainties. For instance, the variance increases from 0.5–1.5 μg/m3 to 3–10 μg/m3 for the areas between and 25–50°N below 1.5-km altitude. This is not surprising considering that emission perturbations are applied only in the lowest model layer, and we expect these perturbations to be well mixed within the planetary boundary layer (PBL). The variance above the 20th model level is similar in the MET_BE and MET + EMIS_BE cases. This is also expected as variability in the distribution of air pollutants in the free troposphere is often strongly driven by inflow from the domain boundaries (e.g., Kumar et al., 2015; Pfister et al., 2011), and our setup does not account for uncertainties in boundary conditions. The AMASSI standard deviation also shows changes similar to the AMASSJ variance for the MET + EMIS_BE case. However, we see negligible changes in AMASSK variance between MET_BE and MET + EMIS_BE (Figure S1 in the supporting information) mainly because coarse mode aerosols are mostly emitted from natural sources (desert dust and sea-salt). Therefore, accounting for uncertainties in anthropogenic emissions does not affect the AMASSK variances significantly. The variance values are the highest for AMASSJ in the MET + EMIS_BE experiment and for AMASSK in the MET_BE experiment.

Details are in the caption following the image
Comparison of the AMASSJ Background Error (BE) variances for [a] MET_BE and [b] MET+EMIS_BE cases.

The GSI makes an assumption which smooth outs the spatial variability of emissions. GSI assumes that the HLSs vary only by altitude and latitude (in 1° bins) and the vertical length scales vary only by altitude. For calculation of the HLSs, aerosol concentrations from all the longitudes within 1° latitude bins are used in equation 2. Similarly, data from all longitudes and latitudes at every level are used to calculate the vertical length scales. The horizontal and vertical length scales are very similar between the MET_BE and MET + EMIS_BE cases, with values in the range of 1–3 grid points likely because changes in the variances and gradients in the MET + EMIS_BE relative to MET_BE cancel each other out.

2.2.3 MODIS AOD Retrievals and Observations Errors

This study uses the MODIS AOD from the National Aeronautics and Space Administration (NASA) Neural Network Retrieval (NNR; Randles et al., 2017) that provides observationally constrained AOD retrievals designed to better fit the Aerosol Robotic Network (AERONET) observations to provide unbiased and assimilation ready products. The assimilation of NNR is shown to reduce errors in PM2.5 and AOD simulations at more stations compared to the operational collection 5.1 MODIS dark-target retrievals (Saide et al., 2013). The operational GEOS-5 (Rienecker et al., 2008) aerosol assimilation system (Global Modeling and Assimilation Office, 2017) and MERRA-II reanalyses (Randles et al., 2017) also assimilates the NNR retrievals. The NNR uses slightly different predictors over the land and the ocean. Top of the atmosphere reflectance, cloud fraction (<85%), solar and sensor angles, glint, and GEOS-5 surface wind speeds are used as predictors for the ocean retrievals. For the land retrievals, the predictors used include top of the atmosphere reflectance, cloud fraction (<85%), climatological albedo (only when it is lower than 0.25), and solar and sensor angles. MODIS AOD retrieval is not used as a predictor in the NNR retrievals. The NNR also provides 550-nm AOD at 10-km resolution like MODIS Level 2 operational retrievals (Global Modeling and Assimilation Office, 2017). The observation errors are specified following Remer et al. (2005) as (0.03 + 0.05 * AOD) and (0.05 + 0.15 * AOD) over the ocean and the land, respectively. The uncertainties in the forward operator design may also affect the observation error covariance matrix as discussed later in section 4.1. These observational errors have also been used in previous MODIS AOD assimilation experiments (e.g., Liu et al., 2011; Schwartz et al., 2012). MODIS overpasses corresponding to 15 Z, 18 Z, and 21 Z retrieve AOD over parts of the United States, but the spatial coverage is highest at 18 Z (Figure S2). A total of 14,882 MODIS AOD retrievals are assimilated into CMAQ during 15 July to 14 August 2014.

2.2.4 The Forward and Adjoint Operators

A simple forward operator based on the parameterization of Malm and Hand (2007) is developed to convert CMAQ aerosol chemical composition into AOD for a direct comparison with MODIS AOD retrievals. This parameterization calculates the AOD following equations 2 and 3 and is also used as one of the two visibility calculation methods in CMAQ.
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0004(4)
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0005(5)
where i = 1, 2 … N represents the vertical layers in CMAQ, βexti represents the extinction coefficient for the layer i, values in square brackets represent mass concentrations of different aerosol chemical compounds, and f(RH) and f(RH)SS represents a relative humidity correction factor that accounts for hygroscopic growth of sulfate-nitrate-ammonium and sea-salt aerosol components, respectively. f(RH) and f(RH)SS are determined from look-up tables, and their variations with RH are shown in Figure S3. Extinction due to other aerosol components is assumed to be invariant with RH.
The tangent linear (TL) and adjoint (AD) of the forward operator are generated using the automatic differentiation tool TAPENADE (http://www-sop.inria.fr/tropics/tapenade.html). Here the forward operator is the TL itself. The resulting TL code is validated using the Taylor-Lagrange formula:
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0006(6)
where P is the TL code to be tested against the forward operator code Q. C represents CMAQ aerosol chemical composition, and h is the perturbation factor, which is varied from 10−1 to 10−9, and the ratio of the finite difference derivative calculated using the forward operator Q code (numerator in equation 4) to the derivative calculated by the TL code (denominator in equation 4) is found to be 1 for all values of h.
The AD code needs to be tested because it transforms the changes in J with respect to AOD back to changes with respect to aerosol mass concentrations. The AD code is tested using the following equation:
urn:x-wiley:2169897X:media:jgrd55296:jgrd55296-math-0007(7)

Equation 7 states that the inner product of the derivatives generated using the TL code (left-hand side) must be equal (in limits of the machine precision) to the inner product of the adjoint derivative and original perturbation (right-hand side). This test was also performed successfully for h values ranging from 10−1 to 10−9 and for all aerosol species.

2.2.5 Cost Function Minimization

The gradient of cost function becomes zero at the minimum of the cost function. However, the analytical solution of x J = 0 is not possible because of the large rank of B and thus numerical approaches are required to minimize the cost function. GSI preconditions its cost function by defining a new variable z = B−1x. This eliminates the requirement to invert B in minimizing the cost function. z J (=B ∇x J) and x J are minimized simultaneously using an iterative conjugate gradient method. The minimization process is described in detail in DTC (2016). The convergence threshold for the GSI solution is set to 10−9, and the maximum number of iteration set to reach this convergence threshold is set to 50. We find that the GSI reached convergence in 11–47 iterations for all the assimilation cases.

2.3 Experimental Design

We conducted three CMAQ experiments to assess the impact of including anthropogenic emission uncertainties on CMAQ initialization and short-term PM2.5 predictions due to assimilation of MODIS AOD. The first background CMAQ experiment does not assimilate MODIS AOD retrievals and is thus named BKG. The other two experiments are named MET_BE and MET + EMIS_BE as they assimilate MODIS AOD with the BEC statistics corresponding to the aforementioned two sets of the BEC statistics. Figure 4 depicts the setup used every day to assimilate MODIS AOD in CMAQ for the period of 15 July to 14 August 2014. Each day, we use automated scripts to conduct four CMAQ runs and three GSI runs. The first CMAQ run starts at 00 Z and ends at 15 Z and is followed by the assimilation of MODIS AOD at 15 Z. The analysis state produced by the 15 Z GSI run provides initial conditions for 15–18 Z CMAQ forecast. The procedure is repeated for 18–21 Z and 21–24 Z CMAQ runs and 18 Z and 21 Z GSI runs. The CMAQ output at 24 Z serves as the initial conditions for the next day's 00–15 Z CMAQ run.

Details are in the caption following the image
Schematic of the Gridpoint Statistical Interpolation-Community Multiscale Air Quality (GSI-CMAQ) setup for assimilation of Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol optical depth (AOD) retrievals in CMAQ for a typical day.

3 Evaluation Data Sets

CMAQ-simulated PM2.5 mass concentrations are evaluated against surface PM2.5 measurements obtained from the Air Quality System (AQS) data of the EPA. The AQS data contains all the PM2.5 measurements that EPA collects under the national ambient air monitoring program. Different tribal, state, and local agencies collect these data sets and perform several quality control tests before archival on the AQS data website (https://www3.epa.gov/ttn/amtic/quality.html). CMAQ aerosol chemical composition is first converted into PM2.5 concentrations that can be compared directly with the EPA measurements using the CMAQ combine utility, and then paired with the observed values in space and time using the CMAQ sitecmp utility. CMAQ PM2.5 concentrations are estimated using the sharp-cut PM2.5 inlet method (Jiang et al., 2006) that calculates the volume fraction of each mode (Aitken, Accumulation, and Coarse) below 2.5-μm diameter. While EPA measured PM2.5 at 1981 sites across the United States during July–August 2014, we considered only 659 sites in our analysis to ensure that all the measurement sites used in the evaluation had at least 50% data availability, that is, 384 hourly measurements during 15 July to 14 August 2014. Daily aerosol chemical composition measurements at 145 sites from the EPA Chemical Speciation Network are also used to evaluate the CMAQ aerosol chemical composition. Additionally, we have obtained the planetary boundary layer height (PBLH) derived for 75 sites within the model domain from the Integrated Global Radiosonde Archive (IGRA; Durre et al., 2006; Durre & Yin, 2008). These estimates are based on the radiosonde observations and are reported to have an uncertainty of a few 100 m (Seidel et al., 2010).

4 Results and Discussion

4.1 GSI Adjustment of CMAQ AOD and Aerosol Chemical Composition

The assimilated MODIS AOD retrievals collocated with CMAQ AOD for all three CMAQ experiments are saved in the GSI runs at all the assimilation times. The frequency distributions of collocated MODIS and CMAQ AOD for BKG, MET_BE, and MET + EMIS_BE cases are compared in Figure 5 (top panel). CMAQ AOD with and without assimilation significantly underestimates the MODIS AOD, but as expected, the assimilation brings the CMAQ AOD distribution closer to the MODIS AOD distribution. The average values of CMAQ and MODIS AOD along with one standard deviation in the average values are shown in Table 2, and the correlation coefficients and the mean biases (MBs) are shown in Figure 5 (bottom panel). For the MET_BE and MET + EMIS_BE experiments, the assimilation increases the correlation coefficient between the MODIS and CMAQ AOD by 0.20–0.22, and 0.39–0.54, respectively, and reduces the MB by 0.01–0.02 and 0.04–0.06, respectively. The bootstrap confidence intervals (5–95% limits of the distribution obtained by computing a specific statistical metric over a data set) show that the decreases in MB and the increase in correlation coefficient in both the assimilation experiments are statistically significant. The improvements in the MET + EMIS_BE are also statistically significant compared to the MET_BE experiment. To examine the robustness of improvement in AOD, we compared CMAQ simulated AOD for the BKG and MET + EMIS_BE experiment with the Multi-angle Imaging Spectroradiometer (MISR) Level 2 AOD retrievals over the model domain (Figure S4). MISR retrievals are not available for the period of 15 July to 1 August 2014, and thus, the comparison shown here is performed for 2 August to 14 August 2014. A total of 76,003 MISR retrievals are included in the comparison. The assimilation of MODIS AOD pushes the CMAQ AOD distribution closer to the MISR AOD distribution, but CMAQ still underestimates the MISR AOD similar to CMAQ-MODIS comparison (see Figure 5). Average MISR AOD is estimated to be 0.17 ± 0.13, and the corresponding CMAQ AOD in the BKG and MET + EMIS_BE experiments are estimated to be 0.06 ± 0.06 and 0.09 ± 0.08, respectively. The correlation coefficient (r) between CMAQ and MISR AOD improved also from 0.28 in the BKG experiment to 0.49 in the MET + EMIS_BE experiment.

Details are in the caption following the image
Frequency distributions of collocated Moderate Resolution Imaging Spectroradiometer (MODIS) and Community Multiscale Air Quality (CMAQ) aerosol optical depth (AOD) at 550 nm over the model domain for all the CMAQ experiments and at all assimilation times (top panel). Histograms of the correlation coefficient and mean bias for the three CMAQ experiments (bottom panel). The vertical bars on the histograms show the bootstrap confidence intervals.
Table 2. Domain-Wide Statistical Comparison of Collocated CMAQ and MODIS AOD at 15 Z, 18 Z, and 21 Z for All the CMAQ Experiments
Time MODIS CMAQ (BKG) CMAQ (MET_BE) CMAQ (MET + EMIS_BE)
15 Z 0.22 ± 0.17a 0.07 ± 0.05 0.09 ± 0.05 0.13 ± 0.07
18 Z 0.19 ± 0.22 0.05 ± 0.05 0.07 ± 0.06 0.10 ± 0.08
21 Z 0.16 ± 0.18 0.04 ± 0.03 0.05 ± 0.04 0.08 ± 0.05
  • a Mean ± standard deviation; AOD, aerosol optical depth; CMAQ, Community Multiscale Air, Quality; MODIS, Moderate Resolution Imaging Spectroradiometer.

The increase in correlation coefficient in both the MET_BE and MET + EMIS_BE experiments is higher than that (0.03–0.06) reported in previous studies (e.g., McHenry et al., 2015; Schwartz et al., 2012) employing a variational data assimilation scheme over a domain similar to the one used here. Previous studies also reported a reduction in the MB of 0.03–0.07 (McHenry et al., 2015; Schwartz et al., 2012; Tang et al., 2017) that is higher than the MET_BE experiment but is comparable for the MET + EMIS_BE experiment. The assimilation of MODIS AOD using a different method called Optimal Interpolation (OI) produces larger increments in CMAQ AOD (e.g., Tang et al., 2017; Chai et al., 2017). Tang et al. (2017) attributed larger increments in the OI than the GSI to the use of stronger background errors in the OI. A similar behavior is found here, where larger departures from the background are found for MET + EMIS_BE compared to MET_BE as larger standard deviation values are used in MET + EMIS_BE (Figure 3) generating a closer fit to the observations being assimilated (Figure 5).

The GSI translates changes in CMAQ AOD to the AMASSI, AMASSJ, and AMASSK because we are using total aerosol mass per mode as control variables. The analysis increments in the lowest model layer AMASSI, AMASSJ, and AMASSK due to the assimilation of MODIS AOD at 18 Z averaged over the whole study period for the MET_BE and MET + EMIS_BE experiments are shown in Figure 6. MODIS AOD assimilation mostly leads to positive increments in aerosol mass for all the modes with the highest increments in AMASSJ. Analysis increments in both AMASSI and AMASSJ are much larger for the MET + EMIS_BE experiment compared to MET_BE. For instance, AMASSI increments increase from 0.001–0.002 μg/m3 in the MET_BE experiment to more than 0.005 μg/m3 in MET + EMIS_BE over many parts of the domain. Similarly, AMASSJ for MET + EMIS_BE is much larger (1–6 μg/m3) compared to the MET_BE experiment (<1 μg/m3) especially in Oregon, Washington, Idaho, Montana, North Dakota, South Dakota, Nebraska, Minnesota, Wisconsin, and the Great Lakes. Larger increments in the northwest contiguous United States in the MET + EMIS_BE experiments is also in contrast with Schwartz et al. (2012), who found little improvement in aerosol mass concentrations after assimilation of MODIS AOD. The differences in AMASSI and AMASSJ analysis increments between the MET_BE and MET + EMIS_BE are higher for northern CONUS because of larger difference between the background error variances for the two experiments at northern latitudes (see Figure 3). The AMASSK analysis increment for the MET + EMIS_BE experiment is smaller compared to the MET_BE experiment. This is because the background error standard deviation values are the highest for AMASSK in the MET_BE experiment, and thus, GSI produces larger analysis increments in the coarse mode. Similar spatial distribution of average analysis increments in AMASSI and AMASSJ are seen at 15 Z and 21 Z (Figures S5 and S6).

Details are in the caption following the image
Analysis increments in surface layer AMASSI, AMASSJ, and AMASSK averaged over the whole study period for both the MET_BE and MET + EMIS_BE experiments at 18 Z. Note different color scale used for AMASSI.

The errors in forward operator can affect the magnitude of analysis increments. To gain some insight into the role of forward operator error, we perform a sensitivity experiment by increasing the observation error by 100% in the MET + EMIS_BE experiment. Our assumption of 100% difference in AOD estimated using different forward operators in based on Tang et al. (2017), who compared AOD calculated using three different aerosol optical property calculations for 1 July 2011 18 UTC. They did not see a 100% difference everywhere over the NAQFC domain, and thus, our assumption might be viewed as an upper bound to the contribution of forward operator uncertainties to the observation error covariances. The new observation error is specified as 30% of AOD over the land and 10% over the Ocean. Figure S7 shows the effect of increasing the observation error on average analysis increments in aerosol mass concentrations for the Aitken (AMASSI), Accumulation (AMASSJ), and Coarse (AMASSK) modes, respectively, at 18 Z. Increasing the observation error reduces the analysis increment in the MET + EMIS_BE experiment but does not affect their spatial distribution. Similar reductions are observed in the analysis increments at 15 Z and 21 Z, respectively. Average reductions in analysis increments due to 100% increase in observation error are estimated to be 36–40% in both AMASSI and AMASSJ and 28–33% in AMASSK. However, the analysis increments in the MET + EMIS_BE experiment with 100% increase in observation error are still much larger compared to those in the MET_BE experiment even with original observation errors.

To understand how AOD changes are translated vertically, we compare vertical profiles of analysis increments averaged over the whole domain for the entire study period at 18 Z for the MET_BE and MET + EMIS_BE experiments with the corresponding absolute values of AMASSI, AMASSJ, and AMASSK in the BKG experiment (Figure 7). The analysis increments in AMASSI and AMASSJ peak between model levels 10 and 13 (i.e., model layers between 1 to 1.5 km) for both the MET_BE and MET + EMIS_BE experiments, while those in AMASSK peak at the surface. The analysis increments approach zero above model level 25 (i.e., ~7 km), which is likely due to lack of accounting for boundary condition uncertainties in our BEC matrix. The MET + EMIS_BE experiment produces larger analysis increments throughout the model atmosphere in AMASSI and AMASSJ, while MET_BE produces a larger increment in AMASSK at all vertical levels.

Details are in the caption following the image
Vertical distribution of domain averaged analysis increments in AMASSI, AMASSJ, and AMASSK averaged over the whole study period for the MET_BE and MET + EMIS_BE experiments at 18 Z (top panel). The corresponding absolute values of AMASSI, AMASSJ, and AMASSK in the BKG experiment are also shown (bottom panel).

The final step in the GSI is to distribute the changes from AMASSI, AMASSJ, and AMASSK to the individual aerosol chemical components. To examine whether the AOD assimilation changes the aerosol chemical composition in the right direction, we compare CMAQ-simulated surface layer sulfate (SO4), nitrate (NO3), ammonium (NH4), organic carbon (OC), elemental carbon (EC), chloride (Cl), and total PM2.5 for the BKG, MET_BE, and MET + EMIS_BE experiments with the corresponding Chemical Speciation Network observations (Figure 8). Among all the components, the observations show that OC has the highest mass concentrations, followed by SO4, NH4, EC, NO3, and Cl. CMAQ simulates this order except that it switches place of SO4 with OC. The BKG experiment underestimates the observed concentrations of all aerosol components. The AOD assimilation increases the concentrations of all aerosol components but pushes the SO4 concentrations a little further than the observations and still underestimates mass concentration of other components. An analysis of the fractional contribution of different aerosol chemical components to the total PM2.5 mass concentrations shows that AOD assimilation has a very small impact on the fractional contribution of different species to PM2.5 (Figure 8, bottom panel). We notice large overestimation of the SO4 fraction and underestimation of other PM2.5 fractions in the BKG experiment. This causes the data assimilation to apply largest increments in SO4 because it is the dominant species in the BKG experiment, which in turn leads to overprediction of SO4 with AOD assimilation. The overestimation of SO4 could be related to the use of older anthropogenic emissions from NEI 2011. The degradation of some of the aerosol components after the AOD assimilation has also been reported for California (Saide et al., 2013) as well as for CONUS (McHenry et al., 2015).

Details are in the caption following the image
(top left panel) Spatial distribution of the Chemical Speciation Network (CSN) sites used for evaluating the Community Multiscale Air Quality (CMAQ) aerosol chemical composition. (top right panel) Comparison of observed and model simulated aerosol chemical components averaged over the whole study period for all the three CMAQ experiments. Percentage contribution of different aerosol chemical components to PM2.5 mass concentrations in the observations and three CMAQ experiments are shown in the bottom panel. For both the model and observations, the contribution of other components is derived by subtracting the sum of SO4, NO3, NH4, OC, EC, and Cl from the total PM2.5 mass concentrations.

The underestimation of most of the aerosol components can be attributed to the underestimation of emissions as well as errors in model simulations of the PBLH. To gain some insights into the model's ability to simulate the PBLH, we compare the WRF-simulated PBLH (that is used in CMAQ) with the IGRA-derived PBLH estimates at 00 Z averaged over the period of 15 July to 14 August 2014 (Figure 9). Both the model and IGRA estimates show similar spatial distributions with lower PBLH over the eastern United States and higher over the western US. This is because 00 Z corresponds to evening (1900–2000 local time) in the eastern and central United States, and late afternoon (1700–1800 local time) in the western United States. Thus, this comparison allows us to evaluate two regimes of the PBL, viz., the fully developed PBL regime in the western United States and relatively shallower evening PBL regime in the eastern United States.

Details are in the caption following the image
Comparison of the Weather Research and Forecasting (WRF) simulated PBLH used in CMAQ with the Integrated Global Radiosonde Archive (IGRA)-derived PBLH estimates at 00 Z averaged over the period of 15 July to 14 August 2014. The difference between mean CMAQ and IGRA-derived PBLH estimates is also shown.

The model significantly underestimates the IGRA PBLH in the eastern United States with biases as high as 1,000–1,500 m indicating that the PBL collapses too early in the model. We see a mixed model performance in the central and western United States with the model overestimating the IGRA PBLH at some sites and underestimating it at the others. However, the model biases are not as strong as they are in the eastern United States indicating the model performs better in capturing the fully developed PBL. If the emission estimates are correct, a shallower PBL in the model would lead to overestimation of modeled air pollution concentrations at the surface by mixing them into a smaller volume and vice versa. The shallower PBL might also affect the formation of secondary aerosol components by affecting the kinetics of their chemical production and dry deposition at night. However, as shown in the next section, CMAQ underestimates the PM2.5 mass concentrations throughout the day suggesting that uncertainties in emission inventories are large enough to mask the variability in aerosol mass concentrations due to uncertainties in PBL mixing. Another potential source contributing to the underestimation of AOD as well as aerosol mass concentrations in CMAQ could be the lack of timely varying chemical boundary conditions. Furthermore, the representativeness errors resulting from the comparison of point observations located near a strong local source with grid-box model averaged values might also contribute to the model-observation discrepancy. While the model underestimates both the AOD and aerosol mass concentrations, the above discussion shows that data assimilation pushes the model in the right direction with performance similar (or in some cases better) to previous AOD assimilation studies.

4.2 Effect of AOD Assimilation on Surface PM2.5

The collocated observed and CMAQ PM2.5 mass concentrations averaged over all the 659 sites at diurnal and daily time scales for all the three CMAQ experiments are compared in Figure 10. It is important to distinguish between the daily and diurnal scales here because we are assimilating only 1–2 MODIS AOD retrieval every day, and thus, we expect the data assimilation to improve model performance in capturing the day-to-day variability rather than the diurnal variability. At both the diurnal and daily scale, CMAQ simulations with and without assimilation significantly underestimate the observed PM2.5 mass concentrations similar to the AOD. This behavior is in line with the previous studies where models continued to underestimate (e.g., McHenry et al., 2015; Schwartz et al., 2012) or overestimate (e.g., Saide et al., 2013) the PM2.5 mass concentrations even after assimilating MODIS AOD. However, the assimilation for both the MET_BE and MET + EMIS_BE experiments reduces the model bias with the MET + EMIS_BE experiment yielding larger improvements.

Details are in the caption following the image
Geographic locations of the Environmental Protection Agency (EPA) PM2.5 monitoring sites used for evaluation of Community Multiscale Air Quality (CMAQ) simulated PM2.5 mass concentrations are shown in the top left panel. The comparisons of the observed and CMAQ simulated diurnal and daily variability of PM2.5 averaged over all the sites during 15 July to 14 August 2014 for all the three CMAQ experiments are shown in the top right and bottom panels, respectively. Standard deviation in the average observed values range from 4.8 to 11.9 μg/m3, and those in CMAQ average value range from 2.7 to 7.5 μg/m3. Standard deviations are not plotted in the figure to maintain clarity.

In comparison to the observed diurnal variability, CMAQ misses the evening peak observed around 2000–2100 hr and the monotonic decrease from 2200 to 0300 hr. The correlation coefficients between the observed and CMAQ simulated diurnal PM2.5 cycles for the BKG, MET_BE, and MET + EMIS_BE cases are 0.33, 0.34, and 0.34, respectively. This indicates that data assimilation of temporally sparse, that is, 1–2 MODIS AOD retrievals per day has little impact on the model's ability to capture diurnal variability despite significantly reducing the model bias. In contrast, data assimilation significantly improves the correlation coefficient for the day-to-day variability as reflected by an increase in correlation coefficient from 0.48 in the BKG to 0.71 (~48% improvement) in MET_BE and to 0.80 (~67% improvement) in MET + EMIS_BE. The observed PM2.5 averaged over all the sites and at all times during 15 July to 14 August of 2014 is estimated to be 9.7 ± 7.3 μg/m3, and the corresponding CMAQ averaged values for the BKG, MET_BE, and MET + EMIS_BE are estimated to be 4.5 ± 3.9 μg/m3, 5.0 ± 4.2 μg/m3, and 6.5 ± 5.4 μg/m3, respectively. Thus, assimilation of MODIS AOD into CMAQ reduces the MB in CMAQ surface PM2.5 from −5.2 μg/m3 in the BKG experiment to −4.7 μg/m3 (~10% reduction) in the MET_BE experiment and to −3.2 μg/m3 (~38% reduction) in the MET + EMIS_BE experiment. In comparison, the MB reduced by ~14% over CONUS in McHenry et al. (2015).

In addition to the above discussed all-site comparison, we also evaluated how the assimilation of MODIS retrievals affects the model performance at each site in terms of the correlation coefficient, MB, and root-mean-square error (RMSE). The absolute values of these statistical parameters for the BKG, MET_BE, and MET + EMIS_BE cases are shown in Figure 11, and percentage improvements of the MET_BE and MET + EMIS_BE experiments relative to the BKG experiment are shown in Figure 12. Large variability in correlation coefficient values across the U.S. points toward heterogeneity in the model's ability to reproduce day-to-day variability in the observed PM2.5. The data assimilation improves the correlation coefficients at more than 80% of the sites across the United States with larger improvements in the MET + EMIS_BE than the MET_BE. Few sites in California, Oregon, Washington, Montana, Colorado, South Dakota, Texas, and Florida show negative correlation coefficient in the BKG experiment. AOD assimilation turns the negative correlation coefficient values to positive at most of these sites except at the few locations in Texas, especially in the MET + EMIS_BE experiment. The correlation coefficient at most of the remaining sites show values of 0.2–0.8 in the BKG experiment, which increases by more than 50% in many cases in both the MET_BE and MET + EMIS_BE experiments. The MB is highest (exceeding −10 μg/m3) at sites in California, Oregon, Washington, Idaho, Montana, and Texas. The MB ranges from −2 to −8 μg/m3 at the majority of the remaining sites. The MET_BE experiment reduces the MB by less than 20% at a large portion of the sites, while MET + EMIS_BE reduces the MB by more than 30% at many sites with reductions as high as 50% in several parts of the eastern United States, California, Oregon, and Washington (Figure 12). The spatial distribution of the reduction in RMSE is similar to the MB with the MET + EMIS_BE leading to a larger reduction.

Details are in the caption following the image
Spatial distribution of correlation coefficient (CC), mean bias (MB), and root-mean-square error (RMSE) for all the three Community Multiscale Air Quality (CMAQ) experiments compared to the observed values at all the AirNOW used in this study.
Details are in the caption following the image
Percentage improvement in correlation coefficient (CC) and reduction in mean bias (MB) and root-mean-square error (RMSE) in the MET_BE and MET + EMIS_BE experiments relative to BKG experiment.

To summarize the statistical evaluation results, we calculated the statistical parameters for the BKG, MET_BE, and MET + EMIS_BE experiments for every state of the United States (Figure 13). The largest improvements in the correlation coefficient due to the assimilation of MODIS AOD retrievals is seen in California, Colorado, Florida, North Dakota, South Dakota, Utah, and Wyoming. There are some states (Connecticut, Idaho, Illinois, Indiana, Maine, Michigan, New Hampshire, New York, and Vermont) where the MET_BE experiment leads to slightly higher correlation coefficients than the MET + EMIS_BE experiment. In Delaware, Montana, New Jersey, and Rhode Island, we also notice a small reduction in the correlation coefficient in both the MET_BE and MET + EMIS_BE experiments relative to the BKG experiment.

Details are in the caption following the image
State-wide variation in correlation coefficient (CC), mean bias (MB), and root-mean-square error (RMSE) for the BKG, MET_BE, and MET + EMIS_BE experiments. The numbers at the bottom of the MB plot represent the number of observation sites available in each state.

The MB reduces with the assimilation of MODIS AOD retrievals in all the states except in New York and Wisconsin, where assimilation changes the negative MB to positive particularly in the MET + EMIS_BE experiment. The negative to a positive change of MB is also seen in Minnesota and Rhode Islands, but the absolute magnitude of the MB is smaller with the assimilation of MODIS AOD. To understand the reason for this negative to positive transition in the MB, we analyzed the diurnal variations in observed and CMAQ simulated PM2.5 concentrations for New York and Wisconsin. We find that CMAQ agrees well with the observed PM2.5 values during the daytime but overestimates the nighttime observed PM2.5. The daytime increase in CMAQ assimilation experiments is attributed to the GSI analysis increments applied to the CMAQ aerosol chemical composition to minimize the difference between CMAQ and MODIS AOD. The increments last in the model even 48 hr after the assimilation (see section 4.4 for details), which along with trapping of aerosols emitted in the shallow nighttime boundary layer leads to nighttime overestimation of observed PM2.5 values in CMAQ. The MB is reduced to less than −5 μg/m3 in most of the remaining states, but MB exceeding −5 μg/m3 despite AOD assimilation is seen in California, Colorado, Idaho, Montana, Texas, and West Virginia. The RMSE values follow a pattern similar to the MB with reductions in all the states except New York, Wisconsin, Rhode Islands, and Minnesota for the reason discussed above.

4.3 Impact of AOD Assimilation on 48-hr PM2.5 Forecasts

The CMAQ simulations presented in the previous section assimilated MODIS AOD every day at 15 Z, 18 Z, and 21 Z. Higher PM2.5 levels in the MET_BE and MET + EMIS_BE experiments compared to the BKG experiments from 21 Z through 00 Z to 15 Z (top right panel of Figure 10) demonstrate that the effect of improving initial conditions via AOD assimilation last for at least 18 hr. To examine the impact of AOD assimilation beyond that, we perform 48-hr forecasts starting from the analysis state at 21 Z every day from 15 July to 14 August 2014 for the MET + EMIS_BE and BKG experiments. These forecasts are averaged by lead time at all the sites for all the days and compared against the observations (Figure 14). Higher PM2.5 levels in the MET + EMIS_BE experiment compared to the BKG experiment throughout the 48 hr show that the effect of improving aerosol initialization via assimilation of MODIS AOD retrievals can last for 48 hr. However, the improvement decreases with time with the first 24 hr showing the larger improvement. Similar effects of improving the initial conditions via assimilation of MODIS AOD retrievals have been reported previously (e.g., Saide et al., 2013; Schwartz et al., 2012). Note that we see higher improvements in simulated PM2.5 mass concentrations due to assimilation of MODIS AOD after the lead time of zero. This is likely because our 48-hr forecasts start at 21 Z and in our simulations the model soon enters into nighttime after the assimilation. Since we start with higher initial concentrations due to AOD assimilation, trapping of emissions within the shallower nighttime boundary layer further enhances the concentrations. This feature of the model is also noticed in the BKG experiment. Another potential contribution could be from enhanced contribution of advection at the observation sites because AOD assimilation increases the aerosol concentrations everywhere in the domain.

Details are in the caption following the image
Evaluation of 48-hr Community Multiscale Air Quality (CMAQ) forecasts for the BKG and MET + EMIS_BE experiments against the AirNOW observations.

5 Conclusions

This study developed a new approach to assimilate MODIS AOD retrievals to improve initial conditions of CMAQ, which is used by the NAQFC at NOAA to produce operational air quality predictions. The uncertainties in anthropogenic emissions are accounted for in the BEC matrix for the first time in a 3DVAR framework. Anthropogenic emission uncertainties substantially increase the background error standard deviation for the Aitken and Accumulation mode aerosols. To assess the value of incorporating anthropogenic emission uncertainties in the data assimilation, two sets of the BEC matrix are designed, with the first including uncertainties only due to differences in meteorological initialization (MET_BE), and the second including uncertainties in both the meteorological initialization and the anthropogenic emissions (MET + EMIS_BE).

Three CMAQ experiments, viz., one background experiment without assimilation and two assimilation experiments ingesting two different BEC matrices, are conducted to understand the impact of including anthropogenic emission uncertainties on the assimilation of MODIS AOD retrievals and PM2.5 simulations. All the CMAQ experiments are conducted from 15 July to 14 August 2014 and are evaluated against EPA measurements of PM2.5 mass concentrations and aerosol chemical composition. The PBL height data set derived from radiosonde observations is also used to assess the model's ability in simulating the PBL. The CMAQ model without assimilation significantly underestimates the MODIS AOD retrievals as well as surface PM2.5 mass concentrations over the United States. Data assimilation pushes both the modeled AOD and surface PM2.5 distributions toward the observed distributions, but CMAQ still underestimated both the MODIS AOD and observed surface PM2.5. This behavior is in line with the previous studies assimilating MODIS AOD with the objective of improving surface PM2.5 mass concentrations (e.g., McHenry et al., 2015; Saide et al., 2013; Schwartz et al., 2012; Tang et al., 2017).

Model results show that accounting for uncertainties in anthropogenic emissions had a large impact on the quality of aerosol analyses. Averaged over CONUS, the assimilation of MODIS AOD improved model's ability to simulate day-to-day variability in PM2.5 mass concentrations by ~48% in the MET_BE experiment and ~67% improvement in the MET + EMIS_BE experiment. The corresponding reductions in the MB are estimated to be ~10% and ~38% for the MET_BE and MET + EMIS_BE experiments, respectively. MODIS AOD assimilation improved the model performance at more than 80% of the AirNOW sites in terms of correlation coefficient, MB, and the root mean square error with MET + EMIS_BE yielding larger improvements. We also analyzed the model performance by state and fond that assimilation improves model performance in all the U.S. states except New York and Wisconsin, where the background model was already closer to the observations. Finally, we show that improving aerosol initial conditions via assimilation of MODIS AOD retrievals can reduce biases in PM2.5 forecasts at least for 48 hr.

The improvements in PM2.5 predictions via assimilation of satellite AOD retrievals shown here suggests that data assimilation is ready to play the same fundamental role in operational air quality predictions as it plays in the NWP. The geostationary satellites that will provide AOD information with much higher spatial and temporal resolution compared to the current polar-orbiting satellite are expected to improve the forecasting skill (Saide et al., 2014). This makes data assimilation an even more exciting prospect especially for air quality management in data void regions of the world. However, further research is also required to enhance the capabilities of data assimilation systems particularly on improving the representation of the background errors by incorporating major sources of errors in air quality simulations, developing unified forward operators that can handle aerosol chemical composition produced by widely used aerosol models, developing flow-dependent background errors via hybrid data assimilation, incorporating forward model errors in error covariances, cross correlation of error covariances, and improving the accuracy of satellite AOD retrievals. Future studies should also explore feedback of chemical data assimilation on weather parameters and vice versa (e.g., Saide et al., 2012; Semane et al., 2009), assimilation of vertical distribution of aerosols retrieved by sensors such as CALIPSO, assimilation of multisatellite retrievals, and assimilation of aerosol chemical composition. We refer the reader to Bocquet et al. (2015) for a detailed discussion on current and future prospects of chemical data assimilation. Furthermore, data assimilation can be used not only to improve the initial conditions of air quality models but also to analyze model error characteristics with the goal of improving the representation of key atmospheric processes in atmospheric composition models similar to the recent efforts in NWP (Lee, McQueen, et al., 2017).

Acknowledgments

We acknowledge the use of anthro_emis tool provided by the Atmospheric Chemistry Observations and Modeling (ACOM) laboratory of NCAR. We gratefully acknowledge the funding from NASA Applied Science Program (grant NNX15AH03G) for this study. We would like to acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. Data supporting the conclusions of this paper can be obtained here: https://doi.org/10.5281/zenodo.2563377. The National Center for Atmospheric Research is sponsored by the National Science Foundation. We thank the three anonymous reviewers for their constructive comments on the manuscript.