Volume 129, Issue 16 e2024JD040906
Research Article
Open Access

A Comparison of Regression Methods for Inferring Near-Surface NO2 With Satellite Data

Eliot J. Kim

Corresponding Author

Eliot J. Kim

Nelson Institute Center for Sustainability and the Global Environment (SAGE), University of Wisconsin-Madison, Madison, WI, USA

Now at Global Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, MD, USA

Correspondence to:

E. J. Kim,

[email protected]

Search for more papers by this author
Tracey Holloway

Tracey Holloway

Nelson Institute Center for Sustainability and the Global Environment (SAGE), University of Wisconsin-Madison, Madison, WI, USA

Department of Atmospheric and Oceanic Science, University of Wisconsin-Madison, Madison, WI, USA

Search for more papers by this author
Ajinkya Kokandakar

Ajinkya Kokandakar

Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA

Search for more papers by this author
Monica Harkey

Monica Harkey

Nelson Institute Center for Sustainability and the Global Environment (SAGE), University of Wisconsin-Madison, Madison, WI, USA

Search for more papers by this author
Stephanie Elkins

Stephanie Elkins

Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA

Search for more papers by this author
Daniel L. Goldberg

Daniel L. Goldberg

Department of Environmental and Occupational Health, George Washington University, Washington, DC, USA

Search for more papers by this author
Colleen Heck

Colleen Heck

Nelson Institute Center for Sustainability and the Global Environment (SAGE), University of Wisconsin-Madison, Madison, WI, USA

Search for more papers by this author
First published: 12 August 2024

Abstract

Nitrogen dioxide (NO2) is an atmospheric pollutant emitted from anthropogenic and natural sources. Human exposure to high NO2 concentrations causes cardiovascular and respiratory illnesses. The Environmental Protection Agency operates ground monitors across the U.S. which take hourly measurements of NO2 concentrations, providing precise measurements for assessing human pollution exposure but with sparse spatial distribution. Satellite-based instruments capture NO2 amounts through the atmospheric column with global coverage at regular spatial resolution, but do not directly measure surface NO2. This study compares regression methods using satellite NO2 data from the TROPospheric Ozone Monitoring Instrument (TROPOMI) to estimate annual surface NO2 concentrations in varying geographic and land use settings across the continental U.S. We then apply the best-performing regression models to estimate surface NO2 at 0.01° by 0.01° resolution, and we term this estimate as quasi-NO2 (qNO2). qNO2 agrees best with measurements at suburban sites (cross-validation (CV) R2 = 0.72) and away from major roads (CV R2 = 0.75). Among U.S. regions, qNO2 agrees best with measurements in the Midwest (CV R2 = 0.89) and agrees least in the Southwest (CV R2 = 0.65). To account for the non-Gaussian distribution of TROPOMI NO2, we apply data transforms, with the Anscombe transform yielding highest agreement across the continental U.S. (CV R2 = 0.77). The interpretability, minimal computational cost, and health relevance of qNO2 facilitates use of satellite data in a wide range of air quality applications.

Key Points

  • We compare regression methods to estimate surface nitrogen dioxide concentrations at 0.01° resolution using satellite and land use data

  • Multivariate linear regression with Anscombe-transformed inputs has strongest agreement with surface nitrogen dioxide measurements

  • Regression methods provide accurate, low-bias concentration estimates with minimal computational and data requirements

Plain Language Summary

Nitrogen dioxide (NO2) is an air pollutant which causes cardiovascular and respiratory illnesses and reacts in the atmosphere to form other harmful pollutants. This necessitates accurate and reliable quantification of NO2 concentrations in the air. Ground monitors directly observe NO2 concentrations near the Earth's surface. However, monitors do not have sufficient spatial coverage to quantify NO2 at large scales. Satellite-based instruments capture NO2 amounts across the Earth at increasingly high spatial resolution. However, satellite instruments cannot directly observe surface NO2 concentrations. In this study, we compare regression methods for estimating surface NO2 over the continental U.S. using satellite data and auxiliary land-use variables. We find that NO2 estimated using multivariate regression models with transforms applied to inputs result in the highest agreement with surface NO2 among the regression methods we investigated. We then use the regression models to quantify surface NO2 concentration across the U.S. at 0.01° by 0.01° spatial resolution. Our work leverages the precision of ground observations and the high resolution of satellite data to accurately quantify surface NO2. The interpretable, generalizable, and easily applicable methods used in our study will facilitate the use of satellite data for air quality and human health assessments.

1 Introduction

1.1 Background

Nitrogen dioxide (NO2) is an air pollutant with harmful impacts on human health. Exposure to high concentrations of NO2 is closely associated with hospital admissions and mortality for a range of respiratory and cardiovascular diseases (Mills et al., 2015). NO2 pollution accounts for a significant portion of asthma cases among children worldwide (Anenberg et al., 2022; Chowdhury et al., 2021). Given these health effects, NO2 is regulated by the United States Environmental Protection Agency (EPA) under the National Ambient Air Quality Standards (NAAQS), which requires the annual mean concentration of NO2 to remain below 53 parts per billion (ppb) in inhabited areas.

In addition to directly harming human health, NO2 acts as a reactant in the troposphere to form other harmful air pollutants. In the presence of volatile organic compounds and sunlight, NO2 reacts to form tropospheric ozone (O3) which in turn damages human health, increases mortality, and harms ecosystems (Ashmore, 2005; Jerrett et al., 2009; Sillman, 1999). NO2 also contributes to the formation of particulate nitrate (NO3), a component of fine particulate matter (PM2.5) which causes cardiovascular, respiratory, and birth-related illnesses and impairments (Behera & Sharma, 2012; Feng et al., 2016).

NO2 is emitted from both anthropogenic and natural sources, mainly through high temperature combustion from biomass burning and fossil fuels (M. Lee et al., 1997). Thus, NO2 serves as a tracer for air pollution from traffic, industrial sites, and other point sources. NO2 is therefore important for estimating emissions of greenhouse gases that are co-emitted during combustion, such as CO2 (Goldberg et al., 2019; Konovalov et al., 2016; Levy et al., 2014). Anthropogenic activity is the dominant source of NO2 in industrialized North America, Europe, and Asia (van der A et al., 2008). Natural sources of NO2 include soils and lightning (Olivier et al., 1998). Because NO2 has a relatively short lifetime of several hours, it remains concentrated near its source, resulting in distinct spatial gradients in concentration that are strongly correlated to emissions (L. N. Lamsal et al., 2011; Pommier, 2023). Thus, reliable quantification of NO2 concentration is critical for characterizing emissions from human activity and for measuring human air pollutant exposure in urban, roadside, and industrial areas with high NO2 concentrations.

The EPA maintains a national network of ground-based monitors that provide ambient air pollution data known as the Air Quality System (AQS). Although AQS monitors provide hourly measurements of NO2 concentrations, their sparse and irregular spatial distribution renders them insufficient for capturing the spatiotemporal variability of NO2 at regional and national scales. Ground monitors have limited usefulness for comprehensive assessments of human exposure to air pollution (Guay et al., 2011). Satellite data products provide global coverage of column NO2 on a high-resolution spatial grid, but generally have daily frequency as opposed to hourly ground measurements, and are limited to daylight hours. Satellite data offer the potential to bridge the spatial gaps in ground-based monitor data for capturing surface NO2 concentrations (Holloway et al., 2021). However, satellites do not directly measure NO2 at the surface and instead detect NO2 amounts through the atmospheric column with greater sensitivity to mid-tropospheric background NO2 (Dang et al., 2023). Since NO2 sources are concentrated at the surface, NO2 vertical column densities (VCD) measured by satellites have varying strengths of correlation with surface NO2 depending on spatiotemporal scale, season, region, and the characteristics of the surface and satellite data (Bechle et al., 2013; Goldberg et al., 2021; Griffin et al., 2019, 2021; Ialongo et al., 2020; Judd et al., 2020; Lamsal et al., 2008, 2015; H. J. Lee et al., 2023; Penn & Holloway, 2020; Pommier, 2023; Yu & Li, 2022; van der A et al., 2008).

The highest resolution global satellite NO2 data currently available comes from the TROPOspheric Monitoring Instrument (TROPOMI) onboard the Sentinel-5 Precursor satellite, launched by the European Space Agency (ESA) in October 2017 (van Geffen et al., 2020; Veefkind et al., 2012). TROPOMI follows a lineage of remote sensing spectrometers including the Global Ozone Monitoring Experiment, the Scanning Image Spectrometer for Atmospheric Chartography (SCIAMACHY), and the Ozone Monitoring Instrument (OMI). TROPOMI provides column NO2 data at a peak resolution of 3.5 km by 5.5 km at nadir, a significant improvement over the 13.0 km by 24.0 km peak resolution of the OMI NO2 data product (Veefkind et al., 2012). The smaller pixel size of TROPOMI enables an unprecedented scale of observation, such as distinguishing signals from individual sources at the scale of individual cities (Ialongo et al., 2020). By capturing the spatial heterogeneities in NO2 at a finer scale, TROPOMI provides opportunities for significant improvements in satellite-based quantification of surface NO2.

1.2 Literature Review

To best leverage the global coverage and high spatial resolution of satellite NO2 data, it is critical to investigate the agreement between column NO2 amounts and surface NO2 concentrations across varying spatiotemporal scales. Prior studies evaluating satellite-surface NO2 agreement have used four main methods: (a) chemical transport models (CTM), (b) machine learning, (c) spatial interpolation models, and (d) regression. The advantages and disadvantages of each method are detailed in Table 1.

Table 1. Comparison of Common Methods for Investigating Agreement Between Satellite-Derived and Surface Measurements of NO2
Model type Advantages Disadvantages Examples in literature
Chemical transport Provides vertical profiles of chemical concentrations Requires extensive data and computational resources to run chemical transport models Bechle et al. (2013); Cooper et al. (2020); Gu et al. (2017); Lamsal et al. (2015)
Enables direct computation of ground-level concentration estimates CTM outputs have lower resolution
Machine learning Learns complex non-linear patterns from high dimensional data sets with minimal human supervision Difficult to interpret Chan et al. (2021); Chi et al. (2022); Ghahremanloo et al. (2021); Grzybowski et al. (2023); Kim et al. (2021); Li et al. (2022); Qin et al. (2020); Yeganeh et al. (2018)
Requires significant computational resources and expertise
Spatial interpolation models Direct encoding of spatial and/or temporal information May struggle to generalize to spatial and/or temporal contexts beyond the training domain Qin et al. (2017); Young et al. (2016)
Regression Minimal data and computational requirements Cannot capture complex nonlinear relationships Goldberg et al. (2021); Griffin et al. (2019); Henderson et al.(2007, p. 200); Hoek et al. (2008); H. J. Lee et al. (2023); Novotny et al. (2011); Yu and Li (2022)
Easy to interpret

Vertical profiles of mixing ratios from CTM have been used to derive surface NO2 concentrations from satellite data. Commonly used CTMs include the global three-dimensional Goddard Earth Observing System-Chemistry (GEOS-Chem) model and the regional-scale Community Multi-Scale Air Quality (CMAQ) model and Comprehensive Air Quality Model with extensions (CAMx) (Bechle et al., 2013; Lamsal et al., 2015). Cooper et al. (2020) applied GEOS-Chem vertical profiles to both OMI and TROPOMI column NO2 to correct for inaccuracies in vertical mixing assumptions in satellite products. Their work showed that TROPOMI-derived surface NO2 had lower variance and greater ability to capture emissions sources at high resolution than OMI-derived surface NO2. Gu et al. (2017) compared ground monitor NO2 in China with both unadjusted OMI NO2 and OMI surface NO2 derived using CMAQ NO2 profiles. Using the CMAQ-adjusted OMI NO2, they found 0.03 greater correlation coefficients (R) for January 2014 and 0.05 greater R-values for July 2014. However, the use of CTM in near-real-time requires meteorological reanalysis data and emissions inventories, significant computational resources, and additional re-gridding steps to accommodate for the lower spatial resolution of models. In this study, we use TROPOMI column NO2 without CTM-based adjustments, to provide surface NO2 estimates with minimal computational burden.

Machine learning methods are well-suited for estimation and prediction problems with complex input data sets, as is the case for air quality estimation and forecasting. Several recent studies implement machine learning methods using TROPOMI column NO2 as well as meteorological and land use data inputs to estimate surface NO2 concentrations (Chi et al., 2021; Grzybowski et al., 2023; Li et al., 2022; Qin et al., 2020; Yeganeh et al., 2018). Ghahremanloo et al. (2021) trained convolutional neural networks to predict surface NO2 concentrations over Texas using TROPOMI column NO2, vegetation, land-use, and meteorological data as inputs. Their machine learning method improved (R = 0.91) had stronger agreement with surface NO2 than multiple linear regression (R = 0.77). Kim et al. (2021) used tree-based ensemble machine learning methods with TROPOMI NO2, land-use, meteorological, and topographic variables to predict hourly surface NO2 over Switzerland and northern Italy. Their model achieved R2 of 0.54 for monitors held-out from model training and R2 of 0.84 for all monitors. Chan et al. (2021) estimated surface NO2 concentrations over Germany for 2018 through 2020 at weekly to seasonal time-scales using artificial neural networks and TROPOMI NO2 reprojected to 0.5 by 0.5 km resolution, resulting in R2 of 0.64. These studies demonstrate that machine learning models can accurately estimate surface NO2 from large, multi-dimensional input data sets. However, the usability of machine learning models is limited by their significant computational demands and their inherent lack of interpretability. Here, we investigate the ability of regression models with column NO2 input to estimate surface NO2. Regression models have minimal computational demands and are straightforward to interpret, enabling a broad range of applications.

Spatial interpolation models including kriging and geographically and temporally weighted regression (GTWR) have been used to estimate ground-level NO2 concentrations. Kriging applied to NO2 ground monitors provides adequate performance in areas with clustered monitors, and incorporating satellite NO2 data improves prediction at locations far from monitors (Young et al., 2016, p. 201). GTWR improved on ordinary least squares (OLS) regression for predicting ground-level NO2, with a cross-validation (CV) R2 of 0.60 for GTWR compared to 0.44 for OLS at a daily scale over central and eastern China (Qin et al., 2017). These spatial interpolation models provide accurate predictions of surface NO2 but require the inclusion of chemical transport model profiles, meteorological data, and other spatial information which are not readily available in a real-time prediction context.

Previous regression-based studies have shown strong agreement between surface measurements and TROPOMI column NO2. Griffin et al. (2019) compared TROPOMI NO2 and surface measurements in the Canadian Oil Sands and found an R2 of 0.67, demonstrating the improved capability of TROPOMI in capturing fine-scale surface NO2 variations compared to OMI. Yu and Li (2022) explored the agreement of TROPOMI NO2 with surface monitors in China's Xinjiang Province, finding a province-wide R2 of 0.78. Their work also explored meteorological and economic factors, using annual GDP of industry as a proxy for industrial activity. Goldberg et al. (2021) investigated the weekly and diurnal variability of TROPOMI NO2 as well as the impact of temperature. Their study found an R2 of 0.66 between annual-average EPA surface NO2 and TROPOMI column NO2 across the continental U.S.

Land use regression (LUR) studies incorporate land use and road data to estimate surface NO2. Early literature conducted seasonal to annual-scale measurement campaigns of surface NO2 to generate data for LUR. These studies improved on spatial interpolation methods while achieving particularly strong performance in urban areas with fine-scale gradients in NO2 concentrations (Beelen et al., 2013; Henderson et al., 2007; Hoek et al., 2008). Novotny et al. included OMI-derived surface NO2 as input to LUR models, resulting in a median R2 of 0.76 on the hold-out set of monitors over the continental U.S. (Novotny et al., 2011). Lee et al. (2023) used multivariate regression, land use data, and TROPOMI column NO2 to estimate NO2 across 89 monitor sites in California, attaining an R2 of 0.76. Further, Lee et al. estimated surface NO2 on a 500 m-resolution grid across California.

Here, we investigate TROPOMI NO2 to capture spatial heterogeneities in the distribution of ambient NO2 at the surface across the continental U.S. We compare regression methods for estimating surface NO2 concentrations in varied land use settings (urban/suburban/rural and highway proximity) and geographies (seven distinct U.S. regions). We then apply the regression models to provide a reliable metric of surface NO2 across CONUS. This metric provides an easily interpretable, high-resolution estimate of surface NO2 with minimal data and computational requirements. Recognizing the limitations of an annual average metric, we term this quantity “quasi-NO2” (qNO2 for short). We assess the performance of this metric on regional and national scales, investigate spatial patterns and potential causes of biases, and evaluate the applicability of qNO2 across different use cases. We anticipate qNO2, with its high spatial resolution and ease-of-use, will facilitate air quality and health impact assessments.

2 Methods

2.1 Surface Monitor Data

Hourly NO2 measurements over the U.S. were obtained from the EPA Air Quality Service (AQS) for 2019 (US EPA, 2013). AQS monitors use a chemiluminescence method which measures the amount of NO that is converted from NO2 by a molybdenum oxide converter (Fontijn et al., 1970). Other oxidized nitrogen compounds such as nitric acid (HNO3) and peroxyacetyl nitrate (PAN) are also converted to NO by these converters, causing an overestimation of NO2 when there are high concentrations of HNO3 or PAN (Steinbacher et al., 2007). Interference is observed to be highest during afternoon hours for urban areas and in the summer season for rural areas (Dunlea et al., 2007; Steinbacher et al., 2007). This positive monitor bias is often corrected when used in comparison with satellite data (Cooper et al., 2020; Lamsal et al., 2015). Following the reasoning previously described in Penn and Holloway, and given the annual scale of our analysis, we do not apply a bias correction factor to the monitor data (Penn & Holloway, 2020). EPA NO2 is used without bias corrections for many health impact studies and regulatory purposes, such as determining attainment of the NAAQS across the U.S. Several prior studies use EPA AQS NO2 data without bias correction (Goldberg et al., 2021; Jiang et al., 2018; Lu et al., 2015; Novotny et al., 2011; Qu et al., 2021). Further, obtaining HNO3 and PAN concentrations for each monitor location at daily or hourly time scales requires output from a chemical transport model, as done by Cooper et al. (2020), which is a computationally expensive task that limits the accessibility of our methods. To remain consistent with the US air quality management community and to minimize the computational burden of our methods, we use the monitor data without bias correction.

EPA NO2 data were filtered to only include monitors for which at least 75% of 2019 hourly measurements were considered “valid” by EPA quality control checks. Then, for each of the remaining 402 monitors, all valid 2019 hourly measurements were averaged to obtain the final “ground-truth” data set. Our filtering method aligns with the criterion implemented in prior annual average NO2 studies (Novotny et al., 2011; Penn & Holloway, 2020).

We use two monitor classifications provided by the EPA as input variables for regression modeling: “location setting,” which consists of urban (n = 152), suburban (n = 146), and rural (n = 104) classes, and “road proximity,” which has non-near-road (n = 333) and near-road (n = 69) classes. We use the term “location setting” rather than “land use” because our classification scheme is more general than traditional land use data sets. Near-road monitors are located near highways in metropolitan areas. 57% of these monitors are within 20 m of a highway and 89% are within 30 m (Watkins, 2016). Our data set includes 49 near-road monitors in urban areas, 20 near-road monitors in suburban areas, and no near-road monitors in rural areas.

2.2 Satellite Data

We use 2019 annual average TROPOMI (version 2.3.1) column NO2 as an input variable for regression models. Our primary motivation for conducting analysis at the annual-average scale is to align with the metrics used by the air quality management community. Further, the annual-average scale mitigates the impact of missing pixels in satellite data and enables re-gridding to a higher spatial resolution. However, the annual-average scale smooths over significant diurnal and seasonal NO2 variability. We further discuss the applicability of annual-average results in Section 4. TROPOMI measures the slant column density using a differential optical absorption spectroscopy technique, separating the column into stratospheric and tropospheric components. Air mass factors (AMFs) are then used to convert the SCDs into VCD (VCDs) (van Geffen et al., 2020). Current AMFs are subject to uncertainty and may be a partial cause of low bias in column NO2 observations in urban areas (Judd et al., 2020). The highest resolution of TROPOMI is 3.5 km by 5.5 km at nadir (resolution increased from 3.5 km by 7.0 km on 6 August 2019).

TROPOMI has an approximate overpass time of 1:30 p.m. local time (Veefkind et al., 2012). We compared satellite-surface agreement using surface measurements averaged across the full 24 hr and averaged only for 1–2 p.m. to align with TROPOMI overpass time. Prior studies investigating annual-average satellite-surface NO2 agreement have used both 24-hr surface averages (Lee et al., 2023; Young et al., 2016) and overpass time averages (Chi et al., 2022). We found surface measurements averaged for 1–2 p.m. and for the full 24 hr are strongly correlated (R = 0.93). We also found that simple linear regression fit on TROPOMI NO2 (methods detailed in Section 2.4) better estimates daily average surface NO2 (R2 = 0.55) than 1–2 p.m. average surface NO2 (R2 = 0.45). Further, as NO2 concentrations tend to be lower during the mid-day satellite overpass times and greater during nighttime (Penn & Holloway, 2020), 24-hr average concentrations are a more relevant metric for NO2 exposure. Thus, we use 24-hr average surface measurements in our work, with the aim of providing the most relevant metric for exposure assessments and regulatory work. We use the method used in Goldberg et al. (2021) to re-grid TROPOMI NO2 to a 0.01° by 0.01° grid (approximately 1 km by 1 km). Figure 1 shows the re-gridded TROPOMI NO2 data used in this study.

Details are in the caption following the image

2019 annual average TROPOMI NO2 gridded at 0.01° by 0.01° resolution across the continental United States.

2.3 Road and Location Setting Data

To characterize surface NO2 concentration across the full domain, we applied our regression models to each TROPOMI NO2 CONUS grid cell. To apply the regression models, we classified each grid cell by road proximity and location setting, as defined in Section 2.1.

We use road data from the U.S. Census Bureau TIGER/Line Primary Roads data set. All EPA NO2 monitors classified as “near-road” are within the same TROPOMI grid cell as a Census Bureau “primary road.” Thus, to create the near-road data set for the full 0.01° by 0.01° CONUS grid, TROPOMI NO2 grid cells overlapping with any segment of a TIGER/Line “primary road” were classified as “near-road.” All remaining grid cells were classified as “non-near-road.” We used ArcGIS Pro 3.0 to re-grid the TIGER/Line data onto the TROPOMI NO2 grid. Figure S1 in Supporting Information S1 shows primary roads on the 0.01° by 0.01° CONUS grid along with near-road EPA monitors.

We determined the location setting classification of each TROPOMI grid cell based on the National Center for Education Statistics (NCES) Education Demographic and Geographic Estimates locale classification. The NCES data set provides boundaries for four categories of locales across the U.S.: City, Suburban, Town, and Rural. We used ArcGIS Pro 3.0 to determine the locale class with the most area covered in each TROPOMI grid cell. To align NCES and EPA classifications when applying our regression models across the full CONUS grid, we classified NCES “City” grid cells as EPA “Urban,” NCES “Suburban” as EPA “Suburban,” and NCES “Rural” and “Town” as EPA “Rural.” Figure S2 in Supporting Information S1 shows the location setting classifications for each grid cell. Text S1 in Supporting Information S1 details the agreement between NCES and EPA location setting classifications, which are displayed in Figure S3 in Supporting Information S1.

We use the three-class land use classification instead of a more complex land use data set to maximize reproducibility of our methods to other regions and time periods. Using a more detailed land use data set would result in a small number of monitors for each land use category in the data set, which would limit the generalizability of the regression models. Further, we opt to use land use categories in lieu of comprehensive emission inventory data for two main reasons. First, emissions data would have a strong correlation with the TROPOMI data and thus prevent direct evaluation of satellite column data for quantifying surface emissions. Second, as with complex land use data sets, obtaining and processing comprehensive emissions inventory data necessitates significant computational resources, which may inhibit straightforward application of satellite data. Thus, we choose the NCES classifications as a sufficiently simple yet informative input variable to distinguish the impact of land use on column-surface agreement.

2.4 Regression Methods

We fit simple (SLR) and multivariate linear regression (MLR) models to evaluate the relationship between surface monitor NO2 and TROPOMI column NO2. SLR has one input parameter (TROPOMI NO2) and MLR has three input parameters in our work (TROPOMI NO2, location setting, and road proximity). The main difference lies in the number of input parameters — both SLR and MLR are fitted by minimizing the sum of squares of the residuals (linear least squares). The output of SLR is termed qNO2 SLR and the output of MLR is termed qNO2 MLR. Through this analysis, we aim to (a) understand under which conditions NO2 satellite data best represents surface NO2 concentrations and (b) compare the performance of different satellite NO2-based regression methods for estimating surface NO2. The TROPOMI NO2 measurements have a Poisson-like distribution (Figure S4 in Supporting Information S1). Thus, to satisfy the normality and constant variance assumptions of linear regression, we fit additional regression models with the log transform and Anscombe transform applied to TROPOMI NO2 inputs. Equation 1 gives the Anscombe transform for positive real number x.
a ( x ) = 2 x + 3 8 $\begin{array}{c}a(x)=2\sqrt{x+\frac{3}{8}}\end{array}$ (1)

The distribution for the log and Anscombe transformed-TROPOMI NO2 has greater symmetry than the non-transformed distribution (Figure S4 in Supporting Information S1). The Anscombe transform ensures transformed values remain positive, whereas log-transformed values may be negative and thus result in negative regression outputs (Anscombe, 1948). The outputs of MLR with log transform of TROPOMI NO2 are termed qNO2 logMLR, and the outputs of MLR with Anscombe transform of TROPOMI NO2 are termed qNO2 anscMLR.

While the resolution of TROPOMI NO2 is much finer than previous NO2 satellite data products, we still expect the kilometer-scale data to be insufficient to capture emissions near individual major roads, which can have sharp decay gradients over hundreds of meters (Kimbrough et al., 2017). Thus, we separately conduct simple linear regression on near-road and non-near road monitors. To further compare TROPOMI NO2 performance over different location settings, we also conduct simple linear regressions on each location setting class: urban, suburban, and rural.

We fit multivariate regression models with three input variables: TROPOMI column NO2 concentration (no transform, log transform, Anscombe transform), road proximity, and location setting. Road proximity is a binary variable representing EPA monitor “near-road” and “non-near-road” classification. Location setting is a categorical variable with three levels corresponding to EPA monitor classification: urban, suburban, and rural. Multivariate regression provides a single interpretable model for calculating qNO2 across the U.S., facilitating interpretation and application by stakeholders.

2.5 Evaluation Methods

We evaluate regression model performance using four metrics: coefficient of determination (R2), root mean squared error (RMSE), mean fractional error (MFE), and mean fractional bias (MFB).
RMSE = i = 1 N q N O 2 [ i ] E P A N O 2 [ i ] 2 N $\begin{array}{c}\text{RMSE}=\sqrt{\frac{\sum\limits _{i=1}^{N}{\left(qNO2[i]-{EPA\,NO}_{2}[i]\right)}^{2}}{N}}\end{array}$ (2)
MFB = 1 N i = 1 N q N O 2 [ i ] E P A N O 2 [ i ] i = 1 N q N O 2 [ i ] + E P A N O 2 [ i ] 2 $\begin{array}{c}\text{MFB}=\frac{1}{N}\left(\frac{\sum \nolimits_{i=1}^{N}\left({qNO2[i]-EPA\,NO}_{2}[i]\right)}{\sum\limits _{i=1}^{N}\frac{\left({qNO2[i]+EPA\,NO}_{2}[i]\right)}{2}}\right)\end{array}$ (3)
MFE = 1 N i = 1 N | q N O 2 [ i ] E P A N O 2 [ i ] | i = 1 N q N O 2 [ i ] + E P A N O 2 [ i ] 2 $\begin{array}{c}\text{MFE}=\frac{1}{N}\left(\frac{\sum \nolimits_{i=1}^{N}{\vert qNO2[i]-EPA\,NO}_{2}[i]\vert }{\sum\limits _{i=1}^{N}\frac{\left(qNO2[i]+{EPA\,NO}_{2}[i]\right)}{2}}\right)\end{array}$ (4)

R2 is the proportion of variance in the output that is captured by the model. RMSE is a metric of absolute error with the same units as the model output (ppb), calculated using Equation 1. MFB is a unitless metric of relative bias. For instance, MFB = 0.67 indicates that the model output is an overestimate of observed values by a factor of 2. MFB = 0.4 indicates that the model output is an overestimate by a factor of 1.5. MFE is a unitless metric of relative error. MFB and MFE are used to measure the relative performance of qNO2, enabling comparison between models fit on different classes and regions.

The multivariate regression models were evaluated for seven distinct regions of the continental U.S.: Northeast, Southeast, Midwest, Rockies, Southwest, Northwest, and Southern California. These regions were selected due to their distinct topographical and meteorological conditions, and to ensure a similar number of EPA monitors (n = 51 ∼ 63) in each region. Figure S5 in Supporting Information S1 shows region divisions and monitor locations.

We implement random and spatial CV methods to assess the generalization ability of the multivariate regression models. Generalizability is an important factor for the utilization of qNO2 as a metric for surface NO2 in spatial and temporal domains beyond those evaluated in this work. We conduct random CV using k-fold and Monte Carlo. We also use each of the seven regions as CV “folds” to conduct spatial CV. Cross-validation experiments are described in greater detail in Text S2 in Supporting Information S1.

After model training and evaluation, we compute qNO2 MLR, logMLR, and anscMLR for the full CONUS TROPOMI NO2 grid and discuss the spatial variation of qNO2 values and qNO2 performance metrics across road proximity and location setting classes as well as U.S. regions. Additional analyses are presented for three metropolitan areas with some of the highest TROPOMI NO2 levels in the United States: Los Angeles, Dallas-Fort Worth, and New York City.

3 Results and Discussion

3.1 Regression Results

We present SLR and MLR results for surface NO2 estimation, their relative performance across U.S. regions, and the impact of transforms applied to TROPOMI NO2. Figures 2a–2c show the relationship between TROPOMI NO2 and EPA NO2, separated by road proximity and location setting classes. Table S1 in Supporting Information S1 displays performance metrics for SLR and MLR models. SLR with TROPOMI NO2 as the sole input resulted in an R2 of 0.55 when evaluated over all monitors. We fit separate SLR models for near-road and non-near-road monitors (Figure 2a). SLR with TROPOMI NO2 captures the majority of variance in surface NO2 concentrations at non-near-road monitors (R2 = 0.66) but does not fully capture near-road variation (R2 = 0.41). Surface monitors better detect NO2 near major roads compared to TROPOMI NO2 because the kilometer-scale resolution of TROPOMI cannot fully capture fine-scale NO2 concentration gradients. However, while SLR at near-road sites has higher absolute error than non-near-road sites, fractional error and bias is lower at near-road sites than non-near-road sites. Thus, SLR with TROPOMI NO2 can be useful as a nearly unbiased estimate in data-sparse settings near major roads. To account for the difference in performance between near-road and non-near-road sites, we include road proximity as a binary variable in the MLR models, aligning with several prior studies which include road proximity information in satellite NO2-based statistical models to estimate surface NO2 (Grzybowski et al., 2023; Henderson et al., 2007; Kim et al., 2021; H. J. Lee et al., 2023; Novotny et al., 2011; Yeganeh et al., 2018; Young et al., 2016).

Details are in the caption following the image

Simple and multiple linear regression results for TROPOMI NO2 with no transform, log transform, and Anscombe transform. (a) Simple regression models fit separately on near-road (black line-of-best-fit) and non-near-road (gray line-of-best-fit) Environmental Protection Agency (EPA) NO2 monitors. Dashed lines indicate mean near-road and non-near-road TROPOMI and EPA NO2 values. (b) Simple regression models fit separately on urban (pink line-of-best-fit), suburban (blue line-of-best-fit), and rural (green line-of-best-fit) EPA NO2 monitors. Dashed lines indicate mean urban, rural, and suburban TROPOMI and EPA NO2 values. (c) qNO2 MLR fit on all monitors. y = x line in red. (d) Simple regression models with log transform of TROPOMI NO2 input fit separately on near-road and non-near-road EPA NO2 monitors. (e) Simple regression models with log transform of TROPOMI NO2 input trained fit on urban, suburban, and rural EPA NO2 monitors. (f) qNO2 logMLR fit on all monitors. y = x line in red. (g) Simple regression models with Anscombe transform of TROPOMI NO2 input fit separately on near-road and non-near-road EPA NO2 monitors. (h) Simple regression models with Anscombe transform of TROPOMI NO2 input trained fit on urban, suburban, and rural EPA NO2 monitors. (i) qNO2 anscMLR fit on all monitors. y = x line in red.

In addition to classification by proximity to major roads, we separated monitor sites by their location setting (urban, suburban, rural) and fit SLR models to each class (Figure 2b). We found TROPOMI NO2 best captures surface concentrations at suburban sites (R2 = 0.60, RMSE = 2.77 ppb), captures around half of concentration variance at rural sites (R2 = 0.53, RMSE = 1.80 ppb), and has the poorest performance at urban sites (R2 = 0.33, RMSE = 3.95 ppb). Column NO2 does not fully capture surface NO2 concentration peaks in urban areas but has stronger performance in suburban and rural areas, which have lower and more uniform NO2 concentrations. However, urban and suburban sites have lower relative error and bias than rural sites (Table S1 in Supporting Information S1). SLR with TROPOMI NO2 is useful as a low bias estimate of urban and suburban surface NO2. In rural areas, SLR-based estimates have moderate positive bias. To account for the differing performance of column NO2 in capturing surface concentrations across location settings, our MLR models include location setting as a categorical variable.

We then fit a multiple linear regression (MLR) model with TROPOMI NO2, road proximity, and location setting variables as inputs and surface NO2 concentration estimates as the output (Figure 2c). MLR evaluated across all monitor sites results in an R2 of 0.76, greater than the full-domain SLR R2 of 0.55. Thus, incorporating road proximity and location setting information aids in surface NO2 estimation. MLR also has lower absolute (RMSE = 2.58 ppb) and fractional error (MFE = 0.29) than SLR (RMSE = 3.59 ppb, MFE = 0.40) and results in a lower positive bias (MFB = 0.09) than SLR (0.15). Thus, in locations with readily available data on major roads and basic land use classifications, we recommend the use of the MLR model for surface NO2 estimation. Table S2 in Supporting Information S1 shows coefficients for the SLR models of each site classification and for the MLR model. Text S3 in Supporting Information S1 includes additional analysis of regression coefficients. As noted in Section 2.4, we term the surface NO2 estimates of the MLR model as qNO2 MLR.

2019 annual average TROPOMI NO2 amounts over CONUS have a log-normal distribution. Figure S4 in Supporting Information S1 shows distributions of TROPOMI NO2 values without transforms and with log-transform and Anscombe transform. To better satisfy the assumption of normality in regression and to improve regression performance, we applied a log-transform and Anscombe transform to TROPOMI NO2 and compared performance with the corresponding no-transform models. Figures 2d–2f show the relationship between TROPOMI NO2 with log-transform and EPA NO2. Figures 2g–2i show the relationship between TROPOMI NO2 with Anscombe transform and EPA NO2. For SLR, both log (R2 = 0.63) and Anscombe (R2 = 0.61) transformed-TROPOMI NO2 have greater R2 than no-transform SLR (R2 = 0.55) when fit on all sites. For MLR, both log (R2 = 0.77) and Anscombe (R2 = 0.78) transforms resulted in marginal improvements in performance. Performance metrics and regression coefficients for log-transform models are presented in Tables S3 and S4 in Supporting Information S1, respectively. Performance metrics and regression coefficients for Anscombe-transform models are presented in Tables S5 and S6 in Supporting Information S1, respectively. Following the naming convention defined in Section 2.4, we term the output of the MLR with log transform as qNO2 logMLR and the output of MLR with Anscombe transform qNO2 as anscMLR.

We specify the model configuration with the best surface NO2 estimation performance for each site classification. For rural sites, SLR with no transform has the highest R2 and lowest RMSE, but SLR with log-transform of TROPOMI NO2 has the lowest fractional bias. For near-road sites, SLR with log transform has the highest R2, lowest RMSE, and lowest fractional bias. For urban sites, MLR with log transform has the highest R2. The difference in performance between MLR and SLR is greatest at urban sites, which indicates the value of road proximity information for estimating urban surface NO2. For non-near-road and suburban sites, MLR with Anscombe transform has the best performance. MLR with Anscombe transform has the best performance across all monitors, with an overall R2 of 0.78.

3.2 qNO2 Computation

To analyze spatial patterns of surface NO2 estimates, we computed qNO2 MLR for all TROPOMI 0.01° by 0.01° grid cells over CONUS (Figure 3a). qNO2 MLR is highest in major cities and along major highways across the U.S. The Great Lakes and much of the eastern half of the U.S. have high overall qNO2 MLR concentrations, while the Mountain West and Northern New England have lower overall concentrations. Western North Dakota and the Permian Basin in western Texas have elevated qNO2 MLR levels compared to the surrounding rural areas, coinciding with the high oil industry activity in both regions.

Details are in the caption following the image

(a) 2019 qNO2 gridded across the continental United States, computed using multiple linear regression (qNO2 MLR). (b) 2019 qNO2 gridded across the continental United States, computed using multiple linear regression with log transform of the TROPOMI NO2 input (qNO2 logMLR). (c) The difference between qNO2 logMLR and qNO2 MLR. Red indicates areas where qNO2 logMLR is greater than qNO2 MLR, and blue indicates areas where qNO2 logMLR is less than qNO2 MLR. (d) 2019 qNO2 gridded across the continental United States, computed using multiple linear regression with Anscombe transform of TROPOMI NO2 input (qNO2 anscMLR). (e) The difference between qNO2 anscMLR and qNO2 MLR. Red indicates areas where qNO2 logMLR is greater than qNO2 MLR, and blue indicates areas where qNO2 logMLR is less than qNO2 MLR.

We also computed qNO2 logMLR (Figure 3b) and anscMLR (Figure 3d) at 0.01° by 0.01° resolution across CONUS. Figure 3c displays the difference between qNO2 logMLR and qNO2 MLR for each grid cell. qNO2 logMLR is greater than qNO2 MLR across the eastern half of the United States, particularly around the Great Lakes, Texas, and the Mid-Atlantic. qNO2 logMLR is also greater than qNO2 MLR in the California Central Valley and in areas around Seattle, Portland, Salt Lake City, Phoenix, and Denver, as well as the Bakken oil fields in North Dakota and Permian Basin in Texas. qNO2 logMLR and qNO2 MLR are close in value in most urban areas and throughout most of the rural western U.S. qNO2 logMLR and MLR have the greatest difference in the Los Angeles and New York City areas, where qNO2 logMLR concentrations are more than 4 ppb lower than qNO2 MLR. Figure 3e shows the difference between qNO2 anscMLR and qNO2 MLR for each grid cell. qNO2 anscMLR follows a similar spatial pattern of differences to qNO2 MLR as qNO2 logMLR, but with a lower magnitude of difference. Overall, qNO2 logMLR and anscMLR have greater spatial spread of NO2 from urban areas and greater background concentrations in the eastern U.S. as well as lower maximum concentrations compared to qNO2 MLR.

3.3 Regional Evaluation

We evaluated qNO2 in seven U.S. regions to investigate the variability of satellite-surface agreement between large spatial domains with similar topographic and meteorological conditions. qNO2 MLR best aligns with surface NO2 in the Midwest states (R2 = 0.88). Northeast, Southeast, Rockies, and Southern California regions have comparable qNO2 MLR performance with R2 values ranging from 0.72 to 0.76. The Southwest (R2 = 0.65) and Northwest (R2 = 0.66) regions have the lowest qNO2 MLR performance (Table S7 in Supporting Information S1). The strong performance in the Midwest and relatively weak performance in the Western U.S. suggests that elevation gradient may be an additional variable that could be included to further improve MLR performance.

All regions have positive MFB except the Northwest, which has an MFB of −0.06 indicating that qNO2 is a slight underestimate of surface NO2. Rockies region has the greatest MFE (0.46) and MFB (0.15). This may be due to the larger proportion of rural sites in the Rockies region with very low NO2 concentrations, which inflates relative error metrics. For rural and remote areas with low background NO2 concentrations, absolute error metrics are more relevant for assessing model performance.

qNO2 logMLR exhibits similar regional variability as qNO2 MLR. R2 in the Northeast, Midwest, Northwest, and Southern California is slightly higher compared to qNO2 MLR, while R2 in the Southeast is slightly lower (Table S8 in Supporting Information S1). qNO2 anscMLR has slightly higher R2 than qNO2 MLR and logMLR in all regions (Table S9 in Supporting Information S1).

qNO2 performance varies within regions as well as between regions. Figure 4 displays the fractional bias of qNO2 at each EPA monitor. qNO2 MLR (Figure 4a) overestimates surface NO2 relative to the measured value along the California coast, Wyoming, Montana, the Dakotas, and Texas. qNO2 MLR underestimates surface NO2 in the California Central Valley and the Southwest. Sites in the Midwest and Southeast have low overall bias. qNO2 anscMLR (Figure 4c) and qNO2 MLR have similar spatial variation in fractional bias across EPA monitor sites, but qNO2 anscMLR has lower fractional bias in Wyoming and Montana. qNO2 logMLR (Figure 4b) also has similar spatial fractional bias variation as MLR and anscMLR but has a much greater degree of bias in Wyoming and Montana.

Details are in the caption following the image

(a) Fractional bias between Environmental Protection Agency (EPA) NO2 and qNO2 MLR at EPA monitor locations (n = 402) across the continental United States. Red indicates monitor locations where qNO2 is relatively high compared to the measured NO2 concentration. Blue indicates monitor locations where qNO2 is relatively low compared to the measured NO2 concentration. (b) Fractional bias between EPA NO2 and qNO2 logMLR at EPA monitor locations. (c) Fractional bias between EPA NO2 and qNO2 anscMLR at EPA monitor locations.

3.4 Urban Case Studies

Figure 5 shows qNO2 MLR, logMLR, and anscMLR over three large U.S. metropolitan areas: Los Angeles, CA; Dallas-Fort Worth, TX; and New York City, NY-NJ-CT-PA. qNO2 MLR in Los Angeles (Figures 5a, 5d, and 5g) is greater than 20 ppb in the city center. qNO2 MLR decreases sharply between the metropolitan area and the surrounding rural areas. qNO2 logMLR has a lower maximum level in the city center and a more gradual decrease toward the surrounding rural areas than qNO2 MLR. The urban-rural concentration gradient for qNO2 anscMLR is steeper than qNO2 logMLR but less steep than qNO2 MLR. All qNO2 models indicate concentrations greater than 18 ppb along the major highways extending south and east from central LA. Among the qNO2 models, qNO2 MLR (RMSE = 3.65 ppb) and anscMLR (3.52 ppb) have the lowest error in Los Angeles, while logMLR (RMSE = 7.78 ppb) has the highest error.

Details are in the caption following the image

qNO2 MLR, qNO2 logMLR, and qNO2 anscMLR over three selected large U.S. metropolitan areas. (a) qNO2 MLR over Los Angeles, CA. (b) qNO2 MLR over Dallas-Fort Worth, TX. (c) qNO2 MLR over New York-Newark-Jersey City, NY-NJ-CT-PA. (d) qNO2 logMLR over LA. (e) qNO2 logMLR over Dallas. (f) qNO2 logMLR over NYC. (g) qNO2 anscMLR over LA. (h) qNO2 anscMLR over Dallas. (i) qNO2 anscMLR over NYC.

Dallas-Fort Worth (Figures 5b, 5e, and 5h) has lower overall qNO2 than Los Angeles, with maximum qNO2 of 16–18 ppb along major highways. The qNO2 models estimate similar concentration levels in the metropolitan area, but logMLR and anscMLR have a broader radius of high concentrations than qNO2 MLR. In Dallas, qNO2 has high accuracy, with logMLR having the lowest error (RMSE = 1.56 ppb).

New York City (Figures 5c, 5f, and 5i) has comparable peak qNO2 levels to Los Angeles, as the urban core and adjacent highways have qNO2 concentrations greater than 20 ppb. As in Los Angeles and Dallas-Fort Worth, qNO2 logMLR and anscMLR over New York City have smoother gradients toward the edges of the metropolitan area than qNO2 MLR. The spatial patterns of qNO2 anscMLR are a combination of the sharp gradients and high peak concentrations of qNO2 MLR and the smoother gradients of qNO2 logMLR. Among the qNO2 models, anscMLR results in the lowest error (RMSE = 3.49 ppb) while logMLR has the highest error (RMSE = 7.04 ppb).

3.5 Cross-Validation

We implemented k-fold and Monte Carlo CV to investigate the generalizability of qNO2 on data sets held out from model fitting. Table S10 in Supporting Information S1 displays k-fold CV results and Table S11 in Supporting Information S1 displays Monte Carlo CV results.

Both CV methods indicate that qNO2 anscMLR performs well on unseen data. k-fold CV resulted in similar mean holdout set performance for k = 5 and k = 10 with R2 of 0.74. However, using k = 20 resulted in a mean holdout set performance of R2 = 0.71. Smaller holdout sets are more likely to be unrepresentative of the population distribution, thus resulting in poor evaluation performance. Monte Carlo CV using holdout set sizes of 25% and 50% indicated strong evaluation performance on the holdout data, with anscMLR R2 of 0.77. When evaluated over a sufficiently large set of unseen data points, qNO2 anscMLR exhibits strong generalization ability. Further, the difference between holdout set and training set performance is small, indicating that the anscMLR model is not overfit to the training data. This finding supports the use of qNO2 anscMLR as a reliable metric for future surface NO2 estimation beyond the domain of our analysis.

We also conduct CV using the seven CONUS regions by leaving one region out for evaluation and fitting anscMLR models on the remaining six regions. As with non-cross-validated regional evaluation detailed in Section 3.3, qNO2 anscMLR generalizes well to Midwest monitors with an R2 of 0.89 and has the lowest generalization performance for Southwest (R2 = 0.65), Pacific Northwest (R2 = 0.66), and Northeast sites (R2 = 0.69) (Table S12 in Supporting Information S1). The similar results between cross-validated and non-cross-validated region-wise evaluation indicate that qNO2 is generalizable to new geographic contexts.

4 Conclusions

We fit regression models with TROPOMI NO2, location setting, and road proximity inputs to estimate 2019 annual average surface NO2 concentrations at 0.01° by 0.01° resolution across the continental U.S. Among the regression models studied, qNO2 anscMLR has the strongest overall performance. qNO2 anscMLR is the best estimate for surface NO2 at non-near-road sites (anscMLR R2 = 0.76) and suburban sites (anscMLR R2 = 0.74). We also investigate qNO2 spatial patterns over large U.S. urban areas, compare qNO2 performance across U.S. regions, and assess the generalizability of qNO2. We find that qNO2 performs best in the Midwest, with CV anscMLR R2 of 0.89.

Using easily accessible data and interpretable methods, we demonstrate comparable or improved performance over prior regression-based studies which use satellite NO2 to estimate surface NO2. Table 2 presents R2 values of satellite-surface NO2 agreement for prior studies which cover four main methods, both OMI and TROPOMI NO2 data, and a wide range of spatiotemporal scales. Novotny et al. (2011) used GEOS-Chem to derive surface NO2 concentrations from OMI, which was then used as regression input along with land use to estimate annual-average surface NO2 at 30-m resolution. Their work resulted in a CV R2 of 0.77 and an MAE of 2.40 ppb evaluated at EPA NO2 monitors across CONUS. The slightly stronger performance of qNO2 anscMLR using a three-variable regression model without GEOS-Chem-based column NO2 adjustments highlights the improved ability of the higher resolution TROPOMI to capture surface NO2 compared to prior satellite products. Goldberg et al. (2021) found an R2 of 0.66 between annual-average TROPOMI NO2 and EPA NO2 at non-near-road sites. Using the same 0.01° by 0.01° TROPOMI data set, we apply the Anscombe transform to TROPOMI NO2 which results in 0.06 greater R2 at non-near-road sites. Lee et al. (2023) used multivariate regression to analyze TROPOMI NO2 agreement with 2018–2019 annual-average surface NO2 over California at 0.5 by 0.5 km resolution. Their final regression models included land use and road proximity inputs. Meteorological inputs were initially considered but were removed because they did not contribute to model performance. Their work achieved a CV R2 of 0.76 and RMSE of 2.51 ppb. These metrics are comparable to qNO2 anscMLR metrics computed using the same CV method as Lee et al., for all CONUS monitor sites (CV R2 = 0.75 and RMSE = 2.64 ppb). We further find that qNO2 anscMLR has CV R2 of 0.76 and RMSE of 2.63 ppb in California, in close agreement with the Lee et al. results while using simpler input variables.

Table 2. Comparison of R2 Values With Other Studies Investigating Agreement Between Satellite-Derived and Surface Measurements of NO2
Model type Study Satellite product Spatial domain Temporal domain R2
Chemical transport Bechle et al. (2013) OMI (Ozone Monitoring Instrument) S. Cali. Annual 0.86
Gu et al. (2017) OMI China Monthly 0.61–0.64
Cooper et al. (2020) TROPOMI (TROPOspheric Monitoring Instrument) U.S., Canada Annual 0.53
Machine learning Qin et al. (2020) POMINO E. China Monthly 0.72a
Chan et al. (2021) TROPOMI Germany Daily 0.64a
Ghahremanloo et al. (2021) TROPOMI Texas Daily 0.71b
Chi et al. (2022) TROPOMI China Daily 0.73a
Li et al. (2022) TROPOMI China Daily 0.78a
Grzybowski et al. (2023) TROPOMI Poland Weekly 0.60a
Spatial interpolation Young et al. (2016) OMI U.S. Annual 0.65–0.80b
Regression Henderson et al. (2007) Vancouver, Canada Annual 0.69
Novotny et al. (2011) OMI U.S Annual 0.77a
Ghahremanloo et al. (2021) TROPOMI Texas Daily 0.59b
Goldberg et al. (2021) TROPOMI U.S. Annual 0.66
Yu and Li (2022) TROPOMI Xinjiang Province Monthly 0.78
Grzybowski et al. (2023) TROPOMI Poland Weekly 0.49a
Lee et al. (2023) TROPOMI California Annual 0.76a
Present Study TROPOMI CONUS Annual 0.78 b
  • a Holdout set or k-fold CV R2 values.
  • b Spatial CV R2 values.

Our results compare favorably to prior studies using CTM, machine learning, and spatial interpolation models to investigate satellite-surface NO2 agreement (Table 2). Cooper et al. (2020), who used GEOS-Chem simulated vertical profiles to infer surface NO2 from TROPOMI columns over CONUS and Canada, attained an annual-average R2 of 0.53 against surface observations, comparable to the R2 of 0.55 we obtain with SLR for all monitors. Chi et al. (2022) used XGBoost, a popular machine learning method, to estimate 2018–2020 annual-average surface NO2 over China using TROPOMI, meteorological, and land use data. They obtained a CV R2 of 0.73. Ghahremanloo et al. (2021) used convolutional neural networks, a deep learning method specialized for image-based inputs, to estimate surface NO2 over Texas. They attained a spatial CV R2 of 0.71 averaged over 2019. We found a spatial CV R2 of 0.65 for monitors in the Southwest region, of which a majority are in Texas. Lastly, Young et al. (2016) use kriging to estimate annual-average surface NO2 over CONUS from 1990 through 2012 using OMI NO2, obtaining spatial CV R2 values between 0.65 and 0.80, depending on the year. Although our results are not directly comparable to the above studies due to differing spatiotemporal domains and data sets, our anscMLR R2 of 0.78 indicates favorable performance compared to studies using machine learning and spatial interpolation models. Despite the relative simplicity of regression methods, they provide comparable performance for estimating surface NO2 using satellite and auxiliary data.

Although the analyses in this work are conducted over the continental United States, the strong region-wise CV performance of the MLR models indicate that qNO2 can be applied to other regions with relatively similar climates, geographies, and chemical regimes to CONUS, such as Northern Europe and East Asia. Given the multitude of studies which have investigated satellite-surface NO2 agreement over China (e.g., Chi et al., 2022; Li et al., 2022) and Europe (e.g., Chan et al., 2021; Grzybowski et al., 2023), we anticipate the data sources and methods used in our work can be reproduced over these regions. Additionally, since both road density and location setting are relatively static over time, surface NO2 concentrations of additional years in the TROPOMI record can be estimated using qNO2, thus enabling NO2 exposure assessments in areas or time periods with sparse or no monitor coverage. However, qNO2 may not generalize well to regions with characteristics not represented in CONUS, such as tropical and polar regions. In addition to characterizing surface NO2, our analysis of satellite-surface agreement across spatial scales, contexts, and regression model configurations informs the application of satellite NO2 products in different domains.

The annual-average scale of our analysis is suitable for characterizing long-term spatial characteristics of surface NO2. However, as NO2 exhibits significant daily and seasonal variability, our results are less applicable for studies of short-term pollution events and trends. For example, our methodology is less applicable for inferring NO2 from biomass burning events because of their short time scale (less than 1 week) and tendency to be obscured from satellite measurements since they produce high-density smoke which is often indistinguishable from clouds (Griffin et al., 2021). We expect our results to support long-term, high-spatial resolution NO2 exposure assessments, such as those used to determine compliance with the EPA annual-average NO2 NAAQS.

The spatial distribution of EPA monitors presents a potential source of bias for qNO2. In the eastern half of the U.S., clusters of monitors are evenly distributed, mainly near urban areas. In the western U.S., monitors are concentrated in rural Wyoming, western North Dakota, and throughout California but are sparse in Washington and Oregon. Thus, monitor measurements may not be fully representative of NO2 spatial patterns over the U.S., impacting the generalizability of qNO2 to less-represented location settings and regions. Spatial interpolation models which account for error correlations between monitors in proximity may help to account for the inconsistent distribution of surface monitors.

We anticipate that the results presented here will inform analysis of data from TEMPO (Tropospheric Emissions: Monitoring of Pollution), a new NASA geostationary satellite instrument launched in April 2023 which captures hourly column NO2 during all daylight hours at 2.1 km by 4.4 km resolution over the entire continental United States (Zoogman et al., 2017). The greater spatial and temporal resolution from TEMPO will expand the scope of air quality analyses. For example, the methods in this work can be extended to compare hourly TEMPO observations with hourly ground monitor measurements of NO2. Further, greater spatial resolution will enable investigation of satellite-surface agreement over finer-scale emissions sources such as industrial sites in addition to major roads.

Acknowledgments

EK, TH, MH, CH, and SE were supported by NASA Grant 80NSSC21K0427 for the NASA Health and Air Quality Applied Sciences Team (HAQAST). DG was supported by NASA Grant 80NSSC21K0511 for the NASA Health and Air Quality Applied Sciences Team (HAQAST). EK was also supported by the Hilldale Undergraduate Research Fellowship at the University of Wisconsin-Madison.

    Data Availability Statement

    TROPOMI NO2 data can be obtained here: https://doi.org/10.5270/S5P-9bnp8q8 (Copernicus Sentinel-5P (processed by ESA), 2021). TROPOMI NO2 data re-gridded by DG to 0.01° resolution over CONUS for 2019 through 2023 can be found here: https://disc.gsfc.nasa.gov/datasets/HAQ_TROPOMI_NO2_CONUS_A_L3_2.4/summary. The road data used in our work was derived from shape files accessible at https://www2.census.gov/geo/tiger/TIGER2021/PRIMARYROADS/. Location setting data was obtained from https://nces.ed.gov/programs/edge/Geographic/LocaleBoundaries. The above data re-gridded to the custom 0.01° by 0.01° grid used in this work is available at https://doi.org/10.5281/zenodo.10601063. EPA AQS data is accessible at https://aqs.epa.gov/aqsweb/airdata/download_files.html. All code for the analysis and visualizations presented in this study are available at https://doi.org/10.5281/zenodo.10582277.