Volume 5, Issue 8 e2021GH000423
Research Article
Open Access

Spatiotemporal Associations Between Social Vulnerability, Environmental Measurements, and COVID-19 in the Conterminous United States

Daniel P. Johnson

Corresponding Author

Daniel P. Johnson

Department of Geography, Indiana University—Purdue University at Indianapolis, Indianapolis, IN, USA

Correspondence to:

D. P. Johnson,

[email protected]

Contribution: Conceptualization, Methodology, Software, Validation, Formal analysis, ​Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision, Project administration, Funding acquisition

Search for more papers by this author
Niranjan Ravi

Niranjan Ravi

Department of Electrical and Computer Engineering, Indiana University—Purdue University at Indianapolis, Indianapolis, IN, USA

Contribution: Software, Validation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization

Search for more papers by this author
Christian V. Braneon

Christian V. Braneon

NASA Goddard Institute for Space Studies, New York, NY, USA

SciSpace, LLC, New York, NY, USA

Contribution: Conceptualization, Methodology, ​Investigation, Writing - original draft, Writing - review & editing

Search for more papers by this author
First published: 21 July 2021
Citations: 13


This study summarizes the results from fitting a Bayesian hierarchical spatiotemporal model to coronavirus disease 2019 (COVID-19) cases and deaths at the county level in the United States for the year 2020. Two models were created, one for cases and one for deaths, utilizing a scaled Besag, York, Mollié model with Type I spatial-temporal interaction. Each model accounts for 16 social vulnerability and 7 environmental variables as fixed effects. The spatial pattern between COVID-19 cases and deaths is significantly different in many ways. The spatiotemporal trend of the pandemic in the United States illustrates a shift out of many of the major metropolitan areas into the United States Southeast and Southwest during the summer months and into the upper Midwest beginning in autumn. Analysis of the major social vulnerability predictors of COVID-19 infection and death found that counties with higher percentages of those not having a high school diploma, having non-White status and being Age 65 and over to be significant. Among the environmental variables, above ground level temperature had the strongest effect on relative risk to both cases and deaths. Hot and cold spots, areas of statistically significant high and low COVID-19 cases and deaths respectively, derived from the convolutional spatial effect show that areas with a high probability of above average relative risk have significantly higher Social Vulnerability Index composite scores. The same analysis utilizing the spatiotemporal interaction term exemplifies a more complex relationship between social vulnerability, environmental measurements, COVID-19 cases, and COVID-19 deaths.

Key Points

  • Patterns of coronavirus disease 2019 (COVID-19) cases and deaths vary considerably through time and space

  • COVID-19 cases and deaths concentrated in areas of increased social vulnerability at different times of the year

  • Differences exist between examined variables contribution's to risk of infection and their respective contribution's to risk of mortality

Plain Language Summary

Coronavirus disease 2019 (COVID-19) affects different locations at different points in time and understanding its impact on communities is an imperative research effort. Communities that are considered socially vulnerable—less resilient to hazards—are disproportionately impacted by pandemics and other environmental stresses. In this study, we utilize a modeling approach that accounts for COVID-19 cases and deaths, social vulnerability, environmental measurements, and both space and time domains at the US county level from March 1 to December 31, 2020. Throughout much of the time period, cases and deaths clustered in different areas. Measurements of social vulnerability were higher in these long-term clusters. Examining short-term clusters on a monthly basis, COVID-19 cases and deaths focused heavily in socially vulnerable areas during the summer and autumn months, respectively. The individual social vulnerability variable of not having a high school diploma and non-White status were the most significant contributors to relative risk to both cases and deaths. Age 65 and over contributed significantly to deaths. Temperature, with an inverse relationship, had the strongest effect on risk among the environmental measurements. Social vulnerability measures were higher in areas where there was an increased risk of COVID-19 infection and death during the summer and autumn, respectively.

1 Introduction

The coronavirus disease 2019 (COVID-19, ICD-10-CM, U07.1, and 2019-nCoV acute respiratory disease) pandemic is currently affecting much of the world. As of January 30, 2021, 11 months (325 days) into the pandemic and 1 year since the World Health Organization declared COVID-19 a Public Health Emergency of International Concern (PHEIC), there are over 100 million confirmed cases of the disease and over 2 million deaths within 223 countries, areas, or territories (WHO, 2021). In the United States, as of the same time, there are over 25 million confirmed cases and close to 500,000 deaths, 25.25% of cases and 19.62% of deaths worldwide (CDC, 2021). The United States only accounts for 4.23% of the global population, so it is disproportionately affected (U.S. Census Bureau, 2021)

Pandemics, as well as other natural and man-made hazards, disproportionately impact socially vulnerable individuals and communities (Freitas & Cidade, 2020; Gaynor & Wilson, 2020; Seddighi, 2020; Usher et al., 2020). The past decade has witnessed an increasing trend in research activity focusing on social and environmental vulnerability as it relates to geophysical and man-made hazards. More recently, there has been vigorous interest in social vulnerability as it relates to the ongoing COVID-19 pandemic (Bilal et al., 2020; Coelho et al., 2020; Dasgupta et al., 2020; Gaynor & Wilson, 2020; Khazanchi et al., 2020; Kim & Bostwick, 2020; Lancet, 2020; Mishra et al., 2020; Mohanty, 2020; Neelon et al., 2020; Snyder & Parks, 2020). Additionally, researchers have attempted to construct COVID-19-specific vulnerability indices, examine spatial relationships, or integrate both social and environmental determinants into a complete model, illustrating areas more prone to adverse impacts (Karaye & Horney, 2020; Khazanchi et al., 2020; Snyder & Parks, 2020).

However, there is a paucity of studies focusing on the spatiotemporal character of the pandemic and the relationships between social and environmental determinants of COVID-19 vulnerability. This study focuses on a spatiotemporal analysis of the COVID-19 pandemic in the conterminous United States during the year 2020. This investigation not only adds to the growing literature on vulnerability and COVID-19, it also illuminates some of the spatial and temporal underpinnings of the pandemic in the United States. This study seeks to examine these spatial-temporal patterns and vulnerability contributions with three specific aims:
  1. Identify spatiotemporal associations between social vulnerability, environmental measurements, and both cases of and deaths from COVID-19 aggregated by US counties.

  2. Model the spatial-temporal dimensions of the pandemic and determine if socially vulnerable counties are more or less impacted at certain times of the year.

  3. Create two complementary parsimonious spatiotemporal models—one (1) for COVID-19 cases and one (1) for COVID-19 deaths—that take into account social vulnerability, environmental measurements, and spatial and temporal random effects

2 Background

2.1 COVID-19 as a US Health Disparity

The disproportionate impact of COVID-19 on Black, Indigenous, and People of Color (BIPOC) in the United States is ongoing as of early 2021 (LaVeist, 2005; Shi & Stevens, 2021). A number of health disparities associated with “non-White” (i.e., minority) status have been examined in the literature (Guess, 2006; Song, 2020). This list includes among others cardiovascular disease, chronic respiratory conditions, hepatitis, and cancer (LaVeist, 2005). The fact that COVID-19 is disproportionately represented in US communities of color is not surprising (Singu et al., 2020). The reason(s) for the disparity in representation of COVID-19 cases and deaths within the US population is likely multifaceted encompassing a variety of cultural, social, environmental, and economic contributors. While Persad et al. (2020) have noted that “racial identity is not an inherent risk factor,” “COVID-19 disparities reflect the health, environmental, and occupational effects of structural racism.” Numerous researchers have highlighted the underfunding of preventative public health infrastructure, an inefficient healthcare system, inadequate governmental response, and systemically racist policies that have exacerbated the pandemic's effects (Egede & Walker, 2020; Hanage et al., 2020; Hathaway, 2020; Yearby & Mohapatra, 2020; Zalla et al., 2021). However, one highly probable contributor is the number and extent of socially vulnerable communities within the United States that demonstrate reduced resiliency in the face of a hazard (Karaye & Horney, 2020; Khazanchi et al., 2020; Shi & Stevens, 2021).

2.2 Social Vulnerability and COVID-19

Social vulnerability (SV) as a concept refers to a society or communities' impaired ability to respond to an external stressor. These external pressures can be a single incident or the compounding consequences of multiple events leading to deleterious effects on the society or community. Studies highlighting the negative impact COVID-19 has on those considered socially vulnerable have grown exponentially since the beginning of the pandemic. Here, we highlight research that focuses on social vulnerability as a covariate in geographic ecological regression studies. Many of these efforts come to similar conclusions; albeit with different variables being more or less related to COVID-19's effects. The studies highlighted utilize the U.S. Centers for Disease Control and Prevention's (U.S. CDC) Social Vulnerability Index (SVI), which we apply in this study (CDC's Social Vulnerability Index (SVI), 2021; Flanagan et al., 2011). The SVI ranks counties based on 15 social factors, which we utilize independently in this study. The factors are also grouped into four different themes, socioeconomic status, household composition and disability, non-White status and language, and housing type and transportation. These themes are also grouped into a single composite score of overall vulnerability. We utilize the composite score for comparison of counties in our hot and cold spot analysis.

Khazanchi et al. (2020), using a quasi-Poisson regression approach, discovered that counties considered vulnerable had a 1.63-fold greater risk for COVID-19 diagnosis and a 1.73-fold greater risk for COVID-19-related death. When considering only the language and non-White status domain of the SVI, they found a 4.94-fold and 4.74-fold increase in diagnosis and death, respectively. Examining counties broken into the most vulnerable by socioeconomic status, housing type, and transportation deficiencies resulted in a higher relative risk (i.e., were at a greater risk of COVID-19 infection and death). Further effort by Nayak, examining 433 US counties (counties with ≥50 COVID-19 cases as of April 4, 2020), using a generalized linear mixed-effect model, found that higher composite SVI values were associated with an increased COVID-19 case fatality rate (CFR) (Nayak, 2020). The relative risk further increased after adjusting for age 65 and over. However, the relationship between the overall SVI score and COVID-19 incidence was not statistically significant. In a study by Neelon et al. (2020) utilizing COVID-19 cases and deaths within a Bayesian hierarchical negative binomial model between March 1 and August 31, 2020, counties were classified based on SVI composite percentiles. Cases and deaths were examined daily for all US counties after adjusting for percentage rural, percentage poor or in fair health, percentage of adult smokers, county average daily PM2.5, and primary care physicians per 100,000. By March 30, 2020, relative risk became significantly greater than 1.00 in the most vulnerable counties. Upper SVI quartile counties had higher death rates on average beginning on March 30, 2020. By late August, the lower quartiles for SVI began to exhibit increasing levels of cases and deaths. Dasgupta et al. (2020) examined COVID-19 cases from June 1 to July 25, 2020 relating them to the CDC's SVI. Areas with a higher proportion of individuals with non-White status, housing density, and crowded housing units (3 of the 15 individual social factors that comprise the SVI) were more likely to become COVID-19 hot spots; defined as areas where there is a >60% change in the most recent 3–7 days COVID-19 incidence rate. Further analysis demonstrated, that among the hot spot counties, those with greater SVI composite scores had higher COVID-19 incidence rates. Karaye and Horney (2020) examined local relationships between COVID-19 case counts and SVI utilizing geographically weighted regression (GWR). The study examined data from January 21 to May 12, 2020 and found (after adjusting for population size, population density, number of persons tested, average daily sunlight, precipitation, air temperature, heat index, and PM2.5) that non-White status, limited English, household composition, transportation, housing, and disability effectively predicted case counts in the United States. Snyder and Parks (2020), in another spatial analysis (utilizing GWR), which did not utilize the CDC SVI, found that socio-ecological vulnerability to COVID-19 varied across the contiguous United States, with higher levels of vulnerability in the Southeast and low vulnerability in the Upper Midwest, Great Plains, and Mountain West.

2.3 Environmental Determinants of COVID-19

Even though we are only 11 months into the pandemic, there is growing evidence regarding environmental determinants of COVID-19 infection and mortality. Initially, it was hypothesized that COVID-19 may behave like many other respiratory infections and cases would subside in the summer months in the Northern Hemisphere due to increases in temperature (Hassan et al., 2020; Jamil et al., 2020; Prata et al., 2020). Therefore, many researchers have concentrated on temperature and less on other meteorological measurements as a factor in spread of SARS-CoV-2. A study conducted in Bangladesh found that high temperature and high humidity significantly reduced the transmission of COVID-19 when analyzed using ordinary least squares regression (Haque & Rahman, 2020). Another analysis utilizing log-linear generalized additive models across 166 countries revealed that temperature and relative humidity were also associated with a decrease in COVID-19 cases (Wu et al., 2020). A 1°C increase in temperature was associated with a 3.08% reduction in daily new cases and a 1.19% reduction in daily deaths across the studied countries. Relative humidity had a similar effect on cases and deaths (Wu et al., 2020). However, this is not surprising given the calculation of relative humidity employs a function of temperature. Rouen et al. (2020), utilizing micro-correlation analysis using a 10-days moving window, found a negative correlation between temperature and outbreak progression. Their research was conducted across four continents in both hemispheres. Sarkodie and Owusu (2020), using panel estimation techniques, focused their research on the top 20 countries with COVID-19 cases between January 22 and April 27, 2020 and found that a percent increase in temperature lowered cases by ∼0.13% and deaths by 0.11%. Percentage increases in minimum temperature increased cases and deaths by 10%. Additionally, low temperature, wind speed, dew/frost point, precipitation, and surface pressure increased the infectivity and survival of the virus.

At a much finer scale, research in China at 31 different provincial levels revealed a “biphasic” relationship with temperature, using distributed lag nonlinear models (Shi et al., 2020). Epidemic intensity was slightly reduced on days following higher temperatures and was associated with a decrease in relative risk. An investigation into temperature and precipitation's relationship with COVID-19 in Oslo, Norway found that maximum temperature and average temperature to be positively associate with COVID-19 transmission and precipitation to have a negative relationship, using non-parametric correlation estimation (Menebo, 2020). Research in the sub-tropical cities of Brazil uncovered a negative relationship between temperature and COVID-19, using generalized additive models and polynomial linear regression (Prata et al., 2020). Bashir et al. (2020) in New York City, New York, USA, between March and April of 2020, found a significant positive correlation between average temperature and minimum temperature on total cases, using Kendall and Spearman rank correlation tests. Average temperature was significantly positively related to COVID-19 mortality and the minimum temperature was associated with new cases. Another study, utilizing multi-variate regression focusing on all US counties from the beginning of the pandemic to April 14, 2020, found that higher temperatures were associated with a decrease in cases but not deaths (Li et al., 2020). This sample of studies demonstrates—particularly at finer spatial and temporal scales and depending on how temperature is sampled—the relationship between environmental variables and COVID-19 are complex and variable.

3 Methods

3.1 Study Area and Timeframe

This study focuses on the counties (substate administrative districts) located in the conterminous United States. We selected this subset due to Alaska and Hawai'i being noncontiguous and the error and complexity these “islands” would introduce into the spatial weighting matrices necessary for the spatiotemporal analysis. Furthermore, we focus on the timeframe from March 1 to December 31, 2020 (10 full months or 307 days), the first calendar year of the pandemic in the United States. We focus on data compiled monthly not only because it is a natural timeframe in which to aggregate the data but that the time from initial exposure to infection and/or mortality is typically within 30 days (McAloon et al., 2020)

3.2 Data Collection

The data described below in Sections 3.2.1–3.2.4-3.2.1–3.2.4, was used to create the data set for the analysis (Johnson & Ravi, 2021).

3.2.1 COVID-19 Cases and Deaths

To create two models, one for COVID-19 cases and one for deaths, infection and mortality counts were collected from USA FACTS (US Coronavirus Daily Cases by County, 2021; US Coronavirus Daily Deaths by County, 2021). These data were retrieved in comma-separated values (csv) format and were grouped into monthly cases and deaths for all counties of the contiguous United States (n = 3,106), two counties were missing data. The expected number of cases and deaths, E, per county areai, was determined by calculating the number of cases and deaths per month and computing the standardized infection rate (SIR) and standardized mortality rate (SMR) for each area for each month; the denominator of each rate is the expected number of cases or deaths (depending on the model) = Eit,. These expected values are later used as the offset in each model; expected number of cases/deaths at area i during time t.

3.2.2 Social Vulnerability

In this analysis, we utilize the U.S. CDC's SVI (CDC's Social Vulnerability Index (SVI), 2021; Flanagan et al., 2011). The SVI is composed of 15 variables that are related to social vulnerability and the local socio-ecology at the county or census tract-level for the entire United States. We chose the SVI because it is highly cited in the literature and while there are inherent limitations in all vulnerability indices it demonstrates greater accuracy and relevancy in many studies (Bakkensen et al., 2017; Rufat et al., 2019; Spielman et al., 2020; Tate, 2012). The 15 SVI variables are listed below in Table 1 along with the additionally included value of percent uninsured (health).

Table 1. Descriptive Statistics of Monthly (Year 2020) Averages for SVI and Environmental Variables Utilized in the Models and Aggregated by U.S. County
Variable Mean Standard deviation Minimum Maximum
Unemployed 5.74% 2.78% 0% 26.40%
Per capita income $27,036.00 $6,457.00 $0 $72,832.00
No high school diploma 13.45% 6.34% 1.20% 66.30%
Age 65+ 18.43% 4.54% 3.80% 55.60%
Age 17− 22.35% 3.43% 7.30% 40.30%
Disabled 15.96% 4.40% 3.80% 33.70%
Single parent 8.30% 2.71% 0% 25.60%
Minority 23.15% 19.84% 0% 99.30%
Mobile home 13.03% 9.62% 0% 59.30%
Crowded living (>10) 2.33% 1.92% 0% 33.80%
No vehicle 6.18% 3.57% 0% 77.00%
Group quarters 3.48% 4.74% 0% 55.70%
Poverty 15.63% 6.46% 0% 55.10%
Multi-unit dwelling 4.63% 5.64% 0% 89.40%
Limited english 1.70% 2.79% 0% 30.40%
Uninsured 10.00% 4.98% 1.70% 42.40%
Daytime LSTa 294.65 K 16.62 K 239.00 K 334.00 K
Nighttime LSTa 292.73 K 14.61 K 240.21 K 326.94 K
(AGL) temperatureb 2 m above ground level 289.11 K 8.58 K 262.81 K 309.40 K
Specific humidityb 0.008 kg/kg 0.004 kg/kg 0.0015 kg/kg 0.020 kg/kg
Atmospheric pressureb 96,628.92 Pa 5,465.36 Pa 6,669.97 Pa 102,268.70 Pa
Longwave radiationb 327.85 W/m2 51.84 W/m2 170.07 W/m2 438.38 W/m2
Shortwave radiationb 208.13 W/m2 72.30 W/m2 38.76 W/m2 359.35 W/m2
Potential evaporationb 0.22 kg/m2 0.10 kg/m2 −0.0051 kg/m2 0.55 kg/m2
Precipitationb 0.11 kg/m2 0.008 kg/m2 0 kg/m2 0.66 kg/m2
Wind speed 10 m AGLb 3.45 m/s 0.69 m/s 1.08 m/s 6.93 m/s
Wind direction 10 m AGLb 190.52° 35.00° 77.38° 335.61°
  • Abbreviations: LST, land surface temperature; MODIS, Moderate Resolution Imaging Spectroradiometer; NLDAS, North American Land Data Assimilation System.
  • a MODIS.
  • b NLDAS.

We utilize the percentage ((variable/total population) × 100) of each variable (except for per capita income where we used U.S. Dollars $) for all the selected counties standardized by their respective z-scores. These variables were chosen so that we could determine which individual social factors within the SVI were most related to COVID-19 cases and deaths. We also utilize the SVI composite score, which is an aggregate of the four individual themes of socioeconomic status, household composition and disability, non-White status and language, and housing type and transportation, for our hot-spot analysis.

3.2.3 Land Surface Temperature

Daily land surface temperature (LST) measurements were collected from the Moderate Resolution Imaging Spectroradiometer (MODIS) TERRA satellite system; MOD11a1.006 (Thome, 2020; Wan et al., 2015). MODIS data has a low spatial resolution (1 km) but a high (daily) temporal resolution. This remotely sensed data set, an emissivity corrected land surface temperature image for both daytime and nighttime, was collected from Google Earth Engine™ using geemap (Gorelick et al., 2017; Wu, 2020). After collection, the daily values were averaged per month for each county in the conterminous United States resulting in a monthly average for daytime and nighttime LST. Areas where cloud cover interfered with image acquisition were assigned a NA value. These monthly averaged data were standardized by z-score.

3.2.4 Meteorological Measurements

For additional environmental variables, we utilized the North American Land Data Assimilation System (NLDAS). NLDAS contains land surface model datasets available hourly at 1 km spatial resolution and is also accessible in Google Earth Engine™ (Cosgrove et al., 2003; Gorelick et al., 2017). All but one of the environmental variables listed in Table 2 were averaged by day and then by month for each county and standardized by their respective z-scores. However, for precipitation, we calculated a daily sum and then a monthly average before standardizing. After adjusting for multi-collinearity, the measurements for specific humidity, longwave radiation, shortwave radiation, and potential evaporation were removed. Specific humidity is the ratio of the mass of H2O(v) per total mass of the air parcel (kg/kg). This measurement is not a function of temperature and water content like relative humidity, but we still found a greater than 80% correlation across all counties throughout the time period with 2 m AGL temperature. The other extraneous environmental variables had correlation coefficients above 0.80 with 2 m AGL temperature.

Table 2. Posterior Coefficients and 95% Credibility Intervals of Variables Utilized in the Model for COVID-19 Cases of Infection
Variable Mean Standard deviation 2.5% credibility interval 97.5% credibility interval Effect on risk (%)
Intercept −0.65 0.01 −0.66 −0.63
Unemployed −0.09 0.04 −0.17 −0.01 −8.5
Per capita income 0.04 0.01 0.01 0.07 4.26
No high school diploma 0.19 0.02 0.16 0.23 21.44
Age 65+ −0.02 0.01 −0.05 0 −2.26
Age 17− 0.11 0.01 0.08 0.14 11.78
Disabled −0.07 0.01 −0.1 −0.04 −6.73
Single parent 0.03 0.01 0.01 0.05 2.94
Minority 0.19 0.01 0.16 0.21 20.6
Mobile home −0.01 0.01 −0.04 0.01 −1.3
Crowded living (>10) −0.05 0.01 −0.07 −0.03 −5.13
No vehicle −0.07 0.01 −0.1 −0.05 −7.19
Group quarters 0.03 0.01 0.01 0.05 3.41
Poverty 0.09 0.04 0.01 0.18 9.6
Multi-unit dwelling 0.11 0.01 0.08 0.13 11.18
Limited english −0.05 0.01 −0.08 −0.03 −5.1
Uninsured −0.01 0.01 −0.03 0.01 −1.03
LST day 0 0.01 −0.02 0.02 0.45
LST night −0.13 0.01 −0.14 −0.11 −11.82
AGL temperature −0.23 0.02 −0.27 −0.2 −20.7
Pressure 0.12 0.01 0.09 0.14 12.34
Precipitation −0.04 0.01 −0.06 −0.03 −4.07
Wind direction 0.11 0.01 0.09 0.13 11.79
Wind speed 0.03 0.01 0.01 0.04 2.84
  • Note. Values in bold are not statistically significant.
  • Abbreviations: COVID-19, coronavirus disease 2019; LST, land surface temperature.

3.3 Modeling and Specification

3.3.1 Bayesian Spatial-Temporal Framework

This study utilizes a Bayesian hierarchical spatiotemporal modeling approach initialized within the freely available R Statistical platform and the R-INLA package (R: The R Project for Statistical Computing, 2021; Rue et al., 2009). Furthermore, all models developed in this study were executed within Indiana University's High-Performance Computing (HPC) environment (Research and high performance computing, 2020). Bayesian hierarchical modeling provides a flexible and robust framework where space-time components can be modeled in a straightforward manner. There are numerous introductions to Bayesian disease mapping to which we direct the novice (Best et al., 2005; Blangiardo et al., 2013; Lawson, 2013; Lawson & Lee, 2017; Moraga, 2020).

The Bayesian hierarchical methodology offers many benefits. For example, when creating disease models and relating counts to covariates, it is unreasonable to assume that one can collect all the necessary variables that account for a given response. The approach utilized here allows for the inclusion of these “unknown” covariates as random effects within the model (Bernardinelli et al., 1995; Best et al., 2005; Congdon, 2019). These effects, in the spatial-temporal domain, account not only for spatial structure (spatial autocorrelation) and noise (overdispersion), but for temporal correlation and interaction between space and time (Besag et al., 1991; Samat & Mey, 2017; Samat & Pei Zhen, 2017; Ugarte et al., 2014). Also, it is appropriate to utilize the standardized incidence rate (SIR), number of cases/expected number of cases, and SMR, number of deaths/expected number of deaths, for country-level disease modeling. However, when examining disease measurements at a finer level (i.e., county-level or smaller), the SIR/SMR, a surrogate for relative risk, can be unstable and suffer large fluctuations due to some areas possessing a small population relative to the incidence of disease. The Bayesian modeling approach utilized “smooths” values of relative risk through space and time, by “borrowing” information both locally and globally, thereby reducing the impact of these instabilities (Bernardinelli et al., 1995; Besag et al., 1991).

To model relative risk, observed counts—in this study, COVID-19 cases of infection and deaths—Yi are modeled using a Poisson distribution with mean Eiθi; E is the expected counts, θi is the relative risk (RR) of area i. The logarithm of RRi is the sum of an intercept α and random effects accounting for extra-Poisson variability.
α is the overall risk in the study area, and S and U are spatial random effects for area i modeling the spatial dependency structure (S) and the unstructured uncorrelated noise (U). Along with the inclusion of covariates, that determine risk (i.e., social vulnerability, environmental measurements), and/or other random effects the overall spatial model can be represented as
di is a vector consisting of the intercept (α), and β is a coefficient vector, the fixed effects of the model. The parameters of the 27 fixed covariates included in this study are each assigned urn:x-wiley:24711403:media:gh2265:gh2265-math-0004 prior distributions.
A widely cited specification for the random spatial effects S and U, the Besag, York, Mollié (BYM) model, is regularly utilized in disease mapping studies (Besag et al., 1991). In the BYM model, the spatial random effect S is assigned a conditional autoregressive (CAR) distribution, smoothing the associated data based on a specified neighborhood structure, where neighbors are defined as areas sharing a common border.

The unstructured component U is modeled as independent and identically distributed (IID) with mean of zero and variance = urn:x-wiley:24711403:media:gh2265:gh2265-math-0009. Therefore, data is shared both locally through the S component and globally through the U component.

In this study, we follow the parameterization of the BYM model proposed by Simpson et al. (2015) that enables assigning penalized complexity (PC) priors. This so-called “BYM2” model incorporates a scaled spatially structured and unstructured component (S* and U*) and is defined as:

The mixing—between S* and U*—parameter φ (0 ≤ φ ≤ 1) measures the proportion of variance explained by S*. This scales the BYM2 model making it equal to the spatial model when φ = 1 and equal to only unstructured spatial noise when φ = 0 (Riebler et al., 2016). We set priors for these parameters following suggestions by Simpson et al. (2015). PC priors as their name suggest penalize model complexity. In this case, they penalize based on the degree to which a given model deviates from a foundational assumption of no spatial dependency (φ = 0). Conjoining the random spatial effects for each area (S* + U*) is termed the convolutional spatial component. The exponential factor for the convolutional spatial effect urn:x-wiley:24711403:media:gh2265:gh2265-math-0011 provides one with RR contribution of the random spatial effects additively. Repeating this procedure for either S* or U* will provide the relative contribution of each and allow the determination of the comparative contribution to variance (spatial fraction). This scaling parameterization makes the BYM2 representation more interpretable between models than the unscaled BYM model.

The above specification for log(θi) can be extended into the spatial-temporal domain by the addition of further random effects.

Here, urn:x-wiley:24711403:media:gh2265:gh2265-math-0013, correspondingly represent the temporally structured and temporally unstructured random effect. Typically, urn:x-wiley:24711403:media:gh2265:gh2265-math-0014, is modeled as a CAR random walk of either order one or two (RW1, RW2), but there can be additional specifications (i.e., seasonal). In the present study, we model the temporally structured effect as RW1, and specify urn:x-wiley:24711403:media:gh2265:gh2265-math-0015 as a Gaussian exchangeable IID (urn:x-wiley:24711403:media:gh2265:gh2265-math-0016. The space-time interaction component urn:x-wiley:24711403:media:gh2265:gh2265-math-0017, represents a parameter vector that varies jointly through space and time. This vector allows for deviations from the space and time structure that expresses both dynamic spatial changes from one time frame to another and active temporal patterns from one area to another (Knorr-Held, 2000). Therefore, mapping urn:x-wiley:24711403:media:gh2265:gh2265-math-0018 characterizes short-term clusters of disease activity that deviate from the space-time average over the study area at time t.

3.3.2 Ecological Regression

The covariates representing SVI and environmental measurements (after correcting for multi-collinearity) were included in the models as fixed effects, to examine which measurements are intricately linked to the spatiotemporal processes of the pandemic in the United States. This study opted to include all variables that logically fit into the framework of vulnerability regardless of their statistical significance provided there are limited issues with multi-collinearity. In addition to accounting for the variables, much of this decision is based on the potentially poor inference generated by utilizing a stepwise framework and the Deviance Information Criteria (DIC) not significantly decreasing when the variables were removed (Greenland et al., 2016; Huberty, 1989). Even though a particular variable might not be statistically significant, it is nonetheless important to see its effect in the model and to compare between COVID-19 cases and deaths when an alternative approach, aimed at reducing variables based on their significance, might result in less comparable models. Furthermore, since the response is logarithmic, we calculate the exponential of the mean of the β coefficients and subtract from 1.00 to determine each variable's effect on relative risk.

3.3.3 Model Selection Criteria

DIC was employed to select the most parsimonious model (Spiegelhalter et al., 2014). During exploratory data analysis, we examined two different prior specifications on the covariates:urn:x-wiley:24711403:media:gh2265:gh2265-math-0019

These resulted in minimal changes to the models and we selected the specification for the covariates which produced the lowest DIC score; in this case, the normal prior which produced a DIC score of ∼10 less than the uniform specification (DIC Cases: Normal 243958.04 vs. Uniform 243969.28/DIC Deaths: Normal 90,824.00 vs. Uniform 90,835.76); a somewhat significant reduction (Spiegelhalter et al., 2014). Furthermore, we utilized all four types of spatial-temporal interactions suggested by Knorr-Held (2000) and found that Type I best fit our data, temporal stratification, and modeled process. Therefore, in our modeling approach, we imposed no restrictions on when or where a space-time anomaly could occur (Type I spatial-temporal interaction).

3.4 Disease Mapping

Another key benefit of Bayesian inference is the creation of the posterior distribution where one can generate the probability of exceeding a certain threshold, the so-called exceedance probability. In this study, we plot the probability that a county exceeds the national average in all our maps. Furthermore, these maps may look different than many of those that are readily available for COVID-19 because we have normalized cases and deaths by the expected number of each in the models. For this purpose, we mapped the convolutional spatial effect, urn:x-wiley:24711403:media:gh2265:gh2265-math-0022, and the spatially structured effect, urn:x-wiley:24711403:media:gh2265:gh2265-math-0023, at the county level. These two effects are active throughout the study period. Also, we plotted the modeled space-time interaction, urn:x-wiley:24711403:media:gh2265:gh2265-math-0024, which represents short-term clusters of activity (month long in our case) relative to the study-area average at time t. In these maps, we followed the classification rule followed by Richardson et al. (2004); areas where urn:x-wiley:24711403:media:gh2265:gh2265-math-0025, is considered a hot spot, urn:x-wiley:24711403:media:gh2265:gh2265-math-0026, is considered areas statistically similar to the national average, and urn:x-wiley:24711403:media:gh2265:gh2265-math-0027, signifies a cold spot; areas that represent infection/mortality rates below the national average (Richardson et al., 2004). Counties that are considered hot spots based on the convolutional spatial effect, the spatially structured effect, and the space-time interaction component are compared to the SVI composite score of the combined average and cold spot areas; areas where urn:x-wiley:24711403:media:gh2265:gh2265-math-0028 (non-hot spot areas). In other words, we compared the SVI composite score of hot and non-hotspot counties. This analysis was done on model outputs that were singular for the entire study period (convolutional spatial effect and spatially structured effect) and monthly outputs (space-time interaction component). This assessment, utilizing notched box-plots, is employed to examine differences in the SVI composite score between the affected areas and in the case of urn:x-wiley:24711403:media:gh2265:gh2265-math-0029, different times. Although the models do include individual variables that ultimately make up the SVI composite score, we felt it was very important to illustrate the differences in social vulnerability between the hot and non-hot spot areas.

3.5 Limitations and Caveats

A potential limitation of this study—likely not a significant constraint—is the lack of greater temporal resolution with regard to the SVI. In the case of the CDC's SVI, the index is calculated either yearly or every other year. This study focuses on COVID-19 on a monthly basis and there is no available capability of measuring changes in social vulnerability at that temporal resolution. That considered, it is likely that many of the individual variables that are used to define social vulnerability do not change dramatically from one month to the next (Flanagan et al., 2011; Neelon et al., 2020; Shi & Stevens, 2021). Also, the internal limitations present in the SVI are extend to our analysis (Bakkensen et al., 2017; Rufat et al., 2019; Tate, 2012)

Another potential limitation is the number of zeros the data set for COVID-19 fatalities contains, at least in the initial months. It is worth noting that the cases data set contains 2,097 zero values (6.75% of total values) with steadily decreasing numbers from 997 zeros in March 2020 to 2022 zeros in December 2020. In the mortality data set, the zeros are potentially more of an issue with 13,108 zero values (42.20%), but with steadily decreasing numbers from 2,607 in March 2020 to 328 in December 2020. We could have opted for a Zero-Inflated Poisson model, especially for deaths, but decided to keep the hierarchical specifications and structure as consistent as possible between COVID-19 cases and deaths. By doing so, we eliminate a level of complexity within the model and maintain their comparability. However, future research, that focuses specifically on the spatiotemporal nature of COVID-19 mortalities should certainly examine a Zero-Inflated Poisson approach.

A further limitation is that the rates for cases and deaths are not age-adjusted. Given the data set collected, there was no information available to age-adjust the rates (i.e., the data were not stratified by age). However, in the models, we do include as an independent variable, the percentage of population age 65 and over. This inclusion should help account for variations in the aged population that might contribute to higher numbers of deaths.

The use of county-level data could be considered a limitation. Additionally, the environmental variables are averaged over the extent of each county. There are likely to be important sub-county-level variations in infections and deaths that could produce different spatiotemporal associations if a study were conducted at a finer spatial scale (i.e., zip code, census tract, and block group).

Finally, we decided not to place temporal lags on the environmental variables in relation to COVID-19 cases and deaths. Estimates for a latency in exposure to COVID-19 and onset of symptoms range from 2 to 24 days (CDC, 2020; Grant et al., 2020). The WHO estimates that there is a temporal lag of 2–8 weeks from the onset of symptoms to death in the most severe cases (Baud et al., 2020; Woolf et al., 2021). Given these large ranges of temporal associations and the aggregation of the data by month, we opted to compare SVI and environmental measurements on the date where a case or death was reported. Therefore, the coefficients should be interpreted in the proper context and with this consideration in mind.

4 Results

4.1 Temporal Trends in COVID-19 Cases and Deaths

COVID-19 cases by month per 100,000 people are presented in Figure 1a. Cases steadily increased until October, with an exponential increase through October until the end of 2020. Deaths from COVID-19, presented in Figure 1b, increased exponentially between March and April, then decreased and remained fairly stable through October, with another exponential increase in November and December.

Details are in the caption following the image

(a) Cases of coronavirus disease 2019 (COVID-19) per 100,000 count of population in the United States for the year 2020 (Crimson line = Locally weighted scatterplot smoothing (LOWESS) line of cases). (b) Deaths from COVID-19 per 100,000 count of population in the United States for the year 2020 (Crimson line = Locally weighted scatterplot smoothing (LOWESS) line of deaths).

4.2 Spatiotemporal Ecological Regression

The coefficients resulting from the spatiotemporal ecological regression model for COVID-19 infections are presented in Table 2. The variables grayed out are considered not statistically significant since 0 falls within the 95% credibility interval (Wang et al., 2018). The variables unemployed, age 65 and up, disabled, crowded living, no vehicle, limited English, LST nighttime, AGL temperature, and precipitation have a negative relationship with COVID-19 infection risk. The strongest effect on cases is from the variable “no high school diploma” with a 20.60% increase in risk from 1 standard deviation (6.34%) increase. AGL temperature is the second strongest with a 20.70% decrease in risk resulting from an 8.58k (1 standard deviation) increase. The remaining effects of each variable are presented in Table 2.

Table 3 contains the coefficient results from the posterior distributions of the ecological regression model for COVID-19 fatalities. The variables with the strongest impact (percent increase) on risk to COVID-19 deaths are non-White status (+40.19%), no high school diploma (35.21%), AGL temperature (−26.68%), age 65 and over (23.93%), and age 17 and under (19.42%) after a standard deviation increase in each variable, respectively. Unemployed, single parent, mobile home, group quarters, uninsured, and LST daytime were not statistically significant.

Table 3. Posterior Coefficients and 95% Credibility Intervals of Variables Utilized in the Model for COVID-19 Deaths
Variable Mean Standard deviation 2.5% credibility interval 97.5% credibility interval Effect on risk (%)
Intercept −1.88 0.05 −1.99 −1.79
Unemployed −0.06 0.07 −0.2 0.08 −5.57
Per capita income 0.06 0.02 0.02 0.11 6.65
No high school diploma 0.3 0.03 0.25 0.36 35.21
Age 65+ 0.21 0.02 0.17 0.26 23.93
Age 17− 0.18 0.03 0.13 0.23 19.42
Disabled −0.09 0.02 −0.13 −0.04 −8.37
Single parent 0.07 0.02 0.03 0.11 7.05
Minority 0.34 0.02 0.3 0.38 40.19
Mobile home 0.01 0.02 −0.03 0.05 1.09
Crowded living (>10 per hh) −0.09 0.02 −0.13 −0.05 −8.54
No vehicle −0.02 0.02 −0.05 0.01 −1.78
Group quarters 0.03 0.02 −0.01 0.06 3.02
Poverty 0.07 0.07 0.02 0.21 7.06
Multi-unit dwelling 0.11 0.02 0.07 0.15 11.29
Limited english −0.14 0.02 −0.19 −0.1 −13.35
Uninsured −0.01 0.02 −0.05 0.03 −0.73
LST day 0 0.02 −0.03 0.04 0.48
LST night −0.12 0.01 −0.15 −0.09 −11.14
AGL temperature −0.31 0.03 −0.37 −0.25 −26.68
Pressure 0.14 0.02 0.1 0.17 14.66
Precipitation −0.06 0.01 −0.08 −0.03 −5.62
Wind direction −0.07 0.01 −0.1 −0.04 −6.99
Wind speed −0.02 0.01 −0.04 0.01 −1.73
  • Note. Values in bold are not statistically significant.
  • Abbreviations: COVID-19, coronavirus disease 2019; LST, land surface temperature.

4.3 Modeled Temporal Trend

Figure 2 exhibits the temporally structured γt and unstructured ωt effects for both cases and deaths. The left panel shows the modeled structured temporal effect for cases with all covariates, following the random walk order-1 and IID specification for unstructured temporal effects. The structured component shows an increase in relative risk until August with a decrease for the remainder of the year. The unstructured effect tends to fluctuate between being slightly above 1.00 to slightly below 1.00; with its 95% credibility envelope easily encompassing 1.00. In the right panel for deaths, γt, increases until June, drops in July, increases until September, and drops until the end of the study period. Similar to the cases panel, ωt, fluctuates between being slightly above 1.00 to slightly below 1.00.

Details are in the caption following the image

Posterior temporally structured; γt (light blue) and unstructured; ωt (light red) effects for coronavirus disease 2019 cases; panel (a), and deaths; panel (b) (95% credibility envelope) in the United States by month for the year 2020.

4.4 Spatial Effects

The convolutional spatial effect and the spatially structured effect for COVID-19 cases are mapped in Figure 3a. These figures show the probability that the relative risk exceeds 1.00, the national average throughout the study period. There is a strong clustering of high probability for the convolutional spatial effect in Florida, Alabama, Mississippi, Louisiana, Arkansas, Tennessee, Iowa, and Arizona. There is sporadic clustering of high probability, most prominent of which is in Indiana, Kansas, and Colorado. There is significant clustering of low probability areas in the Northeast, Pacific Northwest, Upper Atlantic Coast, Upper Midwest, Michigan, and West Virginia. The spatially structured effect for cases follows a similar pattern to the convolutional effect as it explains 82.7% of the variance in the overall spatial effects. Key differences include some higher probabilities in Connecticut and Iowa. Probability is lower in Southern California, Michigan, and New Mexico.

Details are in the caption following the image

Exceedance probabilities of convolutional spatial effect (urn:x-wiley:24711403:media:gh2265:gh2265-math-0030) and spatially structured effect (urn:x-wiley:24711403:media:gh2265:gh2265-math-0031) relative risk associated with coronavirus disease 2019 (COVID-19) infection; panel (a), and COVID-19 mortality; panel (b).

Figure 3b shows exceedance probabilities for the convolutional spatial effect and the spatially structured effect for deaths. There is a strong clustering of high probabilities in New Mexico, Indiana, Louisiana, Eastern Pennsylvania, and the Northeast megalopolis. Higher probabilities are scattered throughout the Southeast and portions of the Midwest into Montana, Idaho, and Eastern Oregon. The spatially structured effect accounts for 60.9% of the variance for (S*+U*), so similarities are expected, although not as high of a degree as in the spatial effect for cases. The most notable difference is the increase in clustering in the Southeast, the increase in Indiana, and less sporadic dispersion of counties in the Midwest into the Mountain West.

4.5 Spatiotemporal Interaction

4.5.1 Cases

Exceedance probabilities using 1.00 (the national average) as a threshold for the spatiotemporal interaction term are presented in Figure 4, for cases, and Figure 7, for deaths. Initial clustering of high probabilities in March are noted in the Northeast, especially New York, Florida, Louisiana, and counties containing some of the major metropolitan areas around the country (i.e., Atlanta, Denver, Detroit, Chicago, Indianapolis, San Diego, Los Angeles, San Francisco, Portland, and Seattle). Lower probabilities are less stable than in April and this is likely due to the average risk being so low at this point in time. In April, the areas noted previously in March have expanded in what appears to be a diffusion pattern, in many cases doubling in extent. Lower probability areas are more prominent and are focused in the Upper Midwest south through the Great Plains into Texas. There is another notable area of low probabilities in Ohio South through West Virginia, west into Kentucky and further South into Tennessee. In May, areas previously noted at high probability have remained fairly stable, with a notable decrease in probability in upper New York, the Upper Northeast, and Michigan. There is a crescent of lower probability extending from Western Pennsylvania, through West Virginia and west to Kansas and Oklahoma. There is a notable increase in cases in Minnesota and upper Iowa.

Details are in the caption following the image

Probabilities of space-time interaction term urn:x-wiley:24711403:media:gh2265:gh2265-math-0032 exceeding 1.00 during the year 2020 for United States coronavirus disease 2019 infections stratified monthly.

By June, there are some significant changes to areas of high probability. The pandemic seems to have settled much more into the southern United States focusing again in Florida, Alabama, Mississippi, Louisiana, eastern Texas, South, and North Carolina. Probabilities have decreased in the Northeast and throughout much of the Midwest apart from much of Iowa and southern Minnesota. High probabilities remain in Arizona and have expanded into Utah, Nevada, and much of California. July shows a further solidification of the pandemic in the southern United States extending from the Atlantic to the Pacific coast. Arizona northward into Utah and Idaho has joined this high probability area. Metropolitan areas in Minnesota, Wisconsin, Ohio, and Pennsylvania are showing renewed higher probabilities. The Northeast and the middle Midwest are the lowest probability areas, with Illinois and Indiana continuing a trend of decreasing activity. By August, the pandemic is shifting out of the southern United States and into the Midwest with increases from Tennessee into Minnesota, North Dakota, and South Dakota. The Mountain West is exhibiting an increase in activity as is much of California. Arizona and New Mexico are showing a decrease in probability and the Northeast remains firmly in the lower category.

September witnesses the pandemic lessening in the southern United States, but the increases previously noted in the Upper Midwest have become even more pronounced, with Missouri, Illinois, Iowa, Wisconsin, Minnesota, Kansas, Nebraska, and North and South Dakota heavily burdened. The pandemic continues to lessen in Arizona and California. By October, the pandemic continues to rage in the Upper Midwest affecting much of the counties in the states from Wisconsin to Idaho. There is a notable lessening in southern Minnesota and Iowa. The pandemic continues to subside in the southern United States and remains stable in the Northeast. November expresses a further strengthening in the upper Midwest with areas previously showing a lessening pattern overrun by cases. Much of the United States is affected apart from the southern United States, California, Arizona, Washington, and the Northeast. Through December, the pandemic has diminished in the Upper Midwest and shifted with higher probabilities into the Northeast and Texas and has further reasserted itself on the Pacific Coast. The upper Midwest and extreme South continue a waning effect apart from Florida which displays a resurgence.

4.5.2 Deaths

The spatiotemporal interaction term probability exceedances for deaths are shown in Figure 5. As expected, there is not much activity in March apart from a few deaths in some major cities. By April, deaths begin to show in many of the major metropolitan areas of the United States with a clustering of high probabilities in the New York City area, Chicago, Detroit, Indianapolis, Atlanta, San Diego, Los Angeles, San Francisco, Portland, and Seattle. Through May, many of these areas of high probability have expanded in a similar apparent diffusion pattern to cases a month or so earlier. Much of the Northeast megalopolis is affecting along with Detroit, Cleveland, Pittsburg, Chicago, Indianapolis, Nashville, Birmingham, New Orleans, and counties in New Mexico, Arizona, and Southern California. In June, many of the counties around these same cities have become even more heavily burdened, with notable clusters in the Northeast, Ohio, eastern Michigan, northern Illinois, central Indiana, central Mississippi, Arizona, and southern California. Through July, many of these areas are showing a decrease in deaths apart from the northeast megalopolis, Chicago, central Indiana, Arizona, and southern California.

Details are in the caption following the image

Probabilities of space-time interaction term urn:x-wiley:24711403:media:gh2265:gh2265-math-0033 exceeding 1.00 during the year 2020 for United States. Coronavirus disease 2019 mortalities stratified monthly.

Through August, probabilities of high deaths have shifted to the southern United States, while lessening in the Northeast and Midwest. The probabilities have strengthened in Arizona and much of California. September witnesses a further strengthening of probabilities in the southern United States affecting much of South Carolina, southern Georgia, and much of Florida. The pandemic continues to strengthen in Arizona and California. The waning trend has continued throughout much of the upper Northeast and the Ohio Valley. By October, these shifts continue with sporadic higher probabilities throughout the southern United States. The lessening continues in the Ohio Valley and upper Northeast as Arizona begins a trend of diminishing deaths. November witnessed a shift in the probability of deaths into the upper Midwest, as suspected, considering cases witnessed a similar trend a few months prior. The pandemic begins to lessen in the southern United States, but contains some sporadic high probabilities. However, the shift to the north is apparent. Lessening continues in Arizona and California with the same effect stable in the upper Northeast. Through December, the shift into the upper Midwest is even more evident with the Mountain West now included. December further witnesses a resurgent trend in the upper Northeast, especially northern New York, Vermont, New Hampshire, and Maine. The continued lessening in the southern United States, Arizona, and California is noteworthy.

4.6 Comparisons of Composite Vulnerability in Hot and Cold Spots

Figure 6a displays the boxplots comparing hot and cold spots for the convolutional spatial effect (urn:x-wiley:24711403:media:gh2265:gh2265-math-0034) for COVID-19 infections. The light gray distribution is for areas with a probability less than 0.80 of exceeding 1.00 (cold spots) and the red distribution is for areas where the probability is greater than or equal to 0.80 of exceeding 1.00 (hot spots). Hot spot areas have a significantly higher SVI composite score compared to the low probability areas. Figure 6b illustrates the boxplot comparing areas delineated in the same way to the composite SVI score for COVID-19 fatalities. Likewise, areas with a higher probability of COVID-19 deaths have a statistically significant higher SVI composite score. The composite SVI score for the spatially structured effect is similarly higher in areas of higher probability, which is expected based on the percent of variance explained in the convolutional effect by that component.

Details are in the caption following the image

Boxplot of Social Vulnerability Index composite score for hot and cold spots for the convolutional spatial effect for coronavirus disease 2019 (COVID-19) infections; plot (a), COVID-19 fatalities; plot (b).

Taken in the spatiotemporal context, the relationship between higher SVI composite scores and the hot and cold spots is not as straightforward. Figures 7a and 7b display boxplots comparing the SVI composite score for the areas by month that have a high probability of exceeding 1.00; within the spatiotemporal interaction component urn:x-wiley:24711403:media:gh2265:gh2265-math-0035. The individual plots are delineated in the same way as above. Comparing the distributions for SVI composite scores to cases (9A) from April through August, the score is higher and statistically significant in hot spot counties. Similarly, for deaths (9B), the SVI score is higher for the counties involved for the months July–October. The mean SVI score for the hot spots during these months nears or exceeds the third quartile value in the cold spot counties. November also witnesses a higher but not statistically significant SVI score interpreted via the notches in the boxplots.

Details are in the caption following the image

Boxplots organized by month, comparing cold spots and hot spots of coronavirus disease 2019 cases (a) and deaths (b) to composite Social Vulnerability Index score.

5 Discussion

5.1 Relationships Between Social Vulnerability Variables and COVID-19 Infections and Deaths

The top five contributors for county-level risk for infection are non-White status (+20.60%), no high school diploma (21.4%), age 17 and under (11.78%, multi-unit dwelling (11.18%), and poverty (9.6%). Non-White status (40.19%), no high school diploma (35.21%), age 65 and over (23.93%), age 17 and under (19.42%), and multi-unit dwelling (11.29%) are the top five contributors for county-level risk of death. Apart from the clear difference in contribution from age 65 and over, there are other differences, primarily in magnitude, from the relative contribution of other variables.

Age 65 and over is the variable that is most significantly different in its impact between cases and deaths. For cases, age 65 and over seems to have a negative effect on risk (−2.26%), but that effect is not statistically significant. In relationship to deaths, this variable increases risk by 23.93%. Given that we know age 65 and over is a significant individual risk factor for COVID-19 mortality, this is not completely surprising (Woolf et al., 2021). Also, given that there has been significant outreach to the older community to prepare and educate on the risks from COVID-19, the community-level relationship of the variable to cases is understandable.

Non-White status has a much stronger impact on the risk of death (+40.19%) than on infection (+20.60%). This further demonstrates that not only is the health burden from COVID-19 significant in the non-White community but the burden of mortality is significantly greater than it is for infection. These findings support prior studies demonstrating COVID-19 having a disparate effect in non-White communities (Karaye & Horney, 2020; Khazanchi et al., 2020; LaVeist, 2005; Shi & Stevens, 2021; Singu et al., 2020). Poverty is a significant contributor to relative risk for cases (9.60%) and deaths (7.06%). There are many factors that could potentially contributors to risk in poorer communities. Often, these areas coincide spatially with communities of color (non-White status) and the effect is potentially, at least partially, being noticed in this analysis.

On the opposite end of the risk spectrum, unemployment is the community-level variable that has the strongest negative effect on risk from cases (−8.50%) and deaths (−5.57%). Assuming that an unemployed individual has less potential to be exposed to an infected individual this relationship makes sense intuitively. Not in possession of a vehicle has a greater effect on lowering risk for infection (−7.19%) than it does for death where the risk is lowered (−1.78%) but that effect is not statistically significant. Similar to unemployment, not having a vehicle could potentially mean less social contact and lower the risk of exposure. Disabled lowered risk for infection (−6.73%) and death (−8.37%). Potentially those that are disabled have a designated caretaker that helps lower the risk of contact with an infected individual. Crowded living is another variable that has a lessening effect on risk of cases (−5.13%) and deaths (−8.54). Crowded living is defined as greater than or equal to 10 persons per household. The relationship with risk from this variable is difficult to explain as one would assume crowded living conditions could contribute to higher degrees of risk. Perhaps there are sub-family level interactions that lower risk (i.e., one person designated as the consumer for the group).

Our findings are especially supportive of Karaye and Horney (2020) with conclusions related to non-White status, but not as supportive as their findings related to limited English, or the variables that make up the themes of household composition, transportation, housing, and disability (Karaye & Horney, 2020). This lack of support is likely due to their study focusing on cases through May 12, 2020, so the comparison in the amount of data and the timeframe of investigation is different. Dasgupta et al. (2020) found that counties with high percentages of non-White status and crowded housing were more likely to become COVID-19 hot spots during June and July 2020. Our study supports these findings for non-White status, multi-unit dwelling, and group quarters throughout the year 2020.

5.2 Relationships Between Environmental Variables and COVID-19 Infections and Deaths

There are differences in the relative contribution to risk from the environmental variables but they are primarily differences in magnitude. Of the seven environmental variables examined, six have a statistically significant relationship with COVID-19 cases and five with COVID-19 deaths. AGL temperature has the strongest impact on risk to cases (−20.70%) and deaths (−26.68%). LST nighttime temperature increases lower risk for cases (−11.82%) and deaths (−11.14%). These findings support research that points to increases in temperature lowering the risk of COVID-19 infections (Haque & Rahman, 2020; Prata et al., 2020; Rouen et al., 2020; Sarkodie & Owusu, 2020; Shi et al., 2020). Our precipitation finding (−4.07% for cases and −5.62% for deaths) contradicts Sarkodie and Owusu (2020), where they found a positive association between precipitation and COVID-19 cases. However, it does support Menebo (2020), where both variables for temperature and precipitation had a negative association.

In relationship to winds, prior research has found a negative association with wind speed and COVID-19 incidence (Islam et al., 2020; Şahin, 2020). However, in our study, utilizing wind data at 10M AGL, average monthly wind direction, when the azimuth is increased by 35°, increased risk of infection by 11.79%. Average monthly wind speed, when increased by 0.69 m/s, raises risk by 2.84%. Wind direction lowers risk in the model for deaths (−6.99%) as does wind speed (−1.73%) but it is not statistically significant. While this relationship is difficult to explain it is worth noting and we opted to keep wind data in the analysis to account for its potential effects. More research is needed on wind's relationship, especially at a finer temporal stratification (i.e., daily) where it should be easier to infer the relationship (Islam et al., 2020; Şahin, 2020).

5.3 Temporal Structure of the Pandemic in the United States

Cases and deaths have clearly increased throughout the year 2020 as evidenced by Figure 1. However, the modeled relative risks have fluctuated throughout the time period (Figure 2). These relative risks as modeled through random walk order—1 show rapid increases in cases and deaths from March through much of the summer of 2020. Decreases are then evident for the remainder of the year. Temporal relative risk from death shows more fluctuation than cases but also presents a steady decline in average relative risk for the last few months of 2020. On the surface, these results (Figures 1 vs. 2) may seem contradictory. Apart from one chart showing per capita COVID-19 cases/deaths and the other modeled relative risk, closer examination of the maps of the spatiotemporal interactions (Figures 4 and 5) shows there are more counties affected in the latter months of 2020. This observation suggests the pandemic has broadened in spatial extent, especially in regard to cases during October and November, but has become less intense overall as measured by relative risk. This finding also implies the pandemic may be starting to decline in average intensity—in the United States—as we head into 2021. The average non-modeled SIR and SMR across all counties, support this and demonstrate a similar relative risk trend and corresponding decrease for the latter months of 2020.

During the initial stages of the pandemic in the United States and during the significant increase in cases in the summer months, some suggested that increased numbers of tests were the primary driving force behind COVID-19 infection numbers. However, when viewed in the context of the number of COVID-19 cases per test this has been largely discredited. In the summer months, COVID-19 cases per test were significantly higher throughout much of the country. If increased testing was the principal driver in the increasing numbers of COVID-19 infections, this number would have largely remained the same. However, in many states, there were significant differences between COVID-19 cases/tests in spring and those in summer 2020 (Gu, 2021; Pitzer et al., 2020).

5.4 Spatial Structure of the Pandemic in the United States

After adjusting for the fixed effects of the covariates and the temporally structured and unstructured random effects, the convolutional spatial effect risk map and the spatially structured effect risk map identified counties at increased risk of COVID-19 infection and death throughout the study period. The most prominent spatial aspect of relative risk to COVID-19 infection were the clusters most heavily focused in the southern United States, and the states of Indiana, Iowa, and New Mexico. Somewhat supported by Snyder and Parks (2020), with their GWR approach, this result was not completely unanticipated (Snyder & Parks, 2020). There were additional sporadic areas of increased risk scattered throughout the Midwest, US South, and Great Plains. Strong degrees of spatial autocorrelation, which supports the clustering, as modeled with the spatially structured effect, were present in many of those same areas.

Convolutional risks from COVID-19 deaths were less focused than cases but were still prevalent in the US South, especially Louisiana and Tennessee. Indiana and the megalopolis region of the Northeast were also part of this cluster. Interestingly, the latter region did not present as an increased risk area relative to cases throughout the study period. This is a potentially alarming finding suggesting this region has a higher rate of death relative to the number of cases throughout the year. Further west, nearly every county in New Mexico was at an elevated risk. A large degree of spatial autocorrelation (as the spatially structured effect accounts for nearly 61% of the variance between the two components) is also present in many of these identical areas based on the spatially structured risk from death.

5.5 Spatiotemporal Structure of the Pandemic in the United States

The spatiotemporal interaction term is a random effect and can be interpreted as the modeled residual risk after accounting for the fixed effects, spatially structured and unstructured, and temporally structured and unstructured effects. This represents short-term (month long in our study) sporadic clusters of COVID-19 cases and deaths. The maps, Figures 4 and 5, show the probability of relative risk exceeding the national average for each county. During the time period of the study, it is notable that areas impacted by the pandemic shift drastically throughout the country. Cases shift from major metropolitan areas and the Northeast, into the Southern and Southwest United States in the summer months. This trend supports evidence found by Snyder and Parks (2020) that the pandemic was focusing in the highly vulnerable US South by mid-summer. By late summer and early autumn, elevated cases have moved into the Upper Midwest with a second shift into the Southwest and a re-emergence in the Northeast by the end of 2020.

The short-term patterns in deaths follow a similar trend but are delayed by approximately a month to month-and-a-half and are not as large in extent or as contiguous. This implies there is a temporal delay from cases to deaths that falls within the WHO suggestion of 2–8 weeks; tending toward the higher end (Baud et al., 2020). Elevated deaths shift into the Southwest by early summer and into the Southern United States by late summer. Increased probabilities of above average deaths have moved from the Northeast megalopolis by August as they refocus in the South. By November, deaths have moved into the Upper Midwest and are beginning to shift out of the US South (apart from Florida); into less socially vulnerable areas of the country. December shows the US South to be below average risk along with the Southwest. A further broadening is evident in the Upper Midwest and Mountain West.

5.6 Composite Social Vulnerability in Hot and Cold Spots

The convolutional spatial effect for COVID-19 cases was classified into hot and cold spots based on the criteria of Richardson et al. (2004). For COVID-19 cases of infection, areas that were considered hot spots throughout the course of the study period had higher SVI composite scores (0.57) than those that were considered cold spots (0.48). In relation to deaths, there was a similar trend with higher SVI composite scores for hot spot areas (0.55) versus cold spot areas (0.49). These differences were statistically significant at 0.05 confidence interval. These relationships provide further evidence to studies finding locations that have higher vulnerability scores to be more at-risk of COVID-19 infection and death (Dasgupta et al., 2020; Karaye & Horney, 2020; Khazanchi et al., 2020).

When comparing SVI composite scores to hot and cold spots for cases and deaths based on the spatiotemporal interaction term, the relationship is not as straightforward. In respect to cases, our study supports the findings of Nayak, where the relationship between composite SVI score and COVID-19 cases is not statistically significant in the first month of the pandemic in the United States (Nayak, 2020). The SVI composite score is higher from April to August and again in December in hot spot counties. This observation supports Neelon et al., where counties with higher SVI scores contained higher COVID-19 cases through August. Furthermore, this result verifies Dasgupta et al., where higher SVI scores in June and July 2020 supported the probability of becoming a COVID-19 hotspot. Neelon et al. also discovered that in August, counties with lower composite SVI scores were beginning to be affected. This is also supported by our study, but the composite index is lower in hot spot areas extending into the months after August. For deaths, the index is higher from July through October, with the remaining months either lower or not statistically significant. In the case of August, September, and October, the SVI composite score is much higher in hot spots than in cold spots. The mean SVI score in hot spot areas is approaching the third quartile value in cold spot areas.

Much of this observed shift in the relationship between COVID-19 and vulnerability is due to the spatial-temporal nature of the pandemic in the United States. Cases shift into the southern and Southwest United States in the summer months. These areas are known to have significantly larger numbers and percentages of vulnerability than most other areas of the country (Shi & Stevens, 2021). Likewise, in the instance of deaths in August, September, and October, they are similarly focused in some of the more vulnerable locations of the American Southeast and Southwest. These results support a more complex or nuanced relationship between the SVI, the environmental measurements, and COVID-19 cases and deaths, especially when examined in a smaller spatial and temporal context (Bashir et al., 2020; Shi et al., 2020). The overall time period relationships are not stable across all counties for all time periods. By utilizing the spatiotemporal modeling approach used in this study, we are able to uncover this more elusive space-time relationship. Alternatively, in areas where the SVI composite score in hot spots is lower than in cold spots, the pandemic has shifted into less socially vulnerable areas of the country; the Upper Midwest and Northeast. However, a key finding is that during the summer months (for cases) and autumn (for deaths), the pandemic seemed to shift into warmer and more socially vulnerable areas of the country. Future spatiotemporal analyses, after the behavior of the pandemic in 2021, are needed to determine if this is likely to be a trend the virus follows or if this is simply how the pandemic initially diffused in the United States.

6 Conclusion

Bayesian hierarchical modeling provides a flexible and robust framework in which to model complex spatiotemporal systems. This study presents the findings of fitting a Bayesian hierarchical spatiotemporal model to COVID-19 cases and deaths in the United States for the year 2020. The data collection and modeling framework are explained in detail for straightforward replication. This hopefully will foster continued effort in modeling the spatiotemporal nature of the pandemic in the United States and abroad.

A key finding of this study is focused on the spatiotemporal character of the pandemic in the United States after accounting for specific contributors to social vulnerability, environmental measurements, and spatial and temporal random effects. Furthermore, patterns of COVID-19 cases and deaths vary considerably through time and space. In terms of cases, the pandemic shifted into the US South and Southwest during the hotter months of the year, which are the most vulnerable regions of the US Deaths followed the same pattern only with a 1–2-months temporal lag. This demonstrates a potentially alarming trend if this pattern or a similar one is repeated in other years. These highly vulnerable areas are already under significant stress at this time of the year and the introduction of another stressor on a regular basis could be catastrophic in some areas. Further alarming is the cluster of deaths in the Northeast Megalopolis region.

Another major finding is the difference in relative contribution to risk of COVID-19 cases and death from the individual variables examined. The most striking difference is the contribution of age 65 and over to relative risk of death relative to its impact on risk of infection; a ∼24% increase in risk from death but having a negative but statistically insignificant impact on cases. Non-White status and not having a high school diploma were respectively the strongest social contributors to risk of infection and death. Age 17 and under was the third strongest contributor in cases and fourth for deaths. Living in a multi-unit dwelling was the fourth strongest factor in cases and the fifth strongest for deaths. Poverty was the fifth strongest contributor to risk of infection and the sixth strongest for death. AGL temperature had the strongest effect on cases and deaths of the environmental variables examined with a strong negative influence. LST nighttime temperature increases also decreased risk for cases and deaths. Precipitation in our study had a negative influence on risk of cases and deaths.

Studies such as the one presented here provide insight into the complicated mix of social and environmental factors relating to vulnerability. We demonstrate that relationships between COVID-19 cases/deaths, social vulnerability, and environmental measurements are spatially and temporally variable. Even though many of the findings presented here are supportive of other studies, more work is needed in the spatial-temporal domain of the pandemic. One primary effort should be modeling the spatiotemporal structure at a finer temporal scale (i.e., weekly). This could elucidate other relationships with the examined variables and allow for more elaborate spatial temporal interaction specifications (Type IV spatial-temporal interaction). Such examinations, perhaps at finer spatial scales (i.e., census tract), could also allow us to focus on some of this study's more alarming results.


Funding is supported through the IUPUI Office of the Vice Chancellor for Research and the Indiana Space Grant Consortium.

    Conflict of Interest

    The authors declare no conflicts of interest relevant to this study.

    Data Availability Statement

    Data utilized for the conclusions in this study are available on the Indiana University-Purdue University at Indianapolis Data Repository. https://doi.org/10.7912/D2/23 (Johnson & Ravi, 2021). These data are in CSV format and readily importable into the R statistical package or other platforms.