Causal Effect of Impervious Cover on Annual Flood Magnitude for the United States

Despite consensus that impervious surfaces increase flooding, the magnitude of the increase remains uncertain. This uncertainty largely stems from the challenge of isolating the effect of changes in impervious cover separate from other factors that also affect flooding. To control for these factors, prior study designs rely on either temporal or spatial variation in impervious cover. We leverage both temporal and spatial variation in a panel data regression design to isolate the effect of impervious cover on floods. With 39 years of data from 280 U.S. streamgages, we estimate that a one percentage point increase in impervious basin cover causes a 3.3% increase in annual flood magnitude (95%CI: 1.9%, 4.7%) on average. Using 2,109 streamgages, some of which have upstream regulation and/or overlapping basins, we estimate a larger effect: 4.6% (CI: 3.5%, 5.6%). The approach introduced here can be extended to estimate the causal effects of other drivers of hydrologic change.


Introduction
Humans have modified nearly all of the world's rivers (Vörösmarty et al., 2010), leading to changes in water availability, floods, and droughts. To predict and manage risks associated with these changes, we need to quantify the causal links between human activities and hydrological patterns. The effect on floods of impervious surfaces, such as roads, roofs, and parking lots, is of particular interest, as demonstrated by a large body of research spanning the last half-century (Hollis, 1975;Leopold, 1968;Oudin et al., 2018;Sheng & Wilson, 2009;Shuster et al., 2005).
Despite a general consensus that growth in impervious cover reduces infiltration and increases runoff and flood magnitudes (Villarini & Slater, 2017), estimated effect sizes range from zero (Dudley et al., 2001;Wibben, 1976;Wright et al., 2013) to increases of up to a factor of 10 (Hollis, 1975;Leopold, 1968). Most studies estimate a range of effects falling between these extremes (Debbage & Shepherd, 2018;Hawley & Bledsoe, 2011;Hejazi & Markus, 2009;Jones & Grant, 1996;Ogden et al., 2011;Salavati et al., 2016;White & Greer, 2006). (See Supporting Information Table S1.) Quantifying the effect of changes in impervious cover on floods is difficult because many rivers are responding to concurrent changes in climate, irrigation practices, land use, water withdrawals and transfers, and regulation of flows at dams (Allaire et al., 2015;Hejazi & Markus, 2009;Viglione et al., 2016). As a result, many confounding variables exist that may mask or mimic a causal relationship between impervious cover and flooding. Furthermore, flood series often have large natural variability, due to cycles of wet or dry years, and a low signal-to-noise ratio (Cohn & Lins, 2005;Hirsch, 2011;Merz et al., 2012), making statistical inference from observable data a challenge.
The case study design relies on variation in impervious cover across time: Flooding in an early period, with minimal impervious cover, is compared to flooding in a later period, with increased impervious cover. This design thus relies on the assumption that the early period's flood pattern represents the counterfactual pattern of the later period, had impervious cover not increased. If, for example, precipitation increased at the same time that impervious cover increased, and the effects of precipitation are not accounted for, the estimator of the effect of impervious cover on flooding is confounded (in this case, biased upward).
In contrast, the paired catchment design relies on variation in impervious cover across space. In comparing flooding in an urbanized river basin to flooding in another basin with minimal impervious cover but otherwise similar characteristics, this design makes the assumption that the minimally impacted basin's flood pattern represents the counterfactual pattern of the urbanized basin, had impervious cover not increased. Identifying a comparison catchment can be a challenge because large unimpaired catchments are uncommon, and impaired and unimpaired catchments often differ across other features that also affect flooding. Limited samples sizes mean that effects estimated from paired catchment designs can be sensitive to small changes in the data (Prosdocimi et al., 2015).
A third common approach is to run multiple simulations with differing levels of impervious cover and compare predicted flood magnitudes (Du et al., 2012;Harrigan et al., 2014). This approach, however, cannot estimate a causal effect of impervious cover on floods as findings reflect assumptions made about the model structure and the effect of impervious cover. Such simulation approaches are also limited by small samples, uncertainties associated with model formulation, parameterization, selection of a baseline period, and data (Gyawali et al., 2015;B. Merz et al., 2012;Zhang et al., 2018).
Here we introduce a panel regression approach that improves upon previously used designs by leveraging variation in observed data across both time and space. Temporal variation comes from using a time series at a given stream gage. Spatial variation comes from using data across different basins in the same year. Borrowed from econometrics, the design focuses on isolating and quantifying the causal effect of a single driver-impervious cover-by controlling for other drivers of change (confounders). Recent reviews detail a variety of ways econometric approaches can improve our understanding of hydrologic change (Müller & Levy, 2019) and of coupled human-natural systems more broadly (Ferraro et al., 2019).
The panel regression design has several advantages. First, by exploiting repeated observations of annual floods and impervious cover across many basins, the approach reduces bias due to time-varying factors (e.g., regional weather events) and time invariant, but spatially varying factors (e.g., topography) that might confound identification of the effect of impervious cover on floods. In addition, the design eliminates the need to set subjective "pre" and "post" periods. Finally, it is unnecessary to identify a single reference catchment to serve as a counterfactual because the panel regression compares all basins and time periods, capturing the relevant variation in time and space.
While a few hydrology studies have used panel regression designs, none, to our knowledge, have applied the design to estimate a causal effect of impervious cover on streamflow or floods. Previously, panel approaches have been used to estimate the effect of forest cover on flood frequency (Ferreira & Ghimire, 2012), deforestation on streamflow (Levy et al., 2018), rainfall on low streamflow (Bassiouni et al., 2016), and urbanization on annual runoff coefficients (Steinschneider et al., 2013). The only study that has applied panel models in the context of urbanization and flooding is Over et al. (2016). However, rather than estimating a causal effect of urbanization on floods, the study applies fitted parameters to adjust the historical flood records for northeastern Illinois.
We focus on the effect of impervious cover on annual flood magnitude, defined as the largest streamflow in a given year. Annual flood magnitudes are one of the most common metrics of flooding and are used in flood frequency analysis (Hodgkins et al., 2019;R. Merz & Blöschl, 2008;Vogel et al., 2011). Impervious cover has been reported to have a larger effect on frequent floods, such as the annual flood, compared to infrequent extreme flood events, such as the 100-year flood. This difference has been attributed to soil behaving similarly to an impervious surface after prolonged rainfall and saturation (Hawley & Bledsoe, 2011;Hollis, 1975;Sauer et al., 1983). In a study of national flood trends, Hodgkins et al. (2019) found that changes in the magnitude of annual floods were consistent with change in the number of peaks-over-threshold, another commonly used metric of flooding.
In addition to our panel regression design, this study has two other novel attributes. First, we estimate an average causal effect of changes in impervious cover on annual flood magnitude for a large sample from the whole United States, rather than a single basin or small set of basins. Second, we show how a causal graph can be useful for identifying causal relationships, assessing sources of bias, and developing an appropriate regression model structure.

Causal Graph
Causal graphs (or causal diagrams) are a tool to help identify the sources of variation in a causal variable and its outcome, thereby highlighting potential sources of bias (Morgan & Winship, 2015). Used across a range of fields, these graphs summarize our knowledge and communicate our assumptions about a hypothesized causal relationship (Pearl & Mackenzie, 2018). By laying out these assumptions, causal graphs help us avoid mistaking correlation for causation, as well as inform design of the most appropriate model structure for causal inferences. We interpret estimated effects as causal when we have controlled for all plausible confounding variables. Although uncommon to hydrology, a causal graph was previously used to illustrate the relationship between changes in vegetation and streamflow in Brazil (Levy et al., 2018).
The causal graph in Figure 1a illustrates a hypothesized causal effect of changes in impervious cover on changes in annual flood magnitude. Confounders are variables that affect both the quantity of impervious cover and the magnitude of annual floods and thus can mask or mimic a causal relationship between the two. For example, upstream regulation of streamflow has been reported to reduce flood magnitude (FitzHugh & Vogel, 2010). By protecting downstream areas from floods, upstream regulation may also increase development and thus impervious cover. Regulation of flows is thus a potential confounding variable because it can affect both floods and impervious cover. The causal graph also illustrates variables that are not confounders. For example, changes in peak precipitation or precipitation intensity are likely to affect annual floods. However, unlike regulation, they are not confounders because they do not also affect changes in impervious cover.
In addition, the causal graph includes two other factors: moderators, which are off the causal path and moderate the magnitude of the causal effect; and mechanisms, which lie on the causal path and represent the intermediate outcomes through which impervious cover affects flooding. Interacting the causal variable with moderators can shed light on the heterogeneity of the causal effect. Controlling for mechanisms would be inappropriate because it would mask part or all of the effect of impervious cover on flooding. For example, some studies have found that urbanization increases flooding not only by reducing infiltration rates, but also by increasing total storm rainfall (Zhang et al., 2018). This suggests that precipitation would be considered a mechanism; thus including precipitation in the model could mask part of the effect of impervious cover on flood change and bias our estimated effect.

Modeling Approach
Based on insights from the causal graph (Figure 1a), we developed a panel regression model to eliminate the effects of possible confounding variables that could bias our estimated effect of impervious cover on annual flood magnitude (Figure 1b). In contrast to prior studies that control for only specific confounding variables, our design controls for all possible confounders that are time invariant at the basin level or time varying at the national or regional level.
First, intercept terms (α i ) specific to each streamgage i control for any streamgage-specific characteristics that are constant over time and may be associated with both impervious cover and flood magnitude (e.g., elevation). Representing average annual flood magnitude across 1974-2012 for a given streamgage, these intercepts would be referred to as "fixed effects" in econometric literature. Here, the term "streamgagespecific" is equivalent to "basin-specific" because each streamgage measures streamflow from one basin.
Our panel design also controls for time-varying confounders at both national and regional scales, accounting for temporal trends in floods due to changes other than impervious cover. For example, if water diversions increased across the entire country during the study, this increase would be considered a national timevarying confounder. To control for time varying, nationwide confounders, we use annual dummy variables (D t ) that take the value of 1 to indicate a given year, and 0 otherwise. For time-varying regional confounders, we also include regional annual dummy variables (D t D r ). Regions were defined as the physiographic regions of the United States (Figure 2), which have similar topography, rock structure, and geologic and geomorphic history (Fenneman & Johnson, 1946). For example, if peak precipitation increased in region r in year t, the coefficient γ t,r would reflect and control for that increase.
Although our study design controls for a wide range of confounding variables, an unobserved time-varying confounder at the subregional level could bias our inferences. This is because this "fixed effects" model relies on the assumption that there are no time-varying variables in ε i,t that are also correlated with impervious cover (and we have already controlled for national and regional-level confounders). Fixed effects panel regression models are similar to random effects models (Verbeek, 2003), but differ in one key way: Fixed effects models permit unbiased estimation of the causal relationship (β) even when the streamgage-intercept terms (α i ) are correlated with the error term ε i,t (i.e., even when time-invariant attributes of the basin are correlated with both flooding and impervious cover). In other words, streamgage-specific intercepts are a parameter to be estimated in fixed effect models, whereas, these intercepts are part of the error term in random effects models. For more information, see Gelman (2005, p. 20), Verbeek (2003, and the Supporting Information Text S1. We fit models with the plm package (Croissant & Millo, 2008) in R (R Core Team, 2019). Because model errors at a given streamgage over time (ε i,t ) are likely to be serially correlated, we cluster the standard errors at the streamgage-level (Bertrand et al., 2004). Our clustered estimation of the variance allows for arbitrary serial correlation within each streamgage, as well as heteroskedasticity across streamgages (Wooldridge, 2002). See Supporting Information Text S1 for code. Finally, because we are focused on identifying a causal effect, rather than predicting future flood flows, the causal diagram is used to determine the correct model structure. Model validation methods typically used for predictive models, such as goodness of fit, are not appropriate for models aimed at causal inference because the goal is not prediction (Ferraro et al., 2019). As in previous studies, we use a logarithmically linear model because annual flood values are strongly skewed (Vogel et al., 2011). By taking the natural logarithm of our response variable, we can interpret β as an estimate of the average causal effect of a one percentage point change in basin impervious cover on annual flood magnitude. Changes in impervious cover are reported in percentage points because they

10.1029/2019GL086480
Geophysical Research Letters reflect a difference in percentages. For example, an increase from 5% impervious cover to 10% impervious cover is a five percentage point increase, not a 5% increase. The units of annual flood magnitude are m 3 /s (average for the day). Thus, our estimated effect (β) is reported as a percent change in flood magnitude from a one percentage point change in impervious cover.
We expect the effect of impervious cover on flooding to vary based on certain basin characteristics ("Moderators" in Figure 1a). High soil permeability and low basin slope have been reported to serve as a buffer against hydrologic changes like increased impervious cover (Hopkins et al., 2015). To test whether those attributes moderate the effect of impervious cover on flooding in our sample, we reestimate the model with interaction terms between impervious cover and soil permeability and between impervious cover and slope. We then test the null hypotheses that the estimated coefficient on each interaction term is equal to zero. Note that it is possible that these moderators could also be confounders. However, because they are both timeinvariant characteristics of basins, their confounding effects are controlled for through the streamgagespecific intercept terms.
We also check for a nonlinear relationship (quadratic and cubic) between impervious cover and floods. In the region of Chicago, Illinois, Allen and Bejcek (1979) reported a decreasing effect of impervious cover on floods while Over et al. (2016) reported an increasing effect. Thus, we do not have a clear hypothesis about the sign of a nonlinear relationship. See Supporting Information for equations.

Development of Panel Data Set
We created a spatial and temporal data set that includes annual flood magnitude and annual percent impervious cover in the upstream basin. The time series of percent impervious cover in each basin came from the enhanced Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) II data set based on the National Water-Quality Assessment Wall-to-Wall Anthropogenic Land use Trends data set (Falcone, 2017). These data were available for the years 1974, 1982, 1992, 2002, and 2012. During this period, impervious cover at a given basin either stayed constant or increased, but never decreased. This pattern allowed us to use linear interpolation to estimate impervious cover between the 5 years of observations, resulting in 39 years of data. As a robustness check, we also refit our model with no interpolated data using only the 5 years with observed impervious cover and under the more conservative assumption of a stepwise increase in impervious cover.
From Dudley et al. (2018), we obtained annual maximum streamflows at 2,683 US Geological Survey (USGS) streamgages in the conterminous United States. Given that streamgage siting and funding may be prioritized for locations with more human alteration, a design that includes incomplete streamgage records could result in a biased estimator. We thus selected the 2,109 streamgages with annual maximum streamflow available for every year from 1974 through 2012 (Figures 2a and 2b). To eliminate possible confounding, we also selected the 280 streamgages (Figure 2c) that were unaffected by upstream storage or overlapping basins for our main analysis. Upstream storage was calculated by dividing total upstream dam storage by annual average inflow (Falcone, 2011;Hodgkins et al., 2019) and streamgages with more than 1 day of upstream storage eliminated. To eliminate overlapping basins, we included only the most upstream basins. We use this subset of 280 streamgages for our main analysis and the larger data set as a robustness check. All basin characteristics, including storage and annual average inflow, were obtained from the GAGES II data set (Falcone, 2011).
For the main analysis of 280 study streamgages during 1974-2012, flood magnitude ranged from 0.01 to 6,513 m 3 /s and basin sizes from 5-1,910 km 2 (Figure 2d). Percent impervious cover ranged from 0 to nearly 50%, but most basins had small amounts of impervious cover, with a median of less than 0.44%. A summary of study streamgage and basin characteristics, including data used to estimate the regression model, are provided in the Supporting Information Table S2.
Consistent with previous studies Hirsch & Ryberg, 2012;Slater & Villarini, 2016), we do not see a cohesive spatial pattern in trends in annual floods for the United States. Results from the nonparametric Mann-Kendall test (p < 0.1) indicate a range of positive and negative trends in annual floods scattered across the country. Among the streamgages in Figure 2c, increasing annual floods (positive trends) were detected at 12 basins overall, 3 of which experienced greater than 1 percentage point increase in 10.1029/2019GL086480

Geophysical Research Letters
impervious cover during 1974-2012. However, at most streamgages, no trend was detected. Although we expect increases in impervious cover to increase flood magnitudes, it is not surprising that we find relatively few gages with statistically significant positive trends. Many factors cause changes in floods and these factors may serve to confound the relationship between impervious cover and flood magnitudes. Even at streamgages with negative trends in floods, floods may still be larger than they would have been in the absence of increases in impervious cover.

National Average Estimated Effect
We estimate that a one percentage point increase in impervious basin cover causes a 3.3% increase in annual flood magnitude, on average. The 95% confidence interval (CI) of this estimate is (CI: 1.9-4.7%). This estimated effect size is based on our data set of 280 streamgages without upstream regulation or basin overlap. We also reestimate the model with the expanded data set of 2,109 streamgages to find a larger effect 4.6%, with a CI that overlaps with the CI of the main analysis (CI: 3.7%, 5.4%).
In Figure 3, we contrast our estimated effect from the 280 streamgages with estimates from two nationalscale studies that both estimated the effect of impervious cover on floods as well as reported their data. In a review of studies from around the United States, Hollis (1975) reports large, but widely varying, estimated effects. Applying both paired catchment and model residual designs, Salavati et al. (2016) estimated smaller and less variable effects, with some being negative (albeit the confidence intervals cross zero). While these studies were the most appropriate comparison we could find, they do not provide an exact comparison; see Supporting Information for more details. To facilitate comparison, we added a fitted regression line to each prior study's data in Figure 3. The plot extends up to a 25 percentage point change in impervious cover because this is the largest difference in impervious cover found at a single streamgage within our study.

No Evidence of Nonlinear Effects or Moderation by Slope or Soil Permeability
Although we expected the effect of impervious cover on flood magnitude to vary by soil permeability and/or basin slope, we do not find evidence to support these moderating effects (Supporting Information Table S3). Neither the coefficient on the interaction term between impervious cover and soil permeability (p value = 0.69) nor the coefficient on the interaction between impervious cover and slope (p value = 0.85) is statistically significant. In the nonlinear version of our main model, we also do not find evidence of nonlinear effects of impervious cover based on the statistical significance of estimated coefficients on the quadratic (p value = 0.93) and cubic (p value = 0.75) impervious cover variables (Supporting Information Text S3  and Table S3).

Robustness Checks
In our main analysis, we linearly interpolate impervious cover in between five decadal observations. As a robustness check, we reestimate our model using only the 5 years of observed impervious data with no interpolation and find a larger estimated effect: 6.1% (CI: 3.8%, 8.4%). We also reestimate the model assuming impervious cover changes as a step function (i.e., remains constant until the next decadal observation).
With that assumption, we estimate a slightly smaller effect size of 2.6% (CI: 1.5%, 3.7%), which is not surprising as we expect this assumption to bias our estimate downward (Supporting Information Table S4).

Conclusions and Limitations
To predict and prepare for future floods, we need to understand the causes of observed changes in flooding. Scholars and policymakers have long been interested in the effect of impervious cover on flood magnitude, but estimated effects vary widely. Quantifying this link has been challenging due to small sample sizes and modeling approaches that fail to control for a range of confounding factors. With a large data set of streamgages across the United States, our panel regression design controls for confounding variables by leveraging both temporal and spatial variation in flooding and impervious cover to estimate a causal effect. Greater use of such temporal, spatial, and causal information to inform flood frequency has been called "essential" (Merz & Blöschl, 2008).
There are several limitations to this study, which can be explored in future work. First, although our panel design requires fewer assumptions for causal inference than previous approaches, it is not assumption-free. A causal interpretation of our estimated effect requires one to assume that there are no unobserved, timevarying, subregional variables that systematically affect changes in both impervious cover and flooding.
That assumption might not be satisfied if, for example, previous floods affected both future flood magnitudes and future impervious cover. Second, we include a comparison of our main analysis (280 streamgages) to an expanded data set of 2,109 streamgages that are impacted by upstream regulation and/or basin nesting. Accounting for upstream regulation and nesting structure was beyond the scope of the study, but may have biased this estimate. Third, the effects of impervious cover on flooding may vary depending on the placement of impervious cover within the basin (Debbage & Shepherd, 2018;Oudin et al., 2018), but we did not account for the spatial distribution of impervious cover. This was because our goal was to estimate a national-level average effect that could inform policy and management decision-making.
Our finding that annual floods increase by 3.3%, on average, for each percentage point increase in impervious cover, falls within the wide range of previously estimated effects (Figure 3). Although we were not able to detect moderating effects of soil permeability or basin slope, further exploration of basin characteristics which moderate the relationship between impervious cover and floods is warranted. Estimation of the causal effect of impervious cover on the 100-year flood, or on changes in variability of floods (Hecht & Vogel, 2020), would provide a more nuanced understanding of the ways in which impervious cover affects flooding and extremes. Engineering and policy-making communities increasingly seek details on the magnitude of the effects of environmental change on water management and infrastructure. Our study design provides an approach to estimating the causal effect of imperviousness on floods that can be extended to other drivers of environmental change. (ii) Hollis (1975); and (iii) Salavati et al. (2016) which applied two approaches, Paired Catchments and Model Residuals. The right axis (Ratio relative to pre-urbanized flood) is the metric used in Hollis (1975) and is defined as Peak Discharge after Urbanization Peak Discharge before Urbanization ; so a doubling of peak discharge, or annual flood, is equivalent to a 100% increase in flood magnitude (Supporting Information Text S2).