Urban Scaling as Validation for Predictions of Imperviousness From Population
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://energy.gov/downloads/doe-public-access-plan).
Abstract
Strategies for creating quantitative projections for human systems, especially impervious surfaces, are necessary to consider the human drivers of climate and ecosystem change. There are models that generate predictions of how impervious surfaces may change in response to different potential futures, but few tools exist for validating those predictions. We seek to fill that gap. We demonstrate a statistically robust sublinear scaling relationship between population and urban imperviousness across a 15 year history. We show that Integrated Climate and Land-Use Scenarios (ICLUS) urbanization projections are also consistent with theory. These results demonstrate a theory that can be used to validate other models' predictions of urban growth and land cover change, analogous to the ways in which allometric scaling laws in biology have been used to validate process-based models of ecosystem composition under different climate scenarios.
Key Points
- Urban Scaling theory can validate process-based models of urban growth, measured by impervious surface area
- ICLUS projections are consistent with urban scaling theory
Plain Language Summary
The decisions human organizations like cities or countries make can have a big influence on the environment, and the environment can influence our decisions. Scientists have some tools for modeling long-term interactions between cities and the environment. It is hard to learn if these tools are working, because even if we know how a particular decision–if made–would influence the environment, we are not good at predicting what decisions will be made. We show that there is a specific mathematical shape to the relationship between a city's population and the total built-up area: things like roads, parking lots, or buildings. This mathematical relationship shows us that in cities with larger populations, there is less space available per person and so built-up areas are more intensely used. We then test whether other researchers' predictions about interactions between urban population and built-up areas predict that these places will be more intensely used, like we showed. We show that the other researchers' predictions do correctly represent some important things we know about cities. This helps us know more about how reliable our predictions of interactions between human organizations and the environment might be.
1 Introduction
Over the last few decades, Earth System Models (ESMs) have fundamentally changed how we understand, observe, and predict the future of our planet. Unfortunately, only recently have these models begun to include the effects of and on humans beyond total carbon emissions. Human activities have a dramatic influence on land cover, the water cycle, and many biogeochemical processes which influence the Earth system across many spatial scales. Furthermore, climate change continues to have a profound impact on human security and economic concerns. Understanding these interactions provides a clear mandate for models to simulate and project the interaction of humans and climate.
Obtaining accurate environmental projections based on human input to the system is difficult, however, because human decisions can pivot rapidly, causing rapidly diverging outcomes. Shared socioeconomic pathway scenarios (SSPSs) (O'Neill et al., 2017) were introduced to characterize the breadth of possibilities introduced by human decisions. Much like the Reference Concentration Pathway (RCP) scenarios for carbon emissions (van Vuuren et al., 2011), the SSPs provide plausible future scenarios that are focused on societal development, including demographics, economics, and technology. These scenarios provide a context within which to develop predictive understanding of how societal development and the climate will coevolve. Impact assessment tools like the Integrated Climate and Land use Scenarios (ICLUS) (US EPA, 2014) project look to leverage these scenarios to predict key human factors that affect global environmental change. These data sets then are the necessary inputs to Integrated Assesment Models (IAMs) (Calvin & Bond-Lamberty, 2018; Collins et al., 2015; Thornton et al., 2017), which couple with ESMs to allow human decisions to drive models of the natural system.
Evaluating model predictions based on these scenarios is challenging, however, because social processes are sufficiently complex that uncertainty in decision making substantially out-weighs uncertainty in process representation. It is not always clear which processes influence the outcome of interest, and rarely can all processes of interest be modeled. We often lack sufficient data for historically driven empirical models, and factors like environmental change, technology change, and institutional change can cause unexpected and rapid changes to the underlying processes (Liu et al., 2007; Ruth et al., 2011). These factors can make historically based model predictions obsolete. In order to evaluate models which include anthropogenic influences on the Earth system, robust model validation strategies must be developed concurrently with the models.
One approach for model evaluation is the use of theoretical predictions of correlations and scaling laws as a constraint on predictions. Scaling law analyses, instead of looking at the state variables as a function of time and space, look at the correlations of variables expected to be related through a modeled process and ask whether the predicted correlations match the correlations expected from theory and observation. This approach can ensure that process-based predictions, which are necessarily temporally and spatially specific, are consistent with system-level observations that can be more general.
Using theoretic or observation-driven expectations on empirical relationships between outcomes of interest to validate process-based models has been successfully applied in a variety of contexts. Notably, allometric scaling theories have been used in multiple impactful predictions of what plants will look like under changing climate conditions (McDowell & Allen, 2015; Oren et al., 2001). Energy spectra scaling, and theoretically derived transitions in that scaling, provides an important constraint on meso- and high-resolution global-scale models of atmospheric dynamical cores (Skamarock, 2004). In earthquake modeling, scaling laws of event magnitudes are such an important observation that a whole class of models has been derived and studied extensively because they were shown to be the simplest model that reproduces the observed scaling law (Carlson et al., 1994). Each of these, and others, provide examples where model predictions can be evaluated and constrained by theoretical predictions based on scaling laws to identify whether biases are present in the model. Here we will use this concept to leverage urban scaling theory to evaluate predictions from the ICLUS model to build confidence for their use in IAMs.
Theories on the scaling of urban properties with population offer a novel approach for the evaluation of models which depend on human activities (Bettencourt et al., 2007; Gomez-Lievano et al., 2016). The form of Urban Scaling Theory described by Bettencourt (2013) proposes a scaling relationship between aggregate urban population and a broad suite of aggregate socioeconomic and infrastructural outcomes. According to this theory, many infrastructure characteristics are expected to scale sublinearly. This sublinear scaling occurs because larger cities are more densely populated and this increased population density enables more intense use of infrastructure and more efficiently connected networks. Both observationally and theoretically, this exponent has been shown to be 5/6 across a wide range of urban infrastructures (Bettencourt, 2013; Marshall, 2007; Samaniego & Moses, 2008).
In this paper, we demonstrate a novel approach for evaluating human-driven data products for use in IAMs and ESMs—at an integrated scale, the detailed data products are consistent with Urban Scaling Theory. To do this, we show there is a statistically robust, sublinear scaling relationship between urban imperviousness and urban population for the recent past in U.S. cities that is consistent with an exponent of . We then explore how well output from an independent model of urbanization conforms with this relationship. ICLUS's spatially and temporally explicit predictions of population and land-cover change are aggregated into city units across a range of scenarios. We identify some conditions under which the projections are consistent with the urban scaling theory predictions. Furthermore, we explore biases in the comparison and find that the sign and relative magnitudes of the biases are consistent with expectation.
2 Data
2.1 Historic Data Set
We use the U.S. Census 2010 definition of Metropolitan Statistical Areas (MSA) (US Census, 2020) to define the spatial extent of MSA's in the continental United States. These urban boundaries are used to estimate aggregate population and aggregate impervious area for all cities, in all years in both the historical and projected data.
The NLCD 2016 Imperviousness data set (Yang et al., 2018) represents urban impervious surfaces as a percentage of developed surface over every 30 m pixel in the United States. We then use those percentages to estimate the aggregate impervious area in each U.S. MSA, including each pixel which intersects with the 2010 U.S. census MSA boundary.
MSA boundaries do not cross county borders; therefore, we use the 2001, 2006, 2011, and 2016 county level population estimates generated from the 2000 and 2010 decennial censuses and the American Community Survey to define total MSA population in the NLCD years. County-level population estimates are summed to the MSA level. This becomes our historical data set. Figure 1 compares the NCLD data in 2011 for the Atlanta, GA metropolitan area to our projected dataset for the same region.
2.2 Projected Data Set
- A1 assumes rapid economic development, improved education, and reduced income disparities across regions and that population rises rapidly until midcentury and then falls below replacement level.
- A2 is the highest ICLUS population projection. It assumes slower economic growth, limited population migration, and limited flow of ideas across regions. It also assumes an increase in fertility and average household size. This scenario represents a “worst-case” pattern of development for the United States.
- B1 assumes the same population projections as A1, but for reasons of rapid social development and an emphasis on education rather than economics. Because social development of women is key to this scenario, a rapid decline in fertility in developing regions is incorporated, leading to low population projections. This scenario results in the least-altered landscape for most areas of the United States.
- B2 assumes moderate economic growth in developing regions, local solutions to environmental and economic stress, and low migration with focus on local sustainability, and increasing employment opportunities in urban centers.
- Base case (BC) assumes the same population and growth projections as B2, but with medium levels of internal migration instead of low.
The ICLUS tool produces county-level population projections based on the SERGoM v3 (EPA, U., 2010) spatial allocation model. Population growth forecasts are then converted to housing units, which are distributed within counties based on the spatial pattern of previous growth and requisite transportation infrastructure. The distribution of new residential housing units and new housing unit density is used to infer increases in impervious surface area. These projections do not include other land uses such as industrial, commercial, or transportation. Additionally, only increases in population and housing units are considered for projections, so loss of housing units due to demolition or neighborhood conversion to commercial use or greenspace is not modeled. The projections do, however, allocate a “full continuum” of housing density, from urban to rural so that growth patterns are examined reasonably comprehensively, given that exurban/low-density development encompasses a generally a larger footprint that urban development does and grows faster than urban areas do.
The ICLUS tool produces spatially explicit projections of impervious surface area with 1 km spatial resolution, for each decade from 2010 to 2100. We sum these data to the 2010 MSA boundaries. This becomes our projected data set. True MSA boundaries are likely to change over this 90 yearlong time period. Section S1, “Bias in MSA Boundaries”, in the supporting information discusses the direction and relative magnitude of the bias introduced by using a fixed definition of MSA boundaries. Figure 1 compares the the ICLUS 2010 and 2100 to the historic dataset for the Atlanta, GA metropolitan area.
Importantly, the projections of impervious surface area are derived from pixel-level residential population density and infrastructure, not urban level population. While it has long been known that impervious surface area and population are strongly related (Bhaduri et al., 2007; Lu et al., 2006; Wu & Murray, 2007), this has primarily been explored at the pixel scale. By integrating these projections across MSAs and comparing the result across all cities, we gain a constraint on the functional form of population-imperviousness relationships at the city scale. There is no urban-scale correction factor added for imperviousness estimations which would define in advance a scaling relationship between population and imperviousness at the city level. Thus, the population/imperviousness scaling relationships that we do observe from urban scaling theory and ICLUS are independent measures; urban scaling theory is an independent constraint on the viability of the ICLUS projections.
3 Methods
The scaling exponent β of different classes of aggregate urban outcomes is an active research area. A broad range of socioeconomic characteristics such as GDP, wealth, innovation, serious crime, and infectious disease (Bettencourt et al., 2007; O'Clery et al., 2016; Patterson-Lomba et al., 2015; Youn et al., 2016) demonstrate superlinear scaling, with β > 1. Bettencourt (2013) proposes as the theoretical expectation for urban socioeconomic characteristics which scale superlinearly. Linear scaling relationships are expected and observed for characteristics relating to individual human needs, such as firms, employment, housing, and household water and electricity consumption (Bettencourt et al., 2007; Gomez-Lievano et al., 2016).
Many physical characteristics and infrastructure-based characteristics of cities scale sublinearly, including density, urban area, road lane miles, and other infrastructure characteristics (Marshall, 2007; Samaniego & Moses, 2008). This sublinear scaling occurs because as populations grow and their density increases, these infrastructures are used more intensely, and networks can be connected more efficiently. Bettencourt (2013) proposes for aggregate urban characteristics related to infrastructure volume.
In the current literature, urban population is often used as a proxy for imperviousness when imperviousness data are lacking; for this reason, having a sound theoretical and empirical baseline for the form of the relationship between the two is crucial. The precise scaling relationship for impervious surface area has not yet been explored from Bettencourt's theoretical perspective, but impervious surface intensity is clearly bounded: No land area can be more than 100% impervious, and more densely populated places are likely to have higher impervious coverage than less densely populated areas. Thus, we expect impervious surface area to scale similarly to other forms of infrastructure, with β near 5/6.
3.1 Tests of Model Form
Henceforth, Y is defined as aggregate impervious area, and N is defined as total urban population. The indices i and t index cities and time, respectively. Lowercase letters denote the log of the capital letter: . If the model represented in Equation 1 correctly represents the processes generating Y, then ordinary least squares (OLSs) estimation of Equation 2 will provide a linear and unbiased estimator of β. If errors in n are homoskedastic (the variance in the error is not dependent on n), then we additionally know that OLS is the very best linear unbiased estimator—meaning that it gives the most precise and unbiased estimate of β that is possible.
In estimating this equation, we identify the best fitting relative weight of the power law and exponential terms, with the λ term used to create a weighted sum of the terms in Equations 2 and 4. If the comprehensive form estimates that , then we can reject the exponential form. If , then we can reject the power law form. If 0 < λ < 1, then further testing is needed, as the J test cannot distinguish between the two models.
In practice, λ, β_{1}, and β_{2} cannot be simultaneously estimated using linear regressions. Davidson and MacKinnon's strategy for estimating λ centers around estimating whether the residuals from Equation 1 are correlated with the difference between the fitted values of Equations 1 and 3. If they are uncorrelated, we can infer that λ is not distinguishable from 0. We can swap the order of the hypotheses to test whether λ is distinguishable from 1.
3.2 Panel Data Estimation Strategy
Most of the urban scaling literature uses cross-sectional data, which are data that only have spatial variation to explore urban scaling effects. More recently, explorations of urban scaling theory have begun to use panel data sets which include both spatial and temporal variation (Bettencourt et al., 2020; Depersin & Barthelemy, 2018). In this field, there has been little engagement thus far with well-established techniques developed by economists and statisticians for the assessment of panel data. In this paper, we use standard econometric tests and techniques to identify appropriate estimations of β based on the panel structure of the data.
After identifying a power law model form as a better fit to the data than a linear or exponential form, we then need to identify the correct panel estimator.
A flat OLS model of the form shown in Equation 2 identifies the average relationship between log population and log imperviousness, but it assumes that each observation is fully independent. This model does not acknowledge that there are multiple observations for each city, with later observations dependent on earlier observations. Each city's imperviousness is clearly highly dependent on the imperviousness 4 years ago. In any given place, most parking lots, roads, and buildings do not change much over a 4 year period, and reductions in imperviousness are particularly rare: A parking lot may be built up into a building, but its quite rare for a parking lot to be torn up and returned to grass or soil. In fact, the NLCD data set relies on the scarcity of that transition and is defined so that for any pixel, N_{t + 1} ≥ N_{t}, creating a strong temporal dependency by definition. Similarly, urban populations do not fully turn over in 4 years. Although growth and change do occur, current population is a good predictor of future population.
This approach identifies the empirical relationship between n and y that is based only on cross city variation, while still taking advantage of the temporal data that we have. It is consistent with the strategy described by Bettencourt et al. (2020). Within city variation, or changes in the n, y, difference from expectation across time informs only our confidence intervals around estimates of α, β, or γ_{j}. This strategy absorbs time varying factors such as systematic error in N or the variance of N based on different underlying satellite imagery, as well as genuine temporal dependence on the overall levels of imperviousness into the γ variables. The defined temporal dependency in the NLCD data combined with the fact that population is also increasing for most cities means that we expect γ_{t + 1} ≥ γ_{t} and γ > 0.
3.3 Test Scaling Relationship on Projected Data
Given the identified model specification, we then test whether the empirically identified n, y relationships in the ICLUS data set for each scenario is consistent with the relationship identified in the historical data set: Is β for the historical data distinguishable from each of the estimated ICLUS β's?
We estimate β for the projected data set using Equation 6 and then use a standard t test to compare the estimated β for the historical and projected data sets.
4 Results
Figure 2 shows the data and key results for the historical data set. Light gray traces show the temporal trajectory for each city's population and imperviousness outcomes. Each city's trace is composed of a vector of four points, one each for 2001, 2006, 2011, and 2016. A 10% random sample of cities has been highlighted in green to allow more visibility of typical patterns for individual cities. The dashed line shows our null hypothesis of a linear relationship between population and imperviousness. The dot-dash line shows the expectation from theory, . The solid black line shows the fitted result based on using the historical data set to estimate Equation 6, which is substantially closer to the theoretical expectation than the null hypothesis.
4.1 Model Form
We find that the J test allows us to reject the exponential form in favor of the power law form, with 95% confidence. When we use the power law form as the null hypothesis, we find that the p value of 0.74 is much greater than the 0.05 value that would enable rejection of the null hypothesis that with 95% confidence. When we swap the order of the hypotheses, we find that the p value of 0.000 does enable us to reject the hypothesis that .
4.2 Panel Data Estimation Results
Table 1 shows the results for estimates of the “flat” model described in Equation 2 and the “between” model described in Equation 6. Results for these two models are statistically indistinguishable, and the empirical differences are so small that they also are of no practical importance for our conclusions. This is because in our data set, almost all of the range in n is contained in the differences between cities, instead of the difference for any one city through time. The populations of U.S. cities span 2.5 orders of magnitude, while population change within any one city during the study period spans no more than 0.1 orders of magnitude.
NLCD: flat | Between | ICLUS: A1 | A2 | B1 | B2 | BC | |
---|---|---|---|---|---|---|---|
β | 0.867^{∗ ∗ ∗} | 0.867^{∗ ∗ ∗} | 0.817^{∗ ∗ ∗} | 0.798^{∗ ∗ ∗} | 0.809^{∗ ∗ ∗} | 0.805^{∗ ∗ ∗} | 0.814^{∗ ∗ ∗} |
(0.00614) | (0.0130) | (0.0145) | (0.0119) | (0.0141) | (0.0123) | (0.0125) | |
α | −2.711^{∗ ∗ ∗} | −2.709^{∗ ∗ ∗} | −2.482^{∗ ∗ ∗} | −2.408^{∗ ∗ ∗} | −2.454^{∗ ∗ ∗} | −2.440^{∗ ∗ ∗} | −2.486^{∗ ∗ ∗} |
(0.0347) | (0.0713) | (0.0846) | (0.0686) | (0.0815) | (0.0696) | (0.0716) | |
R^{2} | 0.924 | 0.925 | 0.947 | 0.937 | 0.945 | 0.932 | 0.941 |
Observations | 1,410 | 1,410 | 1,810 | 3,030 | 1,920 | 3,150 | 2,680 |
Cities | 366 | 181 | 303 | 192 | 315 | 268 |
- Note. Columns 1 and 2 show results of the main panel data specification, while Columns 3–7 show results of estimating a scaling relationship on all ICLUS scenarios, when cities with declining populations are excluded. Standard errors are in parentheses.
- * p < 0.05.
- ** p < 0.01.
- *** p < 0.001.
We use these results to answer two different questions. (1) “Do we observe any agglomeration effect at all?” And (2) “is the magnitude of the agglomeration effect consistent with theory?” To test Question 1, whether agglomeration effects are observed, we test if β < 1. We use a one sided t test with and H_{A} : β < 1. The null hypothesis is easily rejected in favor of the alternative hypothesis, showing that we do observe meaningful agglomeration effects in the relationship between population and imperviousness. To address our second question, if β is consistent with the theoretical expectation of 5/6, we select a conservative testing strategy, and ask whether we can reject the hypothesis that . We define a null hypotheses and an alternative hypothesis . We find that, with 95% confidence ( ), we reject the null hypothesis in favor of the alternative hypothesis: Our best estimate of β is slightly higher than the theoretical expectation. Nonetheless, in practical terms these results are close to that expected from theory.
These results give us substantial confidence that a power law relationship with a sublinear exponent close to 5/6 is a sound, time independent way to describe the relationship between urban population and aggregate urban impervious area. We next test whether existing projections of population and imperviousness remain consistent with these theoretical expectations and historic data.
4.3 ICLUS Validation Results
We test the agglomeration effects for ICLUS projections for each scenario on the projected data set using the model form shown in Equation 6 after replacing the j values from the historical data set to the years included in the projected data set, j ∈ 2010, 2020, 2030, … , 2100. Figure 3 shows these results for scenarios A1, A2, B1, and B2. The BC scenario is shown in Figure S1. The horizontal traces in this figure are cities with constant imperviousness and declining populations. Within each of the ICLUS scenarios, some cities show substantial population declines, while the aggregate impervious area for these shrinking cities is projected to remain constant. It is clear that these shrinking cities drive the empirical relationship between population and imperviousness and contribute to our result that the estimated relationship between projected population imperviousness is not consistent across scenarios, nor is it generally consistent with the theoretical expectation that .
We then test the population/imperviousness relationship only for the subset of cities where ICLUS projects population increases between 2010 and 2100. Table 1 shows the results from estimating Equation 6 for this subset of cities in each ICLUS scenario. We note that on this restricted data set, β estimates are still lower than that expected by theory but are more consistent across ICLUS scenarios. They are collectively indistinguishable from each other.
In sum, we find that ICLUS projections of the relationship between population and aggregate impervious area are broadly consistent with scaling results, with a somewhat lower β coefficient than expected by theory and observed empirically. This suggests that the method ICLUS uses to project changes in the population/imperviousness relationship are consistent with processes that result in agglomeration outcomes. The major exception is shrinking cities, but the assumption of constant imperviousness is reasonable, given the lifespan of concrete and asphalt and the challenge of removing it. Planned urban shrinkage, or regreening, is a complex policy topic because urban population declines are not geographically contiguous. Withdrawal of services and infrastructure in any place has disproportionately harmful effects on the people who live there, often people who are already marginalized. Nonetheless, these results give a quantitative estimate of the extent to which shrinking cities are maintaining economically and environmentally costly infrastructure no longer needed to serve their current population.
5 Conclusions
This study is the first application of urban scaling theory to relate urban population to impervious surface area. We demonstrate that there is a statistically robust sublinear scaling relationship between population and urban impervious surface across all four available National Land Cover Database data products over the 15 years from 2001–2016. This suggests that the underlying process described by urban scaling theory does apply to the relationship between population and imperviousness. We next show that the Integrated Climate and Land Use Scenarios (ICLUS) urbanization projections are also broadly consistent with urban scaling theory for cities in which population is projected to increase.
Perhaps most importantly, these results demonstrate that urban scaling theory can be used to validate other models' predictions of urban growth and land cover change, analogous to the ways in which allometric scaling laws in biology have been used to validate process based models of ecosystem composition under different climate scenarios. As human-driven systems have high uncertainty due to a wide range of possible decision-making outcomes, predictions based on these decision scenarios are difficult to evaluate. We believe this work demonstrates an innovative and impactful approach to provide some confidence in scenarios driven by human decisions and that this general approach offers broad potential for use across the field of climate and global change.
Acknowledgments
This research was sponsored by the DOE Office of Science as a part of the research in Multi-Sector Dynamics within the Earth and Environmental System Modeling Program.
Open Research
Data Availability Statement
In accordance with FAIR data availability practices, the three data sets for this research are included in (1) Yang et al. (2018) (available at https://www.mrlc.gov/data/nlcd-imperviousness-conus-all-years); (2) EPA, U. (2017) (available at https://www.epa.gov/gcx/iclus-downloads); and (3) the U.S. census (available at https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html).