Breaking Down the Computational Barriers to Real-Time Urban Flood Forecasting
Abstract
Flooding impacts are on the rise globally, and concentrated in urban areas. Currently, there are no operational systems to forecast flooding at spatial resolutions that can facilitate emergency preparedness and response actions mitigating flood impacts. We present a framework for real-time flood modeling and uncertainty quantification that combines the physics of fluid motion with advances in probabilistic methods. The framework overcomes the prohibitive computational demands of high-fidelity modeling in real-time by using a probabilistic learning method relying on surrogate models that are trained prior to a flood event. This shifts the overwhelming burden of computation to the trivial problem of data storage, and enables forecasting of both flood hazard and its uncertainty at scales that are vital for time-critical decision-making before and during extreme events. The framework has the potential to improve flood prediction and analysis and can be extended to other hazard assessments requiring intense high-fidelity computations in real-time.
Key Points
-
There is presently no means to forecast urban flooding at high resolution due to prohibitive computational demands and data uncertainties
-
Proposed framework combines high-fidelity modeling and probabilistic learning to forecast flood attributes with uncertainty in real-time
-
The framework can be extended to other real-time hazard forecasting, requiring high-fidelity simulations of extreme computational demand
Plain Language Summary
Currently, we cannot forecast flooding depths and extent in real-time at a high level of detail in urban areas. This is the result of two key issues: detailed and accurate flood modeling requires a lot of computing power for large areas such as a city, and uncertainty in precipitation forecasts is high. We present an innovative flood forecasting method that resolves flood characteristics with enough detail to inform emergency response efforts such as timely road closures and evacuation. This is achieved by performing complex analysis of information on flooding impacts well before a future storm event, which subsequently allows much faster predictions when flooding actually happens. This approach completely changes the demand for required resources, replacing the nearly impossible burden of computation in real-time with the easy problem of data storage, feasible even with a low-end computer. Example results for Hurricane Harvey flooding in Houston, TX, show that predictions of both flood hazard and uncertainty work well over different areas of the city. This approach has the potential to provide timely and detailed information for emergency response efforts to help save lives and reduce other negative impacts during major flood events and other natural hazards.
1 Introduction
Between 1995 and 2015, flooding has affected ∼2.3 billion people, its impacts totaled US$662 billion (CRED-UNISDR, 2015), and flooding in densely populated areas has remained one of the deadliest of all weather-related hazards (Doocy et al., 2013, Figure S1 and S1b). Analyses from the National Oceanic and Atmospheric Administration (NOAA) reveal that three quarters of fatalities from 1995 to 2017 are classified as “driving” and “in water” (Table S1). These deaths typically occur in flooded roads and potentially could be avoided if real-time information on conditions in the impacted area were available (Sanders et al., 2020).
The number of global extreme floods is on the rise (Figure S1c, Text S1). While the co-occurrence of various factors makes it difficult to determine their relative importance in this trend (Mallakpour & Villarini, 2015), there is clear evidence that the global hydroclimate is undergoing intensification (Fischer & Knutti, 2016; Pfahl et al., 2017), and changes in flooding are consistent with the distribution of changes in precipitation extremes (Peterson et al., 2013; Sanderson et al., 2019).
Established and rapidly expanding urban areas have been hotspots of flood impacts, and estimates indicate that the number of people residing in areas of high flood risk will reach 2 billion within two generations (Groeve et al., 2015). Flood impacts are thus poised to continue escalating in the future (Bevacqua et al., 2019; Moftakhari et al., 2017). Emerging needs include: (a) understanding how urban environments affect the propagation of extreme floods in order to inform mitigation measures, and (b) engineering comprehensive modeling capabilities to support real-time decision making in the times immediately before, during, and after flood events. Responding to both needs requires spatially explicit information on flood dynamics within urban areas at the level of detail that can inform both individual human decisions and efforts of emergency management personnel charged with public safety as they prepare for or react to flooding. We refer to the spatial scales of flood detail useful and consistent with these activities as “human action” scales that are within the range characteristic of a human dwelling, an evacuation route, or a vehicle, that is, 𝒪 (100–101) m. The short duration of most floods (hours to several days) offers only a narrow window for planning and response measures. Quantitative flood information thus needs to be available for emergency management personnel in short order for timely responses and to appropriately communicate with the public to shape awareness and risk perceptions (Sanders et al., 2020). Nearly all operational frameworks focus on streamflow as the flood variable of key interest (Maidment, 2017; Salas et al., 2018), and thus there are presently few, if any, systems poised to make forecasts of flooding at “human action” scales in real-time, despite significant advances in numerical weather prediction (Benjamin et al., 2018) and meteorologic hazard “now-casting” (Kirschbaum & Stanley, 2018).
Numerical modeling of runoff generation and overland flow has a rich history that dates back to the 1970s. Recent developments have led to the integration of watershed hydrology and flow hydrodynamics in comprehensive, first-principles based “flood models” (e.g., Brunner, 2016; Glenis et al., 2018; Kim et al., 2012; Ogden et al., 2015; Sanders & Schubert, 2019, to name a few). They however require advanced modeling techniques and their solution is computationally intensive (see Rosenzweig et al., 2021). Flood modeling in urban areas remains particularly difficult as the built environment creates a complex mosaic of hydrologic and hydraulic conditions. Surface flow confluences and obstacles due to buildings, bridge piers, flood control structures, and highly heterogeneous “patches” of developed and natural land cover typify urban environments. Adequate representation of these features and of their impact on physical dynamics in computational models necessitates mesh resolutions of 𝒪 (100–101) m scale, with 𝒪 (104–106) computational cells per square km (Glenis et al., 2018; Schubert & Sanders, 2012). Considering that a small time step is required for accuracy and stability, flood models pose an enormous computational burden even for moderate size efforts to represent a whole-city or region (𝒪(103–104) km2), and representation of larger areas remains infeasible (Text S2). New methods of upscaling fine-resolution land surface data in flood models show potential, but they may still not be able to capture small scale structures such as flood walls and levees (Sanders & Schubert, 2019).
Importantly, substantial uncertainty can exist in individual rainfall forecasts (Cloke & Pappenberger, 2009), necessitating ensemble simulations to ensure accuracy and uncertainty assessment in flooding estimates. The computational means typically available to a flood modeler, a workstation, a computational cluster, or cloud computing infrastructure, all continue to be modest for this task. Thus, despite computational advancements (Glenis et al., 2013; Neal et al., 2010; Vivoni et al., 2011; Wittmann et al., 2017) and increasing evidence that flood simulations at “human action” scale are useful for shaping awareness and perceptions of flooding and informing decision-making (Sanders et al., 2020), real-time flood forecasting with appropriate uncertainty quantification has been impossible so far (Echeverribar et al., 2019; Hosseiny et al., 2020; Wing et al., 2019). A fundamental change in the approach to modeling urban flooding is needed in order to address these challenges.
2 Overcoming the Computational Barriers
2.1 A Framework for Real-Time Urban Flood-Forecasting
We propose a framework for real-time urban flood modeling and uncertainty quantification that combines the physics of fluid motion and recent advances in probabilistic learning. This approach overcomes the computational limitations that preclude “human-scale” flood forecasting techniques.
The framework contains six steps outlined in Figure 1. In Step 1, the relevant uncertain inputs of a flood-resolving model of high fidelity are defined and their probability distributions are used to run the model a large number of times (e.g., 𝒪 (102–103)) to simulate runoff generation and surface flow hydrodynamics (for example, historical storm data can be used in these simulations). Step 2 creates a set of surrogate models of reduced complexity (e.g., polynomial functions, Text S3) from the response of the high-fidelity model to inputs in Step 1. Each surrogate model aims to mimic a single flooding variable, a specific quantity of interest (QoI) such as water depth at a given time and location: a road intersection, freeway underpass, or floodplain. Optionally, one can perform inference (Step 3a, Text S4), if observed data are available and can be mapped onto the state-space of the flooding model. This can be done, for instance, to infer hydraulic or hydrologic properties of the area that are input to the model, and account for their uncertainty (Step 3b). While this was not done in the current study to keep the narrative streamlined and focused on the proposed novel framework, previous research indicates the high efficacy of surrogate-aided Bayesian inference (e.g., Dwelle et al., 2019; Sargsyan et al., 2015; Sargsyan et al., 2019). Once surrogate models are created, inputs of the high-fidelity model can be supplied to the surrogates (Step 4) to perform flood forecasting for any actual event at a significantly lower computational cost. Trained surrogates enable real-time estimation of flooding variables with full uncertainty quantification (Step 5) and assessment of sensitivity (Step 6, optional). By decomposing the response of the high-fidelity model into a set of targeted quantities of interest, the framework renders progressive “learning” of underlying physics with computationally inexpensive surrogate models, thereby enabling probabilistic space-time assessment of flooding variables.

Workflow of a framework for real-time flood forecasting with uncertainty quantification. A set of uncertain inputs X is defined for a high-fidelity flood model to simulate runoff generation and surface flow hydrodynamics. A surrogate
is a model of reduced complexity that is trained to represent any output quantity of interest (QoI) based on the model
response, given uncertain inputs X. Observed data linkable to the outputs of
can be used to learn the distributions of uncertain inputs of the model, resulting in inference (Text S4). This computational effort occurs during an inter-flood training period (blue box). Trained
permits computational feasibility in real-time (pink box), including propagation of uncertainties in X to get probabilistic estimates of QoIs. Pink box, bold face type: elements used in this study.
A key element of the proposed approach is the reallocation of high-performance computations from immediately prior to or during a flood event to the time interval(s) between major events (Figure 1, blue box). The temporal shift of arduous calculations to times other than the occurrence of floods critically changes the requirements of computational resources: instead of needing large numbers of processing cycles during the flooding event when resources are likely to be limited, one can perform these computations prior to the event and store simulation results. Effectively, the challenge of real-time computational burden is replaced with the problem of data storage to save the constructed surrogates and ensure a straightforward access to their parameters.
We note that there has been gradual percolation of surrogate modeling approaches into the domain of water resource applications (Razavi et al., 2012) but the few relevant studies (Berkhahn et al., 2019; Bermúdez et al., 2019; Kalinina et al., 2020; Zahura et al., 2020) have been so far case-specific and hampered in terms of their ability to provide uncertainty quantification of modeled outputs or parametric inference. The framework independently developed in this research provides a unifying, general platform for real-time flood-forecasting modeling with interpretable quantification of the uncertainty. Our prior work (Dwelle et al., 2019; Sargsyan et al., 2014; Tran et al., 2020) offers proofs of viability of individual elements in Figure 1 as well as specific details relevant to their implementation (omitted in this paper).
To illustrate, the framework is applied to the extraordinary urban flooding that occurred in August 2017, during Hurricane Harvey, in Houston, Texas. To mimic an operational setting of the real-world situation, we use quantitative precipitation forecasts (QPFs) provided by NOAA at the event onset. By treating this forecast as uncertain input, alternative realizations are generated and serve as input into a set of pre-trained surrogate models to provide probabilistic flooding estimates (see Section 3). The efficacy of the framework is demonstrated by illustrating simulated and observed streamflow, inundation patterns, and by quantifying both the uncertainties and the computational effectiveness of the simulations.
2.2 Pretraining of Model in the Calm Before the Storm
Many different types of surrogate models can be trained to represent outcomes of complex dynamics (Ghanem et al., 2017). Any scalar output can be used as a QoI, requiring development of a surrogate model for each QoI. In the flood forecasting context, surrogates can be trained to mimic relevant QoIs that are direct or derived outputs of the high-fidelity flood model such as discharge, level, pressure, shear stress, etc. Once a training set of results from the high-fidelity model is generated (performed only once, unless the model structure changes), surrogate models can be constructed and calculated very quickly, even with a low-end computer.
One surrogate form that offers flexibility in construction and simplicity in computation is the polynomial chaos expansion (PCE) of a quantity of interest (Ghanem & Spanos, 2003). PCE-based surrogate models offer several advantages for making probabilistic assessments. First, they can represent the complex, nonlinear behavior of high-fidelity models, as long as the input-output relationship is reasonably smooth. They also provide global sensitivity analysis (Sudret, 2008) and uncertainty decomposition without need for additional simulations of either the surrogate or high-fidelity models beyond those that are required for QoI analysis. This allows identification of the major uncertainty contributors as the primary drivers causing modeled variations in flooding response. Finally, the polynomial form greatly reduces computational effort: the time it takes to train the surrogate model and use it to predict a QoI can be many orders of magnitude lower as compared to a high-fidelity model simulation. Importantly, these computational savings allow for uncertainty quantification to take place in real-time, as the polynomial surrogates can be rapidly sampled thousands of times to construct empirical probability distributions of flood response.
The space-time resolution of the high-fidelity flood model may result in a large number of QoIs to be considered (e.g., spatial variations in inundation depth). One can decrease this number using dimensionality reduction methodologies by treating spatio-temporal outputs with strong correlations as stochastic fields. A dimensionality reduction approach (e.g., Karhunen-Loève decomposition, Text S5) uses the mean of the field and decomposes the variation around it using only a few eigenvalues and eigenfunctions of the field's covariance function. This approach can greatly reduce the number of QoIs, and therefore the number of the surrogate models that needs to be constructed and run during the forecast. Therefore, it can save hours of computational time.
Quantification of uncertainty sources and their impacts in QoI forecasts is a vital element of flood forecasting. Uncertainty quantification (UQ) is particularly relevant for urban areas to inform decision-making processes such as planning evacuation routes, manipulating flood control structures, or carrying out rescue operations. To illustrate UQ within the novel flood forecasting framework, we address the uncertainty associated with the precipitation forecast. Arguably, this is one of the most important sources of input uncertainty (Pappenberger et al., 2005): even postevent, observation-informed analyses emphasize the issue of high variability among estimates (Text S6). The UQ approach is not limited to precipitation uncertainty only; similar applications in other contexts have proven the approach efficacy under multiple sources of uncertainty (Dwelle et al., 2019; Sargsyan et al., 2014; Tran et al., 2020).
3 A Blueprint for Real-Time Urban Flood Forecasting: Hurricane Harvey Case Study
Hurricane Harvey produced historic rainfall of more than 150 cm over southeastern Texas, causing extensive flooding and dozens of fatalities (Blake & Zelinsky, 2018). We demonstrate the utility of the proposed flood forecasting framework in an urbanized watershed (Figure 2a) in the greater Houston area that experienced heavy flooding (Figures 2b and 2c). We concentrate on three locations of interest within the watershed (Figure 2d, Text S7). Subarea 1 focuses on transportation infrastructure along a potential escape route to Interstate 610; Subarea 2 captures an area of high inundation around the White Oak Bayou in downtown Houston; and Subarea 3 is in a high-density development impacted by inundation due to poor drainage. During pretraining, 1,000 simulations of the high-fidelity hydrologic and hydrodynamic flood model (see Text S8 and Kim et al., 2012) were carried out to represent flow conditions at the 𝒪 (100–101) m scale (Figure 2d). The number of necessary simulations can be assessed based on the number of uncertain inputs (e.g., see Dwelle et al., 2019; Sargsyan et al., 2014). Simulation outputs were used to construct surrogate models mimicking inundation in the three subareas and streamflow at the watershed outlet. Importantly, time series of input rainfall in pretraining simulations did not contain any information about the event (Text S9): by using a series of uncorrelated pulses as input the goal is to illustrate the efficacy of pretraining (Figure 1) that remains “ignorant” to circumstances of a flooding event for which a forecast would have to be issued.

Case study watershed and event simulation with the high-fidelity model. (a) Land use in the greater Houston area. The white line delineates the study watershed. (b) Distribution of rainfall from gage-adjusted radar rainfall at 11 UTC on August 27, 2017. (c) Event inundation depth for Subarea 2 shown in subplot 2d (FEMA data; Text S1). (d) Modeled inundation depth in the three watershed subareas at 23 UTC on August 27, 2017, using the high-fidelity model with gage-adjusted radar rainfall as input (white areas indicate buildings/structures). (e) Outlet streamflow series based on USGS measurements (“USGS streamflow”) and obtained with the high-fidelity model (“tRIBS-OFM”) with radar rainfall as input; the Nash-Sutcliffe model efficiency coefficient (N-Smec, Nash & Sutcliffe, 1970) is 0.8. The area shaded in blue is the 5%–95% regional of uncertainty associated with the stage-discharge relationship (Kiang et al., 2018, Text S1). Top of 3e shows basin-averaged precipitation: gage-adjusted radar, IMERG satellite-based observations, and the High-Resolution Rapid Refresh (HRRR) numerical weather model forecast (Text S6).
Water depth is considered to be a primary QoI. Surrogate models were trained for each computational cell of the three subareas to represent spatial variations (Figure 2d) for each of 18 consecutive simulated hours. This was done for 7,057 cells, resulting in a total of 127,026 QoIs. Streamflow at the watershed outlet (Figure 2e) for each half-hour represents another set of QoIs. The forecast uncertainty due to uncertain rainfall is estimated for each QoI.
We emulate a real-time operational application of the framework in what is expected to be a typical forecast setting. The QPF from the High Resolution Rapid Refresh (HRRR) system (Benjamin et al., 2016) provides an 18-h rainfall forecast in the study area. Issued at the onset (0:00 UTC on August 27) of the storm's heaviest precipitation, this rainfall scenario is used to construct a stochastic description of the event precipitation process, providing its uncertainty (Figure 3a, Text S9). By sampling from the stochastic process, a set of rainfall realizations is generated for the forecast period and used as input into the surrogate models (Figures 3a and 3b) to forecast QoIs at different times within the 18-h window.

Validation and forecasting with surrogate models. (a) Left: bars illustrate High-Resolution Rapid Refresh (HRRR) forecast rainfall rate, whiskers show two standard deviations of the Gaussian process (GP) fitted to the forecast rainfall (Text S9). Red dashed line shows the mean streamflow estimated using trained polynomial chaos expansion (PCE) surrogate models for 36 30-min intervals, using HRRR precipitation forecast as input (N-Smec is 0.68). The area shaded in pink is the 5%–95% posterior probability region obtained from the surrogates that used 10,000 realizations of rainfall input; area shaded in blue is the 5%–95% regional of uncertainty associated with the stage-discharge relationship (Kiang et al., 2018, Text S1). Right: validation of trained surrogate models for 36 half-hourly outlet discharges using 150 validation simulations of the high-fidelity model with uncorrelated rainfall series as input (Text S9). (b) Median depth and its uncertainty. Top panel: inundation depth at hour 12 UTC on August 27, 2017 for the three regions estimated using surrogate models, with HRRR precipitation forecast used as input. Bottom panel: the uncertainty of the depth estimates expressed as the difference between 95% and 5% of the posterior distribution for each location obtained from the surrogates that used 10,000 realizations from the GP. (c) Validation of the inundation depths simulated with the trained surrogate models using 150 simulations for the three regions of interest shown in (b) at hour 12 UTC on August 27, 2017. Blue-colored scatterplots: a comparison of each cell quantity of interest (QoI) (7,057) from the high-fidelity model (X-axis) and the surrogate models (Y-axis). Green-colored scatterplots: depths from the surrogate models (Y-axis) are constructed using the QoI dimensionality reduction method (as described in Text S5). Depths are in (m). In all regression plots, the coefficient of determination is higher than 0.999.
To convey the full utility of this approach, we discuss the emulated forecasts and their probabilistic nature, also addressing how they can be used by different stakeholders.
Stemming from the uncertain rainfall of the 18-h lead forecast, the streamflow estimates reflect the uncertainty bounds from the QPF (Figure 3a). They can be vital in decision making that, for example, aims to optimally control water volume in the domain. The estimated streamflow bounds may not always span observed streamflow: the latter are highly uncertain during extreme floods (Text S1).
The inundation “forecast” for Subarea 1 (Figure 3b, top left) focuses on the vicinity of the entrance ramp to Interstate 610. A timely forecast of high spatial resolution in this area can be assessed by a flood response team as one of potential evacuation routes and safest course for vehicle passage identified. Forecasted water depth in the White Oak Bayou Subarea 2 unsurprisingly exhibits higher flow depths in the river channel. It also shows larger associated uncertainty in the inundation extent in the surrounding areas and downstream levels (Figure 3b, bottom). However, the uncertainty of the forecasted inundation does not scale with depth: there are areas where the prediction has relatively low uncertainty for high flow conditions and vice versa. Inundation extent in the commercial district (Figure 3b, top right) illustrates that even without proximity to a channel, poor drainage characteristics can promote local inundation, commonly referred to as “pluvial flooding,”an important but poorly understood phenomenon because of the localized nature of its occurrence.
If it could not be produced in real-time, the development of the flood-resolving information shown in Figure 3b would have a limited value. Despite the extreme computational demand of the hydrodynamic model, the framework demonstrated here reduces the cost of the proposed real-time solution for a variable of interest (i.e., a QoI surrogate model) by 2–4 orders of magnitude; for example, it takes only a few seconds to run all 127,026 surrogate models (Text S10).
While each of the PCE-based surrogates developed here to mimic the high-fidelity model is computationally inexpensive and quite accurate (Figure 3c, left panels), running many thousands of them with full UQ (i.e., 127,026 and many thousand scenarios for each) can be burdensome. To further assure real-time feasibility, one can significantly reduce the dimensionality of QoIs, for example, water depth spatial distribution. Specifically, the number of the QoIs can be reduced to only three for each subarea and each hour of the 18-h period (Text S5). With dimensionality reduction of the QoI set, the computational effort can be reduced by a further three orders of magnitude, making the execution feasible even with low-performance computational systems available to any practitioner. Validation of this truncated set of 54 surrogate models testifies that such a reduction does not lead to an appreciable loss of accuracy (Figure 3c, right panels).
4 Discussion
The results demonstrated here assert the practical utility of complex, high-fidelity hydrodynamic models for real-time flood forecasting with full uncertainty quantification at the level of detail relevant for human decision-making immediately before, during, or after a flood occurs.
The central premise of the proposed framework is that most intense calculations can be performed offline during a period that does not have the urgency of an impending or ongoing extreme flood event. A crucial methodological point is that training of reduced-order models can occur without the knowledge of details of a future flooding event, an important feature in a non-stationary climate. Specifically, the skill of trained surrogate models is remarkable even though rainfall input to the high-fidelity model to develop the set of outputs for training is represented as a series of uncorrelated pulses. Incorporating realistic storm structures with embedded correlations (Fatichi et al., 2011; Peleg et al., 2017) would be a logical framework extension.
A “bridge” from the phase of model training to meet the demands of real-time forecasting is in the form of storage and manipulation of high-fidelity model outputs and polynomial information for the trained surrogate models. We show that this problem is trivial (Text S10) due to affordability of modern storage systems.
We note that while surrogates are becoming widespread in computationally intensive studies of physical models, their application as the carrier of the uncertainty information for global sensitivity analysis and model parameter inference has a particularly promising potential for urban flood modeling. The richness of high-fidelity model solutions and the computational efficiency permit explicit uncertainty quantification for variables beyond those demonstrated here (streamflow and inundation depth). They may contain information critical for real-time assessments of flooding impacts, such as flow dynamic pressure, velocity, water volume within districts, etc. Additionally, the sensitivity assessment module embedded within the UQ framework (Figure 1) allows for a formal analysis of QoI dependence on the uncertain inputs. For instance, the case study assumed rainfall to be the dominant source of input uncertainty and sensitivity analysis could associate fractions of the QoI forecasted error bars with contributions from uncertain rainfall at different input intervals. Highlighting the periods during which input rainfall has the strongest impact on a QoI would call for focusing efforts on the quality of rainfall forecasts within these periods in real-time.
Importantly, given observations, the framework facilitates model inference (Figure 1, Text S4), even when data are diverse and disparate (e.g., Dwelle et al., 2019). Although not explicitly demonstrated in this study to keep the framework application straightforward, streamflow, stage, and flood areal extent data, for example, can help improve the representation of channel and land hydraulic properties using the Bayesian inference. The addition of new data can lead to continuous adaptation of the high-fidelity model and surrogate machinery (by re-running the less computationally costly inference problem) and, likely, a reduction of predictive uncertainty (Tran et al., 2020). Further, model-aided optimization of drainage and flood control characteristics can be incorporated into “flood-smart” urban design and disaster management planning.
The discussed framework is not without its challenges. Although there is a reduction in the number of computational resources required during the real-time forecast, access to high performance computing facilities is needed to run thousands of model simulations of the high-fidelity model, train surrogates on the desired QoIs, and carry out inference.
Additionally, high-fidelity models never perfectly represent all of real-world physical processes, and therefore model error (or structural uncertainty) will always be present in simulations. Ignoring model error can lead to a significant bias in estimated parameter values and thus a lack of physical accuracy in the surrogates that reproduce the high-fidelity model. Nonetheless, one does not need to wait for physical flood models of choice to become perfect as both the conventional model correction approaches (Kennedy & O'Hagan, 2001) and embedded model error methods (Sargsyan et al., 2019) can remove this model-error induced bias, and both types of methods can operate within our framework using the preconstructed surrogates.
Another challenge is the applicability of the approach to a much larger dimension of uncertainty spaces, that is, when numerous model inputs and parameters need to be treated as uncertain. Surrogate models have been applied to such problems (Ricciuto et al., 2018) with the conclusion that they remain tractable if only a few parameters have an appreciable impact on the QoI variability. For extreme floods, experience shows that precipitation uncertainty dominates (Pappenberger et al., 2005) and methodological advancements in treating information from several sources (e.g., multimodel ensemble) continue to be warranted. We note that surrogate performance is not guaranteed to be flawless a priori. While theoretical considerations relate the number of uncertain inputs and characteristics of surrogate polynomials (Xiu & Karniadakis, 2002; Text S3), the degree of smoothness of the high-fidelity model solution determines the effort required in surrogate training. In general, surrogate model construction may be challenged by overfitting in cases when there is a large number of uncertain inputs and not enough training simulations due to the computational burden of the high-fidelity model. However, the simple parametric form of polynomials makes it less prone to overfitting than other surrogate model methodologies. Furthermore, sparse learning approaches, such as Bayesian compressive sensing (Dwelle et al., 2019; Sargsyan et al., 2014), facilitate adaptive selection of only relevant polynomial terms in surrogate, effectively enforcing the Occam's razor principle and further reducing the likelihood of overfitting.
Lastly, significant alterations within the urban landscape potentially impacting its runoff and drainage characteristics (e.g., through changes to water management infrastructure) can be accommodated within this framework. Specifically, the sensitivities (Dwelle et al., 2019) of flood metrics with respect to key variables associated with potential alterations can be estimated. In the limiting case, however, significant changes can make the stored surrogates (and data) obsolete, thus requiring new high-fidelity simulations to update the surrogates.
Floods represent nearly half of all global weather-related disasters (CRED-UNISDR, 2015) and the combination of urban growth and an increase in rainfall extremes make a compelling case to rethink the current flood forecasting paradigm. We propose a framework that builds on classical fluid dynamics and recent advances in probabilistic learning, incorporates the most up-to-date knowledge of urban landscape, and is highly adaptive to include additional data. It has the potential to drastically improve our ability to compute and understand, in real-time, the hazards posed by channel and pluvial floods and assess uncertainties. Furthermore, the approach can be readily extended to a broad class of geophysical hazard assessments whose prediction accuracy and uncertainty quantification in real-time continue to be constrained by extreme computational demand.
Acknowledgments
The authors acknowledge the support from the Sandia National Laboratories and the Uncertainty Quantification group for their help with navigating UQ and UQTk. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. The authors acknowledge the constructive criticism of two anonymous reviewers that has led to substantial improvements of this manuscript. The study was seed-funded by the “Catalyst Program” of Michigan Institute for Computational Discovery and Engineering at the University of Michigan, and partially supported by the NSF grant 1725654 and grant awarded by the Michigan Department of Natural Resources, Office of the Great Lakes to V. Y. Ivanov. J. Kim was supported by a grant 127554 from the Water Management Research Program funded by the Ministry of Environment of Korean government. K. Sargsyan was supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program through the FASTMath Institute. G. Bisht was supported by the U.S. Department of Energy, Office of Science, Subsurface Biogeochemical Research (SBR) program through the SBR Scientific Focus Area project at Pacific Northwest National Laboratory.
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
Open Research
Data Availability Statement
All data used are cited in the manuscript and supplementary materials. Specifically, data on U.S. flood-related fatalities from 1995 to 2017 were downloaded from https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/. Global flood occurrence and severity data are from Brakenridge (2016). Maximum inundation depth data (FEMA, 2018) for the August 2017 flooding in Houston, Texas, is available at HydroShare repository: https://www.hydroshare.org/resource/165e2c3e335d40949dbf501c97827837. River stage data were collected by the U.S. Geological Survey at gage 08074540: https://waterdata.usgs.gov/nwis/uv?site_no=08074540. Quantitative precipitation forecasts (QPF) from the High Resolution Rapid Refresh (HRRR) system is from Benjamin et al. (2016). Integrated Multi-satellitE Retrievals for GPM from NASA is from Huffman et al. (2015). NEXRAD (Next-Generation Radar) rainfall data are available at https://www.ncdc.noaa.gov/data-access/radar-data/nexrad-products. Rain gage time series from Weather Underground (Station ID: KTXHOUST1941): https://www.wunderground.com/weather/us/tx/houston/KTXHOUST1941. River channel network and Digital Elevation Model (DEM) at 3 m resolution: https://viewer.nationalmap.gov/. Houston building footprint data: https://koordinates.com/layer/12890-houston-texas-building-footprints/. The land use information is available from the National Land Cover Database 2016: https://www.mrlc.gov/data?f%5B0%5D=category%3Aland%20cover.