Implications of Simulating Global Digital Elevation Models for Flood Inundation Studies

The Shuttle Radar Topography Mission has long been used as a source topographic information for flood hazard models, especially in data‐sparse areas. Error corrected versions have been produced, culminating in the latest global error reduced digital elevation model (DEM)—the Multi‐Error‐Removed‐Improved‐Terrain (MERIT) DEM. This study investigates the spatial error structure of MERIT and Shuttle Radar Topography Mission, before simulating plausible versions of the DEMs using fitted semivariograms. By simulating multiple DEMs, we allow modelers to explore the impact of topographic uncertainty on hazard assessment even in data‐sparse locations where typically only one DEM is currently used. We demonstrate this for a flood model in the Mekong Delta and a catchment in Fiji using deterministic DEMs and DEM ensembles simulated using our approach. By running an ensemble of simulated DEMs we avoid the spurious precision of using a single DEM in a deterministic simulation. We conclude that using an ensemble of the MERIT DEM simulated using semivariograms by land cover class gives inundation estimates closer to a light detection and ranging‐based benchmark. This study is the first to analyze the spatial error structure of the MERIT DEM and the first to simulate DEMs and apply these to flood models at this scale. The research workflow is available via an R package called DEMsimulation.


Introduction
Digital elevation models (DEMs) are numerical representations of the bare-earth surface, but like all models are a simplification of reality. The Shuttle Radar Topography Mission (SRTM; Farr et al., 2007) is a freely available DEM covering the Earth's surface within latitudes of +60 ∘ N and −56 ∘ S at a resolution of 3 arc sec (≈90 m). It was collected over an 11-day period in February 2000 in a mission led by National Aeronautics and Space Administration (NASA). In late 2015, a 1 arc sec (≈30 m) product was released for areas outside the United States. Various versions exist including the original nonvoid filled SRTM V1, void filled products SRTM V2, SRTM V3, and the CGIAR Consortium for Spatial Information (CGIAR-CSI) developed version (Jarvis et al., 2008). In the near future the NASADEM (Crippen et al., 2016), which is a reprocessed version of the original SRTM data set, is due to be released. Other free global DEMs exist, such as Advanced Land Observing Satellite (Tadono et al., 2014) and Advanced Spaceborne Thermal Emission and Reflection Radiometer (Abrams, 2000), with the SRTM data set generally favored by geoscientists (particularly the CGIAR-CSI Version 4; Jarvis et al., 2008) due

10.1029/2018WR023279
Key Points: • Assessed vertical error and estimated semivariograms for MERIT and SRTM DEMs for 20 lowland locations • Calculated spatial error structure can be used to simulate floodplain in the MERIT and SRTM DEMs • Using simulated DEMs in flood models for two locations gives more realistic flood estimates compared to using a single DEM Supporting Information: • Supporting Information S1 • Figure S1   Shortridge (2006) Note. Vertical errors from each study are reported, either as root-mean square error (RMSE) or MAE (mean absolute error). Land cover refers to the land cover class of the location in each study. Inclusion of error assessment from vegetation or terrain or an analysis of spatial dependence is assessed on a yes/no basis. The stated figures only give a headline, and interested readers are referred to the referenced studies for more details on the methods used. SRTM = Shuttle Radar Topography Mission.
to greater feature resolution, reduced number of artifacts, lower noise, and better vertical accuracy (Jarihani et al., 2015;Rexer & Hirt, 2014;Schumann et al., 2014). Therefore, SRTM forms a crucial resource in providing elevation data to many hazard and risk assessment models, particularly in remote and data poor locations where high-resolution data such as LIDAR (light detection and ranging) either do not exist or are not freely accessible. Despite calls for a concerted effort to produce a more accurate free global DEM , there is little sign that such a data set will be produced soon. Thus, the SRTM data set remains the best option for elevation data for much of the Earth's surface, both now and for the foreseeable future.
Characterization of the MERIT/SRTM spatial error structure is interesting in its own right but crucial if MERIT/SRTM is to be used to simulate DEMs for the purposes of flood risk assessment. With a perfect DEM, and assuming, for simplicity, a perfect flood model, one run of the model would be required, after which each pixel would be classified either dry (probability = 0) or wet (probability = 1). An imperfect DEM introduces a source of uncertainty about whether a pixel is inundated and hence a range of probabilities between 0 and 1. In floodplains and river deltas, where topography is near flat, this uncertainty could be large, because small variations in the DEM could lead to large changes in inundation. DEM uncertainty can be assessed in a deterministic or stochastic way, by generating candidate DEMs that are consistent with the MERIT/SRTM DEM and its spatial error structure, running the flood model over each candidate and aggregating the results. Simulating candidates of the DEM does not create a single true DEM, where the truth refers to the true observed measurement, but instead provides a bound within which the true elevation value is likely to be found. Stochastic simulation of DEMs using the spatial error structure is a relatively well known idea in geostatistics (e.g., Fisher, 1998;Fisher & Tate, 2006;Hunter & Goodchild, 1997;Kyriakidis et al., 1999), with realizations of the DEM found to greatly affect surface derivatives (e.g., slope; Darnell et al., 2008;Davis & Keller, 1997;Hengl et al., 2008;Holmes et al., 2000;Oksanen & Sarjakoski, 2005;Veregin, 1997). Yet these studies have not analyzed the wealth of global DEM data now available (i.e., SRTM). In addition, DEM simulation has only been used sparingly in flood studies, with only Wilson and Atkinson (2005) using the approach with a hydrodynamic model. Several others (Fereshtehpour & Karamouz, 2018;Leon et al., 2014) have used DEM simulation to estimate coastal flood inundation using the bathtub approach, whereby pixels are inundated when their elevation is lower than a storm surge level and the pixels are hydrologically connected.
The quality and resolution of topographic data has long been recognized as a key control on flood inundation (Horritt & Bates, 2002) and the wetting/drying of a domain (Bates, 2012;Neal et al., 2011). Despite the recognition of the importance of topography, flood model evaluation has tended to focus on other hydraulic parameters (Wechsler, 2007) owing to the lack of DEMs available. Studies that do evaluate the influence of topography on model results tend to use a high-resolution DEM and resample to various coarser resolutions, thus testing the influence of model resolution over the quality of the initial DEM Horritt & Bates, 2001;Komi et al., 2017;Neal et al., 2009;Sanders, 2007;Saksena & Merwade, 2015;Savage, Pianosi, et al., 2016), concluding that higher resolutions generally give more accurate predictions. For most of the world, high-resolution DEMs are unavailable; thus, a global DEM must be used. SRTM remains the most widely used topography input to flood models in areas where a high-resolution DEM does not exist owing to better accuracy and ease of accessibility over other global DEMs (Yan et al., 2015). Rarely do studies in such data-sparse areas compare flood extent estimates given by different global DEMs. Examples of studies that do compare flood estimates using different global DEMs usually compare Advanced Spaceborne Thermal Emission and Reflection Radiometer with SRTM (Bhuyian & Kalyanapu, 2018;Jarihani et al., 2015) or more recently SRTM with MERIT . In areas where a global DEM is the best source of topographic information, we are limited to the small number of global DEMs available and thus are restricted in the number of DEMs we can use. DEM simulation allows for multiple realizations of a DEM to be produced, thus allowing modelers to produce many flood maps given the topographic error. To refine our goal to simulate DEMs for flood models, we focus our study on floodplains, specifically large river deltas, as these areas are of most interest to flood modelers. Deltaic regions form one of the most flood prone areas in the world, with this expected to increase in the future (Hallegatte et al., 2013;Syvitski et al., 2009).
Here we quantify the spatial error structure in the SRTM and MERIT DEMs for 20 lowland locations. Using the fitted error covariance function, we simulate plausible versions of the MERIT and SRTM DEMs, creating a catalog of possible DEMs. We then demonstrate the impact of using an ensemble of simulated DEMs in a flood inundation context for two locations in Fiji and Vietnam. While the spatial error structure was calculated for 20 locations, the effects were assessed on flood models of two locations due to data availability and the complexity of building each flood model. In a world of increasing computer power, but a lack of detailed data sets, this powerful approach can be used throughout natural hazard modeling to understand how errors in the DEM can impact hazard assessment.

Data Preparation and Visualization
To calculate the errors for the SRTM DEM and MERIT DEM for floodplains, ground truth data are needed. We use LIDAR data as ground truth data for 20 different sites distributed across the globe. In the absence of GPS ground truth data, we use LIDAR data owing to its low vertical error (<20 cm at all sites), availability, and spatial coverage.
Ideally, we would use LIDAR data collected at the same time as the SRTM product, but the available LIDAR was typically collected 6-14 years after SRTM (see the supporting information for details). Using Google Earth™ and the yearly satellite images, we checked whether the land use had changed between the SRTM and LIDAR collection period and found no significant differences over land pixels for any of the study sites. Subsidence is a major challenge for many of the world's deltas (Higgins, 2016;Schmidt, 2015), with subsidence rates for some of the study sites documented in the supporting information. Subsidence rates change the land elevation between SRTM and LIDAR collection dates but were ignored in this study as they fall well within the vertical error of SRTM.
The next stage is to calculate the arithmetic mean of the LIDAR values that fall within each MERIT/SRTM pixel to determine the vertical error. We make the assumption that each MERIT/SRTM pixel is the integration of its interior topography so we use the arithmetic mean of LIDAR elevation values. This overcomes problems associated with using the elevation of grid cell centers to represent elevation as this often does not accurately represent the hydrography of floodplains (Moretti & Orlandini, 2018). We use the least manipulated SRTM (3-arc sec SRTM v1 nonvoid) product (Farr et al., 2007) and the MERIT DEM (Yamazaki et al., 2017) as both are the same resolution and use the same grid. Analysis was performed using the raster package of Hijmans et al. (2016) in the statistical computing environment R (R Core Team, 2017). In total, we analyzed over 5,100 km 2 of floodplains or an area approximately the size of the U.S. state of Delaware.

10.1029/2018WR023279
We produced surface error maps of MERIT-LIDAR and SRTM-LIDAR with an example from the Mekong Delta shown in Figure 1. Such maps are useful in allowing us to visualize the spatial location of the errors and compare these to satellite imagery and land cover maps to assess the possible causes. Additional surface error maps for the other study sites are available in the supporting information. Gray pixels indicate areas where either MERIT/SRTM or LIDAR pixels are missing or are water bodies. The water body mask was delineated using the Global Water Surface water occurrence map (Pekel et al., 2016). A water occurrence threshold of 90% was used based on trial and error, with such pixels not contributing to the estimation of the MERIT/SRTM-LIDAR spatial error structure.

Estimating the Variogram
For each study site, we fit a semivariogram to the difference map (i.e., SRTM/MERIT-LIDAR), using those pixels not excluded by missing values or by the water body mask (see section 2.1). We assume stationarity and isotropy, with these assumptions based on directional semivariograms and semivariogram maps (example in the supporting information). Geostatistical analysis are implemented using gstat in R (Pebesma, 2004).
If s and s ′ are the vectors of spatial coordinates and X is value of SRTM/MERIT(s)-LIDAR(s) (in other words the vertical error), then the semivariogram ( (h)) is defined as where h (or the lag is measured in decimal degrees. As this study intends to use the semivariograms to simulate other places, we need to fit the semivariograms. There is no best semivariogram model to fit semivariograms, so one must be careful to choose a model that captures the main spatial features avoiding over fitting (Goovaerts, 1997). Inspection of the empirical semivariograms suggested that a double-exponential shape would capture the main spatial features. Therefore, the chosen model to fit the semivariograms has the parametric form where (a 1 , a 2 ) represent the range, 2 1 the near component fitted with an exponential model, and 2 2 the far component again fitted with an exponential model. To fit equation (2), we proceeded in two stages. First, we fitted an exponential semivariogram using only pixels within 0.005 decimal degrees (≈500 m) of each other (i.e., the near component). This gives an estimate of the near range parameter, a 1 . Then we fitted the sum of two exponential semivariograms with specified ranges a 1 and a 2 = 10 a 1 to pixels within 0.01 ∘ (≈1,000 m) of each other (i.e., the far component). From this we can calculate the sill and range values. The sill refers to the semivariance value at which the semivariogram levels off and is the marginal standard deviation. The range is the distance at which the semivariogram effectively reaches the sill value. The cutoff values were chosen based upon visual inspection of the resultant semivariograms and fall within the values previously calculated by Rodríguez et al. (2006), Shortridge (2006), LaLonde et al. (2010, and Shortridge and Messina (2011) in their empirical semivariograms.

DEM Simulation
Here we outline the theory that allows us to use the MERIT or SRTM data sets and a specified semivariogram of MERIT/SRTM-LIDAR to simulate candidate true DEMs for the purposes of risk assessment.
Ideally, we would construct a statistical model covering both the true DEM and observations on the true DEM. Typically, this model might be a multivariate Gaussian distribution, in which the observations might be a subset of the true DEM plus noise, where the noise is probabilistically independent of the true DEM. Then we would condition the true DEM on the observations and use samples from the conditional (or posterior) distribution of the true DEM as candidates in a simulation-based approach to computing the inundation probabilities for each pixel of the hazard map.
Specifying the full covariance structure over both the true DEM and the observations is demanding. Under some conditions, it also turns out to be unnecessary. These conditions are described in Rougier and Zammit-Mangion (2016;Theorem 3). In essence, the prior variance matrix of the true DEM has to be far larger than the error variance matrix of the observations. In this case, if there is an observation for every pixel, then the posterior expectation of the true DEM is approximately equal to the observations, and the posterior variance matrix of the true DEM is approximately equal to the measurement error variance (i.e., it inherits the spatial structure of the error). This plug-in approach is very intuitive and quite widely used, so it is reassuring to know that it is approximately correct under an acceptable assumption about a large prior uncertainty. Of course, the reverse of this assumption is that one can reduce posterior uncertainty much further with a smaller prior uncertainty for the true DEM, but only at the cost of quantifying the prior uncertainty in terms of a prior variance matrix.
An alternative method to the geostatistical approach that we use to estimate the spatial dependence structure is to use copula functions. Copula functions allow for a relaxation of the assumption of Gaussian dependence and can overcome deficiencies in geostatistical procedures where quality is a function of observation density and the semivariogram model. Studies have outlined copula-based methods to predict groundwater quality parameters (Bárdossy & Li, 2008), hydraulic conductivity (Haslauer et al., 2012), and soil properties (Marchant et al., 2011), which then can be stochastically simulated. As far as the authors are aware a copula-based method has not been applied for DEM simulation.
We can implement this plug-in approach by using either the MERIT DEM or SRTM DEM as the observations and simulating candidate DEMs by adding simulations from the semivariogram of MERIT-LIDAR, where we are treating the LIDAR observations as true.

Flood Inundation
Now we can simulate candidates of the true DEM; we can run flood models using multiple DEMs, thus allowing us to explore the impact topographic uncertainty has on inundation extent. To do this, we built two flood inundation models using LISFLOOD-FP version 6 (Neal et al., 2012). One model covered a section of An Giang Province in the Vietnamese Mekong Delta and the other a 15-km reach of the Ba catchment in Fiji. The An Giang model uses hydrographs from Chau Doc and Vam Nao gauging stations as the upstream boundary condition, while the downstream boundary is set as the water level height from the Long Xuyen gauge, with all these records available from the Mekong River Commission (MRC). We chose to simulate the year 2001. This particular year was selected for several reasons. First, the flood was severe with estimated damages at over USD 200 million and approximately 300,000 homes damaged in the Vietnamese Mekong Delta (Chinh et al., 2016). While the return period of the 2001 flood is unknown, Le et al. (2007) estimated that the moderately larger flood in 2000 had a return period of 20 years. Second, after the floods of 2000 and 2001, and with the shift from low dikes (0-2 m) to high dikes (>3.5 m) to facilitate triple rice cropping (Kontgis et al., 2015), extensive flood prevention structures have been built in An Giang. The expansion of paddies protected from high dikes in An Giang has risen from <10,000 ha in 2000 to >140,000 in 2011 (Duc Tran et al., 2018), with these structures being recognized as being important in protecting against damaging floods (Chapman et al., 2016). Considering that SRTM was acquired in 2000, the flood prevention structures have changed the topography represented in SRTM, with flood studies analyzing later periods needing to update dike information (Duc Tran et al., 2018;Dung et al., 2011;Triet et al., 2017). Even though the 2011 flood was hydrologically similar to that of the 2000 flood, 71% of An Giang was flooded in 2000 compared to 30% in 2011 (Dang et al., 2016;Mekong River Commission, 2011), with flood prevention structures found to be the main cause of hydrological alterations (Dang et al., 2016). Third, we were restricted by the availability of gauge data. Geometry data for the channels were gathered from the Global Width Database for Large Rivers (GWD-LR) river width database (Yamazaki et al., 2014) and bathymetry from a 2008 survey conducted by the MRC with cross sections approximately every 250 m. The channel was assumed to have a rectangular shape, with bathymetry values assigned by interpolating the cross sections. Manning's friction parameters (Chow, 1959) were set as 0.03 for the channel and 0.05 for the floodplain, which are both realistic and performed well in a larger Mekong flood model built with LISFLOOD-FP.
For the Ba model,we estimated a 50-year hydrograph using the regional flood frequency analysis approach of Smith et al. (2015), utilizing meteorological data from the Fiji Meteorological Office. The downstream water level boundary condition at the coast was set at 0 m, even though this value is highly uncertain as heavy rainfall is likely to occur at the same time as a storm surge to compound flooding (Wahl et al., 2015;Zscheischler et al., 2018). As the domain size of the Ba reach model is comparatively small, the river width was estimated from Google Earth™ imagery. The river depth was estimated such that the river conveyed the 1-in 2-year return period before going out of bank. While this bankfull discharge value varies considerably around the world, a return period of 2 years is a generally accepted average value (Pickup & Warner, 1976;Williams, 1978), with a return period of 2 years being found in similar rivers in Fiji (Terry, 2007;Terry et al., 2002). Finally, Manning's friction parameters were set as 0.035 for the channel and 0.04 for the floodplain based on typical values for agricultural floodplains. For each location, we built four models-one at 90-m resolution using the SRTM DEM, another at 90-m resolution using the MERIT DEM, a further 90-m version using resampled LIDAR, and a final 30-m resolution model built using LIDAR data to act as a benchmark model. These four models are the deterministic models used in our analysis. We make the assumption that the LIDAR based model will make the best flood simulation as it is based on the most accurate topographic data, so is deemed a benchmark. In the absence of validation data, we further assume that this benchmark model is the closest to the true situation. A 30-m resolution was chosen for the LIDAR model based on available computational resources. Next we used the SRTM and MERIT models and replaced the DEMs with simulated DEMs, consequently forming our DEM ensembles. Three sets of DEM ensemble models were built-one by simulating the MERIT DEM using an average MERIT semivariogram, another by simulating the MERIT DEM by MERIT land cover semivariograms, and a final by simulating the SRTM DEM by SRTM land cover semivariograms. In this case an average semivariogram refers to the average semivariogram parameters across all 20 locations or in other words a representative floodplain semivariogram. For the An Giang model, each DEM ensemble contained 200 DEMs, and for the Ba model each ensemble contained 500 DEMs. All other flood model parameters were kept the same, thus allowing us to determine how topography alone impacts the simulated inundation extent. The maximum inundation extent for the model runs using the simulated DEMs was converted to a binary wet/dry map. Subsequently, these maps were merged together to create an inundation probability map, whereby (based on the simulated DEMs), we can delineate pixels we are most confident would flood. In this case, inundation probability refers to the percentage of occasions when a particular pixel is inundated, so in a deterministic model this is either 0% or 100%, while for the DEM ensembles the probability can take any value between this range. We checked that the number of simulations were adequate for the probability to converge by taking a subset of simulated DEMs and producing inundation probability maps.

Results and Discussion
In this section we present and discuss results of the three components of our analysis: semivariograms, DEM simulation, and flood inundation. As each component builds on the previous we decide to present our findings in this particular way.

Semivariograms
First, we plot the empirical and fitted semivariograms for each study site (Figure 2). Our fitting procedure is appropriate as the fitted semivariograms align well with the empirical results. The excellent fit of the semivariograms further justifies our choice to not use copula functions as the spatial error structure is effectively estimated by our choice of semivariogram model. Broadly speaking, all locations have similar semivariograms for the MERIT DEM, with these often having considerably different semivariogram parameter values than the SRTM equivalents (Figure 2). For example, the sill values for the MERIT DEM are markedly lower at the Mekong, Roanoke, and Savannah sites. Across all study sites, the MERIT DEM has lower sill values (0.7-2.2 m) compared to SRTM (1.0-4.8 m) and larger range values as well (308-4,364 m compared to 298-1,931 m). A detailed table of fitted semivariogram parameter values can be found in the supporting information. Lower sill values mean that the DEM is more accurate, and a larger range means that the error is more spatially dependent.
As all sites are floodplains with a similar topography, differences in semivariogram parameters are likely to come from another source. As noted earlier, vegetation has a large influence on DEM error; thus, we produce semivariograms by land cover class. We take land cover class data from the Climate Change Institute (CCI) land cover data set (http://maps.elie.ucl.ac.be/CCI/viewer/). The CCI land cover data set has annual records from 1992 to 2015, with our analysis using records from 2000 as this was the year of SRTM acquisition. In total, there are 38 land cover class and subclasses. To calculate the semivariograms, we first resampled the CCI land cover data set from its 300-m resolution to the resolution of the MERIT DEM. Therefore, each MERIT pixel had an associated land cover class. We then selected land cover classes with over 600 pixels to produce semivariograms that fitted well, with this threshold chosen by trial and error. Here we plot results for the Burdekin site as for this location there are several land cover types with a sufficient number of pixels for analysis ( Figure 3). Semivariogram parameters values for all 20 land cover types found for sites in this study can be found in the supporting information.
Comparing the surface error map with satellite imagery suggests that the areas of largest error for the Burdekin are vegetated with mangroves. The semivariograms in Figure 3 corroborate this theory with mangroves having the largest semivariance. Evergreen tree cover also has a large error, followed by the categories of shrubland. Irrigated cropland has the lowest error, suggesting that land cover with lower vegetation heights has less error. Furthermore, embankments tend to have larger errors as evidenced in our surface error maps. Therefore, we can deduce that the similarity between semivariograms is influenced by the land cover class, with land cover classes with higher vegetation heights having a greater influence. Despite some vegetation removal in the MERIT DEM, vegetation artifacts remain a key source of error but are a considerable improvement on the SRTM product.

DEM Simulation
Based on our calculated semivariograms, we can simulate plausible ensembles of the MERIT and SRTM DEMs, thus allowing modelers to move beyond using a single DEM to using a catalog of DEMs. In our initial analysis, we have calculated semivariograms for 20 locations to estimate an average floodplain semivariogram for both the MERIT and SRTM DEMs. With the number of locations, we also estimated semivariograms by land cover type, resulting in 20 out of the 38 land cover classes and subclasses of the CCI data set having estimated semivariograms. The exact land cover classes covered are outlined in the supporting information. These land cover semivariogram estimates allows us to simulate DEMs by land cover class with the workflow outlined in Figure 4. For one to simulate by land cover class they first need to extract DEM pixels by land cover class, then apply the associated land cover semivariograms to those pixels and repeat for the number of land cover classes. When a land cover class does not have a semivariogram, we revert to the average semivariogram. This approach makes our method more relatable to other locations, with these semivariograms available for both MERIT and SRTM. We simulate reasonable versions of the DEM as highlighted by the cross-sectional profiles in Figure 5, but one should always inspect the simulated DEMs to gauge whether these estimations are reasonable. Finally, one should only use these semivariograms on floodplain locations as we have not tested on terrain with steep relief so cannot be confident with the relationships.

Flood Inundation
The simulated DEMs are subsequently used in flood models for two locations-An Giang and Ba. (Figures 5  and 6). We use these two sites to demonstrate the impact of topographic uncertainty on flood predictions for two reasons. First, we believe they represent end members of floodplains, as the Ba floodplain is small and constrained within a valley, whereas the An Giang floodplain is large and is not constrained by valley sides. Furthermore, we were restricted by the data availability (e.g., flow data) to build the flood models. DEM ensembles are simulated for the MERIT and SRTM using the average floodplain semivariograms and semivariograms disaggregated by land cover as discussed in section 3.2. By using an ensemble of simulated DEMs we can produce flood inundation probability maps, whereby the inundation probability refers to the number of ensemble members in which the pixel in question is flooded. For example, if a pixel was flooded in 300 DEMs in an ensemble of 500 DEMs then the inundation probability would be 60%.
First, we evaluate the flood models by calculating four commonly used skill scores: critical success index (CSI), hit rate, miss rate, and false alarm rate (Horritt & Bates, 2001;Sampson et al., 2015;Stephens et al., 2014; Table 2). The CSI measures the fraction of correctly predicted events, penalizing for both misses and false alarms. This is an adjustment of the proportion correct score for the quantity being forecast (Wilks, 2011). CSI scores range from 0 indicating no skill to 1, which is a perfect score. The hit rate is the rate of correctly predicted inundated pixels. Conversely, the miss rate measures pixels that are not predicted in the model but are flooded in the observations (i.e., model underprediction). The false alarm rate refers to incidences where the model predicts flooding, but the observed floodplain state is dry (i.e., model overprediction). In this analysis, the LIDAR model at 30 m was assumed to be the observation. To allow for direct comparison to the 90-m resolution that the other models were run at, the 30-m data were resampled using bilinear interpolation. The LIDAR model at 90 m had the best performance for both sites. However, the LIDAR model at 90 m for An Giang had only a marginally better CSI score than the MERIT and SRTM models, with this primarily due to a relatively high false alarm rate. Out of the global DEMs, MERIT performs better than SRTM. It is noticeable that there is a marked difference in performance of the two flood models, with An Giang model performing poorly 10.1029/2018WR023279 (maximum CSI of 0.36) and the Ba model performing very well (maximum CSI of 0.90). This goes to highlight the difficulty in modeling small-magnitude events in areas where the floodplain is not constrained (An Giang).
In Table 2 we give the range of skill scores for each member of the DEM ensembles. We can conclude that using DEM ensembles of simulated DEMs improves model performance as it allows for the bounds of error in the DEM to be explored. Skill scores can vary considerably. For example, the CSI in Ba is 0.77 for the MERIT model, 0.58 for SRTM, and ranges from 0.60 to 0.87 for the DEM ensemble of MERIT simulated using land cover semivariograms. The hit rate and miss rate have particularly large ranges for DEM ensembles using MERIT compared to the ensemble using SRTM simulated by land cover semivariograms. We deduce that this is likely to be a feature of the DEM simulation process that is making the generated DEMs more noisy. From the semivariograms presented earlier (Figure 2), we know that SRTM is noisier than MERIT as indicated by the shorter-range values. A noisier DEM indicates that the neighboring pixels are more different, thus making flow, or connectivity, more unlikely. We suggest that the noisiness (lack of connectivity) results in the SRTM underpredicting flooding at both test sites (Figures 5 and 6). So by simulating the MERIT DEM we can unintentionally make the DEM noisier, thus reducing connectivity and thus flooding. Conversely, the DEM simulation process can correct some of the key pixels that control connectivity and thus the inundation extent.
To further understand the flood inundation results, we visualize the flood inundation maps in Figures 5 and 6. The simulations using an ensemble of DEMs bracket the predicted extent of the benchmark LIDAR model and deterministic MERIT and SRTM models, with the MERIT simulation being closer to the benchmark for both case studies. Areas of higher inundation probability are generally closer to that of the benchmark LIDAR model. For the An Giang case study, the ensemble of simulated DEMs activates floodplain flow pathways that are present in the higher-resolution LIDAR data but are not in MERIT/SRTM. These flow pathways are switched on/off by using an ensemble of DEMs, with inundation extent varying significantly. The large variation in inundation extent is also a result of the unconfined floodplain environment of the delta meaning it is difficult to limit the flood. In the MERIT land cover DEM ensemble, the higher inundation probabilities (light blue) more closely align with the LIDAR benchmark model, even though there is still some overprediction in the middle of the domain similar to the deterministic MERIT model ( Figure 5). Nevertheless, the DEM ensemble models do capture the flooding in the top right of the domain, which is not present in the deterministic MERIT model. For the Ba case study, the variation in inundation extent is more constrained with a larger area with higher inundation probability. This is due to the confined river valley setting meaning the large flood fills the valley floor.
In Figure 6 we explore what impact the differing estimated inundation extents can have on exposed assets by adding in roads and residential areas from the OpenStreetMap™ (https://planet.openstreetmap.org) database. We choose to include asset data so when we compare model results we can determine what the actual impacts might be. This adds a qualitative component to the more traditional skill score metrics.
Research into the presentation of flood hazard maps is extensive and outside the scope of this study, so we encourage interested readers to consult the literature (Alfonso et al., 2016;Di Baldassarre et al., 2010;Hagemeier-Klose & Wagner, 2009;Meyer et al., 2012). We can see that some assets are inundated (highlighted by white dashed boxes in Figure 6) in the LIDAR models but are not in the MERIT and SRTM models. In this situation, we have the luxury of a high-resolution benchmark model, but in most data-sparse areas a decision maker would be presented with either the deterministic MERIT or SRTM simulations, thereby missing some at-risk assets in this case. By using a DEM ensemble, these assets that have been missed have a relatively high inundation probability (≈50-70%). Thus, if you presented these maps to a decision maker, they would be at least aware that these assets may in fact be at risk and could allocate resources as they see fit.
In other words, by using an ensemble of DEMs we get closer to the true situation (with the assumption that the benchmark model is the true flood) and avoid the spurious precision in flood estimates from using a single DEM and allowing risk assessors to identify pivotal locations where (often) limited resources can be used most effectively.
To determine which DEM simulation method is most effective at estimating inundation extent, we produce density plots for both An Giang and Ba (Figure 7). This plot type was chosen over a more conventional histogram as it normalizes the difference in inundated area, which is particularly apparent in the An Giang example. In this analysis we make the assumption that the LIDAR 30-m benchmark model is the true flood situation in the absence of any flood observation data. Pixels in the LIDAR model are compared to their counterparts in the DEM ensemble for each DEM simulation approach. Pixels are binned into two categories: (1) correctly predicted (blue) when both pixels are inundated and (2) incorrectly predicted (red) when pixel in either the LIDAR or DEM ensemble model is not inundated. The corresponding inundation probability for the pixel in question is then plotted against the density of observations. This allows us to visualize the distribution of inundation probability for correctly and incorrectly predicted pixels. The dashed lines show the mean of this distribution. We can see that the DEM ensemble simulated using the MERIT DEM and using the fitted land cover semivariograms give the inundation extent closest to that of the LIDAR (indicated by blue dashed line). This is less apparent for Ba as the mean inundation probability value is just 0.3% more than the DEM ensemble of MERIT using the average floodplain semivariogram. The difference in the distributions in Figure 7 not only highlights the challenge of flood inundation estimation in unconstrained floodplains such as for the An Giang case but also indicates that we cannot take a probability threshold value to delineate pixels that we can be confident will flood. In other words, this distribution is location dependent. For a floodplain environment like the Ba case study, our DEM simulation approach is more effective as the variation in topography simulated in the DEM ensemble has less impact than an environment like An Giang. Lastly, the MERIT DEM simulation consistently predicts inundation closer to the LIDAR model, so we would recommend using the MERIT DEM simulated using land cover semivariograms.
In this analysis we have deliberately chosen to change just the topography information so we can assess the impact of topographic error on flood estimates. While it is beneficial to incorporate uncertainty analysis from flow uncertainty to more sophisticated approaches such as compound flooding (Moftakhari et al., 2017;Wahl et al., 2015;Zscheischler et al., 2018) or event generation Neal et al., 2013), it would create an unworkable parameter space, thus making it challenging to draw robust conclusions. Yet we would encourage others to utilize such approaches and for others to investigate the contribution of topography against other parameters in flood models.
As effective computing power continues to grow, it is increasingly possible to run multiple flood models to test sensitivity to parameters. These have almost exclusively focused on hydraulic parameters, with topography largely ignored due to the lack of alternative data sets. We have demonstrated that topographic uncertainty has a large impact on inundation extent and should therefore be included in any flood hazard estimation. These results suggest that simulating DEMs by land cover semivariograms is most appropriate.
In theory, one could take the semivariograms produced in this study to simulate floodplain MERIT or SRTM DEMs for any location where the MERIT and SRTM data sets exist. While possible, and made available through the R Package accompanying this work, we are reluctant to state that the relationships found in this work can be applied globally. Yet in the absence of being able to quantify the spatial error structure for every floodplain location (which remember would need an accurate high-resolution DEM, LIDAR) and given the similarity of the semivariograms produced here, we cautiously suggest that the semivariograms here can be used to simulate DEM ensembles, especially when land cover is considered.

Limitations and Future Work
The method presented here is intended to give flood modelers primarily working in data-sparse areas a quick and efficient method to simulate plausible DEMs from the MERIT and SRTM DEMs. Ultimately, these simulated DEMs are not necessarily a better version of MERIT/SRTM but are a realization of these products. One should consider that the MERIT and SRTM were acquired in 2000 and have numerous errors so when choosing to use MERIT or SRTM a modeler should ask themselves whether these DEMs are good enough for the need of their study. If the answer is a no, but higher-quality data are unavailable or if the model resolution is prohibitive for available computing power, our approach can at least get closer to the truth and avoid the spurious precision of just using a single DEM. One should also consider that modeling at a higher-resolution costs substantially more computing power, with  finding that halving hydraulic model resolution leads to a 10 times increase in compute costs. Thus, even if a higher-resolution DEM is available, it may be worth modeling at a coarser resolution to explore the sensitivity of not only the DEM but to other model parameters, similar to the approach advocated by Savage, Pianosi, et al. (2016). While currently our work is focused on MERIT and SRTM, we intend to expand this analysis to the Advanced Land Observing Satellite AW3D30 DEM and when released the NASADEM. As outlined in section 2.3 our approach could be made more complex, but we intend our work to be as accessible as possible. When using the R Package DEMsimulation created from this work, one should remember that several of the land cover semivariograms only contain a single semivariogram (detailed in the supporting information) so our estimated spatial error structures for these are highly uncertain.

Conclusions
This study presents a method to simulate plausible DEMs from the MERIT and SRTM DEMs. We calculated the spatial error structure and fit semivariograms for both MERIT and SRTM by assessing against a reference data set (LIDAR) for 20 lowland locations. We found that the MERIT is consistently more accurate than SRTM (semivariogram sill values of 0.7-2.2 m compared to 1.0-4.8m), with the errors in MERIT being more spatially dependent as indicated by larger range values (308-4,364 m) compared to SRTM (298-1,931 m). Further semivariograms were produced by land cover type, showing that the spatial error structure differs by land cover, with tree-covered areas having the largest errors. These fitted semivariograms are then used to simulate candidates of the MERIT and SRTM DEM, which are in turn run through a flood model to assess the impact of topographic uncertainty on the hazard. We show that using multiple plausible DEMs avoids the spurious precision in prediction given by deterministic models, with higher probabilities of inundation closer to that of the truth model. By using ensembles of simulated DEMs we improved the skill of the flood predictions, with an increase in CSI of 0.44 from 0.26 for An Giang and 0.87 from 0.77 for Ba when we consider global DEMs. This is due to DEM simulation being able to explore the bounds of error in the DEM that enables the true connectivity of the floodplain to be approximated. Simulating the MERIT DEM by land cover class consistently gives inundation estimates closest to that of the true situation (with the assumption that the LIDAR 30-m model is the benchmark); thus, we recommend using a DEM ensemble simulated using this technique. We further find that the distribution of inundation probability can vary considerably between floodplain locations, so one should not take a probability threshold to determine what pixels will flood. This work makes it possible to use more than a single DEM for any floodplain location as we can now simulate plausible versions of MERIT or SRTM using either a representative floodplain spatial error structure or by a global land cover map. This represents a significant shift in modeling efforts where the lack of data has restricted our attempt to understand the impact of topographic uncertainty. Future work will include adding more semivariograms and assessing the sensitivity of topography compared to other parameters in flood models. The flexibility of this approach means this method can be used once new DEMs are released, as this technique needs only a DEM and a reference truth data set. The code to simulate DEMs is freely available for research and education purposes (https://github.com/laurencehawker/DEMsimulation).