Global Estimates of Reach-Level Bankfull River Width Leveraging Big Data Geospatial Analysis
Abstract
Recent progress in remote sensing has snapshotted unprecedented numbers of river planform geometry, providing opportunity to revisit the oversimplified channel shape parameterizations in global hydrologic models. This study leveraged two recent Landsat-derived global river width databases and created a reach-level width dataset to measure the validity of model parameterizations at ~1.6 million kilometers of rivers in length. By showing state-of-the-art parameterization schemes only capture 30–40% of the width variance globally, we developed a machine learning (ML) approach surveying 16 environmental covariates, which considerably improved the predictive power (R2 = 0.81 and 0.77 for two testing cases). Beyond the commonly discussed upstream basin conditions, ML revealed that local physiographic factors and human interference are also important covariates for width variability. Finally, we applied the ML model to estimate bankfull river width, creating a new reach-level dataset for use in global hydrodynamic modeling.
Key Points
- A machine learning approach to predicting global bankfull river width was developed
- Local physiographic factors and human interferences are also important covariates for width variability
- Estimates of global reach-level bankfull river width were provided for use in models
Plain Language Summary
Large-scale river models typically parameterize channel shapes (e.g., bankfull width) based on discharge or drainage area with a power law model or stream order with look-up tables. These highly simplified representations reflect our limited understanding of spatial variability of channel shapes, leading to great uncertainty in global river modeling. Using the most up-to-date global snapshots of channel planform geometry derived from satellite images, this study revisited state-of-the-art channel shape parameterizations used in large-scale river models. We also used a machine learning approach to improve the predictive power for the width spatial variability. Results from this study can complement the understanding of downstream hydraulic geometry from a data mining viewpoint, which also informs the improvement of channel shape parameterization in models.
1 Introduction
River width is a fundamental channel geometry parameter determining the fluxes exchange between the river corridors and the atmosphere (Allen & Pavelsky, 2018; Downing, 2012; Raymond et al., 2013). River width at channel-forming flow, that is, bankfull width, is also a key input to hydrodynamic models for accurate flood inundation mapping because it is used, together with bankfull depth, to distinguish model representations of in-channel and floodplain flow dynamics (Andreadis et al., 2013; Yamazaki et al., 2011). Despite the wide variety of river geomorphologies that exist in nature, representation of channel shapes in large-scale river models is often oversimplified, which reflects our limited understanding of spatial variability of global river geomorphology.
The most widely accepted form for channel geometry estimation relates river width (W) to discharge (Q) or drainage area (A) in a power law model (W = a · Qb or
(see Gleason, 2015, for a review), which essentially dictates the channel widening mechanism as discharge increases downstream. Since its initial proposal by Leopold and Maddock (1953), this geomorphic equation describing the downstream hydraulic geometry (DHG) relationship has been verified by numerous field studies, and geomorphic understanding has been facilitated by examining variations in coefficients (a, a′) and exponents (b, b′) (Harman et al., 2008; Merritt & Wohl, 2003; Moody & Troutman, 2002; Williams & Rosgen, 1989). Owing to the simple and efficient hydrologic scaling, the DHG relationship was often used to parameterize channel shapes in large-scale river models, where a locally fitted empirical equation is often ubiquitously adopted across regional, continental, or even global domains. For example, Andreadis et al. (2013) used the DHG equation by Moody and Troutman (2002) (hereafter MT02; a = 7.2, b = 0.5) and generated a quasi-global scale bankfull river width dataset. Although only assessed along nine rivers, this dataset has been adopted in flood inundation modeling and travel time estimation globally (Allen et al., 2018; Andreadis & Schumann, 2014; Schumann et al., 2013). Yamazaki et al. (2011) estimated a and b by trial and error in constructing a global physically based model CaMa-Flood (Yamazaki, 2014), and it has been used for flood inundation mapping in many regions (Pappenberger et al., 2012). Schulze et al. (2005) derived their coefficient and exponent against bankfull flow velocity observations, and the equation has been adopted by a global water resource model WaterGAP. There are also some simpler alternatives, such as the U.S. National Water Model (NWM, https://water.noaa.gov/about/nwm) predicting streamflow 2.7 million river reaches (Gochis et al., 2018; Lin, Hopper, et al., 2018; Lin, Rajib, et al., 2018; Maidment et al., 2016), where channel shape parameters are assigned for each stream order using look-up tables.
Despite the widely known variations in power law exponents/coefficients and their geomorphological interpretations (Miller et al., 2014; Rhoads, 1991; Singh et al., 2003), large-scale river models still use the above simplified parameterizations for several reasons. First, this type of “one-relationship-for-all” strategy is convenient for model scaling up. Although optimizing the DHG equations basin-by-basin (Beighley et al., 2015; Beighley & Gummadi, 2011; Hoch et al., 2017; Luo et al., 2017; Miller et al., 2014; Paiva et al., 2011) may better capture the spatial variability, it is computationally expensive and still ignores within-basin variability and possible breakdowns of DHG. Second, modelers tend to rely on parameter calibration (e.g., van Beek et al., 2011) for seemingly “unobservable” parameters, under the assumption that they can be adjusted to physically meaningful values. Unfortunately, it suffers from “equifinality” complaints (Beven, 2006), where issues could arise when too many unknown parameters are subject to calibration, that is, errors from different parameters might compensate for each other, resulting in correct simulations under physically unrealistic parameters. These difficulties facing the global modeling community are calling for the use of all best available observations (e.g., from spaceborne/airborne platforms and ground networks) in constraining model parameters as much as possible in a spatially explicit way, which is critically important.
Global surveys of channel planform geometry have become available only recently owing to the progresses in remote sensing, which can be potentially used to address these issues. Advancements in image processing algorithms (Pavelsky & Smith, 2008; Yamazaki et al., 2014; Yang et al., 2019) have allowed automatic extractions of river channel width from satellite imagery at continental to global scales (Allen & Pavelsky, 2015, 2018; Isikdogan et al., 2017; Yamazaki et al., 2019). In line with the goals of the forthcoming Surface Water and Ocean Topography (SWOT) mission (Biancamaria et al., 2016), these datasets provide, for the first time, unprecedented number of samples tailored to global rivers, which can significantly improve our understanding of various global channel forms. However, their usage in advancing large-scale river models have not been fully assessed. Despite demonstrated accuracy, these widths were often snapshotted at mean flow conditions, while bankfull width, a more meaningful parameter for hydrodynamic models, was not estimated globally.
Using the global river width dataset by Allen and Pavelsky (2018), Frasson et al. (2019) recently concluded that a traditional geomorphic equation (MT02) is valid in describing the overall discharge-width relationship for global rivers below 60°N. However, this reassured general relationship is limited in offering useful insights to global river modeling, partly because the global modeling community is now working towards its hyper-resolution goal (Bierkens et al., 2015; Wood et al., 2011) that emphasizes fine-scale variability instead of general relationships. Given that high-resolution geospatial datasets tailored to global rivers are becoming increasingly available (e.g., Grill et al., 2019; Lin et al., 2019; Shen et al., 2017), it is imperative to assess the state-of-the-art modeling strategies in a comprehensive data survey to inform future modeling developments.
It is therefore the goal of this study to leverage the use of the most recent satellite-derived global river width datasets, to study global channel form variability, their influencing factors, as well as to develop a global solution to predict bankfull river width using a data-driven approach. First, we describe the methodology in creating a new reach-level river width dataset, which is used as the reference data for machine learning training and the big data geospatial analysis (section 2). Next, we use the reference data to demonstrate issues with state-of-the-art channel width parameterization strategies in global hydrologic models, focusing on their implications to hyper-resolution modeling (sections 3.1 & 3.2). We will then present the machine learning model validation and its application in estimating bankfull channel width in sections 3.3 & 3.4, followed by discussions, concluding remarks, and future work of this study.
2 Methods
2.1 Global River Width Datasets and Reach-Averaging
We obtained two observed river width datasets from the Global River Width from Landsat (GRWL) database (Allen & Pavelsky, 2018) and the MERIT Hydro product (Yamazaki et al., 2019). Both datasets represent state-of-the-art river width estimates at the global scale, each containing tens of millions of width readings derived from Landsat images where different algorithms were used; for detailed descriptions of the two datasets, readers are referred to Allen and Pavelsky (2018) and Yamazaki et al. (2019). To study the global scale variability, we first calculated reach-averaged river widths (see supporting information, Text S1 and Figure S1 for details). This is because we note that studying cross-sectional widths, as required by traditional hydraulic models (Gichamo et al., 2012; Ridolfi et al., 2014), is beyond the scope of current global river modeling partly due to computational constraints. In addition, possible noises/uncertainties can also be reduced, in line with Harman et al. (2008). The underlying river network for reach-averaging is MERIT Hydro flowlines (Yamazaki et al., 2019), which was vectorized by Lin et al. (2019) with a channelization threshold of 25 km2. Under this threshold, ~2.94 million reaches were derived globally (median length equals 6.8 km). Although GRWL and MERIT Hydro width estimates were both sourced from Landsat imageries at 30-m resolution, we found differences in these two estimates especially for the narrow river spectrum (Figure S2), a likely result of their algorithm differences. Therefore, the two estimates were averaged to create a new reference dataset that one has better confidence in its accuracy (Text S1). The above preprocessing reduced the raw tens of millions of width data readings to ~140 thousand reach-averaged widths (see coverage in Figure 2e; total length: ~1.6 million kilometers), which we consider as the reference for the machine learning model training and the ensuing analyses.
2.2 Global Environmental Covariates for River Width Variability
Several environmental covariates were collected to study the spatial variability of global river widths based on our physical knowledge (Table 1; Text S2). Specifically, river form is a complex expression of its upstream conditions, with Q and A being the dominant factors to dictate the channel shape adjustment to these conditions. Thus, they were listed to represent these aggregated controls. Additionally, local climatic, soil, topographic, biologic, and geologic conditions may also reflect channel shape variations (Gardner et al., 2019), so we incorporated datasets of these factors, leveraging the consideration of data availability for global geospatial datasets (Table 1). For example, soil composition may indicate channel boundary and sediment load in rivers (Knighton, 1974), and riparian vegetation may alter the surface roughness (Ashmore & Sauks, 2006), thereby changing the width-discharge relationship. Some other factors (e.g., aridity, topography) were often discussed previously in the context of the best-fitted coefficients/exponents of the power law (Miller et al., 2014; Rhoads, 1991), but here we explicitly incorporated them using a data-driven approach to assess how they can improve the explained variance. Finally, since we also hypothesized that channel shapes may imprint signatures from human interventions (e.g., expansions of urban impervious areas, human water use, and dam regulations, all of which modify stream power), we additionally gathered datasets on these possible human controls in a comprehensive global river survey, which is another novelty of this study over previous studies that only focused on the Q-W scaling in naturalized environments (Dubon & Lanzoni, 2019; Ettmer & Alvarado-Ancieta, 2010),
Group | Description | Symbols | Resolution/processing | Data source |
---|---|---|---|---|
Aggregated controls | Mean annual flow | Q | Polyline based | Lin et al. (2019) |
Drainage area | A | Polyline based | Lin et al. (2019) | |
Local physiographic covariates (climate, geography, topography, vegetation, soil, groundwater, lithology) | Aridity index | AI | Gridded (~1 km) | Trabucco and Zomer (2019) |
Elevation | Elev | Gridded (~90 m) | Yamazaki et al. (2017) | |
Channel slope | Slp | Polyline based | Lin et al. (2019) | |
Sinuosity | Sin | Polyline based | Lin et al. (2019) | |
Leaf area index | LAI | Gridded (~1 km, 2010–2015 mean) | Zhu et al. (2013) | |
Soil clay, silt, sand content as mass percentage | Cly, Slt, Snd | Gridded (~250 m, first 4 layers mean, up to 30 cm deep) | Hengl et al. (2017) | |
Water table depth | WTD | Gridded (~1 km) | Fan et al. (2013) | |
Bedrock permeability and porosity | K, P | Polygon based | Huscroft et al. (2018) | |
Human intervention | Human water use (irrigational, industrial, domestic demand) | HW = Irr + Ind + Dom | Gridded (~10 km, 5-arcmin, 2010–2014 mean) | Wada et al. (2016) |
Urban fraction | Urb | Gridded (~30 m, value of Year 2015) | Liu et al. (2018) | |
Degree of dam regulation | DOR | Polyline based | Grill et al. (2019) |
Most of the global geospatial datasets are gridded, and they were obtained at their original resolution and zonally averaged to the 2.94 million MERIT unit catchments, each representing an irregular local drainage polygon surrounding the river reach (the global median size of these catchments is 36.8 km2). Because some datasets are temporally variable (e.g., leaf area index [LAI], human water use, and urban fraction), data from the most recent year(s) were either directly used or averaged to reflect the most up-to-date conditions (Table 1). Our analysis had the implicit assumption that channel forms do not change over time and are in equilibrium with external drivers, which is appropriate here given our focus on time-invariant spatial variability at the global scale. Table 1 lists the details of the gathered global geospatial datasets.
2.3 Machine Learning Algorithm and Parameter Tuning
For the machine learning (ML) tool, we chose nonparametric decision tree regressors because it can address nonlinear relationships, has less sensitivity to feature scaling and does not require feature normalization, and suffers less from the “black box” complaint compared to artificial neural networks (e.g., Zhang et al., 2018). A recently developed gradient boosting tree (GBT) method XGBoost (Chen & Guestrin, 2016) was used to train the transfer functions, as we found it to have better performance with significant lower computational costs compared to the widely used random forest tool. It also better avoids overfitting than other GBT methods.
The ML training and validation used a 90%/10% split of the reference data. An independent testing was performed by using widths obtained from gauge observations at ~1,200 North America gauges by Allen and Pavelsky (2018). Although widths for narrower rivers contain more uncertainties (Allen & Pavelsky, 2018), all reference data were used in ML training because the preprocessing (Text S1: reach-averaging only for >30 valid widths and ensemble average) was expected to reduce the data uncertainties to some degree. More importantly, river widths in nature follow a Pareto distribution, that is, narrower rivers are much more abundant than wider rivers (Allen & Pavelsky, 2015). Thus, the training data need to follow such a distribution for the ML algorithm to be generalized enough in an actual prediction setting. The target widths were natural log transformed for the training, and the predicted values were transformed back for validation. The ML optimization was performed by searching a predefined model hyper-parameter space: maximum tree depth (max_depth) in {3,4,5,6}, number of boosting trees (n_estimator) in {600,800,1000}, and learning rate (alpha) in {0.05,0.07,0.1}, which minimizes the root-mean-square error (RSME) with fivefold cross-validation. Tree-based ML algorithms by construction can rank the importance of features/covariates by counting the occurrence of covariates used to branch the tree nodes, and we used this field to understand the relative importance of each covariate.
3 Results and Discussions
3.1 How Much River Width Variance Is Explained by State-of-the-Art River Models?
Figures 1a–1c exemplify three widely used channel width parameterization schemes in global river models, that is, applying a universal relationship to define W based on Q, A, and stream order, respectively. Ordinary least square (OLS) regression suggests that Q has slightly better predictive power (R2 = 0.39) than A (R2 = 0.35), followed by stream order (R2 = 0.32). This is reasonable because A is only a proxy for Q without accounting for storage terms in the basin, while stream order further lumps the variability for efficient hydrologic scaling (Downing et al., 2012). For stream order, a seemingly erroneous inverse relationship is seen for lower order streams (Figure 1c), because these streams in global river models are often extracted using a uniform area threshold (e.g., 25 km2 in this study), and they can in fact correspond to higher order streams in natural systems especially for wet regions with high drainage density. Thus, it points to problems with models that use stream orders to parameterize channel shapes particularly for lower order streams. In all schemes, a significant portion of unexplained variance (60–70%) is noted, which is much higher than previous geomorphological reports (e.g., Moody & Troutman, 2002). The reason is partly because the reference data comprised much more global rivers (~1.6 million kilometers in total length) than previously and partly because the DHG equations (e.g., MT02) were only developed to characterize within-basin downstream geomorphic changes instead of to reflect basin-to-basin variability globally. To account for possible unexplained variance due to modeled Q uncertainty, we also used observed Q at >1,500 global gauges (Lin et al., 2019), which shows Q only explains ~30% of the width variability (Figure S3), confirming the difficulty in using traditional methods to represent the wide variety of channel forms globally. Although MT02 seems to well summarize the global Q-W relations at median values for each bin defined by Frasson et al. (2019) (Figure 1d), however, here we focus on the substantial departures of these estimates from reference widths (e.g., interquartile range and outliers), which could imply significant errors for global hydrologic models at many geographic locations.

3.2 What Is the Caveat of Using “One-Equation-for-all” in Hyper-Resolution Modeling?
We further show how generalizing a well-fitted equation globally can bring significant errors to hyper-resolution modeling by applying the “one-equation-for-all” strategy (using MT02 as an example) to mean conditions at different basin classifications (Figures 2a–2e; the basins provide a seamless global coverage of consistently sized and hierarchically nested sub-basins, which ranges from tens to millions of square kilometers; Lehner, 2014). The percentage bias (PBIAS) of MT02-estimated width (compared against the reference width) was used to show where the two estimates are close to each other and where they significantly diverge. Geographically, we found PBIAS for the Mississippi River Basin (where the MT02 in situ samples were collected) and river basins where humans have easy access to is generally small (within ±20%, yellow). By contrast, poorly gauged basins such as some arctic river basins, western Tibetan Plateau, or part of Africa show larger discrepancies between MT02-estimated and reference river widths. The results indicate that geographic sampling biases exist in deriving MT02, or any geomorphic equation in general, making them more suitable for places with more data samples and basins of human proximity and/or during near-average conditions (see more discussions in Text S3).

Another obvious spatial pattern is that as the basin classification gets finer and finer (Figure 2f), the MT02-estimated width shows greater and greater discrepancies from reference width, that is, locations with PBIAS less than ±35% decrease from 65% to 44% (from level 02 to 05) and to 37% in the reach-level estimates (Figure 2e). This has significant implications for global modeling as one might expect less errors if using MT02 to parameterize channel shapes for coarser-resolution models. As a comparison, huge errors may be expected if it is applied to construct fine-scale global river models with realistically represented channel networks (Lin et al., 2019; Sampson et al., 2015). In fact, geomorphic studies of stream channels were not designed for fine-scale variabilities (Moody & Troutman, 2002), which may explain the increasing errors with resolution as we observed. This further highlights the necessity to incorporate spatially explicit remote sensing information to advance existing high-resolution global river models.
3.3 Validation of the ML Model
Figure 3 shows the performance of the optimized ML model in predicting 10% split of the reference data (Figure 3a: R2 = 0.81, RMSE = 112.05 m). The explained width variance by using ML is close to or better than those fitting different power equations separately to individual river basins (Miller et al., 2014) and those in Figure 1. This suggests that in addition to dominant controls from upstream basin conditions, local climate, soil, vegetation, topography, and geology conditions that we introduced in Table 1 are also important in improving the predictive power for river width. The ML model also demonstrates generalizability, as shown by the independent testing case (Figure 3b: R2 = 0.77, RMSE = 53.24 m) which mainly consists of rivers narrower than 1,200 m. With the demonstrated confidence in the ML model, we then applied it to the original ~140 thousand reaches and found that the PBIAS of the ML-estimated width (Figure 3c) is much lower than that using the MT02 equation (Figure 2e), where percentage reaches with smaller PBIAS (±20%, yellow dots) has significantly increased to 77.2%. The split violin plot (Figure 3d, All) further reveals that ML tends to center the PBIAS distribution around zero, suggesting the elimination of the systematic negative bias (Text S3) seen earlier. Separation of width bins further shows that ML slightly overestimates width for narrow streams (<100 m) and slightly underestimates width for wide rivers (>2,000 m), while the width spectrum between 100–2,000 m was better estimated. This dependence of the ML performance may be related to the properties of the training data in each width bin, and a future study that establishes the ML training for different width bins may help avoid such dependence.

The bar plot of feature importance (Figure 3e) suggests that Elev, AI, WTD, soil composition, and Slp are the top five most important covariates, beyond Q and A. Catchment soil composition stands out on top of the commonly discussed climatic controls, possibly because it reflects the local depositional environment, which dictates grain size and presence/absence of cohesion and thus relates well to river geomorphology. Lithology covariates K and P are expected to be related to bedrock erodibility, but their rank is low, which may be due to their relatively poor estimates at the global scale (Beck et al., 2015)—subsurface properties are still among the most difficult-to-estimate geophysical parameters. Interestingly, HW and DOR, which are highly related to population density and human activities, rank similarly to physiographic covariates such as LAI and Sin, the latter of which was found to correlate well with global channel shapes (Frasson et al., 2019). This suggests that human interference also has some deterministic power in shaping the channel geometry. Urban fraction is surprisingly the least important covariate, but it is worth noting that 87.4% of our training river locations have zero Urb fraction due to the highly concentrated nature of global urban areas (Liu et al., 2018). This can artificially lower its rank because Urb was much less used in node splitting when constructing tree-based ML models. Although urbanization tends to be centered around (large) rivers and associated human activities may have influenced the channel geometry, such human signals may be dampened in this analysis lumping all geographic regions together, and a future analysis focusing on rivers with nonzero urban fractions may better reveal its importance to channel shapes. Overall, from a data-driven perspective to analyzing global rivers, our results still highlight the need to better understand river geomorphology in the Anthropocene, which supports recent findings that global free-flowing rivers are becoming increasingly rare now (Best, 2019; Grill et al., 2019).
3.4 ML-Derived Global Bankfull River Width
We finally applied the ML model to bankfull discharge to estimate bankfull width, assuming the relationship holds for different flow quantiles (Leopold & Maddock, 1953). Bankfull Q was approximated by a 2-year return period flood discharge (Andreadis et al., 2013), which was estimated using 35-year daily Q simulation at each of the 2.94 million river reaches (Lin et al., 2019). More specifically, at each river reach, we fitted Normal, Log-normal, Gumbel, and Log-Pearson Type III distributions to 35 annual maximum flow data, and the best fit was selected to construct flood frequency curves and compute Q at the 50% exceedance probability (see Figure S4 & Text S4 for the calculation and a discussion on its uncertainties). This allowed bankfull river widths to be estimated for global rivers wider than 30 m (Figure 4a), with a total length of 7.3 million kilometers and covering more locations than GRWL and MERIT Hydro alone (see Text S5 for details). These estimates are in general agreement with the DHG estimates (Andreadis et al., 2013), while they also well reflect local variability picked up by the reference data, as validated in Figure 3. Further visual comparisons with the ESRI Satellite Basemaps at several random locations also show that the reach-level widths (values displayed in Figure 4b) are close to, but slightly higher than (due to bankfull conditions), those estimated for main stem rivers (Figure 4c) and tributaries (Figure 4d) by differentiating the wetted surface from the vegetation canopies on the satellite images (note the scale bars). Therefore, constrained by the best available data, these estimates should be more appropriate for future use in improving reach-based global river models, such as that presented by Lin et al. (2019).

4 Summary and Concluding Remarks
Tailored to global rivers, this study conducted a big data geospatial analysis to understand the spatial variability of global river widths, where a machine learning approach was developed to estimate global river width at bankfull discharge. By leveraging two recent Landsat-derived global river width databases containing tens of millions of width readings, we created a new reference global river width dataset covering 1.6 million kilometers of rivers in length to study global rivers. By using the reference data, we revisited three channel width parameterization schemes (using mean flow, drainage area, and stream order to scale channel width, Figure 1) widely adopted by large-scale river models, where a large portion of unexplained variance was found (60–70%). We further showed the issues with generalizing a well-fitted local DHG equation globally, including geographic sampling biases, as well as increasing errors as the equation was applied to finer and finer basin classifications, highlighting the necessity to incorporate spatially explicit remote sensing information for accurate high-resolution global river modeling. By comprehensively surveying 16 environmental covariates, we showed that an optimized ML model can increase the explained variance (R2 = 0.81 and 0.77 for two testing cases). Beyond Q and A, elevation, aridity, water table depth, soil composition, and slope stood out as important covariates, while smaller but detectable influences from human activities were noted. Finally, we created a new global reach-level bankfull river width dataset based on the demonstrated effectiveness of the ML, offering a simple and robust way for river width estimation. The effort is in line with the upcoming SWOT satellite mission goal that aims at better understanding global river channel geometry and improving the modeling capability to predict global river flows and flood inundation area.
Several caveats are worth mentioning to inform future research. First, this study only focused on static controlling factors, yet channels are dynamically changing in response to glacial, tectonic, fluvial, pluvial, climatic, volcanic, and coastal processes (Dubon & Lanzoni, 2019), which can leave signatures on channel form variability over regions of strong influences (e.g., Allen et al., 2013). Other effective discharges above erosion threshold (Knighton, 1974) may also play a role in shaping channels. However, these more complex factors were not explicitly considered here. Second, given a lack of high-quality observed width data for global narrow streams (30–100 m wide), uncertainties may be propagated to our estimates. For streams under 30 m wide, currently there is no good reference data, limiting our estimation to rivers wider than 30 m only (Text S3). Thus, we argue that there is a need for future remote sensing studies to develop systematic and effective width extraction approaches tailored to the global narrow river spectrum (e.g., Feng et al., 2019), to fully resolve the issue. Last but not least, in this study we only focused on channel width, instead of depth, due to limitations of observation techniques and unavailability of high-quality datasets. Our ultimate goal for understanding global channel geometry and improving global river modeling is to estimate three-dimensional surfaces of river corridors using a data-driven approach. Future studies should also develop methods to derive effective channel depth at reach scales by leveraging tens of thousands of Acoustic Doppler Current Profiler (ADCP) data available across the continental US—the hydroSWOT database (Canova et al., 2016). Despite anticipated uncertainties, such types of measurements contain critical information to constrain channel geometry estimates, which should also be utilized to improve spatially explicit models.
Acknowledgments
We acknowledge funding support from NASA on Algorithm Development for SWOT River Discharge Retrievals #NNX16AH84G. Datasets created in this study are shared at http://doi.org/10.5281/zenodo.3552776 for research purposes. Python scripts for river width processing, extraction of ML covariates, and calculation of bankfull discharge are also openly available at GitHub: https://github.com/peironglinlin/GlobalRiverWidth. We thank Kai Zhang and Xiaogang He for the helpful discussions on ML techniques. R.P.d.M Frasson was partially supported by the NASA SWOT Science Team Project NNX16AH82G. G. Allen was partially supported by NASA THP #NNH17ZDA001N.