Volume 15, Issue 2 e2022MS003089
Research Article
Open Access

Topological Relationship-Based Flow Direction Modeling: Mesh-Independent River Networks Representation

Chang Liao

Corresponding Author

Chang Liao

Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory, Richland, WA, USA

Correspondence to:

C. Liao,

[email protected]

Contribution: Conceptualization, Methodology, Software, Validation, Formal analysis, ​Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization, Supervision

Search for more papers by this author
Tian Zhou

Tian Zhou

Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: ​Investigation, Resources, Writing - review & editing, Project administration

Search for more papers by this author
Donghui Xu

Donghui Xu

Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: Formal analysis, ​Investigation, Resources, Writing - review & editing

Search for more papers by this author
Matthew G. Cooper

Matthew G. Cooper

Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: Formal analysis, ​Investigation, Resources, Writing - review & editing

Search for more papers by this author
Darren Engwirda

Darren Engwirda

T-3 Fluid Dynamics and Solid Mechanics Group, Los Alamos National Laboratory, Los Alamos, NM, USA

Contribution: Software, ​Investigation, Resources, Data curation, Writing - review & editing

Search for more papers by this author
Hong-Yi Li

Hong-Yi Li

University of Houston, Houston, TX, USA

Contribution: Formal analysis, ​Investigation, Writing - review & editing

Search for more papers by this author
L. Ruby Leung

L. Ruby Leung

Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: Formal analysis, ​Investigation

Search for more papers by this author
First published: 24 January 2023
Citations: 1


River networks are important features in surface hydrology. However, accurately representing river networks in spatially distributed hydrologic and Earth system models is often sensitive to the model's spatial resolution. Specifically, river networks are often misrepresented because of the mismatch between the model's spatial resolution and river network details, resulting in significant uncertainty in the projected flow direction. In this study, we developed a topological relationship-based river network representation method for spatially distributed hydrologic models. This novel method uses (a) graph theory algorithms to simplify real-world vector-based river networks and assist in mesh generation; and (b) a topological relationship-based method to reconstruct conceptual river networks. The main advantages of our method are that (a) it combines the strengths of vector-based and DEM raster-based river network extraction methods; and (b) it is mesh-independent and can be applied to both structured and unstructured meshes. This method paves a path for advanced terrain analysis and hydrologic modeling across different scales.

Key Points

  • We use graph theory algorithms to simplify real-world river networks

  • Topological relationships are reconstructed from simplified river networks and mesh intersections

  • Topological relationships can be used to model flow direction field and flow routing parameters

Plain Language Summary

Representing rivers in hydrologic models is difficult because river networks are often very complex. Existing methods generally rely on elevation differences between land and rivers or image processing to define river networks in computer models. In this study, we combine the strengths of two existing methods and develop a topology-based method. This follows river channels and defines river networks in a way that works for any grid system. The products of this method can be used to improve hydrologic models.

1 Introduction

River networks are important features in hydrologic and Earth system modeling (Jolley & Wheater, 1997; Liao et al., 2019; Wu et al., 2012). Real-world river networks are complex, depending on landscape features such as elevation, aspect, and lithology. Moreover, the fractal nature of river networks means that they are approximately scale-free (Tarboton et al., 1988) without a well-defined spatial resolution at which they should be represented (Davies & Bell, 2009; Yamazaki et al., 2009). As a result, hydrologic models often use a conceptual representation of river networks. To date, there are mainly three methods for representing river networks at different scales, each with its own advantages and disadvantages. However, limitations remain when representing river networks due to the resolution mismatch and their interactions with other hydrologic features (e.g., ocean).

Two methods are useful at the watershed/regional scale, the first of which is the vector-based river networks analysis method. This method uses vector datasets to represent the river networks and their topological relationships (Lin et al., 2018; Mizukami et al., 2016). Vector datasets are often provided by public agencies, for example, the United States Geological Survey (USGS), or are digitized from satellite image processing, for example, vectorization-based river channel extraction. Various graph theory algorithms are then used to perform quality control and network analysis based on the vector river networks (Lindsay et al., 2019). Lindsay et al. reviewed several potential issues in existing vector river networks datasets: (a) the vertex coordinates of river flowlines may not exactly overlap with the actual locations due to digitization error and floating-point rounding errors. (b) A river flowline's starting and ending vertices may be reversed during spatial analysis, resulting in an opposite flow direction. (c) vector datasets obtained from different sources generally use different spatial references and cannot be used or combined directly. Even without these issues, vector datasets still require other pre-processing before use. For example, braided rivers are not universally supported in hydrologic models as multiple flow directions are not always supported. Although the vector-based river networks representation method is computationally efficient and scale-independent, it has limitations. First, the resulting vector networks are not explicitly linked with the rectangle mesh system commonly used in hydrologic models. As a result, the vector-based river networks method is limited to flow routing and cannot be easily coupled with other hydrologic processes. Second, this method can only be applied to areas with vector datasets unless it’s combined with advanced terrain analysis.

The second method is the high spatial resolution raster digital elevation model (DEM)-based river networks extraction method (Esri Water Resources Team, 2011; Tarboton, 2003; Yamazaki et al., 2019). This method generally involves several steps: (a) calculation of cell-to-cell flow direction for each raster cell based on elevation differences (e.g., D4/D8 algorithm); (b) calculation of flow accumulation based on flow direction; and (c) definition of river cells using an accumulation threshold. Because of local depressions in the DEM, a depression-filling operation is often required before step 1 to guarantee that water can flow out of each cell. The raster DEM-based river networks extraction method is widely used with a few limitations. First, this method is very sensitive to the spatial resolution and accuracy of the DEM. In general, it only performs well for high-resolution (<1 km) DEMs (Goulden et al., 2014; Sood & Smakhtin, 2015). Second, because the derived flow direction in step 1 relies on elevation differences, it is less accurate in flat areas with fewer topographic variations. As a result, obtaining unrealistic river networks is common. To address these limitations, the “stream burning” technique (also called “DEM reconditioning”) is often used to lower elevations within and near river channels so that water always flows into the river cells (Hellweger & Maidment, 1997; Lindsay, 2016b). This technique requires an additional river network data set, typically in vector format. The user-provided river networks data set is often converted into a binary mask with the same spatial resolution as the DEM (Lindsay, 2016b). Because the binary mask does not describe the upstream-downstream topological relationships between mesh cells, it may not accurately capture the meander, confluence, and parallel river if the resolution is not high enough (Figures S1–S6 in Supporting Information S1). In this study, we define parallel rivers as rivers running in adjacent mesh cells side by side. As a result, it often produces incorrect flow directions in these locations (Figures S1–S6 in Supporting Information S1). The extensive modifications to the elevations of river networks and riparian-zone can result in large biases in slope calculations (river channel slope and riparian zone slope), which will significantly impact the flow routing and flooding processes in hydrologic models (Shelef & Hilley, 2013). To remediate this, slopes are sometimes calculated from the original DEM (Lindsay, 2016b), but this requires additional adjustments to consider local depressions. Similar issues also arise in the terrain analysis depression filling process. Several studies used additional topological information in the hybrid breaching-filling algorithm to minimize modifications to elevation (Lindsay, 2016a). However, this hybrid approach is not readily available and is not used in most Geographic Information System (GIS) or hydrologic models. Due to the computational cost, the raster DEM-based method is often not directly used at large spatial domains. Instead, the study domain is often separated into small tiles merged after applying the method.

The third method is used to derive river network for continental and global applications at coarse spatial resolution. Due to the resolution mismatch between fine-scale river network datasets and model application mesh cell size (around 10–200 km), there are often multiple rivers within the same cell and their topological relationships are often very complex (Shaw et al., 2005). To maintain the large-scale flow patterns, an upscaling is often needed. The state-of-art upscaling models often use high-resolution datasets (e.g., results from the high-resolution raster DEM-based method) as guidance to define the coarse resolution cell-to-cell flow direction (Davies & Bell, 2009; Eilander et al., 2021; Fekete et al., 2001; Wu et al., 2011). For example, the Cell Outlet Tracing with an Area Threshold (COTAT) (Reed, 2003) and Network Tracing Method (Wu et al., 2012) use either high-resolution raster or vector-based river networks to guide their coarse resolution flow directions. The Iterative Hydrography Upscaling method considers both local and global information during iterations to remove erroneous flow directions (Eilander et al., 2021). Because of the resolution mismatch, some upscaling models sometimes need to modify the major river locations to produce reasonable flow direction fields (Munier & Decharme, 2022; Wu et al., 2012). The shift of locations can result in the unrealistic spatial distribution of hydrologic model outputs such as the floodplain inundation (Decharme et al., 2012; Luo et al., 2017; Mao et al., 2019; Yamazaki et al., 2011; Zhou et al., 2020). Currently, most existing upscaling models only support structured rectangle meshes. Taken together, the existing three methods have different advantages and disadvantages, and they are often used in different scales, applications, and hydrologic communities (Table S1 in Supporting Information S1).

To date, all the spatially-distributed flow routing models (except vector-based) are limited to the structured rectangle meshes, and cannot be seamlessly coupled with other unstructured mesh-based numerical models. Moreover, existing methods mainly focus on projecting existing river networks onto prescribed structured rectangle meshes. Less attention has been paid to unstructured meshes, which allow certain hydrologic features, such as river networks, to be burnt into the meshes. Many studies have attempted to “burn” river networks into meshes, with most using the Triangular Irregular Network (TIN) approach (Coon et al., 2019; Kreveld & Silveira, 2011). However, existing TIN-based methods do not generally incorporate stream burning or depression filling methods (Coon et al., 2019; Huang & Lee, 2015; Ivanov et al., 2004). As a result, there is still uncertainty in flow direction and slope calculations.

In recent years, model development based on unstructured meshes is an emerging area of interest in hydrologic and Earth system models because it provides several advantages (Engwirda & Liao, 2021): (a) unstructured mesh refinement can be used to define specific regions of interest (ROIs). Because hydrologic features such as river networks, dams, and coastal lines do not align well with the rectangle meshes used in numerical models, spatial interpolation and approximation are often needed. In contrast, unstructured meshes provide the flexibility to represent these features reasonably well through variable size and rotation, resulting in reduced model uncertainty. (b) spatial interpolation will be significantly reduced or removed in a unified unstructured mesh framework that includes all the model components, for example, ocean, land, and river. These components can exchange fluxes seamlessly at their interfaces (Liao et al., 2022). (c) unstructured meshes can be used to balance spatial resolution and computational cost through variable resolution, which is critical for large-scale hydrologic and Earth system models.

To the authors' knowledge, there is few river networks representation methods designed for the unstructured mesh system (de Azeredo Freitas et al., 2016; Hsu, 2020; Huang & Lee, 2015; Hyvaluoma, 2017; Paz & Collischonn, 2007; Sood & Smakhtin, 2015). For hydrologic models and Earth system models, this requires a new river network representation and flow direction method that supports unstructured meshes. In this two-part study, we introduce a novel method that combines the strengths of existing methods to produce river networks and flow direction for any mesh system. In part 1, we mainly focus on the topological relationship-based river network representation model (PyFlowline) (Liao & Cooper, 2022). In part 2, we will demonstrate how to use the topological relationship in depression filling and flow direction modeling in the HexWatershed model (Liao et al., 20202022). Part 1 of the study is organized as follows. We first introduce the model algorithms. Then we apply this model to a coastal watershed, the Susquehanna river basin (SRB), using different model configurations, and evaluate the model performance against real-world river networks using several metrics including river length and area of differences. Last, we discuss the limitations and future applications in hydrologic and Earth system models.

2 Methods

2.1 Overview

Conceptually, any river channel can be represented using three basic graph elements: vertex, edge, and flowline (Figure S7 in Supporting Information S1). River networks can be viewed as collections of these three elements. After converting existing river network datasets into these basic elements, we can then use graph algorithms to extract topological relationships including connectivity and direction. For notation, a single letter/number will be used for the vertex (e.g., vertex A) and a sequence of letters/numbers will be used for the directed edge and flowline (e.g., edge A- > B). The essence of our method is that in the simplest scenario, in any type of mesh system, a river channel always intersects (enters and exits) a mesh cell on two different edges (or, less likely, its vertices) (Figure 1), unless it is either a headwater or river mouth in which either the starting or ending vertex lies within the mesh cell.

Details are in the caption following the image

Illustration of a river channel entering and exiting a mesh cell from its edges. Blue curves are river channels. Black polygons are mesh cells (triangle, square, and hexagon). Regardless of the cell type, the intersection between a river channel and a mesh cell always results in two vertices on mesh cell edges, such as A/D, B/E, and C/F pairs, and a directed flowline within, such as A- > D, B- > E, and C- > F.

Based on the intersections between the river channels and mesh system, real-world river networks can be represented digitally as collections of the individual “reach” within each mesh cell (e.g., flowline A- > D and B- > E in Figure 1). Based on the intersections, topological relationships (e.g., which cells are upstream of the current cell) between mesh cells can be built. As a result, river networks can be consistently preserved in any mesh system. To achieve this, our method consists of several major steps, illustrated in Figure 2 and described in detail in the following sections.

Details are in the caption following the image

The workflow of the topological relationship-based river networks representation method. The workflow includes three major components: flowline simplification, mesh generation, and intersection-based topology reconstruction. Each component contains one or more steps with indices. Several steps (highlighted by the yellow arrow) may be run recursively in flowline simplification. The output from flowline simplification can be optionally used for mesh generation (purple arrow). Both structured (e.g., rectangle lat-lon and hexagon) and unstructured meshes (e.g., Model for Prediction Across Scales mesh) are supported. Step 13 is a combination of several steps from Steps 3 to 9.

Besides, because our method is completely built upon the geodesic coordinate system (Text S1 in Supporting Information S1), it can be applied to both regional and global scales.

2.2 Flowline Simplification

To address potential issues in existing vector datasets, we developed a list of algorithms (Steps 1 to 9 in Figure 2) to pre-process the vector flowlines. In practice, these algorithms may be run in different combinations and orders depending on the datasets. From now on flowline pre-processing, as either individual steps or a collection of several steps, is referred to as flowline simplification. And the outputs from flowline simplification are referred to as simplified flowlines. Details of the algorithms in Steps 1 to 9 are provided in Supporting Information S1 (Text S2). Additionally, our method provides the option to burn dams and associated flowlines in the workflow. For flowline simplification, this means that besides the prescribed high-order flowlines, the model also includes all the downstream flowlines of the user-provided dams.

2.3 Mesh Generation

Mesh generation is the process that defines the spatial domain discretization (Liao et al., 2020). Structured meshes such as the geographic coordinate system (GCS) rectangle mesh (latitude-longitude), projected coordination system (PCS) rectangle mesh (square), and hexagonal mesh, can be generated using various GIS computer programs (e.g., QGIS MMQGIS) (Minn, 2021) or with Python scripting (Liao et al., 2022). This usually involves the following steps: (a) obtaining the spatial extent of the study domain; (b) setting the lower-left or upper-left corner as the origin; (c) calculating the number of rows and columns based on the desired resolution and spatial extent; (d) calculating the vertex coordinates of each mesh cell; and (e) exporting all the mesh cells to a GIS file format.

For unstructured meshes such as the TIN or the Model for Prediction Across Scales (MPAS) mesh (Text S3 in Supporting Information S1) (Ringler et al., 2013), advanced mesh generators such as JIGSAW are available (Engwirda, 2014; Sahr, 2015). These mesh generators can generally apply mesh refinement near some ROIs, such as river networks and/or coastal areas. Some generators also allow mesh grid centers to align with predefined polylines (e.g., river flowlines or coastal lines) and points (e.g., dams) to satisfy particular modeling needs. Our study mainly used the JIGSAW mesh generator to produce an MPAS mesh. Specifically, we used simplified flowlines, dam locations, and coastal lines during the mesh refinement process so that the MPAS mesh cells align with these hydrologic features (Engwirda, 2017).

To maintain consistency, we use “equivalent” resolution which is the square root of the mesh cell spherical sector area, to define mesh resolution (Equation 1, and Text S1 in Supporting Information S1) (Liao et al., 2022). For a structured GCS rectangle mesh, the equivalent resolution changes with latitude. The equivalent resolution only applies to a certain mesh cell for unstructured variable resolution meshes such as TIN and MPAS meshes.
where Resc is the “equivalent” resolution (m); and Areac is the mesh cell spherical sector area (m2).

Mesh generation is not the focus of this study and has been extensively explored in relevant communities (Engwirda, 2017; Minn, 2015; Sahr, 2015). This study uses relatively simple meshes and existing mesh generators as test cases to demonstrate our new method.

2.4 Topological Relationships Reconstruction

Intersecting the simplified flowlines and meshes breaks stream segments into stream reaches. For simplicity, we redefine the simplified flowlines using stream segments and stream reaches (Figure S12 in Supporting Information S1). Because both the stream segment index and stream order information are defined during the flowline simplification step (Step 9 in Figure 2), they are directly assigned to each stream reach. A mesh cell may contain one or more internal reaches if intersecting with the simplified flowlines.

Reconstruction of the mesh cell topological relationships starts from the user-provided approximate outlet location (latitude and longitude coordinates) and searches from the outlet to upstream headwater in reverse. Figure 3 illustrates this process in the simplest scenario with only three stream segments in a rectangle mesh. The model also allows a segment to take a “shortcut” (Red dashed edge F- > I in Figure 3) when it passes a mesh cell in a short distance (Reach 3- > 2 within cell E), which produces the classical D8 diagonal travel path (only in the rectangle mesh). This short distance is defined as a “shortcut” threshold parameter (Table S2 in Supporting Information S1). The resulting topological relationships are stored and expressed as conceptual flowlines, which connect one cell center to another (blue and red dashed arrows/edges).

Details are in the caption following the image

Illustration of the topological relationships reconstruction in a 3-row by 4-column rectangle mesh. The letters in the top-left corner of each cell, from A to L, represent mesh cell IDs. The numbers from 1 to 7 represent the start, end, and intersected vertices. Colored solid arrows are intersected real stream reaches within each mesh cell. Starting from outlet vertex 1, the algorithm searches reversely and reconstructs the cell topological relationships as D- > C- > B- > F, H- > L- > K- > G- > F, and F- > E-> I in blue dashed arrows. Optionally, if the D8 diagonal path is turned on because the reach (3- > 2) length within grid E is less than the user-provided threshold, the algorithm omits the grid E and takes a shortcut (red dashed arrow). Because the algorithm strictly follows topology relationships, it precisely captures confluence (Grid F), parallel river (C- > B and G- > F), and meander (H- > L- > K- > G).

With these capabilities, the model can represent the river meander, confluence, and parallel river in any mesh system (Figures S4–S6 in Supporting Information S1). In some cases, a stream segment may enter and exit a mesh cell multiple times on the same cell edge, which results in a “cycling” effect (e.g., F- > E-> B- > E-> D in Figure S10 in Supporting Information S1). To address this issue, the cycling removal algorithm (Step 6 in Figure 2) is applied again to remove the loop (e.g., the final topological relationships are F- > E-> D). It is also possible that multiple conceptual flowlines enter and exit the same cell. Therefore, some flowline simplification algorithms are reused (Step 13 in Figure 2).

3 Model Application

3.1 Study Area

The SRB is a major river basin located in the Mid-Atlantic region of the United States. The total drainage area of the SRB is about 7.1 × 104 km2. Its surface elevation ranges from 0 m to more than 900 m. It contains both relatively mild and steep surface slopes, ranging to 30° in some areas (Figure 4). Spatial datasets and maps were produced using the Python packages including Matplotlib and GDAL (GDAL/OGR contributors, 2019; Gillies et al., 2007; Hunter, 2007; Liao, 2022).

Details are in the caption following the image

The spatial location, surface elevation, and surface slope distribution (based on DEM) of the Susquehanna river basin. The upper left red polygon is the Watershed Boundary Data set watershed boundary on Google Maps; the upper right is the histogram of surface slope (degree), and the bottom is the topographic map (m). In the topographic map, the black lines are major river channels. The red crosses are major dams. The outlet is in the lower right corner.

Because the river outlet of the SRB is into Delaware Bay, the study area is prone to sea-level rise, storm surge, and other extreme event-induced flooding. Due to the resolution mismatch and mesh differences between the land, river, and ocean in Earth system models, the study area is generally poorly represented, especially near the coastal lines (Feng et al., 2022).

3.2 Data

We collected river networks, watershed boundaries, and dam locations from the United States National Hydrography Data set Plus High Resolution (NHD Plus HR) and Watershed Boundary Data set (USGS, 2013). Because NHDPlus HR contains more than 10,000 flowlines in the SRB (Figure 5), we used stream order (higher and equal to 6 or 7) to reduce the number to 120 (Figure 5) (Tarboton et al., 1991).

Details are in the caption following the image

The spatial distribution of river flowlines from NHDPlus HR for the Susquehanna river basin. (a) Is the raw data with all flowlines (more than 10 k flowlines) and (b) shows filtered flowlines with stream orders higher or equal to 7. A total of 120 river flowlines are retained.

3.3 Model Setup

Major model configuration parameters are listed in Table S2 in Supporting Information S1. To evaluate the sensitivity of model performance to spatial resolution and meshes, we ran the model with different configurations with case indices used for illustrations (Table 1).

Table 1. Simulation Configurations With Case Indices
Mesh/Resolution 50 km 10 km 5 km
Lat-lon 1 2 3
Square 4 5 6
Hexagon 7 8 9
MPAS 10/11/12
  • Note. The illustrations and analyses all use the same indices.

For the structured meshes, that is, GCS rectangle mesh (lat-lon), PCS rectangle mesh (square), hexagon, we ran 3 different spatial resolutions (50 km, 10 km, and 5 km) with the same stream order threshold 7. The square mesh Case 9 results are used for comparison with the traditional raster DEM-based method (Text S4, Figure S13 in Supporting Information S1).

For the unstructured variable resolution (3 ∼ 10 km) MPAS mesh, we ran 3 cases: (a) Case 10 with stream order threshold 7 (the same with structured-mesh cases); (b) Case 11 with stream order threshold 6. This case is used to evaluate the performance at different levels of flowline details; (c) Case 12 with stream order threshold 7 and additional dams burnt in the flowlines and mesh. This case is used to illustrate the dam-burning capability.

3.4 Results and Analysis

3.4.1 Flowline Simplification

After model simulation, the river flowlines were substantially simplified. First, the total number of flowlines is reduced from 120 to 7, with 4 headwater vertices, 3 confluence vertices, and 1 outlet vertex (Figure 6). Second, all the small rivers with lengths less than the threshold were removed (Figure 7). Third, all braided rivers were removed (Figure 7). Flowline simplification results are the same from Cases 1 to 10.

Details are in the caption following the image

The spatial distribution of flowlines after flowline simplification. (a) Colored line features are 7 flowlines after simplification. Segments 1 to 7 are segment indices. Confluences 1 to 3 are confluence indices.

Details are in the caption following the image

Before and after flowline simplification comparisons. (a and b) are zoomed-in views where small rivers are removed near the river meanders; (c and d) are zoomed-in views where braided and small rivers are removed.

After removing small and braided rivers, the total length of the flowlines decreased 10.0% (dashed red and blue lines in Figure 8).

Details are in the caption following the image

Comparison of the total flowline length from different model configurations. The X-axis is the spatial resolution, and the Y-axis is the ratio between modeled and simplified river length. The dashed red and blue lines are the NHDPlus HR and simplified total flowline length. Colored (green, purple, orange, and yellow) bars with textures represent different model configurations.

Depending on user needs, a combination of stream order and dam information can be used in PyFlowline to preserve different levels of details (e.g., tributaries). For example, the simplified flowlines from Cases 11 and 12 are illustrated in Figure S11 in Supporting Information S1.

3.4.2 Mesh Generation

The generated structured GCS rectangle lat-lon, PCS square, and PCS hexagon meshes are illustrated in Figure 9.

Details are in the caption following the image

The spatial distributions of modeled conceptual flowlines on the lat-lon, square, and hexagon meshes at multiple spatial resolutions. The layout of the figures and Case indices from 1 to 9 is the same as Table 1. Black line features represent the simplified flowlines. Colored line features represent the conceptual flowlines.

The generated MPAS mesh is illustrated in Figure 10. Because both the small and braided rivers were removed, JIGSAW successfully aligned cell centers following the simplified flowlines at the desired resolution. JIGSAW also varies cell resolution by considering the distance of the cell center to the simplified flowlines (as well as the coastal lines) through a density function. In this case, a coastal-resolving mesh spacing function is adopted to cluster high model resolution near coastlines. Mesh resolution varies between 3 km near the river outlet/coastal line and 10 km near the domain boundary.

Details are in the caption following the image

Modeled flowline-guided Model for Prediction Across Scales mesh from Case 10 (clipped to the study domain) (Table 1). Because the JIGSAW mesh generator considers the simplified flowlines in its density function, the mesh cell center locations align with the flowlines, and cell resolutions are higher (∼3 km) near the flowline than away from it (∼10 km).

3.4.3 Topological Relationship Reconstruction

In general, on the structured meshes, the spatial patterns of modeled conceptual flowlines are similar to the simplified flowlines at different spatial resolutions (Figure 9).

The model can separate parallel flowlines such as Segments 1 and 6 (Figure 6). It also captures all the river confluences. As the spatial resolution increases, the model can capture more spatial details. At 5 km resolution, the modeled conceptual flowlines can accurately describe the river confluences. The shortcut algorithm also works well, producing the classical D8 diagonal paths in many scenarios (e.g., Cases 2 and 5).

On the unstructured MPAS mesh, the modeled conceptual flowlines closely follow the provided simplified flowlines (Figures 10 and 11). The model's performance varies with the complexity of the simplified flowline and mesh cell resolution. For example, at moderate resolution (∼6 km) the modeled conceptual flowline follows the overall pattern and captures the sharp U-turn near the river meander (Figure 11). Near Confluence 1 and the outlet (∼3 km), the modeled conceptual flowlines almost overlap with the provided simplified flowlines (Figure 11). The modeled MPAS mesh-based conceptual flowlines are closer to the simplified flowlines than the modeled structured meshes-based conceptual flowlines (Section 3.5.2). This improved fidelity results from the enhanced flexibility of a fully unstructured mesh representation. This allows for the close alignment of mesh cells with stream features during the mesh generation phase, significantly improving the accuracy of the reconstructed river flowlines.

Details are in the caption following the image

The zoom-in views of the spatial distributions of modeled conceptual flowlines on the Model for Prediction Across Scales mesh from Case 10 (Table 1). (a). (b and c), are the zoomed-in views near the river meander, river confluence, and river outlet. Black line features are simplified flowlines. Colored line features are the conceptual flowlines.

With increased flowline details, that is, stream order 6 or dam burnt-in, the model can still capture the river networks (Figure S11 in Supporting Information S1).

3.5 Metrics Analysis

For metrics analysis, we only focus on Cases 1 to 10 as they have the same level of flowline details.

3.5.1 Flowline Length

The modeled cell center-based conceptual flowline length, although not always used in hydrologic or Earth system models (Paz & Collischonn, 2007), reflects the closeness between the modeled and real river networks. For structured meshes, the total flowline length increases as the spatial resolution increases (Figure 8). At the same spatial resolution, the differences between different meshes are around 10.0%. The total length from the MPAS mesh-based flowlines is shorter than the results from equivalent structured meshes (Figure 8). Differences between different segments are mainly influenced by river meander features (Section 3.5.4).

Because the MPAS mesh is designed to follow flowlines whenever possible, its length and area of difference (Section 3.5.2) are the smallest. In contrast, the structured mesh-based flowlines usually generate a “zig-zag” effect, which produces a larger length and area of difference (Figure 12).

Details are in the caption following the image

Illustration of the conceptual length and area of difference. The dashed black lines are the cell edges. The blue and purple dots are cell centers and vertices, respectively. The black/green/yellow lines are the real, structured mesh-based, and Model for Prediction Across Scales (MPAS) mesh-based flowlines. Because the structured mesh-based flowlines (green lines) do not closely follow the real flowlines (black lines), their total length is larger than the MPAS mesh-based flowline (yellow line). The structured mesh-based area of difference (light blue polygons) is larger than the MPAS mesh-based area of difference (red polygons).

3.5.2 Area of Difference

Similar to our earlier study (Liao et al., 2022), we compared the area of difference formed by flowline intersections. In this method, we used area to represent line feature discrepancies. If two or more line features intersect, the intersected segments can be used to create enclosed polygons. In general, the smaller the total area, the closer the line features. The area of difference is illustrated in Figure 12. Area of differences can be calculated using Python scripting in the following steps:
  1. Convert modeled conceptual flowlines into edge-based flowlines A;

  2. Intersect the edge-based flowlines A with the simplified flowlines B to obtain all the vertices list C;

  3. Classify the vertices C into different types of vertices (Text S2 in Supporting Information S1);

  4. Split both A and B using C to obtain a list of flowlines D;

  5. Build all the polygons enclosed by connected flowlines in D using a cycling algorithm and calculate their areas;

  6. Sum up the areas to obtain the total area of difference.

For structured meshes, the area of difference decreases as the spatial resolution increases. At the same spatial resolution, the hexagon mesh-based area of difference is the smallest except at the 50 km spatial resolution. For the unstructured mesh, the MPAS mesh-based area of difference is smaller than its equivalent structured mesh results (Figures 13 and 14).

Details are in the caption following the image

The enclosed area of difference between modeled river networks and simplified NHDPlus flowlines from the Model for Prediction Across Scales mesh.

Details are in the caption following the image

Comparison of the area-of-difference for modeled conceptual flowlines from different model configurations. The X-axis is the spatial resolution and the Y-axis is the area of difference (km2).

3.5.3 Branching Angle

The ramification angle, or branching angle (Text S1 in Supporting Information S1), is another important characteristic of river networks (Devauchelle et al., 2012). This metric is calculated using the last incoming flowline edges of a confluence. The results show that conceptual flowlines generally cannot capture the branching angles well, especially at the 50 km spatial resolution. For example, the branching angles at Confluence 2 are all 180° from the structured meshes (Figures 9 and 15). This is because only a limited number of branching angles are supported by the structured meshes. In contrast, the branching angles from the MPAS mesh are more flexible because the mesh provides flexible rotations.

Details are in the caption following the image

Comparison of the river branch angle from different model configurations for Confluence 1 to 3. The X-axis is the spatial resolution and the Y-axis is the branch angle (degree). The dashed blue line represents results from the simplified flowlines, used as the reference.

3.5.4 River Sinuosity

River sinuosity is the ratio between flowline length and valley length (the distance between the starting and ending vertices of a flowline). For example, all the models underestimate river sinuosity for Segment 4, which exhibits many meander features (Figure 16). In contrast, the modeled river sinuosity is much closer to the reference for Segment 7, especially at the 5 km high spatial resolution in part because there are fewer meander features (Figure 16).

Details are in the caption following the image

Comparison of river sinuosity for Segment 1 to 7 from different model configurations. The X-axis is the spatial resolution and the Y-axis is the sinuosity (ratio). The dashed blue line is the simplified flowline, used as the reference. Colored (green, purple, orange, and yellow) bars with textures represent different model configurations.

4 Discussion

4.1 Thresholds

4.1.1 Stream Order

The stream order used to filter out small rivers from the original NHDPlus HR datasets determines the level of details the model can preserve (Figure 5). This threshold should be tested based on the mesh spatial resolution and hydrologic model application. For example, if low-order rivers are included during the data preparation step, they may be still removed by the small river removal algorithm.

Because stream order is based on the topological relationship between stream segments, it may not reflect the actual drainage area of a river channel. For this reason, the drainage area should also be considered to filter out small rivers.

4.1.2 Small River Removal

The “small river” threshold (Table S2 in Supporting Information S1) is an important parameter for removing river channels with relatively short lengths. It should be configured considering the mesh spatial resolution. User-provided flowlines may be too fragmented, causing the model to excessively remove flowlines. For this reason, a visual inspection of the raw flowlines is important.

4.1.3 Shortcut Path Length

Although the “shortcut path” algorithm allows the model to reproduce the classical D8 diagonal travel path in the rectangle meshes, its impact on the topological relationship reconstruction process is not trivial. The structured rectangle mesh particularly exemplifies the effects of the “hortcut path” algorithm because it produces a diagonal travel path that goes through the mesh cell vertex. In contrast, the shortcut remains on the mesh cell edge instead of the vertex for the structured hexagon and unstructured MPAS meshes. Thus, the structured hexagon and unstructured MPAS meshes are preferred for coupled surface and subsurface hydrologic simulations because they are more compatible with hydrologic models that assume the flux exchange occurs through the cell faces (Liao et al., 2022).

Similar to the small river removal algorithm, this parameter should consider the mesh spatial resolution. Therefore, a fraction between 0.0 and 1.0 of the mesh cell resolution is preferred.

In our model, the flowline length is used to determine whether the conceptual flowline should take a shortcut. Other studies also suggest that the distance from the flowline to the mesh cell center should also be considered (Lindsay, 2016a). A combined approach considering both length and distance may produce even more robust results.

4.2 Quality of Modeled Conceptual Flowlines

In general, the quality of modeled conceptual flowlines increases as the spatial resolution increases. For structured meshes, the model performances are very close and the hexagon mesh is potentially better (Figure 14), which is constant with our earlier study (Liao et al., 2020). The MPAS mesh-based conceptual flowlines are the closest to the real flowlines.

Although the center-to-center flowline length often leads to reduced river length, the length decrease is offset by the “zig-zag” effect (Figure 12). This explains why the total length can be higher than the simplified flowlines from the structured meshes-based results (Figure 8). Because the MPAS mesh cells align with the simplified flowlines, they have a minimized offset effect and the shortest total length (Figure 16).

The area of difference comparison shows that the MPAS mesh produces the closest conceptual flowlines to the simplified flowlines in terms of the area followed by the hexagon mesh. Its performance is even better than structured meshes at higher resolution 5 km (Figure 14).

Branching angle analysis shows that the structured meshes generally cannot capture the confluence ramification feature well because they only support a limited number of angles. The MPAS mesh has the potential to resolve this issue but requires a high-quality mesh (Figure 15).

4.3 Importance of Topological Relationships

Compared with the vector-based or raster DEM-based river network extraction methods, our topological relationship-based method overcomes several existing limitations. It precisely follows river channels near river meander and confluence, producing high-quality conceptual river networks.

Given the explicit topological relationships, advanced stream-burning algorithms can be implemented to remove the depressions within river channels. Traditionally, stream-burning algorithms blindly lower river channel elevations significantly to force the flow direction. More recently, adaptive stream-burning algorithms can adjust river channel elevations using a hybrid breaching and filling approach (Lindsay, 2016a). However, the breaching algorithm requires topological relationships, especially near river confluences, to avoid incorrect breaching directions (Liao et al., 2022). Therefore, the topological relationships produced from our method provide an opportunity to use the fully hybrid breaching and filling method to minimize modifications to both land and river elevations.

4.4 Comparisons With Existing Methods

Although our method uses a different approach to represent river networks, it shares some similarities with the existing methods. First, existing upscaling methods generally use high-resolution datasets as guidance to track the dominant rivers. In our method, the vector river networks are filtered using the stream order and they essentially represent the dominant rivers. Second, the graph-based flowline simplification is similar to some vector-based and upscaling method algorithms. These algorithms turn the real-world river networks into a directed graph and simplify the graph based on several criteria such as distance and connectivity. Third, existing upscaling methods often use indices to describe the entrance and exit of river channels on the coarse-resolution cell edges because a parent-child structure exists between the coarse and high-resolution structured mesh cells (Eilander et al., 2021). In our method, we use the intersection to obtain the topology to support both structured and unstructured meshes.

4.5 Implication for Hydrologic and Earth System Models

Representing river networks is key to hydrologic models. Our study achieves this through two major considerations. First, the flowline simplification guarantees that the most important river information is preserved while keeping river flowlines in their simplest formats. Second, the topological relationship-based conceptual flowlines further preserve the river network structures while linking them with the mesh system. In this way, the model is able to consistently capture the spatial pattern of river networks.

The support for non-rectangle meshes further improves model performance and demonstrates the advantage of using unstructured meshes in hydrologic and Earth system models. Instead of projecting existing hydrologic features onto the rectangle meshes, we can now directly generate meshes that explicitly define these features. For example, in this study, we burnt flowlines in the MPAS mesh through the JIGSAW density function and in the conceptual flowlines through the topological relationship reconstruction. Our method also burnt dams through the workflow (Figure S11 in Supporting Information S1). In the future, we can include other hydrologic features such as lakes and wetlands. This enables us to define the geometry of hydrologic features, resulting in improved hydrologic and Earth system models (Liao et al., 2022).

Our method produces high-quality river routing parameters other than the spatial pattern. For example, it keeps track of both actual and conceptual flowline length throughout the processes. However, when the “shortcut” occurs, the corresponding flowline length can be merged into the upstream/downstream or ignored. More importantly, the topological relationships can be used to minimize the modifications to both land and river elevations in the fully adaptive hybrid breaching and filling method, generating more realistic slopes in both rivers and riparian zones. Because both river length and slope are important river routing parameters, the results from our method can be used to improve flow routing models.

4.6 Limitations

There are several limitations to the current model. First, unlike the small river removal and shortcut path length parameters that can be configured to consider the mesh resolution, the stream order threshold used to filter out small rivers is not directly associated with any physical attribute. Alternatively, drainage area should be a more meaningful metric to filter out small rivers.

Second, the model does not use additional information in the cycling detection algorithm to decide which channel should be preserved if they have the same stream order. River channel width or length may be included to capture the dominant river channel.

Third, our current method only considers rivers in the workflow. Other hydrologic features (e.g., lake and reservoir) are not fully considered. In a complex landscape, these features should also be included so the final flow routing map is consistent for hydrologic and Earth system models.

Last, our model currently only relies on the topological relationship to reconstruct conceptual river networks. It is possible to combine our method with other upscaling methods to consider more complex scenarios. In this case, a more dedicated mesh cell topology algorithm is needed because unstructured meshes often do not have the parent-child hierarchical structure.

5 Conclusions

In this study, we developed a mesh-independent topological relationship-based river networks representation model (PyFlowline). We applied the model to the SRB with different configurations. The model evaluation shows that the model performs well and the modeled conceptual river networks are consistent with the real-world river networks. The outputs of our model, especially the topological relationships, should be used for advanced terrain analysis and hydrologic models.


This work was supported by the Earth System Model Development program area of the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research as part of the multi-program, collaborative Integrated Coastal Modeling (ICoM) project. A portion of this research was performed using PNNL Research Computing at Pacific Northwest National Laboratory. PNNL is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830.

    Conflict of Interest

    The authors certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

    Data Availability Statement

    The data used for model simulations can be downloaded through the USGS website (https://www.usgs.gov/national-hydrography). The model input and output files can also be accessed from the GitHub repository: GitHub https://github.com/DOE-ICoM/liao-etal_2022_pyflowline_james. The PyFlowline model can be installed as a Python package through either (Liao & Cooper, 2022): Python Package Index (PyPI): https://pypi.org/project/pyflowline/, or Conda: https://anaconda.org/conda-forge/pyflowline.