# A Network-Based Flow Accumulation Algorithm for Point Clouds: Facet-Flow Networks (FFNs)

## Abstract

Flow accumulation algorithms estimate the steady state of flow on real or modeled topographic surfaces and are crucial for hydrological and geomorphological assessments, including delineation of river networks, drainage basins, and sediment transport processes. Existing flow accumulation algorithms are typically designed to compute flows on regular grids and are not directly applicable to arbitrarily sampled topographic data such as lidar point clouds. In this study we present a random sampling scheme that generates homogeneous point densities, in combination with a novel flow path tracing approach—the Facet-Flow Network (FFN)—that estimates flow accumulation in terms of specific catchment area (*S**C**A*) on triangulated surfaces. The random sampling minimizes biases due to spatial sampling and the FFN allows for direct flow estimation from point clouds. We validate our approach on a Gaussian hill surface and study the convergence of its SCA compared to the analytical solution. Here, our algorithm outperforms the multiple flow direction algorithm, which is optimized for divergent surfaces. We also compute the SCA of a 6-km^{2}-steep, vegetated catchment on Santa Cruz Island, California, based on airborne lidar point-cloud data. Point-cloud-based SCA values estimated by our method compare well with those estimated by the *D*_{∞} or multiple flow direction algorithm on gridded data. The advantage of computing SCA from point clouds becomes relevant especially for divergent topography and for small drainage areas: These are depicted with much more detail due to the higher sampling density of point clouds.

## Key Points

- Specific catchment area estimation from irregular point-cloud data by triangulated irregular networks (TINs)
- Efficient and innovative sink treatment to generate hydrologically correct flow estimates from point-cloud data
- Comparative study on accuracy gains in specific catchment area estimations with an increase in resolution of the digital topography

## 1 Introduction

Recent advances in generating high-resolution topographic data have increased the demand on computing resources, new algorithms, and techniques (Larsen et al., 2016; Roering et al., 2013). Methods such as lidar and Structure-from-Motion generate densely sampled “point clouds” that are then used for a wide range of hydrologic and tectonogeomorphologic applications (see, e.g., Arrowsmith & Zielke, 2009; Hurst et al., 2012; Hilley & Arrowsmith, 2008; Meigs, 2013; Neely et al., 2017; Perroy et al., 2012; Passalacqua et al., 2010, 2015; Tarolli et al., 2009). A typical airborne lidar data set can contain on the order of 10 points per square meter, leading to a billion points for a study site of 100 km^{2} (Passalacqua et al., 2015; Roering et al., 2013). Before being used in geomorphic and hydrological analyses, however, high-resolution point-cloud data sets are often converted to digital elevation models (DEMs) by aggregating them to a regular, coarser grid (e.g., to a resolution of ≈1 grid point per square meter). This is typically done in order to allow for more realistic computation times, potentially reduce the impact of random errors in the measurements, and to gain analytical tractability because of the regularity of the gridded estimates. Although there exist several applications that are not severely impacted by the ‘gridding’ step, as they do not require the fine-scaled, sub-meter spatial resolution, we contend that the inherent data set resolution—that of the point cloud itself—would allow more detailed insights into the hydrological features of the landscape. Geomorphic studies of ephemeral channels, channel heads in dry areas with weak lithologies, landslide scarps and arroyo formation would all benefit from submeter spatial resolution data. Existing approaches that are commonly used to construct river-flow networks from elevation data sets, however, are not equipped to handle point clouds directly, and require a DEM as a starting point of the analysis. Additionally, DEMs usually have fewer pixels than the underlying point cloud in order to mitigate elevation uncertainties and to avoid interpolation artifacts (Florinsky, 1998; Oksanen & Sarjakoski, 2006; Wechsler & Kroll, 2006; Wechsler, 2007).

In this study, we describe a network approach to perform flow accumulation on irregularly spaced point clouds and provide a framework that links this method to existing, grid-based flow accumulation techniques. In our approach, we generate a flow network from the irregularly sampled point-cloud data set by representing the topographic surface with a triangulated irregular network (TIN). Such a representation of the topographic surface assumes a linear model for elevations between measurements, which might not be a valid approximation depending on the roughness of the terrain and the number of measurements per square meter. However, since a TIN can be constructed from any arbitrary set of sampled points, our method is directly applicable to point-cloud data, which yields the highest possible measurement densities. Despite the existence of flow accumulation approaches on TINs (see, e.g., Jones et al., 1990; Ivanov et al., 2004; Zhou et al., 2011), we put forward an efficient and realistic approximation of flow on the surface of the TIN facets themselves and hence refer to this network as Facet-Flow Network (FFN). Our approximation is realistic because estimates converge to the analytical solution even for terrain with diverging flow where flow partitioning is most important, and it is efficient because our flow partitioning depends only on the local gradient and not on the full evolution of the corresponding *stream tube*. We define a stream tube to be the flow path bounded by the two flow lines originating at the boundary of that flow path's first facet. The evolution of flow lines on TINs is determined by the gradient of its facets.

The underlying assumption in our flow path estimation is that rain falling onto the triangulated surface (facet) of the TIN is transported from facet to facet as determined by the local gradient and thereby aggregates into channels and rivers that eventually drain into one or more outlets. However, we do not consider variations in the flow density at the subfacet level. Water exiting a facet is considered to be homogeneously distributed along the corresponding contour of the facet.

Although we are interested in a geometric terrain analysis in terms of drainage areas, the FFN approach could be extended for studies of hydrological modeling. Such a hydrological model may analyze the flow propagation and its temporal evolution due to variations of rainfall in time and space. However, assuming a constant rain rate (m/s), a steady flow pattern will emerge, which can be expressed by the amount of water per time (m^{3}/s) at each point on the topographic surface. Because we assume rainfall as spatially homogeneous (i.e., the rain rate is the same at all locations), the flow can be studied in terms of the total drainage area (*TDA*). For each point in space, the TDA is defined as the size of the 2-D surface area that drains into it. Note that the TDA at a chosen location is not proportional to the area of the 3-D surface upstream of that location, but it is rather proportional to area of the *xy* projection of the 3-D surface. TDA is thus proportional to water amounts that flow through the surface due to spatially homogeneous rainfall. However, although not implemented that way, our approach could as well be used to accumulate the 3-D surface area of facets if a slope dependent TDA is desired.

TDA has been used by many hydrologic and geomorphologic applications to model the flow of water and sediment (Montgomery & Foufoula-Georgiou, 1993; Rengers et al., 2016; Tucker & Bras, 1998). However, TDA is inherently resolution dependent, making it difficult to compare between data sets of different resolutions (Pelletier, 2010). The resolution dependence leads to biases in estimates for irregularly sampled data. The higher the resolution, the smaller the grid cells or facets and the lower the TDA, because TDA is an aggregated measure along the contour width of cells (Erskine et al., 2006; Schoorl et al., 2000; Zhou & Liu, 2002). Although our network-based approach also accumulates flow to TDA, we use the specific catchment area (*SCA*), which is defined as TDA per unit contour width. Hence, even though both TDA and SCA are directly estimated using FFNs, SCA is unbiased even if the sampling density changes across the geographic region of interest and is therefore used throughout this study. An advantage of using FFNs is that each facet has a well-defined corresponding contour width, and thus, a more accurate estimation of SCA can be directly calculated for high spatial resolution data. For other approaches it is often challenging to estimate SCA accurately because corresponding contour widths are unknown or heuristically estimated in terms of effective contour widths (Chirico et al., 2005; Pelletier, 2010; Qin et al., 2007). Given a grid-based flow accumulation that allows multiple flow directions (MFDs), the correct contour width inside channels is the grid cell width. However, the more diverging the landscape, the higher the corresponding contour width.

An additional advantage of the FFN approach is that it enables us to apply a novel way to resolve sinks in the landscape. A sink in digital topography is a single cell, or a group of grid cells that have no lower neighbor (Jenson & Domingue, 1988; O'Callaghan & Mark, 1984). Instead of carving or filling sinks in digital topography (e.g., O'Callaghan & Mark, 1984; Rieger, 1993; Planchon & Darboux, 2002; Soille et al., 2003; Zhang et al., 2017), we propose a fundamentally different approach that does not alter the digital topography. We introduce new additional links into the flow network that “tunnel” the flow out of sinks. This approach is more similar to sink carving than to sink filling, in the sense that water is routed away from the bottom of sinks instead of overflowing the sink. High-resolution topography creates new challenges because the effect of anthropogenic artifacts such as road embankments and bridges increases with increased resolution. Similarly, at fine spatial scales, sinks exist in natural landscapes and high-resolution topographic data will result in hydrologically disconnected landscapes. Part of the difficulty arises from the problem of reliable point-cloud classification and the removal of nonground objects (i.e., vegetation and buildings) from a point cloud used for generating digital topography. Specifically, classification inaccuracies in densely vegetated terrain will introduce false point elevations, which ultimately will result in surface water flow obstruction. The FFN sink-tunneling approach mitigates some of these constraints.

In order to apply the FFN algorithm and demonstrate its usefulness, we validate FFN SCA estimations of a Gaussian hill against the analytical solution of SCA and compare the efficiency of our algorithm to that of the MFD algorithm (Freeman, 1991; Pelletier, 2008). We then analyze classified point-cloud data from the steep and vegetated Pozo catchment of the Santa Cruz Island in Southern California. For the Pozo point cloud, we compare our FFN SCA to the one derived by MFD from a corresponding set of sink-filled 60-cm DEM (see the supporting information for comparisons to the *D*_{∞} algorithm). Our findings suggest that FFN SCA estimations are more efficient and accurate than conventional SCA estimations from gridded data.

## 2 Methods

### 2.1 FFN Construction

We first construct a *TIN* from all elevation measurements using a 2-D Delaunay triangulation (Barber et al., 1996). Delaunay triangulation is applied to the projected, spatial coordinates (e.g., UTM eastings and northings) of the elevation measurements such that the resulting TIN is a unique nearest neighbor network embedded in geographical space. Note that depending on coordinate systems and space, the distance metric might change. For our applications, we consider Cartesian coordinates with Euclidean distances. We define *facets* as the triangular surfaces formed in the TIN that guide the flow of water. This is in contrast to existing approaches where flows are modeled along the edges of the TIN (e.g., Ivanov et al., 2004). The flow direction on a facet follows the gradient determined by the elevation values of its three corners. The flow has to exit at one or two sides of the facet and enters from corresponding neighboring facets. This determines a directed connection between facets, which we define as the directed links of the *FFN* in which the facets are the network nodes. Since we obtain the TIN by Delaunay triangulation, the FFN is spatially structured as a Voronoi graph. This FFN can be seen as a drainage network, and since we assume the flow to be homogeneous along contours within facets, FFN links connect facets that drain into each other via flow lines. Accordingly, the FFN is a relational network and flow lines are not parallel to FFN links. We conceive FFN flow lines as flow paths following the gradient of the faceted surface of the TIN. This is in contrast to flow lines estimated at subpixel resolution by adjusting the flow path in the current grid cell according to the exit point in the previous grid cell(Zhou et al., 2011).

We illustrate the concept of FFN using 12 elevation measurements on an inclined plane obtained by two different sampling schemes: (i) regular grid (Figure 1a) and (ii) irregular random sampling (Figure 1b). The gradient-determined flow pattern within a facet can be only one of three types, all of which manifest in the two conceptual samples shown in Figure 1. The patterns are (i) sequential (Figure 1c), the flow enters from one side and exits through another; (ii) merging (Figure 1d), the flow enters from two sides and exits through one; and (iii) branching (Figure 1e), the flow enters from one side and exits through the remaining two sides.

These flow patterns arise because at each nonflat facet, the flow either enters the facet through one of its sides and exits either at one or both of the other remaining sides or enters on two sides and exits on the remaining one. We note that perfectly flat facets occur only because of the finite elevation measurement precision. In that case, one can lift one corner of such a facet, while remaining within the bounds of elevation uncertainty. We detect flat facets while computing the gradient for each facet. In the FFN, each exit side is seen as an outgoing link, and in order to construct the entire FFN, it suffices to focus only on these outgoing links. From the perspective of outgoing links, the sequential and merging flow patterns are equivalent as each of them have only one outgoing link. The branching flow pattern can in turn be seen as two parallel sequential flow patterns by dissecting the facet along the FFN flow line that passes through the lowest corner of the facet (Figure 2). The dissection is unique, and it conserves both area and flow; that is, the area of the original facet is equal to the sum of both subfacet areas, and the flow routed through the original facet is equal to the sum of both flows that get routed through the two subfacets. The conservation of flow holds because we assume homogeneous flow and the ridge length *l*_{()} is conserved (*l*_{orig}=*l*_{left}+*l*_{right}; see Figure 2). Although we are not dissecting branching facets in practice, each facet is seen as one node in the FFN; this perspective highlights how we accumulate the flow along branching facets.

### 2.2 Sinks and Connecting Facets via Tunnels

We define a *sink* as a depression in the landscape where water accumulates and then either internally drains via subsurface flow, evaporates, or overflows. Sinks are a common feature of elevation data sets at various spatial scales (O'Callaghan & Mark, 1984; Roering et al., 2013). Many sinks that are found in elevation data sets, such as tiny depressions that may cause puddles, are in fact an integral part of the geomorphic and hydrologic signature of the landscape. That being said, one of the primary goals of a flow model is to route water through all parts of the landscape, including sinks, to one or more outlets.

Conventionally, in gridded data sets, the issue of sinks is resolved either by filling up the depression or by carving a channel through the surrounding landscape, resulting in a modified, “hydrologically corrected” landscape (O'Callaghan & Mark, 1984; Planchon & Darboux, 2002; Rieger, 1993; Zhang et al., 2017). Other approaches avoid this modification by directing flow uphill out of a sink toward the outlet of that sink (Du et al., 2017; Wang et al., 2009). Here, we propose an approach that modifies the FFN without a modification of the underlying TIN. Instead of changing the elevation of the landscape, we introduce *tunnels*—new links that tunnel through sink-forming barriers (Figure 3). In FFNs, sinks cause *link cycles* and to introduce tunnels, we detect facets that are part of a cycle, remove the links that cause the cycle (cyan links in Figure 3), and place new links that connect the sink with lower facets nearby (dashed magenta links in Figure 3). In other words, for each cycle-causing link, starting at the facet where the link originates, we perform a breadth-first search for the next closest facet, which has its highest point below the level of the bottom of the sink. This results in two tunnels for each facet forming the sink, which may or may not connect to the same facet. This is clear from Figure 3b, where the facet on the far side has tunnels that end up in different locations, whereas for all the other facets forming the sink, both tunnels from each of them link to the same facet. Similar to filling or carving algorithms, a threshold of maximum distance or elevation drop can be included, in order to exclude nonphysical tunneling.

We acknowledge that a small and shallow depression in the landscape might rather overflow, than drain via subsurface flow, but in such cases tunnels will be short and shallow with differences between the two cases being negligible in terms of drainage area. We note that subsurface flow is typically orders of magnitude slower than overland flow, but since we focus on the steady state of flow, flow speeds are irrelevant. Fundamentally, the question whether a sink is rather filled or drained via subsurface flow is answered by a full hydrological model, which is beyond the scope of this study. For our application to airborne lidar data most sinks are very shallow and sampling dependent; that is, we conclude that all areas drain and that sinks are spurious (see supporting information Figures S7 and S8). Despite the use of tunnels in our approach, we aim at creating a hydrologically corrected flow on a TIN that can be used to study upslope areas, river profiles, and similar geomorphologic metrics.

Additionally, we note that in the FFN, cycles of length two occur that are not due to physical sinks but simply because two neighboring facets point toward each other such as the pages of a half open book (cf. only two facets connected by a cyan link as in Figure 3). This is also dealt with in the same way as in the case of sinks, only that we now search for the nearest facet lower than the lowest point of their common side. This has to be a facet whose highest corner is the lowest point of that common side. These new links represent the flow along the common side, which drains into the next lower facet.

### 2.3 Flow Accumulation and FFN-Derived SCA

In landscapes and their static consideration, flow is often expressed as flow accumulation or drainage area and not in units of cubic meters per second (Chirico et al., 2005; Gallant & Hutchinson, 2011; Jenson & Domingue, 1988; O'Callaghan & Mark, 1984). In that sense, the flow that exits a facet possibly comes from two areas: the upstream drainage area, which is the flow that is routed through the facet, and the area of the facet itself. The flow that exits a facet is therefore always larger than the flow that entered it through its sides. The flow accumulates along its path. We call these paths stream tubes, which are bounded by two flow lines. In general a stream tube changes its shape from facet to facet, which creates variations in the flow density along the stream tube and also within facets (Figure 4). This is relevant for SCA estimation since our flow partitioning assumes a homogeneous flow density along the contours of facets. The error introduced by this assumption occurs only at branching flow facets (Figure 2) but can propagate downstream (Figure 5). At branching facets the error is proportional to the TDA value, but further downstream the error reduces its relative importance because of additional contributions to the local TDA. Furthermore, since TDA is preserved and flow partitioning errors are both positive and negative, they will average out in converging landscapes and on larger scales. However, we validate our FFN-derived SCA on diverging terrain where the cancelation of errors is suppressed and still observe a better convergence to the analytical solution than the grid-based approach (see section 2.4 and Figure 9).

In practice, the flow accumulation on the FFN is calculated similar to a breadth-first search along the outgoing links. Starting at facets *i*, which have no incoming flow (e.g., hilltops and ridges), their outgoing links
along with the flow they transport *f*_{ij} (m^{2}), are added to a First-In-First-Out queue. This queue is successively emptied, while the sum
of all incoming flows for each facet is maintained (*I* is the set of all facets *i* that have a link to the current facet *j*). If the sum *F*_{j} (m^{2}) is complete, the TDA draining into the facet *j* is thereby determined and can, together with the 2-D projected area of the facet *j* itself (*A*_{j}; m^{2}), be distributed among its outgoing links. Hence, now the outgoing links of the *j* facets,
along with their outgoing flows *f*_{jk}, are added to the queue. In this way, the algorithm works itself in layers downhill (*i*,*j*,*k*,…) and collects all contributions from all facets (cf. Figure 5). See section 6 for a link to our implementation of this algorithm (section 6) with a pseudo-code description in Appendix Appendix A.

^{2}) of facet

*j*as the sum of all incoming flows

*F*

_{j}(m

^{2}) and its own contribution, the area of the facet

*A*

_{j}(m

^{2}). Otherwise facets

*i*with no incoming links would have drainage zero. Since facets are planar, contour lines on these are straight and perpendicular to the gradient. Hence, the contour length

*d*

_{j}(m) corresponding to the TDA of a facet is the width of that facet projected onto the axis perpendicular to the gradient of the facet. Thus, the FFN-derived

*SCA*(m) is defined as

### 2.4 Validation of FFN-Derived SCA

In order to quantitatively evaluate our flow accumulation scheme, and to compare it to existing measurements from regularly spaced data, we consider a Gaussian hill (Figure 6) as a synthetic landscape. The Gaussian hill serves as a first approximation of the low SCA, hilltop-like part of real-world landscapes. We expect point-cloud approaches such as our FFN approach to be especially useful in divergent parts of the landscape because the increased point density of point clouds extends studies to smaller SCAs. Note that by validating SCA, we validate TDA and the corresponding contour widths, which are both independently estimated by our FFN approach. Nevertheless, we are aware of the usefulness of TDA despite its resolution dependence.

*SCA*

^{0}of a Gaussian hill is one-dimensional in the radius

*r*if polar coordinates (

*r*,

*ϕ*) are used. As for the cone in Shelef and Hilley (2013), or any other hill-like surface with polar symmetry, the TDA that drains through the circle of radius

*r*and that has the origin at the hilltop is given by

*TDA*(

*r*)=

*πr*

^{2}. Together with the circumference of that circle, which is the corresponding contour width, the

*SCA*

^{0}(

*r*) at a distance

*r*from the center of such hills is given by

Note that equation 1 is for facets, whereas equation 2 is for points on the surface. For a comparison of the two, we interpret the *SCA*_{j} of a facet *j* as an estimate of SCA at the centroid of the facet. In this way, we can numerically compare the *SCA*_{j} obtained from a facet *j* to the corresponding *SCA*^{0}(*r*_{j}), where *r*_{j} is the radial distance of the centroid of facet *j*.

The FFN-derived SCAs are defined for the centroids of the corresponding facets and are therefore irregularly sampled in space (Figure 7, dark blue box, top layer). This is in contrast to established SCA estimates such as *D*_{∞} *SCA* or MFD SCA, which are only defined for gridded DEMs. To compare our SCA estimates to the existing approaches, we aggregate the FFN-derived SCAs to a grid (Figure 7, light blue box, bottom layer) or we apply the proposed FFN accumulation scheme to the TIN obtained from a gridded DEM (Figure 7, green box, bottom layer). These are then compared to the SCAs obtained by MFD (Figure 7, red box, bottom layer) and to the SCAs obtained by *D*_{∞} (Figure 7, orange box, bottom layer). The *D*_{∞} SCA estimates are shown exclusively in the supporting information, while here we focus on the MFD results because it performs better than *D*_{∞} on diverging surfaces (Erskine et al., 2006; Zhou & Liu, 2002).

This has the advantage that small deviations in regions of low SCA are more visible than similar deviations in regions of high SCA. Additionally, since SCA varies on many orders of magnitude, and because deviations generally increase with an increase in SCA, the root-mean-square error would be dominated by high errors for high SCA estimates (see supporting information Figure S2). Moreover, we do not aggregate deviations into a single value, but rather study the distribution of *δSCA* estimates.

In particular, we investigate how *δSCA* varies over different point densities *ρ* (number of points per unit area; see section 3.1) and how these are distributed spatially (see section 3.2). For a given *ρ*, we sample the Gaussian hill on the interval
such that its *xy* area is 10 square units. In this sampling scheme, a sample with *ρ*=10 would contain 100 uniformly random distributed elevation measurements from the Gaussian surface (see Figure 6). Next, for each point density, we sample the hill 1,000 times and therefore obtain a distribution of 1,000×*ρ*×10 relative deviations *δSCA*. We then compare the distribution of *δSCA* values obtained from the point clouds to three different gridded SCA estimation approaches: two obtained from applying MFD routine and FFN routine on the same gridded DEM, and the third obtained by gridding point-cloud FFN-derived SCAs to the same grid as the DEM. For the gridded approaches, we compute the DEMs by linear interpolation of the uniform randomly sampled elevation measurements to a grid with the same number of pixels as elevation measurements. Hence, given *ρ*, we determine the grid cell width *w* by
. This procedure of interpolating point measurements to a grid cell elevation is in contrast to directly measuring the surface elevation at the grid cell centers. However, we consider this to be a more realistic representation of how real-world DEMs are constructed because measurements are not performed at grid cell centers. We perform linear interpolation because it is common and most comparable to TINs.

Complementing the above analysis of *δSCA* distributions and their dependence of *ρ*, we also analyze the spatial pattern of *δSCA*, which shows how deviations from the analytical solution are spatially structured. For this, we choose a specific *ρ*=168.1 points per unit area, which results in a point cloud of 1681 uniform-randomly distributed surface measurements from the Gaussian hill. Correspondingly, we compute a 41×41 pixel DEM by linear interpolation with a grid cell width *w* of *w*≈0.077. We compare the following four scenarios: (i) *δSCA* from the FFN of the gridded DEM, (ii) *δSCA* from the FFN of the point-cloud data, (iii) *δSCA* by MFD of the gridded DEM as in (i), and (iv) *δSCA* as in (ii), but aggregated to the same grid as the DEM.

### 2.5 Application to Lidar Point-Cloud Data

Apart from synthetic surfaces, we also apply the FFN approach to airborne lidar data from the Santa Cruz Island in southern California. This tectonically active, very steep and densely vegetated terrain provides an ideal location to test the algorithm for an area where lidar point clouds represent an imperfect characterization of the bare-earth surface (Baguskas et al., 2014; Neely et al., 2017; Perroy et al., 2010, 2012). In particular, we choose the Pozo catchment in the southwestern part of the island (see Figure 8) covering different lithologies and vegetation covers. In this area, the deeply incised gullies and chaparral-like vegetation cover does not always allow for the lidar pulses to reach the surface(Perroy et al., 2010). We specifically selected this terrain for the difficulties in lidar point-cloud ground classification, because FFNs can also be constructed from unclassified or imperfectly classified point clouds.

Data were collected using a Riegl LMS-Q560 laser scanner flown on a helicopter retrieving on average 9 points per square meter leading to a point-cloud data set containing ≈7 ·10^{7} points for this catchment (NSF OpenTopography Facility, 2012). To ensure accurate triangulation of the data, we apply an initial thinning step that removes the higher point out of a pair of points if they occur within an *xy* distance of less than 5 cm to each other. The *xy* distances are computed using the cKDtree implementation of a quick nearest neighbor lookup distributed in the spatial module of the scientific computing package SciPy. This thinning results in a point cloud of ≈6 ·10^{7} points with unique *xy* coordinates. Additionally, we use the ground-only classified points in order to have more comparable results with existing grid-based techniques. The number of ground points for the Pozo catchment is ≈24 ·10^{6}.

*k*measurements out of

*N*; that is, we cross validate SCA in terms of elevation measurements. Theoretically, the number of possible bootstrapping samples is finite. However, given an initial number of

*N*measurements and a sample size of

*k*, the number of combinations are give by the binomial coefficient and therefore in practical terms limitless. For instance, in our application to lidar point-cloud data of a single catchment on the Santa Cruz Island in California, we bootstrap our samples from an initial number of

*N*≈6·10

^{7}, or

*N*≈24·10

^{6}, respectively, elevation measurements. In a bootstrapping sample, each measurement is chosen with the selection probability ,

*ρ*(

*x*

_{j},

*y*

_{j}) is the spatial point density at (

*x*

_{j},

*y*

_{j}). Thus, measurements in regions of low density

*ρ*are very likely to be chosen, whereas points in densely sampled regions (e.g., vegetation and overlapping flight lines) are not. This leads to a homogeneous sampling if not too many measurements per bootstrapping sample are selected. For a homogeneous sampling the sample size

*k*is limited by , with being determined by the lowest spatial densities in the region of interest. However, for our application to lidar data, we choose

*k*=

*N*/2 as a trade-off between homogeneity and data density. Additionally, we take the Voronoi cell area for each lidar data point as a good approximation for the inverse of the point density. Hence, given the 2-D Voronoi tessellation of the region of interest into Voronoi cell areas

*a*

_{j}corresponding to each point measurement at (

*x*

_{j},

*y*

_{j}), we approximate the selection probability as,

According to this probability we select 100 bootstrapping samples with *N*/2 lidar measurements each, from which we obtain an ensemble of 100 FFNs. This results in almost all elevation measurements to be incorporated at least once.

For comparisons to gridded approaches, we generate gridded DEMs from the lidar point cloud samples by linear interpolation as with the synthetic examples, that is, the number of valid elevation pixels is given by *N*/2. In section 3.3, we compare our FFN approach for lidar data to grid-based approaches in three ways: (i) we compare the distributions of SCA for ground-classified points; (ii) we analyze spatial flow patterns and the influence of vegetation and microtopography to such patterns, and compare our FFN approach to MFD in that respect for ground-classified points and the full point cloud; and (iii) since the channel is in parts highly vegetated, we highlight the efficiency of tunnels for the real-world example of tunneling the flow through vegetation and along the channel and compare longitudinal longest flow path profiles using the unclassified full point cloud.

## 3 Results

### 3.1 Dependence of Relative Deviations on Point Densities

For the synthetic landscape of the Gaussian hill (Figure 6) and for our FFN approach on point clouds, relative deviations *δSCA* decline similar to a power law ∝*ρ*^{1/2}. We quantify this by the decline of the upper quartile of the *δSCA* distributions and a corresponding increase of the lower quartile, both toward zero (Figure 9, dark blue solid lines). Naturally, the convergence of the same SCA only aggregated to a grid is very similar (Figure 9, light blue dashed lines). However, different results are retrieved by flow accumulation on the gridded DEMs of the Gaussian hill. For FFN SCA estimations using linear interpolation DEMs, relative deviations *δSCA* are larger for low point densities *ρ* and converge toward zero. However, the lower quartile converges similar to the point-cloud FFN *δSCA* but offset to larger negative deviations, whereas the upper quartile converges faster (see Figure 9, green dash-dotted line). For MFD SCA estimations using linear interpolation DEMs, relative deviations *δSCA* are larger for low point densities *ρ* and do not converge toward zero. For overestimations, here represented by both quartiles of the MFD *δSCA* distributions, the convergence slows down with an increase in *ρ* until it stagnates at *δSCA*≈5*%* (see Figure 9, red dotted line).

### 3.2 Spatial Patterns of Relative Deviations

Relative deviations of the point-cloud FFN SCA from the Gaussian hill do not exhibit strong spatial patterns except for a radially symmetric increase of SCA overestimation close to the center, that is, with increasing elevation (Figure 10b). This holds also for the gridded version of the same SCA, except that some of the smaller deviations from the analytical solution average out with the aggregation to the grid, leading to a larger fraction of pixels being within the (−5*%*,5*%*) interval (Figure 10d). This is in contrast to FFN and MFD flow accumulation on the gridded DEM (Figures 10a and 10c). As expected from the distributions of *δSCA*, both show larger deviations from the analytical solution than estimations directly from point clouds and the largest deviations occur along the cardinal and diagonal directions, especially for FFN on the gridded DEM. Furthermore, MFD leads more often to overestimations in SCA (Figure 10c), whereas FFN approach results in more underestimations (Figure 10a).

### 3.3 Pozo Catchment Lidar Point Cloud

The SCA probability density functions (PDFs) estimated by the four different approaches are fairly similar within their ranges of SCA values (Figure 11a). All PDFs have a similarly modulated power law-like tail, which is slightly offset according to the different normalizations due to different SCA value ranges, that is, different supports. The differences between the PDFs are highlighted by quantile-quantile plots, which map the quantiles of our proposed point-cloud FFN SCA measure to the corresponding quantiles of the three compared approaches (Figure 11b). Since the point-cloud FFN SCA has the largest value range, ranges of very low (high) quantiles of the point-cloud FFN SCA correspond to the minimum (maximum) of the other SCAs. Within the possible range of SCA values, quantile-quantile plots have a finite positive slope. If two PDFs are identical up to a normalization constant, their quantile-quantile function will have a slope of one. None of the compared PDFs are identical in that sense and consequently their quantile-quantile function slopes vary around one. This variation is most pronounced for MFD SCA, less for FFN on grid SCA and gridded FFN SCA. Note that the variation of slopes is very similar between these three quantile-quantile functions for intermediate SCA values (from 10^{1} to 10^{5}), especially between the three FFN approaches.

The spatial SCA patterns obtained from the MFD routine on the 60-cm DEM ensemble (Figure 12b) contain no SCA values below 0.36 due to the limit in spatial resolution. Our point-cloud estimates of TDA and SCA, however, although gridded to the same resolution, are computed directly from irregular data and resolve smaller drainage areas (Figures 12a and 12c). The overall flow pattern in terms of the shapes of the rivers and the magnitudes of the SCA values are similar for both *SCA*_{pcl} and *SCA*_{mfd}. However, relative deviations between both reveal pronounced differences in channels and at hilltops (Figure 12d). Despite many positive deviations (red), most deviations are negative (blue); that is, *SCA*_{pcl} is often lower than *SCA*_{mfd}. This is especially visible at broad hilltops and at channel boundaries (cf. Figure 13). Possible influences of vegetation are best seen in combination with the point-cloud intensity which is low for vegetated points (Figure 14). Focusing on two arbitrary starting points of flow lines, we compute for each FFN of the bootstrap ensemble one deterministic FFN flow line for each starting point. This provides us with an ensemble of FFN flow lines for each starting point that captures the flow line uncertainty given the point-cloud elevation data. With individual flow lines depicted, a tunnel is exemplified by a long straight part in the flow line. These occur mainly within vegetated regions, especially if the full unclassified point cloud is studied (Figure 15). However, FFN flow lines rather meander around some vegetated parts or get dispersed (Figure 14b).

The longitudinal flow path profiles for the FFN SCA of the unclassified full point cloud and the MFD SCA of the longest flow path in the catchment are estimated in terms of the path distance from the hilltop (Figure 16). The two profiles are quite similar for low path distances but are visibly offset for the rather steep descent around 750 to 1,000 m. In the detailed views of the profiles from the hilltop (Figure 16a) and the vegetated channel (Figure 16b), the interquartile range (gray shaded regions) of the surrounding elevations, that is, <1 m from the channel, are better resolved. For vegetated regions, this distribution is shifted toward higher elevations and the flow is routed via tunnel (a few representative examples of which are shown using magenta lines). Particularly for MFD, we observe flat sections which correspond to regions which have undergone sink filling. The point-cloud FFN SCA increases on average monotonically as elevation decreases, but shows fluctuations on smaller length scales due to branching and merging of the channels.

## 4 Discussion

### 4.1 Dependence of Relative Deviations on Point Densities

Our results suggest that relative deviations *δSCA* for flow accumulation on DEMs from linearly interpolated point clouds are less centered around optimal deviations (*δSCA*=0) than for flow accumulation on the point clouds assuming the same linear model of a TIN. Although FFN on grid *δSCA* distributions converge similarly to point-cloud FFN *δSCA* distributions, they are biased by underestimations due to grid effects. MFD *δSCA* distributions on the other hand have higher quartiles due to SCA overestimations. Regarding MFD convergence, we note that for point densities between 10^{1} and 10^{2} points or pixels per unit area, the convergence is comparable the FFN, for *ρ* between 10^{2} and 10^{4} the convergence slows down, and from 10^{4} points per unit area on, an increase in spatial resolution has no effect on MFD SCA accuracy; that is, it stays at a fixed overestimation of ≈5%. However, for the FFN flow accumulation on the point clouds, relative deviations steadily decline by one order of magnitude for an increase in point density by roughly 2 orders of magnitude. For an analysis regarding the scaling in computational costs we refer to the supporting information (see Figures S5 and S6).

### 4.2 Spatial Patterns of Relative Deviations

The analysis of spatial patterns of *δSCA* illustrates the advantage of deriving SCA from irregular point clouds instead of first gridding the data and then deriving flow accumulations. The gain in SCA precision is not due to more elevation measurements in the point cloud versus pixels in the DEM, both are kept equal, but due to a lack of grid effects. For the example of a 41×41 grid and *ρ*=168.1 number of points per unit area, we have on average one surface measurements per grid cell (Figure 10). Apart from the variability in SCA accuracy due to random sampling, SCA tends to be overestimated close to the hilltop for all four scenarios. However, since we study relative deviations, errors close to the hilltop are emphasized (i.e., *δSCA*→*∞* as *r*→0).

Another spatial pattern is aligned with the orientation of the grid and best visible in the first scenario (i) of FFN SCA estimation from the gridded DEM (Figure 10a). For this case, SCA is highly overestimated along the cardinal directions of the grid and rather underestimated otherwise. Such spatial patterns in relative deviations are also caused by grid effects. For MFD SCA, grid effects are more difficult to see at this coarse resolution, but these become more pronounced at higher resolutions and grids with perfect elevation data or other flow accumulation algorithms such as *D*_{∞} (see supporting information Figures S3 and S4).

### 4.3 Implications of Grid Effects and SCA

DEM metrics such as TDA, SCA, slopes, and curvatures are typically calculated using the gridded product, rather than using the TIN of measured data directly. Here we put forward the idea of taking advantage of the usually much higher resolution of the raw measurements instead of aggregating measurements to a grid. For our examples we choose grids with on average one measurement per grid cell; however, that is not often the case. The motivation to take point-cloud data directly is twofold: First, the above analysis illustrated the benefits in accuracy of higher measurement densities (cf. Figure 9). Second, the regularity of grids can cause spatial effects that propagate on scales beyond the spatial scale of grids (cf. Figure 10). Any neighborhood-based flow accumulation will suffer from the fact that locally only a finite set of flow directions occur. Extreme cases are the *D*4 and *D*8 flow directions, but even *D*_{∞} and MFD allow only the four cardinal directions plus the four *π*/4-shifted directions (Freeman, 1991; Tarboton, 1997). Because these routines are designed for square grids only, the eight local flow directions are also globally the only ones occurring. Biases due to imperfect flow partitioning can therefore aggregate along these directions together with the flow aggregation. On cones, or other highly symmetric surfaces, these biases create especially obvious and well known artifacts in the flow as previously described by Shelef and Hilley (2013), Qin et al. (2007), Zhou and Liu (2002), and Freeman (1991). In order to minimize such grid effects, the MFD algorithm includes an additional parameter that is adjusted to the analytical solution of the SCA of a cone(Freeman, 1991). However, although this leads to reduced grid effects when compared to *D*_{∞}, they are still apparent (cf. Figures S4 and S5). In contrast, on irregular sampled data a finite-neighborhood flow accumulation scheme does not lead to a finite set of total flow directions. This is an advantage of our FFN approach, as it is explicitly designed for irregular sampled data such as point clouds. Sometimes an aggregation of measurements to a grid might seem inevitable because of noisy data or the necessity of a gridded flow product for further analysis, but in order to avoid grid effects it is advisable to aggregate SCA results instead of elevation measurements.

### 4.4 Pozo Catchment Lidar Point Cloud

Due to the high spatial resolution of the airborne lidar point cloud, and especially due to the random homogeneous density sampling, we have a larger ensemble of SCA estimates for the point-cloud data as well as for the grid-based approaches.

The ensembles of SCA values allow a smooth and reliable histogram estimation, even for very small drainage areas (Figure 11a, solid dark blue line). If this ensemble of point-cloud SCA estimates is aggregated to the 60-cm grid prior to the density estimation, the correspondingly estimated PDF is flattened out, that is, grid cells with many low SCA estimates (e.g., from vegetation or boulders) and relatively fewer, but much higher SCA estimates (e.g., from channels or river basins) will aggregate to grid cells of high SCA. Although this is intended, because grid cells should represent the channel SCA if they include a channel, it leads to relatively fewer low SCA and higher SCA values (Figure 11a, dashed light blue line). Essentially, this is due to the too coarse grid resolution of 60 cm. The same holds for MFD and FFN on the 60-cm DEM ensembles, but for a different reason. For flow accumulation on grids, channels have a discretized width of multiples of the grid resolution. This leads to an overrepresentation of channels in the PDF of SCA estimates (see also Figure S6 for results corresponding to *D*_{∞}), less so for the FFN on the gridded DEM, because the grid is triangulated and the flow is discretized by triangles with an area of half a grid cell. This is also the reason why FFN flow accumulation on the 60-cm DEM resolves *SCAs* below 0.36 cm (cf. Figure 11a, dash-dotted green line). The smallest TDA in this case is 0.18 m^{2}, and the smallest SCA is therefore
m.

The quantile-quantile maps for the SCA PDFs highlight the differences between our point-cloud SCA estimate and the other grid-based approaches in terms of quantile-position thresholds (Figure 11b). Since all PDF estimates have a different support, the images of the quantile-quantile maps vary between densities. In that sense, absolute differences between probabilities are meaningless because each PDF is normalized on its own support. The more parallel a density relative to the reference line of the point-cloud SCA PDF is (Figure 11b, solid dark blue line), the more similar their quantiles vary with SCA. For instance, on the interval 10^{−2} ≤ *SCA*<10^{6} the PDF for the gridded SCA evolves more parallel to the PDF for the point-cloud SCA than the PDFs for the flow accumulation on the gridded DEMs. Overall, the FFN-based PDFs are more parallel to the reference PDF than the MFD PDF.

In an analysis of spatial SCA patterns, we confirm the aforementioned differences in SCA for the different approaches and the dependence of these differences on SCA magnitude (Figure 12). In contrast to low SCA values from the point-cloud FFN, low MFD SCA values are bounded by the resolution of the DEM and cannot resolve areas smaller than 0.36. This effect is further emphasized by the tendency of MFD to overestimate SCA values (Figure 9). For the comparison by relative deviations (Figures 12d and 13b), this results in negative deviations (blue) for regions where *SCA*_{mfd} is limited by the resolution of the DEM. Interestingly, our *SCA*_{pcl} reveals channelization closer to the drainage divide than *SCA*_{mfd}, which leads to positive deviations (red) in the form of channels (Figure 13).

Further, it is possible to see feedbacks between vegetation and channelization using this high spatial resolution lidar data. Due to the stabilization of soil by vegetation (Ludwig et al., 2005), soil is rather eroded around it, leading to more surface runoff there than in vegetated patches. This enhances erosion around vegetation and combined with weak lithologies leads to early channelization close to the drainage divide (Perroy et al., 2012). Additionally, vegetation traps fluvial and aeolian sediment, which elevates vegetated patches. However, since we use only ground-classified points, vegetated patches are gaps in the point cloud that are incorporated into the FFN by gap-spanning facets. In order to see the influence of these gaps, we compare our flow line ensemble using only ground points (Figure 14) with an ensemble using all points (Figure 15). Both ensembles show a very similar SCA as well as flow line pattern with the ground-only ensemble being less influenced by vegetation. In the flow line ensemble using all points, vegetation acts as stable patches and forces flow around vegetation rather than through itself. Most flow lines go around bushes and trees with only few tunnels going through bigger vegetation patches. The biggest difference between the ensemble using only ground-classified points and the ensemble using all points is seen for high vegetation (dark colors in Figures 14b and 15b). For the unclassified point-cloud ensemble flow lines rather meander around high vegetation such as trees, whereas for the ground points ensemble flow lines are less obstructed. This suggests that the erosional imprint of trees is overestimated in the unclassified point cloud. Nevertheless, both ensembles show meandering around chaparral vegetation with early channelization close to the drainage divide.

The efficiency of tunnels in our FFN approach is also demonstrated with a comparison of two longitudinal flow path profiles: one for our point-cloud FFN approach and the other from the *SCA*_{mfd} field of the approximately vegetation-free gridded DEM ensemble (Figure 16). Especially within the first kilometer from the hilltop, there are very steep and highly vegetated parts of the channel where lidar pulses do not reach the surface. As a result, some of the grid cell elevations are over estimated and combined with the discretization of the channel into grid cells, this leads to sinks in the channel. Resolving these by a sink-filling approach leads to plateaus (i.e., flat sections) in the MFD profile, while in the FFN profile sinks are circumvented by tunneling (cf. Figures 16a and 16b). Although the tunnels are often much longer than the grid cell spacing for MFD, the resulting FFN profile appears to be more smooth. In map view, this also leads to a straighter and less meandering path for the FFN profile. The MFD profile in turn is shifted to the right of the FFN as a result. Since the spatial resolution of the point cloud samples is the same than that of the gridded DEMs, this suggests that discretization of the channel into a series of grid cells makes the river longer than it actually is (cf. Fisher et al., 2013). This highlights again the advantage of irregular point-cloud sampling. Even though the FFN longitudinal flow path profile includes many tunnels due to vegetation, it records the channel bottom at those geographic locations where the corresponding elevations were measured and not at predefined grid cell centers.

### 4.5 Outlook: Additional FFN-Based Metrics

Drainage area is only one measure of interest in flow terrain analyses, and it is often accompanied by measures such as direction of descent, slope, and curvature. Although an analysis of these is beyond the scope of this study, some of them are directly retrievable by our FFN approach. For instance, the direction of descent and slope at each facet would be given by the gradient of that facet. An alternative to the slope could be the derivative of elevation along FFN flow lines and similar to that the second derivative could serve as a curvature estimate.

## 5 Conclusions

We presented a novel approach for the representation of flow on topographic surfaces and the estimation of SCA on irregular sampled elevation measurements. The approach is based on the calculation of a TIN of measured data by Delaunay triangulation and routes the flow from each facet of the TIN according to the so called FFN. The validity of the linear TIN model depends on the data density and surface roughness. Considering the analyses done in this study, we conclude that a TIN is a sufficient surface model for high-resolution data such as airborne lidar point clouds. Based on the FFN, we also present a fundamentally different treatment of sinks without any modification of the digital representation of the landscape, that is, changing elevation measurements by sink filling or carving. This also allows a quantification of uncertainties in SCA and resulting flow patterns by generating a bootstrapping ensemble of elevation measurements and therefore an ensemble of FFNs. Both, the treatment of sinks as well as the bootstrap sampling are optional steps, but useful in real-world applications. Results from our FFN point-cloud approach were compared to grid-based procedures for the estimation of SCA using real-world data as well as synthetic data from a Gaussian hill where the analytical solution for SCA is known. For all comparisons we use DEMs that have the same number of pixels as elevation measurements in the corresponding point clouds. Based on these comparisons, we conclude that our network-based approach generates flow patterns closer to the analytical solution on a synthetic surface than, for example, the MFD flow accumulation and offers the following advantages: (i) The association of a well-defined contour width to each link of the FFN, which grants an accurate estimation of SCA from the corresponding TDA values; (ii) the applicability to irregular spaced data makes high-resolution point-cloud data directly accessible to flow path tracing, not requiring the creation of DEMs. (iii) Additionally, this allows for a dynamic data density in terms of a varying point density depending on, for instance, surface roughness or measuring technique.

Although in this study we create high-resolution DEMs with the same number of pixels as points in the point cloud, this is only for comparison and not a recommended DEM creation scheme. Typical DEMs have less pixels than the underlying point cloud in order to mitigate elevation uncertainties and to avoid interpolation artifacts. Compared to that, our FFN approach is beneficial regarding an increased SCA accuracy due to a more dense spatial sampling, but it also makes it possible to study features in the landscape that are smaller than the grid cell width of such typical DEMs.

## 6 Software Availability

A Python module written in C called *FacetFlowNetwork*, which implements the FFN construction and SCA estimation, is available at https://github.com/UP-RS-ESP/FacetFlowNetwork website. Additionally, an implementation of the MFD algorithm by Freeman (1991), based on the implementation of Pelletier (2012), is available at https://github.com/UP-RS-ESP/mfdrouting website.

## Acknowledgments

Bedartha Goswami was funded by a scholarship from the Ministry of Science and Education of the state of Brandenburg (MWFK). We thank six anonymous reviewers for the careful reading of the manuscript and their detailed and constructive reviews.

## Appendix A: FFN Flow Accumulation Algorithm

Our flow accumulation algorithm requires the in-degree *K*_{j} of each facet *j* in the FFN in order to confirm the arrival of all inflows. We compute the in-degree by simple counting with two nested for loops (cf. Algorithm 1). This is reasonably fast because facets have either one or two outgoing links; that is, the outer loop has *N* (number of facets) iteration and the inner only one or two.

Similar to the array of in-degree *K*, the flow accumulation algorithm maintains an array of counters *Seen* and an array for the sums of inflows *F*. These arrays are initialized as zero vectors (Algorithm 2, L1). The sum of inflows *F*_{j} for a given facet *j* together with the area *A*_{j} of the facet is defined as its *TDA*_{j} (cf. equation 1).

The breadth-first walk starts at facets *i* that have no in-degree (*K*_{i}==0, Algorithm 2, L2) and initializes a queue with their downhill neighbors *j* along with the flow *f*_{ij} from *i* to *j*. This queue is emptied in a while loop, while the counters *Seen* and the sums of inflows *F* are updated (Algorithm 2, L5 and L6). If the counter *Seen*_{j} for a given facet *j* is equal to its in-degree (*Seen*_{j}==*K*_{j}, Algorithm 2, L7), all inflows are accumulated and are passed on to the next layer of *k* facets.

If the facet *j* is a sequential or merging facet, *a*_{jk} and *w*_{jk} are given by *a*_{jk}=*A*_{j} and *w*_{jk}=1, because there is only one downhill neighbor *k*. However, if the facet *j* is a branching facet the areas *a*_{jk} are given by the areas of two corresponding sequential facets and the weights *w*_{jk} are given by *l*_{left}/*l*_{orig} and *l*_{right}/*l*_{orig}, respectively (cf. Figure 2).