# Minimum Hydraulic Resistance Uncertainty and the Development of a Connectivity-Based Iterative Sampling Strategy

## Abstract

The presence of well-connected paths is commonly observed in spatially heterogeneous porous formations. Channels consisting of high hydraulic conductivity (*K*) values strongly affect fate and transport of dissolved species in the subsurface environment. Several studies have established a correlation between connectivity properties of the spatially variable *K*-field and solute first arrival times. However, due to limited knowledge of the spatial structure of the *K*-field, connectivity metrics are subject to uncertainty. In this work, we utilize the concept of the minimum hydraulic resistance and least resistance path to evaluate the connectivity of a *K*-field in a stochastic framework. We employ a fast graph theory-based algorithm to alleviate the computational burden associated with stochastic computations in order to investigate both the impact of the hydrogeological structural conceptualization and domain dimensionality (2-D vs. 3-D) on the uncertainty of the minimum hydraulic resistance. Finally, we propose an iterative data acquisition strategy that can be utilized to identify the least resistance path (which is linked to preferential flow channels) in real sites. A synthetic benchmark test is presented, showing the advantages of the proposed sampling strategy when compared to a regular sampling strategy. By using the iterative data sampling strategy, we were able to reduce first arrival time uncertainty by 47% (when compared to the regular sampling strategy), while maintaining site characterization efforts constant.

## Key Points

- Efficient computation of hydraulic connectivity using graph theory
- Impact of both the RSF model and dimensionality (2-D/3-D) on connectivity uncertainty
- Proposed iterative sampling strategy reduces first arrival time uncertainty

## 1 Introduction

The spatiotemporal dynamics of a solute body is strongly affected by the variability of the hydraulic properties of the subsurface environment. Fluctuations in the hydraulic properties of the porous medium, such as the hydraulic conductivity (*K*), lead to well-connected paths which are key in controlling first arrival times at an environmentally sensitive target (Gómez-Hernández & Wen, 1998). Identifying well-connected paths in the subsurface environment is critical for risk analysis in many hydrological, environmental, and energy applications (e.g., Andric̆ević & Cvetković, 1996;de Barros, 2018;Fuks et al., 2019). Since a complete characterization of the subsurface is not feasible, the identification of well-connected paths is a challenging task. Instead, few field measurements are collected and therefore the *K*-field is subject to uncertainty. For such reasons, stochastic methods are employed to construct a *random space function* (RSF) of the *K*-field (Dagan, 1989; Rubin, 2003). As a consequence of the uncertainty in the *K*-field, the connectivity structure is also a random entity. Consequentially, first arrival times need to be characterized statistically through their moments and probability density function.

The effect of well-connected paths on solute transport has been extensively studied in the hydrogeological community (e.g., Fiori et al., 2010; 2014; Fogg et al., 2000; Knudby & Carrera, 2005; Rizzo & de Barros, 2017; Sánchez-Vila et al., 1996; Trinchero et al., 2008; Tyukhova et al., 2015; 2016; Western et al., 2001). Implications of aquifer connectivity in the probabilistic assessment of increased lifetime cancer risk due to the exposure of chlorinated solvents is also reported in the literature (Henri et al., 2015). Poeter and Townsend (1994) discussed how well-connected *K* regions could improve aquifer remediation (see also; LaBolle & Fogg, 2001). In particular, the connectivity of the *K*-field is one of the main factors leading to non-Fickian behavior commonly observed at the field scale (Bianchi & Pedretti, 2017; Zinn & Harvey, 2003). In the literature, different metrics have been proposed to quantify connectivity (e.g., Fiori, 2014; Knudby & Carrera, 2005; Le Goc et al., 2010; Renard & Allard, 2013; Sánchez-Vila et al., 1996). These metrics can be classified as static (i.e., metrics solely based on the *K*-field) or dynamic (i.e., metrics based on quantities describing key features of the flow field and transport). For additional details on different connectivity metrics, we refer the reader to Knudby and Carrera (2005) and Renard and Allard (2013). In this paper, we focus on a static quantity denoted as the *minimum hydraulic resistance* (*MHR*) and the corresponding *least resistance path* (*LRP*; Rizzo & de Barros, 2017; Tyukhova & Willmann, 2016). The MHR and the LRP are solely based on the *K*-field. The MHR is related to the minimum resistance encountered by a solute particle traveling between two control volumes, that is, higher values of MHR implies larger solute residence times (Tyukhova et al., 2015; Tyukhova & Willmann, 2016). The LRP corresponds to the curve connecting the two control volumes that minimizes the resistance. One of the main advantages of using static quantities such as the MHR is that important features of the hydrogeological system can be extracted without resorting to flow and transport simulations. They are particularly suited for applications where time resources are limited and fast preliminary screening analysis is required. Moreover, they can potentially indicate how the physical heterogeneity of the aquifer affects solute transport. Another example of static metric is the *geological entropy* introduced by Bianchi and Pedretti (2017, 2018). We point out that not all dynamic quantities can be correlated to static quantities, since the latter are missing crucial information about flow (e.g., velocity gradients) and transport.

With the goal of estimating the hydraulic resistance in heterogeneous porous media, Tyukhova et al. (2015) introduced an algorithm based on erosion and dilation of a moving polygon. The authors used this algorithm to estimate high *K* channels in two-dimensional (2-D) porous media (Tyukhova et al., 2015; Tyukhova & Willmann, 2016). The extension of this method to three-dimensional (3-D) fields can be cumbersome and computationally expensive. To overcome the computational burden associated with the estimation of the MHR, Rizzo and de Barros (2017) transformed the continuous *K*-field into a graph, proving that finding the MHR is equivalent to the famous *shortest path problem*. This allows us to use a variation of the computationally efficient Dijkstra's algorithm, being able to compute the MHR and LRP in both 2-D and 3-D conductivity fields. Rizzo and de Barros (2017) also showed how the MHR, computed through the Dijkstra's algorithm, is related to first arrival time and how the LRP can be used to estimate the trajectories of fastest particles (i.e., leading front of the plume). Furthermore, Rizzo and de Barros (2017) reported results for Gaussian and non-Gaussian fields and developed a semianalytic expressions relating the expected value of the MHR to key geostatistical properties characterizing the log *K*-field (such as its RSF parameters). Knudby and Carrera (2006) provided a similar application of Dijkstra's algorithm for 2-D fields. The authors showed how both solute plume and drawdown signal follow the LRP. Moreover, the works of Viswanathan et al. (2018) and Hyman et al. (2018) show the advantages of using graph theory to estimate flow and transport quantities in discrete fracture networks, where connectivity and flow channeling are of key importance (e.g., Maillot et al., 2016).

In this paper, we expand the ideas of Rizzo and de Barros (2017) to provide a framework based on the MHR and LRP to estimate the connectivity uncertainty of a given random *K*-field and to illustrate how the concept of the LRP can be employed to improve site characterization efforts. These quantities can be efficiently computed using graph theory, and the method can be easily extended to a large variety of grids (e.g., 3-D, nonstructured grid). In fact, in many cases it is possible to transform a grid or lattice into a graph. We recall that a graph is defined as a set of vertices that are connected by weighted edges (in this case, the hydraulic resistance between two vertices). Given the flexibility of the method, we explore its usage in practical applications where connectivity plays an important role. The objectives of this work are twofold. The first is to quantify the uncertainty of the MHR and its controlling factors. Mainly, we illustrate how the dimensionality (2-D vs. 3-D) and geological conceptualization (e.g., multi-Gaussian and non-multi-Gaussian log conductivity fields) impact the uncertainty of the MHR. Rizzo and de Barros (2017) only explored the impact of aquifer structure on the expected value of the MHR. The present contribution further develops the work of Rizzo and de Barros (2017) to quantify the empirical distribution of the MHR as a function of the above mentioned factors (i.e., dimensionality and geological conceptualization). Second, we demonstrate how the concepts introduced in Rizzo and de Barros (2017) can be employed to reduce the uncertainty associated with the identification of the LRP of a given *K*-field. To achieve this, we develop an iterative framework to best allocate data assimilation efforts with the goal of reducing the LRP uncertainty. Given the strong relation between MHR and first arrival times, our proposed framework results in a reduction of the uncertainty of the latter. This strategy is closely related to inverse modeling and optimal design (e.g., Alcolea et al., 2006; Kitanidis, 1996; Nowak et al., 2010; Rubin et al., 2010), where sampling locations are chosen by minimizing the expected prediction variance (e.g., first arrival times variance). In the case of optimal design and inverse modeling, many stochastic flow and transport simulations are needed to estimate such variance (see Herrera & Pinder, 2005, for a practical application). On the contrary, the iterative sampling strategy uses the MHR and LRP uncertainty as a proxy for the first arrival time uncertainty; therefore, no stochastic flow and transport simulation is needed to estimate the new sampling locations.

In section 2, we present the concepts of MHR and LRP and briefly review the algorithm utilized to compute these quantities presented in Rizzo and de Barros (2017). In section 3, we investigate, for the first time, how different geological conceptualization, such as different RSF models and dimensionality (i.e., 2-D vs. 3-D), impact the uncertainty of the MHR. This type of analysis can help engineers and scientists in choosing the appropriate conceptual RSF model based on the expected geological connectivity. Then, in section 4, we propose a novel iterative method for site characterization which is based on the concepts of the MHR and the LRP. In each step of the method, the sampling locations (where the hydraulic conductivity is sampled) are chosen to maximize the probability to find the LRP. This leads to a drastic reduction of the first arrival time uncertainty when compared to regular sampling strategies. Finally, a summary is presented in section 5.

## 2 Computing the MHR and LRP

*n*-dimensional aquifer characterized by a spatially variable log conductivity field . The Cartesian coordinate system is given by

**x**=(

*x*

_{1},…,

*x*

_{n}) where

*n*is the dimensionality of the porous formation. In order to measure connectivity of

*K*-fields, we utilize the concept of MHR. Let

*S*and

*T*be two set of points (e.g., a point, surface, or volume) representing the starting points and target points, respectively. We define the MHR between

*S*and

*T*as follows:

*S*to one point of

*T*. For example, a possible setup is to choose

*S*and

*T*as two opposite boundaries. The path that minimizes the hydraulic resistance between

*S*and

*T*(i.e., the hydraulic resistance along is equal to the MHR ) is called the LRP. In other words, any path connecting

*S*and

*T*will have a hydraulic resistance greater or equal the hydraulic resistance of the LRP.

The MHR has been found to have a close relation with the first arrival time of a solute plume (e.g., Figure 7 of Rizzo and de Barros, 2017, and Figure 3 of Tyukhova and Willmann, 2016), and at the same time, the LRP is correlated to the trajectory of the fastest particle (e.g., Figure 5 in Rizzo and de Barros, 2017). This relation indicates that connectivity structure of the conductivity field is often the driving factor governing the front of the plume. One of the main advantages of using the MHR and LRP is that these quantities can be efficiently computed. Equation 1 can be reformulated in a *graph theory* framework where it is equivalent to the shortest path problem that can be solved using a variation of the Dijkstra's algorithm. A schematic representation of how to transform the *K*-field into a graph is showed in Figure 1. More details about the algorithm are presented in Rizzo and de Barros, 2017 and briefly summarized in Appendix Appendix A. Using this algorithm, the MHR and LRP can be used to quickly estimate critical connectivity properties in real applications while limiting the computational burden.

## 3 Hydraulic Conductivity RSF Model and the MHR Uncertainty

*K*-field on the MHR uncertainty. The analysis is carried out for different dimensionalities of the physical domain (i.e.,

*n*=2,3). Our goal is to verify whether or not the statistical properties of the MHR change significantly according to the structure of the

*K*-field. To achieve this task, we will adopt four different RSF to describe the log conductivity field ( ). With the goal of providing a fair comparison, each field has same log mean

*μ*

_{Y}=0, same log variance , and same Gaussian semivariogram model

**r**is the

*n*-dimensional lag distance and

*h*is defined as follows:

*a*

_{i}denoting the practical ranges in each direction. For all the upcoming analysis within this section, we consider the

*K*-field to be isotropic, that is, all conductivity fields have the same effective range

*a*

_{i}=

*a*for every

*i*.

- Multi-Gaussian fields (
*MG*) - Non-multi-Gaussian field where high conductivity values are well connected (
*non-MG con*) - Non-multi-Gaussian field where low conductivity values are well connected (
*non-MG dis*) - Binary field characterized by two homogeneous facies (
*binary*)

For each field, both the 2-D and 3-D versions are taken into account. Therefore, we will analyze the MHR for a total of eight types of unconditional fields. Details pertaining to the *K*-fields and the corresponding methods employed to generate multiple random realizations are described in the upcoming subsection.

### 3.1 Spatially Random Hydraulic Conductivity Fields

*Multi-Gaussian Fields*: One of the most common choice to model a random log *K*-field is the MG model (Dagan, 1989; Kitanidis, 1997; Rubin, 2003). For any set of points, the joint probability density function (PDF) is assumed to be multi-Gaussian. This assumption permits to efficiently generate random realizations using, for example, the Sequential Gaussian Simulation technique (see Chapter 3 of Rubin, 2003) using SGEMS (Remy et al., 2009). More information about MG fields can be found in Rubin (2003).

*Non-Multi-Gaussian Fields*: In addition to the MG field, we also investigate the impact of augmenting connected and disconnected features in the *K*-field using the approach described in Zinn and Harvey (2003). Other approaches are also available (e.g., Rubin & Journel, 1991). The fields reported in Zinn and Harvey (2003) are “artificially” generated using a nonlinear transformation of the MG field. The connected fields (denoted in the plots as *non-MG con*) are characterized by the presence of well-connected *high*-*K* areas. On the contrary, disconnected fields (denoted in the plots as *non-MG dis*) are characterized by the presence of well-connected *low*-*K* areas. For each point of this field, the PDF of *Y* is Gaussian. However, the joint PDF of a set of points is not multi-Gaussian. For this reason, we refer to these types of fields as non-MG fields. Note that the block size is increased after the fields are transformed to ensure that all the fields (i.e., MG and non-MG) have the same practical range (Zinn & Harvey, 2003). These fields have been employed in multiple studies to investigate the role of connectivity in transport (e.g., Jankovic et al., 2017; Srzic et al., 2013; Tyukhova & Willmann, 2016).

*Binary Fields*: Finally, we consider a discrete binary field where the log *K* can assume only two values *y*_{1} and *y*_{2}. The field realizations are generated using sequential indicator simulation in SGEMS (Remy et al., 2009). Note that in this case, the univariate PDF is not Gaussian. To satisfy the constrains on the mean and variance, the two log *K* values are *y*_{1}=−*σ*_{Y} and *y*_{2}=*σ*_{Y}, and the probability of encountering each of these value is exactly 0.5. Therefore, this is a two-facies type of field, where the conductivity is homogeneous in each face. More complex multifacies fields can be obtained using multiple-points geostatistical simulations (Mariethoz et al., 2010) or transition probability-based indicator simulations (Carle & Fogg, 1996).

### 3.2 Statistical Analysis of MHR

#### 3.2.1 Box Plot Analysis

To understand the impact of the RSF model on the MHR, we consider the MHR between two points separated by a distance 4*a* and compare the results for different log-conductivity RSF models. Using equation 1, we define
, where **s** and **t** are two points at distance 4*a* between each other. The simulation domain is partitioned in 201×201 cells in 2-D and 201×201×201 cells in 3-D. The practical range is *a*=20 cells. We consider the four types of RSF (MG, non-MG connected, non-MG disconnected, and binary) previously described. For each field, we analyze both the 2-D and 3-D cases. Since the *K*-field is spatially random, the MHR is treated as a random quantity. To quantify the uncertainty in the MHR, a Monte Carlo simulation is performed for each of the fields (eight cases in total). For each case, an ensemble consisting of 100 realizations of the *K*-field is generated. For each realization, the MHR between two points at distance 4*a* is computed through the algorithm described in section 2 and Appendix Appendix A (see Rizzo & de Barros, 2017, for additional details). Finally, the key statistical properties of
can be estimated.

Figure 2 shows the box plots of the
for each of the RSF considered. Results are displayed for both 2-D and 3-D fields. Statistical properties of the
are reported in Table 1 for each type of field. Comparison of the
box plots provides two important information: it is possible to see how the MHR varies, in *average*, among different cases. Second, it is possible to see the impact of the RSF model on the *uncertainty* (i.e., the height of the box plot) of the MHR. The latter indicates the possibility of having realizations with extremely different connectivity configurations (e.g., two points laying in the same high conductivity channel or two points separated by a low conductivity barrier).

Field | Mean | Standard deviation | Skewness | Kurtosis |
---|---|---|---|---|

MG 2-D | 0.433 | 0.367 | −0.029 | 0.092 |

Non-MG con 2-D | −0.019 | 0.556 | 0.717 | 0.192 |

Non-MG dis 2-D | 0.765 | 0.345 | −0.502 | 1.139 |

Binary 2-D | 0.343 | 0.371 | 0.108 | −0.993 |

MG 3-D | 0.022 | 0.543 | 0.954 | 0.731 |

Non-MG con 3-D | −0.145 | 0.755 | 0.299 | −0.466 |

Non-MG dis 3-D | 0.359 | 0.397 | 0.143 | 1.134 |

Binary 3-D | −0.101 | 0.190 | 0.518 | −0.469 |

*Note.*MG = multi-Gaussian.

Close inspection of Figure 2 reveals that the MHR is consistently lower, in the average sense, for the 3-D fields when compared to the corresponding 2-D fields. This is in agreement with previous studies showing a higher connectivity when considering the third dimension, such as consistently lower percolation threshold for 3-D fields when compared to 2-D fields (Renard & Allard, 2013) or a significantly higher probability to find preferential channels in 3-D fields (Fiori & Jankovic, 2012). In the 3-D cases, there are more possible paths connecting the start and end points; therefore, minimization of the hydraulic resistance (among all the possible paths) leads to a lower MHR. Note that the difference of the average MHR between 2-D and 3-D can be more than 1 order of magnitude. Also, the uncertainty of the (i.e., the height of the box plot) is affected by the dimensionality in a nontrivial way. For example, our results show that for non-MG connected fields, the uncertainty in 3-D is higher than the 2-D case, while for binary fields the uncertainty in 3-D is lower than the 2-D case.

Next, we compare the results displayed in Figure 2 obtained for MG fields and the non-MG fields (connected and disconnected). As expected, the MHR of the connected field is (on average) lower than the MG field. At the same time, we observe that the mean value MHR of the disconnected non-MG field is higher than the average of the MG field. Since these differences can be relatively high (by order of magnitudes), it is clear that the MHR is strongly affected by high order moments characterizing the RSF, since both MG and non-MG fields share the same mean, variance, semivariogram, and univariate PDF. Moreover, we shall note that the transformation used to build the non-MG fields (i.e., Zinn & Harvey, 2003) strongly affects the uncertainty of the MHR. The connected field shows a remarkably higher uncertainty in the MHR when compared to the MG case, while the disconnected field MHR uncertainty is lower than the MG case.

In the following we compare the MG field with the binary field (see Figure 2). For this RSF model, the average MHR is similar in both the 2-D and 3-D cases. However, our results show that the uncertainty of the MHR in the binary field is smaller than the uncertainty in the MG field, especially in the 3-D case. This means that the binary field is not able to reproduce those extreme connectivity configurations that can be found in the MG case.

We remark that connectivity, as expressed by MHR, depends on the position of the start and end points (or volumes) and the type of RSF used. However, comparisons of the MHR as shown in the box plot analysis (Figure 2) can be used to quickly compare connectivity properties of different conceptual models. Comparison can be made for both the average MHR (is one geological model more connected than the other in average?) and the uncertainty of the MHR (is one model including more extremes than the other?). This is particularly important since the MHR displays a strong correlation with first arrival time (Rizzo & de Barros, 2017; Tyukhova et al., 2015). Moreover, the MHR can be used as a lower bound for first arrival time (Rizzo & de Barros, 2017). Therefore, the statistical analysis presented here can inform risk managers on the importance of the choice of the RSF model adopted in models when estimating early arrival times of a contaminant plume.

#### 3.2.2 PDF Analysis

In the previous section, we have seen the impact on the average resistance and its uncertainty. Now, we will compare the empirical PDF of
(obtained directly from the ensemble) to well-known, parameterized, PDF models. For this analysis, we have tested numerous PDF models. For each known PDF model, we first estimate its parameters using a proper maximum likelihood estimator, by setting the minimum
(zero for MG and non-MG fields and
for the binary field). Then, we perform the Kolmogorov-Smirnov goodness of fit test to compare the known PDF and the empirical PDF of
for each of the eight fields previously described. For each test, we report the corresponding *p* value that indicates the probability that
is distributed according to the given PDF. Here, we report the tests performed utilizing the following distributions: power lognormal, lognormal, exponentiated Weibull, and beta prime. Results for each type of RSF are shown in Figure 3.

We start by noting that none of these distribution models is able to fully mimic the statistical behavior in the binary 3-D case (see Figure 3h). The reason is that the
empirical PDF is bounded, as a result of the bimodal *K* distribution, while the known PDF taken in consideration has a semiinfinite domain (e.g., random values range from 0 to infinity). Next, we see that all distributions taken in consideration have a similar performance. It is interesting to note that many of these PDF are used to statistically model extreme events. This is consistent with the class of problem we are investigating in this paper, since the MHR can be seen as an extreme. For example, given a set of independent and identically distributed random variables with a finite minimum, the distribution of the minimum will converge to a Weibull distribution (e.g., Chapter 4 of Ang & Tang, 2007). However, the MHR is the result of the minimization problem 1 where the hydraulic resistances of all the paths are not independent. Still, results indicate that the PDF may be related to the exponentiated Weibull distribution, which is an extension of the Weibull distribution.

## 4 Identifying the LRP of a 3-D Random Hydraulic Conductivity Field and Implications on First Arrival Time Uncertainty

In this section, we demonstrate how the proposed MHR concept can be employed to improve data acquisition campaigns and reduce uncertainties associated with the LRP and, consequently, with first arrival times of a solute body.

We start by considering a 3-D unconditional *Y*-field characterized by mean *μ*_{Y}, variance
, and Gaussian semivariogram model as described in 2. The considered field contains 181×91×31 cells, each cell being a regular 1×1×1 cube. The practical ranges in horizontal directions are *a*:=*a*_{1}=*a*_{2}=30 cells, while the vertical range is *a*_{3}=*a*/2=15 cells. In this study, we consider a dimensionless coordinate system
, where
are the coordinates in dimensional form.

For our analysis, a random unconditional realization (with *μ*_{Y}=0 and
) is generated as shown in Figure 4 using the sequential Gaussian simulation method and SGEMS (Remy et al., 2009). We refer to this field as
, and it will represent our *synthetic true Y*-field from which we will sample conductivity values at different locations. In the following subsection, we will explain the details of the LRP-based sampling strategy.

### 4.1 Iterative Sampling Strategy Based on the LRP

*x*

_{1}=0) and right boundary (

*x*

_{1}=6). As previously mentioned, this field will represent the synthetic true field from which we will extract log conductivity data at specific locations. We recall that the hydraulic resistance along the LRP is, by definition, less or equal to the hydraulic resistance of any other path connecting the left and right boundaries (see section 2). The goal is to estimate the LRP by sampling the conductivity field only in few locations, emulating a real site characterization process. Instead of using a regular sampling grid, we select the sampling locations through an iterative approach, using current information on the LRP and its uncertainty to select the locations for the new samples. The iterative sampling method is based on the following steps:

- Select the initial
sampling locations where
*j*=1,…,*N*_{0}and*N*_{0}is the initial number of measurements. - In each new location
**X**^{(j)}, sample the hydraulic log conductivity and create a new measurement data vector - Using the new measurement data
*θ*^{*}together with previously sampled data, generate*N*_{f}conditional realizations of the*K*-field. - For each realization, employ the graph theory algorithm to compute the LRP. From all these realizations, compute the probability map of the LRP.
- Select
*N*_{m}new**X**^{(j)}locations using the information on the uncertainty of the LRP and restart from Step II.

The results originating from Steps I–V are illustrated in Figure 5. In Figure 5 we show the probability of finding the LRP for each iteration of the data sampling campaign. For this type of analysis and for the purpose of illustration, it is convenient to use a *x*_{3}-integrated probability map in order to better identify the sampling locations for the next iterations. Moreover, the probability is computed in cells containing 5×5 blocks of the original numerical grid. The probability that the LRP crosses a given cell can be calculated by counting the number of realizations where the LRP crosses the cell. The initial sampling locations are located on a horizontal 4×3 regular grid (see black circles in Figure 5.I1), for a total of 12 horizontal sampling locations. For each horizontal sampling location, conductivity values are extracted in layers 0, 5, 10, 15, 20, 25, and 30, so that the initial number of measurements is *N*_{0}=4×3×7=84. Using this information, we generate *N*_{f}=100 conditional *K*-fields and compute the LRP for each realization following the approach described in section 2. After the probability map of finding the LRP is computed, we select the six new horizontal sampling locations (see red squares in Figure 5.I1). Therefore, the number of measurements added for each iteration is *N*_{m}=6×7=42. Since the objective is to find the LRP, the new sampling locations should be chosen based on the current LRP probability map. Therefore, locations with high probability of finding the LRP should be preferred. The sampling strategy consists in allocating most of the sampling locations in areas where the probability to find the LRP is high and the remaining sampling locations where the probability is low. Collocating few sampling locations in low probability areas reduces the risk of focusing on suboptimal paths (or from an optimization point of view, converging to a local minimum), therefore improving the quality of the iterative procedure. For this benchmark case, we place four sampling locations on areas where there is a high probability of finding the LRP and two elsewhere. It is clear from Figure 5 that after only five iterations, we are able to detect the LRP with a relatively small level of uncertainty (refer to the red line in Figure 5.I6 for the real LRP). Moreover, as a natural outcome of the iterative procedure, most of the sampling locations are placed within the vicinity of the LRP. The resulting sampling layout is also consistent with the results originating from the Bayesian geostatistical design framework of Nowak et al. (2010). Nowak et al. (2010) used inverse modeling to minimize the uncertainty of solute arrival times in a 2-D spatially variable flow field and showed how most of the *K* sampling were located between the source and endpoint.

In real applications, the geostatistical model and statistical properties of the hydraulic conductivity field are, in general, unknown. However, in case the site has been already analyzed and hydraulic conductivity statistical properties are already estimated, the initial sampling locations of the iterative method can be chosen to be the *K*-values sampled during the previous site characterization process. If the site is yet to be analyzed (unknown geostatistical parameters), the initial number of sampling locations have to be larger, since statistical properties of the field must be estimated. It is important to note that the geostatistical model (and corresponding statistical properties) can be updated at each iteration using the newly available data and could be further cast in a Bayesian framework.

### 4.2 Comparison With a Regular Sampling Strategy

Next, we compare the proposed sampling strategy based on the LRP probability (see measurement network depicted in Figure 5) to a regular sampling strategy. For this regular strategy, we select the horizontal sampling locations along a regular 7×6 grid, for a total of 42 horizontal sampling locations. This is the same number of locations used at the end of the iterative strategy (i.e., number of black circles in Figure 5.I6). Therefore, both strategies would require an identical sampling effort, with the iterative strategy having a negligible time overhead due to the fact that we compute the LRP probability map for each iteration. Figure 6 shows the ensemble of 100 LRPs for the fields conditioned to *K* extracted with the *regular* strategy (Figure 6, top) and the *iterative* strategy (Figure 6, bottom). It is clear that the uncertainty on the LRP is higher for the regular strategy, with the LRPs covering a larger area and having different shapes from the expected one. On the contrary, the higher sampling frequency along the LRP obtained by using the iterative strategy reduces its uncertainty, as shown by the more homogeneous behavior of the LRPs across different conditional realizations (Figure 6, bottom).

The comparison with the regular sampling strategy shows the advantages of using the iterative method to choose the sampling locations. In fact, it is possible to reduce the uncertainty on the LRP by rearranging the sampling locations, maintaining the characterization costs and effort constant. Since the graph theory-based algorithm used to compute the LRP is relatively efficient (Rizzo & de Barros, 2017), the computation overhead added can be justified by the gain in uncertainty reduction.

### 4.3 Uncertainty Reduction of First Arrival Times

We have seen how choosing the *K* sampling locations along the LRP decreases the uncertainty associated with its estimation. Here, we show that the uncertainty on the first arrival time decreases when using the iterative method to select *K* sampling locations. To show this, we simulate flow and transport for the both conditional *K* ensembles obtained through the iterative sampling and the regular sampling strategies.

The governing equations of groundwater flow are solved using MODFLOW (Harbaugh, 2005) and the Python interface FloPy (Bakker et al., 2016), while solute transport is simulated using the open source GPU-accelerated random walk particle tracking code PAR^{2} (Rizzo et al., 2019). Details about the physical-mathematical model adopted in this work can be found in Appendix Appendix B. Note that all the quantities are presented in dimensionless form, and the dimensionless groups are listed in Appendix Appendix B. The solute plume, represented by a set of particles, is initially placed in a *x*_{2}-*x*_{3} plane along the left boundary (*x*_{1}=0). For this simulation, we use 10 million particles, a dimensionless time step d*t*=0.05, molecular diffusion *D*_{m}=10^{−9}, and longitudinal and transverse dispersivities *α*_{L}=0.01 and *α*_{T}=0.001, respectively. Figure 7 shows the shape of the plume after 500 time steps (i.e., dimensionless time *t*=25) using the synthetic true field
depicted in Figure 4. Since the heterogeneity level of the field is relatively high, the solute plume is strongly affected by high conductivity channels, and the front of the plume moves along the LRP.

We run two Monte Carlo simulations using conductivity fields conditioned to the sampling locations determined in the iterative strategy (Figure 6, bottom) and sampling locations chosen using a regular strategy (Figure 6, top). For both ensembles, 100 conditional realizations are generated using sequential Gaussian simulation. For each *K* realization, flow and transport are simulated and the statistics of the cumulative breakthrough curve (BTC) at a control plane located on the right boundary (*x*_{1}=6) is computed.

Figure 8 shows a comparison on the cumulative BTC at the right boundary between the two sampling strategies. Since the MHR and LRP are linked to first arrival time (see Figures 5 and 7 of Rizzo and de Barros, 2017), we focus on the early solute breakthrough. Figure 8 shows that the cumulative BTC obtained from the synthetic true field
(red line) is within the uncertainty envelope in both the iterative sampling method (blue) and regular sampling method (green). However, the confidence interval (i.e., uncertainty) is smaller when using the iterative sampling method. For example, let us consider *t*_{1%} (i.e., the time when 1% of the particles has crossed the right boundary) as the quantity of interest. If we measure the uncertainty through the standard deviation of *t*_{1}*%*, we find that the proposed iterative method reduces the uncertainty by 47% when compared to the regular sampling method. Figure 9 shows the estimation of *t*_{1%} for both the iterative and regular sampling methods. We point out again that both regular and iterative sampling methods require the same number of *K*-samples (in this case, 42 horizontal sampling locations). Similar to LRP, it is possible to decrease the uncertainty of first arrival time by rearranging the sampling locations. The direct computation of first arrival times would require many flow and transport simulations for each iteration. Instead, our work proposes to use the iterative strategy to minimize the LRP uncertainty, which is more efficient and requires minimal computational effort. Nonetheless, we can achieve a remarkable reduction in the first arrival time uncertainty due to the link between MHR/LRP and first arrival times. The 47% uncertainty reduction from a regular sampling strategy is similar to what was obtained by Nowak et al. (2010) for *t*_{50%} using a task-driven inverse modeling technique. However, the iterative strategy does not need any flow and transport simulation, being more efficient when the aquifer domain is large. Moreover, the choice of new sampling locations is based on all the samples collected during previous iterations; thus, the final sampling locations are optimized for the site under examination.

Finally, we compare the computed values of *t*_{1%} with the corresponding MHR for each realization of the hydraulic conductivity ensemble. This analysis is carried out for an unconditional ensemble and the conditional ensemble obtained through the iterative sampling strategy described in section 4.1. Figure 10 shows a scatter plot of the MHR and *t*_{1%} for 100 fields generated using the conditional field and 100 fields generated using the unconditional field. As expected, there is a clear correlation between the MHR and first arrival times. Note that the quality of this regression is of key importance for the success of the iterative sampling strategy. The regression variability is representative of all the factors that are not incorporated in the MHR, such as boundary condition effect and mass conservation. Recall that the MHR is a static quantity and flow and transport simulations are not used for its computation. The unconditional field realizations (gray circles) span a large area, as a result of the high variability due to the lack of measurements. On the contrary, the conditional field realizations (blue rhombus) variability is much lower, being close to the point, depicted as a red cross, indicating the MHR and *t*_{1%} of the synthetic true field
. It is clear that reducing the uncertainty of the MHR (e.g., by sampling along the LRP) results in a decrease of the uncertainty of the first arrival time due to the fact that the two quantities are correlated.

### 4.4 Start and Target Control Volumes Selection

The benchmark scenario considered in the previous sections assumes that the solute plume is transported along the average flow direction in the *x*_{1} direction. The iterative strategy proposed can be extended to different scenarios. One key factor for a proper application of this strategy is the selection of the start and target control volumes. In the benchmark case (see section 4.3), the start and target control volumes were chosen to be the left and right boundaries of the domain. This was chosen since the solute is initially injected through the left boundary, and we are interested in the first arrival times at the right boundary. However, other configurations might be of interest.

A practical scenario which is of interest to hydrogeologists is the case where the solute is injected in a well and extracted from another well (similar to a tracer test). Following the procedure described in the previous sections, Figure 11 reports the sampling locations selected after six iterations as well as the corresponding LRP probability map. In this case, we are seeking the LRP from the injection point located in the lower left area, that is, (*x*_{1},*x*_{2},*x*_{3})=(1,0.6,0.5), to the production point located in the upper right area, that is, (*x*_{1},*x*_{2},*x*_{3})=(5,2.4,0.5). The synthetic true field is the same as before (i.e.,
as shown in Figure 4). Again, the results displayed in Figure 11 show that the methodology is capable of identifying the trajectory of the LRP after six iterations and 42 horizontal sampling locations.

As done in the benchmark scenario, see section 4.3), we perform a Monte Carlo analysis of the flow and transport simulations using a RSF conditioned to the conductivity extracted in the sampling locations given by the iterative strategy and the sampling locations located along a regular grid. Both cases have the same number of conditioning points (i.e., 42 horizontal sampling locations). For this simulation, we enforce closed boundary conditions and that the injecting and producing wells operated at the same constant flux rate. The particles are instantaneously released in the injection point. All the other parameters are the same as in the benchmark case. As shown in Figure 12, the mass BTC uncertainty estimated through the iterative sampling is smaller when compared to the simulations conditioned on the regular sampling grid (compare left and right graphs presented in Figure 12). In this case, we observed a 42% reduction of the *t*_{1%} standard deviation, when compared to the regular sampling strategy.

## 5 Summary

In this work, we employ a graph theory framework to compute the MHR and LRP in the subsurface environment. These quantities can be used to provide information regarding the hydraulic connectivity of a heterogeneous porous formation which is related to solute first arrival times.

Due to the computational efficiency of the framework, we perform a systematic study to investigate the impact of the random heterogeneous structure of the porous medium on the statistical properties of the MHR. We show how the conceptualization of the RSF model impacts the statistical distribution and spread of the MHR. Among several models used in the literature, we focused on multi-Gaussian as well as non-multi-Gaussian log conductivity fields. Within the non-multi-Gaussian field category, we consider two facies (binary) fields and fields displaying well-connected and disconnected features. Results show that the RSF model has a remarkable impact on the average MHR and the MHR uncertainty. Furthermore, we illustrated how the dimension of the physical domain (i.e., 2-D vs. 3-D) has a key role in controlling the statistics of the MHR and the connectivity structure of the porous formation. We show how the choice of both the RSF model and dimensionality impacted the extremes of the MHR statistics. As expected, our analysis show that 3-D fields display consistently lower MHR (in the average) when compared to the 2-D counterparts. The graph theory-based framework presented in this analysis can be used by practitioners to efficiently evaluate the impact of assumptions associated with a specific RSF model and dimensionality on connectivity.

Lastly, we propose an iterative strategy for data acquisition to improve the delineation of preferential channels in aquifers. Instead of evenly distributing the sampling locations, our method allows for the sampling campaign to be divided into multiple iterations. In each iteration, new sampling locations are chosen with the goal of minimizing the uncertainty of the LRP. Since the LRP can be computed efficiently for both 2-D and 3-D fields, the computational overhead introduced by the intermediate steps is negligible. We evaluate this strategy using a synthetic field and perform a comparison with a simpler (i.e., regular) strategy, where sampling locations are chosen on a regularly spaced grid. The iterative strategy leads to a reduction of the LRP uncertainty when compared to the regular strategy, since most of the sampling locations are placed along the true LRP. Moreover, since the LRP is related to the trajectory of the front of the solute plume, the uncertainty on first arrival time is also decreased. Our computational analysis shows a 47% decrease in first arrival time uncertainty as a result of the new sampling strategy. In our benchmark model, we used a Multi-Gaussian random field to represent the hydraulic conductivity uncertainty. However, due to the flexibility of the algorithm to compute the MHR and LRP, the iterative sampling strategy can be used with any RSF that supports hard conditioning (i.e., set the *K* values in the sampling locations). Casting the iterative sampling methodology ideas within an optimal design framework for the identification of the LRP is a topic of future research.

## Acknowledgments

Authors gratefully acknowledge the financial support by the National Science Foundation under Grant 1654009. There are no data-sharing issues given that this work is computational, and all numerical results are reported in the tables and figures. The first author acknowledges the financial support from the USC Provost's PhD Fellowship.

## Appendix A: Graph Theory Algorithm to Compute MHR and LRP

*K*-field into a graph. We place the nodes at the center of each cell, and neighbor nodes are connected by edges. Each edge

*e*has a weight equal to the hydraulic resistance along the line connecting the nodes

*i*and

*j*

**r**

_{ij}| is the length of the segment connecting the coordinates of node

*i*with either the center of the face between the cells

*i*and

*j*or the corner coordinates between

*i*and

*j*. Therefore, it is possible to approximate 1 between two points

**s**and

**t**as follows:

**s**and

**t**. Equation A2 is equivalent to the famous shortest path problem that can be efficiently solved by using Dijstra's algorithm. To extend the computation to multiple starting and target vertices, two virtual vertices need to be added to the graph. The first is connected to the starting vertices using edges with zero resistance. The second is connected to the target vertices in a similar way. These two points will be the new start and target points for Dijkstra's algorithm. For additional details about the methodology, we refer the reader to Rizzo and de Barros (2017).

## Appendix B: Flow and Transport Dimensionless Equations

*a*is the practical range, are the original coordinates, is the effective porosity, is the hydraulic conductivity and is its geometric mean (i.e., ), is the hydraulic head, and Darcy's velocity. Using B1, it is possible to rewrite the flow equation in dimensionless form

*L*

_{1}×

*L*

_{2}×

*L*

_{3}ranges and permeameter-like boundary conditions (i.e., zero flux, Neuman-type boundary conditions in

*x*

_{2}and

*x*

_{3}directions and Dirichlet boundary conditions along the

*x*

_{1}direction). The hydraulic head difference in the

*x*

_{1}direction is set such that Δ

*h*/

*L*

_{1}=1. The spatially fluctuating velocity field

**u**is obtain through the use of the dimensionless Darcy's law

Flow is numerically solved using MODFLOW (Harbaugh, 2005) and Flopy (Bakker et al., 2016).

*x*

_{1}=0. We introduce the following dimensionless quantities:

The advection-dispersion equation above is solved using the GPU-accelerated random walk particle tracking code PAR^{2} (Rizzo et al., 2019).