Hydrogeological Modeling and Water Resources Management: Improving the Link Between Data, Prediction, and Decision Making
Abstract
A risk-based decision-making mechanism capable of accounting for uncertainty regarding local conditions is crucial to water resources management, regulation, and policy making. Despite the great potential of hydrogeological models in supporting water resources decisions, challenges remain due to the many sources of uncertainty as well as making and communicating decisions mindful of this uncertainty. This paper presents a framework that utilizes statistical hypothesis testing and an integrated approach to the planning of site characterization, modeling prediction, and decision making. Benefits of this framework include aggregated uncertainty quantification and risk evaluation, simplified communication of risk between stakeholders, and improved defensibility of decisions. The framework acknowledges that obtaining absolute certainty in decision making is impossible; rather, the framework provides a systematic way to make decisions in light of uncertainty and determine the amount of information required. In this manner, quantitative evaluation of a field campaign design is possible before data are collected, beginning from any knowledge state, which can be updated as more information becomes available. We discuss the limitations of this approach by the types of uncertainty that can be recognized and make suggestions for addressing the rest. This paper presents the framework in general and then demonstrates its application in a synthetic case study. Results indicate that the effectiveness of field campaigns depends not only on the environmental performance metric being predicted but also on the threshold value in decision-making processes. The findings also demonstrate that improved parameter estimation does not necessarily lead to better decision making, thus reemphasizing the need for goal-oriented characterization.
Key Points
- We present the risk-based data acquisition design evaluation (RDADE) framework to integrate stochastic analyses with defensible decisions
- We find that improved parameter estimation does not always guarantee lower decision risk
- When dealing with extreme events with high consequences, it is advantageous to improve system resiliency in addition to modeling accuracy
1 Introduction
Providing water in sufficient quantity and quality to meet growing demands for agricultural, industrial, and municipal uses is an increasingly complex challenge, requiring water resources managers to make difficult decisions regarding the selection, maintenance, and treatment of water sources in light of increasing scarcity and prevalence of contamination, in both surface water and groundwater. In the United States, groundwater provides drinking water for nearly 150 million people, yet over 20% of the nation's groundwater samples have had at least one contaminant present at levels potentially harmful to human health (DeSimone et al., 2014).
1.1 Environmental Performance Metrics and Water Resources Management
Sustainable groundwater management requires managers to make decisions based on answers to crucial questions regarding the quantity and quality of groundwater resources. For example, a water district manager needs to make decisions on when and where to divert water to storage facilities, so that the district has water that is suitably clean (i.e., that all contaminant concentrations are below the limit of the treatment system) and sufficiently ample (i.e., that there is an acceptable low risk of failure to supply the fluctuating domestic water demand). In these cases, the hydrological/hydrogeological variable(s) involved in these types of questions may be the arrival time of a contaminant plume at a water intake, the groundwater and contaminant flux passing through a specific area over a given period, or the contaminant concentration at a specific location or time. Such variables have been referred to as environmental performance metrics (EPMs; De Barros et al., 2012), the prediction of which helps water managers answer the aforementioned questions.
1.2 Field Data, Uncertainty, and Risk
Answering the above types of questions with complete certainty is impossible in practice due to the many challenges in site characterization, modeling, and decision making. Farber and Findley (2010) acknowledged the inevitability of this uncertainty, writing, “In many situations, safety cannot be absolute but must entail an acceptable level of risk, however and by whomever that level may be defined” (pp. 172–173). USEPA (2014) also described the importance of acknowledging uncertainty in management decisions, stating that “if uncertainty and variability have not been well characterized or acknowledged, potential complications arise in the process of decision-making” (p. 6).
Uncertainty could arise from several sources. Among all types of uncertainty, perhaps the one most commonly addressed and studied concerns the uncertainty of the parameters in the model(s) applied to estimate the EPMs, and how uncertainty in these parameters creates uncertainty in EPM predictions. Typically, these uncertainties are reduced by characterizing the site by acquiring field data and then using the data in inverse modeling to estimate parameters and the uncertainties in the estimates. Substantial research has developed methodologies for these processes and thus is not the focus of this paper, but more information can be found in, for example, Hubbard and Rubin (2000), Kowalsky et al. (2004), Chen et al. (2004), Hou and Rubin (2005), Rubin et al. (2010), and Savoy, Heße, et al. (2017).
Due to the inherent assumptions and the conceptual setup of each model, it is also important to consider uncertainty in conceptual models and assumptions, as mentioned by, for example, Refsgaard et al. (2007). A more detailed discussion of the sources and types of uncertainty, along with how they can be addressed, is provided in section 5.
- How much uncertainty is acceptable?
- How does one obtain the necessary information for such an acceptable level of uncertainty?
The answer to the first question is nontrivial and ideally would be determined by considering costs (i.e., how costly are data and analyses) and consequences (i.e., what is at stake) with input from multiple stakeholders including regulators, site owners, site users, and the general public. Thus, the answer to question 1 is beyond the scope of this paper. The goal of this paper is to provide a framework by which the second question can be answered.
The answer to the second question is closely related to the spatiotemporal scale of interest (e.g., whether we are interested in the instantaneous peak contaminant concentration or the total contaminant mass over an area or period); the physical response being modeled (e.g., whether we are interested in the spread of the contaminant plume or the residual concentration after the main plume has been carried away by advection); and the efficacy of the sampling campaign where the data are obtained.
1.3 Previous Work and Remaining Challenges
The efficacy of a sampling campaign can be hard to define due to the complex process by which the sampled data are used to condition the conceptual, statistical, and/or spatiotemporal variability models, all of which is a precursor to making EPM predictions and, ultimately, decision making. Abellan and Noetinger (2010) demonstrated a method for optimizing subsurface field campaign designs. However, the objective by which “optimal” was defined was inferring the most accurate geostatistical model (Figure 1, arrow A) and not necessarily the most accurate EPM predictions or most successful decision making. It is this added complexity that highlights the importance of a goal-oriented site characterization (e.g., De Barros & Rubin, 2008; De Barros et al., 2012; Nowak et al., 2010; Savoy, Kalbacher, et al. 2017).
There is often no direct interplay between hydrogeological considerations and other considerations related to management, regulation, and the general public. However, in some applications, such as when assessing a health risk to a potentially exposed population, hydrogeological characterization and modeling play only one part in the overall risk assessment (De Barros & Rubin, 2008; Maxwell et al., 1999; Rubin et al., 2018). It is thus important to adopt a goal-oriented perspective (Figure 1, arrows B, C, and D), where considerations regarding all aspects revolve around the key management variable—the risk of making a wrong decision—which, in turn, shape the sampling campaign design. The conceptual difference between goal orientation and parameter orientation can be exemplified by the different definitions of what a “good” campaign design entails. For instance, Abellan and Noetinger (2010) presented a method of optimizing field campaigns based on the associated information gain, defined as the Kullback-Leibler divergence of the posterior distribution with respect to the prior distribution of the geostatistical parameters. The concept here is to maximize the information gain in order to have a better understanding of the distribution of the parameters of interest. On the other hand, Nowak et al. (2012) integrated optimization with hypothesis testing, where optimality was defined by the lowest decision risk at a fixed cost of a sampling campaign or, inversely, the lowest cost to achieve an acceptably low risk.
However, a challenge exists when applying the approach presented by Nowak et al. (2012). As illustrated in Figure 1, the main feature of goal-oriented design is the inclusion of considerations from management, regulation, and the general public. Under the framework proposed by Nowak et al. (2012), any constraints or considerations must be quantified and codified into an optimization algorithm, which can be challenging or impossible, due to the elaborate relationship that exists between industry, society, government, and the general public (Brulle & Pellow, 2006; Rubin et al., 2018). After arriving at the optimal design, it must be then checked for conformity to all the applicable noncodifiable constraints and considerations on an ex post facto basis. To that end, it is likely that some quantitatively suboptimal designs are in fact qualitatively “better,” due to greater compliance with noncodifiable rules, which motivates use of a proposal-evaluation approach as opposed to an automated optimization algorithm, which will be discussed in the following subsection. In addition to codifiability, challenges remain related to the subjectivity of some considerations as described above. For example, it is difficult to decide whether to save $10,000 on sampling for a 1% risk increase—there is no “correct” answer, but rather a complex interplay between regulations, socioeconomics, politics, and hydrogeology as mentioned before. This is why we consider the first question above to be outside the scope of this paper.
1.4 Present Contribution
To address the aforementioned challenges, this paper presents the risk-based data acquisition design evaluation (RDADE) framework. RDADE facilitates communication concerning uncertainty and risk between multiple stakeholders, including managers, regulators, and the general public in addition to hydrogeologists (see Figure 1). The idea is that when hydrogeologists communicate forecasts, uncertainty, and decisions to other stakeholders, the focus should be on the operational decisions along with the consequence and probability of an erroneous decision. Thus, RDADE provides a mechanism to summarize the uncertainty from all steps, including site characterization, inverse modeling, and forward modeling. In this way, the focus of the communications can revolve around environmental risk and making operable decisions based on a balance between consequence of erroneous decisions and the probability of such errors.
- It allows for adherence to any number of constraints and considerations, regardless of codifiability and quantifiability.
- It reduces computational effort, since the objective function is only evaluated on a few feasible proposals, rather than done numerous times throughout the execution of an optimization algorithm.
- It provides decision makers a list of agreed-upon and feasible alternative designs, which thus allows for the transparent treatment of subjective considerations and weighing costs of increased certainty versus acceptable probability of erroneous decisions.
- It allows more flexible treatment of epistemic uncertainty, especially in nontechnical components of decision making (see section 5 for a detailed discussion).
There are many scenarios where the RDADE framework could be applied. Examples include determining whether or not a migrating contaminant plume will reach nearby water supplies, determining whether contaminant concentration will exceed the Maximum Contaminant Level (MCL, the highest concentration of chemicals permitted in drinking water systems), determining how large a well protection zone should be, or designing flood control structures (e.g., Bolster et al., 2009, 2013; Enzenhoefer et al., 2012; Frind et al., 2006). This paper presents the RDADE framework and demonstrates its use in a synthetic case study predicting contaminant arrival time at a water intake. Since the focus of the paper is demonstration of RDADE, the case study is relatively simple in order to allow conciseness and clarity.
The remainder of this paper is structured as follows. Section 2 presents the theoretical framework for RDADE. Section 3 provides an overview of a synthetic case study. Section 4 presents our findings and offers a discussion of the case study. Section 5 provides a detailed discussion of the various sources of uncertainty that can be accounted for by RDADE and offers suggestions on how to account for the others. Lastly, section 6 provides the summary and conclusions of this work.
2 Theoretical Framework: RDADE
This section presents the details of the RDADE framework, which involves a two-level nested hypothesis testing. In the first layer, hypothesis testing is used to make a decision about the relevant environmental system, using all available data. In the second layer, we randomize the environmental system in order to probabilistically evaluate the effectiveness of a proposed data acquisition design.
2.1 First Level: Conditional Decision Making
It is worth noting that α is not determined by any engineering calculation or modeling prediction. Rather, α is determined by regulation or policy to strike the proper balance between acceptable levels of uncertainty and characterization costs. In other words, if the consequences of a certain type of error are catastrophic, a very low value of α would be used. In this way, even if such a catastrophic event were to happen, the decision makers can demonstrate adherence to well-defined probabilistic criteria throughout the process. Referring again to Figure 1, this can be represented by arrows B, C, and D. While not all stakeholders are equipped for detailed discussions of model parameterizations, all stakeholders can participate in discussions weighing costs and benefits related to characterization and risk and determining what levels of certainty are acceptable and at what cost. In an ideal scenario, these probabilistic criteria would be defined by regulation with input by all stakeholders (Rubin et al., 2018).
2.2 Second Level: Probabilistic Sampling Campaign
where indicates the occurrence of a Type I error (i.e., erroneously reject ) given significance level α; that same holds for for a Type II error (i.e., erroneously fail to reject ). A schematic diagram is provided in Figure 2, which summarizes the hypotheses, the conditional probabilities, and all the indicator variables in a conventional confusion matrix format.
In addition to the subjectivity in α and Rcrit, here we further emphasize the subjectivity in determining the weights wα and wβ. In other words, we now ask not only what probability of error is acceptable but also what probability of each type of error is acceptable. While the obvious interest is keeping water supplies safe, which would motivate a high weight to wα, this comes at a cost of being overly conservative, which may have other negative effects. For instance, in some cases a site user (e.g., a farm or factory) may be required to unnecessarily decrease production if wα is too high and wβ is too low, which could cause detrimental economic impacts.
2.3 Implementation of RDADE: Simulation-Based Hypothesis Testing
In this subsection, we present a simulation-driven method to practically implement RDADE. The necessity of simulation lies in the fact that in reality, the EPM(s) are unknown and, hence, we could not determine the environmental risk indicator variable in the first place (equation 1). The simulation-based hypothesis testing method presented here allows us to implement RDADE, while accounting for the uncertainties rooted in the fact that the EPM(s), and the models and the parameters needed to estimate them are unknown.
Given that RDADE allows for some subjectivity based on any regulatory or managerial considerations, for demonstration purposes, here we assume a case where falsely assuming the safety of water supplies is more consequential than being overly conservative. Hence, wα would take a value of 1, while wβ would take a value of 0. In addition, we also assume that Rcrit=α for simplicity.
The implementation of RDADE starts with generating an ensemble of NY baseline fields, denoted by , where NY is selected to be sufficiently large to represent the entire range of physically plausible possibilities of the site of interest. Each is a field of all parameters necessary to compute the EPM(s). For example, for a groundwater flow/transport modeling problem the ensemble could include spatially variable hydraulic conductivity, porosity, and geochemical parameters. The random generation of baseline fields can be done using any method deemed appropriate for the parameters of interest and their spatial structures. The baseline fields can be generated conditional to any knowledge state prior to the sampling campaign; in a Bayesian context, this knowledge state is represented by the prior information on which the prior distribution is based. RDADE, in general, allows for the possibility of competing conceptual models, which may necessitate the generation of multiple ensembles of baseline fields.
Each baseline field is used for two purposes. The first one is the computation of the baseline EPM (EPMi). Depending on the application, computing the EPM may involve any number of models or transfer functions (e.g., hydrological, geochemical, and biological). In other words, EPMi represents the value of the EPM, which would occur if were a true representation of the site under consideration. By comparing EPMi with EPMcritical and following equation 1, we can define the environmental risk indicator variable for each baseline field, . The addition of the subscript and the superscript for indicates that it is the environmental risk indicator variable if were a true representation of the site in question.
The second use of is to simulate the field campaign and the resulting decision making. This involves simulating the collection of data, gij, where the subscripts indicate the adoption of sampling campaign design Gj and the assumption of being true. In simple cases where the quantity being measured by Gj is a component of the field (e.g., hydraulic conductivity), the information in gij is merely that quantity from the locations in the field specified by the measurements. If the quantity being measured is not a direct component of , then some numerical simulations may be necessary to compute gij.
After the simulation of gij we move on to simulate the decision making that would result. In most cases, this involves many steps, including inferring parameters, distinguishing between conceptual models, and forward modeling. RDADE can be applied to any combination of conceptual models, as well as any form of inverse modeling for the model or parameter inference. If there are multiple competing site conceptual models and/or forward models, Bayesian model averaging can be done in simulating a model averaged EPM, which would provide a model averaged estimate of decision risk. The details of the inverse and forward modeling processes, however, are specific to each application and thus are not the focus of the present study.
Afterward, by following equation 4, we can determine the decision indicator variable, , which represents the decision we would have made, assuming that was the real baseline field and that the only information we had was gij.
While the conditional error probability ( ) is usually the focus in classical hypothesis testing, in water resources management, it makes sense to focus on the error occurrence probability ( ). This is because in some cases, it may be practically impossible to predict an event of extremely low probability (e.g., a very early arrival time). In this scenario, the probability of occurrence (⟨I⟩) would be very low, but the conditional error probability would be very high due to its conditional nature. In other words, if the risk-posing event has an exceptionally low probability of occurrence, no amount of field data could enable managers to predict this event, which would prevent any course of action from being acceptable, as indicated by the conditional probabilities. This effect is demonstrated in section 4.
3 Framework Application: Synthetic Case Study
3.1 Statistical and Physical Setup
Aquifer flow was simulated in a 2-D planar (x,y) rectangular domain, with constant head boundary conditions along the two boundaries parallel to the y axis, and no flow conditions along the boundaries parallel to the x axis. The flow was uniform in the average in the positive x direction. The porosity, n=0.10, was assumed to be known and homogeneous. The total flow domain was 50 numerical grid cells in both dimensions, with each grid cell between and , depending on the integral scale (see below). We assumed the contaminant to have originated from an instantaneous point release, where the time and location of the release were also assumed to be known. The point release was located 40 grid cells upstream of the control plane.
A set of field measurements were taken to characterize the natural logarithm of hydraulic conductivity, Y=ln(K), which was modeled as a space random function (SRF) with a stationary multivariate Gaussian distribution and exponential covariance. The measurements were used to estimate the parameters , where μY, , and IY represented the mean, variance, and integral scale, respectively. The measurements were also used as conditioning points in the forward modeling.
To investigate the effect of prior information, the case study was executed in three alternative scenarios regarding prior distributions. In Scenario 1, the SRF parameters θ were assumed to be deterministically known with . In this scenario, no inference of θ was necessary, and the measurements served only as the conditioning points. In Scenarios 2 and 3, all three parameters were assumed to be distributed uniformly and independently of each other. In Scenario 2, μY was distributed uniformly in the range [−6,−5]. In Scenario 3, μY was distributed uniformly in the range [−7,−4]. In both Scenarios 2 and 3, and IY were distributed uniformly in the ranges [0.1,1] and [3,6], respectively. The three prior information scenarios were chosen to provide a range of knowledge states, from deterministic knowledge of SRF parameters (Scenario 1) to probabilistic descriptions of SRF parameters (Scenarios 2 and 3). Scenarios 2 and 3 were selected to represent relatively informative and relatively uninformative knowledge states about the SRF for Y, specifically different levels of variability in the mean value. This allowed us a closer examination of the relationship between parametric uncertainty, travel time prediction, and decision-making accuracy.
3.2 Field Campaign Setup
The field campaigns to be tested were designed with n = 4, 8, 16, and 32 measurements. For each n, two alternative designs were tested. One configuration had measurements spread throughout the domain, covering various lag distances with the idea of improving the SRF parameter estimates. The other configuration had all measurements located in the likely area of the travel path. The likely area of the travel path was determined by simulating an ensemble of particle paths conditioned only on the prior information, and lateral displacement was plotted against the distance from the point source. These results are shown in Figure 4. Additionally, the locations of the measurements for all field campaign designs Gj,j=1,…,8 are shown in Figure 5.
3.3 Monte Carlo Methodology
In accounting for uncertainty in θ, Latin hypercube integration was used in order to reduce the computational burden (Rubin, 2003). The ensemble sizes were selected such that resulting distributions demonstrated stability with respect to ensemble size. For Scenarios 2 and 3, 27 and 81 hypercubes were used, respectively. For each hypercube, traditional Monte Carlo sampling was used to simulate 250 baseline fields from the distribution f(Y|θ), providing a total of NY= 250, 6,750, and 20,250 for Scenarios 1, 2, and 3, respectively. The methods described in the following subsections were implemented for each of the three scenarios.
3.3.1 Baseline Fields and Simulated Field Campaigns
In each scenario, the travel time was computed deterministically for every baseline field, yielding via equation 17. After recording the deterministically known travel time, the field campaign was simulated by recording the values of hydraulic conductivity at the locations specified by the field campaign design. After the simulated field data gij was collected, the measurements were used to compute the maximum a posteriori estimate (θMAP) for the geostatistical parameters, similar to the maximum likelihood method presented by Kitanidis and Lane (1985), but with bounds provided by the prior distributions. After θMAP was computed, the conditional distribution of travel time f(τc|θMAP,gij) was computed using semianalytical particle tracking (Rubin, 1991).
3.3.2 Simulated Decision Making
For each field campaign Gj, Nc=250×NY realizations of τc were computed, which led to Nc realizations of via equation 17, and, in turn, was computed using equation 9. From this, was either accepted or rejected via equation 4. Finally, could be tested based on equations 12 and 13, as a final judgment on the adequacy of the campaign design Gj. This entire process is then repeated for all eight field campaigns, beginning with the Nc=250×NY realizations of τc.
4 Case Study Results and Discussion
This section presents the results of the case study described in section 3. The results are focused on the effects of the critical value of travel time, measurement configurations, and parametric uncertainty, both a priori and a posteriori. The results from the case study are shown in Figures 6-8, and 9. For the reasons described above, we chose to focus on the behavior of (equations 5 and 11) and (equation 14), though analogous conclusions could have been made regarding and (equations 6 and 15, respectively) as well.
Figures 6a, 6c, and 6e show the resulting values of ⟨I⟩ (equation 9), , and plotted against τcrit for a single measurement configuration (G1A) and all three prior information scenarios. The vertical axis here is on a log scale in order to focus attention on the low-probability regions. Recalling the definition of I given by equation 17, we see that ⟨I⟩ as a function of τcrit coincides with the cumulative distribution function of τ. Recalling the definition of (equation 14), we see that is the quotient of the other two quantities. Figures 6b, 6d, and 6f show the standard deviation of these variables plotted against τcrit. Figure 7 shows all eight measurement configurations for comparison but focuses on only for clarity. The left-hand column shows results from the four campaigns with measurements focused along the travel path, while the right-hand column shows the campaigns with the spread out measurements, aligning with the two columns in Figure 5. The horizontal line denoted α indicates the level of significance 0.05. Values of exceeding this value indicate that the measurement campaign would be rejected via equation 12, deeming G inadequate. In both Figures 6 and 7, τcrit is nondimensionalized by the travel path length L and expected value of velocity ⟨v⟩. Figures 8 and 9 show the root mean square error (RMSE) of the estimates for SRF parameters μY, σY, and IY for all eight measurement campaigns for Scenarios 2 and 3.
4.1 Effect of τcrit
The first thing we notice is that, regardless of the quantity or spatial configuration of measurements, and were highly sensitive to τcrit. In turn, whether or not could be rejected and G deemed adequate was dependent on τcrit. For both large and small values of τcrit, we found approaching zero for all measurement configurations, indicating a very low occurrence of a Type I error in these regions. To understand why this was the case, we examined the value of ⟨I⟩ for these regions, shown in Figure 6. Recalling that ⟨I⟩ as a function of τcrit corresponds directly to the cumulative distribution function of the arrival time, a Type I error was very unlikely for relatively small values of τcrit due to the low probability of being true in this region. On the other hand, for relatively large values of τcrit, a Type I error was unlikely because it was easier to predict the relatively likely event of the arrival time being smaller than the large τcrit. Where a Type I error was more likely, then was where is somewhat likely to be true, but more difficult to predict correctly—the intermediate portion of the cumulative distribution function of the arrival time.
Recalling the definition in equation 14, the behavior of and ⟨I⟩ with varying τcrit explains the behavior of . Figure 6 shows the relationship between ⟨I⟩, , and graphically and also highlights the difference between using occurrence probability and conditional probability to describe the effectiveness of the field data.
In the case when τcrit was relatively small, was very unlikely to be true, and if it was true, it became increasingly difficult to predict correctly, regardless of the amount of data. Thus, could be considered a more informative metric to assess the benefit of the data than Pα. The relationship between these two variables and the probability of being true in this case was shown near the vertical axes in Figure 6.
As demonstrated, the effectiveness of a field campaign and the success of decision making are highly dependent on the value of τcrit. The practical implication of this result is that for effective, goal-oriented characterization, the field campaign should be tailored not only to the EPM of concern but also to the critical value of this EPM on which decision making depends.
4.2 Effect of Measurement Configurations
To compare the overall effectiveness of measurement campaigns, we examine both their performance in inferring SRF parameters (Figure 1, arrow A) and enabling successful decision making, that is, their resulting (Figure 1, arrow B). Figures 8 and 9 show the RMSE resulting from all of the SRF parameter estimates for all baseline fields for all measurement configurations. We find, unsurprisingly, that the parameter estimates improve with the increasing quantity of measurements. For any given number of measurements, the error in the parameter estimates was less for the measurements spread throughout the domain than for the measurements focused along the travel path.
As mentioned before, the focus of goal-oriented design is the probability of an incorrect decision. To this end, we use for comparison, shown in Figure 7. Focusing for the moment on the prior information of Scenario 2, we notice two clear patterns: (1) Larger quantities of measurements were adequate ( is rejected) for a greater range of τcrit, and (2) given a specific quantity of measurements, the configuration with measurements focused along the travel path (A) outperformed the configuration with measurements spread throughout the domain, despite the poorer performance of estimating the SRF parameters.
While summarizing a field campaign design in terms of a possible rejection of is a useful tool for managers and practitioners, a more thorough description of the performance of a campaign can be provided by analyzing , which indicates the probability that a Type I error will occur and not just its relation to a threshold probability. For Scenario 1, there was little difference between the behavior of the different configurations, stemming from the unrealistic assumption that the SRF parameters were known deterministically. For the relatively uninformative prior information (scenario 3), we found the same patterns as that for Scenario 2—an increasing quantity of measurements improved performance, and the measurements focused along the travel path were better for predicting earlier arrivals, despite the poorer performance of estimating the SRF parameters.
Again, the practical implications of these results emphasize the need for goal-oriented characterization design. Designing field campaign strategies to optimize performance when estimating SRF parameters is clearly not the best approach, as it was shown that improved parameter estimates do not necessarily indicate improved decision-making performance. In addition to designing field campaigns tailored to predicting a specified EPM (e.g., arrival time), it is also necessary to take into consideration the critical value of this EPM, which is the threshold for good decision making. Relating again to Figure 1, these results demonstrate that focusing only on arrow A can hinder arrow B, which is important when considering the goal of successful environmental and water resources management.
4.3 Effect of Parametric Uncertainty
To explore the effect of parametric uncertainty, we compare across scenarios the values presented in Figures 6 and 7. Initially, it seems counterintuitive that the more informative priors resulted in a higher probabilities of Type I errors, an effect which is explored in detail in this subsection.
Consider a baseline field where a relatively early arrival occurred, although not too early to predict (i.e., roughly in the range 0.5≤τ⟨v⟩/L≤0.9). In this case, field measurements having above-average conductivity are more likely to be observed. With greater prior parametric uncertainty, it becomes more likely to predict this early arrival due to the increased influence of the measurements. In Scenario 1, a group of high-conductivity measurements will cause the conditional cumulative distribution function (CDF) of the arrival time to diverge only slightly from the unconditional CDF. This, in turn, decreases the chance of correctly predicting the early arrival, even when the data suggest early arrival. On the contrary, in Scenarios 2 and 3, the high-conductivity measurements exhibit a greater impact on the conditional CDFs, enabling us to predict the early arrival when the data suggest so.
On the other hand, we can consider a baseline field where the arrival time is comparable to L/⟨v⟩ or greater in scale. In this case, the arrival time can be captured well by the mean behavior of the arrival time distribution, which can be estimated better with an informative prior. Thus, this is why with increasing τcrit⟨v⟩/L, we observed decreasing in all the scenarios, where the severity of the decrease is inversely correlated to the informativeness of the prior.
The effects described in this subsection should be considered specific to our specific case study, in which relatively simple geostatistical and physical models were used. Future research could investigate how this effect may change when more sophisticated spatial models, inversion methods, and forward models are used.
4.4 Uncertainty in the Results
To get a sense of the uncertainty in the results, we can look at the standard deviations of , ⟨I⟩, and , as shown on the right-hand side of Figure 6. What we see is that the standard deviations of both and ⟨I⟩ were maximal near the mean arrival time (i.e., when τcrit⟨v⟩/L was near one). The reason for this was that the majority of the actual arrival times was near this value, making it more difficult to predict the binary outcome given by equation 17. On the other hand, the standard deviation of was at its peak when , meaning that the conditional error probability was most uncertain when τcrit was about half of the arrival time's expected value. The practical implication of this result is that even before the simulation is executed, τcrit, ⟨v⟩, and L can provide a rough idea of how difficult the predictions might be. If τcrit⟨v⟩/L is close to zero, then it can be reasoned that is unlikely to be true; however, if it is true, it will be difficult to detect. Conversely, if τcrit⟨v⟩/L is much greater than one, it can be reasoned that is likely to be true and that it will be easy to detect.
5 On the Limits of Modeling Uncertainty
This paper provides a methodology for making decisions under conditions of uncertainty. The inputs required consist primarily of statistical models (in the form of statistical distributions, probability density functions (pdfs)) used for uncertainty. As “uncertainty” is a somewhat ambiguous term, it is important that we define the various types of uncertainty that can, and should, be addressed using RDADE, as well as the types that require a different approach. In this way, we delineate RDADEs usability.
The two major categories of uncertainty are aleatory uncertainty and epistemic uncertainty. Extensive discussions on these two categories can found in Beven (2016) and Blöschl et al. (2013). In addition, somewhat more suggestive definitions of uncertainty are provided by Di Baldassarre et al. (2016) and Rubin et al. (2018), who discuss differences between “known unknowns” and “unknown unknowns,” standing for aleatory uncertainty and epistemic uncertainty, respectively. Further, in Di Baldassarre et al. (2016), the authors included a third category—“wrong assumptions”—meaning “things we think we know but we actually do not know.” This third category obviously overlaps, to some degree, with the other two, but it is important to mention this third category as we map the universe of uncertainty.
What is covered by RDADE? Simply stated, any element of the system under investigation that can be formulated as a statistical model can be analyzed with RDADE. Uusitalo et al. (2015) provide a detailed list of such elements, including inherent randomness, measurement error, and natural variations, which all fall nicely under the “known unknown” category. As also stated in Uusitalo et al. (2015), “Uncertainty about the cause-and-effect relationship, is often very difficult to characterize.” Thus, RDADE covers situations of known unknowns. However, RDADE is able to do more, as it works with unknown unknowns, as well. Presumably, there would be no pdf for unknown unknowns. However, this situation can partly be addressed by developing alternative conceptual models for any of the system elements, which can then be integrated into RDADE using a Bayesian modeling approach (BMA; cf. Hoeting et al., 1999). GLUE (Beven & Binley, 1992) could also be considered in this context. Thus, instead of a single model used to compute the statistics, for example, as needed in equations 4-9, or 12, we can use multiple conceptual models.
There is a substantial body of evidence suggesting that BMA provides better-than-average predicting capabilities (Hoeting et al., 1999; Madigan & Raftery, 1994; Madigan et al., 1996). What is particularly appealing in this context is the consideration of “wild cards” (cf. Wardekker et al., 2010) by formulating them as conceptual models. Bringing multiple, alternative conceptual models into the RDADE analysis, including so-called “wild ones”, should not be interpreted to mean that the “true” model is necessarily one of a given class of alternatives (Leamer, 1983). Nonetheless, as shown by Leamer (pp. 315–316), such an approach “… asymptotically produce[s] an estimated density closest to the true density,” which is what is required for RDADEs application.
As mentioned above, the third category that Di Baldassarre et al. (2016) discussed is wrong assumptions. To qualify for RDADE, we must ask whether we can put a pdf on (e.g., associate a probabilistic model with) “wrong assumptions.” To some degree, the answer is yes; we are able to accomplish this by using BMA, in combination with the multiple or wildcard scenarios stated above. Ideally, this approach would reduce the impact of dominant, “wrong assumptions”-based conceptual models, for example, by exploring ex-situ information, such as geologically similar sites, (cf. Chang & Rubin, 2019; Cucchi et al., 2019; Li et al., 2018). However, as stated in Uusitalo et al. (2015), “… caution is advised … the [alternative conceptual] model parameters have been fitted to predict the observed data well, so depending on which kinds of scenarios are tested, it may only be natural that the model predictions are relatively similar.”
Hence, the “unknown unknowns” situation—right or wrong assumptions—cannot be completely and assuredly addressed because it covers situations that never happened or managed to escape the radar screens, so to speak, so there is always a chance of a low-probability, hard-to-predict event, wilder than any wild card. Such low-probability events could produce either minor consequences or huge ones—the latter often referred to as “Black Swans,” following Taleb (2007). Black swan events are unexpected, large-scale disasters. Black Swan situations cannot be covered by RDADE, or, for that matter, by any other probability-based, risk analysis methodology. The reason is, any scenario that could be conceptualized is, by definition, no longer a Black Swan. Here we wish to depart from a somewhat optimistic approach about being able to cover all aspects of uncertainty, by bringing Black Swans into the picture.
Black Swans are low-probability events by the fact that they are so rare—possibly even without precedent—and, because they are so rare, they are not included in uncertainty models. Even if one were to attempt to recognize them, they would exist somewhere at the extremes: the hard-if-not-impossible-to-characterize, low-probability tails of the distribution. Because the tails of the distribution are hard to characterize, this calls into question any risk assessment analysis or management decision that would use such probabilities. Taleb (2007) has discussed this situation, stating that “it is much easier to deal with Black Swan problems if we focus on robustness to errors rather than improving predictions.” Referring to Black Swan situations, Wardekker et al. (2010) demonstrated that “a resilience approach makes the system less prone to disturbances, enables quick and flexible responses, and is better capable of dealing with surprises than traditional predictive approaches.” Blöschl et al. (2013) extended the discussion in Taleb (2007) to flood risk situations, by suggesting that the vulnerability of a system can be reduced by structural changes and emergency planning. Moreover, Merz et al. (2015) suggested that “decentralization, diversification, and redundancy can further enhance system robustness and adaptivity.” Moving underground, in the case of nuclear waste repositories, for example, or any other possible high-risk polluter, it may be more beneficial to invest in early warning monitoring systems and other precautionary measures than trying to improve the predictive accuracy of what might happen tens of thousands of years into the future. To summarize, RDADE should be used primarily to analyze situations that could be reasonably modeled with probabilities. For the rest, planners should focus on reducing system vulnerability.
6 Summary and Conclusions
In this paper, we introduced RDADE, a framework for rational, risk-based decision making in water resources management, policy, regulation, and field campaign design. The framework enables the straightforward management of uncertainty and risk. Using RDADE, regulators and policymakers are able to define—in whatever way deemed appropriate—an acceptable level of risk in management decisions, while managers and practitioners are able to simply demonstrate that a decision is defensible, even if it is not ultimately correct.
The RDADE framework is general: It can accommodate any number of hydrogeological, biological, geochemical, and so forth conceptual models and can be used with any type of field data acquisition and inverse modeling methods. RDADE itself does not design a field campaign; rather, that is left to practitioners with experience in field methods and local hydrogeological conditions. What the framework does accomplish is that it allows practitioners to take a proposed field campaign design and determine, probabilistically, whether or not the design will provide enough information for a defensible decision.
RDADE allows for the simple communication of uncertainty and risk. Instead of focusing on geostatistical model uncertainty or parametric uncertainty, or any other concept that may be unfamiliar to stakeholders outside of hydrologists, the framework allows for simple communication of uncertainty in the quantities directly relevant to decision making. While this simple communication of uncertainty and risk is no substitute for the transparent communication of data collected, models used, or assumptions made in the analysis, the benefit is the ability to summarize such communication with a simple description regarding levels of uncertainty and risk. In other words, it is easier to relate to chances (or probabilities) of making an error in the decision than to explain what estimation variance means in relation to environmental or societal impact. What RDADE enables is a communication of the type, “With budget a, we have a b% chance of error, but with a larger budget c, we can reduce the chance of error to d%.” This is also not a trivial concept but can be more easily understood and used as a means to shape discourse among stakeholders. Still, the ability to clearly state probabilities of success and failure does not completely solve any ambiguities. For a complete approach, there is a need to define legally binding probability standards (Rubin et al., 2018). Once probabilistic laws are set, an approach such as RDADE can be used to demonstrate compliance with such requirements.
In this work, we demonstrated the RDADE framework using a case study predicting contaminant arrival time in an aquifer. The emphasis of this paper was the demonstration of the framework, so for conciseness and simplicity, several simplifying assumptions were made, including 2-D flow, a Gaussian field with low variance, exponential variogram, and uniform prior distributions. These assumptions, of course, limit the applicability of the conclusions drawn regarding the spatial configuration of measurements to this scenario. However, as discussed, the assumptions made in the case study are not necessary for the general use of the framework. Going forward, research could utilize the RDADE framework to more closely examine the relationship between prior information, prediction uncertainty, and decision making in more realistic scenarios.
The case study showed that improved estimates of geostatistical parameters are not necessarily associated with improved water resources decision making, thus demonstrating the importance of designing field campaigns with the goal of making defensible management decisions, as opposed to optimal parameter estimates. It was also shown that the amount of field data necessary to make a decision must be determined on a case-by-case basis. The critical value of the EPM on which the decision depends, as well as the amount of prior information available about the site, can significantly affect the amount of field data that is necessary. This further highlights the importance of goal-oriented characterization design, which is important in light of the costs associated with site characterization approaches. The methods presented here can be utilized by managers to prevent overspending on unnecessary amounts of field data and can also ensure that measurements are strategically placed in order to ensure the maximum benefit of the data.
Acknowledgements
All data used in this study were synthetically generated using the publicly available R package RandomFields, as cited. The baseline fields used in this study are available online (at https://doi.org/10.6078/D15Q4K). This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The first author is grateful for the support provided by Helmholtz-UFZ during and after the time he spent there as a guest scientist. Additionally, the first author is grateful to Heather Savoy for her advice on various components of the computational and writing processes conducted for this project.