A Fresh Look at Variography: Measuring Dependence and Possible Sensitivities Across Geophysical Systems From Any Given Data
Abstract
Sensitivity analysis in Earth and environmental systems modeling typically demands an onerous computational cost. This issue coexists with the reliance of these algorithms on ad hoc designs of experiments, which hampers making the most out of the existing data sets. We tackle this problem by introducing a method for sensitivity analysis, based on the theory of variogram analysis of response surfaces (VARS), that works on any sample of input-output data or pre-computed model evaluations. Called data-driven VARS (D-VARS), this method characterizes the relationship strength between inputs and outputs by investigating their covariograms. We also propose a method to assess “robustness” of the results against sampling variability and numerical methods' imperfectness. Using two hydrologic modeling case studies, we show that D-VARS is highly efficient and statistically robust, even when the sample size is small. Therefore, D-VARS can provide unique opportunities to investigate geophysical systems whose models are computationally expensive or available data is scarce.
Key Points
- We develop an efficient “data-driven” method for global sensitivity analysis (GSA) based on principles of variography
- Our method enables assessing the relationship strength, either causal or correlational, in geophysical systems based on any given data
- Our method provides theoretical links with previous GSA methods and demonstrates robust performance even when the data size is very small
Plain Language Summary
Sensitivity analysis (SA) is about assessing how the properties of a system are influenced by different factors. It can also help us better understand the behavior of a mathematical model and the underlying real-world system that it mimics. Almost always, classic SA estimates the sensitivities by sampling the entire problem space in a specific manner. They are incapable of using a pre-existing set of input-output data or pre-computed model evaluations. Hence, classic SA becomes useless in cases where a sample of input-output data, obtained from physical experiments or computationally expensive simulations, is already available. We propose a purely data-driven method that can effectively be used in such situations. Based on the principles of variography, our method measures dependence and possible sensitivities across a system from any given data. Here, we illustrate our method in the context of hydrologic modeling, but it can potentially be applied to study other geophysical systems models.
1 Introduction
Understanding how various uncertain factors influence Earth and environmental systems (EES) models' behavior has greatly raised the need for continued development of the efficient methodologies for global sensitivity analysis (GSA). GSA methods identify dominant factors that exert a considerable impact on the model responses and accordingly provide invaluable information for model simplification, risk assessment, and uncertainty analysis (Bhalachandran et al., 2019; Castelletti et al., 2012; Janetti et al., 2019; Li et al., 2019; Markstrom et al., 2016; Puy et al., 2020; Safta et al., 2015).
- They are bound to their own sampling strategies that follow specific spatial arrangement (the “ad hoc designs” as termed by Ratto et al., 2007).
- They are not applicable when the underlying input-output functional relationship is unavailable (e.g., we may only have a sample of input-output data obtained from field observations or previous modeling experiments, and nothing more).
The former issue is widespread among GSA methods (e.g., GSA methods of Saltelli et al., 2010, and Razavi & Gupta, 2016a, 2016b), and therefore, a sample taken by one method cannot be utilized by another method. Furthermore, this complicates comparison of the different methods. And due to the latter, GSA has remained limited to cases where a computational model is available, and its run time is tractable to generate a large enough input-output sample. Hence, the mainstream methods are not useful when the distributions, correlations, and interactions between different factors need to be characterized based on the existing data set, without model (re)evaluations. These challenges necessitate a “given-data approach” to GSA (alternatively termed as “data-driven approach” or “post-processing GSA,” see Borgonovo et al., 2016; Li & Mahadevan, 2016; Plischke, 2010) that can extract sensitivity-related information from preexistent data sets. In a burgeoning era of big data, this becomes more crucial than ever because data collection can be much easier and faster than building models.
A limited number of GSA methods can potentially be used under the given-data approach. These are mainly emulator-based techniques, limited to estimating variance-based sensitivity indices using either Monte Carlo-based method (see, e.g., Iooss et al., 2006; Marseguerra et al., 2003; Storlie et al., 2009) or analytical approaches (see, e.g., Jia & Taflanidis, 2016; Jourdan, 2012; Sargsyan & Safta, 2019; Sudret, 2008). Additionally, moment-independent methods can also be used in the given-data setting (Borgonovo et al., 2012, 2017; Plischke et al., 2013; Yun et al., 2018). Each of these methods characterizes only a specific feature of the response surface and ignores the rest (Razavi & Gupta, 2015). Recognizing this fact, some researchers have recently advocated the use of multiple GSA measures together, which may be based in different theories, at the cost of augmenting computational burden (Borgonovo et al., 2017; Pianosi et al., 2016; Wang & Solomatine, 2019). We should note that, in theory, the given-data paradigm provides a unique platform to estimate various GSA measures simultaneously from the same data set, without much additional computational effort.
Here, we contribute to the given-data paradigm by developing a highly efficient and robust data-driven GSA, based on the theory of variogram analysis of response surfaces (VARS) (Razavi & Gupta, 2016a, 2016b). Despite the increasing popularity of VARS (e.g., Akomeah et al., 2019; Becker, 2020; Korgaonkar et al., 2020; Krogh et al., 2017; Leroux & Pomeroy, 2019; Lilhare et al., 2020; Schürz et al., 2019; Sheikholeslami et al., 2017; Yassin et al., 2017), its current version cannot be applied in the given-data setting. Conversely, our new version of VARS works on any given data, by approximating the anisotropic variogram structure of the underlying (but unknown) response surface, when only a (small) sample of the input-output space is available. We also address a crucial, but often neglected, component of any GSA practice, which is assessing the “robustness” of its results. Note that GSA results are typically prone to statistical uncertainty due to sampling variability and to imperfect nature of the numerical methods. However, a comprehensive robustness assessment can be computationally costly or infeasible in the given-data context. Therefore, we develop a new robustness index that works well within the given data setting.
2 Method
2.1 Background


Using Equation 2, directional variograms along the jth input factor (j = 1,2,…,d) can be estimated by calculating γ(hj) for any given set of hj, for example, {0, Δh, 2Δh, …} with perturbation resolution of Δh (Razavi & Gupta, 2016b). The directional variograms, therefore, show how the variability of the response variable is changing with respect to the direction and perturbation scale. Accordingly, VARS-based sensitivity analysis links the rate of variability to both direction and perturbation scale (for detail see Razavi & Gupta, 2016a).

2.2 New Extension of VARS: A Given-Data Estimator






3 Numerical Experiments
3.1 Computer Experiments Versus Physical Experiments
We argue that D-VARS is applicable to both computer experiments and physical experiments. For the former, D-VARS contributes to more efficient and sampling-free GSA, while for the latter, it provides new opportunities to assess the strength of relationships, either causal or correlational, between different variables measured in an experiment. There are two major differences between these two types of experiments: computer experiments are usually deterministic, while data collected from physical experiments are prone to noise or errors, often with unknown properties. In computer experiments, one may have full control on the experimental design for collecting samples and their distributional properties, while it is typically not the case in physical experiments. In this paper, we chose to test D-VARS on computer experiments. While the outcome of our tests here can, in part, be generalized to physical experiments, a rigorous study is required to test D-VARS to cases wherein data sets are polluted with noise and variables follow a variety of distributional forms. This will be the topic of our follow-up paper.
3.2 Hydrologic Models Used
We designed our case studies with two well-established hydrologic models of increasing complexity to illustrate the utility of D-VARS. The HYMOD model with five parameters was configured for the 1,950-km2 Leaf River catchment located in Mississippi, USA. The HBV-SASK model with 12 parameters was configured for the 1,435-km2 Oldman River basin located in Alberta, Canada. The model parameters were chosen as inputs to D-VARS. As the output, for the HYMOD case study, we chose a goodness-of-fit metric to observations, Nash-Sutcliffe efficiency (NS) (Nash & Sutcliffe, 1970), as a common setting that informs calibration (Gupta & Razavi, 2018). For the HBV-SASK case study, however, we chose model's estimate of flood peak with the 10-year return period, as a more modern application of GSA focused on learning about the system's behavior under uncertainty and nonstationarity (Razavi et al., 2019, 2020). Full details are available in the supporting information (see Text S3).
3.3 D-VARS Runs
We first ran the original VARS for both case studies using the sampling-based STAR-VARS algorithm (Razavi & Gupta, 2016b). A large sample size was chosen, resulting in ~70,000 model evaluations for each case study, to ensure convergence to robust and accurate results. These results were deemed to be the “true” sensitivities and used as the comparison benchmark for the performance assessment of D-VARS.
Second, we generated synthetic data sets by randomly sampling from the input space, with progressively increasing sample size (i.e., 20, 50, 100, 200, and 400 sample points) via progressive Latin hypercube sampling (Sheikholeslami & Razavi, 2017). We then ran all the sample sets through the models to obtain the respective outputs. Each input-output sample set was considered to be a set of “given data.” We replicated this data generation process 100 times with different initial random seeds to assess both average and variability of the D-VARS behavior. Furthermore, IVARS-50, called “total-variogram effect”(Razavi et al., 2019), was applied to assess the factor sensitivities, which means that Equation 8 was computed for Hj= “50% of the parameter range.”
4 Results and Discussion
4.1 Factor Sensitivities and Actual Robustness of D-VARS
Figures 1 and 2 present the histograms of the parameter rankings for different input-output sample sizes. Rank1 stands for the most influential parameter on the output, i.e., the dependency of the output to the Rank1 parameter is the highest. The true ranking of the HYMOD and HBV-SASK parameters are as follows: {Rq, Cmax, alpha, bexp, and Rs} (see Table S1) and {PM, C0, FRAC, TT, FC, a, UBAS, K1, EFT, LP, K2, and beta} (see Table S2) from the most influential to the least influential one.


As shown in Figures 1 and 2, rankings of the most influential parameters for HYMOD, {Rq, Cmax, and alpha}, and HBV-SASK, {PM, C0, FRAC, TT, and FC}, have been well established even when the size of given data was 50. This confirms that D-VARS is extremely efficient in identifying the most influential parameters, when the sample size is very small. Furthermore, Figure 2 shows that the ranking of parameters with moderate importance in HBV-SASK, {α, UBAS, and K1}, stabilized when the sample size was larger than 200. A close examination of the results reveals that in more than 50 replicates, the ranking of these parameters converged to the true ranking, when the sample size is only 200. It is noteworthy that parameters with the least influence on HBV-SASK, {EFT, LP, K2, and beta}, have the slowest convergence rate in terms of ranking.
Now let us directly look at actual sensitivity indices, IVARS-50, derived by D-VARS. Figures 3 and 4 show IVARS-50 values (scaled between 0 and 1) for the HYMOD and HBV-SASK parameters, obtained from the 100 replicates of each experiment. For an extremely small sample size (i.e., 20), D-VARS showed highly variable performance. However, by increasing the sample size, all the replicates quickly converged to a single set of IVARS-50 (i.e., true values), particularly for the most influential parameters. This confirms the robustness of D-VARS against sampling variability. For the least influential parameters, the IVARS-50 values may not be distinguishable across all the replicates, even for the larger sample sizes. This is mainly because these parameters are almost non-influential on the output of interest and their associated IVARS-50 values are close to 0.


4.2 Physical Justifiability of Sensitivity Assessments
A critical question that one might ask after running GSA is whether the results are physically meaningful. Based on our results, we can (rather subjectively) categorize the parameters of HYMOD into two groups: influential: {Rq, Cmax, and alpha}, and non-influential: {bexp and Rs}, and those of HBV-SASK into three groups: (i) strongly influential: {PM, C0, FRAC, TT, and FC}, (ii) moderately influential: {a, UBAS, and K1}, and (ii) non-influential: {EFT, LP, K2, and beta}. Recall that these assessments are based on choosing NS values and 10-year flood estimates as the outputs for the HYMOD and HBV-SASK case studies, respectively. We know from hydrology domain knowledge that both outputs should, in principle, be dominantly controlled by high flows in the hydrograph.
In case of HYMOD, D-VARS ranked Rq, a parameter controlling the quick flow generation, as the most influential parameter. This assessment is physically justifiable, as the Leaf River basin is a rainfall-dominated basin with a history of torrential storms. Most influential parameters of HBV-SASK, however, are those mainly responsible for the snowmelt (C0 and TT) and soil (FRAC and FC) processes. The high influence of C0 and TT can be justified because in the Oldman River basin major floods are governed by snow accumulation and melt in early spring. From the structure of both models, it is evident that alpha (in HYMOD) and FRAC (in HBV-SASK) determine the fraction of soil release entering fast reservoir and accordingly play a significant role on the high flow values. Moreover, Cmax (in HYMOD) and FC (in HBV-SASK) account for partitioning of the precipitation into runoff and soil moisture, and thus can significantly impact the simulated high flows. Additionally, our method recognized α and UBAS among the influential parameters for peak flow generation in HBV-SASK since they control timing and attenuation of the release from the fast reservoir. Finally, D-VARS identified Rs (in HYMOD) and K2 (in HBV-SASK) as the non-influential parameters. These parameters represent the release pace of slow reservoir in the structure of these models, and as such are only responsible for base flows (minimally contributing to peak flows).
4.3 The Proposed Robustness Index
The assessment of actual robustness, as presented in section 4.1, is typically infeasible in practice because only a single data set is often available or the model under investigation is computationally expensive to generate multiple independent sets of input-output data. This study, therefore, presented a novel robustness index that can estimate the robustness of D-VARS for any given data (see Text S2). The performance of this new index for the synthetic samples taken from the HYMOD and HBV-SASK models across 100 replicates are depicted in Figure 5.

As shown in Figure 5, by increasing the sample size, the variability of the robustness index obtained over all replicates of the algorithm became lower, and consequently the robustness of D-VARS became higher. Also, for both models, the robustness indices quickly converged toward 100% (i.e., perfect robustness). When the sample size was larger than 100, almost all the robustness indices were higher than 50%. For clarity, see the medians of the estimated robustness indices (the horizontal black lines in Figure 5). Notice that the medians of the robustness indices were already quite high when the sample size was 100 (55% for HYMOD and 79% for HBV-SASK) and increased rapidly thereafter to be 90% for HYMOD and 93.7% for HBV-SASK at 400 samples.
5 Conclusions
GSA has often been tied to ad hoc experimental designs and defined in the context of mathematical models. This study tried to take GSA to a next level by proposing a new method to conduct GSA for any given data, independent of experimental designs and mathematical models. Our proposed method, called D-VARS, was based on variography via GP modeling to characterize the spatial correlation properties of the underlying response surface for estimating sensitivity/dependence indices. D-VARS is not only a computationally cheap method that works with any given data but also makes, in principle, less confining assumptions than most of the existing sampling-based methods. For example, many GSA methods assume that the input data themselves are uncorrelated (see Do & Razavi, 2020), while D-VARS handles correlated inputs as well (not shown in this paper). We examined the performance of the method using two benchmark hydrologic case studies. Overall, our results demonstrated that D-VARS accurately estimates the true sensitivity measures requiring very small sample sizes.
- D-VARS can open up a new area of research, where GSA is applied to any data set, even when the underlying relationship and mechanisms are not known.
- D-VARS can enable GSA for computationally expensive models, wherein conventional GSA is handicapped, as D-VARS requires minimal computational effort to produce robust results.
Further, we argued that any GSA practice must be accompanied by the assessment of the robustness, which is typically neglected in the literature. To this end, we developed a new robustness index and showed that it can consistently provide an accurate evaluation of robustness. Our proposed index can be easily used in practice since it works when only a single (albeit small) data set is available. Future work may include testing our proposed method on real-world data obtained from field observations or remote sensing to better support model development and understanding.
Open Research
Data Availability Statement
Data sets for this research is available through Razavi et al. (2019).