Volume 49, Issue 10 p. 6871-6892
Regular Article
Free Access

An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

Guannan Zhang

Guannan Zhang

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA

Search for more papers by this author
Dan Lu

Dan Lu

Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA

Search for more papers by this author
Ming Ye

Corresponding Author

Ming Ye

Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA

Corresponding author: M. Ye, Department of Scientific Computing, Florida State University, 489 Dirac Science Library, Tallahassee, FL 32306-4120, USA. ([email protected])Search for more papers by this author
Max Gunzburger

Max Gunzburger

Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA

Search for more papers by this author
Clayton Webster

Clayton Webster

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA

Search for more papers by this author
First published: 13 August 2013
Citations: 68


[1] Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.

Key Points

  • High-order stochastic collocation method is used for Bayesian inference
  • Adaptive sparse grids are employed to reduce computational cost
  • An iterative algorithm is proposed for simulating PPDF with multiple modes

1. Introduction

[2] Groundwater models are vital tools for predicting the effects of future anthropogenic and/or natural occurrences in the subsurface environment. Model predictions are inherently uncertain due to epistemic and aleatory uncertainties in data and model parameters and structures; uncertainty quantification in groundwater modeling is indispensable, and many methods of uncertainty quantification have been developed to facilitate science-informed decision making in water resource management (see recent review articles of Matott et al. [2009] and Tartakovsky [2013], and references therein). While this study is only for quantification of parametric uncertainty, the results can be used directly for quantification of model uncertainty, because quantifying parametric uncertainty is the basis of quantifying model structure uncertainty in the popular multimodel analysis methods [Neuman, 2003; Ye et al., 2004; Poeter and Hill, 2007; Refsgaard et al., 2012; Neuman et al., 2012; Lu et al., 2012a]. The Bayesian method is one of the most widely utilized approaches for quantifying parametric uncertainty [Kitanidis, 1986; Box and Tiao, 1992; Ezzedine et al., 1999; Beck and Au, 2002; Marshall et al., 2005; Marzouk et al., 2007; Ma and Zabaras, 2009; Marzouk and Xiu, 2009; Allaire and Willcox, 2010; Renard, 2011; Zeng et al., 2012; Kitanidis, 2012; Lu et al., 2012b; Shi et al., 2012], wherein model parameters and predictions are modeled as random variables. The Bayesian methods are well connected with and complementary to other methods of uncertainty quantification [e.g., Woodbury, 2011; Nott et al., 2012]. They are flexible and can be applied to different models to incorporate multiple types of data and prior information [e.g., Woodbury, 2007; Rubin et al., 2010; Chen et al., 2012]. The outputs of Bayesian methods are probability density functions of quantities of interest that can be directly used for uncertainty quantification, risk assessment, and decision making.

[3] In the Bayesian inference framework, this paper presents a computationally efficient method developed using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method to reduce the computational cost of Bayesian computation, which is always a burden for practical Bayesian applications especially to computationally demanding models with a large number of parameters. When estimating the posterior probability density function (PPDF) in Bayesian inference, except in special cases in which analytical expressions of the PPDF can be derived [Woodbury and Ulrych, 2000; Hou and Rubin, 2005], the PPDF is usually estimated numerically using sampling techniques. One of the most popular and robust sampling techniques is the Markov Chain Monte Carlo (MCMC) method [Marshall et al., 2005; Gamerman and Lopes, 2006; Vrugt et al., 2008, 2009; Keating et al., 2010; Liu et al., 2010]. However, MCMC methods are in general computationally expensive, because a large number of model executions are needed to estimate the PPDF and sample from it. Many MCMC algorithms have been developed to improve computational efficiency, such as delayed rejection and adaptive Metropolis sampling [Haario et al., 2006] and differential evolution adaptive Metropolis (DREAM) sampling [Vrugt et al., 2008, 2009], by reducing the needed number of model executions. The number of model executions is of primary interest, because computational cost of solving the models dominates over that of other MCMC calculations that are simple algebraic operations. However, even with these advanced methods, the number of model executions is still often in the order of magnitude of tens of thousands or even hundreds of thousands. As a result, applications of MCMC approaches are prohibitive for computationally demanding models such as those of groundwater reactive transport, one solution of which may take tens of minutes and even hours [Zhang et al., 2012].

[4] In this study, the problem of high computational cost of MCMC simulations is resolved by incorporating sparse-grid methods into MCMC operation to develop sparse-grid-based MCMC algorithms. The sparse-grid methods in a broader sense are one of surrogate methods that have been used to improve computational efficiency in water resources research [Razavi et al., 2012]. The key idea of sparse-grid methods is to place a grid in the parameter space with sparse parameter samples (as opposed to a full tensor-product grid). Then the forward model is solved only for the sparse parameter samples to save computational cost. More specifically speaking, the method used in this study is a stochastic collocation method at sparse grids, also known as the sparse-grid stochastic collocation method [Nobile et al., 2008a, 2008b]. Another popular collocation method is the probabilistic collocation method that uses the finite-dimensional polynomial chaos expansion [Marzouk et al., 2007; Li and Zhang, 2007; Shi et al., 2009]. A comprehensive comparison of the two stochastic collocation methods can be found in the study of Chang and Zhang [2009] in terms of their accuracy and efficiency. While such a comparison is of high significance to the selection of an appropriate method for different applications, it is beyond the scope of this study. The sparse-grid methods have been demonstrated to be efficient and effective for dealing with high-dimensional interpolation and integration, and they have been used recently in groundwater uncertainty quantification. In the studies of, e.g., Shi and Yang [2009], Lin and Tartakovsky [2009, 2010], and Lin et al. [2010], the sparse-grid methods were used to estimate the mean and covariance of groundwater state variables such as hydraulic head and solute concentrations. In these studies, parameter distributions were assumed known, and Bayesian inference was not conducted. Bayesian inference using the sparse-grid method was conducted in the study of Ma and Zabaras [2009] and Zeng et al. [2012], in which surrogate of geophysical models was built and then used to evaluate parameter distributions using observations of state variables.

[5] While the aSG-hSC method presented in this paper is in spirit similar to that of Ma and Zabaras [2009] and Zeng et al. [2012] in terms of using the sparse-grid method to improve computational efficiency of Bayesian inference, our method tackles a more challenging problem of uncertainty quantification and offers more computationally efficient structures of sparse grids. Different from the previous studies of sparse-grid methods that only quantify uncertainty in flow and advection-dispersion problems, this study conducts uncertainty quantification for groundwater reactive transport models, which are significantly more nonlinear due to nonlinear reactions and coupling between flow, transport, and biogeochemical processes. The nonlinearity causes two challenges to applications of sparse-grid methods. First, if the surrogate systems of the nonlinear models are constructed using linear hierarchical basis functions as in previous groundwater applications, more sparse-grid interpolation points, i.e., more model executions, are needed to obtain the prescribed interpolation accuracy, which plagues the purpose of using sparse-grid methods. The other challenge is that the nonlinearity always leads to extremely complex surface of likelihood function (or its least square equivalent) with a large number of local minima such as those reported in Matott and Rabideau [2008] and Shi et al. (Assessment of parametric uncertainty for surface complexation modeling of uranium reactive transport, submitted to Water Resources Research, 2013) for nitrogen and uranium reactive transport, respectively. The multiple local minima correspond to multiple modes (significant or insignificant) on the surface of the PPDF. Existing algorithms cannot succeed in capturing all the significant modes or may succeed only with significantly increased computational effort. The two problems caused by nonlinearity are not limited to groundwater reactive transport models but prevalent to all nonlinear models.

[6] The aSG-hSC method is developed to resolve the two challenges above. To resolve the first challenge of efficiently approximating the PPDF involving nonlinear groundwater reactive transport models, the surrogate system with a sparse-grid interpolation is constructed with high-order stochastic collocation (hSC) approach, i.e., utilizing high-order hierarchical polynomial basis with quadratic or cubic polynomials as in Griebel [1998] and Bungartz and Griebel [2004]. Due to their increased accuracy compared to the linear hierarchical basis, the number of model executions needed for constructing the surrogate system can be greatly reduced. The high-order approach is not a trivial extension from the linear technique [Zhang et al., 2010)], and it is the first time that the high-order stochastic collocation method is used not only in groundwater modeling but also in surrogate modeling for Bayesian inference. Furthermore, instead of building the approximate PPDF using isotropic sparse-grid interpolation [Nobile et al., 2008a; Barthelmann et al., 2000] or dimension-adaptive sparse-grid interpolation [Nobile et al., 2008b], a locally adaptive sparse-grid (aSG) interpolation [Griebel, 1998] is used. This technique utilizes the hierarchical surplus (discussed in section 3.2) as an error indicator to detect the nonsmooth and/or important regions in the parameter space and adaptively place more points in the regions. This results in further computational gains and guarantees that a user-defined accuracy of the surrogate system is realized.

[7] To resolve the second challenge of reducing the computational cost of constructing the surrogate system for a PPDF with multiple modes, an iterative procedure is developed for the aSG-hSC method to incorporate optimization results into the surrogate construction. Using aSG-hSC together with optimization is considered as a strength, since it can leverage extensive research in the area of optimization. The design of the iterative procedure is based on the following observations. In MCMC-based Bayesian inference, large parameter ranges are always specified in the prior distribution due to lack of information. If multiple modes exist on the PPDF, there are high-probability regions around each significant mode (definition of the high-probability regions is given in section 3 below). Markov chains move toward the high-probability regions and generate random samples by following the Metropolis rule [Gamerman and Lopes, 2006]. During this process, a large number of samples are discarded in the burn-in period and rejected due to the Metropolis rule, and model executions corresponding to these samples are wasted. This procedure of sampling can be made more computationally efficient using the adaptive sparse-grid techniques, if the approximate locations of the modes are known from optimization. This motivates the iterative aSG-hSC method. In each iteration, global or local optimization is utilized to detect each significant mode of the PPDF, and the corresponding high-probability region is determined based on optimization results such as Hessian matrix at the found optimum. Subsequently, the high-probability region is incorporated into the prior distribution, and the aSG-hSC method is used to construct surrogate within the high-probability region. This is the key to saving computational cost, because the surrogate is not constructed over a large parameter space where a significant number of sparse grid points are blindly placed in the low-probability regions. However, there is a trade-off between the saved computational cost and that spent on optimization, which is discussed in the numerical examples in section 4. The iteration stops until all significant modes are identified according to a user-specified significance tolerance. It is demonstrated in section 4 that our method can find all the modes whose significance is larger than a user-defined significance tolerance. Note that the aSG-hSC method is independent of MCMC methods, so that it can be used together with any MCMC methods. In addition, because both the aSG-hSC and MCMC methods are model independent, the resulting sparse-grid-based MCMC algorithms can be applied to a wide range of problems.

[8] The rest of the paper is organized as follows. In section 2, the Bayesian framework and the conventional MCMC method used in this study are briefly introduced, followed by the iterative aSG-hSC method of constructing the surrogate system presented in section 3. In section 4, the new approach is applied to reactive transport problems and its effectiveness and efficiency in comparison with the conventional MCMC method is demonstrated.

2. Bayesian Inference and MCMC Simulation

[9] Consider the Bayesian inference problem for a nonlinear model
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0002 is a vector of Nd measurement data, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0003 is a vector of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0004 model parameters, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0005 is the forward model with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0006 inputs and Nd outputs, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0007 is a vector of residuals, including measurement, model parametric, and structural errors.
[10] The posterior distribution urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0008 of the model parameters urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0009, given the data urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0010, can be estimated using the Bayes' theorem [Box and Tiao, 1992] via
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0012 is the prior distribution and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0013 is the likelihood function to measure goodness-of-fit between model simulations and observations. The prior distribution can be specified using data of previous studies in similar sites or expert judgment. When prior information is lacking, a common practice is to assume uniform distributions with relatively large parameter ranges such that the prior distribution does not affect the estimation of posterior distribution. Selection of a likelihood function appropriate to a specific problem is still an open question. Generally speaking, there are two types of likelihood functions, formal and informal, in literature. A commonly used formal likelihood function is based on the assumption that the residual term urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0014 in equation 1 follows a multivariate Gaussian distribution, which leads to the formal Gaussian likelihood function listed in Table 1. However, validity of the explicit Gaussian assumption in practice is always criticized, despite that the Gaussian likelihood function has been used with success for decades. The informal likelihood function is designed carefully to implicitly account for errors in measurements, model inputs, and model structure and to avoid overfitting to measurement data [Beven and Binley, 1992; Smith et al., 2008; Schoups and Vrugt, 2010; Smith et al., 2010)]. Several widely used informal likelihood functions in hydrology are also listed in Table 1. Definition of informal likelihood function is problem specific in nature, and there has been no consensus on which informal likelihood function outperforms the others.
Table 1. Examples of Likelihood Functionsa
Formal Likelihood Functions
Multivariate normal (MVN) urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0015
Informal Likelihood Functions
Exponential (EXP)
Nash-Sutcliffe (NS)
Mean cumulative error (MCE)
Normalized sum of squared errors (NSSE)
  • a urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0020, mean of observations; urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0021, mean of outputs of forward model; ζ, scaling constant for the exponential likelihood function; Σ, covariance matrix of residual for the Gaussian likelihood function.

[11] This study is not to investigate how to define likelihood functions but how to efficiently build surrogate models for a chosen likelihood function using the aSG-hSC approach. As a function approximation method, aSG-hSC only requires that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0022 in equation 2 is a continuous function, which is satisfied by all likelihood functions in literature including those listed in Table 1. In section 4 of numerical examples, for the sake of illustration, the Gaussian likelihood function is used for the first numerical example and the likelihood function of exponential type for the second numerical example.

[12] While this study is focused on quantification of parametric uncertainty, its results can be used directly for quantification of model uncertainty. In parametric uncertainty quantification, the denominator of the Bayes' formula in equation 2 is a normalization constant that does not affect the shape of the PPDF. As such, in the hereafter discussion concerning building surrogate systems, the notation urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0023 or the terminology PPDF will only refer to the product urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0024. When extending this research to quantification of model uncertainty, the denominator, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0025, becomes critical. In the Bayesian model averaging method that considers model uncertainty due to alternative models [e.g., Ye et al., 2004, 2008, 2010], for an individual model Mk, this term becomes the model likelihood function, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0026, where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0027 is the joint likelihood function of the model and its parameters. The model likelihood function is the most critical variable for evaluating model probability used to quantify model uncertainty. Although this term can be evaluated using the aSG-hSC method, it is beyond the scope of this study.

[13] Due to the nonlinearity of model urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0028 with respect to parameter urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0029, it is often difficult to draw samples from the PPDF directly, so that the MCMC methods, such as Metropolis-Hastings (M-H) algorithm [Gamerman and Lopes, 2006] and its variants, are often used for sampling. The essence of the MCMC methods is that parameter samples are drawn from a proposal distribution instead of the PPDF and the Markov property guarantees the convergence of the proposal distribution to the posterior distribution. However, in practice, the convergence is often slow when the proposal distribution deviates from the posterior distribution. Many advanced MCMC methods have been developed, and one of them is the Differential Evolution Adaptive Metropolis Approach (DREAM) developed by Vrugt et al. [2008, 2009]. The DREAM algorithm uses multiple Markov chains simultaneously, and all chains are viewed as from the same population, and the sampling procedure is treated as the evolution of the population. As such, the classic proposal distribution used in the M-H algorithm is not necessary, and the jump of each Markov chain at each step is determined by differential evolution of a genetic algorithm. It was shown by Vrugt et al. [2008, 2009] that DREAM is generally more efficient than traditional MCMC algorithms in the absence of additional information about the PPDF. Moreover, DREAM is more advantageous in dealing with multimodal posterior distribution, which matches our goal of building a surrogate system for a posterior distribution with multiple significant modes. For these reasons, DREAM is chosen in this study as the framework of Bayesian inference. Using aSG-hSC together with DREAM is considered as a strength, because it leverages the recently developed MCMC algorithm. However, it should be noted that the aSG-hSC method to build surrogate system can be used with other MCMC algorithms.

3. Iterative aSG-hSC Methodology

[14] This section describes the iterative aSG-hSC method to construct the surrogate system for estimating the posterior distribution. To provide context for the method described in section 3.3, determination of the high-probability region is first introduced in section 3.1, followed by describing the high-order hierarchical polynomial basis and the adaptive sparse-grid interpolation in section 3.2.

3.1. Determining High-Probability Region as the Prior

[15] In the context of Bayesian inference, the searching region for MCMC sampling can be represented by
which is usually large due to lack of prior knowledge about the posterior distribution urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0031. In the parameter space, it is always the case that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0032 is very small (close to zero) within a large part of Γ but significant in one or several subregions. These subregions are referred to as high-probability regions in this paper, denoted by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0033 and rigorously defined as
where δ is a user-specified threshold, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0035 is the maximum value of the PPDF. Equation 4 indicates that the high-probability region, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0036, is within the contour urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0037. For MCMC sampling, after convergence, all Markov chains will only move around high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0038. Although some trial samples of MCMC can jump out of the high-probability regions, most of them will be rejected such that almost all accepted MCMC samples after burn-in period fall into urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0039. Thus, it is a waste of computational effort to build an accurate surrogate system for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0040 over the whole searching region Γ. Instead, it is computationally more efficient to approximate urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0041 in the high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0042. Thus, we seek to define the high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0043 for each significant mode. When the PPDF has multiple significant modes, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0044 consists of several disjoint subregions in Γ, which can be defined iteratively as discussed in section 3.3.

[16] Defining an individual high-probability region starts with searching the global maximum of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0045 (i.e., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0046) using global optimization. The objective function used for optimization may be different from the posterior distribution. For example, if the selected likelihood function is of an exponential type, such as the formal multivariate normal likelihood and the informal exponential likelihood in Table 1, using the logarithm of the likelihood function in optimization gives more stable results than using the likelihood function itself [Pflüger, 2005]. This strategy is also applicable to the prior distribution. Thus, for the convenience of notation, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0047 is used to represent the objective function, which can be viewed as the preprocessing of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0048 for optimization. For example, when using uniform prior and informal exponential likelihood in Table 1, the objective function is defined by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0049 where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0050 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0051 are the mean of the observations and outputs of the forward model, respectively. When using the Nash-Sutcliffe likelihood, the objective function can be defined as urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0052 to guarantee enough gradient information over the searching region. Note that the preprocessing function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0053 must be monotonic such that it is invertible. While any global optimization algorithm can be used, the DIRECT algorithm is used in this study. DIRECT, first proposed by Jones et al. [1993], is a derivative-free global optimization algorithm and an improvement of the standard Lipschitzian approach that eliminates the need to specify a Lipschitz constant.

[17] Once the global parameter optimum urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0054 is obtained, the next step is to define the high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0055 around urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0056. This can be done by investigating how fast the posterior distribution decays to zero away from urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0057 through estimating the curvature of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0058, as more curved function decays faster. Since the second-order derivative is a measure of a function's curvature, the Hessian matrix of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0059 at urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0060, denoted by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0061, is used to determine the high-probability region. In the case of multivariate Gaussian distribution, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0062 is the inverse of the covariance matrix and sufficient to define the Gaussian shape; in non-Gaussian cases, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0063 still provides sufficient information about the curvature of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0064 around urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0065.

[18] In this work, defining urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0066, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0067 is estimated via [Nocedal and Wright, 2006]
for the diagonal entries and
for the off-diagonal entries. Here urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0070 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0071 are vectors with zero elements except for the lth and mth entries, which are equal to properly selected steps urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0072 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0073. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0074 can be estimated by using singular value decomposition (SVD), i.e.,
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0076 contains the singular values urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0077, each of which characterizes the variance of the posterior distribution along the individual orthogonal singular vectors in V. Note that when urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0078 is an uncorrelated random variable, both V and U are identity matrices and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0079 characterize the variances along the axes of the parameters. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0080 also represents a linear transform in urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0081 dimensional space where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0082 and V determine the rotation and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0083 determine the stretching along orthogonal directions. Based on these results, the high-probability region, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0084, corresponding to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0085 is defined by transforming a unit cube urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0086 via
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0088 is a scaled matrix of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0089 by scaling vector urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0090, i.e., the diagonal terms of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0091 are urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0092. The scaling vector urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0093 can be a user-defined constant vector that determines the volume of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0094. The urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0095 value can be easily determined in a Gaussian case based on probability tables. For example 99.7% of the samples are within three standard deviations from the mean value. In a non-Gaussian case, although it is not straightforward to automatically find an optimal value of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0096, it can be set a little large to guarantee that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0097 covers urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0098 for most common cases.
[19] The numerical example shown in Figure 1 is used to illustrate how to define the high-probability region for both Gaussian and non-Gaussian densities. Consider two two-dimensional density functions, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0099 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0100, defined by
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0102 is a Gaussian density with mean urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0103 and covariance matrix urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0104. While urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0105 is pG itself, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0106 is non-Gaussian. The contours of the two density functions are shown in Figures 1a and 1b, respectively. The searching region is set to be large with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0107. The maxima of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0108 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0109 are 44.5 found at (0.5,0.5) and (0.5, 0.4), respectively. The desired urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0110 in equation 4 is set as 0.01 ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0111), meaning that the defined high-probability region should cover the area within the contour of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0112. Figure 1 plots the high-probability regions for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0113 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0114. In Figure 1a, for the Gaussian density, the prior region with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0115 can cover urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0116 very well; in Figure 1b, for the non-Gaussian density, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0117 is needed to fully cover the contour of 0.01. In either case, the prior regions are dramatically smaller than the initial search region, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0118.
Details are in the caption following the image
(a) Contours of Gaussian density urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0119 and its high-probability regions defined with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0120 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0121. (b) Contours of non-Gaussian density urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0122 and its high-probability regions defined with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0123 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0124.

[20] Defining such high-probability regions for MCMC simulation has two advantages. First, since the volume of the high-probability region is often significantly smaller than the searching region, the computational cost of building the surrogate system of desired accuracy can be considerably reduced. In addition, because the high-probability region well covers the significant mode of the urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0125, the initial samples of Markov chains can be generated within such regions, which will significantly accelerate the convergence of MCMC sampling. While both the global optimization and the calculation of the Hessian matrix require forward model executions, such computational expense is worthwhile as long as more computational cost can be saved by working on the prior regions due to the two advantages. Although global optimization is used here, it may not be necessary in practice, because the aSG-hSC method is expected to perform well as long as the prior regions include the mode. In other words, a rough estimation of the location of each significant mode is sufficient for our method. This can be achieved by using local optimization techniques if information about the shape of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0126 is available, which will further reduce computational cost.

3.2. Adaptive Sparse-Grid High-Order Stochastic Collocation Method

[21] After obtaining the high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0127 using equation 8 for the significant mode of the PPDF around urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0128 in the parameter space, the next task is to build the surrogate model for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0129 on urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0130 using the aSG-hSC method. The method is only briefly described here, for more details refer to Griebel (1998), Barthelmann et al. (2000), Bungartz and Griebel (2004), and Klimke and Wohlmuth (2005). Since the methods of building surrogate systems are generally applicable to any functions governed by partial differential equations, not limited to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0131, a general function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0132 is used for the method description.

3.2.1. One-Dimensional Hierarchical Interpolation

[22] The basis of constructing the desired sparse-grid approximation in the multidimensional setting is the one-dimensional (1-D) hierarchical interpolation. Consider a function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0133 where the standard domain [0,1] can be rescaled to any bounded domain by translation and dilation. The 1-D hierarchical Lagrange interpolation formula is defined by
where the incremental interpolation operator urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0135 is given as
The nonnegative integer L in equation 10 is called the resolution level of the hierarchical interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0137 and the summation over the resolution level in equation 10 exhibits the hierarchical structure of the interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0138. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0139, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0140 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0141 in equation 11 are the basis functions and the interpolation coefficients for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0142, respectively. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0143, the integer mi in equation 11 is the number of interpolation points involved in urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0144, which is defined by
A uniform grid, denoted by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0146, can be utilized for the incremental interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0147. The abscissas of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0148 are defined by
Then, the hierarchical grid for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0150 is defined by

[23] Since the representation of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0152 depends on the properties of the selected basis function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0153, the basis functions is discussed first. Different from the previous studies that utilize linear hierarchical basis functions to build surrogate systems, this study uses high-order hierarchical polynomial basis functions, including quadratic and cubic hierarchical bases defined by Bungartz and Griebel [2004], in order to improve the accuracy and efficiency for constructing the surrogate system. Expressions of linear, quadratic, and cubic hierarchical polynomial bases are provided below.

[24] In the case of linear hierarchical basis, for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0154, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0155
For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0157, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0158,
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0160.
[25] In the case of quadratic hierarchical basis, for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0161, the basis is the same as the linear hierarchical basis defined in (15). For i = 1, j = 1 and 2, define urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0162 and set
For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0165, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0166,
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0168 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0169.
[26] In the case of cubic hierarchical basis, for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0170, the basis is the same as the linear hierarchical basis defined by equation 15. For and i = 1, the basis is the same as the quadratic hierarchical basis defined by equations 17 and 18. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0171, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0172 and j is odd,
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0174 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0175 For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0176, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0177 and j is even,
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0179 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0180
Details are in the caption following the image
One-dimensional three-level hierarchical bases: linear basis (left), quadratic basis (middle), and cubic basis (right).
[27] Figure 2 depicts the hierarchical basis functions from level 0 to level 3 for the linear, quadratic, and cubic bases. The definitions of basis and Figure 2 show that on each level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0181
Based on equations 10, 11, 22, and the interpolatory property of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0183, i.e., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0184 for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0185 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0186, representations of the coefficient urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0187 are derived as follows. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0188,
and for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0190, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0191,
The coefficient urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0193 is defined as the hierarchical surplus of the basis function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0194, which is the difference between the value of the interpolated function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0195 and the value of the interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0196 at urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0197. As discussed in Klimke and Wohlmuth [2005] and Ma and Zabaras [2009], when the function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0198 is smooth with respect to θ, the magnitude of the surplus urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0199 will approach to zero as the resolution level i increases. Therefore, the surplus can be used as an error indicator for the interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0200 in order to guide the sparse grid refinement.

3.2.2. Multidimensional Isotropic Sparse-Grid Interpolation

[28] Based on the one-dimensional hierarchical interpolation discussed in section 3.2.1, one can construct an approximation for a multivariate function, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0201, where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0202. Starting from the isotropic sparse-grid interpolation, analogous to the definitions of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0203 in equation 10 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0204 in equation 11, define the multidimensional hierarchical interpolation formula as
and the multidimensional incremental interpolation operator urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0206 is defined by
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0208 is a multi-index of the resolution level of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0209, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0210 is a strictly increasing function, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0211 belonging to the multi-index set
and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0213 is the hierarchical surplus. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0214 is the multidimensional hierarchical basis function defined by
where for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0216, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0217 is the one-dimensional hierarchical basis function. The multidimensional grid points urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0218 are defined corresponding to the basis urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0219. Equation 26 shows that the multidimensional incremental interpolation operator on level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0220 is the tensor product of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0221 one-dimensional incremental interpolation operators. This is the reason that the notation urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0222 can be used to illustrate the tensor-product operation. In the following discussion, the equivalent notation urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0223 is used to denote the incremental interpolation operator. The grids for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0224 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0225, denoted by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0226 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0227, respectively, are represented as
Note that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0229 involves a total of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0230 grid points. In addition, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0231 is composed of several incremental interpolants. Thus, the definition of the function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0232 in equation 25 determines the number of grid points involved in urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0233 and also the structure of the resulting grid. The following two definitions are given corresponding to the full tensor-product grids and isotropic sparse grids:
A L-level full tensor-product interpolant needs urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0235 grid points, where mi is defined in equation 12. This is also the number of model executions needed when building the surrogate system. Using the full tensor-product formulation, the number of grid points grows exponentially with the number of random parameters urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0236, which is the curse of dimensionality as the dimension urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0237 increases. By virtue of the second definition of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0238 in equation 30 corresponding to the isotropic sparse-grid interpolation, the curse of dimensionality can be resolved.

[29] Figure 3 illustrates how the curse of dimensionality is resolved, using the construction of a two-dimensional ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0239) level L = 3 isotropic sparse grid as an example. The definitions of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0240 in equation 30 show that an L-level isotropic sparse grid is a subgrid of an L-level full tensor-product grid. The resolution level in one dimension can be urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0241, and 3 as shown in the top horizontal lines in Figure 3a. The same is true for the other dimension as shown in the left vertical lines. There are a total of 16 subgrids in Figure 3a, each of which corresponds to an incremental interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0242 in equation 25, where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0243 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0244. Different combinations of i1 and i2 with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0245 lead to all the 16 subgrids in Figure 3a, the union of which constitutes the level L = 3 full tensor-product grid with the 81 grid points shown in Figure 3c. In comparison, different combinations of i1 and i2 with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0246 lead to only 10 subgrids above the dashed line in Figure 3a, the union of which constitutes the level L = 3 isotropic sparse grid with only the 29 grid points shown in Figure 3b. This reduction is significant even though the maximum number of interpolation points in each dimension is the same for the both grids. Generally speaking, an isotropic sparse grid contains approximately urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0249 points where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0250, whereas a full tensor-product grid contains urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0251 points [Nobile et al., 2008a]. Although significantly fewer points are used, the accuracy of the sparse-grid interpolation does not appreciably deteriorate compared to that of the full tensor-product interpolation [Barthelmann et al., 2000; Bungartz and Griebel, 2004]. Thus, in the sequel, the definition of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0252 in equation 25 is fixed to be urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0253 and referred to in equation 25 as an isotropic sparse-grid interpolant.

Details are in the caption following the image
Comparison of a three-level isotropic sparse grid in (b) and the corresponding full tensor-product grid in (c) based on Newton-Cotes points. The sparse grid in Figure 3b consists of the 10 coarse subgrids (black boxes above the dashed line in (a)), each of which is a coarse tensor-product grid with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0247. The sparse grid in (a) has only 29 points; The full tensor-product grid in Figure 3c, constructed by all the 16 subgrids in (a) with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0248, has 81 points.
[30] The coefficients urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0254 can be computed analogous to the one-dimensional case and following the discussion in Klimke and Wohlmuth [2005]. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0255, i.e., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0256, the coefficients are calculated as
For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0258, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0259 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0260, the coefficients are evaluated via
Next, we explain how to construct adaptive sparse grid (as opposed to isotropic sparse grid) using the higher-order hierarchical polynomials defined in equations 15-21.

3.2.3. Adaptive Sparse-Grid Interpolation

[31] As discussed above, if the function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0262 is smooth with respect to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0263, the magnitude of the hierarchical surplus will decay to zero as the resolution level L of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0264 increases. A smoother function has faster decay rates of the surplus. This feature is the basis of constructing adaptive sparse grids using the surplus as an error indicator. We start from the construction of one-dimensional adaptive grids and then extend it to the multidimensional sparse grids. As shown in Figure 4, the one-dimensional isotropic hierarchical grid have a tree-like structure, a grid point urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0265 on level i has two children, namely urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0266 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0267 on level i + 1. Special treatment is required when moving from level 1 to level 2, because only one child point is added on level 2 for nodes urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0268 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0269. On each successive interpolation level, the basic idea of adaptivity is to use the hierarchical surplus as an error indicator to detect the smoothness of the target function and refine the grid by adding two new points on the next level for each point whose magnitude of the surplus is larger than the prescribed error tolerance.

Details are in the caption following the image
A six-level adaptive sparse grid for interpolating a one-dimensional function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0270 on [0,1] with the error tolerance of 0.01. The resulting adaptive sparse grid has only 21 points (black points), whereas the full grid has 65 points (black and gray points).

[32] The adaptivity concept is illustrated in Figure 4, where the six-level adaptive grid is used to interpolate the Gaussian kernel function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0271 on [0,1] with the error tolerance being 0.01. From level 0 to level 2, because the magnitude of every surplus is larger than 0.01, two points are added for each grid point, except that only one point is added for each grid point on level 1. On level 3, since the surplus is larger than 0.01 at only one point, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0272, two new points are added after this point on level 4. When this procedure continues through levels 5 and 6, it leads to the six-level adaptive grid with only 21 points (points in black in Figure 4), whereas the six-level nonadaptive (isotropic) grid has a total of 65 points (points in black and gray in Figure 4).

[33] It is straightforward to extend the adaptivity from one-dimensional to multidimensional adaptive sparse grid. The isotropic level L sparse grid urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0273 in equation 29 can be rewritten as
where the grid points have the tree-like structure in each dimension. For example, a point urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0275 has two children points in each direction, so that it has a total of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0276 children. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0277, the two children of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0278, denoted by urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0279 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0280, are represented by
where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0282 with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0283 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0284. Note that the children of each sparse-grid point on level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0285 belong to the sparse-grid point set of level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0286. Adding children points is to perform the sparse-grid interpolation from level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0287 to level urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0288. In this way, the sparse grid is refined locally without breaking the structure of sparse grids.
[34] For a prescribed error tolerance α, the adaptive sparse-grid interpolant is defined as
where the multi-index set urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0290 is defined by modifying the multi-index set urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0291 in equation 27, i.e.,
Thus, the level L adaptive sparse-grid interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0293 in equation 35 only retains the terms of the isotropic sparse-grid interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0294 in equation 25 for which the magnitudes of the corresponding surpluses are larger than α. The corresponding adaptive spare grid can be represented by
which is a subgrid of the level L isotropic sparse grid urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0296 in equation 29. If the tolerance urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0297, the adaptive sparse-grid interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0298 is equivalent to the isotropic sparse grid interplant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0299 in equation 25; if urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0300, it will adaptively select which points are added to the sparse grid. Subsequently, the sparse-grid points will become concentrated in the nonsmooth region, e.g., where oscillations or sharp transitions occur, to guarantee the prescribed accuracy of the interpolation. On the other hand, in the region where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0301 is very smooth, e.g., insensitive to certain parameters, this approach will save a significant number of grid points but still achieve the prescribed accuracy.

[35] In practice, for a specific urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0302-dimensional target function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0303, the total number of sparse grid points and the accuracy of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0304 can be controlled by two user-defined constants L and α, where L defines the maximum allowable resolution of the sparse grid and the error tolerance α is used to guide mesh refinement. L is usually set to large according to the maximum affordable computational cost and α is set to the desired accuracy of the interpolation, which allows maximizing the use of the available computation resource. The mesh refinement can be stopped in two ways, when the magnitudes of all surpluses on the current level are smaller than α or when the maximum level L is reached.

3.3. Algorithm for Iterative Construction of the Surrogate PPDF

[36] Using the procedure of defining high-probability regions and the aSG-hSC method discussed in sections 3.1 and 3.2, respectively, one can iteratively construct the surrogate system for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0305 with multiple modes due to the nonlinearity of the reactive transport models. Figure 5 shows the flowchart of the iterative algorithm that sequentially captures all the significant modes. As shown in Figure 5, the algorithm starts from defining the searching region Γ of the Bayesian inference (also used for global optimization) and the objective function of global optimization urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0306. As discussed in Section 3.1, since the preprocessing function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0307 must be invertible, i.e., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0308 exists, we also use urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0309 as the target function of the sparse-grid approximation due to numerical stability issues studied in Pflüger [2005]. For example, when using the formal Gaussian likelihood function or the informal exponential likelihood function in Table 1, the logarithm of the likelihood function is used as the target function for both optimization and the surrogate system.

Details are in the caption following the image
Flowchart of the algorithm of iteratively constructing surrogate system of a posterior probability density function with multiple modes.
[37] The initial surrogate system urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0310 has no component, i.e. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0311. In the first iteration (k = 1) in Figure 5, global optimization is used to search for the global optimum urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0312 of the function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0313 through the global optimization operator urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0314. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0315 is the highest peak of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0316, so that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0317 is the most significant mode on urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0318. Subsequently, the inverse of Hessian matrix urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0319 of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0320 is calculated to determine the prior region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0321 around urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0322 based on equation 8 and the user-defined vector urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0323. On urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0324, the adaptive sparse-grid interpolant
is constructed by setting urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0326 in equation 35. Note that to generate sparse grids on an irregular shaped region, e.g., the prior regions in Figure 1, one needs to generate the sparse grid abscissas in the unit cube urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0327 and then map it onto urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0328 by the transformation urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0329 in equation 8. urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0330 is the first component of the surrogate system urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0331. After that, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0332 is updated to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0333, where urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0334 is the characteristic function of the prior region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0335
[38] The following iterations, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0336, are to find other modes of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0337 with the previous modes excluded and to build the surrogate system around the modes. Specifically speaking, in the kth iteration, the k – 1 components of the surrogate system are excluded from the optimization, and the optimization operator urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0338 is applied to only
where for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0340, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0341 is the mth component of the surrogate system defined on the domain urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0342, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0343 is the characteristic function of the region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0344 that avoids overlap of different prior regions. The maximum urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0345 represents the kth highest peak (mode) of the urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0346. Since the mode becomes less significant when k increases, the significance ratio
is used to terminate the iteration when urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0348 is too small to be significant. If the ratio is smaller than the user-defined significance threshold δ, for example, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0349, then the height of the peak of the posterior distribution at urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0350 is negligible in comparison with the highest peak at urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0351. As a result, there is no need to construct a surrogate component for such a negligible mode. Whenever a new mode urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0352 is found, the corresponding sparse-grid approximation urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0353 is constructed and added to the surrogate system urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0354. The final surrogate system for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0355 is
with M components for M significant modes. The total number of model executions for constructing urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0357 in equation 41 consists of those for global optimization, estimation of the Hessian matrix, and adaptive sparse-grid interpolation.

4. Numerical Examples of Groundwater Reactive Transport Modeling

[39] To illustrate effectiveness and efficiency of the iterative aSG-hSC method in building the surrogate system for the PPDF, it is applied to two synthetic examples of groundwater reactive transport modeling. The first example, adapted from Sun et al. [1999], considers multispecies reactive transport with six random parameters. Since the five reactions involved in this example are linear, this example can be used to evaluate the computational efficiency and accuracy of the aSG-hSC approach in approximating high-dimensional posterior distributions with high-order hierarchical basis. The second example is related to reactive transport of uranium (VI) in column experiment with four random parameters, which is revised from Kohler et al. [1996]. This example is more complicated, since it includes nonlinear reactions of surface complexation. For the same reactive transport model used in this study Shi et al. [submitted manuscript, 2013] found that the PPDF of the model parameters are non-Gaussian and have multiple modes. This example therefore can be used to evaluate computational efficiency and effectiveness of the iterative aSG-hSC method for a PPDF with multiple modes. To demonstrate that the aSG-hSC method is not limited to the Gaussian likelihood function, the informal likelihood function of the exponential type (Table 1) is used in the second numerical example. The aSG-hSC method is evaluated by comparing the results of aSG-hSC-based MCMC with those of the DREAM-based MCMC in approximating the PPDFs of model parameters and the PDFs of model predictions. Computational efficiency of aSG-hSC is evaluated from two perspectives: equation 1 the number of model executions required to obtain an estimate of the PPDF within a prescribed accuracy, and equation 2 the accuracy of the approximate PPDF for a given number of model executions. These two criteria are complementary in that the first criterion is for the situation when a large number of model executions is affordable while the second criterion when limited number of model executions is affordable. For the second numerical example, nonlinear regression is conducted using UCODE_2005 [Poeter et al., 2008] to estimate local parameter optimum and quantify parameter uncertainty. Due to high model nonlinearity, the nonlinear regression cannot identify multiple modes on PPDF of one parameter and cannot accurately quantify parameter uncertainty.

4.1. Case 1: Multispecies Reactive Transport

[40] This numerical example considers the transport of multiple reactive species coupled by a serial-parallel reaction network in a uniform flow field discussed in Sun et al. [1999]. As shown in Figure 6, species A has one child species B, and B has three child species C1, C2, and C3. The governing equations of the simultaneous transport and degradation of the five species involved in the serial-parallel reaction network are as follows:
where CA, CB, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0359, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0360, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0361 are the concentrations of the five species A, B, C1, C2, and C3, respectively, t is the time, x is the spatial location in the domain [0,40], v is the constant flow velocity, D is the dispersion coefficient, kA, kB, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0362, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0363, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0364 are the reaction rates of the species, yB is the stoichiometric yield factor that describes the production of its parent species A to B, and likewise for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0365, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0366, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0367.
Details are in the caption following the image
Serial-parallel reaction network in Case 1 of multispecies reactive transport.

[41] Using the parameter values given by Sun et al. [1999], synthetic data are generated by solving equation 42 at time t = 40 and 10 points of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0368 using numerical code PHT3D [Prommer and Post, 2010]. A total of 50 concentrations are generated for the five species, and they are corrupted with 3% Gaussian random noise; the corrupted data are treated as measurements. In the DREAM-based MCMC simulation, the six parameters listed in Table 2 are considered as unknown parameters, which are the dispersion D and the logarithm of the five reaction rates, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0369, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0370, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0371, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0372, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0373. Their true values are listed in Table 2; the other parameters are fixed at their true values of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0374.

Table 2. True Parameter Values and Searching Region, Γ, of the Parameters in Case 1
True value 10 −1.6094 −2.3026 −3.9120 −3.9120 −3.9120
Γ [1,20] [−10, −0.1] [−10, −0.1] [−10, −0.1] [−10, −0.1] [−10, −0.1]
[42] The surrogate system is constructed by using the aSG-hSC algorithm discussed in section 3.3. The large searching region Γ of the six parameters are listed in Table 2. The global optimization takes 1034 model executions to find the first maximum
of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0381 ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0382 to guarantee that g is positive for numerical convenience). The corresponding urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0383 is −7.9128. Computing the Hessian matrix urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0384 using equations 5 and 6 requires 73 model executions. The inverse of the Hessian matrix is
Using equation 8 with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0386 leads to the prior region, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0387, for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0388. The diagonal entries of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0389 indicate that urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0390 is significantly smaller than Γ. Therefore, building surrogate system on the high-probability region can greatly reduce computational cost compared to that on the searching domain Γ. Subsequently, the adaptive sparse-grid interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0391 in equation 35 is constructed on urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0392 by setting urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0393, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0394, L = 20 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0395. This is the first component urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0396 of the surrogate system urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0397. The sparse grid interpolant is constructed using the linear, quadratic, and cubic basis functions shown in Figure 2. The number of model executions needed for the three interpolants are 6760, 1909, and 1299, respectively, which are also the number of points of the three corresponding adaptive sparse grids.
[43] After constructing the first component urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0398, the second maximum of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0399 is obtained by conducting the global optimization on the remainder urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0400. The second round of optimization takes 1359 model executions to find
with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0402. If one sets the significance tolerance in Figure 5 to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0403, the significance ratio defined by equation 40
is negligible. Having only one mode is not surprising, given the linear reactions. Therefore, there is no need to construct the surrogate component for the second mode and the iteration is terminated.

[44] With the surrogate systems, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0405, constructed using the linear, quadratic, and cubic hierarchical basis, the PPDF of model parameters are estimated by conducting MCMC simulations. The DREAM-based MCMC simulation without using the surrogate systems is also conducted, which is referred to the conventional MCMC, and its results are used as the reference to evaluate accuracy and efficiency of the three basis functions. All the MCMC simulations are conducted using the same searching domain Γ listed in Table 2. The prior distribution of each parameter is assumed to be uniform distribution with bounds the same as the searching domain. Each MCMC simulation draws 60,000 parameter samples using three Markov chains, each of which evolves 20,000 generations. Convergence of the Markov chains is examined using the Gelman-Rubin R statistic [Gelman et al., 1995], which indicates that the chains converge after 720, 840, 970, and 760 generations for the linear, quadratic, and cubic surrogates and the conventional MCMC, respectively. For simplicity, the first 1000 generations of each chain are discarded in all the four simulations, and the remaining urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0406 samples are used to estimate the PPDF.

[45] Figure 7 plots the marginal PPDFs for the six parameters. The black vertical line represents the true values of the six parameters listed in Table 2. The red-solid lines are the marginal PPDFs estimated by the conventional MCMC, and the dashed lines represent those estimated by the MCMC simulations based on the surrogate systems. The figure indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are close to those of the conventional MCMC. However, the surrogate-based MCMC needs significantly fewer model executions. In comparison with the 60,000 model executions for the conventional MCMC, the number of model executions for the surrogate-based MCMC are 9226, 4375, and 3765 for the linear, quadratic, and cubic surrogate systems, respectively, which consists of those of global optimization, calculation of the inverse of Hessian matrix, and construction of surrogate systems. For the surrogate-based MCMC simulations, drawing the 60,000 parameter samples does not require any model executions but negligible computational time for polynomial evaluation using the surrogate systems. The improvement of computational efficiency by using our surrogate systems is more outstanding when more parameter samples are drawn in the MCMC simulation.

Details are in the caption following the image
Marginal posterior probability density functions of the six parameters in Case 1 estimated using the conventional MCMC (C-MCMC) with 60,000 model executions (red-solid lines), the linear, quadratic, and cubic surrogate systems with 9226, 4375, and 3765 model executions (dashed lines), respectively. The black vertical lines represent true parameter values. Take the conventional MCMC results as reference, the estimations by the surrogate systems are accurate enough but with computational cost greatly reduced.

[46] The accuracy of the surrogate-based MCMC and the conventional MCMC is also compared by running the conventional MCMC with the same computational effort of surrogate-based MCMC, i.e., using the number of model executions needed to construct the surrogate systems. The marginal PPDF for each parameter based on the conventional MCMC with 9226, 4375, and 3765 samples are plotted in Figure 8 as dashed lines. Comparing Figures 7 and 8 indicates that, with the same number of model executions, the approximations in Figure 7 using our surrogate systems are more accurate than those in Figure 8 using the conventional MCMC, suggesting the efficiency of our surrogate-based MCMC method.

Details are in the caption following the image
Marginal posterior probability density functions of the six parameters in Case 1 estimated using the conventional MCMC (C-MCMC) with 60,000 model executions (red-solid lines) and 9226, 4375, and 3765 model executions (dashed lines), the same number of model executions used to constructing the linear, quadratic, and cubic surrogate systems, respectively, in Figure 7. The black vertical lines represent true parameter values. Without using the sparse-grid method, the same number of conventional MCMC simulations cannot yield satisfactory results.

[47] To investigate computational efficiency between the three linear, quadratic, and cubic interpolants, Figure 9 plots their error decay with the number of interpolation points. To attain the same error, the cubic interpolant needs significantly fewer interpolation points than the linear and quadratic interpolants. This indicates that the surrogate system based on high-order hierarchical basis (i.e., the cubic basis) is more efficient than that with linear hierarchical basis. It suggests that, when computational resources are limited, using higher-order hierarchical basis is a better choice.

Details are in the caption following the image
Errors of the aSG-hSC surrogate systems based on linear, quadratic, and cubic hierarchical basis functions in Case 1. For the same computational cost, i.e., the same number of model executions, high-order hierarchical bases (quadratic and cubic) achieve higher accuracy than linear hierarchical basis; for the same accuracy, i.e., the same error, the high-order hierarchical basis also achieve better efficiency (i.e., smaller number of model execution) than the linear hierarchical basis.

[48] Predictive performance of the four types of MCMC simulations is evaluated by using the parameter samples obtained above to predict spatial distribution of the concentration of species C3 with a different velocity of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0407 at time urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0408 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0409. For each parameter sample drawn in the conventional MCMC, PHT3D is run for predictions; for the samples drawn in the surrogate-based MCMC, the predictions are estimated based on the surrogate systems. The surrogate systems are built via urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0410 in equation 35 by setting urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0411 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0412. Figure 10a shows that the upper and lower bounds of the 95% credible intervals based on the conventional and surrogate-based MCMC are identical. Figure 10b plots the probability density functions of C3 at a fixed location and time urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0413 obtained using the four types of MCMC simulations. Figure 10b indicates that the four density functions are almost identical except at the peak. However, the computational cost of prediction is dramatically different for the conventional MCMC and surrogate-based MCMC. While the conventional MCMC needs to run the model for 57,000 times (the number of parameter samples), the surrogate-based MCMC only needs to run the models for 1853, 1032, and 793 times for building the linear, quadratic, and cubic surrogate systems, respectively.

Details are in the caption following the image
(a) Spatial distribution of true concentration (black dots) of species urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0414 at t = 40, the 95% credible intervals estimated using conventional MCMC (red-solid lines) and the surrogate systems with linear, quadratic, and cubic hierarchical basis (dashed lines) in Case 1. (b) Probability density function of predicted concentration of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0415 estimated using the conventional MCMC (red-solid lines) and the linear, quadratic, and cubic surrogate systems (dashed lines) in Case 1. The true solution is plotted in black-solid line and the red-solid lines are taken as reference. While the conventional MCMC requires 57,000 model executions, the surrogate-based MCMC simulations require 1853, 1032, and 793 model executions for the linear, quadratic, and cubic surrogate systems, respectively.

4.2. Case 2: Reactive Transport of Uranium (VI) in Column Experiment

[49] The second synthetic study is designed based on the uranium reactive modeling of Kohler et al. [1996], who conducted seven column experiments in a well-characterized U(VI)-quartz-fluoride column system and simulated the experiments using seven alternative surface complexation models (C1–C7) with different numbers of functional groups and reactions. The models were calibrated against three column experiments (Experiments 1, 2, and 8) conducted under different experimental conditions, and the calibrated models are used to predict the remaining four experiments (Experiments 3, 4, 5, and 7). Model C4 of Kohler et al. [1996] is used in this study. As shown in Table 3, the model has two functional groups, called weak site ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0416) and strong site ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0417), respectively. The weak site is associated with one reaction, and the strong site with two reactions. The model has a total of four parameters, three of which are the formation rates of the three reactions, denoted as K1, K2, and K3. The fourth parameter is the fraction of the strong site, denoted as Site (the fraction of the weak site is calculated as 1 minus the fraction of the strong site). The 10-base logarithm of the parameters are listed in Table 3. In this study, following Kohler et al. [1996], the concentration data are generated using the computer code RATEQ (developed by Curtis [2005]) for the chemical conditions of Experiments 1, 2, and 8. The numbers of concentrations for Experiments 1, 2, and 8 are 39, 32, and 49, respectively. The synthetic data are corrupted by adding 3% random noise to the true concentration values.

Table 3. Surface Complexation Reactions, True Parameter Values, and Searching Region Γ of the Parameters in Case 2a
U(VI) Surface Reaction True Values Γ
[−10.0, −2.0]
[−6.0, −1.0]
[0.1, 5.0]
urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0424(Site) = –1.7104 [−4, −0.1]
  • a Total site density used in this model is 1.3 M/L.
[50] The PPDF of the four parameters, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0425, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0426, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0427, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0428, are estimated using the 49 data of Experiment 8; the data of Experiments 1 and 2 are treated as prior data not used directly in the PPDF estimation. For the sake of demonstrating that our method is not limited to Gaussian likelihood function, the informal likelihood function of the exponential type in Table 1 is used with the coefficient urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0429. The searching region Γ of the parameters are listed in Table 3. The prior distribution is assumed to be uniform for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0430, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0431, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0432. For urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0433, a bivariate Gaussian distribution is assumed, because of the results of calibrating the model against the data of Experiments 1 and 2. The optimum parameters obtained using UCODE_2005 [Poeter et al., 2008] with the searching regions listed in Table 3 are
for Experiments 1 and 2, respectively. While the optima of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0435, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0436, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0437 are very close, the optima of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0438 are significantly different for Experiments 1 and 2, which is also found in Shi et al. (submitted manuscript, 2013) for the same model. It suggests that the PPDF of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0439 may have at least two modes. This prior information is used for estimating the PPDF of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0440 (using the data of Experiment 8) by defining the bivariate Gaussian prior, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0441, as
where the standard deviations, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0443 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0444, are obtained from the results of the local calibrations using UCODE_2005.
[51] The high-probability region is defined as follows. First, the global optimum for the objective function urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0445 ( urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0446 to guarantee that g is positive for numerical convenience) is estimated in the searching domain Γ. After 2309 model executions, the first maximum is found as
corresponding to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0448. Calculation of the Hessian matrix urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0449 using the formula in equations 5 and 6 takes 33 model executions. The inverse of the Hessian matrix is
With these results, the first high-probability region urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0451 is defined using equation 8 with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0452.

[52] Figure 11 illustrates the relation between the above defined high-probability region and the searching region. To make the visualization possible, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0453 is fixed at its optimum of −3.4077, and the high-probability region of the other three parameters is plotted in the gray region of Figure 11, which is transformed from the unit cube urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0454 after rotation and dilation. The figure shows that the volume of the high-probability region is dramatically smaller than that of the searching region Γ given in Table 3. Nevertheless, the high-probability region is sufficiently large to cover all the MCMC samples obtained using DREAM around urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0455, which are plotted as the blue dots in Figure 11a. Based on the three-dimensional high-probability region, the adaptive sparse-grid interpolant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0456 is constructed using equation 35 and by setting urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0457, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0458, L = 20, and the tolerance urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0459. The interpolant is built using the linear, quadratic, and cubic basis functions, and the number of model executions needed for the three interpolants are 1577, 633, and 393, respectively. The sparse grid of cubic basis function for parameters log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0460, log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0461, and log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0462 is shown in Figure 11b. Since the number of model executions to build an isotropic sparse grid is 6017, using the adaptive sparse grids surrogate is more computationally efficient. Among the three basis functions, the cubic hierarchical basis is more efficient and thus used for the calculation below. Using the cubic basis function, the number of model executions needed to build the four-dimensional sparse grid, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0463, for all the model parameters increases to 548.

Details are in the caption following the image
High-probability region (in gray) calculated with urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0464 for three parameters of Case 2. The region is significantly smaller than the searching region (the outer box) and covers (a) all the MCMC samples after convergence and (b) the sparse grid with 393 grid points.
[53] The second maximum of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0465 is obtained by conducting global optimization to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0466. It takes 3131 model executions to find the second maximum of urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0467, i.e., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0468, whose corresponding parameters are
It is similar to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0470 except the value of log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0471, which is close to the optimum value obtained using the data of Experiment 2. This is not surprising, because log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0472 is not influential to the data of Experiment 8. If one sets the significance tolerance urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0473 (Figure 5), the significance ratio δ in equation 40 is
indicating that the optimum parameter set is a significant mode on the PPDF. Computing the Hessian matrix urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0475 with 33 model executions and taking its inverse leads to
The high-probability region and the sparse grid for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0477 are developed in the same manner for urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0478, and the number of needed model executions is 609.
[54] The iteration continues, and the third set of optimum parameters
is obtained after 2984 model executions, and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0480. The significance ratio urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0481 in equation 40
is dramatically smaller than the user-specified urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0483, indicating that this mode on PPDF is negligible in comparison with the other two modes, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0484 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0485. The iteration terminates, and the surrogate system using the cubic basis is used for the surrogate-based MCMC.

[55] Figure 12 plots the marginal posterior distribution of the four parameters and the 2-D contours of their combinations obtained using DREAM- and surrogate-based MCMC. For each MCMC simulation, like in the first numerical experiment, a total of 60,000 parameter samples are drawn using three Markov chains. The Gelman-Rubin R statistic indicates that the Markov chains converge after 600 and 420 samples for the DREAM- and surrogate-based MCMC, respectively. For simplicity, the first 600 samples are discarded in both the simulations, and the remaining urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0486 samples are used to estimate the PPDF. Figure 12 indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are almost identical to those obtained using DREAM. However, considering that the number of model executions for the DREAM- and surrogate-based MCMC are 60,000 and 9647, respectively, with comparable accuracy, the surrogate-based MCMC is significantly more efficient.

Details are in the caption following the image
One-dimensional (diagonal) and two-dimensional (off-diagonal) marginal probability density functions of the four parameters of Case 2. The MCMC samples are obtained using the conventional MCMC with 60,000 model executions (red-solid lines and dots) and the cubic surrogate system with 9647 model executions (blue-dashed lines and dots). The true parameter values are plotted in black-solid lines or green dots. Take the conventional MCMC results as reference, the estimations by the cubic surrogate system are accurate but with greatly reduced computational cost.

[56] Predictive performance of the DREAM- and surrogate-based MCMC is evaluated by using the parameter samples obtained above to predict the breakthrough curve of Experiment 4 of Kohler et al. [1996] with 118 measurements. Like in Case 1, a cubic surrogate system is built at each predicted point, which costs 411 model executions. Figure 13a plots the upper and lower bounds of the 95% credible intervals for the predictive breakthrough curve obtained from the DREAM- and surrogate-based MCMC. The two sets of credible intervals are visually identical. Figure 13b plots the density functions of concentrations at a pore volume of 3.76 in the predicted breakthrough curve obtained from the two kinds of MCMC simulations. The two sets of distributions are very close. While only 411 model executions are needed to build the surrogate system using the cubic basis function and to obtain the results in Figure 13, the DREAM-based MCMC requires 58,200 model executions.

Details are in the caption following the image
(a) True breakthrough curve of Experiment 4 in Case 2 (black dots), the 95% credible intervals estimated using the conventional MCMC (red-solid lines) and the cubic surrogate system (blue-dashed lines). (b) Probability density functions of a specific predicted quantity at pore volume of 3.76 estimated using the conventional MCMC (red-solid line) with 58,200 model executions and the cubic surrogate system (blue-dashed line) with 411 model executions. The true solution is plotted in black-solid line.
[57] However, the number of model executions needed for the surrogate-based MCMC is still relatively large. One may raise the question that whether the same results can be obtained using computationally frugal methods, such as nonlinear regression, considering that Lu et al. [2012b] and Shi et al. [2012] showed that nonlinear regression and Bayesian methods may give similar results for quantifying parametric uncertainty, while nonlinear regression methods only require hundreds of model executions. To answer this question, nonlinear regression is conducted using UCODE_2005, which minimizes the sum of squared weighted residual (SSWR) to estimate the local minimum of the parameters and calculate the parameter estimation covariance matrix. Since UCODE_2005 can incorporate prior information into the nonlinear regression, the prior density given in equation 48 is used in the UCODE_2005 optimization. The initial parameter values used for the local optimization are selected randomly in the searching region, Γ, listed in Table 3. The calibrated parameter values are urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0487, which is very close to the second mode, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0488, on the PPDF. If one adjusts the initial parameter values, the first mode, urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0489, on the PPDF may be obtained. However, it is impossible to obtain the two modes simultaneously, despite that the two modes have similar density (Figure 12). Therefore, the discussion here is focused on the current local optimum that is close to urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0490. Its corresponding covariance matrix is
The variance terms indicate that, except for log urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0492, the parametric uncertainty is negligible for the other three parameters, which is incorrect based on Figure 12. In addition, the covariance terms do not accurately reflect the parameter correlation. For example, the covariance matrix shows that the correlation between urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0493 and urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0494 is negative, whereas it is actually positive as shown in Figure 12. The inconsistency is attributed to the nonlinearity of the model. The Beale's measure of nonlinearity of this model calculated using UCODE_2005 is 197.52, which is overwhelmingly larger than the threshold value of 0.39 [Hill and Tiedeman, 2007]. Therefore, the covariance matrix estimated above based on linearity assumptions cannot accurately quantify parametric uncertainty. While there may be other methods that are comparable with the surrogate-based MCMC in terms of accuracy and efficiency for parametric uncertainty quantification, identifying such methods and having a comprehensive comparison is beyond the scope of this study.

5. Conclusion

[58] This paper presents a new adaptive sparse-grid high-order stochastic collocation method (aSG-hSC) to improve the computational efficiency of Bayesian inference for quantification of parametric uncertainty. The method is model independent and flexible to be used together with any MCMC algorithms and likelihood functions (formal and informal). This study tackles a challenging problem of groundwater reactive transport modeling. High nonlinearity of groundwater reactive transport models cause difficulties of developing an accurate and efficient surrogate of the models and capturing significant modes on parameter distributions. These problems are resolved by combining high-order hierarchical polynomial basis and the local adaptive sparse-grid technique, which can greatly reduce the computational cost for the desired surrogate system in comparison with the existing sparse grid methods. To further reduce the computational cost of constructing a surrogate system for a parameter distribution with multiple modes, the iterative aSG-hSC algorithm is developed that uses optimization methods to find the modes sequentially. For each mode, a high-probability region is built, on which a component of sparse grids is constructed. The high-probability regions are significantly smaller than the searching region of MCMC simulation, which is the reason for saving computational cost. The iterative aSG-hSC method is demonstrated using two numerical examples of groundwater reactive transport models. In the both cases, the aSG-hSC method provides almost identical results of DREAM-based MCMC but requires a dramatically smaller number of model execution for estimating parameter distributions and quantifying predictive uncertainty. The first example involves only linear reactions and is suitable to demonstrate that higher-order hierarchical basis functions are more efficient. The second example involves nonlinear reactions and is thus highly nonlinear. Its parameter distributions are multimodal and non-Gaussian, and these features can be well captured in the results obtained using the iterative aSG-hSC method. These features however cannot be captured by the linear regression method investigated in this study. The computationally efficient aSG-hSC method is critical to the practical application of Bayesian inference to time-consuming groundwater reactive transport modeling. Due to the nonintrusive nature of the new method, it can be used together with many models and sampling methods used in hydrology and other fields.

[59] As a surrogate method, the iterative aSG-hSC method has some limitations. First, its computational performance relies on the ability of finding the modes using optimization methods. If the execution of the global optimization solver is computationally expensive, then it will deteriorate the efficiency of aSG-hSC. If the optimization fails to find an optimum parameter set at a given iteration, a significant mode may be missing. In this case, one has to sacrifice computational efficiency and use more sparse grid points. However, it is worth mentioning that this kind of challenge is not specific to our method but to all numerical algorithms of uncertainty quantification and optimization. It is expected that this problem can be resolved with advances in optimization techniques. In addition, since the Hessian matrix is to determine the high-probability domain for each significant mode, it remains empirical at this moment to find the optimal value for the user-defined constant urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0495 in equation 8. When the shape of the detected significant mode is extremely complicated, the commonly used value, e.g., urn:x-wiley:00431397:media:wrcr20467:wrcr20467-math-0496, may not be appropriate to cover the high-probability region. In other words, the reduction of the bounds for building sparse grids may not be significant as shown in the numerical examples of this study. The major challenge resides in nonsmoothness of the surface of parameter distributions due to nonlinearity of groundwater reactive transport models. Reducing nonlinearity may be a solution to the problems mentioned above.


[60] G. Zhang was supported by the Advanced Simulation Computing Research (ASCR), Department of Energy, through the Householder Fellowship at ORNL. M. Ye was supported by the DOE Early Career Award, DE-SC0008272. M. Gunzburger was supported by the US Air Force Office of Scientific Research under grant FA9550-11-1-0149. C. Webster was supported by the US Air Force Office of Scientific Research under grant 1854-V521-12. C. Webster was also sponsored by the Director's Strategic Hire Funds through the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory (ORNL). The ORNL is operated by UT-Battelle, LLC, for the United States Department of Energy under Contract DE-AC05-00OR22725.