# An adaptive sparse-grid high-order stochastic collocation method for Bayesian inference in groundwater reactive transport modeling

## Abstract

[1] Bayesian analysis has become vital to uncertainty quantification in groundwater modeling, but its application has been hindered by the computational cost associated with numerous model executions required by exploring the posterior probability density function (PPDF) of model parameters. This is particularly the case when the PPDF is estimated using Markov Chain Monte Carlo (MCMC) sampling. In this study, a new approach is developed to improve the computational efficiency of Bayesian inference by constructing a surrogate of the PPDF, using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method. Unlike previous works using first-order hierarchical basis, this paper utilizes a compactly supported higher-order hierarchical basis to construct the surrogate system, resulting in a significant reduction in the number of required model executions. In addition, using the hierarchical surplus as an error indicator allows locally adaptive refinement of sparse grids in the parameter space, which further improves computational efficiency. To efficiently build the surrogate system for the PPDF with multiple significant modes, optimization techniques are used to identify the modes, for which high-probability regions are defined and components of the aSG-hSC approximation are constructed. After the surrogate is determined, the PPDF can be evaluated by sampling the surrogate system directly without model execution, resulting in improved efficiency of the surrogate-based MCMC compared with conventional MCMC. The developed method is evaluated using two synthetic groundwater reactive transport models. The first example involves coupled linear reactions and demonstrates the accuracy of our high-order hierarchical basis approach in approximating high-dimensional posteriori distribution. The second example is highly nonlinear because of the reactions of uranium surface complexation, and demonstrates how the iterative aSG-hSC method is able to capture multimodal and non-Gaussian features of PPDF caused by model nonlinearity. Both experiments show that aSG-hSC is an effective and efficient tool for Bayesian inference.

## Key Points

- High-order stochastic collocation method is used for Bayesian inference
- Adaptive sparse grids are employed to reduce computational cost
- An iterative algorithm is proposed for simulating PPDF with multiple modes

## 1. Introduction

[2] Groundwater models are vital tools for predicting the effects of future anthropogenic and/or natural occurrences in the subsurface environment. Model predictions are inherently uncertain due to epistemic and aleatory uncertainties in data and model parameters and structures; uncertainty quantification in groundwater modeling is indispensable, and many methods of uncertainty quantification have been developed to facilitate science-informed decision making in water resource management (see recent review articles of *Matott et al*. [2009] and *Tartakovsky* [2013], and references therein). While this study is only for quantification of parametric uncertainty, the results can be used directly for quantification of model uncertainty, because quantifying parametric uncertainty is the basis of quantifying model structure uncertainty in the popular multimodel analysis methods [*Neuman*, 2003; *Ye et al*., 2004; *Poeter and Hill*, 2007; *Refsgaard et al*., 2012; *Neuman et al*., 2012; *Lu et al*., 2012a]. The Bayesian method is one of the most widely utilized approaches for quantifying parametric uncertainty [*Kitanidis*, 1986; *Box and Tiao*, 1992; *Ezzedine et al*., 1999; *Beck and Au*, 2002; *Marshall et al*., 2005; *Marzouk et al*., 2007; *Ma and Zabaras*, 2009; *Marzouk and Xiu*, 2009; *Allaire and Willcox*, 2010; *Renard*, 2011; *Zeng et al*., 2012; *Kitanidis*, 2012; *Lu et al*., 2012b; *Shi et al*., 2012], wherein model parameters and predictions are modeled as random variables. The Bayesian methods are well connected with and complementary to other methods of uncertainty quantification [e.g., *Woodbury*, 2011; *Nott et al*., 2012]. They are flexible and can be applied to different models to incorporate multiple types of data and prior information [e.g., *Woodbury*, 2007; *Rubin et al*., 2010; *Chen et al*., 2012]. The outputs of Bayesian methods are probability density functions of quantities of interest that can be directly used for uncertainty quantification, risk assessment, and decision making.

[3] In the Bayesian inference framework, this paper presents a computationally efficient method developed using an adaptive sparse-grid high-order stochastic collocation (aSG-hSC) method to reduce the computational cost of Bayesian computation, which is always a burden for practical Bayesian applications especially to computationally demanding models with a large number of parameters. When estimating the posterior probability density function (PPDF) in Bayesian inference, except in special cases in which analytical expressions of the PPDF can be derived [*Woodbury and Ulrych*, 2000; *Hou and Rubin*, 2005], the PPDF is usually estimated numerically using sampling techniques. One of the most popular and robust sampling techniques is the Markov Chain Monte Carlo (MCMC) method [*Marshall et al*., 2005; *Gamerman and Lopes*, 2006; *Vrugt et al*., 2008, 2009; *Keating et al*., 2010; *Liu et al*., 2010]. However, MCMC methods are in general computationally expensive, because a large number of model executions are needed to estimate the PPDF and sample from it. Many MCMC algorithms have been developed to improve computational efficiency, such as delayed rejection and adaptive Metropolis sampling [*Haario et al*., 2006] and differential evolution adaptive Metropolis (DREAM) sampling [*Vrugt et al*., 2008, 2009], by reducing the needed number of model executions. The number of model executions is of primary interest, because computational cost of solving the models dominates over that of other MCMC calculations that are simple algebraic operations. However, even with these advanced methods, the number of model executions is still often in the order of magnitude of tens of thousands or even hundreds of thousands. As a result, applications of MCMC approaches are prohibitive for computationally demanding models such as those of groundwater reactive transport, one solution of which may take tens of minutes and even hours [*Zhang et al*., 2012].

[4] In this study, the problem of high computational cost of MCMC simulations is resolved by incorporating sparse-grid methods into MCMC operation to develop sparse-grid-based MCMC algorithms. The sparse-grid methods in a broader sense are one of surrogate methods that have been used to improve computational efficiency in water resources research [*Razavi et al*., 2012]. The key idea of sparse-grid methods is to place a grid in the parameter space with sparse parameter samples (as opposed to a full tensor-product grid). Then the forward model is solved only for the sparse parameter samples to save computational cost. More specifically speaking, the method used in this study is a stochastic collocation method at sparse grids, also known as the sparse-grid stochastic collocation method [*Nobile et al*., 2008a, 2008b]. Another popular collocation method is the probabilistic collocation method that uses the finite-dimensional polynomial chaos expansion [*Marzouk et al*., 2007; *Li and Zhang*, 2007; *Shi et al*., 2009]. A comprehensive comparison of the two stochastic collocation methods can be found in the study of *Chang and Zhang* [2009] in terms of their accuracy and efficiency. While such a comparison is of high significance to the selection of an appropriate method for different applications, it is beyond the scope of this study. The sparse-grid methods have been demonstrated to be efficient and effective for dealing with high-dimensional interpolation and integration, and they have been used recently in groundwater uncertainty quantification. In the studies of, e.g., *Shi and Yang* [2009], *Lin and Tartakovsky* [2009, 2010], and *Lin et al*. [2010], the sparse-grid methods were used to estimate the mean and covariance of groundwater state variables such as hydraulic head and solute concentrations. In these studies, parameter distributions were assumed known, and Bayesian inference was not conducted. Bayesian inference using the sparse-grid method was conducted in the study of *Ma and Zabaras* [2009] and *Zeng et al*. [2012], in which surrogate of geophysical models was built and then used to evaluate parameter distributions using observations of state variables.

[5] While the aSG-hSC method presented in this paper is in spirit similar to that of *Ma and Zabaras* [2009] and *Zeng et al*. [2012] in terms of using the sparse-grid method to improve computational efficiency of Bayesian inference, our method tackles a more challenging problem of uncertainty quantification and offers more computationally efficient structures of sparse grids. Different from the previous studies of sparse-grid methods that only quantify uncertainty in flow and advection-dispersion problems, this study conducts uncertainty quantification for groundwater reactive transport models, which are significantly more nonlinear due to nonlinear reactions and coupling between flow, transport, and biogeochemical processes. The nonlinearity causes two challenges to applications of sparse-grid methods. First, if the surrogate systems of the nonlinear models are constructed using linear hierarchical basis functions as in previous groundwater applications, more sparse-grid interpolation points, i.e., more model executions, are needed to obtain the prescribed interpolation accuracy, which plagues the purpose of using sparse-grid methods. The other challenge is that the nonlinearity always leads to extremely complex surface of likelihood function (or its least square equivalent) with a large number of local minima such as those reported in *Matott and Rabideau* [2008] and Shi et al. (Assessment of parametric uncertainty for surface complexation modeling of uranium reactive transport, submitted to *Water Resources Research*, 2013) for nitrogen and uranium reactive transport, respectively. The multiple local minima correspond to multiple modes (significant or insignificant) on the surface of the PPDF. Existing algorithms cannot succeed in capturing all the significant modes or may succeed only with significantly increased computational effort. The two problems caused by nonlinearity are not limited to groundwater reactive transport models but prevalent to all nonlinear models.

[6] The aSG-hSC method is developed to resolve the two challenges above. To resolve the first challenge of efficiently approximating the PPDF involving nonlinear groundwater reactive transport models, the surrogate system with a sparse-grid interpolation is constructed with high-order stochastic collocation (hSC) approach, i.e., utilizing high-order hierarchical polynomial basis with quadratic or cubic polynomials as in *Griebel* [1998] and *Bungartz and Griebel* [2004]. Due to their increased accuracy compared to the linear hierarchical basis, the number of model executions needed for constructing the surrogate system can be greatly reduced. The high-order approach is not a trivial extension from the linear technique [*Zhang et al*., 2010)], and it is the first time that the high-order stochastic collocation method is used not only in groundwater modeling but also in surrogate modeling for Bayesian inference. Furthermore, instead of building the approximate PPDF using isotropic sparse-grid interpolation [*Nobile et al*., 2008a; *Barthelmann et al*., 2000] or dimension-adaptive sparse-grid interpolation [*Nobile et al*., 2008b], a locally adaptive sparse-grid (aSG) interpolation [*Griebel*, 1998] is used. This technique utilizes the hierarchical surplus (discussed in section 3.2) as an error indicator to detect the nonsmooth and/or important regions in the parameter space and adaptively place more points in the regions. This results in further computational gains and guarantees that a user-defined accuracy of the surrogate system is realized.

[7] To resolve the second challenge of reducing the computational cost of constructing the surrogate system for a PPDF with multiple modes, an iterative procedure is developed for the aSG-hSC method to incorporate optimization results into the surrogate construction. Using aSG-hSC together with optimization is considered as a strength, since it can leverage extensive research in the area of optimization. The design of the iterative procedure is based on the following observations. In MCMC-based Bayesian inference, large parameter ranges are always specified in the prior distribution due to lack of information. If multiple modes exist on the PPDF, there are high-probability regions around each significant mode (definition of the high-probability regions is given in section 3 below). Markov chains move toward the high-probability regions and generate random samples by following the Metropolis rule [*Gamerman and Lopes*, 2006]. During this process, a large number of samples are discarded in the burn-in period and rejected due to the Metropolis rule, and model executions corresponding to these samples are wasted. This procedure of sampling can be made more computationally efficient using the adaptive sparse-grid techniques, if the approximate locations of the modes are known from optimization. This motivates the iterative aSG-hSC method. In each iteration, global or local optimization is utilized to detect each significant mode of the PPDF, and the corresponding high-probability region is determined based on optimization results such as Hessian matrix at the found optimum. Subsequently, the high-probability region is incorporated into the prior distribution, and the aSG-hSC method is used to construct surrogate within the high-probability region. This is the key to saving computational cost, because the surrogate is not constructed over a large parameter space where a significant number of sparse grid points are blindly placed in the low-probability regions. However, there is a trade-off between the saved computational cost and that spent on optimization, which is discussed in the numerical examples in section 4. The iteration stops until all significant modes are identified according to a user-specified significance tolerance. It is demonstrated in section 4 that our method can find all the modes whose significance is larger than a user-defined significance tolerance. Note that the aSG-hSC method is independent of MCMC methods, so that it can be used together with any MCMC methods. In addition, because both the aSG-hSC and MCMC methods are model independent, the resulting sparse-grid-based MCMC algorithms can be applied to a wide range of problems.

[8] The rest of the paper is organized as follows. In section 2, the Bayesian framework and the conventional MCMC method used in this study are briefly introduced, followed by the iterative aSG-hSC method of constructing the surrogate system presented in section 3. In section 4, the new approach is applied to reactive transport problems and its effectiveness and efficiency in comparison with the conventional MCMC method is demonstrated.

## 2. Bayesian Inference and MCMC Simulation

*N*measurement data, is a vector of model parameters, is the forward model with inputs and

_{d}*N*outputs, and is a vector of residuals, including measurement, model parametric, and structural errors.

_{d}*Box and Tiao*, 1992] via

*Beven and Binley*, 1992;

*Smith et al*., 2008;

*Schoups and Vrugt*, 2010;

*Smith et al*., 2010)]. Several widely used informal likelihood functions in hydrology are also listed in Table 1. Definition of informal likelihood function is problem specific in nature, and there has been no consensus on which informal likelihood function outperforms the others.

Formal Likelihood Functions | |
---|---|

Multivariate normal (MVN) | |

Informal Likelihood Functions | |

Exponential (EXP) | |

Nash-Sutcliffe (NS) | |

Mean cumulative error (MCE) | |

Normalized sum of squared errors (NSSE) |

^{a}, mean of observations; , mean of outputs of forward model; ζ, scaling constant for the exponential likelihood function; Σ, covariance matrix of residual for the Gaussian likelihood function.

[11] This study is not to investigate how to define likelihood functions but how to efficiently build surrogate models for a chosen likelihood function using the aSG-hSC approach. As a function approximation method, aSG-hSC only requires that in equation 2 is a continuous function, which is satisfied by all likelihood functions in literature including those listed in Table 1. In section 4 of numerical examples, for the sake of illustration, the Gaussian likelihood function is used for the first numerical example and the likelihood function of exponential type for the second numerical example.

[12] While this study is focused on quantification of parametric uncertainty, its results can be used directly for quantification of model uncertainty. In parametric uncertainty quantification, the denominator of the Bayes' formula in equation 2 is a normalization constant that does not affect the shape of the PPDF. As such, in the hereafter discussion concerning building surrogate systems, the notation
or the terminology PPDF will only refer to the product
. When extending this research to quantification of model uncertainty, the denominator,
, becomes critical. In the Bayesian model averaging method that considers model uncertainty due to alternative models [e.g., *Ye et al*., 2004, 2008, 2010], for an individual model *M _{k}*, this term becomes the model likelihood function,
, where
is the joint likelihood function of the model and its parameters. The model likelihood function is the most critical variable for evaluating model probability used to quantify model uncertainty. Although this term can be evaluated using the aSG-hSC method, it is beyond the scope of this study.

[13] Due to the nonlinearity of model
with respect to parameter
, it is often difficult to draw samples from the PPDF directly, so that the MCMC methods, such as Metropolis-Hastings (M-H) algorithm [*Gamerman and Lopes*, 2006] and its variants, are often used for sampling. The essence of the MCMC methods is that parameter samples are drawn from a proposal distribution instead of the PPDF and the Markov property guarantees the convergence of the proposal distribution to the posterior distribution. However, in practice, the convergence is often slow when the proposal distribution deviates from the posterior distribution. Many advanced MCMC methods have been developed, and one of them is the Differential Evolution Adaptive Metropolis Approach (DREAM) developed by *Vrugt et al*. [2008, 2009]. The DREAM algorithm uses multiple Markov chains simultaneously, and all chains are viewed as from the same population, and the sampling procedure is treated as the evolution of the population. As such, the classic proposal distribution used in the M-H algorithm is not necessary, and the jump of each Markov chain at each step is determined by differential evolution of a genetic algorithm. It was shown by *Vrugt et al*. [2008, 2009] that DREAM is generally more efficient than traditional MCMC algorithms in the absence of additional information about the PPDF. Moreover, DREAM is more advantageous in dealing with multimodal posterior distribution, which matches our goal of building a surrogate system for a posterior distribution with multiple significant modes. For these reasons, DREAM is chosen in this study as the framework of Bayesian inference. Using aSG-hSC together with DREAM is considered as a strength, because it leverages the recently developed MCMC algorithm. However, it should be noted that the aSG-hSC method to build surrogate system can be used with other MCMC algorithms.

## 3. Iterative aSG-hSC Methodology

[14] This section describes the iterative aSG-hSC method to construct the surrogate system for estimating the posterior distribution. To provide context for the method described in section 3.3, determination of the high-probability region is first introduced in section 3.1, followed by describing the high-order hierarchical polynomial basis and the adaptive sparse-grid interpolation in section 3.2.

### 3.1. Determining High-Probability Region as the Prior

[16] Defining an individual high-probability region starts with searching the global maximum of
(i.e.,
) using global optimization. The objective function used for optimization may be different from the posterior distribution. For example, if the selected likelihood function is of an exponential type, such as the formal multivariate normal likelihood and the informal exponential likelihood in Table 1, using the logarithm of the likelihood function in optimization gives more stable results than using the likelihood function itself [*Pflüger*, 2005]. This strategy is also applicable to the prior distribution. Thus, for the convenience of notation,
is used to represent the objective function, which can be viewed as the preprocessing of
for optimization. For example, when using uniform prior and informal exponential likelihood in Table 1, the objective function is defined by
where
and
are the mean of the observations and outputs of the forward model, respectively. When using the Nash-Sutcliffe likelihood, the objective function can be defined as
to guarantee enough gradient information over the searching region. Note that the preprocessing function
must be monotonic such that it is invertible. While any global optimization algorithm can be used, the DIRECT algorithm is used in this study. DIRECT, first proposed by *Jones et al*. [1993], is a derivative-free global optimization algorithm and an improvement of the standard Lipschitzian approach that eliminates the need to specify a Lipschitz constant.

[17] Once the global parameter optimum is obtained, the next step is to define the high-probability region around . This can be done by investigating how fast the posterior distribution decays to zero away from through estimating the curvature of , as more curved function decays faster. Since the second-order derivative is a measure of a function's curvature, the Hessian matrix of at , denoted by , is used to determine the high-probability region. In the case of multivariate Gaussian distribution, is the inverse of the covariance matrix and sufficient to define the Gaussian shape; in non-Gaussian cases, still provides sufficient information about the curvature of around .

*Nocedal and Wright*, 2006]

*l*th and

*m*th entries, which are equal to properly selected steps and . can be estimated by using singular value decomposition (SVD), i.e.,

*V*. Note that when is an uncorrelated random variable, both

*V*and

*U*are identity matrices and characterize the variances along the axes of the parameters. also represents a linear transform in dimensional space where and

*V*determine the rotation and determine the stretching along orthogonal directions. Based on these results, the high-probability region, , corresponding to is defined by transforming a unit cube via

*p*itself, is non-Gaussian. The contours of the two density functions are shown in Figures 1a and 1b, respectively. The searching region is set to be large with . The maxima of and are 44.5 found at (0.5,0.5) and (0.5, 0.4), respectively. The desired in equation 4 is set as 0.01 ( ), meaning that the defined high-probability region should cover the area within the contour of . Figure 1 plots the high-probability regions for and . In Figure 1a, for the Gaussian density, the prior region with can cover very well; in Figure 1b, for the non-Gaussian density, is needed to fully cover the contour of 0.01. In either case, the prior regions are dramatically smaller than the initial search region, .

_{G}[20] Defining such high-probability regions for MCMC simulation has two advantages. First, since the volume of the high-probability region is often significantly smaller than the searching region, the computational cost of building the surrogate system of desired accuracy can be considerably reduced. In addition, because the high-probability region well covers the significant mode of the , the initial samples of Markov chains can be generated within such regions, which will significantly accelerate the convergence of MCMC sampling. While both the global optimization and the calculation of the Hessian matrix require forward model executions, such computational expense is worthwhile as long as more computational cost can be saved by working on the prior regions due to the two advantages. Although global optimization is used here, it may not be necessary in practice, because the aSG-hSC method is expected to perform well as long as the prior regions include the mode. In other words, a rough estimation of the location of each significant mode is sufficient for our method. This can be achieved by using local optimization techniques if information about the shape of is available, which will further reduce computational cost.

### 3.2. Adaptive Sparse-Grid High-Order Stochastic Collocation Method

[21] After obtaining the high-probability region
using equation 8 for the significant mode of the PPDF around
in the parameter space, the next task is to build the surrogate model for
on
using the aSG-hSC method. The method is only briefly described here, for more details refer to *Griebel* (1998), *Barthelmann et al*. (2000), *Bungartz and Griebel* (2004), and *Klimke and Wohlmuth* (2005). Since the methods of building surrogate systems are generally applicable to any functions governed by partial differential equations, not limited to
, a general function
is used for the method description.

#### 3.2.1. One-Dimensional Hierarchical Interpolation

*L*in equation 10 is called the resolution level of the hierarchical interpolant and the summation over the resolution level in equation 10 exhibits the hierarchical structure of the interpolant . For , and in equation 11 are the basis functions and the interpolation coefficients for , respectively. For , the integer

*m*in equation 11 is the number of interpolation points involved in , which is defined by

_{i}[23] Since the representation of
depends on the properties of the selected basis function
, the basis functions is discussed first. Different from the previous studies that utilize linear hierarchical basis functions to build surrogate systems, this study uses high-order hierarchical polynomial basis functions, including quadratic and cubic hierarchical bases defined by *Bungartz and Griebel* [2004], in order to improve the accuracy and efficiency for constructing the surrogate system. Expressions of linear, quadratic, and cubic hierarchical polynomial bases are provided below.

*i*= 1,

*j*= 1 and 2, define and set

*i*= 1, the basis is the same as the quadratic hierarchical basis defined by equations 17 and 18. For , and

*j*is odd,

*j*is even,

*Klimke and Wohlmuth*[2005] and

*Ma and Zabaras*[2009], when the function is smooth with respect to θ, the magnitude of the surplus will approach to zero as the resolution level

*i*increases. Therefore, the surplus can be used as an error indicator for the interpolant in order to guide the sparse grid refinement.

#### 3.2.2. Multidimensional Isotropic Sparse-Grid Interpolation

*L*-level full tensor-product interpolant needs grid points, where

*m*is defined in equation 12. This is also the number of model executions needed when building the surrogate system. Using the full tensor-product formulation, the number of grid points grows exponentially with the number of random parameters , which is the curse of dimensionality as the dimension increases. By virtue of the second definition of in equation 30 corresponding to the isotropic sparse-grid interpolation, the curse of dimensionality can be resolved.

_{i}[29] Figure 3 illustrates how the curse of dimensionality is resolved, using the construction of a two-dimensional (
) level *L* = 3 isotropic sparse grid as an example. The definitions of
in equation 30 show that an *L*-level isotropic sparse grid is a subgrid of an *L*-level full tensor-product grid. The resolution level in one dimension can be
, and 3 as shown in the top horizontal lines in Figure 3a. The same is true for the other dimension as shown in the left vertical lines. There are a total of 16 subgrids in Figure 3a, each of which corresponds to an incremental interpolant
in equation 25, where
and
. Different combinations of *i*_{1} and *i*_{2} with
lead to all the 16 subgrids in Figure 3a, the union of which constitutes the level *L* = 3 full tensor-product grid with the 81 grid points shown in Figure 3c. In comparison, different combinations of *i*_{1} and *i*_{2} with
lead to only 10 subgrids above the dashed line in Figure 3a, the union of which constitutes the level *L* = 3 isotropic sparse grid with only the 29 grid points shown in Figure 3b. This reduction is significant even though the maximum number of interpolation points in each dimension is the same for the both grids. Generally speaking, an isotropic sparse grid contains approximately
points where
, whereas a full tensor-product grid contains
points [*Nobile et al*., 2008a]. Although significantly fewer points are used, the accuracy of the sparse-grid interpolation does not appreciably deteriorate compared to that of the full tensor-product interpolation [*Barthelmann et al*., 2000; *Bungartz and Griebel*, 2004]. Thus, in the sequel, the definition of
in equation 25 is fixed to be
and referred to in equation 25 as an isotropic sparse-grid interpolant.

*Klimke and Wohlmuth*[2005]. For , i.e., , the coefficients are calculated as

#### 3.2.3. Adaptive Sparse-Grid Interpolation

[31] As discussed above, if the function
is smooth with respect to
, the magnitude of the hierarchical surplus will decay to zero as the resolution level *L* of
increases. A smoother function has faster decay rates of the surplus. This feature is the basis of constructing adaptive sparse grids using the surplus as an error indicator. We start from the construction of one-dimensional adaptive grids and then extend it to the multidimensional sparse grids. As shown in Figure 4, the one-dimensional isotropic hierarchical grid have a tree-like structure, a grid point
on level *i* has two children, namely
and
on level *i* + 1. Special treatment is required when moving from level 1 to level 2, because only one child point is added on level 2 for nodes
and
. On each successive interpolation level, the basic idea of adaptivity is to use the hierarchical surplus as an error indicator to detect the smoothness of the target function and refine the grid by adding two new points on the next level for each point whose magnitude of the surplus is larger than the prescribed error tolerance.

[32] The adaptivity concept is illustrated in Figure 4, where the six-level adaptive grid is used to interpolate the Gaussian kernel function on [0,1] with the error tolerance being 0.01. From level 0 to level 2, because the magnitude of every surplus is larger than 0.01, two points are added for each grid point, except that only one point is added for each grid point on level 1. On level 3, since the surplus is larger than 0.01 at only one point, , two new points are added after this point on level 4. When this procedure continues through levels 5 and 6, it leads to the six-level adaptive grid with only 21 points (points in black in Figure 4), whereas the six-level nonadaptive (isotropic) grid has a total of 65 points (points in black and gray in Figure 4).

*L*sparse grid in equation 29 can be rewritten as

*L*adaptive sparse-grid interpolant in equation 35 only retains the terms of the isotropic sparse-grid interpolant in equation 25 for which the magnitudes of the corresponding surpluses are larger than α. The corresponding adaptive spare grid can be represented by

*L*isotropic sparse grid in equation 29. If the tolerance , the adaptive sparse-grid interpolant is equivalent to the isotropic sparse grid interplant in equation 25; if , it will adaptively select which points are added to the sparse grid. Subsequently, the sparse-grid points will become concentrated in the nonsmooth region, e.g., where oscillations or sharp transitions occur, to guarantee the prescribed accuracy of the interpolation. On the other hand, in the region where is very smooth, e.g., insensitive to certain parameters, this approach will save a significant number of grid points but still achieve the prescribed accuracy.

[35] In practice, for a specific
-dimensional target function
, the total number of sparse grid points and the accuracy of
can be controlled by two user-defined constants *L* and α, where *L* defines the maximum allowable resolution of the sparse grid and the error tolerance α is used to guide mesh refinement. *L* is usually set to large according to the maximum affordable computational cost and α is set to the desired accuracy of the interpolation, which allows maximizing the use of the available computation resource. The mesh refinement can be stopped in two ways, when the magnitudes of all surpluses on the current level are smaller than α or when the maximum level *L* is reached.

### 3.3. Algorithm for Iterative Construction of the Surrogate PPDF

[36] Using the procedure of defining high-probability regions and the aSG-hSC method discussed in sections 3.1 and 3.2, respectively, one can iteratively construct the surrogate system for
with multiple modes due to the nonlinearity of the reactive transport models. Figure 5 shows the flowchart of the iterative algorithm that sequentially captures all the significant modes. As shown in Figure 5, the algorithm starts from defining the searching region Γ of the Bayesian inference (also used for global optimization) and the objective function of global optimization
. As discussed in Section 3.1, since the preprocessing function
must be invertible, i.e.,
exists, we also use
as the target function of the sparse-grid approximation due to numerical stability issues studied in *Pflüger* [2005]. For example, when using the formal Gaussian likelihood function or the informal exponential likelihood function in Table 1, the logarithm of the likelihood function is used as the target function for both optimization and the surrogate system.

*k*= 1) in Figure 5, global optimization is used to search for the global optimum of the function through the global optimization operator . is the highest peak of , so that is the most significant mode on . Subsequently, the inverse of Hessian matrix of is calculated to determine the prior region around based on equation 8 and the user-defined vector . On , the adaptive sparse-grid interpolant

*k*th iteration, the

*k*– 1 components of the surrogate system are excluded from the optimization, and the optimization operator is applied to only

*m*th component of the surrogate system defined on the domain , is the characteristic function of the region that avoids overlap of different prior regions. The maximum represents the

*k*th highest peak (mode) of the . Since the mode becomes less significant when

*k*increases, the significance ratio

*M*components for

*M*significant modes. The total number of model executions for constructing in equation 41 consists of those for global optimization, estimation of the Hessian matrix, and adaptive sparse-grid interpolation.

## 4. Numerical Examples of Groundwater Reactive Transport Modeling

[39] To illustrate effectiveness and efficiency of the iterative aSG-hSC method in building the surrogate system for the PPDF, it is applied to two synthetic examples of groundwater reactive transport modeling. The first example, adapted from *Sun et al*. [1999], considers multispecies reactive transport with six random parameters. Since the five reactions involved in this example are linear, this example can be used to evaluate the computational efficiency and accuracy of the aSG-hSC approach in approximating high-dimensional posterior distributions with high-order hierarchical basis. The second example is related to reactive transport of uranium (VI) in column experiment with four random parameters, which is revised from *Kohler et al*. [1996]. This example is more complicated, since it includes nonlinear reactions of surface complexation. For the same reactive transport model used in this study Shi et al. [submitted manuscript, 2013] found that the PPDF of the model parameters are non-Gaussian and have multiple modes. This example therefore can be used to evaluate computational efficiency and effectiveness of the iterative aSG-hSC method for a PPDF with multiple modes. To demonstrate that the aSG-hSC method is not limited to the Gaussian likelihood function, the informal likelihood function of the exponential type (Table 1) is used in the second numerical example. The aSG-hSC method is evaluated by comparing the results of aSG-hSC-based MCMC with those of the DREAM-based MCMC in approximating the PPDFs of model parameters and the PDFs of model predictions. Computational efficiency of aSG-hSC is evaluated from two perspectives: equation 1 the number of model executions required to obtain an estimate of the PPDF within a prescribed accuracy, and equation 2 the accuracy of the approximate PPDF for a given number of model executions. These two criteria are complementary in that the first criterion is for the situation when a large number of model executions is affordable while the second criterion when limited number of model executions is affordable. For the second numerical example, nonlinear regression is conducted using UCODE_2005 [*Poeter et al*., 2008] to estimate local parameter optimum and quantify parameter uncertainty. Due to high model nonlinearity, the nonlinear regression cannot identify multiple modes on PPDF of one parameter and cannot accurately quantify parameter uncertainty.

### 4.1. Case 1: Multispecies Reactive Transport

*Sun et al*. [1999]. As shown in Figure 6, species

*A*has one child species

*B*, and

*B*has three child species

*C*

_{1},

*C*

_{2}, and

*C*

_{3}. The governing equations of the simultaneous transport and degradation of the five species involved in the serial-parallel reaction network are as follows:

*C*,

_{A}*C*, , , and are the concentrations of the five species

_{B}*A*,

*B*,

*C*

_{1},

*C*

_{2}, and

*C*

_{3}, respectively,

*t*is the time,

*x*is the spatial location in the domain [0,40],

*v*is the constant flow velocity,

*D*is the dispersion coefficient,

*k*,

_{A}*k*, , , and are the reaction rates of the species,

_{B}*y*is the stoichiometric yield factor that describes the production of its parent species

_{B}*A*to

*B*, and likewise for , , .

[41] Using the parameter values given by *Sun et al*. [1999], synthetic data are generated by solving equation 42 at time *t* = 40 and 10 points of
using numerical code PHT3D [*Prommer and Post*, 2010]. A total of 50 concentrations are generated for the five species, and they are corrupted with 3% Gaussian random noise; the corrupted data are treated as measurements. In the DREAM-based MCMC simulation, the six parameters listed in Table 2 are considered as unknown parameters, which are the dispersion *D* and the logarithm of the five reaction rates,
,
,
,
, and
. Their true values are listed in Table 2; the other parameters are fixed at their true values of
.

*g*is positive for numerical convenience). The corresponding is −7.9128. Computing the Hessian matrix using equations 5 and 6 requires 73 model executions. The inverse of the Hessian matrix is

*L*= 20 and . This is the first component of the surrogate system . The sparse grid interpolant is constructed using the linear, quadratic, and cubic basis functions shown in Figure 2. The number of model executions needed for the three interpolants are 6760, 1909, and 1299, respectively, which are also the number of points of the three corresponding adaptive sparse grids.

[44] With the surrogate systems,
, constructed using the linear, quadratic, and cubic hierarchical basis, the PPDF of model parameters are estimated by conducting MCMC simulations. The DREAM-based MCMC simulation without using the surrogate systems is also conducted, which is referred to the conventional MCMC, and its results are used as the reference to evaluate accuracy and efficiency of the three basis functions. All the MCMC simulations are conducted using the same searching domain Γ listed in Table 2. The prior distribution of each parameter is assumed to be uniform distribution with bounds the same as the searching domain. Each MCMC simulation draws 60,000 parameter samples using three Markov chains, each of which evolves 20,000 generations. Convergence of the Markov chains is examined using the Gelman-Rubin R statistic [*Gelman et al*., 1995], which indicates that the chains converge after 720, 840, 970, and 760 generations for the linear, quadratic, and cubic surrogates and the conventional MCMC, respectively. For simplicity, the first 1000 generations of each chain are discarded in all the four simulations, and the remaining
samples are used to estimate the PPDF.

[45] Figure 7 plots the marginal PPDFs for the six parameters. The black vertical line represents the true values of the six parameters listed in Table 2. The red-solid lines are the marginal PPDFs estimated by the conventional MCMC, and the dashed lines represent those estimated by the MCMC simulations based on the surrogate systems. The figure indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are close to those of the conventional MCMC. However, the surrogate-based MCMC needs significantly fewer model executions. In comparison with the 60,000 model executions for the conventional MCMC, the number of model executions for the surrogate-based MCMC are 9226, 4375, and 3765 for the linear, quadratic, and cubic surrogate systems, respectively, which consists of those of global optimization, calculation of the inverse of Hessian matrix, and construction of surrogate systems. For the surrogate-based MCMC simulations, drawing the 60,000 parameter samples does not require any model executions but negligible computational time for polynomial evaluation using the surrogate systems. The improvement of computational efficiency by using our surrogate systems is more outstanding when more parameter samples are drawn in the MCMC simulation.

[46] The accuracy of the surrogate-based MCMC and the conventional MCMC is also compared by running the conventional MCMC with the same computational effort of surrogate-based MCMC, i.e., using the number of model executions needed to construct the surrogate systems. The marginal PPDF for each parameter based on the conventional MCMC with 9226, 4375, and 3765 samples are plotted in Figure 8 as dashed lines. Comparing Figures 7 and 8 indicates that, with the same number of model executions, the approximations in Figure 7 using our surrogate systems are more accurate than those in Figure 8 using the conventional MCMC, suggesting the efficiency of our surrogate-based MCMC method.

[47] To investigate computational efficiency between the three linear, quadratic, and cubic interpolants, Figure 9 plots their error decay with the number of interpolation points. To attain the same error, the cubic interpolant needs significantly fewer interpolation points than the linear and quadratic interpolants. This indicates that the surrogate system based on high-order hierarchical basis (i.e., the cubic basis) is more efficient than that with linear hierarchical basis. It suggests that, when computational resources are limited, using higher-order hierarchical basis is a better choice.

[48] Predictive performance of the four types of MCMC simulations is evaluated by using the parameter samples obtained above to predict spatial distribution of the concentration of species *C*_{3} with a different velocity of
at time
and
. For each parameter sample drawn in the conventional MCMC, PHT3D is run for predictions; for the samples drawn in the surrogate-based MCMC, the predictions are estimated based on the surrogate systems. The surrogate systems are built via
in equation 35 by setting
and
. Figure 10a shows that the upper and lower bounds of the 95% credible intervals based on the conventional and surrogate-based MCMC are identical. Figure 10b plots the probability density functions of *C*_{3} at a fixed location and time
obtained using the four types of MCMC simulations. Figure 10b indicates that the four density functions are almost identical except at the peak. However, the computational cost of prediction is dramatically different for the conventional MCMC and surrogate-based MCMC. While the conventional MCMC needs to run the model for 57,000 times (the number of parameter samples), the surrogate-based MCMC only needs to run the models for 1853, 1032, and 793 times for building the linear, quadratic, and cubic surrogate systems, respectively.

### 4.2. Case 2: Reactive Transport of Uranium (VI) in Column Experiment

[49] The second synthetic study is designed based on the uranium reactive modeling of *Kohler et al*. [1996], who conducted seven column experiments in a well-characterized U(VI)-quartz-fluoride column system and simulated the experiments using seven alternative surface complexation models (C1–C7) with different numbers of functional groups and reactions. The models were calibrated against three column experiments (Experiments 1, 2, and 8) conducted under different experimental conditions, and the calibrated models are used to predict the remaining four experiments (Experiments 3, 4, 5, and 7). Model C4 of *Kohler et al*. [1996] is used in this study. As shown in Table 3, the model has two functional groups, called weak site (
) and strong site (
), respectively. The weak site is associated with one reaction, and the strong site with two reactions. The model has a total of four parameters, three of which are the formation rates of the three reactions, denoted as *K*_{1}, *K*_{2}, and *K*_{3}. The fourth parameter is the fraction of the strong site, denoted as *Site* (the fraction of the weak site is calculated as 1 minus the fraction of the strong site). The 10-base logarithm of the parameters are listed in Table 3. In this study, following *Kohler et al*. [1996], the concentration data are generated using the computer code RATEQ (developed by *Curtis* [2005]) for the chemical conditions of Experiments 1, 2, and 8. The numbers of concentrations for Experiments 1, 2, and 8 are 39, 32, and 49, respectively. The synthetic data are corrupted by adding 3% random noise to the true concentration values.

U(VI) Surface Reaction | True Values | Γ |
---|---|---|

[−10.0, −2.0] | ||

[−6.0, −1.0] | ||

[0.1, 5.0] | ||

(Site) = –1.7104 | [−4, −0.1] |

^{a}Total site density used in this model is 1.3 M/L.

*Poeter et al*., 2008] with the searching regions listed in Table 3 are

*g*is positive for numerical convenience) is estimated in the searching domain Γ. After 2309 model executions, the first maximum is found as

[52] Figure 11 illustrates the relation between the above defined high-probability region and the searching region. To make the visualization possible,
is fixed at its optimum of −3.4077, and the high-probability region of the other three parameters is plotted in the gray region of Figure 11, which is transformed from the unit cube
after rotation and dilation. The figure shows that the volume of the high-probability region is dramatically smaller than that of the searching region Γ given in Table 3. Nevertheless, the high-probability region is sufficiently large to cover all the MCMC samples obtained using DREAM around
, which are plotted as the blue dots in Figure 11a. Based on the three-dimensional high-probability region, the adaptive sparse-grid interpolant
is constructed using equation 35 and by setting
,
, *L* = 20, and the tolerance
. The interpolant is built using the linear, quadratic, and cubic basis functions, and the number of model executions needed for the three interpolants are 1577, 633, and 393, respectively. The sparse grid of cubic basis function for parameters log
, log
, and log
is shown in Figure 11b. Since the number of model executions to build an isotropic sparse grid is 6017, using the adaptive sparse grids surrogate is more computationally efficient. Among the three basis functions, the cubic hierarchical basis is more efficient and thus used for the calculation below. Using the cubic basis function, the number of model executions needed to build the four-dimensional sparse grid,
, for all the model parameters increases to 548.

[55] Figure 12 plots the marginal posterior distribution of the four parameters and the 2-D contours of their combinations obtained using DREAM- and surrogate-based MCMC. For each MCMC simulation, like in the first numerical experiment, a total of 60,000 parameter samples are drawn using three Markov chains. The Gelman-Rubin R statistic indicates that the Markov chains converge after 600 and 420 samples for the DREAM- and surrogate-based MCMC, respectively. For simplicity, the first 600 samples are discarded in both the simulations, and the remaining samples are used to estimate the PPDF. Figure 12 indicates that the MCMC results based on the surrogate systems constructed by our aSG-hSC method are almost identical to those obtained using DREAM. However, considering that the number of model executions for the DREAM- and surrogate-based MCMC are 60,000 and 9647, respectively, with comparable accuracy, the surrogate-based MCMC is significantly more efficient.

[56] Predictive performance of the DREAM- and surrogate-based MCMC is evaluated by using the parameter samples obtained above to predict the breakthrough curve of Experiment 4 of *Kohler et al*. [1996] with 118 measurements. Like in Case 1, a cubic surrogate system is built at each predicted point, which costs 411 model executions. Figure 13a plots the upper and lower bounds of the 95% credible intervals for the predictive breakthrough curve obtained from the DREAM- and surrogate-based MCMC. The two sets of credible intervals are visually identical. Figure 13b plots the density functions of concentrations at a pore volume of 3.76 in the predicted breakthrough curve obtained from the two kinds of MCMC simulations. The two sets of distributions are very close. While only 411 model executions are needed to build the surrogate system using the cubic basis function and to obtain the results in Figure 13, the DREAM-based MCMC requires 58,200 model executions.

*Lu et al*. [2012b] and

*Shi et al*. [2012] showed that nonlinear regression and Bayesian methods may give similar results for quantifying parametric uncertainty, while nonlinear regression methods only require hundreds of model executions. To answer this question, nonlinear regression is conducted using UCODE_2005, which minimizes the sum of squared weighted residual (SSWR) to estimate the local minimum of the parameters and calculate the parameter estimation covariance matrix. Since UCODE_2005 can incorporate prior information into the nonlinear regression, the prior density given in equation 48 is used in the UCODE_2005 optimization. The initial parameter values used for the local optimization are selected randomly in the searching region, Γ, listed in Table 3. The calibrated parameter values are , which is very close to the second mode, , on the PPDF. If one adjusts the initial parameter values, the first mode, , on the PPDF may be obtained. However, it is impossible to obtain the two modes simultaneously, despite that the two modes have similar density (Figure 12). Therefore, the discussion here is focused on the current local optimum that is close to . Its corresponding covariance matrix is

*Hill and Tiedeman*, 2007]. Therefore, the covariance matrix estimated above based on linearity assumptions cannot accurately quantify parametric uncertainty. While there may be other methods that are comparable with the surrogate-based MCMC in terms of accuracy and efficiency for parametric uncertainty quantification, identifying such methods and having a comprehensive comparison is beyond the scope of this study.

## 5. Conclusion

[58] This paper presents a new adaptive sparse-grid high-order stochastic collocation method (aSG-hSC) to improve the computational efficiency of Bayesian inference for quantification of parametric uncertainty. The method is model independent and flexible to be used together with any MCMC algorithms and likelihood functions (formal and informal). This study tackles a challenging problem of groundwater reactive transport modeling. High nonlinearity of groundwater reactive transport models cause difficulties of developing an accurate and efficient surrogate of the models and capturing significant modes on parameter distributions. These problems are resolved by combining high-order hierarchical polynomial basis and the local adaptive sparse-grid technique, which can greatly reduce the computational cost for the desired surrogate system in comparison with the existing sparse grid methods. To further reduce the computational cost of constructing a surrogate system for a parameter distribution with multiple modes, the iterative aSG-hSC algorithm is developed that uses optimization methods to find the modes sequentially. For each mode, a high-probability region is built, on which a component of sparse grids is constructed. The high-probability regions are significantly smaller than the searching region of MCMC simulation, which is the reason for saving computational cost. The iterative aSG-hSC method is demonstrated using two numerical examples of groundwater reactive transport models. In the both cases, the aSG-hSC method provides almost identical results of DREAM-based MCMC but requires a dramatically smaller number of model execution for estimating parameter distributions and quantifying predictive uncertainty. The first example involves only linear reactions and is suitable to demonstrate that higher-order hierarchical basis functions are more efficient. The second example involves nonlinear reactions and is thus highly nonlinear. Its parameter distributions are multimodal and non-Gaussian, and these features can be well captured in the results obtained using the iterative aSG-hSC method. These features however cannot be captured by the linear regression method investigated in this study. The computationally efficient aSG-hSC method is critical to the practical application of Bayesian inference to time-consuming groundwater reactive transport modeling. Due to the nonintrusive nature of the new method, it can be used together with many models and sampling methods used in hydrology and other fields.

[59] As a surrogate method, the iterative aSG-hSC method has some limitations. First, its computational performance relies on the ability of finding the modes using optimization methods. If the execution of the global optimization solver is computationally expensive, then it will deteriorate the efficiency of aSG-hSC. If the optimization fails to find an optimum parameter set at a given iteration, a significant mode may be missing. In this case, one has to sacrifice computational efficiency and use more sparse grid points. However, it is worth mentioning that this kind of challenge is not specific to our method but to all numerical algorithms of uncertainty quantification and optimization. It is expected that this problem can be resolved with advances in optimization techniques. In addition, since the Hessian matrix is to determine the high-probability domain for each significant mode, it remains empirical at this moment to find the optimal value for the user-defined constant in equation 8. When the shape of the detected significant mode is extremely complicated, the commonly used value, e.g., , may not be appropriate to cover the high-probability region. In other words, the reduction of the bounds for building sparse grids may not be significant as shown in the numerical examples of this study. The major challenge resides in nonsmoothness of the surface of parameter distributions due to nonlinearity of groundwater reactive transport models. Reducing nonlinearity may be a solution to the problems mentioned above.

## Acknowledgments

[60] G. Zhang was supported by the Advanced Simulation Computing Research (ASCR), Department of Energy, through the Householder Fellowship at ORNL. M. Ye was supported by the DOE Early Career Award, DE-SC0008272. M. Gunzburger was supported by the US Air Force Office of Scientific Research under grant FA9550-11-1-0149. C. Webster was supported by the US Air Force Office of Scientific Research under grant 1854-V521-12. C. Webster was also sponsored by the Director's Strategic Hire Funds through the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory (ORNL). The ORNL is operated by UT-Battelle, LLC, for the United States Department of Energy under Contract DE-AC05-00OR22725.