Volume 53, Issue 8 p. 6784-6801
Research Article
Free Access

Large-scale inverse model analyses employing fast randomized data reduction

Youzuo Lin

Corresponding Author

Youzuo Lin

Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA

Correspondence to: Y. Lin, [email protected]Search for more papers by this author
Ellen B. Le

Ellen B. Le

Institute for Computational Sciences and Engineering, University of Texas at Austin, Austin, Texas, USA

Search for more papers by this author
Daniel O'Malley

Daniel O'Malley

Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA

Search for more papers by this author
Velimir V. Vesselinov

Velimir V. Vesselinov

Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA

Search for more papers by this author
Tan Bui-Thanh

Tan Bui-Thanh

Institute for Computational Sciences and Engineering, University of Texas at Austin, Austin, Texas, USA

Search for more papers by this author
First published: 06 July 2017
Citations: 19

Abstract

When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called “sketching” matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.

Key Points

  • We have developed a computationally efficient, scalable, and implementation-friendly randomized geostatistical inversion method
  • Our method is especially suitable for inverse modeling with a large number of observations
  • Our method yields a comparable accuracy to other geostatistical inverse methods

1 Introduction

The permeability of a porous medium is of great importance for predicting flow and transport of fluids and contaminants in the subsurface [Carrera and Neuman, 1986; Sun, 1994; Carrera et al., 2005]. A well-understood distribution of permeability can be crucial for many different subsurface applications, such as (1) forecasting production performance of geothermal reservoirs, (2) extracting oil and gas, (3) estimating pathways of subsurface contaminant transport, and many others.

Various hydraulic inversion methods have been proposed to obtain subsurface permeability [Neuman and Yakowitz, 1979; Neuman et al., 1980; Carrera and Neuman, 1986; Sun, 1994; Kitanidis, 1997a; Zhang and Yeh, 1997; Carrera et al., 2005], of which geostatistical inversion approaches are the most widely used [Kitanidis, 1995; Zhang and Yeh, 1997; Kitanidis, 1997a, 1997b; Vesselinov et al., 2001a]. Geostatistical inversion can be more advantageous than many other subsurface inverse modeling methods in that not only can it provide uncertainty estimates, but it is also suitable for sequential data assimilation [Vesselinov et al., 2001a, 2001b; Illman et al., 2015; Yeh and Simunek, 2002]. However, as pointed out in Vesselinov et al. [2001b] and Illman et al. [2015], one drawback of geostatistical inversion methods is its high-computational cost when the number of observations is large and the model is highly parameterized. In recent years, with the help of regularization techniques [Tarantola, 2005; Engl et al., 1996], there is a trend to increase the number of model parameters [Hunt et al., 2007]. It has been suggested that these highly parameterized models have great potential for characterizing subsurface heterogeneity [Tonkin and Doherty, 2005; Hunt et al., 2007]. Meanwhile, as the theory and computational tools for subsurface characterization quickly move into the new era of “big data,” many existing methodologies are facing the challenge of handling a large number of unknown model parameters and observations. Therefore, it is important to address the theoretical and computational issues of the geostatistical inversion methods.

The costs related to geostatistical inversion methods can be broken into two parts: the computational cost and the memory cost. A number of computational techniques have been proposed to alleviate the costs of computation [Saibaba and Kitanidis, 2012; Liu et al., 2013; Ambikasaran et al., 2013; Liu et al., 2014; Lee and Kitanidis, 2014; Lin et al., 2016] or memory [Nowak et al., 2003; Schoniger et al., 2012; Saibaba and Kitanidis, 2012; Kitanidis and Lee, 2014; Lee and Kitanidis, 2014]. Some studies targeted both computation and memory costs [Saibaba and Kitanidis, 2012; Kitanidis and Lee, 2014; Lee and Kitanidis, 2014].

One major direction to reduce the computational cost is based on subspace approximations, i.e., solving a small-size approximated problem residing in a lower-dimensional subspace. Several types of subspaces have been utilized, including principle component subspaces [Kitanidis and Lee, 2014; Lee and Kitanidis, 2014; Tonkin and Doherty, 2005], Krylov subspaces [Lin et al., 2016; Liu et al., 2014; Saibaba and Kitanidis, 2012], subspaces spanned by reduced-order models [Liu et al., 2014], hierarchical matrix decompositions [Ambikasaran et al., 2013; Saibaba and Kitanidis, 2012], and active subspaces [Constantine et al., 2014].

In geostatistical inversion methods, a majority of the memory is used in storing the matrices, such as the Jacobian matrix and the covariance matrix [Kitanidis and Lee, 2014; Lee and Kitanidis, 2014]. In situations with a large number of measurements and model parameters, it is prohibitively expensive to store these matrices. To overcome the memory issues, matrix-free or low-rank approximation methods have been developed. Specifically, Kitanidis and Lee [2014] and Lee and Kitanidis [2014] developed a matrix-free Jacobian to approximate the multiplication of the Jacobian matrix with a vector by finite-difference operations. To further reduce the memory cost associated with storing the covariance matrices, various computational methods have been developed. Nowak et al. [2003] developed FFT-based geostatistical inversion method, which is restricted to regular grids, but it only needs to store the first line of the covariance matrix. Ensemble Kalman filters (EnKFs) and related methods have also been proposed for geostatistical inversion to avoid the storage and handling of large covariance matrices [Schoniger et al., 2012]. Low-rank matrix approximation-based techniques have also been employed, such as hierarchical decomposition [Ambikasaran et al., 2013; Saibaba and Kitanidis, 2012] and principal component decomposition [Kitanidis and Lee, 2014; Lee and Kitanidis, 2014]. Recent work [Lee et al., 2016] reported a computationally efficient method to generate a preconditioner by using Generalized Eigenvalue Decomposition and the Sherman-Morrison-Woodbury formula. Another popular computational method to reduce the data size and computational cost is based on the extraction of temporal moments from large data sets [Yin and Illman, 2009; Zhu and Yeh, 2006; Nowak and Cirpka, 2006; Cirpka and Kitanidis, 2000].

Randomized algorithms have received a great deal of attention in recent years [Drineas and Mahoney, 2016]. They can be seen as either sampling or projection procedures [Mahoney, 2011]. Their main idea is to construct a sketch matrix of the input matrix. The sketch matrix is usually a smaller matrix that yields a good approximation and represents the essential information of the original input. In essence, a sketching matrix is applied to the data to obtain a sketch that can be employed as a surrogate for the original data to compute quantities of interest [Drineas and Mahoney, 2016]. Randomized algorithms have been successfully applied to various scientific and engineering domains, such as scientific computation and numerical linear algebra [Le et al., 2017; Meng et al., 2014; Drineas et al., 2011; Lin et al., 2010; Rokhlin and Tygert, 2008], seismic full-waveform inversion and tomography [Moghaddam et al., 2013; Krebs et al., 2009], and medical imaging [Huang et al., 2016; Wang et al., 2015; Zhang et al., 2012].

Here we present a new geostatistical inversion method using a randomization-based data reduction technique to reduce both the computation and memory costs. We use Gaussian projection to produce the sketching matrix [Johnson and Lindenstrauss, 1984] in a matrix and a direct linear solver to obtain the solution of the surrogate problem. A numerical cost analysis will show that our new randomized geostatistical inversion method improves the computational efficiency and reduces memory cost significantly. To evaluate the performance of our new randomized geostatistical inversion method, a test case is presented where a transmissivity field is estimated from observations of hydraulic head. By comparing the results with those obtained using the conventional principal component geostatistical approach, we show that our method significantly reduces the computational and memory costs while maintaining the accuracy of the inversion results.

In the following sections, we first briefly describe the fundamentals of inverse modeling and geostatistical inversion methods (section 2). We then develop and discuss a randomized geostatistical inversion method (section 3). We further elaborate on the computational and memory costs of our method (section 4). We then apply our method to test problems and discuss the results (section 5). Finally, concluding remarks are presented in section 6.

2 Theory

2.1 Inverse Modeling

We consider a transient groundwater flow equation. The forward modeling problem can be written as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0001(1)
where h is the hydraulic head, T is the transmissivity, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0002 is the nonlinear forward operator mapping from the transmissivity to the hydraulic head, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0003 is a term representing additive noise that follows a normal distribution:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0004(2)
where R is the error covariance matrix.
The problem of hydrogeologic inverse modeling is to estimate the transmissivity from available measurements. Usually, such a problem is posed as a minimization problem:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0005(3)
where d represents a hydraulic head data set and m is the vector of model parameters, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0006 measures the data misfit, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0007 stands for the urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0008 norm. Minimizing equation 3 yields a model urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0009 that minimizes the mean-squared difference between observed data and model predictions. However, inverse problems are often severely ill-posed. Moreover, because of the nonlinearity of the forward modeling operator f, the solution of the inverse problem may be nonunique and null sets of parameters might provide acceptable inverse solutions. Regularization techniques can be used to address the nonuniqueness of the solution and reduce the ill-posedness of the inverse problem. A general regularization term can be incorporated into equation 3 as [Vogel, 2002; Hansen, 1998]:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0010(4)
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0011(5)
where urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0012 is a general regularization term and the parameter λ is the regularization parameter, which controls the amount of regularization in the inversion.

2.2 Geostatistical Inverse Modeling

To account for the errors in the observations and the model, we follow the work of Kitanidis and Lee [2014] and Lee and Kitanidis [2014], and employ the generalized least squares approach that produces weights to the data misfit and regularization terms in equation 5 using covariance matrices:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0013(6)
The weighted data misfit and regularization terms are defined as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0014(7)
and
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0015(8)
where X is a drift (trend) matrix, Q is the covariance matrix of the model parameters, and R is defined in equation 2.
Using the Jacobian matrix H of the forward modeling operator f defined as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0016(9)
the linearized function of the forward modeling operator f can be defined as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0017(10)
where urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0018 is the current solution and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0019 is the previous solution. Following Kitanidis [1997b] and Nowak and Cirpka [2004], the current solution urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0020 in equation 10 is given by:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0021(11)
where the vectors of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0022 and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0023 are solutions to the linear system of equations:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0024(12)

2.3 Computational Approaches for Solving Geostatistical Inverse Modeling

The most computational and memory intensive parts of solving the geostatistical inverse model in equation 12 are the construction of the Jacobian matrix, H, and the matrix products involves the Jacobian, particularly urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0025 in equation 12. Various techniques are employed to address these issues. In Kitanidis and Lee [2014] and Lee and Kitanidis [2014], the principal component geostatistical approach (PCGA), a seminal computational method in solving the geostatistical inverse model, is proposed and developed. To bypass the expensive explicit construction of the Jacobian matrix, a finite difference scheme is used to approximate a generic Jacobian-vector multiplication of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0026, i.e.:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0027(13)
where x is a n-dimensional vector and δ is the finite difference interval. Furthermore, a low-rank approximation of the covariance matrix Q is used
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0028(14)
where urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0029 is the rank-k approximation of the covariance matrix Q, Z is the square root of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0030 obtained using eigen decomposition, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0031 is the ith column vector of Z. Based on equations 13 and 14, the expensive matrix-matrix operations of HQ and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0032 can be reformulated as matrix-vector operations
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0033(15)
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0034(16)

Another computational technique to reduce the cost of matrix products with the Jacobian matrix is to use a hierarchical representation of the covariance matrix [Saibaba and Kitanidis, 2012]. The hierarchical representation of a matrix is accomplished by having split the given matrix into a hierarchy of rectangular blocks and approximating each of the blocks by a low-rank matrix [Saibaba and Kitanidis, 2012; Bebendorf, 2008; Borm et al., 2003].

With the Jacobian matrix obtained approximately, two main categories of numerical methods have been developed to solve the linear system of equations in equation 12. One is based on direct solvers [Lee and Kitanidis, 2014; Kitanidis and Lee, 2014] and the other is based on iterative solvers [Liu et al., 2014; Saibaba and Kitanidis, 2012; Nowak and Cirpka, 2004]. Direct solvers are mostly used in situations when the size of problems ranges from small to medium scale and the system matrix in equation 12 can therefore be explicitly constructed [Lee and Kitanidis, 2014; Kitanidis and Lee, 2014]. As pointed out in Lee and Kitanidis [2014], direct solvers can be used to solve dense linear systems of dimension up to urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0035. For large-scale problems (dimension urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0036), matrix-free representations can be used, and Krylov-subspace based iterative solvers such as GMRES [Saad and Schultz, 1986] or MINRES [Paige and Saunders, 1975] are favored over direct methods to solve equation 12 [Liu et al., 2014; Saibaba and Kitanidis, 2012].

The use of direct solvers or iterative solvers to solve equation 12 can be memory bound [Lee and Kitanidis, 2014; Kitanidis and Lee, 2014]. Such a limitation can significantly reduce the computational efficiency when a large number of measurements are available. In particular, it can be observed from equation 12 that the number of equations is of the same order as the number of the measurements. In many subsurface applications, it is increasingly common to calibrate models using a very large number of observations (e.g., urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0037 or more). Using the computational techniques discussed above to solve linear system of equations of such a scale is beyond the computability and storage capacity of any method regardless of the choice of direct or iterative solvers. As pointed out in Kitanidis and Lee [2014], the computational methodologies discussed so far work best for problems with a modest number of observations. Therefore, there is a need to develop computational methods that allow an efficient solution of equation 12 with a large number of measurements.

There have been some studies addressing this important need for data reduction to reduce computational and storage costs. In Lee et al. [2016], PCGA was extended to handle data-intensive inverse problems by constructing a fast preconditioner of the cokriging matrix leading to accelerated iterative matrix inversion. Specifically, using a similar notation as Lee et al. [2016], urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0038, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0039, the exact inversion of the system matrix in equation 12 can be written as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0040(17)

By further employing the Sherman-Morrison-Woodbury formula and Generalized Eigenvalue Decomposition (GED) [Golub and Van Loan, 1996], the dominant cost of solving urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0041 can be significantly reduced by low-rank approximation, while the overall accuracy is well maintained. It has been pointed out in Lee et al. [2016] that GED can be efficiently implemented by using either the sequential Lanczos-based method or the parallelized randomized SVD method. Lee et al. [2016] concluded that this computational technique can be either used as a direct solver or as a preconditioner for iterative solution of equation 12. In the numerical examples therein, the authors estimated the hydraulic conductivity field of a laboratory-scale sand box using 6 million MRI-scanned tracer concentration observations directly within a reasonable time.

Another popular computational method to reduce the data size and computational cost is based on extraction of temporal moments from large data sets. Researchers have applied such a technique to various data sets such as transient pressure [Yin and Illman, 2009; Zhu and Yeh, 2006] and concentration breakthrough curves [Nowak and Cirpka, 2006; Cirpka and Kitanidis, 2000]. Temporal moment based data reduction methods have been shown to be very efficient in reducing the data. Their major drawback, however, is that the system response must be integrable (except when using truncated temporal moments), so this approach cannot be applied to dynamic systems with fluctuating drivers.

In the next section, we describe our approach to reduce the dimensionality of the data while maintaining the accuracy of the inverse results based on randomization theory. We will demonstrate that our method has no restrictions with respect to the mathematical properties of the data.

3 Randomized Geostatistical Inverse Modeling

3.1 Randomized Geostatistical Approach

We develop a new randomized geostatistical inversion method to reduce the data dimensionality and maintain the accuracy of the inversion result. The basic idea of this approach is to construct a sketching matrix S, then replace the data d with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0042, replace the forward model, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0043, with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0044, and the additive noise, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0045, with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0046; and use the PCGA method for inversion. By multiplying all vectors by S, we reduce the dimensionality (S has many columns, but not that many rows). At a high level, multiplying by the sketching matrix solves the problems associated with a high-dimensional observation space and the use of the PCGA method solves the problems associated with a high-dimensional parameter space. By combining these methods, we solve both problems. Additionally, if a PCGA implementation is available, the randomized geostatistical approach is extremely easy to implement in high-level languages such as Julia, Matlab and Python (our Julia implementation consists of three lines of code).

The misfit function of the randomized geostatistical inversion is given by
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0047(18)
where urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0048 is the sketching matrix and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0049 is a tunable reduced dimension. The sketching matrix is also referred to as a Johnson-Lindenstrauss Transform [Kane and Nelson, 2014; Woodruff, 2014; Mahoney, 2011; Dasgupta et al., 2010; Clarkson and Woodruff, 2009; Sarlos, 2006]. With the new misfit function defined in equation 18 and following a similar derivation as in the previous section, the following randomized linear system of equations is obtained
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0050(19)
At this point, we need to specify R. As discussed above, the forward model can be formulated as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0051(20)
we can therefore derive the data error covariance matrix R in equation 19 as:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0052(21)
With the randomized linear system of equations given in equation 19 and the covariance matrix in equation 21, we will have the corresponding solution iterate, which is similar to equation 11
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0053(22)
Correspondingly, the posterior covariance matrix can be derived similar to Kitanidis and Lee [2014]
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0054(23)
where
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0055(24)

3.2 Selection of the Sketching Matrix

Random projection is one class of methods for low-rank matrix approximation [Yang et al., 2016; Mahoney, 2011]. The idea of random projection is based on the Johnson-Lindenstrauss Lemma [Johnson and Lindenstrauss, 1984]. In particular, Johnson and Lindenstrauss [1984] pointed out that random projection yields the property of subspace embedding. Johnson and Lindenstrauss [1984] further provided a strategy to generate the random projection matrix. With the significant increase of data volumes, recent years have witnessed an explosion of research on so-called randomized numerical linear algebra algorithms, which use the power of randomization in order to perform standard matrix computations [Yang et al., 2016; Drineas and Mahoney, 2016; Iyer et al., 2016; Mahoney, 2011; Avron et al., 2010].

3.2.1 Subspace Embedding and Johnson-Lindenstrauss Lemma

Subspace embedding is the core of all randomization-based methods. It is a particular property of any randomization projection built upon the definition of column space, which is provided in the appendix. The randomized projection matrix (or sketching matrix) S is critical in reducing data dimensionality and preserving solution accuracy. The role of the sketching matrix can be seen as preconditioning the input data to spread out or uniformize the information contained in the data [Drineas and Mahoney, 2016]. With an appropriately selected sketching matrix, the solution to equation 18 yields a highly accurate approximation to the original problem in equation 3.

In Johnson and Lindenstrauss [1984], theoretical work is provided (through the proof of the Johnson-Lindenstrauss Lemma) to demonstrate the existence of a projection matrix (sketching matrix) that allows subspace embedding. Johnson and Lindenstrauss [1984] described the subspace embedding and further proved that a specially constructed sketching matrix S exists that allows to project with high (asymptotic) probability, N points in high-dimensional space to a much lower dimension without losing essential information. We provide the visualization of the Johnson-Lindenstrauss bounds to better illustrate the relation between the number of observations and the value of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0056 with respect to the distortion rate urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0057, which is defined in definition A.2 of the Appendix A. Specifically, Figure 1a shows a plot of the minimum urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0058 versus the number of observation, n, for different distortion rates urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0059. Figure 1b shows a plot of the minimum number of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0060 versus the distortion rate urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0061 for different number of observations. From Figure 1a, we can see the larger the number of observations, the larger the value of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0062 required to preserve a given distortion rate. Similarly, in Figure 1b, with the number of observations fixed, the larger the value of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0063, the smaller the distortion rate becomes.

Details are in the caption following the image

Johnson-Lindenstrauss bounds: (a) minimum urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0064 versus the number of observation, n, for different distortion rates urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0065; (b) minimum number of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0066 versus the distortion rate urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0067 for different number of observations, n. Clearly, an increase in n requires an increase in urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0068 to preserve a given distortion rate urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0069.

3.2.2 Construction and Selection of Sketching Matrix

Practically, various methods have been proposed to construct the sketching matrix, S [Drineas and Mahoney, 2016; Mahoney, 2011]. The most important criteria for construction and selection of the sketching matrix are based on computational complexity, the ability to apply it to arbitrary data, and the quality of data reduction.

Sketching matrices can be categorized into two types, those based on projection and those based on sampling [Drineas and Mahoney, 2016; Mahoney, 2011]. Sampling-based sketching matrices are easy to implement. However, these methods are data dependent, and, therefore, may not provide robust reduction performance. Projection-based sketching matrices can be applied to arbitrary data. Two of the widely used sketching matrices are obtained using Gaussian projection or a randomized Hadamard transform. The Gaussian projection sketching matrix can be represented by independent identically distributed (i.i.d.) Gaussian random variables, i.e., matrix values drawn from the standard Gaussian distribution [Drineas and Mahoney, 2016]. The randomized Hadamard transform sketching matrix is represented by a product of two matrices, a random diagonal matrix with +1 or −1 on each diagonal entry, each with probability 1/2, and the Hadamard-Walsh matrix [Ailon and Chazelle, 2010]. There are other construction methods of sketching matrices designed for specific cases. In Avron et al. [2010], the construction of the sketching matrix is the product of a random diagonal matrix and the discrete cosine transform. In Iyer et al. [2016], a specific sketching matrix is designed for solving large-scale and sparse systems. In this work, we employ the randomization matrix scheme similar to the one used in Le et al. [2017] considering its simplicity to implement, its independence from the data, and stronger conditioning properties than other sketching matrices [Drineas and Mahoney, 2016]. Hence, the Gaussian random projection matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0070 can be represented by
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0071(25)
where G is sampled i.i.d. from urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0072.

3.3 Randomized Geostatistical Inversion Algorithm

To summarize our new randomized geostatistical inversion algorithm, we provide a detailed description of the algorithm below.

Algorithm 1. Randomized Geostatistical Approach (RGA)

Require: urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0073, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0074;

Ensure: urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0075

 1: Initialize urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0076;

 2: Initialize urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0077, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0078;

 3: Generate the sketching matrix according to section 3.2;

 4: Obtain the data-reduced problem according to equation 20;

 5: Update the data covariance matrix R according to equation 21;

 6: while { urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0079} do

 7:  Solve for the solution of the reduced linear system of equations in equation 19;

 8:  Update the iterate according to equation 22;

 9:  if {Stopping criterion are satisfied} then

10:    urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0080;

11:   Return with current iterate urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0081;

12:  end if

13: end while

Both direct linear solvers and iterative solvers can be utilized to solve the reduced linear system of equations in equation 19. Considering that, in most cases, the reduced linear system of equations usually yields relatively small system matrices, we use a direct solver to solve the reduced linear system of equations.

4 Computational and Memory Cost Analysis

To better understand the cost of our new randomized geostatistical inversion algorithm, we provide both the computational and memory cost analysis of our method. We assume that the number of model parameters is urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0082, the number of observations is urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0083, hence the size of the Jacobian matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0084 and the covariance matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0085. We also denote the rank of the sketching matrix by urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0086. The drift matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0087. As a reference method, we select the method of PCGA, which is developed in Kitanidis and Lee [2014] and Lee and Kitanidis [2014].

4.1 Computational Cost

Considering that most of the numerical operations in Algorithm 1 involve only matrix and vector operations, we use the floating point operations per second (FLOPS) and the big- urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0088 notation to quantify the computational cost [Golub and Van Loan, 1996]. In numerical linear algebra, basic linear algebra subprograms (BLAS) are categorized into three levels. Level-1 operations involve an amount of data and arithmetic that is linear in the dimension of the operation. Those operations involving a quadratic amount of data and a quadratic amount of work are Level-2 operations [Golub and Van Loan, 1996]. Following this notation and given a vector of length n and a matrix size of n × n, vector dot-product, addition and subtraction are examples of BLAS Level-1 operations (BLAS 1). They involve urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0089 amount of data and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0090 amount of arithmetic operations. Matrix-vector multiplication is a BLAS Level-2 operation and it involves urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0091 amount of data and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0092 amount of arithmetic operations. Matrix-matrix multiplication is a BLAS Level-3 operation and it involves urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0093 amount of data and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0094 amount of arithmetic operations.

First, we provide the computational cost of the PCGA method followed by the computational cost of RGA method, PCGA employs a matrix-free iterative approach (equations 13-16) for solving the cokriging system in equation 12. The total computational cost is [Lee and Kitanidis, 2014]:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0095(26)
where τ is the iteration number, and k is the rank of the approximated covariance matrix Qk in equation 14. Because of the randomization technique used in the RGA method, the size of the system matrix in equation 19 has been significantly reduced. Therefore, direct linear solver such as QR factorization is feasible for solving the linear system of equations in equation 19 [Golub and Van Loan, 1996]. Using QR-factorization to solve equation 19, the computational cost of the RGA method is given by:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0096(27)
where the first term corresponds to the cost of QR factorization, and the second term is the cost to form the right-hand side and the cost to perform the back substitution. As an alternative, we can also employ the matrix-free iterative approach to solve equation 19 as done in the PCGA method. The resulting computational cost then is given by:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0097(28)

By comparing to equations 27 and 28, we observe that the RGA method is more efficient. However, it should be noted here that this analysis only explores the computational cost of the linear algebra associated with performing an iteration of the inverse analysis. The overall computational cost should also include the computational cost for solving the forward model repeatedly. However, when PCGA is used and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0098 is sufficiently large, the computational cost associated with the linear algebra operations dominate. By reducing the cost of the linear algebra operations, RGA results in a situation where the computational cost of repeatedly solving the forward model is the dominant cost in the inverse analysis.

4.2 Memory Cost

Both the RGA and PCGA methods rely on dense matrix storage. Hence, the major memory cost is that used to store the matrices. Out of all these matrices, the largest matrices required to be stored are Z and HZ in equations 14 and 16 for the PCGA method or the matrix in equation 19 for our method. The dimensions of system matrices Z and HZ are urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0099 and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0100. Hence, the total memory cost of the PCGA method will be:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0101(29)
Similarly, we can also calculate the dimension of the corresponding linear system of equations in equation 19 for our method. Provided with a rank urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0102 sketching matrix, the dimension of the resulting linear system of equations will be urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0103. Hence, the total memory cost including both the sketching matrix and linear system of equations is:
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0104(30)
Comparing equation 30 to equation 29, we see that the memory cost of our method is approximately urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0105 of that of the PCGA method. Practically, the sketching matrix can be generated “on-the-fly.” It means that instead of constructing the sketching matrix explicitly, we can generate the elements of the sketching matrix implicitly, therefore, the storage of the sketching matrix can mostly be saved, which results in a cost of
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0106(31)

Despite the considerably lower memory cost, the implementations of our method on top of PCGA are straightforward.

5 Numerical Results

In this section, we provide numerical examples to demonstrate the efficiency of our new randomized geostatistical inversion algorithm. A synthetic model study using transient groundwater flow is developed where the “observed” hydraulic heads were taken from a solution of the groundwater equation using a reference transmissivity field with the addition of noise. To have a comprehensive comparison, we provide four sets of tests. In section 5.1, we provide a convergence test of our method. In section 5.2, we report the performance of our method as a function of the number of rows, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0107, in the sketching matrix. In section 5.3, we test the robustness of our method with a view on the randomness of the sketching matrix. In section 5.4, we test our method on inverse problems with an increasing number of measurements up to 107. An important parameter in the PCGA method is the number of principal components (rank), urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0108. In all the tests using the PCGA method, we set urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0109.

We select Julia as our programming tool because of its efficiency and simplicity. Julia is a high-level programming language designed for scientific computing [Bezanson et al., 2014]. The Julia code for our RGA algorithm is available as a part of the open-source release of the Julia version of MADS (Model Analysis and Decision Support) at “http://mads.lanl.gov” [Vesselinov et al., 2015]. The methods of the QR factorization and fundamental BLAS operations are all implemented using the system routines provided in the Julia packages. As for the computing environment, we run the first three sets of tests on a computer with 40 Intel Xeon E5–2650 cores running at 2.3 GHz, and 64 GB memory, and the last set of tests on a higher-memory machine with 64 AMD Opteron 6376 cores running at 2.3 GHz and 256 GB of memory.

The stopping criterion is an important issue for any iterative method including our method. We use the following two stopping criteria
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0110(32)
and
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0111(33)
where urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0112, and Iter is the iteration count. IterMAX = 50 is the maximum number of iterations. If either equation 32 or equation 33 is satisfied, the iteration procedure is stopped and convergence is declared.

5.1 Test of the Convergence

In our first numerical example, we test the convergence of our new method. The reference model is solved on a grid containing 2-D 100 × 100 head nodes and a total of 20,200 model parameters (100 × 101 log-transmissivities along the x axis, 101 × 100 log-transmissivities along the y axis). Table 1 describes the model setup in more detail. We generate a ground truth, which is shown in Figure 3a. We utilize the variance ( urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0113) and an exponent ( urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0114—related to the fractal dimension of the field and the power law of the field's spectrum) to characterize the heterogeneity of the considered fields [Peitgen and Saupe, 1988]. In this example, we set the variance to urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0115 and the power exponent to urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0116. The total number of measurements generated in this test is 16,000, which come from running the transient simulation to simulate pumping tests at each well (a total of four tests) and acquiring data at all four locations (four sets of data for each test). In each test, 1000 hydraulic head observations are recorded at each well.

Table 1. Settings for the Calibration Used for the RGA Calibration
Parameter Value(s) Notes
Observation noise urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0117
Observations per well per pumping test 1000
Number of observation/pumping wells 4–10 See Figures 3 and 9 for locations
Prior covariance for T urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0118 Exponential model
True heterogeneity of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0119 urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0120 Fractal model

We illustrate one of the randomization matrices in Figure 2. The dimension of the randomization matrix is urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0121. The elements of the randomization matrix follow equation 25, a scaled normal distribution with mean 0 and standard deviation 1. Because of the width limitation of the page, we only show the first 1000 columns of the randomization matrix. We set the color scale of Figure 2 in the range urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0122 to enhance the visualization of the randomized matrix.

Details are in the caption following the image

Illustration of the randomization matrix used in the presented analyses with dimension urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0123. The elements of the randomization matrix follow equation (25), a scaled normal distribution with mean 0 and standard deviation 1. Because of the width limitation of the page, we only show the first 1000 columns of the randomization matrix.

Figure 3b illustrates the result using the PCGA method and the Figure 3c shows the results for our method. Compared to the true model in Figure 3a, our method obtains a good result, representing both the high and low log-transmissivity regions. Visually, our method yields a comparable result to the one obtained using the PCGA method in Figure 3b.

Details are in the caption following the image

Synthetic log-transmissivity field (a) with variance urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0124 and power urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0125. Hydraulic conductivity and hydraulic head observation locations are indicated with circles. The results of the inverse modeling solved by (b) PCGA and (c) our RGA algorithm are shown. They are visually similar to each other. The RME values of the results in Figures 3b and 3c are 0.28 and 0.33, respectively. Hence, our RGA method yields comparable result to that obtained using the PCGA method.

To further quantify the error of the two inverse modeling methods, we calculate both the relative-model-error (RME) and relative-data-error (RDE) of the inversion results
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0126(34)
urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0127(35)
where m is the inverted transmissivity field and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0128 is the reference transmissivity field, m is the size of the model, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0129 is the prior standard deviation of the field, n is the size of the data, urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0130 is the standard deviation of the additive noise, d are the simulated data and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0131 are the observations used for inversion.

We provide a plot of the rate of convergence of the PCGA method and our RGA method in Figure 4. We observe that both our method and the PCGA yield a very similar rate of convergence as a function of the number of iterations steps. At each iteration, the methods yield similar relative data error and model error values. After convergence, the RME values of our RGA method and PCGA method are 0.33 and 0.28, respectively. Therefore, together with the inversion result in Figure 3, this demonstrates that our RGA method yields a comparable accuracy to the PCGA method in a situation where both methods can be applied. We note, however, that one of the main benefits of the RGA method is that it can be applied in situations with a very large number of observations and yield accurate results and efficient performance. In this example, it took RGA only about 1300 s to converge, spending 1210 s on forward modeling and only 0.03 s on inversion.

Details are in the caption following the image

Convergence of the PCGA (in black) and our RGA (in blue) algorithms in terms of iteration steps. The rates of convergence for these two methods are very close to each other. However, the computational time of two methods to reach convergence are very different. In this case, PCGA converged for about 32,000 s, while RGA convergence took only 1020 s. The RGA speed-up is about 31 times.

5.2 Test on the Rank of the Sketching Matrix

The rank of the random sketching matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0132 is critical to the accuracy and efficiency of our RGA method. In this section, we test our algorithm using sketching matrices with different rank values. The values of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0133 used in the problem are 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, and 8192.

In Figure 5, we provide the RME value and the RDE value as a function of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0134. We notice that, the larger urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0135 becomes, the smaller the error of the inversion. In addition, there is a significant decrease in the RME values with increasing urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0136 for low values of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0137, which means that the inversion results are improving. In particular, the inversion results are completely off when urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0138 is 4. When urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0139, the RME starts to level off while the RDE continues to decrease. Even though the data misfit of the inversion becomes smaller with increasing urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0140, hardly any useful information is introduced into the results.

Details are in the caption following the image

(a) RME and (b) RDE curves as defined in equations (34) and (35), respectively. For urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0141 increasing from 4 to 256, there is a significant decrease in RME. For urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0142, the RME curve starts to level off while RDE curve still reduces.

Figure 6 shows the corresponding wall time cost for different values of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0143. It can be observed that the time is quite stable around 500 s until urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0144, where the CPU time increases to about 550 s. When urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0145, the CPU time cost is 2902 s. This can be explained by the fact that when urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0146 is relatively small, the CPU time is mostly dominated by the forward modeling operations; while as urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0147 increases, the linear solver for the system in equation 19 starts to dominate.

Details are in the caption following the image

CPU time cost as a function of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0148. The CPU time is quite stable around 500 s for urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0149. The time cost dependency on urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0150 can be explained by the fact that when urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0151 is relatively small, the CPU time is mostly dominated by the forward modeling operation, while as urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0152 increases, the linear solver for the solution of the system in equation (19) starts to dominate.

From this test, we conclude that the optimal selection of the urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0153 value ranges from 256 to 1024 considering factors including model error, data misfit, as well as the corresponding time cost. In general, when choosing the value of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0154, one would want to choose a value that is large enough to produce accurate results (i.e., large enough to be in the flat portion of Figure 5a) and small enough so that the method is computationally efficient (i.e., small enough to be in the flat portion of Figure 6).

5.3 Test on the Randomness of the Sketching Matrix

Because of the random nature of our method, the resulting inversion can fluctuate among different realizations of the sketching matrix. In this test, we provide the inversion results and corresponding analysis using various sketching matrices. We use the same model set up as in Test 5.1. We generate 20 different realizations of the sketching matrix with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0155. Each of them is drawn from the Gaussian distribution with mean 0.0 and variance 1.0 according to equation 25. After convergence of each inversions, we calculated the relative data errors and relative model errors according to equations 34 and 35. The results in Figure 7 show that 19 out of 20 inversion runs converged. This shows that the random nature of the sketching matrix may lead to convergence failure in certain cases. Therefore, a safeguard will be necessary to prevent unconverged inversion results. One option is to use cluster analysis on the scatter plot in Figure 7 in order to detection inversion results that did not properly converge. However, this option can be computationally more expensive in that we need to postpone our decision until after all the computation is completed. An alternative is to access convergence directly from the inversion results using the convergence of the iteration. Specially, we provide a plot of RDE as a function of iteration number for the nonconverged inversion run in Figure 8. We observe that the RDE fluctuates between two values, and does not decrease as expected. Hence, this type of convergence plot can be utilized as a safeguard during the computation to avoid sketching matrices that lead to nonconverged inversion results. From this test, we conclude that even though a small number of realizations of the sketching matrix may lead to inappropriate inversion results, in most cases our method yields accurate results.

Details are in the caption following the image

Plot of relative data errors versus relative model errors according to equations (34) and (35) using 20 different realizations of sketching matrices. The sketching matrices leading to converged inversion results are shown in blue. In contrast, those leading to diverged results are shown in blue.

Details are in the caption following the image

Plot of convergence corresponding to the failure scenario indicated by the red scatter shown in Figure 7. The convergence plot can be utilized as a safeguard during the computation to discard the realization of a sketching matrix that will lead to a diverged result.

5.4 Test on the Number of Observations

To better understand the scalability of our method, we test RGA on a set of inverse problems that have an increasing number of observations. Specifically, we test our algorithm on inverse problems where the number of observations is equal to urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0156, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0157. As before, the observations come from simulating a series of pumping tests and recording “observations” at a number of monitoring wells. For each observation well, there are 1000 observations for each pumping test. The increasing number of observations comes from increasing the number of pumping tests and the number of observation wells. For example, the case with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0158 observations involves 16 pumping tests and 16 observations wells while the case with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0159 involves 100 pumping tests and 100 observation wells. The reference transmissivity field is same as the one as in Figure 3a. The value of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0160 is again set to 256.

Through our analysis on memory cost in section 4.2, we observe that both our RGA method and the PCGA method can be comparable if we construct the sketching matrix explicitly. However, our RGA method can be more memory efficient than the PCGA method when we generate the sketching matrix “on-the-fly.” Using RGA, we are able to perform the inverse analysis with 10 million observations. We tested our RGA method on all the problem sizes mentioned above and provide the corresponding results where the number of observations is urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0161, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0162 in Figure 9. We notice that our RGA method yields reasonable inversion results even when the size of the data sets becomes massive. The RME values of the inversion results in Figures 9b–9d are 0.26, 0.23, and 0.25, respectively. The availability of more measurements, in general, may lead to a better inversion. However, in our tests, data can be significantly redundant. Adding more data without increasing kred may not result in an improved inverse model. This explains what we observe in Figure 9, where the quality of the inverse models is similar. Finally, as a comparison, the PCGA method fails in all three cases of Figures 9b–9d because of the insufficient memory. This is due in part to our use of off-the-shelf matrix data structures and matrix-vector multiplication operators for the PCGA implementation and could be alleviated with the use of custom matrix data structures and matrix-vector multiplication operators.

Details are in the caption following the image

(a) The “true” field and (b) inversion results of our RGA method with different numbers of observations including urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0163, (c) urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0164, and (d) urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0165. Our RGA method yields reasonable inversion results when the size of the data sets becomes massive. As a comparison, the PCGA method fails in all three cases of (b–d) because of the insufficient memory.

We also provide the wall time costs of our method with different numbers of observations in Figure 10. Figure 10 shows both the wall time to perform the model calibration with RGA and the wall time to perform a single model run. Independent of problem size, the time to perform the full model calibration takes ∼28 times as long as performing a single model run and this could be reduced further with more CPU cores. Also we notice that the computational cost of RGA scales well with the number of observations. Through this test, we conclude that our method can more readily calibrate models with a large number of observations compared to the PCGA method when off-the-shelf matrices and matrix-vector operations are used.

Details are in the caption following the image

Wall-clock times to perform the model calibration with our RGA method and to perform a single model run. These times are shown for inverse analyses where the number of observations is urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0166, and urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0167. For all these analyses, which vary over two orders of magnitude, the time to perform the full model calibration takes 28 times as long as performing a single model run and this could be reduced further with more CPU cores. The cost of an individual run increases because more the time that is simulated increases to account for a larger number of pumping tests.

6 Conclusion

We have developed a computationally efficient, scalable, and implementation-friendly randomized geostatistical inversion method, which is especially suitable for inverse modeling with a large number of observations. Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). To overcome the issues of excessive memory and computational cost that arise when dealing with a large number of observations, we incorporated a randomized sketching matrix technique into PCGA. The randomization method can be seen as a data-reduction technique, because it generates a surrogate system that has a much lower dimension than the original problem.

Through our computational cost analysis, we show that this matrix sketching technique reduces both the memory and computational costs significantly. Compared to the PCGA method, our RGA method yields a much smaller problem to solve when computing the next step in the iterative optimization process, therefore reducing both the memory and computational costs. We demonstrate through our numerical example that our RGA method yields rather efficient computational and memory costs, which can be scaled with the information content of the applied observation data in the inverse process (the computational and memory costs of the RGA inverse analyses do not scale directly with the size of the observation data). It is reasonable to conclude that the efficiency improvement can be significant when the size of the data set increases.

In summary, with an ever-increasing amount of data being assimilated into hydrogeologic models, there is a need to develop an inverse method that is able to handle a large number of observations. Our RGA method addresses this need. The contribution of our work is to incorporate a randomized numerical linear algebra technique into the PCGA method. Through both a computational cost analysis and numerical tests, we show theoretically and numerically that our RGA method is computationally efficient and capable of solving inverse problems with urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0168 observations using modest computational resources (approximately 10 US dollars if state-of-the-art cloud services are employed). Therefore, it shows great potential for characterizing subsurface heterogeneity for problems with a large number of observations.

The RGA method is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). However, the implementation of RGA is relatively simple, and can be easily added to any existing code. Finally, the randomization method is not limited to hydrogeologic problems and applications. Being a successful data/dimensionality reduction technique, randomization can be applied to a broad set of applications in many science and engineering domains.

Acknowledgments

Youzuo Lin, Daniel O'Malley, Ellen B. Le, and Velimir V. Vesselinov were supported by Los Alamos National Laboratory Environmental Programs Projects. In addition, Daniel O'Malley was supported by a Los Alamos National Laboratory (LANL) Director's Postdoctoral Fellowship, and Velimir V. Vesselinov was supported by the DiaMonD project (An Integrated Multifaceted Approach to Mathematics at the Interfaces of Data, Models, and Decisions, U.S. Department of Energy Office of Science, grant 11145687). We thank Jonghyun Lee, Wolfgang Nowak, and the Associate Editor (Sander Huisman) for their valuable comments that helped improve our manuscript. The code that generates all of our synthetic data is available at https://github.com/madsjulia/GeostatInversion.jl.

    Appendix A: Definitions of Column Space and Subspace Embedding

    A.1. Definition: Column Space

    Consider a matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0169. Notice that as one ranges over all vectors urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0170 ranges over all linear combinations of the columns of A and therefore defines a d-dimensional subspace of urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0171, which we refer to as the column space of A and denote it by urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0172.

    With the column space defined in definition A.1, the definition of subspace embedding can be provided as:

    A.2. Definition: Subspace Embedding

    A matrix urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0173 provides a subspace embedding for urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0174 if urn:x-wiley:00431397:media:wrcr22773:wrcr22773-math-0175, such a S provides a low distortion embedding, and is called subspace embedding.