Ground penetrating radar (GPR) is used to image the shallow subsurface as evident in earth and planetary exploration. Electromagnetic (EM) velocity (permittivity) models are inverted from GPR data for accurate migration. While conventional velocity analysis methods are designed for multioffset GPR data, to our knowledge, the velocity analysis for zero-offset GPR has been underexplored. Inspired by recent deep learning seismic impedance inversion, we propose a deep learning guided technique, GPRNet, that is based on convolutional neural networks to directly learn the intrinsic relationship between GPR data and EM velocity. GPRNet takes in GPR data and outputs the corresponding EM velocity. We simulate numerous GPR data from a range of pseudo-random velocity models and feed the datasets into GPRNet for training. Each training data set comprises of a pair of one-dimensional GPR data and EM velocity. During training phase, the neural network's weights are updated iteratively until convergence. This process is analogous to full-waveform inversion in which the best model is found by iterative optimization until simulated data matches observed data. We test GPRNet on synthetic testing datasets and the predicted velocity models are accurate. A case study is presented where this method is applied on a GPR data collected at the former Wurtsmith Air Force Base in Michigan. The inversion results agree with velocity models established by previous GPR inversion studies of the similar area. We expect the GPRNet open-source software to be useful in imaging the subsurface for earth and planetary exploration.
We propose a deep learning-based electromagnetic velocity inversion for GPR zero-offset data
Tests on synthetic examples show accurate velocity inversion results
Applications to field data yield predictions that agree with the velocity models derived from previous physics-based inversion studies
Plain Language Summary
Ground penetrating radar (GPR) is often used to reveal structures beneath the surface. To image the subsurface, velocity models are created from GPR data from a process called inversion. However, inversion is a challenging task for many common GPR systems because they only use a single antenna. Conventionally, velocity inversion is best achieved by using data from multiple antennas. We propose and develop a data driven-based GPR velocity inversion technique by employing deep learning. Deep learning can find hidden relationships between the inputs and outputs by performing sophisticated approximations over large, interconnected webs of mathematical functions. Similarly, we let the deep learning process “learn” the relationship between GPR data (input) and velocity (output). Using the learned knowledge, we can predict velocity from GPR data. We tested our method on synthetic examples which yielded accurate predictions. We perform a field case study with our method and the velocity prediction shows comparable agreement with results established from previous literature reviews.
Ground penetrating radar (GPR) is a robust noninvasive geophysical method to image the electrical discontinuities in the shallow subsurface. GPR works by sending an electromagnetic (EM) pulse to the subsurface and reflected waves are recorded by antennas at the surface. Researchers often use GPR to study the shallow subsurface strata and to identify anomalous underground structures in many earth near-surface applications (Neal, 2004), for example, characterization of permafrost soil layers, detection of shallow engineering objects, and mapping hydrologic flow paths. Recent studies in imaging Martian regolith (Soldovieri et al., 2009) and lunar shallow structure (Xiao et al., 2015; Zhang et al., 2015) show promising for future planetary exploration applications. Geoscientists employ GPR in the field not only to reveal the subsurface structures but also to process the collected data to recover the subsurface EM velocity (), where = magnetic permeability (Henries per meter, ), = dielectric permittivity (Farad per meter, ), = relative dielectric permittivity (), dimensionless, and = speed of light (299,792,458 ). EM velocity is important for several uses including depth conversion, GPR data processing, migration, and understanding the characteristics, composition, and complexity of the subsurface lithology. In order to obtain the depth domain of the GPR data, accurate EM velocities are needed to convert the two-way travel time to depth. An accurate migrated data (image) reveals important information such as point scatterers, faults, and geological depressions that are otherwise difficult to spot in the raw data. In practice, many GPR migration rely on the input of a single EM velocity value instead of a realistic heterogeneous velocity model (Xiao et al., 2015).
To derive accurate EM velocity from GPR data, conventional velocity analysis often requires multichannel GPR data. As multichannel GPR system has multiple source-receiver offsets, seismic velocity analysis (Yilmaz, 2001) can be adopted to apply in GPR velocity analysis. However, multichannel GPR systems are bulky due to their large dimensions and consequently, this limits its deployment in remote locations and challenging terrains. Furthermore, the bulkiness of the multichannel antennas makes indoor survey not possible. Nowadays most commercial and industry grade GPR systems are equipped with single-channel receiver. Single-channel GPR systems which utilize only one antenna take up very little space and are highly portable. As a result, the portability of these single-channel GPR systems are preferred by many users especially when there is necessity to perform GPR surveys in secluded areas and places with rough terrains. Single-channel GPR systems also take less time to set up as compared to that of multichannel GPR systems. Since single-channel GPR systems only have one antenna, it collects a single data trace at each offset location. Similarly, in common-offset GPR acquisition, each source corresponds to a single antenna. The collected data are often referred as zero-offset because the distance between source and antenna is very small compared to the wave penetration depth.
In general, there is no well-established method to recover velocity information from zero-offset data. In Forte et al. (2014), the study explores a new analytical methodology that extracts velocity values from common-offset GPR data based on reflection and transmission coefficients. However, the study's method made assumptions that each subsurface layer is homogeneous, isotropic, nonmagnetic, nonconductive, and nondispersive. The method also disregards intrinsic attenuation and scattering effects. These assumptions make the methodology not representative of the average Earth's shallow subsurface layers. The study also indicated that the dip of the reflector and thin layers that exist in real subsurface pose a challenge in accurate velocity inversion. In another study (Babcock & Bradford, 2015), zero-offset GPR inversion is shown to work on targeted reflection event of interests such as reflection from thin layer properties, for example, oil-saltwater layers. The study's proposed inversion takes account of prior information (e.g., velocity analysis, conductivity probes) to initialize the inversion for quicker convergence of solution. In the case of inaccurate or faulty prior information, the waveform inversion will struggle to find a convergence, and hence yield inaccurate results. Therefore, all these factors call for an innovation of common-offset GPR velocity analysis method that works universally for all GPR inversion, yet at the same time allows user-defined parameters to be included in the inversion process.
In recent years, there has been a surge in data-driven methods such as deep learning in applications across many scientific fields, with notable success especially in computer vision and natural language processing. Deep learning has proven to be especially powerful and accurate in extracting intrinsic relationships between data variables in a data set that are otherwise highly challenging to formulate analytically. Convolutional neural network (CNN), which is a type of deep learning method, is commonly used to connect nonlinear intrinsic relationship between input data and output labels. Fundamentally, the input data are passed onto multiple hidden layers, which are essentially mathematical functions (e.g., sigmoid function), that compresses and extracts high-level information or commonly known as features. These features are then mapped onto the output which are the corresponding labels for the data. The input data are trained with its output labels based on gradient-based methods such as backpropagation which minimizes the loss function with respect to the weights of the network until convergence. This supervised mode of training can then be applied onto new data sets to predict the output. CNNs have seen an explosive success due to its ability to perform image classification at an incredible accuracy. For example, in the ImageNet Large Scale Visual Recognition Challenge (Russakovsky et al., 2015), CNN based neural network reduced the error rates from 25.8% in 2011 to 3.0% in 2016.
In the geoscience community, deep learning has seen notable applications such as mapping mineral prospectivity (Xiong et al., 2018), automated seismic interpretation (Waldeland et al., 2018), seismic facies classification (Wrona et al., 2018), automated fault detection (Araya-Polo et al., 2017), and seismic traveltime picking (Guo et al., 2020; Zhu & Beroza, 2018). In geophysical inversion, deep learning has also spurred great interests in seismic velocity inversion. For instance, Araya-Polo et al. (2018) used semblance extracted from raw seismic data to predict velocity models. Wu and Lin (2020), Yang and Ma (2019), and Li et al. (2020) used CNN-based architectures to predict two-dimensional (2D) velocity models directly from raw seismic waveforms in the favor of multiple-offset multiple-fold seismic acquisition for oil and gas exploration. Most recently, Das et al. (2019) proposed and showed a robust method of incorporating a shallow layered CNN to recover the impedance profile from each trace in the poststack seismic section. Inspired by this, we are interested to explore the possibility of using deep learning guided techniques to recover EM velocity directly from raw zero-offset GPR data.
The study is organized as follows: First, we introduce the GPRNet neural network model; second, we present our method of generating training data set and prediction on synthetic samples; third, we perform a field case study (at former Wurtsmith Air Force Base, Michigan) to map the groundwater layer and contamination zone; fourth, we test the boundaries of GPRNet's performance. Lastly, we end the study with a discussion and conclusion of the features, limitations, and potentials of GPRNet.
In this section, first, we describe the neural network model (GPRNet) used in the inversion. Next, we present a workflow of the inversion that includes data set generation, training, validation, and prediction on synthetic examples. All our neural network training is performed on a NVIDIA GeForce RTX 2080 Ti graphics card.
We design our neural network model based on DeepLabV3 due to its ability to outperform the state-of-the-art image segmentation models in the PASCAL VOC 2012 (Chen et al., 2017). DeepLabV3 is an image segmentation-based model in which the input and output are spatially aligned; meaning at every grid point, the input and output are related by some intrinsic relationship. Image segmentation models are excellent in learning the hidden relationship between the input and output if they are in the same domain. Hence, it is fitting to adopt DeepLabV3 in our neural network because our input (GPR data) and output (EM velocity) have the same time depth domain.
Our neural network model is shown in Figure 1. The neural network essentially works as an encoder-decoder model. The encoder extracts intrinsic features from the input (GPR data) and decoder deciphers the encoded input to produce an output (EM velocity). At every epoch, the GPR data input is encoded into basic building blocks of features and the decoder rebuilds the features to match the corresponding EM velocity as output. After many epochs, the relationship between GPR data and EM velocity is trained by backpropagation which minimizes the loss function until convergence. In our deep learning model, Conv layers are convolution layers which extracts features by performing one-dimensional convolution over a sliding window on the input data. Pool layers are 1D pooling layers which reduces the dimension of the input data while retaining important information. The Py layers are based on the Atrous Spatial Pyramid Pooling (Chen et al., 2017) in which different Py layers have different convolutional dilation rates. Merge layers combine (concatenate) previous layers into a single layer. MgConv layer is a convolution layer that takes Merge1 layer as input. Deconv layers are also 1D convolutional layers and each successive Deconv layer has decreasing number of filters. Up layers perform 1D upsampling on the input layers. UpMgConv is an upsampling layer that takes MgConv as input. Table 1 shows all the layers' output shapes and other configurations. Note that we use ReLu (Nair & Hinton, 2010) as activation layer and padding is set to “same” at every layer. As for optimization method, we use Adam algorithm (Kingma & Ba, 2014).
|Layer||Output shape||Layer details|
|Conv1 + Pool1 + Dropout||kernel size, filters, 1 stride, 2 pool size|
|Conv2 + Pool2 + Dropout||kernel size, filters, 1 stride, 2 pool size|
|Conv3 + Pool3 + Dropout||kernel size, 4 filters, 1 stride, 2 pool size|
|Conv4 + Pool4 + Dropout||kernel size, 8 filters, 1 stride, 2 pool size|
|Py1 + Dropout||kernel size, 16 filters, 1 stride, 1 dilation rate|
|Py2 + Dropout||kernel size, 16 filters, 1 stride, 6 dilation rate|
|Py3 + Dropout||kernel size, 16 filters, 1 stride, 12 dilation rate|
|Py4 + Dropout||kernel size, 16 filters, 1 stride, 18 dilation rate|
|Merge1 + Dropout||Concatenates Pool4, Py1, Py2, Py3, and Py4|
|MgConv + Dropout||kernel size, 16 filters, 1 stride|
|Deconv1 + Up1 + Dropout||kernel size, 16 filters, 1 stride, 2 up size|
|Deconv2 + Up2 + Dropout||kernel size, 8 filters, 1 stride, 2 up size|
|UpMgConv + Merge+ + Dropout||Concatenates UpMgConv and Up2|
|Deconv + Up3 + Dropout||kernel size, 4 filters, 1 stride, 2 up size|
|Deconv4 + Up4 + Dropout||kernel size, 2 filters, 1 stride, 2 up size|
|Deconv5 + Gaussian dropout||kernel size, 1 filter, 1 stride, 2 up size|
- Note. For our synthetic scenario, L = 1,280, n = 16, and k = 20. For field scenario, L = 400, n = 16, and k = 10. During training, dropout layers are set to dropout rate of 0. This ensures all trainable features are being used during training.
2.2 EM Velocity Inversion Workflow
Generate training data set by generating random EM velocities and their corresponding simulating GPR data.
Train the neural network. The input is zero-offset GPR waveform data and output is the 1D profile EM velocities.
Apply trained neural network on GPR data to obtain EM velocity predictions.
2.2.1 Data Set Generation, Training, and Validation
The resulting EM velocity is in time depth and is used as output in the neural network training. From the EM velocities, we generate their corresponding relative dielectric permittivity () for GPR data simulation (Figure 2). We use 2D finite difference time domain (FDTD) (Irving & Knight, 2006) to perform forward simulation; and by using a zero-offset source-receiver setup, we generate 1D GPR trace from the 1D EM velocities. This 1D data generation can also be done via 1D convolution modeling. We use 2D FDTD modeling to maintain the same forward modeling solver, which will be used later in the 2D model testing case. The computational cost for forward simulation is rather low with the help of parallel computing. For instance, it takes less than 1 h to simulate 10,000 GPR traces using parallel processing with 12 CPU cores (Xeon E5-2687W). After forward data simulation, we remove the direct waves and apply max normalization to the GPR trace before training. We use a standard black Harris wavelet as our source wavelet. Max normalization normalizes the values based on the maximum value of each 1D GPR trace. To sum it up, the generated GPR traces (GPRNet's input) and their corresponding EM velocities (GPRNet's output) are then fed into the GPRNet for training.
We generate 10,000 training datasets, of which 1% (100) is reserved for validation and another 1% (100) are for testing. Each data set consists of a pairing of GPR trace and EM velocity. We use mean squared error for training loss and validation loss and R2 value for accuracy benchmarks. R2 value is used as it provides a statistical measure of how a predicted model fits the ground truth. A R2 value of 0 means the predicted model does not match the ground truth at all. R2 value of 1.0 means the predicted model completely match the ground truth. In our training, the array length, L, of our data set is 1280. We set the kernel size k to be 10 and n filters to be 8. We train in mini-batches with batch sizes of 40. The neural network is set to save the weights only when there is a decrease (over a very small threshold) in validation loss after each epoch. The training converges at epoch 101 (Figure 3). The validation accuracy during training has R2 value of 0.96. The training process takes ∼2 h.
2.2.2 Prediction: Synthetic Examples
We apply the trained model to predict EM velocity from the GPR testing data pool. The results are generally accurate (Figure 4). The R2 value of the entire testing data set averages at 0.95. This means the predicted EM velocities are statistically similar to the ground truth. The neural network is able to accurately generalize the relationship between input and output data (Text S2 shows more information about thin layer predictions). This encourages us to apply our trained model to 2D common-offset GPR data since our model presents compelling evidence that zero-offset EM velocity prediction works very well.
We design a 2D velocity model (Figure 5a) such that the EM velocity values correspond to real EM velocity of geologic materials such as dry sand, wet sand, sandstone, shale, and wet shale. We first generate full common shot gathers via 2D FDTD on the 2D velocity model (Figure 5a). Next, the 2D zero-offset GPR data (Figure 5b) is generated by putting together (horizontal stacking) the traces that are closest to the corresponding sources. The sources and receivers are placed at every 5 mm on the surface. The total horizontal distance of the model is 12.55 m. We apply the trained neural network from our previous synthetic 1D data set to the 2D common-offset GPR data. The prediction is shown in Figure 5c. We notice that some artifacts were introduced but the general geological structure such as the shape of each individual layers is similar to that of the original model. The predicted velocity artifacts (vertical strips in Figure 5c) introduced scattering effects and diffractions as evident in its forward simulated data (Figure 5d). In the forward data, the primary reflections from the major layer interfaces (e.g., layer between l2 and l3) remain. We use Structural Similarity Index Metric (SSIM) (Wang et al., 2004) to evaluate our 2D prediction results. SSIM of 0.0 between two images means they are completely dissimilar. SSIM of 1.0 means the two images are completely similar. The SSIM of the predicted 2D EM velocity is 0.92.
From the 2D EM velocity prediction model, we select a few 1D profiles from the 2D predictions and corresponding ground truths to compare their accuracy (Figure 6). The R2 scores for the velocity profiles at 2, 6, and 12 m are 0.97, 0.93, and 0.98. The predictions are considered highly accurate even though the training library is trained on 1D zero-offset data.
3 Field Case Study
The promising 2D synthetic testing results encourages us to examine GPRNet's applicability in improving field GPR inversion. We use the field GPR data collected at the now decommissioned Wurtsmith Air Force Base by John Bradford and his team in 2002 (Bradford, 2003). The site is a former training facility for the military and large quantities of fuel were burned on open ground during weekly training exercises over the period of few decades (Bradford, 2003). Over the years of military usage, large quantities of hydrocarbons seeped into the underlying aquifer, causing the light nonaqueous phase liquid (LNAPL) contamination around the water table. We want to investigate if our method is able to detect the contamination, reveal any new subsurface features (if possible), and provide a sense of realistic velocity model that describes the shallow subsurface of the area.
The GPR data are collected through multifold acquisition. There were 208 shots with 0.61 m (2 ft) source interval and 0.3 m (1 ft) receiver interval. The common-offset data are extracted by combining all the GPR traces from receivers that are closest to each source. The source-receiver distance (0.91 m [3 ft]) here is relatively small compared to the depth (15–18 m) in which we are interested in imaging. Hence, we assume the common-offset data to be zero-offset data.
3.1 Training Data Set for Field Prediction
In order to encapsulate a wider range of the EM velocity variation, we generate our training data set with EM velocities that ranges from 0.042 to 0.211 m/ns. This corresponds to a dielectric permittivity range from 2 to 50 F/m. We also generate random EM velocities based on a range 4–21 number of layers. In total, we generate 50,000 unique EM velocity profiles. We use FDTD method to simulate the corresponding GPR data. As field data often contain high levels of noise and the fact that we are using data from only one receiver (due to common-offset GPR acquisition), we augment our data set with added noise. Besides, we account for the horizontal velocity variation in the real subsurface by adding random time gain. Our data augmentation is as follows: First, being the original 50,000 unique datasets; second, random gaussian noise (0.15–0.85σ, where σ as the standard deviation of the individual GPR trace) is added to the unique traces; third, random time gain (, where dt is the sampling rate) is added to the unique traces; fourth, random time gain is added to the traces with random noise. The source wavelet used is Ricker wavelet because it is close to the GPR acquisition system used for the study (Bradford, 2003). In total, we generated 1,250,000 data set, of which 1% (12,500) is set apart for validation. The training took ∼10 h. The training progress is smooth and has a final validation R2 accuracy of 0.97.
3.2 Field Data Inversion
We process the field data before applying the trained neural network on it. First, we remove the direct arrival, apply a bandpass filter (30–200 MHz), add time gain, and finally apply max normalization to the 2D field GPR data. We choose the bandpass frequency band of 30–200 MHz as it is the most suitable band that preserves all the important data without introducing low and/or high frequency noises. More clarification about the selection of bandpass range can be found in Text S1. The processed data are shown in Figure 7b.
After prediction, we stack the 1D predicted velocity profiles together to produce a 2D field EM velocity model (Figure 8a). We smooth the velocity model (Figure 8b) for better display. From the stacked velocity model, we perform zero-offset forward simulation to produce forward data (Figure 8c). At the water table (marked in Figure 8b), the forward data from prediction shows matching reflectors as compared to the processed field data (Figure 8d). The reflector is also known as the underground water table of the area (Bradford & Wu, 2007). The known LNAPL (marked in Figure 8b) contaminant plume is found at regions between distances of 35–80 m and time depth of 160–320 ns (Bradford & Wu, 2007). The contaminant plume is known to greatly attenuate the GPR signal (Bradford, 2003; Bradford & Wu, 2007), so we infer that the large attenuation is due to the contaminant's low velocities. Our prediction model shows that the contaminant has low velocities which averages at ∼0.054 m/ns. In addition, the velocity of vadose zone (region of dry and low residual water saturation) above the water table was known to be around 0.14 m/ns (Bradford, 2003). This agrees with our prediction model in which the region above the water table is very high at around 0.10–0.15 m/ns.
We qualitatively compare our velocity model in spatial depth to the velocity model produce by Bradford and Wu (2007), in which they performed a multioffset GPR survey at a site 33 m (100 ft) north of the GPR data collected for our prediction. Since both site surveys are near to each other, we assume that the water table depth will not vary by much. In order to further validate our prediction model, we convert the model back to spatial depth (Figure 9). Our prediction model in spatial depth shows that the water table is around 3.5–6 m. This is largely consistent with the flat water table depth (5 m) calculated by Bradford and Wu (2007). We note that our water table shows a high velocity layer, however, in Bradford and Wu (2007), the study showed a thinner transition layer of intermediate velocities. We cannot verify the exact EM velocity of the water table, but we can show that the predicted forward data (Figure 8c) show matching positive and negative peaks with the field data (Figure 8d). Furthermore, we selected three individual examples of forward data traces to compare with their field data counterparts (Figure 10). The water table waveforms (around 80 ns) for both the prediction and field data closely overlap with each other. The velocity features, as highlighted in Figure 8a, produced the corresponding forward data features (e.g., scatterers) in Figure 8c. By comparing Figures 8c and 8d, we can see that features such as diffraction patterns that are caused by scatterers are present in the field data and similarly, represented in the forward data. Besides, the shallow surface reflectors at around 40 ns (highlighted in Figures 8c and 8b) also have matching data structures in both the forward and field data. These matching data features give us confidence in trusting our predicted model's velocity features. As for deeper regions (e.g., 200 ns onwards in Figure 10), our data matching does not show many similarities and thus we are not confident in trusting those predicted velocities. Nevertheless, we demonstrate excellent data matching in the shallow subsurface layers especially at the water table. This allows us to determine the depth of the water table at a great accuracy and with higher confidence.
3.3 Testing the Boundaries of GPRNet
A common theory in deep learning postulates that the more data set is available for training, the better it is for the neural network to generalize a nonlinear problem. Here we provide empirical testing to investigate how GPRNet performs under limited dataset conditions. We perform two tests: (a) we completely remove the augmentation, so that the data set now contains only the 50,000 unique data sets; and (b) we only use 5,000 data sets from the previous step. In test Case 1, the complete removal of augmented data sets stems from our rationale that augmentation should either be done in large scale or lack thereof. This will also highlight the effects of augmentation on the prediction. In test Case 2, we selected 5,000 data sets or 10% from test Case 1. Here, we assume that 5,000 datasets are considered less than ideal for the field inversion, which we expect the validation score to be lower. For both test cases, we set 1% of the total data set for validation, and 1% for testing. We train both test cases using the same previous GPRNet settings. In test Case 1, the validation score settled at a R2 score of 0.96 and similar for the testing data set. In test Case 2, the validation score converges at a R2 score of 0.93 and similar for the testing data set. We apply both trained models to the field GPR data. The inverted EM velocities are slightly smoothed for better visualization and are displayed in Figure 11.
As expected, although the validation R2 scores indicates considerably good accuracies, the inverted EM velocity models for both test cases showed inconsistencies. We notice a glaring irregularity at the later arrivals (360–400 ns) in which both regions in the test cases show a horizontal band of significantly high velocities. The field data does not support what we see here; there is no sufficient reflectors in the data to suggest the presence of a horizontal band of high velocities. Thus, this is very likely caused by the absence of data augmentation. In our augmentation, we added random noise and random time gain to the training GPR data. Time gain was also added to the field GPR data before inversion to compensate for signal attenuation. The lack of these augmentation limits the ability of the neural network to generalize well especially when there is noise and signal attenuation in the data. As a result, in these test cases models, the neural networks “mistook” the later arrival artifacts (e.g., noise and attenuation) as major reflectors; hence producing irregular velocities. Furthermore, the water tables (80 ns) are not as apparent as that of the prediction when all datasets are used (Figure 8b), which is also likely attributed to the lack of data augmentation. Here, we learn that our data augmentation techniques allow the neural network to better recover the underlying EM velocity that is masked by noise and signal attenuation.
We notice the validation R2 score in test Case 2 decreased to 0.93 as compared to 0.97 when the full data set is used. We expect that the generalization ability of the neural network will continue to suffer as if we train below 5,000 data sets. Although this is not by any means a rigorous test to determine the least possible number of datasets for optimal performance, we think it will provide valuable reference for future research.
This study investigates the problem of estimating velocities from zero-offset GPR data using deep learning. Our 1D synthetic predictions are considered significantly accurate. Despite our training only cover a relatively small range of EM velocities due to limited computational resources, we demonstrate that 1D GPR EM velocity inversion is achievable using state-of-the-art deep learning methods. Furthermore, we show that the 1D trained library is able to accurately predict EM velocities from the 2D GPR data (Figure 5b). For the field case study, our forward data (Figure 8c) from predictions match the original data (Figure 8d) very well. This is akin to full-waveform inversion in which inverted data matches the original data. Our method shows that deep learning guided inversion not only able to perform waveform inversion accurately, but also quickly and efficiently.
Our approach focuses on generating a significantly large number of realizations of 1D velocity profiles. During prediction, our method basically inverts 1D data; and when different random inverted 1D velocity profiles are put together, a resulting complex 2D velocity model (e.g., dipping layers, folding blocks, and overlapping stratigraphy) is formed. This alleviates some of the strong assumptions made by previous physics-based inversion studies (Forte et al., 2014) in which the solution assumes that each subsurface layer is homogeneous and isotropic. Furthermore, we are able to apply data augmentation such as adding random noise to training data set as a proxy to emulate noise picked up in a field acquisition environment. Our method also allows us to add random time gain to both the training data and target data which would compensate for the intrinsic attenuation in field EM surveys.
In real applications of geophysical subsurface inversion, prior knowledge such as initial velocity model provided by tomography and geology (e.g., knowledge about material composition) are pivotal to achieve accurate inversion. In our study, we can incorporate such prior information in the inversion process as well. For instance, we can control the range of random velocities. By having the option to constrain the randomness of velocity generation, it is akin to having the ability to include prior information (e.g., geology knowledge that provides rough velocity estimates of the subsurface layers) in our training data set. Besides, we can alter the range of random number of layers when generating the 1D velocity profiles. This is again similar to having prior knowledge (e.g., shallow well log) about the target area. Putting all these together, we can narrow down the model constraint at which the inversion will work on.
We acknowledge that our synthetic 2D EM velocity model (Figure 5a) is idealistic, simplified, and more importantly does not contain scatterers which would otherwise produce diffractions in the GPR data. In reality, horizontal velocity variations such as faults, synclines, and anticlines will produce many diffractions in GPR data. We assume that our 1D zero-offset prediction would predominantly work on data that are free from these scattering effects. We strongly believe our GPRNet will work best on GPR data that only contains reflections. The reflections can be separated from the original data by diffraction separation method (Merzlikin et al., 2019).
Our tests of removal of data augmentation show that it is essential to add random noise and random time gain show when dealing with common-offset field GPR data. Although we are not exactly sure about the least number of datasets needed to sufficiently generate accurate results, we provided some form of empirical example in which we hope it will help for future research.
We believe future researchers can generate a comprehensive training library that includes a very wide range of EM velocities, of different source wavelets, and varying electrical conductivities. This in turn could be made into an all-inclusive, universal library of neural network weights for zero-offset GPR data inversion that performs with substantial accuracy. GPRNet primarily works by linking the intrinsic relationship between GPR data and EM velocity, both in time depths. For future research, it will be interesting to see if deep learning is able to directly predict EM velocity in spatial depth from GPR data that is in time depth.
EM velocity inversion from GPR data is achievable using well-designed encoder-decoder CNN based architecture. In our case, GPRNet works very well in predicting EM velocity from minimally processed zero-offset GPR data. The prediction of synthetic examples has greater than 90% accuracy for each individual testing data set. Using our trained 1D zero-offset library of weights, it is able to accurately (SSIM of 0.92) predict the 2D velocity model from synthetic common-offset GPR data. In our field case study, we show a realistic (heterogenous) velocity model of the shallow subsurface at the former Wurtsmith Air Force Base. Furthermore, we discover new features (e.g., scatterers) from the predicted velocity model that are confirmed by the presence of diffractions in the field data. The field data inversion works arguably well when we qualitatively compare our results to previous studies done in the field area. The important features such as the vadose zone velocities, water table location, and contamination plume location are recovered in our inversion. We use data augmentation of random noise and random time gain to improve our prediction results. In short, our method shows promising potential to use deep learning-based 1D zero-offset inversion to predict velocity model from 2D GPR data. We anticipate that the open-source GPRNet will directly benefit near surface exploration applications in earth and other planets.
Z. X. Leong was supported by the Petroleum Geosystem Initiative scholarship and Shell Energy Research Facilitation Award scholarship in the Department of Geosciences at The Pennsylvania State University. The field GPR data are provided by the Center for Geophysical Investigation of the Shallow Subsurface (CGISS) at Boise State University. The authors would like to thank Prof. J. Bradford for allowing us to use the data.
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
|2020JB021047-sup-0001-Supporting Infomation SI-S01.pdf540.3 KB||Supporting Information S1|
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
- 2017). Automated fault detection without seismic processing. The Leading Edge, 36(3), 208–214. https://doi.org/10.1190/tle36030208.1
- 2018). Deep-learning tomography. The Leading Edge, 37(1), 58–66. https://doi.org/10.1190/tle37010058.1
- 2015). Reflection waveform inversion of ground-penetrating radar data for characterizing thin and ultrathin layers of nonaqueous phase liquid contaminants in stratified media. Geophysics, 80(2), H1–H11. https://doi.org/10.1190/geo2014-0037.1
- 2007). An introduction to ground penetrating radar (GPR). In G. S. Baker, & H. M. Jol (Eds.), Special Paper 432: Stratigraphic analyses using GPR (pp. 1–18). Geological Society of America. https://doi.org/10.1130/2007.2432(01
- 2003). GPR offset dependent reflectivity analysis for characterization of a high-conductivity LNAPL plume. In Symposium on the application of geophysics to engineering and environmental problems. https://doi.org/10.4133/1.2923166
- 2007). Instantaneous spectral analysis: Time-frequency mapping via wavelet matching with application to contaminated-site characterization by 3D GPR. The Leading Edge, 26(8), 1018–1023. https://doi.org/10.1190/1.2769559
- 2017). Rethinking atrous convolution for semantic image segmentation. Retrieved from http://arxiv.org/abs/1706.05587
- 2019). Convolutional neural network for seismic impedance inversion. Geophysics, 84(6), R869–R880. https://doi.org/10.1190/geo2018-0838.1
- 2014). Velocity analysis from common offset GPR data inversion: Theory and application to synthetic and real data. Geophysical Journal International, 197(3), 1471–1483. https://doi.org/10.1093/gji/ggu103
- 2020). Aenet: Automatic picking of p-wave first arrivals using deep learning. IEEE Transactions on Geoscience and Remote Sensing, 56, 1–11.
- 2006). Numerical modeling of ground-penetrating radar in 2-D using MATLAB. Computers & Geosciences, 32(9), 1247–1258. https://doi.org/10.1016/j.cageo.2005.11.006
- 2014). Adam: A method for stochastic optimization. Proceedings of International Conference on Learning Representations.
- 2020). GPRNet (v0.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.4049031
- 2020). Deep-learning inversion of seismic data. IEEE Transantions on Geoscience and Remote Sensing, 58(3), 2135–2149. https://doi.org/10.1109/tgrs.2019.2953473
- 2019). Least-squares path-summation diffraction imaging using sparsity constraints. Geophysics, 84(3), S187–S200. https://doi.org/10.1190/geo2018-0609.1
- 2010). Rectified linear units improve restricted Boltzmann machines. Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 807–814).
- 2004). Ground-penetrating radar and its use in sedimentology: Principles, problems and progress. Earth-Science Reviews, 66(3–4), 261–330. https://doi.org/10.1016/j.earscirev.2004.01.004
- 2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y
- 2009). A preparatory study on subsurface exploration on Mars using GPR and microwave tomography. Planetary and Space Science, 57(8–9), 1076–1084. https://doi.org/10.1016/j.pss.2008.11.014
- 2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6), WCC1–WCC26. https://doi.org/10.1190/1.3238367
- 2018). Convolutional neural networks for automated seismic interpretation. The Leading Edge, 37(7), 529–537. https://doi.org/10.1190/tle37070529.1
- 2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612. https://doi.org/10.1109/tip.2003.819861
- 2018). Seismic facies analysis using machine learning. Geophysics, 83(5), O83–O95. https://doi.org/10.1190/geo2017-0595.1
- 2020). InversionNet: An efficient and accurate data-driven full waveform inversion. IEEE Transactions on Computational Imaging, 6, 419–433. https://doi.org/10.1109/tci.2019.2956866
- 2015). A young multilayered terrane of the northern Mare Imbrium revealed by Chang'E-3 mission. Science, 347(6227), 1226–1229. https://doi.org/10.1126/science.1259866
- 2018). Mapping mineral prospectivity through big data analytics and a deep learning algorithm. Ore Geology Reviews, 102, 811–817. https://doi.org/10.1016/j.oregeorev.2018.10.006
- 2019). Deep-learning inversion: A next-generation seismic velocity model building method. Geophysics, 84(4), R583–R599. https://doi.org/10.1190/geo2018-0249.1
- 2001). Seismic data analysis: Processing, inversion and interpretation of seismic data ( 2nd ed., p. 998). SEG.
- 2015). Volcanic history of the Imbrium basin: A close-up view from the lunar rover Yutu. Proceedings of the National Academy of Sciences of the United States of America, 112(17), 5342–5347. https://doi.org/10.1073/pnas.1503082112
- 2018). Phasenet: A deep-neural-network-based seismic arrival time picking method. Geophysical Journal International, 216, 261–273.