Mapping the global depth to bedrock for land surface modeling
Abstract
Depth to bedrock serves as the lower boundary of land surface models, which controls hydrologic and biogeochemical processes. This paper presents a framework for global estimation of depth to bedrock (DTB). Observations were extracted from a global compilation of soil profile data (ca. 1,30,000 locations) and borehole data (ca. 1.6 million locations). Additional pseudo-observations generated by expert knowledge were added to fill in large sampling gaps. The model training points were then overlaid on a stack of 155 covariates including DEM-based hydrological and morphological derivatives, lithologic units, MODIS surface reflectance bands and vegetation indices derived from the MODIS land products. Global spatial prediction models were developed using random forest and Gradient Boosting Tree algorithms. The final predictions were generated at the spatial resolution of 250 m as an ensemble prediction of the two independently fitted models. The 10–fold cross-validation shows that the models explain 59% for absolute DTB and 34% for censored DTB (depths deep than 200 cm are predicted as 200 cm). The model for occurrence of R horizon (bedrock) within 200 cm does a good job. Visual comparisons of predictions in the study areas where more detailed maps of depth to bedrock exist show that there is a general match with spatial patterns from similar local studies. Limitation of the data set and extrapolation in data spare areas should not be ignored in applications. To improve accuracy of spatial prediction, more borehole drilling logs will need to be added to supplement the existing training points in under-represented areas.
Key Points
- Observations from soil and geological surveys are combined for developing global spatial prediction models of depth to bedrock
- Machine learning explains 59% of variation in spatial distribution of depth to bedrock for interpolation but much less for extrapolation
- The framework proposed can be used to gradually improve accuracy by adding more ground observations
1 Introduction
Bedrock is either exposed at the earth surface or buried under soil and regolith, sometimes over a thousand meters deep. Understanding the global pattern of underground boundaries such as groundwater and bedrock occurrence is of continuous interest to Earth and geosciences [Schenk and Jackson, 2005; Fan et al., 2013]. In land surface modeling, depth to bedrock (DTB) serves as the lower boundary which affects the energy, water and carbon cycle. A constant DTB was assumed in most models due to the lack of data, but this can limit the performance of land surface models [Gochis et al., 2010]. Lawrence et al. [2008] found that a deepening of the soil column will lead to improvements of the simulated near-surface soil temperature for the permafrost area in the Community Land Model (CLM). Peterman et al. [2014] showed that a variable DTB affects simulated carbon and water in a dynamic vegetation models. Brunke et al. [2016] implemented a global DTB data set [Pelletier et al., 2016] in CLM4.5 and found that there were significant influences on water and energy simulations. Information on DTB is also important to other fields such as hydrology, ecology, soil science, geology, agriculture and civil engineering [Tromp-van Meerveld et al., 2007; Fu et al., 2011]. Bedrock restricts roots, animals and/or other biological activities and is a key soil property of interest for global soil mapping [Arrouays et al., 2014]. In geology, DTB helps geologists describe the natural history of a region and can be used as an input for modelling earthquake and land slide risks [McPherson, 2011]. Information on DTB is useful for mineral exploration [Wilford et al., 2016] and crop yield modelling [Calviño et al., 2003]. Civil engineers need information on DTB to build safe, stable buildings, roads, railways, bridges, and to locate water wells [Price, 2009].
Ground observations of DTB can be used as training data to produce spatial predictions of DTB for the whole area of interest. Various mapping methods, including physically based, interpolation from point samples and empirical-statistical models [Kuriakose et al., 2009], have been used for this purpose. Pelletier and Rasmussen [2009] developed a numerical model to predict soil thickness in the upland by using the balance between soil production and erosion modeled via a digital elevation model (DEM) data. Karlsson et al. [2014] developed a simplified regolith model to estimate regolith thickness in area with high fraction of outcrop based on outcrops, slopes and the distance to outcrops in eight directions. Boer et al. [1996] showed that the performances of the maximum likelihood classifier in the shale and limestone areas were better than that in the phyllite area for the prediction of soil depth of dry Mediterranean areas. Shafique et al. [2009] predicted regolith thickness by landform, elevation and distance to stream. Tesfa et al. [2009] applied generalized additive models and random forest to predict soil depth from topographic and land cover attributes. Dahlke et al. [2009] used class means of merged spatial explanatory variables to extrapolate the soil depth measured at point locations. Wilford [2012] used airborne gamma-ray spectrometry and digital terrain analysis to derive weathering intensity index, which can be used to estimate the appearance of outcrops.
Globally, there are several existing maps of depth to bedrock. One of the first estimates of global distribution of DTB (limited to the upper 2 m) was produced by FAO [1996]. Here global soil depth was mapped using expert rules, and primarily based on the soil unit's classification name, the soil phase and the slope class. Miller and White [1998] derived the DTB for the United States based on STATSGO (State Soil Geographic data). DTB in STATSGO2 is expressed as a shallowest depth of soil components that occupies less than 15% area of the map unit [USDA-NCSS, 2006]. Shangguan et al. [2013] estimated soil profile depth and seven basic horizon thicknesses based on soil classes of China. Hengl et al. [2014] further tried using zero-inflated models to estimate global DTB based on global compilation of soil profiles. All previous examples provide only information about DTB within 2 m. Wilford et al. [2016] produced a regolith depth map for the whole Australia at 3 arc-seconds resolution by using water well records and the R-Cubist package for model fitting and prediction [Kuhn et al., 2014]. Recently, Pelletier et al. [2016] developed a global data set of the average thicknesses of soil, intact regolith, and sedimentary deposits by representing upland areas by soil data and lowland by water well data, using topography, climate, and geology data as input.
Above-mentioned global estimates of DTB are available at coarse resolutions only (1km or coarser) and/or are often of limited accuracy. In addition, soil, hydrological and geological exploration is often done in isolated domains: predictions based only on soil data, i.e., soil maps [e.g., FAO, 1996; Miller and White, 1998; Hengl et al., 2014] are often limited to soil surface with values limited to several meters. Likewise, maps based on boreholes from geological explorations are only available for some states in USA and small regions with values up to several hundred meters [see e.g., Richard et al., 2007; Illinois State Geological Survey, 2004; Witzke et al., 2010]. Combining soil profiles and boreholes in producing DTB maps are necessary to fill this gap and provide consistent estimates.
In this paper we describe a framework to estimate depth to bedrock at the spatial resolution of 250 m by using the state-of-the-art machine learning methods. As training points we use a compilation of publicly available soil profiles and borehole logs. As covariates, we use an extensive list of remote sensing based covariates including the most up-to-date lithologic map of the world, DEM-based hydrological and morphological derivatives and MODIS land products. Our main objective is to use a statistical framework to provide best possible unbiased predictions of DTB. We develop this framework within domain of automated soil mapping as part of the SoilGrids system [Hengl et al., 2014], in which spatial predictions can be gradually improved by adding new training data.
2 Materials and Methods
2.1 Basic Definitions
Although soil depth is commonly recorded during the fieldwork, it can often mean different things to different groups. For example, “soil depth” from many soil databases can not be considered equivalent to DTB. “Soil depth” is probably more comparable with the common synonyms such as: ploughing depth, rooting depth etc. [Miller and White, 1998; Scholes and Colstoun, 2011; Tesfa et al., 2009]. In the Encyclopedia of Soil Science, bedrock is de- fined as: “a rock body underlying a soil and its parent material” [Chesworth, 2008], so that all rocks (no matter soft or hard, consolidated or not) below the soil surface may be considered as bedrock. Weathered rocks or weakly consolidated rocks are sometimes also classified as R horizon or bedrock in WRB and Soil Taxonomy [FAO, 2014; Soil Survey Staff, 2014], although Cr horizon is most commonly used for such cases. Several terms, including “regolith thickness” [Karlsson et al., 2014], “overburden thickness” [Missouri Geological Survey, 2013] and “drift thickness” [Illinois State Geological Survey, 2004] are also used in literature to describe the depth to bedrock. In contrast, the term “bedrock” is used relatively consistently in geological literature [Illinois State Geological Survey, 2004; Missouri Geological Survey, 2013; Karlsson et al., 2014; Jain, 2014], though differences also exist.
which is considered to be equivalent to the definition of the R horizon (hard rock) in soil science [Schoeneberger et al., 2011].“the consolidated solid rock underlying unconsolidated surface materials, such as soil or other regolith,”
“depth (in cm) from the ground surface to the contact with coherent (continuous) bedrock.”
As such, DTB is a skewed variable with a lot of values grouped around 0 depth, while maximum values can range up to a few thousand meters. Exposed bedrock or bedrock visible at surface is referred as “rock outcrop,” i.e., DTB= 0 [Jain, 2014].

Schematic explanation of the depth to bedrock.
2.2 Observation and Measurement of DTB
- In soil science—where bedrock is considered in soil profile description and soil classification [Soil Survey Staff, 2014; Juilleret et al., 2014] and is commonly labeled as R layer or horizon. Although many countries have their own classification systems, bedrock, i.e., the R horizon, is often the least problematic variable for harmonization and translation from national to international systems.
- In geology — where bedrock is identified via excavation, borehole drilling and via geophysical sensoring. Borehole drilling includes water, oil or gas wells and holes drilled for other purposes such as geotechnical investigations, mineral exploration or temperature measurement. Geophysical methods of measuring DTB include refraction microtremor, electrical resistivity, ground penetrating radar, seismic refraction, seismic reflection, resistivity methods and similar [Lowrie, 2007]. Although measurements from excavations and geophysical methods such as electrical resistivity are quite reliable [Yamakawa et al., 2012], they are still not appropriate for large area due to the cost and time of sampling [Karlsson et al., 2014].
From the two DTB data sources above, soil survey data contain measurements of depth to bedrock, i.e., depth to the R horizon in most cases [Schoeneberger et al., 2012]. Records from borehole drillings (primarily water wells) frequently contain more accurate DTB measurements and/or lithology observations than soil survey data, which will likely remain the major data source for DTB and are also the main focus in this paper. Soil profiles are usually less than 200 cm, which are censored observations. On the other hand, borehole drilling logs are usually very deep, and sometimes as deep as thousands of meters. The problems of the definition of DTB and its measurements were discussed in the discussion section.
2.3 Training Data
We use three major data sources for the purpose of training global spatial predictions models for DTB (Figure 2):

Global distribution of depth to bedrock observations. (a) Red colors indicate soil profiles, (b) blue colors boreholes, and (c) the yellow colors pseudo observations, i.e., points inserted using expert knowledge.
- A global compilation of soil profiles data.
- A global compilation of borehole drilling logs.
- Pseudo-observations of DTB, i.e., simulated points containing values of target variables based on the remote sensing data (shifting sands and rock outcrops) and on published literature sources (observations without coordinates). For example, it is known from literature that depth to bedrock in Sahara is on average about 150 m [Dregne, 2011]. Also the rock outcrops (DTB=0) are highly correlated with the slope of local terrain—after a certain slope is reached the chances of surface rock outcrops becomes high, hence we use a global map of terrain slope to generate pseudo-points for very steep terrains (>40° slope).
- the absolute DTB in cm,
- the censored DTB in cm within 0–200 cm (here values equal to 200 cm indicate “deep as or deeper than”), and
- the occurrence of R horizon (bedrock) within 0–200 cm expressed as 0–1 probability values.
The absolute DTB is only available for borehole drilling data and for soil profiles where the absolute DTB is within the observed depth. Censored DTB (within 0–200 cm) is, on the other hand, heavily skewed variable (essentially a zero-inflated variable) with majority (>90% values >200 cm). Censored DTB within 0–200 cm and occurrence of R horizon within 0–200 cm are available at all locations, hence these are the most complete variables.
2.3.1 Soil Profiles
We used the global compilation of soil profiles generated and maintained at ISRIC which includes various national and regional soil profile databases [Hengl et al., 2014; Ribeiro et al., 2015]. In almost all cases, there were no direct records of DTB, and DTB was derived by identifying the R horizon (or based on coarse fragments) and then matching the observed depth for the given horizon. The systematic import of soil profiles resulted in total of 132,193 points with observed or censored DTB (Figure 2). Note that the soil profiles have good spatial coverage, but they are in >80% of cases censored, i.e., for many points we only know that DTB is deeper than 200 cm, but we do not know actual absolute DTB. All import steps have been documented via Github (R code, https://github.com/ISRICWorldSoil/SoilGrids250 m).
2.3.2 Boreholes
We use 1,574,776 points with borehole logs from: the United States (661,441), Canada (580,063), Australia (5,943), Sweden (320,451), Ireland (4,250), Brazil (2,004), China (598) and Russia (26). The spatial distribution of boreholes is shown in Figure 2. Many states in the US established digital water well databases over the last several decades. The databases includes data from Northern High Plains aquifer, South-Central Kansas, and 14 state databases, i.e., Alaska, Indiana, Iowa, Kentucky, Maine, Minnesota, Missouri, Nevada, New Hampshire, New York, Ohio, Pennsylvania, Tennessee and Vermont. The coordinates of the points from Alaska were derived from the Public Land Survey System, with a geo-location error ranging from ±50 m to ±800 m (still compatible with our target resolution of 250 m). For Canada, four provinces, i.e., British Columbia, Nova Scotia, Prince Edward Island and Quebec, have a water well database. The list of water wells from the United States and Canada are given in the supporting information. Boreholes of Russia are from Melnikov [1998].
For Australia, we derive DTB from the Australia National Groundwater Information System (ANGIS) (http://www.bom.gov.au/water/groundwater/). Each well contains multiply layers of construction, hydro-stratigraphy and lithology logs, which can be used to determine location of the the bedrock [Wilford et al., 2016]. Although the number of recorded points is >200,000, only 5,992 points from the total can be classified as DTB measurements with high enough certainty to be further used for building global spatial prediction models. The lookup tables used to convert original records in the ANGIS to values used for building global spatial prediction models are available in the supporting information. For Brazil and China, DTB was extracted from the lithology layer description by manual interpretation. The Brazil Groundwater Information System (SIAGAS, http://siagasweb.cprm.gov.br) contains 273,972 water wells and the Chinese National Database of Geological Drilling (http://zkinfo.cgsi.cn) contains 410,123 boreholes. Only a small fraction contains lithological data that was used as training points, which distributes across Brazil and China quite evenly.
2.3.3 Pseudo or Expert-Based Observations
- Based on the global mask maps of sand dunes areas and steep bare surface areas (i.e., Himalayas) generated using remote sensing and slope map of the world, and
- Based on the detailed geological maps reporting rock outcrops.
We generated the global mask maps of sand dunes areas and steep bare surface areas using the global MODIS surface reflectance product (MCD43A4) and global DEM and slope maps based on the SRTM DEM [Rabus et al., 2003], both derived at 500 m. After some visual inspection, we discovered that the medium infrared band 7 from the MCD43A4 land product [Moody et al., 2005] can be used to detect areas of high surface reflectance (sand dunes and bare rock). For the shifting sand areas we randomly inserted 300 points (DTB=150 m; average depth of the sand in Sahara) and for the steep bare surface areas 200 points (DTB=0 m). Again, these points were carefully inserted only for the purpose of filling the possible gaps in the data. The resulting global mask maps used to generate pseudo-observations are shown in Figure 3.

Global mask maps of shifting sand areas (above) and steep bare surface areas. This map was derived using the medium infrared band 7 from the MCD43A4 MODIS land product, and global DEM and slope images (based on the SRTM DEM). Projected in the original MODIS sinusoidal projection system.
In the second approach, we also generate few hundred points by using a number of detailed regional geological maps. Regions having exposed bedrock maps include New York State, Vermont, Alaska, Alberta, Manitoba and Newfoundland and areas covered by NRCan Groundwater Program (http://gin.gw-info.net/). All steps used to generate pseudo-points have been documented via Github (R code).
2.4 Covariate Layers
2.4.1 Land Mask and Covariates
We generate predictions using the official land mask (defines the prediction area) used within the SoilGrids project for the purpose of global soil mapping. The global soil mask excludes water bodies, and all areas covered with permanent ice, i.e., areas to the south of 60°S.
The land mask is visible from the final prediction maps shown in Figure 8. As covariates, we use 155 global environmental layers (most of them available from http://worldgrids.org/), which include:
- Global lithological map [Hartmann and Moosdorf, 2012],
- Global landform map [Sayre et al., 2014],
- Global land cover GLC30 product [Chen et al., 2015],
- Climatic surface based on WorldClim [Hijmans et al., 2005],
- MODIS land products, including EVI images and surface reflectance bands,
- Global Water Table Depth in meters based on Fan et al. [2013],
- Global 1 km Gridded Thickness of Soil, Regolith, and Sedimentary Deposit Layers based on Pelletier et al. [2016].
The complete list of covariates is given in the supporting information. Note that the map by Pelletier et al. [2016] is generated by combining process-based models and empirical models, and is as such ideal for statistical calibration using actual point data. For this purpose we use the layer of average soil and sedimentary-deposit thickness which shows only depths up to 50 m.
2.5 Model Fitting and Validation
The framework of generating spatial predictions consists of four main steps (Figure 4): overlay observations of DTB and covariates and prepare regression matrix, fit prediction models, apply spatial prediction models using covariates, and assess accuracy using cross-validation and compare the prediction with regional maps.

The spatial prediction framework used to fit models and predict DTB variables globally at 250 m resolution.
Spatial predictions were generated using an ensemble model based on two data-driven algorithm implemented via the R environment, i.e., random forest (ranger package) and Gradient Boosting Tree (xgboost package). Both models are tree ensemble methods. The random forest model uses fully grown decision trees (low bias, high variance) and reducing error by reducing variance[Breiman, 2001]. The Gradient Boosting Tree uses shallow trees (high bias, low variance) and reducing error mainly by reducing bias, and also to some extent by reducing variance by aggregating the output from many models [Chen and Guestrin, 2016].


To evaluate the extrapolation risk, we used a procedure as follows (referred as “cross-validation by region”). First, all samples was partitioned into subsets by regions. Then, the spatial prediction model was calibrated using one subset of a region (or regions). Finally, this model was validated using the other subsets (or other subset). At the continental scale, the spatial prediction model is calibrated using data from one continent and then applied it to the other two. The three continents are North America (United States and Canada), Europe (Sweden and Ireland) and Australia. A similar procedure is applied to the provinces of Canada and states of US. For convenience, we call these spatial prediction models continental models and state (province) models. The extrapolation risk is also evaluated by leave one state out in calibration for the United States. For convenience, we called such spatial prediction model such as the “without Ohio” model. All code used to generate predictions is available from the Github channels (https://github.com/ISRICWorldSoil/SoilGrids250m).
3 Result
3.1 Summary Statistics
The statistics of the absolute DTB and the censored DTB is given in Table 1. Figure 5 shows the histogram of the absolute DTB and the censored DTB. The absolute DTB after logarithm transform had distribution similar to normal distribution but with many zero values (i.e., outcrops). The frequency of values larger than 1 m from soil profiles decreased as the DTB increases. Many borehole values were around 0.5 m, 1 m, 1.5 m, 2 m, etc. as well as in integer multiples of one foot (i.e., 30.48 cm). This is due to the fact that the DTB is usually recorded in feet or (half-) meters in borehole logs.

Histograms of (a, b) absolute depth to bedrock (DTB) and (c, d) censored DTB. For absolute DTB, values equal or large than 8800 cm are not shown. For censored DTB, values equal or large than 200 cm are not shown. The number of observations are 1,590,464, 13,416 and 2,93,095 for (a, b) absolute DTB, (c) censored DTB from soils, and (d) censored DTB from wells, respectively.
Variable | Continent | Minimum | Mean | Median | Maximum | Number |
---|---|---|---|---|---|---|
Africa | 2 | 1,337.3 | 125 | 15,000 | 3,281 | |
Asia | 0 | 1,057.9 | 15 | 65,379 | 2,070 | |
Absolute DTB | Oceania | 0 | 3,335.9 | 2,250 | 66,900 | 6,251 |
Europe | 0 | 690.5 | 400 | 22,000 | 281,563 | |
North America | 0 | 1,487.4 | 850 | 312,541 | 1,227,393 | |
South America | 0 | 1,595.1 | 500 | 37,000 | 2433 | |
World | 0 | 1,309.3 | 670 | 312,541 | 1,590,464 | |
Africa | 2 | 110.01 | 110 | 195 | 2,636 | |
Asia | 0 | 25.4 | 10 | 197 | 1,543 | |
Censored DTB | Oceania | 0 | 61.63 | 55 | 198 | 805 |
Europe | 0 | 87.73 | 100 | 198 | 78,491 | |
North America | 0 | 105.88 | 120 | 199 | 192,214 | |
South America | 0 | 29.25 | 10 | 190 | 892 | |
World | 0 | 97.51 | 100 | 199 | 307,936 |
- a There are 1,379,502 observations which have a value equal or large than 200 cm, and these are excluded in calculating statistics of censored DTB.
3.2 Model Fitting Results
In most cases, model fitting using random forest and Gradient Boosting Tree algorithms do not report major difference. However, Gradient Boosting Tree reports somewhat lower R2, but similar RMSE as derived using Out-Of-Bag training samples. Table 2 shows complete summary results for model fitting and cross-validation.
Variable | Type | Units | Range | Model Fit RF (R2) | Model Fit GB (R2) | Amount of Variation Explained | ME | RMSE |
---|---|---|---|---|---|---|---|---|
Absolute DTB | log-normal | cm | 0–312,500 | 0.61 | 0.38 | 0.59 | −24.6 | 1,172 |
Censored DTB | zero-inflated | cm | 0–200 | 0.35 | 0.25 | 0.34 | 1.25 | 51 |
Occurrence (of R horizon) | binomial | prob. | 0–1 | 0.35 | 0.23 | 0.34 | −0.006 | 0.34 |
- a RF indicates random forest, GB indicates Gradient Boosting Tree. Amount of variation explained, mean error (ME) and root mean square error (RMSE) were determined using 10–fold cross validation.
Figure 6 shows the scaled importance of covariates measured by residual sum of squares of the random forest models. The most important covariates for the absolute DTB were precipitation, surface reflectance of MODIS MIR band 7, valley depth and DEM. It should be noted that the DTB determined by Pelletier et al. [2016] was also important in prediction. Topography and geological units are also clearly visible in the local patterns, while climatic conditions are most visible in the continental patterns. The most important covariates were similar for the censored DTB and the occurrence of R horizon, which include the latitude, surface reflectance of MODIS NIR band 4, daytime land surface temperature, MODIS precipitable water vapor, surface roughness and Multi-resolution Index of Valley Bottom Flatness and DEM.

Scaled importance of covariates with the resolution of 250 m for target variables by random forest model. NIR is near infrared radiation. MIR is middle infrared radiation. MRVBF is Multiresolution Index of Valley Bottom Flatness. LST is land surface temperature. PWV is precipitable water vapor.
3.3 Mapping Accuracy
Table 2 also shows cross-validation summary statistics of interpolation for models based on random forest and Gradient Boosting Tree. Random forest yielded more accurate predictions than Gradient Boosting Tree. The percentages of explained variance by random forest were relatively high for the absolute DTB compared to the other two target variables. Figure 7 shows the cross-validation plot of the absolute DTB. This shows that for the absolute DTB, lower values (especially values near zero) are significantly overestimated. Overestimation of lowest values is a common problem in regression, especially when the model is not able to explain >50% of variability in the target variable. Figure 7 also shows that prediction limits are relatively wide. For the censored DTB, most of lower values were overestimated, while higher values around 2.5 m are underestimated. The prediction of the occurrence of R horizon has an AUC value of 0.87, which indicates the prediction is quite good. The error rates were 23.6% and 23.9% for the random forest and Gradient Boosting Tree, respectively.

Plot showing cross-validation results for absolute depth to bedrock on the logarithmic scale. R-square is calculated using formula in equation (1).
Table 3 shows the goodness of fitting of continental spatial prediction models and their validation. R2 of calibration are from 0.51 to 0.68. R2 were very low (below 0.04) for extrapolation, while they were from 0.44 to 0.63 for interpolations. The interpolation ME had similar value to those of calibration. Extrapolations had higher absolute ME in most cases compared to interpolations and calibrations. An exception is that the extrapolation of North America by the Australia model had a similar absolute ME. The North America model had the best performance in calibration, but this model had similar accuracy of extrapolating predictions with other models. Table 4 shows calibration and validation metrics of province models of Canada. The Nova Scotia model had the lowest R2. All extrapolations had a low value of R2. The highest R2 of extrapolations was produced by the Québec model in predicting British Columbia. This implied that the predictability of extrapolation increased at the province scale. Similar results were observed for the state models of United States (not shown). Table 5 shows the goodness of fitting of leave one state out models of United States and their validation. There was a significant decrease in the calibration R2 when the Northern High Plains aquifer was left out for calibration. The R2 of the “without Northern High Plains” model was 0.677, while other models were around 0.72. Except the “without Iowa” model and the “without Ohio” model, all models had a R2 below 0.1 in extrapolation. The results showed that a general model (i.e., leave one state out model) gave better results in extrapolation than a local model (i.e., state model) for most cases.
Calibration Area | Calibration | Validation | ||||||
---|---|---|---|---|---|---|---|---|
R2 | ME | |||||||
R2 | ME | North America | Europe | Australia | North America | Europe | Australia | |
North America | 0.684 | −0.02 | 0.63 | 0.005 | 0.0001 | 0.05 | 1.11 | −0.64 |
Europe | 0.513 | −0.04 | 0.006 | 0.442 | 0.004 | −0.75 | −0.03 | −2.26 |
Australia | 0.598 | −0.05 | 0.029 | 0.0003 | 0.546 | 0.11 | 0.95 | −0.11 |
- a ME is mean error, which is calculated after logarithm transform.
Calibration Area | Calibration | Validation | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
R2 | ME | |||||||||
R2 | ME | British Columbia | Nova Scotia | Ontario | Québec | British Columbia | Nova Scotia | Ontario | Québec | |
British Columbia | 0.467 | −0.03 | 0.407 | 0.017 | 0.004 | 0.044 | −0.03 | 0.48 | 0.85 | 1.05 |
Nova Scotia | 0.376 | −0.006 | 0.0001 | 0.302 | 0.036 | 0.005 | −0.4 | 0.01 | −0.32 | 0.15 |
Ontario | 0.755 | −0.01 | 0.048 | 0.013 | 0.604 | 0.06 | −0.22 | 0.02 | 0.03 | 0.4 |
Québec | 0.444 | −0.03 | 0.065 | 0.025 | 0.008 | 0.386 | −1.92 | −1.01 | −0.64 | 0.006 |
- a ME is mean error, which is calculated after logarithm transform.
Calibration Area (Without) | Calibration | Validation | ||||
---|---|---|---|---|---|---|
R2 | ME | |||||
R2 | ME | Interpolationb | Extrapolation | Interpolationb | Extrapolation | |
Indiana | 0.735 | −0.019 | 0.644 | 0.092 | −0.026 | −0.261 |
Iowa | 0.717 | −0.018 | 0.63 | 0.201 | 0.024 | 0.041 |
South central Kansas | 0.727 | −0.019 | 0.637 | 0.045 | −0.022 | 0.285 |
Kentucky | 0.717 | −0.02 | 0.677 | 0.003 | 0.11 | 0.68 |
Maine | 0.73 | −0.021 | 0.613 | 0.012 | 0.029 | 0.221 |
Minnesota | 0.714 | −0.019 | 0.616 | 0.099 | 0.026 | −0.567 |
Missouri | 0.741 | −0.023 | 0.683 | 0.054 | −0.024 | −0.306 |
New Hampshire | 0.726 | −0.019 | 0.648 | 0.051 | 0.039 | 0.003 |
New York | 0.746 | −0.02 | 0.684 | 0.008 | −0.034 | −0.151 |
Ohio | 0.722 | −0.021 | 0.663 | 0.17 | −0.041 | 0.011 |
Northern High Plains | 0.677 | −0.018 | 0.621 | 0.033 | 0.008 | −0.739 |
Pennsylvania | 0.736 | −0.02 | 0.674 | 0.007 | −0.016 | −0.042 |
Tennessee | 0.732 | −0.02 | 0.653 | 0.002 | −0.042 | −0.296 |
Vermont | 0.729 | −0.02 | 0.652 | 0.012 | 0.034 | 0.119 |
- a ME is mean error, which is calculated after logarithm transform.
- b The average of all interpolations of leave one state out models.
3.4 Final Predictions
Figure 8 shows output prediction of the absolute DTB, the censored DTB and the occurrence probability of the R horizon by the ensemble model based on random forest and Gradient Boosting Tree at 250 m resolution. We choose to map DTB at the resolution of 250 m based on the available data sources and our available computing power. The mean absolute DTB predicted was 33.6 m. High values of DTB are visible in the desert areas and around 70°N, 45°N and 40°S. Somewhat lower values of DTB are visible in the tropics. For the censored DTB, low values were found in the mountainous areas and especially in Mexico. About 85% of the land surface we predict DTB to be larger than 2 m. R horizon i.e., shallow soils seem to be most correlated with topography, are visible in the mountainous areas along the major mountain chains. These patterns fit well with expert knowledge [Brown et al., 2001; Dregne, 2011; Howell, 1960; Swinford, 2004].

Final prediction of (a) the absolute depth to bedrock (cm), (b) the censored depth to bedrock (cm, here values equal to 200 cm indicate “deep as or deeper than”), and (c) occurrence of R horizon within 200 cm (%). The maximum value of the absolute depth to bedrock is set as 250 m for the convenience of visualization. But the actual maximum predicted value is about 540 m.
3.5 Comparison With Regional Maps and Observations
We used regional maps of DTB from Iowa and Ohio to validate global predictions both visually and statistically. The Iowa map was drawn by geologists based on various data sources, including bedrock outcrop maps, water wells, boreholes and soil description (filtered for those soils encountering bedrock) [Witzke et al., 2010]. The Ohio map was produced using over 162,000 data points as control for the bedrock-topography lines [Swinford, 2004]. Ground-moraine dominated areas have a shallow DTB, the Ice-deposited Wisconsinan-age ridge moraines generally have a medium DTB, and limited areas of deep DTB are largely the results of deep bedrock valleys filled with drift.
The correlation coefficient between our prediction and the regional maps are 0.82 and 0.6 for Iowa and Ohio, respectively. Although the regional maps of DTB cannot be considered a ground truth, these maps can be nevertheless considered several times more accurate than our global predictions. For both areas, there is an underestimation according to the mean error (−422 for Iowa and −528 for Ohio). Although the differences in Figures 9 and 10 indicate that there is some underestimation of higher values, especially in the case of Ohio, this comparison also shows that the general patterns between regional maps and our predictions match in most cases. In Iowa, the bedrock surface is buried by unconsolidated surficial sediments (mostly Quaternary) over most of its extent. In the southwest and northwest of Iowa, shallow DTB was found. Most areas of Ohio are covered by sediments left by continental glaciers. In the southwest Ohio, the bedrock surface is very close to the land surface as this area was free from glaciation.

Comparison of (a) regional map of Iowa, (b) our prediction and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the correlation coefficient indicate how well our prediction and Pelletier et al.'s [2016] prediction match the regional predictions. Values have been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took out the values no less than 50 meters for the corresponding scatter plots.

Comparison of (a) regional map of Ohio, (b) our predictions and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the correlation coefficient indicate how well our prediction and Pelletier et al.'s [2016] prediction match the regional predictions. Values have been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took out the values no less than 50 meters for the corresponding scatter plots.
The correlation coefficient between the map of Pelletier et al. [2016] and the regional maps are 0.27 and 0.24 for Iowa and Ohio, respectively. For both areas, the spatial patterns were quite different between them (Figures 9 and 10). For Iowa, the deep DTB in the east part didn't appear in the map of Pelletier et al. [2016]. For Ohio, the frequency of medium values in the map of Pelletier et al. [2016] were very low compared to the regional map, and most values were either near zero or 50 m.
Figure 11 shows the comparison between the observations, our prediction and the regional DTB maps along a line in Iowa and Ohio. In general, our predictions coincide well with the observation and the regional maps. Compared to the regional map of Iowa, our prediction had an underestimation for the bedrock valley along the way from Alvord to Wever. Compared to the regional map of Ohio, our prediction had an overestimation for the hill slope around 100km from Montpelier to Pomeroy, and an underestimation for the valley around 250km.

Comparison of measured and predicted absolute depth to bedrock for (a) Iowa and (b) Ohio. The points are the observations. The black line is the land surface elevation. The red line is the predicted DTB. The blue line is the DTB of regional map.
Because our data set used the map of Pelletier et al. [2016] as a covariate in the prediction, the comparisons above may be biased. However, the results from cross-validation shows that the amount of variation explained decreased from 58.7% to 58.6% when the map of Pelletier et al. [2016] was took out from the covariate list. The reason of this may be that many of the patterns in the map of Pelletier et al. [2016] has been already represented in the existing list of covariates (especially DEM-derived parameters which are also used as covariates in producing the map of Pelletier et al. [2016]). Thus, the resulting map will not change much if the map of Pelletier et al. [2016] is taken out as a covariate, and the comparisons above is not problematic.
Figures 12 and 13 show the comparison between observations, our prediction and the map of Pelletier et al. [2016] for Kentucky and Pennsylvania. The correlation coefficient between our prediction and observations was relative high. The machine learning models could reflect the major spatial pattern of DTB. However, the underestimation of high value and the overestimation of low value were significant. On the contrary, the map of Pelletier et al. [2016] gave extreme estimations, i.e., very high or very low, for almost all the areas. But the major spatial patterns in observations are not reflected. For example, almost the whole state of Kentucky has a shallow DTB, and the high values in the southeast corner of the state are almost missing. This may be caused by the misclassification of landform.

Comparison of (a) observations of Kentucky, (b) our predictions and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the correlation coefficient indicate how well our prediction and Pelletier et al.'s [2016] prediction match the observations. Values have been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took out the values no less than 50 m for the corresponding scatter plots.

Comparison of (a) regional map of Pennsylvania, (b) our predictions, and (d) map of Pelletier et al. [2016]. (c, e) The scatter plots with the correlation coefficient indicate how well our prediction and Pelletier et al.'s [2016] prediction match the regional predictions. Values have been stretched using a log-scale to emphasize spatial patterns. Note that the maximum value of Pelletier et al. [2016] is 50 m. And we took out the values no less than 50 m for the corresponding scatter plots.
We validated the map of Pelletier et al. [2016] with our DTB observations by excluding the values no less than 50 m because the maximum value of Pelletier et al. [2016] is 50 m. For interpolation area including Indiana, Kentucky, New York and Pennsylvania where they used DTB data for calibration, the amount of variation explained is 5%. For extrapolation, the amount of variation explained is 2%.
4 Discussion
We used the most abundant depth to bedrock observations from soil survey and geologic boreholes (primarily water wells) to estimate the global spatial distribution using data-driven models. This work presented the most up-to-date global DTB maps with higher resolution 250 m and higher accuracy compared to previous studies such as Pelletier et al. [2016]. The cross-validation statistics show that the absolute DTB maps and the occurrence of R horizon have moderate accuracy, and the censored DTB map has a low accuracy. There is overestimation of the absolute DTB with mean error of −0.25 m, and an underestimation of the censored DTB with mean error of 1.25 cm. The large RMSE (11.7 m) in relation to the mean predicted values highlights the need for considered use of the depth predictions. Our prediction patterns of DTB also match with regional maps from Iowa and Ohio, although the average differences in values are about ±10 m.
4.1 Problems of the Definition of Depth to Bedrock and Its Measurements
In this study, we used DTB observations from soil profiles and borehole drillings, and considered that they are under the same definition. However, some problems exist on the definition of depth to bedrock and its measurements. In the soil survey, the definition of R horizon or hard rock is not strictly equal to bedrock, because intact regolith (weathered bedrock) may be included in R horizon. Though the definition of bedrock in geology survey is more consistent, DTB measurements of borehole drillings are based almost entirely on the judgements of nonscientists (i.e., drillers) and this lowers the accuracy of DTB. As shown in Figure 1, the majority of soil profiles (usually, less than 2 m) do not encounter the bedrock. So the DTB from soil profiles are censored data. In contrast, borehole drillings goes much deeper and they do not encounter the bedrock in few cases. For pseudo-observations, we also assumed that DTB is zero where local slopes exceed 40°, even though such surface bedrock is often highly fractured and porous. However, this assumption was made for simplicity and may not be a serious problem, because the resulting maps did not change significantly when we changed these values to a random number between 0 and 20 cm.
4.2 Success and Limitation of the Data Set
As mentioned above, soil profiles are censored data. And censored data will produce censored map. As a result, maps produced solely from soil profiles can not be interpreted as a true DTB, but as “deeper than” the predicted values. In this study, we used deep observations from boreholes, which can compensate the shallow observations from soil profiles. Thus, the predicted maps were more realistic. For the occurrence of R horizon, the models provided relative reliable estimation. However, for the censored DTB, the models were not very successful in finding the relationship between the target variable and the covariates, and the resulting map remains experimental.
The amount of variation explained by the models for the absolute DTB is about 59%, which means almost half is unexplained. Mapping depth to bedrock is certainly complex (as soils are hidden, results of past gradual and abrupt processes). Most likely more detailed geomorphological maps and lithological maps could be the key for improving the predictions. At the moment we used the GLiM data set, which is actually of very general scale and low quality. As soon as a more detailed global lithological data arrives to the public domain, it will be useful to improve the predictions.
It is a common thing in regression, such as the machine learning models we used, that low/high values can get smoothed out in the case R-square is smaller. The deepest observation in the source data is about 3000 m. But the actual maximum predicted value is about 540 m. The machine learning models also overestimated zero DTB values, i.e., many outcrop were predicted as values around 300 cm (Figure 7). As a result, the hint of Andes, Himalayas or many other mountain ranges, where DTB is near zero, is not very clear in the map of absolute DTB. Another reason of the poor performance for mountain ranges is that we have few observations there but only some pseudo-observations.
We could not predict deep values such as > 1km deep in Andean foreland basin because the borehole data are also censored to some extent, i.e., we do not have much deep observations in such areas. There is no universal requirement on how deep a drilling should go. So we do not know how much the borehole data are censored (likely dozens of meters). Luckily, most applications including Earth System Models are more interested for the shallow DTBs. Even though we estimate the absolute DTB, it should be considered as a censored DTB when the interest is for the deep DTBs.
4.3 Process-Based Model and Empirical Model
Pelletier et al. [2016] distinguished global land surface into three landform components, i.e., upland hillslope, upland valley bottom, and lowland and used different models for each component to estimate the DTB. For upland hillslope, a model based on the balance of soil production and erosion was calibrated by soil thickness using topographic curvature and mean annual rainfall to estimate soil thickness, and the regolith was estimated based on water table depth, which has a high uncertainty. For upland valley, the DTB was estimated by assuming that the side-slope project down to a V–shape valley. For the lowland, an empirical model was established between the topographic roughness index and the DTB using water well logs. Process based models have strength in capturing one or two major factors and apply the general rule to global, but their simplification of soil and regolith formation processes may have ignored other factors. The advantage of statistical models, such as random forest and Gradient Boosting Tree used in this study, is that they can utilize as many covariates as possible, including DTB maps, and reflect the spatial variation of the relationship between the target variables and the covariates. Compared to the process based model and other empirical models, however, the machine learning models driven with big data require considerable computational power. Due to the comparison in the result section, our data-driven method provided more accurate predictions in interpolation. But the process based model had its strength in extrapolation areas where it had a R2 around 0.02 which was slightly higher than those of empirical models (Table 3), though both produced maps with high uncertainty. These two approaches were complimentary, and the best way may be to use data fusion approaches to compensate weakness of the two approaches. It is still challenging to estimate the global DTB accurately due to lack of observations and poor understanding of the processes affecting DTB and improvements were needed for both approaches. In our study, we used the DTB map by Pelletier et al. [2016] as a covariate, and it came out as the seventh important for the absolute DTB as shown in Figure 6. This indicated that our prediction has quite different spatial patterns, but broad similarities are visible. The algorithm picked up precipitation as the most important covariate, which coincide with its control on the rate of soil production. Topographic parameters, including DEM, valley depth and the Multiresolution Index of Valley Bottom Flatness, were also important factors as they affect soil erosion processes and the transportation of sedimentary depositions. Surface reflectance of MODIS MIR band 7 also plays an important role in predicting DTB. One deficiency of our study is that with the exception of the geology units, the covariates used in this study reflect the current surface or the subsurface conditions. Therefore there is little covariate information relating to deeper conditions and/or to long term changes to DTB that could be included in the model fitting.
4.4 Extrapolation Risks
The “homosoil” is proposed to extrapolate from reference areas with soil data to interested areas without soil data when these areas have similar soil forming factors [Boettinger et al., 2010]. However, the accuracy of the extrapolation area is usually much lower than the interpolation area (Table 3-5). It should be noted that the accuracy assessment by cross-validation in this study is valid for the interpolation areas. Not only the extrapolation in feature space but also the extrapolation in geographic space will lead to the poor performance of spatial prediction models. In this study, though the spatial coverage of soil profiles was quite good, boreholes are spatially clustered and the spatial coverage of boreholes was not ideal. Systematic omission of deep DTB observations where there are no water wells or other boreholes led to the underestimation of the DTB. For example, the tropical rainforests usually have a very deep regolith, but the above feature is not predicted in the resulting DTB map due to the lack of deep observations in those areas. We used the ellipsoid defined by Montgomery et al. [2001] to determine the feature space similarity. The results shows that the feature space is covered well by the point observations (above 99.9%), indicating that there is no extrapolation in feature space. However, the relationship between the dependent and independent variables may not carry from one region to another. The spatial coverage of deep DTB observation is more importance than their coverage in feature space to reduce the extrapolation risk.
4.5 Effect of Observation Density on Model Results
We developed a data thinning algorithm, which is implemented as a function named sample.grid in the R package GSIF, to get a subset of spatial clustered data points in such a way that the output data points are distributed evenly in the space. Spatial points are overlaid with spatial grids with a specified cell size and then get a subset from each grid with a specified number at most. If one grid has fewer points than the specified number, all the points are taken. If one grid has more points than the specified number, only this number of points are taken by random sampling. To test the effect of observation density on model results, we used the above algorithm to get subsets of the observations taking Kentucky (Figure 12) as an example. Eight cell sizes (50, 100 m, 200 m, 500 m, 1000 m, 2000 m, 4000 m, and 8000 m) were tested. The maximum number of each cell was set as 1. The density and the number of observations increased as the cell size decreased. These subsets were used to fit random forest models, and the rest of the observations were used to validate. Figure 14 shows that the performance of the models by validation increased rapidly as the cell size decrease. But the amount of variation explained did not increase much when the cell size is 100 m or less, which was slightly smaller than the amount of variation explained by 10-fold cross validation (54%). This indicated that there should be at least one observation at each grid with a size of 100 m by 100 m to represent the spatial variation of DTB. It should be noted that there were less than 1% of grids within the interpolation area which had an observation when the cell size is 100 m by 100 m. This is because the observations are spatially clustered. As a result, adding more observations will improve the prediction even in such areas with high density of observations. We also tested the above procedure for the global observations. The results shows that the amount of variation explained by validation was 19% when the cell size was 100km by 100km, and only 2,308 observations were used for model calibration. This indicated that there were still some predictabilities when observations were very sparse but were evenly distributed in space.

Effects of observation density on the model performance for Kentucky. Black line is the amount of variation explained. Red line is the percentage of the observations used for model calibration. There are 82,905 observations in total.
4.6 Suggestion on the Usage of the Data Set
Users should be aware of the limitation of the predicted maps and the low accuracy in the extrapolation areas as mentioned above. If a user is interested in the deep DTB larger than 100 m or requires accurate value of shallow DTBs (such as several centimeters), the data set may not have high accuracy to satisfy these demands. For global scale application such as earth system modelling, it is necessary to aggregate the data set into a lower resolution by averaging. Though the resolution is 250 m, it is not the first choice for regional applications if local maps exist. Users may need to make their own decision recognizing the advantages and limitations that we described in the paper.
4.7 Further Improvements
Predicting what lies beneath soil surface is not trivial. We believe that key to improving the predictions of DTB is in adding more training points, especially in areas such as Latin America, Asia, and Africa where the model heavily extrapolates (see Figure 2). In that context, there are other less available data, including point observation and regional maps, which could be used to improve the data sources used to produce a global map. In some borehole observations, there is no direct record of depth to bedrock, though it is possible to extract DTB values using the lookup tables, as we did with the Australian, Brazil and China data. On the other hand, matching of DTB values with lookup tables will introduce uncertainty. Records from seismic sources and engineering borehole data could also be used to help improving the accuracy of predictions of DTB. However, these data are usually for small areas with varied quality, which presents challenges for harmonization into a global data set. Additional regional DTB maps could also be used to improve global predictions. For example, Brown et al. [2001] provided two classes of overburden thickness (> 5–10 m and <5–10 m) in the Circum-Arctic Map of Permafrost. Likewise, surficial geology maps could also be used to infer DTB of the adjacent area [Karlsson et al., 2014]. In any case, increasing the representation and quality of training data will likely remain our main strategy to improve these maps. Earth System Models can handle multiple layers and subgrid structure in utilizing depth estimation products such as Pelletier et al. [2016]. Because of the greater uncertainty of depth of intact regolith in uplands for the product of Pelletier et al. [2016], it is still not practical to include it in Earth System Models. As a result, the application of the above data set in the Community Land Model used only the DTB. In the future, it is necessary to include both DTB and soil depth (or depth to regolith) to represent the reality of water and energy balance more accurately in land surface processes because regolith and soil have quite different thermal and hydraulic properties. This depends on the availability of depth data and the development of corresponding method.
5 Conclusions
We produced maps of the depth to bedrock including the absolute DTB, the censored DTB, and the occurrence of R horizon within 200 cm for the whole world using state-of- the-art ground observations of depth to bedrock and machine learning algorithms. This data set provides Earth System Models with more accurate estimation of the lower boundary condition. The cross-validation suggests that moderate performance for the absolute DTB and the occurrence of R horizon. However, the censored DTB contains a significant amount of over-predicted low values. The predictability of DTB was limited by the inherent variability, inaccuracies, censored nature of the observations and biased spatial coverage of the input data. In addition, almost all the covariates used in this study reflect surface or near surface characteristics and processes in modern time. This restricts the ability of predicting the higher values of DTB (i.e., deeper DTB). Incorporation of more observations, especially borehole drilling logs in the tropics, wetlands, mountain ranges, shifting sand areas and similar, would help improve the resulting maps and increase accuracy, especially for higher values of DTB. As all processes from point to raster overlay to model fitting are fully automated, by gradually adding new training data we hope to produce more and more accurate maps of underlying boundary of the world soil and regolith. The resulting global maps are available for download at http://globalchange.bnu.edu.cn/ and http://soilgrids.org/.
Acknowledgments
This work was supported by the Natural Science Foundation of China (under grants 41575072, 41405096) and R&D Special Fund for Nonprofit Industry (Meteorology, GYHY201206013, GYHY201306066). ISRIC is a nonprofit-making organization, core-funded by the Dutch government, with a mandate to serve the international community as custodian of global soil information and to increase awareness and understanding of the role of soils in major global issues.