Development of a New Pedotransfer Function Addressing Limitations in Soil Hydraulic Models and Observations
Abstract
The most applied pedotransfer functions (PTFs) often suffer from two main limitations: (a) the soil hydraulic models (SHMs) only account for capillary forces and/or the models show an unrealistic decrease near saturation for fine-textured soils; (b) the observations of soil hydraulic properties (SHPs) used to generate the PTF generally do not cover very dry conditions. In this paper, we first present a simple method for predicting SHPs in the dry range from soil texture information. Together with measurements that cover only a relatively high matric potential range, the method yielded a good prediction of the complete SHPs from saturation to oven dryness. With this method, we extended a public dataset to cover dry conditions, and then applied it to develop a new PTF for a SHM that accounts for both capillary and adsorption forces and overcomes the unrealistic decrease near saturation for fine-textured soils. A comparison with other PTF that was developed for the capillary-based soil hydraulic model showed that the new PTF provided the most accurate predictions of SHPs. It reduced the root-mean-square-error value from 0.055 to 0.045 cm3 cm−3 in predicting water content and from 0.84 to 0.66 log10 (cm d−1) in predicting hydraulic conductivity. We further applied this method to extend an existing capillary-based PTF to dry conditions. The results showed an improved performance, with reported RMSE reduced from 0.058 (original) to 0.056 (extended) cm3 cm−3 and from 1.43 (original) to 1.20 (extended) log10 (cm d−1) for prediction of water content and hydraulic conductivity, respectively.
Key Points
-
A simple method is presented for predicting a complete SHP with limited measurements covering only high matric potential range
-
A new PTF was developed by considering limitations of model structure and observation, which significantly improved the prediction of SHPs
-
A method is proposed to extend an existing capillary-based PTF to dry conditions
1 Introduction
Soil hydraulic properties (SHPs), including the soil water retention curve (SWRC) and the hydraulic conductivity curve (HCC), are crucial inputs in water and solute transport simulations. However, the experimental determination of these parameters is expensive, time-consuming, and can be difficult, especially under very dry conditions (Vereecken et al., 2010). For large-scale applications that require the input of a large number of SHPs, the experimental measurements can even be impractical. As a result, pedotransfer functions (PTFs), which relate SHPs to easily measured soil properties such as soil texture, serve as a standard method for predicting SHPs, especially in large-scale field-based applications (e.g., Dai et al., 2019; Zhang et al., 2018).
The general way of developing a PTF is fitting the measured SHPs with soil hydraulic models to obtain the parameters, which are then related with the easily measured soil properties such as soil texture information through the use of regression methods. Over the past few decades, many PTFs have been developed, and great efforts have been made to improve their performance, mainly through including large measured datasets (e.g., Nemes et al., 2001; Weynants et al., 2013; Wösten et al., 1999), adding additional soil properties such as organic content or soil chemistry information besides the soil texture information as the input (e.g., Børgesen et al., 2008; Børgesen & Schaap, 2005; Pachepsky et al., 2006; Pachepsky & Rawls, 2004; Rawls & Pachepsky, 2002; Szabó et al., 2021; Tóth et al., 2015), and applying more powerful machine learning methods such as nearest neighbor, support vector machine, or random forest (RF) methods (e.g., Araya & Ghezzehei, 2019; Lamorski et al., 2008; Nemes et al., 2006; Szabó et al., 2021). Useful reviews of PTF development can be found in Wösten et al. (2001), Pachepsky and Rawls (2004), Vereecken et al. (2010), and Van Looy et al. (2017), among others.
However, almost all the soil hydraulic models applied for developing PTFs are capillary-based. For example, the most widely applied model is the well-known van Genuchten (1980)-Mualem (1976) model (hereafter referred to as the VGM model). While these models have a good ability to describe SHPs in the high to medium moisture range, the failure of these capillary-based models has long been recognized under low water content conditions (e.g., Nimmo, 1991; Rossi & Nimmo, 1994; Tuller & Or, 2001; Wang et al., 2013). The reason for this is that these models do not consider the effect of adsorption forces, which are dominant in a low water content situation (e.g., Tokunaga, 2009; Tuller & Or, 2001). As drylands cover nearly 41.3% of the land surface (Robinson, 2015), understanding soil water dynamics, especially in the low moisture range, as well as their impact on evapotranspiration and other related processes, is crucial for the understanding of the global water and energy cycles and their response to climate change. For example, in a recent study by Wang et al. (2019), the authors argued that the commonly neglected adsorption forces play a crucial role and should be included in soil evaporation estimation. The simulation and prediction of the evapotranspiration and other related processes requires the accurate determination of SHPs that cover dry conditions, where the adsorption forces dominate.
Early efforts were mainly focused on developing a complete SWRC over the entire moisture range (e.g., Fayer & Simmons, 1995; Fredlund & Xing, 1994; Lu et al., 2008; Nimmo, 1991; Rossi & Nimmo, 1994; Webb, 2000). Tuller and Or (2001) were the first to include non-capillary forces when developing a complete HCC. Since then, a series of complete models have been proposed that consider the coupled effects of capillary and adsorption forces, which have performed well in describing SHPs from saturation to oven dryness (e.g., Lebeau & Konrad, 2010; Zhang, 2011; Peters, 2013; Wang et al., 2016; Liao et al., 2018; Weber et al., 2019; Stanić et al., 2020 among others). However, to describe a complete HCC, an additional parameter that represents the saturated film conductivity is required, which yet is difficult to determine. Differing from the complete models that introduce additional parameter, Wang et al. (2018) proposed a new model to describe the HCC over the entire moisture range, using a single equation. Compared to the commonly used capillary-based models (such as the VGM model), this model requires no extra parameters, and shows a very good ability in predicting the HCC. Together with the SWRC described by the Fredlund and Xing (1994) model, this represents an easy way to develop new PTFs that can predict SHPs over the entire moisture range. For example, by applying this Fredlund and Xing (1994)-Wang et al. (2018) model (referred to as the FXW model hereafter), Rudiyanto et al. (2021) recently developed a new PTF and achieved a notable reduction in root-mean-square error (RMSE) for both the SWRC and HCC, in comparison with the VGM model based Rosetta3 PTF developed by Zhang and Schaap (2017). The original FXW model, however, does have one limitation, that is, the HCC drops dramatically near saturation for soils with small n value, which is a parameter used in shaping the SWRC (de Rooij et al., 2021; Wang et al., 2018). Wang et al. (2022) recently improved the FXW model further to overcome this shortcoming by forcing the water content be saturated above a non-zero matric potential.
Besides the limitations of the model structure, another crucial limitation in developing PTFs that aim to predict SHPs from saturation to oven dryness comes from the limited measurements. That is, the applied soil hydraulic data for PTF development often does not have measurements from very dry conditions. For example, most of the SWRC measurements in the UNsaturated SOil hydraulic DAtabase (UNSODA) adopted by Rudiyanto et al. (2021) are for matric potential above −1.0 × 103 cm. Lu et al. (2014) showed that the observed data should cover a more negative potential of about −3.0 × 103 cm to achieve an accurate description of the SWRC over the entire moisture range. When it comes to the HCC, the measurements under dry conditions are even fewer. Specifically, only 29 samples of 215 soil samples used by Rudiyanto et al. (2021) have conductivity measurements at a matric potential lower than −1.0 × 104 cm. Since the number and the quality of the measured data are essential for developing PTFs, the lack of measurements under low moisture conditions greatly limits the reasonable development of PTFs. This limitation comes from the SHP observations, which, to the best of our knowledge, is an aspect that has rarely been considered or dealt with in the literature.
Meanwhile, many of existing PTFs that only consider capillary forces were developed on different, usually non-public datasets (e.g., Szabó et al., 2021; Tóth et al., 2015). Weber et al. (2020) recently developed a transfer function through weighted linear regression to predict the parameters of the so-called Brunswick model provided by Weber et al. (2019) from the VGM parameters. This enables the application of existing PTFs developed for the VGM model to other models. Differing from applying linear regression, in this study, we try to develop a new and physically-based method to predict SHPs from saturation to oven dryness using existing PTFs.
Accordingly, the purpose of this study was: (a) to develop a method for predicting SHPs over the entire moisture range with limited observations that cover only a relatively high potential range; (b) to develop a new PTF with the application of this extended method for a new soil hydraulic model developed in Wang et al. (2022), which accounts for both capillary and non-capillary effects and overcomes the unrealistic decrease of HCC near saturation for fine-textured soils; and (c) to provide an easy and physically-based way to extend existing PTFs to dry conditions.
2 Methods
In Section 2.1, we introduce the three applied soil hydraulic models. Followed by this, we present the method for predicting the complete SHPs with limited observations in Section 2.2. In Section 2.3, we introduce the PTFs development, while in Section 2.4, we show how to improve the prediction of SHPs with existing PTFs. The statistics used for evaluating model performance are introduced in Section 2.5.
2.1 The Applied Soil Hydraulic Models
In this study, three soil hydraulic models were applied to develop the PTFs, including the capillary-based VGM model, the FXW model that considers both capillary and adsorption forces and the so-called FXW-M1 model that improves further the performance of the FXW model for fine-textured soils (Wang et al., 2022).
Since we intend to develop PTFs that aim to predict SHPs from saturation to oven dryness, the isothermal vapor diffusion is also considered because it contributes to the total water flux under very dry conditions. The conductivity K (L T−1) is therefore treated as a sum of the liquid conductivity Kl (L T−1) and the isothermal conductivity of vapor flow Kv (L T−1).
2.1.1 The VGM Model
2.1.2 The FXW Model
2.1.3 The FXW-M1 Model
For a detailed description of the FXW-M1 model, we refer the reader to Wang et al. (2022). In describing the SHP with the FXW-M1 model, l is set as 3.5 and hs is set as −1.0 cm as suggested by Wang et al. (2022), while the remaining five parameters θs, α, n, m, and Ks need to be determined.
2.1.4 The Isothermal Conductivity of Vapor Diffusion
2.2 Predicting the Complete SHPs With Limited Observations
2.2.1 Predicting the Complete SWRC With Limited Observations
The soil water content and then SL are determined by the specific soil surface area, which is, in turn, controlled mainly by the clay fraction. Several relationships between the clay fraction and SL can be found in the literature (e.g., Arthur et al., 2013; Resurreccion et al., 2011; Schneider & Goss, 2012). Jensen et al. (2015) also used the silt fraction and organic matter content besides clay fraction to derive SL.
However, the referred relationships were usually built through simple regression with the application of limited datasets. For example, Resurreccion et al. (2011) developed an exponential relationship between 1/SL and the clay fraction based on 41 soil samples, and Schneider and Goss (2012) provided a linear relationship between these two parameters for 18 soil samples.
In this study, by collecting a total of 275 soil samples from the literature (as described in Section 2.7), we built a new relationship between SL and soil texture information, including the sand, silt, and clay percentages, and the bulk density, through an RF model, which is a powerful machine learning method.
2.2.2 Predicting the Complete HCC With Limited Observations
Under dry conditions, the hydraulic conductivity, which accounts for the adsorption forces, is determined by the specific surface area SA (L2 L−3) and the film thickness f (Bird et al., 1960). A detailed description of the method for predicting conductivity data from the SWRC was presented in a recent study by Wang et al. (2022). This is briefly introduced as follows.
Together with the measured data points at a relatively high matric potential range, they were fitted with the HCC, as described in Equation 5 or 7, to obtain the parameters.
2.3 PTF Development
2.3.1 Parameter Optimization
Model | Parameter | Lower boundary | Upper boundary |
---|---|---|---|
VGM | αVG (cm−1) | 0.001 | 0.25 |
nVG | 1.01 | 10.00 | |
θr (cm3 cm−3) | 0.01 | 0.25 | |
θs (cm3 cm−3) | 0.24 | 0.65 | |
Ks (cm d−1) | 1.0 × 10−4 | 1.0 × 104 | |
FXW | α (cm−1) | 0.001 | 0.1 |
N | 1.1 | 10.00 | |
M | 0.01 | 1.5 | |
θs (cm3 cm−3) | 0.24 | 0.65 | |
Ks (cm d−1) | 1.0 × 10−4 | 1.0 × 104 | |
FXW-M1 | α (cm−1) | 0.001 | 0.1 |
N | 1.01 | 10.00 | |
M | 0.01 | 1.5 | |
θs (cm3 cm−3) | 0.24 | 0.65 | |
Ks (cm d−1) | 1.0 × 10−4 | 1.0 × 104 |
2.3.2 Developing PTF by the Random Forest Model
For each soil hydraulic model, we developed three PTFs with different input information. Model H1 only considers the soil texture class, model H2 considers the sand, silt, and clay percentages, while model H3 also considers bulk density. When developing the PTFs, it should be noted that for parameters α, n, m, and Ks, they were transformed to the log10 scale.
The RF model was adopted for developing PTFs. The RF model (Breiman, 2001) is regarded as one of the best machine learning techniques (e.g., Boulesteix et al., 2012; Cutler et al., 2007). Predictions in RF are generated as an ensemble estimate by constructing a lot of decision trees through bootstrap samples (Hengl et al., 2018). RF is easy to adopt by using a package such as the ranger package (Wright & Ziegler, 2017) implemented in R software. The RF model has been successfully applied in predicting soil properties such as soil texture information (e.g., Hengl et al., 2017), soil saturated hydraulic conductivity (Araya & Ghezzehei, 2019) and also soil water retention and hydraulic conductivity properties (Szabó et al., 2021).
When applying the RF model, the performance on the training model was evaluated by calculating the RMSE value of the out-of-bag samples. In tuning process, the number of trees was set to 500, which enabled a stable root-mean square value for the predictions. While the two parameters, the minimum leaf size and the number of randomly selected predictor variables chosen at each node, were optimized by minimizing the root-mean square value of the out-of-bag samples.
2.4 Extending Existing Capillary-Based PTF to Predict the Complete SHPs
With the extended method described in Section 2.2, existing PTFs that account for capillary forces can be easily extended to dry range where adsorption forces dominate. In this study, we took the Rosetta3 PTF proposed by Zhang and Schaap (2017), which was developed for the VGM model that accounts for only capillary forces, as an example.
Second, the predicted SHPs from 0 to hc, together with three additional water retention and conductivity data pairs, as calculated by Equations 12 and 17, respectively, were fitted with the FXW model to derive the parameters. Figure S4 in Supporting Information S1 shows the method can generally predict the SHPs well from saturation to oven dryness.
2.5 Model Performance Statistics
The RMSE and coefficient of determination (R2) were applied for evaluating the model performance in predicting the 1/SL and then the complete SHPs (Section 2.2) and in extending existing PTFs (Section 2.5). For PTF performance evaluation (Section 2.3), the mean error (ME) was also introduced. For PTF development, the model performance was evaluated for two randomly divided groups, that is, a training set with 70% of the data and a test set with the remaining 30% of the data.
3 Data
3.1 Data Applied for Developing PTFs
The UNSODA database (Nemes et al., 2001) was applied to develop the PTFs. Only the measurements from drying experiments in the laboratory were chosen, to avoid the impact of hysteresis. For the data selection, a lower boundary of 1.0 g cm−3 was set for the bulk density and a minimum number of five was set for both SWRC and HCC measurements. In addition, we excluded the data with only measurements higher than the potential of −300 cm, which is a value close to the so-called field capacity. The reason, as shown in Section 3.1, is that when applying the extended method presented in Section 2.2, the measurements need to reach the potential of −300 cm to achieve a good description of the SHPs over the entire moisture range. Considering all the limitations, a total of 422 soil samples were selected, with a total of 4,887 retention points. Among the samples, 215 soil samples also included HCC measurements, with a total of 3,966 points. The soil texture distribution of the selected soil samples, including 70% training data and 30% test data, is shown in Figure 1.
Besides the selected original UNSODA data, we also developed an extended data to develop PTFs. That is, for each soil sample, three additional data pairs that estimated by Equations 12 and 17 are included besides SWRC and HCC measurements, respectively. Because the observations were generally sufficient for describing the SHPs over the entire moisture range when the measurements reached the potential of about −1.0 × 104 cm (Lu et al., 2014), only the datasets with measurements above this potential were extended.
3.2 Data Applied for Predicting SL
To build the relationship between SL in Equation 12 and the soil texture information, a total of 275 soil samples were applied. Among them, 62 samples come from Resurreccion et al. (2011) and Jensen et al. (2015), where the water retention data pairs were measured directly in the dry range and SL was derived by fitting Equation 9 to the measurements. The remaining 213 soil samples, with measured retention data reaching the potential of −1.0 × 104 cm, are selected from the UNSODA database. Because no direct measurements of retention data are available in very dry range (i.e., less than the potential of −1.0 × 105 cm), the SL of the selected 213 soil samples was derived through inverse modeling. This was achieved by fitting the measured retention data with the FXW model (Equation 3) to obtain the parameters, and then predicting the water content values at a matric potential of −1.0 × 105, −5.0 × 105, and −1.0 × 106 cm with the determined SWRC. The SL was then derived by fitting Equation 12 with these predicted water retention points. A flowchart for deriving the SL is provided in Figure S1 in Supporting Information S1. It should be noted that, in the dry range, a semi-log relationship always holds between the matric potential and the water content in case of the FXW model. The reported R2 is higher than 0.99 for all 213 soil samples. The reason is that for the FXW model, the water content in the dry range is determined mainly by the correction factor in front of Γ(h) in Equation 3 and thus is linear with the semi-log matric potential.
In developing PTF for predicting SL, we do not divide the dataset into training and test groups. The main reason for this is that a small size of data set is available. Especially, the applied data includes only 62 measured data of high quality.
4 Results
4.1 Prediction of the Complete SHPs With Limited Observations
4.1.1 Determination of SL From the Soil Texture Information
Figure 2 presents the predicted SL obtained with the RF method, with the input of sand, silt, and clay percentages, as well as the bulk density. A total of 275 soil samples, with a variety of soil textures, were used for training the RF model. The SL in Equation 12 can be well predicted with the soil texture information by applying the RF method, particularly for 1/SL values of less than 0.04. A higher 1/SL generally means a higher clay fraction in the soil sample. For more fine-textured soils (with much higher 1/SL values), the method underestimates the 1/SL value for some of the UNSODA database. It should be noted that, for the UNSODA database, the 1/SL values were derived from inverse modeling rather than direct measurement. As a result, there is higher uncertainty than in the data from Resurreccion et al. (2011) and Jensen et al. (2015). The overall performance of the 1/SL estimation is very good, with an RMSE of 0.0086 and an R2 of 0.93.
4.1.2 Prediction of a Complete SWRC With Limited Observations
With the SL predicted from the soil texture information, three water retention points at a matric potential of −1.0 × 105, −5.0 × 105, and −1.0 × 106 cm could be estimated with Equation 12. These three points, together with the measurements at a relatively high matric potential range, that is, measurements with potentials higher than −100, −300, and −1,000 cm, respectively, were then fitted with the FXW model. To evaluate, 213 soil samples with SWRC measurements lower than potential of −1.0 × 104 cm were selected. As shown in Figure 3a, the FXW model yields close agreement with observations when fitting with all the measured data points, with reported R2 of 0.99 and RMSE of 0.013 cm3 cm−3. When the measurements only reach the potential range of about −100 cm, the FXW model, with the inclusion of three additional data points in the dry range, generally shows good agreement with the observations at the non-fitted range, whereas there is underestimation for some soil samples for water content higher than 0.2 cm3 cm−3 where capillary forces dominate (Figure 3b). This underestimation happens the same for the fitted results with measurements reaching the potential range of −300 cm (Figure 3c), a value close to the potential of the so-called field capacity, although it reduces the RMSE from 0.026 cm3 cm−3 (with the lower boundary of −100 cm) to 0.021 cm3 cm−3. The underestimation happens mainly for fine-textured soils (Figure 4). The reason is that the capillary forces dominant a much broader potential range than the lower limit of −100 or −300 cm for fine-textured soils. When the measurements reach the potential range of about −1.0 × 103 cm, which is close to the limit of the commonly applied tensiometers, the observations at the non-fitted range are captured well by the proposed model (Figure 3d). The reported RMSE is 0.017 cm3 cm−3, which is very close to the 0.013 cm3 cm−3 of the fitted results with all the measurements (Figure 3a).
In contrast, the fitted results obtained without the extended retention points show notable overestimation of the water content in the non-fitted range, with roughly double the RMSE compared to those results obtained with the extended method (Figure S2 in Supporting Information S1).
The difference between the predictions with and without the additional data points can be seen more directly in Figure 4. The model predictions without the additional data in the dry range yield obvious overestimation of water content in the non-fitted range even when the observation reaches the lower limit of −1.0 × 103 cm (Figure 4c). When including the three additional data points, the model improves the prediction of water content. However, the model slightly underestimates the water content in the middle saturation range when the measurements reach the lower limit of −100 and −300 cm (Figures 4a and 4b).
In summary, the results suggest that (a) the SWRC in the dry range can be well predicted with the method developed in Section 2.2.1, and (b) to predict the SWRC from saturation to oven dryness with limited measurements for all types of soil, the measurements should cover the lower limit of about −1.0 × 103 cm.
4.1.3 Prediction of a Complete HCC With Limited Observations
Figure 5 indicates that the extended method, as presented in Section 2.2.2, performs well in predicting the hydraulic conductivity under dry conditions. For observations with a lower potential limit of −100 and −300 cm, the RMSElog10(K) is 0.67 and 0.50 log10 (cm d−1), respectively. When the observations reach a potential range of −1.0 × 103 cm, the extended method yields a much better performance (Figure 5d), with the RMSElog10(K) being 0.39 log10 (cm d−1), which is much closer to the 0.31 log10 (cm d−1) of the fitted curve obtained with all the observations. In contrast, the predictions obtained without the extended data yield a much poorer performance under dry conditions, especially for the curves with observations higher than −100 cm, with a lower R2 of 0.85 and a much higher RMSElog10(K) of 0.80 log10 (cm d−1) (Figure S3 in Supporting Information S1).
Figure 4 suggests that when including three additional data points, the estimated conductivity is much smaller than the estimation with only measurements at relatively high potential range in the dry range.
4.2 Development of PTFs With the Original and Extended Data
4.2.1 Model H1
Following the standard of the United States Department of Agriculture, 12 soil texture classes were categorized based on the percentages of the sand, silt, and clay content. For each soil texture class, we calculated the mean value and the standard deviation (SD) of the five parameters derived for each soil sample. Here, we only provide the results of the FXW-M1 model fitted with the extended data (Table 2).
Soil texture | Numa | α (1/cm) | n | m | θs (cm3cm−3) | log10Ks (cm d−1)b | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Mean | SD | Mean | SD | Mean | SD | Mean | SD | Mean | SD | ||
Clay | 16 (12) | 0.10 | 0.09 | 2.05 | 1.49 | 0.18 | 0.09 | 0.53 | 0.07 | 1.07 | 0.62 |
Clay loam | 15 (4) | 0.08 | 0.10 | 1.68 | 1.01 | 0.26 | 0.13 | 0.45 | 0.06 | 0.93 | 0.35 |
Loam | 50 (33) | 0.08 | 0.09 | 1.73 | 1.63 | 0.37 | 0.15 | 0.47 | 0.09 | 1.25 | 0.79 |
Loamy sand | 48 (16) | 0.04 | 0.02 | 3.32 | 2.12 | 0.72 | 0.32 | 0.39 | 0.07 | 1.74 | 0.57 |
Sand | 99 (48) | 0.04 | 0.03 | 4.62 | 2.79 | 0.84 | 0.30 | 0.36 | 0.05 | 2.13 | 0.76 |
Sandy clay | 3 (0) | 0.00 | 0.00 | 2.20 | 1.99 | 0.26 | 0.16 | 0.41 | 0.04 | Nan | Nan |
Sandy clay loam | 29 (5) | 0.05 | 0.07 | 2.08 | 2.21 | 0.34 | 0.16 | 0.39 | 0.05 | 0.58 | 0.28 |
Sandy loam | 63 (24) | 0.03 | 0.04 | 2.20 | 1.97 | 0.58 | 0.22 | 0.37 | 0.07 | 0.92 | 0.56 |
Silt | 3 (3) | 0.01 | 0.00 | 1.86 | 1.19 | 0.74 | 0.27 | 0.42 | 0.03 | 0.97 | 0.87 |
Silty loam | 86 (58) | 0.04 | 0.07 | 1.41 | 1.02 | 0.56 | 0.26 | 0.44 | 0.05 | 0.73 | 0.59 |
Silty clay | 8 (8) | 0.08 | 0.07 | 2.50 | 1.37 | 0.12 | 0.06 | 0.48 | 0.10 | 0.60 | 0.45 |
Silty clay loam | 9 (4) | 0.08 | 0.09 | 2.74 | 3.02 | 0.29 | 0.16 | 0.49 | 0.06 | 1.03 | 1.06 |
- a The number in brackets is for conductivity.
- b Nan means no dataset.
It should be noted that, because the particle size of some samples was located at the boundary of two different classes, these samples accounted for both soil classes, resulting in the total number of samples exceeding 422. As shown, some soil classes only include a few soil samples, so caution should be applied when applying the mean values of these soil classes.
4.2.2 Models H2 and H3
For the original FXW model and the modified FXW-M1 model, the training and test performance of the parameters, represented by the R2, RMSE, and ME, are presented in Tables 3 and 4, for both models H2 and H3. Because the FXW model and the FXW-M1 model have almost the same performance in predicting the four parameters for describing SWRC, we only show the results of the FXW-M1 model. While for the parameter log10(Ks), the results of both two models are presented. Notably, for the training case, we apply the out-of-bag samples to calculate the statistical values.
Model H2 | R2 | RMSE | ME | ||||
---|---|---|---|---|---|---|---|
EXTa | Original | EXT | Original | EXT | Original | ||
log10(α) | Train | 0.30 | 0.23 | 0.34 | 0.34 | 0.0010 | 0.0006 |
Test | 0.31 | 0.25 | 0.32 | 0.34 | 0.0389 | 0.0355 | |
log10(n) | Train | 0.34 | 0.37 | 0.24 | 0.24 | −0.0004 | −0.0001 |
Test | 0.24 | 0.27 | 0.24 | 0.23 | −0.0286 | −0.0275 | |
log10(m) | Train | 0.59 | 0.62 | 0.20 | 0.19 | −0.0005 | −0.0006 |
Test | 0.64 | 0.59 | 0.17 | 0.19 | 0.0034 | −0.0001 | |
θs | Train | 0.27 | 0.30 | 0.07 | 0.06 | 0.0004 | 0.0001 |
Test | 0.29 | 0.31 | 0.07 | 0.07 | −0.0046 | −0.0031 | |
log10(Ks) (FXW model) | Train | 0.30 | 0.15 | 0.71 | 0.83 | −0.0022 | −0.0045 |
Test | 0.35 | 0.12 | 0.55 | 0.72 | −0.0104 | 0.0143 | |
log10(Ks) (FXW-M1 model) | Train | 0.44 | 0.33 | 0.70 | 0.79 | 0.0038 | −0.0102 |
Test | 0.62 | 0.47 | 0.48 | 0.56 | −0.0001 | −0.0363 |
- Note. For the parameters log10(α), log10(n), log10(m), and θs, the training and the test case have 295 and 127 samples, respectively. When it comes to log10(Ks), the training and the test case have 151 and 64 samples, respectively.
- a EXT means that for each sample, three additional soil water retention and soil hydraulic conductivity data pairs besides the measurements were included in deriving the five parameters.
Model H3 | R2 | RMSE | ME | ||||
---|---|---|---|---|---|---|---|
EXT | Original | EXT | Original | EXT | Original | ||
log10(α) | Train | 0.35 | 0.28 | 0.33 | 0.33 | −0.0025 | 0.0005 |
Test | 0.38 | 0.29 | 0.30 | 0.33 | 0.0218 | 0.0202 | |
log10(n) | Train | 0.35 | 0.38 | 0.24 | 0.24 | −0.0010 | 0.0008 |
Test | 0.30 | 0.31 | 0.23 | 0.22 | −0.0302 | −0.0284 | |
log10(m) | Train | 0.61 | 0.63 | 0.19 | 0.19 | −0.0021 | 0.0007 |
Test | 0.63 | 0.61 | 0.17 | 0.19 | −0.0023 | −0.0015 | |
θs | Train | 0.51 | 0.58 | 0.05 | 0.05 | 0.0004 | 0.0003 |
Test | 0.69 | 0.70 | 0.04 | 0.04 | −0.0056 | −0.0046 | |
log10(Ks) (FXW model) | Train | 0.30 | 0.15 | 0.71 | 0.83 | 0.0032 | −0.0110 |
Test | 0.42 | 0.18 | 0.52 | 0.69 | −0.018 | 0.0123 | |
log10(Ks) (FXW-M1 model) | Train | 0.46 | 0.35 | 0.69 | 0.78 | 0.0064 | −0.0049 |
Test | 0.67 | 0.54 | 0.45 | 0.52 | −0.0085 | −0.0483 |
For model H2, Table 3 shows that the PTFs developed with the original data perform very similar in predicting log10(α), log10(n), log10(m), and θs between the training and the test cases. When it comes to parameter log10(Ks), the test case of the FXW model, however, yields a reduced RMSE of 0.72 log10 (cm d−1) compared to the 0.83 cm d−1 of the training case. While the FXW-M1 model also shows an obvious difference between the training and the test cases, it considerably improves the prediction of log10(Ks) compared to the FXW model. Taking the test case as an example, the R2 value increases from 0.12 (the FXW model) to 0.47 (the FXW-M1 model) and the RMSE reduces from 0.72 (the FXW model) to 0.56 log10 (cm d−1) (the FXW-M1 model).
In contrast, the PTFs developed with the extended data show a similar performance in predicting the four parameters of the SWRC for both the training and test cases. However, for parameter log10(Ks), the PTFs developed with the extended data yield an obvious improvement compared to the PTFs developed with the original data, for both the FXW and FXW-M1 models. Taking the test case of the FXW model as an example, the R2 increases from 0.12 (original) to 0.35 (extended) and the RMSE reduces from 0.72 to 0.55 log10 (cm d−1).
The comparison between models H2 and H3 shows that model H3 with the additional input of the bulk density achieves an obvious improvement in predicting the parameter θs, with R2 increasing from 0.29 (H2) to 0.69 (H3) and the RMSE reducing from 0.07 (H2) to 0.04 (H3) cm3 cm−3 for the test case developed with the extended data. Model H3 also slightly improves the prediction of log10(Ks) for the test case. When it comes to the parameters log10(α), log10(n), and log10(m), both models H2 and H3 show similar performance.
The notable difference between the training and the test cases in predicting log10(Ks) might be attributed to the limited size of the training data. In this study, only 215 soil samples have measurements of HCC. Among them, we apply 151 fitted Ks data for the training case while keeping the remaining 64 data as the test case. This limited data size might not be sufficient for training an unbiased model. However, the obvious improvement in predicting log10(Ks) for both the training and test cases confirms that the PTFs developed with the FXW-M1 model and with the extended data have a superior performance compared to the PTFs that are developed with the original FXW model and with the original data.
4.3 Performance of the PTFs
4.3.1 Model H1
The prediction of SHPs with model H1 PTF that is developed for the FXW-M1 model and for the extended data is shown in Figure 6. The reported RMSE values are 0.073 cm3 cm−3 and 0.87 log10 (cm d−1), and the R2 values are 0.76 and 0.81 for the predictions of SWRC and HCC, respectively. The RMSE values are smaller than the 0.078 cm3 cm−3 and 1.06 log10 (cm d−1) reported by Schaap et al. (2001), where the VGM model and a much larger dataset was applied in developing the PTFs with the input of soil texture classes. By applying 215 soil samples from the same UNSODA data, Rudiyanto et al. (2021) reported similar RMSE values of 0.072 cm3 cm−3 and 0.85 log10 (cm d−1), and a much higher value of 0.103 cm3 cm−3 and 1.38 log10 (cm d−1) with the PTFs developed for the FXW and the VGM model, respectively.
4.3.2 Prediction of SHPs With the PTFs Developed From the Original UNSODA Data
Figures 7 and 8 present the prediction of SHPs with the different PTFs developed with the original UNSODA data. In the main text, only model H3 with the input of the sand, silt, and clay percentages, as well as the bulk density, is described, while the performance of model H2 is provided in the supplementary material (Figure S5 in Supporting Information S1).
For the SWRC prediction, the VGM model overestimates the water content in the dry moisture range (<0.1 cm3 cm−3), while showing underestimation in the wet moisture range (>0.4 cm3 cm−3). The overall R2 is 0.88 and the RMSE is 0.047 cm3 cm−3 for the training case. The FXW model, in contrast, considerably improves the prediction of water content in both the dry and wet moisture range. The reported R2 is 0.89 and the RMSE is 0.044 cm3 cm−3 when evaluating with the data in the training case. For the test case, the improvement is even more notable. The R2 increases from 0.83 to 0.87 and the RMSE reduces from 0.055 to 0.046 cm3 cm−3. Compared to the FXW model, the FXW-M1 model shows a very similar performance for both the training and test cases.
When it comes to the prediction of the HCC, the PTF developed for the VGM model shows the worst performance for the training case, with the R2 being 0.75 and the RMSElog10(K) being 1.00 log10 (cm d−1). Considerable underestimations are found for conductivity observations less than 0.01 cm d−1 (Figure 8). When applying the FXW model that accounts for capillary and adsorption forces as well as vapor diffusion, the developed PTF improves the prediction in dry conditions, and yields a higher R2 of 0.78 and a lower RMSElog10(K) of 0.79 log10 (cm d−1). For the test case, however, the PTF developed with the FXW model overestimates the conductivity for most data pairs, with the RMSElog10(K) being 0.87 log10 (cm d−1), which is even higher than that of the VGM model (0.84 log10 (cm d−1)).
Compared to the FXW model, the FXW-M1 model provided in Wang et al. (2022) overcomes the shortcoming of the abrupt drop near saturation for soils with n values close to 1. With this modified FXW-M1 model, the developed PTFs considerably improve the predictions for both the training and test cases. Taking the test case as an example, the FXW-M1 model reports the highest R2 of 0.81, and the lowest RMSElog10(K) of 0.69 log10 (cm d−1).
Figure S5 in Supporting Information S1 also provides the predictions of conductivity obtained with the PTFs developed with the input of the sand, silt, and clay percentages (model H2). Compared to model H3, the prediction of the water content and hydraulic conductivity with model H2 shows a slightly worse performance for all three soil hydraulic models. Taking the PTFs developed with the FXW-M1 model as an example, the R2 decreases from 0.87 (model H3) to 0.83 (model H2) and the RMSE increases from 0.046 (model H3) to 0.052 cm3 cm−3 (model H2) when predicting the water content for the test case. A close examination of the SWRC prediction shows that the difference occurs mainly within the range of water content from about 0.4 to 0.6 cm3 cm−3, that is, close to the saturated condition, where model H2 shows obvious underestimation. For the HCC, model H2 also obtains a lower R2 of 0.80 and a higher RMSElog10(K) of 0.70 log10 (cm d−1), compared to the 0.81 and 0.68 log10 (cm d−1) of model H3 for the test case.
4.3.3 Prediction of SHPs With the PTFs Developed With the Extended UNSODA Data
The extended UNSODA data considers three additional θ-h and K-h data pairs in the dry range besides the original UNSODA data. For prediction of the SWRC, the PTF developed with the extended data yields almost the same performance as those developed with the original data for both the FXW and the FXW-M1 models (Figures 7 and 9).
When it comes to the prediction of the HCC, the developed PTFs with the extended UNSODA data show a notable improvement compared to the PTFs developed with the original data (Figures 8 and 9).
For the FXW model, the PTF developed with the extended data for model H3 reduces the RMSElog10(K) from 0.79 for the original data to 0.72 log10 (cm d−1) when evaluating with the training data. When it comes to the test case, the reduction is even more notable, from 0.87 to 0.75 log10 (cm d−1).
When the PTFs developed with the original and the extended data show a similar performance for the FXW-M1 model in the training case, the PTF developed with the extended data yields improved prediction of HCC in the test case. Compared to the PTF developed with the original data, the PTF developed with the extended data reduces the RMSElog10(K) from 0.69 to 0.66 log10 (cm d−1).
The PTFs developed with model H2 yield a similar improvement as those with model H3 as shown in Figure S6 in Supporting Information S1.
4.4 Extending the Capillary-Based PTFs to Dry Conditions
With the extended SWRC and HCC in dry conditions, the existing PTFs developed with the capillary-based soil hydraulic models can be easily extended to the entire moisture range. We apply the Rosetta3 PTF proposed by Zhang and Schaap (2017) as an example.
The Rosetta3 PTF was developed for the capillary-based VGM model. The VGM model fails to describe the SHPs under dry conditions for not considering the effect of adsorption forces (Wang et al., 2018). Figure 10 shows that the original Rosetta3 PTF overestimates the water content and underestimates the hydraulic conductivity in dry conditions. Meanwhile, the Rosetta3 PTF overestimates the conductivity for most of the data when the observed conductivity is higher than 1.0 × 10−3 cm d−1. The overall R2 is 0.81 and 0.60, and the RMSE is 0.058 cm3 cm−3 and 1.428 log10 (cm d−1) for θ and log10(K) predictions, respectively.
By applying the extended method, as described in Section 2.5, the extended Rosetta3 PTF improves the prediction of soil moisture for water content less than about 0.10 cm3 cm−3, with the RMSE reducing from 0.058 to 0.056 cm3 cm−3. The improvement is even more notable for conductivity, with the R2 increasing from 0.60 to 0.66 and the RMSElog10(K) decreasing from 1.43 to 1.20 log10 (cm d−1).
5 Discussion
5.1 Prediction of Complete SHPs With Limited Observations
Measurements of SHPs in dry conditions are time-consuming and can be difficult. In this paper, by applying a powerful machine learning method (the RF method), we built a relationship between SL and soil texture information based on 275 soil samples. With the estimated SL, then we can predict the SWRC in dry conditions. Furthermore, we developed a physically-based method for predicting the HCC under dry conditions with the estimated SWRC. Combining the SHPs predicted for the dry range with SHPs measured in the capillary-dominant zone shows an excellent performance in predicting complete SHPs.
In practice, this method provides a simple and accurate way for deriving complete SHPs with measurements only in the wet moisture range, for instance, for matric potential above −1.0 × 103 cm. This matric potential range is in the limit of the widely applied tensiometers.
In the literature on SHPs measurements, to derive a complete SWRC, several different devices have often been required (e.g., Schelle et al., 2013). For example, tensiometers for the high matric potential range (0 to −100 KPa), pressure plate apparatus for dry conditions (0 to several MPa), and chilled-mirror dew point devices (WP4-T e.g.,) for very dry conditions (several to hundreds of MPa). Meanwhile, a long period is required to reach equilibrium under very dry conditions (Schelle et al., 2013; Wang et al., 2013). Therefore, it is labor-intensive and time-consuming to measure a complete SWRC directly. For the HCC, measurements in very dry conditions are even more difficult because of the extremely slow water movement rate, and have rarely been presented in the literature.
Accordingly, the method described in this paper provides a simple and accurate method for deriving SHPs from saturation to oven dryness with measurements only from a relatively wet moisture range.
5.2 Limitations of the Model Structure and SHP Observations in PTF Development
Most PTFs provided in the literature show a relatively poor performance under dry conditions. Taking the Rosetta3 PTF developed with the most commonly applied VGM model as an example, it generally shows overestimation of moisture and obvious underestimation of conductivity under dry conditions (Figure 10, and also reported by Zhang & Schaap, 2017 and Rudiyanto et al., 2021), due to the limitation of the model structure. That is, this kind of model only accounts for capillary forces while neglecting the impact of adsorption forces that are dominant under dry conditions (e.g., Tuller & Or, 2001; Tokunaga, 2009). When evaluating with the selected 212 soil samples, the VGM model based Rosetta3 PTF also yields significant overestimation of conductivity for most data in the medium to high potential range (Figure 10). In Rosetta3, the measured Ks rather than the fitted value as applied in this study was used in developing the PTF (Zhang & Schaap, 2017). As argued by Schaap and Leij (2000) and Schaap et al. (2001), among many others, applying observed Ks as matching point can lead to overprediction of conductivity at most matric potentials. The reason for this is that the Ks is sensitive to macropore flow while the unsaturated flow is in soil matrix (van Genuchten & Nielsen, 1985).
When applying the FXW model that considers both capillary and adsorption forces and treating Ks as a free-fitted parameter, the developed PTFs improve the prediction of soil moisture and conductivity under dry conditions. However, the developed PTFs with the original data overestimate the conductivity in the test case (Figure 8). This overestimation can be attributed to the abrupt drop near saturation of the HCC of the FXW model for soils with small n values (Wang et al., 2018, 2022; de Rooij et al., 2021). This shortcoming is a result of the non-zero dθ/dh at the matric potential of zero (de Rooij et al., 2021; Schaap & van Genuchten, 2006; van Genuchten & Nielsen, 1985). For soils with small n value, a much higher Ks is therefore expected when fitting the observations. Accordingly, if the applied data include a relatively high proportion of soils with small n values, the developed PTF tends to overestimate Ks and then overestimate the unsaturated conductivity for soils with high n values.
In contrast, the PTFs developed with the FXW-M1 model, which solves the unrealistic drop by forcing water content be saturated above a non-zero matric potential (Wang et al., 2022), considerably improve the prediction of conductivity (Figures 8 and 9). We also included the impact of vapor diffusion, which only slightly improves the model performance, mainly due to the limited observations of conductivity in extremely dry conditions. The notably improved performance obtained when considering the impact of adsorption forces and dealing with the unrealistic decrease near saturation indicates that the limitations coming from the model structure have to be considered in PTF development, in addition to applying broader datasets and more model inputs, as well as more powerful machine learning or deep learning methods.
Furthermore, most measurements of SHPs only cover a relatively high matric potential range, which is especially true for the HCC. This represents a great limitation but is often not considered or dealt with when developing PTFs that aim to predict SHPs over the entire moisture range (e.g., Rudiyanto et al., 2021). Here, we showed that, by applying additional data in the dry range estimated from soil texture information, the derived PTFs show a notable improvement in HCC prediction compared to those PTFs developed with the original data. This suggests that PTFs developed with limited observations can result in obvious bias in the SHP prediction. For the SWRC, the improvement seen with extended data is not so obvious. This might be due to the difference in training datasets. That is, for the SWRC, about a half of the selected data have observations that cover the potential range of about −1.0 × 104 cm and might be sufficient for PTF training. When it comes to the HCC, only 29 of the 215 soil samples have measurements for potential less than −1.0 × 104 cm. Accordingly, the trained PTF is biased when predicting the HCC in dry conditions. Moreover, one should keep in mind that the lack of measurements in dry conditions also hinders the complete evaluation of the proposed PTFs. With more observations covering dry conditions, the more physically based PTFs can be expected to show better performance.
For PTF development with different inputs, the developed PTF with the input of bulk density outperforms that with only the input of soil texture percentages, which is consistent with the findings in other studies of PTF development (e.g., Wösten, et al., 1999; Schaap et al., 2001). The main improvements are for prediction in the high water content range, where the H2 model shows obvious underestimation. As confirmed in Tables 3 and 4, model H3 shows an obvious improvement in predicting parameter θs when compared to model H2. This can be attributed to the impact of organic matter and/or the model structure, which can be reflected (partially) by the bulk density and/or porosity (e.g., Minasny & McBratney, 2018; Rawls et al., 2003).
Furthermore, by applying the extended method, we also showed that existing PTFs developed with the capillary-based soil hydraulic models can be easily extended to the dry moisture range.
In summary, the findings of this study suggest that the impact of the model structure (e.g., Weber et al., 2020; Rudiyanto et al., 2021) and limited observations has to be considered in PTF development. However, to further improve the PTFs that predict SHPs over the entire moisture range, a much broader dataset and the impact of the soil structure such as the presence of macropore need to be considered. For example, by applying the free-fitted Ks in developing PTF, the predicted conductivity shows an underestimation near saturation.
6 Concluding Remarks
In this paper, we have presented a simple and accurate method for predicting complete SHPs with measurements taken only in a relatively high matric potential range. Testing with a broad dataset showed that the method performs very well in describing the SWRC (with 213 soil samples) and reasonably well in matching the HCC (with 65 soil samples). This method will be of great importance in practice, considering the difficult and time-consuming nature of measuring SHPs under dry conditions.
Based on this method, the SHPs of 422 soil samples (including 215 samples with HCC measurements) selected from the UNSODA database (Nemes et al., 2001) were extended to the complete moisture range. That is, for each soil sample, we added three additional data pairs, which is estimated with the method described in Section 2.2, besides the measurements for both water retention and soil hydraulic data. These data were then further applied in developing and testing the PTFs with the use of different models. The results indicated that the FXW-M1 model, which accounts for both capillary and adsorption forces and overcomes the unrealistic decrease of the HCC near saturation for fine-textured soils, together with the extended data, obtained the best prediction of SHPs from saturation to oven dryness. This suggests that the model structure and limited observations play an important role in PTF development.
The extended method was further applied together with the Rosetta3 PTF presented by Zhang and Schaap (2017), which accounts for only capillary forces, to predict the SHPs from saturation to oven dryness. The test results yielded an obvious improvement in describing SHPs under dry conditions.
The method presented in this paper and the developed PTFs will be beneficial for the study of soil water flow and the associated processes covering relatively dry conditions.
Acknowledgments
This research was supported in part by the National Natural Science Foundation of China (Nos. 42071045, 41722208) and in part by the Natural Sciences Foundation of Hubei Province of China (2019CFA013). Finally, the authors thank the editor Xavier Sanchez-Vila and all anonymous reviewers for their very insightful and constructive comments on this manuscript, and special thanks to the associate editor J.A. (Sander) Huisman for the reviewing and editing work.
Open Research
Data Availability Statement
The applied data were obtained from a public dataset, which is available at the website of the United States Department of Agriculture (https://data.nal.usda.gov/search/type/dataset). The applied data and the code for the developed PTFs as well as the extended method are also available at the website of http://www.hydroshare.org/resource/2e1d064c765744db9d4b7855671e641d.