Volume 55, Issue 2 p. 990-1010
Research Article
Free Access

A Comprehensive Distributed Hydrological Modeling Intercomparison to Support Process Representation and Data Collection Strategies

Gabriele Baroni

Corresponding Author

Gabriele Baroni

Institute for Environmental Sciences and Geography, University of Potsdam, Potsdam, Germany

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Correspondence to: G. Baroni,

[email protected]

Search for more papers by this author
Bernd Schalge

Bernd Schalge

Meteorological Institute, University of Bonn, Bonn, Germany

Search for more papers by this author
Oldrich Rakovec

Oldrich Rakovec

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Faculty of Environmental Sciences, Czech University of Life Sciences, Prague, Czech Republic

Search for more papers by this author
Rohini Kumar

Rohini Kumar

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
Lennart Schüler

Lennart Schüler

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
Luis Samaniego

Luis Samaniego

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
Clemens Simmer

Clemens Simmer

Meteorological Institute, University of Bonn, Bonn, Germany

Centre for High-Performance Scientific Computing in Terrestrial Systems, Geoverbund ABC/J, Jülich, Germany

Search for more papers by this author
Sabine Attinger

Sabine Attinger

Institute for Environmental Sciences and Geography, University of Potsdam, Potsdam, Germany

UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany

Search for more papers by this author
First published: 17 January 2019
Citations: 26

Abstract

The improvement of process representations in hydrological models is often only driven by the modelers' knowledge and data availability. We present a comprehensive comparison between two hydrological models of different complexity that is developed to support (1) the understanding of the differences between model structures and (2) the identification of the observations needed for model assessment and improvement. The comparison is conducted on both space and time and by aggregating the outputs at different spatiotemporal scales. In the present study, mHM, a process-based hydrological model, and ParFlow-CLM, an integrated subsurface-surface hydrological model, are used. The models are applied in a mesoscale catchment in Germany. Both models agree in the simulated river discharge at the outlet and the surface soil moisture dynamics, lending their supports for some model applications (drought monitoring). Different model sensitivities are, however, found when comparing evapotranspiration and soil moisture at different soil depths. The analysis supports the need of observations within the catchment for model assessment, but it indicates that different strategies should be considered for the different variables. Evapotranspiration measurements are needed at daily resolution across several locations, while highly resolved spatially distributed observations with lower temporal frequency are required for soil moisture. Finally, the results show the impact of the shallow groundwater system simulated by ParFlow-CLM and the need to account for the related soil moisture redistribution. Our comparison strategy can be applied to other models types and environmental conditions to strengthen the dialog between modelers and experimentalists for improving process representations in Earth system models.

Key Points

  • A comprehensive approach is developed to compare states and fluxes simulated by different models at different spatial and temporal scales
  • Different strategies depending on the model output are identified to discriminate between model structures and for modeling improvements
  • The approach strengthens the dialog between modelers and experimentalists for improving process representation in hydrological models

1 Introduction

Hydrological modeling is an inherent part of scientific research, with applications reaching from hypothesis testing to prediction in decision support systems (Semenova & Beven, 2015). Since the beginnings of computational power, hydrological models of different complexity that resolve the physical processes with different degrees in completeness and complexity and at different spatiotemporal resolutions have been developed (Hrachowitz & Clark, 2017).

In the last decades, advances in data availability and computational power have supported the development and application of distributed models at increasingly finer resolutions (Wood et al., 2011). Along this initiative, several modeling studies providing a unique opportunity to characterize the different hydrological processes and the interactions between the different hydrological compartments have been conducted (Bierkens et al., 2015). However, it has also been underlined that improving the representation of hydrological processes in Earth system models still remains one of the main scientific challenges (Clark et al., 2015; Semenova & Beven, 2015). Nowadays, the realism of these processes becomes even more relevant because models originally developed for different purposes are also used for similar applications (Bierkens et al., 2015) fostering the debate regarding the “correct” modeling approach and the future of hydrological models (Beven, 2001; Beven & Cloke, 2012; Clark et al., 2017, 2015; Savenije & Hrachowitz, 2017; Weiler & Beven, 2015). But how to proceed?

Among different debates, there is a consensus in the hydrologic community for the need of complementary data sets besides observed streamflow and related signature measures to improve the representation of the dominant physical processes (Clark et al., 2016; Gharari et al., 2014; Melsen et al., 2016). Several studies show in fact that well-calibrated models based on streamflow observations have only a limited control over the spatial distribution of hydrologic fluxes and states such as evapotranspiration (ET) and soil moisture (Rakovec et al., 2016b; Stisen et al., 2011). But since many applications of these models like drought monitoring or the analysis of land-atmosphere feedbacks rely on spatially representative simulations of these distributed variables (Keune et al., 2018; Samaniego et al., 2018; Shrestha et al., 2018; Simmer et al., 2015; Wada et al., 2012), improving the reliability of these model predictions becomes a fundamental need.

For these reasons, several studies explored the use of additional variables to constrain the model parameters and to achieve better performance in the prediction of internal variables, for example, using soil moisture (Parajka et al., 2006; Sutanudjaja et al., 2014; Wanders et al., 2014), groundwater measurements (Seibert, 2000), surface soil temperature (Stisen et al., 2011; Zink et al., 2018), total water storage anomalies (Lo et al., 2010; Rakovec et al., 2016a), or combinations of aforementioned observations (Livneh & Lettenmaier, 2012; Schoups et al., 2005; Shi et al., 2015, 2013; Silvestro et al., 2015; Stisen et al., 2018; Sun et al., 2013).

Despite the fact that most of the studies mentioned above show improvements in the model prediction when considering several variables, two critical aspects should be further considered. First, current model intercomparisons show that even an evaluation based on several variables does not discriminate competitive modeling approaches as many models show similar modeling skills and limitations (Beck et al., 2017; Breuer et al., 2009; Koch et al., 2016; Kollet et al., 2017; Maxwell et al., 2014; Sulis et al., 2017; Vigiak et al., 2018). The reasons are identified in the interactions and compensations between model structures and parameters (Clark et al., 2016; Fenicia et al., 2011) and the difficulties to disentangle their effects (Baroni & Tarantola, 2014; Dai et al., 2017). Second, the studies reported above show that the observations used for the assessment and/or parameters constraints are selected based on modeling purposes, data availability, and expert knowledge. However, there are no general structured approaches to support what to better implement or evaluate in the specific applications.

In this study, we hypothesize that a truly comprehensive and scale-distinctive intercomparison between simulations of models of different complexity can (1) better identify the sensitivity of each model structure for specific applications and (2) support experimental designs (which observations and how) suitable for further assessments and improvements of the models. We acknowledge that synthetic tests only provide a first check of model implementations (Clark et al., 2015), but we believe that such tests can become a fundamental step to improve the dialog between experimentalists and modelers (Francke et al., 2018; Kavetski & Fenicia, 2011; Seibert & McDonnell, 2002) and contribute to a continuous learning process where model settings and observation collection strategies are better integrated (Baroni & Tarantola, 2014; Krueger et al., 2010).

We test this approach for two distributed hydrological models of different complexity: mesoscale hydrological model (mHM; Kumar et al., 2013; Samaniego et al., 2010), a spatially distributed process-based model, and ParFlow-Community Land Model (CLM; Kollet & Maxwell, 2006; Maxwell & Miller, 2005), an integrated subsurface-surface hydrological model that explicitly resolves the 3-D variable saturated system based on the Richards equation and the land surface energy balance. Both models are currently used in several studies for drought analysis and for quantifying feedbacks between compartments (Keune et al., 2016, 2018; Samaniego et al., 2018). For this reason, understanding the physical realism of their internal variability is deemed very important. The comparison is conducted based on 3 years of simulations performed at a spatial resolution of 100 m for a medium-sized catchment in Germany (2,324 km2). This resolution has been chosen as a limit of applicability for both models.

The remainder of the paper is structured as follows. The models are described in section 2. The mesoscale catchment and the specific model inputs are described in section 3. The comprehensive analysis of states and fluxes conducted at different spatial scales is described in section 4. The results are presented in section 5, and the discussion is in section 6. Overall conclusions are summarized in section 7. Additional material is reported in the supporting information.

2 Hydrological Models

2.1 mHM

mHM (Kumar et al., 2013; Samaniego et al., 2010) is a process-based spatially distributed hydrologic model (www.ufz.de/mhm). It considers interception, snow accumulation and melting, infiltration and soil water retention, ET, percolation, and runoff generation as the main hydrologic processes. The readers are referred to Samaniego et al. (2010) for a full model description; here we summarize the main model features that are relevant to this study.

Within the present study, potential ET is estimated through the Hargreaves and Samani equation (Hargreaves & Samani, 1985), which requires as inputs daily maximum and minimum temperature and global radiation. The soil within mHM is discretized vertically into a first reservoir down to 5 cm, a second one down to 30 cm, and a third one with a variable depth set to the total soil depth provided by the soil map. Based on the soil textural properties, mHM estimates effective parameters for porosity, hydraulic conductivity, field capacity, and permanent wilting point using a set of pedotransfer functions (Livneh et al., 2015). Actual ET is limited by soil moisture availability based on prescribed soil moisture thresholds. The dominant fraction of roots affecting ET can be prescribed for each soil layers separately. Four lateral water fluxes are defined (at surface and for each reservoir), which contribute to the total runoff. These fluxes are routed based on the Muskingun-Cunge approximations (Todini, 2007) to represent streamflow.

2.2 ParFlow-CLM

ParFlow-CLM (Kollet & Maxwell, 2006; Maxwell & Miller, 2005) is an integrated subsurface-surface hydrological model that includes feedbacks between groundwater and land surface processes at different spatial and temporal scales. ParFlow explicitly resolves the 3-D-variable saturated system based on the Richards equation, and the kinematic wave equation is used to simulate the two-dimensional overland flow. The CLM v. 3.5 (Oleson et al., 2008) is a single-column biogeophysical land surface model that considers snow, soil, and vegetation processes and the land surface energy balance.

In the present study, ParFlow-CLM (hereafter named PfC) implemented in the Terrestrial System Modeling Platform (TerrSysMP) is used (Shrestha et al., 2014; Simmer et al., 2015). The two models are coupled via the external OASIS3 coupler (Valcke, 2013); the 1-D Richards equation included in CLM is replaced by ParFlow.

3 Study Domain and Data

3.1 Description of the Study Area

The modeling experiment is carried out for the upper Neckar catchment located in southwest Germany (Figure 1). The catchment has an area of 2,324 km2 with varying topography including mountains up to 1,020 m a.s.l. Land use and cover in the lower elevations are dominated by agriculture. Forests, mainly of needle-leaf trees, are located in mountainous areas (see supporting information Figure S2). The soil within the catchment is prevalently clay loam with a relatively high spatial variability within the catchment (see Figure S1). Depths to the groundwater are only a few meters in large parts of the area, which assures a coupling between groundwater depth and ET (Kollet & Maxwell, 2008). Annual mean precipitation over the catchment ranges between 600 and 2,000 mm with the highest values over the southwestern part of the catchment. While summer precipitation is dominated by convection, winter precipitation is predominantly related to precipitating fronts, which is increased over the mountains due to orographic lift. Daily average temperature values vary with altitude between −5 and 0 °C in January and 13 and 18 °C in July.

Details are in the caption following the image
Topography of the upper Neckar catchment with river network and discharge gauge location. DEM = digital elevation model.

3.2 Meteorological Data

Three years of high-resolution meteorological forcing (100 m, hourly values) for both hydrological models is obtained from runs of TerrSysMP (Shrestha et al., 2014; Simmer et al., 2015) in which the regional atmospheric model Consortium for Small-scale Modeling (COSMO) is coupled with CLM v. 3.5. COSMO (Baldauf et al., 2011) is a weather prediction model operated by the German national meteorological service (DWD) and other national European weather services. CLM v. 3.5 provides the lower boundary condition for COSMO. The models are run at 1.1-km spatial resolution over an area of ~57,850 km2 containing not only the full Neckar catchment but also a large area around it, including the entire Black Forest and upper Rhine valley. These simulations are laterally forced by operational COSMO-DE analyses provided by DWD, and results are spline interpolated to the 100-m grid. The COSMO-CLM simulations have been evaluated against real observations in a previous study, which showed good agreement within the specific domain (Schalge et al., 2016). The high resolution used for the atmospheric model reproduces quite satisfactorily the fine-scale atmospheric events that are relevant for the present study (advection and storm evolutions) in a physically consistent way. For this reason, these simulated forcings are used in contrast to real observations to avoid patchiness in model simulations due to the usually much coarser grids provided by observational meteorological data (Zink et al., 2018).

3.3 Land Surface and Subsurface Properties

The same land surface and subsurface properties are used to set up the two hydrological models. In particular, the digital elevation model (DEM) provided by the European Environment Agency is used to specify land surface characteristics in the catchment (slope, flow accumulation, and catchment drainage). The DEM is projected to the latitude/longitude grid and bilinearly interpolated from the original 30-m spatial resolution. Land cover is taken from the 2006 Corine Land Cover Data Set also provided by the European Environment Agency. Land use types are grouped into five classes: broad-leaf forest, needle-leaf forest, grassland, cropland, and bare soil. Urban areas are not explicitly simulated by the models, and specific parametrizations should be adopted to compensate for this unresolved process (Bhaskar et al., 2015). To avoid the effect of specifically adapted solutions, urban areas are not considered in this comparison and replaced by bare soil. The leaf area index (LAI) is obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) (Myneni et al., 2002) as monthly averages for the year 2008 for each of the four vegetated land use classes. This LAI is further modified to account for known biases in the MODIS data (Tian et al., 2004). Similarly, the stem area index used in PfC is estimated from the LAI by a slightly modified formulation (no dead leaf for crops, constant base stem area index of 10% of maximum LAI) of Lawrence and Chase (2007) to better represent European tree types.

For the representation of the subsurface, the soil map (BUEK1000) and the geological map (GUEK1000) provided by the Federal Institute for Geosciences and Natural Resources are used. For the geological map, some features that are characteristic for the domain, such as Middle Triassic and Jurassic karst aquifers, are not included to avoid the manifold hydrological challenges related to its modeling (Hartmann et al., 2015) and to provide a more general and consistent modeling intercomparison. The soil map offers sand and clay percentages as well as carbon contents for two to seven soil horizons down to a maximum depth of 3 m for each soil type. In the present study, the first three layers are used to represent the first three soil horizons above the geological structure. The resolution of the soil map is, however, much coarser than the spatial variability at the model grid resolution (100 m). For this reason, the conditional point method presented in Baroni et al. (2017) is applied. This method preserves the main spatial soil patterns in the original soil map while introducing realistic subscale variability. The steps of the conditional point method are briefly described in the supporting information (Figure S1). Readers interested in further details regarding the soil map generation and comparison with other methods are referred to Baroni et al. (2017).

3.4 Specific Models Settings

For this study, we establish mHM for the upper Neckar catchment at 100-m horizontal spatial resolution resulting in (232,360 · 3)  7 · 105 grid cells. The model runs at hourly time step. The model has been calibrated and evaluated in previous studies conducted in the same area showing very good capability to match streamflow observations of catchments of different sizes (Kumar et al., 2010, 2013; Samaniego et al., 2010; Wöhling et al., 2013). This parameterization is also used for the present study (i.e., the same global calibrated parameters), but no additional tuning of the parameters has been performed. For this reason and considering that new forcing and input (e.g., soil properties) have been used in the present study, we consider the model as uncalibrated. The simulation is conducted with a 5-year model spin-up time (by repeating the year 2007) to generate appropriate initial conditions. Then, 3 years of forward runs is simulated for the period 2007–2009.

PfC is set up for a larger domain than the actual catchment to avoid effects of the prescribed lateral boundary conditions. This setup is part of the numerical tests that are performed over the Neckar catchment to provide high-resolution reference simulations for data assimilation tests (http://www.for2131.de/home-en). The specific simulation used in the present intercomparison is based on a total of 700 · 800 grid cells discretized at the same spatial resolution as mHM (i.e., 100 m). In the vertical the model consists of 50 layers of variable thickness down to 100 m, leading to a total of 2.8 · 107 computational nodes. The model runs at 15-min time step. Carbon content available from the soil map is used to infer soil color, which is used in CLM for the surface energy balance (albedo). ParFlow describes retention and hydraulic conductivity curves based on van Genuchten-Mualem parameters, and pedotransfer functions are applied to estimate them. The pedotransfer functions of Cosby et al. (1984), Rawls (1983), and Tóth et al. (2015) are selected because of data availability and previous comparisons conducted in the area (Tietje & Hennings, 1996). As simulated state and fluxes at the land surface are sensitive to these parameterizations (Baroni et al., 2010; Loosvelt et al., 2011), we acknowledge that also for this model additional refining of the parameters based on direct comparisons with observations would have improved the model setup. In the light of testing the framework developed in the present study to identify which observations should be used for further improvement, however, we decided to not perform any additional refining. Finally, the same spin-up strategy used for mHM (5 years) is applied for PfC to generate the initial conditions and 3 years of forward runs is simulated. We acknowledge here that natural groundwater systems usually undergo longer spin-up times (Ajami et al., 2014), but 5 years did minimize the effect of the initial conditions in the land surface states and fluxes evaluated in the present study. In addition, it has to be noted that the spatial resolution (and thus the number of grid cells) pushes the computational requirements at the limit and longer simulations would have been unfeasible.

4 Analysis

4.1 Indices to Quantify the Agreement Between the Models

First, the models are evaluated by comparison of river discharge at the outlet (Q); here we also consider real observations to check the plausibility of the model results. It follows a comprehensive comparison between the two models concerning the simulated actual ET and the volumetric soil moisture (θd) at different soil layers d. In particular, ET is computed by aggregating plant transpiration, canopy, and ground evaporation. Soil moisture of the three reservoirs of mHM is compared with PfC soil moisture averaged over the respective PfC layers, that is, θ1 = 0–5, θ2 = 5–30, and θ3 = 30 cm down to the variable depth according to the total soil depth provided with the soil map (with a maximum soil depth of 150 cm).

The modified Kling and Gupta efficiency (KGE) metric (Kling et al., 2012) and its three components (β, γ, and ρ) are used for the assessment:
urn:x-wiley:00431397:media:wrcr23798:wrcr23798-math-0001(1)
with
urn:x-wiley:00431397:media:wrcr23798:wrcr23798-math-0002(2)
urn:x-wiley:00431397:media:wrcr23798:wrcr23798-math-0003(3)
urn:x-wiley:00431397:media:wrcr23798:wrcr23798-math-0004(4)
with A and B the values of two variables to be compared. μ and σ are the mean and standard deviation, respectively, ρ is the linear correlation coefficient, β represents the bias ratio, and γ the relative variability. Values close to 1 indicate perfect matches. In the following, when only simulations are compared, the variables A and B refer to PfC and mHM, respectively.
In addition, the histogram matching index h (Koch et al., 2018) is also calculated to compare the intersection percentage of the two distributions as follows:
urn:x-wiley:00431397:media:wrcr23798:wrcr23798-math-0005(5)
with K the histogram of the variable A and L the histogram of the variable B. As originally proposed, the values of A and B are standardized to a mean of 0 and a standard deviation equal to 1 (z score) to avoid the effect of different units. The number of bins n is defined based on the Freedman-Diaconis approach (Freedman & Diaconis, 1981). The main utility of the histogram comparison is that it is sensitive to clusters in the data and it complements the other metrics used for the comparison (Demirel et al., 2018).

4.2 Temporal and Spatial Agreement at Different Spatiotemporal Scales

The results obtained by the two models are compared based on an extension of a framework previously proposed (Baroni et al., 2017; Refsgaard et al., 2016). The approach is briefly summarized in the following, and a scheme of the framework is provided in Figure 2. In the first step (Figure 2, step S1), the spatial distribution of the values of the two variables to be compared—A and B—is depicted. Four time steps are shown (t1t4). Based on these distributions, the agreement between the two variables is calculated in both time and space based on a user-defined index (e.g., correlation coefficient ρ). Temporal agreement is quantified at each grid cell. Thus, a map (Figure 2, step S2) showing the spatial variability of the agreement is computed at the grid resolution (e.g., 1 km). In this case the index is used to quantify the temporal agreement between the two variables. Thus, the index is identified with the subscript T (e.g., ρT). Based on the same variables, the index is calculated at each time step over the spatial domain to quantify the spatial agreement of the two variables (Figure 2, step S3). In this case, the index is identified with the subscript S (e.g., ρS).

Details are in the caption following the image
Analysis of the spatial and temporal agreement between distributed simulated values of two models A and B (step S1). The models are first compared at the finest resolution (steps S2 and S3). The same analyses are performed by aggregating the model outputs at different spatial and temporal resolutions (steps S4, S7, and S10). Averaged indices are visualized in steps S14 and S15 (red points) and interpolated for better visualization (green raster). Additional explanation is provided in the text.

These analyses (steps S2 and S3) are repeated with the variables increasingly aggregated in space and time. In the example of Figure 2, the original variables are first spatially aggregated at, for example, 2 km (step S4) followed by the calculations of the temporal and spatial agreements (steps S5 and S6). Then, the analysis is conducted by aggregating the variables over the temporal domain. In the specific example of Figure 2 (step S7), the four time steps are averaged over two periods producing two maps for each variable (t12 and t34). Also for this aggregation, temporal and spatial agreements are calculated (steps S8 and S9). The analysis is then repeated for all combinations of spatial (nS) and temporal aggregation (nT), which leads to a total number of (nS · nT) cases. In the specific example of Figure 2, (2 · 2) = 4 combinations are shown.

These indices can be displayed in different ways. For visualization and to summarize the overall agreement, the mean temporal (<ρT>) and mean spatial (<ρS>) agreement is calculated over all grid cells and time steps, respectively. Thus, a matrix is considered where coordinates indicate spatial and temporal aggregations and the actual point values indicate the indices (Figure 2, steps S14 and S15, red points). The values are then spline-interpolated for better visualization (Figure 2, green raster).

For the specific case study, the spatially distributed model outputs (ET and soil moisture θd at different soil layers d) simulated by the two models are analyzed based on this approach. The values are aggregated from the finest spatial model resolution of 100 m to resolutions of 200 and 500 m and 1, 5, 10, 15, and 20 km (i.e., nS = 8). The temporal aggregation is conducted by aggregating the daily values over intervals of 5, 15, 30, 60, and 90 days (i.e., nT = 6). This results in a total number of (8 · 6) = 48 combinations.

5 Results

5.1 River Discharge at the Outlet

The river discharge at the outlet of the catchment simulated by the two models is in good agreement with observations (Figure 3 and Table 1). mHM performs slightly better than PfC due to the better correlation between the time series (ρT = 0.75 for mHM, while ρT = 0.57 for PfC). In both modeling cases, however, the simulations tend to overestimate observations (βT > 1.2) and to show slightly less variability (γT < 1). The agreement between simulations and observations is remarkable, considering that the two models are not calibrated within the present study; only the same forcing and inputs are provided. In addition, it is interesting to note that the differences between simulation and observations are lower than the differences between the model simulations (see Table 1). Overall, the simulations provide a reliable and interesting case for questioning how much and where the distributed states and fluxes simulated within the catchment differ between the two models.

Details are in the caption following the image
Comparison of simulated river discharge Q (m3/s) by the two models mHM and PfC at the outlet of the catchment (2,326 km2) compared to Obs. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM); Obs = observation.
Table 1. Indices Calculated to Quantify the Agreement Between the Two Models for the Discharge at the Outlet Q, Evapotranspiration ET, and Soil Moisture θ at Different Soil Layers (1, 2, and 3)
Variable Domain Comparison β γ ρ KGE h
Q Temporal agreement at the outlet (Figure 3) mHM versus observation 1.35 0.94 0.76 0.57 0.90
Q Temporal agreement at the outlet (Figure 3) PfC versus observation 1.26 0.95 0.57 0.49 0.89
Q Temporal agreement at the outlet (Figure 3) PfC versus mHM 0.94 1.0 0.83 0.82 0.92
ET Temporal agreement in one selected grid cell (Figure 5) PfC versus mHM 0.76 1.11 0.81 0.67 0.89
ET Spatial agreement on 28 May 2008 (Figure 6) PfC versus mHM 0.69 0.76 0.13 0.05 0.78
θ1 Temporal agreement in one selected grid cell (Figure 8, top) PfC versus mHM 0.92 1.12 0.82 0.77 0.69
θ1 Temporal agreement in one selected grid cell (Figure 8, bottom) PfC versus mHM 1.25 1.05 0.79 0.67 0.89
θ1 Spatial agreement on 28 May 2008 (Figure 9) PfC versus mHM 1.06 1.39 0.17 0.09 0.54
θ2 Temporal agreement in one selected grid cell (Figure 11, top) PfC versus mHM 1.45 0.62 0.83 0.39 0.53
θ3 Temporal agreement in one selected grid cell (Figure 11, bottom) PfC versus mHM 0.81 0.11 0.52 0.0 0.28
θ2 Spatial agreement on 28 May 2008 (Figure 12) PfC versus mHM 1.38 1.47 0.14 −0.05 0.49
  • Note. The indices are calculated over the temporal domain and over the spatial domain. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM); beta = bias index; gamma = relative variability index; KGE = Kling and Gupta efficiency; h = histogram matching index

5.2 Evapotranspiration

Figure 4 compares the simulated ET by the two models. The distributions of the indices (equations 25) are directly shown here to better compare the agreements obtained in space and time. On average, ET simulated by PfC is lower than that simulated by mHM (β < 1) but has a higher variability (γ > 1) both in time and in space. On the contrary, results are highly correlated in time (<ρT> = 0.86) and almost uncorrelated in space (<ρS> ~ 0). Similarly, the histogram matching index h shows better agreement in time (<hT> = 0.9) than in space (<hT> = 0.7). The indices vary little when the temporal domain is considered. Since the temporal indices are calculated at all grid cells, the results indicate that differences between ET simulated by the two models are consistent in all the locations within the catchment. In contrast, the indices vary considerably when the space domain is considered. Since the spatial indices are calculated at all days, this result indicates that the difference between ET simulated by the two models over the catchment varies with time.

Details are in the caption following the image
Temporal (T) and spatial (S) agreement between evapotranspiration (mm/day) simulated by PfC and mHM. From left: the bias β, the variability ratio γ, the correlation coefficient ρ, and the histogram matching index h. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

For a better understanding of the differences between the two simulated ET, some results are shown and discussed in detail. Simulated ET over time is depicted for one location within the catchment (Figure 5 and Table 1), which demonstrates the general temporal differences between both ET time series. Both models reproduce the expected seasonal dynamics well, but ET simulated by PfC drops during precipitation events, while ET simulated by mHM increases (Figures 5c and 5d). During those events ET simulated by PfC reflects the strong reduction of the atmospheric demand (water vapor deficit decreases). Accordingly, soil evaporation and plant transpiration decrease, while most of the energy is used for the evaporation of canopy interception. In contrast, mHM does not simulate this behavior as ET is driven by global radiation and air temperature based on the Hargreaves and Samani equation (Hargreaves & Samani, 1985) and only limited by soil moisture. Thus, during precipitation events, ET simulated by mHM increases due to increased soil moisture availability.

Details are in the caption following the image
(a) ET (mm/day) simulated by mHM and PfC in one location over time. (b) Differences (mHM − PfC) between the two time series and (c, d) ET during two short periods are also shown for better visualization of the effect of the precipitation. The two short periods are also indicated by vertical dashed black lines in panels (a) and (b). ET = evapotranspiration; mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

Finally, the spatially distributed simulated ET by both models on 28 May 2008 is shown in Figure 6 as an example to visualize the overall pattern differences. The mHM simulation exhibits clearly several larger-scale spatial structures not seen in the PfC simulation. At this day a precipitation event occurred, which led to increased evaporation and plant transpiration in the mHM simulation caused by the increased water availability. In contrast, the response of the simulated ET by PfC has a much weaker correlation with the precipitation distribution and demonstrates a longer time response of the plant system to such events.

Details are in the caption following the image
Evapotranspiration (mm/day) simulated by PfC and mHM on 28 May 2008. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

5.3 Soil Moisture in the Surface Soil Layer (0–5 cm)

The distributions of the indices calculated between the soil moisture θ1 in the surface soil layer (0–5 cm) simulated by the two models are shown in Figure 7. Since both βT and βS have on average values close to 1, the analysis shows that the temporal and spatial mean θ1 simulated by the two models generally agree well. The distributions of the indices are, however, very different. As indicated by the large spread of βT, mean temporal θ1 simulated by the two models can differ strongly at certain locations within the catchment. On the contrary, the spatial average θ1 is consistent over time as demonstrated by the small spread of βS. The relative variability γ between the simulated θ1 differs between time domain and space domain. The models agree well on average when comparing temporal variation (γT), while PfC shows a spatial higher variability (γS > 1). The distributions of both indices are also quite wide, indicating that differences between the models depend on the location within the catchment and on the time. By looking at the correlation coefficients ρ, the results show that the simulated θ1 by the two models are highly correlated in time (<ρT> = 0.78) due to the rather direct impact of the identical forcing (precipitation and atmospheric demand). On the contrary, the θ1 spatial distributions are only weakly related (<ρS> ~ 0.12) due to the different land surface processes represented in each model. These relations are consistent in both space and time, as demonstrated by the rather narrow distributions of the indices. Similar results are obtained by looking at the histogram matching index h.

Details are in the caption following the image
Temporal (T) and spatial (S) agreement between soil moisture θ1 (m3/m3) of the first soil layer (0–5 cm) simulated by PfC and mHM. From left: the bias β, the variability ratio γ, the correlation coefficient ρ, and the histogram matching index h. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

Again, for a better understanding of the differences in the surface soil moisture θ1 simulated by the two models, some results are shown and discussed in detail. The simulated soil moisture θ1 over time is depicted for two representative locations in Figure 8. Related indices are reported in Table 1. In the first location (Figure 8a), θ1 simulated by PfC is lower than that by mHM but shows a higher variability. In the second location (Figure 8b), θ1 simulated by PfC is higher than that by mHM with comparable variability. Despite these differences, the time series are highly correlated (ρT ~ 0.8); thus, the soil moisture dynamic is similar, independent from locations (i.e., soil and land cover) as shown by comparing the soil moisture anomalies by dividing soil moisture values by their mean (Figure 8c). The main differences are detected during dry periods (Figure 8d). Thus, since soil moisture is overall quite high over the simulated period (θ > 0.3 m3/m3), stronger differences between the models may occur when applied under drier environmental conditions, for example, for semiarid regions.

Details are in the caption following the image
(a, b) Comparison of soil moisture θ1 (m3/m3) in the surface soil layer (0–5 cm) simulated by the two models at two different locations. (c) The soil moisture anomalies (i.e., by dividing their values by their mean) and (d) their differences (PfC − mHM) are shown. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

Finally, we compare the simulated soil moisture θ1 distributions at the same day we have discussed for ET (28 May 2008; see Figure 9 and Table 1). While the mean soil moisture over the catchment simulated by the models is similar (βS ~ 1), their spatial structures differ remarkably. PfC simulates a higher spatial variability (γS > 1), which is practically uncorrelated with the distribution simulated by mHM (ρS ~ 0). The differences are also detected by the low histogram matching index (hS ~ 0.5). As for ET, the spatial structures simulated by mHM have a much larger pattern than the PfC simulation, although they are not correlated to the precipitation distribution as for ET but rather to soil properties and land use (cf. Figure 6). In contrast, the smaller-scale spatial structures simulated by PfC are strongly related with the river network and topography.

Details are in the caption following the image
Spatial variability of the soil moisture simulated by PfC and mHM at the surface soil layer (0–5 cm) on 28 May 2018. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

5.4 Soil Moisture in Deeper Soil Layers

The differences in simulated soil moisture at deeper soil layers (θ2 and θ3) show some similarities; thus, we present and discuss both variables in parallel and focus on results, which help us to better understand the behaviors of the two hydrological models.

The results for soil moisture at the middle layer θ2 (Figure 10, top row) show that θ2 simulated by PfC is, on average, higher than θ2 simulated by mHM (β > 1). Similar to those for the surface soil layer, however, the results agree much more for the spatial (βS) than for the temporal domain (βT) as indicated by the spread of the distributions. Also, the relative variability γ differs between the temporal and spatial analyses. The temporal θ2 variability simulated by PfC is, on average, lower than the one simulated by mHM (γT < 1), while its spatial variability is on average higher (γS > 1). In both cases, the distributions of the indices are quite large, indicating both location and time dependencies in the agreements. The soil moisture values largely correlate in time (ρT ~ 0.9), indicating, also for these soil layers, that the models reproduce the same hydrological response to the forcing (precipitation and atmospheric demand). On the contrary, the spatial correlation of the simulated soil moisture fields is very low (ρT ~ 0.1). Finally, the results are also supported by the histogram matching index h for which relatively low agreement and high variability are detected in time and in space.

Details are in the caption following the image
Temporal (T) and spatial (S) agreement between soil moisture θ (m3/m3) simulated by PfC and mHM. The indices calculated for (top row) the second soil layer θ2 (5–30 cm) and (bottom row) the third soil layer θ3 (from 30 cm down to variable soil depth). From left: the bias β, the variability ratio γ, the correlation coefficient ρ, and the histogram matching index h. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

Similar results are obtained for the simulated soil moisture θ3 in the third layer (Figure 10, bottom row). The mean θ3 simulated by PfC is, however, lower than the θ3 simulated by mHM (β < 1). Moreover, the temporal relative variability (γT) decreases almost to 0 because PfC simulates almost no temporal dynamics, while mHM exhibits some seasonal trends. Accordingly, the histogram matching index in time (hT) is very low.

For better understanding of the differences between the model simulations for the middle and lower soil layers, Figure 11 shows the simulated θ2 and θ3 over time for one representative location within the catchment (see also Table 1). The soil moisture simulated by the two models for the second layer (θ2) differ mainly by a systematic bias between the models, while their temporal dynamics are very similar. Thus, both models similarly respond to precipitation and atmospheric demand as for the surface soil layer. For the third layer (θ3, from 30 cm down to maximum soil depth), soil moisture simulated by PfC is almost constant in time and lower than θ3 simulated by mHM. The bias is explained by the different soil parameterizations (different pedotransfer functions) used by the models in this mostly saturated soil conditions. The lower dynamics simulated by PfC is explained by the shallow groundwater system simulated only by PfC, which contributes to the saturation of the deeper soil layers and decouples θ3 from the atmospheric dynamics in the PfC simulations.

Details are in the caption following the image
Comparison of soil moisture θ (m3/m3) simulated by the two models in one selected grid cell in (top) the second layer and in (bottom) the third layer. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

Finally, we compare as an example the spatial variability of the simulated soil moisture in the middle soil layer θ2 on 28 May 2008 (Figure 12 and Table 1). mHM again shows larger spatial structures than does PfC, similar to the results for θ1, which we attribute to stronger effect of river, topography, and short-scale soil spatial variability on PfC. Also, large saturated patches occur in PfC, which can be attributed to the contribution of the shallow groundwater level.

Details are in the caption following the image
Spatial variability of soil moisture simulated by PfC and mHM in the second soil layer (5–30 cm) on 28 May 2018. mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

5.5 Agreement by Aggregating in Space and Time

The agreement/disagreement between both models is now progressively assessed by aggregating the model output in space and time as described in section 4.2. The bias term β is not affected by the aggregation; thus, the discussion will focus on the variability ratio γ, the correlation coefficient ρ, and the histogram matching index h.

The agreement between the temporal variability is summarized in Figure 13. The variability ratio (γT) for ET simulated by the two models is only slightly affected by aggregating in time and space; thus, differences between the models for this variable persist independently from the spatial and temporal resolutions. There is only a slightly better match by aggregating in time (γT decreases toward 1), while the agreement decreases by aggregating in space (γT increases). Similar results are obtained for the soil moisture in the upper and middle layers, also indicating a strong coupling between these processes. However, for the soil moisture simulated in the bottom soil layer θ3, the agreement decreases by aggregating in time, while it increases by aggregating in space. This behavior results from the saturated soil conditions simulated by PfC in several parts of the catchment as affected by the shallow groundwater system. In these locations, the soil moisture dynamics of PfC is negligible, which reduce its variability and hence decreases γT (cf. equation 4). Aggregation in time enhances this behavior because of a further reduction of the PfC dynamics.

Details are in the caption following the image
(top row) Variability ratio γ, (middle row) correlation coefficient ρ, and (bottom row) histogram matching index h calculated over the time domain (T) by aggregating the model output (ET and soil moisture θ at different soil layers 1, 2, and 3) at different spatial and temporal scales. Darker colors indicate better agreement. ET = evapotranspiration; mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

In contrast, the correlation between both simulations always tends to increase when aggregating in space and time for all the variables, which suggests that the differing descriptions of the integrated model processes become less relevant for larger scales. Since the correlation coefficient ρT for ET increases mainly by temporal aggregation, the differences between the models occur mainly on short time scales (days) but stay consistent in space. In contrast, the correlation between the soil moisture simulated by the two models at different depths mainly increases by aggregating in space and only to a lesser extent by aggregating in time. Thus, contrasting results are found more at specific locations, while differences stay consistent in time.

Finally, the histogram matching index hT shows a general improvement of the agreement between the models by aggregating in time and space. A different behavior is detected only for the soil moisture simulated in the third layer for which the agreement decreases by aggregating in time as it has been discussed for the variability ratio (γT).

For the spatial variability (Figure 14), tendencies due to aggregation are different; since the results are only slightly affected by aggregating in time, model differences are temporally persistent. When aggregating in space, however, results depend on the considered variable. For ET, model similarities decrease when aggregating in space; thus, especially the large-scale spatial structures produced by the models are different, which is consistent with the different spatial structures generated by precipitation as previously discussed (cf. Figure 6). In contrast, the agreement between the soil moisture in the upper two layers (θ1 and θ2) increases when aggregating up to ~10 km and decreases when aggregating further for all the three metrics. This behavior originates from the generally higher soil moisture variability simulated by PfC (see Figures 7 and 10). Since most of the spatial variability in PfC is found, however, on short spatial scales (<10 km), aggregating this scale first makes the two models more similar, and thus, the indices increase. When even coarser resolutions are considered (>10 km), soil moisture variability simulated by mHM is also reduced and the indices diverge again. Only the soil moisture of the lowest layer shows a different behavior as γS decreases monotonically; by aggregating in space soil-saturated patches become more dominant, while the spatial variability simulated by PfC decreases even further than that by mHM.

Details are in the caption following the image
(top row) Variability ratio γ, (middle row) correlation coefficient ρ, and (bottom row) histogram matching index h calculated over the space domain (S) by aggregating the distributed model output (ET and soil moisture θ at different soil layers 1, 2, and 3) at different spatial and temporal scales. Darker colors indicate better agreement. ET = evapotranspiration; mHM = mesoscale hydrological model; PfC = ParFlow-Community Land Model (CLM).

6 Discussion

The results obtained based on the intercomparison conducted at different spatial and temporal scales are further discussed (1) to identify the sensitivity of each model structure when considering specific applications and (2) to support experimental designs (which observations and how) suitable for further assessment and model improvement.

First of all, the analysis highlights that despite similar performances of both models in matching river discharge at the outlet, simulated states and fluxes within the catchment may substantially differ at different spatial and temporal scales. Thus, this result confirms the limited information content of the river discharge to constrain the spatial distribution outputs and the need of additional observations as discussed in several studies (Baroni et al., 2017; Demirel et al., 2018; Koch et al., 2016; Rakovec et al., 2016b; Stisen et al., 2018, 2011).

Specifically, the ET comparison reveals large differences in the processes simulated by the two models on short time scales (daily) and during rainy periods. These differences are quite consistent within the catchment, but they are reduced when results are aggregated in time (see Figure 13). Even if observations are always needed to assess the model performance (i.e., in case both models are wrong), this result supports the use of the spatially distributed output also at fine spatial resolution but only at relatively long temporal resolutions (e.g., weekly or monthly), because the results become independent from the model used. In addition, the differences identified during the short time periods indicate that ET observations at least at daily resolution (and for different weather conditions) are needed to evaluate the sensitivity of the models to the fast-changing atmospheric conditions (e.g., vapor water deficit) and to identify model deficiencies and thus to suggest improvements (Peng et al., 2018). Since these differences are consistent at different locations within the catchment, comparisons with observations collected by only a few but targeted eddy covariance stations within the catchment can be sufficient to discriminate intermodel differences and improvements. Remote sensing products with their typically low temporal sampling frequency (weekly and coarser), which are available only in clear sky conditions are inappropriate for assessing this particular behavior (Mu et al., 2007).

Soil moisture simulated by the two models at the upper soil layers shows a consistent temporal behavior, while differences depend largely on the location within the catchment. Accordingly, the correlation in time ρT is high, while the other metrics (βT, γT, and hT) can be low. This result supports the conclusion that surface soil moisture anomalies are mainly driven by the atmospheric forcing (ET and precipitation) and they are less affected by model structure and parameters. For this reason, these simulated model outputs can be used for, for example, drought analysis, independently from the model selected (Koster et al., 2009). However, the comparison of soil moisture anomalies alone is not appropriate for discriminating intermodel differences and to support model improvements. Specifically, as the models used different pedotransfer functions (Tóth et al., 2015; Zacharias & Wessolek, 2007) and land surface state and fluxes are sensitive to these parametrizations (Baroni et al., 2010; Loosvelt et al., 2011), better agreement (e.g., reducing bias) between the simulated outputs could be achieved by additional refining of the soil parameters.

Relevant differences in simulated soil moisture emerge when the spatial domain is considered. PfC produces small spatial-scale patterns attributed to the lateral water fluxes between grid cells reproduced in the surface and subsurface as driven by river network and topography. On the contrary, lateral water fluxes are directly routed in mHM and only larger spatial structures emerge that are attributed to atmospheric conditions and land cover. While the agreement slightly increases by aggregating in space up to the resolution of ~10 km, the differences further increase at larger scales. Even if these differences could be slightly compensated by refining specific model parameters (e.g., manning coefficient and soil parameters), the results highlight the challenge to define model structures that are independent of the scale of application (Beven, 2006) and the need of spatially distributed observations for improving our process understanding and assess hydrological models (Koch et al., 2018). Specifically, since soil moisture differences tend to be consistent over time, low temporal resolutions of observations (weekly) can be sufficient. Therefore, our results favor spatially highly resolved observations at a few snapshots in time above spatially sparse soil moisture observations at high temporal scales (hours). Thus, high-spatial-resolution soil moisture observations as provided by remote sensing products are valuable information to integrate in the models even if only the surface soil is detected (Barrett et al., 2009; Paloscia et al., 2013). The use of roving cosmic ray neutron sensing could be another promising tool for inferring soil moisture status and their integration into hydrological models, also because the signal integrates over most of the root zone (Chrisman & Zreda, 2013; Dong & Ochsner, 2018; Schrön et al., 2018).

Finally, differences between the simulated soil moisture increase for the deeper soil layers. A shallow groundwater system is simulated by PfC in agreement with the catchment conditions (Schalge et al., 2016), which affects soil moisture in several parts of the catchment and influences its dynamics by decoupling it from the atmospheric conditions (precipitation and ET). On the contrary, mHM does not explicitly simulate a groundwater system and the third soil layer is allowed to drain. For this reason, soil moisture simulated by mHM for deeper soil layers is more dynamic and it is still strictly coupled to the atmospheric forcing. As this dynamic could be relevant also for capturing land surface fluxes (Kollet & Maxwell, 2008; Kroes et al., 2018), comparisons with groundwater observations are highly recommended even when groundwater prediction is not intended to identify limitations also for other simulated states and land surface fluxes. Simple parametrizations to account for groundwater contributions on ET should be, for instance, included also in mHM to partially compensate this unresolved hydrological process (Liu et al., 2006).

7 Conclusions

In this study a comprehensive approach to compare distributed model outputs at different spatial and temporal scales is presented. The approach is applied to two hydrological models of different complexity: mHM, a spatially distributed process-based model that simulates the main hydrological processes of the land surface, and Parflow-CLM (PfC) , an integrated subsurface-surface hydrological model that explicitly resolves the 3-D variably saturated subsurface-surface system based on the Richards equation and the land surface energy balance. Three years of simulations at a very high resolution (100 m) for a medium-sized catchment in Germany (2,362 km2) is used as the basis for the comparison.

The analysis shows a good agreement in simulating river discharge at the outlet of the catchment. This result is remarkable, considering that the models are not calibrated but only share the same input and forcing. The comparison conducted for the distributed states and fluxes simulated by the two models within the catchment shows, however, several differences depending on the variable. Specifically, ET simulated by the two models is in good agreement when, for example, monthly time periods are considered, but differences increase at short time scales and during precipitation events. Soil moisture dynamics for the surface soil layers are consistent between the models but large differences are found in the horizontal spatial distributions. The differences even increase at the deeper soil layers due to the shallow groundwater tables simulated by PfC.

The agreement between the models detected in the temporal domain supports our understanding of these processes and some specific model applications (e.g., drought analysis based on soil moisture anomalies). But the discrepancies also identify the need for model improvements and the observations required to assess the model performance. In particular, for ET, observations by eddy covariance stations in few locations within the catchment should be used to assess and discriminate between models and to identify model improvements. Spatially highly resolved patterns provided, for example, by remote sensing product or cosmic ray neutron sensing rover observations, should be used for understanding the description of the lateral fluxes. Finally, comparison with groundwater levels should be considered even when groundwater prediction is not intended to better understand possible interactions with other compartments. In contrast, the analysis shows how the more common observations that are used to develop and assess land surface and hydrological models are less informative to disentangle the differences identified in the present study, for example, remote sensing products available only in clear-sky conditions or spatially sparse, temporally highly resolved soil moisture measurements.

Overall, the method developed for the comparison between hydrological models of different complexity provides a practical approach for a first assessment of the models. An in-depth and multiscale comparison of models, as presented in this contribution, is not limited by available observations but, on the contrary, identifies new observations required to resolve model differences. We believe that this type of intercomparison may play a crucial role to strengthen the dialog between modelers and experimentalists by identifying conflicts in our understanding and description of the processes and supporting useful and new data-collecting strategies. For these reasons, we encourage the collaboration of different modeling communities to apply this type of intercomparison to other models and at different environmental conditions (e.g., semiarid regions) to identify strengths and weakness in our capability to simulate states and fluxes at different spatial and temporal scales and pave the way for improving the representation of hydrological processes in Earth system models.

Acknowledgments

The study was supported by Deutsche Forschungsgemeinschaft (DFG) under grants AT 102/9-2 and SI 606/29-2 in the framework of the research unit FOR 2131 “Data Assimilation for Improved Characterization of Fluxes across Compartmental Interfaces.” Computing time for mHM (~6 core hours) has been provided by the EVE supercomputing facility available at UFZ. Computing time for Parflow-CLM (~6 million core hours) has been provided by the Gauss Centre for Supercomputing (http://www.gauss-centre.eu) and the facilities operated by the Jülich Supercomputing Centre (http://www.fz-juelich.de; Stephan & Docter, 2015). We kindly acknowledge our data providers: the European Environmental Agency, the Federal Institute for Geosciences and Natural Resources (BGR), and the Federal Agency for Cartography and Geodesesy (BKG). Finally, we thank Lieke Melsen and another anonymous Reviewer for the valuable comments that help improving the manuscript. The comments provided by Editor and Associate Editor are also highly appreciated.