Tradeoffs Between Temporal and Spatial Pattern Calibration and Their Impacts on Robustness and Transferability of Hydrologic Model Parameters to Ungauged Basins
Abstract
Optimization of spatially consistent parameter fields is believed to increase the robustness of parameter estimation and its transferability to ungauged basins. The current paper extends previous multi-objective and transferability studies by exploring the value of both multi-basin and spatial pattern calibration of distributed hydrologic models as compared to single-basin and single-objective model calibrations, with respect to tradeoffs, performance and transferability. The mesoscale Hydrological Model (mHM) is used across six large central European basins. Model simulations are evaluated against streamflow observations at the basin outlets and remotely sensed evapotranspiration patterns. Several model validation experiments are performed through combinations of single- (temporal evaluation through discharge) and multi-objective (temporal and spatial evaluation through discharge and spatial evapotranspiration patterns) calibrations with holdout experiments saving alternating basins for model evaluation. The study shows that there are very minimal tradeoffs between spatial and temporal performance objectives and that a joint calibration of multiple basins using multiple objective functions provides the most robust estimations of parameter fields that perform better when transferred to ungauged basins. The study indicates that particularly the multi-basin calibration approach is key for robust parametrizations, and that the addition of an objective function tailored for matching spatial patterns of ET fields alters the spatial parameter fields while significantly improving the spatial pattern performance without any tradeoffs with discharge performance. In light of model equifinality, the minimal tradeoff between spatial and temporal performance shows that adding spatial pattern evaluation to the traditional temporal evaluation of hydrological models can assist in identifying optimal parameter sets.
Key Points
-
Clear improvements in simulated spatial patterns can be achieved with very limited tradeoff with discharge performance
-
Multi-basin and spatial pattern-based calibration improve parameter realism and transferability to ungauged basins
Plain Language Summary
Hydrological models typically require local observations of river flow to calibrate the models and test their predictive capability. This limits the possibility for predictions in ungauged basins. This study used holdout tests to investigate the robustness of hydrological predictions for ungauged basins. Particularly we investigate how adding more basins and observed spatial patterns of evapotranspiration in the calibration of these models impact this robustness and transferability of model parameters to basins not used for calibration. Results show that transferability and spatial consistency of parameters increase when adding more basins and spatial pattern observations.
1 Introduction
High-resolution distributed hydrological models are increasingly being employed to address and solve a broad range of water-related issues (Bierkens et al., 2015). Output from such models is used to analyze hydrological responses and climate impact assessments at a scale far below the spatial scale at which the models are calibrated and validated (Mizukami et al., 2017; Samaniego et al., 2017). Similarly, models are generalized based on parameterizations obtained from neighboring basins, which is particularly pertinent for efficiently managing ungauged basins (Hrachowitz et al., 2013). This poses two fundamental challenges; how do we improve the reliability of model simulations at a higher spatial resolution and how do we develop robust parametrizations that can be transferred meaningfully to neighboring catchments? (Fenicia et al., 2014; Kirchner, 2006; Kumar, Samaniego, & Attinger, 2013; Samaniego et al., 2010).
Integration of satellite remote sensing data with distributed hydrological models has been a common path toward improving the reliability of hydrological model simulations (Dembélé et al., 2020). This development has followed with the progress and availability of remotely sensed data sets, which have evolved significantly over the past decades, although accuracy of satellite-based data sets remains varying (Ko et al., 2019; Stisen et al., 2021).
Several studies have addressed the impacts of adding remotely sensed observations to streamflow calibration (Odusanya et al., 2022; Rientjes et al., 2013; Sirisena et al., 2020). Nijzink et al. (2018) presented a large modeling effort illustrating the impact of adding different remotely sensed products across five different conceptual model and 27 European catchments. They analyzed 1023 possible model combinations regarding model constraint and showed an added value of remotely sensed data in the absence of streamflow data. In a recent model intercomparison paper Mei et al. (2023) analyzed different model calibration strategies combining streamflow and global gridded soil moisture and evapotranspiration data sets. They found that adding soil moisture to the streamflow calibration improved evapotranspiration performances. Mei et al. (2023) also included a review of 16 previous papers on the subject of constraining models using a combination of streamflow and remotely sensed data. Both the study by Nijzink et al. (2018) and Mei et al. (2023), and 14 out of the 16 papers in Mei et al. (2023) review applied spatially averaged time series of the remotely sensed data. By this approach, the spatial information in the satellite data is ignored and the hydrological model evaluation remains limited to the temporal component of the models. This traditional focus on the temporal performance, defined as the comparison of simulated and observed time series, is often selected due to several factors related to either the spatial resolution of the models and the remote sensing data or lack of spatial performance metrics and optimization frameworks. However, temporal model evaluation, whether in the form of metrics based on discharge records or basin average remote sensing observations fall short on evaluating the spatially distributed states or fluxes simulated by distributed hydrological models. Inadequate evaluation of simulated spatial patterns becomes particularly problematic when distributed models are utilized for detailed spatial predictions such as impacts of land cover change or changes in water management. If a model performs well on discharge at the outlet but has a wrong representation of the relative contributions to streamflow from different regions or land covers its relevance for predicting impacts of land cover changes will be low.
Therefore, an particular avenue addressing the challenges of distributed model fidelity, is the use of spatial pattern information from remote sensing data to constrain hydrological model parametrization (Dembélé et al., 2020; Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018; Henriksen et al., 2022; Soltani, Bjerre, et al., 2021; Zink et al., 2018).
The fundamental idea behind this approach is to employ a multi-objective calibration framework that adds to the traditional discharge-based calibration, an independent set of objective functions that mainly reflects the observed spatial pattern of key hydrological states or fluxes. This approach differs from multi-objective calibrations based on multiple metrics calculated from the same observation (e.g., streamflow timeseries) or application of basin average timeseries of remotely sensed data (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018). In addition, independence in the optimization approach can be obtained by adding the new information source in combination with a pareto-achieving optimizer which circumvents the need to join multiple objective functions into a single score (Mei et al., 2023).
A previous study by Zink et al. (2018) incorporated land surface temperature patterns in model calibration and showed that this helped to better constrain the model parameters connected to evapotranspiration when compared to calibrations based on streamflow only. Moreover, in their study the model performance regarding evapotranspiration increased at seven eddy flux measurement sites used for evaluation. Adding new constraints to calibration decreased streamflow performance yet the authors of that study illustrated how land surface temperature data could secure better results for ungauged basins. For a single Danish basin, Demirel, Mai, et al. (2018) developed a spatial pattern-oriented calibration framework and a new spatial performance metrics, and illustrated a small tradeoff between streamflow and spatial pattern performance. Dembélé et al. (2020) applied a similar calibration framework to a model study of the poorly gauged Volta River basin in West Africa. They showed that while streamflow and terrestrial water storage performance decreased by 7% and 6%, respectively, soil moisture and evapotranspiration performances increased by 105% and 26% respectively when including the spatial calibration framework with multiple objectives. Soltani, Bjerre, et al. (2021) illustrated how adding spatial pattern optimization to a national scale groundwater model improved evapotranspiration patterns and altered groundwater recharge patterns without deteriorating groundwater head and discharge performance significantly. Other, recent studies such as Xiao et al. (2022) and Ko et al. (2019) have utilized spatial patterns of land surface temperature for hydrological model evaluation. However, in the context of our current study, Xiao et al. (2022) and Ko et al. (2019) did not address the tradeoffs between different optimization strategies and streamflow performance.
As a results of increased availability of remotely sensed data sets combined with machine learning approaches and computational power, many gridded spatial products are now available (Belgiu & Drăguţ, 2016; Feigl et al., 2022). Despite the varying accuracy of both satellite data and machine learning approaches, these gridded data sets facilitate the spatial characterization of hydrologic variables and fluxes and enable spatial model evaluations. However, to optimize the simulated spatial patterns of a hydrological model, the model parametrization scheme needs to be fully distributed and spatially flexible. In this context, the multi-scale parameter regionalization (MPR) method (Samaniego et al., 2010) represented a significant advancement, which was initially included in the mesoscale Hydrological Model (mHM) (Kumar, Samaniego, & Attinger, 2013; Samaniego et al., 2010). Afterward, it has been incorporated into several other modeling frameworks (Lane et al., 2021; Mizukami et al., 2017; Tangdamrongsub et al., 2017) and it is available as a stand-alone parametrization tool that can be coupled to hydrological models (Schweppe et al., 2022). Other studies have developed similar flexible parametrizations schemes based on pedo-transfer functions using gridded data (Feigl et al., 2020; Ko et al., 2019; Soltani, Bjerre, et al., 2021).
-
What are the tradeoffs between temporal and spatial model performance investigated in a pareto-achieving optimization framework?
-
How does multi-basin and spatial pattern-oriented calibration impact model performance and transferability to ungauged basins?
In this study, we demonstrate the impact of multi-site and multi-objective calibration compared to single-site and single-objective parameter estimation, that is, the most common practice in hydrologic modeling, specifically in the context of parameter transferability to ungauged basins. The single versus multi-objective comparison specifically addresses temporal versus spatial model evaluation. The impact on parameters transferability via adding spatial patterns into the model calibration, is a novel aspect that has not received much attention in the literature so far. The study is conducted for six mesoscale central European basins.
The distributed modeling study is carried out in the framework of a flexible spatial model parameterization scheme in combination with observed spatial patterns of actual evapotranspiration (AET) derived from satellite data. We apply the mHM model code since it suits the applied calibration framework well due to its flexible model parametrization schemes based on pedo-transfer functions to distribute soil parameters and the built-in multi-scale parameter regionalization.
We design a set of model calibration experiments including both single- and multiple basins as well as single- and multi-objective calibrations and two jack-knife experiments, that is, sequentially keeping one or five of six basins out of the joint calibration approach. Model simulations are evaluated based on temporal discharge performance and spatial AET performance using long term average monthly pattern maps, appropriate objective functions and a global multi-objective pareto-achieving search algorithm is applied to illustrate the exact tradeoff between the two objectives.
2 Methodology
Catchments, observed data (both Section 2.1), and the hydrologic model (Section 2.2) are presented in this section. The objective functions to evaluate the model performance of simulated discharge and AET, respectively, are described in Section 2.3. A sensitivity analysis performed to determine the most important parameters for model calibration is described in Section 2.4, while Section 2.5 describes the calibration and validation setup including a brief description of the multi-objective calibration algorithm applied.
2.1 Catchments and Hydro-Meteorological Data
This study is conducted using six European catchments, that is, Elbe, Main, Meuse, Mosel, Neckar, and Vienne with drainage areas varying from 12,775 to 95,042 km2. The catchments are spread over Central Europe and represent a diversity of soil texture, land use, and land cover. The mean annual rainfall varies from 637 to 874 mm while the mean annual runoff varies from 184 to 398 mm (Table 1). The six catchments are selected based on two criteria: good model performance obtained in previous studies (Rakovec, Kumar, Mai, et al., 2016) and spatial patterns of AET that are likely dominated by land-surface heterogeneity, that is, land cover and soil properties, rather than a strong climate gradient. The latter will facilitate a meaningful model calibration driven by spatial patterns since simulated patterns can be adjusted through the surface parametrization within the hydrological model and are not purely driven by climate (Koch et al., 2022). In basins with a large climate gradient, simulated spatial patterns are typically easier to simulate even with a suboptimal spatial parametrization since the patterns are to a lesser degree controlled by the model parameters and will display correct overall patterns enforced by the climate forcing data.
Area (km2) | P (mm) | ETref (mm) | Q (mm) | |
---|---|---|---|---|
Elbe | 95,042 | 637 | 755 | 184 |
Main | 14,117 | 736 | 773 | 243 |
Meuse | 20,143 | 874 | 741 | 398 |
Mosel | 27,127 | 872 | 777 | 365 |
Neckar | 12,775 | 858 | 782 | 335 |
Vienne | 19,892 | 815 | 864 | 308 |
Average temperature, precipitation, and Reference ET (ETref) data are available at daily time steps and over 0.25° grids for the period extending from 1980 to 2018, whereas the length of the observed daily discharge data varies between catchments. Daily averaged meteorological data (P and ETref) were obtained from the E-OBS and ERA-5 reanalysis data sets (Cornes et al., 2018; Hersbach et al., 2020). Reference ET, estimated based on the Hargreaves–Samani model using ERA-5 air temperature data (daily minimum, maximum, and mean) as input (Hargreaves & Samani, 1985). The Hargreaves-Samani model is a parsimonious option with low data demand and reasonable accuracy (Pohl et al., 2023) and was therefore chosen in this study based on previous experiments showing that different formulations of PET/ETref could be applied with no significant impact on the simulations regarding capturing the inter-annual variability across different European regions (Rakovec, Kumar, Mai, et al., 2016).
In addition to the six outlet discharge gauges used in model calibration, we obtained daily data from 46 gauging stations from the Global Runoff Data Center (GRDC, 2023) for internal validation of the six catchment models. Remotely sensed AET estimates for the period from 2002 to 2014 were obtained using MODIS data and the two-source energy balance method (Norman et al., 1995) as described in Stisen et al. (2021). Digital elevation model (DEM) data were retrieved from the Shuttle Radar Topographic Mission (SRTM, Farr et al., 2007). Soil texture variables, clay content, sand content, and bulk density were derived from the SoilGrid Database (Rakovec, Kumar, Attinger, et al., 2016; Rakovec et al., 2019). The soil texture data for six layers with varying depths (5, 15, 30, 50, 100, and 200 cm) and a tillage depth of 30 cm are introduced as input to the model. All input data were resampled to a common spatial resolution of 0.001953125° (∼200 m) (Rakovec, Kumar, Mai, et al., 2016). A MODIS-based land use map was reclassified into three classes, namely forest, pervious, and impervious. Long-term monthly leaf area index (LAI) maps used to calculate the spatiotemporally varying crop coefficient are based on the MODIS MOD16A2.v061 product (Running et al., 2021). The original 8-day composite LAI maps were aggregated to long-term monthly means sampled at a matching spatial resolution of ∼200 m.
2.2 Hydrologic Model
Second, another transfer function utilizes spatially distributed soil texture maps to allow for an incorporation of soil physical properties in the spatial parametrization of root fraction coefficients (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018). Hereby, the root fraction coefficient can vary with both vegetation and soil type and is used in mHM to calculate root water uptake as part of the AET reductions factor (Samaniego, et al., 2021a, 2021b). During calibration, both transfer functions increase the model flexibility to adjust the spatial AET patterns retrieved from satellite data (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018; Koch et al., 2022). Finally, the total runoff generated at every grid cell is routed to its neighboring downstream cell using the adaptive timestep spatially varying celerity method for the river runoff routing scheme (Thober et al., 2019).
mHM has previously been parameterized and successfully calibrated against multiple satellite-based data sets including AET and terrestrial water storage anomalies (GRACE), land surface temperature, and soil moisture at multiple spatial scales over numerous river basins (Busari et al., 2021; Ekmekcioğlu et al., 2022; Koch et al., 2022; Kumar, Livneh, & Samaniego, 2013; Rakovec, Kumar, Attinger, et al., 2016; Rakovec, Kumar, Mai, et al., 2016; Zink et al., 2018).
In this study, the following four different spatial resolutions are defined in the mHM model: 0.001953125° for the morphological characteristics (L0 scale), 0.015625° for the hydrologic modeling resolution (L1), 0.0625° for runoff routing (L11), and 0.25° for the meteorological forcing (L2). Note that around 50° North these resolutions correspond approximately to 140/430 m, 1.1/3.5 km, 4.5/7 km, and 18/28 km lon/lat for L0, L1, L11, and L2 respectively. Finally, a 13-year period (2002–2014) with a 4-year warming period (1998–2001) was simulated at a daily timestep for calibration and evaluation of the discharge performance and spatial pattern match between remote sensing based and simulated AET (from 2002 to 2014). The remote sensing based AET is estimated with the Two-Source Energy Balance method (TSEB) (Norman et al., 1995), using MODIS data including land surface temperature, albedo and NDVI. For a full description of the AET data set and comparison to other estimates for Europe the reader is referred to (Stisen et al., 2021).
2.3 Evaluation Metrics and Objective Functions
For both KGE and SPAEF metrics we combine individual stations and seasons into single SSR OF's to limit the number of independent OF's (one for KGE and one for SPAEF). It is only feasible to work with a few OF's (2–3) because the number of model evaluations required to define the Pareto front, and interpret trade-offs, increases drastically with increasing number of OF's.
2.4 Sensitivity Analysis
Identification of the optimal parameter set through a calibration framework can be cumbersome if the dimension of the search space is not limited by a sensitivity analysis first. mHM has 69 parameters (Samaniego et al., 2021a, 2021b) each increasing the dimension of the search space. Focusing a calibration on parameters that are sensitive regarding the selected OFs is computationally more efficient than calibrating all parameters (Demirel, Koch, Mendiguren, & Stisen, 2018). To reduce the computational burden by narrowing the number of calibration parameters, a one-at-a-time (OAT) sensitivity analysis was conducted to identify the most important parameters for calibration using the PEST Toolbox (Doherty, 2010). Although the parameter interactions are not accounted for in this local OAT method, it provides an indication of sensitive parameters especially if combined with the expert opinion which can complement the assessment of parameter interactions.
The sensitivity analysis was based on an initial parameter set obtained from a previous calibration against KGE for the same model setup across Europe (Rakovec & Kumar, 2022), although with a different parameterization scheme for Kc and root fraction distribution. For the new parametrization we used parameters from a few test runs combined with experience from (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018). We used KGE as OF for discharge performance and SPAEF as OF for spatial pattern performance of the model, and the initial parameter set gave reasonable performance on both OF's. The purpose of the sensitivity analysis was to reduce the number of free parameters to the make the optimization more efficient and to ensure that parameter that were important for either KGE or SPAEF were selected. In addition, the new parameters introduced for the Kc and root fraction distribution were included in the analysis. Some parameters, such as routing parameters have no impact on the spatial pattern performance, whereas other parameters related to the spatial parameter distribution has limited impact on KGE. Each parameter was perturbated two times (5% increased and 5% decreased based on the initial point) to calculate the average sensitivity index of OFs for the change in the parameter value. This index value is then multiplied by the absolute parameter value to account for the parameter magnitude in the calculations. Finally, the sensitivities are normalized by the maximum of the group. The analysis is solely used to select parameters for optimization, the sensitivity does not carry over to the optimization, since the DSS algorithm does its parameter search independently of the initial sensitivity analysis. For all subsequent calibration tests the same parameters have been selected and as such each calibration experiment optimize the same parameters using the same parameter intervals.
2.5 Experimental Design of Calibration and Validation
In total, 26 calibration experiments were designed to investigate the potential benefits of incorporating AET to augment a multi-objective and multi-basin calibration framework (Figure 1). Note that SSRQ is incorporated in all calibration experiments as objective function while SSRAET is used only for the KSP1, KSP5, and KSP6 calibration experiments. Note that KSP stands for KGE and SPAEF multi-OF calibration whereas KGE stands for KGE only based single-OF calibration. The indices 1, 5, and 6 show the number of basins included in the calibration experiments as conceptualized in Figure 1. In this study, we did not include an AET-only scenario as it failed to reproduce reasonable water balances in our preliminary tests and also a previous study (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018).
All 26 calibration experiments (cases) were performed with the open-source model-agnostic Ostrich optimization toolbox written in C++ (Matott, 2017). For all 26 calibration experiments, the parallel implementation of the Pareto-Archived Dynamically Dimensioned Search (ParaPADDS) algorithm was used (Asadzadeh & Tolson, 2013). This algorithm is the multi-objective version of the Dynamically Dimension Search (Tolson & Shoemaker, 2007) algorithm that identifies a Pareto front of non-dominated optimal solutions, which is most appropriate for our multi-objective calibrations (Beume & Rudolph, 2006; Razavi & Tolson, 2013). Moreover, ParaPADDS algorithm reached reasonable solutions for both single and multiple OFs; therefore, we used the same search algorithm in all scenarios for consistency. The ParaPADDS algorithm was configured with user-defined maximum 750 iterations, with 3 parallel nodes (logical processors), a perturbation value of 0.2, and the exact hypervolume contribution as the selection criterion. Note that initial tests for one basin with 200, 500, and 1,000 iterations indicated stable results already at 500, but a somewhat incomplete Pareto-front. Based on this and in the interest of saving computation time, we decided on 750 iterations. Like all multi-objective calibration methods, the algorithm does not provide a single best solution for the multiple OF problem. Still, it offers the modeler a set of possible solutions on the Pareto front (Asadzadeh & Tolson, 2013).
KGE1 and KGE6 calibrations resulted in the single best parameter set that was used to create our final results in the following figures. KSP1 and KSP6 calibrations provided multiple possible solutions on the Pareto front with KGE as one axis and SPAEF as the other axis. To systematically select a best-balanced parameter set, we picked the solution that is closest to the origin by normalizing both axes (SSRQ and SSRAET) using min-max normalization and choosing the minimum of the sums, similar to the approach by Martinsen et al. (2022). The normalization is applied to avoid the metric-magnitude effects on the selection. KSP1 and KSP6 results presented hereafter are generated using this selected single parameter set. Calibrations were done with the six discharge gauges and three seasonal AET maps (March–November). We used 46 discharge stations from GRDC for internal validation of the six catchment models and we show the results of KGE5 and KSP5 cases as maps (see Figure 6).
3 Results
3.1 Sensitivity Analysis
Table 2 shows the 20 most influential parameters out of 69 mHM parameters selected based on the combined sensitivity of the two metrics. We used these normalized sensitivities varying from 0% to 100% and applied a threshold of 1% for at least one of the OF's for selecting the most sensitive parameters for calibration (20). Based on the KGE, the five most sensitive parameters controlling discharge are RotFrCofClay, RotFrCofFore, PTFLowConst, Kc,min_ pervi, PTFKsConst which are parameters mainly controlling the AET and thereby the water balance. KGE is also sensitive to some routing parameters but generally less than the parameters controlling AET levels. The SPAEF OF is most sensitive to the parameters RotFrCofClay, RotFrCofFore, Kc,min _pervi, and Kc,min _forest, which is almost identical to the most sensitive parameters for KGE. Additionally, parameters associated with simulated patterns, for example, related to pedo-transfer functions for soil properties are important for SPAEF. Conversely, SPAEF has zero sensitivity to routing parameters. Overall, the most sensitive parameters contribute to spatial heterogeneity of root fraction coefficients, crop coefficients, infiltration factor and field capacities, of the grid cells.
Normalized sensitivity (%) | ||||
---|---|---|---|---|
Parameter abbreviation (in the mHM namelist) | Description | Range | KGE | SPAEF |
ExpSlwIntFlW | Exponent slow interflow | 0.05–0.3 | 3.6 | 0.0 |
InfShapeF | Infiltration shape factor | 1–4 | 2.6 | 1.2 |
IntRecesSlp | Interflow recession slope | 0–10 | 1.2 | 0.0 |
Kc,min_forest | Kc,min—forest in dynamic scaling function (Kc) for PET | 0.3–1.3 | 10.1 | 19.1 |
Kc,min_impervi | Kc,min—impervious in dynamic scaling function (Kc) for PET | 0.3–1.3 | 0.5 | 2.7 |
Kc,min _pervi | Kc,min—pervious in dynamic scaling function (Kc) for PET | 0.3–1.3 | 15.3 | 15.2 |
Kc,max | Vegetation scaled Kc component | 0–1.5 | 3.7 | 2.2 |
a | Exponent coefficient for Kc | −2–0 | 1.6 | 0.9 |
PTFHigConst | Constant in Pedo-transfer function for soils with sand content higher than 66.5% | 0.5358–1.1232 | 0.3 | 1.1 |
PTFKsconst | Constant in pedo-transfer function for hydraulic conductivity of soils with sand content higher than 66.5% | −1.2–−0.285 | 11.7 | 0.5 |
PTFKssand | Coefficient for sand content in pedo-transfer function for hydraulic conductivity | 0.006–0.026 | 3.5 | 0.3 |
PTFLowclay | Constant in Pedo-transfer function for soils with clay content lower than 66.5% | 0.0001–0.0029 | 1.5 | 1.5 |
PTFLowConst | Constant in Pedo-transfer function for soils with sand content lower than 66.5% | 0.6462–0.9506 | 21.1 | 13.9 |
PTFLowDb | Coefficient for bulk density in Pedo-transfer function for soils with sand content lower than 66.5% | −0.3727 to −0.1871 | 10.9 | 8.7 |
RechargCoef | Recharge coefficient | 0–50 | 1.9 | 0.0 |
RotFrCofClay | Root fraction for clay | 0.9–0.999 | 100.0 | 100.0 |
RotFrCofFore | Root fraction for forest areas | 0.9–0.999 | 57.2 | 75.9 |
RotFrCofImp | Root fraction for impervious areas | 0.9–0.999 | 2.1 | 9.3 |
RotFrCofSand | Root fraction for sand | 0.001–0.09 | 1.9 | 2.3 |
SlwIntReceKs | Slow interception | 1–30 | 1.4 | 0.0 |
- Note. The parameter abbreviations correspond to the name of the parameter in the mHM setup (In the mHM namelist).
3.2 Calibration Results
Figure 2 shows the model calibration results for single basin calibrations using single (KGE1) and multi-objective functions (KSP1). Each calibration is performed using 750 model runs distributed in three parallel processors where non-dominated runs leading to a Pareto front are identified by the ParaPADDS algorithm (Asadzadeh & Tolson, 2013). A solution is called non-dominated if there is no other solution that is better in all objectives analyzed. Although the calibrations do not depict a clear Pareto front due to the combined plotting of KSP1 and KGE1, the tradeoff between only discharge and spatial performances is clearly distinguishable from the plots. Only KGE (KGE1) calibrations lead to slightly better KGE performance and much poorer spatial AET pattern performance than those in multi-objective calibrations (KSP1). However, KSP1 calibrations enable to identify a more-balanced solution leading to higher SPAEF performance and only slightly poorer KGE performance than KGE1's single solution for each basin (shown as a triangle). While it is well known that good KGE performance does not guarantee a good spatial pattern performance, it is a novel finding that there is a very limited tradeoff between the temporal and spatial performance of the models.
Table 3 shows the KGE performance for the best-balanced solutions from KGE1 and KSP1 along with the SPAEF calculated across all six basins when combining the six single basin calibrations. SPAEF is calculated across all basins, since we are not interested in individual basin SPAEF values for the joint evaluation, but the spatial pattern similarity across all basins. Generally, all calibrations can lead to KGE performances in the range of 0.84–0.96 as shown in Table 3.
Basin | KGE1 | KSP1 | KGE6 | KSP6 |
---|---|---|---|---|
Elbe KGE | 0.89 | 0.84 | 0.87 | 0.84 |
Main KGE | 0.94 | 0.91 | 0.84 | 0.84 |
Meuse KGE | 0.96 | 0.93 | 0.91 | 0.93 |
Mosel KGE | 0.96 | 0.91 | 0.90 | 0.90 |
Neckar KGE | 0.94 | 0.92 | 0.90 | 0.89 |
Vienne KGE | 0.91 | 0.90 | 0.85 | 0.86 |
Average KGE | 0.93 (0.02) | 0.90 (0.03) | 0.88 (0.03) | 0.88 (0.04) |
Across Basins SPAEF | −0.45 (0.18) | 0.61 (0.06) | 0.02 (0.40) | 0.61 (0.10) |
- Note. Values for the KSP-calibrations represent the best-balanced solutions from the pareto fronts. Values in parentheses are STD across stations for Average KGE and across seasons for SPAEF.
Subsequently, a multi-basin calibration was conducted again with both single (KGE6) and multiple (KSP6) objectives (see Section 2.5 for details). The results are shown in Figure 3 and Table 3. The model performance results mimic the results of the single basin test, with similar KGE performances; however, with a significant performance increase for SPAEF, from 0.02 with KGE6 to 0.61 with KSP6, as would be expected when adding the spatial pattern objective function. Table 3 highlights the limited tradeoff for KGE both for individual stations and averages.
Figure 4 illustrates the spatial AET maps from TSEB (observed) and the various calibration tests. For the multi-objective calibrations (KSP1 and KSP6), the best-balanced solution (closest point to the origin) is chosen for visualization. The maps clearly show the issues related to KGE1, regarding spatial pattern performance. For three out of six basins, that is, Elbe, Mosel, and Vienne, the KGE1 calibration has resulted in a strikingly poor spatial AET pattern (compared to KSP1) where distinct low and high AET areas were inverted as compared to the TSEB pattern. In contrast, including the SPAEF metric in the optimization (KSP1) prevented such errors without any substantial loss in KGE performance (average KGE of 0.93 for KGE1 and 0.90 for KSP1, Table 3).
Interestingly, the KGE6 calibration, that is, without any spatial pattern constraint, was able to represent the overall pattern to some extent across the six basins, although with a significantly underestimated variance and some substantial differences. This emphasizes the value of joint multi-basin calibration for robustness in spatial parametrization within the MPR parametrization scheme. Adding the SPAEF metric to the multi-basin calibration (KSP6), generated the best spatial similarity to TSEB, although not better than combining spatial AET from the six individual KSP1 calibrations maps into one map (Figure 4 and Table 3). Comparing KGE1 and KGE6 calibrations illustrates the reduction in KGE performance from averages of 0.93 falling to 0.88, when seeking one common parametrization in KGE6. While the sampling uncertainty (0.01–0.03) in KGE scores (see Appendix A) are typically lower than change in KGE performance (0.05) they are of a similar magnitude. Analysis of the sampling uncertainty suggests that when moving from the KGE1 to KGE6 calibration approach, the uncertainties remain the same but centered around lower KGE values.
The higher KGE performance obtained from single basin optimization does however come with a very poor SPAEF performance of −0.45 for KGE1 compared to 0.02 for KGE6. Although the SPAEF for KGE6 is also low, this is mainly attributed to the variance component of SPAEF (Figure 4).
Even though the model performance of simulated spatial patterns across the six basins shares some similarities for KSP6 and KSP1, there is a marked difference between the parameter distributions that generate the spatial AET patterns. This is shown in Figure 5 displaying the resulting parameter fields of field capacity and crop coefficient, which are calculated in mHM and represents the key controls of AET simulations. The field capacity and crop coefficients are not parameters that are assigned directly in mHM but are the results of several transfer function parameters. Therefore, field capacity and crop coefficient are not included in Table 2, which lists the transfer parameters that generate them (Kc* and PTF*). Although KSP1 calibrations generate parameter distributions that have meaningful patterns of field capacity within each basin, it fails to form one consistent seamless parametrization across the basins (Figure 5). Similarly, for the KGE1 and KGE6 calibrations, the spatial inconsistency resulting from single basin calibration becomes apparent. For field capacity (Figure 5), the parametrization obtained from KGE6 and KSP6 is relatively similar, although the KSP6 results in a slightly larger variance. A different picture emerges for the crop coefficient where KSP1 generates patterns similar to KSP6. At the same time KGE6 produces very different patterns with unreasonably high values for urban areas and little impact of vegetation patterns on crop coefficients. This difference is due to the crop coefficient parameter pattern mainly being constrained by the SPAEF OF, while the KGE OF on discharge also constrains the field capacity parameter.
3.3 Cross-Validation Results
To investigate the potential impact of the calibration strategy on the transferability of parameters to ungauged basins, two Jack-knife tests were applied. The two tests are holding out five (KGE1-KSP1) or one (KGE5-KSP5) basins simultaneously and evaluating only the uncalibrated basins using parameters obtained calibrating either one or five other basins. These tests are performed for both single- and multi-objective calibrations, resulting in four parameter transfer tests.
Results for the single-basin calibrations and subsequent evaluation of the performance of parameter transfer to five ungauged basins based on the KGE1 and KSP1 calibrations are shown in Table 4. For each discharge evaluation, KGE is calculated as the average across all basins, each represented in five holdout evaluations (a total of 30 ungauged evaluations). The SPAEF is calculated based on three seasons for six holdouts (a total of 18 pattern evaluations). Table 4 shows that discharge performances with average KGE of 0.79 and 0.83 across ungauged basins, and similar between KGE1 and KSP1, although the latter performs better. Compared to the KGE6 and KSP6 calibrations (both with an average KGE of 0.88 in Table 3, relatively little loss in performance for discharge is noticed, even for ungauged cases.
Basin | KGE1 holdout | KSP1 holdout | KGE5 holdout | KSP5 holdout |
---|---|---|---|---|
Elbe KGE | 0.72 (0.05) | 0.76 (0.06) | 0.83 | 0.84 |
Main KGE | 0.76 (0.09) | 0.77 (0.04) | 0.81 | 0.80 |
Meuse KGE | 0.84 (0.08) | 0.91 (0.03) | 0.89 | 0.94 |
Mosel KGE | 0.85 (0.03) | 0.88 (0.04) | 0.88 | 0.87 |
Neckar KGE | 0.82 (0.08) | 0.87 (0.03) | 0.89 | 0.90 |
Vienne KGE | 0.74 (0.05) | 0.79 (0.11) | 0.79 | 0.83 |
Average KGE | 0.79 (0.08) | 0.83 (0.08) | 0.85 (0.04) | 0.86 (0.05) |
Across Basins SPAEF | −0.10 (0.45) | 0.41 (0.19) | 0.25 (0.23) | 0.49 (0.15) |
- Note. Values for the KSP-calibrations represent the best-balanced solutions from the Pareto fronts. For KGE1 and KPS1 KGE values are averages across five holdout experiments. Values in parentheses are STD across holdout experiments for single stations and across stations and holdout solutions for average KGE and for SPAEF STD is calculated across seasons and holdout solutions.
For the spatial pattern evaluation, the performance for the KGE1 parameter transfer has low average SPAEF across all basins, while the standard deviations are large across seasons. For KSP1, the results of SPAEF are much better with an average of 0.41. This indicates that single basin calibration with multiple objectives can better make robust predictions for ungauged basins when both discharge and AET patterns are considered in calibration at gauged locations.
The single basin holdout evaluation based on the KGE5 and KSP5 calibrations (Table 4) shows that discharge performances (average of 0.85 and 0.86) are better than the five-basin holdout (KGE1 and KSP1) and very similar to the KGE6 and KSP6 calibrations. Again, the multi-objective calibrations seem more robust for parameter transfer when evaluated against discharge only. For the SPAEF performance evaluation KGE5 performs better than KGE1, indicating better parameter transfer when calibrated against more and diverse basins. However, spatial pattern performances are still considerably better for the ungauged assessment based on multiple objectives in KSP5. Also, KSP5 (SPAEF around 0.5) performs better than KSP1 (SPAEF around 0.4).
In summary, the four ungauged basin tests indicate that discharge can be predicted with average KGEs around 0.79–0.83 across the six selected basins based on parameter transfer from calibration of neighboring basins, even when only a single basin is used to estimate parameters for five neighboring basins. Performances on discharge improve further when including an additional objective function in the form of AET patterns and when calibrating across five basins and evaluating on a single holdout basin. Similarly, spatial patterns can be simulated with average SPAEF values of 0.41 and 0.49, that is, somewhat lower than KSP6 at 0.61, when only accounting for AET patterns from neighboring basins in the parameter estimation. On the contrary, spatial patterns are very poorly represented when parameters are based on single-basin and single-objective calibrations (KGE1).
In addition to the jack-knifing validation for ungauged basins, a validation test for internal discharge stations was performed for the KGE5 and KSP5 holdout (ungauged) simulations. This test was intended to analyze the possible added value of spatial pattern calibration on internal discharge stations' performance compared to a pure discharge calibration. The spatial pattern calibration based on the bias-insensitive SPAEF metric, does not add a temporal constrain on the model optimization and will not directly influence the temporal performance of the simulated downstream discharge. However, the impact of the SPAEF based calibration will alter the spatial pattern of AET and thereby the internal water balances within the larger basins. Therefore, the internal validation focuses on the discharge bias (Equation 2; β term) alone and not the KGE, in the attempt to quantify a possible reduction of simulated streamflow biases for internal validation points.
Since spatial patterns of AET are only included for the period March–November, they are likely to mainly influence the summer water balance where AET has the most impact. Hence, annual and summer statistics are estimated separately. Figure 6 illustrates the location of 46 internal discharge stations and the difference in absolute bias (%) between the ungauged simulations from the KGE5 and KSP5 holdout experiments. For annual statistics (Figure 6 top panel), results are very similar (same average bias) and most stations have differences between plus and minus 10%. For the Meuse basin, significant improvements can be detected in the bias for KSP5, while KGE5 tends to be better for the Elbe basin. For the summer statistics (Figure 6 bottom panel) the KSP5 has a slightly lower average bias with considerable improvements for the Meuse and Vienne. At the same time, differences for the Elbe basin are more polarized with stations that are better for both KGE5 and KSP5. Overall, the analysis did not show a clear improvement in biases when constraining the models with spatial patterns in the holdout test. If analyzing KGE and the α and r terms of KGE (Equation 2), the KGE-only calibrations performed best for internal station validation in the holdout test. This is illustrated by Figure B1 in the supplementary information section, which shows results for both KGE, and its three components.
The model performances presented in this study should be evaluated in light of the uncertainties associated to them. One aspect of this uncertainty is the sampling uncertainty associated with the KGE metric (Clark et al., 2021). The sampling uncertainty represents the uncertainty related to the time window used for the KGE calculation, since the KGE metric is sensitive to the variance of the evaluation period. This uncertainty can be significant and is important especially when evaluating the applicability of a given model for a particular purpose. Even though it is less important for the comparison of different calibration experiments based on the same evaluation periods, the uncertainties associated to each of the evaluation stations used in the study are given in Tables A1 and A2 in Appendix A. The uncertainties are estimated based on the method described in Clark et al. (2021) and vary between stations but are largely correlated between calibration experiments.
4 Discussion
The single- (temporal) versus multi-objective (temporal and spatial) calibration experiment presented here illustrated a minimal tradeoff in discharge performance when adding the spatial pattern-oriented metric to the traditional KGE objective function (Figure 2). This result is very similar to previous studies (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018; Kumar, Samaniego, & Attinger, 2013; Rakovec, Kumar, Mai, et al., 2016; Soltani, Bjerre, et al., 2021; Zink et al., 2018) and can be attributed to two main factors. First, the metric design, with a long-term average bias-insensitive spatial pattern metric introduces limited conflict to matching the discharge biases and no conflict with the temporal dynamics of the discharge simulations. Second, single-objective calibrations based on downstream discharge only, are known to constrain the spatial distribution of internal fluxes to a minimal extent (Stisen et al., 2011), causing a high degree of equifinality. Consequently, the addition of a spatial pattern metric can be viewed as a means of selecting the best spatial pattern match among an extensive set of plausible parameter sets (all producing satisfying KGEs). These results on objective function selection, are consistent for both the single-basin and multi-basin tests (with six basins, Figure 3). Not surprisingly, it also becomes evident that a good discharge performance (KGE) does not guarantee a good spatial pattern performance.
In light of the low tradeoff for discharge, single-basin versus multi-basin calibrations, results are best analyzed through comparing the spatial patterns of AET and resulting parameter fields. Here, it becomes clear that single-basin single-objective (temporal) calibration can select parameter sets that are entirely inconsistent between the basins (Figure 5) and displays internal spatial AET patterns that are reverse of the observed patterns (Figure 4). Interestingly, the multi-basin KGE calibration (KGE6) shows that simply adding multiple basins in this case enables the model to obtain a somewhat realistic spatial pattern without being constrained specifically to AET. However, the spatial metric must be included to improve this pattern and spatial variability (KSP6). Logically, one joint calibration (KGE6 and KSP6) also ensures a spatially consistent parameter field (Figure 5) and thereby also spatially consistent AET patterns (Figure 4). This point has previously been highlighted by Samaniego et al. (2017), who illustrated the shortcomings in producing seamless parameter fields based on multiple single basin calibrations without parameter regionalization across Europe. Eventually, the goal of regional to continental scale distributed hydrologic modeling is to produce scalable spatial patterns of all states and fluxes across the entire model domain.
Moving on to the spatial holdout experiments, first with single basin calibrations (five holdouts) and later with multi-basin calibrations (single holdouts), the parameter transfer to “ungauged” basins results in average KGE values between 0.79 and 0.86 even when transferring parameters from a single basin to five neighboring basins.
For these holdout experiments, the mean KGE for ungauged basins lies around 0.8 (Figure 4) compared to 0.88 for the multi-basin calibrations (KGE6 and KSP6 in Table 3). This is probably a result of a considerable similarity between the basins and their relatively large size, all of them encompassing a range of land use, soil texture, and climate conditions. Also, the six basins were chosen because they all fulfilled the criteria of a similar climate and topography, and previous performance in a Pan-European modeling context (Rakovec, Kumar, Mai, et al., 2016). In this context, the robustness of parameter transferability might be overestimated compared to basins with less similarity.
Other studies have analyzed parameter transferability and KGE performance drop by spatial validation in ungauged basins. A recent and very relevant example is the model intercomparison paper by Mai et al. (2022). They explicitly performed a spatial validation test against basins not included in the calibration for a range of different model codes over the Great Lakes region in North America. They reported average loss in KGE of around 0.26 for locally calibrated models using a simple parameter transfer scheme and a loss of 0.10 KGE for regionally calibrated models. In comparison, our study reports a loss of KGE of 0.14 for the KGE1 holdout, 0.07 for KSP1 and 0.03 and 0.02 for the KGE6 and KSP6 holdouts (evaluated through the KGE5 and KSP5 performances). It is assumed that a simpler parameter transfer scheme will result in a greater performance loss when testing against basins not included during calibration. In addition, the basins used in our study are quite similar regarding climate and topography, which might not have been the case in other studies. In order to truly compare different holdout experiments regarding performance loss, some accounting for basin similarity and possibly parameter transfer schemes would be recommended.
For the parameter transfer, the experiments including AET during calibration (KSP1 and KSP5) produce better spatial patterns (SPAEF 0.41 and 0.49) when combining ungauged basins, as compared to the KGE-only calibrations (SPAEF −0.10 and 0.25), however KGE5 produced better patterns than KGE1. This is in line with the results of Poméon et al. (2018) who calibrated sparsely gauged basins using remote sensing products. Their study showed that including AET to model calibration significantly improved the performance of the evapotranspiration simulation whereas soil moisture and total water storage predictions were within a good predictive range.
In this study SPAEF is used as the evaluation metric for spatial pattern performance, however it should be noted that other metrics could have been utilized and that further investigations covering other regions, model codes and spatial observation data should be conducted to gain experience in interpretation of the SPAEF metric and benchmark spatial pattern performances of distributed models.
The internal validation against 46 discharge stations was intended to evaluate whether adding spatial patterns to the calibration would improve the discharge bias performance within each basin. Somewhat surprisingly and discouraging, such a systematic bias improvement could not be verified. A previous study by Conradt et al. (2013) on the Elbe basin revealed large discrepancies between water balance AET (precipitation-discharge) and remote sensing-based AET on the sub-basin level. This could indicate that sub-basin water balances are in some cases largely controlled by factors other than AET. This could be water divergence, abstraction, or inter-basin groundwater flow (Le Mesnil et al., 2020; Soltani, Koch, et al., 2021). Wan et al. (2015) showed that the inter-basin transfer of water could cause significant errors in the water balance-based AET calculations. Alternatively, the accuracy of the satellite-based AET might not be sufficient to describe differences at the sub-basin level. Recent analyses, using the AET data set used in this study, have demonstrated that remote sensing-based AET can reproduce large-scale AET patterns across major European basins (>25,000 km2) (Stisen et al., 2021), while studies like Conradt et al. (2013) and Stisen et al. (2021) indicate substantial deviations for smaller sub-basins (below 200–500 km2).
5 Conclusions
The need for systematically transferring parameters to ungauged basins while respecting their landscape heterogeneity and water balance motivated us to expand our previous single-basin experiments (Demirel, Koch, Mendiguren, & Stisen, 2018; Demirel, Koch, & Stisen 2018; Demirel, Mai, et al., 2018) to a regional scale study. In this study, we elaborated on the value of multi-basin, multi-objective model calibration for distributed hydrologic modelers incorporating readily available global remote sensing data in flexible open-source models with cutting-edge parameter regionalization schemes like the multi-parameter regionalization in mHM. Through this approach our single versus multi objective calibration schemes represented purely temporal evaluations (KGE) versus temporal and spatial pattern evaluations (KGE and SPAEF).
We first selected the most relevant parameters for spatial calibration using a sensitivity analysis. Then remotely sensed AET based on the two-source energy budget approach is used together with outlet discharge time series to constrain mHM simulations. Through a series of calibration and cross-validation experiments we identify tradeoffs between objective functions representing temporal and spatial model performance and examine the robustness of parameter transferability to ungauged basins.
-
Multi-objective calibrations including both temporal and spatial evaluation metrics for both individual and multiple basins resulted in balanced solutions leading to better spatio-temporal performances compared to single-objective calibrations focusing on the temporal performance alone. Adding new constraints on spatial patterns only lead to a very limited deterioration in discharge performance while they improve the model predictions for actual evapotranspiration, illustrating a small tradeoff between temporal and spatial model performance.
-
Combining multi-basin and multi-objective calibration has positive impacts on the simulated fluxes and improves the spatial consistency of parameter fields and their transferability to ungauged basins. Multi-basin calibration is found to be the most crucial element of robust parametrizations if only focusing on discharge. However, adding spatial pattern objectives further ensures spatial consistency, performance, and transferability.
Improved model parametrizations in distributed hydrologic models via different transfer functions in combination with appropriate spatial calibration frameworks could facilitate the applications of global hyper-resolution models for “everywhere” (Bierkens et al., 2015) and “without an illogical (unseamless) patchwork of states and fluxes” (Mizukami et al., 2017) in the future. Future work should incorporate more than six basins and spatial patterns of other variables readily available from reliable satellite products.
Acknowledgments
We acknowledge the financial support for the SPACE project by the Villum Foundation (http://villumfonden.dk/) through their Young Investigator Program (Grant VKR023443). The first author is supported by the Scientific Research Projects Department of Istanbul Technical University (ITU-BAP) under Grant MDA-2022-43762 and by the National Center for High Performance Computing of Turkey (UHeM) under Grant 1007292019 and by NASA program that is, NNH22ZDA001N-RRNES: A.24 Rapid Response and Novel Research in Earth Science under the Grant 22-RRNES22-0010.
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
Appendix A: Results of the Jackknife and Bootstrap Based Sampling Uncertainty Analysis
Clark et al. (2021) showed that popular temporal metrics in hydrology, that is, NSE and KGE, are often subject to inevitable sampling uncertainty. This is due to the fact that differences between observed and simulated streamflow values at random time steps in time series can have significant effects on the overall metric value (Knoben & Spieler, 2022). Therefore, we assessed the sampling uncertainty in KGE results of KGE1, KGE6, KSP1, and KSP6 cases presented in Table 3, using the gumboot R package (Clark et al., 2021) which utilizes a jackknife-after-bootstrap method of Efron (1992) to estimate standard errors (SEJaB). Note that this has been done for all 46 + 6 (validation + calibration) stations listed in Table A1. Uncertainty is represented as confidence interval that is, the 5th to 95th percentile of the bootstrap samples. Correlation analysis between the SEJaB scores across all 52 stations gives an R2 of 0.68 for KGE1 versus KSP1 and 0.76 for KGE6 versus KSP6. This indicates that the uncertainties are largely related to the specific stations and the variance and error structure of the hydrograph (Table A2).
GRDC station | Basin | KGE1 | KSP1 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
p05 | p50 | p95 | Score | seJab | p05 | p50 | p95 | Score | seJab | ||
6340180 | Elbe | 0.843 | 0.883 | 0.901 | 0.894 | 0.012 | 0.807 | 0.854 | 0.876 | 0.864 | 0.022 |
6340130 | Elbe | 0.711 | 0.775 | 0.860 | 0.780 | 0.036 | 0.628 | 0.697 | 0.796 | 0.701 | 0.037 |
6340170 | Elbe | 0.766 | 0.841 | 0.914 | 0.844 | 0.042 | 0.700 | 0.775 | 0.856 | 0.777 | 0.023 |
6340300 | Elbe | 0.251 | 0.389 | 0.559 | 0.392 | 0.135 | 0.155 | 0.305 | 0.487 | 0.311 | 0.173 |
6340190 | Elbe | 0.699 | 0.764 | 0.853 | 0.768 | 0.042 | 0.596 | 0.669 | 0.776 | 0.673 | 0.039 |
6340600 | Elbe | 0.654 | 0.709 | 0.799 | 0.714 | 0.049 | 0.619 | 0.694 | 0.813 | 0.699 | 0.057 |
6340700 | Elbe | 0.057 | 0.357 | 0.622 | 0.374 | 0.219 | 0.026 | 0.329 | 0.557 | 0.356 | 0.139 |
6340200 | Elbe | 0.027 | 0.166 | 0.329 | 0.169 | 0.097 | −0.122 | 0.026 | 0.187 | 0.032 | 0.121 |
6340320 | Elbe | 0.485 | 0.630 | 0.754 | 0.629 | 0.092 | 0.565 | 0.705 | 0.819 | 0.709 | 0.085 |
6340365 | Elbe | 0.245 | 0.402 | 0.505 | 0.405 | 0.083 | −0.144 | 0.099 | 0.233 | 0.101 | 0.120 |
6340620 | Elbe | 0.627 | 0.698 | 0.819 | 0.700 | 0.047 | 0.604 | 0.700 | 0.863 | 0.704 | 0.069 |
6340120 | Elbe | 0.694 | 0.760 | 0.856 | 0.766 | 0.054 | 0.588 | 0.659 | 0.770 | 0.663 | 0.053 |
6340630 | Elbe | 0.490 | 0.531 | 0.596 | 0.534 | 0.030 | 0.372 | 0.422 | 0.497 | 0.423 | 0.047 |
6140400 | Elbe | 0.664 | 0.732 | 0.834 | 0.738 | 0.067 | 0.558 | 0.634 | 0.750 | 0.638 | 0.054 |
6340621 | Elbe | 0.618 | 0.679 | 0.767 | 0.679 | 0.034 | 0.539 | 0.641 | 0.799 | 0.645 | 0.053 |
6140500 | Elbe | 0.615 | 0.653 | 0.693 | 0.655 | 0.021 | 0.505 | 0.559 | 0.616 | 0.560 | 0.016 |
6140481 | Elbe | 0.772 | 0.823 | 0.857 | 0.825 | 0.020 | 0.703 | 0.750 | 0.792 | 0.752 | 0.017 |
6140600 | Elbe | 0.343 | 0.459 | 0.605 | 0.458 | 0.051 | 0.532 | 0.638 | 0.755 | 0.637 | 0.058 |
6140250 | Elbe | 0.381 | 0.504 | 0.663 | 0.504 | 0.084 | 0.266 | 0.378 | 0.527 | 0.375 | 0.060 |
6140450 | Elbe | 0.225 | 0.351 | 0.456 | 0.354 | 0.056 | 0.237 | 0.326 | 0.407 | 0.325 | 0.015 |
6140300 | Elbe | 0.595 | 0.643 | 0.743 | 0.646 | 0.068 | 0.366 | 0.436 | 0.579 | 0.437 | 0.087 |
6340302 | Elbe | 0.808 | 0.846 | 0.875 | 0.855 | 0.024 | 0.719 | 0.780 | 0.837 | 0.784 | 0.027 |
6335500 | Main | 0.904 | 0.929 | 0.944 | 0.939 | 0.010 | 0.889 | 0.913 | 0.930 | 0.921 | 0.007 |
6335301 | Main | 0.905 | 0.932 | 0.945 | 0.941 | 0.016 | 0.900 | 0.924 | 0.938 | 0.932 | 0.015 |
6335303 | Main | 0.902 | 0.926 | 0.941 | 0.931 | 0.010 | 0.893 | 0.922 | 0.937 | 0.925 | 0.025 |
6335530 | Main | 0.678 | 0.743 | 0.802 | 0.739 | 0.035 | 0.666 | 0.736 | 0.779 | 0.732 | 0.055 |
6335800 | Main | 0.719 | 0.756 | 0.794 | 0.762 | 0.018 | 0.704 | 0.746 | 0.789 | 0.751 | 0.019 |
6421101 | Meuse | 0.923 | 0.946 | 0.958 | 0.956 | 0.008 | 0.899 | 0.924 | 0.936 | 0.932 | 0.008 |
6221500 | Meuse | 0.829 | 0.861 | 0.896 | 0.872 | 0.020 | 0.785 | 0.826 | 0.856 | 0.835 | 0.022 |
6221680 | Meuse | 0.835 | 0.890 | 0.929 | 0.895 | 0.029 | 0.653 | 0.715 | 0.780 | 0.720 | 0.040 |
6221102 | Meuse | 0.689 | 0.748 | 0.798 | 0.754 | 0.034 | 0.648 | 0.720 | 0.772 | 0.724 | 0.049 |
6121240 | Meuse | 0.056 | 0.284 | 0.488 | 0.273 | 0.052 | −0.120 | 0.150 | 0.412 | 0.134 | 0.099 |
6221550 | Meuse | 0.804 | 0.827 | 0.841 | 0.838 | 0.016 | 0.791 | 0.846 | 0.881 | 0.858 | 0.029 |
6221120 | Meuse | 0.782 | 0.862 | 0.897 | 0.874 | 0.046 | 0.716 | 0.804 | 0.845 | 0.819 | 0.039 |
6221620 | Meuse | 0.341 | 0.421 | 0.488 | 0.420 | 0.050 | 0.340 | 0.446 | 0.536 | 0.445 | 0.064 |
6221200 | Meuse | 0.744 | 0.828 | 0.896 | 0.832 | 0.037 | 0.630 | 0.719 | 0.799 | 0.721 | 0.042 |
6336050 | Mosel | 0.921 | 0.951 | 0.964 | 0.960 | 0.008 | 0.892 | 0.932 | 0.949 | 0.943 | 0.025 |
6336500 | Mosel | 0.885 | 0.930 | 0.954 | 0.935 | 0.027 | 0.872 | 0.911 | 0.933 | 0.920 | 0.026 |
6336800 | Mosel | 0.690 | 0.769 | 0.865 | 0.774 | 0.055 | 0.721 | 0.807 | 0.882 | 0.816 | 0.037 |
6336900 | Mosel | 0.782 | 0.838 | 0.882 | 0.832 | 0.036 | 0.780 | 0.851 | 0.909 | 0.845 | 0.035 |
6336920 | Mosel | 0.042 | 0.207 | 0.370 | 0.209 | 0.051 | 0.252 | 0.407 | 0.541 | 0.403 | 0.096 |
6336910 | Mosel | 0.743 | 0.791 | 0.840 | 0.793 | 0.020 | 0.719 | 0.794 | 0.848 | 0.786 | 0.048 |
6136200 | Mosel | 0.405 | 0.548 | 0.703 | 0.557 | 0.062 | 0.214 | 0.385 | 0.570 | 0.395 | 0.063 |
6335600 | Neckar | 0.891 | 0.931 | 0.948 | 0.942 | 0.016 | 0.872 | 0.911 | 0.926 | 0.921 | 0.014 |
6335601 | Neckar | 0.863 | 0.911 | 0.927 | 0.919 | 0.022 | 0.841 | 0.887 | 0.903 | 0.896 | 0.017 |
6335602 | Neckar | 0.689 | 0.752 | 0.830 | 0.756 | 0.034 | 0.640 | 0.710 | 0.796 | 0.714 | 0.042 |
6335660 | Neckar | 0.733 | 0.819 | 0.860 | 0.825 | 0.042 | 0.745 | 0.802 | 0.830 | 0.807 | 0.045 |
6335291 | Neckar | 0.699 | 0.754 | 0.792 | 0.756 | 0.027 | 0.774 | 0.810 | 0.834 | 0.814 | 0.020 |
6335690 | Neckar | 0.454 | 0.512 | 0.580 | 0.517 | 0.042 | 0.462 | 0.528 | 0.606 | 0.533 | 0.050 |
6123400 | Vienne | 0.832 | 0.892 | 0.916 | 0.913 | 0.018 | 0.814 | 0.882 | 0.906 | 0.899 | 0.025 |
6123450 | Vienne | 0.072 | 0.279 | 0.457 | 0.286 | 0.080 | 0.328 | 0.528 | 0.663 | 0.531 | 0.100 |
6123820 | Vienne | 0.617 | 0.803 | 0.863 | 0.802 | 0.151 | 0.684 | 0.806 | 0.845 | 0.825 | 0.091 |
- Note. Rows in bold indicate the six downstream stations used for calibration.
GRDC station | Basin | KGE6 | KSP6 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
p05 | p50 | p95 | Score | seJab | p05 | p50 | p95 | Score | seJab | ||
6340180 | Elbe | 0.824 | 0.857 | 0.877 | 0.865 | 0.011 | 0.782 | 0.820 | 0.844 | 0.828 | 0.021 |
6340130 | Elbe | 0.688 | 0.769 | 0.861 | 0.770 | 0.045 | 0.694 | 0.758 | 0.848 | 0.761 | 0.043 |
6340170 | Elbe | 0.774 | 0.834 | 0.903 | 0.838 | 0.021 | 0.749 | 0.818 | 0.891 | 0.821 | 0.029 |
6340300 | Elbe | 0.197 | 0.332 | 0.473 | 0.332 | 0.118 | 0.021 | 0.185 | 0.351 | 0.188 | 0.149 |
6340190 | Elbe | 0.676 | 0.762 | 0.868 | 0.763 | 0.049 | 0.678 | 0.748 | 0.854 | 0.751 | 0.049 |
6340600 | Elbe | 0.698 | 0.754 | 0.841 | 0.758 | 0.043 | 0.667 | 0.723 | 0.813 | 0.728 | 0.047 |
6340700 | Elbe | 0.118 | 0.428 | 0.682 | 0.435 | 0.262 | −0.065 | 0.245 | 0.531 | 0.255 | 0.187 |
6340200 | Elbe | −0.020 | 0.110 | 0.236 | 0.113 | 0.103 | −0.266 | −0.092 | 0.072 | −0.089 | 0.123 |
6340320 | Elbe | 0.434 | 0.578 | 0.700 | 0.577 | 0.076 | 0.429 | 0.574 | 0.700 | 0.578 | 0.091 |
6340365 | Elbe | 0.101 | 0.263 | 0.356 | 0.268 | 0.101 | −0.361 | −0.030 | 0.138 | −0.031 | 0.146 |
6340620 | Elbe | 0.658 | 0.734 | 0.833 | 0.735 | 0.035 | 0.637 | 0.715 | 0.839 | 0.717 | 0.063 |
6340120 | Elbe | 0.675 | 0.759 | 0.870 | 0.761 | 0.059 | 0.670 | 0.738 | 0.849 | 0.743 | 0.066 |
6340630 | Elbe | 0.517 | 0.558 | 0.621 | 0.560 | 0.039 | 0.413 | 0.455 | 0.519 | 0.457 | 0.052 |
6140400 | Elbe | 0.642 | 0.732 | 0.851 | 0.733 | 0.059 | 0.642 | 0.716 | 0.833 | 0.720 | 0.071 |
6340621 | Elbe | 0.662 | 0.731 | 0.822 | 0.732 | 0.030 | 0.571 | 0.660 | 0.813 | 0.664 | 0.063 |
6140500 | Elbe | 0.649 | 0.688 | 0.729 | 0.691 | 0.021 | 0.540 | 0.588 | 0.642 | 0.590 | 0.022 |
6140481 | Elbe | 0.822 | 0.862 | 0.883 | 0.864 | 0.020 | 0.761 | 0.805 | 0.843 | 0.808 | 0.014 |
6140600 | Elbe | 0.514 | 0.610 | 0.732 | 0.608 | 0.048 | 0.607 | 0.702 | 0.785 | 0.706 | 0.054 |
6140250 | Elbe | 0.309 | 0.459 | 0.640 | 0.458 | 0.080 | 0.357 | 0.466 | 0.621 | 0.466 | 0.068 |
6140450 | Elbe | 0.054 | 0.157 | 0.246 | 0.154 | 0.025 | −0.001 | 0.097 | 0.189 | 0.095 | 0.024 |
6140300 | Elbe | 0.540 | 0.616 | 0.767 | 0.617 | 0.078 | 0.446 | 0.510 | 0.630 | 0.512 | 0.068 |
6340302 | Elbe | 0.783 | 0.833 | 0.866 | 0.841 | 0.024 | 0.769 | 0.813 | 0.852 | 0.823 | 0.026 |
6335500 | Main | 0.807 | 0.842 | 0.877 | 0.843 | 0.019 | 0.824 | 0.860 | 0.893 | 0.862 | 0.021 |
6335301 | Main | 0.797 | 0.838 | 0.882 | 0.839 | 0.018 | 0.820 | 0.860 | 0.896 | 0.861 | 0.020 |
6335303 | Main | 0.777 | 0.818 | 0.858 | 0.816 | 0.023 | 0.806 | 0.845 | 0.884 | 0.843 | 0.022 |
6335530 | Main | 0.525 | 0.591 | 0.657 | 0.589 | 0.033 | 0.561 | 0.633 | 0.697 | 0.629 | 0.033 |
6335800 | Main | 0.832 | 0.867 | 0.903 | 0.873 | 0.019 | 0.781 | 0.835 | 0.883 | 0.838 | 0.019 |
6421101 | Meuse | 0.871 | 0.908 | 0.939 | 0.911 | 0.017 | 0.874 | 0.914 | 0.943 | 0.918 | 0.021 |
6221500 | Meuse | 0.777 | 0.817 | 0.850 | 0.823 | 0.021 | 0.824 | 0.853 | 0.875 | 0.862 | 0.012 |
6221680 | Meuse | 0.764 | 0.826 | 0.884 | 0.831 | 0.033 | 0.891 | 0.923 | 0.936 | 0.933 | 0.010 |
6221102 | Meuse | 0.748 | 0.809 | 0.868 | 0.815 | 0.028 | 0.749 | 0.805 | 0.864 | 0.814 | 0.028 |
6121240 | Meuse | 0.174 | 0.386 | 0.581 | 0.372 | 0.049 | 0.123 | 0.335 | 0.534 | 0.323 | 0.054 |
6221550 | Meuse | 0.770 | 0.795 | 0.812 | 0.803 | 0.010 | 0.786 | 0.817 | 0.839 | 0.829 | 0.016 |
6221120 | Meuse | 0.737 | 0.812 | 0.857 | 0.824 | 0.042 | 0.762 | 0.808 | 0.840 | 0.818 | 0.029 |
6221620 | Meuse | 0.397 | 0.473 | 0.535 | 0.471 | 0.045 | 0.341 | 0.414 | 0.474 | 0.412 | 0.043 |
6221200 | Meuse | 0.690 | 0.774 | 0.852 | 0.778 | 0.036 | 0.784 | 0.850 | 0.895 | 0.855 | 0.037 |
6336050 | Mosel | 0.845 | 0.894 | 0.921 | 0.897 | 0.029 | 0.837 | 0.891 | 0.923 | 0.895 | 0.027 |
6336500 | Mosel | 0.833 | 0.893 | 0.934 | 0.896 | 0.026 | 0.828 | 0.893 | 0.931 | 0.896 | 0.023 |
6336800 | Mosel | 0.672 | 0.764 | 0.870 | 0.771 | 0.059 | 0.674 | 0.769 | 0.872 | 0.777 | 0.057 |
6336900 | Mosel | 0.820 | 0.860 | 0.891 | 0.859 | 0.027 | 0.796 | 0.845 | 0.879 | 0.843 | 0.036 |
6336920 | Mosel | 0.422 | 0.527 | 0.623 | 0.528 | 0.032 | 0.453 | 0.565 | 0.662 | 0.566 | 0.046 |
6336910 | Mosel | 0.702 | 0.741 | 0.778 | 0.741 | 0.012 | 0.627 | 0.690 | 0.741 | 0.684 | 0.028 |
6136200 | Mosel | 0.268 | 0.418 | 0.585 | 0.427 | 0.071 | 0.156 | 0.314 | 0.493 | 0.322 | 0.068 |
6335600 | Neckar | 0.863 | 0.893 | 0.907 | 0.902 | 0.011 | 0.851 | 0.885 | 0.904 | 0.895 | 0.012 |
6335601 | Neckar | 0.831 | 0.868 | 0.883 | 0.877 | 0.016 | 0.816 | 0.859 | 0.880 | 0.868 | 0.017 |
6335602 | Neckar | 0.652 | 0.727 | 0.817 | 0.731 | 0.042 | 0.616 | 0.694 | 0.794 | 0.699 | 0.043 |
6335660 | Neckar | 0.767 | 0.852 | 0.898 | 0.848 | 0.082 | 0.795 | 0.834 | 0.867 | 0.844 | 0.013 |
6335291 | Neckar | 0.702 | 0.735 | 0.759 | 0.734 | 0.016 | 0.747 | 0.788 | 0.812 | 0.790 | 0.017 |
6335690 | Neckar | 0.445 | 0.503 | 0.581 | 0.510 | 0.057 | 0.433 | 0.495 | 0.576 | 0.501 | 0.054 |
6123400 | Vienne | 0.750 | 0.848 | 0.922 | 0.852 | 0.036 | 0.747 | 0.850 | 0.904 | 0.853 | 0.049 |
6123450 | Vienne | 0.475 | 0.623 | 0.703 | 0.631 | 0.092 | 0.547 | 0.660 | 0.708 | 0.679 | 0.087 |
6123820 | Vienne | 0.676 | 0.801 | 0.844 | 0.810 | 0.061 | 0.616 | 0.764 | 0.838 | 0.766 | 0.103 |
- Note. Stations in bold are the six downstream stations used for calibration.
Appendix B: Validation Against 46 Internal Discharge Stations
Results of the validation against 46 internal discharge stations for KGE and its three components r, α and β are displayed in Figure B1 as ranked scores for the KGE5 and KSP5 hold-out experiments.
Open Research
Data Availability Statement
Discharge gauge data is retrieved from GRDC data portal using Download by Station option (GRDC, 2023). The gauge ID list is also provided (Table A2) from the same GRDC data portal and anyone can select the stations from the map canvas. MODIS MOD16A2 v061 product was retrieved from MODIS (2023). SRTM DEM data was retrieved from NASA JPL (2023). The source code of the mHM is publicly available from Samaniego et al. (2021a, 2021b). The source code of the SPAEF metric is publicly available at Demirel, Koch, & Stisen (2018). The source code to quantify the sampling uncertainty in performance metrics (the “gumboot” package) is available at Clark and Shook (2020). The model calibration software Ostrich is available from USBR (2022). The data for the mHM model simulations are publicly available at Demirel and Stisen (2022).