Seeking genericity in the selection of parameter sets: Impact on hydrological model efficiency
Abstract
This paper evaluates the use of a small number of generalist parameter sets as an alternative to classical calibration. Here parameter sets are considered generalist when they yield acceptable performance on a large number of catchments. We tested the genericity of an initial collection of 106 parameter sets sampled in the parameter space for the four‐parameter GR4J rainfall‐runoff model. A short list of 27 generalist parameter sets was obtained as a good compromise between model efficiency and length of the short list. A different data set was used for an independent evaluation of a calibration procedure, in which the search for an optimum parameter set is only allowed within this short list. In validation mode, the performance obtained is inferior to that of a classical calibration, but when the amount of data available for calibration is reduced, the generalist parameter sets become progressively more competitive, with better results for calibration series shorter than 1 year.
1 Introduction
1.1 The Ideal of a Calibration‐Free World
Most hydrological models can only be used after their free parameters have been calibrated against observed data. Although most hydrologists have accepted this fact, it would be much more satisfying—from a scientific point of view—to have model parameters univocally derived from physical catchment characteristics. We would then live in a calibration‐free world.
Instead of this ideal world, the real‐world modeling practice remains dependent on calibration, a domain where we have been trained to hunt for optimal parameter sets in an n‐dimension hyperspace (n being the number of free parameters in the hydrological model). This of course causes problems in the case of ungaged catchments, for which calibration data are not available [Bárdossy, 2007; Hrachowitz et al., 2013; Oudin et al., 2008; Parajka et al., 2013]. However, calibration problems are not restricted to ungaged catchments: on gaged catchments, calibration is subject to numerous pitfalls that may result in miscalibration, i.e., when the parameter search fails to identify the mathematical optimum, and/or overcalibration, i.e., when, although the mathematically optimum parameter set has been identified over the calibration period, it does not remain mathematically optimum over different periods (see Andréassian et al. [2012] for a more detailed discussion on this topic).
In this context, we would like to reduce the dependence on classical model calibration. Here we wish to test the challenge of trading (as little as possible of) model efficiency for (as much as possible of) parameter genericity (i.e., the capacity to be not specific to a unique catchment).
1.2 Why Should We be Willing to Trade Efficiency for Genericity?
- Ungaged catchments—dealing with ungaged locations would be much easier if a small number of generalist parameter sets was available for any model. The recent IAHS‐led decade on prediction in ungaged basins has contributed to intensifying the efforts on this topic [Blöschl et al., 2013; Hrachowitz et al., 2013] but also shown that the issue is far to have found a unique satisfying solution.
- Robustness—naturally, we would all agree on trading efficiency in calibration for robustness, because calibration is just an intermediary step before model application on an independent data set, as is the case in validation tests. A robust parameterization is one which does not lose much efficiency (on average) between calibration and validation. This issue of robustness is often discussed along with the extrapolation capacity of hydrological models (over a range of validation periods): see the old idea of the differential split sample test of Klemeš [1986] and several more recent attempts [Coron et al., 2012; Gharari et al., 2013; Perrin et al., 2008].
- Multicriteria assessment—we would also accept to trade efficiency measured by a given criterion for improved efficiency over a set of complementary criteria: this is precisely what multiobjective calibration techniques attempt to do (see, e.g., Efstratiadis and Koutsoyiannis' [2010] review).
There is an abundant literature on robust calibration. For example, Koren et al. [2003] and Leavesley et al. [2003] argued that a good means to avoid overcalibration with distributed models was to force some a priori level of spatial and physical consistency into parameter estimates. The concept of regional calibration was also proposed by a few authors as a way to improve parameter transposability [Fernandez et al., 2000; Parajka et al., 2007]. One can also cite Kuzmin et al. [2008] who advocated local calibration approaches starting from physically relevant a priori parameter guesses.
However, to our knowledge, the issue of generalist parameter sets raised in this paper has only received very little interest in the hydrologic literature, which has rather focused on debating the issue of generalist and specialized model structures (see, e.g., the discussion on the FLEX modeling framework of Fenicia et al. [2008] and Fenicia et al. [2011]). We discussed this genericity question under the name “discrete parameterization” [Perrin et al., 2008]: by “discrete,” we meant a parameter search that was restricted in its dimensionality from an n‐dimension search for parameter values to a one‐dimension search for parameter sets, i.e., the dimension of a list. However, we did not address the question of how short a list of parameters could be made: we simply showed that a random reduction of the parameter list (from 299 parameter sets to 75 sets) would only cause a modest efficiency loss.
1.3 Scope of the Paper
In this paper, we address a few basic questions, which we believe can help reduce some of the problems observed in the calibration of hydrological models: would it be possible to reduce calibration to a choice among a few parameter sets? What would the implications be in terms of model efficiency? How would this impact the robustness of the calibration process? Here our aim is to assess the potentiality of using only a few selected parameter sets to run a hydrological model, which implies searching for generalist parameter sets, i.e., parameter sets able to yield acceptable results on various catchments.
In section 2, we will first present the hydrological model used here (section 2.1), the three numerical criteria selected to decide on the acceptability of a parameter set (section 2.2), and the catchment set on which this study is conducted (section 2.3). Then, we will describe the approach followed to identify the most generalist parameter sets (given acceptability thresholds on the three criteria). Section 3 will discuss parameter set acceptability and section 4 the identification of a parameter short list, allowing us to reach the best average performance with the fewest generalist parameters. Last, we will investigate the advantages and limitations of calibrating our model based solely on this short list.
2 Material and Methods
2.1 Hydrological Model
We used the daily lumped continuous GR4J rainfall‐runoff model presented by Perrin et al. [2003], whose structure is shown in Figure 1. The model has four parameters: the capacity of the production store (X1, mm), the groundwater exchange coefficient (X2, mm), the capacity of the routing store (X3, mm), and the time base of the unit hydrograph (X4, days). The only slight adaptation made for this study consisted in rewriting the unit hydrograph parameter X4 as a function of the catchment area S: X4 = X4*.S0.3 (tests, not presented here, showed that it does increase the genericity of the parameter). During our parameter search, we used X4*, which was then adjusted to each catchment using the previous equation.

Schematic diagram of the structure of the four‐parameter GR4J model (X1–X4 are model parameters; PE: potential evapotranspiration, P: rainfall, Q: streamflow, other letters are internal state variables).
2.2 Criteria Used to Evaluate Model Simulations
-
: the ratio of the standard deviation of the observed flow to the standard variation of the simulated flow. The optimal value is 1.
-
: the ratio of the average observed flow to the average simulated flow (i.e., volume bias). The optimal value is 1.
- ρ: the Pearson correlation coefficient (simulated versus observed flow). The optimal value is 1.

KGE is optimal for a value of 1.
In section 3, we will use triple thresholds (α = α0, β = β0, ρ = ρ0) to define the acceptability of parameter sets. A parameter will be deemed acceptable if its simulation simultaneously yields criteria values above these thresholds. For the sake of simplicity, we will use in upcoming discussions four categories defined in Table 1, in order to qualify the model simulation efficiency obtained by use of the parameter set in question.
| Model Efficiency Requirement | α | β | ρ |
|---|---|---|---|
| Very high | 0.95 << 1.05 | 0.95 << 1.05 | >0.9 |
| High | 0.90 << 1.10 | 0.90 << 1.10 | >0.8 |
| Moderate | 0.85 << 1.15 | 0.85 << 1.15 | >0.7 |
| Low | 0.80 << 1.20 | 0.80 << 1.20 | >0.6 |
- a A parameter set belongs to the qualitative efficiency class if it fulfills simultaneously the three conditions on α, β, and ρ.
In this paper, model performance was assessed solely on the basis of the three criteria α, β, and ρ. However, the method presented here can be implemented with any performance criterion for which an efficiency threshold similar to those presented in Table 1 is defined.
2.3 Catchment Sets
A set of 202 unregulated catchments was selected in this study to develop the parameter short list (Figure 2). These 202 catchments represent various hydrological conditions, given the variability in climate, topography, and geology in France. This set includes Mediterranean catchments with intense precipitation as well as larger, groundwater‐dominated basins and highland catchments. We did not include catchments where snow plays a major hydrological role, since no snowmelt module was used here.

Location of the French catchments used for this study: in blue, the development set (202 catchments) and in red, the validation set (515 catchments).
A second (independent) data set of 515 French catchments was used to test independently the parameter lists obtained on the former catchment set (the test procedure will be explicited in section 4.1). This second set only differed from the test by the period of data availability (1980–2010 for the first, 1959–2010 for the second).
Daily rainfall, runoff, and potential evapotranspiration (PE) data series were available for both catchment sets. Meteorological inputs originate from the SAFRAN database [Vidal et al., 2010]. PE was estimated using the temperature‐based formula proposed by Oudin et al. [2005]. Hydrological data were extracted from the HYDRO national archive (www.hydro.eaufrance.fr). Available times series lengths range from 20 to 50 years (average length was 34 years).The characteristics of both catchment sets are given in Table 2.
| Catchment Area (km2) | Mean Annual Rainfall P (mm) | Mean Annual Potential Evaporation E0 (mm) | Mean Annual Discharge Q (mm) | P/E0 Ratio | Q/P Ratio | |
|---|---|---|---|---|---|---|
| Development Set (202 Catchments) | ||||||
| First decile | 46 | 768 | 653 | 168 | 1.05 | 0.21 |
| Median | 245 | 977 | 701 | 344 | 1.40 | 0.35 |
| Ninth decile | 2258 | 1391 | 795 | 830 | 2.04 | 0.60 |
| Test Set (515 Catchments) | ||||||
| First decile | 56 | 723 | 634 | 160 | 1.03 | 0.22 |
| Median | 204 | 908 | 671 | 324 | 1.34 | 0.35 |
| Ninth decile | 1268 | 1194 | 752 | 616 | 1.77 | 0.53 |
2.4 Method Retained for Identifying and Assessing Generalist Parameter Sets
- First, we describe the acceptability of the parameter sets and analyze its dependency on the model's efficiency requirements.
- Then, we investigate the parameter sets which show the best genericity.
- Last, building on these most generalist parameter sets, we will identify a parameter short list to be used for model parameterization, as an alternative to full calibration.
3 On Parameter Set Acceptability
3.1 Definitions
- Parameter acceptability: for a given catchment, we can identify the parameter sets among the 106 parameters reaching a certain efficiency threshold. These parameter sets will be qualified as acceptable for the given catchment. Naturally, acceptability will vary with efficiency requirements: the higher the requirements, the lower the acceptability.
- Catchment attainability: similarly, for a given parameter set, we can identify the catchments where the parameter set obtains a given efficiency threshold. We say that these catchments are attainable by the given parameter set. Similar to acceptability, attainability will vary with the efficiency requirements: the higher the requirements, the lower the attainability.
- Parameter genericity: widely acceptable parameter sets (i.e., parameter sets acceptable by more than x% of our 202 catchments) will be considered as generalist (at the x% level).
Note that looking for generalist parameter sets implies a pragmatic vision of hydrologic modeling: given the inherently imperfect nature of hydrological models, we seek neither a perfect fit nor the identification of the unique true parameter set representing a given catchment. Rather, we see in generalist parameter sets the opportunity to identify clusters of catchments, which has naturally implications for similarity and classification issues.
3.2 Parameter Set Acceptability‐Sensitivity to the Model's Efficiency Requirements
Table 3 presents the total number of parameter sets acceptable for at least one catchment, along with the corresponding number of attainable catchments by at least one parameter set for four typical performance categories. For the “Low” category, it is possible to find at least one acceptable parameter set for each of the 202 catchments, and on average, there are 570 acceptable parameter sets per catchment. This number decreases when requirements rise: already with the “Moderate” category there are six catchments, which cannot be attained by any parameter set. This number rises to 13 for the “High” category (still only 6% of the catchment set), and to 77 for the “Very high” category (38% of the catchment set).
| Model Efficiency Requirement (Model Performance Category) | Number of Parameter Sets Acceptable for at Least One Catchment (Out of 106) | Number of Catchments Attainable by at Least One Parameter Set (Out of 202) |
|---|---|---|
| Very high | 2,170 | 125 |
| High | 23,258 | 189 |
| Moderate | 57,952 | 196 |
| Low | 115,043 | 202 |
3.3 On Generalist Parameter Sets
Here we focus on those parameter sets which—for a given triplet of efficiency requirements (α0, β0, ρ0)—are found acceptable on at least x% of the data set (this attainability limit will be discussed later: we start with x = 10%). The existence and the number of generalist sets is a function of the efficiency requirements: the higher the requirements, the lower their number (Table 4).
| Efficiency Requirement | Number of Generalist Parameter Sets (Attainability Limit of 10%) | Number of Catchments Attained by the Generalist Parameter Sets | ||
|---|---|---|---|---|
| Max | Median | Minaa
The minimum is 20 by definition, because the focus here is generalist parameter sets, which attain at least 10% of the total number of catchments. |
||
| Very high | 0 | |||
| High | 22 | 24 | 20 | 20 |
| Moderate | 1831 | 52 | 25 | 20 |
| Low | 7708 | 76 | 30 | 20 |
- a The minimum is 20 by definition, because the focus here is generalist parameter sets, which attain at least 10% of the total number of catchments.
If we look for parameter sets that can attain a large number of catchments, we will find that some of them can be particularly efficient: for “Low” efficiency requirements, the most generalist parameter set is acceptable to 76 catchments. This number decreases to 52 for “Moderate,” to 24 for “High,” and to 9 for the “Very high” efficiency requirements.
Table 5 shows how the attainability limit impacts the number of generalist parameter sets when the limit is lowered from 10 to 5%. Clearly, the number of generalist parameter sets increases rapidly when the coverage requirement is lowered.
| Efficiency Requirement | Number of Generalist Parameter Sets for an Attainability Limit of: | |||||
|---|---|---|---|---|---|---|
| 10% | 9% | 8% | 7% | 6% | 5% | |
| Very high | 0 | 0 | 0 | 0 | 0 | 0 |
| High | 22 | 53 | 129 | 245 | 419 | 735 |
| Moderate | 1831 | 2439 | 3142 | 4076 | 5336 | 6891 |
| Low | 7708 | 8758 | 10,089 | 11,505 | 13,383 | 15,734 |
- The 6891 generalist sets (corresponding to a minimum attainability of 5% and a “Moderate” efficiency requirement) can collectively attain 177 catchments (88% of the catchment set).
- The 735 generalist sets (corresponding to a minimum attainability of 5% and a “High” efficiency requirement) can collectively attain 132 catchments (65% of the catchment set).
- The 22 generalist sets (corresponding to a minimum attainability of 10% and a “High” efficiency requirement) can collectively attain 72 catchments (36% of the catchment set).
| Efficiency Requirement | Number of Catchments Attainable by the Generalist Sets Collectively (Total Number of Catchments: 202) | |||||
|---|---|---|---|---|---|---|
| 10% | 9% | 8% | 7% | 6% | 5% | |
| Very high | 0 | 0 | 0 | 0 | 0 | 0 |
| High | 72 | 78 | 100 | 108 | 122 | 132 |
| Moderate | 157 | 158 | 161 | 166 | 170 | 177 |
| Low | 175 | 178 | 182 | 183 | 186 | 187 |
- a For each of the generalist parameter sets identified in Table 5, we give here the total number of catchments which are attainable by at least one of the parameter sets.
The high coverage reached by a very limited number of parameter sets opens the way to imagining parameterization schemes based on the use of generalist parameter sets, even if they cannot cover absolutely all catchments.
Comparing Table 6 to Table 5 shows that although the number of generalist sets rises very rapidly when we decrease the attainability limit, the proportion of the catchment set attained by a least one generalist sets rises, but much more slowly: the new parameter sets mostly attain the same catchments.
4 Toward a Short List of Parameter Sets: Aiming at Maximum Model Efficiency Using Only a Minimum Number of Parameter Sets
4.1 Parameter Short List and Short‐List Calibration (SLC)
We now wish to look for a way to constrain our choice of parameters in calibration while losing as little as possible in terms of simulation efficiency. As our objective is precisely to recruit parameter sets, we chose the term “short list.” Parameterizing our hydrological model based on a short list not only can considerably speed up the calibration process, but it could perhaps make it more robust too, as was shown it in a previous attempt [Perrin et al., 2008].
Short‐list calibration (SLC) consists in replacing an unconstrained free‐range calibration with a calibration restricted to the short list: this means that the search for an optimal parameter set is reduced to the few parameter sets that are part of the short list.
4.2 Stepwise Selection of Efficient Short Lists
An efficient short list is one that yields the best model efficiency for a given number of parameter sets. To identify these lists, we proceeded in a stepwise manner, alternating forward selection and backward elimination (the pseudo‐code presented in the Supporting Information provides a formal description). The specificity of this approach, however, lies in the fact that the search was started with the generalist parameter sets identified in the previous section.
- Initialization: the short list search is initialized with the 22 generalist parameter sets corresponding to the “High” efficiency requirement and the 10% attainability limit (see Table 5).
- Elimination: we first test whether one of the parameter sets is redundant. The objective function used is the average KGE value for the whole catchment set (noted KGEav). Here a parameter set is redundant when it can be removed without significantly affecting KGEav (significant meaning a decrease of more than 0.001 in KGEav). We renew the test as long as a parameter set can be removed.
- Selection: after all redundant parameters have been removed, we add an additional parameter set selected among suitable candidates (see below). The candidate selected is chosen by evaluating its added value to the SLC process.
During the selection process, parameter sets were explored by order of decreasing genericity (as defined in Table 6). This means (figures refer to Table 5) that after exploring the group of 22 high‐efficiency parameter sets at an attainability level of 10%, we move to the 53 “High efficiency–9% genericity,” etc. until we have visited all the categories of Table 5. Each time we changed a category (from “High” to “Moderate” efficiency, from “Moderate” to “Low”), we included a single search through the entire (106) collection of parameter sets to make possible the selection of more specific (i.e. less generic) parameter sets, provided they yield a larger improvement of KGEav.
This stepwise selection approach yielded lists of progressively increasing length, which can be used for SLC. For the sake of good scientific practice, SLC will be tested on an independent data set.
4.3 Validation of the Short List on an Independent Data Set
In order to identify a parameter list offering the best compromise between efficiency and length, we tested the suitability of the SLC process on a validation data set of 515 catchments (see section 2.3).
With each of the short lists successively, GR4J was calibrated on half of the period available and validated on the second half. Then we inverted the role of each time period. Figure 3 shows the results in calibration and validation for average KGE efficiency, as well as for three quantiles (10, 50, and 90%), in order to illustrate the variability among the 515 validation catchments.

Efficiency of the short list calibration (SLC) on the independent catchment set, as a function of the number of parameter sets retained. Different colors represent different quantiles. Solid lines represent the efficiency of the short lists and dashed lines show the reference: full calibration.
Figure 3 shows that SLC provides model efficiencies that progressively approach that of full calibration. With long streamflow series, SLC remains inferior to full calibration, but if a short list of around 27 members (vertical dashed line in Figure 3) is selected, we see that we need to trade little efficiency in order to use generalist parameter sets (note in validation mode, the distance between full calibration and SLC is almost halved in comparison to the calibration mode).
Table 7 further analyzes these results by looking at the four different efficiency criteria: again, it demonstrates the good performance and robustness of SLC. Although full calibration remains slightly superior, we see that the difference with SLC in validation is limited (0.04 of KGE, 0.03 of correlation, very little difference for α and β): SLC is only inferior to full calibration for the correlation criterion.
| β | α | ρ | KGE | |
|---|---|---|---|---|
| Full calibration | 1.01 (0.86–1.18) | 1.00 (0.84–1.16) | 0.90 (0.82–0.95) | 0.80 (0.66–0.92) |
| Short‐list calibration | 0.99 (0.82–1.18) | 0.99 (0.80–1.19) | 0.87 (0.78–0.94) | 0.76 (0.59–0.90) |
- a The objective function in calibration was KGE.
These results confirm those of Perrin et al. [2008] but with a much shorter list (they investigated lists varying between 75 and 299 parameter sets).
4.4 Testing Short‐List Calibration With Short Calibration Periods
Here the analysis is pushed further by looking at the impact of the amount of calibration data on the relative performance of full and short‐list calibrations (Figures 4 and 5).

Impact of the availability of calibration data on the relative efficiency of full calibration (solid lines) and short‐list calibration (dashed lines). The efficiency is computed in Kling‐Gupta efficiency in validation mode (the same validation period is used for all lengths). N stands for the entire period.

Full calibration versus short‐list calibration, as a function of the calibration period length. (top row) Calibration results; (bottom row) validation results.
Figures 4 and 5 show that under 1 year of calibration data, SLC becomes more efficient on almost all quantiles, first on the 10% efficiency quantile, then on the mean and median results. The additional constraint of calibrating only on the short list, although it reduces the efficiency when long calibration periods are available, has clear advantages when only short periods are available. Figure 5 illustrates this point: in calibration mode, full calibration cannot be beaten, but in validation mode, the scatterplot shows a progressive shift toward the 1:1 line. This confirms the results of Perrin et al. [2008].
4.5 Description of the Compromise Short List (27 Parameters)
Table 8 presents the compromise short list: out of the corresponding 27 parameter sets, 22 are generalist sets belonging to the “High” efficiency requirement, two are generalist sets belonging to the “Low” efficiency requirement, and three sets do not belong to any genericity group. Note that from the initial 22 parameter sets (“High‐efficiency”‐10% attainability), only two remain.
| Parameter Set ID | X1 (mm) | X2 (mm) | X3 (mm) | X4* (days) | Parameter Set Properties | |
|---|---|---|---|---|---|---|
| Selection Procedure on 202 Catchments: Efficiency Requirement and Attainability Level | Validation Test on 515 Catchments: Attainability Level (%) | |||||
| 1 | 174.9 | −0.6 | 19.0 | 2.82 | High, 6% | 8.9 |
| 2 | 253.0 | −0.6 | 45.8 | 1.95 | High, 8% | 6.3 |
| 3 | 351.2 | −2.6 | 102.1 | 1.60 | High, 6% | 5.9 |
| 4 | 200.1 | −1.5 | 76.2 | 2.36 | High, 9% | 5.5 |
| 5 | 180.9 | −1.5 | 35.3 | 2.43 | High, 7% | 5.5 |
| 6 | 530.8 | −0.4 | 64.6 | 1.43 | High, 9% | 5.0 |
| 7 | 198.2 | −0.7 | 12.8 | 2.66 | High, 7% | 4.6 |
| 8 | 291.0 | 0.3 | 109.1 | 2.06 | High, 5% | 4.5 |
| 9 | 354.2 | −0.5 | 103.0 | 2.49 | High, 8% | 4.4 |
| 10 | 836.9 | −3.4 | 68.2 | 1.75 | Low, 6% | 4.4 |
| 11 | 826.5 | −1.1 | 90.2 | 1.40 | High, 5% | 4.2 |
| 12 | 4005.6 | −4.6 | 318.3 | 2.03 | 4.0 | |
| 13 | 144.3 | 0.8 | 41.0 | 2.59 | 4.0 | |
| 14 | 126.0 | −0.2 | 169.2 | 1.25 | High, 6% | 3.8 |
| 15 | 275.7 | −1.3 | 78.5 | 2.25 | High, 10% | 3.8 |
| 16 | 196.8 | −0.7 | 28.6 | 1.84 | High, 5% | 3.7 |
| 17 | 327.1 | −1.4 | 34.0 | 1.67 | High, 6% | 3.3 |
| 18 | 238.5 | −2.9 | 267.7 | 2.03 | High, 6% | 3.1 |
| 19 | 215.9 | −0.4 | 102.7 | 2.15 | High, 7% | 2.9 |
| 20 | 281.0 | −0.7 | 8.2 | 2.69 | High, 5% | 2.5 |
| 21 | 346.5 | −5.5 | 61.5 | 2.10 | Low, 7% | 2.4 |
| 22 | 328.4 | −1.0 | 15.9 | 2.53 | High, 6% | 1.8 |
| 23 | 586.6 | −1.5 | 154.4 | 2.67 | High, 5% | 1.7 |
| 24 | 364.4 | −3.9 | 185.7 | 2.16 | High, 5% | 1.6 |
| 25 | 281.1 | −1.2 | 15.1 | 2.28 | High, 5% | 1.2 |
| 26 | 342.5 | −0.9 | 77.1 | 3.56 | High, 10% | 0.8 |
| 27 | 452.5 | −54.5 | 149.9 | 1.24 | 0.3 | |
Table 8 also provides the attainability levels for each set over the 515 independent catchment sets. Interestingly, two of the nongeneralist sets are selected by around 4% of the data set each, while one of the highly generalist sets (“High”—10%) is in fact chosen by less than 1% of the data set, but overall, the hierarchy of genericity is preserved.
Figure 6 shows the cumulative distribution functions of the parameter sets used in full calibration and with the short‐list calibration. One can notice that the ranges of variation are well captured by the 27 parameter sets of the short list, although some of the extreme values obtained by optimization are not represented.

Cumulative frequency of the parameters used over the 515 catchments when considering the full calibration (continuous lines) and the short‐list calibration (discrete points). Note that a logarithm transformation is used for X1 and X3. Quantiles ranging from 0.01 to 0.99 are used here.
In order to give an overview of the generality of each parameter retained in the compromise short list, we present in Figure 7 the distribution of each of the parameters efficiency (expressed in KGE) over the 202 catchments of the calibration data set. It shows that if most of the parameter sets have a rather high median efficiency, sets 12, 13, and 27 show a different behavior: they seem to represent an extremely specific behavior, and to give rather low performance on most of the other catchments. This difference was expected, as these three parameter sets are those which do not belong to one of the “high‐efficiency” groups.

Range of efficiency of each of the parameter sets part of the compromise short list presented in Table 8: the values characterize the distribution of efficiencies computed on the 202 catchments of the calibration data set.
5 Conclusion and Perspectives
This paper aimed to identify generalist parameter sets for the GR4J hydrological model. We assembled a list of 27 parameter sets and validated it by relying exclusively on it for the calibration of GR4J. We call this approach short‐list calibration (SLC).
We obtained mixed results: on one side, SLC efficiency remains inferior to that of full calibration. But on the other side, as the length of the calibration time period becomes shorter than 1 year, SLC takes progressively the advantage over full calibration. In terms of computing efficiency, this means that a calibration can be obtained with only 27 model runs, where a classical calibration would require at least several hundred. Even in the context of a full calibration, the use of the short list allows for a quick initial prescreening to initialize a local search procedure.
Several options could be explored: it would be instructive to test SLC with less parsimonious models (in their comparison, Perrin et al. [2008] showed that the more heavily parameterized model benefitted more from discrete calibration and this could be the same for SLC). And since in this paper, we based our short list selection on a single aggregated efficiency criterion (KGE), we could also test other criteria in an explicit multiobjective setting.
But more importantly, we see in this exploration of generalist parameter sets new perspectives for ungaged catchments (with ensemble simulations in an attempt to address the difficulty of parameterizing hydrological models) and for poorly gaged catchments (because we demonstrated that the short list was a powerful constraint when streamflow data were scarce).
Last, parameter short lists of different length offer a potential for catchment classification, based on behavioral similarity (i.e., under the assumption that the similarity of the model's parameter sets obtained on two different catchments reflects the similarity of their behavior with respect to rainfall‐runoff transformation, see the discussion by Oudin et al. [2010]). Further tests are needed in order to study the geographic coherence of generalist parameter sets, possible relationships to physical catchment characteristics, and the geographic transferability of the proposed short list.
Acknowledgments
The authors would like to acknowledge three anonymous reviewers who contributed to improve this paper through their constructive critics. Hydrometrical data used for this paper can be accessed through the French National Hydrometric Archive (http://www.hydro.eaufrance.fr/). Climatic data can be accessed through Météo France climatological archive (Publithèque: http://publitheque.meteo.fr/okapi/accueil/okapiWebPubli/index.jsp).
References
Citing Literature
Number of times cited according to CrossRef: 17
- Thibault Mathevet, Hoshin Gupta, Charles Perrin, Vazken Andréassian, Nicolas Le Moine, Assessing the performance and robustness of two conceptual rainfall-runoff models on a worldwide sample of watersheds, Journal of Hydrology, 10.1016/j.jhydrol.2020.124698, (124698), (2020).
- S. Etter, B. Strobl, J. Seibert, H. J. Ilja Meerveld, Value of Crowd‐Based Water Level Class Observations for Hydrological Model Calibration, Water Resources Research, 10.1029/2019WR026108, 56, 2, (2020).
- Jordi Prats, Vincent Roubeix, Nathalie Reynaud, Thierry Tormos, Pierre-Alain Danis, The thermal behaviour of French water bodies: From ponds to Lake Geneva, Journal of Great Lakes Research, 10.1016/j.jglr.2020.04.001, (2020).
- Mohamed Saadi, Ludovic Oudin, Pierre Ribstein, Random Forest Ability in Regionalizing Hourly Hydrological Model Parameters, Water, 10.3390/w11081540, 11, 8, (1540), (2019).
- Fernando Neves Lima, Wilson Fernandes, Nilo Nascimento, Joint calibration of a hydrological model and rating curve parameters for simulation of flash flood in urban areas, RBRH, 10.1590/2318-0331.241920180066, 24, 0, (2019).
- undefined García-Romero, undefined Paredes-Arquiola, undefined Solera, undefined Belda, undefined Andreu, undefined Sánchez-Quispe, Optimization of the Multi-Start Strategy of a Direct-Search Algorithm for the Calibration of Rainfall–Runoff Models for Water-Resource Assessment, Water, 10.3390/w11091876, 11, 9, (1876), (2019).
- Swagat Patnaik, Vimal Chandra Sharma, Basudev Biswal, Evaluation of an instantaneous dryness index-based calibration-free continuous hydrological model in India, Hydrology Research, 10.2166/nh.2019.081, (2019).
- Nans Addor, Hong X. Do, Camila Alvarez-Garreton, Gemma Coxon, Keirnan Fowler, Pablo A. Mendoza, Large-sample hydrology: recent progress, guidelines for new datasets and grand challenges, Hydrological Sciences Journal, 10.1080/02626667.2019.1683182, (1-14), (2019).
- David P. Wright, Mark Thyer, Seth Westra, Benjamin Renard, David McInerney, A generalised approach for identifying influential data in hydrological modelling, Environmental Modelling & Software, 10.1016/j.envsoft.2018.03.004, (2018).
- David P. Wright, Mark Thyer, Seth Westra, David McInerney, A hybrid framework for quantifying the influence of data in hydrological model calibration, Journal of Hydrology, 10.1016/j.jhydrol.2018.01.036, 561, (211-222), (2018).
- Adam P. Piotrowski, Jaroslaw J. Napiorkowski, Performance of the air2stream model that relates air and stream water temperatures depends on the calibration method, Journal of Hydrology, 10.1016/j.jhydrol.2018.04.016, 561, (395-412), (2018).
- Ye Tian, Yue-Ping Xu, Chong Ma, Guoqing Wang, Modeling the Impact of Climate Change on Low Flows in Xiangjiang River Basin with Bayesian Averaging Method, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001557, 22, 9, (04017035), (2017).
- Luis Samaniego, Rohini Kumar, Stephan Thober, Oldrich Rakovec, Matthias Zink, Niko Wanders, Stephanie Eisner, Hannes Müller Schmied, Edwin H. Sutanudjaja, Kirsten Warrach-Sagi, Sabine Attinger, Toward seamless hydrologic predictions across spatial scales, Hydrology and Earth System Sciences, 10.5194/hess-21-4323-2017, 21, 9, (4323-4346), (2017).
- Claudia Rojas‐Serna, Laure Lebecherel, Charles Perrin, Vazken Andréassian, Ludovic Oudin, How should a rainfall‐runoff model be parameterized in an almost ungauged catchment? A methodology tested on 609 catchments, Water Resources Research, 10.1002/2015WR018549, 52, 6, (4765-4784), (2016).
- Stephen Oni, Martyn Futter, Jose Ledesma, Claudia Teutschbein, Jim Buttle, Hjalmar Laudon, Using dry and wet year hydroclimatic extremes to guide future hydrologic projections, Hydrology and Earth System Sciences, 10.5194/hess-20-2811-2016, 20, 7, (2811-2825), (2016).
- Mahyar Shafii, Bryan A. Tolson, Optimizing hydrological consistency by incorporating hydrological signatures into model calibration objectives, Water Resources Research, 10.1002/2014WR016520, 51, 5, (3796-3814), (2015).
- Magali Troin, Richard Arsenault, François Brissette, Performance and Uncertainty Evaluation of Snow Models on Snowmelt Flow Simulations over a Nordic Catchment (Mistassibi, Canada), Hydrology, 10.3390/hydrology2040289, 2, 4, (289-317), (2015).





