Using the Budyko Framework for Calibrating a Global Hydrological Model

Global hydrological models (GHMs) have become an established tool to simulate water resources worldwide. Most of the GHMs are however uncalibrated and typically use a set of basic hydrological parameters, that could potentially lead to unrealistic projections of the terrestrial water cycle. The calibration of hydrological models is usually performed by using and comparing modeled to observed discharge. Accurate station data and reliable time series data of discharge are, however, often not available for many parts of the world and classic calibration approaches are therefore not feasible. In this paper, we aim to develop a new calibration approach that requires no additional data, is easy to implement, and substantially improves model performance, especially in regions where uncalibrated model performance is rather poor. This is achieved by using the Budyko framework, which provides a conceptual representation of the long‐term water and energy balance. We use a state‐of‐the‐art GHM and calibrate the model within nine river catchments of different sizes and characteristics. Since observed river discharge is available for these catchments, we are able to compare the Budyko‐based calibration approach to a classic discharge‐based calibration scheme and the uncalibrated model version. In all catchments, the Budyko‐based calibration approach decreases biases and increases model performance compared to the uncalibrated model version although performance improvements obtained through a classic calibration approach are greater. Nonetheless, a Budyko‐based calibration is a valuable, intermediate approach between use of a basic set of a priori hydrological parameters and classical calibration against discharge data.


Motivation
Global hydrological models (GHMs) have become a common tool to assess the large-scale dynamics of surface and subsurface hydrological processes, ranging from more classical hydrologic modeling approaches focusing on small catchment areas, up to continental-basin scales. At smaller scales, models are usually calibrated using a large set of parameters. The majority of GHMs are however uncalibrated, typically using reduced sets of a priori parameters. Focusing on individual or small sets of catchments, calibration is usually performed using observed discharge. Global modeling approaches suffer from multiple issues hindering a reasonable and feasible calibration procedure, most prominently the varying availability of accurate discharge observations in different world regions (Hanasaki et al., 2018;Müller-Schmid et al., 2014). While in North America, Europe, and parts of Asia dense gauging networks enable a comprehensive calibration of model parameters, this is largely impossible in Africa, South America, Australia, and other parts of Asia. In addition, GHMs usually show large uncertainties and biases and do not permit quantitative assessments of past, present, and future hydroclimatological changes, especially in these data-scarce regions. Calibrating GHMs only against observed records that are concentrated in Europe and North America does not necessarily help to provide better estimates in such data-scarce regions. The availability of station observations has also constantly been decreasing since the early 1990s (Hunger & Döll, 2008;Vörosmarty et al., 2001).
The issue of heterogeneous, inaccurate observations is addressed by using recent, more comprehensive collections of discharge data (Do et al., 2018). However, these still suffer from large data gaps in many tropical to subtropical regions. In addition, observation-driven data sets of global streamflow (Barbarossa et al., 2018) have recently become available. These are spatially explicit, but still based on observed data records that are subject to data gaps. Another approach is the regionalization of model parameters. By using sets of catchments for calibration that cover a wide range of climatic and landscape characteristics, the obtained parameters can subsequently be transferred to grid cells with similar characteristics (Beck et al., 2016). Additionally, remotely sensed data sources have been shown to be useful in specific cases (Lopez et al., 2017; The motivation to use the Budyko framework in this context is that it provides a benchmark for the long-term coupled water and energy balance that can be used in the context of, for example, quantifying intercatchment groundwater flows (Bouaziz et al., 2018) or to evaluate hydroclimatological data set combinations (Greve et al., 2014). In the spirit of those conceptual studies exploring possibilities to make use of the Budyko framework, we aim to fully avoid the use of additional data by presenting an innovative calibration approach for large-scale and global hydrological models that is solely based on this fundamental hydrological concept. Using the Budyko framework enables us to ensure a good representation of the basic characteristics of the long-term coupled water and energy balance, and we argue that, given its simplicity and data requirements, an implementation into existing calibration tools is easy and efficient. We hypothesize that this approach will reduce runoff biases but not the subdaily to seasonal modeling of high and low flows and that it will provide a major and simple step toward more reliable large-scale hydrological modeling in data-scarce and ungauged regions.
The Budyko framework conceptually represents the coupled water and energy balance. The Budyko hypothesis (Budyko, 1974) states that the evaporative index, E∕P, defined as the relative fraction of mean annual (or longer) average precipitation, P, that is used for evapotranspiration, E, is a function of the aridity index , defined as the ratio of potential evaporation to precipitation (PET∕P). Here we use an extended version of the Budyko hypothesis, including an additional single, free parameter (here referred to as ): The Budyko hypothesis assumes no changes in water storages, therefore ideally requiring multiyear averages and catchment scales. Based on the Budyko hypothesis, it is often assumed that the first-order control of the climatological water and energy balance is represented through , while the single, free parameter represents the wealth of all other (second-order) controls. A widely used function fulfilling the requirements of the Budyko hypothesis is the Tixeront-Fu equation (Fu, 1981;Tixeront, 1964;Zhang et al., 2004): The parameter technically represents the residual influence on E∕P besides . If = 2.6, it matches the original, empirical Budyko curve as introduced by Budyko (1974). In recent years factors such as vegetation and topography (Carmona et al., 2016;Destouni et al., 2013;Donohue et al., 2010Donohue et al., , 2012Jaramillo & Destouni, 2015;Jiang et al., 2015;Li et al., 2013;Ning et al., 2019;Oudin et al., 2008;Porporato et al., 2004;Shao et al., 2012;Wang & Hejazi, 2011;Williams et al., 2012;Xu et al., 2013;Yang et al., 2009;Yokoo et al., 2008;Zhou et al., 2015;Zhang et al., 2001Zhang et al., , 2016Zhang et al., , 2018 have been proposed to control . Other studies also suggest climatic and weather influences, like phase shifts in the seasonal cycles of P and PET or average storm depth (Berghuijs et al., 2014;Carmona et al., 2016;Dean et al., 2016;Donohue et al., 2012;Ning et al., 2019;Milly, 1994;Shao et al., 2012;Yokoo et al., 2008). However, the obtained results between the studies are generally inconclusive and sometimes contradictory (Padrón et al., 2017). Hence, instead of using deterministic values within specific contexts dependent on the different factors mentioned above, a probabilistic representation of the parameter potentially provides a better, general representation of the full range of values (Greve et al., 2015;Gudmundsson et al., 2016). Instead of determining a parameter set best representing observed discharge, the overarching goal of this study is to show the potential of calibrating a parameter set that best represents the Budyko curve, including variability in , thereby ensuring a model setup that provides a reasonable estimate of the long-term water balance as a function of climatic aridity. Furthermore, we do not expect the Budyko-based calibration to achieve performance improvements as great as a classical discharge-based calibration.
In the first part of this study, we introduce the Community Water Model (CWatM) and specify the set of river basins that we use to evaluate the feasibility of a Budyko-based calibration approach (section 2). We then introduce the Budyko-based calibration method and outline how it is implemented (section 3). After presenting the results (section 4), we discuss the advantages and possible limitations of the Budyko-based calibration approach (section 5) before outlining the potential for future developments and further improvements (section 6).

Modeling Framework and Discharge Data
We use a state-of-the-art GHM-the CWatM-to assess differences between observed and modeled discharge in nine river basins of different sizes and characteristics. The simulated discharge is obtained from (i) the uncalibrated version of the model using a set of basic hydrological parameters, (ii) a model version calibrated against observed discharge within the nine river basins, and (iii) a model version calibrated using the newly developed Budyko-based calibration technique. It is then compared with observed discharge data.

CWatM
CWatM (Burek et al., 2020) is a hydrological rainfall-runoff and channel routing model developed at the International Institute for Applied Systems Analysis (IIASA). It is used to quantify water availability, human water use, and the effects of water infrastructure, including reservoirs, groundwater pumping, and irrigation. It is grid-based with recent versions for 0.5 • and 5-min resolutions (with subgrid resolution taking topography and land cover into account). It operates at daily time steps (with subdaily time stepping for soil and river routing). A schematic view of all featured processes is provided in Figure 1. Please note that our main focus here will be on the simulation of runoff and discharge.
CWatM is comparable to other state-of-the-art GHMs like H08 (Hanasaki et al., 2008), WaterGAP (Alcamo et al., 2003), PCR-GLOBWB ( van Beek et al., 2011;Wada et al., 2014), LPJmL (Bondeau et al., 2007) (Burek et al., 2019) and Github. In this study, CWatM is driven by the daily meteorological forcing data set WFDEI (WATCH Forcing Data methodology applied to ERA-Interim data; Weedon et al., 2014) at a 0.5 • spatial resolution and a daily temporal resolution. From WFDEI, the model requires daily estimates of P, as well as surface air temperature, wind speed, relative humidity, incoming longwave and shortwave radiation, and surface air pressure as inputs to calculate potential evaporation using the Penman-Monteith method based on a reference crop surface (Allen et al., 1998). A crop factor is used to account for differences in vegetation characteristics in comparison to the reference crop. Elevation data used in this study is extracted from SRTM (Jarvis et al., 2008) and the HYDRO1k Elevation Derivative Database (USGS, 2002). Soil properties are obtained from the Harmonized World Soil Database (FAO et al., 2012) and transferred to Van Genuchten model parameters (van Genuchten, 1980) using the pedotransfer model Rosetta3 by Zhang and Chaap (2017). The calculation of the fraction of direct runoff is based on the ARNO scheme (Todini, 1996). Data used for reservoir operation (water supply, flood control, hydropower generation, and others) are obtained from the HydroLakes database (Lehner et al., 2011;Messager et al., 2016). Crop-specific calendars and growing season lengths are derived from the MIRCA2000 data set (Portmann et al., 2010). CWatM uses the kinematic wave approximation of the Saint-Venant equations (Chow et al., 1988) for river routing. CWatM uses a local drainage direction map which defines the dominant flow direction in one of the eight neighboring grid cells (D8 flow model). For the 0.5 • version used here, the drainage direction map (DDM30) of Döll and Lehner (2002) is used. To calculate the kinematic wave in CWATM static maps of channel width, channel bankful depth, channel gradient, Manning's coefficient, and channel length are needed. These are calculated using the river network, the flow accumulation, elevation data, and average river discharge data. In order to calculate the associated maps, the approach of Pistocchi and Pennington (2006) is used. Please see Burek et al. (2020) for a more detailed description of CWatM hydrological processes.

River Basins
We selected a set of nine river basins worldwide, encompassing different climate and landscape characteristics (from arid to tropical wet, from lowland to mountainous), and size (ranging from 33,450 km 2 to 4.7 million km 2 and from 90 to 170,000 m 3 /s average annual discharge). This enables us to assess the potential of using the Budyko-based calibration approaches under varying conditions and to evaluate and analyze the potential caveats and limitations of the approach. For the basins, we used observed discharge from the Global Runoff Data Centre (GRDC), the United States Geological Survey (USGS), the Oesterreichische Wasserstrassen-Gesellschaft (viaDonau), and the Murray-Darling Basin Authority (MDBA). Table 1 provides an overview of the nine basins and their particular characteristics, sorted by their basin-wide aridity index from humid to arid. For calibration we used a time period of 10 to 20 years and either daily or monthly data depending on data availability. Validation was performed based on independent periods. Please note that the time series for the Upper Nile and the Zambezi rivers were too short to enable an assessment of independent calibration and validation periods

Calibration
Calibration is performed using two approaches: 1: SimBud. The newly developed Budyko-based calibration, which determines a parameter set through calibration comprising the characteristics of the Budyko framework. 2: SimDis. Calibrating CWatM against observed discharge for each catchment, thus representing the classic, ideal approach, where measured discharge is available, assignment to a gauge is clear and values are accurate.
The parameter sets obtained are common for all grid cells within each catchment and are used to set up individual catchment-specific model runs. The model results are further evaluated against discharge estimates obtained from model runs using an a priori, uncalibrated parameter set (Sim0). The uncalibrated parameter set is provided in Table 2 and is based on default parameter values either set to unity or to values that are commonly associated with an average hydrological response (such as for the preferential bypass flow or infiltration capacity).
Calibration is performed using an evolutionary computation framework in Python called DEAP (Fortin et al., 2012). DEAP implemented the evolutionary algorithm NSGA-II (Deb et al., 2002) which is used here to enable an assessment of two independent periods. All available observed data were already used for calibration.

Table 2
List  to perform a single objective optimization. The calibration generally uses a population size ( ) of 256 and a recombination pool size ( ) of 32. We use 30 generations. For more information on the convergence, please refer to Figures S1 and S2 in the supporting information.

Budyko Calibration:
The Budyko-based calibration approach SimBud requires no additional data and employs the Budyko framework to improve the long-term water balance characteristics of the hydrological model with respect to climatic aridity. Within the calibration procedure, we compute the multiyear sums of daily P, E, and PET at each grid cell within a specific catchment (see Figure 2). The resulting ratios and E∕P constitute a point cloud within the Budyko space. A straightforward approach to assess the offset from the expected median Budyko curve, would be to use the root-mean-square error (RMSE) to the original Budyko curve ( = 2.6) as the objective function. However, this approach would not account for the inherent, nondeterministic nature of , which led us to employ a probabilistic representation of the Budyko framework (Greve et al., 2015). This approach assumes that is a (quasi-)random variable, which was found to follow a Gamma distribution based on a set of several hundred catchments following the climate gradient of the contiguous United States. The objective function is thus based on a goodness-of-fit metric (here the Kolmogorov-Smirnov distance D i ) between the empirical distribution of the modeled values (F( i )) and a reference Gamma distribution F 0 (x) = (4.54, 0.37) + 1 that is based on the analysis of observations from several hundred catchments across the contiguous United States (Greve et al., 2015):

10.1029/2019WR026280
where sup denotes the supremum and This means that the calibration algorithm aims to find the parameter set that minimizes D i between the simulated and reference distribution. In summary, the calibration approach could be implemented with any appropriate goodness-of-fit metric and consists of the following general procedure: At first, model parameters that should be calibrated and that are ideally related to the long-term water balance within a particular region or basin are selected. An appropriate single objective calibration tool (e.g., DEAP) is chosen and the respective calibration settings (number of generations, population size, etc.) that provides sufficient convergence are specified. The model is subsequently calibrated based on the following steps: 1. For each grid cell within the chosen catchment or region, compute at least annual, or better, multiyear sums of P, E, and PET to obtain estimates of and E∕P. 2. Estimate (numerically) for each grid cell based on the values of and E∕P using equation (2). All values for each grid cell within the catchment are used to construct an empirical distribution. Please note here that is both very sensitive to small changes close to the physical limits and not defined outside the limits. We decided to ignore data points plotting outside the physical boundaries. However, it is important to make sure that the majority of data points plot inside the limits. 3. Define a reference distribution for . If no direct reference is available, we suggest using a Gamma distribution F 0 (x) = (4.54, 0.37) + 1. 4. Select a goodness-of-fit metric D i (e.g., the Kolmogorov-Smirnov test). 5. Implement the goodness-of-fit test metric D i (here the Kolmogorov-Smirnov distance) as objective function that should be minimized: F obj = min(D i ).
As outlined above, this approach can be implemented using any suitable goodness-of-fit test and the choice of reference distributions is flexible. The characteristics of such distributions might, for instance, be dependent on climate type (Padrón et al., 2017) and can simply be accounted for by adjusting the reference distribution. The Budyko-based calibration can be used with any other hydrological model, as it is independent of the internal model structure and only requires standard model output (e.g., discharge, precipitation, potential, and actual evaporation).
As the Budyko framework is valid at multiannual time scales, a Budyko-based calibration approach ideally needs a long (at least several years) calibration time period. Here we use calibration periods of at least 10 years (see Table 1). We further apply the Budyko calibration at grid scale. The rather coarse spatial resolution of 0.5 • ensures approximate water balance closure over long time scales, thus enabling the use of the Budyko framework at grid scale. Our approach further ensures that the resulting point cloud plots within the physical boundaries of the Budyko space and according to the probabilistic Budyko framework, thereby retaining the inherent variability within the Budyko space. The grid-based approach can also be used at other than basin scales.
Since the Budyko-based calibration approach focuses on the long-term water balance, being primarily related to the generation of runoff and evaporation at each grid cell, we decided to exclude parameters representing groundwater, routing, water demand, and lakes/reservoirs from the calibration procedure (see Table 2). The majority of these parameters (excluding groundwater) primarily represent short-term dynamics and calibrating them based on the long-term aridity index and precipitation partitioning is potentially spurious. Therefore, only five parameters (snowmelt coefficient, crop factor, soil depth factor, preferential bypass flow, and infiltration capacity) are finally calibrated using SimBud, whereas the other parameters are set to default values such as in Sim0. The parameter ranges are defined to cover a wide, though realistic range of the hydrological response associated with the parameters (see Table 2 for more information, units, and parameter ranges). The obtained best-fit parameter sets (based on a minimized Kolmogorov-Smirnov distance) are provided in Table 3.

Discharge Calibration:
We applied a modified version of the Kling-Gupta Efficiency (KGE) (Kling et al., 2012) as objective function: with r denoting the correlation coefficient between simulated and observed discharge, = s ∕ o representing the bias ratio between mean, simulated ( s ) and observed streamflow ( o ), and = CV s ∕CV o as the variability ratio between the simulated (CV s ) and observed (CV o ) coefficient of variation (unmodified KGE is computed based on the standard deviation).
Since r, , and have their optimum at unity, the KGE' could be interpreted as the Euclidean distance from the optimal value (i.e., unity) of the Pareto front and is therefore able to provide an estimate that simultaneously represents optimal correlation, mean bias, and variability. Discharge calibration is here performed by maximizing KGE against multiyear daily time series of observed discharge from the individual catchments. The calibration periods correspond to those of the Budyko-based calibration approach (see Table 1). Since validation against observed discharge can be performed within the calibration period for the Budyko calibration (as the observed discharge is not part of the calibration routine and the respective objective function), a discharge-based calibration approach ideally requires independent validation periods (see Table 1).
For discharge calibration, we use a set of 12 parameters that are optimized to maximize the KGE. These parameters represent snow, soil, groundwater, and routing characteristics, as well as evaporation processes and the treatment of lakes within the model and include the five parameters used in the Budyko-based calibration approach. Similar to the Budyko calibration aproach, the parameter ranges are defined to cover a realistic range of the hydrological response associated with the respective parameters (see Table 2 ). The obtained best-fit parameter sets (based on a maximized KGE) are provided in Table 3.

Results and Discussion
The nine basins chosen here represent a wide range of climate types, catchment characteristics, and catchment area. By selecting these basins, we are able to focus both on those catchments where a Budyko-based . Sample data clouds of Sim0 (red), SimBud (green), and SimDis (blue) for the nine study catchments. Each dot represents a grid cell within the catchment. The reference Budyko distribution used in this study is indicated by solid (median), dashed (25th, 75th quantile) and dotted lines (10th, 90th quantile). Subplots illustrate the associated sample distributions of alongside the reference distribution of as used in this study.Please note that the rightmost bar sums up all values larger than 4, including also those grid cells exceeding the theoretical limits of the Budyko framework.
calibration leads to significant increases in model performance and on those catchments illustrating the limitations and caveats of the simple Budyko-based calibration approach. We first assess the implications of the Budyko-based calibration approach (SimBud) within the Budyko space and compare it to the uncalibrated (Sim0) and discharge calibrated model runs (SimDis). In a following step, we discuss potential improvements regarding daily and seasonal discharge based on the obtained results. The obtained model results from the three experiments are validated against observed discharge. The model performance estimates for Sim0, SimBud, and SimDis presented in the following are for the validation periods indicated in Table 1. However, please note that SimBud does not require independent validation and calibration periods since no discharge observations are needed for calibration. In the following we present results for the respective best-fit parameter sets. Considering a larger selection of those simulations obtained during the calibration process that best minimize the objective function enables an assessment of the range of possible model results and parameter ranges and is provided in the supporting information ( Figures S3, S4, and Table S1).

Implications of Calibration Within the Budyko Space
By focusing on the Budyko space and the sample distributions of obtained from the three considered experiments (Sim0, SimBud, and SimDis), we are able to assess the associated implications of choosing a  particular calibration approach (see Figure 3). For the majority of catchments (Rhine, Zambezi, Sacramento, Danube, Murray-Darling, and Indus), the uncalibrated run Sim0 generally plots below the median Budyko curve (according to the reference Budyko distribution)-a sign that the uncalibrated model favors runoff. This is both evident within the Budyko space and from the associated sample distributions. The distributions generally peak at values below both the median and mean of the reference gamma distribution. SimBud generates distributions of better resembling the reference distribution. For these catchments, the resulting data cloud fits the distribution of Budyko curves within the Budyko space reasonably well.
The sample distribution of obtained from SimBud thus provides the best representation of the reference distribution in most catchments. Using discharge calibration SimDis, the data cloud often plots above the reference Budyko distribution (especially for the Rhine, Upper Nile, Danube, and the Amazon). In some catchments (Rhine, Danube, and the Amazon) many data points further plot above the theoretical limits of the Budyko space, therefore actually distorting the sample distributions of (technically is not defined for values exceeding the supply and/or demand limit). Even in catchments with a rather good representation of under Sim0 (e.g., Upper Nile and the Danube), SimDis usually shifts the data cloud to values above the median Budyko curve.
For the Amazon and Indus river basin, both SimBud and also SimDis do not resemble the reference distribution, either peaking above (Amazon) or below (Indus) the Budyko distribution. The largest offsets are obtained for the Indus, where especially SimDis plots far below the Budyko distribution. The best representations of the reference distributions are obtained in both the Upper Danube and the greater Danube, the Upper Nile, and the Murray-Darling.

Implications of Calibration Regarding Catchment Discharge
Here we analyze the implications of the calibration experiments on (i) monthly river discharge and (ii) the representation of the mean annual cycle of river discharge. We further provide an overview of the relevant statistics in comparison to the observed discharge time series (see Table 3). Figure 4 provides an overview of (i) the observed time series of monthly discharge and the modeled time series obtained from (ii) Sim0, (iii) SimBud, and (iv) SimDis. Focusing on these time series, the most striking feature is that in those catchments with Sim0 generally plotting below the reference Budyko distribution-an indication of the uncalibrated model to favor Q over E (see Figure 3, Zambezi, Upper Nile, Sacramento, Murray-Darling)-modeled discharge is generally overestimated in all months or specifically within the wet season (as for the Zambezi). This is largely resolved through calibration. However, while SimBud substantially increases the overall model performance, it is generally outperformed by SimDis.

Monthly Time Series
By further examining the associated model statistics (Table 4), we aim to assess whether the Budyko-based calibration approach SimBud provides a substantial improvement in model performance against the uncalibrated model Sim0 and how this compares to improvements achieved through the discharge calibration method SimDis.

Water Resources Research
10.1029/2019WR026280 Figure 5. Observed (gray) and modeled multiyear mean annual cycles of river discharge for the (a-i) nine study catchments. Please note differences in the total amount of monthly river discharge. Shaded areas denote the multiannual standard deviation.
For the Upper Danube and the Rhine, the uncalibrated run Sim0 already provides a rather good representation of the observed discharge time series, showing a reasonable KGE, NSE, and correlation coefficient and small biases. The statistics are generally improved under SimDis, while SimBud slightly improves KGE and most of the other statistics. Among the humid catchments featured in this study, the Amazon shows lowest KGE in Sim0, while NSE, correlation, and bias are comparable to those of the Rhine and the Upper Danube. However, in the Amazon SimBud does not improve NSE and biases, while SimDis substantially improves KGE and NSE. For the greater Danube, KGE and correlation coefficient are already reasonable in Sim0 and are slightly improved under SimBud compared to an overall stronger improvement under SimDis. However, NSE and bias are, compared to the smaller European catchments (Rhine, Upper Danube) and the Amazon, rather poor and again slightly improved under SimBud (while the bias still remains above 20%).
SimDis indeed significantly reduces the bias and further improves NSE.
Moving toward more arid climates, the overall performance of Sim0 is generally rather poor for the Upper Nile basin (including Lake Victoria), the Zambezi, the Indus, and the Sacramento rivers. Biases are especially large in Sim0 and are largely improved in SimBud, even though substantial biases remain compared to improvements obtained in SimDis for the Upper Nile and the Zambezi. While KGE was already quite high in Sim0 and further improved by SimBud in the Upper Nile, large improvements from a small KGE in Sim0 toward a reasonably larger KGE in SimBud are obtained for the Zambezi and the Sacramento rivers. This also applies to NSE. Nonetheless, improvements obtained through discharge calibration SimDis are in any case more substantial than those obtained by SimBud, especially for the Upper Nile basin. Nonetheless, the Budyko-based calibration approach SimBud improves the large offset and largely improves the model performance for the selected arid catchments, especially for the Zambezi and Sacremento rivers. In the Zambezi basin, the uncalibrated run Sim0 shows poor model performance, with almost no skill in simulating observed discharge, especially due to large positive biases that substantially overestimate discharge in the wet season. Further improvements are obtained using SimDis, almost completely resolving the bias.
From the arid catchments, the Indus and especially the Murray-Darling show poor performances in Sim0.
For the Indus basin SimBud provides no potential improvement and largely underestimates observed discharge. However, even though SimDis slightly improves KGE, NSE, and the correlation coefficient, the overall model performance for SimDis is also still rather poor, implying that CWatM may not be capable of capturing the hydrology of the Indus. Starting from a very poor overall perfomance of the model in Sim0 for the Murray-Darling basin, large improvements are achieved by using SimBud, while, however, overall model performance remains at a poor level. SimDis does indeed substantially improve biases and model performance in the Murray Darling basin, providing an overall reasonable model performance. For an illustration of the percentage bias as a function of please refer to Figure S5. Figure 5 provides representations of the observed and modeled mean annual cycles of river discharge within the study catchments. We are again able to distinguish between catchments in which Sim0 already provides a good representation of the observed annual cycle (Rhine, Upper Danube, and also partly the Danube), and catchments in which Sim0 generally overestimates discharge throughout all seasons (Upper Nile, Sacramento, and Murray-Darling) or specifically within the wet season (Zambezi). In the Amazon, the observed annual cycle is more pronounced compared to Sim0, showing more (less) discharge in the wet (dry) season, resulting in an overall rather small relative bias (see Table 3). In the Indus, modeled discharge is largely underestimated in the wet season. SimDis generally resembles the observed annual cycle, except for the Amazon basin, showing a slight shift in peak discharge and for the Indus, largely underestimating wet season discharge. SimBud also resembles observed discharge in many catchments (Rhine, Zambezi, Sacramento, and the Upper Danube), while it overestimates winter discharge in the greater Danube. In some catchments (Upper Nile, Murray-Darling, and the Danube), SimBud still shows substantial biases, but outperforms the uncalibrated run Sim0 and shifts the modeled run toward the observed annual cycle. In general, for all catchments, SimBud leads to a more realistic representation of the observed annual cycle.

Advantages and Limitations of a Budyko-Based Model Calibration
We used a set of nine catchments of varying size, surface, and climate characteristics in order to evaluate the potential of a Budyko-based calibration approach for large-scale hydrologic models. We were able to show that a Budyko-based calibration method substantially increases model performance in the majority of catchments, but does not lead to similar improvements as those achieved through a classical discharge-based calibration approach. However, calibration based on the Budyko framework requires no additional data and is easy to implement into various modeling frameworks. As the Budyko framework provides a simple, conceptual characterization of the long-term water balance as a function of climatic aridity, it ensures its reasonable representation within a modeling context. It was especially shown to be useful in reducing biases and also in increasing model performance regarding monthly river discharge dynamics. As it is easy to implement within any calibration tool, we conclude that a Budyko-based calibration approach could be a useful alternative to uncalibrated, basic hydrological parameter sets that are widely used in large-scale hydrologic models, especially in ungauged or poorly gauged regions.
While it has advantages, a simple Budyko-based calibration is also subject to a variety of potential limitations, and caution in the interpretation and applicability of the suggested calibration approach needs to be taken. Nonetheless, the Budyko model ensures a physical consistent representation of the long-term water balance based on climatic aridity. A Budyko-based calibration approach makes use of this and enables a better representation of long-term water balance components in complex, large-scale models, thereby substantially reducing biases and improving skill in regions with poor default model performance.
We used a set of catchments that allowed us to assess the disadvantages and caveats of the Budyko calibration across different climates. Focusing, for instance, on both the greater Danube basin and an Upper Danube subbasin, could provide us with some insights related to issues arising from growing catchment size. When focusing on the large greater Danube Basin as opposed to the smaller subbasin in the Upper Danube, Sim-Bud does not provide reasonable improvements, but rather reveals possible limitations of the Budyko-based calibration approach. In the larger, almost continental-sized greater Danube basin, other processes related to the routing, water extraction, and representation of lakes and reservoirs likely become more important. Such processes are not considered within the Budyko-based approach as these are assumed to be only effectively represented through discharge calibration. This becomes even more evident when considering one of the largest river basins in the world, the Amazon basin. In the Amazon, average discharge is around 170.000 m 3 /s and depends mostly on the timing of streamflow, which is primarily determined through river

10.1029/2019WR026280
routing and the representation of lakes and reservoirs. Hence, even though SimBud slightly improves model performance for the Amazon, the improvements obtained in SimDis are far more substantial.
The Budyko-based calibration approach further improves the long-term water balance at grid level. Certain river basins that extend into semiarid to arid regions might not be well represented within SimBud since most of the runoff is generated in a rather small fraction of the catchment area. We illustrate this by considering the Murray-Darling river basin. The Murray-Darling rises within the humid eastern Australian mountain ranges, while most of the basin area is semiarid and the transmission loss from the river is substantial. The transmission loss is mostly controlled through routing processes, which are not assumed to be effectively calibrated in SimBud. Since the Budyko approach aims to better represent the long-term water balance as a function of climatic aridity at each grid cell, a lot of emphasis is put on the large number of semiarid grid cells that do not generate substantial amounts of runoff. Hence, large improvements are expected and achieved through using SimDis, while SimBud reduces the very large bias of Sim0, without substantially improving most other statistics. This might also apply to other river basins with similar characteristics, such as the Niger or the Nile. Additionally, it is important to consider that many of those basins (and especially the Murray-Darling basin) are heavily managed, and water extractions for irrigation are substantial and need to be taken into account to provide realistic discharge simulations.
Generally speaking, it appears that SimBud shows the best performance under humid and more temperate conditions (e.g., in midlatitude regions such as Europe and North America), while already Sim0 provides reasonable results in these regions. The highest potential for model improvements is indeed found in semiarid to arid catchments, where the overall model performance of Sim0 is rather poor and largely improved by SimBud, despite still showing substantial biases (see Table 4). A reasonable representation of discharge in arid regions also requires a good setup of model parameters that are not necessarily improved through SimBud. Even though not explicitly assessed here, it is likely that SimBud will also improve model performance in high-latitude catchments, since snowmelt coefficients are directly calibrated. As mentioned above, it is also important to consider that the nine considered catchments are subject to human interventions. Previous studies suggest that population density (Jiang et al., 2015;Wang & Hejazi, 2011), irrigation (Han et al., 2011;Jaramillo & Destouni, 2015;Wang & Hejazi, 2011), and dams and reservoirs substantially impact the representation of the water balance within the Budyko framework. The results also need to be considered in this context.
Another example of basins where no significant improvement in model performance could be expected from SimBud is the Indus. Neither SimBud nor SimDis improve model performance in the Indus basin as a result of either poorly constrained forcing data or a poor parameterization of relevant processes within CWatM. However, even sophisticated calibration approaches are not able to overcome and address biases in the forcing data, with SimBud being no exception to this. Discharge in the Indus basin is thus largely underestimated in both SimBud and SimDis.
It is also important to note that the calibration method presented here could be further improved. This might, for instance, be achieved by using different reference distributions representing variations in . Such distributions depend, for example, on the climate type (Padrón et al., 2017). However, this also implies that we might overestimate the variability in by using a reference distribution obtained from observations covering a large climatic gradient. Another improvement could be the use of other, more comprehensive goodness-of-fit measures (such as, the Anderson-Darling test) or approaches based on maximum likelihood estimation. These could, in comparison to the Kolmogorov-Smirnov distance, provide a better representation of the characteristics of the particular reference distribution and might increase the selectivity of a parameter set in order to represent the entirety of and E∕P ratios. It should further be considered that, at the moment, only the water balance of a grid cell without incoming discharge and evaporation from rivers and lakes are estimated and neither the storage term, incoming discharge from upstream cells, nor evaporation from lakes and wetlands are taken into account. Taking this into consideration, possible limitations with respect to the Budyko-based calibration approach need to be evaluated in the future.

Concluding Remarks
The Budyko-based approach improved biases and model performance in comparison to an uncalibrated model simulation throughout a set of nine catchments. However, as expected, these improvements in model 10.1029/2019WR026280 performance are not as substantial in comparison to a classical approach using river discharge as the objective function for calibration. Nonetheless, our initial hypothesis, stating that a Budyko-based calibration method potentially provides better results than using a priori sets of basic hydrological parameters without calibration, was proven valid. Hence, the Budyko-based calibration method introduced here can be seen as an independent and beneficial approach over using predefined parameter sets in ungauged catchments or in regions with sparse data.
Given previous conceptual research to constrain the parameter space of hydrological models based on the Budyko framework (Kapangaziwiri et al., 2012;Li et al., 2014), we present here a novel method to directly and independently calibrate hydrological model parameters at grid scale without the use of any additional data. As calibration based on the Budyko framework ensures a good representation of the long-term water balance based on climatic aridity, improvements in annual and partly also in seasonal hydrological statistics are achieved and improve hydrological model performance most in poorly gauged basins.
The simplicity of the Budyko framework should enable an easy implementation into other existing large-scale hydrological models. Compared to calibration based on various data products, Budyko-based calibration approaches can be directly applied and require no preprocessing and data handling. However, while remote sensing products potentially provide more advanced results, especially in data scarce and ungauged regions, it is also important to note that these products obtained from satellite data are based on various (model) assumptions and are thus subject to large uncertainties and to multiple sources of uncertainty. Hence, a Budyko-based calibration approach will provide an easily accessible tool to ensure an improved representation of the long-term water balance as a function of climatic aridity within large-scale hydrological models. It could further be implemented without any data prerequisites. And, if not alone, it will at least provide a useful addition to calibration approaches using multiple information sources based on, for example, remote-sensing products (Nijzink et al., 2018).
However, it is important to note that the improvement in model performance is subject to the particular model setup and the choice of model parameters. Using other hydrological models or more sophisticated parameter sets might not permit a substantial improvement in model performance under Budyko-based calibration approaches. However, due to the general applicability of the Budyko framework, representing (and therefore ensuring) valid and consistent long-term hydrological conditions as a function of climatic aridity, it is likely that a Budyko-based calibration will improve model performance in other hydrological models as well (at least in comparison to uncalibrated setups).
In summary, the Budyko-based calibration approach presented in this study enables us to implement a calibration tool for CWatM, ideally performing a global calibration for all catchments in order to replace the basic hydrological parameter assumptions that are used in most GHMs. This will potentially overcome unrealistic representations of available water resources in river catchments.