Bayesian Model Averaging With Fixed and Flexible Priors: Theory, Concepts, and Calibration Experiments for Rainfall‐Runoff Modeling

This paper introduces for the first time the concept of Bayesian model averaging (BMA) with multiple prior structures, for rainfall‐runoff modeling applications. The original BMA model proposed by Raftery et al. (2005, https://doi.org.10.1175/MWR2906.1) assumes that the prior probability density function (pdf) is adequately described by a mixture of Gamma and Gaussian distributions. Here we discuss the advantages of using BMA with fixed and flexible prior distributions. Uniform, Binomial, Binomial‐Beta, Benchmark, and Global Empirical Bayes priors along with Informative Prior Inclusion and Combined Prior Probabilities were applied to calibrate daily streamflow records of a coastal plain watershed in the southeast United States. Various specifications for Zellner's g prior including Hyper, Fixed, and Empirical Bayes Local (EBL) g priors were also employed to account for the sensitivity of BMA and derive the conditional pdf of each constituent ensemble member. These priors were examined using the simulation results of conceptual and semidistributed rainfall‐runoff models. The hydrologic simulations were first coupled with a new sensitivity analysis model and a parameter uncertainty algorithm to assess the sensitivity and uncertainty associated with each model. BMA was then used to subsequently combine the simulations of the posterior pdf of each constituent hydrological model. Analysis suggests that a BMA based on combined fixed and flexible priors provides a coherent mechanism and promising results for calculating a weighted posterior probability compared to individual model calibration. Furthermore, the probability of Uniform and Informative Prior Inclusion priors received significantly lower predictive error, whereas more uncertainty resulted from a fixed g prior (i.e., EBL).


Introduction
Model uncertainty is a critical problem that raises questions about the alternative modeling paradigm to simulate observed processes. Which set of the model approaches is appropriate to faithfully simulate and explain the variation in the observational records? How should one explicitly or implicitly evaluate the suitability of alternative models and characterize predictive uncertainty arising from different modeling assumptions? In addressing these questions, multimodel ensembles (MME) has become a popular alternative for probabilistic merging of simulations. By exploiting the information contained in multiple modeling structures, the MME approach is expected to provide better and more reliable estimates of forecast uncertainty. In the last decade, MME has gained popularity in different research disciplines including climatology (Barnston et al., 2003;Grimitt & Mass, 2002;Palmer et al., 2004;Raftery et al., 2005), public health (e.g., Thomson et al., 2006), and agriculture (e.g., Cantelaube & Terres, 2005). In hydrology, MME has led to significant improvement in flow simulation and estimates of the forecast probability density function (pdf; e.g., Duan et al., 2007;Doblas-Reyes et al., 2005;Gneiting et al., 2005;He et al., 2018;Madadgar & Moradkhani, 2014;Min & Hense, 2006;Najafi & Moradkhani, 2015;Rajagopalan et al., 2002Rajagopalan et al., , 2005Rings et al., 2012; among others).
The first attempt in using MME in hydrology used the simple average method (SAM), the weighted average method (WAM) and the neural network method (NNM) in simulation (Shamseldin et al., 1997). Fuzzy systems were also employed to combine the simulation results of different conceptual rainfall-runoff models in a flood forecasting study (Xiong et al., 2001). Subsequently, a number of studies showed that ensemble simulations outperformed the best model if the aim was to use the outputs in operational forecasting system Grantz et al., 2005;Georgakakos et al., 2004;Rajagopalan et al., 2005).
The theoretical basis of MME was strengthened by the introduction of Bayesian model averaging (BMA), a statistical method of producing probabilistic forecasts from ensembles of regression models, accounting for model uncertainty within a Bayesian framework. An excellent overview of the theory behind BMA is given by Hoeting et al. (1999). Raftery et al. (2005) provided an early application of BMA to weather forecasting. BMA addresses uncertainty by calculating a weighted average of all potential model combinations (Feldkircher & Zeugner, 2009) and describes both between-and in-model variances (e.g., Ajami et al., 2006). These weights arise naturally as posterior model probabilities (PMPs) and sum all strengths of individual competing models based on the probabilistic likelihood measures of a model.
Despite recent applications of BMA in hydrology (Ajami et al., 2006;Duan et al., 2007;Darbandsari & Coulibaly, 2019;Grantz et al., 2005;Mendoza et al., 2014;Najafi & Moradkhani, 2015;Parrish et al., 2012;Rajagopalan et al., 2005;Schepen & Wang, 2015;Sharma et al., 2019;Vicuña et al., 2011;Xu et al., 2019), limited insight has been provided into the conditional distribution of each individual forecast model. The preliminary reason is that the most common BMA approach centers on a linear function of the original forecast and standard deviation and assumes each ensemble member has a normal prior distribution with a similar variance. However, Rings et al. (2012) have recently strengthened this approach and proposed a joint particle filtering and Gaussian mixture modeling framework to derive the conditional pdf. Nevertheless, both approaches can be criticized for relying too strongly on a symmetric prior distribution (i.e., normal or Gaussian priors) that may be difficult to justify in hydrological applications (see Samadi et al., 2018). In other words, BMA PMPs in the context of model uncertainty are typically rather sensitive to the specification of the priors (Fernández et al., 2001). An alternative and more appealing approach would apply a hierarchy of prior specifications.
BMA uses a Bayesian approach to quantify model uncertainty. Specifically, consider a class of regression models denoted by M γ where the subscript represents a model index. The analyst specifies a prior on the unknown θ γ = (α, β γ , φ) ∈ Θ γ , where α denotes an intercept that is common to all models, β γ is the p γ -dimensional vector of nonzero regression coefficients, and φ represents a precision parameter (the inverse of the error variance) in each model. The priors on the regression parameters generate prior distribution for the possible models p(M γ ), and from these posterior probabilities of each model can be obtained (Liang et al., 2009): This study aims to use four rainfall-runoff models to predict (via simulation) daily average discharge. The outputs of these rainfall-runoff models are denoted as X 1 , X 2 , X 3 , X 4 (Equation 2). As explained in section 2.2, the models use forcing input data to simulate the streamflow. The simulated outputs from the hydrological models are then used as the regressors in the BMA approaches to predict observed daily discharge Y from 2003 to 2005, each simulation model provided a weight to the predicted daily streamflow records: where β i is considered as the weight of the ith hydrological model and ε represents the random error term.
BMA was applied in model selection to calculate the weights based on the performance of each hydrological model during a training period. Various priors on the regression coefficients will be considered. One should be wary of subjective priors for model-specific coefficients that are not robust enough for a high-dimensional model space (Fernández et al., 2001;Liang et al., 2009). However, despite the acknowledgment that PMPs can be quite sensitive to the specification of the prior distribution (Garthwaite & Mubwandarikwa, 2010;George, 1999;Kass & Raftery, 1995;Zellner, 1986), the BMA approach has not gained significant attention in practice primarily due to the perceived computational burden.
To address this shortcoming, Zellner's (1986) g prior for β γ (Equation 3) with sample size N and design matrix X (which contains K potential variables or, in this case, hydrologic simulations) was employed in this study.
Zellner's prior takes the form of the normal distribution with zero mean vector and covariance matrix gφ Zellner's priors can be used as a special case of mixtures of g priors (Liang et al., 2009), usually combined with a locally uniform (Jeffreys) prior on β 0 (Bové & Held, 2011), if the design matrix As an extension to Zellner's g priors, the hyper-g priors distribution, proposed by Liang et al. (2009), was applied in this study. These priors permit a closed-form expression for the corresponding marginal likelihood f(y|γ) which is vital for efficient model inference (Bové & Held, 2011). The prior on the hyperparameter g is selected, based on standard Bayesian asymptotic theory proposed by Bernardo and Smith (2000), so that the prior distribution converges to the normal distribution (Equation 4).
Both the Zellner's g and hyper-g prior families provide flexibility, adaptability, and a computationally efficient procedure to deal with high-dimensional (noisy) data that are common in hydrological simulation.
The practical advantage of Zellner's g is that it exerts nonnegligible influence on posterior inference and governs how posterior mass is spread over the models while hyper-g prior families adjust the distribution of posterior mass based on the information provided by the data (prior dependence) and greatly reduce the g prior sensitivity. More significantly, the hyper-g prior has several advantages for hydrological model simulation. First, the hyper-g prior greatly reduces the g prior sensitivity of posterior mass by shrinking the estimated coefficients more toward zero in noisier data sets, which allows for a data-dependent shrinkage factor (also known as a Bayesian "goodness-of-fit" indicator). Second, it adjusts the distribution of posterior mass based on the patterns in the data. Thus, if noise dominates the data set, the posterior distribution will be distributed more evenly, whereas in the case of minor noise, posterior mass will be concentrated even more as in fixed settings that impose large values for g (Feldkircher & Zeugner, 2009). Third, in addition to being computationally feasible, it gives the user the flexibility of formulating prior belief without the risk of affecting posterior statistics.
Since the introduction of these priors in 2009, they have rarely been implemented in any hydrological modeling application. In this sense, we implemented the contributions of Liang et al. (2009) and Feldkircher and Zeugner (2009) to BMA for a coastal plain streamflow calibration where a closed-form representation of posterior mass was appropriate based on the noisy data set. In addition, we included several prominent prior structures such as the benchmark prior (Fernandez et al., 2001) to examine the predictive properties of various settings for g. These priors lead to simple closed-form expressions of posterior statistics and the resulting marginal likelihood that may shed light on predictive posterior mass driven by various prior structures for streamflow calibration. We hypothesize that these priors can enhance hydrological simulation if we relax the assumption of a preconceived and time-invariant form of prior in favor of fixed (Zellner's g families) and flexible (hyper-g prior families) priors. This should further help integrate the concept of BMA with multiple prior structures into hydrological modeling calibration.
This paper introduces the theory and concepts of fixed and flexible priors and demonstrates their usefulness and applicability for daily streamflow simulation of a coastal plain drainage system in the southeast United States (SEUS). A new sensitivity analysis model, the generalized parameter sensitivity analysis (GSA), was developed to define which parameters have significant influence on streamflow calibration of conceptual to semidistributed hydrological models. The simulations were then accomplished using the BMA procedure with various prior specifications, combining the simulations of the posterior model distribution for each potential hydrological model. As the first attempt to employ this learning process, this paper does not deal with the details of each individual modeling simulation, but rather focuses on the application of multiple priors for the BMA simulation, aiming at better modeling performance and introducing this procedure to the hydrology community.
This paper is organized as follows: In section 2, the underlying theory and concepts of BMA with various priors are introduced. This is followed with a detailed description of the hydrological models, parameter uncertainty algorithm, and the study area. In section 3, we present the results of parameter uncertainty and streamflow simulation. Section 4 shows the results using fixed and flexible priors in BMA analyses. Finally, in section 5 a summary with conclusions is presented.

BMA
BMA is a statistical postprocessing method that addresses model uncertainty in a canonical regression problem . If we consider a linear model structure, with y being the dependent variable, α γ a constant, β γ the coefficient, and ε a normal independent and identically distributed (IID) error term with variance σ 2 : In the following discussion, X is a model matrix whose columns correspond to candidate explanatory variables and X γ denotes any model containing a subset of such explanatory variables. If there are many potential explanatory variables, each subset of which forms a model X γ from the columns of matrix X, then variables that should be included in the model need to be determined by estimating models for all possible combinations of these variables X 1 , X 2 , .... In perusing this, a single linear model that includes all possible variables may be inefficient and infeasible especially if there are a limited number of observations. Including unnecessary explanatory variables can also lead to bias in the estimates of the coefficients of the important explanatory variables.
To overcome this challenge, BMA has become a popular alternative to model selection. BMA tackles the problem by estimating models for all possible combinations of models and constructs a weighted average over all of them (Zeugner & Feldkircher, 2015). If X contains K potential variables, this means estimating 2 k variable combinations and thus 2 k models. The weighted average stems from PMPs that arise from Bayes' theorem shown below: where P(y| X) represents the integrated likelihood which is constant over all models (a multiplicative term). Thus, the PMP (α) is proportional to P(y| M γ , X) as the integrated likelihood (Zeugner & Feldkircher, 2015) which reflects the probability of the data given model M γ . By renormalization of the Equation 6, the PMPs and the model weighted posterior distribution can be inferred for any statistic θ or the estimator of the coefficient β γ .
One can elicit the model prior p(M γ ) that reflects prior distribution. In BMA theory, Markov chain Monte Carlo (MCMC) samplers are used to gather results on the most important part of the posterior model distribution; thus, to approximate it as closely as possible (e.g., Zeugner & Feldkircher, 2015). This study used two MCMC samplers that are different in the way they propose candidate models. Specifically, we used Birth-Death (BD) and Reversible-Jump (Rev.jump) MCMC samplers. BD is the most common sampler in BMA that wanders through model space by adding or dropping regressors from the model. In this algorithm, a potential covariate (K) is randomly chosen; if K forms as part of model M i , then M j as the candidate model will have the same set of covariates as M i but for the chosen variable (Zeugner & Feldkircher, 2015).
Rev.jump was proposed by Madigan and York (1995) that draws a candidate model using BD algorithm with 50% probability. M j randomly drops one covariate with respect to M i and (randomly) adds one chosen from the potential covariates that were not included in the model M i (Zeugner & Feldkircher, 2015). We refer the readers to Zeugner and Feldkircher (2015) for more information on this sampler. The readers are referred to Raftery et al. (2005), Fernández et al. (2001) and Ley and Steel (2009) for more information on BMA.

Mathematical Structures of Priors 2.2.1. Zellner's g Priors
Noting that the Zellner's g prior forms part of the conjugate normal-gamma family in Equation 3, the posterior probabilities of models can be expressed through the Bayes factor for pairs of hypotheses using Equation 8: where BF M γ 0 : M b h i is the Bayes factor that compares each M γ to a base model M b (known as the "encompassing" approach; see Zellner & Siow, 1980) given by Here, we refer the choice of the base model of M N (the null model) as the null-based approach, while the full-based approach utilizes the full model as the base model (see Liang et al., 2009). M γ and M N are compared through the hypotheses H 0 : β γ = 0 and H 1 : β γ ∈ R pγ .
We may assume here that the columns of X γ have been centered so that 1 T X γ = 0, in which case the intercept α may be regarded as a common parameter to both M γ and M N (Liang et al., 2009). This has led to the adaptation of where α represents the (scalar) intercept. The priors in Equations 8 and 9, as a default prior specification for α, β γ , and φ under M γ are simply Zellner's g prior. The marginal likelihood of Equations 10 and 11 can be estimated as below: where R 2 γ denotes the ordinary coefficient of determination of the regression model M γ . If we compare M γ with the null model M N , the resulting Bayes factor is In the next subsection, we explain fixed and flexible g priors based on Zellner's priors used in this study.

Fixed g Priors
In BMA theory, the choice of g effectively controls model selection. A large g typically concentrates the prior on parsimonious models with a few large coefficients (Liang et al., 2009), while a small g leads to concentrate the prior on saturated models with small coefficients (George & Foster, 2000). In this study we used below fixed g priors, and key terms are explained below in the following.
1. Unit information prior (UIP or g-UIP) or uniform prior. The UIP g prior, proposed by Kass and Wasserman (1995), includes the amount of information about the parameter equal to the amount of information contained in one observation. The amount of information in a parametric family is defined through the Fisher information. In this prior, the unit information prior corresponds to taking g = n, leading to a Bayes factor that behaves like the Bayesian information criterion (BIC; Liang et al., 2009). The UIP prior reflects a common prior model probability of p(M γ ) = 2 −K . K denotes the number of potential regressors (in our context, the number of hydrological models). The restriction in choosing prior expected model size and other factors is the main drawback of uniform prior distribution model in the BMA theory. 2. g-Risk inflation criterion (g-RIC or RIC). Foster and George (1994) proposed to calibrate a BMA model based on the g-RIC. Setting g = K 2 calibrates the PMP to asymptotically match the risk inflation criterion proposed by Foster and George (1994). 3. Benchmark (g) prior (BRIC). Fernández et al. (2001) recommended the use of g = max(N, K 2 ) where K is the number of potential regressors, and N represents the sample size. BRIC bridges the g-UIP and the g-RIC priors, depending on the dimension of K. 4. Empirical Bayes Local (EBL) prior. EBL estimates a separate g for each model. The estimated g using the EBL prior is the maximum marginal likelihood estimate that is nonnegative and is computed using the following formula.ĝ denotes the usual F statistic to test β γ = 0.

5.
Global empirical Bayes (GEB) prior. This prior assumes one common g for all models in the BMA model and estimates g from the marginal likelihood of the data and averages them over all models (Equation 15).
One may view the marginal maximum likelihood estimate of g as a posterior mode under a uniform (improper) prior distribution for g.

6.
Binomial model prior. The binomial model prior is a simple alternative to the uniform prior. This approach sets a common probability θ of including each regressor. The prior probability of a model of size k γ is given by

Flexible g Priors
Let π(g) denote the prior on g. The marginal likelihood of the data p(Y| M γ ) is proportional to the Bayes factor computed as below: whereα andβ γ are the ordinary least squares estimates of α and β, respectively, for model M γ . As explained above, the PM of β γ under a specific selected model is a linear shrinkage (or Bayesian "goodness-of-fit" indicator) estimator with a fixed shrinkage factor g/(1+g). On the other hand, using a set of flexible g priors leads to adaptive data-dependent shrinkage. In the flexible g priors under model averaging, the optimal (Bayes) estimate of μ under squared error loss is the PM calculated using Equation 19.
Since g is involved in Bayes factors, model probabilities, and PMs and predictions, the choice of prior on g is vital for accurate computations of these quantities. In this study, we used flexible priors based on fully Bayesian approaches proposed by Zellner & Siow, 1980 (or Zellner-Siow's Cauchy prior) which is based on an inverse-gamma prior on g as well as an extension of the Strawderman (1971) prior to the regression context (i.e., the hyper-g prior). Zellner-Siow's Cauchy prior is a special case of mixtures of g priors. A brief description of these two priors is given below.

Binomial-Beta model prior.
To reflect prior uncertainty about model size, one should rather impose a prior that is less tight around the expected model size (Zeugner & Feldkircher, 2015). Therefore, Ley and Steel (2009) proposed a hyperprior on θ effectively drawing it from a Beta distribution. They adopted a combination of a "noninformative" improper prior on the common intercept and scale, a so-called g prior (Zellner, 1986) on the regression coefficients, leading to the prior density of p α; which makes θ random rather than fixing it. In this formula, α is an intercept, Z represents a design matrix (all possible combinations of the models), K represents a set of possible regressors (here the number of rainfall-runoff models) in Z, M j is the model with the 0 ≤ k j ≤ k regressors grouped in Z j , β j contains the relevant regression coefficients and σ is a scale parameter. 2. Custom prior inclusion probabilities. In this prior, for each model size k, there are K over k models (K choose k), of which K-1 over k-1 models (K-1 choose k-1) contain the user preferred variable i (e.g., Zeugner & Feldkircher, 2015). This information can be integrated into a more general model prior "creator" function that can be achieved using . The advantages of choosing this prior is that one could add much more general options to the model and define the proportion of weights (or probability) that each model contributes to the BMA posterior probability. We refer the readers to Zeugner and Feldkircher (2015) for more information. 3. Zellner-Siow prior. Zellner and Siow (1980) introduced multivariate Cauchy priors. If the two models under comparison are nested, the Zellner-Siow strategy is to place a flat prior on common coefficients and a Cauchy prior on the remaining parameters (Liang et al., 2009). The Zellner-Siow prior can be represented as a mixture of g priors with an inverse-gamma prior on g, computed as with 10.1029/2019MS001924

Journal of Advances in Modeling Earth Systems
This prior computes a one-dimensional integral over g using standard numerical integration techniques or using a Laplace approximation (see Liang et al., 2009).

4.
Hyper-g prior. The posterior distribution corresponding to the hyper-g prior can be derived using Equation 22.
where 2 F 1 (a, b; c; z) is the Gaussian hypergeometric function explained by Abramowitz and Stegun (1970). Liang et al. (2009) advocated that the normalizing constant in the prior on g is a special case of the 2 F 1 function with z = 0, which is referred to as the hyper-g prior. This normalizing constant in the posterior g that leads to the null-based Bayes factor formulated in Equation 23.
The PM of g under M γ can be calculated as The shrinkage factor of under each model can be estimated using Equation 25.
represents the posterior distribution of the shrinkage factor or goodness of fit that behaves similar to information criteria. Readers are referred to Tierney and Kadane (1986) and Liang et al. (2009) for further information on the hyper-g prior.
The practical application of BMA prior selection and posterior calculation were performed using two MCMC samplers. Specifically, this study used Birth-Death (BD) and Rev.jump MCMC samplers to summarize the conditional pdf. BD is the most common sampler in BMA, which wanders through the model space by adding or dropping regressors (i.e., rainfall-runoff models) from the model simulation. In this algorithm, a potential covariate (from the set of K potential covariates) is randomly chosen; if at the ith step in the algorithm, the current model is denoted model M i , then M j as the candidate model will have the same set of covariates as M i except for the chosen variable (Zeugner & Feldkircher, 2015).
Rev.jump MCMC was proposed by Madigan and York (1995) that draws a candidate model using BD algorithm with 50% probability. M j randomly drops one covariate with respect to M i and (randomly) adds one chosen from the potential covariates that were not included in the model M i (Zeugner & Feldkircher, 2015). For more details on the sampler readers see Zeugner and Feldkircher (2015).

Rainfall-Runoff Models
This study coupled BMA with a range of conceptional to semidistributed hydrological models and simulated daily average streamflow data. HYdrological MODel (HYMOD; Boyle, 2001), a modified version of soil conservation service curve number model (SCS-CN;Soil Conservation Service, 1956), and Hydrologic Engineering Center-Hydrologic Modeling System (HEC-HMS; USACE, 2000) models were used as lumped/conceptual hydrological models while Soil and Water Assessment Tool (SWAT; Arnold et al., 1993) was employed as a semidistributed model. The simulation of all four rainfall-runoff models was conducted using daily average streamflow data from 2003-2005 (continuous simulation), excluding the 3-year spinning-up period (i.e., [2000][2001][2002]. All models were validated using daily streamflow records of 2006-2007, the calibration simulation (2003)(2004)(2005) was used for the BMA modeling. We calibrated hydrological models using different starting points. Each simulation period was shifted by 1 year, such that subsequent periods have 2 years of data in common. Overall five different calibration periods were considered, and for each data set, parameter sensitivity was evaluated. Model sensitivity did not vary significantly during those subsequent periods suggesting that 3 years of daily streamflow data contains enough information about the estimation of parameters in hydrological models for this particular watershed system, and therefore, no significant variation in parameter estimates between calibration data sets is anticipated. The four hydrological models are briefly described below.

HYMOD
HYMOD is a parsimonious daily step hydrological model with typical conceptual hydrological components, based on the theory of runoff yield under excess infiltration (Moore, 1985). HYMOD computes the rainfall-runoff processes using five parameters including the maximum storage capacity in the catchment C max (L), the degree of spatial variability of soil moisture capacity within the catchment b exp , the factor distributing the flow between the two series of reservoirs Alpha, and the residence times of the linear slow and quick flow reservoirs, R s (days) and R q (days; see Table 1). HYMOD uses potential evapotranspiration (PET) if enough water is available; otherwise, (actual) evapotranspiration (ET) is calculated based on the available water storage. The Alpha parameter divides the surface runoff into quick flow and slow flow, which are routed through three identical quick flow tanks (or surface flow; Q1, Q2, and Q3) and a parallel slow flow tank (groundwater), respectively. The resident time in the quick (Kq (day)) and the slow (Ks (day)) tanks are then used to compute the flow rates in the routing system. HYMOD calculates the evaporation based on water storage concept in the watershed. If the available water in storage is greater than the potential evaporation, the real evaporation is equal to the potential evaporation, otherwise all available water evaporates (e.g., Boyle, 2001).

HEC-HMS
HEC-HMS, devueloped by the United States Army Corps of Engineers (USACE, 2000), is a standard and widely used model to simulate the complete hydrological processes of a watershed system. This model includes the procedures for both continuous modeling (long-term daily rainfall-runoff simulation based on the soil moisture accounting; SMA) and single-event based hydrological modeling (SCS-CN; USDA, 1986). SMA as a lumped bucket-type model is employed in this study. It represents a subbasin with well-linked storage layers/buckets accounting for canopy interception, infiltration, surface depression storage, ET, as well as soil water and groundwater percolation. Given precipitation and potential ET, the SMA model computes basin surface runoff, groundwater flow, losses due to ET, and deep percolation over the entire basin. Potential ET is calculated using the Priestly-Taylor (P-T) method (Priestly & Taylor, 1972). The parameters associated with the SMA approach used in this study are provided in Table 2.

Journal of Advances in Modeling Earth Systems
SAMADI ET AL.

SWAT
SWAT is a watershed modeling program developed by the United States Department of Agriculture (USDA)-Agricultural Research Service to simulate hydrological and water quality at various scales (Arnold et al., 1998). It was developed to simulate streamflow, sediment, and agricultural chemical yields in large complex watersheds with varying soils, land use, and management conditions (Neitsch et al., 2004). SWAT integrates various spatial environmental data such as soil, land cover, climate, and topographic features Zhang et al., 2017).
SWAT subdivides the watershed system into subwatersheds and hydrologic response units (HRUs) connected by a stream network. The HRUs vary in terms of land cover, climate, forest-covered area, cultivation, and hydrologic behavior, and therefore provide an opportunity to test the SWAT procedure under different conditions. In this study, the output of SWAT modeling is based on P-T ET method (see Samadi, 2016), the Muskingum method (Schroeter & Epp, 1988), and the improved one-parameter depletion coefficient for adjusting the CN based on plant ET (see Samadi & Meadows, 2017). For PET calculation, the P-T method is preferable compared to other models such as Hargreaves and Penman-Monteith due to wet and humid surfaces of the coastal plain drainage system, as stated in Lu et al. (2005) and Samadi (2016). SWAT parameters and their ranges are given in Table 3.

SCS-CN
In this study, the modified SCS-CN code was developed in a MATLAB environment. This model simulates temporal and spatial variations of various processes involved in the runoff generation mechanism by incorporating storage concepts to represent the catchment response over time. The differences between the original and the modified model are that the former model is based on an infiltration excess model assuming that the surface runoff generates from the entire catchment, whereas the latter model assumes that certain dynamic contributing areas vary with storm intensity (e.g., Geetha et al., 2008). Further, it considers three different stores of moisture: interception store, soil moisture store, and groundwater store. Modified SCS-CN simulates flow using 13 different parameters by accounting for the antecedent moisture effect and temporal variations of the curve number (see Geetha et al., 2008), while in this study, a slight second modification is achieved in the original SCS-CN methodology and included a pan coefficient (PANC) parameter to calculate evaporation. A description of the modified SCS-CN parameters and their absolute ranges are given in Table 4.

Parameter Sensitivity and Uncertainty Algorithm
This study used the GSA method with three primary components proposed by Spear and Hornberger (1980). GSA code was linked to the outcome of sampling method (here DREAM (zs); Laloy & Vrugt, 2012) as a post processing step to carry out parameter sensitivity analysis. This method uses a parameter set after DREAM (zs) reaches a convergence. Next, the parameter set is classified into behavioral and nonbehavioral solutions using a cut-off threshold (e.g., Nash-Sutcliffe) to distinguish between behavioral (>0.55) and nonbehavioral

Journal of Advances in Modeling Earth Systems
(<0.55) solutions. The behavioral parameter sets are then divided into 10 equally sized groups based on sorted Nash-Sutcliffe efficiency (NSE; Nash & Sutcliffe, 1970) value as recommended by Wagener and Kollat (2007). The modeling procedure was preceded by plotting the cumulative distribution function (CDF) of the parameters within each group (10 CDF curves, in total). The sensitivity of the parameters was determined by looking at the spread among the produced CDF curves. The Kolmogorov-Smirnov (K-S) test (Kottegoda & Rosso, 1997) was then used to calculate differences among the CDF curves. A high K-S value (a value close to 1) indicates higher parameter sensitivity while a low K-S ensures low sensitivity.
The four hydrological models explained above contain parameters η h that were calibrated using DREAM (zs) with a generalized likelihood (GL) function (Schoups & Vrugt, 2010). This research used a GL function that is especially developed for nontraditional residual distributions with correlated, heteroscedastic, and non-Gaussian errors (Schoups & Vrugt, 2010). GL skillfully described the heteroscedastic and autocorrelated error model. This approach yielded a tight predictive uncertainty band that was far less sensitive to the particular time period used for calibration. DREAM (zs) used the b R-statistic (Gelman & Rubin, 1992) to determine convergence to the stationary posterior distribution. Readers are referred to Vrugt et al. (2009), Schoups and Vrugt (2010), and Pourreza-Bilondi et al. (2017) for more discussion on DREAM (zs).
Several indices were used to quantify the goodness of sensitivity analysis as well as calibration performance for BMA: the P factor, which is the percentage of data bracketed by a 95% prediction uncertainty band (95PPU; maximum value is 100%), and R factor (or d factor), which is the average width of the uncertainty band divided by the standard deviation of the corresponding measured variable (minimum value is zero; Abbaspour et al., 2004). Theoretically, the value for the P factor ranges between 0% and 100%, while the R factor ranges between 0 and infinity (see Pourreza-Bilondi et al., 2017, for further information). Based on the requirement of the geometric structure of the prediction bounds, two different indices for assessing the average asymmetry degree of the prediction bounds with respect to the observed hydrograph are proposed. The first index is defined as S (Equations 26-28): The second index for assessing the average asymmetry degree of the prediction bounds with respect to the observed hydrograph is defined as T: where Q i , q u i , q l i and b i , respectively represent observed discharge, upper and lower limits of predictive bound and actual bandwidth (Xiong et al., 2009). Small values of T and S are desirable for a perfect simulation. In addition, KGE (Kling and Gupta efficiency; Gupta et al., 2009), root-mean-square error (RMSE), and NSE were used to calculate calibration performance.

Study Area and Data
The methodologies and procedures explained above were applied to the upper Waccamaw watershed (UW2), a coastal plain drainage system located in North Carolina (NC; Figure 1). Due to tidal effects in 10.1029/2019MS001924

Journal of Advances in Modeling Earth Systems
the downstream region, this study simulated daily average streamflow of the upstream part of the watershed. The study area is 1881.67 km 2 and characterized by a low elevation (5.5-46.3 m), low erosive energy streams, varying soil wetting fronts, dense vegetation, broad and flat alluvial floodplains and complex groundwater structure dominated by a shallow aquifer (e.g., Samadi et al., 2018). The climate of the region is specified as humid subtropical and precipitation in the summer is dominated by convection storms and in the winter by frontal boundaries Samadi & Meadows, 2017). In the study region, spring and fall are wetter, which receives the highest amount of rainfall in the summer due to convective storms. The average annual precipitation in the study area ranges between 46.3 inches (1,176 mm) and around 80 inches (2,032 mm) occurring throughout the year. Average temperature ranges near 90°F (32°C) with overnight lows near 70°F (21°C). Winter temperatures are much less uniform. During the calibration period (2003)(2004)(2005), precipitation ranged from an extremely wet year in 2003 (320 mm above average rainfall) to an average range in 2005 (~1,350 mm). Soils are typically sandy loam and sandy clay loammoderately drained in the uplands and poorly drained in the floodplain.
Meteorological data (daily precipitation, maximum and minimum air temperatures, wind speed, humidity, and solar radiation), and spatial data inputs (digital elevation model [DEM], land use, and soil coverage) were acquired from the National Climatic Data Center (NCDC) and U.S. Geological Survey (USGS) portals on September 25, 2015. Model calibration was carried out using data from USGS gauging station at Freeland. We used three climate stations namely Longwood, Loris, and Whiteville to incorporate the rainfall and temperature fields in the hydrological models. Data from climate stations were interpolated using Thiessen polygon and inverse distance weighting (IDW) methods to capture the spatial continuity of rainfall fields in the study area.
A linkage between multiple hydrological models used in this study and BMA with the various priors is illustrated in Figure 2. Briefly, the simulation of the UW2 is achieved using conceptual to semidistributed rainfall-runoff models. Due to the computational burden of the BMA code, this study used two steps to combine the results of hydrological models. First, the outputs of models were coupled with GSA and DREAM (zs) algorithm to assess the sensitivity and parameter uncertainty of the models. Next, the outcomes of

Journal of Advances in Modeling Earth Systems
hydrological models were fed to the BMA to combine the simulation with the most important parts of the posterior model distribution of hydrological model and improve daily streamflow prediction. A weighted posterior probability simulation was then outperformed using BMA with the most appropriate fixed and flexible priors for the watershed under study.

Parameter Sensitivity Analysis
In this study, SWAT and HEC-HMS models were forced by 18 parameters, while HYMOD and modified SCS-CN were calibrated using five and 14 parameters, respectively. Parameter sensitivity analysis proved that the DREAM (zs) sampling algorithm is onerous and time consuming, especially for the SWAT and HEC-HMS analyses. This is partly related to the fact that some parameters contributed more weight to simulation, making the MCMC results being more sensitive to changes of these parameters. For instance, shallow aquifer and soil properties are two key parameters that are proved to have a significant effect on the coastal plain simulation (e.g., Samadi & Meadows, 2017).
Marginal distributions derived from DREAM (zs) were computed for all four hydrological models (results not shown here). Optimal parameter values, and the upper and lower bounds that define the prior uncertainty ranges of HYMOD parameters as well as sensitivity rank are given in Table 1. In HYMOD, residence time slow flow reservoir (days) that computes the groundwater parameter showed more spiked and narrower range and depicted most sensitive parameters (see Table 1) whereas parameters related to quick flow Figure 2. A conceptual model explaining the linkage between four rainfall-runoff models and BMA. R denotes rainfall, PET is potential evapotranspiration, ET represents actual evapotranspiration, T is transpiration, REVAP means groundwater "revap" coefficient, EPCO is plant uptake compensation factor, and ESCO denotes soil evaporation compensation factor.

Journal of Advances in Modeling Earth Systems
(e.g., Rq) were ranked as the most insensitive ones. Note that the marginal pdfs of the HYMOD parameters appeared approximately Gaussian except for those of b exp and R s , which significantly departed from a normal distribution and tended to concentrate most of the probability mass at their upper and lower bounds (sharp response; results not shown here). Indeed, the mathematical approaches used to estimate b exp and R s appear to be insufficient to capture appropriate ranges for the spatial variability of the soil moisture storage as well as the residence time of groundwater flow.
The HEC-HMS model results indicate that the range of maximum infiltration was considerably narrowed by DREAM (zs), while other parameters such as groundwater, maximum percolation, and threshold to peak flow occupied a relatively large region interior to the uniform prior distribution. These ranges were not further narrowed by DREAM (zs) even when the number of iterations were increased. Further, the ranges of the GW2 routing coefficient, storage capacity and GW1 storage capacity narrowed significantly by the MCMC algorithm indicating the sensitivity associated with these parameters in the model (see Table 2). HEC-HMS showed low sensitivity when the tension capacity was lower than storage capacity; thus, a multiplier value was used for the storage capacity. What should be noted here is that the degree of sensitivity of infiltration rate, storage and groundwater routing showed more fluctuations that could be decreased if groundwater/shallow aquifer properties were formulated appropriately by the HEC-HMS. This is because there is a strong interaction among shallow aquifer properties, overland flow and channel routing parameters in the coastal plain drainage system as recently shown by Samadi and Meadows (2017). This interaction may affect the runoff amount and runoff travel time in downstream. Thus, the groundwater routing and storage properties of the HEC-HMS model should be discreetly limited in this current version.
In terms of SWAT parameter sensitivity, CN2, SOL_AWC, CH_N2 and SOL_BD appear to be the most sensitive parameters (see Table 3) while other parameters with K-S values less than 0.75 seem to be almost identical. Most parameter values covered their completely predefined ranges except SOL_AWC and SOL_K. The ranges of these parameters are narrower than their predefined scopes suggesting high sensitivity of SWAT groundwater parameters. Further, some of SWAT parameters occupied a relatively small region interior to the uniform prior distributions of the individual dimensions (results not shown). This reveals that the observed streamflow data contains sufficient information to estimate these parameters. This is further confirmed with relatively small parameter ranges as shown in Table 3. Table 4 shows modified SCS-CN parameter ranges, their optimal values, the KS values and parameters sensitivity rank. As expected, subsoil drainage coefficient and coefficient of transpiration from soil zone are ranked as the most sensitive parameters. Like SWAT, modified SCS-CN showed more sensitivity to curve number. Further, this model showed more sensitivity to hydrogeological properties (e.g., subsoil permeability) and surface energy balance. It appears C 1 depends on the available soil water in the topsoil layer. PNAC is another sensitive parameter that indicates inadequate atmospheric evaporation capability in the model.

DREAM (zs) Predictive Uncertainty
Once the posterior distribution of the model parameters is determined, the predictive uncertainty of daily discharge simulation for each hydrological model was computed by propagating the different samples of the posterior distribution at the 95% confidence interval. Table 5 compares the results of DREAM (zs) simulation for different hydrological models. Based on several performance criteria, SWAT well calibrated streamflow records compared to the rest of models. This would imply that the GL function, used as an objective function in DREAM (zs) to remove heteroscedasticity and autocorrelation, was particularly successful for a semidistributed hydrological model calibration. The modified SCS-CN model and HEC-HMS are posed in the next ranks. The HYMOD model showed the weakest performance, which may be because of its simplicity and lumped concept.
Figures 3-6 show diagnostic plots of the residuals (i.e., difference between observed and simulated streamflow) derived from the GL likelihood function. In most cases, the heteroscedasticity has been removed by the GL function and the residuals are not sensitive to the magnitude of streamflow. Figures 3b, 4b, 5b, and 6b clearly show that the double exponential (heavy-tailed) distribution used by the error model is suitable and consistent with the pdf of the residuals. This would reveal that MCMC Bayesian algorithm applied in this study is promising for relaxing the residual error assumption of the upper Waccamaw watershed.

Journal of Advances in Modeling Earth Systems
Temporal dependence of the residuals is illustrated in Figures 3c, 4c, 5c, and 6c, which indicate that the residuals still exhibit substantial dependence at higher lag autocorrelations. Although autocorrelation of the residuals has been substantially reduced by DREAM (zs), the challenge of omitting temporal correlation in the coastal plain simulation is understandable. DREAM (zs) was particularly successful in eliminating temporal dependencies in SWAT and modified SCS-CN simulation. HEC-HMS and modified SCS-CN predictive uncertainty seems to be similar, although the latter one provided a wider band. HYMOD, on the other hand, converged to a very different posterior pdf revealing less capability of this model in simulating flow

Journal of Advances in Modeling Earth Systems
dynamics across a coastal plain environmental system. However, DREAM (zs) was somewhat successful in removing correlation in the HYMOD model.
Overall, the total predictive uncertainty bound seems reasonably accurate for SWAT and HEC-HMS (see Figures 7-10). These two models mimicked the observed data quite well, reproducing most minor and major flow events. However, closer inspection of the calibration results indicates that DREAM (zs) was not quite successful in calibrating the modified SCS-CN model. Further, HYMOD results revealed that, although error assumptions are fulfilled, the predictive uncertainty band is too large and meaningless. This indicates that this model is less capable of simulating coastal rainfall-runoff processes. On the other hand, the inconsistency in simulation may also arise due to uncertainty in input data when repeated rainfall events occur in the coastal plain (e.g., Samadi et al., 2018).
The assumption of a double exponential prior distribution, as explained above, relatively well approximated the conditional pdf of SWAT, HEC-HMS and the modified SCS-CN model. Daily streamflow data of the coastal drainage system are naturally bounded by fat-to highly skewed-tail distributions which is difficult to justify according to the outcomes of one specific prior distribution. This inspired the authors to employ BMA with a hierarchy of prior formulations to marginally combine the simulation on the most important parts of the posterior probability distribution of each potential hydrological model explained in next section.

BMA Computation
The outcomes of four hydrological models were coupled with BMA using a variety of prior structures. This study included fixed to flexible model priors and examined the possibility of subjective inference using prior inclusion probabilities according to the user's belief. Two efficient MCMC samplers (i.e., BD and Rev.jump MCMC) were then applied to create a weighted posterior probability distribution from BMA exercise that sorted through the model space. Below is a discussion of the results of BMA simulation for different prior structures.

Fixed Priors
A uniform prior with the unit information prior on Zellner's g was first applied to compute the expected prior parameter size. The variable names and corresponding statistics are shown in Table 6. PM displays the coefficients averaged over all models. Posterior inclusion probabilities (PIP) or shrinkage factor represents the sum of the PMP for all models wherein a covariate (a modeled streamflow) was included. In other words, PIP displays the importance of the various hydrological models in explaining daily streamflow data. Interestingly, the majority of the posterior mass virtually rests on models that include HEC-HMS, SWAT, and modified SCS-CN. In contrast, HYMOD has the lowest PIP (i.e., 4%), indicating that HYMOD does not seem to matter much, implying an inability of a simple lumped model in simulating coastal observational records.

Journal of Advances in Modeling Earth Systems
Accordingly, the rank of models (see Table 6) that are sorted by PIP reveals that HEC-HMS and SWAT were successful in simultaneously describing observations in terms of magnitude. The posterior expected model size (i.e., the average number of included regressors, which ranges from 1 to 5) using the uniform prior was 3.04, which is the sum of PIPs. The uniform model prior puts more mass on intermediate model sizes (e.g., k/2 = 3 with 31% probability); therefore, it is important to consider other popular priors that allow more freedom in choosing model size and other factors. It is interesting to note that the uniform prior better calibrates the shrinkage factor (or PIP) by avoiding overfitting. As the shrinkage factor increases, the model tends to yield a tighter posterior.
Next, we used a Binomial model prior to place a common and fixed inclusion probability on each hydrological model. The model included a prior model size of 2.5 (i.e., k/2 = 2.5) which tilted the prior distribution toward smaller model sizes. Simulating BMA using the Binomial prior with model size of 2.5 yielded a posterior model size of 3.07, near that of the uniform prior. As a result, the PIP of each rainfall-runoff model is the same as uniform prior. Although the PIP of HYMOD improved slightly, the rank of the hydrological models is similar to when using the uniform prior (see Table 6).
In addition, BMA calibration using g-RIC, BRIC, EBL, and GEB priors were computed and the results presented in Table 6. Among these priors, EBL provided the most discouraging results, indicating marginal likelihood evaluation using a Laplace approximation was less appropriate for integrating the posteriors. Overall, these results indicate that the BMA model concentrated posterior mass tightly on HEC-HMS and somewhat SWAT whereas modified SCS-CN model resulted less model mass concentration. HYMOD was the least capable model, and thus BMA avoided including this model for summarizing the posterior mass.

Flexible Priors
In view of the pervasive impact of HEC-HMS and SWAT models on posterior model distribution, one might wonder whether their importance would still remain robust to a greatly unfair prior. In perusing this, we specified our own model priors (custom prior inclusion) and offered a possibility of subjective inference by setting prior inclusion probabilities according to the user belief. We defined a low prior inclusion probability (i.e., θ = 0.01) for the HEC-HMS and SWAT simulations while setting a standard prior inclusion probability of θ = 0.5 for the rest of the hydrological models. Results indicated that HEC-HMS and SWAT still retain their shrinkage factors near 100%. Posterior model size, on the other hand, increased to 3.2 while HYMOD obtained a larger PIP (0.16). The modified SCS-CN model also retained its PIP of 99%. The coefficients averaged over all models improved slightly for HYMOD where it remained similar for the rest of models.
The Binomial-Beta prior was the second prior that was implemented for the BMA calibration. Since the fixed common θ in the Binomial prior (explained above) centers the mass of its distribution near the prior model size, this may increase prior uncertainty about model size. Thus, Ley and Steel (2009) proposed to include a hyperprior on the inclusion probability of θ that effectively draws from a Beta distribution. In pursuing this, the Binomial-Beta prior put a completely flat prior on an expected model size of each hydrological model. Consequently, the posterior model size resulted in a value of 3.32. In terms of coefficient and posterior model size distribution, the results are similar to Binomial model priors, although the latter approach involved a tighter model prior. The use of the Binomial-Beta framework supports the results found in aforementioned

Journal of Advances in Modeling Earth Systems
prior calibrations that 100% of all posterior mass virtually rests on models that include HEC-HMS and SWAT.
BMA was also calibrated using Zellner-Siow and hyper-g priors. Zellner-Siow implicitly proposed an inverted Gamma distribution as a prior and put a lot of weight on regions of g with high marginal likelihood. Thus, this type of prior does particularly well in simulation. Unlike Zellner-Siow, the hyper-g prior seems to weaken the marginal likelihood estimates. The hyper-g assigns large prior mass to the model which provides a fat-tailed posterior that seems difficult to justify for the coastal hydrological dynamics. Roughly 66% and 59% of the Zellner-Siow and hyper-g posterior mass, respectively, virtually rests on the model that include HEC-HMS (see Table 7).
Overall, marginal information of different priors on g resulted a tighter pdf with a heavier tail. As a consequence, the pattern of Bayes factors among the models with different priors on g is different. In this respect, custom prior inclusion and uniform priors showed better results compared to the rest of priors. We therefore combined the results of these two priors and computed the posterior probability distribution explained in next section. Results suggest that HEC-HMS, SWAT and somewhat modified SCS-CN put more weight (mass) on the posterior probability distribution. However, complete prediction of the coastal plain streamflow records, especially during low and high flow events, can be quite challenging, as pointed out recently by Joseph and Guillaume (2013), Samadi and Meadows (2017), and Samadi et al. ( , 2018.

Ensemble of Rainfall-Runoff Models 4.3.1. Combining Sampling Chains
Based on the results that were achieved using different priors, it is now straightforward to enumerate all potential model combinations to obtain an ensemble posterior pdf. This study used BD and Rev.jump MCMC samplers that are different in the way they propose candidate models. In this respect, MCMC samplers combined results on the most important parts of the posterior model distribution of each potential hydrological model and approximated it as closely as possible. We retained model convergence and posterior statistics for HEC-HMS, SWAT and modified SCS-CN that presented the highest PMPs. The quality of an MCMC approximation to the actual posterior distribution depends on the number of draws the MCMC sampler runs for. Thus, our simulation started out with 20,000 simulation runs and slightly increased the numbers of iterations until no difference was found between PM and PIP.
A combined uniform and custom inclusion priors with the BD MCMC sampler were primarily employed with 20,000 iterations and the numbers of runs increased slightly. The results suggest 200,000 iterations, after a substantial number of burn-in iterations (i.e., 80,000), provided a good PMP and proper marginal likelihood. The same procedure for the numbers of iterations and burn-ins were also used to ensemble three hydrological models using the Rev.jump MCMC sampler. PMP correlation using the BD algorithm indicated a good degree of convergence among analytical likelihoods. However, the more complicated the distribution of marginal likelihood, the more difficulties the sampler meets before converging to a good approximation of PMPs (Zeugner & Feldkircher, 2015). In addition, the sum of PMPs using the Rev.jump MCMC algorithm indicated that in total, modified SCS-CN accounted for less than 20% of posterior model mass while the rest of models accounted for greater than 80%.
Further, the PMP correlation of the combined model using the BD algorithm seems to be more promising because it estimated more than 90% of posterior probability. The coefficient of the combined model was also better than the individual ones when we compared the simulation results. The results are presented in Figures 11-13. All statistics are based on a combined iteration chain using uniform and custom inclusion priors as the best model likelihoods. BMA simulation revealed a tight posterior shrinkage that is concentrated around 0.91 when a combined prior was used. It is interesting to note that an overfitting shrinkage factor that was too large leads to tight PMP concentrations and skewed distribution. In contrast, an excessively small shrinkage factor does not reflect the data signals, and typically leads to intermediate PIPs for models. Therefore, a combined prior provided an average shrinkage factor or Bayesian goodness-of-fit indicator.
The sharper the posterior density, the more information the sample contains about models and the less important prior choice of g becomes. In other words, if the posterior density is sharper relative to the prior density, this means BMA more strongly relies on the data. As illustrated in Figures 11-13, HEC-HMS put more significant prior mass around small values of the shrinkage factor than SWAT and modified SCS-CN. Specifically, 95% of the posterior coefficient mass of HEC-HMS seems to be concentrated between 0.56 to 0.73, while this range is 0.25 to 0.37 and 0.11 to 0.24 respectively, for SWAT and modified SCS-CN.  This would imply that HEC-HMS and somewhat SWAT are more robust hydrological models in simulating the upper Waccamaw watershed daily streamflow since both included in virtually all models mass.
BMA analysis was preceded by predicting streamflow during 2003-2005. Three hundred sixty-five data sets (2003 data) were used to train the BMA model. Table 8 and Figures 14 and 15 present BMA training and testing results. As illustrated, the sharpness of the predictive uncertainty ranges has substantially increased, which led to reducing the average spread of the 95PPU. However, the overall best results are obtained by combining fixed and flexible priors in the BMA model. This approach not only exhibited the best predictive performance, but also adequately narrowed the predictive uncertainty band and captured the expected percentage of observations. We thus conclude that combined flexible (time varying conditional pdf) and fixed priors have significant practical advantages for the coastal plain hydrological simulation when the aim is to skillfully simulate the data records that contain frequent high and low flow events. We presented the error associated with the BMA modeling as well as the spread and percent of observations (see Table 8) that refer to the average width of the 95% uncertainty ranges, and the percentage of discharge observations contained in this interval, respectively.
These results also support model averaging rather than the selection of any individual model based on performance criterion, the approach that was performed by DREAM (zs). In summary, the use of BMA for combining the results of conceptual to semidistributed hydrological simulations provided considerable predictive improvement compared to relying on one individual model, although care should be taken in the methodology adopted in the BMA modeling and the prior.

Conclusions
This paper presents a two-step procedure that includes model calibration for a range of conceptual to semidistributed hydrological models using DREAM (zs) algorithm, followed by ensemble prediction of streamflow using BMA with various prior structures. DREAM (zs) was first employed to address the parameter uncertainty of individual model simulation. A new sensitivity model based on the Monte Carlo sampler GSA was implemented to post process DREAM (zs). GSA determined the sensitivity of the parameters by calculating the spread among the CDF curves of model parameters. The sensitive parameters were then used in DREAM (zs) to simulate daily streamflow  records. DREAM (zs) used the GL error assumption to diminish parameter and predictive uncertainty of each hydrological model.
GSA analysis revealed that soil and groundwater properties are the most sensitive parameters for streamflow simulation. In coastal plain watersheds, high and repeated storm events increase water levels to the saturated condition, resulting in frequently ponded regions, especially in the riparian wetland area, that might be the reason why both soil and groundwater parameters were both particularly sensitive. In addition, this result may reflect the deficiency of the numerical solver employed in each hydrological model to capture coastal flow dynamics, a topic which is out of the scope of this study. DREAM (zs) calibration showed proficiency in calibrating surface and subsurface flow interaction and summarizing the posterior parameter distribution. Furthermore, GL was useful in removing heteroscedasticity and skewness; thus, it may prove to be a good choice for the coastal plain hydrological simulation, as correlation and the absence of homoscedasticity (error variances do not depend on the magnitude of streamflow) in standardized residual errors have been reported in several other studies (e.g., Samadi et al., , 2018. The results of BMA with fixed and flexible priors used in this study led to tight and sharp posterior distributions. BMA provides a coherent mechanism and promising results for calculating a weighted posterior probability compared to the DREAM (zs) calibration for each individual model. BMA analysis revealed the influence that a poorly chosen prior exerts on the (weighted) posterior when applied to noisy data (coastal  plain streamflow records), which leads to the relative merits of BMA being less pronounced and its predictive power deteriorating.
Marginalizing out the PMPs with the fixed priors induced a much flatter pdf while flexible priors led to a much tighter pdf. BMA calibration using two different MCMC algorithms revealed that compared to the Rev.jump approach, the Birth-Death algorithm is preferable when the aim is to deal with mixture g priors with unknown numbers of states. The advantage of using the BD algorithm is that it is a continuous time MCMC sampler, which is used to construct ergodic chains for models with a varying parameter space. This makes prediction reasonably straightforward for every g drawn in the sampler as predictions are simply mixed over values of g in the sampler. The prediction of the Rev.jump MCMC algorithm as a general extension of the Metropolis-Hastings algorithm seems less straightforward and the quality of its approximation is not that reliable.
Focusing on the simulation of various prior structures, this study can make recommendations for hydrological simulation. Assuming that modelers/users want BMA priors to be consistent, to avoid the information paradox and to perform well in a wide variety of situations, combined flexible and fixed priors are consistent and perform well on continuous daily streamflow data. The global empirical Bayes prior is hard to recommend for situations where ranges of hydrological models are used. This is because it assumes one common g for all models in the BMA formula making the marginal likelihood estimation unreliable. Binomial, BRIC and Binomial-Beta priors do not fare well in prediction in terms of the PM. The Zellner-Siow prior performs relatively well, except for the HYMOD model, where it gets very little support from the data. However, the results of BMA depend on how well each individual hydrological model simulates the streamflow. Nevertheless, we feel the Zellner-Siow prior deserves a place in the toolbox of BMA for hydrologists, especially if the aim is to simulate flash flood events (the number of data records is relatively small).
In our view, the two priors that stand out by not having displayed any truly poor behavior in our experiment are the custom prior inclusion and uniform prior. When dealing with complexity and lack of fit in simulation, we view the custom prior inclusion and uniform prior as a reasonable default prior and starting place.
Marginalizing out the PMPs with these priors on θ and g induced a much better result and much tighter/sharper posterior median of g. This made BMA analysis more strongly supported by the data. Thus, both priors provided an interesting compromise and would be our general recommendation to hydrological communities.
Putting a prior on both θ and g makes the analysis naturally adaptive and avoids the information paradox (Liang et al., 2008) compare to analyses with fixed g. We now allow the data to inform us (the concept of "letting the data decide the function" expressed by Philips et al., 2018) on variable inclusion probabilities and the appropriate region for g. To a much greater extent, this will reduce the lack of fit for each given hydrological model. Therefore, we feel the BMA model used herein with the recommended priors on θ and g can be considered a safe "automatic" choice for hydrological calibration, although, additional efforts are needed to build up series of benchmarks in simulation. For example, the BMA model with multicollinear regressors that includes alternative priors such as different sets of Heredity priors introduced by Chipman (1996) and the Tessellation prior presented by George (2010) can be further applied. These priors propose some other promising approaches to dilution prior construction based on predictive and empirical Bayes ideas that make the analysis naturally adaptive. We feel the models used herein with the fixed and flexible priors on both θ and g can be considered a safe benchmark prior structure choice for use in BMA that can be applied in a variety of hydrological/geoscience settings. The BMA, GSA and modified SCS-CN codes used herein can be obtained from the corresponding author upon request.