Uncertainty in hydrological signatures for gauged and ungauged catchments

Reliable information about hydrological behavior is needed for water‐resource management and scientific investigations. Hydrological signatures quantify catchment behavior as index values, and can be predicted for ungauged catchments using a regionalization procedure. The prediction reliability is affected by data uncertainties for the gauged catchments used in prediction and by uncertainties in the regionalization procedure. We quantified signature uncertainty stemming from discharge data uncertainty for 43 UK catchments and propagated these uncertainties in signature regionalization, while accounting for regionalization uncertainty with a weighted‐pooling‐group approach. Discharge uncertainty was estimated using Monte Carlo sampling of multiple feasible rating curves. For each sampled rating curve, a discharge time series was calculated and used in deriving the gauged signature uncertainty distribution. We found that the gauged uncertainty varied with signature type, local measurement conditions and catchment behavior, with the highest uncertainties (median relative uncertainty ±30–40% across all catchments) for signatures measuring high‐ and low‐flow magnitude and dynamics. Our regionalization method allowed assessing the role and relative magnitudes of the gauged and regionalized uncertainty sources in shaping the signature uncertainty distributions predicted for catchments treated as ungauged. We found that (1) if the gauged uncertainties were neglected there was a clear risk of overconditioning the regionalization inference, e.g., by attributing catchment differences resulting from gauged uncertainty to differences in catchment behavior, and (2) uncertainty in the regionalization results was lower for signatures measuring flow distribution (e.g., mean flow) than flow dynamics (e.g., autocorrelation), and for average flows (and then high flows) compared to low flows.


Introduction
Reliable information about the hydrological behavior of both gauged and ungauged catchments is needed for a wide range of scientific and water-resources management purposes. Such information is often summarized as an index value -or a hydrological signature -calculated from data time series in gauged catchments. Examples include the base-flow index and flow descriptors such as flow percentiles or statistics of high and low flow behavior. Signatures have a long history of use in eco-hydrology [Olden and Poff, 2003] and hydrology for, e.g., change detection [Archer and Newson, 2002;Juston et al., 2014;Sawicz et al., 2014], model evaluation [Hrachowitz et al., 2014;Montanari and Toth, 2007;Refsgaard and Knudsen, 1996;Sugawara, 1979], model-structure diagnostics [Coxon et al., 2013;Gupta et al., 2008;Jothityangkoon et al., 2001;McMillan et al., 2011], and catchment classification [Sawicz et al., 2011]. In particular, they have been widely used for transferring information about hydrological behavior from gauged to ungauged catchments [Bloeschl et al., 2013]. In this paper we consider regionalization procedures that transfer flow signature information directly from gauged to ungauged catchments (i.e., without using a hydrological model), and the uncertainties that affect such procedures.
Uncertainty in signature values for gauged catchments stems from the observed data from which they are calculated and, for more complex signatures such as recession parameters, from the choice of calculation method . Such uncertainties reduce the information gained from the signature values for hydrological analyses and thus also the reliability of those analyses, e.g., when used to study differences in catchment behavior [Wagener and Montanari, 2011]. It is therefore important to understand the magnitude and characteristics of signature uncertainty under different conditions. The main sources of data uncertainty are the measurements' accuracy, precision and representativeness for the studied variable [McMillan et al., 2012], but also data postprocessing [Hamilton and Moore, 2012]. Studies have shown that rating curve uncertainty propagates to uncertainty in flood-frequency estimates and in signatures used for model calibration or change detection for individual catchments [Blazkova and Beven, 2009;Juston et al., 2014;Kuczera, 1996;Westerberg et al., 2011]. Westerberg and McMillan [2015] found that rainfall-runoff signature uncertainty as a result of observational uncertainty for two catchments in the UK and New Zealand were on the order of 610240% and varied between the signatures. However, there have been no large-scale studies investigating these uncertainties across multiple catchments and multiple signature types.
When regionalizing signature values to an ungauged catchment, the uncertainty in the regionalized signatures have several sources; 1) uncertainty in the signatures calculated for the gauged catchments, 2) uncertainty stemming from the regionalization procedure, and where the latter may include 3) uncertainty in catchment characteristics data (e.g., geomorphological descriptors like elevation and soils) used to describe catchment similarity for the transfer of information. There is a long tradition of regionalization of flow signatures to ungauged basins [Bloeschl et al., 2013], with common approaches including those based on regression against catchment descriptors [e.g., Almeida et al., 2012;Bardossy, 2007;Castiglioni et al., 2010;Nathan and McMahon, 1992], donor catchments or pooling groups [Burn, 1990;Holmes et al., 2002;Kjeldsen et al., 2014], and, more recently, geostatistics [Pugliese et al., 2014;Viglione et al., 2013]. Yadav et al. [2007] investigated uncertainty in the regionalization procedure when using regression to regionalize signature values for 30 UK catchments. They found that regionalization performance varied widely between signatures and that the most useful independent catchment characteristics were climate, topography and geology characteristics. Hannaford et al. [2013] evaluated the utility of the hydrometric network in England and Wales for regionalization based on catchment descriptors and gauging station data quality. They found that for low (high) flows 22% (45%) of the catchments with the highest regionalization potential have low utility because of inadequate hydrometric data quality. Westerberg et al. [2014] investigated signature uncertainty resulting from observed data and the regionalization procedure for 36 Central American catchments. They regionalized flow duration curves (FDCs) using a typical estimate of discharge uncertainty for the region, and found that the majority of the predicted uncertainty bounds encompassed the observed values. However, discharge uncertainty is known to vary with flow range depending on site-specific measurement conditions [Le Coz et al., 2014;McMillan and Westerberg, 2015;Morlot et al., 2014]. Further investigation using data sets that allow site-specific uncertainty estimates is therefore needed to gain a better understanding of uncertainty for a wider range of flow signatures compared to earlier studies for both gauged catchments and in regionalization for ungauged catchments.
The aim of this study was to investigate uncertainty in flow signatures for gauged and ungauged catchments. In particular the objectives were to; 1) regionalize signatures while accounting for discharge uncertainty in the gauged donor catchments as well as uncertainty in the regionalization procedure, and 2) investigate the role and relative magnitude of the different uncertainty sources in defining the predicted signature uncertainties. Uncertainty in the catchment characteristics data used to describe catchment similarity was not included in this study.

Data
The study was performed using a comprehensive data set consisting of 15 min water level time series (1 October 2003to 30 September 2008 in combination with rating curve and gauging data for 43 catchments in England and Wales ranging in size from 8 to 1480 km 2 (Figure 1). The catchment characteristics that we used for the regionalization procedure were: mean annual precipitation for the study period, BFIHOST (a baseflow index from the UK Flood Estimation Handbook derived from soil characteristics in the hydrology of soil types (HOST) classification [Institute of Hydrology, 1999]), and the 90 th percentile of the catchment elevation distribution (see also section 3.3). The BFIHOST and elevation indices were obtained from the UK Hydrometric Register [Marsh and Hannaford, 2008]. We selected catchments that fulfilled a number of criteria to ensure reliable discharge data uncertainty estimates and that the regionalization performance was not affected by anthropogenic factors or nested catchment locations. The criteria were: (1) the station was active, (2) it was classified as having a natural flow regime in the UK Hydrometric Register,

10.1002/2015WR017635
(3) the station was classified as having a Service License Agreement in the register (part of a strategic monitoring network subject to more rigorous quality control), (4) data suitable for reliable discharge uncertainty analyses were available (e.g., sufficient information about out-of-bank rating, no stilling-well problems, etc.), (5) it was a gauged weir and/or velocity-area station, and 6) it was not upstream/downstream of another catchment in the data set. Only water level data not classified as suspect by the data provider were used; other uncertainties in the water level data series were not considered. There were five stations with 5-12% missing water level data, 32 stations had less than 2% missing data and the rest were in-between 2% and 5%. The chosen catchments spanned a wide range of hydrological behavior (Figure 1), representing most of the range of catchments classified as having a natural flow regime in England and Wales. In central and Eastern England, there are few catchments with natural flow regimes, and only one of these fulfilled all the selection criteria for this study.

Choice and Calculation of Signatures
We used nine signatures describing the flow distribution and six signatures that describe flow dynamics (Table 1). These signatures describe the magnitude and dynamics of high and low flows together with Figure 1. The 43 catchments in England and Wales used in the study and their range of characteristics (top, where Elevation 90 p. is the 90th percentile of the catchment elevation distribution) and signature values (bottom, where the parallel coordinate plots shows the range of signature values with the minimum (maximum) values below (above) the plot). The signatures were calculated with the optimal rating curves from the uncertainty analysis (signature definitions in Table 1).

10.1002/2015WR017635
average flow conditions and overall flow variability. They represent signature information of interest for a wide range of applications and illustrate the effect of data uncertainty across a range of flow behavior. Most of these have been used in previous studies aiming to regionalize specific signature information separately [Castellarin et al., 2004;Holmes et al., 2002], when using multiple signatures to constrain rainfall-runoff models [Euser et al., 2013;Hrachowitz et al., 2014;Yadav et al., 2007;Yilmaz et al., 2008], and in eco-hydrological studies [Clausen and Biggs, 2000;Jowett and Duncan, 1990].

Discharge and Signature Uncertainty for Gauged Catchments
The uncertainty in the signatures for the gauged catchments was estimated as follows. Discharge uncertainty was estimated for each gauging station. The uncertainty in the rating curve parameters was estimated from the stage-discharge gauging data, obtaining 40,000 equally likely rating curves for each station (see below). Each rating curve was used to calculate a discharge time series from the water level data. The resulting set of 40,000 discharge time series were aggregated to hourly time scale, converted to specific discharge (i.e., per unit area, expressed in mm/h) and each used to calculate a signature value, thus obtaining an uncertainty distribution for each signature. A detailed description of this method is given by Westerberg and McMillan [2015].
The uncertainty in the rating curve parameters was estimated in a Markov Chain Monte Carlo (MCMC) analysis with the Voting Point likelihood method . This method accounts for random and epistemic uncertainty sources. Random (aleatory) gauging measurement uncertainty was estimated as logistic distribution functions for UK conditions for a set of stations where uncertainty due to temporal rating curve variability was assumed negligible [Coxon et al., 2015]. Epistemic uncertainty related to the rating curve approximation of the true stage-discharge relationship is important to consider at many gauging stations. This approximation may be uncertain outside the gauged range where the curve is extrapolated, or where processes such as erosion, seasonal weed growth, hysteresis, and variable backwater induce nonstationarity in the stage-discharge relationship. Such epistemic uncertainties imply that all gauging points are not compatible with the same ''true'' rating curve, and the voting point likelihood was therefore defined in terms of the fraction of time that a candidate rating curve could have been representative of the channel conditions (see definition and equations in McMillan and ). The analysis was constrained to the functional form of the official rating curves used in the study period, which was a powerlaw function often containing multiple segments. The priors for the rating curve parameters were set to standardized ranges defined relative to the official parameter values. These were adjusted for some stations where visual inspection showed that these ranges did not fully capture the gauging data uncertainty. Only the gaugings that were representative for the rating curve were used, where the gauging data were pooled based on deviations between historical rating curves [Coxon et al., 2015].

Regionalization of Signatures to Ungauged Catchments With Uncertainty
Traditional regionalization methods, e.g., regression of signature values using catchment descriptors as independent variables, allow estimation of predictive uncertainty, but involve strong assumptions on the signature error distribution (e.g., normality). Those assumptions may not be compatible with uncertainties estimated in site-specific analyses of gauged catchments. Instead, our regionalization method is based on hydrologic similarity, allowing for different empirical distributions for the gauged signature uncertainties, drawing on previous studies by Holmes et al. [2002] and Westerberg et al. [2014]. Hydrologic similarity was expressed as the Euclidean distance d it in the standardized catchment descriptor space between each gauged catchment, i, and the target catchment, t: where X mi is the standardized catchment descriptor m (normalized by the standard deviation for all stations) for catchment i, and M the number of descriptors. The catchment descriptors were chosen in a correlation analysis aimed at finding descriptors that were highly correlated with the signature values, but weakly correlated with each other (following Yadav et al. [2007] and Westerberg et al. [2014]). The chosen descriptors were mean annual precipitation in the study period, the 90 percentile catchment elevation, the BFIHOST base-flow index and catchment area (section 2 and Figure 1). These describe climate, topography and geology, similar to descriptors previously found to explain most of the observed daily streamflow behavior for UK catchments [Yadav et al., 2007]. We found it useful to also include catchment area [Kjeldsen et al., 2014;McIntyre et al., 2005], which can, for example, explain differences in flow peak attenuation that are more pronounced in hourly data.
A dynamic region of influence (i.e., a pooling group) was defined as the N catchments that were most similar to the target catchment [Burn, 1990]. The signature PDFs for each target catchment were then estimated by sampling from each pooling catchment's signature PDF, with the number of samples proportional to the catchment's similarity weight, w it ( Figure 2): This method of deriving the predicted signature distribution makes the assumption that the weights, based on the similarity quantified in the catchment descriptor space through (1), can be interpreted as the probability that each donor catchment is the ''nearest neighbor'' of the target catchment in terms of the streamflow signatures. That is, that the catchment descriptor similarity is proportional to the probability that the gauged catchment signatures are representative of the ungauged catchment signatures. Our method is equivalent to making multiple draws from the gauged catchment signature uncertainty distributions, and . Schematic illustration of the signature regionalization procedure. The signature distribution for the target ungauged catchment is estimated by sampling from the signature distributions for the most hydrologically similar gauged catchments proportional to their hydrological similarity weight (w i ). The hydrologic similarity was calculated as a function of the catchment descriptors BFIHOST, the 90th percentile of the elevation distribution, catchment area and mean annual precipitation.

Water Resources Research
10.1002/2015WR017635 each time selecting the ''probable nearest neighbor'' according to those probabilities. Regionalization uncertainty was thus represented by the weighted pooling group variability (equivalent to a nearest-neighbor method with uncertainty). This reflects the expectation that hydrologic similarity is always approximate [Olden et al., 2012;Oudin et al., 2010;Reichl et al., 2009;Wagener et al., 2007] and that there is no ideal donor catchment [Beven, 2000]. The method provides a direct visualization of the gauged versus the regionalization uncertainty components. We also present ''gauging-uncertainty-only'' regionalized results, where only the gauged uncertainty component was considered, to illustrate the effect of excluding uncertainty stemming from the regionalization procedure. These were obtained by randomly sampling values from each observed signature distribution in the N catchment pooling group and calculating the predicted value as a linear weighted combination [e.g., Holmes et al., 2002].

Uncertainty in Rating Curves and Flow Percentiles for Gauged Catchments
The rating curve estimation method succeeded in capturing the uncertainty in the gauging data over the diverse range of gauging data and rating curve characteristics for the 43 stations. Results from five stations with different flow magnitudes, rating curve sections and gauging data error characteristics are shown in Figure 3 as examples that illustrate the range of rating curve uncertainties across the data set. The first station ( Figure 3a) is affected by seasonal weed growth at low flows and extrapolation uncertainty due to lack of high-flow gaugings. The second station ( Figure 3b) has low uncertainty; it is well gauged for almost the whole flow range with little gauging data scatter. The third station (Figure 3c) was the ''worst case'' in the data set. It has a considerable gauging scatter for the whole flow range as a result of tidal influence and heavy weed growth, and the gauging authority has therefore downgraded it to a level-only station with a high-flow rating. The fourth station ( Figure 3d) is a velocity-area station with large scatter at low flows and a gauged out-of-bank section with high flows reaching large magnitudes. In contrast, the fifth station ( Figure  3e) has very low flow magnitudes, and a rating curve that appears to underestimate discharge for the whole flow range. Figure 4 shows how rating curve uncertainties propagate to uncertainty in hourly flow percentiles, with the results for the five stations in Figure 3 highlighted in blue. The relative uncertainties were calculated with respect to the optimal rating curve from the MCMC estimation. The signature uncertainties result from the combination of the rating curve uncertainty distribution and the variability of the flow time series during the period. An extrapolated and uncertain high-flow part of a rating curve will therefore have more/less impact on the signature uncertainties depending on how often the highest gauged discharge was exceeded. This is illustrated in Figure 5 for a station which had one of the largest rating curve extrapolations in the data set (about 2 m). However, the large extrapolation mainly affects one 5 h peak-flow event that is more than twice the size of the other annual maximum flows. This demonstrates that the time series variability needs to be considered when determining the effect of the rating curve uncertainty on the flow signatures. The largest relative uncertainties occurred at high and low flows ( Figure 4) where uncertainty in the rating-curve is normally highest (section 3.2), similar to results from a large Norwegian rating curve study [Petersen-Overleir et al., 2009].

Uncertainty in Signature Values for Gauged Catchments
The relative uncertainty ranges were calculated as the half-width of the 5-95 percentile range for each signature and catchment. Then the 5, 50 and 95 percentiles of these ranges were calculated to illustrate typical values for catchments with low, medium and high uncertainty respectively ( Table 2). The uncertainties were in general lowest (median values 610-15%) for Q BFI measuring average groundwater contribution and Q MEAN and Q 5 measuring average flow behavior, and highest (median values 630-40%) for signatures measuring low and high flow magnitude (e.g., Q 99 and Q 0.01 ) and dynamics (Q LV and Q HV ). This shows that careful consideration of data uncertainty is needed when using the latter type of signatures, e.g., in flood and drought studies. Signatures measuring flow variability across the time series (Q AC and Q CV ) had uncertainty magnitudes similar to or somewhat higher than the average flow signatures, where Q AC had generally higher uncertainty in the above-median range, but the lowest uncertainty in the below-median range.  high flows are less extreme, thus facilitating gauging of the whole flow range. In addition, the slow waterlevel dynamics would control the values of the Q AC and Q BFI signatures, rather than rating curve uncertainty.
The uncertainty magnitudes were correlated within the low/high flow signature groups, but poorly correlated between these two groups, which is expected given the different factors affecting the uncertainty of high and low flows ( Table 2). The uncertainty magnitude for S FDC was poorly correlated (correlation coefficient <0.5) against all the other signatures. We found that the uncertainty in this signature was high for most stations that had a breakpoint in the rating curve in the 33-66 percentile flow range.
The relative signature uncertainties were in some cases considerable even for signatures measuring average behavior, such as Q MEAN (630-40% for three catchments in Figure 6b). This demonstrates the systematic nature of discharge uncertainties caused by rating curve uncertainty; they do not cancel out when averaging over longer time periods. These large uncertainties in Q MEAN occurred for the stations that had a large gauging scatter over the whole flow range (e.g., 70005 in Figure 3c that was affected by tides and heavy weed growth), and when there was large uncertainty in the flow range that contributed the largest flow volumes (Table 2). This was also illustrated by a high correlation between the magnitude of the Q MEAN uncertainty range and that for Q 50 and Q 5 .
For cross-catchment comparisons, signature uncertainty is important when the absolute uncertainties overlap and therefore impede the interpretation of differences between catchments. This situation was in particular found for S FDC (Figure 6c, note the contrast to Q MEAN in Figure 6a), but would also be of concern for several other signatures (e.g., Q LV and Q AC , supporting information Figures S1-S5). We found no clear links between the type of gauging station and the uncertainty magnitudes; these are likely more controlled by local gauging conditions such as weed growth and backwater causing variability in the stage-discharge relation.

Uncertainty in Signature Predictions for Ungauged Catchments
To evaluate the signature predictions for ungauged catchments we compared the uncertainty distributions for the gauged signatures with those for the regionalized predictions. The comparisons were made 1) across all signatures and catchments using summary measures, 2) by analyzing differences between signatures and catchments, and 3) by analyzing the contributions of the different uncertainty sources to the predicted PDFs. We first evaluated the performance of the regionalization method in a leave-one-out cross-validation by comparing the overlap between the 5-95 percentile ranges of the gauged and regionalized signature distributions. This comparison accounts for the uncertainty in the observed as well as the regionalized values and was made in terms of reliability (the overlapping range as a percentage of the gauged range) and precision (the overlap as a percentage of the regionalized range, see definitions in Figure 7c). Ideally predictions should have both high reliability and precision, but high reliability is more important than high precision. These measures have previously been used by, e.g., Westerberg et al. [2014] and are similar to those used by Yadav et al. [2007]. We also compared the distributions using two standard metrics, the Kolmogorov-Smirnov distance (K-S, the maximum absolute distance between the CDFs) and the earth mover's distance (EM, the sum of the absolute distances between the CDFs).
Reliability (across all signatures and catchments, Figure 7a) increased with the number of stations, while precision decreased (the regionalized distributions became wider as the pooling group size increased). The average reliability was high (>80%) even for a pooling group of 4-5 stations, and there was a marked increase in the reliability for poor performance stations (10 th percentile) when the group size increased, with a smaller drop in overall precision. The K-S and the EM distance metrics showed similar results with a large initial decrease in the distances when the number of pooling catchments increased (Figure 7d). This was followed by a slight increase in the distances as more and more samples were taken from catchments with low similarity weights with often somewhat wider and flatter distributions as a result. The number of stations in the pooling group was chosen to be 10 as a trade-off between increase in reliability and decrease in precision across all the signatures. Increasing the number of stations means that the pooling group is less homogenous, thus allowing more reliable predictions for catchments with low hydrologic similarity. However, for catchments near the extremes of the signature distributions, this implies less precise predictions [Burn, 1990;Holmes et al., 2002].
With 10 pooling catchments the average reliability (precision) varied between 83 and 93% (24-48%) for the different signatures, while the 10 th percentile of the overall reliability (measuring poor performance) was 54%. The pooling group size is similar to that used by Westerberg et al. [2014] who used 8 donor catchments in regionalization of FDCs with uncertainty, while Holmes et al. [2002] also used 10 donors in deterministic regionalization of Q 95 in the UK. Previous studies using catchment similarity as a basis for conceptual model Largest uncertainty where there was a large scatter for the whole flow range (e.g., Figure 3c), or when there was large uncertainty in the range of flows that contribute most of the total flow volume. For comparison we investigated the reliability and precision of a gauging-uncertainty-only regionalization where regionalization uncertainty was not included (section 3.3). This resulted in predicted regionalized distributions that were generally narrower than the observed distributions and much less reliable (average reliability 18% for 10 pooling stations). The gauging-uncertainty-only predicted distributions became narrower as more pooling stations were included in the weighted average predictions, leading to a decrease in average reliability with increase in pooling group size (Figure 7a). This was also seen in the K-S and EM distances that increased with the number of pooling stations (Figure 7d).

Differences in Regionalization Performance Between Signatures
The average flow signatures had the highest reliability followed by the high flow signatures, whereas the low flow signatures had the worst performance (Figure 7b). In general, the results were reliable except for the most extreme signature values that were not captured well (Figures 7 and 8). This is expected as the regionalization predictions were constrained to the observed variability among the pooling catchments. In the best performance range, there were 7 (2) catchments where all 15 signatures had a reliability of 90% (100%), and 31 (27) catchments where 10 or more signatures had a reliability of 90% (100%). In the poor performance range, there was one catchment for which 7 signatures had a reliability of 0, and an additional 9 catchments where 1-3 signatures had a reliability of 0. The station that had the poorest results had the Both the magnitude of the gauged uncertainties and the explanatory strength of the regionalization method need to be considered when evaluating the results (Figure 8). The average signatures (Q MEAN , and Q 5 ) that both had low gauged uncertainty and strong correlations with the catchment characteristics, had the highest number of stations (>84%) with high reliability (>95%). The opposite was seen for S FDC that had the lowest number of stations (49%) with high reliability. This was caused by high gauged uncertainty in relation to the signature range across the data set (Figures 6c and 8), in combination with poor correlation with the catchment characteristics. This poor correlation may partly be a result of the high gauged uncertainties, exemplifying how consideration of gauged uncertainty is important when interpreting the regionalization results.
When using the regionalized signature information, high reliability is important, but high precision is also desirable. Regionalized values with large uncertainty compared to the gauged range give little information about the regional signature variability (e.g., Q AC , Figure 8). The signatures measuring flow dynamics (Table 1, Q AC , Q HV , etc.), had less precise regionalized results than those measuring flow distribution (Q MEAN and the flow percentiles, Figure 8). Better results might be obtained for some flow dynamics signatures by tailoring the regionalization to each signature separately, e.g., by giving a higher weight to the BFIHOST characteristic in predicting Q BFI .

Contribution of the Different Uncertainty Sources
The regionalized distributions provide additional information about the success of the regionalization and the role of the different uncertainty sources (Figure 9). The color of the distributions illustrates the contributions from catchments with different hydrologic similarity to the target catchment. For example, where the whole distribution is light-blue (e.g., 25003 and 72015 in Figure 9), all the pooling group catchments have a low hydrologic similarity with the ungauged catchment, indicating that the regionalization is likely to be imprecise.
The shape of the distributions varied from unimodal (where gauged uncertainty dominates) to multimodal distributions (where regionalization uncertainty dominates). In general, the low flow signatures were the most unimodal (gauged uncertainty was high), and the average flow signatures were the most multimodal (gauged uncertainty was low). The widths of the distributions are also important. For example, if there is both a wide range of signature values and multiple separated peaks (e.g., Q 5 for 27051) this reflects a large variability within the pooling group and that the regionalization uncertainty dominates. This contrasts with other cases where the regionalized distribution is more compact and the gauged uncertainty dominates over the regionalization uncertainty (e.g., S FDC for 27084). Where stations have different levels of gauged uncertainty this gave regionalized distributions with multiple peaks of different width (e.g., S FDC for 60003 and Q 5 for 41022). These results clearly illustrate the risk of attributing differences between catchments that are a result of gauged uncertainty to differences in catchment behavior. In other cases, disregarding gauged uncertainty may lead to underestimation of signature variability between catchments. Neglecting the gauged uncertainties thus leads to overconditioning of the regionalization inference, i.e., the domain of possible predicted values is too constrained because the full range of possible data values is not taken into account.
A comparison with a gauging-uncertainty-only regionalized simulation (i.e., not including regionalization uncertainty, section 3.3) was also made (grey lines in Figure 9). These distributions were often narrower than the gauged uncertainties and often completely outside the gauged distributions, showing that the regionalization uncertainty needs to be considered to obtain reliable results. In addition, the optimal gauged and regionalized signature values (the optimal values from the MCMC rating curve estimation) are shown as black and grey dots respectively on the x-axes in the figures. These illustrate how overconditioned analyses that do not consider gauged uncertainty can be (e.g., Q 0.01 for 25003, Q CV for 41022 and S FDC for 27084).

Rating Curve Uncertainty
Our rating curve uncertainty estimation method captured the uncertainty for diverse gauging data sets with different epistemic errors (e.g., weed growth or high-flow extrapolation), and different multisection power-law rating curves. Site-specific hydraulic information might reduce the uncertainties, but would require detailed information and investigation [e.g., Le Coz et al., 2014], that is typically not available across large catchment data sets. We found it important to check the estimates for each station against available metadata, including information about nonideal conditions such as weed growth, backwater and out-ofbank flow ranges. The last was especially important to avoid unreliable extrapolation where there was insufficient information about the out-of-bank rating, and we excluded several such catchments. It is important to note that rating curve uncertainty may vary with time. This means that our estimates are not necessarily representative for other time periods as a result of station modifications, rating shifts, and/or different flow ranges (in particular out-of-bank flows). The rating curve estimations involved a considerable effort, suggesting that similar estimations for hundreds of stations are not easily achievable and that depth may need to be balanced with breath for large-sample hydrology also in terms of data uncertainty estimation [Gupta et al., 2014].

Gauged Signature Uncertainty
The medium-level gauged signature uncertainty magnitudes (Table 2) we found were similar to the 610-40% range found in two catchments by Westerberg and McMillan [2015]. However, we found a large Water Resources Research 10.1002/2015WR017635 range in uncertainty magnitudes across the 43 catchments for all signatures, including those measuring average conditions (e.g., 610-30% range in typical low and high uncertainty values for Q MEAN ). This large variability, and the absence of clear links between station types and uncertainty magnitudes, illustrate the importance of site-specific factors in controlling uncertainty. We found factors linked to high uncertainty in particular signature types, such as uncertainty in breakpoint location for multisection rating curves affecting S FDC . Our analyses could be extended in the future to include catchments with greater human impacts, and to include signatures requiring both rainfall and runoff data such as runoff ratio, which may have different uncertainty characteristics. Uncertainties in the water level time series may be important, but were not considered here other than removing suspect, flagged data and excluding stations with documented problems (e.g., blocked stilling-well intake pipes). In addition to data uncertainty, the data time step determines the information gained from signature values, as temporal averaging leads to loss of information about shortterm response patterns . We used an hourly time scale as catchments in England and Wales are small, and we could see a clear loss of flow-peak information when averaging data to a daily time scale.

Signature Regionalization
The signatures quantifying low-flow magnitude and dynamics had the poorest regionalization. These signatures had high gauged uncertainty, the lowest correlations with the independent catchment descriptors, and they may be more susceptible to water level time series uncertainty (e.g., from moderate human impacts like sluices) than average and high flow signatures. As discussed by Olden et al. [2012], limitations in data and process understanding (e.g., surface water-groundwater connectivity) make it difficult to accurately characterize spatial variation in low flow magnitude and duration using catchment descriptors. However, including additional geologic data might improve prediction [Holmes et al., 2002]. Similar to the Austrian study by Viglione et al. [2013], the average flow signatures had the best regionalization results. In contrast, they found poorer results for high than low flows, but they did not account for gauged uncertainty that is often large at high flows.
Our regionalization method enabled visualization of how uncertainties in the gauged data and the regionalization contribute to the predicted uncertainty; thus providing valuable additional information about the reliability of each prediction. For example, if the highest weighted catchments have distinct signature peaks, this shows that regionalization uncertainty is large and that the hydrologic similarity definition might be improved. The influence of the gauged uncertainties in the individual pooling catchments are also made explicit. Catchments with high gauged uncertainty contribute less information to the regionalization, but do not compromise the reliability of the predictions when their uncertainties are accounted for.
Our results clearly show that regionalization uncertainty is important: the gauging-uncertainty-only regionalized distributions were much less reliable and often completely outside the gauged distributions. We represented the regionalization uncertainty by the weighted pooling group variability, which is a simple and straight-forward method that enabled us to incorporate the site-specific gauged uncertainty distributions. However, the predicted signature distributions are conditional on both the hydrologic similarity weights and the assumption that the signature value in the target catchment can be selected according to the uncertain nearest neighbor method that we describe in section 3.3. This assumption could be explored in the future by developing methods to incorporate site-specific gauged uncertainty estimates in other regionalization methods, such as the Bayesian regression technique based on conditional probabilities, suggested by Muller et al. [1996]. Such methods could also enable prediction of ''extreme'' signature values outside the observed variability in the pooling data set (which here covered most of the variability among catchments with a natural flow regime in England and Wales, see section 2). Our method could be further developed by considering uncertainty in the definition of hydrologic similarity, both the type of measure/weighting used, and in the catchment characteristics data used to calculate it [Burn, 1990;Reichl et al., 2009]. The latter was found to be important in snow-dominated catchments [Arsenault and Brissette, 2014] and may be difficult to estimate, e.g., uncertainty in catchment area where groundwater and surface water catchments differ, but sensitivity analyses could be used for a first investigation of their impact. For the characteristics we used, we expect elevation to have low uncertainty, uncertainty in catchment area to be important in flat and karstic areas, uncertainty in BFIHOST to be dependent on the underlying model, and uncertainty in mean annual precipitation to depend strongly on the number of rain gauges . The latter study found that uncertainty in mean annual precipitation was around 610% in two catchments (50 and 135 km 2 ) using 1 rain gauge, which would not have a large effect on the relative weightings given the range of values in our data set (710-2400 mm/y, Figure 1). The choice and/or weighting of the catchment characteristics could also be tailored to the different signatures, e.g., using the most correlated characteristics for each signature, or enabling dynamic regions where the number of catchments depend on within-region similarity as suggested by Holmes et al. [2002].

Water Resources Research
10.1002/2015WR017635 5.4. Implications of Signature Uncertainty Uncertainties in discharge data and derived signature values affect analyses such as catchment classification, eco-hydrological analyses, change detection and model calibration [e.g., Juston et al., 2014;Kennard et al., 2010;McMillan et al., 2012]. Signatures have in particular been used to understand differences in catchment function, and to reduce predictive uncertainty in ungauged catchments Wagener and Montanari, 2011]. Understanding whether differences in signature values are a result of data uncertainty or a difference in catchment behavior is fundamental to comparative analyses [Kennard et al., 2010], and when transferring information to ungauged catchments Olden et al., 2012]. To understand the impact of data uncertainty the signature type needs to be considered, e.g., signatures describing the magnitude and dynamics of extreme flows are more susceptible to data uncertainty than those describing average behavior. Viglione et al. [2013] find that regionalization performance decreases with catchment area and discuss the generally poorer results found in arid regions [Bloeschl et al., 2013] as a result of greater nonlinearity in runoff processes and larger space-time variability. In addition to these factors, discharge uncertainties likely play an important role: in arid catchments there are few high flow events with rapid flow variability, which impedes reliable gauging of the high-flow rating curve. Similarly, greater discharge uncertainty can be expected in small catchments because greater flow variability and shorter rainfall-runoff lag times impede reliable gauging of the full flow range. Understanding the sources of data uncertainty, at what conditions they are active, and how they affect different types of signatures and analyses is therefore important for reliable estimation of predictive uncertainty.
The gauged signature uncertainty distributions varied in size and shape between the stations and over the flow range in a site-specific way. This means that regression-based regionalization that assumes normally distributed errors [e.g., Yadav et al., 2007], or fuzzy methods that use general discharge uncertainty estimates [e.g., Westerberg et al., 2014], do not fully represent the nature of these errors. Using the regionalized signatures from this study for constraining model predictive uncertainty in ungauged basins (as in these previous studies) could therefore provide valuable insights. Valuable further information would also be gained by investigating the effects of rating curve uncertainty on signature analyses in other regions as they are determined by measurement practices in combination with natural conditions such as topography, catchment size, geology and climate (e.g., snow and ice conditions would introduce different uncertainties [Hamilton and Moore, 2012]).

Conclusions
This study has demonstrated how rating curve uncertainty propagates to uncertainty in hydrological signatures and their regionalization across a large set of catchments with diverse flow series characteristics and across multiple signature types. The gauged uncertainty varied with signature type and for each station local measurement conditions (e.g., weed growth, backwater, and station design) in combination with flow variability determined the uncertainty magnitudes. The catchments with the most dampened flow variability had the lowest signature uncertainties in our data set. The highest uncertainty magnitudes were found for signatures measuring high/low flow magnitude and dynamics (relative uncertainty 630-40% as the median across all catchments). Signatures measuring average flow behavior had lower uncertainty (median relative uncertainty 610-15%), but there was a large range in uncertainty magnitudes across the 43 catchments for all signatures. Our regionalization method allowed us to assess the role and relative magnitudes of the gauged and regionalized uncertainty sources in shaping the signature uncertainty distributions predicted for catchments treated as ungauged. We found that 1) if the gauged uncertainties were neglected there was a clear risk of overconditioning the regionalization inference, e.g., by attributing differences between catchments resulting from gauged uncertainty to differences in catchment behavior, and 2) the uncertainty in the regionalization results was lower for signatures measuring flow distribution than flow dynamics, as well as for average flows (and then high flows) compared to low flows.
Our results provide a strong demonstration of the need to investigate data uncertainties in analyses where signatures are used. Consideration of data uncertainty may often make these analyses more complex. But, as emphasized by Juston et al. [2013], moving beyond deterministic frameworks by recognizing these inherent data uncertainties increases our possibilities to draw robust conclusions about present and future hydrologic behavior -in gauged and ungauged catchments.