Regionalization of land-use impacts on streamflow using a network of paired catchments

Quantifying the impact of land use and cover (LUC) change on catchment hydrological response is essential for land-use planning and management. Yet hydrologists are often not able to present consistent and reliable evidence to support such decision-making. The issue tends to be twofold: a scarcity of relevant observations, and the difﬁculty of regionalizing any existing observations. This study explores the potential of a paired catchment monitoring network to provide statistically robust, regionalized predictions of LUC change impact in an environment of high hydrological variability. We test the importance of LUC variables to explain hydrological responses and to improve regionalized predictions using 24 catchments distributed along the Tropical Andes. For this, we calculate ﬁrst 50 physical catchment properties, and then select a subset based on correlation analysis. The reduced set is subsequently used to regionalize a selection of hydrological indices using multiple linear regression. Contrary to earlier studies, we ﬁnd that incorporating LUC variables in the regional model structures increases signiﬁcantly regression performance and predictive capacity for 66% of the indices. For the runoff ratio, baseﬂow index, and slope of the ﬂow duration curve, the mean absolute error reduces by 53% and the variance of the residuals by 79%, on average. We attribute the explanatory capacity of LUC in the regional model to the pairwise monitoring setup, which increases the contrast of the land-use signal in the data set


Introduction
Quantifying the impacts of land use and land cover change (LUCC) on the water cycle is still fraught with difficulties, and hydrologists are not able to present consistently reliable evidence to support management decisions. Although a large body of research on LUCC effects exists, in general there is still a paucity of evidence that unambiguously links them with hydrological nonstationarity in individual catchments [e.g., O'Connell et al., 2004[e.g., O'Connell et al., , 2007Bulygina et al., 2009;Bearup et al., 2014;Biederman et al., 2015;Livneh et al., 2015]. Given that much of the earth's surface is ungauged or poorly gauged [Fekete and V€ or€ osmarty, 2007;Wohl et al., 2012], the uncertainty about LUCC impacts is further exacerbated in sparsely gauged regions [Sivapalan, 2003;Visessri and McIntyre, 2016].
Evidence of LUCC impacts is commonly obtained by analyzing the dynamics of the hydrological response of an individual catchment, but this approach has several problems including interference of climate variability [Lørup et al., 1998], the gradual nature of changes that could take many years to reach equilibrium [Ashagrie et al., 2006], and therefore require long-term records before and after the change. Using pairwise catchment comparisons helps alleviating these problems. The ideal monitoring setup consists of a baseline period in which two catchments are under the same land-use, and an evaluation period in which one catchment's land-use is changed. In practice, this still requires long monitoring times and control over the landuse of a catchment, which is often not practical. An alternative to this setup, is to 'trade space for time ' [e.g., Buytaert andBeven, 2009, 2011;Oudin et al., 2010;Singh et al., 2011;Sivapalan et al., 2011], and to consider paired, collocated, catchments with different land-use types. While such a setup makes it much more difficult to attribute observed differences to land-use impacts, the catchments can be selected in such way that their size, shape, topography, soils, and climate are as similar as possible, such that land use and land cover Key Points: A paired catchment approach increases the detectability of land-use signals in regionalization Explicit inclusion of land use and cover in regional models increases significantly performance and predictive capacity It is possible to separate and regionalize land use and cover impacts from natural variability Supporting Information: Supporting Information S1 Table S1  Table S2  Table S3  Table S4 (LUC) are the major (known) source of variability [C elleri et al., 2010]. Although in practice catchments are unique [Beven, 2000] and therefore any differences observed between them can never be attributed for certain to differences in land-use, this approach is much more efficient and straightforward to implement. An additional advantage is that it provides more rapid results to feed into the decision-making process.
Direct approaches to assessing the impact of LUCC on the hydrological response include the comparison of flow statistics over time and space. For instance, hydrological regulation can be assessed by comparing the Flow Duration Curves (FDC) of catchments under different LUC regimes: a steep slope may indicate high flashiness of the hydrological response to input precipitation, whereas a more horizontal curve may represent a buffered behavior and larger storage capacity [Brown et al., 2005;Yadav et al., 2007]. However, this approach cannot explicitly deal with other differences between the catchments such as topography, and neither does it allow to correct for nonstationarity in boundary conditions [e.g., Beck et al., 2013a]. Although the use of rainfall-runoff models is an alternative that allows more explicit incorporation of catchment characteristics and climatic nonstationarity [Lørup et al., 1998;Schreider et al., 2002;Hundecha and B ardossy, 2004], such models have data requirements that are often problematic in data scarce environments [Wooldridge et al., 2001;Wagener and Wheater, 2006;Samaniego et al., 2010]. Alternatively, pooling of catchments may be a way to reduce the impact of uncontrolled variability such as topography in the statistical analysis, and to increase the statistical strength of the variable of interest (i.e., LUC).
Such pooling, commonly known as regionalization [Wagener and Wheater, 2006;Yadav et al., 2007], is most often applied in the context of prediction in ungauged basins [Sivapalan, 2003;Beck et al., 2013b]. The regionalization can be seen as an exercise in addressing nonstationarity (between donor and receptor catchments), which is conceptually similar to predicting the impact of LUCC (i.e., nonstationarity between the old and new land-use). Two regionalization approaches are typically used to account explicitly for differences between receptor and donor catchments [Bulygina et al., 2009[Bulygina et al., , 2011: (i) dependence of rainfallrunoff model parameters on catchment physical characteristics [e.g., Lamb and Kay, 2004;McIntyre et al., 2005;Parajka et al., 2005;Lee et al., 2006;Young, 2006], and (ii) dependence of hydrological indices (also referred to as streamflow signatures in hydrological literature) on catchment physical characteristics [e.g., Berger and Entekhabi, 2001, Shamir et al., 2005, B ardossy, 2007Yadav et al., 2007;Visessri and McIntyre, 2016].
The first approach uses knowledge on how rainfall-runoff model parameters vary spatially [e.g., Wagener and Wheater, 2006;Wagener, 2007;Beven, 2007;Buytaert and Beven, 2009]; however, parameter identifiability is evidently affected by model structural error and parameter interaction during calibration [Beven and Freer, 2001;McIntyre et al., 2005;Beven, 2006]. Recently, Samaniego et al. [2010] have found promising results to address the nontransferability of parameters across scales by introducing a multiscale parameter regionalization technique to consider the difference in sizes of catchments included in the pooling. Another problem related to lumped rainfall-runoff models is that parameters have generally no direct physical meaning, which makes it difficult to identify from the regionalization the sought links with a physical change [McIntyre et al., 2014]. This has led LUCC effects to be based on speculative changes in catchmentscale rainfall-runoff model parameters [Packman et al., 2004].
The second approach relies on relationships between catchment attributes and hydrological indices [e.g., Mazvimavi et al., 2005;Longobardi and Villani, 2008;van Dijk, 2010;Peña-Arancibia et al., 2010;Ahiablame et al., 2013;Beck et al., 2013b]. The major advantages of this regionalization procedure are that it: (i) is not specific to any rainfall-runoff model (thus not impacted by its model structural error) [Bulygina et al., 2012]; (ii) allows the inclusion of most commonly available hydrological information [Zhang et al., 2008]; (iii) does not establish relationships with conceptual rainfall-runoff model parameters (that are potentially affected by nondentifiability and equifinality during calibration) [Yadav et al., 2007]; (iv) estimates an explicit link between the catchment's physical properties as controls of its hydrological behavior [Yadav et al., 2007;Visessri and McIntyre, 2016]; and (v) provides an estimate of the error variance on which analyze uncertainty [Visessri and McIntyre, 2016]. As a result, such regional models have been widely adopted (e.g., the BFIHOST classification in the UK [Boorman et al., 1995] and the Curve Number in the US [USDA, 1986]). Additionally, some studies have investigated the calibration of rainfall-runoff model parameters by including hydrological index estimations in the objective functions [e.g., Zhang et al., 2008], and others have established formal probabilistic methods to integrate data from different sources and assess uncertainty in the modeling [Bulygina et al., 2009[Bulygina et al., , 2011[Bulygina et al., , 2012.

10.1002/2016WR018596
Although regionalization methods can be used to test the capacity of rainfall-runoff models to assess and predict the impacts of LUCC, in practice the identification of LUCC signals in the hydrological response is still problematic [e.g., Merz and Bloschl, 2009], especially for larger catchments. While previous studies have investigated the regionalization of hydrological indices, the conclusions about the importance of specific catchment attributes vary strongly [Beck et al., 2013b;McIntyre et al., 2014], and particularly on the importance of LUC variables as predictors in the regionalization [e.g., Visessri and McIntyre, 2016]. One reason for the difficulty of finding a statistically significant contribution of LUC variables, to explain catchment variability in a regionalization exercise, may be the heterogeneity within catchments used for regionalization. In consequence, in catchments where different land-uses and vegetation covers are mixed indistinctively, the particular effects of LUC and changes therein are more challenging to identify. Additionally, LUCC signals will often be difficult to separate from other spatially varying processes that affect the hydrological response, such as topography, soils, and climate.
Therefore, we hypothesize that the use of monitoring data from pairwise catchment setups may be a way to increase the contrast of LUCC signals in a regionalization exercise, and thus makes it increasingly likely to find a statistically significant relation between LUC variables and the hydrological response. In a regionalization context, a pairwise catchment setup has two main advantages: it increases the homogeneity of LUC within each catchment, and it compares catchments that are as similar as possible in all other physical characteristics (including climate). To test this hypothesis, the remainder of the paper is structured as follows: first we introduce a regionalization model that includes LUC as explanatory variables; this model is then applied to a network of 24 paired catchments in the tropical Andes defined according to the principles previously outlined; and the obtained are then discussed.

Selection of Hydrological Indices for LUCC Assessment
A wide range of indices exists to summarize the dynamics of the hydrological regime in catchments with different conditions. In principle, they attempt to represent relevant characteristics or alteration signals in catchment properties in means of hydrological information of streams [Hughes and James, 1989;Poff and Ward, 1989;Richards, 1989Richards, , 1990Poff, 1996;Richter et al., 1996Richter et al., , 1997Richter et al., , 1998Biggs, 1997, 2000;Wood et al., 2000;Gippel, 2001;Mathews and Richter, 2007;Beck et al., 2013a]. Following Poff and Ward [1989] and Richter et al. [1996] the flow regime is grouped in five widely accepted components: (i) magnitude, (ii) frequency, (iii) duration, (iv) timing, and (v) rate of change in flow events. Olden and Poff [2003] have extended this classification with subcategories for average, low, and high flows.
The selection of hydrological indices should focus on streamflow characteristics that respond to the contextualized practical issue, for instance, those that are most susceptible to change under LUCC [Archer et al., 2010]. Previously, watershed management was related to limited information obtained from streams, especially about quality and only one aspect of quantity, the minimum flow [Poff et al., 1997]. Clausen and Biggs [1997], Buytaert et al. [2007] and Ochoa-Tocachi et al. [2016] are good examples of simple calculations of hydrological indices that put on evidence relevant LUCC effects. However, one issue about the selection of hydrological indices, especially in regionalization studies, is their interdependence [Almeida et al., 2016]. Indices should be relatively independent and well-defined among catchments so that relationships with physical catchment properties minimize ambiguity [Sefton and Howarth, 1998;Bulygina et al., 2009]. Another challenge in the context of LUCC assessment is that changes of practical interest may occur at different scales than those at which the hydrological indices are estimated and regionalized [Bulygina et al., 2009]. The challenge is therefore to identify adequately informative, independent, and well defined hydrological indices for robust regional relationships. Olden and Poff [2003] compiled 171 hydrological indices and performed Principal Component Analysis (PCA) using 36 years of daily streamflow data from 420 catchments in the US. The PCA was leveraged to inspect the governing intercorrelation patterns between variables and to determine subsets that explain the major sources of variation while minimizing redundancy. The disadvantages of PCA are the assumptions of linear combination among variables and normality in data. Their study resulted in 25 indices being listed under the first four principal components, which explained 75% of the original variance [Olden and Poff, 2003]. This approach becomes useful for hydro-ecological studies by supporting the selection of high-Water Resources Research 10.1002/2016WR018596 informational and nonredundant variables for a particular region [Sefton and Howarth, 1998;Yadav et al., 2007].

Robust Regionalization of Hydrological Indices to Assess LUCC Impacts
Although relationships between physical catchment properties and hydrological indices can be done by different approaches [e.g., Parajka et al., 2013], linear regression is one of the most straightforward and common methods [Berger and Entekhabi, 2001;Yadav et al., 2007;Visessri and McIntyre, 2016]: where I j is the expected value of the j th hydrological index, C 1 , C 2 , . . ., C p-1 are p-1 physical catchment properties, b 0 , b 1 , b 2 , . . ., b p-1 are the p regression coefficients, and e is an error term.
A forward stepwise regression technique is generally adopted to discard variables that do not significantly contribute to the index prediction using a critical p-value of 0.05 [Yadav et al., 2007;Visessri and McIntyre, 2016]. However, Visessri and McIntyre [2016] point that this procedure may remove predictors that have physical significance due their potential high interdependency with other descriptors. To ensure robustness in the regional models, catchment properties must be both statistically significant and physically important to explain the hydrological indices.
In this study, we implement this regression model to assess the statistical significance of LUC as explanatory variables. The approach consists of the following steps: 1. Data were obtained from a monitoring network of paired catchments that cover a wide range of physical, climatic, and LUC characteristics in the studied region; 2. Comprehensive sets of physical catchment descriptors and hydrological indices were calculated; 3. Redundancy assessment to analyze multicollinearity and reduce the number of descriptors to a manageable size (i.e., lower than the number of catchments used); 4. Regionalization of hydrological indices against physical and climatic properties with and without the consideration of LUC variables; 5. Leave-one-out cross-validation of the regional models; and, 6. Analysis and discussion of the physical meaning of LUC variables in the resulting regression model structures.
It should be noted that our model is not a pairwise comparison in a statistical sense, which would require the regionalization of the within-pair differences of each catchment characteristic. In a statistical pairwise comparison, pairs are assumed to be identical except for the treatment, which is clearly problematic for catchments [Beven, 2000]. Instead, we treat each catchment as an independent observation in the linear regression model and, rather than questioning the difference between two paired catchments, the collective differences within a pool of paired catchments are statistically analyzed. We hypothesize that the geographical vicinity of catchments, with similar physical properties but contrasting LUC, enhances the LUCC signal within the data set and thus increases the predictive capacity of LUC as descriptor variables in the regression model.

Study Region
We tested our methodology in the Tropical Andes, which is a region characterized by a remarkable natural variability in biogeography and meteorology and thus exhibits diverse hydrological processes. Ecologically, natural Andean ecosystems are encompassed in five large landscape units [Cuesta et al., 2009], from which we consider high Andean grasslands with some forest occurrences. The entire region is affected by common issues of land degradation, compaction and erosion, as a result of conversion to grazing, cultivation or afforestation fields. Therefore, it is particularly challenging from a management perspective.
We used 24 catchments sized between 0.5 km 2 to 7.8 km 2 and distributed in 9 sites along the studied region ( Figure 1). The catchments are located between 0 and 18 south, covering an elevation range from 2500 m to 5000 m altitude. Sites are rural and without water abstractions or stream alterations. Shapes are typically oval tending to circular or stretched. Slopes are generally steep and uneven. The vegetation cover The catchments feature a characteristic high tropical mountain climate regime. [Viviroli et al., 2007;Ochoa-Tocachi et al., 2016]. Seasonal variability is diverse, mainly low in catchments close to the equator, while climatic intraday ranges are larger [Buytaert et al., 2006a;C ordova et al., 2015]. In general, their hydrological response is mainly driven by precipitation and strongly related to their soil conditions, with rainfall intensities typically lower than soil infiltration [Buytaert et al., 2005]. The catchments share a baseflow soildominated response due to the virtual absence of infiltration excess overland flow  and the presence of underlying impermeable bedrock that minimize deep infiltration and groundwater storage . Large water yields have been reported in natural catchments [Buytaert et al., 2006a;Ochoa-Tocachi et al., 2016], and have been linked to the extent of wetlands likely due to the occurrence of saturation excess flow [Mosquera et al. 2015]. Buytaert and Beven [2011] also mention processes triggered by thresholds such as hydrologically disconnected storages found within the catchment microtopography, and nonstationarity in evapotranspiration, infiltration, and routing produced by growing vegetation.

Monitoring Setup
The catchments are part of a participatory network of collocated research basins in Andean ecosystems (known as iMHEA by its Spanish abbreviation) allowing monitoring in many regions that would otherwise be prohibitively expensive or impractical [C elleri et al., 2010]. The generated data are aimed at evaluating the effects of LUCC over ecosystem hydrology. Typically, one catchment is chosen such that it is representative for a reference state, while its pair represents the land-use or management practice to be evaluated. The network focuses on small catchments with homogeneous LUC in at least 75% of its surface. Key physical characteristics are surveyed at the beginning of the monitoring period.
Precipitation is measured with at least 2 and typically 3 tipping bucket rain gauges (resolution of 0.254 mm or better) at a height above ground level of 1.50 m and distributed in the catchment areas to account for the high spatial variability of rainfall [Buytaert et al., 2006b[Buytaert et al., , C elleri et al., 2007. Correlation between the multiple rain gauges within each catchment allows for detecting and correcting errors, filling data gaps, and obtaining reliable averaged values. Streamflow is measured at the outlet of each catchment using compound triangular-rectangular weirs equipped with pressure transducers. Water level is recorded at an interval of at least 15 min and generally 5 min. The lengths of the monitoring for each site are shown in Figure 1.

Calculation of Catchment Characteristics and Hydrological Response Indices
A set of 50 physical and climatic descriptors was calculated for all catchments (Figure 2). Characteristics are grouped by shape, drainage, elevation, topography, meteorology, rainfall intensity, land cover and land use.
In contrast to studies in relatively well-gauged regions, many countries suffer from data-scarcity (e.g., in soil hydrological properties, [Visessri and McIntyre, 2016]), which affects the regionalization approach. Shape, drainage, and LUC variables were derived from available geographic data. Elevation and slopes were calculated using contour lines at 40 m vertical resolution from national cartographic data. Land cover (LC) percentages were classified in three distinctive categories, and land use (LU) was based on a straightforward division of 0 (absent) and 1 (present) use. Despite the limited classification of LUC variables used (Figure 2), they represent the main interventions in the region, which have been linked to impacts on the hydrological response in the available literature [C elleri, 2010].
The tipping bucket rainfall data were interpolated by a composite cubic spline on the cumulative rainfall curve [Sadler and Busscher, 1989;Ciach, 2003;Wang et al., 2008;Padr on et al., 2015] and aggregated at intervals compatible with discharge. Rainfall intensities were calculated using a 5 min scale moving window for durations between 5 min and 2 days. The seasonality index [Walsh and Lawler, 1981] was computed and normalized between 0 and 1. Reference evapotranspiration was calculated using Worldclim temperature data [Hijmans et al., 2005] and the Hargreaves formula [Hargreaves and Samani, 1985;Allen et al., 1998].
A set of 50 hydrological indices (Figure 3) was derived from averaged daily discharge data normalized by catchment areas. The indices were extracted from the review provided by Olden and Poff [2003], including the commonly used Indicators of Hydrologic Alteration [Richter et al., 1998], and complemented with those of interest of the iMHEA partners, such as the Richards-Baker Flashiness Index [Baker et al., 2004]. Most of the magnitude indices are normalized by the median daily flow (MA2) to provide nondimensional comparable values for the first statistical moment. Coefficients of variation (standard deviation divided by mean) for different flow components (MA3, ML21, ML18, DL17, DH16, TL2) represent the second moment. Skewness of daily flows (MA55mean/median) provides an estimate of the third moment. Frequency, magnitude, duration, and variability in pulses above and below given thresholds in the hydrograph were similarly calculated at a daily basis [Poff, 1996].

Selection of Variables for Regionalization
The commonly adopted criteria for selecting catchment descriptors in regionalization studies are data availability and expected physical relationship with the predicted hydrological indices [Sefton and Howarth, 1998;Wagener and Wheater, 2006;Young, 2006;Yadav et al., 2007;Visessri and McIntyre, 2016]. While linear regressions may allow for dependence between predictors, interpreting their significance in a multilinear regression is complicated by high inter-dependencies. As seen in Figure 2, the 50 physical and climatic characteristics are highly correlated within each category. To reduce the number of descriptors while selecting nonredundant high-informative variables, Yadav et al. [2007] used Spearman [1904] correlation coefficients (q) to explore pairwise patterns of variability. Analogously, we discarded variables based on a threshold for q of 60.75 and p-values on a significance level of 5% under the null-hypothesis of no correlation.
The application of linear regressions generally involves the assumption of normally distributed random errors, in this case, attributed to natural variability in hydrological response (i.e., aleatoric uncertainty) and the limited number of physical descriptors used in their estimation (i.e., epistemic uncertainty) [Bulygina et al., 2011]. Kjeldsen and Jones [2010], Visessri and McIntyre [2016], and Almeida et al.
[2016] provide thorough error descriptions in hydrological regressions and the former study addresses the issue of datatransferability. In contrast to sampling errors derived from the data used to calculate a hydrological index, they highlight that each different selection of catchment descriptors would produce a particular regression error structure. In practice, variables are generally skewed and data transformations are needed to approximate normal distributions [Beck et al., 2013b]. Therefore, to improve the results of the linear regressions and to avoid bias in the estimates, we primarily retained those descriptors that better approximate normality (Figure 4).

10.1002/2016WR018596
While physical characteristics may be relatively independent, this is not completely the case for hydrological indices as they are calculated from the same streamflow data [Olden and Poff, 2003]. Three indices, which are generally used in hydrological studies and watershed management, were selected for further study. The   Gringorten [1963]. These last two indices are indicators of hydrological regulation at short and long temporal scales, respectively.

Regionalization and Model Evaluation
The total set of 50 hydrological indices was regionalized through stepwise regression of equation (1) based on a reduced set of 10 physical and climatic properties, and then adding LC only and later all LUC variables. The relative importance of descriptors is commonly quantified as the number of times they occur in the stepwise regressions (e.g., using cross-validations [Yadav et al., 2007]). However, being the stepwise  regression necessarily a subjective process, the appearance of specific explanatory variables needs cautious interpretation. Previous studies [e.g., Merz and Bloschl, 2009;Visessri and McIntyre, 2016] found little or ambiguous representation of LUC influences on regionalized hydrological indices. The value of using paired catchments as independent data points in the common regionalization process increases the potential of the data set to identify the impact of LUCC. Instead of using these catchments in a classic statistical pairwise analysis, our discussion focuses on assessing their effectiveness to regionalize the effects on the hydrological response by increasing the contrast in LUC variables between collocated, physically similar catchments.
To assess the goodness of fit, the coefficient of determination (ordinary R 2 [Pearson, 1895]) has been typically used [e.g., Yadav et al., 2007]. However, as descriptors are added to the regression model, R 2 increases as the fit improves, producing an 'artificial' idea of success. The adjusted-R 2 (R 2 adj ) decreases as predictors are added to the model structure if the increase in performance does not compensate the loss of degrees of freedom: where I* i is the observed hydrological index, I i is the estimated index, I m is the average index value between catchments, W is the number of catchments used, and w is the number of descriptors used in the linear regression model. The success in performance was quantified as the ratio of models with R 2 adj over 0.6, while the success in improvement due to the inclusion of LUC variables was estimated as the ratio of models which have had a positive change in R 2 adj . The width of the 95% confidence intervals (D) derived from the regressions was used as a measure of uncertainty.
To evaluate the predictive capacity of regional models, a leave-one-out cross-validation was applied for the 3 hydrological indices defined in section 2.4. The explanatory variables were fixed (i.e., keeping the same model structures identified from the stepwise regression with the 24 catchments), and each catchment was treated as 'ungauged' in turn while its pair was part of the model derivation. The average bias of the regression models (mean of the absolute values of the residuals between the actual observation and the regional prediction) and the variance of the residuals were used to compare model performance. Additionally, the ranges in R 2 adj and in D from the resulting iterations were also compared between the model structures.

Physical, Climatic and Hydrological Characteristics
The complete sets of catchment properties and hydrological indices, and correlation matrices between the 50 descriptors are available as supporting information.
Physical and climatic properties are shown in Figure 2. Some catchments have particular shape and drainage characteristics, for instance, while catchments in LLO have the largest drainage densities (DD), HUA catchments have particularly low elongation ratios (RE). On the other hand, elevation, slope, and meteorological characteristics are relatively well distributed among their ranges, exhibiting typical mountainous features. As seen in Figure 1, precipitation properties, especially rainfall intensities, are generally grouped by site. From these results, it was found that (i) the catchments show some consistent and comparable characteristics where key physical features are diverse, (ii) catchment collocation (i.e., latitude, elevation) is a dominant driver of their climate, and (iii) several descriptors within each category are highly correlated and can be reduced throughout a redundancy assessment.
A large diversity in hydrometeorological characteristics can be seen in Figures 1, 2 and 3. Annual precipitation ranges from 613 mm to 2677 mm, with a wetness index (RPPE) between 0.49 and 2.27. The rainfall seasonality index (SINDX) ranges from 0.13 to 0.51, while maximum daily intensities present medians

10.1002/2016WR018596
Many hydrological indices tend to be positively skewed and, in general, similar magnitudes in the hydrological responses are grouped by site (Figures 1 and 3). This confirms the established idea that paired catchments, which are physically similar and generally collocated, will provide similar responses unless an external driver, such as LUCC, affects their hydrological behavior. Because several indices depend on the same data and similar calculation, for instance RR and DIFF, RBI1 and RB2, or QDMY and QDML (due to the short streamflow records used that omit interannual variations), they are closely correlated. Hydrological indices from Olden and Poff [2003] show high variability among catchments. Those derived from high and low flow pulses (e.g., MH22 or FL3) result in several null values attributed to thresholds that are too high (e.g., 3 or 7 times the median flow) or too low (e.g., 5% of the mean flow).

Redundancy Analysis and Selection of Variables for Regionalization
While physical and climatic characteristics are relatively independent between different categories, some are correlated within each group because they are functions of the same variables. Most shape descriptors are based on AREA and PER, for instance, so q 5 61 between CC, KF2, and RC. Similarly, p-values <0.05 occur between properties that are combinations of others (e.g., as expected from the Hargreaves formula, LAT, ETYEAR, and RPPE are redundant) or that are highly intrinsically related (e.g., elevations and rainfall intensities). Meteorological parameters (ETYEAR, PYEAR, RPPE) are highly correlated to rainfall variability (PVAR, DAYP0, PMDRY, SINDX) and intensities (RMED, iMAX), suggesting that although several descriptors can be derived from precipitation data, they provide highly interdependent information. The 7 LUC variables were judged to be independent from other descriptors and between them, resulting in |q| <0.6 (far below the threshold of 60.75 [Yadav et al., 2007]), although some occurrences of p-values <0.05 were observed.
We retained two descriptors from each category that are relatively independent between them yet share information with the discarded properties. Exceptions were elevations and rainfall intensities from which only one descriptor was extracted from each group. This resulted in a reduced set of 10 catchment properties, excluding LUC, to be used in the regionalization. Because of the varying degrees of interdependency between hydrological indices [Olden and Poff, 2003], the complete set of 50 indices was used in the first step of the regionalization exercise, from which only 3 representative indices were retained for validation. Figure 4 shows the univariate relationships, q and p-value between the 10 descriptors and the 3 hydrological indices. In general, the histograms show distributions with no obvious bias.

Performance and Validation of the Regional Models
We find evident differences in performance due to the explicit incorporation of LUC variables as predictors in the derivation of regional models. Without them, out of the 50 hydrological indices tested, only 20% have R 2 adj above the acceptable threshold of 0.6. The success in performance increases to 48% after including LC, and to 66% with the complete LUC set (Table 1). Even though some regressions remained below the threshold, 72% of the 50 indices show a positive change in R 2 adj (improvement success) when adding LC. On top of that, the addition of LU further improves the R 2 adj for 74% of the 50 indices (Table 1). Because the relationships are outputs of a stepwise regression, this improvement in model performance reflects the statistical significance of LUC variables to explain the hydrological response of the catchments. This suggests that it is possible to separate LUCC impacts from natural variability, and thus to regionalize them. However, the predictability of some of the indices is still challenging; for instance, DH15, TH3, and RA5 did not improve after the inclusion of LUC variables, while QDMIN, K, DL17, and TL1 remained far below the acceptable threshold. Figure 5 illustrates the results for the 3 selected hydrological indices. The plots show the index observations used to develop the regressions (x-axis) against those estimated from the regional relationships (y-axis) with a 95% confidence interval. In the case of the RR, the regressions have significant R 2 adj values of 0.68 without LUC variables (left), 0.67 adding LC (middle), and 0.88 with LUC variables (right). Although the performance is slightly impaired when LC is part of the descriptor set, the model considerably improves when LU is also added. Therefore, the difference in water yield between paired catchments is more accurately represented when all LUC variables are considered. A different trend is observed for the BFI, in which R 2 adj increases from 0.22 (without LUC variables) to 0.58 (adding LC) and up to 0.77 (with LUC variables). Although a progressive improvement in performance is observed as R 2 adj only depends on the predictand variable, uncertainty may also escalate as the confidence intervals depend on the predictor variables as well. This can be noticed by an increase in D for BFI when only LC is part of the model structure ( Figure 5).

10.1002/2016WR018596
In the case of the R2FDC, R 2 adj significantly increases from 0.21 (without LUC variables) to 0.86 (adding LC). However, the further inclusion of LU does not produce any improvement.
Results of the leave-one-out cross-validation for the 3 selected indices are shown in Table 2 and Figure 6. The ranges in R 2 adj indicate consistency in the results of the regionalization when one catchment is excluded  On the other hand, although the average D decreases when LUC variables are added by 14% for the RR and by 45% for the R2FDC, it increases by 41% for the BFI. Shrinking the width of confidence bounds helps decreasing uncertainty in estimations; however, these results also highlight the difficulty of obtaining robust regional relationships with few data. Nevertheless, the mean absolute error (MAE) reduces when adding LUC variables by 41% (RR), 59% (BFI), and 58% (R2FDC). Similarly, the variance of the residuals (VAR) considerably decreases as a result of this improvement in model structure by 70% (RR), 82% (BFI), and up to 84% (R2FDC). (left) without land use and cover variables, (middle) adding land cover only, and (right) additionally including land use. The x axis corresponds to the index value calculated from each catchment time series; the y axis corresponds to the estimated value from the regional models; the squares color-coding relates to each catchment location in Figure 1; the gray lines correspond to the 95% confidence interval for each estimation.

Discussion
This paper describes the potential of a paired catchment monitoring network to provide statistically robust, regionalized predictions of LUCC impact in an environment of high hydrological variability such as the Tropical Andes. Instead of performing a classic pairwise statistical analysis [e.g., Buytaert et al., 2007;Ochoa-Tocachi et al., 2016], we focus on using paired catchments to increase the contrast of LUC variables and capture LUCC signals in the regionalized hydrological responses. The collocation of catchments minimizes climatic differences between pairs, which at the same time are selected in such way that their physical characteristics are as similar as possible [C elleri et al., 2010]. The paired catchment monitoring setup applied to the common regionalization procedure is a novel contribution of this study for both approaches. Furthermore, our results contrast with earlier studies that found weak or ambiguous signals of LUCC effects on the regressions based on 'conventional' catchment sets [e.g., Visessri and McIntyre, 2016].
We find that the explicit incorporation of LUC variables as predictors in the derivation of regional models increases regression performance (R 2 adj >0.6) for 66% out of 50 hydrological indices tested. The resulting regional models for the 3 indices selected by their hydrological relevance (RR, BFI, and R2FDC) are shown in Table 3. The importance of different descriptors changes as the model structure is modified and, although there is no evidence of ambiguous results (e.g., inversion of coefficient signs between models), any physical explanation needs cautious interpretation. For instance, shape is expressed by RC and RE and both approach 1 as the catchment approaches a circle [Schumm, 1956]. In contrast to RC, which is a function of AREA and PER, RE is inversely proportional to the maximum length parallel to the principal stream (LPL). As a result, RC keeps a positive relation to RR (circularity would favor water yield), while RE would diminish both RR and R2FDC (i.e., a smaller water storage in short catchments may affect the long-term hydrological regulation). Similar to the results of Visessri and McIntyre [2016], elevation is necessary to explain the RR and BFI, which is possibly due to its relation to soil properties. Correa et al. [2016] have found that in mountainous catchments of southern Ecuador, hydrological responses are highly influenced by soils, but the effect of geology cannot be distinguished. Although soil data in this region is scarce, Buytaert et al. [2005] have presented general trends that link soil properties to high-elevation and rainfall. Additionally, southern catchments (Figure 1), where rainfall seasonality is strong and water yield is generally low, are commonly located at higher elevations than their counterparts further north [Ochoa-Tocachi et al., 2016]. Other interesting results are those of the meteorological descriptors. As expected, the wetness excess in the water balance (explained by RPPE) is positively related to RR and BFI, while rainfall intensities (RMED1D) negatively affect the short-term hydrological regulation. Furthermore, in those catchments where an important number of months lack precipitation, virtually the totality of the streamflow is sustained by a baseflow-dominated regime, which may explain the positive relation between SINDX and BFI.
The effects of LUCC in Andean catchments [see e.g., Buytaert et al., 2006a,;C elleri and Feyen, 2009;C elleri, 2010;Crespo et al., 2010;Ochoa-Tocachi et al., 2016] are clearly reflected in the 3 regional relationships. R 2 adj range between the minimum and maximum performance values obtained from the set of validation models. c D range between the minimum and maximum widths of the 95% confidence interval from the set of validation models. d Mean absolute error (mean of the residuals between observation and prediction for each excluded catchment). e Variance of the residuals (between observation and prediction for each excluded catchment).

10.1002/2016WR018596
Vegetation cover (LC GRASS , LC FOREST ), especially the percentage of tall flora, affects negatively water yield (RR) but favors the long-term hydrological regulation (R2FDC). Forested catchments are often associated to a trade-off between high water consumption rates and enhanced soil infiltration by tree roots [Bruijnzeel, 2004;Buytaert et al. 2007;Crespo et al., 2012;Beck et al., 2013a]. The negative relation between LC WETLAND and BFI should be treated with care, although it is likely because of a reduced soil storage capacity that favors the occurrence of saturation excess overland flow. Although Visessri and McIntyre [2016] suggested the possibility that swamps and wetlands may reduce the RR for a group of catchments in Thailand, Mosquera et al. [2015] have found clear positive correlations between the extent of permanently near-saturation zones and water yield in Andean catchments. The grey bars correspond to the difference between the actual observation and the regional prediction, and the white bars are for the 95% confidence interval in the predictions.

10.1002/2016WR018596
Signals of land use are also evident in the derived relationships. In natural catchments (LU N ), both RR and BFI are expected to increase with respect to their altered pairs. Although in temperate catchments, a lower RR may be often associated to a conservation management [e.g., Brown et al., 2005], studies in natural Tropical Andean catchments have reported runoff ratios between 0.50 and 0.70 [Buytaert et al., 2006a,;Ochoa-Tocachi et al., 2016]. Even though Padr on et al. [2015] argue that tipping-bucket rain gauges may underestimate the contribution of drizzle to total precipitation in these ecosystems, this error would be smaller than 15%. Agriculture (LU A ) is related positively to RR and negatively to BFI, which is consistent to the attributed impacts of artificial drains and soil hydraulic conductivity that enhance the catchment drainage [Buytaert et al., 2004[Buytaert et al., , 2005Crespo et al., 2010]. Grazing (LU G ) is found to be negatively related to RR. The grazing extent is linked to forest cover removal [Molina et al., 2015] and, in contrast to the common expectation [see also Bosch and Hewlett, 1982;Bearup et al., 2014;Livneh et al., 2015], Adams et al. [2012] and Biederman et al.; have reported that water yield following tree die-off can potentially decrease rather than increase and, similarly, Crespo et al. [2010] have also observed water yields 15% lower in grazed Andean catchments than in their natural counterparts. The impacts of livestock, however, depend on the animal density, and have been associated to soil compaction affecting water regulation [D ıaz and Paz, 2002;Quichimbo, 2008]. Nonetheless, this descriptor is neither in BFI nor in R2FDC regional models, which suggests that its effects may pass unnoticeable in aggregated indices [see also Crespo et al., 2011;Ochoa-Tocachi et al., 2016]. Lastly, in forested catchments (LU F ), the BFI appears to be impaired. This contrasts to the expected improvement in hydrological regulation due to an enhanced soil infiltration caused by tree roots; however, the apparent role of exotic plantations to control catchment regulation (e.g., flooding) is still under debate [C elleri, 2010]. Negative hydrological impacts of exotic pine plantations in Andean catchments have also been reported by Buytaert et al. [2007], Crespo et al. [2010], and Ochoa-Tocachi et al. [2016].
Under the leave-one-out cross-validation of the regional models, we find that LUC variables significantly increase their predictive capacity and reduce the prediction uncertainty. For the 3 tested indices, on average, the mean absolute error was reduced by 53% and the variance of the residuals by 79%. In the case of the R2FDC, the main enhancement was observed when adding LC rather than LU, which suggests that vegetation (and possibly its effects on soil structure) is the principal driver for controlling the distribution of flows [see e.g., Brown et al., 2005]. On the other hand, the RR regression was slightly impaired when adding LC; yet after including all LUC variables, both model performance and predictive capacity increased. The histograms in Figure 6 show that the distributions of the residuals for the 3 indices better approximate a normal distribution when LUC variables are added to the models. This not only satisfies the main assumption of linear regression [Freedman, 2009], but demonstrates that the inclusion of LUC variables improves the model structure making linear relationships more suitable for regionalization. The remaining uncertainty in index predictions, which is due to different error sources [Kjeldsen and Jones, 2010], possibly includes LUC differences at an intra-catchment scale. Distributed information of different variables (e.g., soils, land use, climatologies [Boorman et al., 1995;Chu et al., 2010;Manz et al., 2016]) is needed for improving regionalization, especially in mountain regions [Visessri and McIntyre, 2016]. In practice, in the absence of data, using regionalized estimations may provide relevant knowledge to evaluate the hydrological impacts of different watershed management practices and rapidly feed results into the decision-making process.

Conclusions
In order to link LUCC signals to catchment hydrological response, we applied a regional analysis that relates a set of hydrological indices to physical and climatic descriptors using a network of collocated, paired catchments. Our model is effectively a regionalization procedure that is thought to be more powerful than pairwise comparisons and evidence-based approaches separately. The collocation of catchments in the monitoring setup is meant to increase the contrast of LUC variables and to minimize climatic and physical differences between pairs. We find that the explicit inclusion of LUC variables in the derivation of regional regressions clearly improves model performance and predictive capacity, while reduces uncertainty in predictions. By treating each catchment as 'ungauged' during the validation process yet still keeping its pair with similar physical characteristics as part of the regionalization, we attribute the increasing accuracy in hydrological index predictions to the LUC information added by the other contrasting pairs. A promising route for further research may include a sensitivity analysis of performance to the number and nature of pairs included in the regionalization.
Furthermore, this methodology can be applied to evaluate ungauged catchments, in which physical and climatic descriptors may be available but streamflow time series are absent. Robust extrapolation to ungauged catchments can be improved, including an adequate quantification of uncertainty, if reliable regional regression models can be obtained from a sensible number of distributed, collocated catchments covering different ecosystems, characteristics, and contrasting LUC types. Predicted hydrological responses in the ungauged catchment can be compared between different LUC regimes using regionalized hydrological indices. This is a useful approach to generate information about the impact of LUCC on the hydrology of data-scarce regions, with potential application in other regions of the world.
Lastly, when using index predictions, for example, for constraining a rainfall-runoff model in ungauged catchments, different streamflow responses are expected to represent and impact on particular parts of the hydrograph [Yadav et al., 2007]. Modeling these different parts would unlikely achieve good performance unless specific criteria are included in the conditioning Bulygina et al., 2012]. A sensible combination of hydrological indices should provide a better predictive uncertainty constraint than using individual characteristics [Almeida et al., 2016]. The present study demonstrates that the inclusion of LUC variables in regionalization exercises effectively improves regression performance for most of the hydrological indices that characterize different features of the flow. This reflects the impacts of LUCC on the hydrological response, and thus potentially increases predictions in ungauged basins.