Volume 56, Issue 7 e2019WR026855
Research Article
Free Access

Effects of Spatial Variability in the Groundwater Isotopic Composition on Hydrograph Separation Results for a Pre-Alpine Headwater Catchment

Leonie Kiewiet

Corresponding Author

Leonie Kiewiet

Department of Geography, University of Zürich, Zürich, Switzerland

Correspondence to: L. Kiewiet,

[email protected]

Search for more papers by this author
Ilja van Meerveld

Ilja van Meerveld

Department of Geography, University of Zürich, Zürich, Switzerland

Search for more papers by this author
Jan Seibert

Jan Seibert

Department of Geography, University of Zürich, Zürich, Switzerland

Department of Aquatic Sciences and Assessment, University Uppsala, Uppsala, Sweden

Search for more papers by this author
First published: 04 May 2020
Citations: 4

Abstract

Isotope hydrograph separation is a powerful tool to investigate catchment functioning. In most hydrograph separation studies, a pre-event baseflow sample is used to represent the pre-event water, and thus, baseflow is assumed to be a mixture of all the water that is stored in the catchment. However, baseflow may not be representative of all water stored in the catchment because some sources may not contribute to baseflow. This is problematic when the isotopic composition of the sources is highly variable. We quantified the effects of spatial variability in the shallow groundwater isotopic composition on pre-event water characterization and hydrograph separation results. We compared the composition of groundwater sampled at 38 wells in a 0.2 km2 pre-alpine catchment with stream water sampled before, during, and after three rainfall events. We estimated the number of groundwater samples needed to characterize the average groundwater composition in the catchment and its spatial variability and compared the results of two-component hydrograph separations for different ways to characterize the pre-event water. We found that differences in the calculated pre-event water fractions and uncertainties were large and depended on which and how many samples were used to characterize the pre-event water composition. Analyses based on a limited number of groundwater samples likely underestimate the real uncertainty and can give a false impression of accuracy. Our results highlight the importance of representing the variability in the pre-event water composition when applying hydrograph separation analyses. We therefore recommend sampling pre-event water at multiple locations or estimating the variability based on literature values.

Key Points

  • Baseflow did not reflect the catchment average groundwater composition
  • Uncertainties in hydrograph separation are likely higher than usually reported due to spatial variability in pre-event water
  • Using samples that represent the spatial variability in pre-event water composition yields more robust pre-event water fractions

Plain language summary

For prediction of floods, droughts, or water quality, it is important to understand how rainfall becomes streamflow. One question is how much rainfall contributes to streamflow immediately (“new” water) and how much of the streamflow is groundwater that has been in the catchment for some time (“old” water). One way to answer this question is to look at changes in the stream water composition. For this, the composition of the “old” water needs to be specified, for instance, by taking a stream water sample before the rain starts or a number of groundwater samples. Usually, researchers take only a few samples to determine this “old” water composition. However, the groundwater composition varies from location to location. We calculated at how many locations one has to take a groundwater sample to reliably estimate the “old” water composition. We also calculated the amounts of “new” and “old” water in streamflow for three rainfall events based on different samples to characterize the “old” water. We found that the calculated amounts were different and that using more samples provides more robust results. Thus, we should take multiple samples that represent the variability in groundwater across the entire catchment when estimating rainfall and groundwater contributions to streamflow.

1 Introduction

Groundwater is the main contributor to streamflow in undisturbed headwater catchments in temperate climates (Buttle, 1994; Klaus & McDonnell, 2013; Rodhe, 1987; Sklash & Farvolden, 1979), but the relative contribution of groundwater varies between and during events and is affected by antecedent moisture conditions and rainfall amount and intensity (e.g., Fischer et al., 2016; Penna et al., 2015; Tetzlaff et al., 2014). The two-component isotope hydrograph separation method is often used to determine the relative contributions of pre-event water (or groundwater and soil water) and event water (precipitation) to the stream. The method assumes conservative mixing of event and pre-event water. One of the main assumptions of isotope hydrograph separation is that these sources have a constant and distinctly different isotopic signature or that any variation in the signature can be accounted for (Buttle, 1994). For a viable hydrograph separation, the variability in the signature of the water sources should be smaller than in streamflow (Hooper, 2001). However, the spatial variability in the isotopic signature of the water sources is often not fully characterized or accounted for due to a lack of data.

Usually, we assume that pre-event streamflow is a mixture of all the water that is stored in the catchment and thus represents the catchment average pre-event water composition (Cpe). However, the water that is stored in a catchment can be highly variable in its isotopic composition (Kendall et al., 2001; Kiewiet et al., 2019; McDonnell et al., 2007). This spatial variability in the pre-event water composition does not affect hydrograph separation results as long as the relative contributions of the different water sources to streamflow during the event do not differ from the pre-event contributions (Figure 1, Situation II). However, not all parts of the catchment are hydrologically connected to the stream during baseflow conditions (Jencso et al., 2010; Jencso & McGlynn, 2011), and the relative contributions from different groundwater (and soil water) stores change with the expansion of the contributing area and connection of different source areas (Rinderer et al., 2019) (Figure 1, Situation III). For example, McGlynn and McDonnell (2003) found that between events, throughout small events, and in the early part of large events, streamflow consisted mainly of riparian groundwater in a 2.5-ha, steep headwater catchment in New Zealand. However, for large events, the contribution of hillslope runoff was similar to that of riparian groundwater. Similarly, Jencso et al. (2010) showed that the composition of stream water in the moderately sloping Tenderfoot Creek Experimental Forest in the United States (range subcatchment sizes: 3–22.8 km2) shifted toward the hillslope signature once a groundwater connection was established. Other three-component hydrograph separation studies have also shown significant contributions of hillslope water during rainfall events (Burns et al., 2003; Inamdar & Mitchell, 2007; Penna et al., 2016). If hillslope groundwater has a composition that is different from riparian groundwater and contributes to the stream during an event but not to baseflow, then the pre-event baseflow sample does not accurately reflect the composition of the pre-event groundwater that contributes to streamflow during the event (Figure 1, Situation III). The effects of this difference between the sampled and the actual pre-event water composition on hydrograph separation results have been highlighted by modeling studies (e.g., McCallum et al., 2010; Jones et al., 2006) but have not been quantified with field data on the variability in the isotopic composition of groundwater.

Details are in the caption following the image
Schematic representation of the changes in the contributing area (white area: does not contribute to streamflow, light gray area: only contributes during intermediate or high flow conditions, dark gray area: always contributes) and the spatial variability in the groundwater composition (represented by different colored dots) for three situations, and contributions of event (Ce) and pre-event (Cpe1 and Cpe2) water to streamflow (Ct) during peakflow conditions for each situation. Situation I (top): There is no significant spatial variability in the isotopic composition of the groundwater. Even though the contributing area changes during the event, the pre-event baseflow sample characterizes the pre-event water composition well. Situation II (middle): Even though there is significant spatial variability in the isotopic composition of the groundwater, the hillslopes do not contribute to streamflow and the relative contributions of the water sources do not change during the event. The pre-event baseflow sample, thus, represents the pre-event water that contributes to streamflow during the event. Situation III (bottom): There is significant spatial variability in the isotopic composition of the groundwater and the contributing area changes during the event. The pre-event baseflow sample does not adequately represent the pre-event water that contributes to streamflow during the event.

A theoretic calculation, assuming that streamflow is a mixture of riparian groundwater, hillslope groundwater, and precipitation but that pre-event streamflow (i.e., baseflow) is fed only by riparian groundwater, provides insight in the associated error in the event water fraction (see supporting information S1). For example, if the difference in the isotopic composition of riparian and hillslope groundwater is 2‰ (with the hilllslope pre-event water source being more depleted than the riparian water that contributes to baseflow) and the precipitation is 5‰ enriched compared to baseflow, then the error in the calculated event water fraction is 8% when hillslope pre-event water contributes 20% to streamflow, but the error is 20% when the hillslope water contributes 50% of the streamflow. However, it is unlikely that there are only two sources of pre-event water in a catchment and that both sources are well mixed, as the isotopic composition of groundwater, soil water, seeps, and springs can be highly variable across the catchment. Above all, this theoretical calculation of the uncertainty assumes that we know the contribution of the pre-event water sources, which we usually do not know.

Generally, the uncertainty in tracer-based hydrograph separation studies is estimated using the Gaussian error-propagation method as, for instance, presented by Genereux (1998). In this method the uncertainty depends on the difference in the composition of the event and pre-event water and the variability in the composition of the two water sources (i.e., the spatial and temporal variability in the event and pre-event water composition). Methods to handle the temporal variability in the event water composition are well established and frequently applied (Laudon et al., 2002; McDonnell et al., 1990), but there is often a lack of data on the spatial variability (e.g., Cayuela et al., 2019; Fischer et al., 2017). Information about the spatial variability of the pre-event water composition is rarely available, and thus, the uncertainty due to this variability is not well characterized (Penna & van Meerveld, 2019). Any spatial variability will result in temporal variability if the relative contributions of the water stored in different parts of the catchment change during an event. This is problematic because the total uncertainty in hydrograph separation results is most sensitive to the uncertainty in the component that contributes most to streamflow (i.e., pre-event water in most undisturbed headwater catchments in temperate climates; Genereux, 1998). Due to the lack of information, the variability in the pre-event water composition in the Gaussian error-propagation method is sometimes replaced by the analytical accuracy (e.g., Cayuela et al., 2019; Jefferson et al., 2015). This significantly underestimates the total uncertainty of the hydrograph separation results because the spatial variability is likely much larger than the analytical uncertainty and can even be the largest source of uncertainty (Uhlenbrook & Hoeg, 2003).

An alternative way to assess the uncertainty of hydrograph separation results is to perform multiple hydrograph separation calculations in which the (pre-)event water composition varies over the observed or estimated range. McDonnell et al. (1991) used this approach and showed that a ±1‰ variability in δ2H of the pre-event water resulted in a ±10% variability in the calculated pre-event water fraction (fpe) for the Maimai M8 catchment in New Zealand (i.e., fpe ± 0.10). Similarly, Rodhe (1981) showed that a 0.5‰ variability in δ18O resulted in a ±15% variability in the pre-event water fraction (i.e., fpe ± 0.15) for the boreal Stormyra and Nåsten basins in northern Sweden. Measurements in other catchments indicate that the range in the isotopic composition of groundwater can be as large as the ranges assigned by Rodhe (1981) and McDonnell et al. (1991) but can also be much larger (and admittedly also smaller). For instance, Carey and Quinton (2005) reported a range of 0.7‰ to 0.8‰ δ18O for a scarcely vegetated catchment in Canada based on three campaigns in which they sampled seven groundwater wells. Klaus et al. (2015) reported a range of 1.8‰ δ18O for 14 wells in three small catchments in South Carolina, USA, that were sampled monthly for eight consecutive months. Kendall et al. (2001) concluded from prestorm and poststorm sampling of soil water and groundwater at the artificial hydrohill catchment in China that the variability in δ18O across the catchment and in the soil profile was about 4‰. If the variability in the pre-event water composition is larger than the assumed 0.5‰ δ18O by Rodhe (1981) or the 1‰ δ2H by McDonnell et al. (1991), then the uncertainty in the two component hydrograph separation results is likely also larger than the reported 10% to 15%.

The aim of this study was to assess how the characterization of the pre-event water composition and the spatial variability in the isotopic composition of the groundwater affect two-component hydrograph separation results. We address the sensitivity of the hydrograph separation analysis to the samples that were chosen to characterize the pre-event water composition and how the number (and choice) of groundwater samples affects the calculated pre-event water fraction and its uncertainty. For this analysis, we used groundwater data from 38 wells in the 0.2 km2 Studibach catchment in Switzerland and stream water and rainfall data for three rainfall events. More specifically, we addressed the following research questions:
  • How many wells do we need to sample to adequately represent the average isotopic composition of shallow groundwater and its variability in a small headwater catchment?
  • How different are hydrograph separation results when using samples from different (combinations of) wells or the pre-event streamflow sample to characterize the pre-event water composition?
  • How does the number of groundwater samples used to characterize the pre-event water composition influence the uncertainty of the hydrograph separation results?

2 Study Site Description

The data for this study were collected in the Studibach, a 0.2 km2 headwater catchment of the Zwäckentobel catchment, located in the Alptal, Switzerland (N°47.038, E°8.723). The catchment elevation ranges from 1,270 to 1,650 m above sea level (Figure 2). Mean annual precipitation is about 2,300 mm year−1 and is relatively evenly distributed throughout the year. About one third of the precipitation falls as snow (Feyen et al., 1999). During the snow-free season (June–October), it rains on average every other day (van Meerveld et al., 2018).

Details are in the caption following the image
The Studibach with (a) the stream network (blue), 20-m elevation contour lines (gray), the catchment boundary (solid black line) and boundary of subcatchment C5 (dashed black lines) and the streamflow gauging stations (blue diamonds), groundwater wells (riparian groundwater in red circles, all other (i.e., non-riparian) wells in gray circles), and the location of the rain gauges (1–3, light gray squares). (b) δ2H of the shallow groundwater for the 5 October 2016 snapshot campaign (color ranging from black (more depleted) to white (more enriched)) projected on an aerial photo of the catchment (source: Federal Office of Topography (Swisstopo); aerial images no. 20000090712703). See supporting information S2 for maps of the shallow groundwater isotopic composition for the other snapshot campaigns.

Soil creep and landslides have created a complex topography of very steep slopes and flatter areas. The average slope is 35° (Rinderer et al., 2014). The steeper parts of the catchment (about half of the catchment area) are covered by open coniferous forest (Picea abies L. with an understory of Vaccinium sp.) (Hagedorn et al., 2000). The flatter areas (about a third of the area) are characterized by moorlands or wet grasslands; the remaining area is covered by alpine meadows. The upper part of the catchment is used for cattle grazing in the summer. Springs and streams emerge at the transition from convex to concave slopes (Molnar et al., 2010). The stream response to rainfall is flashy, and previous studies suggest that event water fractions are generally low but highly variable. The average event water fraction for 24 events in the neighboring Erlenbach catchment ranged between 0.04 and 0.75 (median: 0.21; von Freyberg et al., 2018). The maximum event water fraction for five subcatchments throughout the Zwäckentobel catchment ranged from 0.09 to 0.90 for 13 events (Fischer et al., 2016). Event water fractions are largest for large events with wet antecedent conditions (Fischer et al., 2016; von Freyberg et al., 2018).

The flysch bedrock consists of shale, calcareous slate, and sandstone banks and is assumed to be poorly permeable (Mohn et al., 2000). The bedrock is overlain by gleysols. The soils are wet throughout most of the year. For a large part of the catchment the groundwater is close to the soil surface (Rinderer et al., 2014).

The chemistry of the shallow groundwater is dominated by the carbonate-rich bedrock and usually has a calcium-bicarbonate signature, although some sites have high magnesium and sulfate concentrations. Based on the chemical and isotopic composition, the shallow groundwater can be divided into four types, of which three represent hydrogeomorphic units (Kiewiet et al., 2019):
  1. Riparian zone, near-stream areas, and other flat areas with riparian-like characteristics (referred to as “riparian” in the remainder of the text and figures) that are characterized by above average concentrations of manganese, iron, and cobalt.
  2. Hillslopes and steeper areas that are characterized by above average concentrations of copper, zinc, chromium, and nickel.
  3. Areas with discharge of “deep” groundwater that are characterized by higher concentrations of strontium and molybdenum.

The fourth water type was characterized by high magnesium and sulfate concentrations and likely represents flow through a specific part of the flysch. This water type was found in six wells (15% of all wells) of which five were located within 100 m from each other in a1000 m2 subcatchment (Kiewiet et al., 2019).

The isotopic composition of the shallow groundwater in the Studibach is affected by seasonal changes in the precipitation composition and is most depleted directly after snowmelt and most enriched in late summer. Although the difference in the δ2H values was statistically insignificant, riparian and hillslope groundwater (i.e., Types I and II) resembled the recent precipitation (i.e., these sites had a more enriched isotopic composition during the spring to fall sampling period), whereas groundwater Type III had a more depleted isotopic composition (Kiewiet et al., 2019).

There was no significant relation between the shallow groundwater isotopic composition and topography (e.g., slope, Topographic Wetness Index, distance to stream, well depth, and elevation) or hydrodynamic situation (e.g., flushing frequency and persistency of the groundwater table) (Kiewiet et al., 2019; Figure 2). The spatial variability in the groundwater isotopic composition is smallest directly after snowmelt and in late fall (Kiewiet et al., 2019). During this period, it is also most similar to baseflow (Kiewiet et al., 2019). This reflects the seasonal changes in the connectivity based on groundwater level observations between 2010 and 2014. Approximately 27% of the catchment area was continuously connected to the stream in March to June 2013, whereas only 9% of the catchment area was continuously connected in July 2013 (Rinderer et al., 2019).

3 Methods

3.1 Field Measurements

The main analyses in this study are based on stream water samples for three events (A–C) and groundwater sampled on two different dates (Table 1). We sampled one additional rainfall event for which hydrograph separation was not possible, but we use this event (D) to exemplify how the composition of baseflow and groundwater may differ.

Table 1. Characteristics of the Events: Total Rainfall (mm), Average and Maximum 10-min Intensity (mm h−1), Duration of the Event (hr), Number of Streamflow Peaks During the Sampling Period, and Minimum (Qmin) and Maximum Specific Discharge (Qmax) During the Event (both in mm h−1)
Event A Event B Event C
Date event 3 October 2016 3 October 2017 5 October 2017
Date snapshot campaign 5 October 2016 (I) 12 October 2017 (II) 12 October 2017 (II)
Rainfall Total rainfall mm 15a 27a, b, a, b 29a, b, a, b
Average intensity mm h−1 1.0 3.8 1.9
Maximum Intensity mm h−1 7.2 20.4 8.4
Duration hr 15 8.5 13
Number of samples 10 8 11
Total rainfall between event and snapshot campaign mm 0.2 8.8 42
Streamflow Number of peaks during sampling period 3 1 1
Qmin mm h−1 0.2c 0.8 0.8
Qmax mm h−1 0.7c 4.3 3.0
Qmax/Qmin 3.5 5.4 3.8
Number of streamflow samples 24 22 23
  • a Average values based on measurements at RG1 and RG2
  • b Average values based on measurements at RG1 and RG3
  • c Specific discharge for the catchment is estimated based on the streamflow measured at sub-catchment C5 because of missing data for the catchment outlet

3.1.1 Hydrometric Measurements

Rainfall was monitored at three locations with tipping-bucket rain gauges (0.2-mm resolution, Davis, Odyssey Dataflow Systems Pty Limited, New Zealand) (Figure 2a). Stream stage was monitored every 5 min at the (sub)catchment outlets (Figure 2a) with a pressure transducer (DCX-22 CTD, Keller AG für Druckmesstechnick, Switzerland). The pressure data were corrected for changes in barometric pressure using the temperature and elevation adjusted barometric pressure data from the MeteoSchweiz meteorological station in Einsiedeln (910 m above sea level; ~10 km from the catchment). The stream stage data were converted to streamflow using a stage-discharge relationship based on 16 (C5, V-notch weir) to 20 (catchment outlet, stream stage measured directly in the river) salt dilution measurements. Due to technical issues, we do not have a complete time series of stage at the catchment outlet for Event A. We used the streamflow time series at C5 (Figure 2a) to estimate the streamflow at the outlet. More specifically, we compared the streamflow for the four months directly following Event A for both sites and used this relation to estimate the flow at the outlet for the 15-day data gap. The coefficient of determination (r2) between specific discharge at the catchment outlet and at the C5-subcatchment outlet was 0.66; the root-mean-square error was 0.75 mm h−1 (the 10th and 90th percentiles of specific discharge at the catchment outlet from November to March are 0.35 and 2.11 mm h−1, respectively). Although the streamflow magnitude during Event A is thus uncertain, the average pre-event water fraction would be minimally affected because the same offset would be used for the entire event.

3.1.2 Stream Water, Rainfall, and Groundwater Sampling

We used automatic samplers (full-size portable sampler, 6712, ISCO Teledyne, USA) to sample streamwater at the catchment outlet. We manually turned the samplers on when the predicted time of the onset of precipitation was during the day and used a timer if the event was predicted to start during the night. We used a multi-interval program and adjusted the sampling interval to the predicted event duration. The first six stream water samples were taken every 10 (only the shortest event: A) to 20 min; the remaining 18 samples were taken at an hourly interval.

We used sequential rainfall samplers (built after Kennedy et al. (1979), but see Fischer et al. (2019) for a detailed description) to sample the rainfall at rain gauge locations RG1 and RG3 (Figure 2a) and additionally sampled at location RG3 during Event A. The samplers function mechanically, so we calculated the time of sampling from the rainfall time series. Each sampler had 100-ml glass bottles and a collection area of 214 cm2, which resulted in one sample for approximately every 5 mm of rainfall. The maximum number of bottles that could be filled was 12. We emptied the rainfall and stream water samplers within 24 hr after the event to avoid isotope fractionation. The samples were collected in polyethylene bottles (50 ml) and stored in the fridge (6°C) until they were transferred, within a few days, to 20-ml glass vials with a membrane screw cap.

We used the groundwater samples collected during two baseflow snapshot sampling campaigns (I and II) for the main analysis and a third sampling campaign (III) for the comparison with Event D. The snapshot campaigns are described in detail by Kiewiet et al. (2019). In short, the wells in the catchment were installed in 2009–2010 (Rinderer et al., 2014). They were hand-augered until the bedrock, screened over their entire length, except for the top 10 cm, and were sealed with a bentonite layer. The locations of the wells were based on the distribution of the Topographic Wetness Index (Beven & Kirkby, 1979), so that the wells cover the range of wet and dry locations in each subcatchment (Rinderer et al., 2014). More specifically, 8 wells are located at ridge sites, 22 at midslope sites, and 21 at footslope locations; 20 of the wells are located in forested areas and 31 in non-forested areas. Well depths range from 0.5 m at ridge sites to 2.5 m at footslope locations. All wells were purged the day before the sampling campaign by pumping them dry or extracting at least twice the well volume. During the snapshot sampling campaigns, groundwater samples were taken from all wells that contained water (n = 34 to 38). After sample collection, the groundwater samples were treated and stored similarly as the stream water and rainwater samples. The electrical conductivity (EC) of each sample was measured in the field with a handheld device (Multi 3420, WTW GmbH, Germany), except for the stream water samples of Event A, which were measured only later in the laboratory due to equipment malfunctioning.

All stream water, rainwater, and groundwater samples were analyzed for stable water isotopes with a Cavity Ring-Down Spectroscope (L2140-I (CRDS) or L2130-I (CRDS), Picarro, Inc., USA) at the Chairs of Hydrology, University of Freiburg, Germany. The reported precision is ±0.16‰ for δ18O and ±0.6‰ for δ2H. We calculated the line-conditioned excess (LC-excess; Landwehr and Coplen, 2006) which describes the deviation of a sample from the local meteoric water line. None of the samples deviated significantly from the local meteoric water line: the median LC excess was −2.0‰ and −2.6‰ δ2H for stream water and groundwater samples, respectively. Using δ18O or δ2H as a tracer yielded similar hydrograph separation results. Because the ratio of precision to range (range δ18O: 13.98‰ and range δ2H: 121.9‰) was more favorable for deuterium, we only show the results for δ2H here.

3.2 Data Analyses

3.2.1 Number of Samples Required to Characterize the Isotopic Composition of the Groundwater

We estimated how the number of sampled wells affects the estimates of the average and standard deviation of the isotopic composition of the groundwater in the catchment. We randomly selected a number of wells (without replacement) and calculated the average and standard deviation of δ2H and EC for the selected samples. The selections were based on a constrained random approach (i.e., the values were randomly selected samples from our wells, but the locations of the wells were chosen based on their topographic characteristics; see section 3.1.2). We did this for all possible numbers of wells (1 to 38) and for all possible number of riparian wells (1 to 11). We found that the standard deviation of the average pre-event water composition for 1,000 realizations (i.e., 1,000 random selections of wells for each set of number of wells) differed less than 0.01‰ and therefore chose to limit our analysis to 1,000 realizations rather than computing all possible combinations (e.g., 3.5 · 1010 combinations are possible when sampling 19 out of 38 wells; unordered random sampling n!/(k!(n-k)!), where n is the total number of observations and k the number of selected observations).

We also used the basic confidence interval equation based on the normal distribution (equation 1) and rearranged it to solve for the sample size (n, equation 2) to estimate the number of wells that need to be sampled to obtain an estimate of the average isotopic composition with a 95% confidence interval. This calculation must be viewed as a rough estimation because it assumes that the samples are independent, which can be debated for shallow groundwater, and because the groundwater isotope and EC measurements were only approximately normally distributed (the Shapiro-Wilk test suggested that the data were normally distributed, p values were 0.35 and 0.41 for δ2H and 0.61 and 0.55 for EC for Campaigns I and II, respectively).
urn:x-wiley:00431397:media:wrcr24628:wrcr24628-math-0001(1)
urn:x-wiley:00431397:media:wrcr24628:wrcr24628-math-0002(2)
where n is the required number of wells, σ is the standard deviation, E is the error margin (here we use a value of 1.0‰, 1.2‰, and 2.5‰), and tα/2 is the critical value associated with a specific confidence level. tα/2 approaches 1.960 for a 95% confidence interval for sample sizes larger than 100. We used tα/2 values from the t distribution (Pearson & Hartley, 1954) when we applied this equation to small sample sizes (n < 30) (e.g., when selecting three samples, two degrees of freedom, tα/2 = 4.303).

3.2.2 Hydrograph Separation

3.2.2.1 General Approach

For the three events (A–C), we applied a two-component hydrograph separation to determine the pre-event water fraction:
urn:x-wiley:00431397:media:wrcr24628:wrcr24628-math-0003(3)
where fpe is the fraction of pre-event water, Ct the isotopic composition of stream water, and Ce and Cpe the isotopic composition of event water and pre-event water, respectively. For the pre-event water composition (Cpe), we used the pre-event baseflow sample, the baseflow sample taken during the snapshot sampling campaign, or different groundwater samples taken during the snapshot sampling campaign. We did not use any soil water samples nor did we perform three-component hydrograph separations, although soil water may also contribute to streamflow. We used the incremental weighted mean of the rainwater samples (McDonnell et al., 1990) to characterize the event water composition (Ce). We did not consider any spatial variability in the event water isotopic composition. Spatial sampling of rainfall in the Zwäckentobel suggested that the variability in event water composition does not vary significantly with elevation (Fischer et al., 2017), and our data from the two rain gauges did not suggest a significant relation either. However, the spatial variability in the event water composition can be large due to the complex topography, surrounding mountains, and forest cover, and the two rain gauges might not have been enough to capture this variability.

The two snapshot sampling campaigns used in the analyses are a subset of the nine campaigns during the snow-free seasons of 2016 and 2017 of Kiewiet et al. (2019). Ideally, the snapshot groundwater sampling campaigns would have taken place right before the sampled rainfall events. The original goal of the snapshot sampling campaigns was to determine the spatial variability in shallow groundwater composition across the catchment. The timing of these campaigns was therefore not aligned with the event sampling (which would have been logistically very challenging due to the time required to purge and sample all groundwater wells and low predictability of the moderately-sized the rainfall events in this mountainous terrain). However, two of the snapshot sampling campaigns took place shortly (2–9 days) after the sampled rainfall events (Table 1). Daily precipitation measurements and samples from the neighboring Erlenbach catchment (precipitation gauge and sampler ~500 m from the Studibach outlet; Rücker et al., 2019) suggest that there was no precipitation between the sampled events and the snapshot sampling campaigns (Event A) or there were only small, low-intensity events with an isotopic composition similar to the rainfall before the sampled event (Events B and C; see supporting information S3). We assume that these small- to medium-sized events did not significantly change the isotopic composition of the shallow groundwater because their isotopic composition was typical for the season. Furthermore, we assume that these events did not affect the spatial variability in the isotopic composition of the shallow groundwater. The observed spatial variability in the groundwater composition was large for all of the nine sampling campaigns (median standard deviation of δ2H for all nine snapshot campaigns: 3.8‰ (range: 2.3‰ to 9.3‰; Campaign I: 5.3‰; Campaign II: 3.9‰). Thus, although the calculated pre-event water fractions might be somewhat inaccurate because the samples taken during the snapshot sampling campaigns do not perfectly represent the pre-event groundwater prior to the sampled events, we expect that this difference is small and has a negligible effect on the sensitivity and uncertainty analyses (described below).

3.2.2.2 Sensitivity to the Characterization of the Pre-Event Water Composition

To determine the sensitivity of the hydrograph separation results to the characterization of the pre-event water composition (Cpe), we used different (combinations of) samples (Table 2). We used the pre-event baseflow sample (BFpe) and baseflow sampled during the snapshot campaign (BFss), the average isotopic composition for all groundwater wells (GWavg), the average isotopic composition for all riparian wells (RPavg), the composition for individual wells (GW) or individual riparian wells (RP), and subsets of three, six, or nine randomly selected groundwater wells (GWn) or riparian wells (RPn). Note that calculations for all groundwater wells (GWavg or GWn) also include all riparian groundwater wells. When using the isotopic composition derived from samples taken at individual wells, we repeated the hydrograph separation 11 (RP) or 38 (GW) times, that is, once for each individual well. When selecting subsets of three, six, or nine wells, we repeated the hydrograph separation calculation 1,000 times using the average composition of the randomly selected wells. For 50 realizations the standard deviation in the calculated pre-event water fraction was less than 0.1% (i.e., difference in standard deviation fpe < 0.001). This implies that the number of 1,000 repetitions was sufficient.

Table 2. Abbreviations, Description of the Sample(s) Used for each Pre-event Water Characterization, and Reason for Including the Method to Characterize the Pre-event Water Isotopic Composition (Cpe)
Abbreviation Sample(s) used for the pre-event water characterization Reason for selecting this (set of) sample(s)
BFpe and BFss A baseflow sample taken before the event (BFpe) or during the snapshot sampling campaign (BFss). Baseflow is often assumed to represent the average composition of the water stored in the catchment.
RPavg The average isotopic composition of all riparian wells sampled during the snapshot campaign. In many catchments wells are mainly located in the riparian zone or near the stream. Furthermore, this groundwater most likely contributes to baseflow.
GWavg The average isotopic composition of all wells sampled during the snapshot campaign. These samples are assumed to represent the average composition of shallow groundwater in the catchment.
RP or GW

The composition of a sample from an individual (riparian) well.

This represents a situation where only one well is used to characterize the composition of the groundwater.
RPn or GWn The average composition of three, six or nine randomly selected wells (GWn) or three, six, or nine randomly selected riparian wells (RPn). This represents a situation where sampling is limited to three, six, or nine wells throughout the catchment or the riparian zone to represent the average groundwater composition.

We report the median, the minimum, and the 10th to 90th percentile range of the event-averaged pre-event water fraction for each pre-event water characterization. To determine the significance of the difference in the median event-averaged pre-event water contribution to streamflow for the different characterization methods, we performed pairwise comparisons for all combinations of characterizations (Tukey-Kramer test; Sheskin, 2003; Tukey.HSD in the “agricolae” R-package). We used a 95% confidence level for all statistical analyses.

For about 10% of the groundwater samples (different wells for each event), hydrograph separation led to physically impossible pre-event water fractions (i.e., smaller than 0 or larger than 1). We allowed a 2.5% error margin and considered all fpe values outside this range (i.e., fpe < −0.025 and fpe > 1.025) as impossible. We excluded these values from the analysis for calculations of the minimum and event-averaged fpe. For the analyses, in which we repeated the hydrograph separations 1,000 times, we set physically impossible pre-event water fractions to our lower limit or upper limit (−0.025 or 1.025) to reduce the bias induced by the exclusion of these results.

3.2.2.3 Uncertainty Estimation

We estimated the uncertainty of the calculated pre-event water fractions (Wfpe) using the Gaussian error-propagation method suggested by Genereux (1998):
urn:x-wiley:00431397:media:wrcr24628:wrcr24628-math-0004(4)
where Cpe, Ct, and Ce are the isotopic composition of the pre-event water, event water, and stream water and WCpe, WCe, and WCt are the uncertainties for the pre-event water, event water, and stream water composition, respectively. We used the standard deviation of the isotopic composition of the rain samples taken during the event for WCe and the laboratory precision for WCt. We used the standard deviation of δ2H for the groundwater samples that were used in the hydrograph separation calculation multiplied by the appropriate t value (based on the number of samples; Pearson & Hartley, 1954) for WCpe. If only one sample was used to determine Cpe (i.e., for BFpe, BFss, RP, or GW), we used the laboratory precision for WCpe because no information on the standard deviation was available.

4 Results

4.1 Description of the Events

4.1.1 Precipitation and Streamflow

The total rainfall for Events A, B, and C was 15, 27, and 29 mm, which occurred in 15, 8.5 and 13 hr, respectively (Table 1 and Figure 3). The average and maximum 10-min rainfall intensities were 1.0, 3.8 and 1.9 mm h−1 and 7.8, 20.4, and 8.4 mm h−1, respectively. The catchment response to rainfall was quick; streamflow increased 3.5 (Event A) to 5.4 times (Event B) during the events (Table 1). Event A caused multiple streamflow peaks, whereas Events B and C resulted in only one peak (Table 1 and Figure 3).

Details are in the caption following the image
Time series of 10-min precipitation (bar graph, mm h−1) and specific discharge at the catchment outlet (black lines, mm h−1), isotopic composition (δ2H, ‰) of streamflow (SF, orange triangles) and rainwater sampled at the lower rain gauge (RG1, lower P, yellow circles) and upper rain gauge (RG3, upper P, red triangles), and the incremental weighted mean isotopic composition of the rainwater (IWM, gray squares connected by a dashed line) for Events A (left), B (middle), and C (right). The baseflow sample for Event B was taken one day prior to the event but is projected at 04:00 for better visualization. Please note that the axes differ for the different events. See Table 1 for the dates of the events and event characteristics.

4.1.2 Isotopic Composition of Rainwater

The intra-event variability in the isotopic composition of rainfall was large. The standard deviation of the δ2H of rainwater varied from 5.2‰ for Event A (n = 10) to 12.6‰ for Event B (n = 8). For Event A, the rainwater became isotopically more enriched throughout the event, whereas during Events B and C, it became more depleted (Figure 3). During Event B, the rainwater shifted from a composition that was more enriched than stream water to a composition that was more depleted than stream water. However, the incremental weighted mean of rainwater remained 10‰ more enriched than the streamwater, so that hydrograph separation was still possible for this event.

4.1.3 Isotopic Composition of Stream Water

The isotopic composition of the stream water changed toward the composition of the precipitation during all events (Figure 3). The stream water isotopic composition changed as soon as the water level rose, but the magnitude of the response depended on the amount of rain and the difference in the isotopic composition of the rainwater and pre-event baseflow. Pre-event baseflow and the incremental weighted mean of rainwater were isotopically most similar for Event B (difference in δ2H: 10.7‰ to 18.1‰) and most different for Event A (difference in δ2H: 27.0‰ to 31.9‰). The stream water isotopic composition changed most during Event A (from −70.5‰ to −65.7‰) and least during Event C (from −69.1‰ to −65.2‰; Table 3 and Figure 4). The relation between the stream water isotopic composition and specific discharge varied from being rather linear (Event A) to more hysteretic (Events B and C) (Figure 4). For Event C, the isotopic composition changed markedly just after peakflow (from −65.2‰ to −67.3‰).

Table 3. Average δ2H ± Standard Deviation (‰) for Event Water (Ce), Pre-event Water (Cpe) Based on the Sample Taken Before the Start of the Event (BFpe), the Streamflow Sample Taken During the Snapshot Sampling Campaign (BFss), the Average of All Samples From Riparian Wells (RPavg), and the Average of All Groundwater Samples (GWavg) and Minimum and Maximum δ2H for Streamwater (Ct-min,Ct-max) for Events A–C
Event A B C
Date stormflow event 3 October 2016 3 October 2017 5 October 2017
Date groundwater campaign (number) 5 October 2016 (I) 12 October 2017 (II) 12 October 2017 (II)
Baseflow pre-event BFpe Cpe −70.5 −73.8 −72.5
Baseflow snapshot campaign BFss Cpe −71.0 −72.7 −72.7
Riparian groundwater RPavg Cpe −70.2 ± 4.3 (11) −71.8 ± 3.6 (11) −71.8 ± 3.6 (11)
All groundwater GWavg Cpe −73.0 ± 5.3 (38) −73.2 ± 3.9 (38) −73.2 ± 3.9 (38)
Rainwater (average) Ce −37.0 ± 5.2 (10) −64.6 ± 12.6 (8) −49.2 ± 6.5 (11)
Streamwater (minimum) Ct-min −70.5 −73.7 −69.1
Streamwater (maximum) Ct-max −65.7 −69.1 −65.2
  • Note. The sample size is given in parentheses (n) for sample sizes larger than 1.
Details are in the caption following the image
Relation between specific discharge (mm h−1) and stream water isotopic composition (δ2H, ‰) for Events A–C. The color of the symbols changes from light (first sample: square) to dark (last sample: triangle). The hysteresis index class (c.f. Zuecco et al., 2016) was: I for Events B and C, indicating a clockwise loop and increase from the initial concentration and IV for Event A, indicating a counterclockwise direction and increase from the initial concentration. Note that the axes differ for the events.

4.1.4 Spatial Variability in the Isotopic Composition of Groundwater

The spatial variability in the isotopic composition of the shallow groundwater was large: δ2H varied from −86.3‰ to −67.8‰ and −80.9‰ to −57.2‰ for snapshot Campaigns I (5 October 2016) and II (12 October 2017), respectively (Table 3). Riparian groundwater (δ2H mean ± sd: −71.8 ± 3.6‰ and −70.2 ± 4.3‰) was slightly more enriched than the catchment average groundwater (i.e., average of all sampled groundwater wells: −73.2 ± 3.9‰ and −73.0 ± 5.3‰) for both snapshot campaigns (Figure 5). This difference was larger than twice the laboratory precision (0.6‰ δ2H) but not statistically significant. Baseflow at the catchment outlet during the snapshot campaigns (−71.0‰ and −72.7‰ δ2H, Table 3 and Figure 5) differed 0.3‰ and 2.2‰ δ2H from the average composition of all groundwater wells and 0.8‰ and 2.5‰ from the average composition of all riparian wells for Campaigns I and II, respectively. Pre-event baseflow differed less than 1‰ δ2H from baseflow sampled during the snapshot campaigns.

Details are in the caption following the image
The 5th to 95th percentile of the average (left column) and standard deviation (right column) of the isotopic composition (δ2H) as a function of the number of randomly selected groundwater samples (n = 1 to 38 for all groundwater wells (GW, gray) and n = 1 to 11 for the riparian wells (RP, red), 1,000 repetitions) based on samples taken during snapshot Campaigns I (upper panels) and II (lower panels). The horizontal lines indicate the isotopic composition of baseflow at the outlet during the snapshot campaign (BFss, dashed lines) and prior to the event (BFpe, solid line). Note that all groundwater wells (GW) also includes the riparian wells.

4.2 Number of Wells Required to Characterize the Isotopic Composition of the Groundwater

The range in the calculated average isotopic composition of the groundwater decreased with an increasing number of samples (Figure 5). The 5th and 95th percentiles of the average isotopic composition of the groundwater for six randomly selected groundwater samples were −75.7‰ and −69.4‰ for Campaign I and −75.4‰ and −70.8‰ for Campaign II (Figure 5). For nine randomly selected samples, they were −75.2‰ and −70.2‰ for Campaign I and −74.9‰ and −71.3‰ for Campaign II. The difference between the 5th and 95th percentiles of the calculated average groundwater composition was less than 2.5‰ (i.e., half of the average change in the isotopic composition of stream water during the three studied events) as soon as more than 21 and 16 randomly selected samples were used to determine the average composition of the groundwater for Campaigns I and II, respectively.

The sample size calculation based on the basic confidence interval equation (equation 2) suggests that in order to obtain an estimate of the average groundwater composition within 1.2‰ δ2H, we would have to sample 41 or 95 wells for Campaigns II and I, respectively. For an estimate of the average within 2.5‰, 12 or 24 wells need to be sampled, respectively. Adding restrictions to the random sampling scheme based on landscape characteristics (e.g., by selecting only samples from wells that are close to the stream or that have a high Topographic Wetness Index) did not yield different results.

Sometimes EC is used instead of the isotopic composition for hydrograph separation (e.g., Inserillo et al., 2017; Pellerin et al., 2008). In our campaigns, the spatial variability in EC was even larger than the variability in the isotopic composition, with a mean ± standard deviation of 443 ± 100 and 414 ± 130 μS cm−1 for Campaigns I and II, respectively. The average change in stream water EC was 98 μS cm−1 (range: 55 to 167 μS cm−1). For an estimate of the average EC of groundwater within half of the change in stream water EC with 95% confidence (i.e., an error smaller than 50 μS cm−1), we would need 18 or 28 samples for Campaigns I and II, respectively.

The range in the calculated variability of the groundwater isotopic composition also decreased with increasing sample size (Figure 5). The 5th to 95th percentiles of the standard deviation of the isotopic composition of the groundwater for six randomly selected groundwater samples were 2.7‰ and 7.7‰ for Campaign I and 1.9‰ and 5.5‰ for Campaign II. For nine randomly selected samples, they were 3.2‰ and 7.1‰ for Campaign I and 2.5‰ and 5.1‰ for Campaign II. The difference between the 5th and 95th percentiles of the calculated standard deviation of the groundwater was less than 1.2‰ (which equals twice the accuracy), as soon as more than 29 respective 22 random samples were used to determine the variability in δ2H of groundwater.

4.3 Sensitivity of Two-Component Hydrograph Separation to the Characterization of the Pre-Event Water Composition

The pre-event water fractions (fpe) were highest for Event A (range: 0.85 to 1) and lowest for Event B (range: 0.65 to 0.95) when the baseflow sample taken before the event was used to characterize the pre-event water composition (BFpe, solid black lines in Figure 6, Table 4). Using different (riparian) wells to characterize Cpe resulted in a large range of the pre-event water fractions; the maximum difference in fpe for samples from individual riparian wells ranged from 0.28 to 0.47 for Events A and Event C, respectively (gray and red lines in Figure 6 for all groundwater (GW) and all riparian groundwater (RP), respectively). The difference between the minimum fpe calculated when a baseflow sample (BFss or BFpe) was used to characterize Cpe and when the average composition of all riparian wells (RPavg) was used, varied between 0.03 for Event B and 0.06 for Event A (Table 4). The difference in the minimum fpe calculated using the baseflow sample or the average isotopic composition of all groundwater samples (GWavg) varied between 0.01 for Event A and 0.12 for Event B.

Details are in the caption following the image
Time series of the calculated pre-event water fraction (fpe) for Events A–C using δ2H as a tracer and the pre-event baseflow sample (BFpe, solid black line), the baseflow sample taken during the snapshot sampling campaigns (BFss, dashed black lines), each sample from a riparian well (RP, red lines), and all other groundwater wells (GW, gray lines) to represent the isotopic composition of the pre-event water (Cpe), as well as the frequency distribution of the event-averaged pre-event water fraction (kernel density plot, right side of each subplot) for each method used to represent the pre-event water composition (BFpe: Black dash, BFss: Black asterisk, RP: red solid line, GW: gray dashed line). See Table 2 for a detailed explanation of the different pre-event water characterization methods. Calculations for all groundwater wells (and thus also the kernel distribution) also include all riparian wells.
Table 4. The Range (Min-Max) and Event-Averaged Pre-event Water Fractions (fpe) for Stream Water at the Catchment Outlet Calculated for Different Characterizations of the Pre-event Water Composition (See Table 2): Using the Pre-event Baseflow Sample (BFpe), the Baseflow Sample From the Snapshot Campaign (BFss), the Average Groundwater Composition of All Riparian Wells (RPavg) or All Groundwater Wells (GWavg), or for Each Riparian Well (RP) or Each Groundwater Well (GW) Individually

Cpe

Event

BFpe BFss RPavg GWavg RP (n = 11) GW (n = 38)
Range fpe for individual sampling times (n = 24)
A 0.85–1 0.84–0.98 0.79–0.92 0.86–1 0.78–0.97 0.66–0.99
B 0.65–0.95 0.71–1 0.68–1 0.77–1 0.59–1 0.54–1
C 0.69–0.97 0.67–0.84 0.67–0.82 0.71–0.87 0.62–1 0.59–1
Event-averaged fpe Event averaged (± sd)
A 0.92 0.91 0.86 0.93 0.89 ± 0.07 0.85 ± 0.10
B 0.78 0.85 0.81 0.91 0.86 ± 0.13 0.79 ± 0.13
C 0.79 0.78 0.77 0.82 0.53 ± 0.12 0.79 ± 0.11
  • Note. For the event-averaged pre-event water fractions calculated based on the samples from the individual wells (RP and GW), the average ± standard deviation are given. Calculated pre-event water fractions below −0.025 or above 1.025 were excluded from the calculations.

The temporal pattern of the change in the pre-event water fraction did not depend on how the pre-event water composition was characterized (Figure 6) because the same data for stream water (Ct) and rainwater (Ce) were used for all calculations. For some stream water samples (up to half of the samples, depending on which characterization for the pre-event water was used), the calculated streamflow fractions were physically impossible (fractions >1.025 or <−0.025). This was particularly the case at the beginning or end of the event when samples from wells with a very different isotopic composition than the baseflow were used.

The event-averaged pre-event water fractions were also sensitive to the choice of the sample used to characterize the pre-event water composition (density plots in Figure 6, Table 4). The spread in the event-averaged fpe was, not surprisingly, largest for the ensemble of the calculations based on the individual well samples, as they spanned the whole range of possible isotopic compositions from which the average groundwater composition was calculated (GW, Table 3 and Figure 6). Selecting only riparian wells to characterize Cpe resulted in a higher event-averaged pre-event water fraction than either a selection from all groundwater wells or a pre-event baseflow sample (Figure 7, Table 4). However, ultimately, the latter will depend on the distribution of isotopic compositions prior to each event and might thus differ from event to event.

Details are in the caption following the image
Boxplots of the event-averaged pre-event water fractions (fpe, left) and the associated uncertainty (Wfpe, right) for Events A–C (rows), when the pre-event water composition is represented by a baseflow sample taken before the event (PE, dash) and a few days later during the snapshot campaign (SS, asterisk) and the average isotopic composition based on samples from one, three, six, or nine randomly selected wells in riparian areas (red) or across the entire catchment (gray) and based on the average composition of all riparian wells (all, gray, n = 11) and all wells across the catchment (all, red, n = 38). Calculations for all wells also include all riparian wells. All boxplots are based on 1,000 random selections of wells. The box extends from the 25th to the 75th percentile, the solid line represents the median, the whiskers extend to the 25th percentile −1.5*interquartile range and the 75th percentile +1.5*interquartile range; the circles represent the outliers.

4.4 Uncertainty of the Event-Averaged Pre-Event Water Fraction

The median uncertainty in the event-averaged pre-event water fraction (Wfpe in equation 4 and Table 5) ranged from a low of 0.04 when using pre-event baseflow to characterize the pre-event water composition for Event A (BFpe) to a high of 0.92 (median value for all combinations) when using three riparian wells to characterize the pre-event water composition (RP3) for Event B. Overall, the calculated uncertainties in fpe were smaller for Events A and C (range: 0.04 to 0.31) than for Event B (range: 0.22 to 0.92) because of the smaller variation in event water isotopic composition (standard deviation of Ce: 5.2‰ for Event A and 6.5‰ for Event C vs. 12.6‰ for Event B).

Table 5. The Event-Averaged Pre-event Water Fraction (fpe) and the Associated Uncertainty (Wfpe, 95% confidence interval, equation 4) When the Pre-event Water Composition (Cpe) Was Based on the Pre-event Baseflow (BFpe) Sample, a Baseflow Sample Taken During the Snapshot Sampling Campaign (BFss), and the Median Event-Averaged Pre-event Water Fractions (fpe) and Associated Uncertainty (Wfpe) When the Average Composition of Samples From One, Three, or Nine Randomly Selected Wells, or all Available Wells From the Riparian Areas (RP1-RPavg) or the Entire Catchment (GW1GWavg) Were Used to Characterize the Pre-event Water Composition (Cpe) for Events A–C
Event A B C
C pe characterization (n repetitions) Event-averaged fpe ± Wfpe 10% - 90% Range fpe Event-averaged fpe ± Wfpe 10% - 90% Range fpe Event-averaged fpe ± Wfpe 10% - 90% Range fpe
BFpe (1) 0.92 ± 0.04abcd 0.78 ± 0.30cdefg 0.79 ± 0.07abcde
BFss (1) 0.91 ± 0.04abcd 0.85 ± 0.22abcdefg 0.78 ± 0.07abcde
RP1 (1,000) 0.92 ± 0.05a 0.88–0.97 0.96 ± 0.48a 0.86–1.03 0.82 ± 0.06a 0.78–0.87
RP3 (1,000) 0.93 ± 0.30b 0.89–0.94 0.94 ± 0.92b 0.87–1.01 0.82 ± 0.30b 0.80–0.84
RP6 (1,000) 0.92 ± 0.14b 0.89–0.94 0.93 ± 0.50c 0.89–0.97 0.82 ± 0.15b 0.80–0.83
RP9 (1,000) 0.93 ± 0.11a 0.90–0.93 0.93 ± 0.38d 0.90–0.95 0.81 ± 0.12c 0.81–0.82
RPavg (1) 0.93 ± 0.09abc 0.91 ± 0.27abcdef 0.81 ± 0.11abcde
GW1 (1,000) 0.86 ± 0.05d 0.82–0.91 0.90 ± 0.43e 0.75–0.97 0.77 ± 0.07d 0.73–0.82
GW3 (1,000) 0.86 ± 0.31c 0.83–0.89 0.84 ± 0.76f 0.76–0.87 0.76 ± 0.29e 0.74–0.79
GW6 (1,000) 0.86 ± 0.15c 0.84–0.88 0.81 ± 0.45g 0.77–0.85 0.76 ± 0.14e 0.75–0.78
GW9 (1,000) 0.86 ± 0.12c 0.84–0.88 0.80 ± 0.38g 0.77–0.83 0.76 ± 0.12e 0.75–0.78
GWavg (1) 0.86 ± 0.06acd 0.78 ± 0.30bcdefg 0.76 ± 0.08abcde
  • Note. The number of repetitions is indicated in parentheses. Calculated pre-event water fractions below −0.025 or above 1.025 were set to −0.025 or 1.025, respectively. Event-averaged fpe values with different superscript letters (a–g) for an event are significantly different.

Increasing the number of samples to determine Cpe reduced the variability in the event-averaged pre-event water fraction and the uncertainty in the pre-event water fraction (Figure 7). Wfpe was largest when three samples were used to calculate the pre-event water composition due to the high t value for small sample sizes and the high standard deviation for some of the combinations of samples (see right column of Figure 5). As a result, the reduction in the median uncertainty was largest when the number of samples increased from three to six (Table 5).

The uncertainty in the pre-event water fraction (Wfpe) was smallest for the calculations based on a baseflow sample or one groundwater sample because we assumed that the uncertainty of the pre-event water composition (WCpe in equation 4) was equal to the measurement precision for this situation. For the uncertainty estimation for the pre-event water fraction based on the selection of three, six, or nine groundwater samples or the average composition of all (riparian) groundwater samples, WCpe was based on the standard deviation of the selected samples and corresponding t value for small sample sizes and thus, to some extent, reflects the variability in the pre-event water composition.

4.5 Event D

In addition to the three events presented above, we also sampled a 130-mm rainfall event between 31 August and 2 September 2017 (Figure 8), which exemplifies the difference between the pre-event groundwater composition and pre-event baseflow. The event lasted 51 h and caused the discharge to increase tenfold, from 0.6 to 5.9 mm h−1. We sampled the groundwater across the catchment on 24 August 2017 (map with groundwater isotopic composition in supporting information S3). The average δ2H for samples from the riparian wells was −67.2‰; the average for the samples from all 34 groundwater wells that contained water was −71.2‰. The baseflow δ2H during the sampling campaign was −68.2‰, whereas the δ2H of the baseflow sample taken just before the rain started on 31 August 2017 was −74.2‰. Rainfall at the start of the event was more enriched than later in the event. The average δ2H for the first 12 rainfall samples (i.e., the rainfall sampled at two locations during the first 8 hr of the event) was −79.8‰. For the next six samples the average δ2H was −108.3‰. The event-averaged rainfall δ2H was −108.8‰ (range: −69.5‰ to −151.9‰).

Details are in the caption following the image
(left) Time series of 10-min precipitation (mm h−1, bar graph) and δ2H of rainwater sampled at the lower rain gauge (yellow circles) and upper rain gauge (red triangles) (upper plot) and specific discharge (line graph) and δ2H of stream water (orange squares) (lower plot). (right) The dual-isotope plot (δ2H vs δ18O) for the stormflow (orange squares), pre-event baseflow (orange circle), baseflow during the snapshot sampling campaign (asterisk), the average groundwater (light gray diamond) and riparian groundwater (red diamond), and the rainwater sampled at the lower (yellow circles) and upper rain gauge (red triangles). The error bars for the (riparian) groundwater samples indicate the average composition ±1 standard deviation. The rainfall samplers were full during the last part of the event, and thus, this part was not sampled.

Although baseflow and rainfall had a similar isotopic composition (−74.2‰ and −79.8‰ δ2H, respectively), stream water became more enriched (change from −74.2‰ to −63.1‰ δ2H, Figure 8) during the first 8 hr of the event. This suggests that neither the groundwater samples taken the week before the event nor the baseflow sample taken before the event represented the pre-event water composition that contributed to streamflow during the first hours of the event. Later (2 September), the streamflow became more depleted and reflected a mixture of baseflow/groundwater and rainfall (Figure 8).

A very simple inverse hydrograph separation calculation, assuming a pre-event water fraction of 0.79 (which is the median fpe for 24 events in the neighboring Erlenbach catchment (von Freyberg et al., 2018)), suggests that the average pre-event water composition must have been approximately −61‰. However, the pre-event water fraction was likely lower (because it was a large event). Assuming a pre-event water fraction (fpe) of 0.3 would imply a pre-event water composition (Cpe) of −43‰. These estimates of the pre-event water fraction are highly uncertain but show that the pre-event water that contributed to the streamflow had to at least be 10‰ more enriched than the baseflow sample taken before the event and also more enriched than the average composition of the groundwater measured in any of the baseflow snapshot campaigns (Kiewiet et al., 2019). In this comparison we did not consider the spatial variability in rainfall isotopic composition. However, we cannot exclude its influence on the stream water composition. It might be that part of the rainfall was more depleted (or enriched) at some locations than we sampled, so that the difference between pre-event baseflow and pre-event water might have been smaller (or larger). Daily precipitation collected at a rainfall sampler in the Erlenbach showed that there was a 17-mm rainfall event with an enriched isotopic composition (−32.4‰ δ2H) on the evening of 24 August (i.e., between the snapshot campaign and the sampled event; see supporting information S2). For comparison, the mean isotopic composition of the daily precipitation samples taken between June and October 2017 was −48.7 ± 23.5‰ δ2H (n = 85); the weighted mean composition of the precipitation was −60.7‰ δ2H.

5 Discussion

5.1 Spatial Variability in Shallow Groundwater Composition

The snapshot groundwater sampling campaigns highlighted the large spatial variability in the shallow groundwater isotopic composition in the Studibach (standard deviations of 3.9‰ and 5.3‰ δ2H and 0.43‰ and 0.60‰ δ18O for Campaigns I and II, respectively). Large spatial variabilities in the isotopic composition of groundwater were also reported by Carey and Quinton (2005) (0.7‰ to 0.8‰ δ18O) and by Klaus et al. (2015) (range: 1.8‰ δ18O and 8.3‰ δ2H). However, in contrast to our observations, their samples indicated evaporative enrichment of the groundwater. Kendall et al. (2001) concluded that the variability of soil water and groundwater was ±4‰ δ18O. Given that a large spatial variability in the isotopic composition of groundwater is thus not uncommon, we expect that the observed variability in the Studibach is a reasonable representation of the actual spatial variability in similar small pre-alpine headwater catchments.

The basic sample size calculation to estimate the average groundwater composition with an error margin that is twice the analytical precision (1.2‰ δ2H) suggested that for the Studibach we would need to sample 41 or 95 wells, provided that the measurements represent all hydrogeomorphic units in the catchment, as was the case in our sampling design. This number of wells and samples is unrealistic in terms of sampling effort for most catchments, even though this error margin still spans 17% to 31% of the change in the isotopic composition of stream water during the three events analyzed in this study. Results for the groundwater EC indicate that one can expect even larger uncertainties than we presented for the isotopic composition when other tracers are used for hydrograph separation (although that will admittedly depend on the tracer, the typical concentrations, and the site characteristics).

A review by Penna and van Meerveld (2019) suggested that only a third of the small catchment (<10 km2) studies that determined the isotopic composition of groundwater sampled five or more wells. The results from this study suggest that for five randomly selected samples the 5th percentile of the standard deviation was 2.2‰ and 1.6‰ for snapshot sampling Campaigns I and II, respectively. This is already larger than the 1‰ δ2H variation for pre-event water suggested by McDonnell et al. (1991) and indicates that a reasonable number of groundwater samples can give a rough estimate of the spatial variability in the pre-event water composition. It also indicates that we should increase the number of spatially distributed samples beyond the typical sampling effort if we want to characterize the spatial variability in the pre-event water composition and obtain a better estimate of the uncertainty of hydrograph separation results.

Although pre-event water dominated the isotopic composition of streamflow, event water also impacted its composition. The event water compositioncan vary strongly in space (Fischer et al., 2017), and this variability might have been larger than captured with our measurement set-up. Therefore, we cannot exclude that the unexpected changes in stream water composition are (partially) caused by spatial variability in the event water composition, rather than only by the spatial variability of pre-event water. For example, the sudden change in stream water isotopic composition during peakflow of Event C (Figure 4) might have been influenced by late Event B rainfall, but it could have also been caused by rainfall that fell during a short episode of increased rainfall intensity (and would thus have been more depleted) or due to spatial variability in event water. Similarly, the unexpected stream water composition for Event D could also partially be due to event water variability.

5.2 Baseflow Does Not Reflect the Catchment Average Groundwater Composition

The results from the snapshot groundwater sampling campaigns show that baseflow is not per se a mixture of all groundwater in the catchment, nor a mixture of all riparian groundwater (Figure 5, and see Kiewiet et al. (2019) for a comparison using multiple tracers). It is not surprising that the composition of baseflow does not reflect the catchment average groundwater composition. Only a small part of the Studibach is hydrologically connected to the stream during baseflow conditions (Rinderer et al., 2019) because of the steep slopes and differences in hydraulic conductivity between the different landscape elements (i.e., the forested hillslopes retain much less water than the flatter grassland sites, where the hydraulic conductivity is much lower). Hence, with the expansion of the connected contributing area during events, different landscape elements with different isotopic signatures contribute to the streamflow mixture (Figure 1).

Singh et al. (2016) report a similar observation when they sampled the isotopic composition of shallow groundwater in two adjacent headwater catchments at the Coweeta Hydrologic Laboratory (North Carolina, USA). They found that the spatial variability of the shallow groundwater composition was larger than the variability in baseflow in both catchments. They showed that during low baseflow conditions (low connectivity) and high baseflow conditions (high connectivity), groundwater and baseflow were almost identical and baseflow was spatially least variable, while during transition periods, the spatial variability in the baseflow composition was largest. Soulsby et al. (2007) similarly showed that the sources of baseflow shifted with changes in the hydrological conditions in the Bruntland Burn catchment (Scotland). They suggested that the much higher stream water alkalinity during low flow conditions reflected the smaller influence of soil water seepage and larger influence of the well-buffered groundwater.

A better understanding of which areas are hydrologically connected to the stream can aid the development of comprehensive sampling schemes to determine the pre-event water composition. This might avoid the calculation of physically impossible (pre-)event water fractions (<0 or >1) to streamflow in hydrograph separation analyses (McDonnell et al., 1991). To accomplish this, we deem it important to also consider events for which isotope hydrograph separation does not “work,” such as Event D on 31 August 2017 (Figure 8). Unfortunately, such events now often remain unpublished. Unexpected responses in stream water composition are, potentially, the result of heterogeneity in the pre-event water composition and could provide important information to test our hypotheses on runoff generation processes and evaluate interpretations of previous hydrograph separation results. The event on 31 August 2017 highlighted that at the beginning of the event stormflow was not a mixture of the sampled baseflow and rainfall and that a different type of water contributed to stormflow.

5.3 Sensitivity and Uncertainty of Hydrograph Separation Results

The calculated uncertainties for the pre-event water fractions (Wfpe) that we obtained (Figure 7 and Table 5) are either comparable to or larger than the uncertainties reported in other studies. Penna et al. (2017) used EC and δ2H in a three-component hydrograph separation to quantify snowmelt fractions in streamflow for the Rio Vauz catchment (Italy) and used the standard deviation of stream water δ2H and samples from springs (collected over a 5-year period) to quantify the uncertainty of the composition of the pre-event water (WCpe). They used a 70% confidence interval and calculated an uncertainty range between 8% to 10% (fpe ± 0.08 to 0.10) for two of the catchments and 6% to 21% (fpe ± 0.06 to 0.21) for the catchment with the smallest snowmelt fraction. Pellerin et al. (2008) used EC to perform hydrograph separation for 19 rainfall events in the Saw Mill Brook watershed (Massachusetts, USA). They assumed ±10 μS cm−1 for WCe and the measured standard deviation over 24-hr baseflow periods for WCpe (±52 to 130 μS cm−1). The reported uncertainties varied between 1% and 10% (median: 4.5%). The minimum uncertainties presented by Penna et al. (2017) and Pellerin et al. (2008) correspond quite closely to our minimum uncertainties (Table 5), but our event-averaged uncertainties were much higher (range Wfpe: 0.14 to 0.50 when using the spatial variability based on six riparian groundwater samples; RP6). Although we only considered the uncertainty due to the spatial variability in the pre-event water composition, it is important to note that uncertainties in hydrograph separations due to the spatial variability in event water can be just as large. For instance, Cayuela et al. (2019) reported an uncertainty of 0.01–0.14 for the Can Vila catchment (Spain). The fpe estimations of Lyon et al. (2009) and Fischer et al. (2017) differed more than 50% between computations based on different rainfall sampling locations for the Upper Sabino catchment (Arizona, USA) and Zwäckentobel (Switzerland), respectively. Altogether, these findings suggest that uncertainties in the pre-event water fractions can be large, even when the variability that is included in the calculations might still be smaller than the actual variability.

The sensitivity of hydrograph separation results to the variability of the pre-event water compositions has been addressed previously, by repeating the hydrograph separation calculations for a range of (observed or estimated) pre-event water compositions. McDonnell et al. (1991) found that a ±1‰ δ2H range in the pre-event water composition led to a pre-event water fraction that was within ±5% (i.e., fpe ± 0.05) of the original estimate, except for the peak flow sample, for which the range was larger. They also found that shifting to three-component hydrograph separation increased uncertainties compared to a two-component approach. Carey and Quinton (2005) varied the event water composition with 2‰ δ18O and found that the calculated pre-event water fraction changed up to 19% (i.e., fpe ± 0.19) for a three-component hydrograph separation using EC and δ18O. Our results demonstrate that the event-averaged pre-event water fraction can differ by 10% to 14% (i.e., fpe ± 0.10 to 0.14) when using three samples from different randomly selected wells for the pre-event water characterization and by 4% to 13% when using seven samples, even if the samples are all from riparian areas (Figure 6).

In most studies, the uncertainty of the hydrograph separation results is estimated with the Gaussian standard error method of Genereux (1998; equation 4 in this manuscript). The uncertainties of WCpe and WCe are ideally based on the observed variability in the catchment. However, that requires knowledge about the spatial and temporal variation in the pre-event and event water composition. Often, only a pre-event streamwater (or baseflow) sample is available (Penna & van Meerveld, 2019), and sometimes researchers even assume that WCpe is equal to the analytical precision of the isotope analyzer (e.g., Jefferson et al., 2015) or present results without any uncertainty estimations (Qu et al., 2017; Zhao et al., 2016). It would be better to use the standard deviation of samples taken from multiple wells, to base WCpe on repeated sampling along the stream (cf. James & Roulet, 2009 or Singh et al., 2016), or to use literature values for the variability in the pre-event water composition. Using the analytical precision for WCpe due to a lack of data on the spatial variability in either the baseflow or the groundwater composition leads to an underestimation of the actual uncertainty because it neglects the variability of the pre-event water composition. This can in turn lead to a wrong interpretation of hydrological processes based on these results.

5.4 The Way Forward to Characterize the Pre-Event Water Composition for Hydrograph Separation

We have shown that a large number of samples might be needed to estimate the average pre-event water composition but that a smaller number of samples already gives an estimate of the variability (Figure 5). We also demonstrated that the outcomes of the hydrograph separation and the uncertainty estimates are sensitive to which samples and how many samples are used to characterize the pre-event water composition (Figures 6 and 7). Additionally, the data from the 130-mm event (Figure 8) and a theoretical example (supporting information S1) show that unidentified water sources can have a large effect on the calculated pre-event water fractions.

At first sight, these results might seem discouraging for hydrograph separation analyses or characterization of the pre-event water composition. One might decide to refrain from it altogether or take it as a challenge to determine a time-variable pre-event water composition that reflects the changes in the contributing areas and use this in hydrograph separation calculations (as it is done for the event water composition and is suggested by Harris et al. (1995)). However, this requires knowledge on the contributing areas and is in conflict with the simplicity and thus attractiveness of the hydrograph separation method. Instead, we are convinced that despite these challenges, isotope hydrograph separation can remain a useful toolbut that the results need to be interpreted with care. Quantifying the uncertainty and sensitivity of the analyses by also considering the spatial variability in the (pre-) event water composition is a first step toward improving our interpretations. The results of this study show that when the variability in the pre-event water composition is included, the uncertainty in the pre-event water fractions is likely larger than reported for most studies in small headwater catchments. This should be acknowledged when comparing results for different events or different catchments.

A lack of data is a challenge for any method that estimates the error in mixing fractions. We, therefore, encourage researchers to sample baseflow or groundwater at more locations than is typically the case. Our results suggest that after sampling a few wells, we would have known that the spatial variability is large and would have had a rough estimate of the variability (Figure 5). We do not think that a network with more than 30 groundwater wells is feasible (or needed) for all research areas but encourage additional studies on the spatial variability in shallow groundwater in different climatic and geologic settings so that literature values on the typical spatial variability in the shallow groundwater composition become available and can be used in other studies. In future studies, it will be essential to consider that although samples from some wells might seem uninformative for hydrograph separation (because they come from areas that might not be hydrologically connected to the stream during small or intermediately sized events), they are still essential to characterize the variability in the catchment (average) pre-event water composition that contributes to streamflow during extreme events.

6 Conclusions

Isotope hydrograph separation is a powerful tool to investigate runoff sources and catchment functioning. For undisturbed headwater catchments in temperate climates, results usually show that groundwater makes up the largest portion of streamflow. However, the assumption of a constant pre-event water or groundwater composition during the event is likely violated in most of these studies because there is not a single well-mixed groundwater source. We assessed the spatial variability in the isotopic composition of groundwater in a small steep, humid headwater catchment and found that the spatial variability in varied isotopic composition of shallow groundwater was large (standard deviation: 3.9‰ and 5.3‰ δ2H). A rough sample size estimation suggests that more than 12 wells need to be sampled to estimate the average groundwater composition within 2.5‰ δ2H (half of the variability in the streamflow during events). The difference between the isotopic composition of baseflow and the average (riparian) groundwater ranged from 0.5‰ to 2.2‰ δ2H. As such, the baseflow sample might represent the average pre-event water that contributes to streamflow during the event, but it might also be different because other sources (e.g., hillslopes) may only contribute to streamflow after the expansion of the contributing area. In other words, an apparent (or even physically impossible) event water contribution might be the result of a temporally varying composition of the pre-event water that contributes to streamflow.

We quantified the sensitivity of hydrograph separation results to different characterizations of the pre-event water composition by repeating the calculations for different sets of baseflow and groundwater samples. We found that hydrograph separation results based on riparian groundwater samples or a baseflow sample resulted in different calculated pre-event water fractions than when the catchment average groundwater composition was used to characterize the pre-event water composition. Even if we selected three riparian groundwater samples to characterize the pre-event water composition, the event-averaged pre-event water fractions varied by 0.07 to 0.17.

The uncertainties in the pre-event water fractions (Wfpe) were lowest when one baseflow sample was used to represent the pre-event water composition, but we argue that this gives a false sense of accuracy because it neglects the spatial variability in the groundwater isotopic compositions. Furthermore, we show that this sample may not represent the actual pre-event water composition that contributes to streamflow during the event. The reduction in the event-averaged uncertainty of the pre-event water composition was largest when the number of groundwater samples increased from three to six.

To summarize, our results highlight the importance of representing the variability in the pre-event water composition when applying hydrograph separation analyses to assess runoff processes. This can be achieved by, for instance, increasing the number of sampling locations or by using ranges reported in literature.

Acknowledgments

This research project would not have been possible without the help and support of many people in the lab and field. We particularly thank Barbara Herbstritt for the isotope analyses; Michael Rinderer and Benjamin Fischer for the helpful discussions and for the installation of the wells and stream gauges; the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL); Oberallmeindkorporation Schwyz (OAK), the municipality of Alpthal; and the Department of Environment of the Canton of Schwyz for the excellent cooperation. We thank the Editor, Associate Editor, and five anonymous reviewers for their helpful comments on previous versions of this manuscript. The data used in this analysis are available through the data repository of WSL (www.envidat.ch).