Transformation of Generation Processes From Small Runoff Events to Large Floods

Mixture of runoff generation processes poses a challenge for predicting upper flood quantiles. We examined transformations of generation processes from all identifiable runoff events to frequent and upper tail floods for a large set of mesoscale catchments and observed a substantial change of the dominant processes. Two trajectories of transformation were detected. In regions where floods occur almost exclusively in winter the dominance of processes related to snowmelt consistently increases from small events to frequent and upper tail floods. In catchments characterized by frequent winter‐spring floods and occasional summer‐autumn flood events triggered by rare meteorological phenomena (e.g., Vb cyclones), processes that dominate upper tails are not adequately represented in the sample of frequent floods. Predictions of extremes and projections of flood changes might remain highly uncertain in the latter cases.


Introduction
Predicting magnitude and frequency of large (extreme) floods and their possible changes is critical for hazard assessment, making the topic of their origins of significant practical importance (Smith et al., 2018). Detailed studies of extreme and catastrophic floods (e.g., Blöschl et al., 2013;Hirschboeck, 1987a;Nakamura et al., 2013;Rogger et al., 2012;Smith et al., 2018) and specific atmospheric mechanisms generating the largest flood peaks (e.g., Barth et al., 2017;Doswell et al., 1996;De Luca et al., 2017;Lima et al., 2017;Ralph et al., 2006;Villarini et al., 2011) provided evidences on the heterogeneity of processes in flood samples. These works fueled debates on the origins and hence the choice of appropriate methods for estimating upper tail (i.e., the largest) floods. Moreover, recent process-based investigations of flood changes (e.g., Berghuijs et al., 2019;Blöschl et al., 2019;Kemter et al., 2020) showed that their observed disparate regional patterns might in fact be driven by changes in the mix of flood generation processes, thus further complicating reliable estimations of future flood hazards.
Discussions on the origins and predictability of upper tail and extreme flood events from the sample of annual floods and ordinary runoff events have a long history (e.g., Klemes, 1993;Rossi et al., 1984). Several studies on the hydroclimatic nature of flood events in the United States (e.g., Barth et al., 2019;Hirschboeck, 1987b;Smith et al., 2011Smith et al., , 2018Villarini & Smith, 2010) showed that the nature of upper tail events is often different from that of annual floods. On the other hand, recently developed approaches that utilize the whole range of observed streamflows to derive flood frequency curves (Basso et al., 2016;Claps & Laio, 2003;Miniussi et al., 2020) delivered more accurate predictions of upper tails, thus supporting the hypothesis that the largest floods (statistically) might indeed originate from ordinary runoff events.
Regional and global surveys of processes generating annual and partial duration flood series (Hirschboeck, 1987b;Merz & Blöschl, 2003;Sikorska et al., 2015;Stein et al., 2019) showed that a mixture of processes is the norm. However, it is still unclear if the mixture of processes detected in frequent floods (e.g., annual maxima or peaks-over-threshold) is representative for upper tail events. A consistent process-based characterization of all observed runoff events spanning the whole range of magnitudes might help to detect possible transformation of processes from ordinary runoff events to annual floods and further to events with higher return periods. Such analyses in a wide set of catchments can provide valuable insights on regional variations in the transformation of processes and highlight regions where special care should be taken when predicting extremes or projecting possible future changes of floods.
Therefore, this study aims to investigate the nature and composition of processes generating different event samples, and to examine their transformations going from all identifiable runoff events to frequent (e.g., peaks-over-thresholds and annual maxima) and upper tail floods (e.g., events with return periods >10 years) for a large set of mesoscale catchments.

Data and Methods
The event characterization was performed on a set of 203,852 runoff events that occurred in 172 German catchments (31-23,700 km 2 , median = 516 km 2 , Figure 1a) and were separated from daily hydrometeorological time series by means of the automated method of Tarasova et al. (2018). The study period spans from 1951 to 2013 with a median observation length of 61 years. Only catchments with limited effects from reservoir operation were considered. Land use changes during the study period are limited (<2% of the area for most catchments) and do not considerably affect peak discharges (Tarasova et al., 2018).
In this study we primarily focused on four event samples: ordinary (all identifiable) runoff events isolated by the above mentioned event separation method (see Text S1 in the supporting information for details), two alternative samples of frequent floods defined as series of maximum annual floods (MAFs) and peak-over-threshold events (POT4, the threshold was selected iteratively to sample on average 4 events per year), and upper tail floods (flood events with discharges corresponding to return periods of at least 10 years [HQ10]; Smith et al., 2018;Villarini & Smith, 2010). Discharges corresponding to specific empirical return periods were estimated using Weibull plotting position of annual maxima. A median number of 1,214 ordinary, 216 POT4, 61 MAF, and 6 HQ10 events per catchment were identified. Event samples were identified and analyzed catchment-wise, so that less frequent events always correspond to larger discharges in each catchment. Hence, in this study we term less frequent events as large events and more frequent ones as small events.
To enhance robustness of the possibly detected transformations of dominant processes in the above samples, we also considered maximum seasonal floods (MSF) (i.e., maximum flows in predefined seasons) and flood events with return periods of at least 2, 5, 15, and 20 years (HQ2, HQ5, HQ15, and HQ20). Samples of flood events with larger return periods were not considered due to the limited length of the available discharge series.
Moreover, we examined sensitivity of the results to specific sampling approaches by using different POT samples (two, three, four, and five events per year), different MSF samples (two or four seasons) and deriving samples based on empirical return periods (e.g., HQ10) from both annual and partial duration series. Maximum, median, and minimum numbers of events per catchment corresponding to each event sample are reported in Table S1.
For characterizing event conditions we used daily gridded rainfall data from the REGNIE data set (Rauthe et al., 2013), gridded snowmelt and soil moisture series simulated by the mesoscale Hydrological Model (Kumar et al., 2013;Samaniego et al., 2010), and air temperature interpolated using external drift kriging (Zink et al., 2017).
All events were categorized according to the multi-layer process-based framework of Tarasova et al. (2020). The framework uses indicators that characterize space-time dynamics of inducing precipitation events and their spatial interaction with antecedent catchment states, such as snow cover, frozen soils, and soil moisture content ( Figure 1b). Each event was categorized at each layer based on predefined thresholds applied to the layer-specific dimensionless indicators (see Text S2 and Table S2) and labeled accordingly (Figure 1b). The states categorized at each layer of the framework can be combined hierarchically (Figure 1c) to obtain a A framework for process-based event characterization. The nature of the inducing event is analyzed using the volumetric ratio of rainfall and snowmelt, the spatial covariance of rainfall volume, pre-event snow cover, and degree of soil freezing.
Intensity-or Volume-dominated events are categorized by means of the temporal coefficient of variation of the precipitation rates and the ratio of intensity and volume, while their space-time organization is stratified into Local Steady, Local Unsteady, Extensive Steady, and Extensive Unsteady using the spatial coefficient of variation of the precipitation volume and the spatial covariance of precipitation fields between consecutive time steps. Events are labeled according to the state of the antecedent soil moisture into Wet and Dry. The spatial distribution of soil moisture is quantified by its spatial coefficient of variation (Uniform or Patchy). In case of Patchy (uneven) distribution, the spatial interaction of soil moisture and event precipitation is evaluated (Overlap or No Overlap) by means of their spatial covariance. Indicators and thresholds used to categorize events at each layer are described in Text S2 and unique type for each event. The framework and hierarchical decision tree applied for classification of runoff events in Germany showed regionally consistent performance allowing comparison of classified events in the study catchments (Tarasova et al., 2020).
Frequency of occurrence (fraction) of each possible category was calculated for each layer of the framework ( Figure 1b) for all considered event samples (e.g., ordinary events, MAF, HQ10) separately.
To evaluate the statistical significance of possibly detected transformations of processes, we applied the chi-square ( 2 ) test for categorical data in two different fashions. First, we checked if generation processes characterized at individual layers of the framework (e.g., Intensity-or Volume-dominated events) are evenly distributed in a particular event sample (e.g., MAF). This goodness-of-fit test examines if the frequency distribution of processes is statistically different from an expected (i.e., equal) distribution and rejects the null hypothesis if at least one of the processes is more likely to occur. In the second setting, we used the 2 test as a homogeneity test to check if the distributions of generation processes are different between event samples (e.g., between MAF and HQ10). The null hypothesis that the distributions of generation processes are homogeneous (i.e., similar) is rejected if the tested event samples have statistically different distribution of generation processes. p values were estimated using Monte Carlo simulations with 10,000 replicates (Hope, 1968) to avoid issues with asymptotic approximations that might arise due to imbalanced event samples (Agresti, 2007).
The dissimilarity between distributions of hierarchical event types for different event samples was quantified using the Euclidean distance: where event type frequency distributions of two different event samples are given as p = (p 1 , p 2 , … , p n ) and q = (q 1 , q 2 , … , q n ) in the Euclidean n space, with n total number of possible hierarchical event types (i.e., the combinations of the categories from each layer of the framework in Figure 1c). Accordingly, p i is the frequency of one event type (e.g., Rain-on-snow) in one event sample (e.g., MAF) and q i is the frequency of the same event type in the alternative event sample (e.g., HQ10). Euclidean distance is 0 when two event samples have identical distributions of event types.

Results
Figures 2 and 3 show a substantial transformation of dominant generation processes from small runoff events to larger floods. Pairwise 2 homogeneity tests indicated significant differences (Table S3) in the frequency of generation processes between samples of ordinary events and frequent floods (Figures 2 and 3, All-POT4; for 98% of the catchments on average for the different layers of the framework), between ordinary and upper tail events (HQ10-All, 36% of catchments), between frequent and upper tail floods (MAF-HQ10, 11% of catchments), and between annual floods and peak-over-threshold events (POT4-MAF, 7% of catchments).
The 2 tests for goodness-of-fit indicated that the frequency of different generation processes is distributed unequally for most event samples (Table S4). In the following, we describe the emerged patterns in the distribution of generation processes among different event samples in the study catchments.
Ordinary events are mostly Rainfall-induced, while a considerable portion of frequent floods is classified as Rain-on-snow or Mixture of rainfall and snowmelt (Figure 2, All). The emphasis shifts back to Rainfall-induced events for upper tail floods, especially in the southern regions and in mountainous catchments in Eastern Germany. Rain-on-snow floods instead dominate upper tails in catchments draining the Central Uplands. Rain-on-ice events generally occur very rarely (Figure 2, All) but can lead to large floods in several catchments (Figure 2, HQ10).
Among ordinary events, Volume-dominated events are more common than Intensity-dominated ones (Figure 2, All). This is not surprising, since intense rainfall storms occur relatively seldom in temperate climates like Germany, where long synoptic precipitation events related to frontal activities prevail (Mediero et al., 2015). Their dominance is even more pronounced for frequent floods (Figure 2, POT4, MAF).  (Figure 1b) in the event samples indicated in black font on the left side of the panels: ordinary events (All), peak-over-threshold events (POT4), maximum annual floods (MAF), and events with return periods of at least 10 years (HQ10). For each layer the relative frequency of occurrence of the corresponding categories sums up to unity. Significant difference (p < 0.1) in the distribution of categories between pairs of event samples (indicated in red font on the right side of the panels) was confirmed by χ 2 tests of homogeneity (Table S3) for catchments displayed with a red outline in the maps. The results of the χ 2 tests for goodness-of-fit, that test the hypothesis of equal frequency distribution of different categories within each event sample, are available in Table S4.
Nonetheless, floods with higher return periods can be also generated by Intensity-dominated events in the southern and eastern regions (Figure 2, HQ10).
Equal portions of ordinary events are attributed to Local and Extensive categories. Instead, Steady events (i.e., precipitation occurring over the same portion of a catchment during consecutive days) are more frequent than Unsteady ones (Figure 3, All). This tendency becomes more prominent for frequent floods, which mainly exhibit Extensive spatial structures (Figure 3, POT4, MAF). Similarly, the majority of upper tail events have an Extensive Steady organization (Figure 3, HQ10). However, Local Steady events are as likely to trigger larger floods (Figure 3, HQ10) in small mountainous catchments of the Central Uplands and Alpine Forelands.
Dry catchment conditions slightly prevail among all events (Figure 3, All), whereas frequent and upper tail floods are mainly generated during Wet conditions (Figure 3, POT4, MAF, HQ10). Nonetheless, a portion of the largest floods can be generated during Dry conditions as well (Figure 3, HQ10).
For what concerns the spatial interactions between catchment wetness states and inducing precipitation, runoff events are rarely generated by rainfall that does not overlap with wet areas of the catchment (Figure 3, Patchy No Overlap). When considering larger events, a Uniform spatial distribution of catchment wetness Similar findings on the transformation of generation processes from small runoff events to larger floods emerge when alternative samples of events (e.g., maximum seasonal floods and events with return periods of at least 2, 5, 15, and 20 years; Figures S1 and S2) are considered, further supporting the identified transformations of dominant generation processes.
Different sampling approaches for MSF (i.e., different number of seasons) and POT events (i.e., different number of events selected yearly) seem to have no major effect on the derived distributions of generation processes (no statistically significant differences among these samples were detected in most catchments, Table S5). Deriving samples of events with specific return periods (e.g., HQ10) either using annual maxima or peaks-over-threshold has no effect on the identified distributions of processes (Table S6).

Discussion
Our results showed presence of substantial transformations of dominant generation processes from small events to larger floods along two contrasting trajectories. For some catchments the portion of a particular generation process (e.g., Rain-on-snow or Volume-dominated events) consistently increases from the sample of ordinary events to frequent and upper tail floods. For others, the dominance of certain processes (e.g., Rainfall-induced events or events during Dry conditions) only emerges for upper tail events, while their portions in frequent floods are smaller than in the sample of all events.
The detected transformations of dominant generation processes can be summarized by the difference in the distributions of event types between the samples, here quantified by the Euclidean distance between these distributions and evaluated statistically by means of a 2 test of homogeneity. Since the sample of all events includes the whole range of possible event types (e.g., Figures 4d and 4e) and is expected differ from the sample of upper tail floods, we chose the Euclidean distance between these samples as a reference and evaluated the relative change in the distance between the samples of frequent and upper tail floods. A negative relative change in their distance implies that frequent floods are more similar to upper tails than ordinary events in term of event type distribution. This is observed in catchments with pronounced winter flood regime (western and central parts of Germany) (Figures 4a and 4b, small blue markers, pronounced seasonality). For most of these catchments no significant differences in the distribution of event types between frequent and upper tail floods were detected (Table S7 and Figures 4a and 4b, blue circles). This situation embodies the first detected trajectory, where the dominance of a certain process consistently increases from small to large events and sampling of frequent floods from ordinary events (e.g., by means of annual maxima or discharge thresholds) leads to prioritizing generation processes that are dominant for upper tails.
A positive relative change instead implies that frequent floods are more dissimilar to upper tails than ordinary events in terms of event type distributions. This is observed in catchments characterized by mixed flood seasonality (southern and eastern parts of Germany) (Figures 4a and 4b, large red markers, low seasonality). This situation corresponds to the second trajectory (detected in 23-27% of the catchments depending on the sample of frequent floods considered, Table S7), where the contribution of a certain process decreases from ordinary events to frequent floods, but dominates the generation of upper tails. Sampling of frequent floods from ordinary events in this case leads to prioritizing processes that are not dominant for upper tails. For at least a quarter of these catchments (24-44%, Table S7) the difference between event type distributions of frequent and upper tail floods is statistically significant (Figures 4a and 4b, red triangles). As expected, more catchments with statistically different distributions were detected (41-44%, Table S7; Figures 4c and S3, red triangles) if higher return periods (HQ15, HQ20) are considered.
These contrasting trajectories and their possible outcomes for predicting upper tails of flood frequency distributions are well illustrated by considering two exemplary catchments (Figures 4d-4g). In the Rodach catchment (located in Central Germany) the sample of frequent floods accurately represents the distribution of different events types among the largest events. In the Müglitz catchment (located in Eastern Germany), instead, the majority of frequent floods are categorized as Rain-on-snow, Mixtures of rainfall and snowmelt and Rainfall-induced events during Wet conditions (cold colors), whereas the largest floods are produced by Rainfall-induced events during Dry conditions (warm colors). In this particular mountainous catchment this is caused by the occurrence of rare but intense summer rainfall brought by Vb cyclones (Nied et al., 2014;Petrow et al., 2007), while the majority of annual floods is generated in winter and spring by less intense winter storms interacting with accumulated snowpack. As for most catchments (Figures 2 and 3), distribution of generation processes in the POT4 sample is very similar to that of MAF in this catchment ( Figure 4d) and does not guarantee a more accurate sampling of generation processes for upper tails.
Most catchments attributed to the second trajectory (Figures 4a-4c, red markers) are located in Southern and Eastern Germany, areas characterized by mixed flood seasonality and affected by the above mentioned Vb cyclones (Hofstätter et al., 2016), and show a large variability in catchment sizes ( Figure S4). The detected distributions of generation processes in these catchments (Figures 2 and 3) show that most annual floods triggered by to snowmelt processes (Rain-on-snow, Mixture of rainfall and snowmelt) or volume-dominated rainfall events occurring in Wet conditions during the winter-spring season (Beurton & Thieken, 2009), while Intensity-dominated rainfall events associated to specific meteorological phenomena like Vb cyclones (Hofstätter et al., 2016;Nied et al., 2014), blocking conditions (Grams et al., 2014) or convective storms (Bronstert et al., 2020) trigger rarer floods under Dry conditions in the summer-autumn season, especially in mountainous catchments.
This is likely to pose a problem for the homogeneity assumption required to fit theoretical distributions. Mixed distributions could be applied instead (Barth et al., 2019;DWA, 2012;England et al., 2018), but their usage with annual or partial duration series might result in highly uncertain estimates if only a few events of the types responsible for generating upper tails are sampled. Similar situations are likely to occur in other regions of the world where relatively infrequent climatic phenomena or conditions that are underrepresented in the annual or partial duration series dominate the upper tail while a large portion of floods is generated by processes of different nature. The most prominent examples of such phenomena, besides the mentioned Vb cyclones active in Central Europe (Hofstätter et al., 2016), are inland-penetrating atmospheric rivers in central Arizona, USA (Barth et al., 2017), tropical cyclones in the Eastern and Central United States Villarini et al., 2014), as well as globally widespread organized convective thunderstorms (Doswell et al., 1996). Discrepancy of event types included in the samples of frequent and upper tail floods

Geophysical Research Letters
10.1029/2020GL090547 support using ordinary events instead, as they capture the full variety of possible generation processes. In such catchments recently developed metastatistical and physically based approaches (Basso et al., 2016;Marani & Ignaccolo, 2015;Miniussi et al., 2020;Zorzetto et al., 2016) might be a viable alternative to expand information on rare generation processes using the whole sample of ordinary events.
Beside their relevance for the predictability of upper tail events, the revealed differences in the distribution of generation processes from small to larger events provide insights on the causes of observed disparate patterns of changes in small and large floods (Bertola et al., 2020). These alterations, possibly driven by the changing frequency of their corresponding generation processes (e.g., more frequent intensity-dominated events or less frequent rain-on-snow events), are likely to continue in the future (Kendon et al., 2014), thus altering the probabilistic structure of the extremes and modifying flood hazard in the affected catchments. Knowledge on dominant generation processes of the full range of runoff events might indicate directions toward suitable adaptations to future flood risk.

Conclusions
We examined transformations of generation processes from all identifiable runoff events to frequent (MAFs and peak-over-threshold events) and upper tail floods (i.e., with return period of at least 10 years) for a large set of mesoscale catchments, detecting substantial changes of the dominant processes along two contrasting trajectories. In regions where floods occur almost exclusively in winter the dominance of processes related to snowmelt consistently increases from small events to larger floods and the samples of frequent and upper tail floods show similar distributions of generation processes. In mountainous catchments characterized by frequent winter-spring floods related to snowmelt and occasional summer-autumn floods triggered by rare meteorological phenomena (e.g., Vb cyclones), instead, processes that dominate upper tails are not adequately represented in the sample of frequent floods. In these cases, conventional statistical approaches for the estimation of flood frequency from annual maxima and partial duration series, widely used in several fields of geoscience and in the engineering practice, should be applied with care. In fact, the prediction of upper tails in these catchments might remain highly uncertain even when mixed distributions are utilized. Future studies might advance in this direction by comparing the performance of traditional and emerging metastatistical approaches in catchments exhibiting different trajectories of process transformation and examine if unpredictable floods occur where the upper tail is dominated by rare processes underrepresented in the sample of frequent floods. terms of the research group FOR 2416 "Space-Time Dynamics of Extreme Floods (SPATE)" and grant number 421396820 "Propensity of rivers to extreme floods: climate-landscape controls and early detection (PREDICTED)," as well as the Helmholtz Centre for Environmental Research-UFZ, is gratefully acknowledged. We are grateful to the Editor Valeriy Ivanov and two anonymous reviewers for insightful comments that greatly improved the original manuscript. We thank Arianna Miniussi for useful discussions on metastatistical approaches. We are grateful to Rohini Kumar, Matthias Zink and Luis Samaniego for providing simulations of the mHM model.