Volume 60, Issue 8 e2023WR036900
Research Article
Open Access

Streamflow Intermittence in Europe: Estimating High-Resolution Monthly Time Series by Downscaling of Simulated Runoff and Random Forest Modeling

Petra Döll

Corresponding Author

Petra Döll

Institute of Physical Geography, Goethe University Frankfurt, Frankfurt/Main, Germany

Senckenberg Leibniz Biodiversity and Climate Research Centre (SBiK-F) Frankfurt, Frankfurt/Main, Germany

Correspondence to:

P. Döll,

[email protected]

Contribution: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition

Search for more papers by this author
Mahdi Abbasi

Mahdi Abbasi

Institute of Physical Geography, Goethe University Frankfurt, Frankfurt/Main, Germany

Contribution: Methodology, Software, Formal analysis, Data curation, Writing - review & editing, Visualization

Search for more papers by this author
Mathis Loïc Messager

Mathis Loïc Messager

INRAE, UR RiverLy, Lyon-Villeurbanne, Paris, France

Department of Geography, McGill University, Montreal, QC, Canada

Contribution: Methodology, Software, Formal analysis, Data curation, Writing - review & editing, Visualization

Search for more papers by this author
Tim Trautmann

Tim Trautmann

Institute of Physical Geography, Goethe University Frankfurt, Frankfurt/Main, Germany

Contribution: Methodology, Software, Formal analysis, Writing - review & editing

Search for more papers by this author
Bernhard Lehner

Bernhard Lehner

Department of Geography, McGill University, Montreal, QC, Canada

Contribution: Conceptualization, Methodology, Writing - review & editing

Search for more papers by this author
Nicolas Lamouroux

Nicolas Lamouroux

INRAE, UR RiverLy, Lyon-Villeurbanne, Paris, France

Contribution: Methodology, Writing - review & editing, Funding acquisition

Search for more papers by this author
First published: 31 July 2024

Petra Döll and Mahdi Abbasi contributed equally to this work.

Abstract

Knowing where and when rivers cease to flow provides an important basis for evaluating riverine biodiversity, biogeochemistry and ecosystem services. We present a novel modeling approach to estimate monthly time series of streamflow intermittence at high spatial resolution at the continental scale. Streamflow intermittence is quantified at more than 1.5 million river reaches in Europe as the number of no-flow days grouped into five classes (0, 1–5, 6–15, 16–29, 30–31 no-flow days) for each month from 1981 to 2019. Daily time series of observed streamflow at 3706 gauging stations were used to train and validate a two-step random forest modeling approach. Important predictors were derived from time series of monthly streamflow at 73 million 15 arc-sec (∼500 m) grid cells that were computed by downscaling the 0.5 arc-deg (∼55 km) output of the global hydrological model WaterGAP, which accounts for human water use. Of the observed perennial and non-perennial station-months, 97.8% and 86.4%, respectively, were correctly predicted. Interannual variations of the number of non-perennial months at non-perennial reaches were satisfactorily simulated, with a median Pearson correlation of 0.5. While the spatial prevalence of non-perennial reaches is underestimated, the number of non-perennial months is overestimated in dry regions of Europe where artificial storage abounds. Our model estimates that 3.8% of all European reach-months and 17.2% of all reaches were non-perennial during 1981–2019, predominantly with 30–31 no-flow days. Although estimation uncertainty is high, our study provides, for the first time, information on the continent-wide dynamics of non-perennial rivers and streams.

Key Points

  • Streamflow intermittence at more than 1.5 million European reaches was estimated for every month during 1981–2019

  • 18.7% of the European river network length and 3.8% of all reach-months are non-perennial, predominantly with 30–31 no-flow days

  • 15 arc-sec monthly streamflow obtained by downscaling the output of a global hydrological model serves as input to random forest modeling

Plain Language Summary

Even in wet climates, small streams can seasonally dry up. In drier areas, large rivers might not carry water for weeks or months. However, as streamflow observations are lacking for most drying rivers, we know little about when, where, and how long rivers experience such a streamflow intermittence that is crucial for both river life and human water supply. We developed and applied a novel approach to estimate, for the first time, the temporal dynamics of streamflow intermittence across European rivers and streams, including small ones. This approach combines the output of a global hydrological model with streamflow observations and other data. We refined the global model output available for 50 km cells to monthly streamflow in 500 m cells. We then applied a machine learning model to predict the number of days without water flow in each month during the period 1981–2019 for over 1.5 million river segments. We found that 17% of all European segments and 4% of all months at all segments experienced at least one day without flow. In the future, the model will be used to estimate the impact of climate change on streamflow intermittence.

1 Introduction

It has been estimated that most rivers and streams on Earth have reaches that naturally cease to flow or dry at least one day per year (Messager et al., 2021). Natural streamflow intermittence is most prevalent in semi-arid and arid regions, where it may occur even in large rivers (Merrit et al., 2021), but it is also widespread in smaller headwater streams across humid regions. For example, 25%–40% of the total length of streams and rivers in France are estimated to be non-perennial (Snelder et al., 2013). In most basins, the likelihood and degree of streamflow intermittence, that is, the fraction of no-flow days, increases with decreasing mean streamflow or upstream area (Datry et al., 2014; Messager et al., 2021). Waterways can also cease to flow without being dry due to freezing conditions, so non-perennial streams are a significant feature of cold landscapes as well (Buttle et al., 2012; Shanafield et al., 2021). Anthropogenic alterations of the natural flow regime resulting, for example, from human water abstractions or the operation of artificial reservoirs, can increase or decrease the number of no-flow days (Richter et al., 1997). Following Busch et al. (2020), we use the term “non-perennial reach” to refer to a stream or river reach that ceases to flow at least one day during the reference period, without distinguishing between “ephemeral” and “intermittent”. Considering the temporal dynamics of streamflow intermittence, we use the term “non-perennial reach-month” for a month in which water ceases to flow in the reach for at least one day.

While streamflow intermittence can be monitored by measuring streamflow at gauging stations, these measurements come with numerous limitations (Zimmer et al., 2020) and only cover a very small part of all reaches, being particularly sparse where non-perennial conditions prevail, which is often the case in headwater streams (Krabbenhoft et al., 2022; Sauquet, Shanafield, et al., 2021). In addition, streamflow observations are insufficient to derive projections of future changes in intermittence due to climate change, changes in land and water use or the management of artificial reservoirs (Döll & Müller Schmied, 2012; Sauquet, Beaufort, et al., 2021). Therefore, comprehensive analyses of streamflow intermittence and its effects on water resources for humans and other biota require a modeling approach.

Large-scale modeling of streamflow intermittence is necessary for assessments of biodiversity, ecosystem functions and ecosystem services of rivers and streams at national to global scales. Improved knowledge about spatial and temporal patterns of streamflow intermittence helps raise awareness for non-perennial streams and their value and may support a more extensive hydrological and ecological monitoring of these streams. Until now, however, continental- or global-scale modeling studies on streamflow intermittence have either provided only a static classification of river reaches into non-perennial or perennial at high spatial resolution (15 arc-sec, ca. 500 m; Messager et al., 2021) or time series of non-perennial streamflow conditions at a low spatial resolution (0.5 arc-deg, ca. 50 km; Döll & Müller Schmied, 2012). Messager et al. (2021) used random forest (RF) modeling to estimate which river reaches cease to flow at least one day per year or for at least 30 days per year; this was achieved for 23.3 million km of mapped rivers and streams across the globe (except Antarctica) whose long-term average naturalized discharge exceeds 0.1 m3/s. Despite its fine resolution, such a static classification of reaches as either perennial or non-perennial fails to characterize the temporal structure of flow intermittence (e.g., the number of no-flow days or seasonality of intermittence) which is required for analyzing the biodiversity and ecosystem functions of non-perennial streams and rivers (Datry et al., 2018). By contrast, daily streamflow time series simulated by global hydrological models such as the WaterGAP model used in Döll and Müller Schmied (2012) do represent the temporal dynamics of streamflow intermittence. However, these coarser models overlook headwater stream reaches with small drainage basins, which are more prone to intermittence than larger downstream reaches and comprise the majority of global river length (Messager et al., 2021). What is missing is an approach for modeling streamflow intermittence at a high spatial and temporal resolution, using dynamic time series data of streamflow rather than static averages. Furthermore, the statistical prediction method should be capable of estimating the impact of climate change on streamflow intermittence.

Simulating daily streamflow in small headwater streams requires small computational grid cell sizes (e.g., 500 m or less). Such small grid cells can easily be implemented in hydrological models if the drainage basin of study is small (Mahoney et al., 2023; Yu et al., 2020). However, this is not feasible across large geographic extents like entire continents or the world due to the lack of high-resolution climate data at these scales and computational constraints resulting from the large number of small (high-resolution) grid cells (Bierkens et al., 2015; Döll et al., 2016). A 0.5 arc-deg grid cell, typical for global hydrological models, contains 14,400 individual 15 arc-sec grid cells; in Europe alone (without Russia and Turkey), for example, there are about 73 million 15 arc-sec cells. Furthermore, hydrological models are often less successful in simulating low flows than mean flows (Zaherpour et al., 2018). Most hydrological models are process-based, that is, they attempt to estimate water storage and fluxes across the different compartments of the terrestrial part of the hydrological cycle with sets of mathematical equations (Telteu et al., 2021). However, a satisfactory process-based simulation of low-flow, and particularly no-flow conditions, is difficult even at small scales, in part because the simulation of two-way exchange flows between surface water bodies and groundwater bodies requires coupling of a hydrological model with a gradient-based groundwater model (Döll et al., 2016). To help advance the science and management of freshwater ecosystems globally, new approaches are thus needed to produce large-scale high-resolution models of streamflow intermittence that provide information on the frequency, duration and timing of flow cessation across the entire river network, from the headwaters to river mouths.

Machine learning methods such as RF have the advantage over process-based models in that they do not require detailed knowledge of the processes underlying the phenomenon of interest and are thus a promising tool to produce large-scale high-resolution predictions of no-flow conditions. However, to achieve temporally explicit predictions, these models require temporally explicit predictors. The respective strengths of global hydrological models and machine learning methods can hence be combined by using the dynamic output of the former as an input predictor for the latter to achieve large-scale high-resolution modeling of the temporal structure of streamflow intermittence.

Here, we present such a combined modeling approach for computing monthly time series of streamflow intermittence conditions at the continental scale for river reaches that can be defined with a spatial resolution of 15 arc-sec. This high spatial resolution has already been used for globally predicting the prevalence on non-perennial streams and is necessary for representing fine-scale variations in streamflow intermittence observed in humid regions (Messager et al., 2021). Our RF modeling approach combines temporally explicit predictor variables derived from the low-resolution (LR, 0.5 arc-deg) state-of-the-art global hydrological model WaterGAP 2.2e (Müller Schmied et al., 2021) with several high-resolution (HR, 15 arc-sec) static predictor variables (e.g., drainage area and irrigated area). As part of this approach, WaterGAP LR output is spatially downscaled to derive HR monthly time series of streamflow. While all predictors used in the model are based on globally available data, the approach was developed using time series of daily streamflow observed at 3706 gauging stations throughout Europe (resulting in more than 1 million station-month with information on the number of no-flow days). It was then applied to estimate streamflow intermittence in Europe.

Section 2 presents the data and methods of this study. In Section 3, the downscaled HR monthly streamflow time series are compared to observations at all gauging stations that were used to set up and calibrate the RF model. RF model performance and results of the RF application are presented in Section 4. Section 5 provides validation and discussion of the streamflow intermittence modeling approach, while conclusions are drawn in Section 6.

2 Methods and Data

Below, we first explain the downscaling method applied to derive HR time series of streamflow from the LR output of the global hydrological model WaterGAP (Section 2.1). We then describe the compilation of the data set of observed daily streamflow in Europe that was used for both validating the HR streamflow and for deriving the target data of the RF modeling approach (Section 2.2). In Section 2.3, the RF modeling approach, which consists of two sequential RF models, is described, including the predictor variables (Section 2.3.2). Both RF models were calibrated by twice-repeated three-fold nested cross-validation, using oversampling of the minor classes of the target observations and hyperparameter tuning (Section 2.3.3). This is followed by the definition of European river reaches for which model predictions are made (Section 2.4). In Section 2.5, performance metrics are explained.

The hydrographic data set applied throughout this study is the global HR drainage direction map of HydroSHEDS v1 (Lehner et al., 2008; www.hydrosheds.org). This data set represents, for each 15 arc-sec grid cell on land, the direction in which water would flow from that cell to its neighboring cells given topography. It serves to downscale LR outputs from WaterGAP, to co-register streamflow gauging stations, to delineate river reaches for which the RF model produces predictions and to quantify predictors that are aggregated over the upstream areas of stations and reaches. In this study for Europe, HydroSHEDS was modified in three drainage basins (each about 200 km2) in Finland, Hungary and Croatia due to their use as case study basins in the related DRYvER project (see Döll, Abbasi, & Trautmann, 2023 and https://www.dryver.eu/about/case-studies).

2.1 Downscaling of LR WaterGAP Output to Obtain Time Series of Monthly HR Streamflow

2.1.1 WaterGAP

WaterGAP is a global water resources and use model that covers all continents except Antarctica (see Müller Schmied et al., 2021, for details). It computes time series of water use for irrigation, livestock, manufacturing, cooling of thermal power plants and households, distinguishing groundwater and surface water sources. It also simulates water flows (e.g., evapotranspiration and runoff) and water storages (e.g., in soil and groundwater), taking into account the impact of net abstractions from groundwater and surface water bodies as well as of artificial reservoirs. However, only the operation of the globally largest 1,109 artificial reservoirs (including regulated lakes) is simulated explicitly in WaterGAP, while smaller reservoirs only add to the fraction of each LR cell that is made up of so-called local lakes, thus affecting evapotranspiration and flow dynamics in a very coarse way only. In WaterGAP, daily water flows and storages of 10 storage compartments are simulated in each LR grid cell. Total runoff from land is partitioned into fast (surface) runoff and groundwater recharge. Surface runoff from within a grid cell reaches surface water bodies (wetlands, lakes, reservoirs and rivers) on the same day, while groundwater recharge flows from the soil into the groundwater, which then releases groundwater discharge to surface water bodies as a function of groundwater storage. Only one river is assumed to exist within each LR grid cell, and the streamflow computed by WaterGAP refers to the outflow from the LR grid cell to the next downstream grid cell, which is prescribed by the LR drainage direction map DDM30 (Döll & Lehner, 2002). Groundwater discharge to surface water bodies may become zero in case of groundwater depletion, but the loss of streamflow to the groundwater cannot be simulated. The LR WaterGAP output used in this study was computed by forcing version 2.2e of WaterGAP with the climate data set GSWP3-W5E5 (Müller Schmied et al., 2023a) for the time period 1901–2019. The model was calibrated against long-term mean annual streamflow observed at 1509 gauging stations globally (with a drainage area of at least 9,000 km2) by adjusting 1–3 model parameters. This calibration helps to avoid large over- or underestimations of simulated mean annual LR runoff and streamflow such that the fit to observed streamflow is often superior to that of other global hydrological models (e.g., Zaherpour et al., 2018).

2.1.2 Downscaling Approach

A number of approaches for generating time series of high-resolution streamflow from the output of global hydrological models were recently developed (Chuphal & Mishra, 2023; Kallio et al., 2021; Lin et al., 2019). Our approach for downscaling the LR output of a global hydrological model to HR streamflow is based on the conceptual framework developed by Lehner and Grill (2013), which was previously applied at the global scale (e.g., Linke et al., 2019). We chose this approach because, to our knowledge, it is currently the only downscaling method that utilizes multiple runoff and discharge components from the coarse-scale input model, thereby incorporating streamflow modulations that occur during the river routing process. In this study, we generalized and adapted the approach, including some simplifications, to enable a computationally efficient generation of HR time series of monthly streamflow. As a distinct feature, the downscaling approach does not simply disaggregate and then route the sum of LR surface runoff and groundwater recharge (i.e., total runoff from land) along the HR river network, as this would disregard water retention in groundwater and surface water stores, evaporation from lakes and wetlands, as well as human water use. Instead, our approach uses both surface runoff and groundwater discharge estimates from the LR model and projects the results onto the HR river network using geospatial interpolation methods. Considering the original LR groundwater discharge estimates allows for a better representation of HR streamflow variability because it takes into account the storage capacity of groundwater aquifers that smoothen and delay the streamflow signal. Further corrections incorporate the LR net cell runoff of WaterGAP, that is, the difference between the outflow from the LR cell and the inflow from all upstream cells, which accounts for the dynamics of surface water bodies and anthropogenic streamflow alterations that occur inside the LR cell.

Here, we only describe the core elements of the downscaling method; for details see Text S1 in Supporting Information S1. The sum of LR monthly surface runoff and groundwater discharge (expressed as specific volume flow per unit area, i.e., m3 s−1 km−2) is first interpolated from the original 0.5 arc-deg resolution to an intermediate resolution of 0.1 arc-deg to reduce abrupt changes in streamflow at the edges between LR cells. This is performed using an inverse distance interpolation with a power of 2 and taking into account the nearest 9 LR data points. A maximum interpolation radius of 1.8 arc-deg is allowed to extend data into areas where land cells are represented in the HR hydrography but not in the LR river network. This is the case in coastal regions and in missing cells within large lakes of the LR model. In the next step, the 0.1 arc-deg values are disaggregated to the 15 arc-sec HR grid cells by assigning the same 0.1 arc-deg value to all respective HR cells and assigning null values to HR cells outside of the continental boundaries of HydroSHEDS.

The disaggregated runoff is then corrected to integrate information from the routing routine of the LR model, in particular about the impact of surface water bodies and human water use on streamflow (see Text S1 in Supporting Information S1 for details). For example, the GHM WaterGAP computes streamflow not only by routing surface runoff and groundwater discharge but also considers the impacts of reservoirs, lakes and wetlands as well as human abstractions of groundwater and surface water within each LR grid cell. These impacts are included in the downscaling by calculating and applying correction factors that are derived from the LR input and projected onto the HR output. Further correction terms are added in specific grid cells, such as HR endorheic sinks, outflow cells of lakes and reservoirs, or cells containing large rivers with a drainage area of more than 50,000 km2. The final correction term is applied in a spatially weighted way to the HR grid cells. With the help of the correction weight, a greater share of the total correction occurs in the downstream HR grid cells within each LR cell, which reflects the assumption that downstream HR cells are more affected by surface water bodies and human water use than upstream cells within the LR. The downscaling approach was implemented in Python.

2.2 Compilation and Processing of Measured Streamflow for the Computation of Target Observations and the Validation of Simulated HR Streamflow

Long-term historical information on the number of no-flow days per month in waterways, the target for the RF modeling, can only be derived consistently from uninterrupted observations of daily mean streamflow at gauging stations, that is, if no daily streamflow value is missing in a month. We collected most of these observations from the Global Runoff Data Centre (GRDC; https://www.bafg.de/GRDC/) and the Global Streamflow Indices and Metadata archive (GSIM; Do et al., 2018; Gudmundsson et al., 2018), the largest existing global repositories of streamflow gauging station data. Altogether, daily streamflow records for 2930 GRDC and GSIM stations are available through these data sets for Europe. However, most of the GRDC and GSIM stations are on perennial streams, without any no-flow days in their record, which reflects the global underrepresentation of streamflow gauging stations on non-perennial river reaches (Krabbenhoft et al., 2022). Therefore, we used metadata on gauging stations with flow intermittence in 19 European countries from the SMIRES meta-database (Sauquet, 2020) to obtain daily streamflow time series directly from national streamflow data providers for 375 additional gauging stations listed in the database. As flow intermittence in Europe is most prevalent in Mediterranean regions, we additionally retrieved daily streamflow data for a total of 55, 648 and 1,031 gauging stations from governmental websites for Corsica (https://www.sandre.eaufrance.fr/), Italy (http://meteoniardo.altervista.org/) and Spain (https://ceh.cedex.es/anuarioaforos/demarcaciones.asp), respectively.

From this compiled streamflow data set, records suitable for deriving target observations were selected for subsequent analyses. We first checked whether each gauging station was correctly located on the updated 15 arc-sec HydroSHEDS drainage direction map by comparing the upstream area given in the metadata with the upstream area of the HR cell where the station was located. Confirmatory checks also involved inspecting high-resolution satellite imagery and comparing the river and station names provided in the metadata to topographic maps (ESRI ArcGIS basemaps). If the drainage areas deviated by more than 10%, the stations were manually relocated to a suitable HR grid cell with a deviation of less than 10% and/or associated to a river or stream with the correct name in topographic maps (if provided in the metadata). If this was not possible, the station was excluded from the RF modeling. For the remaining stations, we excluded all station-months with any missing or suspicious daily flow values following the approach of Gudmundsson et al. (2018). We then excluded all stations that had less than 36 station-months of daily streamflow data. Finally, we labeled all days with a mean streamflow of 0.001 m3 s−1 or less as no-flow days and computed, as the target of the RF modeling, the number of no-flow days per month and station (i.e., per station-month). The maximum period with observed no-flow days and streamflow per station-month is 1981–2019 (468 months).

In total, data on streamflow at 3706 stations during 1981–2019 were used for calibrating and validating the RF models, corresponding to 1,166,944 station-months (26 years of useable data per station on average). While 2.80% of the station-months were non-perennial, that is, had at least one no-flow day within the month, 23.9% of the stations had at least one no-flow day over the whole study period. 0.48% of the station-months had 1–5 no-flow days, 0.52% 6–15 no-flow days, 0.74% 16–29 no-flow days and 1.06% 30–31 no-flow days. 30–31 no-flow days represent months in which the stream is (almost) completely without streamflow. In addition, daily streamflow values were aggregated to monthly values for the same station-months to serve for the validation of the simulated HR streamflow (Section 2.1).

2.3 Random Forest Modeling Approach

2.3.1 Overview

The supervised machine learning method RF is well suited for both classification and regression tasks (Breiman, 2001). RF modeling has already been used for hydrological classification problems, that is, for predicting classes of hydrological characteristics including intermittence (global: Messager et al., 2021; Australia: Bond & Kennard, 2017; France: Snelder et al., 2013). Tyralis et al. (2019) provide a review of RF methods with a focus on hydrological applications.

With less than 3% of all observed station-months in our European streamflow data set being non-perennial (Section 2.2), the data set of target observations used for training the model is highly imbalanced, which can severely bias the resulting predictions (Japkowicz & Stephen, 2002). To mitigate this problem, two RFs are set up sequentially in our modeling approach. The first RF is developed to predict months with and without no-flow days (non-perennial station-months and perennial station-months, respectively) in a binary way. The second RF is trained only with data for non-perennial station-months to predict the number of no-flow days in four classes. The two calibrated RFs were then successively applied to predict the occurrence of five intermittence classes (0, 1–5, 6–15, 16–29 and 30–31 no-flow days per month) for each of the 468 months from 1981 to 2019 at more than 1.5 million river reaches in Europe (without Russia and Turkey, see Section 2.4 for the definition of reaches).

In this study, we used a derivative of the standard RF algorithm for making probabilistic predictions of class membership (Malley et al., 2012), which is included in the “ranger” R package (Wright & Ziegler, 2017) that we used for the RF modeling. The “ranger” R package is a fast implementation of RF suited for high-dimensional data (Tyralis et al., 2019). The two consecutive RF models were trained and optimized by cross-validation, that is, calibrated, by relating observations of the number of no-flow days per station-month at streamflow gauging stations (target of RFs) to 23 predictors, 7 of them temporally-explicit (i.e., dynamic). For the RF training, each streamflow station is assigned to an HR grid cell. For RF predictions, each European river reach is assigned to the HR cell containing its downstream end. In the following two sections, the compilation and processing of the predictor variables and the 2-step RF modeling approach are explained.

2.3.2 Predictors

A total of 23 predictor variables were used in both RFs (Table 1). We selected predictors based on their potential causal influence on streamflow intermittence, taking into account the robust body of literature about the controls of streamflow intermittence (Costigan et al., 2016; Hammond et al., 2021; Shanafield et al., 2021). In contrast to multiple regression analysis, RF can leverage information from highly correlated predictors while producing unbiased predictions (Tyralis et al., 2019). To predict monthly time series of the number of no-flow days, predictors that vary from month to month are required. While climate time series might be considered as a predictor, monthly time series of area-specific streamflow are more directly related to the monthly number of days without streamflow than climate data and climate data would also have to be downscaled to HR values. Considering the streamflow conditions of a stream reach during several months, no-flow days are more likely to occur during a month with low monthly streamflow than in a month with high monthly streamflow.

Table 1. Predictors Used in RF Modeling, With Their Abbreviations, Units and Data Sources
Category Predictor type Predictor Abbreviation (unit) Source
Hydrology Monthly time series HR Monthly area-specific streamflow Q (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Minimum monthly area-specific streamflow of the past 12 months Q_min_p12 (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Mean monthly area-specific streamflow of the past 12 months Q_mean_p12 (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Minimum monthly area-specific streamflow of the past 3 months Q_min_p3 (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Mean monthly area-specific streamflow of the past 3 months Q_mean_p3 (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Monthly time series LR Ratio of diffuse groundwater recharge to runoff from land, mean over uba gwr_to_runoff_ratio (%) WaterGAP 2.2e
Climate Number of wet days, mean over uba wet_days (days mon−1/100) WaterGAP 2.2e
Hydrology Interannual variability of monthly area-specific streamflow, per calendar month, in terms of standard deviation Q_iav_sd (m3 s−1 km−2) Downscaled WaterGAP 2.2e
Hydrology Interannual variability of monthly area-specific streamflow, per calendar month, in terms of coefficient of variation Q_iav_cv (−) Downscaled WaterGAP 2.2e
Climate

Static

HR

Aridity index (long-term average P/PET), per calendar month, mean over uba P_to_PET_ratio (1/10,000) Global-AI_PET_v3c
Land cover Potential natural vegetation classes (ranges: 1–15), spatial majority in uba pot_nat_vegetation (−) EarthStatd
Land cover Land cover classes (ranges: 1–22), spatial majority in uba land_cover (−) GLC2000e
Land cover Glacier area fraction in uba glacier_frac (%) GLIMSf
Physiography Drainage area drainage_area (km2) HydroSHEDSg
Physiography Terrain slope, mean over uba slope (deg/100) EarthEnv-DEM90h
Geology Fraction of karst area in uba karst_frac (%) WOKAMi
Geology Occurrence of karst (100 if karst, 0 if not) at HR grid cell karst_status (−) WOKAMi
Anthropogenic drivers Fraction of area equipped for irrigation in uba irri_frac (%/100) HID v1.0j
Fraction of area equipped for irrigation in iubb irri_frac_im (%/100) HID v1.0j
Population density in uba pop_dens (people km−2) WorldPopk
Population density in iubb pop_dens_im (people km−2) WorldPopk
Degree of regulation (total upstream artificial reservoir storage volume/annual streamflow volume) at HR grid cell dor (%/10) HydroSHEDSg & GranDl
Lakes Fraction of lake area in uba lake_frac_(%/100) HydroLAKESm
  • Note. Area-specific streamflow is streamflow at the HR grid cell divided by upstream drainage area. The units are those for the data sets used as input to the RF modeling, in which the integer values were partly multiplied by 10, 100 or 10,000 to increase the precision.
  • a ub: HR upstream basin.
  • b iub: HR immediate upstream drainage basin, refers to all the HR grid cells that drain directly into the respective stream reach.
  • c Zomer et al. (2022).
  • d Ramankutty and Foley (1999).
  • e Bartholomé and Belward (2005).
  • f GLIMS & NSIDC (2012).
  • g Lehner et al. (2008).
  • h Robinson et al. (2014).
  • i Chen et al. (2017).
  • j Siebert et al. (2015).
  • k Bondarenko et al. (2020).
  • l Lehner et al. (2011).
  • m Messager et al. (2016).

We therefore used a total of seven predictors that are derived from the time series of WaterGAP HR monthly specific streamflow, of which five are dynamic, that is, vary from month to month. The five HR dynamic predictors indicate the streamflow conditions in each HR stream reach (represented by its respective most downstream HR cell) in the current month and the past 3 and 12 months. Before computing these predictors, streamflow was converted into area-specific streamflow by dividing it by the drainage area of the HR cell (i.e., the area of the upstream drainage basin). This area normalization is applied because the spatial variation in streamflow (m3/s) can span multiple orders of magnitude within short flow distances whereas specific streamflow (m3/(s km2) is more indicative of the general runoff formation characteristics of a region and therefore represents a better complementary predictor for streamflow intermittence in the RF model when applied in combination with drainage area of the HR cell as a static predictor (Table 1).

Of the two LR dynamic predictors, one is the ratio of diffuse groundwater recharge to total runoff from land for each month as computed by WaterGAP; a higher ratio of delayed groundwater discharge is expected to decrease the likelihood of no-flow days. The other is the average number of days with substantial precipitation (i.e., >2.5 mm/d) per month according to the WaterGAP climate forcing; a low number of days with substantial precipitation in a month may lead to an increased likelihood of no-flow days. While a spatially more highly resolved European daily precipitation data set is available for historic conditions (E-OBS, 0.1 arc-deg, https://surfobs.climate.copernicus.eu/dataaccess/access_eobs.php), the LR input data set of WaterGAP was used in our study as the developed RF model is to be applied to project the impact of climate change on streamflow intermittence, and precipitation projections of global climate models that are bias-adjusted to this precipitation data set are available from the ISIMIP (www.isimip.org). For both LR dynamic predictors, the average value over the upstream basin of each HR grid cell was computed assuming that the values in all upstream HR cells are identical within a given LR cell.

The five HR and two LR dynamic predictors vary between the 468 months of the study period. Three of the 16 static HR predictors vary with the calendar month, the two predictors that quantify the interannual variability of monthly streamflow and the aridity index, which is included as the long-term mean values for the 12 calendar months. Interannual variability was computed from the HR monthly time series of area-specific streamflow as either the standard deviation or the coefficient of variation of all streamflow values of each of the 12 calendar months for the period 1981–2019 (Table 1). It is expected that more no-flow days occur in the case of a large interannual variability of streamflow.

In addition to drainage area, the other 12 HR static predictors include the dominant potential natural and actual land cover class in the upstream basins and the average slope, glacier fraction and lake fraction in the upstream basin. These static HR predictors are selected from the set of globally important predictors from Messager et al. (2021). Additional static predictors include five suspected anthropogenic drivers of streamflow intermittence and two newly developed karst-related predictors derived from the World Karst Aquifer Map (WOKAM) of karstifiable areas (Table 1). It is expected that intermittence occurs more often in karst regions than in other areas due to the high permeability of karst. In the case of the static predictors of karst status and degree of regulation, the value for the HR grid cell for which the number of no-flow days is to be predicted (target cell) is used as a predictor. In the case of the anthropogenic drivers irrigated area fraction and human population, two sets of predictor values are taken into for each: one set of values computed by aggregating over the (total) upstream basin and the other computed by aggregating over the immediate upstream basin, which only encompasses upstream HR grid cells that drain directly into the respective stream reach (see Linke et al., 2019, for additional descriptions of these spatial units).

To train the RF models, the values of these predictors were assembled for each of the 1,166,944 station-months for which daily streamflow observations are available, that is, for the 3706 HR grid cells that contain a gauging station. For the model application, the predictor values for each reach (i.e., for the most downstream HR grid cell of each reach) were computed to predict the occurrence of one of the five intermittence classes for each reach-month.

2.3.3 Two-Step RF Modeling Approach

The first RF model in our approach (Figure 1, step 1) results in a binary classification of station-months as either non-perennial or perennial, whereas the second RF model (step 2) was only applied to non-perennial station-months and classified them into four ordinal non-perennial classes: 1–5, 6–15, 16–29 and 30–31 no-flow days per month. We performed a classification into four classes based on a previous study with a two-step RF model (with fewer target observations and slightly different predictors) where the performance for six classes was not satisfactory (Döll, Abbasi, & Trautmann, 2023). The four classes were defined such that they are informative for biodiversity and ecosystem function studies while keeping the number of observations per class approximately balanced. Following model training and validation for each of the RFs (Figure 1, left-hand side), we sequentially applied the calibrated models (right-hand side) to predict monthly streamflow intermittence for all reaches in Europe derived from the HR drainage direction map (see Section 2.4).

Details are in the caption following the image

RF modeling workflow for simulating monthly time series of streamflow intermittence on river reaches, that is, the number of no-flow days per reach-month in five classes (0, 1–5, 6–15, 16–29, 30–31 no-flow days). Each of the two RF models is setup by calibrating it such that the observed targets are best simulated; this includes the tuning of three hyperparameters in a non-spatial cross-validation (left-hand side of schematic). The intermittence status of each month and reach in Europe during 1981–2019 is calculated with the two calibrated RFs by applying first the step 1 RF for all reach-months and then the step 2 RF for all non-perennial reach-months (right-hand side of schematic).

Despite implementing the two-step approach, class imbalance persists in each step of the modeling process, with many more perennial station-months than non-perennial ones in step 1 and a relatively large number of station-months with 30–31 no-flow days in step 2. Therefore, we applied standard oversampling of the minor class (non-perennial) in step 1 by a factor of 34.68, the ratio of perennial to non-perennial months. In step 2, the three minor classes were oversampled such that, for each minority class, the number of training observations in that class was equal to the number of observations in the majority class (30–31 no-flow days).

Unlike hydrological models, RF models, which are ensembles of classification trees, do not have parameters that represent the properties of the modeled system. However, they have hyperparameters that determine how the RF algorithm exactly works, which can be tuned to maximize predictive accuracy while minimizing overfitting. RF hyperparameters are (a) the sample fraction, that is, the fraction of the training data that is randomly sampled without replacement for generating each individual tree, (b) the number of predictors that are sampled from the full set of predictors and considered by each tree when splitting each node (MTRY) and (c) the minimum number of observations that a terminal node can contain, which influences the depth of the trees (i.e., when tree construction stops). Model performance increases asymptotically with the number of decision trees. In this study, the number of decision trees was set to 800 to limit run times.

In each step, the RF was tuned and evaluated by twice-repeated three-fold nested cross-validation. Nested cross-validation, a resampling method that combines two levels of cross-validation loops (outer and inner loops) separates hyperparameter tuning in the inner loop from model performance evaluation in the outer loop (Bischl et al., 2012). In each loop, cross-validation uses different portions of the data to iteratively test and train a model on the different subsets of the data. A three-fold cross-validation means that the RF is trained with a random selection of two-thirds of the samples (training data), each sample consisting of the predictors and the target for one station-month. The predictive accuracy of the model is then evaluated with the remaining third of the samples (testing data). In a twice-repeated three-fold nested cross-validation, there are six rounds of cross-validation in total with different training and test data. Hyperparameter tuning for RF step 1 and step 2 was performed through 15 and 55 unique combinations of hyperparameter, respectively. For each round, hyperparameters were tuned by evaluating the performance of 15 and 55 unique combinations of hyperparameters in the case of the step 1 RF and the step 2 RF, respectively.

Model validation (Sections 4.1 and 4.2) was done using the results of the six rounds of cross-validation, that is, the results of the six RF models with an optimal combination of hyperparameters as determined by the inner loop. For each station-month, the two predicted probabilities pertaining to a certain class were averaged and the class was assigned. The threshold for assigning the perennial or non-perennial class was set to a probability of 50%, consistent with our efforts to balance the training data set.

The RF showing the highest balanced accuracy (BACC; Section 2.5) across all six rounds was used for model application (Section 4.3). This resulted in a calibrated RF model consisting of the best-performing step 1 RF and the best-performing step 2 RF. For step 1, the optimal values for sample fraction, MTRY and minimum number of observations for the terminal node were 0.25, 4 and 2, respectively; the corresponding values for step 2 were 0.75, 6 and 10, respectively.

We computed the relative contribution of predictors to the predictive ability of the model, in the form of the Actual Impurity Reduction (AIR) predictor importance metric. The higher the AIR, the more important the predictor. The role of predictor variables was also evaluated with partial dependence plots, which depict the marginal relationship between each predictor variable and the probability of a predicted class while holding the rest of the predictors at their respective mean values. Using 20 processors (Intel Xeon silver 4114 2.2 GHz) in parallel, the run time for setting up the step 1 RF was about 14 days, and 14 hr for setting up the step 2 RF.

2.4 Definition of Stream Reaches for Model Application

It would be computationally too expensive to estimate the streamflow intermittence status for all HR grid cells in Europe, regarding both computation time and data storage. With 73 million HR grid cells across Europe and 468 months (1981–2019), more than 34 billion predictions would have to be computed. Therefore, we applied the two RF models sequentially to predict the streamflow intermittence status of river reaches rather than individual grid cells. Predictions are made for the most downstream HR grid cell of each river reach and are assumed to represent the mean conditions over the whole river reach.

River reaches at the HR resolution of 15 arc-sec are available in HydroSHEDS (HydroRIVERS, Lehner & Grill, 2013, https://www.hydrosheds.org/products/hydrorivers) but they insufficiently cover headwater streams for the purpose of our study (Döll, Abbasi, & Trautmann, 2023); in addition, we had slightly modified the HydroSHEDs drainage direction map. Therefore, river reaches were newly generated from the modified HydroSHEDS HR drainage direction map by applying the following delineation thresholds: streams were defined to start at all HR grid cells with an upstream drainage area of more than 2 km2 (instead of 10 km2 in HydroRIVERS) or at a grid cell where the mean annual downscaled HR streamflow of WaterGAP 2.2e during the period 1981–2019 exceeds 0.03 m3/s (instead of 0.1 m3/s in HydroRIVERS). Decreasing the threshold for streamflow to 0.02 m3/s would lead to potential “aggregates" of multiple streams in one grid cell in wet areas. Using these delineation thresholds, the resulting number of reaches in Europe is 1,533,471, with an average reach length of 2.0 km (standard deviation 1.7 km), representing a total stream network length of 3.06 million km. Accordingly, the European data set of monthly streamflow intermittence status contains a total of 717,664,428 reach-months covering the period 1981–2019.

The river reaches as derived from the drainage direction data set may not correspond to actual river reaches. In particular, river reaches (and therefore the streamflow intermittence status) are also delineated inside the boundaries of lakes and artificial reservoirs. Users of the streamflow intermittence data set may therefore need to mask out simulated reaches as appropriate.

2.5 Performance Metrics

As the observation data were strongly imbalanced, we evaluated model performance through the cross-validation of the two RFs based on the Balanced ACCuracy (BACC). BACC provides a better indication of the classification performance of imbalanced models than raw accuracy (the percentage of correctly classified observations). In the binary case of step 1, BACC is the mean of sensitivity and specificity, with
sensitivity = TP ( TP + FN ) $\text{sensitivity}=\frac{\text{TP}}{(\text{TP}+\text{FN})}$ (1)
specificity = TN ( TN + FP ) $\text{specificity}=\frac{\text{TN}}{(\text{TN}+\text{FP})}$ (2)
where TP: true positive, FN: false negative, TN: true negative and FP: false positive, resulting from the confusion matrix (Figure S1 in Supporting Information S1). Here, sensitivity and specificity are the hitting rates for non-perennial and perennial station-months, respectively. In the multiclass case of step 2, we follow the definition of Urbanowicz and Moore (2015) whereby the mean of sensitivity and specificity is calculated for each of the four classes and then averaged over the classes.
Model performance was also evaluated with the Nash-Sutcliffe efficiency (or model efficiency), a traditional performance metric in hydrological modeling. It provides an integrated measure of model performance concerning mean values and variability and is computed as
NSE = 1 1 n sim ( t ) obs ( t ) 2 1 n obs ( t ) μ obs 2 $\text{NSE}=1-\,\frac{\sum\limits _{1}^{\mathrm{n}}{\left({\text{sim}}_{(t)}-{\text{obs}}_{(t)}\right)}^{2}}{\sum\limits _{1}^{\mathrm{n}}{\left({\text{obs}}_{(t)}-{\mu }_{\text{obs}}\right)}^{2}}$ (3)
where μobs is the mean of observations across all time steps; sim(t) and obs(t) refer to the simulated and observed values respectively, at time-step t of a total number of time steps n. NSE can range from -Inf to 1; a value of 0 indicates that the model performs no better than simply using the mean of the observed data to predict the values, and a value of 1 indicates perfect agreement between the observed and modeled values.

3 Validation of HR Time Series of Monthly Streamflow in Europe

Comparing the downscaled HR monthly streamflow time series to the monthly time series of observed streamflow at the 3706 gauging stations across Europe yielded a median NSE value of 0.41; NSE exceeds 0 for 69% of the stations, and 25% of stations exceed the value of 0.64 which indicates a rather good performance. When NSE is computed with the logarithm of streamflow, which puts a larger weight on low-flow months of interest for intermittence, NSE exceeds 0 for 63% of stations and 0.57 for 25% of stations. This shows that streamflow during the low-flow months is also estimated reasonably well. However, the performance of simulated HR streamflow is very poor in most of Spain, where human activities strongly impact streamflow (Figure 2). Although the impact of artificial reservoirs as well as groundwater and surface water abstractions are simulated by WaterGAP, the coarse resolution of the original model calculations (at LR grid cells) prevents the identification of the specific locations of these impacts in the downscaling procedures. Also, the HR location of natural surface water bodies, that is, lakes and wetlands, is not explicitly taken into account in the downscaling method, causing potential misallocation of their attenuating effects on HR streamflow. Furthermore, other anthropogenic disturbances such as weirs are not accounted for in the original WaterGAP estimates. A poorer performance of HR streamflow in strongly altered streams is therefore due to both downscaling constraints and the difficulty of simulating human impacts at the LR resolution.

Details are in the caption following the image

NSE of monthly HR streamflow for 3706 gauging stations in Europe. The continental area considered to belong to Europe in this study is shown in gray.

The performance of the downscaling algorithms can be assessed by comparing the NSE values at gauging stations with different upstream areas (Figure 3). LR streamflow as computed by WaterGAP is generally only compared to streamflow observed at gauging stations with upstream areas of more than 10,000 km2, as a single LR grid cell can cover more than 2,500 km2 (Müller Schmied et al., 2021). The high uncertainty of the global climate data sets used as the input of WaterGAP also inherently limits model performance for smaller basins. The performance of simulated streamflow does not decrease much with decreasing upstream area of the gauging stations (Figure 3a). For example, the median NSE for drainage basins larger than 10,000 km2 is 0.51, while it is only slightly lower at 0.38 for the smallest drainage basins with areas below 2 km2. The median NSE of logarithmic streamflow decreases from 0.40 for the basins larger than 10,000 km2 to 0.14 for basins smaller than 2 km2 (Figure 3b). When interpreting the NSE values, it should be noted that the stations are not equally distributed between the different catchment area classes; for instance, there are less than 100 stations with an upstream area of less than 10 km2. Furthermore, the relationship between predictive performance and drainage area is not consistent among stations on non-perennial and perennial waterways. In the case of non-perennial stations (n = 885), there is a decline in NSE values from basins with upstream areas of less than 50 km2 to basins larger than 10,000 km2 (Figure S2 in Supporting Information S1), whereas the opposite is true for perennial stations (n = 2,821; Figure S3 in Supporting Information S1). While small non-perennial basins smaller than 2 km2 are characterized by a median NSE of 0.49 (median NSE for log streamflow: 0.21), the large basins over 10,000 km2 show a very poor performance with a median NSE of less than 0 (Figure S3 in Supporting Information S1). This might be due to the difficulty of simulating the impact of reservoir operations on intermittence. Considering the size class of 50–500 km2, which includes the most stations of both non-perennial (>100 stations) and perennial types (>1000 stations), the median NSE is 0.23 for non-perennial stations and 0.43 for perennial stations (Figures S2 and S3 in Supporting Information S1).

Details are in the caption following the image

NSE of monthly streamflow time series (left) and of the logarithm of monthly streamflow time series (right) for all 3706 streamflow stations with observations, grouped in size classes of the upstream area of the streamflow gauging stations. The boxes indicate the 25th, 50th (median) and 75th percentiles and the whiskers the 5th and 95th percentiles of the samples. The blue lines of the violin plot show the smoothed distribution of the data points. The “number of stations not shown” indicates the number of stations with an NSE of less than −1.

4 RF Modeling Results

4.1 Model Validation

4.1.1 Step 1 RF

The cross-validation of the calibrated step 1 RF resulted in a BACC of 0.92. Of all perennial station-months, 98% were correctly identified as perennial, that is, without any no-flow day (Table 2). Consequently, 25,496 (2%) of all perennial station-months were erroneously identified as non-perennial. Of the non-perennial station-months, 86% were correctly identified as non-perennial, that is, 4,463 non-perennial months were wrongly identified as perennial. Thus, the step 1 RF tends to overestimate the occurrence of non-perennial months in absolute terms. In Europe, streamflow intermittence is more prevalent in the summer (JJA) and in the fall (SON) than in winter (DJF) and spring (MAM), and this is also the case for the number of predicted non-perennial months (Table 2). A higher percentage of non-perennial station-months, about 88%, was correctly identified as non-perennial in JJA and SON than in the other two seasons (Table 2).

Table 2. Number of Observed and Correctly Simulated Perennial and Non-Perennial Stations and Station-Months
Number of stations Number of station-months
All DJF MAM JJA SON
Correctly simulated as perennial Observed as perennial $\frac{\text{Correctly}\,\text{simulated}\,\text{as}\,\text{perennial}}{\text{Observed}\,\text{as}\,\text{perennial}}$ 2806 2821 $\frac{2806}{2821}$ 1 , 108 , 741 1 , 134 , 237 $\frac{1,108,741}{1,134,237}$ 280 , 627 284 , 423 $\frac{280,627}{284,423}$ 287 , 832 291 , 917 $\frac{287,832}{291,917}$ 268 , 165 276 , 920 $\frac{268,165}{276,920}$ 272 , 117 280 , 977 $\frac{272,117}{280,977}$
99.5% 97.8% 98.7% 98.6% 96.8% 96.8%
Correctly simulated as non perennial Observed as non perennial $\frac{\text{Correctly}\,\text{simulated}\,\text{as}\,\text{non}-\text{perennial}}{\text{Observed}\,\text{as}\,\text{non}-\text{perennial}}$ 551 885 $\frac{551}{885}$ 28 , 244 32 , 707 $\frac{28,244}{32,707}$ 3 , 445 4 , 297 $\frac{3,445}{4,297}$ 3 , 627 4 , 391 $\frac{3,627}{4,391}$ 10 , 294 11 , 643 $\frac{10,294}{11,643}$ 10 , 878 12 , 376 $\frac{10,878}{12,376}$
62.3% 86.4% 80.2% 82.6% 88.4% 87.9%
  • Note. Observed (bottom numbers) and correctly simulated (top numbers). Information on station-months is provided for all months and the four seasons December to February (DJF), March to May (MAM), June to August (JJA) and September to November (SON).

The overestimation of non-perennial months mainly occurs at stations that are both observed and simulated to be non-perennial, that is, stations that have at least one no-flow day in the whole period 1981–2019, as only 15 perennial gauging stations, scattered throughout Europe, were erroneously predicted to be non-perennial (dark red symbol in Figure 4b). Thus, 99.5% of all 2821 stations observed to be perennial were correctly simulated to be perennial (Table 2, gray symbols in Figure 4b). The 885 gauging stations with at least one non-perennial month, that is, 24% of all stations considered in this study, are particularly concentrated on the Iberian Peninsula, Sardinia and Cyprus (Figure 4a), where gauging stations commonly recorded more than 20% of non-perennial months. Elsewhere, almost all non-perennial stations have less than 20%, and mostly less than 10%, of non-perennial months. No intermittence is observed in winter months in the northern parts of Scandinavia, even though no-flow conditions are commonly reported in these climates because of dry conditions, the storage of precipitation as snow, and freezing (Buttle et al., 2012). Intermittence was not even observed at a station on a northern Norwegian island with a small drainage area of 19 km2. Intermittence at the stations in western Finland occurs only in the summer and only at stations with small upstream areas. The two non-perennial stations in northern Sweden are located downstream of large artificial reservoirs.

Details are in the caption following the image

Percentage of observed non-perennial months (with at least one no-flow day) per gauging station for all observations during 1981–2019 (a), ratio of the number of predicted months to the number of observed non-perennial months (P: perennial, NP: non-perennial) (b) and Pearson correlation of the annual time series of the number of non-perennial months (c), as simulated by the step 1 RF model.

Over a third of non-perennial stations (334 out of 885) were wrongly simulated to be perennial by the step 1 RF (dark blue dots in Figure 4b); these stations are distributed across Europe with no clear spatial clustering. Many of these stations are located on streams that normally flow year-round but that exceptionally dried, for example, during a severe drought. Indeed, these non-perennial stations that were wrongly classified as perennial have a median of only 2 non-perennial months across their entire record (range: 1–19 months), while the 551 correctly classified non-perennial stations have a median of 35 months (range 2–431 non-perennial months).

When considering only the 885 non-perennial stations, the median and mean percent of observed non-perennial months are 5.6% and 15.8%, respectively. Whereas 86% of all observed non-perennial stations months (28,244 out of 32,707) are correctly predicted to be non-perennial, 11% of observed perennial station-months at non-perennial stations (25,398 out of 233,195) are wrongly predicted to be non-perennial. This resulted in a general overestimation of the total share of non-perennial station-months at non-perennial stations. While 13% of all station-months at non-perennial stations are observed to be non-perennial (and 11% correctly predicted as such), 21% are predicted to be non-perennial. The overestimation is concentrated in regions with a relatively high prevalence of intermittence, that is, large parts of the Iberian Peninsula, Sardinia and Cyprus (compare Figures 4a and 4b), where non-perennial months are often overestimated by a factor of more than 2 (Figure 4b). The main suspected reasons for this overestimation are the poor ability of the downscaled streamflow estimates (Figure 2) and the RF model to capture the strong human impacts on streamflow dynamics in large parts of Spain as well as Cyprus (not Sardinia). In these semi-arid regions, a multitude of small and large dams as well as water transfers by canals often make naturally non-perennial streamflow perennial (Chiu et al., 2017). Even though some large reservoirs are considered when computing LR net cell runoff used to estimate HR streamflow, the simulation of reservoir outflow is very uncertain already at LR. In addition, information on reservoirs, weirs or canals in the individual HR cells within each LR is not taken into account in the streamflow downscaling approach. The reservoirs included in the computation of the static HR predictor dor (degree of regulation by upstream dams; Lehner et al., 2011) (Table 1) are only a subset of the actual reservoirs and small reservoirs are missing.

Interannual variability of the number of non-perennial months per year is simulated quite satisfactorily, in particular for gauging stations in southern Spain (Figure 4c). Considering all 885 non-perennial stations, the median Pearson correlation coefficient between the observed and predicted annual time series of the number of non-perennial months is 0.50. Thus, the step 1 RF is able to capture the interannual variability of climatic conditions. That said, the corresponding NSE values (i.e., based on the annual time series; not shown) are below zero at almost all stations due to the strong overall overestimation of non-perennial months.

As expected, gauged streams in smaller drainage basins are both observed and simulated to be more strongly non-perennial than larger drainage basins, especially in the two smallest drainage basin classes 0–2 km2 and 2–5 km2 (Figure S4 in Supporting Information S1). However, non-perennial months are also most overestimated in these size classes; the predicted median proportion of non-perennial months for these stations is twice the observed median of about 13%. For drainage basins larger than 2,500 km2, on the contrary, the step 1 RF tends to underestimate the already low percentage of non-perennial months (though it strongly overestimates intermittence for a few basins, too; Figure S4 in Supporting Information S1).

4.1.2 Step 2 RF

The target of the step 2 RF are the observations of the number of no-flow days, in four classes (1: 1–5, 2: 6–15, 3: 16–29, 4: 30–31 no-flow days) in observed non-perennial months. At most non-perennial gauging stations, class 1 (1–5 no-flow days) dominates, whereas class 4 (30–31 no-flow days) dominates in many stations with more than 10% of non-perennial months, in particular in the central and southern part of the Iberian Peninsula and in Cyprus (Figure S5 in Supporting Information S1). With a BACC of 0.67 (averaged over the four classes) in the cross-validation of the calibrated step 2 RF, the classification performs satisfactorily. More than three quarters of the station-months with observed class 4 (30–31 no-flow days) are correctly classified, and almost half of the station-months with 1–5 and 16–29 no-flow days are correctly classified (Figure 5). Although the model exhibits weaker performance for station-months with 6–15 observed no-flow days, these observations are still more likely to be correctly classified than pertaining to any of the three other classes. Classification performance is highest for the class with the most observations, 30–31 no-flow days, as can be expected in RF modeling. In total, 54% of the 32,707 station-months are classified into the correct observed class, and of the wrongly classified observations, 70% are predicted to belong to neighboring classes (Figure 5).

Details are in the caption following the image

Confusion matrix of predicting four classes of no-flow days per station-month. The top number in each box shows the total number of station-months belonging to the observed and simulated intermittence class, the bottom number the percent of the total number of station-months that are observed to be in the intermittence class (step 2 RF model).

The percentage of non-perennial months that are correctly classified into the four classes shows no spatial pattern across Europe (Figure 6a), although the overestimation of no-flow days is most pervasive in Spain where the number of observed no-flow days is already high (red in Figure 6b). The step 2 RF tends to overestimate the number of no-flow days in the non-perennial station-months whereas the step 1 RF also overestimates the number of non-perennial months (e.g., in many stations on the Iberian Peninsula). The bias shown in Figure 6b correlates weakly with the ratio of predicted to observed non-perennial months shown in Figure 4b, with a Pearson correlation coefficient of 0.11. The correlation between the monthly time series of observed and simulated intermittence classes, as measured by the Spearman rank correlation coefficient, is positive for most gauging stations, and larger than 0.3 for 38% of stations (Figure 6c). This correlation analysis does not include the perennial months at a station. The overall performance of the monthly time series of five classes, with class 0 for perennial months, reflects the combined performance of the step 1 and step 2 RFs and thus the overall RF modeling approach used for estimating streamflow intermittence for all reach-months in Europe. These correlation values, shown in Figure 6d, are much higher than the correlation for only the non-perennial months; values larger than 0.9 dominate. The median Spearman rank correlation coefficient for the monthly time series of the five intermittence classes is 0.81, with 90% of the stations exceeding a value of 0.58% and 14% of the stations exceeding a value of 0.99.

Details are in the caption following the image

Percentage of non-perennial months that are correctly classified into the four classes (1: 1–5 no-flow days per month, 2: 6–15, 3: 16–29, 4: 30–31) by the step 2 RF at each of 885 gauging stations with at least 1 no-flow day in their record (a), bias expressed as simulated mean class number (1 through 4) minus observed mean class number (green: correct average classification, red: overestimation of no-flow days, blue: underestimation of no-flow days) (b), and Spearman rank correlation coefficient for the monthly time series of simulated and observed intermittence classes, for four classes 1–4 (c) and five classes 0–5, with class 0: 0 no-flow day (d). All correctly classified perennial stations were omitted from the maps and would show a correlation coefficient of 1.

4.2 Importance of Predictors and Dependence of Predicted Class on Predictor Values

All 23 predictors were found to be significant at the p-value = 0.05 level. The relative importance of the 23 predictors differs strongly between step 1 RF (identifying whether a station-month is non-perennial) and step 2 RF (identifying the number of no-flow days in non-perennial months, in four classes) (Figure 7). However, two predictors computed from the downscaled HR monthly streamflow, namely the monthly area-specific streamflow (Q) and the mean of the area-specific streamflow of the previous 3 months (Q_mean_p3), are among the five most important predictors in both RFs. Both are predicted to be negatively correlated to the probability of intermittence, as was expected (Figure S6 in Supporting Information S1).

Details are in the caption following the image

Predictor importance for step 1 Random Forest (RF) (left) and step 2 RF (right). The higher the impurity reduction, the larger the relative importance of a predictor. The higher absolute values for the step 1 RF are due to the larger number of station months available as target. Error bars show the standard deviation across the six cross-validation training sets calculated for both the step 1 and step 2 RFs. The relatively larger error bars for the step 2 RF are due to considering four classes instead of only two in the step 1 RF. Dynamic HR predictors are indicated by * and dynamic LR predictors by +.

The most important predictor in the step 1 RF is the size of the drainage basin of the streamflow gauging station (Figure 7), with the probability of intermittence decreasing with increasing size up to a drainage area of about 20,000 km2 (Figure S6 in Supporting Information S1). Terrain slope (slope) and the precipitation to potential evapotranspiration ratio (P_to_PET_ratio) show similar importance in step 1 and take up ranks 4 and 5, respectively. The partial dependence plots for the step 1 RF show, for all but 2 of the 23 predictors, correlations between the predictor and the likelihood of intermittence that are expected by hydrologists. For example, the partial dependence plot for interannual variability as expressed by the coefficient of variation (Q_iav_cv) shows the expected behavior, with the intermittence probability increasing with increasing Q_iav_cv for Q_iav_cv > 0.4. Exceptions to this correspondence between model predictions and hydrological understanding include the terrain slope (slope) and, albeit less conclusively, the degree of regulation (dor) (Figure S6 in Supporting Information S1). Steeper slopes across the upstream drainage area are expected to make intermittence more likely (Šarauskienė et al., 2020) due to a decrease in the fraction of runoff that recharges groundwater and thus a decrease in baseflow, but the RF predicted the opposite correlation. This negative correlation can be explained by the spatial distribution of the gauging stations; gauging stations in steeper terrain are those in the mountainous regions along the Spanish Atlantic coast, the Pyrenees and the Alps, that is, wet regions with large runoff. As for the degree of regulation, artificial reservoirs can make streams either more perennial or more non-perennial, depending on reservoir management (e.g., for hydroelectricity, irrigation, flood control) and river type (Datry et al., 2023). Here, the step 1 RF showed that increased regulation was associated with greater levels of intermittence (Figure S6 in Supporting Information S1). A likely reason for this correlation is that many stations downstream of large dams in our training data set were located in dry areas like Spain, where intermittence is common and flow regulation by reservoirs is associated with extensive water withdrawal (Sabater & Tockner, 2009). This predictor's importance in RF 1 is very low (Figure 7), so the impact of this counterintuitive relationship on model predictions is minor.

In the step 2 RF, all of the five most important predictors are dynamic predictors. They include four HR predictors derived from the downscaled WaterGAP output (Figure 7). In addition to the highest ranking Q and Q_mean_p3, the minimum area-specific streamflow over the previous 3 months (Q_min_p3) and the mean area-specific streamflow over the previous 12 months (Q_min_p12) are among the five most important predictors. The LR predictor of the number of wet days per month is ranked second in importance.

4.3 Predicted Time Series of Monthly Streamflow Intermittence Status of Stream Reaches in Europe

In total, 96.2% of the approximately 718 million reach-months at more than 1.5 million stream reaches in Europe are simulated as perennial in the period 1981–2019 (Table 3). 82.2% of the stream reaches and 81.3% of the European network length of 3.06 million km are simulated to never have experienced a no-flow day during this period. Reaches with non-perennial months are simulated to exist in almost all European countries, but high percentages of non-perennial months are prevalent on the Iberian Peninsula, Sardinia and Cyprus and also occur in southern Italy and Greece (Figure 8). Large regions with low fractions of non-perennial months exist in France but also in Finland, Belarus and Ukraine.

Table 3. Occurrence of the Five Intermittence Classes at the Gauging Stations and All Reach-Months in Europe
Class Station-months Reach-months
Observed Predicted Predicted
0: Perennial 1,134,237 1,113,204 690,269,534
97.20% 95.39% 96.20%
1: 1–5 no-flow days 5,643 5,248 413,786
0.48% 0.45% 0.06%
2: 6–15 no-flow days 6,030 5,338 549,107
0.52% 0.45% 0.08%
3: 16–29 no-flow days 8,634 8,484 1,742,476
0.74% 0.73% 0.24%
4: 30–31 no-flow days 12,400 13,637 24,689,525
1.06% 1.17% 3.43%
Total 1,166,944 1,145,911 717,664,428
100% 98.20% 100%
  • Note. In this study, Europe does not include Russia and Turkey. The gauging stations represent those that were used to set up the RF model, where the fraction of all station-months with observed and simulated classes is provided. In each column, the first row shows the total number of stations or reaches in the class and the second row shows the percentage in the class. The percentage values for the reach-months relate to the total number of reach months (468) during 1981–2019; and for the station-months, to the number of station-months with observations. As the step 2 RF model predicting the four classes of no-flow days was set up only for the station-months that are observed to be non-perennial, the predicted class percentages do not add up to 100%.
Details are in the caption following the image

Percentage of months with at least one no-flow day for European stream reaches during the period 1981–2019.

The predicted prevalence of perennial conditions across reaches is similar to the observed prevalence in streamflow gauging stations where 97% of the observed station-months and 76% of the stations are perennial. As drainage area is the most important predictor for a station-month being perennial or non-perennial, with small basin size leading to a higher probability of intermittence, it is surprising that a higher percentage of reaches is simulated to be perennial as compared to the gauging stations. Reaches with small upstream basins of less the 50 km2 comprise 78% of all reaches, whereas only 12% of gauging stations have such small basins (Table 4). This highlights the importance of the interplay of all predictors of the step 1 RF and may be affected by our voluntary addition of non-perennial data in observations.

Table 4. Mean Streamflow Per Station-Month and Reach-Months Averaged for Drainage Basin Area Classes
Upstream area [km2] Gauging stations Reaches
Mean observed (m3 s−1) Mean predicted (DSSa) (m3 s−1) Number of stations in this study/in Döll, Abbasi, and Trautmann (2023) Total station-months (%) Mean predicted (DSSa) (m3 s−1) Total reach-months (%)
(0–2] 0.05 0.03 16/8 0.45 0.05 10.92
(2–5] 0.12 0.08 22/10 0.62 0.04 30.97
(5–10] 0.21 0.17 53/29 1.49 0.1 14.10
(10–50] 0.79 0.58 393/272 9.70 0.32 21.70
(50–500] 4.55 3.25 1,786/896 47.54 2.33 14.56
(500–2,500] 17.29 15.65 789/358 22.41 15.65 4.41
(2,500–10000] 56.91 57.73 366/178 10.13 62.25 1.90
>10,000 512.31 544.30 281/164 7.69 594.55 1.49
  • a Downscaled streamflow.

The fraction of reach-months with 30–31 no-flow days (3.4%) is much higher than the corresponding fraction of the stream-months that are observed and predicted to occur at the streamflow gauges (1.1%; Table 3). This is not due to the much higher prevalence of reaches with small upstream basins than of stations with such small basins (Table 4) as also in each drainage area size class, the fraction of months with 30–31 no-flow days is larger for the reach-months than for the station-months (Table 5). Both station observations and reach predictions agree that the likelihood of perennial months increases and the likelihood of 30–31 no-flow days decreases with increasing size of the drainage basin (Table 5). The exception are the smallest reaches with an upstream area of 2 km2 or smaller because we only generated such small reaches from the 15 arc-sec drainage direction map where the mean annual downscaled HR streamflow during the period 1981–2019 exceeds 0.03 m3/s (Section 2.4); this explains the high fraction of perennial months in the smallest size class. One reason for the higher prevalence of the class 30–31 no-flow days for the reach-months as compared to the station-months in all size classes between 2 and 500 km2 may be that the average streamflow for all reach-months of a certain size class is smaller than for the gauges (both observed and predicted) (Table 4). This discrepancy likely led to more dry months, because streamflow is the most important predictor in the step 2 RF (Figure 7). At the same time, the fraction of perennial reach-months, which is determined by the step 1 RF, is also higher than the fraction of perennial station-months in each size class, such that the other three intermittence classes are predicted to be very rare among the reach-months. The reason for this is unknown, but one aspect to consider may be that streamflow is not the most important predictor in the step 1 RF (Figure 7).

Table 5. Percent of Observed Station-Months and Predicted Reach-Months (1981–2019) in the Five Intermittence Classes
Upstream area [km2] Observed station-months in classes 0–4 (%) Predicted reach-months in classes 0–4 (%)
0 1 2 3 4 0 1 2 3 4
(0–2] 87.30 2.57 4.42 4.42 1.28 99.80 0.008 0.15 0.03 0.14
(2–5] 88.48 1.93 3.03 3.77 2.79 93.55 0.14 0.20 0.58 5.52
(5–10] 94.18 1.11 1.58 1.87 1.25 95.27 0.05 0.06 0.30 4.32
(10–50] 95.85 0.59 0.85 1.13 1.58 96.58 0.01 0.01 0.06 3.33
(50–500] 96.83 0.34 0.55 0.86 1.41 97.55 0.003 0.006 0.03 2.41
(500–2,500] 98.33 0.24 0.34 0.49 0.59 99.34 0.02 0.006 0.01 0.62
(2,500–10000] 99.32 0.10 0.13 0.14 0.30 99.84 0.02 0.007 0.005 0.13
>10,000 98.85 0.23 0.23 0.28 0.40 99.83 0.02 0.007 0.01 0.14
  • Note. These values are represented as a function of upstream drainage area [km2] of the streamflow gauging stations or the reach. Classes: 0: perennial, 1: 1–5 no-flow days, 2: 6–15 no-flow days, 3: 16–29 no-flow days, 4: 30–31 no-flow day. In total, 1,166,944 station-months and 717,664,428 reach-months are considered.

The actual number of perennial months in reaches with upstream areas of 2–50 km2, the dominant upstream area classes listed in Table 4, may even be higher as the step 1 RF tends to underestimate the fraction of perennial station-months (Figure S4 in Supporting Information S1). However, the number of streamflow gauging stations for that class, in particular in the size class under 10 km2, is rather small (Table 4).

The prevalence of intermittence across the European river network shows a clear seasonal and interannual variability. While 97.6%–99.8% of the European reaches are perennial in January and February, this is the case for only 89.6%–93.4% in August and September (Figure 9). There is no overall trend over the whole 39-year period, but seasonal minima and maxima of the fraction of perennial reaches decreased from 2013 to 2019, while the opposite is true for the fraction of months with 30–31 no-flow days (Figure 9). The southern European countries of Portugal, Spain, Italy, Greece and Cyprus have a much higher seasonal range of the fraction of perennial reaches; in July to August, only about 70% of the reaches are perennial, while in winter, it is close to 90%–99%, depending on the year (Figure S8a in Supporting Information S1). In the Scandinavian countries Norway, Sweden and Finland, the (very low) occurrence of non-perennial conditions is larger in the second half of the study period, but the highest level of intermittence occurred in 1996, related to unusually low precipitation (Figure S7b in Supporting Information S1). As an illustration of the spatial distribution and seasonality of streamflow intermittence, the European maps for streamflow intermittence in January and August 2019 are shown in Figure S8 in Supporting Information S1.

Details are in the caption following the image

Monthly time series of the percent of all European stream reaches in the five intermittence classes for the period 1981–2019. The dots in the uppermost graph show the percent of all stream reaches that are perennial (class 0), the graphs below the percent values in the four non-perennial classes.

5 Discussion

5.1 Validation of Streamflow Intermittence Predictions Using ONDE Observations for France

In this study, we chose to use all daily streamflow observations available for the study period to set up the RF model, to obtain a robust model based on the maximum amount of information. Temporal validation of the models with independent data was not conducted. The RF model of Döll, Abbasi, and Trautmann (2023), with fewer streamflow observations and slightly different predictor variables and intermittence class definitions, was trained with data for a calibration period that encompassed, for each gauging station, the first two-thirds of the available observed months, while the rest was left for independent validation; 99% and 95% of the perennial station-months were predicted correctly for the calibration and the validation period, respectively. Considering only the non-perennial station-months (i.e., predictions from the step 2 RF), the frequency of predicting the correct class decreased from 56% in the calibration period to 47% in the validation period. In this study, we validate our predictions with a data set of visually observed intermittence for France.

We used observations from the French National River Drying Observatory (Beaufort et al., 2018; ONDE, 2020) to validate our predictions for 2,865 reaches and 148,004 reach-months in France of whether each reach-month was non-perennial (with a least one no-flow day) or perennial, that is, the step 1 RF. The ONDE network consists of a stable set of approximately 3,300 sites on river and stream reaches of Strahler orders under five, which have been inspected since 2012 by trained public staff from the French Biodiversity Office (OFB in French), at least monthly between May and September with the objective of identifying all drying events. If either the status “no visible flow” or “dried out” was assigned in any month, we considered the reach-month to be observed as non-perennial. Considering that its objective is to track intermittence in mostly headwater streams, the ONDE data set has a much higher percentage of non-perennial reaches and reach-months than the European streamflow gauging station data set used to set up the RF model. While 61% of the reaches and 15% of the reach-months are non-perennial in the ONDE data set, only 24% of the European gauging stations and 2.8% of the station-months are non-perennial. Considering only French gauging stations, the respective values are 38% and 3.5%. About 73% of ONDE reaches have a drainage area of less than 50 km2, which is similar to the fraction of European reaches in the size class (Table 4), whereas this is the case for only 12% in the European data set of gauging stations.

Compared to the ONDE data, the step 1 RF model underestimates the number of non-perennial reach-months (Figure 10d), whereas it tends to overestimate the number of non-perennial station-months relative to the 3,706 European streamflow stations. With a BACC of 0.53, only 8% of the non-perennial reach-months in ONDE are correctly identified (Figure 10c). Underestimation occurs in all size classes, increasing from an underestimation of, on average, 4 months for upstream areas of less than 10 km2 to an underestimation of 6–7 months for basins between 10 km2 and 2,500 km2. Considering whether reaches are non-perennial or perennial, only 23% of the non-perennial reaches were correctly predicted as such, compared to 62% for the European stations used to set up the step 1 RF (Table 2). Our RF model achieves a balanced accuracy of only 0.54 (Figure 10a) in its binary classification of ONDE reaches, while the global static RF model of naturally non-perennial reaches of Messager et al. (2021) yielded a slightly higher value of 0.59 (Figure 10b). The spatial pattern of agreement of the static global model is less patchy than that of our model. The global model predicts intermittence to occur in large contiguous areas, as it is mainly driven by larger-scale climatic predictors whereas our dynamic European model is strongly driven by small-scale streamflow characteristics. In addition, our model is based on more streamflow gauging stations. For unknown reasons, our RF model cannot predict non-perennial reaches along the Mediterranean coast, which differs from the static global model. If the threshold for perennial conditions is increased to a probability of 75%, which does not lead to a decrease of BACC, 91% of the non-perennial reaches would be correctly identified, but then 85% of the perennial reaches would be incorrectly predicted as non-perennial.

Details are in the caption following the image

Comparison of simulated intermittence of reaches and reach-months with the ONDE data set of visually observed intermittence. Correspondence between the simulated and observed intermittence state of reaches for our Random Forest (RF) model (a) and the RF model of Messager et al. (2021) (b), percent of correctly classified reach-months in our model (c) and ratio of predicted to observed non-perennial months in our model (d).

5.2 Challenges of Continental-Scale Estimation of Streamflow Intermittence Caused by the Lack of Streamflow Observations

The upstream area is the most important predictor for the likelihood of a station-month to be non-perennial, yet we cannot assume that the inclusion of this predictor in the RF model development leads to a good representation of the effect of upstream area on the likelihood of intermittence given the existing spatial distribution of our target data derived from observations of daily streamflow. To represent the upstream area appropriately in the target data, we would need a data set of streamflow gauging stations that show the same distribution of upstream areas as the stream reaches; however, the distributions are extremely different (Table 4). While 77.7% of the reaches have an upstream area of up to 50 km2, this is only the case for 12.3% of the gauging stations. The largest size class for the reaches is, with 31%, the class 2–5 km2, but only 0.6% of the station-months are in this class. As an illustration, if we would like to have the same size distribution with the 22 stations in class 2–5 km2 that were available, then we would have to consider only 70 stations in total, instead of 3,706. As the fraction of perennial months is higher for reaches than for the stations, especially for drainage areas below 50 km2 (except for the smallest size class due to the definition of the smallest reaches, Table 5), a further decrease of the average fraction of non-perennial months for the gauging stations by the extension of the data set might have led to an even stronger underestimation of intermittence in these headwater reaches. However, by our extension, we more than doubled the number of stations in the class 2–5 km2 by raising the number of stations from 10 to 22 (Table 5) which increased the information base upon which the RF models were trained.

The performance of our model certainly suffers from the general problem of imbalanced target data, with 97.2% of the station-months being perennial. The most important approach to handle this problem was the two-step approach whereby the prediction of perennial months in step 1 was followed by the prediction of the number of no-flow days per month only for that 2.8% of all station-months for which at least one no-flow day was observed. In addition, various alternative methods for handling imbalanced data were tested for the step 2 RF. Oversampling resulted in slightly better BACC values than undersampling and the Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al., 2002).

The unavailability of data for small rivers and in specific regions, in particular for intermittent river reaches, is a structural problem in hydrometric networks (Krabbenhoft et al., 2022; van Meerveld et al., 2020) that is beyond the scope of this study. However, we made a diligent effort to compile more empirical data in small streams than is represented in the GRDC data set through the addition of SMIRES, Spanish, Italian and French gauges. We have also created a custom river network with a smaller channel initiation threshold (the minimum catchment size for a river segment to be created) as compared to Messager et al. (2021) for including gauges on smaller headwater streams.

5.3 Comparison to Previous Modeling Studies

In this study, we estimate that 18.7% of the European stream network length was non-perennial in the period 1981–2019, while the global model of Messager et al. (2021) predicts a value of 17.1% for our European study area. However, these values cannot be compared directly for various reasons. Our river network includes smaller headwater streams than the global model (representing 12.4% of the European river network used in this study; see Section 2.4) and the definition of non-perennial reach is slightly different (global model: 1 no-flow day per year, our model: 1 no-flow day during the period 1981–2019). In addition, the global model aimed to predict natural intermittence by excluding heavily influenced gauging stations, relying on naturalized hydrology for the period 1971–2000. Still, our model predicts a similar prevalence of non-perennial reaches in Europe.

The prevalence of intermittence across European rivers and streams by this study, with 17.8% of non-perennial reaches and 3.8% of the reach-months, is much lower than in the study of Döll, Abbasi, and Trautmann (2023), with values of 39.6% and 9.1%, respectively, even though the same HR streamflow estimates were used in the RF modeling. While some predictors (related to irrigation, population and the degree of regulation by reservoirs) were added and one (daily streamflow variability) removed in this study, we attribute this strong discrepancy to the different observations of the RF target variables, which were derived from daily time series of streamflow observations. We explicitly tried to obtain streamflow from dry areas and small streams, with a higher likelihood of intermittence and added data from Cyprus and Italy (for Sardinia and the Po basin, but time series for the rest of Italy were shorter than our inclusion threshold of 36 months), but we could not obtain in time any data for, for example, Greece, Albania and Bulgaria. The data set was rather extended by stations for more humid regions such as Scandinavia, the three Baltic states, Poland and Belarus, most of which are perennial (comp. Figure 4a). When the streamflow observations data set was extended from the one used by Döll, Abbasi, and Trautmann (2023), that is, from 1915 gauging stations to the 3706 stations in this study (see Table 4 for station numbers per drainage area class), the additional stations had a smaller fraction of non-perennial months than the original data set. In this way, we have further biased the target data set and therefore may have caused a biased streamflow intermittence prediction for the reaches. Still, we expect that almost doubling the number of target observations as compared to the study of Döll, Abbasi, and Trautmann (2023) increased the reliability of the RF models. The additional streamflow data and predictors are informative because the BACC for the step 1 RF increased from 0.85 in Döll, Abbasi, and Trautmann (2023) to 0.92 in this study (while the BACC for the step 2 RFs are the same). However, the fit to the ONDE data as measured by the BACC for the identification of non-perennial reaches remained the same as that of the step 1 RF model by Döll, Abbasi, and Trautmann (2023). The comparison of the European streamflow intermittence estimation by Döll, Abbasi, and Trautmann (2023) and this study shows the major impact of available target observations on RF modeling results.

5.4 Information Content of the Predicted HR Streamflow Intermittence Data

Given the amount, spatial resolution and uncertainty of available input data, it is very challenging to achieve a good prediction of HR streamflow intermittence for all of Europe. One reason is that continental and global-scale streamflow simulations for relatively large rivers represented by LR models often strongly differ from observations. This mismatch stems from large-scale models not being calibrated in a basin-specific manner against observed daily streamflow, as is done with small-scale models. In addition, their input data are coarser and usually less accurate than those of small-scale models. Here, using an advanced downscaling algorithm, the output of an LR hydrological model (Figures 11a and 11b) was downscaled, by a factor of 14,400, to generate monthly time series of streamflow at 15 arc-sec resolution (Figure 11c). It is encouraging that these simulated HR streamflow time series show skill for most streamflow stations with upstream areas smaller than an LR cell, even down to upstream areas of less than 5 km2 (Figures 2 and 3). However, the number of evaluated gauging stations with such small upstream areas was very small (Figure 3). The estimated HR streamflow time series enabled, together with other predictors, the estimation of HR streamflow intermittence. A comparison of Figures 11b and 11d, which shows LR and HR intermittence, respectively, for a part of France illustrates the strongly increased information content of the European HR streamflow intermittence data set as compared to an LR intermittence estimation.

Details are in the caption following the image

Illustration of downscaling of LR WaterGAP output to the HR stream network of HydroSHEDS and the resulting resolution-dependent characterization of intermittence. Panels show LR (0.5 arc-deg) grid cells with the sum of surface runoff and groundwater discharge (the main input to the downscaling algorithm) (a), LR reaches with their intermittence status (b), HR (15 arc-sec) grid cells with downscaled streamflow (c) and HR reaches with intermittence status in 5 classes (d). The figure shows the situation for the example of August 2003. In c and d, the locations of the streamflow gauging stations used for validation of downscaled streamflow and as target for the RF model are added.

It is very difficult to judge the realism or plausibility of the predicted reach intermittence. The model validation against an independent data set for France (Section 5.1) indicates a severe underestimation of observed non-perennial reach-months, while the comparison of predicted intermittence to streamflow observations used for setting up the RF model indicates that the RF model overestimates intermittence, particularly for the relatively dry regions of Europe such as large parts of the Iberian Peninsula. The latter may be explained by the suspected anthropogenic perennialization of streamflow by many small and large dams that have been constructed to make water available even in periods of low or no flow. Still, the BACC for predicting non-perennial station-months (0.92) was very good. We found that the RF model can simulate well the interannual variability of the number of non-perennial months at the streamflow gauging stations (Figure 4c), which is an important positive characteristic if the model is to be used for assessing the impact of drought conditions or climate change. The partial dependence plots for the step 1 RF show that the model identifies tendencies in the probability of a station-month being non-perennial that agree with expert expectations (except for terrain slope), which increases our trust in the derived RF. Moreover, the correlation between the observed and predicted monthly time series of the five intermittence classes is high at most non-perennial stations (Figure 6d), which indicates a good representation of the seasonality of streamflow intermittence. Averaged over all station-months with available intermittence observations, there is no bias in the prediction of the five intermittence classes per size class of upstream area as compared to observations (Table 3). However, the prediction of the number of no-flow days per reach-month in four classes must be considered to be less reliable than the prediction of a reach-month as either non-perennial or perennial, as indicated by the lower BACC of 0.67 for the step 2 RF.

It can generally be expected that uncertainties in the prediction of streamflow intermittence are higher for streams with small upstream areas than for large streams, caused by, for example, the low resolution of the global hydrological model and its climate input data. Yet our analyses regarding model performance as a function of drainage area of streamflow gauging stations show a more nuanced picture. While the performance of the downscaled HR monthly streamflow time series tends to improve with increasing drainage area in the case of perennial rivers (Figure S3 in Supporting Information S1), the opposite is observed in the case of intermittent rivers (Figure S2 in Supporting Information S1). Regarding the performance of estimating perennial versus non-perennial months, our modeling approach correctly portrays that streams with smaller drainage basins are more likely to be non-perennial than larger drainage basins, especially in the two smallest drainage basin classes 0–2 km2 and 2–5 km2 (Figure S4 in Supporting Information S1). However, these size classes also show the highest overestimations of non-perennial months in the model. This suggests that further investigations are needed to understand local influences on intermittence in headwaters, for example, the potential effects of local groundwater levels and stream bed permeability (Snelder et al., 2013). For drainage basins larger than 2,500 km2, on the contrary, our approach tends to underestimate the already low percentage of non-perennial months (Figure S4 in Supporting Information S1).

6 Conclusions and Outlook

For the first time, streamflow intermittence dynamics could be quantified at the continental scale at a high spatial resolution, that is, for stream reaches with an upstream area down to only 2 km2 (or even smaller in wet regions). We simulated monthly time series of streamflow intermittence in five classes (0, 1–5, 6–15, 16–29 and 30–31 no-flow days per month) in the period 1981–2019 for more than 1.5 million stream reaches in Europe. This was achieved by (a) downscaling the 0.5 arc-deg output of the global hydrological model WaterGAP to obtain time series of monthly streamflow at about 73 million 15 arc-sec grid cells and (b) combining this information with daily data of streamflow as observed at 3,706 gauging stations and a number of static hydro-environmental characteristics of the upstream basins (plus two WaterGAP-related data sets) in an RF modeling approach. The model captures the interannual variability of the number of non-perennial months satisfactorily, and the monthly time series of the predicted five streamflow intermittence states is highly correlated with observations. The spatial prevalence of weakly non-perennial conditions appears to be underestimated, while the number of non-perennial months is overestimated in the dry regions of Europe where reservoirs tend to perennialize streamflow. While the generated streamflow intermittence data set does diverge from reality for many reach-months, it is nevertheless a valuable basis for macro-scale studies of biodiversity, ecosystem functions and ecosystem services under conditions of potential streamflow intermittence.

The monthly time series of streamflow intermittence can be used as input to ecological and biogeochemical models. The presented modeling approach was designed to enable the computation of intermittence changes due to climate change. For this purpose, the LR output of a WaterGAP run that is driven by the bias-corrected output of global climate models, instead of observed historic climate, can be downscaled to calculate monthly time series of HR streamflow in, for example, a 30-year reference period and a 30-year period in the future. These time series, together with the LR WaterGAP time series of monthly diffuse groundwater recharge, runoff from land and the number of wet days under climate change, can then serve to compute the dynamic predictor values that are, in addition to the unchanged static predictor values, the input for the two developed RF models. In addition, the developed modeling approach can be used to analyze the occurrence of drought in non-perennial streams (Sarremejane et al., 2022).

Acknowledgments

The authors thank their collaborators in the DRYvER project, in particular Jean-Philippe Vidal and Thibault Datry, for their contributions. MA was funded by the EU Horizon 2020 DRYvER project (Project ID 869226). MLM was supported by a Vanier Canada Graduate Scholarship (Natural Sciences and Engineering Research Council of Canada) and an H2O'Lyon Doctoral Fellowship (Université de Lyon, ANR-17-EURE-0018). We also thank the three reviewers for their constructive feedback. Open Access funding enabled and organized by Projekt DEAL.

    Data Availability Statement

    WaterGAP 2.2e input and output used for deriving HR streamflow and as LR predictors in the RF model is available from Müller Schmied et al. (2023b). The code for deriving HR streamflow (Trautmann, 2023) is available at https://doi.org/10.5281/zenodo.10301003, and the code and workflow for the RF modeling (Abbasi & Messager, 2023) at https://github.com/mahabbasi/europeanIRmap.git. Due to the very large file sizes, the HR monthly streamflow time series are only available on request from first authors. The following data are available at https://doi.org/10.6084/m9.figshare.24591807: (a) Input files for deriving HR streamflow (Table S1 in Supporting Information S1), (b) the monthly time series of streamflow at the 3706 gauging stations, (c) shapefiles of locations of streamflow gauging stations and European reaches, (d) all predictors and target variables for the 3706 gauging stations used to generate the step 1 and step 2 RFs and (e) shapefiles with the five intermittence classes for each reach-month in the period 1981–2019 as well as the shapefiles for generating all figures (Döll, Abbasi, Trautmann, et al., 2023). The original data used for deriving the HR static predictors are available as described in Section 2.3.2 and Table 1.