Added Value of Large Ensemble Simulations for Assessing Extreme River Discharge in a 2 °C Warmer World
Abstract
The assessment of return periods of extreme hydrological events often relies on statistical analysis using generalized extreme value (GEV) distributions. Here we compare the traditional GEV approach with a novel large ensemble approach to determine the added value of a direct, empirical distribution-based estimate of extreme hydrological events. Using the global climate and hydrological models EC-Earth and PCR-GLOBWB, we simulate 2,000 years of global hydrology for a present-day and 2 °C warmer climate. We show that the GEV method has inherent limitations in estimating changes in hydrological extremes, especially for compound hydrological events. The large ensemble method does not suffer from these limitations and quantifies the impacts of climate change with greater precision. The explicit simulation of extreme events enables better hydrological process understanding. We conclude that future studies focusing on the impact of climatic changes on hydrological extremes should use large ensemble techniques to properly account for these rare hydrological events.
Key Points
- Statistical models to describe extreme river discharge can be unreliable when multiple processes lead to extreme events
- Large ensemble model experiments (many simulation years) are suitable for analysis of extreme events and do not rely on statistical models
- Hydrological large ensembles provide better estimates of changes in extreme floods and droughts and their characteristics
Plain Language Summary
Extreme hydrological events such as droughts and floods can cause severe harm to people and nature. It is therefore important to understand why and how often they occur now and in the future. We compare two methods of studying these extreme events: a frequently used statistical method and a new direct simulation method (called “large ensemble simulations”). We show that this new method better represents the extreme events, that it reduces the uncertainties of the expected effects of climate changes on extreme events, and that it allows us to study why extreme events occur. We therefore are better capable to quantify the impact of climate change on hydrological extremes, and we recommend the large ensemble method for future studies on extreme events.
1 Introduction
Global climate change affects the hydrological cycle around the world. Projected changes include changes in climatological precipitation patterns, precipitation types, evaporation amounts, soil moisture availability, and discharge levels (e.g., Berghuijs et al., 2014; Held & Soden, 2006; Intergovernmental Panel on Climate Change, 2013; Milly et al., 2005; Wanders & Wada, 2015a). Besides changes in mean conditions, also extreme hydrological events are projected to change, including more frequent extreme precipitation events, more severe droughts, and changing flood occurrence (e.g., Min et al., 2011; Prudhomme et al., 2014; Van der Wiel et al., 2017; Wanders & Van Lanen, 2015; Wanders & Wada, 2015b; Winsemius et al., 2015). These extreme events can have severe negative impacts on societies and ecosystems. For instance, the recent drought in South Africa (2015 to present, Baudoin et al., 2017) and the 2017 floods in Bangladesh (Philip et al., 2018) had severe negative consequences on people's livelihoods, agricultural production, and the national economy. Scientific understanding of the probability of occurrence, severity, and characteristics of extreme hydrological events is therefore of societal importance.
Extreme events can be studied in two ways: (i) based on long time series and sampling or (ii) by means of a statistical model of the tail of the distribution. The limited length of observational hydrological records means we frequently rely on the second approach using a statistical model for statements regarding extreme hydrological events and also for simulation studies such an approach is frequently taken (e.g., Abaurrea & Cebrián, 2002; Smith et al., 2014; Sousa et al., 2011). These statistical models are used to extrapolate sample statistics about the extreme values of limited-length observational records, to provide estimates of the probabilities of unobserved extreme values. It is assumed that extreme values follow a single given probability distribution. This is common practice when estimating defense levels against floods and designing drought protection infrastructure (e.g., dams and reservoirs).
Recent advancements of computing power now allow for sufficiently long model simulations or large ensembles to be created to study extreme events by means of sampling, therewith removing the need to rely on statistical models, assumptions on distributions, and extrapolation of data. This is especially valuable when looking at hydrological extremes, which are sensitive to minor changes in the parameters of statistical models and their extrapolations (Engeland et al., 2004), often leading to great uncertainties about extreme event estimates. Of course, a modeling approach depends on the quality of modeled data. Model validation with observations remains important.
Here we aim to contribute to this problem by showing that a large ensemble modeling approach can improve estimates of the risk of extreme events in the present climate and reduce uncertainties in the future projections of the occurrence of these extreme events. We use output from a global climate model (GCM) to force a global hydrological model (GHM) and create two large ensembles of 2,000 years each. Taking the perfect model approach, we show the advantages of the large ensemble approach and the explicit simulation of extreme events over the common statistical approach. We provide global estimates of changes in extreme river discharge that may occur in a world with 2 °C global mean warming, both for extreme low discharge (droughts) and extreme high discharge (floods). We mostly focus on extreme events with an average return period of 100 years, which is representative of flood protection levels around the world (Ward et al., 2017) and sampled relatively well in these simulations.
2 Models and Methods
2.1 Global Climate Model: EC-Earth v2.3
The EC-Earth GCM (Hazeleger et al., 2012) was used to create two large ensembles of meteorological data. We used the model in the same configuration (v2.3, 1.1° resolution) as in the Coupled Model Intercomparison Project phase 5 (CMIP5, Taylor et al., 2012). EC-Earth combines an atmospheric model, an ocean model, a land surface model, and a sea ice model. Full details on EC-Earth, its configuration, parametrizations, and individual components are provided in Hazeleger et al. (2012).
The two large ensembles represent two periods which differ in global mean surface temperature (GMST, supporting information Figure S1; James et al., 2017). The first “present-day” large ensemble was created to have a modeled absolute GMST equal to observed GMST in the years 2011–2015 (based on HadCRUT4 data; Morice et al., 2012). The second “2 °C warming” large ensemble was created to have a modeled absolute GMST equal to observed preindustrial temperature +2 °C warming, motivated by the Paris climate agreements (United Nations Framework Convention on Climate Change, 2015). Each large ensemble consists of 2,000 years of global, daily meteorological data. More details on the design of the large ensembles are provided in the supporting information (Buizza et al., 1999).
2.2 Global Hydrological Model : PCR-GLOBWB 2
Terrestrial hydrology was simulated using the GHM PCR-GLOBWB 2 (Sutanudjaja et al., 2018). PCR-GLOBWB computes a water balance at a 0.5° grid scale, forced by daily precipitation, 2-m air temperature, and potential evapotranspiration from EC-Earth. Potential evapotranspiration was calculated from EC-Earth variables following the Penman-Monteith procedure (Zotarelli et al., 2010). PCR-GLOBWB includes modules to simulate land surface processes, groundwater, irrigation, domestic and industrial water use, and river routing including lakes and man-made reservoirs. Full details are provided in Sutanudjaja et al. (2018). We compare simulated extreme discharge events to those in the Global Runoff Data Centre data set.
A bias correction was applied to the EC-Earth variables before they were used as meteorological drivers for PCR-GLOBWB. Regridding from the coarser EC-Earth grid to the PCR-GLOBWB grid was done by means of a bilinear interpolation. Near-surface temperatures and potential evapotranspiration were corrected to have a seasonal cycle and interannual variability as observed by a shift of the mean and a multiplicative correction. Precipitation was corrected to have the same number of monthly wet days and monthly totals as observed. This is essential because excessive drizzle in EC-Earth (a common problem in GCMs; Dai, 2006) otherwise prevents realistic droughts developing in the simulations. Correction parameters were derived by comparison of the simulated present-day climate and an observational-based baseline (ERA-Interim reanalysis product, 1979–2016; Dee et al., 2011). The obtained correction factors are applied to both the present-day and 2 °C warming ensembles assuming that biases in EC-Earth are equal for the two climate states.
2.3 Analysis of Extreme Events
- Generalized extreme value (GEV) fit to 100 years of data
- GEV fit to 2,000 years of data
- Empirical distribution estimate based on 2,000 years of data
The first two methods follow a commonly used statistical approach: GEV estimation. We fit GEVs and estimate confidence intervals to a randomly selected subsample of 100 years and to the full ensemble of 2,000 years (Gilleland & Katz, 2016). The first approach somewhat mimics the reality in operational water management with limited observational records or short model experiments. Comparing the two GEV methods shows the benefits of having long modeled time series.
The third method does not rely on an a priori assumption of the statistical distribution of the data. GEV fits are made under the assumption that the full distribution of extreme events can be described by a GEV function. This assumption is not always valid, especially for rivers where different runoff generating processes can cause extreme events. In that case, extreme events may not be part of a single statistical distribution, and a GEV analysis will be unreliable. We used the full empirical distribution of events as simulated in the large ensembles and used it to directly determine discharge levels for different return periods. Confidence intervals are estimated by means of bootstrap resampling (N = 10,000). Comparing the GEV fit based on 2,000 years of data to this empirical distribution approach shows the benefits of direct sampling over a statistical approach. A comparison of the empirical distribution approach to fits based on a log-normal or Gumbel distributions is included in supporting information Figures S2 and S3). The GEV-based fits provide better statistical descriptions of the modeled data than the fits based on the other distributions, we therefore only show the GEV-based analysis here.
3 Results
3.1 Changes in Extreme Floods
We compare the three methods of analyzing extreme flood events for some of the major rivers around the world (Figure 1). These rivers were selected to be highlighted because they display different behavior in response to global climate change and for the different methods. The model reproduces floods in these rivers quite well (Figures 1c, 1f, 1i, and 1l). However, the limited length of observational records prevents the verification of the most extreme simulated floods.
Uncertainties in the GEV fit to 100 years of data are so large that we find no significant change in the flood occurrence for the Lena River (Figure 1a). The GEV fit to 2,000 years shows a statistically significant increase in flood magnitudes (Figure 1b). Based on the empirical distribution approach, we confirm this increase in flood magnitude for all return periods (Figure 1c). The GEV-fitted estimates of the 100-year flood magnitude are 4% and 1% lower for the GEV fits based on 100 and 2,000 years, respectively, than those based on the empirical distribution (supporting information Figure S4).
For the Mekong River, the GEV fit to 100 years of data again does not show significant change (Figure 1d), when using 2,000 years of data to fit the GEV (Figure 1e) a statistically significant increase of floods of all return periods is found. The GEV estimates of the 100-year flood are 8% and 1% lower than the empirical estimates (supporting information Figure S4). The empirical distribution approach here indicates that only floods with return periods of up to 200 years are projected to increase; changes of more extreme floods are uncertain (Figure 1f).
In contrast to the Lena and Mekong Rivers, no changes in flood occurrence are projected in the Columbia River based on all three methods (Figures 1g–1i). The Columbia River shows a shift in the distribution for the most extreme floods, caused by cooccurrence of severe precipitation, high snowmelt volumes, and high groundwater levels. This behavior is not observed in the 2 °C warming climate, likely due to a reduction of the contribution of snowmelt (Vano et al., 2015) or due to the fact that a temporal shift of the snowmelt season reduces cooccurrence of these processes.
The GEV fit to 2,000 years of data fails to capture the most extreme events in the Amazon River (Figure 1k). The GEV-based estimates of the 100-year flood are 8% lower than the empirical distribution for both GEV fits (supporting information Figure S4), indicating that the use of longer time series in the GEV approach does not help here. Larger differences are found for greater return periods. In this region we see that the cooccurrence of severe precipitation in different contributing regions of the Amazon causes the most severe floods in the downstream regions of the river (as in 2014; Espinoza et al., 2014). These very extreme floods are part of a different distribution than the distribution of less extreme floods. Any statements on discharge levels of extreme floods or projected changes based on these GEV fits are unreliable. The empirical distribution approach, which is capable of capturing the double distribution in extreme floods and indeed shows multiple branches in the southeast catchment with very high discharge for very extreme floods (Figure 2), shows no projected change in extreme floods in the Amazon River (Figure 1l).
We apply the same three methods to investigate changes in the value of the 1-in-100-year flood globally (Figure 3). In these maps, we only show statistically significant changes to highlight differences in statistical significance between methods. Based on the empirical distribution method, we find that 25% of land shows a significant increase of 1-in-100-year floods (Figure 3a). Just 5% of global land shows a significant decrease in 1-in-100-year floods. The remaining land surface area mostly shows nonsignificant increases in 1-in-100-year floods. This global shift toward more extreme flooding conditions may be related to projected increases in extreme precipitation (e.g., Min et al., 2011; Van der Wiel et al., 2017), while at high latitudes changing precipitation types may contribute (Bintanja & Andry, 2017).
The 100-year-based GEV fit does not provide any statistically significant estimates of changes in extreme floods (Figure 3b). The large-scale pattern of changes estimated by means of the GEV fit to 2,000 years is comparable to that based on the empirical distribution approach (23% of land with significant worsening floods). Regionally differences exist; for example, in West Africa, India, and along the Nile the empirical distribution method provides more statistical significance.
3.2 Changes in Extreme Droughts
We repeat the above analysis for extreme drought events. The GEV fits for extreme droughts have smaller confidence intervals (Figure 4) than those for extreme floods (Figure 1). Based on the comparison of the GEV fits to 2,000 years and the large ensemble bootstrap resampling method based on 2,000 years, we suspect the GEV fits are overconfident. Droughts in the Lena and Columbia Rivers are simulated quite well, though again very extreme droughts can not be constrained based on observational data (Figures 4c and 4i). For the Amazon River, the observed record suggests drought events are best described by a double GEV distribution, which is, however, not found for the simulated values (Figure 4l). The observational record for the Mekong River is relatively short (34 years), with one event well outside the simulated range. This one event can either be a very extreme event, more extreme than 34 years, or it is the first event of a second distribution. The limited length of the record does not allow such conclusions, nor can we say much about the quality of the model chain for very extreme events here.
The Lena River is the only river shown here, which shows a decreasing trend in extreme drought magnitude (Figures 4a–4c). The GEV fits (Figures 4a and 4b) have difficulties capturing the higher return levels, which are caused by prolonged periods of below 0 °C temperature, below normal precipitation, and low groundwater levels. These are typically multiyear droughts, where two consecutive drought events cause a strong reduction in river discharge. These multiyear events violate the GEV assumption that events are independent, which could be the reason for the failing fits here.
The benefits of having 2,000 years of data over 100 years of data are most obvious for the Amazon River, which shows a significant increase of droughts only when the full large ensemble is taken into account (Figures 4j and 4k), and for the Mekong River, where the GEV fit to 100 years of data leads to a different conclusion than the GEV fit to 2,000 years of data (Figures 4d and 4e). In general, the difference between the GEV fits and the empirical distribution is smaller for the 2,000-year-based fits (supporting information Figure S5).
The benefits of the empirical distribution method over the GEV method are largest for the Mekong River. The GEV fit to 2,000 years does not provide a good description of the most extreme drought events (Figure 4e and supporting information Figure S5). As a result, the extreme discharge droughts are underestimated in the GEV fit compared to the values from the empirical distribution approach (Figure 4f).
Figure 5a shows the projected change for 100-year droughts following the large ensemble approach. The large-scale pattern reflects changes in large-scale precipitation patterns. For example, in Europe, a northward shift of summer precipitation (Intergovernmental Panel on Climate Change, 2013) leads to more severe 100-year droughts in the Mediterranean region and central Europe (Rhône, −20%, Danube −12%, and Rhine −14%), though less severe droughts in Scandinavia. Rivers located in the Boreal North show a strong reduction in drought severity, here global warming causes a shortening of the low flow season, reducing drought magnitudes. In equatorial Africa the widening of the Hadley circulation (Hu & Fu, 2007; Lu et al., 2007) results in worsening of droughts magnitude in the subtropical dry zone (e.g., Nile, −8%).
The 2,000-year GEV-based estimates of 100-year drought events (Figure 5c) shows comparable large-scale patterns though with less statistical significance. At smaller scales there are differences, for example, in New Guinea island where the large ensemble method indicates more severe droughts and the GEV methods show no significant change. Comparing the two 2,000-year-based methods (Figures 5a and 5c), we find that more rivers show statistically significant changes for the empirical distribution method (41% versus 28% of land surface, respectively), despite the suspected overconfidence of GEV drought estimates (Figure 4).
4 Discussion
The large ensembles as simulated here represent a chosen climate state. This is fundamentally different from observational records (which are subject to changing management conditions and transient climate) and transient model simulations. Analysis of transient time series requires assumptions to be made on driving covariates, for example, GMST. The design of the large ensembles used here, in combination with the empirical distribution approach, limits the need for (statistical) assumptions, descriptions, and corrections. However, for analysis of events of very high return periods sampling uncertainties remain large.
Although the hydrological model has been extensively validated and shows good agreement with observational records (Sutanudjaja et al., 2018), there is insufficient observational data to evaluate the performance of PCR-GLOBWB for extreme floods and droughts. Repeating the analysis with different GCMs and different GHMs will provide insight into the model sensitivity of the results. This is important as both EC-Earth and PCR-GLOBWB are imperfect models: coarse model resolution, model parameterizations, and missing physical processes give rise to biases in modeled fields. The present analysis was done under the perfect model assumption. This assumption can be relaxed in a multimodel framework, because the results would no longer depend on a single imperfect model formulation and biases tend to reduce (Tebaldi & Knutti, 2007). When looking at relative changes, as done here, model biases also partly cancel. Finally, by combining multiple GCMs with multiple GHMs, the dominant sources of uncertainty in projections (meteorological variability, hydrological response, or model formulation) can be isolated (e.g., Marx et al., 2018; Thober et al., 2017).
Though having an off-line GHM provides us with the opportunity to apply a bias correction to the meteorological forcing variables, there is no closure of the water budget in the complete land-atmosphere system (water budget closure is ensured in the GCM and GHM separately). The lack of full water system closure may introduce biases in future projections of drought, most notably due to the off-line computation of potential evapotranspiration (Kay et al., 2018; Milly & Dunne, 2017). Hydrological variables from GCMs do not suffer from this bias. Direct analysis of GCM hydrology is therefore a possible solution (e.g., Van der Wiel et al., 2018). However, representation of hydrological processes in GCMs is not as sophisticated as in GHMs, often neglecting the groundwater system, human-water interaction, and river routing (Bierkens, 2015). This creates a need for the GCM-GHM modeling sequence. Here we have assumed that the uncertainties introduced by the coupling are compensated by a more realistic hydrologic representation of the system.
5 Conclusions
The aim of this analysis was to show the added value of large ensemble GCM-GHM simulations for the study of extreme hydrological events. This novel method complements existing methods based on observational data, statistical extrapolation, and short (multi)model simulations. By means of a technical comparison we have identified limitations of GEV approximations and shown that these are resolved when extreme events are sampled directly from the empirical distribution of river discharge. The three main advantages are (i) improved estimates of discharge levels of extreme events, (ii) better constraints on projections of changes in extreme events due to global climate change, and (iii) information on the physics of extreme events.
Increasing computing power now allows us to simulate sufficiently long time series or large ensembles to sample extreme events directly from their empirical distribution. We have shown that the described limitations of GEV estimates do not apply to the large ensemble approach, assuming the model simulates extreme events and change therein with some accuracy. We therefore strongly recommend the use of large ensemble techniques for future studies of extreme hydrological events and note that similar advances can be made in studies of climate-induced extreme events in other fields.
Acknowledgments
This work is part of the HiWAVES3 project (NWO ALWCL.2016.2). N. W. acknowledges funding by NWO 016.Veni.181.049. The authors would like to thank two anonymous reviewers and the Editor, Valeriy Ivanov, for their comments, which helped to improve the manuscript. The data set of simulated annual minimum and maximum discharge values used in this study is available from the website http://doi.org/10.5281/zenodo.2536396.