Modeling multidecadal surface water inundation dynamics and key drivers on large river basin scale using multiple time series of Earth‐observation and river flow data

Periodically inundated floodplain areas are hot spots of biodiversity and provide a broad range of ecosystem services but have suffered alarming declines in recent history. Despite their importance, their long‐term surface water (SW) dynamics and hydroclimatic drivers remain poorly quantified on continental scales. In this study, we used a 26 year time series of Landsat‐derived SW maps in combination with river flow data from 68 gauges and spatial time series of rainfall, evapotranspiration and soil moisture to statistically model SW dynamics as a function of key drivers across Australia's Murray‐Darling Basin (∼1 million km2). We fitted generalized additive models for 18,521 individual modeling units made up of 10 × 10 km grid cells, each split into floodplain, floodplain‐lake, and nonfloodplain area. Average goodness of fit of models was high across floodplains and floodplain‐lakes (r2 > 0.65), which were primarily driven by river flow, and was lower for nonfloodplain areas (r2 > 0.24), which were primarily driven by rainfall. Local climate conditions were more relevant for SW dynamics in the northern compared to the southern basin and had the highest influence in the least regulated and most extended floodplains. We further applied the models of two contrasting floodplain areas to predict SW extents of cloud‐affected time steps in the Landsat series during the large 2010 floods with high validated accuracy (r2 > 0.97). Our framework is applicable to other complex river basins across the world and enables a more detailed quantification of large floods and drivers of SW dynamics compared to existing methods.


Introduction
Wetlands and other periodically inundated surface water (SW) areas are hot spots of biodiversity [Gopal, 2009;Pearson et al., 2013;Arthington et al., 2014] play a crucial role in global climatic, hydrologic, and biogeochemical cycles [Hamilton, 2010] and provide numerous ecosystem services of value to people [Maltby and Acreman, 2011;Costanza et al., 2014]. In recent history however, increasing development of water resources, land use transformations, and agricultural intensification have led to an alarming disappearance and decline of floodplains and other types of wetlands around the world [Finlayson et al., 1999;Lemly et al., 2000;Jones et al., 2009;Costanza et al., 2014], with the most comprehensive recent review estimating a loss of 69-75% of inland wetland area present in 1900 AD [Davidson, 2014]. Our study area, the semi-arid Murray-Darling Basin (MDB) in Australia, is a good example of this global trend, where intensive agricultural and water resources development in the second half of the 20th century has led to deterioration of many of the basins floodplain wetlands . This problem was further intensified by the recent Millennium Drought (from mid-1990s to 2009), the most severe recorded hydrological drought in the MDB, which was followed by widespread extreme flooding in 2010/2011 commonly referred to as the La Nina Floods [Leblanc et al., 2012].
As a result of the dramatic global loss in wetland area and the paramount value of terrestrial SW resources in the face of widely proclaimed global freshwater crisis [V€ or€ osmarty et al., 2010;Bakker, 2012;World Economic Forum, 2015], the usage of remote sensing for quantifying the distribution and temporal variability of SW over large areas and periods of time has become an important field of research in recent years Alsdorf et al., 2007;Papa et al., 2010;Tulbure and Broich, 2013;Bates et al., 2014;Bierkens, 2015;Hu et al., 2015;Kuenzer et al., 2015;Lettenmaier et al., 2015;Pekel et al., 2016;Tulbure et al., 2016;World Bank, 2016]. A better understanding of SW extent dynamics, that is the distribution of inland water bodies through time, at continental scale can enable better management of water resources Overton, 2005;Powell et al., 2008;Viney et al., 2014], assist in managing floods and droughts [Sakamoto et al., 2007;Huang et al., 2014;Hu et al., 2015] or help to identify priority conservation areas by studying ecological connectivity across space and time [Tulbure et al., 2014;Bishop-taylor et al., 2015]. Remotely sensed series of SW maps have further been used to quantify reservoir storage Zhang et al., 2014b] and to estimate wetland area in the context of malaria management [Midekisa et al., 2014].
Large-scale flooding events are a critical component of the terrestrial water cycle [Alsdorf et al., 2007] but their propagation through river systems and the corresponding dynamics remain poorly quantified on continental or global scales . The traditional approach for quantifying the extent of floods is through hydrodynamic modeling and increases in computation power and data availability have led to major advances in continental and even global scale flood modeling in recent years [Yamazaki et al., 2011;Getirana et al., 2012;Neal et al., 2012Neal et al., , 2015Cauduro et al., 2013;Sampson et al., 2015]. Schumann et al. (2016) applied a hydrodynamic model to simulate floodplain inundation across Australia continuously over a 40 year period. Nevertheless, parameterization of these models remains difficult and accurate representation of complex river and floodplain topographies and the corresponding storage effects is still limited by the quality of suitable data sets such as digital elevation models (DEMs) with global coverage . A promising alternative for quantifying and monitoring SW dynamics over large areas and long periods of time is the use of time series of satellite data [Alsdorf et al., 2007;World Bank, 2016], with the Landsat and MODIS satellites currently representing the most commonly used data source [Musa et al., 2015;Karpatne et al., 2016]. The unique spectral signature of SW in the infrared and visible bands allows for separation between water and dry land surface by applying a suitable classification technique to satellite imagery, enabling the generation of time series of SW maps on large river basin [Tulbure and Broich, 2013;Huang et al., 2014;Kuenzer et al., 2015;Tulbure et al., 2016], continental [Klein et al., 2014;Mueller et al., 2016], and even global scales [Papa et al., 2010;Klein et al., 2015;Pekel et al., 2016].
Apart from providing a promising data source for calibrating large-scale hydrodynamic models [Karim et al., 2011;Ticehurst et al., 2013], such series of SW maps derived from satellite imagery have previously been used for developing statistical inundation models by empirically linking remotely sensed SW extents to river flow or stage [Costelloe et al., 2003;Gumbricht et al., 2004;Powell et al., 2008;Westra and De Wulf, 2009;Frazier and Page, 2009;Ren et al., 2010;Jung et al., 2011;Leauthaud et al., 2013;Sims et al., 2014;Huang et al., 2014;Ogilvie et al., 2015;Azman et al., 2016]. In addition to their high cost-benefit ratio, satellite-based inundation models are useful for a variety of applications including the prediction of wetland inundation extent [Gumbricht et al., 2004;Overton, 2005;Westra and De Wulf, 2009;Huang et al., 2014] or the improvement of environmental flow strategies [Shaikh et al., 2001;Powell et al., 2008], a water management technique for recovering floodplain ecosystem health by releasing flushes of water from reservoirs. Most existing models, however, are based on a single cohesive floodplain site, used only a limited number of manually selected satellite images and did not account for lag times between river gauges and the area of inundation response. Heimhuber et al. [2016] addressed these limitations by developing a prototype framework for modeling SW extent dynamics, depicted in a Landsat-derived time series of SW maps   , continuously as a function of river flow (Q), local rainfall (P), soil moisture (SM), and evapotranspiration (ET), yet to be tested and applied on large river basin scale. In this study, we built on and adapted this framework to model SW extent dynamics and drivers across the entire MDB (1 million km 2 ). Like many large river basins, the MDB contains a broad variety of smaller river systems with unique climate and hydromorphological characteristics, flow and flooding regimes as well as highly variable levels of river regulation and water abstraction for irrigated agriculture and other uses [Murray-Darling Basin Authority (MDBA), 2010; Leblanc et al., 2012]. All these aspects impose difficulties for accurate quantification of the distribution and movement of SW across river and floodplain systems [Rudorff et al., 2014], which is key knowledge for managing these resources more sustainably [World Bank, 2016]. Satellite-based time series of SW extent, which are rapidly becoming more available [Karpatne et al., 2016], hold good potential for overcoming these difficulties, but their application for modeling large-scale inundation processes remains poorly explored . Our study addresses this gap by integrating multiple novel time series of Earthobservation and ground data for quantifying and modeling SW extent dynamics and drivers within a large and complex river system. Specific objectives were to (1) statistically model inundation extent dynamics Water Resources Research 10.1002/2016WR019858 across the MDB, (2) quantify the role of river flow and local climate drivers (i.e., P, ET, and SM) in SW dynamics, and (3) apply the models developed in (1) to predict SW extents for highly cloud-affected Landsat observations during the 2010/2011 La Nina Floods for two floodplains with contrasting flooding regimes.

Study Area
The study area of this research is one of Australia's largest river systems and the country's bread basket, the predominantly arid MDB (Figure 1). The MDB is often divided into a northern part covering the entire Darling River catchment until its confluence with the Murray River and a southern part covering the remaining area to the south [Leblanc et al., 2012;Huang et al., 2014]. The basin hosts almost 30,000 wetlands [MDBA, 2010] with 16 listed as ''wetlands of international importance'' [Ramsar Convention Secretariat, 2014] and around 200 specified in the ''Directory of Important Wetlands in Australia'' [Environment Australia, 2001]. Together, all wetlands cover around 6% of the total area of the MDB and 89% of all wetland areas are floodplains . The system's long-term average annual rainfall is 469 mm of which around 90% is lost due to evapotranspiration [Leblanc et al., 2012]. The MDB has a pronounced climate gradient with average annual rainfall decreasing and climate variability and evapotranspiration increasing from S-E to N-W [MDBA, 2010]. Due to this climate gradient and the diverse terrain, flooding regimes differ Figure 1. Location, major rivers [BOM, 2012], ecohydrological-zonation , and topography [CSIRO and Geoscience Australia, 2011] of the Murray-Darling Basin along with the spatial framework used for modeling SW dynamics. Colored 10 3 10 km grid cells contained SW areas that are hydraulically connected (i.e., floodplain and floodplain-lake category) to one of the 68 model gauges (colors of grid cells correspond to the color of their respective modeling gauge). For better readability of Figure 1a, irrigation areas [BRS, 2008] and the floodplain and floodplain-lake SW categories (derived from ) are only shown in Figures 1b and 1c. The nonfloodplain category comprised all remaining areas outside the two floodplain categories (colored grid cells and uncolored areas). Blue areas outside of grid cells in Figure 1a illustrate the location of mapped water bodies within the nonfloodplain SW category .

10.1002/2016WR019858
substantially across the MDB, with many of the ephemeral subcatchments in the northern basin having a dry-land river flow regime characterized by extreme variability with long dry spells, punctuated by large flood events [Bunn et al., 2006]. In the rivers of the southern basin, floods typically occur in winter and spring as a result of reliable rainfall and snowmelt in the runoff generating catchment areas and can last for several months at a time in the lower Murray River [Penton and Overton, 2007].
The very different hydrology of the northern compared to the southern basin is reflected in the fact that despite the large size of its catchment, the Darling River only contributes about 14% of the total Murray River inflows under natural conditions [Leblanc et al., 2012]. The MDB is a hot spot for agricultural activity with irrigated agriculture accounting for about 60% of Australia's total agricultural water use and the overall SW use is about half of the MDB's average annual water availability [Leblanc et al., 2012]. This very high level of water usage and related water resources development has led to the deterioration of many of the basin river and floodplain ecosystems [Kingsford, 2000;Leblanc et al., 2012].

Model Development and Validation
The following paragraphs provide a detailed description of the input data, the spatial modeling framework, and the statistical techniques used for modeling SW extent as a function of key driver variables across the MDB. In this study, we built on and adapted a prototype modeling framework that was developed for three local river and floodplain systems in the MDB and is explained in detail in Heimhuber et al. [2016]. This modeling framework is based on a segmentation of the river basin into spatial units that are suitable for establishing meaningful statistical variable relationships. This segmentation accounts for the hydrological structure of the river basin, the type of water body that is being modeled as well as for the hydraulic connection of water bodies to a river with available gauge data. For each spatial unit, a statistical model between a numerical time series of SW extent as the dependent variable and Q, P, ET, and SM as potential driver variables was then fitted and used to analyze the variable relationships.

Time Series of Surface Water Extent
For modeling SW extent holistically across the study area, we used an existing, currently unprecedented Landsat-based time series of SW extent for the MDB . This time series was derived from all available Landsat 5 and 7 scenes between 1986 and 2011 (25,000) of the USGS/EROS L1T standard product with less than 50% overall cloud cover by using random forest classification models for water and clouds. For each satellite observation time step, the series consisted of a raster file in which each pixel was classified into water, dry-land, cloud or no-data. A rigorous accuracy assessment revealed overall classification accuracy of 99%. The median number of classified Landsat scenes per year per path/row was 13 images as opposed to the potentially possible number of 22 scenes as a result of cloud cover .

Spatial Modeling Framework
Modeling SW extent dynamics consistently across the MDB required the development of a spatial modeling framework. Here we built on a previously developed and tested prototype framework [Heimhuber et al., 2016] to model SW dynamics across the entire river basin (Figure 1). To account for the hydrological structure of the study area, we used an existing ecohydrological (EH) zonation of the MDB into 89 zones with uniform ecological and hydrological characteristics [Chen et al., 2012;Huang et al., 2013] as the major units for analysis and modeling. We then combined this zonation with a regular 10 3 10 km grid, which enabled the application of a consistent modeling approach across the river system. The MDB contains a large variety of SW areas and even though the majority of wetlands are floodplains, not all floodplains have a direct hydraulic connection to a river with available gauge data. To account for the different dynamics of different type of SW areas, we categorized them into floodplains, floodplain-lakes, and nonfloodplain areas (hereafter referred to as SW categories) based on the type of wetland and its connection to gauged rivers and modeled them separately using a tailored modeling approach (see Figures 1b and 1c). It is important to note here that this categorization did not intend to accurately define floodplain areas across the MDB. Instead, it is intended to differentiate between SW areas (floodplains and lakes) that are primarily driven by the dynamics of larger rivers with available gauge data and all remaining SW areas which are driven by more local rainfall runoff processes. Hence, the nonfloodplain SW category contained a large variety of SW areas including isolated wetland systems, lakes and areas of irrigated agriculture.
For generating the SW categorization, we first selected 68 river gauges with long-term records to establish homogenous gauge coverage across the MDB (Figure 1). We then used the Australian Geofabric [BOM, 2012], a fully connected and directed river network, to perform network routing operations and define Water Resources Research 10.1002/2016WR019858 hydraulic connectivity between river gauges and grid cells that contained floodplain or lake areas as defined by an existing static wetland layer for the MDB  (see colored grid-cells in Figure  1). If the river network indicated that there was a hydraulic connection between a grid cell and the river gauge, all floodplain areas in that cell were assigned to the floodplain category, and all lakes to the floodplain-lake category. The river network did not always represent the hydraulic connectivity that may occur on vast floodplains during times of flooding. To address this limitation, we used a Landsat-based relative inundation frequency layer  as an additional indicator for hydraulic connectivity to assign those SW areas to the floodplain category that are only connected to gauged rivers during major flooding events. In some areas, this methodology led to numerous modeling grid cells with hydraulic connectivity to a gauge that did not have any wetlands according to the static wetland layer but showed significant SW dynamics in the Landsat-based inundation frequency layer. For these grid cells that predominantly occurred in the very N-E of the MDB, we considered all areas that were inundated at least once during the analysis period as floodplain area, in order to capture and model the corresponding SW extent dynamics accordingly.
For generating the final modeling units, we applied a buffer of 300 m to both the floodplain and the floodplain-lake areas to ensure full coverage of floodplain areas that might inundate during the largest floods. The resulting areas were then clipped to individual modeling units based on the regular grid and the EH-zonation so that each grid cell potentially consisted of a combination of floodplain, floodplain-lake, and nonfloodplain area. The resulting framework comprised a total of 4120 individual floodplain and 447 floodplain-lake modeling units across the MDB which were each linked to one of the 68 river gauges ( Figure 1). In addition, there were a total of 13,954 grid cells that contained nonfloodplain area.

River Flow Data
The river gauges that were used to model SW dynamics across the MDB were selected based on the following considerations. The flow data should cover the entire analysis period from 1986 to 2011, should be without major gaps and there should roughly be at least one gauge per EH-zone. Out of the 68 selected gauges, 50 had full data coverage for the analysis period, nine covered between 20 and 25 years and nine less than 20 years, with the two most problematic ones having less than 10 years of data. Since the majority of gauges had full or nearly full data coverage, we did not interpolate any flow data and instead, only used data for the exact period of coverage of each gauge in the modeling process. All flow data were downloaded from respective government repositories [NSW Government, 2014; Government of SA, 2016; QLD Government, 2016; State Government VIC, 2016].

Driver Variables and Data Preparation
For each modeling unit within each of the three SW categories, we derived a numerical time series of SW extent in the form of the fraction of the percentage of SW area of the total area of the cloud-free modeling unit. We dropped SW extents of Landsat observations, where more than 40% of the area of a modeling unit was classified as cloud. The resulting time series of SW extent served as the dependent variable for each SW category and was modeled as a function of all driver variables, using a dynamic multivariate regression framework. In order to analyze the role of local climate conditions in SW dynamics, we used spatially explicit daily time series of P, ET, and SM as additional driver variables (hereafter referred to as local climate drivers) besides river flow, which are all known to influence the magnitude and duration of floods [S anchez-Carrillo et al., 2004;Lamontagne and Herczeg, 2009]. The selection criteria for the corresponding data sets were that they had to be spatially explicit and cover the entire analysis period. The P time series was based on interpolation of gauge records throughout Australia [BOM, 2015] and the ET time series was modeled output of the landscape component of the Australian Water Resource Assessment System (AWRA-L 4.5) continental scale water balance model [Vaze et al., 2013;Viney et al., 2014]. The SM time series was based on a combination of active and passive microwave data from multiple satellites [Liu et al., 2012;Wagner et al., 2012]. All three local climate drivers were converted into numerical daily time series by computing spatial averages on the level of the 10 3 10 km grid cells.

Statistical Modeling Approach
During the development of the basin-wide SW extent model, the original linear regression approach [Heimhuber et al., 2016] was found not to accurately capture the large variety of SW extent to Q relationships that occurred on modeling units across the MDB. Therefore, we used generalized additive models (GAM) [Wood, 2011], which allowed us to model the nonlinear relationships between SW extent and Q using a smooth function, while the remaining explanatory variables were modeled additively based on linear regression. The resulting model equation was: Where SWextent t is the SW extent (%) at time t, SWextent (t21) is the SW extent of the previous available Landsat observation (t 2 1), s is the GAM smooth function, Lag(Q) is the discharge at the related gauge lagged by the time it takes for a flood to travel from the gauge to the respective modeling cell, P is local rainfall, ET is evapotranspiration, SM is soil moisture, and e is the error term. The equation used for modeling SW extent dynamics on nonfloodplain areas is the same as equation (1) but without discharge as a predictor variable, given that those areas are not hydraulically connected to a river. For each SW category, SW extent was also modeled as a function of SW extent on the previous available satellite observation as a fixed variable, which is hereafter referred to as the lagged dependent variable. Lagged dependent variables are often used for modeling time series data in economics [Keele and Kelly, 2006;Shumway and Stoffer, 2006] and help to improve predictions when a process is known to depend on the state of the process as observed on a previous time step, which is the case for SW extent dynamics. The smooth function in the GAM models is essentially a combination of two or more polynomial curves, which are joined together at ''knots,'' with the smoothness of the curve depending on a single smoothing parameter [Wood, 2011]. Due to the large number of modeling units, it was not feasible to define this smoothing parameter individually for each model. Instead, we used a global smoothing parameter of four for all GAM models. Based on visual examination of model fits, this value led to good approximations of the variable relationship in a variety of different areas of the MDB.
The local climate drivers were modeled linearly and in the form of sums for the 16 day time period before each observation for P and ET (mm/16 days) and a moving average for the same time period for SM (%). While some variables (Q for floodplain and floodplain-lakes and P for nonfloodplain models) and the lagged dependent variable were always included in the models (hereafter referred to as fixed variables), the local climate drivers P, ET, and SM, respectively, were subject to an automated stepwise variable selection process and only included in the models if adding them led to a reduction in root mean squared error (RMSE) in fivefold cross validation (CV). The variables were added to the model in the order given in equation (1). For the nonfloodplain models, only ET and SM were subject to the automated variable selection process since P was the fixed driver for this model category, with Q dropped from these nonfloodplain models. To ensure that only meaningful empirical relationships were modeled, we only used SW extent observations >0 and set a minimum of 20 observations >0 as a threshold for fitting a model. In addition, we also masked out reservoirs with surface area of more than 5 km 2 for analysis. We quantified the performance of the statistical models based on the adjusted r 2 and RMSE in fivefold CV. The absolute and relative improvement in the model's RMSE in fivefold CV after accounting for the selected local climate drivers was used to analyze the role of these drivers in SW extent dynamics. The absolute improvement in RMSE was calculated as the difference in fivefold CV RMSE before and after accounting for the local climate drivers and by dividing this difference by the initial model RMSE (before accounting for the additional drivers), we obtained the relative improvement in RMSE. In addition, we also quantified the percentage of models in each zone, for which each local climate driver was selected. The management of time series data and all statistical modeling was implemented in R [R Development Core Team, 2008].

Quantifying Flood Travel Times
To account for the time that it takes for a flood to travel from the gauge to the modeling unit, Q was lagged by a previously quantified lag time before including it into the models. This lag time for discharge (hereafter referred to as Q lag) is the modeled timing between discharges recorded at the gauge and the corresponding remotely sensed inundation response of a downstream or upstream floodplain or floodplain-lake modeling unit. Q lags were quantified by fitting GAM spline models between SW extent on the floodplain and floodplain-lake units and Q lagged by each possible number of days between a lower limit of 220 and an upper limit of 40 days, and selecting the number of days that led to the highest r 2 . For a detailed analysis of the effect of Q lags as well as the suitability of GAM for capturing different Q to SW extent relationships, we exemplified four floodplain modeling units with different SW extent dynamics and distances to the respective modeling gauges (see Ex-A to Ex-D in Figure 1).

Predicting Surface Water Extent
One promising application of statistical SW inundation models is that they can be used to predict SW extent time steps that are not captured by the Landsat time series as a result of cloud cover. Here we used two of the four previously selected example floodplain modeling units that have contrasting inundation regimes Water Resources Research 10.1002/2016WR019858 (see Ex-A and Ex-B in Figure 1) and applied the statistical models to predict SW extents at a regular 8 day time step during the 2010/2011 La Nina Floods. These predictions were implemented using the previously fitted GAM models and the predict function in R. We validated these predictions by calculating r 2 and RMSE between predictions and Landsat observations of SW extent that were available during the selected time periods.

Example Floodplain Modeling Units
In many areas across the basin, the automated quantification of Q lags was a key step for establishing a meaningful Q to SW extent relationship, to which the GAM models were then fitted along with the other driver variables. The relationship between Q and SW extent is highly dependent on the floodplain topography [Frazier and Page, 2009] and was found to vary substantially across different floodplains in the study area (Figure 2). The four example floodplain modeling units (Ex) were located 140 (Ex-A), 85 (Ex-B), 130 (Ex-C), and 40 km (Ex-D) from their respective modeling gauges and had Q lags of 11, 5, 12, and 4 days, respectively, which equals to an average flood propagation speed of 13 km/d. It can be seen that the application of Q lags reduces the scatter in the data (comparing data points before (green) and after (black) applying Q lags in Figure 2) and reveal distinctive curves, which indicate the flows that are needed to achieve a certain SW extent on the corresponding floodplain area. In comparison to the two Paroo floodplain units (Ex-B and Ex-C), the Lower Murray floodplain unit (Ex-A) does not respond to increased river flows until a threshold of about 500 m 3 /s is reached and the floodplain starts to fill up. In the Paroo floodplain units, increasing river flow immediately leads to increasing floodplain inundation until a saturation point, above which only the highest peak flows of the 26 year analysis period leads to further increase in SW extent. It is also evident that the Q lag has a more pronounced effect on the variable relationship in the Paroo examples (Ex-B and Ex-C) as compared to the lower Murray (Ex-A) example, where there is already little scatter in the relationship before applying a lag time. The Barwon River example (Ex-D) shows a linear relationship for the entire range of recorded discharge values which was also captured well by the GAM model.

Basin-Wide Modeling
Only 2953 out of all 4120 individual floodplain modeling units exceeded the threshold of more than 20 >0 observations of SW extent that was set for fitting statistical models. For the floodplain-lake category, 332 out of 447 units were modeled and 8926 out of all 13,954 nonfloodplain units (Table 1). For illustration purposes, we averaged model metrics of individual modeling units spatially for the floodplain and floodplainlake categories based on zones that result from combining all grid cells that were modeled using the same gauge (Figures 3 and 4). For the nonfloodplain models, we summarized the individual modeling results on the basis of EH-zones ( Figure 5). Despite using a single smoothing parameter for the GAM spline function, the models were able to capture the large variety of Q to SW extent relationships occurring across the basin. The average r 2 for all floodplain models within gauge connected cells was 0.65, with better average goodness of fit in the northern (0.68) as compared to the southern (0.58) MDB (Figure 3a). Floodplain-lakes showed a similar pattern in model performance with an overall average r 2 of 0.66 for the entire MDB and r 2 of 0.7 and 0.59 for the northern and southern basin, respectively (Figure 4a). In contrast to this, the nonfloodplain models had a low total-basin r 2 of 0.25 and slightly better goodness of fit in the southern (0.28) than in the northern (0.24) basin (Figure 5a). In this category, there was a noticeable pattern with the highest r 2 predominantly occurring in the S-W part of the basin, where 27 adjacent EH-zones achieved an average r 2 of 0.33 (see highlighted EH-zones in Figure 5a).
The relative improvement in model RMSE (fivefold CV) resulting from the lagged dependent variable was by far the highest in the floodplain-lake category, with 47% improvement as compared to 22% for floodplains and 27% for nonfloodplain areas. The lagged dependent variable also led to the highest absolute RMSE improvement in the floodplain-lake category with 0.15 compared to 0.05 for floodplains and 0.005 for nonfloodplains and there was no noticeable difference between the northern and southern basin. In comparison to the improvements resulting from the lagged dependent variable, the improvements in RMSE after adding the local climate drivers were small for all three model categories. The average absolute RMSE reduction of the floodplain models across the entire basin was 0.002, with 0.0005 in the southern and Water Resources Research 10.1002/2016WR019858 0.0026 in the northern basin (compare Figure 3b). In this category, there was also a distinctive patch of 14 adjacent zones in the N-W of the basin with a particularly high average RMSE improvement of 0.0036 for 1160 individual floodplain units (see highlighted areas in Figure 3b). It is important to note here that RMSEs of floodplain models also differed across the basin before accounting for the local climate drivers (initial RMSE), with an average initial RMSE of 0.048 in the northern compared to 0.025 in the southern basin (Table 1).
To account for the different magnitude in RMSE of models prior to the variable selection process, we also calculated the relative improvement in RMSE that was achieved through the local climate drivers (Figures  3c, 4c, and 5c). We found that the local climate drivers helped to improve predictive performance of floodplain models by 5.5% in the northern compared to 2.0% in the southern basin with an overall average improvement of 4.9% (Table 1). Furthermore, there were 280 individual floodplain models where the local climate drivers led to RMSE improvement of more than 10% of which 251 were in the northern and 29 were in the southern basin. The average initial RMSE of these 280 floodplain models prior to accounting for the   local climate drivers was high with a RMSE of 0.070 compared to an overall average initial RMSE of 0.041 for all models. This illustrates that large improvement in RMSE after accounting for the local climate drivers may partially be linked to high initial model RMSEs. To understand this connection better, we regressed the absolute RMSE improvement resulting from the local climate drivers against the initial RMSEs for all individual models. These regressions revealed an average r 2 of 0.2 for the floodplain and 0.3 for the nonfloodplain category which indicates that despite a limited connection between these two metrics, high absolute improvements after accounting for the local climate drivers can be achieved for low initial model RMSEs and vice versa.
For floodplain-lakes, the average improvement in RMSE after accounting for the local climate drivers was 0.0025 across the basin and twice as high in the northern (0.0035) than in the southern (0.0019) basin. For nonfloodplains, where P was a fixed variable and only ET and SM were accounted for in the variable selection process, the average improvement was much lower as compared to the other two model categories with a reduction in RMSE of 0.00015 across the basin and 0.00007 and 0.00016 for the southern and northern basin respectively. While the distribution of r 2 -based model performance between the northern and southern basin was the opposite between the floodplain and floodplain-lake categories and the nonfloodplain category, the RMSE improvement after accounting for the local climate drivers was higher in the northern basin for all three model categories.

10.1002/2016WR019858
Based on the agreement between absolute and relative improvements in RMSE, we found the northern and N-W part of the MDB to represent a hot spot for the role of the local climate drivers in explaining SW dynamics in both the floodplain (see Figures 3b and 3c) and floodplain-lake (see Figures 4b and 4c) category. In comparison, many of the nonfloodplain areas that had the highest absolute RMSE improvements showed the lowest relative improvements after accounting for the local climate drivers (see Figures 5b and 5c). The corresponding zones contain large areas of irrigated agriculture, which inhibit the establishment of meaningful variable relationships due to the high level of artificial SW areas that are subject to irrigation management strategies. The artificially increased inundation frequency of the corresponding areas (see irrigation areas in Figures 5b and 5c) can be seen when looking at the number of >0 observations in the nonfloodplain areas (Figure 5f), which is higher in these areas compared to the surrounding areas. The resulting dynamics of these SW areas led to poor model performance and consequently high model RMSEs, which make larger absolute improvements in RMSE more likely when accounting for the local climate drivers. At the same time, the very low relative improvements in RMSE indicate that the local climate drivers did not play significant roles in the dynamics of these artificial SW bodies. Instead, the relative RMSE improvements indicate that similar to the other two model categories, the local climate drivers were most important for explaining SW dynamics on nonfloodplain areas in the N-W part of the MDB.  . Shown are zonal averages of (a) r 2 , (b) the absolute, and (c) relative improvement in RMSE after adding the local climate driver variables along with irrigation areas [BRS, 2008] and the percentage of models, for which adding (d) ET and (e) SM led to RMSE improvement. (f) The number of SW extent observations >0 of each modeling grid cell for the 26 year analysis period. For better illustration of patterns in the modeling results, the symbology was adapted individually for each result metric.

10.1002/2016WR019858
Both the absolute and relative improvements in the model's predictive performance (RMSE improvement) were based on each model's individual combination of local climate drivers as determined in the variable selection process. To get a better understanding of the role of each local climate driver, we also analyzed their individual importance in general and across different zones in the MDB (d, e, f in Figures 3 and 4 and d, e, in Figure 5). For both floodplain categories, P was the most important local climate driver and was selected for 58% of all individual floodplain models as compared to 36% for ET and 40% for SM. There was a noticeable difference in the role of P for explaining SW extent dynamics across the basin with 63% of all floodplain model units in the northern basin accounting for P compared to 48% in the southern basin. P helped to improve the predictive power of more than 70% of all floodplain models in many of the zones in the N-W of the MDB (Figure 3d) with a maximum percentage of 77% within a zone with more than 50 individual models in the zone that contains example cell Ex-A in Figure 1. The highest percentage within a zone with more than 50 individual models where SM helped to explain SW extent dynamics on floodplains was 67% which was also achieved in the same zone.
P was selected for 54% of all floodplain-lake models and ET and SM for 30% and 41%, respectively. Similar to the floodplain category, the highest percentages of models accounting for these drivers occurred in the N-W of the MDB, which represented a hot spot for these local climate drivers. For the nonfloodplain category, ET and SM were selected for about half of all models across the basin with a noticeable difference of 49% compared to 41% (ET) and 56% compared to 41% (SM) of models accounting for these drivers in the northern and southern part of the basin, respectively. These findings indicate that P played the most important role for explaining SW dynamics on floodplains and floodplain-lakes followed by SM and ET. Additionally, P was the fixed driver variable for the nonfloodplain models and the RMSE improvements resulting from the local climate drivers (ET and SM) were by far the smallest in this category which indicates that P also played the most important role in explaining SW dynamics across nonfloodplain areas. In summary, our results suggest that the local climate drivers played a more important role in SW dynamics in the northern compared to the southern basin for all three model categories, with the N-W of the MDB representing a hot spot.

Modeling the 2010/2011 La Nina Floods
The extreme dynamics during the large 2010/2011 La Nina Floods revealed the strengths and limitations of Landsat data and the statistical inundation models for quantifying SW extent dynamics during major flooding events. The very different flooding dynamics of the two example floodplain modeling units can be seen clearly with several short but intense flood bursts in the Paroo site (Ex-C in Figure 6), which is located close to the runoff generating catchment, as compared to the Lower Murray site (Ex-A in Figure 6), where the large flood lasted for over half a year (see Figure 1, for location of Ex-A and Ex-C). The Paroo example illustrates the limitation of Landsat for capturing the short and intense flooding as a result of the comparatively Figure 6. Comparison of the ability of statistical inundation models for quantifying SW extent dynamics during the 2010/2011 La Nina floods based on two example floodplain modeling units with contrasting flooding regimes (see Ex-A and Ex-C in Figure 1). Shown are the hydrographs of the modeling gauge along with observed (green bars) and statistically modeled (black bars) SW extents. For each example floodplain unit, r 2 and RMSE of the validation against observed Landsat-based SW extents during the analysis period are given.

Water Resources Research
10.1002/2016WR019858 long satellite revisit interval and missing observations resulting from cloud cover. Even though this area in the Paroo is located within overlapping satellite paths resulting in a potential time step of 8 days, there was not a single observation with less than 50% cloud cover which would allow to quantify the flooding dynamics during the 1 month period of the largest flood peak in March 2010. In both examples, the statistical inundation models were able to overcome this limitation by predicting SW extents of the missing time steps accurately.
Validation of predicted SW extents (black bars in Figure 6) against the observed Landsat-based SW extents (green bars in Figure 6) revealed an r 2 of 0.97 and a RMSE of 2% for the Paroo floodplain unit and an r 2 of 0.97 and RMSE of 4% for the Murray example. It can be seen that in the Murray example, the prediction of the maximum SW extent during the 9 month period of the flood (see 2011-03 in Figure 6a) is slightly bigger than the highest available Landsat observation in the same period. This means that there was a SW extent observation in a similar range during a different flood before the illustrated time period to which the model was trained as no predictions above the maximum satellite-observed SW extent were made. The statistical inundation models captured the contrasting flooding dynamics of the two example cells well and provided predictions of SW extent at a regular interval of 8 days. Based on the error statistics of the validation time steps, these predictions were a good approximation of the corresponding temporal flooding dynamics for both examples. Both of the example floodplain modeling units had high r 2 for their respective GAM model fits with an r 2 of 0.9 in the Paroo and 0.88 in the Murray example, which partly explains the high accuracy of the predicted validation time steps. Considering the differences in goodness of fit across the basin for all three model categories (see Figures 3a, 4a, and 5a), the accuracy of predicting missing time steps is likely to vary correspondingly for different modeling units and would likely lead to the least accurate predictions for the nonfloodplain category.

Modeling Framework
Despite the growing need for better understanding the role of periodically inundated SW areas in the terrestrial water cycle and the Earth system in general, the propagation of large floods through river systems and the corresponding SW dynamics remain poorly quantified on continental or global scales . Here we used an unprecedented Landsat time series of SW maps to quantify inundation dynamics and the role of local climate drivers across a large and highly regulated river system. We quantified these complex dynamics by building on a transferable grid-based framework [Heimhuber et al., 2016] and developed a model that comprised 68 river gauges and 18,521 individual spatial modeling units among three SW categories (i.e., floodplains, floodplain-lakes, and nonfloodplains), out of which 12,211 exceeded the threshold of 20 >0 observations of SW extent. The development of this framework required a variety of ambiguous decisions in some areas such as selecting the most suitable gauge out of multiple candidates or defining hydraulic connectivity between model gauges and grid cells on the edge of large and scattered floodplains (i.e., top right corner in Figure 1b). Additionally, despite using comparatively small modeling units, the framework was still a simplified approximation of the complex hydrological and hydraulic structure of the river system. Nevertheless, it enabled us to capture and model the spatiotemporal dynamics of large floods and their propagation through the river system at an unprecedented level of detail.
The majority of all SW areas in the MDB are floodplains and our modeling framework achieved high average goodness of fit (r 2 0.65) for the floodplain and floodplain-lake model categories. A key step for achieving good model performance across the large variety of river and floodplain locations was the quantification of Q lags. In many areas of the basin, meaningful variable relationships were revealed only after quantifying and applying these Q lag times to the river flow data prior to fitting the models (i.e., Figure 2). Although this automated quantification can lead to implausible Q lags for some individual modeling units [Heimhuber et al., 2016], the basin-wide estimation of flow travel times that this study provides is an important step toward quantifying the propagation of floods across large river systems. Furthermore, the application of Q lags allowed us to provide statistical models of SW extent for all major river and floodplain systems across the basin, which are essentially rating curves that relate flow in the river to the corresponding inundation extent response on the modeling unit (i.e., Figure 2). In an unprecedented effort to restore the health of the MDB's water-dependent ecosystems, the Australian Government is currently undertaking large investments Water Resources Research 10.1002/2016WR019858 in buying back water licenses and improving irrigation efficiencies to increase the share of water available for environmental flows in the MDB [Burke, 2007;Banks and Docker, 2014]. In the context of this effort, our whole-of-basin inundation response models can provide useful knowledge for managing the distribution of environmental flow water across the MDB holistically and more effectively [Powell et al., 2008;Chen et al., 2011;Sims et al., 2014].

Surface Water Categories
The unique dynamics of different type of SW bodies have previously been addressed by either focusing study sites entirely on lakes Wang et al., 2014;Zhang et al., 2014b;Hu et al., 2015]or floodplains [Frazier and Page, 2009;Jung et al., 2011;Sims et al., 2014], or by using large spatial modeling units and modeling all SW areas simultaneously [Huang et al., 2014]. We found that nonfloodplain areas had by far the lowest model performance (average r 2 of 0.24), which means that modeling all SW areas within a given EH-zone simultaneously would have resulted in lower average goodness of fit across the basin. The poor performance of this category may partly be owed to the fact that the corresponding SW areas were modeled as a function of P over the grid cell as the fixed driver variable instead of Q so that potentially occurring lateral inflows of SW through the ungauged local drainage network into the modeling unit were not accounted for. This category also covered all areas that were not part of the two floodplain categories and hence, comprised a wide range of natural and artificial SW bodies such as farm dams, irrigation paddies, and isolated wetland systems. Additionally, many nonfloodplain areas did not exhibit sufficient SW dynamics for establishing meaningful empirical relationships. In contrast to this, the S-W of the MDB contains numerous isolated ephemeral river and wetland systems (see blue areas outside of colored grid cells in Figure 1a) that seem to have a pronounced relationship with P and had comparatively good model performance (i.e., Figure 5a). Another major difference that we found between the three model categories was that the lagged dependent variable was clearly most important for explaining SW extent dynamics in the floodplain-lake category compared to the other two categories as a result of the slow changes in SW area that occur on lakes. This implies that modeling applications focusing on more static SW areas such as lakes and reservoirs can particularly benefit from statistical methods designed for dynamic processes such as the regression-based lagged dependent variable approach used in this study or Auto Regressive Integrated Moving Average (ARIMA) models. All these differences between the three model categories illustrate that future studies should take into account that SW areas of large and complex rivers systems can have highly variable dynamics that may require tailored modeling approaches.

Local Climate Drivers
We found that P was the most important local climate driver for explaining SW extent dynamics in all three model categories, followed by SM and ET, which may partly be the result of the different nature of the data sets (i.e., observed versus modeled) that we used for representing these drivers [Heimhuber et al., 2016]. In addition, P is arguably the main input of water to a modeling unit despite river flow, which might be another reason for the higher importance of this variable. We also quantified the combined role of the local climate drivers for SW dynamics and found that they played a more relevant role in the northern as compared to the southern MDB with a noticeable hot spot in the N-W. It is important to note that the improvements in model RMSE achieved through the local climate drivers were generally small and hardly ever exceeded 10%. Nevertheless, the model results showed distinctive spatial patterns that were in line with key site characteristics of the MDB, which are likely to have an effect on the role of local climate drivers in large-scale inundation processes.
The highest levels of river regulation are found in the mountainous areas along the S-E and eastern boundary of the basin [CSIRO, 2008;MDBA, 2010] and the adjacent zones to the west have large areas of irrigated agriculture (i.e., Figure 5b). These areas are affected by high levels of SW abstraction and diversion, which highly alters river flow and floodplain inundation regimes and may consequently inhibit the establishment of meaningful statistical relationships between the model variables. In comparison, the Paroo and Warregoo river systems in the very N-W of the MDB have little to no intervention to their natural flow regimes, are of ephemeral and, in the case of the Paroo, also of terminal nature [MDBA, 2010], which means that the drying of inundated floodplains is solely the result of infiltration and evapotranspiration. Additionally, the floodplains of these river systems are extensive, shallow, and sparsely vegetated as compared to many of the southern basin's floodplains, which are often more confined  and covered with Water Resources Research 10.1002/2016WR019858 floodplain forest. It is likely that the increased importance of the local climate drivers in the N-W of the basin is the result of a combination of these characteristics. However, we applied a simplified data-driven empirical modeling approach for quantifying complex land surface processes and further analyses are needed to reinforce these findings on the role of hydroclimatic site characteristics in SW extent dynamics. It is well known that the local climate characteristics before and during flooding such as P, ET, and SM can influence the extent and duration of large-scale inundation processes [Overton et al., 2006;World Bank, 2016] but to our best knowledge, their role in long-term SW extent dynamics has not been quantified across a large river basin to date. Future work on SW dynamics should take into account that local P and other local climate drivers can play an important role in the flooding and drying dynamics of certain SW areas in addition to lateral inflows and outflows of SW.

Quantifying Surface Water Inundation Dynamics
Traditionally, the inundation characteristics of an area have been quantified based on the statistical return period of floods. The corresponding maximum flood extents are typically obtained through hydrodynamic modeling for a range of relevant return periods (i.e., 5, 25, and 100 years) and incorporated into flood hazard maps, which provide useful and well known tools for water resources and disaster management [Heimhuber et al., 2015]. This physically based approach has been upscaled to generate return period flood hazard maps at 90 m resolution for the whole terrestrial land surface between 568S and 608N [Sampson et al., 2015]. There have been recent advances in improving the representation of complex floodplain topographies and corresponding interactions between rivers and their adjacent floodplains in large-scale hydrodynamic models [Yamazaki et al., 2011;Neal et al., 2012Neal et al., , 2015Sampson et al., 2015]. In addition, a recent study [Schumann et al., 2016] has applied a hydrodynamic model to generate a 40 year climatology of floodplain inundation dynamics across Australia and found good agreement with total inundated areas derived from a Landsat-based time series of SW extent. Nevertheless, these large-scale hydrodynamic models are still limited for accurately quantifying the complex and fine-scaled dynamics of periodically inundated and potentially shifting SW areas across large regulated river basins and over long periods of time .
In comparison to hydrodynamic modeling, satellite-based approaches typically do not provide information about the return periods of floods. Instead, SW dynamics are characterized based on composite flooding frequency maps which are derived directly from a series of classified images. Landsat-based studies have expressed inundation frequency as the number of times a given pixel was flagged as water, divided by the total number of satellite observations during a given year or the entire analysis period [Mueller et al., 2016;Tulbure et al., 2016]. One of the limitations with this approach is that flooding may be underestimated, since there are typically less cloud-free observations available during floods and peak flood extents are often missed as a result of the 16 day revisit cycle of the Landsat satellite. As an alternative, MODIS imagery has been used to create daily series of SW maps which were combined into temporally consistent composite maps, showing the number of days that every pixel was inundated during a given year Kuenzer et al., 2015]. CSIRO has combined MODIS-based SW maps with river flow data, on which frequency analysis was performed, to generate return period inundation maps for the MDB [Huang et al., 2014]. Despite their potentially achievable daily time step, MODIS-based SW time series are limited by their coarse 250 or 500 m resolution as well as their, in comparison to Landsat, lower SW classification accuracies and limitations for capturing fine-scaled water bodies and inundation patterns . It is evident, that there is currently no validated method available for quantifying SW inundation dynamics continuously at a high Landsat-like resolution on subcontinental or continental scale.
Our study partly addresses this limitation by using a Landsat-based time series of SW maps to provide models that statistically relate SW extent to river flow and other hydrological key drivers, for which long-term daily data is available. The resulting statistical models can predict SW extent for each individual floodplain and floodplain-lake modeling unit (max. 10 3 10 km) at a regular 8 or 16 day time step, which can lead to substantially improved quantification of the temporal dynamics of large floods (i.e., Figure 6) as compared to using only cloud-free satellite observations. One limitation of this approach was that estimation of predictive accuracy was based on validation against a limited number of available SW extent observations. In addition, the accuracy of predictions partly depends on the goodness of fit of each individual model, which was variable across the MDB but predominantly high for all gauge connected SW areas.

10.1002/2016WR019858
Assuming that a certain numerical SW extent always corresponds to the same distribution of SW across the modeling unit, statistical predictions of SW extent could be converted into SW maps, to generate a spatial time series with regular time step. Our modeling units, however, are still large compared to a Landsat or MODIS pixel so that conversion of numerical SW extents back into the spatial domain would be subject to a variety of uncertainties. To address this limitation, future work should evaluate the suitability of image fusion algorithms for quantifying SW dynamics [Gao et al., 2006;Zhu et al., 2010]. Image fusion takes advantage of the complementary data characteristics of different satellite sensors such as Landsat and MODIS and has previously been used to generate image time series with Landsat-like resolution and a regular time step of MODIS composite images of eight days [Hilker et al., 2009;Walker et al., 2012;Emelyanova et al., 2013;Knauer et al., 2016]. Although a few applications have applied image fusion to areas that are subject to flooding [Emelyanova et al., 2013;Zhang et al., 2014a], a comprehensive analysis of the suitability for quantifying SW extent dynamics on large river basin scale is currently missing. Therefore, future research should assess the potential of data-fusion for quantifying long-term and large-scale SW extent dynamics and compare this approach with the statistical modeling approach presented here.

Conclusions
We integrated 26 year long time series of Landsat-derived SW maps, P, ET, and SM with flow data from 68 river gauges to model SW extent dynamics and the role of drivers holistically across the MDB, and quantified flood propagation times for all of the basin's key river and floodplain sites. GAM models captured the large variety of nonlinear empirical Q to SW extent relationships across the basin well and the lagged dependent variable led to large improvements in model performance, particularly for the floodplain-lake category, which exhibits the slowest changes in SW extents. Categorizing SW areas was another crucial step for achieving high goodness of fit of statistical inundation models, which was much higher for gaugeconnected SW areas (i.e., floodplains and floodplain-lakes) than for the remaining SW areas (nonfloodplain areas) and better in the northern compared to the southern MDB. The local climate drivers were also more important for SW dynamics in the northern basin with the N-W representing a hot spot. These N-S gradients that our models revealed were in good accordance with key basin characteristics such as lower levels of river regulation, vaster and shallower floodplains and higher climate variability in the northern as compared to the southern MDB. Our study illustrates that integrating multidecadal time series of earth observation data with statistical modeling techniques can provide cost-effective tools for improving the management of limited SW resources and better understanding of long-term inundation dynamics and respective drivers. The data-driven modeling framework is applicable to other large and complex river basins across the world and provides statistical models that can predict SW extent for cloud-affected Landsat observations or during the peak of floods and hence, allows a more detailed quantification of the spatiotemporal inundation dynamics of large floods as compared to existing approaches.