Volume 59, Issue 2 e2022WR032312
Research Article
Open Access

Insights From Dayflow: A Historical Streamflow Reanalysis Dataset for the Conterminous United States

Ganesh R. Ghimire

Corresponding Author

Ganesh R. Ghimire

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Correspondence to:

G. R. Ghimire,

[email protected]

Contribution: Conceptualization, Methodology, Software, Validation, Formal analysis, ​Investigation, Writing - original draft, Writing - review & editing, Visualization

Search for more papers by this author
Carly Hansen

Carly Hansen

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Methodology, Validation, Formal analysis, ​Investigation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Sudershan Gangrade

Sudershan Gangrade

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Methodology, Validation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Shih-Chieh Kao

Shih-Chieh Kao

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Conceptualization, Methodology, Software, Validation, Resources, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition

Search for more papers by this author
Peter E. Thornton

Peter E. Thornton

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Validation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Debjani Singh

Debjani Singh

Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Data curation, Writing - original draft, Writing - review & editing

Search for more papers by this author
First published: 17 January 2023
Citations: 6

Abstract

Reconstructed historical streamflow time series can supplement limited streamflow gauge observations. However, there are common challenges of typical modeling approaches: process-based hydrologic models can be data/computation-intensive, and statistics-based models can be region/stream-specific. Here we present a nationally scalable modeling framework integrating the simulated runoff from the Variable Infiltration Capacity (VIC) model with the Routing Application for Parallel computatIon of Discharge (RAPID) routing model leveraging high-performance computing. We demonstrate an efficient method of assimilating streamflow at US Geological Survey (USGS) streamflow monitoring sites using a simple hierarchical approach in the VIC-RAPID framework. The result is a reconstructed 36-year (1980–2015) daily and monthly streamflow dataset (Dayflow) at ∼2.7 million NHDPlusV2 stream reaches in the conterminous US (CONUS). We perform a comprehensive evaluation at 7,526 USGS sites and characterize their error statistics. The results demonstrate that 49% of the USGS sites demonstrate Kling–Gupta Efficiency (KGE) > 0.5 and 58% of the sites show percentage bias within ±20% for the daily naturalized streamflow. Streamflow data assimilation across CONUS shows an overall improvement over naturalized streamflow, notably in the western semiarid-to-arid regions. Comparison to other national and global streamflow reanalysis datasets such as the National Water Model and Global Reach-scale A priori Discharge Estimates for SWOT demonstrates improved KGE, reduced bias, and directions for Dayflow improvements. Investigations of error statistics with key hydrologic, hydroclimatic, and geomorphologic basin characteristics reveal region-specific patterns which may help improve future framework applications. Overall, Dayflow may enable a better understanding of hydrologic conditions in a changing environment, especially in locations currently not represented by streamflow monitoring networks.

Key Points

  • Conterminous US scale daily streamflow reanalysis data (Dayflow) for 36 years (1980–2015) is reported and evaluated against observations

  • Hierarchical observational data substitution shows an overall improvement in terms of several error metrics with region-specific patterns

  • Streamflow prediction errors show region-specific patterns with key basin characteristics

1 Introduction

Accurate historical streamflow data are key information to support long-term water resources planning, ecological system conservation, and flood resilience and mitigation. Regional-to continental-scale streamflow assessments are particularly desired to evaluate changes in streamflow trend and variability (Biederman et al., 2015; Pagano & Garen, 2005; Sadeghi et al., 2019), timing (Ryberg et al., 2020), low flow occurrence (Dudley et al., 2020), human influence (Forbes et al., 2019), hydroclimatic drivers (Rumsey et al., 2020), permafrost melting (Walvoord & Striegl, 2007), irrigation pumping (Kustu et al., 2010), flood hazard assessments (Alfieri et al., 20152017; Wobus et al., 2021), and impacts on fish community (Gido et al., 2010). While gauge observations may provide the most reliable streamflow estimates, they are limited in both space and time (Fekete & Vörösmarty, 2007), posing a significant data gap in ungauged basins (Lin et al., 2019; Sivapalan, 2003).

In the United States, streamflow observations are mostly provided by the US Geological Survey (USGS) National Water Information System (NWIS; USGS, 2021). Although there are over 24,000 streamflow gauges in NWIS, a majority of river reaches (i.e., compared to the 2.7 million reaches in the medium resolution National Hydrography Dataset [NHD]) remain unmonitored. Furthermore, the number of active streamflow gauges continues to decline. At the end of the water year 2021, there were only ∼8,400 active streamflow gauges in NWIS with inconsistent record lengths. While other streamflow observations may be available through state or local agencies, these monitoring efforts are generally shorter-term and less consistent than the observations provided by NWIS.

Given this data challenge, Blöschl et al. (2019) highlighted the need for new technologies to offset the continuously declining measurements of streamflow (Hannah et al., 2011) at all scales. Hydrologic modeling is often employed to supplement limited streamflow observations. However, representing streamflow through either process- or statistics-based models can be challenging at large scales because of the diverse land surface and hydrologic processes, high computation cost, and lack of sufficient data for model calibration and validation. Several earlier approaches were proposed to provide coarser-resolution streamflow estimates (e.g., annual and monthly averages) across the conterminous US (CONUS) based on statistical methods. For instance, Vogel et al. (1999) derived regional regression formula to estimate annual average streamflow based on climatic and geomorphological characteristics (Vogel et al., 1999). The current version of the NHD Plus (NHDPlusV2.1) uses the Enhanced Unit Runoff Method based on a combination of unit runoff from a water balance model, regression-based adjustments, reference gauge observations, and an accounting process to estimate monthly and annual average streamflow (McKay et al., 2012). While these data-driven approaches can provide good first-order flow information, more accurate, and finer resolution estimates are desired for broader applications.

Meanwhile, process-based land surface models (LSMs) such as the Sacramento Soil Moisture Accounting Model (Burnash et al., 1973), Noah-MP (Niu et al., 2011), Unified Land Model (Livneh et al., 2011), and Variable Infiltration Capacity (VIC; Liang et al., 1994) have been used to simulate daily or sub-daily streamflow at regional-, continental-to global-scales. For instance, VIC has been widely used to study streamflow response to changing climate conditions (Catalán et al., 2016; Hamlet & Lettenmaier, 2005; Hamlet et al., 2007; Liu et al., 2013; Marx et al., 2017; Naz et al., 2016; Raymond et al., 2013; Safeeq et al., 2014), droughts (Shukla & Wood, 2008), hydropower production (Hamlet et al., 2010; Kao et al., 2015), and floods (Bates et al., 2021; Porter, 2021). Coupling an LSM with a high-resolution river routing scheme may further provide refined streamflow estimates at the reach scale. For instance, Tavakoly et al. (2017) coupled VIC with the Routing Application for Parallel computatIon of Discharge (RAPID) routing model (David, 2019; David et al., 2011) to simulate daily streamflow time series over 10 years (2005–2014) at ∼1.2 million NHDPlusV2 river reaches in the Mississippi River Basin. A continental-scale hindcast study by Salas et al. (2018) for the summer of 2015 indicated that the framework has a promise for reach-scale flood forecasting. Lin et al. (2019) took this framework further to the global scale to reconstruct historical daily streamflow (GRADES; Princeton-ReachHydro, 2022) over a 36-year (1979–2014) period at ∼2.94 million river reaches derived from MERIT Hydro (Yamazaki et al., 2019). The US National Water Center provides CONUS-scale streamflow reanalysis (1979–2020) over the ∼2.7 million NHDPlusv2 river reaches using the operational National Water Model (NWM), which employs the WRF-Hydro model with Noah-MP and the Muskingum–Cunge river routing scheme. These historical streamflow reanalysis products, generated by process-based models and supported by modern high-performance computing (HPC), have enabled new opportunities to provide more enriched streamflow information. However, process-based models alone do not guarantee better accuracy. In many cases, cumulative errors during routing may lead to biased modeling results that cannot be fully remedied by numerical models. Data assimilation with observed streamflow is still needed to improve the accuracy of streamflow estimates.

A comprehensive evaluation of large-scale high-resolution streamflow reanalyses has not yet been reported except for a few at regional scales (e.g., Hansen et al., 2019; Jachens et al., 2021; Karki et al., 2021; Rojas et al., 2020). Using a lumped hydrologic modeling framework, Newman et al. (2015) provided a well-calibrated Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) streamflow reanalysis dataset for 671 small-to-medium headwater basins across the CONUS, but such an approach cannot provide flow estimates for all river segments in the basin. Using grid-based river networks, some studies provided large-scale streamflow reanalyses (e.g., Alfieri et al., 2020; Harrigan et al., 2020), in which streamflow is routed through flow direction grids derived from a digital elevation model (DEM) (Mizukami et al., 2021; Yamazaki et al., 2009). However, due to insufficient spatial resolution, grid-based river network often fails to accurately represent the actual streams and processes at small scales. The integration of features such as lakes, reservoirs, and gauges in the gridded river network demands special attention to avoid misplacement in the model domain (Tavakoly et al., 2017; Wood et al., 2011). Vector-based river networks are more efficient and reliable in representing the real-river system relative to high-resolution gridded river networks as they employ physics-based spatial discretization (Lehner & Grill, 2013; Mizukami et al., 2021). Apart from different modeling approaches and despite the availability of HPC, exhaustive calibration/validation efforts still depend largely on assimilation or naturalized streamflow (Safeeq et al., 2014). Therefore, a large-scale evaluation may provide insights and help identify sources of potential uncertainties. For instance, Wenger et al. (2010) highlight a few questions on VIC's predictability of high and low flows during summer and in regions with strong groundwater interaction while Rakovec et al. (2019) indicate VIC's limitations in capturing the terrestrial water cycle. Further, efforts to derive insights and attribute uncertainties of streamflow estimates across a wide range of scales and hydroclimatic regions have not been comprehensively reported, including the NWM reanalysis.

In this study, we demonstrate an efficient large-scale integration of VIC and RAPID models within a framework that assimilates streamflow observations across the NHDPlusV2 river network. The result is a long-term historical streamflow reanalysis dataset over the CONUS at the daily scale, referred to as ‘Dayflow’. We also characterize the Dayflow prediction errors based on hydroclimatic, hydrologic, and geomorphologic basin characteristics and suggest directions for future model improvement. Dayflow data are provided as daily time series at ∼2.7 million NHDPlusV2 river reaches with accompanying observational data and model performance metrics. We expect that Dayflow may provide useful benchmarks for comparisons with other large-scale and long-term model outputs and also enable a better understanding of potential streamflow response to changing hydrologic conditions, especially for ungauged locations.

2 Materials and Methods

2.1 Computational Domain

This study builds on Oubeidillah et al. (2014) and Naz et al. (2016) to simulate streamflow along the NHDPlusV2 river network across the CONUS. The effort provides daily streamflow time series for 36 years (1980–2015) at ∼2.7 million NHDPlusV2 river reaches (median local drainage area = 1.38 km2) in the US (Figure 1). In addition, we consider ∼10 thousand river reaches in British Columbia, Canada from Forbes et al. (2019) during RAPID river routing. We select the vector-based RAPID rather than the conventional grid-based routing models (e.g., Lohmann et al., 1998; Naz et al., 2018) since RAPID may simulate streamflow across a wide variety of watershed sizes more accurately and efficiently (Figures 1b–1e), especially for those smaller or irregular watersheds that cannot be represented by rectangular grids. The selection of gauge stations is discussed in Section 2.3.3.

Details are in the caption following the image

Demonstration of the computational domain (a) with National Water Information System streamflow gauges along the NHDPlusV2 river network. Streams with higher Strahler order (>2) are shown. The labels represent the number of HUC2 hydrologic regions (from 01 to 18) in the conterminous US. Panels (b) and (c) show distributions of stream order and catchment area, respectively, as a percent of the number of reaches in NHDPlusV2. Panels (d) and (e) show distributions (y-axis as counts) of basin scales and historical record lengths, respectively at US Geological Survey monitoring sites. The broken red line in the distribution plots represents the median.

2.2 Input Data

The CONUS-scale hydrologic modeling effort in the VIC-RAPID framework requires several inputs (refer to Table S1 of Supporting Information [SI] for a detailed list), including meteorological forcing (e.g., daily precipitation, daily maximum and minimum temperature, and wind speed), elevation, land cover, soil feature, and vegetation. We use the Daymet (Thornton et al., 2016a2016b2021) meteorological forcing, which is available since 1980 to present at a daily scale and 1 km spatial resolution for continental North America, Hawaii, and Puerto Rico making it the best choice for the daily streamflow simulation across the CONUS. Daymet uses interpolation and extrapolation of ground-based meteorological observations through statistical modeling with improved timing and high-elevation temperature measurement biases (Thornton et al., 2021). Further efforts are underway to temporally disaggregate Daymet to a sub-daily scale to improve the representation of rainfall extremes (Kao, Thornton, et al., 2022), but such an effort is yet to be included in this initial Dayflow framework.

The 1 km spatial resolution CONUS-SOIL (Miller & White, 1998a1998b) dataset provides high-resolution soil information important for infiltration and baseflow partitioning. Vegetation information, which is crucial for quantifying evapotranspiration is obtained from the 1 km resolution MODIS leaf area index dataset (Myneni, Knyazikhin, & Park, 2015a2015b). USGS National Elevation Dataset (USGS, 2018a2018b) provides seamless inputs on elevation and topography for representing the snow accumulation and snowmelt process in VIC simulation. These diverse data inputs have been merged and refined to 1/24° (∼4 km) spatial resolution for CONUS-scale VIC simulation and the subsequent RAPID routing. The input data also include observed runoff and streamflow for the model calibration and evaluation. Apart from NWIS, the USGS WaterWatch monthly runoff dataset (Brakebill, Wolock, & Terziotti, 2011a2011b), which is the assimilated flow per unit area for each US HUC8 hydrologic subbasin, is used to calibrate VIC parameters (Naz et al., 2016; Oubeidillah et al., 2014). All VIC-related data inputs are described in further detail by Oubeidillah et al. (2014). We use the NWIS streamflow observations in Figure 1 to assimilate and evaluate Dayflow at the corresponding NHDPlusV2 river reaches.

2.3 Modeling Framework

Figure 2 presents a graphical summary of the proposed VIC-RAPID modeling framework. Within this framework, the purpose of VIC is to simulate the overall runoff outputs from each watershed and the purpose of RAPID is to simulate the river routing process through the river network. Consequently, the calibration of VIC focused on preserving the monthly water balance while the calibration of RAPID focused on preserving the daily streamflow performance. Each step is described further in the subsequent sections. The final output is the Dayflow dataset that provides the CONUS-scale historical streamflow reanalysis at the daily time step. Additionally, the post-modeling analysis provides a characterization of the underlying prediction errors.

Details are in the caption following the image

Graphical description of the Variable Infiltration Capacity (VIC)-Routing Application for Parallel computatIon of Discharge (RAPID) model simulation and evaluation framework.

2.3.1 VIC Runoff Simulation and Calibration

The VIC model simulates land surface processes including evaporation, transpiration, infiltration, and snow accumulation/melting to produce hydrologic outputs such as total runoff (i.e., a combination of baseflow and surface runoff) and other variables. A large number of parameters are required for VIC simulation such as soil, vegetation, elevation, and meteorological forcings. Given a dimensionality issue, Oubeidillah et al. (2014) use the sensitivity analysis following the approach by Demaria et al. (2007) to narrow these parameters down to five for calibration. Those parameters were the variable infiltration curve parameter (binfilt), soil layer 2 thickness (thick2), exponent of the Brooks–Corey drainage equation (exp), the fraction of the maximum base flow velocity at which nonlinear base flow begins (Ds), and the fraction of maximum soil moisture at which nonlinear base flow occurs (Ws). Oubeidillah et al. (2014) used 1980 as the spin-up period, 1981–2000 as the calibration period, and 2001–2008 as the validation period. Calibration is performed across all HUC8 subbasins independently across the CONUS using an HPC-enabled multi-site calibration approach. The approach allows the refinement of model performance uniformly across all subbasins. This process led to the development of the CONUS-scale VIC parameter dataset at the HUC8 hydrologic subbasin scale to simulate the 4-km resolution national-scale daily total runoff during 1980–2015.

While the HPC-enabled calibration exercise represented an intensive effort to improve the CONUS-scale VIC performance, given the complicated hydrologic conditions across the nation, many challenges remain. For instance, there could be issues with the selected WaterWatch runoff benchmark, particularly at those HUC8s with limited gauges, in arid regions, or in regions with highly regulated river flows. The HUC8-based calibration may also create some sharp parameter discontinuity across the HUC8 boundary, which could benefit from implementing a seamless large-domain regionalized parameter calibration strategy like Mizukami et al. (2017). Since the main focus of this study is on streamflow routing and assimilation, we only acknowledge the issues of adopted VIC parameter sets. Efforts such as using alternative parameter sets or different land surface models can be topics of future studies.

2.3.2 RAPID Streamflow Routing and Calibration

We routed the simulated VIC total runoff through the NHDPlusV2 river network using the RAPID (David, 2019; David et al., 2011) routing model. The RAPID model has been designed to perform large-scale flow routing and has demonstrated skill in several large-scale applications (e.g., Lin et al., 2019; Salas et al., 2018; Snow et al., 2016; Tavakoly et al., 2017). Here we adopted the RAPID model that uses the Muskingum-Cunge method (McCarthy, 1939) with parameters K and x. This approach enhances the computational efficiency to perform routing along large river networks such as NHDPlusV2 (David et al., 2011; Tavakoly et al., 2017).

For RAPID implementation, the Muskingum-Cunge method is cast in a vector-matrix formulation as shown in Equation 1.
(1)
where I is the identity matrix, N is the river network matrix, C1, C2, and C3 are diagonal parameter matrices (see Equations 2-4), Q is a vector of reach outflows, Qe is a vector of lateral inflows to each reach, t is time, and Δt is the flow routing time step.
(2)
(3)
(4)
The parameters j, Kj, and xj represent the index of river reach, storage constant in time, and relative influence of inflow on storage (dimensionless), respectively. Irrespective of the value of K and Δt, the value of x is stable in the domain (Cunge, 1969; David et al., 2011). x = 0 corresponds to the maximum attenuation while x = 0.5 represents no flow attenuation. The parameter Kj is analogous to the travel time of the flow in a river reach j of length Lj (Cunge, 1969; David et al., 2011). If the flow travels with the wave celerity cj, Kj can be obtained from Equation 5.
(5)
Automated estimations of Kj and xj for all reaches in the river network are computationally demanding, though Equation 5 allows its theoretical computation. Therefore, we restrict it to an estimation of calibration parameters k and x such that,
(6)
We calibrated the RAPID parameters k and x to minimize the squared error cost function between Dayflow and streamflow observations at multiple NWIS stream gauge sites given by,
(7)
where is the Dayflow, is the daily observed streamflow, is the average of daily observed streamflow, n is the number of gauges used for the calibration, and t0 and tf are the first and last day of the calibration period, respectively. One could use the streamflow observations at upstream and downstream stations based on lagged cross-correlation (David et al., 2011) or terrain-based parameterization (Thober et al., 2019) to compute the flow wave celerity. However, for the CONUS-scale implementation, we adopted a uniform wave celerity of 0.28 m/s (i.e., 1 km/hr) that results in an initial estimate of k = 0.131 and x = 0.258. We constrain the selection of parameter k between 0 and 1. In our setup, we start routing the flow in the network from the headwater streams to a stream gauge with complete streamflow records (i.e., assimilation site), evaluate the objective function (7), and calibrate the parameters k and x. We used Δt = 900s to conduct these simulations.

Figure 3 shows the spatial variability of calibrated Muskingum parameters for the RAPID model. As previous studies (e.g., David et al., 2011; Lin et al., 2019; Tavakoly et al., 2017) demonstrated, x shows much smaller spatial variability across the CONUS. In contrast, parameter k shows much larger spatial variability most notably due to the variability of reach length, and is more responsible for the performance of streamflow routing. Note that both parameters in the setup are temporally constant. The RAPID parameter dataset (k, x) presented here could be used as a catalog for future large-scale RAPID simulations along the NHDPlusV2 network.

Details are in the caption following the image

Spatial variability of Muskingum routing parameters across the NHDPlusV2 river network (a) k-parameter and (b) x-parameter.

2.3.3 Processing of Historical Streamflow Observations

The simulation and evaluation of Dayflow require streamflow observations. We first identified 13,521 NWIS gauges with some daily streamflow observations during 1980–2015. We then identified 7,526 NWIS gauges that, (a) have at least 25% of daily streamflow records flagged as “approved for publication” by USGS, (b) have average annual streamflow greater than 0.15 m3/s (5 ft3/s), and (c) are on the NHDPlusV2 river network. The exclusion of stream gauges, generally monitoring smaller basins (<0.15 m3/s or 5 ft3/s), is necessary since our current meteorologic forcings (at daily resolution) and modeling framework cannot reasonably simulate runoff (at 4 km grid) and streamflow at a very small spatial and temporal scales. For instance, flash flood events at small headwater basins might be harder to capture with daily meteorologic forcings. Overall, there were 2,934 gauges with complete daily streamflow records during 1980–2015 (also see Figure 1). We used these complete gauges as the streamflow data assimilation points (discussed in the next section), and other gauges for evaluation only. For the 2,934 assimilation gauges, we generated the simulated streamflow prior to data assimilation for a fair evaluation of Dayflow performance at these locations. These gauges represent a broad range of basin sizes between 0.03 km2 (canal/ditch) and 2.915 million km2 with a median size of about 600 km2 (Figure 1c).

We further index these 7,526 gauges to the NHDPlusV2 network. USGS-USEPA (2015) has indexed 28,164 USGS monitoring sites in the network. To complement this dataset and index some additional missing gauges, we first assign the index based on the minimum distance. We then compared the differences in drainage areas between the NWIS gauge and the NHDPlusV2 river segment to identify erroneous linkages. We also conducted manual adjustments of some river network linkages to ensure the reasonableness of the overall routing results.

2.3.4 Hierarchical Observational Data Assimilation

The VIC-RAPID modeling framework allows the assimilation of observed streamflow by updating the RAPID routed streamflow at any river reach with complete historical streamflow observations. Here, we route streamflow in the river network starting from the headwater streams until it reaches a gauge with complete streamflow records. Sites labeled 1, 2, and 5 in Figure 4 illustrate such locations. We refer to them as assimilation points as one could perceive this as the simplest form of data assimilation that assumes streamflow observations free of measurement uncertainty. Locations 3 and 4 have partial streamflow records and are used for evaluation only (Figure 4a). In this illustration, the assimilated flow from locations 1 and 2 are routed downstream with contribution from lateral runoffs (light blue subbasins in Figure 4c) till we reach another assimilation point 5. We then evaluate the simulated RAPID streamflow at 3, 4, and 5 and calibrate the RAPID routing parameters. This procedure continues until the routed streamflow arrives at the outlet of the river system. For comparison, we also perform another round of VIC-RAPID simulation without data assimilation. The output is referred to as naturalized streamflow which is conceptually similar to the streamflow reanalysis outputs produced by many other modeling studies that only account for natural processes but do not reflect withdrawals or other complex human interactions.

Details are in the caption following the image

Demonstration of the hierarchical data assimilation (substitution) approach in the river network. (a) Characterization of National Water Information System gauges as either assimilation gauges (i.e., with the complete record) or evaluation-only gauges (i.e., with partial records), (b) substitution of streamflow observations at assimilation gauges (1 and 2), (c) route assimilated flow (from pink subbasins) downstream along with runoff from subbasins (light blue subbasins) and calculate model performance metrics at evaluation-only gauges (3 and 4), and (d) assimilate the streamflow observations at downstream gauge 5.

We recognize that other more sophisticated assimilation approaches may be desirable over the hierarchical data assimilation that propagates only in the downstream direction of the river network. For instance, one may perform quantile-based bias correction to incorporate gauges with partial records. Similarly, one may also insert reservoir or water withdrawal/return modules to better address human influence on streamflow regulation. One could go a step further and implement the more robust assimilation approaches like Ensemble Kalman Filter (e.g., Moradkhani et al., 2005), particle filter (e.g., Abbaszadeh et al., 2018), and integration of deterministic four-dimensional variational (4DVAR) assimilation with the particle filter (Abbaszadeh et al., 2019). However, to demonstrate the value of data assimilation at the CONUS-scale implementation through Dayflow (even in the simplest form), we opt for hierarchical observational data assimilation in this initial study.

2.4 Dayflow Evaluation

Evaluation of streamflow simulations provides a sense of both predictability and reusability. We used several standard performance metrics against the selected NWIS gauges from Section 2.3.3 (Figure 1) at both daily and monthly scales. Among several error metrics, the key ones reflecting various streamflow characteristics are the Kling-Gupta Efficiency (KGE), normalized mean absolute error (nMAE), percentage bias (Pbias), and percentage peak difference (PPD). Refer to Appendix A for details on their computation.

2.5 Hydrologic, Hydroclimatic, and Geomorphological Basin Characteristics

To identify the potential regional patterns in error metrics, we investigate their relationships with hydrologic, hydroclimatic, and geomorphological characteristics within each basin from the GAGES dataset (Falcone et al., 2010). GAGES is a comprehensive national effort characterizing basin characteristics across 6,785 NWIS gauges. In Table 1, we show 24 key basin characteristics from the GAGES dataset that could be most relevant (e.g., Wu et al., 2021) for our analysis. The distribution of these basin characteristics across 6,785 USGS sites is mostly skewed positively (see Figure 5). The comparison of distributional behaviors between the basin characteristics and Dayflow error metrics may provide further insights into their interrelationships, which we discuss further in Section 3.4.

Table 1. Key Hydrologic, Hydroclimatic, and Geomorphological Characteristics Used in This Study
Categories Basin characteristics Acronym Categories Basin characteristics Acronym
Basin ID Drainage area [km2] DA Hydrology Stream density SD
Basin classification Hydrologic disturbance index HDI Hydrology Baseflow index BFI
Census block Population density PD Hydrology Topographic Wetness index TWI
Climate Mean annual precipitation [cm] MAP Hydrology Annual runoff (30 years) [mm/year] AR30
Climate Max. monthly precipitation [cm] MXMP Hydrology Annual runoff (50 years) [mm/year] AR50
Climate Min. monthly precipitation [cm] MIMP Hydrology modified by dams Number of dams ND
Climate Mean annual temperature [˚C] MAT Hydrology modified by dams Dam density DD
Climate Max. monthly temperature [˚C] MXMT Infrastructure % Impervious PI
Climate Min. monthly temperature [˚C] MIMT Basin land cover % Urban PU
Climate Mean annual PET [mm/year] MAPET Land cover change % Change urban PCU
Climate Snow percent precipitation SPP Topography Mean elevation [m] ME
Climate Precipitation seasonality index PSI Topography % Slope PS
Details are in the caption following the image

Distributions of key hydrologic, hydroclimatic, and geomorphologic characteristics of the selected National Water Information System gauges.

3 Results and Discussion

3.1 Comparison Between Naturalized Streamflow and Assimilated Streamflow

In Figure 6, we present the distribution of the VIC-RAPID model performance of both naturalized and assimilated streamflow at the daily time step using four error metrics. Focusing on KGE which reflects the overall performance, two key messages emerge. First, the KGE distribution is negatively skewed, with ∼92% of gauges demonstrating KGE above the threshold (∼−0.41) indicating the model performed better than the mean streamflow. About 49% of the NWIS gauges demonstrate a KGE >0.5 (Figure 6) for naturalized streamflow with some regional patterns (Supporting Information S1; Figure S1). Likewise, about 58% of the sites show Pbias within ±20%. These results indicate the robustness of the simulated streamflow (Towner et al., 2019) across the continental US and its promise for several water resource applications. Second, the assimilation of streamflow in the VIC-RAPID framework shows an overall improvement in performance. About 34% of the gauges show improvement over the naturalized streamflow, 9% show deterioration while 57% remain unaffected in terms of KGE after streamflow assimilation. Investigating the breakdown of KGE components (see Figures S1 and S2 of Supporting Information S1) we see that the streamflow assimilation shows almost similar improvement in the correlation (∼32%), variance ratio (∼29%), and mean ratio (∼31%). A larger improvement in the correlation among the three components of KGE signifies that the streamflow assimilation resulted in better streamflow timing suggesting the overall improvement of streamflow reanalysis. The normalized MAE, Pbias, and PPD show a very similar pattern of performance enhancement (Figure 6). Both naturalized and assimilated streamflow mostly tend to underestimate flow peaks, and the underestimation is more apparent at smaller basins (Figure S3 of Supporting Information S1).

Details are in the caption following the image

Distributions of key error metrics: Kling–Gupta Efficiency , normalized mean absolute error [10−2 m3/s/km2], Pbias [%], and percentage peak difference, PPD [%]. The first row shows the comparison of empirical cumulative distribution functions, and the second row shows the comparison of histograms between naturalized and assimilated streamflow from Dayflow for the respective error metrics. The broken blue and red lines correspond to the median values of the error metrics for naturalized and assimilated Dayflow, respectively.

Though we observed an overall improvement in performance with flow assimilation (mean , it is not the case across the entire CONUS. While improved KGE is observed throughout the country, there are distinct regional patterns (Figure 7). Flow assimilation generally improves the performance in the river reaches with large streamflow regulation or alteration (e.g., directly downstream of major dams). Also, we find some notable improvements in the upper Missouri, upper Arkansas-White-Red, Rio Grande, Upper Colorado, and Lower Colorado regions. These are mostly arid regions (Wu et al., 2021) with thin vegetation cover and limited stream gauge coverage (see Table 2) where streamflow generation mechanisms are primarily governed by Hortonian (infiltration-excess) overland flow (Wu et al., 2021). Therefore, the VIC model could improve on such physical processes representation in modeling runoff in the region (Oubeidillah et al., 2014; Safeeq et al., 2014).

Details are in the caption following the image

Spatial distribution of the difference in Kling–Gupta Efficiency (KGE) between the assimilated and naturalized streamflow across NWIS stream gauges. The KGE difference is computed as . The plot on the bottom left corner shows an empirical CDF of with the blue area depicting better performance of assimilated streamflow while the red area shows better performance by the naturalized streamflow.

Table 2. Summary of Median Error Metrics Across HUC2 Hydrologic Regions
Hydrologic region (HUC2) Assimilation gauge density (10−3) Median drainage area [km2] Median error metrics
KGE nMAE [m3/s/km2 10−2] Pbias [%] PPD [%]
New England (01) 0.71 234 0.66 (0.66) 0.96 (0.96) −3.89 (−4.54) −36.42 (−39.49)
Mid Atlantic (02) 1.25 188 0.60 (0.57) 0.84 (0.91) −1.11 (−3.15) −48.10 (−53.77)
South Atlantic-Gulf (03) 0.32 407 0.57 (0.55) 0.71 (0.74) 3.19 (4.67) −42.95 (−46.37)
Great Lakes (04) 0.20 488 0.59 (0.56) 0.53 (0.58) 0.12 (0.58) −22.69 (−29.29)
Ohio (05) 0.50 581 0.62 (0.59) 0.85 (0.88) −1.60 (−2.13) −38.95 (−44.27)
Tennessee (06) 0.25 369 0.64 (0.62) 0.94 (0.96) 0.45 (−0.64) −42.39 (−48.33)
Upper Mississippi (07) 0.40 925 0.57 (0.55) 0.50 (0.52) 0.79 (1.19) −33.27 (−40.48)
Lower Mississippi (08) 0.05 988 0.59 (0.58) 1.05 (1.05) 2.62 (2.55) −37.15 (−39.66)
Souris-Red-Rainy (09) 0.17 2,115 0.50 (0.41) 0.19 (0.20) 0.12 (1.60) −11.47 (−18.32)
Missouri (10) 0.21 1,466 0.38 (0.30) 0.16 (0.18) 0.83 (1.87) −36.23 (−44.55)
Arkansas-White-Red (11) 0.24 1,554 0.49 (0.38) 0.31 (0.43) 1.61 (4.81) −36.72 (−46.50)
Texas-Gulf (12) 0.23 896 0.36 (0.28) 0.38 (0.45) 9.68 (18.60) −49.85 (−59.79)
Rio Grande (13) 0.08 2,688 0.41 (−0.20) 0.04 (0.10) 14.23 (75.70) −11.80 (−8.94)
Upper Colorado (14) 0.35 444 0.43 (0.38) 0.41 (0.44) 14.52 (24.84) −3.08 (2.09)
Lower Colorado (15) 0.13 2,877 0.23 (0.06) 0.06 (0.08) 13.31 (22.41) −43.80 (−63.08)
Great Basin (16) 0.19 543 0.32 (0.25) 0.25 (0.35) 9.42 (20.41) −15.82 (−25.03)
Pacific Northwest (17) 0.33 592 0.58 (0.51) 0.96 (1.09) 6.89 (10.37) −21.46 (−25.34)
California (18) 0.35 487 0.43 (0.37) 0.47 (0.55) 7.12 (9.02) −31.15 (−35.46)
  • Note. The comparison is between naturalized (values in the parenthesis) and assimilated Dayflow.

Table 2 presents a summary of the median error metrics based on the HUC2 hydrologic regions in the US, where virtually all regions demonstrate improvement. Particularly, the table highlights the potential of data assimilation in more arid HUC2 regions. For example, the Rio Grande region in addition to being arid is highly regulated and the assimilation gauges are mostly concentrated along the main stem of the river. Also, note that the improvement across HUC2 regions is not necessarily governed by the fraction of assimilation gauges or their density (i.e., weaker correlation with ΔKGE). Rather, it is governed by the fraction of the upstream drainage area they monitor or the downstream channel distance between the assimilation point and the evaluation gauge (e.g., Krajewski et al., 20202021) as is the case in the Rio Grande region. For example, as shown by the simulated hydrographs comparison in Figure S4a of Supporting Information S1, the assimilated streamflow captures the observed streamflow much better than the naturalized streamflow. Note that most of the assimilation gauges are located along the main stem of the James River representing larger upstream monitored area fractions, hence the improved performance of assimilated streamflow. In contrast, hydrographs in Figure S4b of Supporting Information S1 show the performance deterioration by flow assimilation primarily because of the smaller upstream monitored area fraction and the flow regulation between the evaluation (downstream) and assimilation (upstream) gauges.

3.2 Comparison Across Streamflow Timescales

Streamflow measured over longer timescales (i.e., monthly) is often used for long-term water resources planning that is determined by seasonal rather than day-to-day variation. Therefore, in addition to the daily scale evaluation, we also evaluate the monthly scale Dayflow reanalysis across the CONUS. The distribution of monthly Dayflow (Figure 8) shows a systematic shift to the right for both naturalized and assimilated streamflow. An indication from this comparison is that the left tail (sites performing poorly) generally does not improve as much compared to the right tail (sites performing well). Between naturalized (median ΔKGE = 0.09) and assimilated (median ΔKGE = 0.07) streamflow, there is little to distinguish in terms of the overall improvement at the monthly time step. There are, however, many gauges in the arid regions where monthly streamflow does not show improvement, particularly for naturalized flow (Figure 8a). For both naturalized and assimilated streamflow, monthly Dayflow generally shows superior performance across the Central-to Eastern US. Such a result is consistent with previous findings (e.g., Ghimire et al., 2020; Lin et al., 2019) which showed improvement at longer timescales.

Details are in the caption following the image

Spatial distribution of the difference in Kling–Gupta Efficiency (KGE) between monthly and daily timescales. (a) Without data assimilation (i.e., naturalized streamflow), and (b) With data assimilation. The KGE difference is computed as . The inset at the bottom left corner depicts the distribution of KGE for daily and monthly Dayflow.

3.3 Comparison Between Dayflow and NWM Streamflow Reanalysis

NWM streamflow reanalysis is the most relevant CONUS-scale high-resolution streamflow simulation we can compare with. The latest version of NWM V2.1 (NOAA, 2021) made some notable upgrades to NWM V1.2 (NOAA, 2021) such as snow model updates in the snow dominant systems, improved representation of lakes and reservoirs, and the use of significantly larger calibration basins. NWM V2.1 uses forcings from the Office of Water Prediction Analysis of Record dataset. Since NWM V2.1 reanalysis (NOAA, 2021) is available for 1979–2020, we use the common 1980–2015 period for comparison. Since the NWM V2.1 streamflow reanalysis is at the hourly time step, we aggregate them to the daily time step for comparison. The difference in KGE (i.e., ) demonstrates the better performance of Dayflow by both naturalized (Figure 9a) and assimilated (Figure 9b) streamflow. The overall improvement is more notable with assimilated streamflow. Note that NWM V2.1 in its current form is without assimilation, which leads to a larger improvement by the assimilated streamflow than the naturalized streamflow. For instance, the median ΔKGE for assimilated streamflow is 0.07 (Figure 9a) with ∼60% of NWIS gauges showing an increase in performance. The difference in KGE for Dayflow relative to NWM V1.2 demonstrates a similar regional pattern as that relative to NWM V2.1 (Figure S5 of Supporting Information S1) with particular improvement in the semiarid-to-arid regions. Note that NWM V2.1 (see Figure 9) generally demonstrates improved performance over NWM V1.2 (see Figure S5 of Supporting Information S1) across the CONUS.

Details are in the caption following the image

Distribution of the Kling–Gupta Efficiency (KGE) difference (ΔKGE) between Dayflow and National Water Model V2.1 streamflow reanalysis data. The KGE difference is computed as . (a1) Spatial distribution of ΔKGE, (a2) histogram of ΔKGE, and (a3) empirical CDF of ΔKGE for assimilated streamflow. (b1) Spatial distribution of ΔKGE, (b2) histogram of ΔKGE, and (b3) empirical CDF of ΔKGE for naturalized streamflow. The red dotted line corresponds to the median ΔKGE.

Most regions in the Central-to Eastern US illustrate a somewhat even distribution of ΔKGE, which is apparent from the median error metrics shown in Table 3 and Table S2 of Supporting Information S1. The NWM, particularly V2.1, employs an improved lake/reservoir routing module while Dayflow (Figure 9a) assimilates the flow in the RAPID routing model, which could explain similar performances. There is a marked improvement in Dayflow across the Central-to Western-US (Table 3 & Table S2 of Supporting Information S1; Figure 9a & Figure S4a of Supporting Information S1). Dayflow demonstrates consistently better performance in arid regions such as upper Missouri, Arkansas-White-Red, Texas-Gulf, Rio Grande, Upper Colorado, Lower Colorado, and Great Basin (also see Wu et al., 2021). It is important to note that NWM V2.1 shows overall improvement across CONUS over NWM V1.2 (Table 3 & Table S2 of Supporting Information S1), particularly in the snow-dominant systems. The comparison of naturalized streamflow to NWM estimates (Figure 9b & Figure S5 of Supporting Information S1) also shows improved performance, but on average, the improvement is minimal. About 52% of NWIS gauges show an increase in KGE for naturalized streamflow over NWM V2.1 streamflow estimates. Interestingly, both naturalized (62% of gauges) and assimilated (65% of gauges) streamflow demonstrate improved Pbias across all hydrologic regions. Overall, one could attribute the improvement to flow assimilation, better representation of infiltration-excess overland flow, and better meteorologic forcings among others, particularly in the semiarid-to-arid regions. We stress the fact that despite the VIC-RAPID framework's better overall performance (relative to NWM; see an example in Figure S4 of Supporting Information S1), it deserves further investigation as to how physical process representation could be improved in the region given its low performance without assimilation, including the peak flow representation (note PPD in Table 3 & Table S2 of Supporting Information S1). For instance, the use of daily precipitation from Daymet to force the VIC model might not adequately represent hydrologic processes at smaller watersheds such as flash floods which typically occur at hourly to sub-daily timescales, hence a lower peak flow performance by Dayflow relative to NWM.

Table 3. Summary of Median Error Metrics Across HUC2 Hydrologic Regions
Hydrologic region (HUC2) Dayflow (NWM V2.1) median error metrics
KGE MAE [m3/s/km2 10−2] Pbias [%] PPD [%]
New England (01) 0.66 (0.66) 0.96 (0.88) −3.89 (−10.62) −36.42 (−11.78)
Mid Atlantic (02) 0.60 (0.60) 0.84 (0.73) −1.11 (−6.33) −48.10 (−13.50)
South Atlantic-Gulf (03) 0.57 (0.56) 0.71 (0.67) 3.19 (−3.61) −42.95 (−6.61)
Great Lakes (04) 0.59 (0.52) 0.53 (0.57) 0.12 (−5.27) −22.69 (0.47)
Ohio (05) 0.62 (0.62) 0.85 (0.77) −1.60 (−8.19) −38.95 (−13.02)
Tennessee (06) 0.64 (0.71) 0.94 (0.83) 0.45 (−5.03) −42.39 (−14.65)
Upper Mississippi (07) 0.57 (0.60) 0.50 (0.47) 0.79 (2.71) −33.27 (−6.57)
Lower Mississippi (08) 0.59 (0.61) 1.05 (0.85) 2.62 (−10.96) −37.15 (−5.34)
Souris-Red-Rainy (09) 0.50 (0.42) 0.19 (0.19) 0.12 (17.48) −11.47 (−0.81)
Missouri (10) 0.38 (0.25) 0.16 (0.24) 0.83 (15.73) −36.23 (0.10)
Arkansas-White-Red (11) 0.49 (0.33) 0.31 (0.39) 1.61 (12.89) −36.72 (−11.71)
Texas-Gulf (12) 0.36 (0.28) 0.38 (0.37) 9.68 (24.36) −49.85 (−26.11)
Rio Grande (13) 0.41 (−1.61) 0.04 (0.21) 14.23 (213.88) −11.80 (43.38)
Upper Colorado (14) 0.43 (0.31) 0.41 (0.49) 14.52 (24.39) −3.08 (1.94)
Lower Colorado (15) 0.23 (−0.76) 0.06 (0.13) 13.31 (129.33) −43.80 (2.60)
Great Basin (16) 0.32 (0.07) 0.25 (0.48) 9.42 (45.80) −15.82 (45.16)
Pacific Northwest (17) 0.58 (0.55) 0.96 (0.99) 6.89 (9.75) −21.46 (13.31)
California (18) 0.43 (−0.02) 0.47 (0.74) 7.12 (60.72) −31.15 (14.59)
  • Note. The comparison is between the assimilated Dayflow and National Water Model (NWM V2.1) streamflow reanalysis.

Investigating the relationship between ΔKGE (i.e., ) for NWM V2.1 and basin scales reveals a weaker but non-negligible dependence on basin size (see Figure S7 of Supporting Information S1). The dependence is stronger for assimilated streamflow than the naturalized streamflow. It is especially true given that the streamflow assimilation at gauges with larger drainage areas tends to improve the streamflow predictability more than other gauges.

In addition, we compute the differences in annual average streamflow across ∼2.7 million NHDPlusV2 river reaches. In Figure 10, we present the percent difference in annual average streamflow between Dayflow and NWM V2.1 (see Figure S6 of Supporting Information S1 for NWM V1.2). The median difference is about 35%, suggesting an overall larger runoff volume of Dayflow than NWM V2.1. If one considers the NWIS gauges (Figure 12c) as a small subset of the river network, a much larger number of gauges show an underestimation of annual average streamflow by NWM V2.1 compared to Dayflow. There has been a significant reduction in the variability of NWM V2.1 (Figure 10c) compared to NWM V1.2 (Figure S6c of Supporting Information S1). We note that the sample size of evaluation gauges is smaller compared to the total number of river reaches in the NHDPlusV2 network. Indeed, the larger number of evaluation gauges show reduced annual average streamflow difference thus explaining the overall reduction in the MAE and Pbias of the Dayflow. Such a connection helps us explain the general pattern in the river network across all hydrologic regions. Most notable is the spatial signature across the semiarid-to-arid regions we highlighted before. The river reaches with a larger percentage difference of annual average streamflow (i.e., between Dayflow and NWM) in these regions generally show a better Dayflow performance.

Details are in the caption following the image

Characterization of annual average streamflow differences across the NHDPlusV2 river network. (a) Percentage difference of annual average streamflow between assimilated Dayflow and National Water Model (NWM) V2.1, (b) distribution of the percentage difference in (a), and (c) two-dimensional scatter density plot of percentage difference of NWM V2.1 and Dayflow relative to corresponding National Water Information System gauge observations, wherever available, in x- and y-axes, respectively. The color code in (b) is similar to that in (a). In (c), the color code represents the point density with the dark blue color being the largest.

3.4 Comparison Between Dayflow and GRADES Streamflow Reanalysis

Global Reach-scale A priori Discharge Estimates for SWOT (GRADES; Lin et al., 2019; Princeton-ReachHydro, 2022) is a global streamflow reanalysis product that uses the same VIC-RAPID modeling framework with Multi-Source Weighted-Ensemble Precipitation (MSWEP) forcing (0.1° and 3-hourly precipitation; Beck et al., 2019). The product provides 1980–2013 streamflow information across ∼2.94 million global river reaches (median local drainage area in the US = 107 km2) extracted from ∼90m MERIT-Hydro DEM (Yamazaki et al., 2019). The product uses a machine learning-derived global runoff characteristics map (Beck et al., 2015) for grid-by-grid calibration and bias correction of the simulated runoff (Lin et al., 2019). The scale and framework implemented in producing this dataset make it very relevant to compare against Dayflow. There are 6,323 common USGS gauges for the comparison between Dayflow and GRADES performance after filtering out gauges that could not be indexed to the river network. The comparison of empirical distributions of KGE and Pbias between Dayflow and GRADES (Figure 11) shows that assimilated streamflow from Dayflow significantly outperforms GRADES. As observed before, the improvement in the Dayflow performance across the CONUS shows a more prominent regional pattern (Figure S8 of Supporting Information S1). The improved performance by Dayflow clearly illustrates the value of using high-resolution modeling setups (e.g., input DEM, VIC simulation grids) and calibration strategies for large-scale streamflow reanalysis. The naturalized streamflow performs similarly relative to GRADES (Figure S9 of Supporting Information S1).

Details are in the caption following the image

Comparison of Kling–Gupta Efficiency and Pbias distributions between assimilated streamflow from Dayflow and Global Reach-scale A priori Discharge Estimates for SWOT (GRADES; Lin et al., 2019; Princeton-ReachHydro, 2022). The first row shows the comparison of empirical CDFs while the second row shows the comparison of histograms. The broken blue and red lines correspond to the median values of the error metrics for Dayflow and GRADES, respectively. Naturalized streamflow (Figure S8 of Supporting Information S1) from Dayflow shows a similar pattern.

3.5 Relationships Between Basin Characteristics and Streamflow Prediction Errors

Correlation analysis of key hydrologic, hydroclimatic, and geomorphological basin characteristics can provide insights into the potential sources of streamflow prediction errors. Figure 12 shows a Pearson's correlation matrix between the basin characteristics and key error metrics. While we only present statistically significant (at a 5% significance level) correlation values, the matrix highlights relationships that vary in strength. For instance, the correlation between the mean absolute error (MAE) and the number of dams (ND) is the largest of all (Figure 10 and Figure S10 of Supporting Information S1). Here, we particularly focus on KGE and nMAE for our discussion. A larger prediction error at the evaluation gauges with a larger number of dams emphasizes the need for the VIC-RAPID framework to improve the algorithm. Clearly, with downstream flow assimilation, streamflow predictions are improved (also see Figure 9). However, there is an apparent need to improve the representation of lake processes, and reservoir routing algorithms such as incorporating the reservoir modules.

Details are in the caption following the image

Pearson's correlation matrix between streamflow error metrics for assimilated streamflow and the key basin characteristics at the evaluation gauges. Refer to Table 1 for the abbreviations of the basin characteristics. The color bar on the right represents the correlation among variables. Pies represent the magnitude of correlation. Empty cells correspond to the statistically insignificant relationships at a 5% significance level. The additional error metrics presented here are Nash–Sutcliffe Efficiency (NSE; Nash & Sutcliffe, 1970), Timing (timing of the hydrographs that maximize the cross-correlation), and Tp (timing of the peak).

Results discussed in previous sections showed that there are regional patterns of prediction errors. With that recognition, we explore the dominant regional relationships for KGE and nMAE. Mean annual precipitation (MAP), average annual runoff from 50 years (AR50), and stream density (SD) show the largest correlation with KGE while nMAE shows the largest correlation with MAP, AR50, and percent slope of basins (PS). Figure 13 presents the geographical distribution of these relationships.

Details are in the caption following the image

Spatial variability of prediction errors (demonstrated by the error metrics) and key basin characteristics (represented by dot sizes). The left column shows the variability of Kling–Gupta Efficiency with (a) mean annual precipitation (MAP), (c) average annual runoff from 50 years (AR50), and (e) stream density (SD). The right column shows the variability of normalized mean absolute error [10−2 m3/s/km2] with (b) mean annual precipitation (MAP), (d) average annual runoff from 50 years (AR50), and (f) percent slope of basins (PS).

There is a strong regional pattern to the relationship of KGE with MAP and AR50 (Figure 13). Central-to Eastern-US and the Pacific Northwest, which are characterized by humid to very humid and dense vegetation cover, receive relatively higher precipitation and hence higher runoff. The streamflow generation mechanisms in these regions are generally governed by saturation-excess overland flow, groundwater flow, and lateral preferential flows (Wu et al., 2021). These regions generally demonstrate a larger KGE indicating the strength of the VIC-RAPID modeling framework in representing the saturation-excess rainfall-runoff conversion process better than the infiltration-excess conversion process typified by the arid regions. For these regions, the finer temporal resolution of precipitation forcings in the VIC-RAPID framework could provide a better representation of streamflow dynamics. The distribution of SD across the US is more even than MAP and AR50. For the arid regions, the SD is generally smaller than the rest. The smaller SD values generally indicate less flow aggregation effect by the river network to filter out the random prediction error in the network, hence a smaller KGE. In addition, it implies the importance of improving flow routing in the region.

The MAE is strongly conditional on the drainage area, hence its comparison across scales is not as straightforward as nondimensional error metrics like KGE. For that reason, we normalize it by the upstream drainage area that is, nMAE. Still, the distribution of nMAE is somewhat similar to that of MAE (Figure S11 of Supporting Information S1). The geographic distribution of nMAE with MAP, AR50, and PS shows some interesting regional patterns. Central-to Eastern regions such as Ohio and Mississippi, and the Pacific Northwest region generally have basins with smaller median drainage areas, larger precipitation, and runoffs accompanied by hydrologic prediction uncertainty that generally show larger nMAE. The semiarid-to-arid regions in the US demonstrate the opposite pattern of nMAE (and also MAE) to what we saw with the KGE, particularly because these regions are characterized by less precipitation and consequent runoffs (Figure 13).

3.6 Limitations and Uncertainties

Despite the promise that Dayflow shows for CONUS-scale streamflow simulation, many limitations and uncertainties remain for future improvements. For instance, the comparison of Dayflow with the CAMELS dataset (see Figure S12 of Supporting Information S1) at 671 small-to-medium unregulated headwater basins reveals the better performance of CAMELS (median KGE of 0.55% and 0.76%, and 67% and 87% of the gauges with percentage bias within ±20% for Dayflow and CAMELS, respectively). The difference in KGE predominantly originates from the difference in correlation and variance ratio while there is little difference in bias (see Pbias in Figure S12 of Supporting Information S1). Better performance of CAMELS can be expected since CAMELS conducted a basin-specific calibration to enhance the performance of streamflow simulation at the outlet of each selected basin. However, it should be noted that such an approach cannot be directly expanded for CONUS-scale simulation, and may not address situations where a gauge is largely or partially regulated. Regardless, the comparison with CAMELS and other relevant large-scale reanalysis datasets presented in this study provides opportunities for improvement of Dayflow and a choice for water managers, modelers, and planners to use the most accurate locally relevant streamflow predictions (Bierkens et al., 2015).

Further, uncertainties could arise from meteorologic forcings (inputs), model structure and parameters, and observations (Liu & Gupta, 2007). Newman et al. (2015), for example, highlighted the negative temperature bias associated with the Daymet forcing in the Pacific Northwest, indicating the potential for improvement of Dayflow in the region. Moreover, uncertainties about real-world hydrologic processes and streamflow predictability are often intertwined with each other, hence future implementations could follow a more holistic approach. More sophisticated uncertainty estimation approaches such as the Generalized Likelihood Uncertainty Estimation (Beven & Binley, 1992), the Bayesian Recursive Estimation (Thiemann et al., 2001), and the Structure for Unifying Multiple Modeling Alternatives (Clark et al., 2015) should provide further improvements in Dayflow. For instance, the hierarchical data assimilation framework adopted in this study does not consider observational uncertainties in streamflow. Future implementations of data assimilation in the VIC-RAPID framework should consider observational uncertainty. Finding ways to assimilate streamflow at gauge locations with partial records should also benefit downstream streamflow routing for a much larger domain. Future implementation should seek to utilize more robust forms of data assimilation (e.g., Abbaszadeh et al., 20182019; Moradkhani et al., 2005). The current implementation does not consider the lake/reservoir modules in the RAPID routing scheme. Gutenson et al. (2020) demonstrated in their test study the potential of using D03 (Döll et al., 2009) and H06 (Hanasaki et al., 2006) reservoir routing models to improve RAPID performance. Incorporating these models or downstream reservoir release data (e.g., Tavakoly et al., 2017) and machine learning-based hybrid approaches (e.g., Gangrade et al., 2022) into the RAPID framework should be able to improve downstream streamflow predictions.

4 Conclusions

This study demonstrates an efficient approach to integrating a large-scale VIC model with a high-resolution RAPID routing scheme. The VIC-RAPID framework requires the use of meteorological forcing (e.g., daily precipitation, daily maximum and minimum temperatures, and wind speed), digital elevation, land cover, soil feature, vegetation, and NHDPlusV2 river network. In addition, we demonstrate an efficient method of assimilating streamflow at 2,934 NWIS gauges with complete historical streamflow records using a simple hierarchical substitution approach. The result is the Dayflow streamflow reanalysis for 36 years (1980–2015) at ∼2.7 million NHDPlusV2 river reaches. We also perform a comprehensive evaluation against streamflow observations at 7,526 NWIS gauges and characterize the prediction errors.

Our results show that 49% of the evaluation gauges show KGE >0.5 while about 58% of the gauges show Pbias within ±20% for the daily naturalized streamflow from Dayflow. The evaluation at the monthly scale suggests a significant overall improvement in the performance over daily streamflow. Simple hierarchical streamflow assimilation across the CONUS also shows an overall improvement (e.g., mean ΔKGE = 0.15) in all error metrics indicating the promise of Dayflow for further water resources applications. Several error metrics associated with Dayflow demonstrate region-specific patterns across the CONUS. Our comparison against two versions of the NWM streamflow reanalysis also demonstrates an overall improvement in performance. Overall, the greatest improvements in KGE were observed in the semiarid-to-arid regions. Comparison with NWM reveals that Dayflow should be improved further in capturing the peak flow. The detailed intercomparison metrics used in this study allow us to highlight the strengths of each streamflow reanalysis product based on the region or area of application. Comparison with a global reanalysis product GRADES, which also uses the VIC-RAPID modeling framework for streamflow simulation, reveals significant improvement by Dayflow with a strong regional pattern. Moreover, investigations of relationships of streamflow prediction errors with key hydrologic, hydroclimatic, and geomorphologic basin characteristics reveal region-specific variability providing insights for improving the modeling framework in future applications (e.g., Gutenson et al., 2020; Tavakoly et al., 2017).

The upcoming Surface Water and Ocean Topography (SWOT) mission should be able to provide near-real-time storage changes in the reservoir thus helping assimilate storage data in the routing model. Moreover, the SWOT mission and other similar hydrologic applications could also benefit from the high-resolution Dayflow estimates, particularly at headwater streams. The discharge estimation from SWOT requires the estimation of some unobservable parameters such as channel friction and bathymetry, which depends on a priori estimation of river flows (Lin et al., 2019). Dayflow could provide bounds for such inversion algorithms. The CONUS-scale climate change impact assessments of streamflow (e.g., using downscaled CMIP6 forcings) could easily build on the framework developed in this study. For example, Kao, Ashfaq, et al. (2022) implemented a multi-model ensemble approach following the VIC-RAPID modeling framework of Dayflow with a comprehensive uncertainty assessment. Other avenues stemming from this research may include but are not limited to (a) estimating VIC parameters seamlessly for large domains using approaches such as multiscale parameter regionalization (Mizukami et al., 2017), (b) using channel attributes such as reach length, bed slope, and residence time for RAPID parameter optimization (e.g., Thober et al., 2019), (c) evaluating a different large-scale river network routing model such as mizuRoute (Mizukami et al., 2016; Vanderkelen et al., 2022) that can model the lake and reservoir release or power-law based hydrologic routing models (e.g., Ghimire et al., 2018), and (d) forcing the VIC model with a different meteorologic forcing, among others aimed at further improvement of the streamflow predictability.

Acknowledgments

This study is supported by the USGS. Department of Energy (DOE) Water Power Technologies Office. The research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is a DOE Office of Science User Facility. PET was supported as part of the Energy Exascale Earth System Model (E3SM) project, funded by the US DOE, Office of Science, Office of Biological and Environmental Research. The authors are employees of UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US DOE. Accordingly, the US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so, for US Government purposes.

    Conflict of Interest

    The authors declare no conflicts of interest relevant to this study.

    Appendix A: Evaluation Metrics

    The KGE (Gupta et al., 2009) metric is composed of three components that represent different characteristics: Pearson's correlation (r), the mean ratio , and the variance ratio , where μ and σ represent mean and standard deviation, respectively. The subscripts ‘s’ and ‘o’ correspond to the simulation and observation, respectively. The KGE metric is easier to interpret in terms of its components as the value of each component equal to 1 results in an ideal KGE value of 1. Camici et al. (2020), Knoben et al. (2019), and Pool et al. (2018), among others, provide further interpretation of KGE for various purposes. For instance, Knoben et al. (2019) show that KGE > −0.41 serves as the threshold of improvement upon the mean flow instead of KGE = 0.
    (A1)
    We computed the mean absolute error (MAE) normalized by the upstream drainage area (A) at the evaluation site, that is, nMAE. This enables us to compare the performance across scales. MAE is given by
    (A2)
    where Qs and Qs correspond to simulated and observed streamflow, respectively. N represents the length of the streamflow time series. Division of MAE by A gives nMAE.
    The percentage bias (Pbias) is a measure of the average tendency of the streamflow simulation to underestimate or overestimate the corresponding observations. Pbias (see Equation 10) is representative of the over or underestimation of the corresponding runoff volume with a positive value representing the overestimation while the negative value representing the underestimation relative to the observations.
    (A3)
    Finally, we computed the annual peak flow difference between the Dayflow and observed streamflow time series using Equation A4.
    (A4)
    where peaks and peako represent annual peak flow from simulated and observed streamflow time series, respectively. In addition, we computed the Nash-Sutcliffe Efficiency [NSE; Nash & Sutcliffe, 1970], Timing (timing of the hydrographs that maximizes the cross-correlation), and Tp (timing of the peak), but our discussion focuses primarily on those in Equations A1 – A4.

    Data Availability Statement

    All hydrography data (NHDPlusV2) are publicly available from https://www..gov/core-science-systems/ngp/national-hydrography/access-national-hydrography-products. The Dayflow Dataset along with associated error metrics is publicly available for research purposes at https://hydrosource.ornl.gov/dataset/dayflow-V1 (Ghimire et al., 2022). The dataset can also be downloaded interactively by selecting HUC8 subbasins of interest from https://hydrosourcedataexplorer.ornl.gov/#externalaccess.