Volume 58, Issue 12 e2021WR031131
Research Article
Open Access

Combined Effects of Stream Hydrology and Land Use on Basin-Scale Hyporheic Zone Denitrification in the Columbia River Basin

Kyongho Son

Corresponding Author

Kyongho Son

Pacific Northwest National Laboratory, Richland, WA, USA

Correspondence to:

K. Son,

[email protected]

Contribution: Conceptualization, Methodology, Software, Validation, Formal analysis, ​Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Visualization

Search for more papers by this author
Yilin Fang

Yilin Fang

Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: Conceptualization, Methodology, Software, Writing - review & editing

Search for more papers by this author
Jesus D. Gomez-Velez

Jesus D. Gomez-Velez

Department of Civil and Environmental Engineering, Vanderbilt University, Nashville, TN, USA

Climate Change Science Institute & Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Contribution: Software, Writing - review & editing

Search for more papers by this author
Kyuhyun Byun

Kyuhyun Byun

Department of Environmental Engineering, Incheon National University, Incheon, South Korea

Contribution: Writing - review & editing

Search for more papers by this author
Xingyuan Chen

Xingyuan Chen

Pacific Northwest National Laboratory, Richland, WA, USA

Contribution: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition

Search for more papers by this author
First published: 21 November 2022

Abstract

Denitrification in the hyporheic zone (HZ) of river corridors is crucial to removing excess nitrogen in rivers from anthropogenic activities. However, previous modeling studies of the effectiveness of river corridors in removing excess nitrogen via denitrification were often limited to the reach-scale and low-order stream watersheds. We developed a basin-scale river corridor model for the Columbia River Basin with random forest models to identify the dominant factors associated with the spatial variation of HZ denitrification. Our modeling results suggest that the combined effects of hydrologic variability in reaches and substrate availability influenced by land use are associated with the spatial variability of modeled HZ denitrification at the basin scale. Hyporheic exchange flux can explain most of spatial variation of denitrification amounts in reaches of different sizes, while among the reaches affected by different land uses, the combination of hyporheic exchange flux and stream dissolved organic carbon (DOC) concentration can explain the denitrification differences. Also, we can generalize that the most influential watershed and channel variables controlling denitrification variation are channel morphology parameters (median grain size (D50), stream slope), climate (annual precipitation and evapotranspiration), and stream DOC-related parameters (percent of shrub area). The modeling framework in our study can serve as a valuable tool to identify the limiting factors in removing excess nitrogen pollution in large river basins where direct measurement is often infeasible.

Key Points

  • Hyporheic exchange flux controls the spatial variation of denitrification across reaches with different sizes and land uses

  • The combination of hyporheic exchange flux and stream dissolved organic carbon (DOC) explains the differences in denitrification for different land use streams

  • D50, stream slope, precipitation, evapotranspiration, and shrub area can explain most of the spatial variability in denitrification

1 Introduction

Air pollution, fertilizer use in agricultural lands, and wastewater effluents and polluted stormwater runoff from urban lands often result in stream nitrogen pollution, which also increases the frequency of eutrophication, hypoxia, and harmful algal blooms in lakes and estuaries (Boyer et al., 2006; Frei et al., 2020; Le Moal et al., 2019; Pinay et al., 20152018). To lessen stream nitrogen pollution, we can reduce the nutrient loading or increase the nitrogen removal activity through in-stream nitrogen decay or the denitrification process in river corridors or soils (Frei et al., 2020; Pinay et al., 2018). Generally, denitrification is the most effective way to transform inorganic forms of excess nitrogen to a gas form (N2) emitted to the atmosphere (Boyer et al., 2006). However, with the importance of denitrification, there are still considerable uncertainties in modeling denitrification in terrestrial and aquatic systems (Groffman, Butterbach-Bahl et al., 2009) due to the high spatial and temporal heterogeneity of key controlling factors (oxygen, nitrate, carbon and pH, temperature, etc.). Therefore, quantifying denitrification in river corridors with varying spatial and temporal scales is challenging, especially for the hyporheic zone (HZ) at large spatial scales (Lee-Cullin et al., 2018).

Denitrification in the HZ varies with local conditions, including substrate availability (e.g., dissolved organic carbon (DOC), dissolved oxygen (DO) and nitrate), sediment properties (e.g., grain size), and hydrologic exchange flux/residence time (Boyer et al., 2006; Findlay et al., 2011; Fork & Heffernan, 2014; Kreiling et al., 2019; Seitzinger et al., 2006; Tank et al., 2008; Zarnetske et al., 2015). Large-scale drivers, including land use/cover and climate, can alter local conditions, for example, agricultural and urban watersheds tend to have higher potential denitrification than undisturbed watersheds (Mulholland et al., 2008). However, the critical controlling factors may change with scale and land use. Kreiling et al. (2019) showed that stream nitrate availability is a crucial variable that controls the spatial variation of denitrification in the Fox River watershed in Wisconsin, a mixed land use landscape. Baker and Vervier (2004) showed that the concentration of low molecular weight organic acids is the best predictor for explaining spatiotemporal patterns of denitrification variables. Even though we know that the combined effects of hydrologic variability and substrate concentration control denitrification, it is unclear which factors become dominant and under what conditions. Bardini et al. (2012) used numerical modeling to demonstrate that streambeds can alternate between net nitrification and net denitrification states by varying physical and chemical constraints. In particular, their numerical simulation study showed that hydrologic variability is more important than reaction substrate availability (DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0001) to drive such changes in streambed biogeochemical transformations. The relative importance of hydrologic and substrate variables may vary with land use and stream size; for example, a study by Myers (2008) found that, for a selected number of sites, denitrification in agricultural streams is limited by hyporheic exchange flux, while in forest streams it is limited by substrate availability.

Previous denitrification studies are often limited to reach scale to lower order streams and have emphasized the importance of the role of lower order streams in denitrification (Alexander et al., 20002007; Gomez-Velez et al., 2015; Tank et al., 2008). Due to the higher ratio of benthic surface-to-water volume and nutrient loading in lower order streams, denitrification’s efficiency in lower order streams is higher than that of higher order streams (Wollheim, 2016). This result may be relevant to the empirical studies’ sample bias, as Tank et al. (2008) pointed out in their meta-analysis that most stream nutrient uptake studies for urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0002 and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0003 were conducted at streams with less than 200 l/s. Using a pulse tracer test method, Tank et al. (2008) also demonstrated that larger streams in the Upper Snake River (seventh order and 12,000 l/s) have higher inorganic nitrogen uptake (urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0004 and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0005) than smaller streams. Ensign and Doyle (2006) analyzed the results of nutrient spiraling experiments spanning from first order to fifth order streams. They found that the cumulative uptake rate of urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0006 increases with stream orders. Similarly, a recent modeling study showed the potentially important role played by larger rivers in removing excess nitrogen (Wollheim, 2016). Therefore, it is vital to investigate further how stream size affects hyporheic exchange processes (Gomez-Velez & Harvey, 2014; Hotchkiss et al., 2015; Tank et al., 2008; Wollheim et al., 2006). Furthermore, many previous modeling studies did not separate the role of HZ denitrification from the whole-stream denitrification (Alexander et al., 200020072009; Schmadel et al., 2021; Wollheim, 2016), so studying HZ denitrification along streams with varying hydrologic and biogeochemical conditions is critical.

Previously, few basin-scale numerical models have been developed to simulate the role of river corridors in removing excess nitrogen from streams and rivers (Alexander et al., 20072009; Curie et al., 2011; Fang et al., 2020; Gomez-Velez & Harvey, 2014). However, most of the basin-scale models are based on empirical reaction models, or the reaction parameters are estimated by fitting the empirical data (Alexander et al., 20002009; Wise et al., 2019). For example, the Networks with Exchange and Subsurface Storage (NEXSS) used an empirical hydrogeomorphic model and a suite of hydraulic and groundwater models to compute the hyporheic exchange flux and residence time along river networks (Gomez-Velez & Harvey, 2014; Gomez-Velez et al., 2015). The NEXSS model determines potential denitrification based on the ratio of computed Damkohler number and river turnover length. However, this potential denitrification does not consider the limitation of substrate availability in the denitrification rate. The spatially referenced regressions on watershed attributes (SPARROW) model was used to estimate in-stream removal of nitrogen in the Mississippi River Basin (Alexander et al., 20002007) and the Pacific regions (Wise et al., 2019). In-stream removal of nitrogen was estimated by fitting the model parameters with the measured mean nitrogen fluxes without considering explicitly nitrogen processes in streams. Also, this model does not separate the nitrogen removal from the water column and HZ. Thus, the sole contribution of the nitrogen removal from the HZ cannot be quantified. An integrated surface and subsurface model (Amanzi-ATS) was developed to compute aerobic respiration and denitrification in the HZ at the watershed scale (Jan et al., 2021), but this study is still limited to demonstrating the capability of the watershed model to simulate the HZ processes and their impacts on stream water quality in an agriculture-dominant watershed. Applying the ATS model in a large river basin and understanding the important factors associated with denitrification is computationally too expensive.

On the other hand, Fang et al. (2020) developed SWAT-MRMT-R, a model that couples the watershed water quality model, soil and water assessment tool (SWAT), with the reaction module from a flow and reactive transport code (PFLOTRAN). It can compute aerobic respiration and denitrification in the HZ. The model was successfully tested in the upper Columbia–Priest Rapids watershed in the Columbia River Basin (CRB). It showed that the spatial variation of HZ denitrification depends on a combination of varying hyporheic exchange and source locations of nitrate.

While physically based numerical models can represent explicit mechanisms and simulate HZ denitrification at varying spatial and temporal scales, these models are computationally expensive (Ren et al., 2021) and require various data sources for model calibration (Chen et al., 2021). As an alternative, machine learning approaches show high performance with limited data and capture complex relationships between inputs and outputs (Mori et al., 2019). In some cases, both approaches can be combined to gain further insight and predictability. For example, the model can be used to reveal the dominant process or features through variable importance analysis (Ren et al., 20202021; Ward et al., 2022).

In this study, we adopted the reaction network model from the SWAT-MRMT-R to study the role of the HZ in removing excess nitrogen at the basin scale. We applied this modeling framework to the CRB, covering a wide range of channel sizes and land uses. A detailed description follows in the methodology section. We used the CRB as a testbed to study the spatial variation of HZ denitrification at the basin scale. The developed basin-scale HZ river corridor model (RCM) aims to quantify the spatial variation of HZ denitrification across the reaches of the CRB. A random forest model, a machine learning approach, is then used to identify the dominant factors associated with the spatial variation of HZ denitrification at the basin scale (Figure 1). Specifically, we ask two questions:
  1. What dominant variables explain the spatial variation of HZ denitrification in the CRB? We hypothesized that (a) the relative importance of hydrologic variability and substrate availability can control the spatial variation of HZ denitrification and (b) their significance may change with stream size and dominant land use. We built random forest models with key input variables and modeled denitrification results to test this hypothesis. With this approach, we identify the variables that can better explain the spatial variation of modeled denitrification across streams with different sizes and land uses.

  2. Which watershed/stream characteristics can better explain the spatial variation of HZ denitrification in the CRB? We extended our efforts to develop another random forest model to capture the modeled denitrification in the CRB with publicly available watershed and stream characteristic data. This random forest model can generalize which watershed/stream characteristics can better explain the spatial variation of the HZ denitrification in the CRB.

Details are in the caption following the image

The framework for studying key factors controlling spatial variation of hyporheic zone (HZ) denitrification in streams across different sizes and land uses in the Columbia River Basin (CRB).

2 Methodology

This study uses the RCM to explore the spatial patterns of HZ denitrification across reaches with different sizes and land use in the CRB. Our main objective is to use the RCM as a virtual reality model, and the machine-learning models as surrogates that encapsulate the complexities of the physics-based model while identifying the importance of different variables that are not evident in the model conceptualization. We do not include a direct comparison of the modeled HZ denitrification and measurements; however, we believe that the RCM can capture the overall spatial patterns of the HZ denitrification because the model inputs and its reaction networks are based on well-established theory (Fang et al., 2020; X. Song et al., 2018) and a physical-based model (Gomez-Velez & Harvey, 2014; Gomez-Velez et al., 2015) or measurements (Li et al., 2017). The combination of the model-based predictions and a machine-learning approach (e.g., random forest) is used to improve our understanding of what variables of the model are associated with spatial patterns of the modeled denitrification across reaches with different sizes and land uses, and to develop a proxy model using measurable variables to reproduce the simulated patterns.

2.1 Columbia River Basin

The study site is the CRB (Figure 2), a large transboundary river basin with approximately 5,230 m of relief and a drainage area of 620,000 km2. Here, we focus on 570,413 km2 of the basin within the continental United States. We selected this fraction of the basin due to data availability. For example, only the U.S. CRB has data from the National Hydrography Dataset (NHD) Plus v2, and our spatial template and the hyporheic exchange and residence time estimates are only available for this region.

Details are in the caption following the image

Columbia River Basin (CRB) maps: (a) Mean annual precipitation (mm); (b) elevation and nine major sub-river basins: (1) Lower Columbia (LC), (2) Middle Columbia (MC), (3) Upper Columbia (UC), (4) Lower Snake (LS), (5) Middle Snake (MS), (6) Upper Snake (US), (7) Kootenai-Pend Oreille-Spokane (KO), (8) Willamette(WM), and (9) Yakima (YK); and (c) land use and cover map (National Land Cover Database 2016 data).

The CRB can be divided into nine sub-river basins: (a) Lower Columbia; (b) Middle Columbia; (c) Upper Columbia; (d) Lower Snake; (e) Middle Snake; (f) Upper Snake; (g) Kootenai-Pend Oreille-Spokane; (h) Willamette; and (i) Yakima River (Figure 1b). The basin expands various climatic and land use/cover classes. For example, western Washington and Oregon have humid continental climate; eastern Washington and Oregon, and Idaho have a semi-arid steep climate; and the Cascade Range in Washington and Oregon, and the Rocky Mountains in Idaho, Montana, and Wyoming have an alpine climate. The variations in climate are reflected in the annual precipitation, which ranges from 158 to 5,230 mm (based on 30 years of normalized PRISM data), and the mean annual temperature, which ranges from −3 to 12°C. The seasonal pattern of precipitation is very consistent with winter precipitation being dominant. Higher elevations are dominated by precipitation in the phase of snow, while in lower elevation regions precipitation falls primarily as rain. Major land use/cover (Figure 1c) is composed of 33.7% forest land (33% evergreen forest and about 0.3% and 0.4% deciduous forest and mixed forest), 33% of shrub lands, 12% agriculture lands (10% croplands and 2% hay and pasture), and 2.3% urban lands.

2.2 Basin-Scale Hyporheic Zone River Corridor Model

The RCM used in this study is a simplified, spatially resolved, basin-scale model that couples carbon and nitrogen dynamics. We focus on simulating the spatial variation of HZ denitrification in the CRB (Figure A1). The model adopted the reaction network model from SWAT-MRMT-R (Fang et al., 2020). Three microbially driven reactions, including two-step denitrification and aerobic respiration, are considered within the HZ (Table A1). Note that this model only simulates the HZ denitrification in the stream sediments without accounting for the denitrification process in water column. The detailed equations and descriptions are found in the appendix and Fang et al. (2020). Key model inputs are stream substrate concentrations (DOC, DO, and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0007), and HZ exchange flux and residence time. The model computes at hourly time steps to capture the fast reaction time characterizing the biogeochemical processes represented in Tables A1 and A2, but the model inputs are constant over time; thus, we consider that the modeled HZ denitrification represents long-term averaged conditions. The RCM computes mean annual urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0008 removal (kgN/day) at the scale of the NHDPLUS stream reaches over the simulation periods and scales it by stream surface area (m2), using two parameters (channel width and length). The stream length and width was derived from the NHDPLUS database (Wieczorek et al., 2018), and the power relationship between measurement of instantaneous flow and bankfull width and NHD cumulative drainage area (Gomez-Velez et al., 2015), respectively. The model separately calculates the urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0009 removal amounts via vertical and lateral hyporheic exchange. To test the variation of mean annual urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0010 removal amounts between years, we ran the model over 10 years and found that after 2 years of simulation, the removal amounts reached a dynamic steady state (Figure S1 in Supporting Information S1). For our modeling analysis, the second year simulation results were used.

Among model inputs, the exchange rate and residence time between stream and HZ were estimated using NEXSS (Gomez-Velez & Harvey, 2014). The NEXSS model coupled empirical geomorphologic models with a suite of existing physical hyporheic exchange flux models; for example, NEXSS estimates the values of bankfull channel with discharge, median grain size (D50), channel slope, sinuosity, and regional head gradients along the NHDPLUS stream networks. In addition, physical hyporheic exchange modeling is used to predict the average hyporheic exchange flux, residence time distribution, and median residence time in the vertical and lateral direction. Vertical hyporheic flux represents exchange between channel water and bedforms, while lateral exchange flux represents exchange between channel water and river bars and meander banks.

Stream substrate concentrations, including DOC, DO, and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0011 (Figure 3), are determined via empirical regression-based estimates or the output of the SPARROW 2012. For the stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0012 concentration, we used results of the 2012 SPARROW model (https://www.sciencebase.gov/catalog/item/5d407318e4b01d82ce8d9b3c). SPARROW is a statistical regression model and has been used to identify key pollutant sources and determine the role of in-stream process in removing nutrients at the regional scale (Alexander et al., 2007; Wise et al., 2019). SPARROW outputs include mean annual streamflow, total nitrogen loading, total phosphorous loading, and suspended solid loading at the NHDPLUS stream reaches. Since our RCM requires stream nitrate concentration, we calculated the mean annual total nitrate concentration by dividing the total nitrogen mean annual loading by the mean annual streamflow estimates and multiplying by the ratio of urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0013 to total nitrogen concentration. The ratio was computed based on the measured urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0014 and total nitrogen concentrations at the U.S. Geological Survey gauge stations in the CRB. To compute stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0015 concentration, the ratio of stream nitrate concentration to the total stream nitrogen was multiplied by the total nitrogen concentration. Detailed analysis is included in the Supporting Information.

Details are in the caption following the image

Key input data for the river corridor model (RCM): (a) stream mean annual dissolved organic carbon (DOC) concentrations (mg/l); (b) stream mean annual urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0016 concentrations (mg/l); (c) stream mean annual dissolved oxygen (DO) concentrations (mg/l); (d) total (lateral and vertical) residence time (log10, second); and (e) total (lateral and vertical) hyporheic exchange flux (log10, m/s).

For stream DOC and DO concentrations, we developed multilinear regression models based on the NHD stream database (Wieczorek et al., 2018) and the measured stream DOC/DO concentrations at the gauging stations in the CRB. The developed stream DOC concentration model is a function of the percentage of basin/catchment shrub areas (tshrub and logshrub), a basin agriculture area (logtargc) (stream DOC = −0.03 (tshrub) + 0.45 (logtargc) − 0.12 (logshrub) + 3.15). Reaches with higher agriculture lands tend to have higher DOC concentrations, but those with higher shrub lands tend to have lower DOC concentrations. The developed stream DO concentration model is a function of basin soil bulk density (TOT_BDAVE), basin topographic wetness index (TOT_TWI), basin drainage area (TOT_BASIN), and catchment dam storage (logCAT_NID) (stream DO = −2.85 (TOT_BDAVAE) − 0.49 (TOT_TWI) + 0.31 (logTOT_BASIN_AREA) + 0.12 (logCAT_NID). The reaches with higher drainage area and dam storage tend to have higher DO concentrations, but those with higher bulk density soil and wetted areas tend to have lower DO concentrations. The detailed procedures of building multiple regression models for spatial DOC/DO mean concentrations are included in the Supporting Information S1.

2.3 Spatial Variation of Modeled Hyporheic Zone Denitrification

2.3.1 Reach- and Basin-Scale HZ Denitrification Within the CRB

We quantified the spatial variability of mean annual urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0017 removal amount at the NHDPLUS reach- and sub-basin scale. We explored how the spatial patterns change with channel size and land use. This study classified the channel sizes in the three groups based on Strahler’s stream ordering system: (a) small streams (first–third), (b) medium rivers (fourth–sixth), and (c) large rivers (seventh–twelfth). While the largest stream/river in the CRB is ninth order, the large rivers include the seventh to ninth orders in our analysis. To determine the dominant land use for each reach, we calculated the percentage of each land use (forest, urban, agriculture, and shrub) within the total upstream routed accumulated area. If the percentage of the drainage area for each land use type is larger than 80%, we assigned that type as the dominant land use. National Land Cover Database (2001) land cover (https://www.mrlc.gov/) was used to calculate the percentage of each land cover. To simplify the classification, forest land use includes mixed, deciduous, and evergreen forest types; urban land use includes developed open spaces and developed low/medium/high density area; agriculture land use includes pasture/hay and cultivated crop areas; and shrub land use includes dwarf scrub and shrub/scrub. We quantified the difference in the mean daily HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0018 removal amounts in the reaches with different sizes (small, medium, and large streams/rivers) and different land uses (forest, urban, agriculture, and shrub). The significance of the effect of land use and reach size on the mean daily HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0019 removal amount was tested using the Kruskal-Wallis test.

2.3.2 Sensitivity of HZ Denitrification to Substrate Concentrations

The stream substrate concentrations at the NHDPLUS reach scale are estimated via the existing SPARROW model or measured stream DOC/DO concentration; therefore, their estimates are expected to have a high uncertainty that can affect the modeling results. To quantify the impact of substrate concentration on the model estimates, we create four seasonal stream DOC and DO concentration maps, and evaluate how the modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0020 removal amount changes with different seasonal concentrations. The detailed descriptions of the seasonal substrate concentrations are included in the Supporting Information S1. We also apply the maximum and minimum of substrate concentrations and evaluate which limits the denitrification process in the reaches across the different sizes and land uses. For example, the maximum value of predicted DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0021 and minimum value of predicted DO concentration are applied to all reaches.

2.3.3 Key Factors Controlling Spatial Variability of Mean Annual urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0022 Removal at Basin Scale

To evaluate the relative importance between hydrologic and substrate variables and modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0023 removal in the CRB, we used variable importance analysis implemented in a random forest model to identify what factors are associated with the spatial variation of urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0024 removal amounts (Figure 1). A random forest model was built with the R “randomforest” package using the key input variables and modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0025 removal amounts (kgN/m2/day), with 80% of samples used to train the random forest model and 20% used to test the model prediction. We used the R2 and mean squared error (MSE) to quantify the model prediction accuracy.

The random forest model we developed was used to compute the partial dependence of each variable on the modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0026 removal amount and to measure importance ranks of key input variables. We tested whether the ranks of variable importance vary across the reaches with different sizes and land uses. To measure the importance of key variables in the random forest model, we used Gini impurity measures to determine how well each tree is classified and the variance within each tree. Lower variance represents better classification of each variable. Also, to generalize which watershed and stream properties can better represent the spatial variation of HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0027 removal amount in the CRB, we developed a random forest model with publicly available watershed/stream variables (Figure 1 and Table 1). The detailed information for each variable used in the random forest model is found in the Supporting Information (Table S4 in Supporting Information S1). The watershed and stream properties are based on the NHDPLUS database (Wieczorek et al., 2018).

Table 1. Lists of Key Watershed/Stream Characteristics and Properties
Properties Variables
Climate Precipitation and air temperature
Topography Elevation, slope, wetness index, and drainage area
Hydrology Annual flow, baseflow index, potential evapotranspiration, and actual evapotranspiration
Land Percent of land use/cover types (forest, wetland, agriculture, urban and shrubland), vegetation index
Soil Hydraulic conductivity of soil and permeability of surface geology, percent of soil texture and organic matter
Stream D50, sinuosity, contact time and stream slope, bankfull width, and channel depth

3 Results

3.1 Variation of Hydrologic Variability and Substrate Availability

We computed the distribution of key model inputs of hydrologic/substrate variables in the reaches across orders and dominant land uses (Figure 4). In the following, we summarize our results, starting with the role of stream size and concluding with land use. Note that we excluded data for ninth order reaches given the small sample (only five).

Details are in the caption following the image

Distribution of key hydrologic and substrate variables in streams with different stream orders. In the violine plot, the white point represents median value, the thick black line represents interquartile range (Q1 and Q3), and the thin black lines represent the 1.5×interquantile range.

The inputs consistently vary with stream orders (Figures 4a–4e). For example, for hyporheic exchange flux, the median flux increased from first to fifth order streams and decreased from sixth to eighth order rivers. Median residence time increased from first to eighth. In contrast, median stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0150 concentrations did not display an obvious trend with channel size. For stream DOC and DO concentrations, the median values increased with stream order, while lower order streams had larger variation of DOC concentration than higher order streams/rivers.

When considering land use, reaches in the forest land tended to have the highest hyporheic exchange fluxes, while those in the shrub land had the lowest values (Figure 5). For residence time, reaches in the shrub land had the longest residence time, while forest reaches had the shortest residence time. This is likely explained by the strong correlation between elevation and the drivers for hyporheic exchange. For substrate availability, reaches in the forest and shrub lands had relatively lower stream DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0151 but higher DO concentrations than the reaches in the urban and agricultural lands. Reaches in the agricultural lands had the highest DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0152. The reaches in the forest land had the highest DO concentration, but those in the urban land had the lowest DO concentration.

Details are in the caption following the image

Distribution of key hydrologic and substrate variables in streams with different land uses. In the violine plot, the white point represents median value, the thick black line represents interquartile range (Q1 and Q3), and the thin black lines represent the 1.5×interquantile range.

We also created the seasonal substrate concentration products, where the spatial patterns of the seasonal DOC do not change with the stream orders (Figure S2 in Supporting Information S1); for example, stream DOC increased with the stream orders. However, the relationship between stream DO and stream orders changed with the season. The median of the spring and summer DO concentrations did not vary with the stream orders, but the fall DO concentration decreased with the stream orders and winter DO concentrations increased. On the other hand, the effect of land use on seasonal DOC and DO was minor (Figure S3 in Supporting Information S1). For example, while reaches in forest and shrub lands had lower DOC than those in urban and agricultural lands for all seasons, reaches in the agriculture land had the highest DOC concentration, except for winter when urban reaches had the highest DOC. Similarly, spatial patterns of stream DO with different land use did not vary with season.

3.2 Spatial Variation of Hyporheic Zone urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0031 Removal Amounts via Different Flow Paths

We computed the mean annual HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0032 removal amount (kgN/m2/day) via vertical and lateral hyporheic exchange, respectively (Figure 6). The spatial variations of HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0033 removal were similar; the spatial correlation (as measured by the Spearman correlation coefficient) between the two estimates was 0.85. The vertical HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0034 removal was about one order of magnitude higher than the lateral HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0035 removal. The vertical HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0036 removal ranged from 0 to 0.33 kg N/m2/day and its mean value was 0.00032 kg N/m2/day, while the lateral HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0037 removal ranged from 0 to 0.00517 kg N/m2/day and its mean value was 2.25e−0.5 kg N/m2/day. The ratio of vertical HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0038 removal to the total HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0039 removal ranged from 0.001 to 0.99, with a mean of about 0.78. The ratio increased with the stream orders. For example, median ratios of the first and second order streams were about 0.67 and 0.83, respectively, and the median ratio of higher order rivers (>fifth) was close to 1. This result suggests that the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0040 removal tends to be more dominated by the vertical exchange in higher order streams and rivers. This is consistent with the modeling results from Gomez-Velez et al. (2015), where the potential denitrification (measured by the reaction significant factor) was higher via vertical hyporheic exchange than via lateral hyporheic exchange in the Mississippi River Basin.

Details are in the caption following the image

Spatial variation of modeled mean annual HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0041 removal amount (log10, kgN/m2/day): (a) urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0042 removal amount via lateral hyporheic exchange; (b) urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0043 removal amount via vertical hyporheic exchange; (c) urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0044 removal amount via total hyporheic exchange; and (d) ratio of the vertical urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0045 removal amount to the total (vertical and lateral) urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0046 removal amount with the stream orders.

3.3 Spatial Variation of Hyporheic Zone urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0047 Removal Amounts in Reaches With Different Orders and Land Uses

We quantified the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0048 removal amount (kgN/m2/day) across the reaches with different orders and land uses (Figure 7, Figures S4 and S5 in Supporting Information S1). Modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0049 removal amounts have an unimodal function of stream/river orders (or sizes); medium-sized rivers (fourth–sixth orders) had the highest urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0050 removal amounts (Figure 7a). Among the reaches with different land uses, forest reaches have the largest urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0051 removal amounts (Figure 7b), urban reaches have the second largest, and shrub reaches have the least urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0052 removal amounts. Their differences were all statistically significant when using the Kruskal-Wallis test, and the p-value of the two tests were all less than 2.2e−16. We also tested the impact of seasonal substrate concentrations on the spatial variation of urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0053 removal amounts (Figures S4 and S5 in Supporting Information S1). Using seasonal substrate concentration does not change the spatial relationship between modeled HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0054 removal amounts and stream/river orders; for example, medium-sized rivers still had the largest urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0055 removal amounts with different seasonal substrate concentrations (Figure S4 in Supporting Information S1). However, with seasonal concentrations, rank of the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0056 removal amounts changes with different land uses; for example, urban reaches had the largest urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0057 removal amounts with fall substrate concentrations, while forest reaches had the largest urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0058 removal amounts in spring. The difference of forest and urban reaches in urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0059 removal amounts was not statistically significant in summer and winter.

Details are in the caption following the image

Variation of modeled hyporheic zone (HZ) mean daily urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0060 removal amount in the reaches with different orders and land uses: (a) effects of sizes and (b) effects of land use.

3.4 Influence Factors on Spatial Variation of Hyporheic Zone urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0061 Removal Amounts

To identify the factors that play a dominant role in the spatial variations of the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0062 removal, we developed a random forest model with the inputs and HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0063 removal amounts. The partial dependence plots (Figure S6 in Supporting Information S1) showed that stream DOC, residence time, and exchange flux had strong nonlinear relationships with the modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0064 removal across different sized streams and rivers. Modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0065 removal increased with stream DOC and exchange flux, but it decreased with residence time. For reaches with different dominant land uses, exchange flux and residence time had a strong positive and negative relationship with the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0066 removal amounts, respectively. For all reaches, stream DOC had a high positive nonlinear relationship with the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0067 removal amounts, while stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0068 and DO had a weak nonlinear relationship.

The variable importance analysis using our random forest model showed that hydrologic variables were more important in explaining HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0069 removal amount spatial variation than substrate variables (Figure 8). Among the hydrological variables, hyporheic exchange flux was the most important variable and residence time was second most important in all sizes of reaches. Among the substrate variables, stream DOC was the most important. Similarly, the hyporheic exchange flux and residence time were the most and second most important variables for reaches with different land uses, respectively. While residence time was always the second most important variable across the reaches with different land uses, among the substrate variables, the stream DOC was the most important in all reaches except for the shrub reaches. For the shrub reaches, the stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0070 showed higher importance than the stream DOC.

Details are in the caption following the image

Relative importance of hydrologic variability and substrate availability in controlling spatial variation of the hyporheic zone (HZ) urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0071 removal amount in reaches along different sizes and dominant land uses. The variable importance (measured by Ginni value) is normalized to calculate the relative importance value (percent contribution) that ranges from 0 to 100.

We evaluated the impact of substrate availability on the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0072 removal amount in reaches across the different sizes and land uses (Figure 9). On average, removing substrate concentration limits tended to increase HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0073 removal amounts. Among substrate availability, applying the maximum DOC concentrations most increased the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0074 removal for all sized reaches and with different land uses, while maximum urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0075 concentrations least increased HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0076 removal amounts. Among the reaches with different land uses, shrub reaches showed the largest increase in HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0077 removal by removing DOC limits. Agricultural reaches showed the least increase by removing the substrate limits. Among the different sized reaches, small streams showed the largest increases in HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0078 removal amount. This result suggests that stream DOC is the most limiting substrate to urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0079 removal, especially for the reaches with relatively lower DOC concentrations (Figures 4 and 5).

Details are in the caption following the image

Sensitivity of modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0080 removal amount (log10(kgN/m2/day)) to the available substrate concentrations across reaches with different sizes and land uses: (a) all reaches; (b) small streams; (c) medium rivers; (d) large rivers; (e) forest; (f) shrub; (g) agriculture; and (h) urban. The base scenarios used the modeled substrate concentration data (Figures 3a–3c). The maxDOC scenarios applied a maximum concentration of modeled dissolved organic carbon (DOC) (Figure 3a) to all reaches, and the maxN scenario applied a maximum concentration of modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0081 (Figure 3b) to all reaches, and the minO scenarios applied a minimum concentration of modeled dissolved oxygen (DO) (Figure 3c) to all reaches.

3.5 Relationship Between Watershed/Stream Characteristics and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0082 Removal Amounts

With the publicly available watershed and stream properties data, we developed another random forest model to predict the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0083 removal amounts in the CRB to generalize which watershed/stream characteristics can better explain the spatial variation of the HZ denitrification. We built random forest models using the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0084 removal amounts via vertical, lateral, and total hyporheic exchange, respectively. Each model showed high predictive accuracy, with R2 values greater than 0.96 and MSE values less than 0.06 (Figure 10a and Table 2). The variable importance plots showed that for the lateral urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0085 removal amounts, D50, annual precipitation, annual evapotranspiration, and stream slope were the most important variables (Figure 10b); while for vertical urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0086 removal amounts, D50, annual precipitation, annual evapotranspiration, vegetation index, and percent of shrub area were the most important variables (Figure 10c). For total urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0087 removal amounts, D50, annual precipitation, annual evapotranspiration, and percent of shrub area were the most important variables (Figure 10d). The D50, stream slope variables, and annual precipitation were highly associated with the hyporheic exchange rate since the variables were used to calculate streambed hydraulic conductivity in NEXSS (Gomez-Velez et al., 2015).

Details are in the caption following the image

Predictions of the random forest model in the testing period and variable importance analysis results: (a) test results for the total HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0088 removal amount; (b) top 10 importance variables for lateral urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0089 removal amount (kgN/m2/day); (c) top 10 important variables for modeled vertical urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0090 removal amount (kgN/m2/day); and (d) top 10 important variables for modeled total urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0091 removal amount (kgN/m2/day). The top 10 variables are D50_m (median grain size), TOT_PPT7100_ANN (30-year mean annual precipitation at the NHD cumulated drainage), CAT_PPT7100_ANN (30-year mean annual precipitation at the NHD catchment), TOT_AET (mean annual evapotranspiration at the NHD cumulated drainage), CAT_AET (mean annual evapotranspiration at the NHD catchment), tshrub (percent of shrub land at the NHD cumulated drainage area), TOT_EVI_JAS_2012 (vegetation index at the NHD cumulated drainage area), CAT_STREAM_SLOPE (stream slope at the NHD catchment), tforest (percent of forest land at the NHD cumulated drainage), forest (percent of forest land at the NHD catchment), tagrc (percent of agricultural land at the NHD cumulated drainage), logd_m (log10(stream depth.,m)), and sinuosity (stream sinuosity).

Table 2. Summary of Model Performance in the Developed Random Forest Model
Model Train Test
R2 MSE R2 MSE
Lateral denitrification 0.96 0.06 0.96 0.05
Vertical denitrification 0.97 0.04 0.97 0.04
Total denitrification 0.97 0.03 0.97 0.03

The percent of shrub area was a key predictor in estimating stream DOC concentrations (Figure 4 and Figure S9 in Supporting Information S1). The results of variable importance supported that the HZ urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0092 removal amount increased with hyporheic exchange flux, which positively correlated with streambed hydraulic conductivity (or D50). The modeled urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0093 removal was also sensitive to the available DOC concentrations, which was negatively correlated to the percent of shrub area. To test how well our random forest model can be applied to the sub-basin in the CRB, we also built a random forest model with the same input data. As with the CRB, the most important variable for each sub-basin was all D50 (Figure S10 in Supporting Information S1), and the second most influential variable was mean annual precipitation or basin area, or bankfull width, depending on sub-basins.

4 Discussion

4.1 Key Controls on Spatial Hyporheic Zone Denitrification Variations

This study used the basin-scale RCM and random forest models to identify key factors associated with spatial variation of HZ denitrification in the CRB. Results showed that hydrologic variables were more important than substrate variables in explaining the spatial variation of HZ denitrification in reaches across different sizes and land uses. Among the selected hydrologic variables, hyporheic exchange flux was the most important variable for all reaches with different sizes and land uses. Among the substrate variables, stream DOC was considered the most important. Previous studies showed hydrologic variables can explain HZ denitrification. For example, the annual runoff variable can explain 91% of nitrogen attenuation from 49 watersheds in northwestern France among 13 biogeochemical and 12 hydrologic proxies (Frei et al., 2020). The stream depth was used to explain in-stream nitrogen loss rates in many studies (Alexander et al., 2000). The residence time and exchange flux or its combination were used to explain the potential denitrification capacity in different river basins (Gomez-Velez & Harvey, 2014; Gomez-Velez et al., 2015; Harvey et al., 2019). The importance of stream DOC in regulating HZ denitrification has been highlighted previously. Zarnetske et al. (2011) showed that labile DOC limits the HZ denitrification through reach-scale experiments. Also, Jan et al. (2021) showed through numerical experiments at the watershed scale that DOC was a limiting factor when exchange flux becomes higher and stream nitrate concentration was less sensitive, which is similar to the substrate sensitivity analysis result (Figure 9). Hester et al. (2014) showed that surface DOC, groundwater urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0094, and hydraulic conductivity of streambeds were the most sensitive parameters affecting the HZ denitrification through numerical experiments.

Among the different sized reaches, medium rivers (fourth–sixth orders) had the highest denitrification due to the largest exchange flux. The literature shows mixed results in the effects of reach size on denitrification (Alexander et al., 20072009; Tank et al., 2008; Wollheim et al., 2006). In our modeling, the highest exchange flux in the medium-sized rivers was mainly due to the coarser grain size (or higher hydraulic conductivity) of the streambed sediment. While the stream DOC, which limits denitrification, increased with stream orders (or sizes) in the CRB (Figure 4 and Figure S2 in Supporting Information S1), the spatial pattern of hyporheic exchange flux controlled the relationship between denitrification amounts and reach sizes. The potential difference between studies may be due to the spatial variation of sediment hydraulic conductivity along the different reach sizes between the river basins if the effect of substrate availability has less influence on denitrification than hydrologic variables. Also, our modeling study showed that hydrologic variables were more important in determining the spatial variation of denitrification in the stream networks than substrate variability. Thus, the hyporheic exchange attributed to the streambed hydraulic properties determined the effect of reach sizes.

Among the four dominant land use types, forest reaches had the highest HZ denitrification due to the highest hyporheic exchange flux (Figure 5b). The urban reaches had the second largest denitrification. However, the rank in difference of forest and urban reaches in HZ denitrification vary with seasonal substrate concentrations; for example, in fall, urban reaches had larger denitrification than forest reaches. Therefore, the substrate concentration can be important in the denitrification process, especially for the forest reaches where the denitrification is limited by sources rather than transport.

Agricultural reaches had the largest DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0095 and the second lowest DO concentration. These reaches, however, were characterized by lower denitrification than forest and urban reaches. Lower denitrification in the agricultural reaches was mainly due to lower exchange flux. Shrub reaches showed the lowest exchange flux and substrate concentration, so they had the lowest denitrification amounts. This limiting factor on HZ denitrification in streams with different land uses is consistent with the result of Myers (2008), who showed that among nine streams in western Wyoming, agriculture and forest reaches had the lowest and highest exchange fluxes, respectively, while agricultural reaches had higher DOC and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0096 concentrations than forest reaches. However, the agricultural reaches showed the highest denitrification due to highest substrate availability (e.g., organic matters) in the hyporheic sediments, even though the modeled exchange flux was the lowest in the agricultural reaches. Also, a study by Mulholland et al. (2008), using data from nitrogen stable isotope tracer experiments across 72 streams and eight regions, obtained results that contrast with ours, that is, urban streams had the highest denitrification rate, while agricultural streams had the second largest denitrification rate, and forest streams had the lowest denitrification rate.

Our modeling study showed that agricultural reaches had lower denitrification than urban and forest reaches due to the lowest hyporheic exchange. Interestingly, the two studies showed opposite results, even though they shared the same limiting factor on denitrification in agricultural and forest reaches. The differences can be explained by the representative time scale implicit in our model, which represents long-term average conditions. The experimental study of Myers (2008), on the other hand, represents short-term conditions. Similarly, the difference in both substrate concentration and exchange flux between reaches with different land uses may determine denitrification. In our modeling study, while forest reaches showed the largest denitrification in most scenarios, in fall the urban reaches showed higher denitrification than forest reaches when the highest DOC concentration was observed. Therefore, our modeling results suggest that the combination of substrate concentration and hydrologic exchange determine the difference of HZ denitrification in the reaches with different land uses.

4.2 Generalization of Important Watershed/Stream Variables in Controlling HZ Denitrification

This study used a machine-learning approach (i.e., random forest model) to improve our understanding of which watershed/stream variables can better explain the spatial variation of HZ denitrification in the CRB. This approach is a powerful tool to predict complex systems, but due to low interpretability, machine learning is considered a box model. However, our modeling study demonstrated that our random forest models successfully captured sub-basin/basin-scale modeled denitrification, and the selected important variables all represented the dominant processes that controlled denitrification across streams with different sizes and land uses.

Our random forest model showed very high prediction accuracies; R2 values are greater than 0.96 and MSE values are less than 0.06. This result suggests that the random forest model with publicly available watershed and stream properties data can capture key variables controlling basin-scale spatial denitrification variation, even though there are complex interactions between many processes/variables determining the spatial variation of HZ denitrification.

Also, the variable importance analysis showed that the stream morphological parameters (D50 and stream slope), climate (annual precipitation and evapotranspiration), and stream DOC (percent of shrub area) can explain most HZ denitrification variability. D50 and stream slope were highly correlated with the modeled exchange flux used in this study. The percent of shrub area was one of two predictor variables in stream DOC concentration, which was a major limiting substrate concentration in the modeled denitrification. Our study demonstrates that our random forest model and a small number of key watershed/stream variables (D50, stream slope, precipitation/evapotranspiration, and land cover), which are fairly easy to measure or characterize, can be used to determine the spatial variation of HZ denitrification at the basin scale, without explicit and complex numerical modeling. Therefore, the important variables and random forest model we developed can be used as a hypothesis testing tool for spatial variation of HZ denitrification at the basin scale and as a sampling design tool for large-scale HZ experimental studies.

4.3 Implications for Role of Hyporheic Zone in River Corridor Processes Under Future Climate Changes

In the CRB, it is expected that future climate change will increase winter/spring flow, decrease summer flow (Hamlet et al., 2013), and increase stream water temperature (Ficklin et al., 2014). The sensitivity of hydrologic changes to future climate change will also vary between sub-basins in the CRB. This change obviously alters the effectiveness of the HZ in regulating water quality in rivers. Based on our modeling results, denitrification increased with the hyporheic exchange, which was a function of grain sizes of streambed, annual precipitation/evapotranspiration, and stream slope, while lower stream DOC availability may limit denitrification. Compared with other river basins in the United States, the streams of the CRB had lower DOC concentrations (Yang et al., 2017), and watershed DOC processes were characterized as transport-limited rather than source-limited (Zarnetske et al., 2018). Therefore, we expect that increasing runoff can generate higher DOC flux (or concentration) in streams, which may promote denitrification in the HZ.

More frequent and intense fires are expected due to future climate conditions (Abatzoglou & Williams, 2016), which can alter the conditions of terrestrial and aquatic systems. For example, fire removes vegetation and delivers more nitrogen/sediments via higher peak flow. On the other hand, fire reduces DOC transport in streams due to biomass and soil carbon burning (Wei et al., 2021). Therefore, higher exchange/more nitrogen availability in the HZ may increase denitrification, while lower sediment hydraulic conductivity values due to finer particle sediment transport by fire and reduced DOC concentrations can reduce denitrification. The impact of fire on HZ denitrification requires extensive future works. Also, the climate and land use changes or their combination may alter the future stream water qualities in different ways (El-Khoury et al., 2015). Therefore, future study should consider both projected changes in determining the role of the HZ.

4.4 Implications for Stream/Watershed Management

Excesses in agriculture activity and urbanization continue to degrade water quality in streams and rivers through increases in atmospheric pollutant depositions and excess in nutrient exports (Frei et al., 2020; Le Moal et al., 2019). To improve water quality in rivers, reducing nutrient loading and increasing nutrient removal should be considered. Our modeling study suggests that increasing denitrification occurs by enhancing the exchange flux between stream and HZ. This result is aligned with previous works (Liu & May Chui, 2020; Ward et al., 2011). For example, Liu and May Chui (2020) demonstrated that through surface and hyporheic flow simulations, increasing hyporheic flux by elevating the height of weirs led to maximizing the nitrogen removal amounts and nitrogen removal ratios. Our modeling also shows that denitrification through vertical exchange is larger than that through lateral exchange and its difference is larger for the large river. This result suggests that enhancing the vertical exchange with higher grain-sized (permeable) streambed materials is more effective in reducing excess nitrogen than lateral exchange through induced channel meandering or others. In addition to enhancing exchange flux, modifying substrate concentration may alter the efficiency of denitrification processes in the HZ. For example, our modeling shows that when exchange flux is high, stream DOC concentration is a limitation factor in the HZ denitrification (Jan et al., 2021). Therefore, to maximize the nitrogen removal process in the HZ, a combination of high exchange flux and stream DOC availability may be required.

4.5 Current Research Limitations and Future Study

This study demonstrated that combination of the reaction network model and empirical methods can quantify the spatial variation of HZ denitrification at the basin scale. However, due to the simplified model structure and assumptions used, this model had several limitations. The first limitation of this study was that hydrological/substrate variables were assumed to be constant over time, and the variables were empirically estimated or dependent on the other model outputs (e.g., SPARROW flow and total nitrogen fluxes). This assumption may create a bias in a different way depending on hydrologic and substrate conditions. For example, in the streams where hydrologic conditions are unsynchronized or synchronized with substrate variables, modeled denitrification may be overestimated or underestimated with the current model assumptions. Future studies should implement the dynamic hydrologic/substrate concentration in-stream and in the HZ; for example, the SWAT-MRMT-R model (Fang et al., 2020) can be used, and to account for the dynamic hydrologic exchange flux/residence time in the HZ, the SWAT-MODFLOW (Bailey et al., 2016) or other integrated hydrologic–biogeochemistry models (Chen et al., 2021) may be considered.

The current model was heavily dependent on the NEXSS-based hyporheic exchange flux and residence time. Even though NEXSS used the physical hydraulic/groundwater models, the exchange flux and residence time were highly correlated with the estimated hydraulic conductivity of the streambed. The NEXSS model used an empirical relationship between D50 and sediment hydraulic conductivity to derive the hydraulic conductivity of the streambed at the NHDPLUS stream reach (Gomez-Velez et al., 2015). High spatial heterogeneity of grain size distribution within reach-scale stream sediment (Ren et al., 2020) and its change due to disturbance make it challenging to estimate the representative hydraulic conductivity at the reach-scale (Stewardson et al., 2016). The hydrologic condition also alters vertical distribution of hydraulic conductivity in streambeds; for example, gaining streams have higher conductivity with depth, but losing streams have lower conductivity (Chen et al., 2013). Therefore, a future study should focus on introducing advanced methods (i.e., machine learning approaches) and find better predictor variables for streambed hydraulic conductivity (Abimbola et al., 2020) to reduce the uncertainty in the RCM.

The second limitation is that this model does not explicitly simulate nitrification processes in the HZ. The current model only implements aerobic respiration and denitrification. When oxygen is abundant and residence time is short, nitrification can be dominant (Zarnetske et al., 2012). This model assumes that nitrification is not dominant. Based on the Dakomber number, lower order streams tend to have lower residence time, so nitrification may be an important process. Interestingly, most streams in the CRB with low residence times tend to have a drainage area with forest lands. Our modeling study suggests that denitrification in the forest streams was mainly limited by the available DOC, but not stream nitrate concentration. Even if nitrate can be more abundant via nitrification because of shorter residence time in the HZ, denitrification of forest streams may not increase because nitrate is not a major limiting factor.

The last limitation is that the current model estimates of HZ denitrification are not validated with field measurements, even though the RCM computed the HZ denitrification using the reaction network model with reasonable estimates of hydrologic and substrate variables. This deficiency may reflect the limitation of currently available denitrification measurements for the HZ, especially for large river basins. Many experimental studies focus on total in-stream processes of nutrient uptake rather than exclusively denitrification measurements (Findlay et al., 2011; Tank et al., 2008). Since our model estimates represent spatially varied denitrification and temporally averaged conditions, the comparison with short-term snap measurements that are usually available in the experimental studies is a big challenge. A recent study in the HJ Andrew watershed in Oregon has done the detailed mapping of stream geomorphology, hydrology, biology, and chemistry along the fifth order streams of the forested watershed (Ward et al., 2019). This may be a good starting dataset to validate the model inputs (for example, concentrations of DOC, DO, and nitrate in the HZ and streambed hydraulic conductivity) and the modeled denitrification along with the stream orders in the future study.

5 Summary and Conclusions

The important role of HZ denitrification is well recognized in hydrologic and biogeochemistry communities (Groffman, Davidson, & Seitzinger, 2009; Harvey & Gooseff, 2015); however, modeling studies quantifying basin-scale HZ denitrification are still limited in current literature. To fill the knowledge gaps, this study used a simplified, spatially fine resolution, basin-scale, coupled-carbon and nitrogen HZ model and random forest models to identify key controls on the spatial variation of HZ denitrification in the CRB. The variable importance analysis demonstrated that hydrologic variables (hyporheic exchange flux and residence time) were more important in explaining the spatial variation of HZ denitrification than substrate variables (stream DOC, nitrate, and DO) across reaches with different sizes and land uses. Among the hydrologic variables, hyporheic exchange flux can explain most spatial variation of the modeled denitrification amounts. Within the substrate variables, the denitrification amount was limited most by the available DOC. Among the different sized reaches, medium rivers (fourth–sixth orders) with the highest exchange fluxes had the largest denitrification amounts. Among the reaches affected by different land use, forest reaches exhibited the most denitrification due to the highest exchange flux, and urban reaches had the second largest denitrification due to relative high exchange flux and stream DOC. However, ranks in difference between forest and urban reaches in denitrification amounts can change depending on seasonal substrate concentrations. For example, urban reaches with fall substrate concentration showed higher denitrification than forest reaches. These results suggest the combination of hydrologic variability and stream DOC control the spatial difference of HZ denitrification among the reaches with different land uses. Also, while reaches in the agriculture lands had the highest DOC concentrations, the HZ denitrification amounts were second lowest due to lower exchange flux. Reaches in the shrub land had the lowest denitrification due to both the lowest exchange flux and DOC availability.

We expanded our efforts to develop a general random forest model to identify key factors associating with the spatial variation of HZ denitrification in the CRB with publicly available watershed and stream properties data. Our random forest model showed a high performance (R2 > 0.96 and MSE < 0.06), with stream morphology parameters (D50), climate (annual precipitation and annual evapotranspiration), and land use (percent of shrub) the most important variables for explaining spatial variation of the modeled HZ denitrification. These results support the relative importance analysis with the model’s input variables; hyporheic exchange flux and available DOC concentration were key limiting factors in HZ denitrification variation in the CRB based on our findings. In this study, hyporheic exchange flux was estimated based on the NEXSS simulation (Gomez-Velez et al., 2015), and its flux was highly dependent on streambed sediment grain size/hydraulic conductivity estimates. To reduce the uncertainty of our RCM, future studies should focus on collecting detailed measurements of hydraulic conductivities (Ren et al., 2020; Stewardson et al., 2016) and developing advanced methods characterizing the spatial variation of hydraulic conductivities (Abimbola et al., 2020). In addition, the current model only represented the spatial averaged conditions of HZ denitrification in the CRB, and key model input variables were temporally constant. Therefore, temporal components should be incorporated using integrated hydrologic–biogeochemistry models to accurately represent basin-scale denitrification in the CRB.

Overall, this study indicates that the combination of reaction network modeling and empirical substrate concentration models can quantify the spatial variation of HZ denitrification at the basin scale. This modeling framework can be easily applied to the regional and continental scales and can help to understand the role of the HZ across stream networks in large river basins with different hydrologic/geochemical conditions.

Acknowledgments

This research was supported by the Department of Energy (DOE), Office of Science (SC) Biological and Environmental Research (BER) program, as part of BER’s Environmental System Science program. This contribution originates from the River Corridor Scientific Focus Area at Pacific Northwest National Laboratory (PNNL). This research used resources from the National Energy Research Scientific Computing Center, a DOE-SC User Facility. PNNL is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of DOE or the U.S. Government. We are thankful to the editors and two anonymous reviewers for providing helpful comments on a previous version of this manuscript. We also want to thank to Dr. Daniel R. Wise who helps us to better understand the SPARROW model inputs and outputs. This manuscript has been coauthored by staff from UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the Department of Energy (DOE). The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes.

    Appendix A: Descriptions of the Basin-Scale River Corridor Model

    The RCM computes aerobic respiration and two-step denitrification in the HZ at the scale of NHDPLUS stream reaches within the CRB. Figure A1 shows the conceptual diagram of the RCM. Tables A1 and A2 include the three reactions and their associated model parameter values. The model computes at hourly timesteps, but the model key input data—including exchange flux, residence time, and stream solute (DOC, DO, and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0097) concentrations—are constant over time; thus, we should consider that modeled denitrification is a long-term averaged estimate. In addition, each reaction in the HZ and exchange between HZ and stream are vertically and laterally determined independently. This model computes the solute exchange between stream and HZ as expressed in Equations A1 and A2. In Equation A2, the exchange volume (V) is computed by multiplying exchange flux (q) by the residence time (urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0098) and stream surface area (width (w) × length (l urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0099. The three reactions are computed by solving the R1, R2, and R3 with the approach proposed by Song et al. (2017), and the associated parameters are obtained from Table 2 in Song et al. (2018).

    Details are in the caption following the image

    Simplified conceptual diagram of the river corridor model (RCM). The RCM computes the aerobic respiration and two-step denitrification in the hyporheic zone (HZ) at the reach scale. The model requires five key inputs; stream dissolved organic carbon (DOC) and dissolved oxygen (DO) were estimated by the two regression models, and stream urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0100 concentrations were estimated from the SPARROW 2012 model (Wise et al., 2019), and the vertical and lateral exchange fluxes urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0101 and their median residence times urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0102 between the streams and hyporheic zone (HZ) were estimated from the Networks with Exchange and Subsurface Storage (NEXSS) (Gomez-Velez et al., 2015).

    Table A1. Aerobic Respiration and Two Steps of Denitrification Reactions
    Reaction process Reaction equations
    Aerobic respiration R1 urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0119
    Denitrification R2 urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0120
    R3 urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0121
    Table A2. Reaction Parameter Values and Initial Substrate Concentrations
    Parameter R1 R2 R3
    Reaction rates fi urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0028 × 0.65 0.65 0.99
    ki (mole/L/hr) 3 × 1.17 1.17 0.97
    Kd,i (mmole/L) 0.25 0.25 0.25
    Ka,i (mmole/L) 0.001 0.001 0.004
    Initial concentrations (mole/L) in hyporheic zone DOC urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0029 DO
    6.37 e−5 7.92 e−5 2.87 e−4
    • Note. R1 is aerobic respiration urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0030 are a two-step anaerobic respiration.
    The following equation is used to calculate the concentration change in the HZ due to the mass exchange between the stream and HZ, as well as microbial reactions in the HZ:
    urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0103(A1)
    Where urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0104 is the HZ residence time, Cs,i is the stream “i” solute concentration (DOC, urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0105, and DO), Ci,t is the hyporheic “i” solute concentration at the “t” time step. urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0106 is the stoichiometric coefficient of solute i in reaction j. Rj is the reaction rate the jth reaction.
    urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0107(A2)
    Where V is the hyporheic exchange volume (urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0108). Using Equation A2 can compute the mass exchange between stream and HZ.
    urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0109(A3)
    urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0110(A4)
    urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0111(A5)
    Where urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0112, urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0113 and urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0114 denote the maximum specific uptake rate of organic carbon, half-saturation constants of the electron acceptors, and half-saturation constants for the electron donors. urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0115 is the concentration of electron acceptor (mol/L), urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0116 is the concentration of electron donor (mol/L), and biomass (BM) is the concentration of biomass (mol/L). Reaction rate Ri is computed using unregulated effect (a Monod-type kinetics coefficient (urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0117) in Equation A4, and regulated effects (urn:x-wiley:00431397:media:wrcr26339:wrcr26339-math-0118 in Equation A5.

    Data Availability Statement

    The model codes/scripts for this study will be made available on this PNNL Gitlab repository at https://gitlab.pnnl.gov/sbrsfa/basin-scale-hyporheic-zone-denitrification-modeling, and the key model inputs/outputs are freely available at https://doi.org/10.5281/zenodo.7152249. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).