Volume 9, Issue 5 e2021EA002086
Research Article
Open Access

Characterizing Drought Behavior in the Colorado River Basin Using Unsupervised Machine Learning

Carl J. Talsma

Carl J. Talsma

Earth and Environmental Sciences, Los Alamos National Laboratory, Los Alamos, NM, USA

Carbon Solutions LLC, Bloomington, IN, USA

Contribution: Validation, Formal analysis, Data curation, Writing - original draft, Writing - review & editing, Visualization

Search for more papers by this author
Katrina E. Bennett

Corresponding Author

Katrina E. Bennett

Earth and Environmental Sciences, Los Alamos National Laboratory, Los Alamos, NM, USA

Correspondence to:

K. E. Bennett,

[email protected]

Contribution: Conceptualization, Methodology, ​Investigation, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration, Funding acquisition

Search for more papers by this author
Velimir V. Vesselinov

Velimir V. Vesselinov

Earth and Environmental Sciences, Los Alamos National Laboratory, Los Alamos, NM, USA

Contribution: Methodology, Software, Writing - original draft, Writing - review & editing

Search for more papers by this author
First published: 29 April 2022
Citations: 1

Abstract

Drought is a pressing issue for the Colorado River Basin (CRB) due to the social and economic value of water resources in the region and the significant uncertainty of future drought under climate change. Here, we use climate simulations from various Earth System Models (ESMs) to force the Variable Infiltration Capacity hydrologic model and project multiple drought indicators for the sub-watersheds within the CRB. We apply an unsupervised machine learning (ML) based on Non-Negative Matrix Factorization using K-means clustering (NMFk) to synthesize the simulated historical, future, and change in drought indicators. The unsupervised ML approach can identify sub-watersheds where key changes to drought indicator behavior occur, including shifts in snowpack, snowmelt timing, precipitation, and evapotranspiration. While changes in future precipitation vary across ESMs, the results indicate that the Upper CRB will experience increasing evaporative demand and surface-water scarcity, with some locations experiencing a shift from a radiation-limited to a water-limited evaporation regime in the summer. Large shifts in peak runoff are observed in snowmelt-dominant sub-watersheds, with complete disappearance of the snowmelt signal for some sub-watersheds. The work demonstrates the utility of the NMFk algorithm to efficiently identify behavioral changes of drought indicators across space and time and to quickly analyze and interpret hydro climate model results.

Key Points

  • Unsupervised machine learning automatically identifies key sub-watersheds with significant changes in their future drought indicators

  • In the Colorado River Basin mountains, distinct differences in future runoff seasonality and intensity changes are established

  • Significant uncertainty in drought behavior is observed among the applied climate models

Plain Language Summary

Our study applies a pattern recognition computer program to categorize regions with the Colorado River Basin (CRB), based on the modeled future behavior of several indicators important to drought. We use the results from models of climate and water to estimate how drought will change in the future. We then group the behavior of sub-watersheds based on identified similarities in their response to changes we observed. We show that areas of the Upper CRB could experience a large reduction in available water for evapotranspiration (for use by trees, e.g.,), and that future hydrologic conditions may more closely resemble those of the Southwest CRB regions today. We are also able to pinpoint which sub-watersheds should expect large losses in snowpack based on expected changes to spring runoff contribution to streamflow. The work is important in that it highlights a key tool that can be used for rapid assessment of vast amounts of climate and hydrology data in a region that may be critically impacted by future changes in extreme events, such as drought.

1 Introduction

Climate and hydrology science has increasingly relied on large-scale models and data sets to simulate future climate conditions and inform practical concerns over water infrastructure and environmental management (Kingston et al., 2020; Schewe et al., 2014). As these simulations have increased in scale and complexity, so has the need for improved processing and explainability of model results (Gupta et al., 2014; Hrachowitz et al., 2014; Mascaro et al., 2015; Orth et al., 2015). Emerging techniques such as machine learning (ML) to study changes across these vast amounts of data are only now being applied to better understand and inform decision making within the climate, hydrology, and earth science realms (Camps-Valls et al., 2020; Harpold et al., 2012; Reichstein et al., 2019; Virts et al., 2020).

A particular concern of late in hydroclimate modeling and analysis is the case of drought. Around the globe, drought causes tremendous global economic and environmental loss each year. It has been estimated that the monetary loss of drought for American farmers and businesses is $6–8 billion each year (in 2004 value, which is equivalent to today's value of $8.16–10.88 billion; Western Governors Association (WGA), 2004). Despite its economic importance, drought is poorly understood among all other climate-induced disasters (e.g., flooding) due to (a) a lack of unanimous definition for drought among scientists and stakeholders (Blauhut, 2020) and (b) the complex set of factors that influence drought and its effects on society (Wilhite, 2009). Drought is often defined categorically as hydrologic (low supply of surface and sub-surface water), meteorological (low rainfall, high evapotranspiration), or agricultural (low water availability for plants). The drivers of drought are even more numerous (Xiao et al., 2018). Further, climate change is expected to amplify and intensify the hydrologic, meteorologic, and climatic factors that induce drought events leading to higher intensity and frequency of drought events in the future, with consequences for ecology, economy, and society (Zhou et al., 2019). Therefore, drought is arguably one of the greatest climate change related risks to stability of society and economy facing humans today.

The Colorado River Basin (CRB) constitutes an area of increasing drought risk (Strzepek et al., 2010) and an area of high economic importance related to its freshwater resources (Bennett et al., 2021; James et al., 2014). Additionally, there is a broad diversity in ecological, climatic, and hydrologic conditions within the CRB contrasted by the arid Southwest United States and the high-elevation snow-dominant mountains of Colorado, Utah, and Wyoming through which the Colorado River flows. Changes in future climate within the CRB are especially concerning due to the CRB's reliance on high-elevation snowpack for annual runoff (Christensen et al., 2004; Li et al., 2017). In addition, observed snowpack has been declining historically (Fassnacht & Hultstrand, 2015), and is projected to decline strongly into the future (Ray et al., 2008).

To better understand complex behavior in hydroclimatic applications and its relation to physical processes, we use the recently developed unsupervised ML algorithm—Nonnegative matrix factorization using k-means clustering (NMFk, Vesselinov et al., 2018)—to extract signals from modeled results of future drought conditions within the CRB. This novel application of NMFk allows us to quickly identify and understand the behavioral differences that exist within the model output both spatially and temporally. We present the capabilities of NMFk as a tool for processing hydroclimatic data and model output for understanding complex behavior and change detection, here in the novel context of future drought behavior in the CRB. We use this example of NMFk to illustrate how ML can be utilized to estimate and elucidate earth science phenomena.

By performing these ML analyses, we can identify spatial patterns as well as threshold changes in hydrologic behavior across the CRB. Using ML models to isolate specific drought-indicator behaviors, we can limit our analysis of the observed indicator behavior to key seasonal periods and sub-watersheds within the CRB. NMFk allows us to disentangle the complex spatial and temporal relationships between drought-indicators and their influencing factors. Through use of a novel ML approach, we demonstrate a capability to automatically isolate where key indicator behavior contributes to drought and where and how behavior will change in the future. Using NMFk, we also reduce the size of the output data to analyze by separating relevant behaviors to quickly process large hydrologic model outputs (30 GB for each ESM over a 30-year time period), identify possible errors, and target unforeseen responses. This approach allows us to dramatically narrow our analysis and processing of the hydrologic model outputs, improving our ability to understand the spatial and temporal behavior of drought indicators.

This paper is organized as follows. In the Section 2, we describe the study site and the methods and data used for hydrologic modeling the hydrology of the CRB under different estimates of future climate change. We describe the drought indicators chosen and how they are calculated, based on the outputs from the hydrologic modeling. We further describe the NMFk algorithm, a novel unsupervised ML method applied to cluster the sub-watersheds within the CRB based on their annual temporal behavior. In Results, we detail ML outputs related to the clustering of drought indicators both spatially and temporally. We interpret the ML results in Discussion, including the causes and implications for drought in the CRB. The Conclusion contains a brief description of the key findings as well as a description of the utility of the ML algorithm for interpretation of model results.

2 Materials and Methods

2.1 Study Site

The study area for this research is the CRB. Located in the Southwestern Unites States and Northern Mexico, the CRB covers an area of 6.4 × 105 km2 (Figure 1). The basin stretches from sea level in the Gulf of California, to higher than 4,000 m in the Southern Rocky Mountains. The CRB contains a broad range of climate zones and ecosystems, with the annual average temperature ranging from 4 to 24°C and the average annual precipitation ranging from 79 to 1,699 mm, based on gridded historical (1981–2010) climate data archive used to force the hydrologic model (Livneh et al., 2015). Much of the precipitation throughout the basin falls as snow at high elevations, and ∼70% of the annual streamflow originates in the Upper CRB upstream from Glen Canyon, Arizona (Christensen et al., 2004; Li et al., 2017). Due to this fact, the CRB is often characterized in two portions: the high-elevation snow dominant Upper CRB and arid low-elevation Lower CRB. The water resources of the CRB are critical to water security within the CRB and to many population centers outside the watershed boundaries where CRB water is diverted (i.e., Los Angeles, San Diego, Salt Lake City, Albuquerque, Denver, Figure 1).

Details are in the caption following the image

The domain of the Colorado River Basin with adjacent areas that receive Colorado River water. Adapted from United States Geological Survey, 2012 (accessed January 11th, 2021; USBR, 2012).

2.2 Earth System Model Simulations

In this study, we use six different, commonly used Earth System Models (ESMs) run with dynamic vegetation. The ESMs and their dynamic vegetation models are: HadGEM2-ES365 (Collins et al., 2011; Cox, 2001), MIROC-ESM (Sato et al., 2007; Watanabe et al., 2011), MPI-ESM-LR, IPSL-CM5A-LR (Dufresne et al., 2013; Krinner et al., 2005), GFDL-ESM2M, and GFDL-ESM2G (Delworth et al., 2006; Shevliakova et al., 2009). We used statistically downscaled data from the Multivariate Adaptive Constructed Analogue (MACA) database (Abatzoglou & Brown, 2012).

For this work, we examine the representative concentration pathway (RCP) 8.5 emissions scenario, which follows shifting greenhouse gas (GHG) emissions levels over time (Le Quéré et al., 2015) and anticipates substantial increases in GHG emissions by 2100 (van Vuuren et al., 2011). The six ESMs were chosen to represent the spread of projected change in precipitation and temperature for the CRB as calculated by ESMs available in the downscaled MACA data set used in the fifth version of the Coupled Model Intercomparison Project (CMIP5). The six selected ESMs were selected to capture the spread of scenarios from dry to wet and from the lowest to the highest temperature increase, both annually and seasonally.

2.3 Hydrologic Modeling & Drought Indicators

The ESM projected precipitation and temperature were used to force the Variable Infiltration Capacity (VIC, version 4.2) hydrology model (Liang et al., 1996) using different climate scenarios for historical (1970–1999) and future (2070–2099) time periods. The output from VIC captures the historical and future climate conditions (as physical indicators) for flow and drought conditions within the CRB. VIC was implemented and run as described in Bennett et al. (20182019) and Solander et al. (2019), and is thus only briefly described herein. VIC is a spatially distributed, macroscale hydrologic model simulating the full water and energy balance while accounting for 1-D variably saturated infiltration through the vadose zone. VIC includes a decoupled routing model that is used to estimate surface water discharge (Lohmann et al., 19961998). We ran VIC at a 1 hourly timestep and a urn:x-wiley:23335084:media:ess21149:ess21149-math-0001° latitude/longitude (∼7 km) spatial resolutions across the CRB.

Parameters utilized to run the VIC model simulations include fractional canopy coverage and a spatially varying monthly climatology of leaf area index, albedo, and canopy fraction (Bohn & Vivoni, 2016), and land cover fractional areas (calculated using the Moderate-Resolution Imaging Spectroradiometer [MODIS] MCD12Q1 Collection 5 Plant Functional Type product (Friedl et al., 2010). Seasonal cycles of vegetation parameters, including LAI, canopy fraction, and albedo, were derived from the MODIS Collection 5 MOD15A2, MCD43A3, and MOD13A1 products (Huete et al., 2002; Myneni et al., 2002; Schaaf et al., 2002). Vegetation structural parameters were estimated from Ducoudré et al. (1993). Soil physical properties were derived using the United Nations Food and Agriculture Organization (FAO) Digital Soil Map of the World (FAO, 1995). More details on the VIC model parameters are described in Bennett et al. (2018). Historical climate data used to force VIC were daily precipitation, minimum and maximum temperature, and wind speed (Livneh et al., 2015). The daily fields were disaggregated to hourly intervals within the VIC model as described in Bohn et al. (2013).

VIC model parameters were adjusted using an automatic multi-objective calibration approach that uses simulated streamflow compared to United States Geological Survey (USGS) naturalized gauged monthly streamflow data (USBR, 2012) for 23 sub-basins within the CRB as described in Solander et al. (2019). The Multi-Objective Complex Evolution algorithm (MOCOM-UA; Yapo et al., 1998) was applied over three objective functions: streamflow peak matching (Nash-Sutcliffe efficiency), streamflow low-flow matching (log Nash-Sutcliffe efficiency) and streamflow volume bias (normalized Root Mean Square Error [nRMSE]). Streamflow was calibrated by modifying six soil parameters: the thicknesses of the second and third soil layers, the infiltration capacity curve shape parameter, and three non-linear baseflow parameters. We additionally calibrated the albedo of newly fallen snow after Mendoza et al. (2015).

Using the hydrologic and meteorological output from the VIC model, we calculated five individual drought indicators: number of dry dates (dryd), maximum temperature (tempx), minimum soil moisture (soilmn), minimum runoff (rn), and maximum evapotranspiration (evapx). As a first step, we calculate all drought indicators for the 134 HUC8 sub-watersheds for 5-day periods (73 each year, with leap year days removed, e.g., January 1st–5th, 6th–10th, and so on) over the historical and future 30-year periods (1970–1999 and 2070–2099, respectively). We then average the 5-day-periods over the appropriate 30-year period giving us the average annual cycle for each time period at a 5-day resolution. The “delta” case is simply the averaged historical annual cycle for a drought indicator subtracted from the averaged future annual cycle. The dryd indicator is the number of days within a 5-day period with no precipitation, while the other indicators represent either the maximum (95th percentile) or minimum (fifth percentile) daily value for each 5-day period. Runoff here is the average non-routed contribution of both runoff and baseflow from an individual VIC model grid cell, but for brevity is referred to simply as “runoff” throughout.

2.4 Machine Learning Methodology: NMFk

A novel unsupervised ML approach was applied in this work (Vesselinov et al., 2018). The ML methods are based on Nonnegative Matrix/Tensor Factorization (NMF/NTF) coupled with k-means clustering (NMFk/NTFk). The factorization is solved as a minimization problem, which also allows various optimization constraints (sometimes referred to as regularization terms) to be applied. In this way, the constraints provide an efficient way to add physics information in the ML process.

NMF is a Blind Source Separation (BSS) technique that has been widely applied to the automated extraction of hidden signals present in complex data sets (e.g., earth sciences, astronomy, biology) with little or no a-priori knowledge or physical modeling efforts (Jung et al., 2000; Nuzillard & Bijaoui, 2000; Sadhu et al., 2017). Perhaps the most prominent benefit of using an unsupervised ML is that any bias from past experience or subject-matter expertise is minimized (Belouchrani et al., 1997). Instead, the signals extracted are based only on the information within the data. NMF does not assume any specific statistical distribution or independence of the original data. However, NMF does impose nonnegative constraints on the estimated factorization matrices, so the extracted signals are readily interpretable with relation to the original data. This is an improvement over other BSS techniques, such as Principle Component Analysis (PCA) or Independent Components Analysis (ICA), that do generate negative matrix elements and are not additive and therefore do not provide direct interpretability of the original data (Kayano & Konishi, 2009). Further, PCA results in the loss of features within the original data set and instead preserves the variance of the original data set and is more suitable for linearly separable data. NMFk results in additive features that are used to reconstruct the original data set as best as possible irrespective of linear separability providing highly interpretable results. NMFk base algorithms have shown favorable comparisons against alternative ML methods when applied to well-known benchmark data sets (Nebgen et al., 2020; Vangara et al., 2021).

The fundamental task of NMF is to decompose a data matrix urn:x-wiley:23335084:media:ess21149:ess21149-math-0002 urn:x-wiley:23335084:media:ess21149:ess21149-math-0003 (a matrix of real numbers, R, with dimensions urn:x-wiley:23335084:media:ess21149:ess21149-math-0004) into two non-negative matrices urn:x-wiley:23335084:media:ess21149:ess21149-math-0005 so that
urn:x-wiley:23335084:media:ess21149:ess21149-math-0006

In our case, urn:x-wiley:23335084:media:ess21149:ess21149-math-0007 is the number of sub-watersheds (134 HUC8 sub-watersheds), and urn:x-wiley:23335084:media:ess21149:ess21149-math-0008 is the number of 5-day time periods throughout the year (73). Note that urn:x-wiley:23335084:media:ess21149:ess21149-math-0009 is a positive integer (less than urn:x-wiley:23335084:media:ess21149:ess21149-math-0010) defining the unknown number of original features (signals) hidden in the data (Lin, 2007). Here, a feature or signal is a unique temporal behavior that is identified among the 134 sub-watersheds but could be more broadly defined for other applications as any unique pattern of data present within a data matrix. urn:x-wiley:23335084:media:ess21149:ess21149-math-0011 is often regarded as the feature matrix (i.e., representing the unique signals or features present the original data), and urn:x-wiley:23335084:media:ess21149:ess21149-math-0012 is called the mixing matrix capturing how the signals are mixed at each watershed.

NMF determines urn:x-wiley:23335084:media:ess21149:ess21149-math-0013 and urn:x-wiley:23335084:media:ess21149:ess21149-math-0014 by minimizing the cost function urn:x-wiley:23335084:media:ess21149:ess21149-math-0015, which is a measure of discrepancy between actual data (urn:x-wiley:23335084:media:ess21149:ess21149-math-0016) and factorized reconstruction of urn:x-wiley:23335084:media:ess21149:ess21149-math-0017 (urn:x-wiley:23335084:media:ess21149:ess21149-math-0018). In this study, we use the Frobenius matrix norm during the minimization process:
urn:x-wiley:23335084:media:ess21149:ess21149-math-0019

Here, our goal is to identify and extract the hidden features (signals) in the drought indicators that contribute to the changes in historical and future hydroclimatic conditions. However, a significant limitation of the traditional NMF is that a priori knowledge of the number of signals is required to solve the objective function, but this is often unknown in practice. Our novel method NMFk (Alexandrov & Vesselinov, 2014; Vesselinov et al., 2018) addresses this limiting using the assumption that an optimum number of signals can be obtained based on the robustness and reproducibility of the NMF results. To this end, NMFk computes solutions for all possible numbers of signals urn:x-wiley:23335084:media:ess21149:ess21149-math-0020 ranging from 1 to d (less than min(m, n)) and then estimates the accuracy and robustness of these solution sets for different values of urn:x-wiley:23335084:media:ess21149:ess21149-math-0021. For each k value, the robustness is estimated in NMFk by performing a series of NMF runs (e.g., 1,000) with random initial guesses urn:x-wiley:23335084:media:ess21149:ess21149-math-0022 and urn:x-wiley:23335084:media:ess21149:ess21149-math-0023 elements. After that, the series of NMF solutions are grouped using a custom semi-supervised urn:x-wiley:23335084:media:ess21149:ess21149-math-0024-means clustering. The customization to the original algorithm is to keep the number of solutions in each cluster equal to the number of NMF runs (e.g., 1,000). The clustering is applied to measure how good a particular number of extracted signals, urn:x-wiley:23335084:media:ess21149:ess21149-math-0025 is to accurately and robustly describe the original data. The optimal number of signals urn:x-wiley:23335084:media:ess21149:ess21149-math-0026 is estimated automatically by the NMFk algorithm. A detailed description of NMFk can be found in Vesselinov et al. (2018) (Vesselinov et al., 2018).

NMFk automatically identifies plausible solutions for the number of drought indicator signals present in the analyzed data set with the optimal number of signals estimated by the solution robustness. The data capture annual temporal signal from 134 HUC8 sub-watersheds resulting in a 134 × 73 matrix. The extracted drought indicator signals are defined as columns in the feature matrix, urn:x-wiley:23335084:media:ess21149:ess21149-math-0027. The estimated mixing matrix, urn:x-wiley:23335084:media:ess21149:ess21149-math-0028, represents how each of the common drought indicator signals is represented in each sub-watershed. Then, the sub-watersheds are grouped based on the dominance of extracted drought indicator signals within each sub-watershed. This workflow is shown in Figure 2, which illustrates the clustering process for rn and allows us to identify the temporally unique drought indicator signals observed throughout the study region for different ESM modeled climate projections. Then we apply theoretical and site knowledge to relate the extracted signals to physiographical characteristics, which allows us to clarify the contributing factors to the low flow and drought events in CRB. NMFk has previously been applied to various climate and hydrologic applications (Alexandrov & Vesselinov, 2014; Fleming et al., 2021; Throckmorton et al., 2016; Vesselinov et al., 20182019), but has never been applied to future hydro-climate projections on this spatial scale or the application of climate and surface hydrology interaction.

Details are in the caption following the image

Process by which the Non-Negative Matrix Factorization using K-means clustering (NMFk) algorithm is applied to the drough indicator data. A 2d matrix of minimum localized streamflow contribution(qn) is created using the 134 HUC8 sub-watersheds with 73 5-day timesteps of streamflow contribution throughout a year. This matrix is input into NMFk which clusters similar temporal signals of qn together.

3 Results

The change in temperature and precipitation across the CRB for the a suite of 14 ESMs in the MACA database is shown in Figure 3. The mean temperature increase of the 14 ESM's is approximately urn:x-wiley:23335084:media:ess21149:ess21149-math-0029°C The mean precipitation also increase by has large variance among the models (urn:x-wiley:23335084:media:ess21149:ess21149-math-0030). Three of the selected ESM's used in the analysis project decreased annual precipitation (IPSL-CM5A-LR, −15.6%; MPI-ESM-LR, −3.33%; HadGEM2-ES365, −4.04%), while the other three project increased annual precipitation (GFDL-ESM2M, +1.38%; MIROC-ESM, +7.79%; GFDL-ESM2G, +8.51%). The mean changes in annual precipitation and temperature are shown in Table 1 for each of the six models.

Details are in the caption following the image

Average annual Precipitation changes (%) plotted against temperature changes (°C) for the Colorado River Basin region for 14 different Earth System Model’s (ESM’s) and the RCP 8.5 scenario. GFDL-ESM2G and IPSL-CM5A-LR models are highlighted in red. Of the 14 ESMs, six were used in our analysis to cover the range of ESM results for precipitation and temperature change. Those six models are presented in Supporting Information S1 with the two highlighted models (GFDL-ESM2G and IPSL-CM5A-LR) discussed in detail here. The vertical and horizontal black lines represent the multi-model mean of projected temperature and precipitation change, respectively.

Table 1. Projected Change in Mean Annual Temperature and Precipitation in the Colorado River Basin Simulated Using the Six Earth System Model Models Used in This Study
ΔT (°C) ΔP (%)
IPSL-CM5A-LR 6.33 −15.60
HadGEM2-ES365 6.35 4.04
MPI-ESM-LR 5.03 3.33
GFDL-ESM2M 4.07 1.38
MIROC-ESM 6.98 7.79
GFDL-ESM2G 4.56 8.51
  • Note. IPSL-CM5A-lR and GFDL-ESM2G are in bold and are discussed in detail in this study while the other models are presented in Supporting Information S1.

For brevity, we focus our presentation of results on the wettest and driest models assessed (GFDL-ESM2G and IPSL-CM5A-LR, respectively), and these models are highlighted in Figure 3. GFDL-ESM2G also exhibits significantly less warming (+4.56°C) than IPSL-CM5A-LR (+6.33°C), providing us with a warm and wet scenario (GFDL-ESM2G, referred to herein as warm/wet scenario) and a hot and dry scenario (IPSL-CM5A-LR, referred to herein as hot/dry scenario). Results for other ESMs at three signals can be found in Supporting Information S1 and will be mentioned in the text where the results of ESMs showed similar or dissimilar behavior. GFDL-ESM2G is labeled Wet, and IPSL-CM5A-LR is labeled as Dry in figures. Here, we present results at 2, 3, and 4 signals where NMFk has determined a robust solution. While NMFk is capable of determining the optimal number of signals, we present multiple solutions to allow for comparison between ESM's and indicators. Further, the optimal solution mathematically may not be the most interpretable with respect to our research focus on drought. Thus, we present a range of the robust solutions as determined by NMFk.

3.1 Maximum Temperature (Tempx)

The spatial clustering of maximum temperature (tempx) for 2, 3, and 4 signals and each warm/wet and hot/dry scenario is shown in Figure 4. The rows in Figure 4 show the NMFk model results at differing number of signals (2, 3, or 4 signals), while each of the columns show the results of a particular climate scenario and time period (hot/dry, or warm/wet scenario, Historical/Future/Delta). With two signals (panel a1–a6), the sub-watersheds sort into the high-elevation Upper CRB and the low-elevation Lower CRB for both future and historical periods. The NMFk solution at two signals are able to consistently produce solutions across differing climate scenarios. The extracted two tempx signals consistently separates into the Upper and Lower CRB, with only a few solutions of NMFk found beyond two signals (panel b4, b6, c6). Nevertheless, the spatial clustering based on extracted tempx for higher number of signals still roughly follow similar latitudinal and elevational gradients as in the 2-signal solution.

Details are in the caption following the image

Non-Negative Matrix Factorization using K-means clustering (NFMk) spatial grouping of HUC8 subsub-watersheds based on tempx (maximum temperature) data set using solutions for 2, 3, and 4 extracted signals. The historical and future time periods, as well as the delta, are shown for both wet and dry scenarios. Each panel represents an independent NMFk clustering and the colors shown are not meaningful to one another across panels. Blank panels represent cases for each NMFk could not produce an acceptable solution.

Figure 5 shows the temporal signal separation in tempx for the warm/wet and hot/dry scenarios. There is a clear separation in the temporal pattern in tempx between the Upper and the Lower CRB clusters for the case of two signals. For both historical and future periods, the Upper CRB exhibits cooler temperatures, as expected. The separation between signals is consistent throughout the year, with slightly more separation during the winter months (panels a1, a2). However, the clustering based on tempx extracted signals varies across the models, exhibiting large differences between panels a3 and d3 of Figure 5. Further, seasonal tempx differences in the “delta” period vary across ESM's as can be seen the Supporting Information S1 and do not appear to have a clear relationship with the projected change in precipitation.

Details are in the caption following the image

Temporal Non-Negative Matrix Factorization using K-means clustering clustering of HUC8 subsub-watersheds based on the annual tempx (maximum temperature) signals for both IPSL-CM5A-LR (dry scenario) and GFDL-ESM2G (wet scenario) simulations. Solutions for 2, 3, and 4 extracted signals are presented for each time period. The clustering on this figure corresponds directly to the spatial clustering in the appropriate panels of Figure 3 (e.g., blue, red, etc.). Each line represents a single sub-watershed, while the dashed lines are representing the cluster medians at each time-step.

3.2 Dry Dates (Dryd)

The spatial clustering of dryd at two signals shows a distinct grouping in the southeast of the CRB, with the remainder of the CRB clustering together (Figure 6, panels a1–a4). This grouping grows slightly from historical to future and largely remains intact with increasing numbers of signals. At higher signals, we see less convergence and less agreement in groupings across models and time periods (Figure 6, panels b1–c6). However, the southeast grouping is represented across different scenarios and time periods, while the clustering of the remainder of the CRB sub-watersheds is more varied.

Details are in the caption following the image

Non-Negative Matrix Factorization using K-means clustering spatial grouping of HUC8 subsub-watersheds based on dryd (dry days) data set using solutions for 2, 3, and 4 extracted signals. The historical and future time periods, as well as the delta, are shown for both wet and dry scenarios. Each panel represents an independent NMFk clustering and the colors shown are not meaningful to one another across panels. Blank panels represent cases for each NMFk could not produce an acceptable solution.

Looking at the temporal pattern for two signals (Figure 7, panels a1–a3, d1–d3), it is evident that the grouping of the southeast portion of the watershed is characteristic of fewer dryd during the summer months, for both historical and future. At a higher number of signals in the historical and future periods (panel b1, e1, f1–f2), the temporal signal separation between signal magnitude is more evident in the spring and fall as well. Still, the strength of the summer seasonality in dryd remains a determining factor in the clustering of sub-watersheds, especially for the cluster in the southeast basin (blue).

Details are in the caption following the image

Temporal Non-Negative Matrix Factorization using K-means clustering clustering of HUC8 subsub-watersheds based on the annual dryd (dry days) signals for both IPSL-CM5A-LR (dry scenario) and GFDL-ESM2G (wet scenario) simulations. Solutions for 2, 3, and 4 extracted signals are presented for each time period. The clustering on this figure corresponds directly to the spatial clustering in the appropriate panels of Figure 6 (e.g., blue, red, etc.). Each line represents a single sub-watershed, while the dashed lines are representing the cluster medians at each time-step.

The delta period results for dryd again tends to cluster along the Upper and Lower CRB at two signals across all climate scenarios, while the magnitude of the temporal signal of these groupings tends to be quite different between climate scenarios. The warm/wet scenario shows the Upper CRB as mostly experiencing fewer dryd throughout the year, and the Lower CRB experiences more dryd in the spring and fewer in the summer. The hot/dry scenario shows that both Upper and Lower CRB experience mostly more dryd throughout the year with some variability. It also shows a distinct increase in dryd in the Lower CRB for the month of July.

3.3 Maximum Evapotranspiration (Evapx)

The spatial results for evapx, shown in Figure 8, again exhibit a separation between Upper CRB and Lower CRB at two signals (panels c1–c6), although more watersheds tend to fall into the Lower CRB grouping compared to tempx and dryd. We also see that a few watersheds in the Lower CRB geographically are grouped in the Upper basin under the historical evapx time period but group with the Lower basin under future periods. While we see similar spatial clustering between scenarios for the historical and future periods for two signals (panels a1–a2), the patterns diverge dramatically for the delta for two signals. The hot/dry scenario groups a large portion of the Southwest CRB along with the Upper CRB (Figure 8; panel a5), while the warm/wet scenario shows a delineation between clusters further to the north and running roughly east-west (panel a6). At three or more signals, evapx again shows a similar spatial cluster across scenarios in the historical but diverges under the future period (Figure 8; panels b1–c6). Further, the spatial clusters become less contiguous, in some, but not all, cases (panels b4–b5, c3).

Details are in the caption following the image

Non-Negative Matrix Factorization using K-means clustering (NFMk) spatial grouping of HUC8 subsub-watersheds based on evapx (maximum evapotranspiration) data set using solutions for 2, 3, and 4 extracted signals. The historical and future time periods, as well as the delta, are shown for both wet and dry scenarios. Each panel represents an independent NMFk clustering and the colors shown are not meaningful to one another across panels. Blank panels represent cases for each NMFk could not produce an acceptable solution.

The temporal signals of evapx, exhibited in Figure 9, show a clear pattern. At two signals (Figure 9; panels a1–a2, d1–d2), the Upper CRB exhibits a peak in evapotranspiration in the summer and a minimum in evapotranspiration in the winter, while the Lower CRB grouping shows a peak in evapotranspiration in both March and a larger peak in the late summer months with a dip in evapotranspiration during May and June. At three or more signals (panels b1–b2, c1–c2, e1–e2, f1–f2) we see that the separation in temporal signals is largely determined by whether the signal has one peak in the early summer, or two peaks in the spring and late summer. Further, clustering is determined by the intensity of the second peak in the late summer and fall.

Details are in the caption following the image

Temporal Non-Negative Matrix Factorization using K-means clustering clustering of HUC8 subsub-watersheds based on the annual evapx (maximum evapotranspiration) signals for both IPSL-CM5A-LR (dry scenario) and GFDL-ESM2G (wet scenario) simulations. Solutions for 2, 3, and 4 extracted signals are presented for each time period. The clustering on this figure corresponds directly to the spatial clustering in the appropriate panels of Figure 8 (e.g., blue, red, etc.). Each line represents a single sub-watershed, while the dashed lines are representing the cluster medians at each time-step.

The scenario results show large disagreement in whether evapx is decreasing or increasing, particularly in the summer (Figure 9, panels a3, d3) when the discrepancy in temperature is greatest. The hot/dry scenario shows that evapx is decreasing across the entire basin, especially during the summer months. Further, the future hot/dry scenario shows the Upper CRB exhibiting the same summer dip in evapx as the Lower CRB. The warm/wet scenario shows increasing evapx in the Upper CRB throughout the year and increasing evapx across the entire CRB during July. In the warm/wet scenario, the cluster in the Upper CRB which exhibits a single peak early in the summer is consistent between historical and future time periods, both spatially and temporally.

3.4 Minimum Soil Moisture (Soilmn)

The spatial clustering of soilmn, shown in Figure 10, forms the least contiguous groupings of any of drought indices. At three signals, sub-watersheds within a single group (red) are scattered throughout the CRB. Further, no NMFk solutions for any scenario or time period converge beyond three signals. When evaluating the delta in soilmn, it appears that differences between clusters are more localized and that local topographyd plays a major role in the spatial clustering. Further, at three signals, a small band of sub-watersheds is grouped at the center of the Lower CRB (blue at three signals; panels b4–b6), while many of the highest elevation sub-watersheds of the CRB tend to group together.

Details are in the caption following the image

Non-Negative Matrix Factorization using K-means clustering (NFMk) spatial grouping of HUC8 subsub-watersheds based on soilmn (minimum soil moisture) data set using solutions for 2, 3, and 4 extracted signals. The historical and future time periods, as well as the delta, are shown for both wet and dry scenarios. Each panel represents an independent NMFk clustering and the colors shown are not meaningful to one another across panels. Blank panels represent cases for each NMFk could not produce an acceptable solution.

Details are in the caption following the image

Temporal Non-Negative Matrix Factorization using K-means clustering clustering of HUC8 subsub-watersheds based on the annual soilmn (minimum soil moisture) signals for both IPSL-CM5A-LR (dry scenario) and GFDL-ESM2G (wet scenario) simulations. Solutions for 2, 3, and 4 extracted signals are presented for each time period. The clustering on this figure corresponds directly to the spatial clustering in the appropriate panels of Figure 10 (e.g., blue, red, etc.). Each line represents a single sub-watershed, while the dashed lines are representing the cluster medians at each time-step.

The temporal signal for soilmn, shown in Figure 11 similarly shows a wide range of behavior and a large range in soilmn magnitudes. In both historical and future periods, the temporal pattern shows a grouping of sub-watersheds with little to zero soilmn and little soilmn seasonality. Other sub-watersheds show a spring peak in soil moisture, but exhibit a large range of magnitude in soilmn for those sub-watersheds. Looking at the delta for soilmn, we see that the spring peak is shifting earlier in the year and becoming larger. The grouping mentioned previously as a band of sub-watersheds across the lower CRB is largely losing soilmn when assessed in Figure 11 (panels a3, b3, e3, f3). The signals and seasonality of soilmn clusters between climate scenarios are quite similar, although the models disagree on the magnitude of soilmn and the magnitude of the seasonality. The hot/dry scenario exhibits a decrease in soil moisture across the CRB and a smaller peak in spring soil moisture in the future, while the warm/wet scenario shows mostly increasing soil moisture throughout the year and a similar magnitude in spring soilmn peak from historical to future.

Details are in the caption following the image

Non-Negative Matrix Factorization using K-means clustering (NFMk) spatial grouping of HUC8 subsub-watersheds based on rn (minimum combined runoff and baseflow) data set using solutions for 2, 3, and 4 extracted signals. The historical and future time periods, as well as the delta, are shown for both wet and dry scenarios. Each panel represents an independent NMFk clustering and the colors shown are not meaningful to one another across panels. Blank panels represent cases for each NMFk could not produce an acceptable solution.

3.5 Minimum Runoff (rn)

The spatial clustering of rn shows a clear separation in Figure 12 (panels a1-b6) between the highest elevation and mountainous sub-watersheds within the CRB and the lower elevation sub-watersheds. At four signals (panels c1–c6), the clustering further splits the lower elevation and downstream sub-watersheds such that we begin to see sub-watersheds of the larger Green River valley grouped together (red; panels a1, a2, and a4) and a southeastern portion of the CRB grouped together (blue). From historical to future, the clusters of the Lower CRB begin to expand into the Upper CRB clusters. The delta panels show similar clustering to the historical and future time periods. However, the high elevation clusters tend to be less contiguous at three and four signals, and several individual sub-watersheds in the southern portion of the CRB associate with the highest elevation sub-watersheds at three signals.

Details are in the caption following the image

Temporal Non-Negative Matrix Factorization using K-means clustering clustering of HUC8 subsub-watersheds based on the annual rn (minimum combined runoff and baseflow) signals for both IPSL-CM5A-LR (dry scenario) and GFDL-ESM2G (wet scenario) simulations. Solutions for 2, 3, and 4 extracted signals are presented for each time period. The clustering on this figure corresponds directly to the spatial clustering in the appropriate panels of Figure 12 (e.g., blue, red, etc.). Each line represents a single sub-watershed, while the dashed lines are representing the cluster medians at each time-step.

The temporal signals of rn, shown in Figure 13, exhibit separation between signals based on the strong spring seasonality between different sub-watersheds. There are clear differences between clusters based on the timing and magnitude of a spring peak in rn, with the largest peaks in runoff occurring later in the spring. The sub-watersheds with the largest seasonal peak in rn also correspond to the high-elevation mountainous sub-watersheds seen in Figure 12. For both models, the peak in rn shifts earlier in the year during the future period.

The delta also shows an increase in the rn in the mountainous sub-watersheds during March through May, followed by a decrease during June where rn peaks during the historical period. At three or more signals, the sub-watersheds with the larger changes in rn tend to be those with a peak in runoff later in the year. The warm/wet scenario shows a seasonal runoff peak in the future equal or greater than that of the past, while the hot/dry scenario shows a much smaller runoff peak in the future.

Overall, NMFk was able to converge on a solution for nearly all scenarios and time periods at four signals and some instances beyond four signals, suggesting that significant behavioral differences exist in the rn signal and the expected delta in rn signal.

4 Discussion

4.1 Usefulness of ML to Identify Climate Impacts of Drought

The applied unsupervised ML algorithm based on non-negative matrix factorization (NMFk) proved useful in separating the annual signatures of various drought indicators. The algorithm automatically detected seasonal differences in rn, soilmn, evapx, and dryd which can be explained by differences in climate, precipitation sources, and snowmelt timing. NMFk was also able to distinguish between watersheds based on the magnitude of the extracted signal as in the case of soilmn, tempx, and evapx. NMFk was particularly useful when applied to the delta estimates in drought indicators for the sub-watersheds representing the historic and future model outputs. NMFk was able to identify key watersheds where drought indicators were projected to change the most or experience a significant change in seasonality. However, because we are not modeling drought or using a specific drought index (Dai, 2011; Palmer, 1965) directly, it is difficult to quantify how the indicators will concurrently contribute to drought in the future. NMFk could be applied to a drought indicator directly to achieve the same seasonal separation of sub-basins based on seasonal behavior. However, for our study, we chose to focus on the hydrologic process level indicators that contribute to drought.

4.2 ESM Projections of Drought Indicators for the CRB

The spatial and temporal pattern of signal separation in dryd clearly demonstrates the influence of the North American Monsoon (NAM) as a dominant precipitation signal in the southern CRB. The NAM is most prominent in the Southeastern CRB from late June to September, resulting in an increase in precipitation (Adams & Comrie, 1997). The results show the spatial influence of the NAM increasing in the future. However, the separation of temporal signals for dryd does not change significantly during the active summer monsoon season and change in summer dryd varies across climate scenarios. Previous studies on the modeled trajectory or observed trends in the NAM are often contradictory as to whether the NAM is intensifying or weakening (Colorado-Ruiz et al., 2018; Demaria et al., 2019; Luong et al., 2017). The ML analysis of evapx also shows signs of influence from the NAM. The second spike in evapotranspiration in the Lower CRB in the late summer demonstrates the water inputs provided by the NAM. Further, the ML extracted spatial patterns for the Lower CRB sub-watersheds at three and four signals appears dependent on the strength of the NAM in those areas.

The evapx extracted signals show a clear separation between two evaporation regimes: the water-limited Lower CRB and a more radiation-limited Upper CRB. The water-limited nature of the Lower CRB explains the bi-modal annual signal of the Lower CRB, where water is available for ET predominantly in the wetter spring months and later during the late-summer monsoon. Comparatively, the historical Upper CRB shows an annual ET signal similar to what we would expect from annual radiation, suggesting that ET is determined by radiation and not by the available-water. The hot/dry scenario shows a distinct shift in the future toward an increasingly water limited regime in the summer across the entire CRB. The future hot/dry scenario shows a large dip in evapx across the basin in June and July when evapotranspiration decreases because of a lack of available water. Increasing evaporative demand associated with climate change is a key driver of drought in the American Southwest, with previous studies showing that increases in evaporative demand may overcome any increases in future precipitation (Ault et al., 2016; Cook et al., 20142015). Our study shows increasing evaporative demand in critical sub-basins as an important driver of drought.

The future hot/dry scenario clustering also shows many of the sub-watersheds within the Green River Valley near the border of Colorado, Utah, and Wyoming clustering together. The extracted temporal signal for this clustering is characterized by a large peak in evapx during the late spring, and a large dip in evapotranspiration in June. The results show that the Green River Valley area may experience large drought pressures from increasing aridity combined with changes in the seasonality of runoff contribution to streamflow and snowmelt upstream. Further, previous studies have cited increasing evapotranspiration as a major risk in the reduction of Colorado River streamflow (Milly & Dunne, 2020; Udall & Overpeck, 2017).

The ML results of both soilmn and rn exhibit large influences from changes in snowmelt behavior. A seasonal increase in soilmn and rn occurs concurrently during the spring snowmelt period. Spatially, rn separates neatly into the snow-dominated mountainous regions of the CRB and sub-watersheds with relatively little snowfall. Soilmn, however, does not. Instead, influences from vegetation, geology, topography, and soil type likely complicate the soil moisture signals as we see a large difference in soil moisture magnitude in the ML results. Changes in soilmn seem to reflect both seasonal changes in snowmelt and larger changes in soil moisture magnitude throughout the year. A key area of change is the collection of sub-watersheds in the mountainous region of Arizona which group together in the “delta” analysis. This region exhibits a large loss in soilmn throughout the year, especially when projected by the hot/dry scenario, but also for the wet scenario. This could be caused by a decrease in orographic precipitation due to drier air, combined with an increase in evapotranspiration due to an increase in vapor pressure deficit. The combined pressures of increasing vapor pressure deficit and loss of snowmelt could drive this region to experience a severe decrease in existing soil moisture, regardless of precipitation changes. The delta of soilmn is drastically different between climate models as the hot/dry scenario shows large decreases in soilmn and the warm/wet scenario exhibits large increases across nearly the entire basin. The consistency of this discrepancy suggests that differences in projected temperature contribute to large changes in soil moisture as higher temperature shift the moisture balance toward drier conditions (Ault et al., 2016).

The runoff delta indicates a significant shift in the timing of peak streamflow contribution for the entire CRB and especially the mountainous regions. This shift in streamflow is well documented and has implications in reservoir management and water availability for irrigation (Christensen et al., 2004; Ficklin et al., 2013; Solander et al., 2017). However, the variability in projected climate scenarios results in significant variability in the magnitude of runoff. The hot/dry scenario forecasts significantly lower rn values in the future, while the wet scenario forecasts little change in rn magnitude while also exhibiting significant shifts in the timing of spring snowmelt runoff.

Previous studies of snowpack trends in the western United States have found that while large snowpack losses have been observed in mid-altitude areas, the relatively higher altitude regions have experienced little to no change in the snowpack (Bales et al., 2006; Howat & Tulaczyk, 2005). However, high elevation areas of the CRB are projected to see a large loss of snowpack as temperatures continue to rise (Fyfe et al., 2017; Pederson et al., 2013; Rhoades et al., 2018). The ML-detected behavior shifts for snowmelt regimes in the CRB is interesting. This finding demonstrates the capability of the ML algorithm in separating the shifts in hydrologic behavior related to climate change. For example, ML results for two extracted signals clearly identify the areas of large runoff changes due to snowmelt in the mountainous regions of the CRB. Further, at a greater number of signals, the algorithm was able to separate the mountainous regions exhibiting snowmelt into separate groups where snowmelt changes were more or less severe, delineating where differences in behavior exist based on threshold hydrologic responses to gradients of temperature change.

4.3 Limitations and Future Work

While NMFk can cluster the indicators and identify common patterns, the interpretation of the results would require additional work in parsing the direction of change and the importance of drought indicators. While NMFk works as an efficient tool to separate the behavioral changes and differences in space and time, the results do require interpretation. Additionally, experimental design is crucial in the application of NMFk, as without a narrowly defined research objective the output of NMFk can be difficult to understand. Further, while NMFk can determine the mathematically optimal number of signals for a given data set, we found that the optimal number of signals was not always the most interpretable solution with respect to our specific research questions.

The ESM projections and VIC modeling results in the CRB show large changes to the hydrologic functioning. The ESM projections for temperature generally show similar projections across all ESMs as well as those in Supporting Information S1. However, large variance in the projection of future precipitation exists (Dai, 2006) and tends complicates the projection of the CRB hydrologic behavior.

Future work will include a more detailed analysis of the raw data and processes (e.g., evaporation, runoff) contributing to future drought pressures within the areas identified by NMFk but for finer scales (i.e., grid cell). Additionally, NMFk runs will be applied to a drought index or metrics of drought effects (e.g., economic) to better tie the process-level changes identified by NMFk to drought impacts within the CRB. Alternatively, research could focus on quantifying the differences in economic drought impact between the identified NMFk cluster based on the drought indicators identified in this study. The ability of NMFk to rapidly separate behavioral changes in spatio-temporal patterns is particularly useful in the application of climate change research and could be applied to other hydroclimate change detection applications as well.

5 Conclusions

Using a novel application of unsupervised ML based on non-negative matrix factorization, we were able to separate seasonal watershed behaviors related to drought across a large range of environmental and climatic factors. Using historical and future climate projections from ESMs, we could rapidly assess seasonal changes in the behavior of drought under different climate conditions. Overall, we found that the NMFk algorithm is a valuable tool in identifying and interpreting the key regions, timing, and magnitude of change in drought indicators where future research and analysis can be more focused on certain processes or regions where drought pressures appear to be increasing.

Among the most pertinent changes was the seasonality and magnitude of rn related to the timing and magnitude of snowmelt runoff. The ML algorithm automatically separated the sub-watersheds in the mountainous regions of the CRB into separate groups based on differences in the rn signal response. While large changes in soilmn for some regions were observed in the results, the modeled climate scenarios showed large disagreement on whether the soilmn was decreasing or increasing across large areas in the CRB. Some mountainous regions of Arizona indicated a decrease in soilmn for both ESM scenarios; likely a result of changes in precipitation and temperature inputs, loss of snowpack, and increases in evapotranspiration demands. A decrease in summer evapx was found in many basins, which indicates a lack of water available for evapotranspiration in these basins. The shift toward a water-limited evaporation regime was most evident in the hot/dry scenario model (IPSL-CM5A-LR) but was also observed in some sub-watersheds in the warm/wet scenario model (GFDL-ESM2G) as well. Areas of the Green River Valley in the Upper CRB appear to be particularly vulnerable to a shift in evapx due to water availability. The range of possible climate change considered here, regardless of ESM model, does point to a hotter CRB with large changes in the timing and magnitude of runoff, evapotranspiration, and soil moisture that will present challenges in managing water resources in the future.

Acknowledgments

CT and KEB, were funded under the Los Alamos National Laboratory Lab Directed Research and Development (LDRD) Early Career Research Program (20180621ECR) for components of the work. Additionally, we acknowledge the initial support under LDRD DR (20150397DR), which allowed KEB to develop the VIC modeling work the analysis is based on (completed under Los Alamos Director's Postdoctoral Fellowship, 20160654PRD). We acknowledge the support of Dr. Feng Yu in early iterations of this work.

    Data Availability Statement

    Downscaled CMIP5 climate model projections may be downloaded via the MACA web portal: https://climate.northwestknowledge.net/MACA/data_portal.php (accessed on 30 March 2022) (Abatzoglou & Brown, 2012; Taylor et al., 2012). VIC model may be downloaded via GitHub: https://github.com/UW-Hydro/VIC and https://doi.org/10.5281/zenodo.5781377 (Accessed 30 March 2022) (Hamman et al., 2021). Historical VIC forcing data may be obtained from ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/OBS/livneh2014.1_16deg/ (accessed on 20 October 2020; Livneh et al., 2015). Naturalized streamflow data for the Colorado River basin may be obtained from USBR: https://www.usbr.gov/lc/region/g4000/NaturalFlow/current.html (accessed on 30 March 2022; U.S. Bureau of Reclamation, 2012). Extreme event indicators used in this work may be downloaded https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1862040 (Accessed May 9th, 2022 (Bennett et al., 2022). Other VIC model outputs and VIC model parameter files for the Colorado River basin may be obtained by contacting the authors. The applied unsupervised machine learning based on non-negative matrix factorization (NMFk) is open source and a part of a general AI/ML framework called SmartTensors. The source code, documentation, examples, and results from other ML studies are available at https://github.com/SmartTensors (Accessed 30 March 2022; Vesselinov et al., 2019).