Volume 58, Issue 11 e2021WR031753
Research Article
Open Access

Community Workflows to Advance Reproducibility in Hydrologic Modeling: Separating Model-Agnostic and Model-Specific Configuration Steps in Applications of Large-Domain Hydrologic Models

W. J. M. Knoben

Corresponding Author

W. J. M. Knoben

Centre for Hydrology, University of Saskatchewan, Canmore, AB, Canada

Correspondence to:

W. J. M. Knoben,

[email protected]

Search for more papers by this author
M. P. Clark

M. P. Clark

Centre for Hydrology, University of Saskatchewan, Canmore, AB, Canada

Department of Geography and Planning, University of Saskatchewan, Saskatoon, SK, Canada

Search for more papers by this author
J. Bales

J. Bales

Consortium of Universities for the Advancement of Hydrologic Science, Inc, Cambridge, MA, USA

Search for more papers by this author
A. Bennett

A. Bennett

Hydrology & Atmospheric Sciences, University of Arizona, Tucson, AZ, USA

Search for more papers by this author
S. Gharari

S. Gharari

Centre for Hydrology, University of Saskatchewan, Saskatoon, SK, Canada

Search for more papers by this author
C. B. Marsh

C. B. Marsh

Centre for Hydrology, University of Saskatchewan, Saskatoon, SK, Canada

Search for more papers by this author
B. Nijssen

B. Nijssen

Civil and Environmental Engineering, University of Washington, Seattle, WA, USA

Search for more papers by this author
A. Pietroniro

A. Pietroniro

Schulich School of Engineering, Department of Civil Engineering, University of Calgary, Calgary, AB, Canada

Search for more papers by this author
R. J. Spiteri

R. J. Spiteri

Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada

Search for more papers by this author
G. Tang

G. Tang

Centre for Hydrology, University of Saskatchewan, Canmore, AB, Canada

Search for more papers by this author
D. G. Tarboton

D. G. Tarboton

Utah Water Research Laboratory, Utah State University, Logan, UT, USA

Search for more papers by this author
A. W. Wood

A. W. Wood

National Center for Atmospheric Research, Boulder, CO, USA

Search for more papers by this author
First published: 28 October 2022
Citations: 6

Abstract

Despite the proliferation of computer-based research on hydrology and water resources, such research is typically poorly reproducible. Published studies have low reproducibility due to incomplete availability of data and computer code, and a lack of documentation of workflow processes. This leads to a lack of transparency and efficiency because existing code can neither be quality controlled nor reused. Given the commonalities between existing process-based hydrologic models in terms of their required input data and preprocessing steps, open sharing of code can lead to large efficiency gains for the modeling community. Here, we present a model configuration workflow that provides full reproducibility of the resulting model instantiations in a way that separates the model-agnostic preprocessing of specific data sets from the model-specific requirements that models impose on their input files. We use this workflow to create large-domain (global and continental) and local configurations of the Structure for Unifying Multiple Modeling Alternatives (SUMMA) hydrologic model connected to the mizuRoute routing model. These examples show how a relatively complex model setup over a large domain can be organized in a reproducible and structured way that has the potential to accelerate advances in hydrologic modeling for the community as a whole. We provide a tentative blueprint of how community modeling initiatives can be built on top of workflows such as this. We term our workflow the “Community Workflows to Advance Reproducibility in Hydrologic Modeling” (CWARHM; pronounced “swarm”).

Key Points

  • Reproducible, transparent modeling increases confidence in model simulations and requires careful tracking of all model configuration steps

  • We show an example of model configuration code applied globally that is traced and shared through a version control system

  • Standardizing file formats and sharing of code can increase efficiency and reproducibility of modeling studies

1 Introduction

Confidence in published findings depends on the reproducibility of the experiments and analyses that support these findings. In computational Earth System sciences research, reproducibility requires knowledge of the computer code and data that underpin a given manuscript. Such computer code can range from a few lines of code that are used to turn data into figures or compute certain statistical properties of the data to modern process-based hydrologic models that can contain many thousands of lines of code. Despite encouraging progress in journal policies (Blöschl et al., 2014; Clark, Luce, et al., 2021), it is still difficult to reproduce published findings in the hydrologic sciences (Hutton et al., 2016; Stagge et al., 2019). Stagge et al. (2019) estimate that results may only be reproducible for between 0.6% and 6.8% of nearly 2000 peer-reviewed manuscripts published in six hydrology and water resources journals, due to a lack of sufficiently clearly described methods and a lack of the necessary input data or processing code.

In complex process-based hydrologic model applications, one additional barrier to reproducibility is the effort required to configure the model. It is not uncommon to hear claims that in such modeling studies 80% of overall effort is spent on configuring the model for a specific use case, and only 20% of overall effort is spent on using the model to answer research questions (e.g., Table 2.8 in Miles, 2014). Model configuration efforts are spent on assembling appropriate data sources for meteorological forcing data and geospatial parameter fields, wrangling these data into the specific format required by the model, defining appropriate model settings, and specifying the required computational infrastructure (e.g., finding the right collection of software libraries, installing or compiling the model, and creating the required scripts to run the model). Additional time costs arise from dealing with the subjectivity in defining appropriate computational subdomains (such as where to draw the boundaries for Hydrologic Response Units (HRUs; Flügel, 1995)), interpreting soil and land cover maps, aggregating geospatial data into some form of representative value for a computational unit, and the associated iterative model configuration and testing steps. This model configuration process is typically poorly documented and extremely time-consuming. In short, the reproducibility problem for process-based hydrologic modeling occurs in part because of the lack of efficiency in model configuration tasks.

Reproducibility of computational science can be improved by following certain recommended best practices for open, accessible, and reproducible science (e.g., Gil et al., 2016; Hutton et al., 2016; Sandve et al., 2013; Stodden & Miguez, 2013). Most focus is currently on advancing the Findable, Accessible, Interoperable, Reusable (FAIR) principles (Wilkinson et al., 2016). Reproducibility requires FAIR data but also includes sharing details about hardware, software versions, and data versions (Añel, 2017; Bast, 2019; Hut et al., 2017; Sandve et al., 2013). The environmental modeling community is interacting with these prescribed best practices in multiple ways. Choi et al. (2021) identify three ongoing main thrusts aimed at making computational environmental science more open, reusable, and reproducible. First, data and models are increasingly openly available online through services as GitHub, Hydroshare, and institutional repositories. Second, computational environments are increasingly recorded and standardized through container applications (e.g., Docker and Singularity) or in self-documenting notebooks. Third, Application Programming Interfaces (APIs) such as the pySUMMA API (Choi, 2020; Choi et al., 2021) make interacting with complex models or data increasingly easier. In practice, however, most progress in FAIR science is arguably on Accessibility, whereas the other aspects of FAIR have received less attention.

A key issue is that little attention is devoted to efficient reproducibility of the full modeling workflow, which includes data acquisition, data preprocessing, model installation, model runs, and postprocessing of simulations. Efficiency is promoted in a general sense through freely shared code and packages that perform specific tasks in the modeling chain (e.g., see Slater et al., 2019, for an overview of R packages that can be used to populate a modeling workflow), and with model-specific tools such as VIC-ASSIST (Wi et al., 2017). Dedicated efforts to ensure end-to-end reproducibility of modeling studies are less common. Exceptions are Leonard and Duffy (2013, 20142016), who provide an in-depth description of a web-based interface for data preprocessing and visualization of simulations from the PIHM model, geographically constrained to the United States; Havens et al. (2019, 2020), who provide an end-to-end workflow for setting up, running, and analyzing a physics-based snow model; Vorobevskii et al. (2020), Vorobevskii (2022), who develop an R package that sets up a simple hydrologic model anywhere on the planet for a given domain discretization shapefile provided by the user; and Coon and Shuai (2022), who provide a Python-based tool to configure watershed models across the United States. Compared to sharing a model's input and output data (which would also enable a study to be reproduced), sharing complete workflows can be more efficient in terms of required storage space. A workflow also provides a transparent record of all modeling decisions and enables a more broadly defined form of reproducibility in which a study can be repeated for a different region, a different data set, or a different version of the same model to see if the original conclusions still hold.

The examples mentioned in the previous paragraph show that it is possible to document workflows for a specific model (or, perhaps more accurately, for a specific version of a model). A further challenge is in designing workflows in such a way that parts of a workflow that configures Model A can be reused in a workflow that configures Model B. We refer to such a design as separating the model-agnostic and model-specific parts of model configuration (see also Miles, 2014; Miles & Band, 2015, for an example of this concept using EcoHydroLib for general data preprocessing and RHESSysWorfklows for creating model-specific input files applied to small watersheds across the CONUS and Australia). In the case of process-based hydrologic modeling, models such as VIC (Hamman et al., 2018; Liang et al., 1994), MESH (Pietroniro et al., 2007), SUMMA (Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021), and SVS (Husain et al., 2016) can be different in how they discretize the modeling domain, the physical processes they include, and the equations used to describe a given process. However, at their core, these models are designed to solve the same general water and energy conservation equations (Clark, Zolfaghari, et al., 2021).

Consequently, the data requirements for a myriad of extant hydrologic models will vary in the specifics but are similar in a general sense. In particular, process-based hydrologic models have similar needs for meteorological forcing data and geospatial parameter fields. Preprocessing of these similar data requirements does not need to rely on specifics of the models themselves. For example, in the case of satellite-based MODIS land cover data, model-agnostic steps are (a) downloading the source data, (b) stitching the source data together into a coherent global map, (c) projecting this map into the Coordinate Reference System of interest, (d) subsetting from the global data only the domain of interest, and (e) mapping the resulting data in pixels onto model elements. Model-specific steps would be to convert the resulting information (i.e., which pixels/land classes are present per model element) to the specific format a model requires (e.g., storing the most common land class per model element as a value in a netCDF file which the model reads during initialization), and, if necessary, perform some form of data transformation to connect land class data to model parameter values or settings (e.g., by defining a lookup table that contains parameter values for each land cover type). Community-wide efficiency gains are possible if workflows distinguish between model-agnostic and model-specific steps and enable straightforward reuse of the workflow for model-agnostic steps (see also Essawy et al., 2016; Gichamo et al., 2020, who make this argument in the context of web-based model configuration tools).

The previous discussion leads us to conclude that the hydrologic modeling community can substantially improve how it shares model configuration code across modeling groups. The key issue is that model physics code is increasingly distributed under open-source licenses but the code that creates the necessary model inputs is typically neither well-documented nor available without contacting the model developers. To move toward a culture of community Earth System modeling, we define three distinct steps:
  1. For a given model, model configuration code should be publicly available and divided into model-agnostic and model-specific steps;

  2. The configuration workflows of multiple different models, ideally using different data sets, should be integrated into a proof-of-concept of a generalized model configuration workflow;

  3. A community-wide collaborative effort should refine the proof-of-concept into a flexible model configuration framework.

The purpose of this paper is to introduce an open-source model configuration workflow that enables full reproducibility of a process-based hydrologic model setup for any location on the planet, with the workflow code divided into model-agnostic and model-specific parts. In other words, we perform the first of the three steps outlined above. This advances our immediate goal of using this model configuration for a variety of projects by reducing the time commitment needed to create model configurations for different domains and by increasing confidence in the modeling outcomes due to increased transparency and the possibility to reproduce results. Our broader goal is to foster a community modeling culture within the Earth System sciences.

The workflow described in this manuscript contributes to this goal in two separate ways. First, our code is openly accessible and therefore reusable by others who wish to use all or part of it for their own experiments. Second, the documented lack of reproducible hydrologic science (e.g., Stagge et al., 2019) suggests that there are barriers within the hydrologic community to adopt more reproducible science. By providing a full example of how a reproducible modeling study can be designed, we intend to lower at least some of these barriers. A model-agnostic workflow approach, as proposed here, would also conform directly to ISO 9001 requirements for quality assurance and quality control systems for software development, as the World Meteorological Organization (WMO) describes in its guidance to WMO members on implementing a quality management system for national meteorological and hydrologic services (World Meteorological Organization, 2017).

The remainder of this paper is organized as follows. In Section 2, we outline several high-level design considerations for reproducible modeling workflows and describe how we implemented these principles in an example of such a workflow. The example workflow uses open-source input data with global coverage, an open-source, spatially distributed, physics-based hydrologic modeling framework (SUMMA; Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021), and an open-source network routing model (mizuRoute; Mizukami et al., 20162021) to generate hydrologic simulations across multiple spatial scales. Technical details about the models and a step-by-step description of the workflow code are given in Appendix A. In Section 3, we present three test cases, covering large-domain (global and continental) and local-scale model configurations to show that a single workflow can be used to configure experiments that vary in terms of spatial and temporal resolution and coverage. In Section 4, we reflect on the current state of reproducibility in large-domain hydrologic modeling, with particular focus on why existing efforts have seen only limited uptake and outline a path forward.

2 Increasing Efficiency and Reproducibility in Earth System Modeling

2.1 Workflow Design Considerations

The reproducibility of modeling studies can be improved through openly published workflows that track all decisions made during model configuration. We propose four general guidelines for such model configuration workflows in the Earth System sciences. These guidelines are informed by existing efforts to promote reproducibility and efficiency in large-domain modeling efforts and by our own experience with creating such large-domain model configurations for process-based hydrologic models. We consider challenges for novice and advanced modelers. Briefly, our recommendations are as follows:
  1. Separate model-agnostic and model-specific tasks. The steps in the workflow must remain model-agnostic for as much of the workflow as possible and provide outputs in standardized, commonly used data formats. This increases the potential utility of the code base for use in different projects and for users of different models.

  2. Clarity for modelers. The workflow must be easily accessible and useable in its default form. A clear structure of the code accompanied by accurate documentation and in-line comments increase the ease-of-use for novice and advanced modelers alike.

  3. Modularity encourages use beyond the original application. Customization of the workflow must be possible and easy. This makes it possible to adapt, improve, or change specific parts of the workflow to access new data sets, use new processing algorithms, or target different models.

  4. Traceability is key. Every outcome of each step in the workflow must be accompanied by metadata that describe the configuration code that generated the outcome. This guarantees that, even if changes are made to the model configuration code, any workflow outcome can still be traced back to its original settings.

In Section 2.2, we discuss an example of a model configuration workflow based on these design considerations. In Section 2.3, we first provide a general description of model configuration steps and then expand on each of the four points outlined above.

2.2 An Example Workflow for Large-Domain Hydrologic Modeling

2.2.1 Workflow Description

Based on the design considerations listed in Section 2.1, we created a model configuration workflow for the Structure for Unifying Multiple Modeling Alternatives (SUMMA; Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021) hydrologic model and the mizuRoute routing model (Mizukami et al., 20162021). Briefly, SUMMA is a process-based, spatially distributed hydrologic model that can be used to simulate the water and energy balance for given locations in space. mizuRoute is a vector-based routing model that can be used to route runoff from a hydrologic or land surface model through a river network. Detailed descriptions of both models can be found in Appendix A1. We selected both models for their flexible nature, computational capacity to model very large domains, and availability of local expertise. Implementing configuration code for specific models (i.e., SUMMA and mizuRoute) in a generalized workflow, as we describe in this paper, is the first step on a possible path toward a community modeling culture that we outline in the Introduction.

Figure 1 provides a high level overview of our workflow in five key steps:
  1. Workflow preparation, where workflow settings are defined and the necessary folder structures are generated;

  2. Model-agnostic preprocessing, accomplishing data preparation steps that do not rely on any characteristics of the models being used. Data resulting from this step can thus be used for multiple different models;

  3. Remapping of prepared data onto model elements. This step is listed as optional because not all models need this step;

  4. Model-specific preprocessing to create model input files based on the prepared data sources and generate model simulations;

  5. Analysis and visualization to summarize model simulations into statistics and figures.

Details are in the caption following the image

High-level overview of a workflow that separates model-agnostic and model-specific tasks. Model-agnostic tasks are shown in blue and model-specific tasks are shown in orange and red. A similarly high-level but more technical flowchart of such a workflow, using SUMMA (a process-based hydrologic model) and mizuRoute (a routing model) as example models, can be found in Figure A2. Technical details of our implementation of model-agnostic and model-specific processing steps can be found in Figures A3 and A4, respectively.

Progressively more detailed overviews of model-agnostic and model-specific tasks can be found in Appendix A2 and Figures A2A4. Despite the seemingly large number of model-specific tasks in those figures, the time costs (in terms of code development) are larger for the model-agnostic tasks. The design considerations presented in this section and our implementation of them as described in Section 2.3 are comparable to existing efforts in the field of ecohydrology involving the EcoHydroLib, RHESSysWorkflows, and HydroTerre tools (Choi, 2021; Leonard et al., 2019; Miles, 2014; Miles & Band, 2015), suggesting that this is a logical way to organize modeling workflows.

2.2.2 Workflow Scope

The workflow scope deliberately excludes spatial discretization and parameter estimation (Figure 2). The scope of our workflow implementation assumes that the user has access to a basin discretization stored as an ESRI shapefile that defines the area of interest as discrete modeling elements (e.g., grid cells and subbasins). Such a discretization may be derived from digital elevation models (see e.g., TauDEM or the geospatialtools code base, Chaney & Fisher, 2021; Sazib, 2016; Tesfa et al., 2011), or obtained from existing basin discretization products, such as HydroBASINS (Lehner & Grill, 2013) or the MERIT Hydro basin delineation (Lin et al., 2019). Moreover, the workflow does not currently include fine-tuning of model parameter values through calibration or estimation from auxiliary data sources. These calibration methods require selecting from a wide variety of calibration algorithms, each with their own strengths and weaknesses (e.g., Arsenault et al., 2014), and an even wider variety of objective functions that express the (mis)match between a model's simulations and observations of hydrologic states and fluxes (e.g., Clark, Vogel, et al., 2021; Gupta et al., 2008; McMillan, 2021; Mizukami et al., 2019; Murphy, 1988; Nash & Sutcliffe, 1970; Olden & Poff, 2003; Pushpalatha et al., 2012), relying on a variety of further choices related to spatial scaling (e.g., Samaniego et al., 2010), regionalization (e.g., Bock et al., 2015), and regularization of the calibration problem (e.g., Doherty & Skahill, 2006). These model calibration choices are not easily standardized and require auxiliary data in the form of observations that are not readily available globally. The modular nature of our workflow implementation allows methods for basin discretization and parameter estimation to be integrated easily into our existing code base, but doing so is planned for future work in an attempt to keep the scope of this first workflow example manageable.

Details are in the caption following the image

Schematic overview of a typical modeling workflow, with the scope of the example workflow described in this paper shown by the colored box. Dashed lines indicate potential connections between elements (such as geospatial parameter fields informing basin discretization and parameter calibration feeding back into the model setup step where parameters for a new run are defined) that are not yet included as part of our workflow.

2.2.3 Workflow Execution

We present this workflow as a collection of Bash and Python scripts, stored inside a folder structure that clearly indicates the appropriate order in which the scripts should be executed (see Section 4.2.3 for a discussion of the choice to use scripts instead of other options). The latest version of the workflow is available through GitHub: https://github.com/CH-Earth/CWARHM. The GitHub repository also contains further documentation that helps a user set up the required computational environment and provides succinct explanations of the purpose of various scripts, decisions, and assumptions in cases where such explanations are necessary. Last, the repository contains the basin discretization used for our third test case that divides the upper part of the Bow River basin (Alberta, Canada) into discrete modeling elements, so that users have immediate access to all the materials needed to implement our workflow.

2.3 Implementation of Workflow Design Recommendations

2.3.1 Separate Model-Agnostic and Model-Specific Tasks

Our first design principle recommends separating model-agnostic and model-specific tasks. Model-agnostic tasks (shown in blue in Figure 1; light gray in Figures A2 and A3) are those tasks that are the same regardless of the model being used, under the assumption that the model requires a given data input at all. In our workflow implementation these tasks include the downloading of meteorological forcing data and geospatial parameter fields (i.e., a digital elevation model [DEM], soil classes, and vegetation classes), in some cases clipping raw data sets to the domain of interest and mapping of these data onto model elements such as grid cells or catchments. Fully model-agnostic outputs in this example are netCDF (.nc) files of meteorological forcing data (i.e., gridded hourly data at 0.25° latitude/longitude resolution) and GeoTIFF (.tif) files of various geospatial parameter fields.

Model-specific tasks (shown in orange and red in Figure 1; dark gray in Figures A2 and A4) involve installing the chosen models, transforming the preprocessed data into the specific format the model requires, and running the models. In our workflow implementation this involves finding the mean elevation, mode land class, and mode soil class per model element and exporting certain information about the modeling elements (area, latitude and longitude location, slope of the river network, etc.) into the netCDF files our models expect.

Due to the complex nature of existing models and their long histories of development, certain tasks cannot be cleanly separated into model-agnostic and model-specific tasks. The mapping of prepared forcing data and geospatial parameter fields onto model elements (shown in dark blue in Figure 1; intermediate gray shade in Figures A2 and A3) is an example of such a task. Certain models run on the same spatial resolution as the forcing and/or geospatial data grid or are able to ingest gridded data in their native alignment and internally map these onto the required model discretization. In our case, this remapping must be done outside the models. In the case of forcing data, the model-agnostic output of meteorological forcing files is mapped onto the model elements (catchments in this case), resulting in catchment-averaged model forcing. Temperature time series are further modified with catchment-specific lapse rates to account for elevation differences between the forcing grid and model elements. In the case of parameter fields, intersections between the model-agnostic GeoTIFF files and the shapefile of the modeling domain are generated. These intersections show how often each elevation level, soil class, and land class occurs in each model element. These processes cannot be called truly model-agnostic because some models do not require them but neither are they fully model-specific. To ensure maximum usability for different models, workflows must therefore be as modular as possible so that modelers can mix and match from available code to suit the particularities of their chosen model (i.e., our third design principle, described later).

2.3.2 General Layout and Workflow Control

Our second design principle prescribes an intuitive interface for hydrologic modelers. We recognize two elements here: first, the code and data structure must be clear and easy to understand. Second, interacting with the workflow must be straightforward. Our example implementation strives to achieve both of these goals through a clean separation of code and data and the use of a single configuration file (hereafter referred to as a “control file”) that outlines high-level workflow decisions such as file paths, spatial and temporal extent of the experiment, and details about the shapefiles that contain the domain discretization. Using configuration or control files is common practice in software design applications (see e.g., Sen Gupta et al., 2015) and avoids the need to introduce hardcoded elements such as file paths and variable values in the code itself.

In a typical application of our example workflow, the user first creates a local copy of the code provided on our GitHub repository. We refer to this local code as the “code directory.” The user would then specify a path in the control file that specifies where workflow data (such as forcing and parameter data downloads, model input files, and model simulations) will be stored. The workflow is set up to read this path from the control file, create the specified folder structure, and store all data for a given modeling domain in the user-specified data folder (referred to as the “data directory”). This allows a clean separation between the workflow code itself and the data downloaded and preprocessed by the workflow code (Figure 3). The workflow's default settings ensure that the data directory is populated with folders and subfolders with descriptive names, making navigation of the generated data clear.

Details are in the caption following the image

Example of separated code and data directories. The code directory (left) contains the scripts as available on the repository's GitHub page. The data directory (right) contains the forcing data, parameter data, setting files, shapefiles, and model simulations that are used and generated by the workflow code.

Table 1 contains a subset of the information that is stored in the control file that defines the workflow settings for a model configuration for the Bow River at Banff, Canada (see Section 3 for a description of this test case). The control file contains the high-level information needed by the workflow, such as the name of the user's shapefiles, the names of required attributes in each shapefile, the spatial extent of the modeling domain, the years for which forcing data should be downloaded, and file paths and names for all required data. The workflow scripts read information from the control file as needed. Keeping all information in one place enables a user to quickly generate model configurations for multiple domains, without needing to scour all individual scripts for hardcoded file paths, domain extents, etc. For example, changing the simulation period for a given domain requires changing two values in the control file, after which selected code can be rerun to download and preprocess the necessary forcing data and run new simulations. To configure our chosen models for a new domain (assuming that no changes to the model or desired data sets are introduced), a user only needs to provide a new domain discretization file and update in the control file the name of the domain (so that a new data folder can be generated), the names of the discretization files, and the bounding box of the new domain. The workflow can then be fully rerun to create a model configuration for the new domain, without any changes being made to the workflow scripts themselves.

Table 1. Example of Part of a Workflow Control File, Showing Settings for the Bow at Banff Test Case
Setting Value Description
Modeling domain settings
root_path /user/CWARHM_data Root folder where data will be stored
domain_name BowAtBanff Used as part of the root folder name for the prepared data
Settings of user-provided catchment shapefile
catchment_shp_path default If “default,” uses “root_path/domain_[name]/shapefiles/catchment”
catchment_shp_name bow_dist_elev_zone.shp Name of the catchment shapefile. Requires extension “.shp”
catchment_shp_gruid GRU_ID Name of the GRU ID column (can be any numeric value, HRU's within a single GRU have the same GRU ID)
catchment_shp_hruid HRU_ID Name of the HRU ID column (consecutive from 1 to total number of HRUs, must be unique).
catchment_shp_area HRU_area Name of the catchment area column. Area must be in units (m2)
catchment_shp_lat center_lat Name of the latitude column. Should be a value representative for the HRU
catchment_shp_lon center_lon Name of the longitude column. Should be a value representative for the HRU
Forcing settings
forcing_raw_time 2008, 2013 Years to download: Jan-[from], Dec-[to]
forcing_raw_space 51.7/-116.5/50.9/-115.5 Bounding box of the shapefile: lat_max/lon_min/lat_min/lon_max. Will be converted to ERA5 download coordinates in script. Order and use of “/” to separate values is mandatory
forcing_time_step_size 3,600 Size of the forcing time step in (s)
  • Note. The actual control file (see Section 3) is available on the GitHub repository—see the section “Data Availability” at the end of this manuscript. These control files are simple text files containing three columns. The “Setting” column contains specific strings that each script in the repository looks for to identify which line in the control file contains the information the script needs. The “value” column contains the actual information, such as file paths names of shapefiles, and shapefile attributes, etc. Descriptions of each field are included for the user's benefit but not used by the setup scripts. The benefit of collecting all information and settings in a single file is that it avoids hard-coding this information in the workflow itself, making it straightforward to apply the same workflow for a new experiment by simply updating the control file.

2.3.3 Flexibility at Each Step of Model Setup

Our third design principle recognizes that process-based models are complex entities and that the setup procedures for any given model are model- or even experiment-specific. Not all models will need to go through the same configuration steps nor will every model experiment need the settings as defined in our example workflow. Our example workflow (Figure 1; details in Appendix A2) therefore aims to encourage adaptation beyond our original application through modularity and documentation.

First, we have chosen to present the workflow as a collection of scripts (i.e., the workflow code is stored in simple text files that can be executed from the command line) rather than a Python package, R library, executable module or similar, so that the user has straightforward access to the workflow code. This presentation simplifies adapting the code to different models or experiments by lowering the skill threshold needed to make adaptations to our code base and is likely closer to the ways in which model configuration is currently often done. Second, the workflow separates model setup into numerous small tasks (see Figures A2A4) and saves all intermediate results to files. This modularity makes it straightforward to branch out from our chosen defaults at any given step in the modeling workflow. Third, for this iteration of our workflow, we have chosen to move high-level decisions into the control file and leave various modeling decisions as assumptions in the workflow scripts. We have spent considerable effort on documenting any such assumptions (see Appendix A2) to let advanced users make targeted changes to the workflow code. Examples of these decisions include the number of soil layers used across the modeling domain, values for the initial model states, and default routing parameters. In future versions of our workflow, such decisions may be moved to a dedicated experiment-control file.

2.3.4 Code Provenance

Our fourth design principle relates to traceability. The decision to separate code and data directories potentially introduces a disconnect between code and data, and situations may arise where it is no longer clear which version of a given piece of code generated a particular piece of data. This can happen in cases where the workflow code is updated after having already been used to create (part of) a model configuration. Although the changes to the workflow code can be tracked through version control systems such as Git, it is much more difficult to trace which version of the code generated the data. Every script in our example workflow therefore places both a log file and a copy of its code in the data subdirectory on which it operates. This ensures that, even if a user makes changes to the code directory, a record exists in the data directory of the specific code used to generate the files in that data directory. Copies of the model settings are stored in their simulation data directories by default so that simulation provenance can be traced as well.

3 Test Cases

The test cases described in this section use the SUMMA and mizuRoute models. We refer the reader to Appendix A1 for details about both models and definitions of certain model-specific terms, such as Grouped Response Units (GRUs) and Hydrologic Response Units (HRUs). For all test cases, meteorological input data are obtained from the ERA5 data set (Hersbach et al., 2020), elevation data are obtained from the MERIT Hydro data set (Yamazaki et al., 2019), land use data are obtained from the MODIS MCD12Q1 data set (Friedl & Sulla-Menashe, 2019), and soil data are obtained from the Soilgrids 250m data set (Hengl et al., 2017). Detailed descriptions of the input data can be found in Appendix A2.

3.1 Global Model Configuration

This first test case simulates hydrologic processes across planet Earth to illustrate the large-domain applicability of our approach. The global domain (excluding Greenland and Antarctica) is divided into 2,939,385 subbasins or Grouped Response Units (GRUs; median GRU size is 36 km2; mean size is 45 km2) derived from the global MERIT basins data set (Lin et al., 2019). Simulations are run for a single month (1 January 1979 to 31 January 1979) at a 15-min temporal resolution. Figure 4 shows summary statistics of several simulated variables. By design, we ran these simulations without a model spin-up period so that we might confirm our models function in regions where under typical conditions after model spin-up we would not expect to see much hydrologic activity (e.g., extremely water-limited regions). The value of this test case is to demonstrate that the workflow is applicable anywhere on the planet, and that the size of the model domain does not provide an insurmountable barrier to open and reproducible hydrologic modeling. The workflow documents every decision made during model configuration and enables repeatable simulations of this model domain with only a fraction of the original effort needed.

Details are in the caption following the image

Overview of global simulations. SUMMA does not perform any computations for Grouped Response Units (GRUs) that are classified as being mostly open water, and mizuRoute was run without using its option to simulate routing through lakes and reservoirs. Lake delineations of lakes >100 km2 are obtained from the HydroLAKES data set (Messager et al., 2016) and used to mask open-water GRUs in this figure. Model setup uses default parameter values, and results are for illustrative purposes only. (a) Mean simulated total evapotranspiration, calculated as the sum of transpiration, canopy evaporation, and soil evaporation. Note that the color scale has been designed to show global variability and local variability in Oceania simultaneously. (b) Mean runoff, calculated as the sum of surface runoff, downward drainage from the soil column, and lateral flow from the soil column. (c) Mean streamflow as determined by mizuRoute's Impulse Routing Function approach.

3.2 Continental Model Configuration

This second test case uses 40 years of hourly forcing data to simulate hydrologic processes over the North American continent and illustrates the combined large-domain and multidecadal applicability of our approach. The continental domain is divided into 517,315 subbasins (median GRU size is 33 km2; mean size is 40 km2) derived from the global MERIT basins data set (Lin et al., 2019). Simulations are run from 1 January 1979 to 31 December 2019, again at a 15-min temporal resolution. Figure 5 shows summary statistics of several simulated variables: as expected, snow accumulation tends to be higher in mountainous and higher latitude locations; total soil water values are lower in the arid regions of the central and western United States and Canada and northern Mexico; evapotranspiration rates fluctuate according to available energy (i.e., by latitude) and water; and large river networks are clearly visible as a result of accumulation of upstream river flow. These results are outputs from a model run with default process parametrizations and parameter values, and improvements to either or both will likely improve local model accuracy. However, the visible large-scale patterns appear hydrologically sensible and give us confidence that this initial model configuration is a solid basis for further model improvement and development. The modular nature of our workflow enables improvements to any single part of it without needing to change any other parts of the model configuration code, which contributes to increased efficiency in model improvement and use.

Details are in the caption following the image

Overview of large-domain multidecadal simulations. SUMMA does not perform any computations for Grouped Response Units (GRUs) that are classified as being mostly open water, and mizuRoute was run without using its option to simulate routing through lakes and reservoirs. Lake delineations of lakes >1,000 km2 are obtained from the HydroLAKES data set (Messager et al., 2016) and used to mask open-water GRUs in this figure. Model setup uses default parameter values, and results are for illustrative purposes only. (a) Maximum simulated Snow Water Equivalent per GRU is capped at 1,000 kg m−2 for visualization purposes. (b) Mean simulated total soil water content, which includes both liquid and solid water in the soil profile. (c) Mean simulated evapotranspiration, defined as the sum of evaporation from the soil profile and the canopy, and transpiration by vegetation. (d) Mean streamflow as determined by mizuRoute's Impulse Routing Function approach.

3.3 Local Model Configuration

The modeling domain in the global and continental test cases is discretized into subbasins (GRUs, in SUMMA terminology) of roughly equal area. SUMMA uses a flexible spatial discretization approach that allows GRUs to be subdivided in as many HRUs as the modeler thinks practical and relevant. These HRUs can be used, for example, to represent different elevation zones, differences in soil or land use, differences in topography, or a combination of several of these elements (see Appendix A1 for a more detailed explanation). As a more localized test case, we created a subset of the MERIT basins data set (Lin et al., 2019) that covers the Bow River from the Continental Divide to the town of Banff, Alberta, Canada. We then subdivided each MERIT subbasin (i.e., each GRU) into multiple HRUs based on 500 m elevation increments (Figure 6a), created a new control file for this new domain, and reran the workflow code. No changes were necessary to any of the workflow scripts because the scripts obtain all the required information from the updated control file and the code is generalized to handle both the large-domain case, where GRUs are not subdivided into HRUs, and this local case, where HRUs are used. Note that this local test case could be for any basin on the planet.

Details are in the caption following the image

Overview of local simulations. Model setup uses default parameter values, and results are for illustrative purposes only. (a) Mean Hydrologic Response Unit (HRU) elevation as derived from MERIT Hydro digital elevation model. (b) Mean of maximum Snow Water Equivalent (SWE) per water year shown for each HRU, and mean annual streamflow shown for each river segment. Only data from complete water years are included.

This third test case uses hourly forcing data from 1 January 2008 to 31 December 2013 (again run at a 15-min substep resolution). Temperature lapse rates are applied to the forcing data for each individual HRU, meaning that the hydrometeorological conditions are somewhat different in each HRU despite the forcing grid cells being relatively large compared to the delineated catchments (see Figure A7). Figure 6b shows that simulated Snow Water Equivalent (SWE) varies per HRU and accumulated streamflow varies per stream segment. These figures provide a rudimentary test of the generated model setup for a location for which we have clear expectations about how the simulations should look (see also a cautionary note on the use of global data products in Appendix B). As may be expected, more snow accumulates at higher elevations, whereas the valley bottoms have a lower snowpack due to warmer air temperatures but larger flows due to their larger accumulated upstream area. As with the global and continental simulations, this local test case is fully reproducible and all model configuration decisions are stored as part of the workflow. This local test case also shows that different model configurations (in terms of spatial discretization in GRUs and HRUs) can be generated by the same model-specific workflow code.

4 Discussion

4.1 To What Extent Does Our Workflow Fulfill Reproducibility Requirements?

Best practices for open and reusable computational science can be briefly summarized as follows (e.g., Gil et al., 2016; Hutton et al., 2016; Stodden & Miguez, 2013): data must be available and accessible, code and methods must be available and accessible, active development on issues with data, code, and methods must be possible, and licensing of data and code should be as permissive as possible. These requirements are formalized in the FAIR principles (Wilkinson et al., 2016) but by themselves are not enough to guarantee reproducibility of computational science (e.g., Añel, 2017; Bast, 2019; Hut et al., 2017). To be fully reproducible, details about hardware, software versions, and data versions also need to be recorded and shared (e.g., Choi et al., 2021; Chuah et al., 2020; Essawy et al., 2020). Such practices require a certain time investments but the benefits are clear: the resulting science is more transparent, can be more easily reproduced, and follow-up work will be more efficient because less time is spent on mundane tasks such as data preparation.

Sandve et al. (2013) outline 10 rules for reproducible computational science in the field of Computational Biology, and these are also applicable to Earth System modeling. Our workflow follows nine of these guiding principles:
  1. Our workflow stores copies of the scripts that generate data together with the data itself, which allows a researcher to track how a given result was produced;

  2. Our workflow contains no manual data manipulation: all changes to the data are done in scripts and can be traced;

  3. An exact version of all software used is tracked, partly as installable Python environments and partly on the workflow repository for command line utilities;

  4. All scripts are version controlled through Git;

  5. Our workflow is modular and stores intermediate results in individual folders to aid in debugging of setups and to allow easy diversion from our workflow;

  6. All data that may support analysis and figures are systematically stored in a logical folder structure;

  7. Our chosen model structure is flexible in prescribing outputs, removing a need to modify the model source code to display specific results;

  8. Our visualization code keeps a precise record of which results file contains the data that underpin a given figure and thus a record exists of which data support a given textual statement about the analysis;

  9. The workflow code is publicly accessible.

Their tenth principle, keeping accurate note of the seeds that underpin any element of randomness in the analysis, does not apply here. Sandve et al. (2013) also recommend sharing access to simulation results. This can be done through repositories such as HydroShare or Pangaea but may be infeasible in the case of large-domain Earth System modeling. For example, storing all input and output data of our continental test case takes approximately 13 TB.

Internal tests on different hardware and by different researchers indicate that our workflow effectively implemented these principles for open and reproducible science in practice: the workflow can be used to generate identical model inputs and outputs by specifying exact library, package, and model versions. Some caveats apply, however. Although it is possible to trace model source code versions through Git commit IDs, such IDs can obviously not account for local code modifications that are not tracked through Git. Good “computational lab hygiene” is needed to ensure consistency between what is reported to have been done and what has in fact been done. Further, not all data sets that underlie our model setups have Digital Object Identifiers assigned to specific versions of the data set. Given the size of the data sets involved, sharing the data itself is infeasible, and some care must be taken to precisely track when data were downloaded as a means of making the use of data without DOIs traceable. Last, reproducibility is ensured through specifying exact versions of packages and libraries but many of these packages and libraries are undergoing rapid development and new versions are released frequently. There is a potential issue for reproducibility if older software versions for one reason or another are no longer available (though for fully open-source software this should theoretically not happen). New versions of specific software may, however, become incorporated into a new version of a workflow if they provide some needed functionality. To ensure backward compatibility, such new workflow versions must therefore also be assigned a new DOI so that any specific workflow version can be tracked and reused when needed.

4.2 Toward Community Modeling

4.2.1 Short-Term Benefits of Using Workflows

This paper introduces a modular model configuration workflow that separates model-agnostic and model-specific configuration steps. The two main benefits of approaching environmental modeling from this angle are clear: configuring multiple modeling experiments becomes much more efficient, and results are reproducible, because all model configuration decisions can be traced. These benefits address two problems that currently affect Earth System modeling. First, creating a typical model configuration is both difficult and time consuming, and it is possible that model configuration tasks do not receive the attention they deserve. Code may not be checked as thoroughly as may be necessary because bugs may not be readily apparent, and any time spent on model configuration is consequently not spent on writing journal articles or meeting report deadlines. Configuring models can be more efficient if model configuration code is freely and openly shared. This enables time that is currently spent on creating model configurations to instead be spent on in-depth analysis, improving the model representation of real-world processes, and fixing any bugs that may be found in the configuration code or the model source code. If bugs are found, tracing the experiments that are affected by these bugs is possible, and it will be clear which studies need to be corrected. Openly shared model configuration code therefore has the potential to increase the robustness of model simulations and accelerate advances in modeling capabilities. Second, by publishing workflow code alongside a manuscript, the provenance of scientific results remains traceable (see e.g., Hutton et al., 2016; Melsen et al., 2017). This can increase confidence in model results. It also enables more effective follow-up studies because all decisions that underpin the original study can be found in the public domain.

4.2.2 Long-Term Vision for Community Workflows

We see workflows such as the one presented in this paper as the first step toward a community-wide modeling framework. Figure 7 illustrates an example of such a framework using the workflow code presented in this paper as examples of each framework layer (see also Miles, 2014). In addition to a division between model-specific and model-agnostic tasks, we envision a framework that distinguishes between data-specific and data-agnostic preprocessing steps. Processing layers would be separated by standardization layers that prescribe the output format for the preceding processing layer and consequently the input format for the following processing layer (see also Miles, 2014, where the use of common file types is discussed as a recommended approach between the model-agnostic and model-specific parts of workflows). Community-wide agreement on the formats used in standardization layers will promote efficient interoperability of different data-specific processing modules, possibly as part of broader work on international hydrologic standards (e.g., HY_Features, Blodgett & Dornblut, 2018). Using our workflow as an example, we have created data-specific processing modules for ERA5 meteorological data, SOILGRIDS-derived soil classes, MODIS-derived land classes, and a MERIT Hydro-derived DEM. These modules generate data in standardized formats (in this case, netCDF4 forcing data and GeoTIFF spatial maps) that in turn feed into the data-agnostic remapping layer. This layer generates further model-agnostic data in netCDF4 and Shapefile formats that are then transformed into SUMMA's inputs through a model-specific processing layer.

Details are in the caption following the image

Schematic overview of a generalized community modeling framework, populated with examples from our SUMMA setup configuration workflow. Key to this modular approach is community-wide agreement on the formats used in each model-agnostic standardization layer. Such standards enable a modular approach to model configuration, where existing modules can be seamlessly replaced, as long as they are designed to read and output data in the agreed-upon formats.

Our currently defined model-agnostic tasks are of course still implicitly SUMMA-centric (i.e., we have completed those tasks because they generate the data that SUMMA requires), though in principle the outputs can immediately be used by other models. The modular nature of our workflow makes adding new data sets and processing steps as straightforward as writing new data-specific and data-agnostic routines and inserting them in a further unchanged workflow. Changing to a different model requires writing a new model-specific interface layer, but existing data processing scripts can remain untouched (again, assuming that the new model has data needs that can be met by already existing data processing scripts). This means that the workflow can be tailored to a specific model or experiment in a fraction of the time needed to create the model configuration from scratch. The modified workflow can then be published alongside the new modeling results to keep those results traceable.

It is of course possible that our attempt to separate model tasks into model-agnostic and model-specific parts is not equally applicable to different models that are currently in use. In such cases, we hope that providing a tangible example of how model configuration code can be organized and shared in a structured way will nevertheless inspire others to create their own workflows. Modifying our workflow or adapting it for different purposes in such ways is the second step we anticipate as needed to move toward a community modeling paradigm. By creating new or modifying existing workflows for new experiments and models, the required structure of a generalized model setup workflow may become apparent. As a third and final step, this generalized workflow can be formalized into a community-driven modeling framework that enhances efficiency and transparency in Earth System modeling.

To initiate the process of creating a community-driven modeling framework, our workflow is available as open-source code: https://github.com/CH-Earth/CWARHM (last access: 27 July 2022). We have chosen a permissive license (GNU GPLv3) that allows others to freely use and modify our code under the conditions that the modified code base is published under the same license, with attribution of its source and a list of changes. We envision a gradual process in which our repository is modified by others (either piecemeal or by incorporating our entire code base in a new repository as, e.g., a Git submodule), increasingly more data-specific and models-specific processing capabilities are made public, and appropriate formats for standard file formats become apparent. Deciding if and how to integrate these different elements into a single modeling framework is a decision the community will need to make in due course.

4.2.3 Where Do Workflows Stand in the Existing Reproducibility Landscape in Hydrologic Modeling?

We approach the workflow problem from a catchment modeling perspective within the wider Earth System modeling community (see the definitions of different communities in Archfield et al., 2015). Calls for more efficient, transparent, and shareable model configuration approaches are not new in the catchment modeling community (see e.g., Blair et al., 2019; Famiglietti et al., 2011; Hutton et al., 2016; Tarboton et al., 2009; Weiler & Beven, 2015) and considerable progress along these lines has been made. For example, Sen Gupta et al. (2015) standardize model inputs and outputs to efficiently couple a snow accumulation and melt routine with an existing open-source modeling framework; Ecohydrolib (Miles, 2014; Miles & Band, 2015; Miles et al., 2022) is a Python API that automatically preprocesses ecohydrologic parameter fields and forms the basis of a model configuration workflow for the RHESSys model; Bandaragoda et al. (2019) develop a general interface for building and coupling multiple models, using the Landlab toolkit (Barnhart et al., 2020; Hobley et al., 2017); Gan et al. (2020) integrate a web-based hydrologic model service with a data sharing system to promote reproducible workflows; HydroDS (Dash & Tarboton, 2022; Gichamo et al., 2020) is a web-based service that can be used to prepare input data for modeling; Bennett et al. (2018); Bennett et al. (2020) create a tool to estimate hourly forcing input for physics-based models from commonly available daily data; Bavay et al. (2022) describe a tool that can be used to effectively create a Graphical User Interface for a given model; Essawy et al. (2016) provide an example of how containerization (storing a full computational environment into a software container) enhances reproducibility; and Kurtzer et al. (2017, 2021) develop a means of saving and transferring software and computing environments on and between High Performance Computing clusters. Put together, most if not all elements for fully reproducible, easy-to-use, computational hydrology already exist. So far, however, uptake of these tools is regrettably not widespread.

We speculate that uptake of existing tools is somewhat low for multiple reasons. First, these tools are typically provided as self-contained packages where some form of interface exists between the user and the source code. Such packages tend to be easy to use for their intended purpose but take time to understand and do not necessarily provide much flexibility to deviate from their intended purpose. Layering additional functions on top of an existing package or modifying a package's source code is certainly possible, but can be outside the comfort zone of many users. Second, several model-configuration tools are provided as web-based services. This can be appealing because, for example, data can be predownloaded to speed up model configuration and model simulations can be easily shared. The advantage of such approaches is that they can be combined with some form of server-side data transformations (e.g., subsetting or averaging), which minimizes data transfers. Storing the inputs for and outputs of large-domain simulations can, however, be cumbersome, and keeping predownloaded data up-to-date and sufficient for all user needs takes sustained, long-term effort. A further complication is that it is regrettably common that such web-based services require some form of manual interaction with the webpage, limiting opportunities to automate data acquisition tasks. Third, the lack of community agreement on standard data formats means that developers of new tools typically decide to have their tool output data in a format relevant to their own application, which may not be a format that is widely used by others. It is cumbersome for developers to have their tools ingest multiple different data formats and such functionality is therefore somewhat rare. Community-wide agreement on a set of standard data formats, such as proposed in Figure 7, will make it easier for developers to know which data formats their tools must be able to ingest and produce to guarantee seamless interaction with other existing tools.

In short, some of the existing tools may be overdesigned or unsuitable for where the majority of the community currently stands. Such tools are typically designed by a small group of people, using a proof of concept or test case that is directly applicable to the developers' own work. Developers can make educated guesses about how their tool can be made more general beyond their proof of concept, as we had to do here. Actually extending these proof of concepts typically relies on the original developers having both the motivation and opportunity to implement functionality for others (e.g., incorporating new data sets or including model-specific layers for other models) or on new developers being willing to first understand the existing package or web service and then modifying it.

Our approach to provide a tangible example of how to structure model configuration tasks is different. First, our use of scripts that allow a user to immediately access the workflow code is likely much more similar to how many models are currently configured than if we had wrapped our workflow code in some form of user interface (such as a Python package, R library, or web interface). This lowers the barrier to trying our approach. Second, our use of standardization layers that require intermediate files to be in commonly used data formats (GeoTIFF, netCDF, and ESRI shapefiles) makes it easy to adapt small parts of our workflow without needing to change any upstream or downstream configuration tasks. Third, there are clear and immediate benefits of adopting a workflow approach of the type proposed in this paper that are unrelated to how widely (or not) this approach is adopted: creating new configurations for the models used in such workflows will be more efficient and the resulting science will stand on a firmer foundation than closed-source results. Should our approach become more widely adopted, then the path to a community modeling framework builds itself: as more examples of model configuration workflows become available, our preliminary sketch of a community modeling framework in Figure 7 can be refined or redrawn. The best approach to design, build, and maintain such a community framework can be decided in due course, and appropriate funding may be sought when needed. Advancing the paradigm of community modeling requires active participation of the community. By providing an example of a community modeling workflow, we hope to encourage uptake, modification, and adaptation of such community approaches.

4.3 Future Work

We outlined three steps to move toward a culture of community modeling in the Earth sciences in Section 1:
  1. For a given model, model configuration code should be publicly available and divided into model-agnostic and model-specific steps;

  2. The configuration workflows of multiple different models, ideally using different data sets, should be integrated into a proof-of-concept of a generalized model configuration workflow;

  3. A community-wide collaborative effort should refine the proof-of-concept into a flexible model configuration framework.

This manuscript provides an example of the first step in this list, by showing how configuration code for a single model can be implemented in a more general framework. Ongoing work focuses on the second step, by integrating multiple different models such as MESH (Pietroniro et al., 2007) and HYPE (Arheimer et al., 2020; Lindström et al., 2010) into our workflow by adding the necessary processing code for these models. This work is nearing completion, and both models have successfully been able to reuse the model-agnostic part of the code base described in this paper, suggesting that a “bottom-up” kind of approach to community modeling is feasible.

New processing code naturally involves writing new model-specific routines that convert existing preprocessed data into the specific formats each new model needs. Inclusion of additional models also necessitates certain new model-agnostic processing routines. For example, whereas SUMMA works on the assumption that a single computational element has a single (possibly dominant) land cover type (but allows spatially flexible configurations so that each different land cover type can be assigned its own computational element), MESH lets the user specify a histogram of land cover types within each grid cell. Our current implementation of model-agnostic land cover remapping therefore still follows the implicit assumption that the required processing output is a single land cover class per model element. A new routine is needed that returns the histogram of land classes per model element that MESH requires. Examples such as these show that a modular approach to a generalized community modeling framework as described in Section 4.2.2 and Figure 7, where new processing modules can be inserted without requiring changes to existing upstream and downstream routines, is a likely path forward on the road to community modeling.

5 Conclusions

This paper describes a code base that provides a general and extensible solution to configure hydrologic models. Specifically, the paper provides a tool that can be used to create reproducible configurations of the Structure for Unifying Multiple Modeling Alternatives (SUMMA, a process-based hydrologic model) and mizuRoute (a vector-based routing model). We consider this the implementation of a single model in a general framework that separates model-agnostic and model-specific configuration tasks. Such a separation of tasks makes inclusion of new models in this framework relatively straightforward because most of the data preprocessing code can remain unchanged and only model-specific code for the new model needs to be added.

The critical component of this framework are standardization layers, which prescribe the details of the file formats that must come out of the preceding processing layer and form the input of the following processing layer. By standardizing inputs and outputs, the code that forms the processing layers only needs to concern itself with these prescribed formats. Changing specific processing modules to, for example, pre-process a different data set, perform a different way of mapping data onto model elements, or prepare input files for a different model, can therefore happen in isolation from the remainder of the workflow as long as the new processing code accepts and returns data in the prescribed formats. We show examples of this approach with global and multidecadal continental SUMMA and mizuRoute simulations, and with a local SUMMA configuration that uses a more complex spatial discretization than the global and continental simulations use.

Future work will involve adding model-specific code for multiple additional models and any needed data-specific preprocessing modules. We have termed this initiative “Community Workflows to Advance Reproducibility in Hydrologic Modeling” (CWARHM; “swarm”) and we encourage others to be part of this model-agnostic workflow initiative. The configuration code for the SUMMA and mizuRoute setup shown in this manuscript is available on GitHub: https://github.com/CH-Earth/CWARHM (last access: 27 July 2022).

Acknowledgments

We are thankful for the general comments for workflow improvement given by Dave Casson, Hongli Liu, Jim Freer, and Hannah Burdett. We are also thankful for the efforts of Reza Zolfaghari and Kyle Klenk who assisted in understanding and solving specific numerical issues, to Louise Arnal for feedback on the workflow schematics, and to Ala Bahrami, Dayal Wijayarathne, and Tricia Stadnyk for letting us share their progress on incorporating the MESH and HYPE models into our workflow concept. Finally, we gratefully acknowledge the effort of the reviewers whose comments helped us clarify the message in this paper. Wouter Knoben, Martyn Clark, Shervan Gharari, Chris Marsh, Ray Spiteri, and Guoqiang Tang were supported by the Global Water Futures program, University of Saskatchewan. The work of David Tarboton was supported by the US National Science Foundation under collaborative Grants ICER 1928369 and OAC 1835569. The work of Jerad Bales was supported by NSF Grant EAR 1849458. NCAR coauthor contributions were supported by the US Army Corps of Engineers Responses to Climate Change program and the US Bureau of Reclamation under Cooperative Agreement R16AC00039.

    Appendix A: Workflow Description

    This section describes in detail our example of a model setup workflow that follows the design principles outlined in Section 2. The workflow code, model code, software requirements, and data are fully open-source to follow the Findable, Accessible, Interoperable, Reusable (FAIR) principles. The workflow is written in Python and Bash, using input data with global coverage, a spatially distributed, physics-based hydrologic modeling framework designed to isolate individual modeling decisions (Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021), and a network routing model (Mizukami et al., 20162021) that connects the individual hydrologic model elements through a river network. This example workflow can be used to generate a basic SUMMA and mizuRoute setup anywhere on the globe and is designed such that the model-agnostic parts of the code can easily feed into other modeling chains.

    Part of the code in this repository is adapted from or inspired by work performed at the National Centre for Atmospheric Research and the University of Washington.

    A1 Models

    This section provides a brief overview of SUMMA (Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021) and mizuRoute (Mizukami et al., 20162021) to the extent relevant to understand our workflow. We refer the reader to the original papers that describe each model for further details. We selected both models for their flexible nature, computational capacity to model very large domains, and availability of local expertise. Both models are written in Fortran, and their source code needs to be compiled before the models can be used.

    A1.1 Structure for Unifying Multiple Modeling Alternatives (SUMMA)

    SUMMA is a process-based modeling framework designed to isolate specific modeling decisions and evaluate competing alternatives for each decision, with the ability to do so across multiple spatial and temporal configurations. SUMMA solves a general set of mass and energy conservation equations (Clark et al., 2015a; Clark, Zolfaghari, et al., 2021) and includes multiple alternative flux parametrizations (Clark et al., 2015b). It separates the equations that describe the model physics from the numerical methods used to solve these equations, allowing the use of state-of-the-art numerical solving techniques (Clark, Zolfaghari, et al., 2021). SUMMA is available as Free and Open-Source Software (FOSS) and under active development (see https://www.github.com/CH-Earth/summa).

    SUMMA organizes model elements into Grouped Response Units (GRUs) that can each be further subdivided into multiple Hydrologic Response Units (HRUs). This enables flexible spatial discretization of modeling domains. For example, point-scale studies are possible by defining the domain as a single GRU that contains exactly one HRU (GRU area can be an arbitrary value because all fluxes and states are calculated per unit area; see e.g., Clark et al., 2015b). It is equally possible to mimic grid-based model setups such as commonly used in land-surface modeling schemes by defining each GRU to be equivalent to a grid cell and optionally using the HRUs to account for subgrid variability (e.g., mimicking the tiled grid approach of traditional VIC and MESH setups; Liang et al., 1994; Pietroniro et al., 2007). Finally, GRUs can represent the (sub-)catchments of a given river system with HRUs being areas of similar hydrologic behavior within each GRU. Such model configurations can use GRUs and HRUs of irregular shape, which has several advantages over grid-based setups (see e.g., Gharari et al., 2020). Most importantly, such spatial configurations can accurately follow the actual topography of the modeling domain, and this makes model results easier to visualize and interpret. SUMMA is configured with irregularly shaped computational elements in the test cases presented in this paper.

    A1.2 mizuRoute

    mizuRoute is a vector-based river routing model specifically designed for large-domain applications such as modeling of hydrologic processes across a continental domain. It organizes the routing domain into HRUs (i.e., catchments) and stream segments that meander through the HRUs and provide connections between them (Mizukami et al., 20162021). It can process inputs from hydrologic models with both grid- and vector-based setups and provides different options for channel routing: Kinematic Wave Tracking (KWT) and Impulse Response Function (IRF). For a given stream segment, the IRF method constructs a set of unique Unit Hydrographs (UH) for each upstream segment which is used to route runoff from each upstream reach independently. In other words, the routed runoff in a given stream segment is a simple sum of the UH runoff generated in all upstream segments. The KWT method instead tracks channel runoff as kinematic waves moving through the stream network with their own celerity. mizuRoute is available as FOSS and under active development (see https://github.com/ESCOMP/mizuRoute), with a particular focus on improving its representation of lakes and reservoirs (Gharari et al., 2022; Vanderkelen et al., 2022).

    A1.3 Note on Definitions

    SUMMA distinguishes between GRUs and HRUs. SUMMA's main modeling element is the GRU, which can be subdivided into an arbitrary number of HRUs. SUMMA can handle GRUs and HRUs of any shape (e.g., points, grid cells, and catchments) and these terms therefore refer to model elements of arbitrary shape and size. In this workflow, we use mizuRoute to route runoff between SUMMA's GRUs. Potentially confusingly, mizuRoute refers to all routing basins as HRUs only and does not use the term GRU. As a result, what SUMMA calls GRUs are referred to as HRUs by mizuRoute. For consistency with both sets of documentation, we use their own terminology for model elements where possible. Figure A1 shows a graphical example of the differences in terminology.

    Details are in the caption following the image

    Catchment of the Bow River at Banff (Alberta, Canada) discretized into (a) SUMMA and (b) mizuRoute model elements, showing associated terminology. SUMMA Hydrologic Response Units (HRUs) in (a) represent different elevation bands within each SUMMA Grouped Response Unit (GRU). A SUMMA GRU always contains at least one SUMMA HRU. There is no upper limit to the number of HRUs a single SUMMA GRU can be divided into. A single SUMMA HRU is never part of more than one SUMMA GRU. In our example, SUMMA GRUs are identical to mizuRoute HRUs. mizuRoute stream segments are shown in different colors to emphasize that in this case each mizuRoute HRU maps 1:1 onto a single stream segment; only a single color is shown in the legend for brevity, but all nonblack lines are stream segments.

    A2 Workflow Description

    This section briefly describes each step shown in the workflow diagram (Figure 1 in the main manuscript, with further technical details in Figures A2A4). Figures are generated using the test case configured for the Bow River catchment located in Alberta, Canada (see Figure A1 for an overview of this domain). This test case covers a geographically small area (approximately 2,200 km2) and uses a more complex model setup (SUMMA GRUs subdivided into multiple HRUs) than the continental and global test cases (where SUMMA GRUs contain exactly one HRU each), making it the best choice to visualize model setup procedures. Italicized phrases in this section indicate folders, scripts, or variables as found in the GitHub repository. To start, a user would download or clone the complete GitHub repository. The following sections provide more detail about the scripts found within the GitHub repository. Although our workflow requires only limited user interaction to generate a model configuration for a new domain, we do make certain assumptions about this model configuration which users should be aware of. These assumptions are specified in each subsection.

    Details are in the caption following the image

    High-level overview of model configuration steps, using SUMMA (a process-based hydrologic model) and mizuRoute (a routing model) as example models. Configuration tasks are separated into model-agnostic and model-specific tasks (details in Figures A3 and A4, respectively). Each rounded box specifies the outcomes of that configuration task as a numbered list.

    Details are in the caption following the image

    Model-agnostic configuration steps. Each rectangular block corresponds to a specific model setup task and is accompanied by a specific script with Python or Bash code, stored in a GitHub repository. Rounded rectangles indicate starting points of specific subtasks (mainly showing which folder in the repository contains certain parts of the workflow) and the outcomes of each subprocess. Parallelograms indicate actions the user must perform. Numbers show connections with the model-specific configuration tasks in Figure A4.

    Details are in the caption following the image

    Model-specific configuration steps. Each rectangular block corresponds to a specific model setup task and is accompanied by a specific script with Python or Bash code, stored in a GitHub repository. Rounded rectangles indicate starting points of specific subtasks (mainly showing which folder in the repository contains certain parts of the workflow) and the outcomes of each subprocess. Parallelograms indicate actions the user must perform. The hexagon indicates an aspect of SUMMA’s input requirements (i.e., not an action or script) and is shown to clarify why creating the forcing files is on the critical path toward creating the other necessary model configuration files. Numbers show connections with the model-agnostic configuration tasks in Figure A3.

    A2.1 Workflow Setup and Folder Structure

    This section describes the steps “User updates control file” and the steps contained in the box “Initial setup” (Figure A3).

    A2.1.1 Control Files

    Control files are the main way for a user to interact with the workflow. They contain high-level information such as file paths, file names, variable names, and specification of the spatial and temporal extent of the modeling domain (see also Sen Gupta et al., 2015). A new control file needs to be created by the user for each new domain. As an example, the control file for the Bow_at_Banff test case is included as part of the Github repository, in the folder ./CWARHM/0_control_files. The READMEs of each subfolder on the GitHub repository contain a list of the settings in the control file on which the scripts in that subfolder rely.

    A2.1.2 Folder Preparation

    The workflow separates generated data from the code used to generate the data. The script in the folder ./CWARHM/1_folder_prep generates a basic data folder structure in a location of the user's choosing (see Figure 3b). This basic folder structure generates a main data folder with a subdirectory for the current domain. In this domain folder, it further generates a dedicated folder where the user can place their shapefiles that delineate the SUMMA catchments (hydrologic model GRUs and HRUs), mizuRoute catchments (routing model HRUs), and mizuRoute river network. This is the only script in the workflow that needs to be manually modified if a setup for a new domain is generated. A user will need to modify the variable sourceFile so that it points to the control file for the current domain. In our example, this is set to control_Bow_at_Banff.txt. The script then copies the contents of this control file into a new file called control_active.txt, which is the file every other workflow script will search for. The variable sourceFile needs to be updated when a control file for a new domain is used. Note that the contents of the file control_active.txt determine which folders and files the other workflow scripts operate on.

    A2.1.3 Domain Shapefiles

    With a basic folder structure in place, the user can now move their prepared shapefiles into the newly generated folders (assuming the control file uses “default” values for these shapefile paths). Briefly, the shapefiles should contain: geometries that delineate the hydrologic model GRUs and HRUs, the routing model HRUs, and the routing model river network in a regular latitude/longitude projection (in other words, in the Coordinate Reference System defined by EPSG:4326; https://epsg.io/4326 [last access, 11 October 2021]). Each shapefile needs to specify certain properties of the model domain, such as identifiers for each GRU, HRU, and stream segment; HRU area and centroid location, stream segment slope and length; and the stream segment ID into which a given HRU drains.

    Detailed requirements for the shapefiles are provided in the README in ./CWARHM/1_folder_prep. Example shapefiles for the Bow_at_Banff test case are part of the repository and can be found in the subfolders of ./CWARHM/0_example.

    A2.2 Model-Agnostic Workflow Elements

    This section provides details about the model-agnostic elements of the workflow (shown in light gray in Figure A3). For convenience, this section is organized to follow the four model-agnostic sub processes: preprocessing of forcing, elevation, soil, and land use data.

    A2.2.1 Preprocessing of Forcing Data

    Our chosen forcing product is the ERA5 reanalysis data set (Copernicus Climate Change Service (C3S), 2017; Hersbach et al., 2020) provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). ERA5 data are available as hourly data for the period 1979 to present minus 5 days, at a 31 km spatial grid that covers the Earth's surface or at a regridded 0.25° × 0.25° latitude/longitude resolution. ERA5 data preparation includes two-way interactions between atmosphere, land surface, and ocean surface components. The ERA5 model setup includes different atmospheric layers and ERA5 data are available at 137 different pressure levels (i.e., heights above the surface), as well as at the surface. The lowest atmospheric level is L137, at geopotential and geometric altitude 10 m (i.e., 10 m above the land surface). To limit the influence of ECMWF's land model on our required forcing variables (simulating the land surface response is SUMMA's role after all), we obtain air temperature, wind speed, and specific humidity at the lowest pressure level (Hersbach et al., 2017) instead of at the land surface. Precipitation, downward shortwave radiation, downward longwave radiation, and air pressure are unaffected by the land model coupling and can be downloaded at the surface level (Hersbach et al., 2018).

    Surface and pressure level data are stored in two different data archives and are accessed in different ways. Download scripts for each separate archive are found in folder ./CWARHM/3a_forcing/1a_download_forcing. These scripts access the C3S Climate Data Store (CDS) using the user's credentials (instructions on how to obtain and store credentials can be found in the README in the download folder) and download the necessary data in monthly blocks of hourly data at a regular 0.25 × 0.25° latitude/longitude resolution. The spatial and temporal extents of the domain are taken from the control file. As per the ERA5 documentation, ERA5 data should be seen as point data, even though standard visualization approaches typically show this kind of data as an interpolated grid. In our example workflow, we make the simple assumption that each ERA5 point contains forcing data that are representative for the grid of size 0.25° × 0.25° of which the grid point is the centroid. The workflow code automatically finds which ERA5 grid points to download based on the catchment bounding box specified in the control file (Figure A5). Once downloaded, the code in ./CWARHM/3a_forcing/2_merge_forcing can be used to merge the surface and pressure level downloads into a single netCDF file, which is used for further processing. During this merging process, the ERA5 variable names are also changed to more descriptive ones.

    Details are in the caption following the image

    Overview of ERA5 data points, catchment and bounding box and how ERA5 data is assumed to overlap the catchment for the Bow at Banff test case.

    Gridded forcing data does not map directly onto irregular model elements such as HRUs. Code in ./CWARHM/3a_forcing/3_create_shapefile generates a shapefile for the forcing data that outlines the forcing grid (dotted red lines in Figure A5), which is later used to find the relative contribution of each forcing grid cell to the forcing of each HRU. The elevation of each ERA5 grid point is added to this shapefile. Elevation data are later used to apply temperature lapse rates based on the difference in elevation of the ERA5 data and mean HRU elevations. As per the ERA5 documentation, the elevation of each ERA5 data point is found by dividing the geopotential (m2 s−2) of each point (downloaded through scripts in ./CWARHM/3a_forcing/1b_download_geopotential) by the gravitational acceleration (m s−2).

    Key assumptions in this part of the workflow are (a) that the user has access to the Copernicus Data Store. Instructions on how to obtain access are given in the README in folder ./CWARHM/3a_forcing. (b) We consider that using forcing data that are the result of interaction between the atmospheric and land surface model components is undesirable and hence somewhat limit this interaction by downloading certain variables at the lowest pressure level instead. (c) ERA5 data points are assumed to be representative of grids of size 0.25° × 0.25°. (d) Gravitational acceleration is assumed to be constant at g = 9.80665 (m s−2) (Tiesinga et al., 2019), although in reality this value would vary depending on latitude and altitude. (e) ERA5 variable names are changed to more descriptive ones that are also the names SUMMA expects these variables to have.

    A2.2.2 Preprocessing of Geospatial Parameter Fields

    Three different types of geospatial data are required for our example model setup. A Digital Elevation Model (DEM) provides the elevation of each HRU and is both a SUMMA input and required to apply temperature lapse rates as a preprocessing step. Maps of soil classes and vegetation classes are needed to utilize parameter lookup tables. These tables specify values for multiple parameters for a variety of soil and land classes. By knowing the soil or land class for a given HRU, SUMMA uses the predefined parameter values for those classes.

    A2.2.3 Digital Elevation Model

    We use the hydrologically adjusted elevations that are part of the MERIT Hydro data set (Yamazaki et al., 2019) to determine HRU elevations. The MERIT Hydro hydrography maps cover the area between 90° north and 60° south at a spatial resolution of 3 arc-seconds. They are derived from the MERIT DEM (Yamazaki et al., 2017), which itself is the result of extensive error correction of the SRTM3 (Farr et al., 2007) and AW3D-30m (Tadono et al., 2016) DEMs. Scripts can be found in the subdirectories of ./CWARHM/3b_parameters/MERIT_Hydro_DEM.

    MERIT Hydro data are provided as compressed data packages that cover 30° × 30° areas. Based on the spatial extent of the domain, as given in the control file, the required 30° areas are downloaded in compressed format. Data are then uncompressed so that the individual GeoTIFF files are accessible. These files are first combined into a Virtual Data set (VRT), from which the exact modeling domain is extracted into a new VRT. The VRT with the extracted subdomain is then converted into a single GeoTIFF file that contains the DEM for the modeling domain. A key assumption is that the user has access to the MERIT Hydro data. Instructions on how to obtain access are given in the README in folder ./CWARHM/3b_parameters/MERIT_Hydro_DEM.

    A2.2.4 Vegetation Classes

    We use MODIS MCD12Q1_V6 data (Friedl & Sulla-Menashe, 2019) to determine land cover classes at the HRU level. MODIS MCD12Q1 data are available for the years 2001–2018 at a 500-m resolution. The data set contains land cover classes for multiple different land cover classification schemes. Each data layer is the result of supervised classification of MODIS reflectance data (Friedl & Sulla-Menashe, 2019). Scripts can be found in the subdirectories of ./CWARHM/3b_parameters/MODIS_MCD12Q1_V6.

    MODIS MCD12Q1 data are provided as multiple Hierarchical Data Format (HDF) files that each cover a part of the planet's surface at a given time. The source data files are in a sinusoidal projection and of irregular shape which makes it difficult to extract a specific region. Therefore, the workflow downloads all available individual HDF files for each year (i.e., global coverage). The individual files for each data year are combined into one VRT per year for easier processing. Only the data layer of interest, the International Geosphere Biosphere Programme (IGBP) land cover classification, is included in the VRT. The VRT is reprojected from its original sinusoidal projection into a regular latitude/longitude grid (EPSG:4326) from which the modeling domain is extracted. The annual VRTs are then combined into a single multiband VRT, which is then converted to a multiband GeoTIFF file. The MODIS documentation advises against using the data of an individual year due to data uncertainty (Sulla-Menashe & Friedl, 2018). Therefore, the mode land class between 2001 and 2018 is identified as the most likely class for each pixel and stored as a new GeoTIFF file.

    Key assumptions are (a) that the user has access to NASA's Earth Data website. Instructions on how to obtain access are given in the README in folder ./CWARHM/3b_parameters/MODIS_MCD12Q1_V6. (b) Our example uses the International Geosphere Biosphere Programme (IGBP) land cover classification data, which is one of multiple options available.

    A2.2.5 Soil Classes

    Our example uses a global map of soil texture classes (Knoben, 2021) derived from the SoilGrids 250m data set (Hengl et al., 2017) to specify representative soil classes at the HRU level. The SoilGrids data are provided at a 250-m resolution and at seven standard depths (up to 2 m depth). Data are the result of a combination of approximately 150,000 observed soil profiles, 158 remote sensing-based soil covariates, and multiple machine learning methods. SoilGrids maps of sand, silt, and clay percentages were converted to a soil texture map for each depth using the soil texture class boundaries of Benham et al. (2009). For each 250 m map point, the mode soil class of the seven soil layers was selected as a representative value for the soil column as a whole, resulting in a single global map of soil texture classes. The preprocessing code needed to create this map (data download, data merge into a coherent map, conversion from percentages to soil texture, and finding the mode of each soil column) is accessible as part of the data resource (Knoben, 2021). Scripts can be found in the subdirectories of ./CWARHM/3b_parameters/SOILGRIDS.

    The global soil texture class map is provided at the same horizontal resolution as the underlying SoilGrids data. The workflow first downloads a map with global coverage. The spatial extent of the modeling domain is extracted based on the bounding box specified in the control file and stored as a new GeoTIFF file.

    Key assumptions are (a) that the user has access to Hydroshare. Instructions on how to obtain access are given in the README in folder ./CWARHM/3b_parameters/SOILGRIDS and (b) the global soil map used assumes that mode soil class in each soil column can be considered as the representative soil class for the entire soil column and that the soil properties (such as saturated conductivity and pore volume) for the mode class are representative of the properties of the column. This approach ignores the existence of layered soil profiles and the differences in water movement this can cause (e.g., Vanderborght et al., 2005). This also assumes that the most common class contains the layers that are most hydrologically active and relevant for modeling purposes.

    A2.3 Mapping of Data to Model Elements

    This section provides details about the mapping of preprocessed forcing data onto model elements (shown in the intermediate gray shade in Figure A3). This process cannot be called truly model-agnostic because whether it is needed depends on the model in question: some models are able to ingest the preprocessed data directly.

    A2.3.1 Geospatial Parameter Fields

    In our example, geospatial data in the form of GeoTIFF files containing the DEM, land classes, and vegetation classes cannot be ingested by the hydrologic model directly. The data must be mapped onto the model elements (HRUs) as delineated in the catchment's shapefile. These procedures use the open-source QGIS project (QGIS Development Team, 2021) to provide the necessary Python functions (./CWARHM/4b_remapping/1_topo). Key assumptions are (a) that MERIT Hydrologically Adjusted Elevation data need to be aggregated into mean elevation values per model element, whereas (b) soil and vegetation classes need to be aggregated into histograms that summarize the distribution of values per model element.

    A2.3.2 Forcing Data

    Figure A6 shows the original gridded air temperature values on an arbitrary day and the HRU-averaged values on that same day that are obtained by mapping the gridded forcing data onto the model elements. For each model element, the relative overlap with each ERA5 grid cell determines the weight with which that forcing grid cell contributes to the HRU-averaged value. This procedure is applied to all seven forcing variables and all time steps to generate HRU-averaged forcing (./CWARHM/4b_remapping/2_forcing).

    Details are in the caption following the image

    (a) Original gridded air temperature data as found in the ERA5 data. (b) Hydrologic Response Unit (HRU)-averaged air temperature obtained as a weighted average of the relative contributions of each ERA5 grid cell to each HRU. Temperatures shown outside the catchment boundaries are the original gridded values.

    We then apply a constant environmental lapse rate of 0.0065 K⋅m−1 (Wallace & Hobbs, 2006, p. 421) to the HRU-averaged air temperature data to account for any differences between ERA5 data point elevation and mean HRU elevation (Figure A7). To avoid excessive data access, the SUMMA-specific variable data_step (which specifies the temporal resolution of the forcing data in [s]) is added to each forcing file at the same time as lapse rates are applied.

    Key assumptions are that (a) a temporally and spatially constant lapse rate can be used. This is common in gridded analysis but typically not locally accurate (Minder et al., 2010). Local lapse rates may be very different from this assumed value, especially in complex terrain and at seasonal or hourly timescales (Cullen & Marshall, 2011; Minder et al., 2010) . Regionally and temporally variable lapse rates are a possible way to improve this part of the workflow (e.g., Dutra et al., 2020) but doing so is beyond the scope of this study. (b) The influence of slope and aspect on radiation fluxes is currently not accounted for in forcing data preparation.

    Details are in the caption following the image

    (a) Hydrologic Response Unit (HRU)-averaged elevation derived from MERIT Hydro adjusted elevations data. ERA5 grid point elevation calculated from geopotential data and a spatially constant gravitational acceleration value, visualized as grid cells. (b) Temperature lapse values based on a constant lapse rate and a weighted difference between ERA5 grid point elevation and HRU mean elevation. (c) Air temperature data before lapse rates are applied. (d) Air temperature data after lapse rates are applied.

    A2.4 Model-Specific Workflow Elements

    This section provides detailed model-specific steps of the workflow (shown in dark gray in Figure A2). These steps form the interface between preprocessed data and models.

    A2.4.1 SUMMA and mizuRoute Installation

    The source code for both SUMMA and mizuRoute can be obtained through GitHub (see Appendix A1). Scripts in ./CWARHM/2_install provide code to download the latest version of both models to a local machine. Both models are written in Fortran and need to be compiled to create executables. The exact commands and settings needed will vary between different computational environments. The workflow contains examples of model compile code for a specific High Performance Computing environment.

    Key assumptions are as follows. (a) The user has determined the appropriate settings to compile both models on their own computational infrastructure and made the necessary changes to our provided example code. (b) Both scripts assume that the “develop” branch of each model is the version of interest. (c) A Linux or MacOS environment is recommended because compiling the SUMMA and mizuRoute source code requires a netCDF-Fortran library to be installed locally and this library is not supported on Windows yet. A basic alternative that avoids compiling the source code is to install pySUMMA and mizuRoute through Conda, but this provides precompiled executables only. Access to the source code is not possible and updates present on GitHub may not immediately appear in the pySUMMA Conda distribution.

    A2.4.2 Shapefile Sorting to Ensure Expected Order of Model Elements

    SUMMA makes certain assumptions about GRU and HRU order in its input files. These are as follows: (a) GRUs and HRUs are in the same order if the forcing files and all SUMMA input files that contain information at the GRU and HRU level; and (b) HRUs inside a given GRU are found at subsequent indices in each NetCDF file. Note that these requirements do not specify anything about the values of the GRU and HRU IDs and only focus on the order in which the IDs appear in files. The code in ./CWARHM/4a_sort_shape sorts the shapefile that contains the catchment delineation into GRUs and HRUs before this shapefile is used by other scripts. This is more efficient than postponing this sorting until the SUMMA input files are generated. A key assumption is that computational efficiency is an important consideration and therefore this model-specific requirement should be run before the (model-agnostic) remapping is performed.

    A2.4.3 SUMMA Input Files

    SUMMA requires several different configuration files: (a) default parameter values at the GRU and HRU level; (b) lookup tables with predefined soil and vegetation parameters for different soil and land classes; (c) a model decisions file that specifies which modeling decisions (e.g., the type of numerical solver) and flux parametrizations to use; (d) an output control file that specifies which internal model variables to write as model output, at which temporal resolution to do so and which, if any, summary statistics to provide; (e) a file manager file that specifies the file paths to all model inputs and outputs as well as the time period for the simulation; (f) a forcing file list file that specifies the names of all meteorological forcing files to use; (g) a trial parameter file that can be used to overrule any parameter value specified in the default parameter files and in the lookup tables that can be helpful to quickly test different parameter values during, for example, calibration; (h) an initial conditions file that specifies the model states at the beginning of the first time step; and (i) an attribute file that contains topographic information such as elevation, soil type, and land use type at the HRU level.

    In our example setup, files with default parameter values, lookup tables, model decisions, and requested outputs are provided as part of the repository. These files do not require any information from the preprocessing steps for forcing data and geospatial parameter fields and can therefore simply be copied into the new SUMMA settings directory. The file manager and forcing file list are populated with information available in the workflow control file. The workflow generates a trial parameter file that, for our test cases, specifies a required value for only one parameter. This parameter controls the time resolution of SUMMA's simulations and is here specified as 900 s (i.e., four times smaller than the 1-hourly forcing data resolution) to improve numerical convergence of the model equations. The initial conditions file serves a dual purpose: it specifies the model states at the start of the simulation and the vertical discretization of the soil domain into discrete layers. In this example, SUMMA is initialized with eight soil layers of increasing thickness (0.025 m for the top layer and 1.50 m for the bottom layer), without any snow or ice present, with some soil and groundwater liquid water storage and at a constant temperature of the soil, and canopy domains of 10°C. The attributes file is populated with data from the user's shapefiles (GRU and HRU IDs, HRU-to-GRU mapping, longitude and latitude, and HRU area) and from the geospatial preprocessing steps. Figure A8 shows the original geospatial parameter fields that are the outcomes of our model-agnostic preprocessing steps and how these are converted into model-specific values for SUMMA's attributes file. All scripts are available in the subdirectories of ./CWARHM/5_model_input/SUMMA.

    Details are in the caption following the image

    Mapping of geospatial parameter fields onto model elements. (a and b) MERIT Hydro adjusted elevations digital elevation model source data and the mean elevation per Hydrologic Response Unit (HRU). (c and d) Soil texture classes derived from SOILGRIDS sand, silt, and clay percentages and the most common class per HRU. (e and f) International Geosphere Biosphere Programme (IGBP) land classes from MODIS data and the most common class per HRU.

    Key assumptions are (a) that the HRU and GRU default parameter files, model decisions, and lookup table files are assumed to be sensible choices for the domain of interest. In particular, the choice of ROSETTA lookup table for soil properties (NCAR Research Applications Laboratory RAL, 2021; U.S. Department of Agriculture: Agricultural Research Service [USDA ARS], 2021) and the modified IGBP table for vegetation properties (NCAR Research Applications Laboratory RAL, 2021) inform how the geospatial data is preprocessed (i.e., which geospatial data sets are used and how they are transformed). (b) Vertical discretization of domain is currently set at eight soil layers with increasing thickness with depth. (c) Initial conditions are dry and warm, and there is no snow and ice present in the domain. (d) Model decisions relying on contourLength and tan_slope attributes are not supported (currently this is the baseflow model decision qbaseTopmodel, as well as certain radiation calculations that account for slope inclination). Attribute variable downHRUindex is only used if decision qbaseTopmodel is active and is therefore set to zero.

    A2.4.4 mizuRoute Input Files

    mizuRoute requires several configuration files: (a) a default parameter file that has values for its different routing schemes; (b) a network topology file that contains a description of the river network and its properties; (c) optionally, a remapping file that shows how output from a hydrologic model should be mapped onto mizuRoute's routing network; and (d) a mizuRoute.control file that specifies the necessary file paths and routing settings. In our example setup, a default routing parameter file is provided as part of the repository. This file does not require any information from the preprocessing steps for forcing data and geospatial parameter fields and can therefore simply be copied into the new mizuRoute settings directory. The network topology file contains a description of the routing basins and their associated stream segments. It specifies which basins and segments exist, which segment each basin drains into, and physical properties of the domain such as drainage area, segment length, and segment slope. The optional remapping file only needs to be used in cases where the hydrologic model operates on model elements that do not map directly onto mizuRoute's routing basins. In such a case the remapping file specifies the weight with which each hydrologic catchment contributes flow to each routing basin. The mizuRoute control file is populated with information available in the workflow control file. All scripts are available in the subdirectories of ./CWARHM/5_model_input/mizuRoute.

    Key assumptions are (a) that the provided routing parameter values are appropriate for the domain and (b) Hillslope routing (i.e., routing between different SUMMA HRUs inside a given SUMMA GRU) is performed by SUMMA. mizuRoute is configured to do the river network routing between different SUMMA GRUs.

    A2.4.5 Model Runs

    Model runs use the compiled SUMMA and mizuRoute executables to perform simulations using the inputs and settings defined in their respective configuration files (./CWARHM/6_model_runs). As part of the model run scripts, model configuration files are copied into the simulations output directories. This ensures traceability of the simulations by keeping a record of the settings used to generate the simulations.

    A2.5 Postprocessing

    Postprocessing of model results in this example is limited to the code needed to generate the modeling domain figure in this manuscript (./CWARHM/7_visualization). Further visualization code may be added over time, as such code is created for specific experiments.

    Appendix B: Note on Data Accuracy

    Our example workflow uses ERA5 forcing data, MERIT Hydro DEM, SOILGRIDS-derived soil texture classes, and MODIS IGBP land classes for their global coverage. This enables global applications of the workflow. Such global data sets are based on a combination of observations and geospatial data processing methods to estimate data values for locations where no observations are available. These approaches may need to sacrifice local information content for global coverage and are not always able to utilize the most accurate local data available.

    ERA5 is a reanalysis product from a data assimilating numerical weather prediction model. ERA5 precipitation estimates compare favorably to other global products at a daily resolution (Beck et al., 2019) but are typically not as accurate as local gauge or radar-based observations, especially in regions with complex topography (e.g., Amjad et al., 2020; Jiang et al., 2021; Tang et al., 2020; Xu et al., 2019). Jiang et al. (2020) show a similar reduced accuracy of ERA5 compared to station observations for direct and diffuse solar radiation estimates. Less is known about the accuracy of the remaining ERA5 forcing variables used in our workflow, and it is possible that the relatively coarse resolution of ERA5 data means that these variables may not be as accurate as local products.

    The MERIT Hydro hydrologically adjusted elevation data set (Yamazaki et al., 2019) is based on the MERIT DEM (Yamazaki et al., 2017), which itself is the result of applying an error-removal algorithm to existing space-borne DEMs. It is available globally at approximately 90 m spatial resolution. The MERIT Hydro data represent an advance over earlier products such as HydroSheds (Lehner et al., 2008), especially at higher latitudes, but some uncertainty in the produced hydrography data remains in regions with low topographic variation, with endorheic basins, with seasonally varying connectivity, and with channel bifurcations. The MERIT Hydro hydrologically adjusted elevations are a modification of the MERIT DEM that satisfies the condition “downstream pixels are not higher than upstream pixels.” This procedure relies on a combination of correctly identifying endorheic basins, connections between subbasins, and adjusting pixel elevations to create continuous flow paths. It is unknown to what extent this procedure affects the mean catchment elevation we derive from the hydrologically adjusted elevation. It is plausible that mean catchment elevations derived from this data will be less accurate in regions with rapidly varying topography, where catchment slopes are steep compared to the MERIT Hydro resolution.

    The SoilGrids database uses observations of approximately 150,000 soil profiles, pseudo-observations that encode expert knowledge in a similar way to actual observed soil profiles, and machine learning to provide global estimates of various soil properties at a 250-m resolution. Ten-fold cross-validation of the resulting sand, silt, and clay percentage data used in our workflow shows that this approach explains approximately 75% of the variation in these soil properties. There is no systematic over or under prediction of these properties, but large differences between estimates and observations exist nonetheless in certain cases (Hengl et al., 2017).

    MODIS MCD12Q1_v6 data use a combination of random forests, bias, and error correction based on ancillary data, and a hidden Markov Model approach to convert preprocessed satellite reflectance imagery into land cover classification categories. Ten-fold cross-validation of the resulting classification indicates that the IGBP classes used in our workflow are accurate in approximately two thirds of cases. Misclassifications tend to occur in regions that contain substantial land cover variability at scales smaller than the 500 m MODIS resolution is provided at and along climatic gradients where the cover type changes gradually (Sulla-Menashe et al., 2019).

    We therefore recommend that users replace our chosen global data products with more appropriate local data if such data are available and the project scope lies within the data domain. Due to the modular nature of the workflow, this replacement requires only minimal changes to the model configuration code. In terms of Figure 7, incorporating a different data set would require a new data-specific preprocessing module for which our existing workflow can serve as a guide. We emphasize that this workflow is intended to provide a baseline configuration upon which a user can improve. Our workflow does not contain any elements that compare the resulting simulations to observations to ascertain the quality of these simulations. A model setup generated through this workflow should thus not be assumed to be fit for a given purpose, unless shown to be so by the user's own model evaluation procedures.

    Data Availability Statement

    The latest version of the workflow code presented in this study is available on https://github.com/CH-Earth/CWARHM with the specific version used to generate Figures 4-6A1, and A5A8 via https://dx.doi.org/10.5281/zenodo.7134868 (Knoben, Marsh, & Tang, 2022) accessible under GNU GPL v3.0.

    The SUMMA (Clark et al., 2015a2015b; Clark, Zolfaghari, et al., 2021) versions used for simulations in this paper can be identified by Git commit ID edd328c8c2e7b81c3b222d4c7d2544769036fd45 (global domain excluding North America) and Git commit ID 3d17543db618cb5b9c7600d6d0de658943056c93 (North America domain and Bow at Banff domain). Source code accessible on https://github.com/CH-Earth/summa under the GNU GPLv3 license.

    The mizuRoute (Mizukami et al., 20162021) version used for simulations in this paper can be identified by Git commit ID 137820620f624f84f8cdb1d4e9884b8222a3f3df (global domain excluding North America), Git commit ID c2de53d242fc41b94c48119d23b78da1f35719ee (North America domain), and Git commit ID d43066b56a7361f3d4a9c7b07264d7d52a9686f1 (Bow at Banff domain). Source code accessible on https://github.com/ESCOMP/mizuRoute under the GNU GPLv3 license.

    The single level ERA5 data (Hersbach et al., 2018) used as meteorological model input data are available at the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) via https://dx.doi.org/10.24381/cds.adbb2d47 under the Licence to use Copernicus Products (https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf; last access: 4 November 2021).

    The pressure level ERA5 data (Hersbach et al., 2017) used as meteorological model input data are available at the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) via MARS request (no DOI) under Licence to use Copernicus Products (https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf; last access: 4 November 2021). Data downloaded on 17 April 2021 for the Bow at Banff test case; between 14 November 2020 and 23 December 2020 for the North America test case; and on 20 June 2021 for the global test case.

    The MERIT Hydro Hydrologically Adjusted Elevations (Yamazaki et al., 2019) used as DEM to determine mean catchment elevations is available at http://hydro.iis.u-tokyo.ac.jp/∼yamadai/MERIT_Hydro/ (last webpage access on 4 November 2021) as version v1.0.1 (no DOI available; data downloaded on 17 April 2021 for the Bow at Banff test case; on 15 May 2021 for the North America test case; between 3 June 2022 and 2 July 2022 for the global test case), accessible under CC-BY-NC 4.0 or ODbL 1.0.

    The MODIS MCD12Q1 V6 data (Friedl & Sulla-Menashe, 2019; Sulla-Menashe & Friedl, 2018; Sulla-Menashe et al., 2019) used to find a representative IGBP land cover class for each model element are available at the NASA EOSDIS Land Processes DAAC via https://dx.doi.org/10.5067/MODIS/MCD12Q1.006, with no restrictions on reuse, sale, or redistribution.

    The Global USDA-NRCS soil texture class map (Knoben, 2021) derived from the Soilgrids 250m data set (Hengl et al., 2017) and used to find a representative USGS soil type class for each model element is available as a Hydroshare resource via https://dx.doi.org/10.4211/hs.1361509511e44adfba814f6950c6e742, under ODbL v1.0.

    The shapefiles that contain the catchment delineations for all test cases are derived from the MERIT Hydro basins data set (Lin et al., 2019), which is originally made available for research purposes on http://hydrology.princeton.edu/data/mpan/MERIT_Basins/. The basin discretization and river network files for the Bow at Banff test case are a subset of the original files, with the original basins further discretized into elevation bands. The Bow at Banff shapefiles is provided as part of the workflow repository. For the global test case the original MERIT Hydro basin and hillslope files were merged into a single shapefile per continent, as were the separate river network files. For the continental test case the original MERIT Hydro basin and hillslope files were merged into a single shapefile per continent, and updated to correct any invalid geometries in basin polygons and to separate coastal hillslope polygons into two separate polygons if the original polygon was intersected by a river segment. The separate river network files were merged into a single file as well. The shapefiles that contain the catchment delineation and river network for the global and continental test cases are available as a Hydroshare resource (Knoben, Clark, et al., 2022) via https://dx.doi.org/10.4211/hs.46d980a71d2c4365aa290dc1bfdac823, under CC BY-NC-SA.