The CR of Evaporation: A Calibration‐Free Diagnostic and Benchmarking Tool for Large‐Scale Terrestrial Evapotranspiration Modeling

Monthly evapotranspiration (ET) rates for 1979–2015 were estimated by the latest, calibration‐free version of the complementary relationship (CR) of evaporation over the conterminous United States. The results were compared to similar estimates of three land surface models (Noah, VIC, Mosaic), two reanalysis products (National Centers of Environmental Protection Reanalysis II, ERA‐Interim), two remote‐sensing‐based (Global Land Evaporation Amsterdam Model, Penman‐Monteith‐Leuning) algorithms, and the spatially upscaled eddy‐covariance ET measurements of FLUXNET‐MTE. Model validations were performed via simplified water‐balance derived ET rates employing Parameter‐Elevation Regressions on Independent Slopes Model precipitation, United States Geological Survey two‐ and six‐digit Hydrologic Unit Code (HUC2 and HUC6) discharge, and terrestrial water storage anomalies from Gravity Recovery and Climate Experiment, the latter for 2003–2015. The CR outperforms all other multiyear mean annual HUC2‐averaged ET estimates with root‐mean‐square error = 51 mm/year, R = 0.98, relative bias of −1%, and Nash‐Sutcliffe efficiency = 0.94, respectively. Inclusion of the Gravity Recovery and Climate Experiment data into the annual water balances for the shorter 2003–2015 period does not have much effect on model performance. Similarly, the CR outperforms all other models for the linear trend of the annual ET rates over the HUC2 basins. Over the significantly smaller HUC6 basins where the water‐balance validation is more uncertain, the CR still outperforms all other models except FLUXNET‐MTE, which has the advantage of possible local ET measurements, a benefit that clearly diminishes at the HUC2 scale. As the employed CR is calibration‐free and requires only very few meteorological inputs, yet it yields superior ET performance at the regional scale, it may serve as a diagnostic and benchmarking tool for more complex and data intensive models of terrestrial evapotranspiration rates.


Introduction
Globally, about 70% of the precipitation over land returns to the atmosphere via terrestrial evapotranspiration (ET; Oki & Kanae, 2006), which at the same time consumes approximately 60% of the energy available at the land surface (Trenberth et al., 2009). The variations in ET are also regarded as an important indicator of hydrological responses to global warming since ET plays a key role in controlling terrestrial water availability and climatic conditions (Seager et al., 2007;Zeng et al., 2017).
While ET can be accurately monitored by a wide variety of ground measurements (e.g., lysimeters, energy balance Bowen ratio, eddy covariance [EC], and scintillometry; Allen et al., 2011), estimation of large-scale, long-term ET remains a difficult task since these in situ techniques typically cover short periods (mainly less than a decade) with limited spatial extent. As a result, a number of approaches employing land surface models (LSM; Cai et al., 2014;Ma et al., 2017;Xia et al., 2016), remote sensing (RS) algorithms (Fisher et al., 2008;Miralles et al., 2011;Mu et al., 2011), data assimilation systems (Lu et al., 2017;Xu et al., 2019), and/or geostatistical upscaling of ground ET measurements (Jung et al., 2011), all with inputs from meteorological as well as surface property data, have been devised to quantify ET over larger areas and longer periods. However, most methods mentioned above require a significant number of soil-and vegetation-related parameters as inputs, which are typically interpolated from limited point-scale measurements/surveys and/or are retrieved from satellite/airborne observations (Masson et al., 2003), thus leading to additional uncertainties. These uncertainties emerge because (i) ground survey of soil profiles is yet relatively scarce in developing countries (Shangguan et al., 2013;Shi et al., 2004) and the simple average of those small-scale soil features will not result in a value representative at the scale of the modeling grid due to typically considerable spatial variability in them, although high-resolution spatially resolved and continuous soil properties maps become available recently in certain developed countries (Chaney et al., 2019) and (ii) translation of remote sensing signals into vegetation traits has great uncertainties because of the existing challenges in the treatment of confounding factors in spectrum-trait relations (Zurita-Milla et al., 2015). For example, a wide range of evaluations (Dai et al., 2019;Mölders, 2005;Teuling et al., 2009) have identified that different LSMs obviously diverge in accuracy for predicting surface fluxes due to empirical parameters (e.g., physiological, phenological, thermal, hydraulic, radiative) employed to represent multiple vegetation and soil types. Moreover, the poor scalability of the algorithms and the limited consideration of subgrid heterogeneity are also great challenges in LSMs for effectively representing the terrestrial hydrological processes, as was highlighted by Samaniego et al. (2017).
While state-of-the-art LSMs and RS models with advanced parameterizations of the surface fluxes are able to simulate large-scale ET in a satisfactory manner, they typically need detailed soil and vegetation data as inputs (e.g., Martens et al., 2017;Mu et al., 2011;Xia et al., 2012). For circumventing the difficulties in representing substantial heterogeneity in soil type, thickness, layering, vegetation cover, and rooting depth on the regional, continental, and/or global scales, the complementary relationship (CR) of evaporation (Bouchet, 1963) may be an appropriate choice for the estimation of ET because it does not require any surface, soil, or vegetation information as input.
The CR predicts the latent heat flux of the (vegetated or bare) land surface over a suitably long period of time, typically five days or longer (to filter out the effect of passing weather fronts (Morton, 1983) that may temporarily disturb the dynamic balance between land ET rate and the resulting moisture content of the air) by relating the ET rate of a small wet patch (ET p ), typically affected by horizontal energy advection (E h ), to that (ET w ) of a wet land surface of regional extent (thus with vanishing E h ) under the same assumed net surface radiation and wind conditions. For almost half a century following the pioneering work of Bouchet (1963), the CR relationship between three evaporation terms: actual ET, wet patch ET (ET p ), and wet-environment ET (ET w ), was thought to be linear in nature. Inspired by the work of Han et al. (2012) on the boundary conditions of the CR, Brutsaert (2015) formulated a more general, nonlinear version of the complementary relationship of evaporation.
Recently, Szilagyi et al. (2017), building on the latest developments of the nonlinear formulation of the CR by Brutsaert (2015) and Crago et al. (2016), proposed a calibration-free nonlinear CR model with more appropriate physical constraints. While the CR model of Brutsaert (2015) was originally intended to be generally applicable, it still contains a tunable parameter on top of its reliance of accurate plot-scale EC measurements (e.g., Brutsaert et al., 2017;Crago & Qualls, 2018;Han & Tian, 2018;Hu et al., 2018) for the calibration of a second model parameter. In the context of large-sample hydrological studies, however, Gupta et al. (2014) argued that the flexibility and transferability of a model cannot be clarified without evaluations over different regions encompassing various land surface and climatic conditions. This highlights the need of assessing any ET model's performance over multiple regions with hydroclimatic regimes that vary considerably. To this end, an appropriate way of setting the value of the key parameter, the Priestley-Taylor (PT) α (Priestley & Taylor, 1972) of the CR, becomes vital for large-scale model applications because it would be particularly difficult to calibrate this parameter at a grid-cell basis in continental-and/or global-scale applications due primarily to the unavailability of measured ET data on a cell-by-cell basis (Ma et al., 2019). Even if measured ET data from a limited number of locations were available for calibrating this model parameter, it may not be directly transferable to other regions or temporal scales (Clark et al., 2017;Samaniego et al., 2010;Zink et al., 2018). To avoid this problem, Szilagyi et al. (2017) proposed a method of identifying wet cells in regional-to-continental-scale grids of meteorological data for obtaining an appropriate constant value of the PT coefficient without resorting to any ET measurements (or their proxies). By inverting the Priestley-Taylor equation over those wet cells, and having an estimate of the wet surface temperature (Szilagyi, 2014), the PT α could be expressed with the help of temperature and humidity gradients between the wet surface and the air, thus making such a CR model calibration-free for estimating large-scale ET. See the Appendix for a brief explanation of the CR theory and some of its historically widely used model versions.
While preliminary evaluations indicated that this calibration-free CR model could, though with a minimum number of input variables, yield ET estimates on a par with current LSM outputs (Szilagyi, 2018a(Szilagyi, , 2018bSzilagyi et al., 2017), such a conclusion was based on an assessment of only one specific LSM product within the North American Regional Reanalysis data (Mesinger et al., 2006). Therefore, it is not clear whether this calibration-free CR model would perform better (or worse) than other ET products generated by (i) remote sensing (RS) models; (ii) atmospheric reanalysis; (iii) spatial upscaling of EC flux tower measurements; and (iv) other LSMs, for example, those widely employed in the second phase of the North American Land Data Assimilation System (NLDAS-2; Xia et al., 2012). Additionally, the comparisons made by Szilagyi et al. (2017) focused mainly on the mean annual ET rates of the six-digit Hydrologic Unit Code (HUC6) basins, while the CR model's (in long-term studies very important) ability to capture tendencies (e.g., linear trends) in annual ET rates in a wide variety of climates of the conterminous United States (CONUS) still remains largely unknown. In this context, a key question that needs also to be addressed is how well current main-stream ET products and also the CR are able to capture trends in annual ET.
In this study, to fully test whether the latest calibration-free CR model improves upon already existing ET estimates across CONUS, long-term (>30 years) water balance data from 18 two-digit HUC (HUC2) and 334 HUC6 basins were employed to evaluate the CR model when run with monthly aggregated inputs and to compare its performance to that of eight other ET products (i.e., three from LSMs, two from reanalysis, two from RS models, and an upscaling of EC measurements). The comparisons are focused on their skills in describing (i) the multiyear mean annual HUC2-and HUC6-averaged ET rates, (ii) the linear trends in their annual ET rates over the individual model coverage within the 37-year period of 1979-2015, and (iii) interannual variability of basin-averaged ET rates during corresponding periods. A series of statistical metrics were also applied to assess how the CR and the main-stream ET models perform in simulating large-scale, long-term ET rates over the CONUS.

Complementary Relationship-Based ET Estimates for 1979-2015
The Szilagyi et al. (2017) formulation of Brutsaert's nonlinear CR model (2015) was employed for large-scale ET estimation across the CONUS, relating two dimensionless evapotranspiration terms in the form where y and X are defined as Here E is the actual, while E p is the potential ET rate, that is, the evapotranspiration rate of a small wet patch in a drying (i.e., not fully wet) environment, typically expressed by the Penman (1948) equation as where Δ (hPa/°C) is the slope of the saturation vapor pressure curve at air temperature, T (°C), and γ is the psychrometric constant (hPa/°C). R n and G are the net radiation and soil heat flux into the ground (the latter typically negligible at a monthly scale) in water equivalent of mm/day, respectively. e* and e [=e*(T d )] are the saturation and actual vapor pressure of the air (hPa), T d is the dew-point temperature, and f u is the wind function containing the 2-m wind speed (u 2 ; m/s), that is, E w is the wet-environment ET rate, observed over a regionally extensive well-watered surface, specified by the Priestley-Taylor equation (Priestley & Taylor, 1972), that is, Note that equation (6) was derived for completely wet environments by Priestley and Taylor (1972), and therefore, Δ should be evaluated at the air temperature, T w (°C), observed in a wet environment, instead of the typical, drying environment T (Szilagyi, 2014;Szilagyi & Jozsa, 2008). This is important since previous studies have found that the difference between these two may reach 5-10°C (Huntington et al., 2011;Ma et al., 2015;Ortman, 2009;Szilagyi, 2014). By making use of a mild vertical air temperature gradient (de Vries, 1959;Szilagyi, 2014;Szilagyi & Jozsa, 2009) observable in wet environments (as R n is consumed predominantly by the latent and not by sensible heat flux), T w can be approximated by the wet surface temperature, T ws (°C). Note that T ws may still be larger than T when the air is close to saturation, but not T w , due to the cooling effect of evaporation, and in such cases T w should be capped by T (Szilagyi, 2014;Szilagyi & Jozsa, 2018). Szilagyi and Schepers (2014) demonstrated that the wet surface temperature is independent of areal extent; thus, T ws can be obtained by iteration from the Bowen ratio (β; Bowen, 1926) of a small wet patch (assuming that available energy for the wet patch is close to that of the drying surface) for which the Penman equation is valid, that is, Here e*(T ws ) is the saturation vapor pressure at T ws . Parameter α in equation (6) is the dimensionless Priestley-Taylor coefficient, with typical values from the range of [1.1-1.32] (Morton, 1983). For large-scale model applications when measured ET is lacking for the calibration of α, Szilagyi et al. (2017) proposed a novel method (see Appendix for details) for assigning an appropriate value of α by automatically identifying wet grid cells and utilizing observed gridded T and humidity data over them. The α value of 1.15 derived by Szilagyi (2018b) for the conterminous United States was retained for the present CONUS-wide ET simulation.
E p max is the maximum value that E p can, in theory, reach during a complete dry-out (i.e., when e a is negligible) of the land surface, that is, in which T dry (°C) is the dry-environment air temperature. The latter can be estimated from the adiabat of an air parcel in contact with the drying surface under constant R n − G (Szilagyi, 2018a), that is, where T wb (°C) is the wet-bulb temperature. T wb is derived by another iteration of writing out the Bowen ratio for adiabatic changes (Szilagyi, 2014) as Please refer to the Appendix for (i) a graphical illustration of the saturation vapor pressure curve, and the different temperatures and related ET rates defined above; (ii) a brief historical description of how the CR evolved into equation (1) over the past 40 years; (iii) plots of selected CR functions over sample data; (iv) how assigning a value of α is performed without resorting to any calibration; and (v) a pseudo-algorithm.

Water Resources Research
Equation (1) was applied in a continuous monthly simulation over the 37-year period of 1979-2015 across CONUS, employing the 4-km spatial resolution Parameter-Elevation Regressions on Independent Slopes Model (PRISM; Daly et al., 1994) air and dew-point temperature data. The 32-km North American Regional Reanalysis surface net radiation and 10-m wind data (Mesinger et al., 2006) were linearly interpolated onto the PRISM grid employing a power transformation (Brutsaert, 1982) of the 10-m wind (u 10 ) values into u 2 [=u 10 (2/10) 1/7 ], required by equation (5). The CR-derived monthly ET rates were then aggregated into annual sums for further evaluations.

Long-Term Large-Scale ET Products
Eight long-term, publicly available ET products were selected to represent mainstream approaches in largescale ET estimations, that is, LSMs, RS models, reanalysis, and spatial upscaling of EC measurements. Table 1 provides an overview of the eight products, which includes the following. 2.2.1. LSM-Based Products: Noah, VIC, and Mosaic Three LSMs, Noah (Chen & Dudhia, 2001), VIC (Liang et al., 1994), and Mosaic (Koster & Suarez, 1996), from NLDAS-2 were selected to represent ET estimates from the LSMs. While these LSMs were developed by different groups for use in global climate models, the underlying principle they calculate ET is essentially the same. They all use a resistance-type stress factor to adjust the Penman-Monteith-based (when surface resistance is set to zero; Monteith, 1965) potential ET rate to represent actual ET. However, there are substantial differences among the models in parameterizations of soil water flow, stomatal conductance, root water uptake, etc., which all impact the estimated stress factor (Xia et al., 2016) in these LSMs (see Figure 2 for the differences in LSM-modeled ET rates).
All three LSMs are driven by NLDAS-2 atmospheric forcing (air temperature, specific humidity, wind speed, surface pressure, incoming solar and longwave radiation, and precipitation) at a spatial resolution of 0.125° ( Xia et al., 2012), which are derived primarily from identical forcing fields of North American Regional Reanalysis data by the National Centers of Environmental Protection (NCEP), except for precipitation. NLDAS-2 precipitation comes from the NOAA Climate Prediction Center's 0.125°gauge-based precipitation analysis, with monthly PRISM adjustments for orographic impacts. The soil parameters for these LSMs are determined by the global 1-km hybrid State Soil Geographic Database (Miller & White, 1998). The ground land cover classification for the NLDAS-2 LSMs was based on the global, 1-km vegetation database of the University of Maryland (Hansen et al., 2010), which was retrieved from the advanced very high resolution radiometer. For each 0.125°model cell, the vegetation field considers the relative frequency (RF) value of each vegetation class established on the 1-km resolution. Noah uses the predominant vegetation class, while VIC and Mosaic use subgrid vegetation tiles weighted by the RF of the classes. Additional vegetation greenness fractions were derived from the NOAA normalized difference vegetation index data (Gutman & Ignatov, 1998) on a multiyear-mean monthly basis without any interannual variation. For a thorough introduction on NLDAS-2 LSM forcing, parameters, configuration, and outputs, readers are suggested to refer to https://ldas.gsfc.nasa.gov/nldas/NLDAS2model.php. In the present study monthly ET rates from three NLDAS-2 LSMs (Noah, VIC, and Mosaic) were employed for 1979-2015 in a spatial resolution of 0.125°.

Reanalysis Products: NCEP-II and ERA-Interim
Reanalysis data are produced by a specific data assimilation scheme which combines various observations (e.g., ground stations, aircrafts, and/or satellites) and forecast outputs from weather prediction models. Two atmospheric reanalysis data, NCEP-DOE (Department of Energy) Reanalysis 2 (NCEP-II) and ERA-Interim, were selected in the present study. NCEP-II (Kanamitsu et al., 2002) is an improved version of the first global reanalysis data (i.e., NCEP-I) released by NCEP-NCAR (National Center for Atmospheric Research), which fixes numerous problems occurring in the latter, although both are based on the three-dimensional variational data assimilation scheme. ERA-Interim is a global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (Dee et al., 2011), which used the most advanced, that is, a four-dimensional variational data assimilation, technique. Three-dimensional variational scheme treats observations that occur within a certain time interval around the target-analysis time as occurring simultaneously with the target time. Four-dimensional variational one, however, uses temporal weights of those near-simultaneous measurements to estimate the target time value. Note that while the spatial resolution of reanalysis data is usually coarse, that is, 2.5°for NCEP-II and~79 km for ERA-Interim, they can, however, be produced in a near-real-time manner, supporting long-term hydroclimatic research. In the present study, monthly ET values from NCEP-II and ERA-Interim for 1979-2015 over the CONUS domain were employed.

RS-Based Products: Global Land Evaporation Amsterdam Model and Penman-Monteith-Leuning
The Global Land Evaporation Amsterdam Model (GLEAM; Miralles et al., 2011) is a two-step method for the estimation of ET, meaning that first the Priestley and Taylor (1972) potential ET rate is obtained which is subsequently adjusted by the help of a stress factor. The latter is a function of root-zone soil moisture and vegetation optical depth and could be derived using microwave satellite data. This model is actually designed using remote sensing observations only, but can also be driven by reanalysis-or station-based forcing to extend its temporal coverage. In the present study GLEAM v3.2a ET product , covering 1980-2015 on a daily time step with a spatial resolution of 0.25°, was selected, driven by net radiation and air temperature from ERA-Interim plus the Multi-Source Weighted-Ensemble Precipitation data, and snow water equivalent on top of the already mentioned inputs. The static input variables of GLEAM include the ground cover fraction from the Moderate Resolution Imaging Spectroradiometer and the soil properties from the Global Gridded Surfaces of Selected Soil Characteristics generated by the International Geosphere-Biosphere Program.
The Penman-Monteith-Leuning (PML) model combines a Penman-Monteith-type equation for plant transpiration, the Leuning model for canopy conductance (Leuning et al., 2008), and the Priestley-Taylor equation for soil evaporation . The global scale 0.5°, monthly PML ET product for 1981-2012 was then generated by Zhang et al. (2017) using the Princeton Meteorological Forcing precipitation, air temperature, vapor pressure, shortwave and longwave downward radiation, wind speed (Sheffield et al., 2006), and the leaf area index data derived from advanced very high resolution radiometer normalized difference vegetation index . The static land cover input for the PML model was also from International Geosphere-Biosphere Program. The PML-obtained ET rates were constrained by the Budyko framework using global streamflow observations at the mean annual scale to ensure an internal water balance (Zhang et al., 2016). It should be mentioned that GLEAM and PML also partition ET into its components (e.g., soil evaporation, plant transpiration), thus representing state-of-the-art RS-based ET products.

Spatially Upscaled Eddy-Covariance Measurement Product: FLUXNET-MTE
FLUXNET-MTE is a spatially upscaled ET product of global eddy-covariance measurements using a machine learning technique called model tree ensembles (MTE) (Jung et al., 2011). The MTE was trained using 29 explanatory variables with measured flux data of 198 FLUXNET towers across a wide range of biomes worldwide, which are mostly located in North America and Europe. FLUXNET-MTE generated a monthly, 0.5°latent heat flux data set for 1982-2011 over the continents. Inputs of the model comprise of (i) the fraction of absorbed photosynthetic active radiation derived from merged remote sensing products of Global Inventory Modeling and Mapping Studies, Sea-viewing Wide Field-of-view Sensor, and Moderate Resolution Imaging Spectroradiometer; (ii) meteorological data from the Climatic Research Unit and the Global Precipitation Climatology Centre; and (iii) a land cover data fused from multiproducts called SYNMAP (Jung et al., 2006). It should be noted that although nearly 30 predictor variables were used to train the MTE, most of them did not vary interannually. Therefore, the FLUXNET-MTE product tends to underestimate the interannual variability of ET (Jung et al., 2011) that translates into subdued linear trend values (seen later).

Water-Balance-Derived ET of the HUC2 and HUC6 Basins Across CONUS
For a standard geographical framework for hydrological research and water resource management, the United States Geological Survey divides the country into successively smaller hydrological units with a unique "hydrological unit code" (HUC; Seaber et al., 1987) identifier. The CONUS has 18 first-level two-digit (i.e., HUC2) hydrological units (Figure 1), which comprise either the drainage area of a major river (e.g., Missouri region) or the combined drainage areas of a series of rivers (e.g., Texas-Gulf region). These HUC2 basins are also further divided into 204 second-level four-digit (i.e., HUC4) and 334 third-level six-digit basins (i.e., HUC6) based on surface topography (Seaber et al., 1987; Figure 1). In the present study, all (except the CONUS averaged mean annual ET rates of Figure 2) evaluations are based on basin-wide ET rates averaged over either HUC2 or HUC6 basins.
Basin-scale water-balance-based evapotranspiration rates (ET wb ) can be derived by where P, Q, and δS are the basin-wide precipitation, stream discharge/runoff, and the changes in terrestrial water storage, respectively. The last term of equation (11) represents the combined changes of water storage in the soils, groundwater, open water bodies, and the snow/ice system, and usually considered small enough to be ignored at an annual (or longer) scale (Senay et al., 2011), that is, Equation (12), however, may not be accurate for smaller basins affected by significant (i) interbasin water exchanges (Szilagyi & Jozsa, 2018) and/or (ii) reservoir regulations of large (relative to mean flow) storage volumes (Han et al., 2015). While the Gravity Recovery and Climate Experiment (GRACE) data (Tapley et al., 2019) enable the community to estimate basin-scale δS in equation (11), its available timecoverage (i.e., 2002-2017) is too short to allow for a reliable estimation of the HUC2/HUC6 mean annual ET wb values and especially for the estimation of long-term tendencies; therefore, equation (12)  Annual δS in the present study was calculated as the difference in terrestrial water storage anomaly of successive Decembers, the latter taken as the arithmetic average of three GRACE products processed by Geoforschungs Zentrum Potsdam; by the Center for Space Research at the University of Texas, Austin; and by the Jet Propulsion Laboratory in Pasadena, CA, after applying the relevant gain factors for each type of data (Landerer & Swenson, 2012). The monthly P and Q data came from PRISM precipitation (version AN81m; Daly et al., 1994) and United States Geological Survey runoff (at both HUC2 and HUC6 scales) records (Brakebill et al., 2011(Brakebill et al., ) for 1979(Brakebill et al., -2015. Altogether seven HUC6 basins ( Figure 1) were removed (about 2% of all HUC6 basins) from the analysis because their ET wb values were outliers from similar values in their neighborhood.
Because of a considerable difference that exists in the spatial resolutions of the selected ET products, including the CR-based estimates, all ET data (also the PRISM P and GRACE-derived δS values) were first resampled into a common 0.125°grid using the nearest-neighbor method, followed by a spatial averaging to calculate basin-wide ET (as well as P and δS) rates for each HUC2 and HUC6 units (see below).
The performance of the nine ET products was measured by the (i) Pearson correlation coefficient (R), (ii) root-mean-square error (RMSE), (iii) relative bias (RB), (iv) ratio of standard deviations (SR) in modeled and water-balance derived values, and (v) Nash-Sutcliffe efficiency (NSE) against water-balance derived ET wb rates for 18 HUC2 and 327 HUC6 basins. The comparisons involved the multiyear mean annual ET rates and the sum-of-squares-fitted linear trends in annual ET for the available data periods (see Table 1) of the ET products, and separately for the overlapping period of the GRACE data and the given ET product.
In addition, the same performance measures were calculated for the annual ET time series of the 18 HUC2 basins to assess how the models capture interannual variability.

Spatial Pattern of the Multiyear Mean Annual ET From CR and Eight Other Products
There is no considerable difference in the spatial pattern of multiyear mean annual ET values among seven products (without VIC and NCEP-II), all generally displaying lower ET rates over the arid regions in the western CONUS and higher values in the humid regions of the southeastern CONUS ( Figure 2). However, it seems that VIC simulates particularly low values over most parts of the eastern CONUS, while NCEP-II does the opposite. A country-averaged multiyear mean annual ET comparison indeed yields the highest value for NCEP-II with 838 ± 60 mm/year (mean ± standard deviation), which is followed by ERA-Interim (653 ± 28 mm/year) and Mosaic (625 ± 23 mm/year), while VIC yields the smallest ET rate of only 435 ± 14 mm/year ( Figure 2). CR yields 534 ± 19 mm/year, which is close to GLEAM's 530 ± 17 mm/year, both are "in the middle" ET rates when comparing all model values. As seen, FLUXNET-MTE displays the lowest interannual variations (10 mm/year) among all the models, mainly because the great majority of the explanatory variables used to train the MTE was assumed static over the years, thereby leading to the observed decreased interannual variability (Jung et al., 2011), which then probably also translates to reduced spatial variance of the resulting ET rates, seen later.

Basin-Scale Evaluation of the Multiyear Mean Annual ET Rates
The water-balance-based (equation (12)) multiyear mean annual ET rates (ET wb ) for the HUC2 basins are displayed in Figure 3. Not surprisingly, the Great Basin (#16; see Figure 1) and the Lower Colorado (#15) are the driest (ET wb = 260-305 mm/year, respectively) while the South Atlantic-Gulf (#03) and Lower Mississippi (#08) the wettest (ET wb = 920-960 mm/year, respectively) HUC2 basins. Figure 4 displays the evaluations of the multiyear mean annual ET estimates against the water-balance-based values over the 18 HUC2 basins for each product, with error bars denoting the interannual variability. Note that the averaging periods throughout the study follow the temporal coverage of the corresponding products (Table 1). While the multiyear mean annual values are not sensitive to the changes in the averaging periods (see Table S1), the trend values, calculated later, are.
In comparison with ET wb , the CR performs the best in estimating the multiyear mean annual ET rate of the HUC2 basins with a NSE value of 0.94 and RMSE value of 51.2 mm/year ( Figure 4 and  Figure 5 illustrates the spatial pattern of the ratio of multiyear mean HUC2-averaged ET from nine products to water-balance derived ET wb across CONUS. It can be seen that the errors of CR are within ±10% for most HUC2 basins with the largest exception for the Lower Colorado (#15). Noah and VIC generally underestimate the multiyear mean annual ET rates over almost all basins, with the most significant bias observed     and CR (RB = −1% in both cases), while the spatial variance is also reproduced the best by GLEAM (SR = 1.01) and the worst by VIC and FLUXNET-MTE (Table 2).
The water-balance derived (equation (12)) multiyear mean annual ET rates (ET wb ) for the HUC6 basins are displayed in Figure 6. While the wettest HUC6 basins (ET wb > 1,000 mm/year) are found within the two wettest HUC2 basins, that is, the South Atlantic-Gulf (#03) and Lower Mississippi (#08), the driest (ET wb < 100 mm/year) basins are found in the California (#18) HUC2 basin.
For the HUC6 basins the FLUXNET-MTE ET rates exhibit the highest NSE value of 0.87 (Figure 7 and  (Figure 7). VIC and ERA-Interim are also among the worst performers with a significant underestimation (RB = −21%) and overestimation (RB = 22%), leading to the lowest positive NSE values of 0.45 and 0.48, respectively (Figure 7). The spatial variance is best captured by the two RS models (PML and GLEAM) plus CR with SR values of 0.98, 0.97, and 1.04, respectively (Table 2).
Again, by switching to the GRACE period for using equation (11), there is no change in the best NSE and RMSE performance order among FLUXNET-MTE, CR, and PML (Table 2). For RB, GLEAM and CR yield the same 1% value. The order for the highest R values remains the same as before; however, CR comes up to the level of PML with an R value of 0.94. NCEP-II performs even worse than before, while VIC and ERA-Interim both improve a bit. For reproducing spatial variability both RS models continue to dominate with the highest SR. Figure 8 presents how the different models predict the multiyear mean annual ET rates of the individual HUC6 basins in a spatially referenced manner. As seen, three models, CR, GLEAM, and FLUXNET-MTE, yield a similar spatial pattern in the modeled ET rates, relative to ET wb values. All of them tend to overestimate ET wb in the northwestern and underestimate it in the southwestern CONUS. Mosaic and ERA-Interim overestimate ET wb in the majority of the HUC6 basins, while NCEP-II seriously overestimates it in almost every basin. The most serious underestimation of ET wb at differing locations can be seen in the CR, Noah, and VIC models.

Spatial Pattern of the Trends in Annual ET Rates From CR and Eight Other Products
While not all products (mainly PML and FLUXNET-MTE) cover the full period of 1979-2015, almost all products demonstrate that annual ET increased over the majority of the upper north-eastern third of the CONUS, while decreasing trends dominated the western parts ( Figure 9). Note again that the trend of a given product is for the period it covers, displayed in Figure 9. It appears that the trends in the reanalysis products, NCEP-II and ERA-Interim, may be unreasonably strong when compared with other products (i.e., the increasing trends in the former and the decreasing ones in the latter). The spatial pattern of the trends from the CR are generally consistent with those of LSMs (Noah, VIC, Mosaic) over most regions, except for the South Atlantic-Gulf HUC2 (#03) basin (e.g., Florida, Georgia, and Alabama) where the CR displays significant increasing trends but the LSMs do not, although GLEAM, PML, and FLUXNET-MTE (even with a shortened modeling period) do so as well. Similarly, in most parts of California the CR-derived ET rates display significant decreasing trends, less apparent in other model outputs, except ERA-Interim.

Basin-Scale Evaluation of the Trends in Annual ET Rates
For every HUC2/HUC6 basin, a linear trend in the basin-averaged annual ET wb values is calculated for the data coverages of the different ET products and also for their overlap with the GRACE period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Figure 10 displays the spatial distribution of the resulting linear tendencies in annual ET wb over 1979-2015 for the 18 HUC2 basins. West of the Lower Mississippi (#08) and Missouri (#10) HUC2 basins the tendencies are negative, with the largest ones (between −2 and −2.5 mm/year) found  The evaluations of the modeled linear trends at the HUC2-basin scale are presented in Figures 11 and 12 as well as in Table 3. It can be seen that the CR captures the trends most effectively among all the models and model periods considered; that is, CR produces the highest NSE (0.80), R (0.91), and the lowest RMSE Figure 8. Spatial distribution of the ratio of multiyear mean annual HUC6-averaged modeled ET to ET wb rates. In each panel, the period for calculating the multiyear mean values of ET wb follows that of the corresponding ET product.  Figures 11 and 12). Unlike for mean annual ET rates, VIC and Noah also perform relatively well in the estimation of the trends with NSE values of 0.74 and 0.63, respectively. The largest errors are found in the two reanalysis, NCEP-II and ERA-Interim, data ( Figure 11 and Table 3), leading to negative NSE values ( Figure 11). Note that ERA-Interim is the only product that yields decreasing trends in most basins of the eastern CONUS, contrary to what is observed in ET wb data (Figures 10 and 13). Among the RS models, GLEAM and PML provide reasonable estimates for the trends in annual ET; however, the former predicts a very narrow range for the trends in the HUC2 basins (see Figure 12 and SR = 0.49 in Table 3), similar to that of FLUXNET-MTE, this latter one due, most probably, to its underestimation of the interannual variability of the ET values (see section 3.5). When

10.1029/2019WR024867
Water Resources Research considering the three products that do not cover the full range of 1979-2015 (Figure 12), the CR yields better performance in every (except R) statistical measure (Table 3) employed. In the model evaluation of trends, RB should not be weighed heavily since the denominator may be very small if ET wb demonstrates little change over a given period.
The linear trend estimates really deteriorate for the GRACE period ( Figures S1 and S2 and Table 3) due primarily to the shortness (13 years altogether at best, but only 9 years for FLUXNET-MTE and 10 years for PML) of the periods (notice the enlarged horizontal whiskers in the ET wb trend estimates of Figures S1 and S2 in comparison with those in Figures 11 and 12). Note that only the CR and GLEAM produce (similar) positive NSE values for the 13-year period, and also the smallest RMSE and largest R values. For the even shorter periods in Figure S2, there is not much difference between the models. From Figures S3 and S4 and Table 3 it becomes clear that the skill of all products in predicting the waterbalance derived trend deteriorates significantly due to the large uncertainty in the HUC6 water-balanceobtained trends caused by large interannual variability in the data. The resulting whiskers are particularly long and the number of data points is so numerous that displaying these uncertainties became impossible in these graphs. The largest NSE (0.4) and the smallest RMSE (1.53 mm/year) values are provided by VIC, followed by an overall equal performance by Noah and CR (0.32 and 1.6 mm/year, respectively) for the 1979-2015 period ( Figure S3 and Table 3). Similar to HUC2-basin performance, the reanalysis models also end up with negative NSE values for the HUC6 basins. For the shortened periods ( Figure S4), the CR and the relevant models (GLEAM, PML, FLUXNET-MTE) behave about the same during the corresponding periods. With regard to the whole GRACE period of 2003-2015, only the CR and GLEAM could produce positive NSE values of 0.14 and 0.12 (Table 3), respectively; while for the shortened GRACE periods the models involved perform about the same again for the HUC6 basins (Table 3).

Basin-Scale Evaluation of the Temporal Variations of Annual ET Rates
In order to understand better how the different ET models capture interannual variability (IAV) of the ET values, annual time series from the nine ET products plus that of ET wb are displayed in Figures S5 and S6 for each HUC2 basin and each period considered. As expected, ET wb in general display reduced variability for the GRACE period due to changes in annual water storage. As seen, FLUXNET-MTE data displays by far the smallest IAV. While all ET products fail to reproduce the IAV of ET wb in the eastern part of CONUS

10.1029/2019WR024867
Water Resources Research (i.e., left column), it appears that most of them show a more comparable IAV to that of ET wb in the generally more arid western part of the CONUS (i.e., right column). When averaged for all 18 HUC2 basins and over all periods, the mean SR values of most ET products are similar (50-70%)-with the best performing model being Mosaic (SR = 76%)-except for NCEP-II which significantly overestimates the IAV of ET wb ( Figure S7), and FLUXNET-MTE doing the opposite. Figure S7 and Table S2 also display additional model skills in simulating temporal variations in HUC2 basin annual ET rates. As seen, the CR produced the smallest period-combined RMSE (71 mm/year) and largest NSE (−0.24%) values in comparison with the other eight products. Note that for NSE the median was computed instead of the arithmetic average due to frequently observed large negative values in individual HUC2 basins which happens when the mean of model ET differs significantly from water-balance one. Noah yielded the highest overall R value (0.58), while GLEAM and CR succeeded the most (RB of −0.3% and −3%, respectively) with being on target with the HUC2-basin model-ET means.

Discussions
While the water-balance method is often regarded the most accurate avenue for deriving basin-scale ET rates , the uncertainties in such "ground-truth" data are not negligible. First, although the subsurface leakage to adjacent basins should be minimal for HUC2 basins, this may not be true for all HUC6 basins (e.g., in the Sand Hills of Nebraska) whose mean area are only approximately 23,000 km 2 (Bleed & Flowerday, 1989). Besides, in watersheds with an obvious trend in the stored water (e.g., due to groundwater pumping) volumes, equation (12) may be inaccurate. Similarly, large-capacity water storage reservoirs may disturb the annual water balance on smaller watersheds. Second, the United States Geological Survey runoff value for each hydrological unit was generated by weighting the stream discharge data collected at HUC8 stream gauges with the corresponding drainage area and subsequently averaging the so-derived values over the (i.e., HUC6 or HUC2 in our case) basins (Brakebill et al., 2011). Although this data set is regarded as a close surrogate of the natural runoff in hydroclimate studies (Ashfaq et al., 2013;Szilagyi, 2018a), those basins with relatively few discharge gauging stations may be less reliable. Note also that water transfer between watersheds is not always negligible, see, for example, the Colorado River water transfer into California (Hanak et al., 2011).
Third, PRISM precipitation data (Daly et al., 1994) may not be perfectly appropriate for computing long-term tendencies for a given location (or grid cell) since certain stations experienced location and/or equipment changes, which may potentially lead to inhomogeneity in the time series of the data in that location/cell. This effect however vanishes for the HUC2 and most likely for the majority of the HUC6 basins as well, because such changes can affect only basins with very few precipitation stations. Fourth, the original spatial resolution of GRACE plus the nine ET products differ significantly (range of 0.125°to 2.5°). Accuracy of the basin-wide ET estimates therefore may deteriorate to some extent over small watersheds. For this reason, basin-wide ET wb and modeled ET rates may be more reliable for HUC2 than for HUC6 basins. In general, future detailed studies on the accuracy of long-term, multi-level basin-wide P, Q, and δS could benefit a comprehensive evaluation of the large-scale ET products.
It should be noted that the accuracy of any ET product is highly dependent on the quality of inputs used to drive the model. With regard to the eight ET products considered in the present study, only Noah, VIC, and Mosaic from NLDAS-2 were intended exclusively for the CONUS where high-resolution, more reliable atmospheric forcing, soil and vegetation data could be obtained (Xia et al., 2012), while the other five ET products cover the whole globe with a limited number of available atmospheric forcing and soil/vegetation property data sets. For example, atmospheric forcing of the GLEAM v3.2a product  includes precipitation from the Multi-Source Weighted Ensemble Precipitation data set (Beck et al., 2017) together with air temperature and radiation data from the ERA-Interim reanalysis (Dee et al., 2011). It is therefore believed that the quality of such global forcing is not as high as those developed exclusively for the CONUS, that is, PRISM temperature and precipitation data. Note that while NLDAS-2 LSMs were driven by better atmospheric forcing and soil/vegetation data across the CONUS (Xia et al., 2012), they did not show much improved skills in comparison with GLEAM, PML, and FLUXNET-MTE, indicating great challenges due to not only parameter uncertainties (Clark et al., 2017;Mendoza et al., 2015;Mölders, 2005) but also the representation of various physical processes related to ET, such as vegetation dynamics (Ma et al., 2017), surface exchange coefficients (LeMone et al., 2008), and root water uptake , which would potentially be upgraded in future NLDAS LSMs (Xia et al., 2016). Finally, most LSMs and reanalysis products do not consider anthropogenic activities such as agricultural irrigation, which may impact both regional ET and precipitation to a large extent (Szilagyi, 2018a;Szilagyi & Jozsa, 2018).

Summary and Conclusions
This study evaluated the performance of the latest calibration-free CR model together with eight other mainstream methods (LSMs, RS-based models, reanalysis products, and a spatial upscaling of eddy-covariance measurements) of estimating monthly ET rates and their multiyear linear tendencies with the help of water-balance-derived annual ET rates (ET wb ) over 18 HUC2 and 327 HUC6 basins within the CONUS.
At the HUC2 basin level where the ET wb estimates are expected to be more accurate (due to a higher number of precipitation stations, more accurate discharge measurements in larger rivers, and a smaller relative effect of any possible interbasin water transfers and/or reservoir regulations on the water balance), the CR produced the highest NSE and best RMSE values among the nine products for the multiyear mean annual ET rates, independent of the different evaluation periods (due to somewhat different data coverage among models). The CR's RB value was matched/exceeded only by GLEAM, the latter also excelling in SR. Similarly, the CR's (spatial) R value was only surpassed by Noah. Still at the HUC2 level, the CR produced the highest R and NSE values together with the best RMSE for the periods considered in the estimation of the linear tendencies in annual ET rates. In the RB value FLUXNET-MTE excelled, followed by the CR, while in the SR value Noah and CR did so.
At the HUC6 level where the water-balance derived ET rates are less reliable for the above reasons, all models' performance worsened. For the mean annual values FLUXNET-MTE achieved the highest NSE and best RMSE values closely followed by CR and PML, in that order. The RB value of GLEAM reflected almost perfectly unbiased estimates, again closely followed by CR. In the R value Noah excelled, followed by FLUXNET-MTE, PML, and CR, in that order. However, SR was the best for PML and GLEAM in both full and GRACE periods, followed by CR. For the linear tendencies all model-performance measures further declined. Yet the CR managed to produce the lowest RB and the best RMSE values overall. CR, VIC, and FLUXNET-MTE produced about the same R values, while CR together with VIC and GLEAM showed comparable NSE values during the full period. The SR value was the best for ERA-Interim and Mosaic for the full and GRACE period, respectively.
Trend-performance statistics are sensitive to the actual data period (especially if the number of data points is low) when the fitting is performed in the typical least sum-of-square-sense (employed here too), as such fitting is not robust (well known in statistics) since outliers exert a disproportionally large weight in the fitting process making it sensitive to such values and errors in the data. Note that here trends served as another tool of comparing model performance, and not the focus of analysis of this study.
Among the models and periods considered CR, GLEAM, PML, and FLUXNET-MTE never produced any negative value for the spatial NSE.
When the combined (i.e., for full and GRACE periods) performance statistics of the modeled annual ET time series of the 18 HUC2 basins are considered (Table S2) then CR excelled in RMSE and NSE, GLEAM in RB, Noah in R, and Mosaic in SR. Only VIC and PML never excelled in any category.
It can be stated that altogether the CR outperformed all other models (even the one which utilizes measured ET fluxes) considered in this study for the estimation of the multiyear mean annual ET value and linear tendency in the annual values at the HUC2 level across the CONUS and was only outdone by FLUXNET-MTE at the HUC6 level. The CR was especially strong with the estimation of the linear trend for the HUC2 basins. This seems quite a feat if one considers that the model was not calibrated against water-balance data as it is calibration-free (however, setting the PT α value is required; detailed in the Appendix) and employs only a minimum number of input variables, such as net surface radiation, wind speed, and air-and dew-point temperature (or any other humidity measure) which also make it easily applicable on a global scale (with the PT α possibly set by, e.g., continents). Future research may clarify if setting the PT α may require further regionalization techniques such as discussed by Samaniego et al. (2010). As of today it is not clear what surface and/or boundary layer flow parameters the value of the PT α depends on despite of several exploratory studies (de Bruin, 1983;de Bruin & Keijman, 1979;Heerwaarden et al., 2009;Lhomme, 1997;Szilagyi, 2015).
For these reasons, it is believed that the CR model here described could serve as a benchmark to other ET estimation models requiring a significantly larger number of input variables often obtained by state-ofthe-art data acquisition (often remotely sensed) and assimilation systems while running on strong computer platforms. These authors believe that this CR could especially benefit the LSMs of current climate models with their calibration (when water-balance data or surface flux measurements of latent and sensible heat are absent or of poor quality) and "back-evaluation" to verify that the atmospheric measurements are indeed in balance with the LSM-predicted latent heat fluxes, especially in future climate scenarios when water balance data are nonexistent for verification. Note that the CR is only a diagnostic tool for the ET rates, as it estimates the cause (ET) from the effect (moisture content and temperature of the air), and therefore, it cannot be employed for predictions in the same sense as those models that predict the ET rate at a future time (i.e., at t + dt) from, for example, the soil moisture and vegetation status obtained at time t.
The success of the present version of the CR lies in its tracking the state of an air parcel, in contact with the surface, via adiabatic processes. It directly relates the surface latent heat flux averaged over a suitable spatial and temporal scale (allowing for a full adjustment of the atmospheric boundary layer to surface properties and fluxes) to the average state of the air overlying the land surface and not through a proxy, like soil moisture. It also makes use of the possible limit values in regional evaporation rates under a given surfaceavailable energy and wind conditions together with how fast evaporation rates may depart from their limit values with changing atmospheric conditions of air humidity and temperature, that is, the very same atmospheric response that changing ET rates triggered in the first place.

Appendix A
Under a constant R n − G term at the drying surface the sum of the latent (LE; i.e., the evaporation rate expressed in energy flux units) and sensible (H) heat fluxes is also constant, from which it follows that a change in one will cause a similar but opposite change (δ) in the other, that is, δLE = −δH. The corresponding response in terms of vapor pressure (e) and temperature (T) of the air in contact with the surface will follow the adiabatic line of Figure A2 (Monteith, 1981). With H changing, the surface temperature (T s ) will also change proportionally (proportionality denoted from here on by the ∝ symbol), which will affect the air temperature, so that one can write -δLE = δH ∝ δT s ∝ δT = −δe/γ (the latter from the adiabat), thus obtaining δLE ∝ δe/γ. By normalizing the changes with their maximum range, and taking advantage of the linear nature of the adiabat, yields the following nondimensional form Under a constant R n − G and f u terms T w (similar to T wb ) stays constant [and thus e (T w ) too, which then can be replaced by any other constant in the proportionality, such as e*(T w ) or e*(T wb )] under adiabatic changes of the surface air (Monteith, 1981;Szilagyi & Schepers, 2014). Therefore, the proportionality (equation (A1)) can be written as In the classical versions of the CR (Brutsaert & Stricker, 1979;Morton, 1983), changes in the vapor pressure are replaced by corresponding changes in the E p term normalized also by E w as E is in equation (A2) and are inserted into a first order (i.e., linear) polynomial (as a first guess for the unknown functional relationship) by considering that 1 ≤ E p /E w ≤ 2, while E/E w takes up values of 1 and 0, respectively, at the limit/boundary values of 1 and 2. Note that the upper limit value of E p /E w assumes, in addition to linearity, a symmetrical relationship between δ(E/E w ) and δ (E p /E w ) around the shared value of unity ( Figure A2). The solution for the resulting set of linear equations thus becomes The replacement of e(T) [=e*(T d )] by a linear function of E p in equation (A2) works because the second term of the Penman equation for E p contains the vapor pressure deficit [VPD = e*(T)e(T)] which changes in accordance ( Figure A1) with e(T) (but opposite in sign) along the adiabatic line [and slightly moderated by the change in the γ/(Δ + γ) term of equation (4) with temperature]. As seen from Figure A1, the VPD term is much more sensitive to temperature changes than the vapor pressure term and may explain its better success with predicting ET.
Equation (A3) yields two curves (one for E/E w and another for E p /E w as functions of E w /E p , the latter a possible measure of water availability on the land surface) in Figure A2a, axisymmetric around the reference level of unity, when E = E w = E p . As the land dries out of a completely wet stage (the [1, 1] coordinate point in the graph), E is decreasing (therefore, T is increasing and with it e decreases along the adiabatic line of Figure A1) with water availability, and therefore, E p is increasing due to corresponding increases in VPD.
With a different grouping of the evaporation terms, equation (A3) can equally be brought into a form such as Figure A2b. Note that the only way of ensuring that the theoretical lines overlap the measured data is via the choice of the PT α value of equation (6). As a result, the same water-balance derived E wb /E p values may appear in different locations (are shifted horizontally) in the graphs, depending on the α value that fits the given CR method.
As Kahler and Brutsaert (2006) and Szilagyi (2007) noted, there is no physical necessity for the CR to be symmetric since E p can be interpreted in several ways, the Penman equation just one of them. E p can equally be represented by different evaporation-pan data or by different wind functions (Ma et al., 2015) in the Penman equation. Therefore, Kahler and Brutsaert (2006) modified equation (A3) but still keeping its linear nature between its terms. The unity and zero values of E/E p will now take place at the corresponding limit/boundary values of 1 ≤ E p /E w ≤ 1 + b, where b > 0. The solution thus becomes See Figure A2 for the corresponding curves with the b = 1.3 choice, as illustration. The CR curves for E/E w and E p /E w in Figure A2a are no longer symmetric.
By allowing for a higher-order (i.e., third-order) polynomial between y = E/E p and x = E w /E p and taking into consideration the following boundary conditions (BC): (i) y = 1 at x = 1, (ii) y = 0 at x = 0, (iii) dy/dx = 1 at x = 1, and (iv) dy/dx = 0 at x = 0, Brutsaert (2015) obtained the y = 2x 2x 3 nonlinear CR. When written out in the original variables it yields To make it more flexible Brutsaert (2015) introduced an additional parameter, c, which thus yields y = (2 − c)x 2 + (2c − 1)x 3 − cx 4 . Szilagyi et al. (2016) warned that −1 ≤ c ≤ 2 must be met for (i) a monotonic functional relationship between E/E p and E w /E p as well as (ii) making sure that y ≤ x for 0 ≤ x ≤ 1, which means that E ≤ E w must always be met. With the c = 2 choice, displayed in Figure A2, one obtains and respectively. Szilagyi et al. (2016) pointed out that unless E w is zero, the E w /E p term will always be larger than zero (thus, the lower BC is misconstrued in general) as the VPD term in the Penman equation is bounded by e*(T dry ) (see Figure A1) and wind velocities at 2-m height above the ground are also limited in value. As a remedy, BC (ii) can be replaced by (ii)' y = 0 at x = d (0 ≤ d < 1) since y may vanish already at x = d . To make the model even more flexible, Szilagyi et al. (2016) let the slope (s) of the nonlinear CR function be larger than zero at the origin, d, via the new BC (iv)' Figure A2. Theoretical relationships between the nondimensional terms of the different CR versions discussed in the Appendix, overlain a sample data set of mean annual HUC6 values across CONUS (data source: Szilagyi et al., 2017). No parameters (where appropriate) were calibrated for the data displayed which here serve only a demonstrational purpose between theoretical lines and data points. B2015 = Brutsaert (2015) dy/dx = s at x = d, with s ≥ 0. Solution of the third-order polynomial of y = a 3 x 3 + a 2 x 2 + a 1 x + a 0 thus becomes with the admissible range for the slope as 0 ≤ s ≤ (2d + 1)/(1 − d), resulting from the (monotony and PT limit line) constraints above on the behavior of the expected CR function. Note that equation (A11) is valid for x ≥ d; when x < d, y = 0 is expected . For the E/E w term the polynomial becomes (with the same coefficients above) See Figure A2 for the two sample curves when d = 0.25 and s = 0.
The nonlinear CR solution of Szilagyi et al. (2016) improves upon the unrealistic boundary value choice of E w /E p = 0 by Brutsaert (2015) by exploiting that E p is bounded. The unrealistic zero value of x by Brutsaert (2015) leads to similarly unrealistic low values of the PT α (=1.03 in Figure A2) in Brutsaert's (2015) solution (see Brutsaert et al., 2017), and also to the present choice of the largest possible value of c (=2), the latter making the CR curve as wide as possible in Figure A2, both necessary to bring observations match with theory in Figure A2. The problem though with the improved BC of Szilagyi et al. (2016) becomes that it is just a constant, while the maximum value (E p max ) of the E p term may change by the measurement period. These E p max values for each measurement period however can be estimated by the Penman equation through the e*(T d ) = 0 choice, and by the help of the adiabatic line of Figure A1 for the estimation of the corresponding dry-environment air temperature, T dry (equation (9)), reached when the environment becomes completely dry, or is very close to it. A proper scaling of the x = E w / E p values thus becomes as X = (E p max − E p ) (E p max − E w ) −1 x, which ensures that X indeed is zero when E p has reached its maximum value of E p max Szilagyi et al., 2017) and allows X become unity when the environment is wet, that is, itself can be considered as an index of aridity (A i ), having a value of zero under total lack of moisture conditions and a value of unity in a wet environment. See Szilagyi et al. (2017) for the mean annual value of A i over CONUS. By employing the same BC's Brutsaert (2015) prescribed but now for X, one obtains (Szilagyi et al., 2017) Note the slight fluctuations in the equation (A14) curve of Figure A2a and in the corresponding E p /E w curve (=A i /X) due to the emergence of A i outside of X (but not in equation (A13)).
Let us finally discuss Brutsaert's (2015) BCs of (iii) dy/dX = 1 at X = 1 and (iv) dy/dX = 0 at X = 0, employed now for X. Let us use finite differences for the derivatives and denote the new values the variables acquire through the changes with an index of "2". At X = 1 one can write where the symbol means a positive number very close to the one shown in its argumentum. The corresponding change in X is 10.1029/2019WR024867

Water Resources Research
Thus, dy/dX at X = 1 is unity. The same changes at X = 0 are where the change, E p max − E p2 , in E p is likely much larger than the corresponding change in E (and thus in T, therefore in e along the adiabat), because of the steep slope of the e*(T) curve at high temperatures causing a large change in VPD and thus in E p due to a relatively small change in T (see Figure A1). As a result, dy/dX = 0 at X = 0 can be expected as the E w (E p max − E w ) −1 term can never be too small (will not vanish).
Finally, the nonlinear CR model (equations (A13) and (A14)) with BCs (i) y = 1 at X = 1, (ii) y = 0 at X = 0, (iii) dy/dX = u (≥1) at X = 1, and (iv) dy/dX = v (>0) at X = 0 were derived and tested (the results not displayed) with calibrated values of u and v with the help of water-balance data of the HUC6 watersheds of CONUS to see if relaxing BCs (iii) and (iv) for X would result in better ET predictions. None of these calibrated models performed significantly better than the calibration-free CR version applied in this study, which shows at the very least that the current formulation is indeed highly robust and the assumptions/approximations as a whole, employed in its derivation, are not far-fetched from reality.

Appendix B
Estimation of a spatially and temporally constant value of α can be obtained with the help of the Priestley and Taylor (1972), equation (6), inserted into the Bowen ratio (Bowen, 1926) written for a wet environment as in which all variables are defined in section 2.1. After rearrangement of equation (B1) one obtains For large-scale gridded data, it is possible to identify wet cells as they must satisfy certain predefined requirements, thereby calculating their α values that must fall within the theoretical limits of {1, [Δ (T w ) + γ]/Δ (T w )} (Priestley & Taylor, 1972). Note that when a cell is wet then T w in equations (B1) and (B2) equals the measured T, therefore these equations must be supplied by T during the identification process of the possible wet cells. For cells where the measured T is not equal to T w (i.e., are not wet), the α value of equation (B2) will likely be outside the specified interval. Following Szilagyi et al. (2017), a grid cell is considered "wet" provided that (i) T ws > T + 2°C (note that if a cell is not wet then T ws < T or T ws ≈ T is likely), (ii) relative humidity (RH) is larger than 90%, and (iii) the equation (B2) calculated α value falls into the above specified interval. RH [=100 e*(T d )/e*(T)] was derived using T and T d data from PRISM. T ws comes from equation (7) and here should not be limited by T. The average value of α resulting from such wet cells yielded a value of 1.15 for CONUS, which was retained for use in the current CR model.

Pseudocode of ET Rate (mm/day) Estimations
The calculations may be performed by temporally averaged input values over intervals of integer multiples of a day. Temporal averaging of the input variables five days or longer are expected to yield the most accurate ET estimates due to the assumptions inherent in the complementary relationship (CR) of evaporation. The optimal spatial scale for the CR is 1 km 2 and up.
Input variables: Air temperature (°C) T Output: Estimated actual ET rate, E, in mm/day Start 1. Convert R n and G into water-depth equivalent of mm/day 2. Calculate E p by equations (4) and (5) 3. Calculate T ws by iteratively solving equation (7), cap the value of T ws by T, if T ws > T 4. Calculate T wb by iteratively solving equation (10) 5. Calculate T dry by equation (9) 6. With gridded large-scale data of the input variables follow Appendix B for obtaining α, otherwise choose a trial value from [1.1-1.32] 7. Assign the value of T ws to T w and calculate E w by equation (6) 8. Calculate E p max by equation (8) 9. Calculate the actual ET rate (E) by equations (1)- (3) End The value of α can be calibrated (provided that measured ET rates are available) by trial and error relatively fast as it is typically satisfactory to specify it to two decimals, and as such, there are only 23 distinct values within the [1.1-1.32] interval.