ECOSTRESS: NASA's Next Generation Mission to Measure Evapotranspiration From the International Space Station

The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) was launched to the International Space Station on 29 June 2018 by the National Aeronautics and Space Administration (NASA). The primary science focus of ECOSTRESS is centered on evapotranspiration (ET), which is produced as Level‐3 (L3) latent heat flux (LE) data products. These data are generated from the Level‐2 land surface temperature and emissivity product (L2_LSTE), in conjunction with ancillary surface and atmospheric data. Here, we provide the first validation (Stage 1, preliminary) of the global ECOSTRESS clear‐sky ET product (L3_ET_PT‐JPL, Version 6.0) against LE measurements at 82 eddy covariance sites around the world. Overall, the ECOSTRESS ET product performs well against the site measurements (clear‐sky instantaneous/time of overpass: r2 = 0.88; overall bias = 8%; normalized root‐mean‐square error, RMSE = 6%). ET uncertainty was generally consistent across climate zones, biome types, and times of day (ECOSTRESS samples the diurnal cycle), though temperate sites are overrepresented. The 70‐m‐high spatial resolution of ECOSTRESS improved correlations by 85%, and RMSE by 62%, relative to 1‐km pixels. This paper serves as a reference for the ECOSTRESS L3 ET accuracy and Stage 1 validation status for subsequent science that follows using these data.

The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) was launched to the International Space Station (ISS) on 29 June 2018. ECOSTRESS is a thermal radiometer built by National Aeronautics and Space Administration (NASA)'s Jet Propulsion Laboratory (JPL) that measures thermal infrared radiation (TIR) in five bands from 8-to 12.5-μm wavelengths, plus an additional sixth band at 1.6 μm for geolocation and cloud detection (ecostress.jpl.nasa.gov). On board the ISS, which has an irregular orbit (rather than a regular polar or geostationary orbit), ECOSTRESS collects measurements continuously between~52°N and~52°S at different times of day. The overpass return frequency for any same spot on Earth is 1-5 days, depending on latitude, with some areas measured multiple times in a single day (particularly the higher latitudes where the ISS orbital direction shifts) within the ECOSTRESS swath width of 384 km (Figure 1). The pixel size at nadir is 38 m × 69 m, which is resampled by the ECOSTRESS data production software to 70-m × 70-m pixels for noise reduction and ease of use. As such, ECOSTRESS now provides a combination of good spatial and temporal resolutions with diurnal cycle sampling.
ECOSTRESS produces four levels of data products, with each increasing level incorporating additional ancillary information. The first data-product level includes raw instrument and calibration information (L0-1A) (Logan & Johnson, 2015), calibrated at-sensor radiances (L1B_RAD), and geolocation (L1B_GEO) (Smyth & Leprince, 2018). The second data-product level incorporates additional data from numerical weather prediction for atmospheric correction (Malakar & Hulley, 2016;Matricardi, 2008;Saunders et al., 1999) to generate land surface temperature and emissivity (L2_LSTE) and a cloud mask (L2_CLOUD) using the Temperature and Emissivity Separation retrieval algorithm also used in other missions (Gillespie et al., 1998;Hulley et al., 2017;Hulley & Hook, 2011;Hulley & Hook, 2018). The third data-product level incorporates additional atmospheric data from MODIS and surface properties from MODIS and Landsat to generate ET (as the latent heat flux, LE), including the ET components of canopy transpiration, soil evaporation, and interception evaporation using the Priestley-Taylor (PT) JPL retrieval algorithm (L3_ET_PT-JPL) (Fisher et al., 2008;Fisher & ECOSTRESS algorithm development team, 2015;Halverson, 2018). An additional ET product (L3_ET_ALEXI) is produced for a subset of agricultural sites in the United States with the Disaggregated Atmosphere-Land Exchange Inverse model (Anderson, Kustas, et al., 2013). Finally, the fourth data product level includes a stress index based on the Evaporative Stress Index (ESI) (Anderson et al., 2010;Otkin et al., 2013) (L4_ESI_PT-JPL; L4_ESI_ALEXI), and a water use efficiency (WUE) product (L4_WUE) (Fisher & ECOSTRESS algorithm development team, 2018), the latter of which originally incorporated gross primary production from MODIS (Zhao et al., 2005), but switched to a native 70 m gross primary production product retrieved using the Breathing Earth System Simulator algorithm (Ryu et al., 2011).
At the outset, the ECOSTRESS Early Adopters program was the largest in NASA history, with a large community of scientists using ECOSTRESS data for a wide variety of applications (ecostress.jpl.nasa.gov/early-adopters). NASA develops Early Adopters programs for new missions to enable support for learning to use the data products while the mission and data production are still in development. Nonetheless, it is essential that ECOSTRESS data are validated first to provide an assessment of accuracy and error before these scientific investigations can be established. Validation is a necessary and important first step to launch these science investigations forward. First, the L1 and L2 products were validated in Hook et al. (2019), who reported uncertainties for those products at <1 K. The objective of this study is to conduct the initial validation and error assessment of the global ECOSTRESS ET product (L3_ET_PT-JPL). Specifically, we ask: how well does ECOSTRESS capture ET across different biome types and climate zones? Are there biases in ECOSTRESS across different times of day?
To conduct this analysis, we used ET measurements from eddy covariance sites (Baldocchi, 2008;Baldocchi et al., 2001). Typically, eddy covariance data sets require many months to years for systematic collection, organization, consistency, gap-filling, energy balance closure, spike removal, quality flags, and other processing procedures (Baldocchi, 2003;Falge et al., 2001;Foken, 2008;Moffat et al., 2007;Papale et al., 2006;Wilson et al., 2002). In order to develop an initial preliminary and fast error assessment for ECOSTRESS Figure 1. The 384-km swath width, high spatial resolution (70 m × 70 m) and accuracy of ECOSTRESS provide field-scale detail and large coverage of evapotranspiration worldwide. Here, ECOSTRESS is acquiring data over Texas, USA, showing differences between pivot irrigation fields as well as within fields.
to allow the science community to proceed forward, we collected as large a data set as fast as possible, conducting a rapid processing of data from over a hundred disparate eddy flux sites. Consequently, we classify this as the Land Product Validation Stage 1 (Committee on Earth Observation Satellites; lpvs.gsfc.nasa.gov), or preliminary validation, with the understanding that both future refined eddy flux-based validation data sets and fully processed versions will become available, as well as further reprocessing for future versions of the ECOSTRESS data.

ECOSTRESS ET Data
The ET retrieval approach for the ECOSTRESS L3_ET_PT-JPL product is the PT-JPL algorithm (Fisher et al., 2008), which has been widely validated throughout the literature as one of the top performing global remote sensing ET models (e.g., Y. Chen et al., 2014;Ershadi et al., 2014;Gomis-Cebolla et al., 2019;Jiménez et al., 2018;McCabe et al., 2016;Michel et al., 2016;Miralles et al., 2016;Polhamus et al., 2013;Purdy et al., 2018;Talsma et al., 2018;Vinukollu et al., 2011). Through ecophysiological constraint functions, PT-JPL retrieves actual ET by reducing potential ET (PET) starting with the PT equation Priestley & Taylor, 1972): where Δ is the slope of the saturation-to-vapor pressure curve, dependent on near-surface air temperature (T a ;°C) and water vapor pressure (e a ; kPa), γ is the psychrometric constant (0.066 kPa/°C), R n is net radiation (W/m 2 ), and α is the PT coefficient of 1.26 (unitless); PET is in units of W/m 2 .
A series of ecophysiological scalar functions (unitless; 0-1), based on atmospheric vapor pressure deficit (D a ; kPa), relative humidity (RH; fraction), and vegetation indices, including normalized difference and soil adjusted vegetation indices (NDVI and SAVI; unitless), simultaneously reduce PET to actual ET, and partition total ET into three sources for canopy transpiration (ET c ), soil evaporation (ET s ), and interception evaporation (ET i ): where f wet is relative surface wetness (RH 4 ) (Stone et al., 1977), f g is green canopy fraction (f APAR /f IPAR ) (Zhang et al., 2005), f T is a plant temperature constraint (June et al., 2004;Potter et al., 1993), f M is a plant moisture constraint (f APAR /f APARmax ) (Potter et al., 1993), and f SM is a soil moisture constraint ( RH Da ) (Bouchet, 1963;Fisher et al., 2008). f APAR is absorbed photosynthetically active radiation (PAR), f IPAR is intercepted PAR, T opt (°C) is the optimum temperature linked to plant phenology, and G is the soil heat flux (W/m 2 ) (Purdy et al., 2016). R nc and R ns are R n for the canopy and the soil, respectively, based on leaf area index derived from NDVI. PT-JPL is run globally and continuously in space and time with no need for calibration or site-specific parameters.
where R SD is downwelling shortwave radiation, R SU is upwelling shortwave radiation, R LD is downwelling longwave radiation, and R LU is upwelling longwave radiation. The R n components are retrieved implementing the Forest Light Environmental Simulator (Iwabuchi, 2006;Kobayashi & Iwabuchi, 2008) and Breathing Earth System Simulator (Ryu et al., 2011(Ryu et al., , 2012. R SD is calculated from eight inputs: (1) solar zenith angle, (2) aerosol optical thickness at 550 nm, (3) cloud optical thickness, (4) land surface albedo, (5) cloud top height, (6) atmospheric profile type, (7) aerosol type, and (8) cloud type . R SU is calculated from broadband surface albedo, which integrates black and white sky albedo, and R SD . R LD and R LU are calculated from Stefan-Boltzmann's law using LST, emissivity, and T a (Prata, 1996;Verma et al., 2016).
ECOSTRESS additionally computes ET using two other models, which are not provided as standard ET data products but are used to produce a multimodel uncertainty product as the standard deviation among the three models: a Penman-Monteith (Monteith, 1965)-based model (PM-Mu) (Mu et al., 2007(Mu et al., , 2011 and the Surface Energy Balance System (SEBS) (Su, 2002).
PM-Mu partitions total ET into ET c , ET s , and ET i : where C p is the specific heat of air at constant pressure (J·kg −1 ·K −1 ), ρ is air density (kg/m 3 ), g B is aerodynamic conductance of dry canopy (s/m), g S is stomatal conductance, g vc is aerodynamic canopy conductance, g hrc is wet canopy conductance, g as is soil aerodynamic conductance, and g tot is soil total conductance. Mu et al. (2011) provide constants for the conductances in a biome-specific lookup table.
SEBS is a single-source approach that targets the sensible heat flux (H), calculating ET from the residual of the energy balance: where Λ r is the relative evaporation, ET wet is the wet limit of ET, H wet is the wet limit of sensible heat flux, H dry is the dry limit of sensible heat flux and H is the actual sensible heat flux (W/m 2 ). This was later updated to the TSEB model.
The ECOSTRESS L2 product is used for LST and broadband emissivity. Landsat is used for ancillary surface properties NDVI, SAVI, and albedo. MODIS is used for ancillary atmospheric properties (with cloud gap-filling from the National Centers for Atmospheric Prediction) and to gap-fill cloudy Landsat surface properties (MODIS surface products are provided as gap-filled multiday aggregates) Fisher & ECOSTRESS algorithm development team, 2015;Verma et al., 2016). PT-JPL, PM-Mu, and SEBS/TSEB are all forced with the same input data for shared variables, including uniform calculation of, for example, R n and G. The multimodel uncertainty product provides spatiotemporally varying information, necessary because a constant uncertainty value cannot be applied to the ET estimates. This is due to non-Gaussian and spatiotemporally variable controls on ET, derived from Monte Carlo sensitivity experiments, Gaussian error propagation, and Method of Moments uncertainty quantification (Fisher et al., 2005(Fisher et al., , 2008. More information and detail on all the approaches can be found in the respective references.
Large synthesis databases coinciding with the recent time period of this analysis were not yet available, so we collated sites individually. Potential sites were identified from a wide variety of methods and sources:  We developed individualized emails based on research of the contacts and sites, and networks or other connections, to establish a social connection to facilitate response. Data from 35% of the sites (54) came from flux data networks; data from the other 101 sites were received directly from PIs. Coauthorship (and gratitude) was offered for data use. Introduction to other site POCs/PIs by already established partners led to a higher likelihood of positive response and subsequent data contribution and collaboration. Nearly every contact made led to at least an attempt to contribute data. The flux community in general was very supportive and interested in supporting a new and novel NASA mission linked directly to their shared science interests.
Each site POC/PI was contacted to discuss a number of details, including the operational status of their tower from July 2018 onward, verification of site descriptive information, confirmation of the required variables, and ability to deliver data relatively quickly (e.g., <1 month). To facilitate participation, a suggested formatting, delivery, and quality assurance/quality control (QAQC) was offered, but not required. As such, multiple data delivery mechanisms, formats, and QAQC were ultimately ingested. Data came from email attachments, institutional servers and websites, automatic SFTP and FTP downloads, and continuously updated internet-based services such as Google sheets. More than a dozen data formats were received across DAT, ASCII, text, Microsoft Excel (XLSX), comma/tab separated value (CSV/TSV), Hierarchical Data Format (HDF/H5), WINACE (.C**, .m**, .s**), Touchstone/SnP (s34, s46, c00, m34, and m36), and Block Compression (BC1).
The data received varied widely in level of processing, from extensive to raw. As needed, we formatted data for consistency, including renaming data gaps with NaN, conversion of timestamps to ECOSTRESS timestamps, and resampling of time steps finer than 30 min to 30-min time steps (for ease of analysis, understanding that ECOSTRESS does not overpass exactly on the hour/half-hour). We matched data to quality flags and excluded data that were flagged from the source data as either "poor quality" or "not a direct observation" Foken & Wichura, 1996;Göckede et al., 2008). Given the recency of these data, we identify these data as initial estimates, which may be subject to change once ingested into repositories and network-wide QA/QC applied. This limits the quality of our analysis, partially obviated by the large number of sites.
Finally, we processed the data through the FLUXNET 2015 data processing standards and code for half-hourly energy balance closure (fluxnet.fluxdata.org/data/fluxnet2015-dataset/data-processing) (Foken, 2008;Stoy et al., 2013;Twine et al., 2000;Wilson et al., 2002). The method produces a range of uncertainty on the eddy covariance measurements as well. The method requires data availability for LE, R n , H, and G; 30% of the 155 sites were unable to provide all of these data, in which cases the statistical distribution in energy balance closure across all available sites was applied to these sites. The range in lack of closure across the sites varied from 10% (Q1, 25th %), 30% (Q2, median), and 50% (Q3, 75th %). As such, we included these closure quartiles to those 30% of sites that did not provide enough data for site-specific closure, flagging them, and those sites with Q3 energy balance closure, for additional assessment. These sites were given energy balance closures of 30% with uncertainties ranging from 10% to 50%. Ultimately, we found that these flagged sites mostly did not noticeably degrade the comparison against ECOSTRESS.

Analysis
ECOSTRESS instantaneous LE (time of overpass), uncertainty, quality flags, and cloud mask were used in this analysis. For each daytime ECOSTRESS scene collected over a flux tower, a selection of 5 by 5 pixels (70 m × 70 m for each pixel) was extracted centered on the tower coordinates, providing a subset 350-m × 350-m scene for each site. The mean, median, and interquartile ranges were calculated for the 5 × 5 subset, as well as for a further refined subset of 3 × 3 pixels (210 m × 210 m) meant to mitigate impacts from landscape heterogeneity or smaller tower footprints. These subsets were selected based on conservative assumptions of general footprint sizes. Generally, the 5 × 5 and 3 × 3 statistics were not statistically significantly different from one another because eddy flux sites are commonly located in relatively homogeneous landscapes. Nonetheless, some sites (especially agricultural sites) were located in very heterogeneous areas, and differences in these calculations became more prevalent. We mostly used the 5 × 5 subset (70%) but visually examined each site in Google Earth to determine when the 3 × 3 subset should be used instead (30%). Ideally, the spatially and temporally varying coordinates of each tower footprint ("footprint-aware") would be provided and ingested. This was not available, however; as such, the overall assessment of ECOSTRESS error also includes some (unquantified) error due to footprint uncertainty (Chasmer et al., 2011;DuBois et al., 2018;Montaldo & Oren, 2016;Xu et al., 2017). Dynamic footprint information should be included in future validations. For comparison, we also evaluated a 1-km box around each tower (not all sites were available for this larger comparison).
We used the ECOSTRESS L2 and L3 quality flags to filter for high quality ECOSTRESS data. The L3 quality flag product, which itself is the combination of the quality flags of all of the ancillary inputs such as MODIS and Landsat, is provided as integers but must be read as 8-, 16-, or 32-bit binaries. For example, with the MODIS Cloud Mask flag, one-bit representation number can reveal: the cloud mask status flag (Bit 0), the cloud mask cloudiness flag (Bits 1 and 2), day/night flag (Bit 3), sunglint flag (Bit 4), snow/ice flag (Bit 5), and surface type flag (Bits 6 and 7). To decode each flag, one needs to shift the bit to the proper location and read the appropriate length. Reading a quality flag 51 as an 8-bit binary would result in 00110011. In this example, Bit 0 (read from the right to left) = 1 and would mean the data are useful. Specifically, data marked with bad quality flags in the MODIS forcing data for clouds (MOD06; bit code 119 marks clear-sky conditions) and aerosol optical depth (MOD04; bit codes 85 or 119 mark good conditions) were avoided as they would introduce contamination into the ECOSTRESS ET retrieval. Users will also want to use the ECOSTRESS uncertainty product for assessment of the ET quality, especially relative to the magnitude of the retrieved ET estimate. After quality control and given available cloud-free and high-quality ECOSTRESS acquisitions during this time period, 82 sites and 502 acquisitions were ultimately used for analysis, representing the majority of available site-to-satellite data pairs (supporting information Figure S1).
Our analysis of ECOSTRESS to eddy covariance measurements includes basic metrics of correlation, absolute root-mean-square error (RMSE), and overall bias. We summarize these statistics across all sites, by IGBP vegetation class, Köppen-Geiger climate zone, and time of day. For visualization and reduction, we grouped the Köppen-Geiger climate zones into 7, the IGBP vegetation classes into 5, and the times of day . We note that although there was large diversity in the sites, they do not necessarily sample a complete and unbiased statistical representation of the entire global land surface or their respective vegetation classes and climate zones (e.g., temperate sites are overrepresented); the available time window also precludes analyses of interannual variability and seasonal analyses. Future validations with a longer ECOSTRESS record and larger FLUXNET synthesis data sets should encompass such an analysis of representative distributions (e.g., Chu et al., 2017;Famiglietti et al., 2018).

Results
ET from ECOSTRESS (L3_ET_PT-JPL) compared well against a wide range of eddy covariance sites, vegetation classes, climate zones, and times of day (Figures 3-5). For instantaneous ET, the r 2 was 0.88, normalized (by range) RMSE was 6%, and overall bias was 8% (Figure 4). The overall RMSE was 41.3 W/m 2 compared to a mean of 182.0 W/m 2 and a range of 713.8 W/m 2 . The mean absolute bias was 19%. The eddy covariance measurements were generally contained within ECOSTRESS uncertainty, which was often relatively well constrained (Figures 3 and 4).
Correlation, RMSE, and bias were generally uniformly good across all group differentiations ( Figure 5). RMSE was relatively consistent across climate zones, though significantly lower in the Bwk and Csa-Csb climate zones. Bias was largest in the Bsh climate zone and consistently low across all the other climate zones. RMSE was also relatively consistent across vegetation classes, though significantly lower in the EBF class, and bias was largest in EBF. RMSE and bias were consistent across times of day. R 2 was generally consistent across climate zones, vegetation classes, and times of day.

10.1029/2019WR026058
Water Resources Research FISHER ET AL.
For comparison, we evaluated a 1-km box around a subset of towers to provide insight into accuracy improvement with the ECOSTRESS high spatial resolution ( Figure 6). As expected, most sites showed only marginal improvement with the high spatial resolution. This is because most FLUXNET sites are, by design, established in areas of relatively homogeneous surrounding land cover. However, many sites showed marked improvement with the high spatial resolution. These sites were primarily agricultural, though some natural ecosystems also benefited from the high spatial resolution, due to large landscape heterogeneity around the respective sites. Generally, 1-km pixels underestimated eddy covariance ET. This means that the land surrounding flux sites tended to be drier than the flux site. This difference may be due to irrigation for agricultural sites, or generally good growing conditions for natural ecosystems (Jung et al., 2011). Relative to the eddy flux data, the high spatial resolution of ECOSTRESS ET improved correlation by 85% (0.76 to 0.89) and normalized RMSE by 62% (13% to 8%). Note that the error from a 1-km pixel from, for example, MODIS, may be even greater than we report because each of our 1-km boxes are centered perfectly on the tower site; a 1-km resolution imager would not necessarily be perfectly centered on each site.

Discussion
These results were for the initial, or Stage 1, validation. Although previous analyses have shown good performance of PT-JPL, results tend to be worse at the instantaneous level (generally better at daily/weekly/ monthly aggregates) (Y. Chen et al., 2014;Ershadi et al., 2014;Fisher et al., 2008;Fisher et al., 2009;Jiménez et al., 2018;McCabe et al., 2016;Michel et al., 2016;Miralles et al., 2016;Polhamus et al., 2013;Purdy et al., 2018;Talsma et al., 2018;Vinukollu et al., 2011). Some studies have shown PT-JPL to have a high bias; so the small bias shown here was an improvement (Y. Chen et al., 2014;Jiménez et al., 2018;McCabe et al., 2016;Polhamus et al., 2013;Purdy et al., 2018;Talsma et al., 2018). We postulate that the results shown here are attributed to five main reasons: (I) the high spatiotemporal resolution of ECOSTRESS; (II) systematic eddy covariance energy balance closure computation; (III) overrepresentation of temperate ecosystems/underrepresentation of tropical ecosystems in the validation data set; (IV) careful treatment and filtering of the ECOSTRESS quality flags and cloud mask; and (V); the high accuracy and precision of the ECOSTRESS measurement, L2(LSTE) product, and the PT-JPL model itself. Multimodel uncertainty (thin vertical gray lines) is assigned to the ECOSTRESS value, and energy balance closure uncertainty is assigned to the eddy covariance value (thin horizontal gray lines). The light gray shaded region around the regression is the 95% confidence interval, and the dark gray shaded region is the prediction interval.
Previous global-scale analyses of remotely sensed ET products and algorithms, such as PT-JPL, have often been hindered by spatial resolutions at 1 km or greater (e.g., MODIS) (e.g., Jiménez et al., 2018;Purdy et al., 2018;Vinukollu et al., 2011;Yao et al., 2013). These resolutions introduce pixel-to-footprint mismatch comparing a single pixel to eddy covariance sites, which measure fluxes from footprints generally up to 1 km but usually much smaller, depending on tower height and wind conditions Baldocchi, 1997;B. Chen et al., 2009;Göckede et al., 2004). Consequently, correlation coefficients may be decreased due to pixel contamination outside the footprint (e.g., from other land covers/uses) (e.g., Figure 6). In the case of ECOSTRESS, with 70-m pixels, multiple pixels are encompassed well within the approximately conservative footprint area here of 350 m × 350 m, or 210 m × 210 m, depending on the landscape conditions surrounding each site. This sampling provides good representation of the footprint while minimizing contamination. Still, as noted in the Methods, this comparison could be improved further by more detailed spatiotemporal information on footprint coordinates at each site (Montaldo & Oren, 2016;Xu et al., 2017). In comparison, Landsat-based ET validations contain comparable excellent spatial resolutions; but they often lack the frequency of ET retrievals and multisite validation due to limited temporal resolution (at most every 16 days) and challenges in  data production (ET is not produced operationally from Landsat as it is for ECOSTRESS) (e.g., Allen et al., 2005;Anderson et al., 2012;Senay et al., 2016).
Eddy covariance energy-balance closure varies significantly from site to site and even within a site from hour to hour and season to season (Da Rocha et al., 2009;Franssen et al., 2010;Leuning et al., 2012;Stoy et al., 2013;Wilson et al., 2002). What has not been historically well standardized, however, is how to correct the data to account for the error indicative by that lack of energy-balance closure Barr et al., 2006;Foken, 2008;Foken et al., 2011;Twine et al., 2000). The FLUXNET synthesis team, with contributions from the larger eddy flux community, has established rigorous postprocessing procedures for eddy covariance measurements, enabling a robust envelope of closure for all sites; this procedure was instituted in the FLUXNET 2015 synthesis data set (Moffat et al., 2007;Papale et al., 2012Papale et al., , 2006Pastorello et al., 2017). Subsequently, a higher-quality eddy covariance observation is matched to the satellite-based estimate of ET. The goodness of fit metrics between site and satellite, therefore, are likely improved because of improvements in energy balance closure estimation and site level data quality.
It may be that the validation performance here is partially due to an overrepresentation of temperate ecosystems in the validation data set, and an underrepresentation of other ecosystems, especially in the tropics. ET in temperate ecosystems is typically easier to capture via satellite estimation than from tropical or semiarid ecosystems (Ershadi et al., 2014;Fisher et al., 2009Fisher et al., , 2017Fisher et al., , 2008Jiménez et al., 2018;Jung et al., 2010;McCabe et al., 2016;Michel et al., 2016). Humid tropical ecosystems mix a multitude of species responses and water use rates with a high radiation and moisture environment, in addition to cloud interference to the remote measurement (Fisher et al., 2009;Gomis-Cebolla et al., 2019;Hasler & Avissar, 2007;Larson et al., 1999;Vergopolan & Fisher, 2016;Werth & Avissar, 2004). A small percentage error in tropical ET estimation can result in a large absolute flux error. ET from semiarid ecosystems is also challenging to estimate due to strong biotic, or atmospherically decoupled, control over ET, as opposed to abiotic, or atmospherically coupled, control largely dominating other ecosystems (Fisher et al., 2008;Fisher et al., 2017;García et al., 2013;Jarvis & McNaughton, 1986;Jung et al., 2010;Moyano et al., 2018;Nemani et al., 2003;Purdy et al., 2018). In contrast, ET in temperate ecosystems tends to be strongly coupled to atmospheric conditions, which also vary strongly diurnally and seasonally (Ershadi et al., 2014;Fisher et al., 2008;Jiménez et al., 2018;Jung et al., 2010;McCabe et al., 2016;Michel et al., 2016;Nemani et al., 2003;Purdy et al., 2018). As such, it is relatively easy to capture temperate ET so long as the atmospheric conditions are well tracked. Here, the majority of sites obtained were from the United States and temperate ecosystems, and under clear-sky conditions, which also improve retrieval accuracy. We chose to include all of the sites in the analysis as we did not have enough data to exclude or downweight/upweight sites for a more globally representative assessment, as we have done with much larger synthesis data sets (e.g., Famiglietti et al., 2018). But this may inflate the interpretation of ECOSTRESS performance at the global scale.
We cannot overstate the importance of incorporating the ECOSTRESS quality flags, cloud mask, and uncertainty product in analyses such as validations. Often, validations ignore the quality flags altogether-both from the satellite as well as in situ data, leading to incorrect assessment or attribution for error calculations.
In the case of ECOSTRESS, the L3_ET_PT-JPL product is the result of a combination of multiple sources of data, each with their own quality flags and sources of error. These span from the ECOSTRESS L2 LSTE and cloud mask products derived from the calibrated L1 products and ancillary numerical weather prediction data, to the Landsat surface and MODIS atmospheric data, which include varying degrees of product levels. The quality flags of the Landsat and MODIS data are worth noting here. Given the coarse temporal resolution of Landsat, especially when cloudy, ECOSTRESS shifts reliance to MODIS for surface properties (NDVI, albedo) when the latest Landsat clear overpass becomes out of date (>16 days). This is now being improved with incorporation of Sentinel 2AB data. For the MODIS atmospheric properties, which are retrieved generally with some diurnal time separation to ECOSTRESS, if there happens to be a cloud beneath MODIS but not ECOSTRESS that day, then MODIS is in filled with numerical weather prediction data (e.g., backup algorithm). As such, ECOSTRESS is sharper when tied more closely to Landsat than to MODIS for the surface properties and more closely to MODIS than to the numerical weather prediction for the atmospheric properties. This may or may not have bearing on the comparison to the eddy flux site, depending on landscape homogeneity/heterogeneity surrounding the site. It may be possible to incorporate higher temporal resolution visible and near-infrared (VNIR) data from commercial cubesats (e.g., Planet) or other satellites in the future to enable more coincident overpasses with the TIR acquisition (Aragon et al., 2018). These issues are relevant to future missions considering coincident versus separated TIR and VNIR measurements; ET uncertainty can be significant when the TIR-VNIR temporal separation is large, and phenological change (including agricultural harvest, deforestation, and leaf flush) is also significant (National Academies of Sciences, Engineering, and Medicine, 2018). Here, we took the quality flags and cloud mask into careful consideration at each site in order to ensure that the comparison against the eddy covariance measurement was directly tied to the ECOSTRESS retrieval for each site and day. The ECOSTRESS multimodel uncertainty product is the strongest predictor of ET quality. The larger uncertainty relative to the magnitude of the retrieved ET estimate, the more likely the estimate has lower accuracy.
Finally, the performance of the ECOSTRESS ET validation may be due in part to the high accuracy (<1 K) and precision (<0.2 K) of the ECOSTRESS measurement, the L2(LSTE) product, and the PT-JPL model itself (Hook et al., 2019). PT-JPL is controlled by multiple drivers, from surface temperature and vegetation characteristics to atmospheric properties, and the influence of those drivers varies depending on space and time (Badgley et al., 2015;Fisher et al., 2017Fisher et al., , 2008Polhamus et al., 2013;Ryu et al., 2011). While it is possible to generate an approximate estimate of ET from vegetation and atmospheric properties alone, LST dominates the ET signal at fine spatial scales, indicating when green vegetation is or is not transpiring (Fisher et al., 2017) (supporting information Figure S2). Because of this sensitivity to LST by PT-JPL, the accuracy and precision of the ECOSTRESS L2(LSTE) is critical to fine-scale ET estimates, such as in comparisons to eddy flux footprints García et al., 2013). In turn, the accuracy and precision of the L2(LSTE) product is a result of the ECOSTRESS instrument and L1 products themselves. ECOSTRESS was built with a state-of-the-art combination of thermal bands, blackbody calibrations, spatial resolution, temporal resolution, measurement accuracy, and precision, which, when used in conjunction with the established Temperature and Emissivity Separation and atmospheric correction retrieval algorithms, allows for a high-quality measurement that propagates upward through the higher-level ET and other data products.

Conclusion
ECOSTRESS provides new thermal infrared temperature measurements from the vantage point of the ISS at 70-m spatial resolution, every 1-5 days, and sampling the diurnal cycle. These measurements are used to generate a suite of data products, with the primary science focus on ET from the Level-3 latent heat flux (LE) product (L3_ET_PT-JPL). We produced a relatively rapid and robust preliminary first Stage 1 validation of ECOSTRESS clear-sky LE against 82 eddy covariance sites from around the world. ECOSTRESS LE matched well with site measurements (instantaneous: r 2 = 0.88; overall bias = 8%; normalized RMSE = 6%), showing good correlations and bias across a range of vegetation classes, climate zones, and times of day. This paper serves as a reference for the ECOSTRESS L3 ET accuracy and preliminary Stage 1 validation status for subsequent science that follows using these data.