Evaluation of Regional‐Scale Soil Moisture‐Surface Flux Dynamics in Earth System Models Based on Satellite Observations of Land Surface Temperature

There is a lack of high‐quality global observations to evaluate soil drying impacts on surface fluxes in Earth system models (ESMs). Here we use a novel diagnostic based on the observed warming of the land relative to the atmosphere during dry spells (relative warming rate, RWR) to assess ESMs. The ESMs show that RWR is well correlated with changes in the partition of surface energy between sensible and latent heat across dry spells. Therefore, comparisons between observed and simulated RWR reveal where models are unable to capture a realistic soil moisture‐heat flux relationship. The results show that in general, models simulate dry spell ET dynamics well in arid zones while decreases in evaporative fraction appear excessive in some models in continental climate zones. Our approach can help guide land model development in aspects that are key in simulating extreme events like heat waves.


Introduction
Soil moisture (SM) plays a crucial role in water and energy partitioning within the Earth system. In waterlimited regions, SM fluctuations strongly control the ratio of latent to sensible heat flux. Dry soils can amplify near-surface air temperature (T a ) extremes through feedbacks with evapotranspiration (ET; Miralles et al., 2014;Teuling, 2018) and influence rainfall generation (Koster et al., 2004;Taylor et al., 2012). Considering that large changes in water availability are expected later in this century, SM-atmosphere feedbacks will likely affect T a and precipitation projections (Seneviratne et al., 2013). This is particularly important for the occurrence of extreme events that have devastating socioeconomic and environmental impacts .
The representation of SM feedbacks in Earth system models (ESMs) is limited by the poor depiction of ET in land surface schemes (Gevaert et al., 2018). Simulations from the Coupled Model Intercomparison Project Phase 5 (CMIP5) disagree on when, where, and how much ET is limited by SM (Mueller & Seneviratne, 2014). Levine et al. (2016) found that land-atmosphere coupling strength was stronger in models than in satellite observations and Dirmeyer et al. (2018) showed that models tend to overestimate correlations between SM and ET when comparing to flux tower observations. Sippel et al. (2017) showed that most CMIP5 simulations systematically produce too frequent coincidences of high T a anomalies with negative ET anomalies in midlatitude regions during the warm season and in several tropical regions year-round. Looking at the evaporative fraction (EF) during heat extremes, Ukkola et al. (2018) also found that ESMs overestimate the strength of land-atmosphere coupling, causing the overamplification of heat extremes in wet regions. ©2019. The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Evaluation of the relevant modeled processes still suffers from a lack of reliable, global observations of ET at the ESM grid box scale. Several studies have presented observation-based metrics of varying complexity for describing different aspects of the coupling between SM and atmospheric conditions. These metrics include correlations between ET and T a , ET and solar radiation (Teuling et al., 2009), an index of surface flux sensitivity to SM (Dirmeyer, 2011), the dependence of EF on SM (Schwingshackl et al., 2017), and coincidences between anomalies of T a and ET (Sippel et al., 2017) and of ET and photosynthetically active radiation (Zscheischler et al., 2015).

10.1029/2019GL082962
Following the approach of Teuling et al. (2006) in assessing land surface behavior during dry downs (rainfree periods), Gallego-Elvira et al. (2016;hereafter GE16) presented the global diagnostic relative warming rate (RWR; mean increase in land surface temperature [LST] relative to T a during dry spells) to evaluate SM effects on land fluxes based on satellite observations and reanalysis data. The novelty of the RWR is that it highlights the direct thermal response of the surface energy balance to SM. Alternative global observationbased approaches either link SM to atmospheric variables (Levine et al., 2016;Mueller & Seneviratne, 2012) or rely on assumptions about the SM-ET relationship to interpret the observed variable (Miralles et al., 2012).
In this study, we assess the potential of the RWR diagnostic for the evaluation of regional-scale SM-surface flux relationships in ESM simulations. We derived global modeled RWRs from 10 ESMs and compare them to observed RWRs computed from 15 years of satellite observations and reanalysis data.

Theoretical Background
The dynamics of LST and T a during dry spells, when negligible rain falls, can be used to characterize the changing partition of energy between sensible and latent heat. When soils dry out sufficiently, ET becomes limited, leading to increases in LST (direct response) and T a (indirect response, via enhanced surface warming of the planetary boundary layer). In fact, as shown recently by Panwar et al. (2019), the imprint of evaporative conditions on temperature is stronger at the land surface than in the atmosphere. Thus, the rate of land surface warming relative to the atmosphere (RWR) emphasizes the impact of changes in SM control on the surface energy balance. For theoretical illustration, we presented in GE16 a simple model to simulate LST evolution during dry downs across the drying stages I-III, as in Teuling et al. (2010) and SM and ET regimes, as in Seneviratne et al. (2010). The sensitivity of RWR to initial SM can be described by a bell-shaped curve (supporting information Figure S1). In the wet, energy-limited regime (stage I), ET is unconstrained by SM, and there is no change in LST during the dry spell (for constant atmospheric conditions, RWR = 0). In the transitional regime (stage II), ET decreases over time as the soil dries, and RWR thus increases. If the onset of the dry spell occurs when the soil is already relatively dry, the decrease in ET over time will be relatively small. This implies that as soils approach stage-III drying, RWR falls toward 0 again. Observations of how RWR varies with soil wetness thus gives an indication of which regimes a region typically experiences. For example, in an arid (wet) region, RWR will rise (fall) with increasing initial SM.

Observational and Model Data
The observed RWRs were derived from satellite and reanalysis data. We use clear-sky daytime LST observations from MODIS (Moderate Resolution Imaging Spectroradiometer, on board of Terra and Aqua satellites) for the period 2000 to 2014 (Collection-5) and aggregated from 0.01°to 0.5°. We used 3-hourly T a from European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalysis (Dee et al., 2011), regridded to a regular 0.5°grid and linearly interpolated to the average satellite scan time of each 0.5°grid box. For both land and air temperatures, we use anomalies (relative to a long-term monthly mean) to minimize the impact of variable spatial sampling in regions with strong temperature gradients and to allow us to combine data from the Terra (late morning) and Aqua (early afternoon) satellites. The anomalies of both (surface and air) temperatures are defined using the same method and sampling. It should be noted that by using temperature anomalies, the RWR effectively filters out slowly varying climatological changes in EF and focuses on anomalous higher-frequency land dynamics.
Satellite precipitation was used to identify rain-free periods. We define a dry spell as a sequence of at least 10 days in which the rainfall of all the three following data sets is below 0.5 mm: Tropical Rainfall Measuring Mission (TRMM, Huffman et al., 2007), Climate Prediction Center MORPHing (CMORPH, Joyce et al., 2004), and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN, Sorooshian et al., 2008). We considered that 10 days is enough to observe the LST response to soil drying while keeping a sufficient quantity of dry spells to enable the analysis globally. The SM condition at the start of each dry spell was estimated from the rainfall accumulation over the previous 30 days. Further details on observational data processing and filtering can be found in GE16.
We evaluated a subset of ten ESMs running the Atmospheric Model Intercomparison Project experiment. The models were selected based on the availability of 3-hourly data of the variables required for the analysis: LST, T a , precipitation, latent and sensible heat fluxes, and surface downward solar radiation (rsds) plus clear-sky rsds (rsdscs) or cloud area fraction. Only six models in the CMIP5 archive provide subdaily LST. The data from the additional four ESMs were provided by partners within the framework of the EU FP7 EMBRACE (Earth system Model Bias Reduction and assessing Abrupt Climate change) project as summarized in Table S1. As with the observations, anomalies in simulated land and air temperature are used. When comparing with observed RWRs, this approach reduces the impact of systematic biases in the models and instead emphasizes the short-term evolution of the SM-ET relationship over a dry spell.

Computation of RWRs and EF Gradients
To calculate observed RWRs, the first step is to compute the composite mean difference between LST and T a anomalies for each dry spell day (LST anom − T a,amon , equation (1)).
where j is the dry spell day, n is the number of dry spell events, i is a single event with a valid LST observation, LST c and T c a are the climatological LST and T a , and the weighting factor w is the number of 0.01°observations used to construct the 0.5°LST. Values of T a are only selected on days with a valid 0.5°LST value, so LST c and T c a represent clear-sky climatologies. We removed dry spell events with days in which T a was below 10°C at satellite overpass time to avoid artefacts from soil freezing. The second step is to retrieve RWR as the slope of the linear regression of LST anom − T a,amon on days 2 to 10 of the dry spell. The first day of each dry spell was discarded because of lower number of LST observations, timing errors in precipitation, and to exclude days affected by water on the vegetation canopy. Note that during long dry downs, only the 10 first days are considered for the analysis.
To derive ESM RWRs, model 3-hourly output was processed to replicate the clear-sky sampling of LST during dry spells. We considered clear-sky conditions when rsds/rsdscs > 0.9 or the cloud area fraction was below 10% at the average satellite overpass local time for MODIS Terra and Aqua combined (~1200 LT). The rest of the computations were analogous to the observations, but they were performed on the native grid of each model. In addition to the variables needed to retrieve RWRs, clear-sky anomalies of modeled fluxes during dry spells were computed to look at the dry-down evolution of the EF. In the same way as for RWRs, we first calculated the composite mean EF anomalies for each dry spell day, and the EF gradient was computed by linear regression of EF on days 2 to 10 of the dry spell.
Finally, in order to assess the sensitivity of RWRs and EF gradients to SM at the onset of a dry down, dry spell events were stratified by observed 30-day antecedent precipitation in decile bins. We studied this sensitivity at regional level using 20 SREX regions defined in Seneviratne et al. (2012), where there were sufficient dry spells and LST observations to enable the analysis. A dominant climatic type was assigned to each SREX region based on the most frequent grid cell class according to the Köppen-Geiger classification (Kottek et al., 2006); equatorial (As, Aw, and Cw); arid cold (Bk) and hot (Bh); temperate humid (Cf) and dry (Cs); and continental hot/warm summer (Dfa and Dfb; hereafter continental).

Results
Global observed and modeled RWRs (between 60°S and 60°N) are presented in Figure 1 at 1°resolution for observations and on the native grid for each model. Globally, the area with positive RWR dominates in both observations (73% of the grid boxes) and ESM simulations (from 56% to 85% for EC-EARTH and MRI-AGCM3-2H, respectively). Positive RWR means that the land surface warms (on average) faster than the atmosphere during 10-day dry downs. Given no clear trend in solar radiation and wind speed over the dry spell (as shown in GE16), positive RWRs can be associated with increases in sensible heat flux. There are some common broadscale features in all models and observations, notably the widespread positive RWR in dry regions such as Australia, southern Africa, and west North America and the dominance of small or even negative RWR in wet and forested regions. Negative RWRs in these areas indicate that there is abundant water in the soil during dry downs for evaporative cooling, and therefore, the air warms faster than the surface. However, in the midlatitudes we can detect several areas of disagreement (in sign and spatial pattern of RWR) among the models and compared to observations: for example, Central Europe (CEU), Eastern North America, and Eastern Asia. In these areas, transitional and wet ET regimes are typical depending on the season (Schwingshackl et al., 2017).
While illustrative of gross properties of the models, a drawback in directly comparing simulated and observed RWR in Figure 1 is that there are a number of factors beyond underlying SM-ET coupling that control RWR. For instance, there is a large range of global mean RWR across the models (e.g., HadGEM2-A and MRI-AGCM3-2H). Large contrasts in mean RWR may reflect different aerodynamic coupling between  Table S1. Inset bar plots show the distribution of RWR across all grid cells and are annotated with the percentage of grid cells with positive RWR. Land grid cells with less than one event per year with suitable data for the analysis are shaded gray. models or different approaches to representing subgrid land surface properties. Also, biases in precipitation forcing from an individual ESM may place the land surface in the wrong drying regime. To provide more insight into the modeling of SM-ET coupling, we examine the variation of RWR under different hydrological conditions across a region. Following GE16, we compute RWR as a function of precipitation in the previous 30 days. This proxy of pre-dry spell hydrological status is chosen as, unlike SM, it has a common definition across the models and the observations. Figure 2 illustrates this analysis for two SREX regions with contrasting hydroclimatic regimes, Southern Africa (SAF) and CEU. The same plot for the remaining SREX regions is available in Figures S2a-S2r. The data from each dry spell and grid cell are grouped in 10 bins defined by the observed deciles of predry spell antecedent precipitation, and these bins are also applied to the models. This allows a comparison between the simulated and observed RWR that is insensitive to monthly rainfall biases in individual models and provides information about the drying stage of the land surface Teuling et al., 2010). In SAF, observed RWR is strongly sensitive to antecedent rain, particularly following relatively dry Figure 2. Observed versus modeled relative warming rate diagnostic (RWR, K day -1 ) and modeled evaporative fraction (EF) gradient (day −1 ) stratified by 30-day observed antecedent precipitation in decile bins for the 10 Earth system models described in Table S1. Error bars correspond to the standard error of the estimated gradient of the linear least squares regression. Plots in columns 1 and 2 correspond to the SREX region Southern Africa (SAF) and columns 3 and 4 to Central Europe (CEU). Gray bars show the distribution of dry spell events produced by the models for each observed antecedent precipitation bin. The maximum number for modeled events per bin is indicated in gray on the top left corner of each plot and for observed events is shown in the top right of the first plot for each region. Correlations between simulated RWR and observations (R 2 obs ) and EF gradient (R 2 EF ) are provided for each model, except when R obs < 0 or R EF > 0, which is indicated with "−" or "+," respectively. months (<30 mm/month) and has a maximum value in the 10th decile. This is consistent with dry spell events falling within drying stages II (SM constrains ET) to III (SM is unavailable for ET). All the models exhibit this semiarid behavior to some extent, which is quantified by the correlation between observed and simulated RWR (R 2 obs ). The relationship between RWR and surface flux dynamics in the models is also illustrated in Figure 2, using the mean gradient of EF over time during dry spells. For each model in SAF, Figure 2 shows that there is a strong negative correlation between RWR and EF gradient. Four of the models (BNU-ESM, CNRM-CM5, IPSL-CM5C-MR, and NorESM1-M) depict the strongest dry spell decrease in EF (most negative value) in the 10th decile, suggesting no influence of stage I drying. These models also have the highest RWR in the 10th decile, consistent with the observations. The relationships between EF gradient and rainfall in the remaining models appear more like inverted bell curves; the EF gradient rises (becomes less negative) under the rainiest conditions. This is indicative of a weakening of the SM control on ET at the wet end and is also reflected in the shape of the RWR curve in these six models. The negative correlation between RWR and EF gradient in each model highlighted here demonstrates the validity of our approach in using RWR as a proxy for dry spell flux dynamics. In supporting information Figure S3 this correlation is quantified (R 2 EF ) for all models and all regions, with 81% of model regions having values exceeding 0.7. CEU exhibits a broader range of hydrological conditions than SAF, with an observed bell-shaped RWR curve (maximum in the seventh decile), consistent with the analysis over a larger European domain by Folwell et al. (2016). The models with the highest values of R 2 obs (CNRM-CM5 and MRI-AGCM3-2H) also exhibit a strong inverted bell-shaped EF gradient curve; in other words, EF decreases most rapidly in dry spells with intermediate soil wetness, consistent with theory. The models ranked lowest using R 2 obs (IPSL-CM5C-MR, MIROC5, MPIESM, and MRI-CGCM3) have almost no sensitivity of RWR to rainfall, at odds with observations. They do not simulate relatively strong dry spell decreases in EF at intermediate rainfall, suggesting poor underlying SM-ET relationships in these models for this region. Note that the CNRM-CM5 model performs best here in spite of sampling a disproportionately high fraction of dry spells following dry months (shown by gray bars in Figure 2). This illustrates that our approach isolates the effect of the underlying land model from biases in rainfall forcing by the host ESM.
The R 2 obs metric of land model performance in simulating observed LST-SM functional relationships is provided for all SREX regions and models in Figure 3. Broadly speaking, the best agreement between observations and models is found in arid climates, while models struggle to simulate RWR behavior in continental climate regions. For the most part, the ranking of the regions in Figure 3 is not directly attributable to a weakening in the correlation between simulated RWR and EF gradient ( Figure S3). In the vast majority of regions and models, there is a strong correlation between our proxy (RWR) and the object of our interest (dry spell ET dynamics). The location (in antecedent rainfall space) of the maximum RWR is the main source of discrepancy between models and observations. We note that the lowest ranked models simulate the location of the RWR curve maximum at higher rainfall than in observations ( Figure S4). This implies a tendency toward stage III drying in these model regions, which is inconsistent with observed RWR. This finding is in agreement with previous studies reporting that models tend to overestimate the response of ET to soil drying and simulate overly SM-limited conditions in several regions (Berg & Sheffield, 2018;Dirmeyer et al., 2018;Levine et al., 2016;Sippel et al., 2017).

Summary and Discussion
In this work, we have developed a methodology for evaluating the sensitivity of ET to SM in ESMs across the world. This is based on the dynamics of LST and T a during dry spells of 10 days, when SM limitation may affect the daily evolution of ET. The models show that RWR is well correlated with changes in the partition of surface energy between sensible and latent heat during a dry spell ( Figure S3). Comparisons between observed and simulated RWR (Figure 3) thus highlight where models are unable to capture a realistic SM-ET relationship. The results show that in general, models capture dry spell ET dynamics well in arid climate zones. The largest differences, both across models and with observations, occur in continental climate zones. Here there is a tendency for dry spell decreases in EF to be too large (or in other words, too strong an influence of stage III drying).
Our approach exploits the direct response of LST to changing hydrological conditions. This has an advantage over previous studies examining the atmospheric response to SM (for example through T a ), which brings in additional factors external to land processes. On the other hand, using our observational measure (RWR) brings its own challenges. While the quality of LST data derived from satellites is high relative to more derived products (such as ET), we also need accurate values of T a from reanalysis. We recognize that the method relies on 1-to 2-K differences between 10-day tendencies in surface and air temperatures, which might be difficult to estimate, though our previous work (GE16) showed relatively little sensitivity in RWR to alternative choices of reanalysis data. Another important issue for the regional scale analyses in Figures 2 and 3 is the spatial and temporal sampling of the dry spells. Climatological patterns of both cloudiness and dry spells dictate where and when the data are sampled. Moreover, it is challenging to interpret results from some of the SREX regions when they incorporate such diverse landscapes as tropical forest and semidesert (e.g., WAF).
We interpret absolute differences between modeled and observed RWR with caution. This is partly because of the aforementioned uncertainties in estimating observed RWR. Additionally, modeled RWR is sensitive to properties such as roughness length and bare soil fraction, and their prescription from remotely sensed land cover maps. Biases in these properties will affect the absolute value of RWR but may only weakly affect dry spell ET dynamics. For these reasons, we consider how RWR varies with antecedent rainfall to be a more robust measure. Finally, our methodology is currently restricted to 10-day dry spells out of a pragmatic choice of sampling sufficient events globally. Land schemes may perform well at this time scale but poorly at longer time scales relevant for seasonal drought (Ukkola et al., 2016). vertically (horizontally). The best performing model per region is indicated with "*." Gray numbers indicate that the corresponding R 2 EF in Figure S3 is below 0.7. Cells shaded dark red and annotated with "−" indicate that R obs is negative.
It is beyond the scope of this paper to provide more than an overview of the approach and summarize model results. We can speculate that simulated dry spell ET dynamics may be poor in CNA, ENA, EAS, and SAS because of their extensive cropland. Many land models lack a realistic representation of the impact of agriculture (particularly when irrigated) on the surface energy balance. The inclusion of groundwater would also suppress simulated ET sensitivity to antecedent rainfall. We find that no single model performs well in all regions. On the other hand, the regional median R 2 obs used to rank models in Figure 3 clearly distinguishes between good and bad performance against our metric globally, though without further analysis, we would caution against overinterpretation.
Notwithstanding some of the issues highlighted above, we think that this approach provides a valuable new tool for modelers seeking to evaluate and upgrade the description of land surface processes (Balsamo et al., 2018). Considering the growing body of evidence that in many regions ESMs tend to overestimate SM-limited ET conditions (Dirmeyer et al., 2018;Levine et al., 2016;Sippel et al., 2017), our method can make a relevant contribution to identify actual water-limited conditions and surface heat release and help to constrain heat extremes projections. We provide the code and aggregated observations for developers to benchmark their individual models. With the advent of CMIP6, which will see subdaily LST data archived from many more models, we hope to provide a more complete assessment of these state-of-the-art models and continue to refine our methodology to identify model shortcomings.