Large Uncertainties in Runoff Estimations of GLDAS Versions 2.0 and 2.1 in China

Gauge observed runoff can reflect influences of both natural hydrological cycle and human intervention. The Global Land Data Assimilation System (GLDAS) 2.0 and 2.1 provide abundant runoff which are useful for water resources assessment in ungauged/poorly gauged regions. However, GLDAS2.0 and GLDAS2.1 runoff have only been validated and inter‐compared in very limited regions. In this study, they are evaluated and inter‐compared utilizing gauge observation in 11 large river basins in China. Results show their runoff have large uncertainties: absolute values of relative bias (|RB|) being above 39% and Nash‐Sutcliffe efficiency lower than 0.15 on average, but GLDAS2.1 is better. Both of them have large uncertainty in the Tibetan Plateau:|RB|are higher than 40%. The gap between GLDAS runoff and observations could attribute to both GLDAS system uncertainty and the fact that GLDAS does not consider human intervention. Therefore, cautions should be taken when using them in coupled human‐natural systems.


Introduction
Water resources management in coupled human-natural systems is of importance to human well-being.Human activities can influence natural hydrological cycle, and natural systems can provide service to human activities.With rapidly increasing population in the anthropogenic era, the interactions between human and natural systems becoming stronger than ever before [Liu et al., 2007].
Runoff is one of the most integrative indicators of basin scale hydrology [Liang et al., 1994;Chen et al., 1996;Sellers, 1997;Soroosh et al., 2005;Zaitchik et al., 2010;Qi et al., 2015;Wang et al., 2017;Qi et al., 2020].The Global Land Data Assimilation System (GLDAS) [Rodell et al., 2004] provide abundant runoff data which is useful for water resources assessment in ungauged/poorly gauged regions.GLDAS estimates runoff on the basis of global Land Surface Models (LSMs) and global scale forcing data, such as the Princeton Global meteorological Forcing (PGF) dataset [Sheffield et al., 2006].Currently, the LSMs used in GLDAS do not consider human intervention [Scanlon et al., 2018], such as human water abstraction.Therefore, the datasets in GLDAS could be problematic when studying coupled human-natural systems.Hydrological gauge observations are results of natural hydrological processes with influences from human activities.Investigating the gap between GLDAS runoff and hydrological gauge observations could provide valuable information on the strengths and weaknesses of GLDAS systems under coupled human-nature systems; thereby can provide information on where cautions should be taken when using them and guidance for future development of the systems.
example, Qi et al. [2015] evaluated GLDAS1.0 input data in Northeast China, and found GLDAS1.0 overestimated downward solar radiation; Bai et al. [2016] evaluated runoff of GLDAS1.0 in the Tibetan Plateau, and found Noah model overestimated runoff.Regarding GLDAS version 2, studies on its quality remain few.For example, Wang et al. [2016a] assessed the soil temperature estimation of GLDAS2.0 and found good agreement with in situ measurement; Wang et al. [2016b] studied the applicability of GLDAS2.0 in terms of precipitation, evapotranspiration, air temperature, water storage and runoff, and found that runoff is underestimated.For GLDAS2.1, Lv et al. [2018] compared GLDAS2.1 runoff with the University of New Hampshire and Global Runoff Data Centre composite dataset, and found considerable uncertainties.However, studies to investigate GLDAS2.1 uncertainty in runoff remain limited on large scales on the basis of hydrological gauge runoff observation.In addition, studies to compare runoff estimation of GLDAS2.0 and GLDAS2.1 based on hydrological gauge observation on large scales also remain limited.
The overall objective is to evaluate and inter-compare the GLDAS2.0 and GLDAS2.1 runoff data on the basis of hydrological gauge observation from 2000 to 2010 (their only overlapping period) in China.The investigated GLDAS data are GLDAS_NOAH025_3H_2.0[Matthew and Hiroko Kato, 2015] and GLDAS_NOAH025_3H_2.1 [Matthew and Hiroko Kato, 2016].study provides unique insights into the strengths and weaknesses of GLDAS2.0 and GLDAS2.1 runoff data in China.

Study regions and methods
In this study, three regions of China are investigated: Northeast China, Middle part of China (Middle China) and the Tibetan Plateau.They cover an area of 3,511,244 km 2 in total.The selected basins in the three regions are shown in Fig. 1.Northeast China has a long winter and warm summer, with multi-year mean monthly temperature being 4.26 degrees Celsius.The basins in Tibetan Plateau have high altitude (above 4,000 meters on average) and multi-year mean monthly temperature is around zero degree Celsius.The Middle China covers a large area with various land covers, e.g., forest, desert, short vegetation, etc.More details of the study regions and data are shown in Table 1, including basin area, mean altitude, mean annual rainfall, mean air temperature and temporal coverage of monthly runoff which is from hydrological gauge observation.
Runoff in a grid ( grid q ) is calculated as the sum of GLDAS overland flow, interflow and groundwater grid sur g+inter q q q  (1) where sur q represents overland flow (m 3 /s); g+inter q represents sum of interflow and groundwater (m 3 /s).In this study, because the time of runoff concentration is less than one month [Wang et al., 2016b;Allen et al., 2018], the monthly runoff of GLDAS2.0 and GLDAS2.1 at the gauging sites are calculated by summing up the upstream monthly runoff gauge grid where gauge q represents the calculated monthly runoff at gauging sites.This routing method is commonly used in previous studies (e.g., Crooks et al. [2014]; Li et al. [2015]; Wang et al. [2016b]), because it is a nonparametric approach (as shown in Eq. 2) and therefore does not introduce parameter uncertainty.Therefore, it is acceptable to use this approach in this study.
Three evaluation criteria are used: Correlation Coefficient (CC), Relative Bias (RB) and Nash-Sutcliffe Efficiency (NSE) because they are commonly utilized in uncertainty evaluations especially for runoff [Beven and Binley, 1992;Beven and Freer, 2001;Wang et al., 2011;Qi et al., 2016a;Qi et al., 2016b;Qi et al., 2016c;Yang et al., 2017;Qi et al., 2018;Qi et al., 2019]: where Qpi and Qti represent simulated and observed data respectively at time i; t Q represents average of observed data; n represents the total number of data points.The higher the CC and NSE values, the better the GLDAS data performance in replicating monthly variations.To compare with uncertainty in runoff, the three criteria are also utilized for rainfall evaluations.
The rainfall data used in this study was generated based on over 2,400 in situ rain gauges in China with a resolution of 0.5 o ×0.5 o by the National Meteorological Information Center (http://data.cma.cn/data/)[Zhao and Zhu, 2015;Shen and Xiong, 2016].In this study, we regrid the rainfall data into 0.25 o ×0.25 o grids which are the same resolutions as GLDAS2.0 and GLDAS2.1 girds, and the data located inside the study regions was utilized.

Relative Bias
Fig. 2 shows the evaluations of runoff and rainfall in terms of RB.GLDAS2.0 greatly overestimates runoff in the Liao river (21%), and underestimates in other regions with the exception of the Yarlung Tsangpo river.In the Upper Yellow river, GLDAS2.0 has the highest uncertainty with RB being up to -93%, whereas it has the lowest uncertainty in the Yarlung Tsangpo river (1%).The average of absolute values of RB is 53% for GLDAS2.0.Therefore, GLDAS2.0 runoff estimation has large uncertainty.Similarly, GLDAS2.1 greatly overestimates runoff in the Liao river with RB being 30%, and it largely underestimates runoff in the Upper Yellow river with RB being -66%.The RB value is 26% in the Yarlung Tsangpo river, which is much higher than the value for GLDAS2.0.In addition, similar to GLDAS2.0,GLDAS2.1 underestimates runoff in most of the basins with the exceptions of the Liao river, Middle Yellow river and Yarlung Tsangpo river.The average of absolute values of RB is 39% for GLDAS2.1.Therefore, GLDAS2.1 is closer to observation than GLDAS2.0 on average but still has large uncertainty.In addition, based on the runoff evaluation results, the absolute values of RB are 9% for GLDAS2.1 in the middle Yellow river and Lancang river, and the RB value is 1% for GLDAS2.0 in the Yarlung Tsangpo river.Therefore, GLDAS2.1/GLDAS2.0could be used in these rivers if the total volume of runoff is of interest.Root Mean Square Error (RMSE) values are shown in supporting information Table S1 and S2 Regarding RB for rainfall (Fig. 2b), GLDAS2.0 overestimates rainfall with RB being 27% in the Yarlung Tsangpo river, whereas it seriously underestimates rainfall in Heihe with RB being -65%.The average of absolute RB values is 23% for GLDAS2.0.Similarly, GLDAS2.1 seriously overestimates rainfall in the Yarlung Tsangpo river and underestimate rainfall in the Heihe.The average of absolute RB values of GLDAS2.1 is 20%, which is lower than that of GLDAS2.0.Therefore, GLDAS2.1 is better than GLDAS2.0 on average, which is the same as the results of runoff evaluations.Differences between the two version datasets also exist.For example, GLDAS2.0 underestimates rainfall in most basins except in the Yarlung Tsangpo and Liao rivers.However, GLDAS2.1 overestimates rainfall in most basins except in the Upper Yangtze river and Heihe.It should be noted that, although GLDAS2.1 underestimates rainfall in the Middle Yellow river, the RB value is very small (-1%).
Comparing the evaluations between runoff and rainfall, GLDAS2.0 underestimates runoff when it underestimates rainfall.However, GLDAS2.1 underestimates runoff even though it overestimates rainfall in the Lancang river, Lower and Upper Songhua river, Middle and Lower Yangtze river and Upper Yellow river, and it overestimates runoff even though it underestimates rainfall a little in the Middle Yellow river.These uncertainty may be caused by both the rainfall uncertainty and Noah model uncertainty (including model structures and parameterization) [Swenson et al., 2012;Chen et al., 2013;Yang et al., 2018].Niu et al. [2011] pointed out that Noah model may result in large uncertainty in surface hydrological processes (including the vegetation photosynthesis and transpiration processes, and groundwater simulation).The Noah model uses a combined surface layer of vegetation and soil surface which cannot explicitly compute photosynthesis and transpiration processes.The Noah model uses a shallow soil column and free drainage scheme at the bottom of the soil column, which could lead to large uncertainty in groundwater simulation.In addition, interactions between Noah model uncertainty and rainfall data uncertainty may also influence the results as shown in the study by Qi et al. [2016a] in which it is revealed that the interactions can contribute to a similar magnitude of uncertainty as models.Further, human activities (such as irrigation, domestic and industrial water use) can consume large amount of water resources [Tang et al., 2008], which may also contribute the difference between GLDAS simulation and hydrological gauge observation because LSMs used in GLDAS do not consider human activities in runoff simulation.For example, the water resources utilization percentage for irrigation, household and industry is higher than 40% according to the water resources reports of the Songliao Water Conservancy Committee of the Ministry of Water Resources of China (http://www.slwr.gov.cn/szy2011/); in the Yellow river, water resources utilization rate is also above 20% (http://www.yrcc.gov.cn/other/hhgb/), in the Heihe river, it is above 50% [Wu et al., 2005].The four river basins on the Tibetan Plateau are covered by a large amount of snow and glacier.However, the Noah model used in GLDAS2.0 and GLDAS2.1 does not include a glacier sheet model, which may also contribute to the uncertainty in the simulated runoff.

Nash-Sutcliffe Efficiency
Fig. 3 shows the evaluations of runoff and rainfall in terms of monthly scale NSE.GLDAS2.0 runoff has the highest NSE in Yarlung Tsangpo river (0.56), and the lowest value in the Upper Yellow river (-1.39).The average NSE is -0.21 for GLDAS2.0.Similarly, GLDAS2.1 runoff also has the lowest NSE in the Upper Yellow river (-0.41); but its highest NSE value is in the Lancang river (0.66), which is different from GLDAS2.0.The average NSE is 0.15 for GLDAS2.1.Therefore, GLDAS2.1 is better than GLDAS2.0 on average, but the NSE value still shows very low accuracy.Because GLDAS2.0 and 2.1 use the same LSM (i.e., Noah 3.3), the differences here could attribute to their rainfall uncertainty, model uncertainty [Barlage et al., 2010] and interactions between the model uncertainty and rainfall data uncertainty.
Regarding the rainfall estimation (Fig. 3b), for GLDAS2.0, the NSE values are the highest in the Middle and Lower Yangtze river (0.95), and NSE is the lowest in the Heihe (0.14).The average NSE is 0.71 for GLDAS2.0.Similarly, for GLDAS2.1,NSE is the highest in the Middle Yangtze river (0.97), where the NSE value is the same as the Middle Yellow river.The average NSE is 0.82 for GLDAS2.1, which is better than GLDAS2.0.Comparing with the results in Fig. 2, GLDAS2.0 and GLDAS2.1 have large uncertainties in runoff estimation in terms of both RB (above 39%) and NSE (lower than 0.15), but GLDAS2.1 is better than GLDAS2.0.This result implies that GLDAS2.1 should be preferable to GLDAS2.0 in water related research, and more confidence should be given to analysis results using GLDAS2.1 than GLDAS2.0.

Time series analysis
Runoff data of GLDAS2.0 and GLDAS2.1 are compared with observation in Fig. 4 in time series plots.GLDAS2.0 and GLDAS2.1 underestimate most of the peak value in the Lower and Upper Songhua river, Middle and Upper Yangtze river, Upper Yellow River and Heihe.In the Upper Yangtze river, Upper Yellow River and Heihe, the underestimation may result from the lower rainfall estimation (Supplementary Information Fig. S1g, h and i).In addition, the human activities also have large influence on river runoff in the Heihe [Zang and Liu, 2013], and therefore the runoff uncertainty of GLDAS2.0 and GLDAS2.1 in the Heihe could also attribute to the fact that GLDAS2.0 and GLDAS2.1 do not consider human activities influence.
In the other basins, it may be due to model uncertainty and interactions between rainfall data uncertainty, Noah model uncertainty and human intervention, because the peak rainfall estimation is not largely underestimated (Supplementary Information Fig. S1a, b and e).In the Lancang river, GLDAS2.0 underestimates all the peak runoff, which may be because GLDAS2.0 underestimates all the peak rainfall (Supplementary Information Fig. S1j).
Compared with GLDAS2.0,peak runoff of GLDAS2.1 replicates observation relatively well in the Lancang river.In the Lower and Middle Yangtze river and the Yarlung Tsangpo river, both GLDAS2.0 and GLDAS2.1 could replicate runoff seasonal variations well.However, in other basins, the CC values are low.The average CC values are 0.59 and 0.63 for GLDAS2.0 and GLDAS2.1,respectively.Thus, GLDAS2.1 is better than GLDAS2.0.
In the middle and lower Yangtze river and the middle Yellow river, the rainfall data of GLDAS2.0 and GLDAS2.1 replicate observation well, with NSE and CC being above 0.88 and 0.97, respectively.In the middle and lower Yangtze river, the absolute values of RB are less than 9% for GLDAS2.0 and GLDAS2.1.In the middle Yellow river, RB of GLDAS2.1 is -1%, and RB of GLDAS2.0 is -18%.Therefore, the rainfall data may not be a main uncertainty source of runoff in the middle and lower Yangtze river.In the middle Yellow river, rainfall uncertainty may be not a main uncertainty source of runoff for GLDAS2.1, but it may be a main source for GLDAS2.0.
As pointed out by Qi et al. [2016a], models can amplify/decrease input data uncertainty in the uncertainty propagation chain from input to model output because of imperfect/incomplete model structures, which is often termed as interactions/non-linear uncertainty propagation [Bosshard et al., 2013].For example, the monthly rainfall from GLDAS2.1 is very close to the observation in the Upper Yellow river (Fig. S1), but the monthly runoff from GLDAS2.1 is much lower than observation (Fig. 4); in the Lower and Upper Songhua river, the monthly rainfall from GLDAS2.1 are higher than observation (Fig. S1), but the monthly runoff from GLDAS2.1 is much lower than observation (Fig. 4).In this kind of cases, the results may imply that models and interactions between models and input data have more significant influence on output than input data uncertainty.

Regional average analysis
Fig. 5 shows regional averages of NSE and absolute values of RB.Absolute RB of GLDAS2.0 runoff is higher in the Tibetan Plateau (64%) than in Middle and Northeast China.Different from GLDAS2.0, absolute RB value of GLDAS2.1 is the highest in Northeast China (43%), and is 40% in the Tibetan Plateau (which is still very high).These results indicate that both GLDAS2.0 and GLDAS2.1 have large uncertainty in the Tibetan Plateau.Similarly, absolute RB for rainfall also show the uncertainty is the highest in the Tibetan Plateau (Fig. 5c) with absolute RB being 36% and 24% for GLDAS2.0 and GLDAS2.1,respectively.
Regarding NSE, Fig. 5b shows NSE of GLDAS2.0 is the lowest in the Tibetan Plateau (-0.46), which is similar to rainfall (Supplementary Information Fig. S1d): NSE of GLDAS2.0 rainfall is also the lowest in the Tibetan Plateau (0.60).Different from GLDAS2.0,NSE of GLDAS2.1 runoff is the highest in the Tibetan Plateau.Although NSE of GLDAS2.1 runoff is higher than GLDAS2.0 in the Tibetan Plateau, but still shows very low accuracy with a value of only 0.30 (Fig. 5b).For GLDAS2.1, different from runoff, NSE of rainfall is the lowest in the Tibetan Plateau (0.73 in Fig. 5d).Overall, both GLDAS2.0 and GLDAS2.1 have large uncertainties in runoff estimation in the Tibetan Plateau, with GLDAS2.0 having higher uncertainty in terms of both RB and NSE.The large uncertainty may be because the rainfall data of both GLDAS2.0 and GLDAS2.1 have the highest uncertainty in the Tibetan Plateau among the three regions.

Conclusions and future work
This study evaluates and inter-compares the GLDAS2.0 and GLDAS2.1 runoff simulations on large scales based on the observation data in China.We found that both GLDAS2.0 and GLDAS2.1 have large uncertainties in runoff simulations, with absolute values of RB being above 39% and NSE values being lower than 0.15 on average.Therefore, they are not reliable enough to underpin water related studies in China.GLDAS2.0 has larger uncertainties in the Tibetan Plateau than in the Middle China and Northeast China.We also found that NSE of GLDAS2.1 runoff has an opposite changing pattern to that of rainfall.Thus, cautions should be taken when using them.
Most of hydrological gauges do not monitor snow-and/or glacier-melt flow.Frozen ground distribution is also not well monitored in the regions studied.Therefore, we did not evaluate the influence of snow-and/or glacier-melt flow and frozen ground in the Tibetan Plateau and Northeast China specifically.Future research is encouraged when well monitored snow-and/or glacier-melt flow and frozen ground data are available.In addition to uncertainty from input data and parameterization, the runoff uncertainty may result from the absence of consideration of glacier melt and human intervention to some degree, because the models GLDAS2.0 and GLDAS2.1 used do not consider glacier melt and human water abstraction.Future research is encouraged to validate improvement of the system performance when GLDAS system model is updated to include them.

Acknowledgements:
This study was supported by the Young Scientists Fund of the National Natural Science The GLDAS2.0 and GLDAS2.1 runoff are calculated on the basis of the Noah land surface model (Noah 3.3), considering vegetation transportation, surface water and energy balance, snow melt water, etc.The GLDAS system uncertainty refers to the input data uncertainty and model uncertainty hereafter.Three regions with different hydro-climate characteristics are chosen for the analysis, i.e., the Tibetan Plateau region, Middle China and Northeast China.Uncertainties in the GLDASs data and the possible factors causing the uncertainties are discussed.Some cautions in the use of the GLDASs data in China are also discussed.This ©2019 American Geophysical Union.All rights reserved.
Fig. 2 and supporting information Table S1 and S2 are because the uncertainty criteria

Fig. 5
Fig. 5 also shows the NSE values of runoff are decreasing from Northeast China to the Tibetan

Foundation
of China (Grant No. 51809136), the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA20060402), and the Young Scientists Fund of the National Natural Science Foundation of China (Grant No. 51509176).Additional support was provided by Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control (2017B030301012), and State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control.The GLDAS data used in this study were acquired as part of the mission of NASA's Earth Science Division and archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC).GLDAS data were downloaded from https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS.The data in Songhua and Liao rivers can be found by contacting Songliao Water Resources Commission (http://www.slwr.gov.cn/slwjj/slwgk/).The data in Yellow and Yangtze rivers can be found by contacting National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (http://www.geodata.cn).Acknowledgement for the data support from National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China.The data in Heihe are from Zhangye Water Resource Management Bureau, and can be found by contacting the Heihe data management center (http://www.heihedata.org/).The data in Yarlung Tsangpo and Lancang rivers are from Nuxia and Changdu hydrology bureaus, and can be found by contacting the Climate Change and Water Resources in the Great River Regions in the Southeast and South Asia

Fig. 3
Fig. 3 Nash-Sutcliffe Efficiency (NSE) of runoff and rainfall on a monthly scale.

Table 1
Details of the study regions Fig.1The regions studied.©2019American Geophysical Union.All rights reserved.