The Assessment and Comparison of TMPA and IMERG Products Over the Major Basins of Mainland China

The Integrated Multi‐satellitE Retrievals for Global Precipitation Measurement mission (IMERG) aims to deliver the “best” precipitation estimation from space and has attracted much attention. The Version 05 of IMERG products including the near‐real‐time “Early” and “Late” run products (IMERG‐E and IMERG‐L, respectively), and the post‐real‐time “Final” run IMERG product (IMERG‐F) are assessed at both national and basin scales against gauge observations over Mainland China for a 4‐year period (from April 2014 to March 2018). As control products for comparison, their predecessor Tropical Rainfall Measurement Mission (TRMM) Multi‐satellite Precipitation Analysis (TMPA) products (i.e., TMPA‐RT and TMPA‐V7) are also employed. Components analysis confirms the best performance of IMERG‐F among the five SPEs in three different categories. All five SPEs feature increasing bias and root mean square difference (RMSD) with increasing daily gauge total precipitation, and such issue is less pronounced for IMERG‐F—as evidenced by the lowest bias and RMSD across all precipitation rates. Besides, compared to TMPA, IMERG products exhibit better accuracy in detecting real precipitation evens, especially for light‐to‐medium rain (<60 mm/day), but they do not demonstrate significant improvement in the assessment of severe over/underestimation. In the basin‐scale comparison, all five SPEs catch the key variation feature of basin‐averaged precipitation time series (except TMPA‐RT over Continental River Basin). Overall, IMERG‐F demonstrates the best performance over all nine basins despite the slight overestimation, followed by TMPA‐V7. IMERG‐E and IMERG‐L show performance close to or even better than TMPA‐V7 in terms of the correlation coefficient and RMSD.


Introduction
Floods are among the most common and costly natural hazards worldwide . Over the past decades, flood-related disasters have caused economic losses of tens of billions of dollars (USD) and a large number of casualties each year (Hirabayashi et al., 2013). Most of these losses and casualties appeared in densely populated and underdeveloped countries (Wu et al., 2012), and the increasing climate variability may exacerbate the flood damages in the future (Hirabayashi et al., 2013). China, as a populous and developing country, has experienced several devastating flood events (Ma et al., 2018), such as the unprecedented great flood of Yangtze River in 1998, which affected 223 million people and resulted in direct economic loss of 166,600 million CNY (Chinese Yuan) (Zong & Chen, 2000). Frequently occurring flood events highlight the importance of flood forecasting system (FFS), which can provide sufficient lead time for risk response. In practice, the core component of an operational FFS involves an applicable hydrological model, which calibrated by retrospective hydrometeorological data to reproduce the observed hydrographs with acceptable errors, and real-time forcing to run the model and provide the up-to-date hydrological states (Zhang & Tang, 2015). However, appropriate past-or real-time precipitation data are often insufficient in most developing countries (e.g., China) and many remote parts of the world, which makes it difficult to build the operational FFS.
Precipitation is a crucial component of global water and energy cycles, and the reliability of FFS is strongly dependent on the quality of the rainfall inputs (Brunetti et al., 2018). Generally, rain gauges and groundbased weather radars are major sources of precipitation data, and they are considered as the most reliable sources (Tapiador et al., 2012). However, uneven distribution, insufficient density, limited spatial coverage hamper the wide application of these ground-based precipitation observations (GPOs), especially in the basins with complex topography (Anjum et al., 2018;Li et al., 2017). In recent years, the rapid development of satellite observations and retrieval techniques has prompted the continuous improvement of satellitebased precipitation estimations (SPEs), providing new opportunities for regional, national, and global scale applications [Kim et al., 2018;J Su et al., 2017;Wu et al., 2012] to overcome the limitation of GPOs.
Since the Tropical Rainfall Measuring Mission (TRMM), operated by the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), was launched in 1997 as the first dedicated meteorological precipitation satellite, several diverse quasi-global SPEs have been released publicly through different techniques, such as the TRMM Multi-satellite Precipitation Analysis (TMPA) [George J Huffman et al., 2007], Climate Prediction Center (CPC) MORPHing technique (CMORPH) (Joyce et al., 2004), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) (Sorooshian et al., 2000), and Global Satellite Mapping of Precipitation (GSMaP) (Ushio et al., 2009). Among them, the TMPA products, especially the latest Version 7 (V7), are regarded as the most reliable and extensively used products in the TRMM era over most parts of world (Dinku et al., 2008;Gao et al., 2017;Qin et al., 2014;F Su et al., 2008;J Su et al., 2017;H Yang et al., 2007;Yong et al., 2015). Over Mainland China, Yong et al. (2016) focused on evaluating both near-realtime and post-real-time TMPA by an improved error-component analysis procedure and suggested that the error components in TMPA exhibited rather strong regional and seasonal differences over diverse climate regimes. Shen et al. (2010) assessed six high-resolution SPEs over China and pointed out that all of six SPEs could capture the overall spatial distribution and temporal variations of precipitation reasonably well while CMORPH demonstrated a slightly better accuracy than TMPA. Sheng et al. (2013) systematically evaluated and compared two successive TMPA V6 and V7 over Mainland China and demonstrated that the latest TMPA V7 clearly improved upon its predecessor TMPA V6 over China. Benefited from the contributions of previous researchers, the spatial and temporal distribution, structure, and magnitude of errors contained in TRMM era SPEs have been preliminarily understood.
The TRMM satellite re-entered the earth's atmosphere on 15 June 2015, ending its over 17 years of productive data gathering; however, it achieved great success for both providing huge volumes of quasiglobal SPEs and accumulating adequate inversion experience and techniques [J Su et al., 2019]. Introduced as a successor to TRMM, the Global Precipitation Measurement (GPM) mission's Core Observatory (CO) satellite was launched on 27 February 2014, marking a transition from the TRMM era to the GPM era. Equipped by a latest Dual-Frequency Precipitation Radar (DPR, the Ku-band at 13.6 GHz and Ka-band at 35.5 GHz) and a conical scanning, multichannel, GPM Microwave Imager (GMI, frequency ranging from 10 to 183 GHz), the CO satellite of GPM is designed to improve the accuracy in detecting light and solid precipitation and thus provide a better reference standard from a constellation of research and operational satellites (Hou et al., 2014;George J. Huffman et al., 2014;J Su et al., 2019). In addition, the GPM mission is also supported by approximately 10 partner satellites. The joint action of CO satellite and its partner satellites can effectively enhance the temporal and spatial coverage of sampling capabilities and further improve the spatiotemporal resolution.
As the Level 3 products of GPM mission, the Integrated MultisatellitE Retrievals for GPM (IMERG) precipitation estimations are produced by merging all available satellite microwave estimations from the low earth orbit satellites and microwave-calibrated infrared estimations from the geosynchronous orbit (GEO) satellites (George J. Huffman et al., 2017). To accommodate the different user requirements, the IMERG is generated to supply three types of products including: the near-real-time "Early" run (IMERG-E), the "Late" run (IMERG-L), and the post-real-time "Final" run (IMERG-F) products (Hou et al., 2014;George J. Huffman et al., 2014). Among them, the IMERG-E and IMERG-L are satellite-only SPEs releasing approximately 4 and 12 hr after the observation time, respectively. The IMERG-F is a satellite-gauge merged SPE, which is calibrated by employing the Global Precipitation Climatology Center (GPCC) monthly precipitation data via the gauge calibration algorithm (GCA). As the IMERG algorithm potentially delivers the "best" precipitation estimations (George J. Huffman et al., 2017), assessment and benchmarking of IMERG products have been in focus. For instance,  globally intercompared and regionally assessed different versions of the IMERG products (i.e., V03, V04, and V05). Their results indicate that both IMERG-F V04 and V05 show significant differences and improvements from V03, particularly in the ocean, and the IMERG-F V05 improves upon both V04 and V03 over Mainland China. An initial comparison of monthly TMPA and IMERG products operated by Liu (2015) at a global scale reported that large systematic differences of them exist and vary with the surface type and precipitation rate. Simultaneously, several initial statistical performance evaluations of IMERG products have been conducted in different regions, such as in India (Ganesh et al., 2019;Prakash et al., 2016), Iran (Sharifi et al., 2016), South America (Palomino-Ángel et al., 2019), West and East Africa (Dezfuli et al., 2017), East Asia (Lee et al., 2019), Singapore (M L Tan & Duan, 2017), and USA (J Tan et al., 2016). Most of them found that IMERG products generally outperform the earlier TMPA products and better capture both the spatial and temporal variation of precipitation.
Mainland China has a complex topography and spatially heterogeneous precipitation, which makes it a good test-bed for evaluating the SPEs. More importantly, the natural pattern of northern droughts and southern floods in Mainland China has generated a large number of potential users of SPEs . Hence, the assessment of IMERG products over Mainland China is valuable for both algorithm developers and data users. Thus far, there have been several preliminary assessments focusing on evaluating IMERG products over Mainland China. For instance, Chen and Li (2016) monthly evaluated IMERG and compared it with TMPA 3B43 products while Ning et al. (2016) directly compared IMERG and GSMaP against gauge observations over a period from April 2014 to November 2015. Moreover, IMERG and TMPA are compared in the works performed by L Jiang and Bauer-Gottwein (2019) and Wei et al. (2018). These researches all used a sparse gauge network (less than 840 weather stations over entire China) as the benchmark. However, the sparse gauge network may not provide sufficient spatial coverage and details of precipitation information and thus leads to high uncertainty of the assessments. Additionally, IMERG and TMPA were evaluated in Tang, Ma, et al. (2016) and Guo et al. (2016) by employing a relatively dense network with more than 2,400 gauges. Nevertheless, the assessments and comparisons of both were only performed over a short time frame (9 months for Tang, Ma, et al. (2016) and 1 year for Guo et al., 2016, respectively). Similarly, works of  and Zhao, Yang, You, et al. (2018) intercompared the performance of IMERG and GSMaP in eight geographical subregions based on gridded Chinese daily Precipitation Analysis Product for a short-term period of March 2014 to December 2015. Due to the great interannual variation of precipitation in Mainland China, it is hard to obtain reliable statistical performance metrics of these SPEs in such a short duration. Meanwhile, J Su et al. (2018) attempted to reveal the 3-hr error features of multiple IMERG products and GSMaP via the error-component analysis, but the reference data used in it are an hourly gaugesatellite merged product produced by merging the ground data with the CMORPH. Since the reference data are not truly independent reference, it may bias the assessment results. In addition, most of the previous studies have mainly focused on the earlier versions of IMERG products (i.e., V03 and V04); thus,the assessment of IMERG V05 is still urgently needed as it has undergone many improvements upon its previous versions [C Wang et al., 2018]. Also, note that almost all assessments related to IMERG focused on the "Final" run products, and comprehensive evaluation of "Early" and "Late" runs of the IMERG products are still rare. Considering the high practical value of near-real-time SPE in actual forecasting, it is necessary to evaluate the accuracy of near-real-time IMERG products. Therefore, the objectives of this study are twofold: (1) evaluating the quality of near-real-time and post-realtime IMERG V05 and comparing them with the corresponding TMPA against gauge observation from the China Meteorological Administration (CMA) over Mainland China, and (2) comparing IMERG and TMPA products synchronously over major basins of Mainland China to provide basic accuracy information for the potential users in flood forecasting and water resources management. The remainder of this paper is organized as follows: Section 2 introduces the study area, precipitation datasets, and metrics; Sections 3 and 4 present the main results and the discussion, separately; and a summary of the work is given in the last section.

Study Area
The study area is the entire Mainland China, which spans between 73°E-135°E and 18°N-53°N and covers a land area of about 9.6 million km 2 . As shown in Figure 1a, the terrain of China is high in west but low in east, which determines the direction of major rivers and plays an important role in the formation of China's climate. According to the water resources and river classification, the entire Mainland China can be divided into nine major hydrological zones (http://www.resdc.cn), namely, Continental River Basin (CRB), Yellow River Basin (YeRB), Haihe River Basin (HaiRB), Song-Liao River Basin (SLRB), Southwest River Basin (SwRB), Yangtze River Basin (YaRB), Huai River Basin (HuaiRB), Pearl River Basin (PeRB), and Southeast Basin (SeRB). The basic characteristics of these basins are presented in Table 1 (Fang et al., 2016). The CRB is located in northwest China, which is a typical arid area with severe water shortage. The YeRB, HaiRB, and YeRB distribute in north and northeast of China with semi-arid and semi-humid climate. The SwRB, YaRB, and HuaiRB distribute at the mid-latitudes in China, while PeRB and SeRB locate at the low-latitudes. Most parts of these five hydrological zones (SwRB, YaRB, HuaiRB, PeRB, and SeRB) are in the humid zone of China, whereas PeRB and SeRB receive significantly more precipitation than other three hydrological zones. In addition, Mainland China is located in the typical Asian monsoon region where the monsoonal circulation significantly affects the spatial and temporal distribution of precipitation. Coupled  with the diverse topography, the Mainland China has been a good test-bed for assessing the performance of SPEs, as well as the effects of such factors on the accuracy of the related retrieval algorithms.

Ground Reference Data
Ground-based observations are normally employed as reference to assess the performance of SPEs. In this study, the daily precipitation observations from the Chinese national ground weather network, which include more than 2,400 weather stations, are employed as the reference in April 2014-March 2018. All the observations from such weather network are subjected to rigorous quality control, including analysis of extremes, contiguous values, and spatial consistency checks. The spatial distribution of these stations is shown in Figure 1b. Generally, the weather stations are distributed unevenly over Mainland China with densely populated stations in eastern China (i.e., HaiRB, HuaiRB, east part of YaRB, SeRB, and PeRB) and relatively sparse stations in western China, especially in CRB. In addition, it should be noted that, over the whole country, only 194 weather stations of such weather network have been employed by GPCC to produce and calibrate TMPA and IMERG products. Given that the dependent stations merely account for a small fraction of the total gauges (<9%), the evaluation and comparison in this research are based on more than 91% independent weather stations. Besides, both post-real-time TMPA and IMERG products are calibrated by the GPCC at monthly scale, whereas the assessment in this study is operated on daily scale. Therefore, we believe that the observations from Chinese national ground weather network can act as a receivable benchmark in evaluating and comparing TMPA and IMERG products.

TMPA and IMERG Products
In this study, the TMPA and IMERG products are selected as the typical representatives of two overlapped eras in quantitative SPEs. For the primary objectives mentioned above, both near-real-time and post-realtime products of them are evaluated and compared at national and basin scales.
The TMPA were produced by combining various precipitation estimations form different satellite systems as well as gauge observations to provide the "best" SPEs in TRMM-era (George J Huffman et al., 2007). The Version 7 of TMPA including near-real-time TMPA-RT and post-real-time TMPA-V7 are available in the NASA website (https://pmm.nasa.gov/data-access/downloads/trmm) at a spatial resolution of 0.25°× 0.25°over the 50°N-50°S global latitude band. The main difference of them is that the TMPA-RT is a satellite-only product directly derived from the combination of the calibrated passive microwave (PMW) data and the PMW calibrated infrared (IR) data, whereas TMPA-V7 is a gauge calibrated product operated by employing GPCC monthly gauge precipitation analysis.
The IMERG algorithm was designed by integrating the TMPA, the PERSIANN-Cloud Classification System (PERSIANN-CSS), and the CMORPH-Kalman Filter (CMORPH-KF) techniques into the unified United States (US)-developed algorithm. In this algorithm, multiple satellite microwave precipitation estimates were intercalibrated, merged, and interpolated to provide quantitative SPEs at fine spatiotemporal resolution (0.1°× 0.1°and half hourly). As with TMPA, IMERG algorithm also provides three kinds of products to accommodate the different user requirements, among which IMERG-E and IMERG-L are produced in near real time and calibrated with climatological coefficients while IMERG-F is produced post-real-time and calibrated by GPCC. Since their first release in 2015, the IMERG products have been updated for several times, i.e., V03, V04, V05, and V06. Since the near-real-time products of IMERG V06 are not yet available, the V05 products acquired from https://pmm.nasa.gov/data-access/downloads/gpm are employed in this study.
In addition, to structure the daily evaluation framework, TMPA and IMERG products are temporally aggregated to produce daily precipitation accumulation commensurate with the gauge observations (Yuan et al., 2017). Then, for facilitating direct comparisons, only the grids containing at least one weather station are selected from TMPA and IMERG products, respectively. If there are two or more weather station in a same grid, the mean value of them is employed as the reference.

Methodology
This research focuses on evaluating and comparing the five SPEs over the major basins of Mainland China.
To build the assessment frame, a series of widely used evaluation metrics including the Pearson correlation coefficient (CC), the Bias, and the root mean square difference (RMSD) are utilized. Among them, CC describes the linear agreement between SPEs and gauge observations; Bias, assessing the system bias of SPEs (i.e., overestimation or underestimation), is the mean value of the difference between SPEs and gauge observations essentially; and RMSD shows the average magnitude of the error and gives greater weights to the larger errors. Besides, to evaluate the precipitation detection capability of SPEs, two categorical indices are employed: the probability of detection (POD) and the false alarm ratio (FAR). POD quantifies the fraction of true rainfall occurrences that are correctly detected by SPE while FAR denotes the fraction of zero rainfall events detected by SPE as non-zero events. Considering the detection resolution of weather stations, a small value (0.1 mm/day) is used as the rain/no-rain threshold in this study. Detailed formulas and perfect values of these metrics mentioned above are listed in the Table 2. For detailed information, please refer to Wilks (2006) and Ebert et al. (2007).

Grid-scale Errors in IMERG and TMPA Products
To visually assess the accuracy of SPEs in capturing precipitation accumulation from April 2014 to March 2018, the spatial distributions of 4-year daily average precipitation derived from the five SPEs and ground weather stations are shown in Figure 2. Since the southeastern and eastern parts of Mainland China are located in the East Asian monsoon region, the spatial distribution of precipitation accumulation exhibits the influence of the monsoon over the corresponding regions. From Figure 2g, it is obvious that the precipitation pattern derived by gauge observations gradually decreases from the southeast (>4 mm/day) to the northwest (<1 mm/day) of China. This is mainly due to the fact that the southeast monsoon from the Pacific Ocean and the southwest monsoon from the Bay of Bengal (as shown in Figure 1a) bring abundant moisture to southeastern China, generating a large amount of rain. The moisture carried by the monsoon declines as it moves north and the precipitation accumulation gradually decreases accordingly. Since the moisture can hardly reach the northwest China, precipitation in such areas is scarce. Besides, Mainland China also suffers from the influence of the midlatitude westerlies. Since the midlatitude westerlies is much weaker than both southeast and southwest monsoons, vast northwest China always suffers from drought and shortage of water. Due to the sparse weather stations in northwest China, the impact of the midlatitude westerlies is not captured with enough details.
All five SPEs well capture this coarse-scale spatial distribution pattern of precipitation, whereas some differences at finer scales still exist among them. In order to quantify such difference, the daily average precipitation sampled from all grids with at last one weather station are compared in the scatter plots of Figure 3. The related error metrics including CC, Bias, and RMSD are also provided in Figure 3. The near-real-time TMPA-RT presents the poorest performance in terms of all three measures of error (CC, Bias, and RMSD). The near-  real-time IMERG-E and IMERG-L show superior performance to TMPA-RT. Interestingly, the performance of IMERG-L is slightly lower than that of IMERG-E on Bias and RMSD although IMERG-L employs more sensor information and has a longer data latency. As for the post-real-time products, TMPA-V7 demonstrates a noticeable improvements over its near-real-time product, whereas the bias of IMERG-F is higher than that of TMPA-V7 and near-real-time IMERGs (0.20 mm/day for IMERG-F versus 0.10 mm/day for TMPA-V7, and 0.02 mm/day and −0.04 mm/day for IMERG-E and IMERG-L, respectively). Considering the high RMSD values and the loose distribution of points shown in IMERG-E and IMERG-L, IMERG-F features the largest bias but smallest spread whereas IMERG-E/L have smaller bias but wider spread (than both TMPA-V7 and IMERG-F). TMPA-V7 has smaller bias than IMERG-F but similar spread (their RMSD are similar). The systematic error, bias, is easier to correct than the random error, scatters around the mean trend. For further comparison and tracking the error source, there is a need to subdivide the error of SPEs into different components (i.e., error components in Hits, Misses, and False categories).
The comparison of the five SPEs versus gauge observations in three different detection categories (i.e., Hits, Misses, and False categories) is summarized in Table 3. The IMERG products show a remarkable improvement in detecting real precipitation events-with the POD values increasing from less than 0.6 for TMPA products to more than 0.75 for IMERG products. Two major factors could contribute to such improvement: (1) the extension of the GPM sensors, which effectively improves the ability in detecting light rain [J Su et al., 2019]; (2) the finer temporal resolution of IMERG products, providing higher chance to capture short-term precipitation events. However, the enhanced sensitivity of sensors and increased sampling frequency also increase the detection of false precipitation events-as evidenced by the increased FAR values from approximately 0.31 for TMPA products to approximately 0.39 for IMERG products. In terms of Bias and RMSD, since IMERG products have much lower Bias and RMSD values in all three categories, they exhibit superior performance to TMPA. In the intercomparison of IMERG products, IMERG-F exhibits the lowest negative bias in Misses category (Bias is −1.81 mm/day for IMERG-F versus −2.04 mm/day and −1.81 mm/day for IMERG-E and IMERG-L, respectively), but the positive bias of it is slightly higher than IMERG-L in False category. Besides, in Hits category, two near-real-time IMERG products both trend to underestimate precipitation, whereas IMERG-F exhibits a slight overestimation. The relatively high precipitation accumulation of IMERG-F in Hits and False categories is mainly caused by the monthly gauge calibration, which attempts to alleviate the underestimation of near-real-time IMERG products over the southeast China but in turn results in slight overestimation for IMERG-F detections over such regions (Xu et al., 2016). Besides, note that  IMERG-F shows the lowest RMSD metrics in all three categories, which highlight the excellent performance of IMERG-F.
Severe over/underestimation of precipitation can cause serious issues in the disaster-related risk assessment (e.g., flood, landslide, and hurricanes) and thus reduce the actual usability of SPEs. Hence, the fraction of severe over/underestimation events for the five SPEs are compared in Figure 4. Here, the severe overestimation is defined as the precipitation detected by SPE exceeds gauge observation by or over 100% (including part of Hits category and the whole False category), and the sever underestimation is defined as precipitation detected by SPE is less than half of gauge observation. From Figure 4, it is observed a high proportion (approximately 60% of SPE's detections) of sever overestimation for all five SPEs, but the components of sever overestimation are different. In comparison with TMPA products, IMERG products have a high proportion of false alarms, and a lower probability of severe overestimation in the Hits category. With regard to severe underestimation, for all five SPEs, the fraction of it is relatively low (approximately 17% of SPE's detections). However, since the severe underestimation of rainstorms can cause catastrophic damage, such a drawback still need to be addressed. In addition, by the comparison between near-real-time and post-real-time SPE (i.e., TMPA-RT versus TMPA-V7, and IMERG-E/L versus IMERG-F), the gauge calibration contributes very little in reducing the fraction of severe overestimation/underestimation, which highlights the potential utility of the near-real-time SPEs in the analysis of extreme precipitation events. Table 4 summarizes the Bias and RMSD values of the five SPEs over all grids (with at least one gauge) at 11 different ranges of precipitation rates, and their associated difference from gauge observations (i.e., SPE minus gauge observation) is shown in Figures 5a-5e. Meanwhile, taking TMPA-V7 and IMERG-F as representatives of TMPA and IMERG products respectively, the variation of their POD metrics at different precipitation rates are demonstrated in Figure 5f. A notable feature of the error shown in Figures 5a-e is the large number of outliers, which indicates extreme overestimation or underestimation of the five SPEs. The bulk of the boxplot outliers across all rain rates except >100 mm/day highlights the positively skewed distributions of difference caused by the overestimation. Besides, with increasing daily precipitation rates, all five SPEs exhibit a trend of gradually increasing negative bias-from a slight positive bias of 1.02-1.82 mm/day for the light rain (<10 mm/day). In other words, the SPE usually overestimate the light rain but underestimate the medium-to-heavy rain (>20 mm/day) (Pipunic et al., 2015;Yong et al., 2010). Meanwhile, RMSD of SPEs increases with the precipitation rates, indicating a multiplicative nature of the error. In the comparison of  multiple SPEs, near-real-time IMERG products generally have the highest negative bias (except for light rain), but post-real-time IMERG-F exhibits the lowest negative bias, which highlights the importance of GCA in producing IMERG-F. Besides, among the five SPEs, IMERG-F also has the lowest RMSD metrics. In Figure 5f, results show a clear trend of increasing POD with increasing rain rate-indicating that the higher daily total precipitation, the lower risk of missing. Even so, for the heavy rain events, the relative high negative Bias and RMSD metrics as well as the numerous outers indicate that it still far away from excellent. Besides, for light/moderate rain events (≤90 mm/day), IMERG products has higher POD metrics than TMPA products.

Spatial Distribution of Errors in IMERG and TMPA Products
In order to analyze the spatial variations of SPEs, the temporal error metrics (including CC, Bias, and RMSD) of the five SPEs over all grids with at least one weather station are calculated and compared in this section. Note that the no-rain days are excluded from the analysis to reduce their impact on the error metrics, particularly over arid regions. Besides, since TMPA products cannot provide the precipitation information above the latitude of 50°N, the weather stations over such regions are not involved in the comparison.
Figures 6a-6e are spatial distributions of CC for the five SPEs, with associated boxplots in Figure 6f. In general, the spatial patterns of CC are similar to that of daily average precipitation shown in Figure 2, which shows higher values in the southeast but lower values in the northwest. For near-real-time TMPA-RT, the   CC is acceptable (generally higher than 0.6) over the humid southeast China, but it decreases appreciably over the semi-arid and semi-humid areas (varies between 0.4 and 0.6). The CC of TMPA-RT is a little lower over northwest China, particularly around the northern boundary of the Tibetan Plateau where CC values are even lower than 0.2. In contrast, TMPA-V7 shares almost identical distributions of CC with TMPA-RT, but exhibits higher values over most parts of Mainland China. The near-real-time IMERG products (i.e., IMERG-E and IMERG-L) demonstrate close or even slightly better CC values to TMPA-V7, and the post-real-time IMERG-F displays best CC of the five selected SPEs-with CC values generally higher than 0.7 over most of central and southern China. Even so, IMERG products still suffer from the low CC over northwest China. Tang, Ma, et al. (2016) suggested that the varied topography and arid climate contribute most to the low CC metrics over such areas, and few rain gauges over these regions are adopted in GPCC system is also an important factor for the post-real-time SPEs . For visual comparison, the Tunxi and Ruoqiang weather stations (refer to Figure 6a) are chosen as representative of humid and arid areas, respectively. The precipitation time series of the five SPEs versus gauge observations over these two stations are compared in Figure 7. It is obvious that the high probability of false precipitation (FAR values are higher than 0.85 for TMPA products and approximately 0.8 for IMERG products) is the key shortcoming of all SPEs over arid regions. This is mainly due to that hydrometeors detected by spaceborne sensors are partially or totally evaporate before reaching surface . Since rain areas delineation and rain/no-rain delineation cannot be corrected by the GCA (J Su et al., 2018;Xu et al., 2016), compared to corresponding near-real-time products, there is no significant improvement of post-real-time SPEs. Hence, for further improving the performance of SPEs (particularly for arid regions), more attention should be focused on developing more accurate rain areas delineation algorithm.
As with CC, the spatial distributions of Bias (mm/day) and RMSD (mm/day) for the five SPEs are shown in Figure 8, respectively. TMPA-RT exhibits severe overestimation over most parts of China, particularly for north China, Tibetan Plateau, and southeast China (R1 in Figure 8a) where Bias values of TMPA-RT are generally higher than 1.5 mm/day. IMERG-E and IMERG-L also overestimate the precipitation over most of north China and the North-China Plain (R3 in Figure 8a), but the magnitude of Bias is significantly lower than TMPA-RT. Besides, IMERG-E and IMERG-L display remarkable underestimation over south China and Sichuan Basin (R2 in Figure 8a), unlike the overestimation of TMPA-RT over such regions. In regard to the post-real-time products, given that the same ground-based benchmark (i.e., GPCC) and GCA are employed by TMPA-V7 and IMERG-F, they share almost identical distributions of Bias. However, in these two post-real-time products, the GCA takes effect in different ways. In TMPA-V7, the GCA attempts to alleviate the overestimation shown in TMPA-RT, whereas, in IMERG-F, it tends to suppress the overestimation over north China and increases the total precipitation over south China.
Hence, compared to corresponding near-real-time products, both MPA-V7 and IMERG-F have lower spatial heterogeneity of Bias. However, as suggested by Xu et al. (2016), the calibration algorithm of IMERG-F significantly reduces the negative hits bias but meanwhile exaggerates the false bias over south China. Therefore, compared to near-real-time IMERG products and TMPA-V7, IMERG-F exhibits slightly higher overestimation over south China.
With respect to RMSD, all five SPEs display similar spatial distribution. Actually, considering the multiplicative nature of RMSD contained in SPEs (Pipunic et al., 2015;Tian et al., 2013) and the similar spatial distribution of their total precipitation (shown in Figure 2), it is understandable. Beyond that, caused by the sever overestimation, TMPA-RT features relatively higher RMSD over the southeast Tibet Plateau (R4 in Figure 8). Overall, according to the boxplots shown in Figure 8i, TMPA-RT exhibits the highest RMSD while IMERG-F shows the lowest. In terms of RMSD, the near-real-time IMERG-E and IMERG-L show similar or slightly better performance to that of post-real-time TMPA-V7.
To compare the capability of precipitation detection, the spatial distributions of POD and FAR are illustrated in Figure 9. As discussed in section 3.1, the enhanced sensitivity of sensors and increased sampling frequency contribute to the improved POD of IMERG products (shown in Figure 9f), but meanwhile exaggerate the FAR (shown in Figure 9l). Besides, the post-real-time SPEs share almost identical distributions of POD and FAR with their near-real-time products respectively, which further confirm the futility of the GCA in correcting rain areas delineation and rain/no-rain delineation. As for the regional difference, caused mainly by the arid climate, all five SPEs demonstrate high FAR metrics over northwest China, serving as an important factor for the poor performance of SPEs over such region. In comparison to TMPA products, IMERG products exhibit relatively high POD values (generally higher than 0.8) and FAR values (generally higher than 0.45) over the North-China Plain (R3 in Figure 9a), which closely relates to the high frequency of light rain. As for the central and southeast China, TMPA products demonstrate low POD metrics, whereas IMERG products show significant improvement over these regions (except for Sichuan Basin). In summary, IMERG products can better catch the precipitation events but also have higher detection of false precipitation.

Basin-scale Assessments and Comparisons
Many studies used the SPEs in their works on water balance analyses, water resource management, hydrological simulation, and disaster forecasting (W Yang et al., 2016). However, basin-scale assessments of SPEs is the process necessary for their widely operational applications, as the inherent biases and indeterminate errors of them can be propagated or even amplified through the applied integration processes and thus result in poor results (J Su et al., 2017;J Su et al., 2019). Figure 10 is the summary of evaluation metrics for the daily basin mean precipitation time series sampled from the five SPEs against gauge observations-with associated monthly basin mean precipitation time series in Figure 11. Here, the monthly time series (instead of daily) is selected for reducing visual clutter.
Compared to the evaluation metrics demonstrated in Figures 6 and 8, Figure 10 is featured by the increased CC and the reduced RMSD, which primarily attributes to the ability of spatial aggregation to partly filter out the uncorrelated error and thus improve the performance in estimating real precipitation. Therefore, all the five SPEs in Figure 11 (except TMPA-RT over the CRB) can catch the key variation feature of basin-averaged precipitation, further indicating that the SPEs can provide accurate information on precipitation occurrence and amount in basin-averaged estimations. Meanwhile, among the five SPEs, two post-real-time SPEs best catch the monthly precipitation variation of all basins as expected. Furthermore, in terms of daily CC and RMSD shown in Figure 10, IMERG-F demonstrates the best performance over all nine basins-as Figure 10. Summary of evaluation metrics for the five SPEs over the nine hydrological zones. The metrics were calculated based on the daily basin mean precipitation time series averaged from all grids containing at least one weather station. evidenced by the highest CC and lowest RMSD. However, in comparison to TMPA-V7, it still demonstrates a slight higher overestimation, particularly over YaRB, HuaiRB, and SeRB. The high probability of false precipitation (shown in Figure 9) and the overestimation in the Hits scenario (shown in Table 3) contribute mainly to such overestimation. In practical applications, if the Bias of IMERG-F is spatially and temporally homogeneous, it is not the key drawback as the hydrologic integration processes can tolerate some bias in a certain extent (J Su et al., 2019), and the calibration of model parameters can also suppress it (Li et al., 2017;Yuan et al., 2017). Hence, although the outliers of IMERG-F in Figure 8f indicate some severe overestimation/underestimation, the short interquartile range, which means low spatial heterogeneity also highlights its high potential hydrological utility over Mainland China.
The near-real-time spatial precipitation products with quantitative uncertainty are extremely valuable for shorter-term streamflow forecasting, which can provide sufficient lead time for flood-related risk responses. From this study, it observes that the performance of near-real-time TMPA and IMERG products generally exhibit greater spatial heterogeneity. TMPA-RT shows limited performance over the CRB as its high overestimation and poor agreement with gauge observations, whereas IMERG-E and IMERG-L demonstrate a significant improvement by evidence of both improved daily evaluation metrics and the better matching with gauge-based month precipitation. For other north basins at high latitude (e.g., YeRB, HaiRB, and SLRB), TMPA-RT significantly overestimate the precipitation, particularly in winter seasons. However, benefiting from the joint collaboration of DPR and GMI, which helps improve the accuracy in detecting solid precipitation, the performance of IMERG-E and IMERG-L is improved obviously in winter seasons. The SwRB situating across the southeast Tibetan Plateau and Yun-Gui Plateau is featured with complex topography and volatile climate. Over such basin, TMPA-RT demonstrates a remarkable overestimation, whereas IMERG-E and IMERG-L show notable underestimation especially in summer seasons. A similar situation also occurs in moist YaRB, PeRB, and SeRB. With regard to HuaiRB characterized by the transition zone between southern and northern climates, IMERG-E and IMERG-L show severe overestimation in winter seasons (Figure 11g), which is not observed in TMPA-RT. Such overestimation can principally ascribe to the supersensitive of sensors in GPM's CO satellite over such region. Generally, near-real-time TMPA-RT usually exhibits worst performance while IMERG-E and IMERG-L close or even better performance to that of post-real-time TMPA-V7 in spite of slight higher Bias.

Implications of the Results
The rapid development of satellite observation and retrieval algorithm has provided unprecedented opportunities for the improvement of SPEs. The IMERG products supported by the state-of-the-art GPM mission have been updated four times (i.e., V03, V04, V05, and V06) since it was firstly released in early 2015. Even so, IMERG is still in its infancy and the TMPA, as the predecessor of IMERG, will continue to be generated until the IMERG products can totally substitute TMPA products. This study demonstrates a systematic evaluation framework to explore the continuity and difference between IMERG and TMPA products across Mainland China, contributing toward a better understanding of errors in these SPEs. However, note that this evaluation is conducted at pixel-to-point scale, which means that the average precipitation of a large area (>100 km 2 ) has been directly compared to the observations at special points. The spatial mismatch may bias the assessment results especially in regions with great spatial variability precipitation, such as mountain areas and monsoon regions. Given a fact that the spatial resolution of IMERG products and TMPA products are 0.1°and 0.25°respectively, such spatial mismatch is more pronounced for TMPA products. In this context, it may potentially make sense that IMERG products are better than TMPA products.
For promoting the potential hydrological application, a preliminary evaluation and comparison of these SPEs are performed over the major basins of Mainland China. Our results highlight their excellent performance over the humid southeast China (e.g., YaRB, PeRB, and SeRB) and confirm their poorer performance over the arid northwest China (e.g. CRB) and the high-latitude basins (e.g., YeRB, HaiRB, and SLRB) as reported by previous studies (J Su et al., 2018;Tang, Ma, et al., 2016;Wei et al., 2018), providing some guidance for potential hydrologic applications over these basins. However, according to Camici et al. (2018) and L Jiang and Bauer-Gottwein (2019), the relation between the quality of the precipitation product and the hydrological performance is not straightforward that means a better precipitation product does not necessarily guarantee a better discharge simulation. Qi et al. (2016) suggested that a good discharge simulation closely depends on a good coalition of hydrological model and precipitation product. So, although the satellitebased precipitation products are not as accurate as the gauge-based products, they could also have better performance in discharge simulations when appropriately combined with hydrological models . Therefore, hydrological assessment of SPEs over given catchments is also a meaningful topic.
Over Mainland China, previous studies have assessed the discharge simulation using SPEs as forcing input, such as Li et al. (2017) and Tang, Zeng, et al. (2016)  . Most of them confirmed that SPE forced hydrological simulations are hard to outperform or equal to gauge based hydrological simulations due to their seasonal and regional systematic biases and random errors [D Jiang & Wang, 2019] although their hydrologic utility may be acceptable. In addition, some studies discovered that the model recalibration by employing SPEs as rainfall forcing could increase the performance of the discharge simulation significantly (S Jiang et al., 2018;Z Wang et al., 2017;Yuan et al., 2018), while the model parameters may be unrealistic and thus limit the model's performance at the sub-basin scale (D Jiang & Wang, 2019;Maggioni & Massari, 2018). In addition, the hydrological assessments of SPEs over Mainland China are usually conducted in a small number of catchments, and therefore, it is not clear to what extent the results can be generalized (Beck et al., 2017). Furthermore, only fairly large catchments (usually larger than 10,000 km 2 ) and complex distributed hydrological models (e.g., the Grid-based Xinanjiang model (S Jiang et al., 2018;Yuan et al., 2017), the Variable Infiltration Capacity model (J Su et al., 2019;Z Wang et al., 2017) and the Coupled Routing and Excess Storage model (Li et al., 2017;Tang, Zeng, et al., 2016)) are employed in evaluating hydrological utility of SPEs, leading to combined rainfall and model uncertainty that is not easily interpreted (Beck et al., 2017). In this context, it is of great value to evaluate and compare the hydrological utility of IMERG and TMPA products over different small watershed with diverse landscapes, topography, and climatic conditions using the traditional lumped hydrological model (i.e., simple rainfallrunoff model). L Jiang and Bauer-Gottwein (2019) explored the feasibility of using SPEs to force a lumped Hydrologiska Byrans Vattenbalansavdelning hydrological model over 300 headwater catchments of varying size and climate for a 2-year period (2016)(2017). Results indicated that IMERG products provide comparable performances to gauge-based precipitation, whereas post-real-time TMPA-V7 performs relatively poor in terms of discharge simulation. However, the short study period, which cannot be representative for longterm catchment conditions, is a major limitation. Also, when a different rainfall-runoff model was used, the results may be different. Hence, more efforts are urgently needed to explore how do SPEs perform as hydrological model forcing, particularly over ungauged or poorly gauged basins.

Conclusion
In this study, we focus on evaluating and comparing IMERG and TMPA products over Mainland China against a relatively dense gauge network from April 2014 to March 2018. For our research objectives, both the near-real-time and post-real-time products of them are employed. Subsequently, their quality is compared in the major basins of China to promote the widely hydrological application. The main conclusions are summarized as follows: 1. Compared to the gauge observations, all five SPEs can capture the coarse-scale spatial distribution pattern of precipitation accumulation well. In the pixel-to-point comparison, post-real-time IMERG-F presents a comparable accuracy to TMPA-V7, in spite of a slightly higher overestimation (0.20 mm/day for IMERG-F versus 0.1 mm/day for TMPA-V7). The near-real-time IMERG-E and IMERG-L demonstrate a poorer accuracy than post-real-time products, but both of them do much better than TMPA-RT in terms of almost all evaluation metrics. 2. Due to the enhanced sensitivity of sensors and increased sampling frequency, IMERG products have better accuracy in detecting real precipitation evens, but also contains more false precipitation. Component analysis in three different detection categories highlights the excellent performance of IMERG-F among the five SPEs. However, IMERG-F does not demonstrate significant improvement in severe overestimation/underestimation comparison. 3. All five SPEs present high accuracy in detecting heavier precipitation, in spite of the notable underestimation. As for light and moderate rain, IMERG products are more reliable by evidence of increased POD metrics. With lower Bias and RMSD, IMERG-F shows the best performance at almost all precipitation rates. 4. For the spatial variations, all five SPEs show better performance over south China but poor performance over northwest China. Attributing to the GCA, which alleviates the overestimation/underestimation effectively, the post-real-time SPEs usually exhibit low spatial heterogeneity over entire Mainland China. 5. All five SPEs can catch the key variation feature of basin-averaged precipitation time series (except TMPA-RT over CRB), indicating high potential hydrological utility. In terms of CC and RMSD, postreal-time IMERG-F demonstrates the best performance over all nine basins. Nevertheless, it also contains a slight higher overestimation than TMPA-V7. Although IMERG-E and IMERG-L exhibit poorer Bias, they demonstrate close or even better performance to that of TMPA-V7 in terms of daily CC and RMSD. 6. In summary, to reply to the original scientific question, we confirmed the significant improvement of IMERG, particularly for near-real-time products. However, care should be taken when employing them (especially for TMPA-RT) over the CRB and high-latitude catchments (e.g., YeRB, HaiRB, and SLRB). Hence, it is necessary to develop appropriate predictive error models, thus reducing their uncertainty. Besides, further studies are needed to evaluate and compare their hydrological utility over different catchments of varying size and climate.