A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons
Abstract
In this paper, we present a comprehensive review of the data sources and estimation methods of 30 currently available global precipitation data sets, including gauge-based, satellite-related, and reanalysis data sets. We analyzed the discrepancies between the data sets from daily to annual timescales and found large differences in both the magnitude and the variability of precipitation estimates. The magnitude of annual precipitation estimates over global land deviated by as much as 300 mm/yr among the products. Reanalysis data sets had a larger degree of variability than the other types of data sets. The degree of variability in precipitation estimates also varied by region. Large differences in annual and seasonal estimates were found in tropical oceans, complex mountain areas, northern Africa, and some high-latitude regions. Overall, the variability associated with extreme precipitation estimates was slightly greater at lower latitudes than at higher latitudes. The reliability of precipitation data sets is mainly limited by the number and spatial coverage of surface stations, the satellite algorithms, and the data assimilation models. The inconsistencies described limit the capability of the products for climate monitoring, attribution, and model validation.
Key Points
- We conduct a comprehensive review of precipitation data sets
- We evaluate the differences between data sets at different spatial and temporal scales
- We explore the opportunities and challenges in generating reliable precipitation estimates
1 Introduction
Ongoing climate change is unequivocal and is most likely caused by increasing concentrations of atmospheric carbon dioxide and other greenhouse gases and by anthropogenic activities (Flato et al., 2013; Mitchell & Jones, 2005). Climate change has substantial impacts on physical, biological, and human-managed systems. Observational data underpin our understanding of global and regional climate change (Feng et al., 2004) and is essential for research into climate variability and change and for identification of the related impacts.
Precipitation is a crucial component of the water cycle (Eltahir & Bras, 1996; Trenberth et al., 2003) and is the most important and active variable associated with atmospheric circulation in weather and climate studies (Kidd & Huffman, 2011). Accurate and reliable precipitation records are crucial not only for the study of climate trends and variability but also for the management of water resources and weather, climate, and hydrological forecasting (Jiang et al., 2012; Larson & Peck, 1974; Liu et al., 2017; Yilmaz et al., 2005). Gauge observations are typically used to measure precipitation directly at the Earth's surface (Kidd, 2001). Various large-scale climate data sets at different spatiotemporal scales have been developed from station (in situ) observations. For example, the Global Historical Climatology Network is an integrated database with about 31,000 stations and observations covering the entire twentieth century. However, gauge measurements have several drawbacks, such as incomplete areal coverage and deficiencies over most oceanic and sparsely populated areas (Kidd et al., 2017; Rana et al., 2015; Xie & Arkin, 1996). With advanced infrared (IR) and microwave (MW) instruments, satellite observations make up for these deficiencies by providing coverage that is more spatially homogeneous and temporally complete for vast areas of the globe (Kidd & Levizzani, 2011; Xie et al., 2003). Some satellite-derived data sets are now operationally available, including the Tropical Rainfall Measuring Mission (TRMM) (Huffman et al., 2007), the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) (Ashouri et al., 2015), and the Climate Prediction Center (CPC) morphing technique (CMORPH) (Joyce et al., 2004) products. Furthermore, products merging satellite and gauge measurements have been designed to improve the accuracy of climate-variable measurements; this approach is expected to maximize the relative benefits of each data type (Huffman et al., 1995; Xie et al., 2003). For instance, the Global Precipitation Climatology Project (GPCP) monthly precipitation analysis merges gauge observations with low-orbit-satellite MW data and geosynchronous-orbit-satellite IR data and is one of most popular products used in climate studies (Adler et al., 2003).
These different types of precipitation data product have proved useful across a wide range of fields of research. Global and regional climate-change trends have been quantified and made evident on the basis of multiple different data sets. Climatology in different regions and changes in climate means and extremes have also been investigated via these data products (Alexander et al., 2006; Kunkel et al., 2015; Rajah et al., 2014; Sun, Kong, et al., 2014). Moreover, droughts and floods can be monitored by high-resolution satellite-based products (AghaKouchak et al., 2015; AghaKouchak & Nakhjiri, 2012; Wu et al., 2014; Yilmaz et al., 2010). Precipitation measurements are arguably the most vital meteorological input for forcing and calibrating hydrological and ecological models. Although an increasing number of climate data sets with higher spatial and temporal resolution have already been constructed and applied in a substantial number of studies, the different data sets are not completely consistent (Tapiador et al., 2017). Differences exist among the so-called observational data sets owing to deficiencies in the data sources and the individual generation of the products. A range of studies comparing climate data sets has appeared in recent years (Derin & Yilmaz, 2014; Donat et al., 2014; Gehne et al., 2016; Jiang et al., 2012; Kidd et al., 2012; Miao et al., 2015; Sun, Kong, et al., 2014; Wang & Zeng, 2015). However, most of these studies focused on the applications of, or comparisons between, just some of the data sets. Few studies provide a comprehensive overview of the existing data products on a global scale. Moreover, more than 30 years of historical data are needed for the purpose of conducting climate studies according to the World Meteorological Organization (WMO). Some data sets, especially the gauge-based data sets and reanalysis, are generally provided long-term records of precipitation, which are suitable for climate studies. The satellite-related data sets have the limitations of their short length of record, but they still provide valuable and important information for the weather process, drought, and hydrological monitoring. We make efforts to provide the full review of different data sets to make readers in different research fields choose data set that suits them more easily. Thereby, this study aims to review the existing precipitation data sets generated from different data sources and to quantify the discrepancies between these data sets over multiple time scales.
2 Data Set Sources and Estimation Procedures
2.1 Gauge-Based Estimates
2.1.1 Gauging Instruments
Precipitation shows high spatial and temporal variability. The analysis of climate change and variability commonly depends on surface gauge observations. Rain gauges, disdrometers, and radar are the tools usually used for measuring precipitation (Figure 1). Rain gauges are the most common tool for directly assessing point precipitation at the surface, measuring the depth of rainfall as it accumulates over time. There are several types of rain gauge, such as accumulation gauges, tipping-bucket gauges, weighing gauges, and optical gauges; these gauges all have strengths and limitations (Strangeways, 2006; Tapiador et al., 2012). Accumulation gauges are simple collecting vessels that provide direct measurement of rainfall accumulation via the water level in the collection cylinder at the gauge location. The accumulated rainfall measurement is the sum of the raindrop volumes collected. Tipping-bucket rain gauges consist of a funnel that collects and channels precipitation into a small seesaw-like container. Once the bucket is filled, the container tips and empties the collected water, producing a signal in an inbuilt electrical circuit. However, clock synchronization and the mechanical accuracy of bucket filling and emptying are potential problems (Michaelides et al., 2009). Tipping-bucket gauges may remain partially filled at the end of a rainfall event, and only tip when a new period of rain starts, resulting in a built-in uncertainty of one bucket tip and possible inaccuration of the quantity of low-intensity rain. Optical rain gauges are based on visibility instruments and detect falling raindrops by their effect on a horizontal beam of light. Although these gauges sense the rate of precipitation rather than the amount, the total amount of rainfall can be derived (Strangeways, 2010). Weighing precipitation gauges incorporate a storage bin for weighing the collected water and recording the mass. Compared with tipping-bucket gauge, the advantage of this type of gauge is that it can measure other forms of precipitation, including rain, hail, and snow. All of these instruments produce rainfall measurements that are highly correlated with one another. Nevertheless, gauges suffer from environmental problems and other sources of error, such as wind, evaporation, wetting, splashing, site location, instrument error, spatiotemporal variation in drop-size distribution, and frozen versus liquid precipitation (Michelson, 2004; Peterson et al., 1998).

More recently, technologically sophisticated devices such as radar and disdrometers have been used to enhance our knowledge about the composition of precipitation and the potential physical processes underlying its formation. Unlike rain gauges, disdrometers can detect individual raindrops and measure their size. Knowledge of the drop-size distribution is essential for understanding precipitation processes, estimating rainfall, and improving microphysics parameterizations in numerical cloud models. Disdrometers can be categorized into two classes: impact disdrometers and imaging (line or area scan) disdrometers. Weather radar is another alternative to rain gauges and provides real-time measurement with high spatial and temporal resolution. Radar can also capture the three-dimensional structure of precipitation. However, lack of accessibility and funding has limited the development of a global radar network (Habib et al., 2012; Sauvageot, 1994).
2.1.2 Gauge-Based Precipitation Products
Gauge-based observations of precipitation tend to be collected by national weather services. For climate research, it is necessary to assemble all of the data from different nations into one integrated global data set. Founded in 1873, the World Meteorological Organization (WMO) is an intergovernmental organization with 191 member states and territories (http://www.wmo.int). WMO promotes the development of observation networks for the fields of climatology, hydrology, and geophysics, as well as the exchange, processing, and standardization of related data. It also provides assistance for technology transfer, training, and research. For instance, the WMO Global Telecommunication System (GTS) is a complex worldwide communication system for the exchange of meteorological data that forms the basis for many applications. Founded in 1992, the aims of the Global Climate Observing System (GCOS) were to meet the demand for climate-related observations and to make the data freely available to all nations (Spence & Townshend, 1995). The GCOS is essentially an aggregate of all climate-related activity in the observing systems from which it is built, from the global to the local scale (Houghton et al., 2012). The total number of rain gauges operated worldwide is estimated to be between 150,000 and 250,000 (Groisman & Legates, 1995; New et al., 2001; Strangeways, 2006). The wide range in estimates is due to the different criteria used to count gauges. Although many gauges exist, not all have operated continuously or concurrently (Kidd et al., 2017).
Owing to the irregular distribution of observation stations, gridding of data is required for many climatic and related applications. Several gridded precipitation data sets based entirely on gauge information have been constructed and are in wide use (summarized in Table 1). The Climate Research Unit (CRU) data set comprises a suite of climate variables, including precipitation. This data set is popular because of its relatively long history and its fine spatial resolution. The principal sources used for construction of the CRU monthly precipitation data set were obtained through the auspices of the national meteorological agencies (NMAs), the WMO, the CRU, the Centro Internacional de Agricultura Tropical, the Food and Agriculture Organization (FAO), and others.
Data set | Resolution | Frequency | Coverage | Period | Source | Reference |
---|---|---|---|---|---|---|
CRU | 0.5° × 0.5° | Monthly | Global land | 1901–2015 | The CRU of the University of East Anglia | (Harris et al., 2014; New et al., 2000) |
GHCN-M | 5° × 5° | Monthly | Global land | 1900–present | National Climatic Data Center | (Peterson & Vose, 1997) |
GPCC | 0.5° × 0.5°, 1.0° × 1.0°, 2.5° × 2.5° | Monthly | Global land | 1901–2013 | GPCC | (Rudolf et al., 2009) |
GPCC-daily | 1.0° × 1.0° | Daily | Global land | 1988–2013 | GPCC | (Schamm et al., 2014) |
PRECL | 0.5° × 0.5°, 1.0° × 1.0°, 2.5° × 2.5° | Monthly | Global land |
1948–2012.1(0.5°) 1948–present |
NCEP/NOAA | (Chen et al., 2002) |
UDEL | 0.5° × 0.5° | Monthly | Global land | 1900–2014 | University of Delaware | (Willmott & Matsuura, 1995) |
CPC-Global | 0.5° × 0.5° | Daily | Global land | 1979–2005 | CPC | (Xie et al., 2010) |
The Global Precipitation Climatology Centre (GPCC) has established a unique capability to collect, perform quality control, and analyze rain gauge data from across the globe. NMAs provide the primary data for the GPCC, with 158 countries and 31 regional suppliers providing the majority of the gauge data in the GPCC database. Furthermore, the GPCC receives daily surface synoptic observations and monthly climate messages from the WMO GTS. The global data collections of the CRU (11,800 stations), FAO (13,500 stations), and GHCN at the National Centers for Environmental Information (34,800 stations from GHCN2 and GHCN daily), as well as data collections from international regional projects, are all integrated into the GPCC. The resulting database covers more than 200 years and uses data acquired from more than 85,000 stations worldwide. The GPCC requires a minimum of 10 uninterrupted years at each station as a screening criterion for the cadre data set for the background climatology. Therefore, the number of stations utilized for GPCC Climatology is exactly 67,298 stations for the best covered month (June) and exactly 67,149 stations for the worst covered month (December). Stations (65,335) pass the 10 year constraint for every month of the year and are therefore utilized for the annual climatology. Four globally gridded monthly precipitation products have been constructed from this database (Becker et al., 2013)—ClimatologyV2011, GPCC Full Data version 7, Monitoring Product V4, and GPCC First Guess Products. The GPCC Full Data product is the most commonly used product and covers the period from 1901 to the present. The GPCC Full Data product is described in Table 1; the distribution of gauges as of July 2005 is plotted in Figure 2b.

For the University of Delaware (UDEL) product, station data are gathered from various sources: a recent version of the GHCN data set (GHCN2); a version of the Daily GHCN (GHCN-Daily); an Atmospheric Environment Service/Environment Canada archive; data from the Hydrometeorological Institute in St. Petersburg, Russia; Greenland Climate Network data; daily records from the Global Surface Summary of the Day; the National Center for Atmospheric Research (NCAR) daily India data; Nicholson's archive of African precipitation data; Webber and Willmott's South American monthly precipitation station records; and the Automatic Weather Station Project's Greenland station records. Part of the background climatology is taken from Legates and Willmott's (1990) unadjusted archive.
To meet the need for a high-quality, observation-based, large-scale precipitation data set covering both land and ocean for the period before the 1970s, the U.S. CPC has constructed a monthly precipitation data set beginning in 1948. Termed the precipitation reconstruction (PREC), this global analysis is constructed by interpolating gauge observations over land (PRECL) and by empirical-orthogonal-function reconstruction of historical observations over ocean (Chen et al., 2002). PRECL data include measurements from over 17,000 stations in the GHCN2 and the Climate Anomaly Monitoring System data sets.
The CPC Gauge-Based Analysis of Global Daily Precipitation (CPC-Global) is the first product from the CPC Unified Precipitation Project in progress at the National Oceanic and Atmospheric Administration (NOAA). This project is devoted to combining all information sources available at the CPC and using the optimal-interpolation objective analysis technique to create a set of unified precipitation products that have consistent quantity and improved quality. Gauge reports from 30,000 stations form the CPC-Global product, including reports from the GTS, Cooperative Observer Network (COOP), and other NMAs.
These databases depend on historical precipitation observations from ground stations, and the original data are sometimes identical in the different databases. Thus, these data sets have always been in general agreement across spatiotemporal scales. Moreover, the number of gauges in use is growing smaller (Figure 2). For instance, in the GPCC Full Data Reanalysis Version 7.0 data set, there were around 10,900 usable stations across the world in 1901. The number increased steadily to a maximum of about 49,470 in July 1970, and then fell thereafter to 30,000 in 2005 and subsequently to only about 10,000 by 2012 (Figure 2e). This reduction in ground-based measurements is wide-ranging and has occurred for all climatic data, not just precipitation. This reduction may be due to increasing operational and staffing costs associated with ground-based data collection, restrictions on the release of the data by NMAs, migration and abandonment of sites, and economic and political factors (Strangeways, 2006). The general decline in the use of gauges is a serious concern and may reduce our ability to follow changes in precipitation in the future (Strangeways, 2006). Timing of gauge observations is also an important issue because it differs among the various national networks. This sometimes creates untagged multiday accumulations, with clear implications for comparisons with other sources of data (e.g., Viney & Bates, 2004).
2.2 Satellite Estimates
Satellite systems are invaluable tools for global measurements of atmospheric parameters at regular intervals. In April 1960, the first Television and IR Observation Satellite (TIROS) was launched, producing images of clouds that could be compared with concurrent meteorological observations (Kidd, 2001). Since then, the number of satellite sensors for observing the atmosphere has advanced markedly. Sensors onboard satellites are currently the only instruments that can provide global, homogeneous, precipitation measurements. The sensors can be classified into three categories: visible/IR (VIS/IR) sensors on geostationary (GEO) and low Earth orbit (LEO) satellites, passive MW (PMW) sensors on LEO satellites, and active MW sensors on LEO satellites (Michaelides et al., 2009; Prigent, 2010). Corresponding methods used to derive precipitation have been developed, including the VIS/IR-based methods, active and passive MW techniques, and merged VIS/IR and MW approaches (Kidd & Levizzani, 2011).
The principle underlying VIS/IR methods is that cold and bright clouds are related to convection; cold cloud tops suggest greater vertical development in the cloud and therefore more rain. The link between IR cloud top temperature and the probability and intensity of rainfall at the ground can be used to estimate precipitation from IR readings. VIS/IR satellite observations have the advantage of providing wide coverage over tropical regions with adequate temporal resolution and fine spatial resolution. A number of different VIS/IR algorithms are widely used, such as the Griffith-Woodley algorithm (Griffith et al., 1978), the GEO Operational Environmental Satellites (GOES) Precipitation Index (GPI) (Arkin & Meisner, 1987), the Convective/Stratiform technique (Adler & Negri, 1988), the GOES multispectral rainfall algorithm (Ba & Gruber, 2001), and so on. It should be noted, however, that the relationship between cloud top temperature and precipitation is indirect and not all clouds form precipitation.
PMW radiometer is a more direct method of measuring precipitation because PMW is sensitive to precipitation-sized particles. Unlike VIS/IR observations, PMW observations from satellites can sense through clouds. PMW-based techniques for estimating precipitation have advanced significantly since the first Special Sensor Microwave/Imager (SSM/I) was launched in 1987 (Hollinger et al., 1987). In late 1997, launch of the TRMM included the PMW TRMM MW Imager (TMI) (Kummerow et al., 1998), which resulted in great progress in depicting and analyzing tropical precipitation. The first advanced MW sounding unit (MSU) (AMSU) onboard NOAA 15 was launched in July 1998. It provided information at higher frequencies between 23.8 and 190 GHz and was useful for deriving cloud and precipitation products. These products, combined with those derived from the U.S. Defense Meteorological Satellite Program (DMSP) SSM/I provide more global observation in spatial-temporal scales and are useful to weather and climate analyses (Goodrum et al., 1999). In 2002, the Advanced MW Scanning Radiometer for the Earth Observing System (AMSR-E) was developed as a multichannel PMW radiometer and measures water-related geophysical parameters (Kawanishi et al., 2003). The spatial resolution of AMSR-E data is double that of Scanning Multichannel MW Radiometer and SSM/I data. Early PMW-based retrieval methods were simple regressions between surface rain rates and the associated simulated or measured brightness temperature, methods which are currently still used in long-term climatologies. Recently, however, other approaches have been developed, including probabilistic, physical, and iterative algorithms, and these have been widely applied to rainfall estimation (Bauer et al., 2001; Petty, 1994; Pierdicca et al., 1996; Wentz & Spencer, 1998). The Goddard PROFiling scheme is one of most popular retrieval methods from which instantaneous rainfall and the vertical structure of rainfall can be obtained (Kummerow et al., 2001). Although PMW-based methods can make realistic instantaneous rainfall estimates, PMW sensors are at present only onboard LEO satellites, leading to relatively poor temporal sampling (Hong et al., 2012) compared with the rapid temporal update cycle (30 min or less) of the GEO-based IR instruments. The current era of LEO satellites with PWM sensors allow for near global coverage every 3 h or less. Data collected from VIS/IR and PMW sensors are often merged to increase the accuracy, coverage, and resolution of precipitation analyses. For example, Sorooshian et al. (2002) developed combined PMW and IR algorithms and Tapiador et al. (2004) also used a neural network-based estimation approach that used both PMW and IR satellite measurements. CMORPH uses motion vectors derived from GEO satellite IR imagery (sampled at 30 min intervals) to produce high-quality precipitation estimates from PMW data (Joyce et al., 2004). The use of active MW observations from satellites for precipitation began with the launch of the first spaceborne precipitation radar in the TRMM mission in 1997, which made it possible to capture the three-dimensional structure of rain (Kummerow et al., 2000). The Global Precipitation Measurement (GPM) mission was designed to offer new rainfall and snowfall observations from space (Figure 3). These data helped the next generation of global precipitation products, characterized by more accurate instantaneous precipitation estimates and unified precipitation retrievals from a constellation of MW radiometers (Hou et al., 2014).

A number of satellite precipitation data sets are currently available (summarized in Table 2), including CMORPH (Joyce et al., 2004), the TRMM Multi-satellite Precipitation Analysis (TMPA) (Huffman et al., 2007), and PERSIANN (Hong et al., 2004; Hsu et al., 1997; Sorooshian et al., 2000). Maggioni et al. (2016) presented a consolidated and detailed review of the algorithms used in satellite precipitation data sets, including a comparison of the algorithms over six continents and oceans.
Data set | Adjusted | Res. | Freq. | Coverage | Period | Data source | Algorithm | Reference | |
---|---|---|---|---|---|---|---|---|---|
GPCP | GPCC, GHCN | 2.5° | Monthly | Global | 1979–present | GPI, OPI, SSM/I scattering, SSM/I emission, TOVS | (Adler et al., 2003) | ||
GPCP 1dd | GPCC, GHCN | 1.0° | Daily | Global | 1996–present | SSM/I-TMPI, TOVS | (Huffman & Bolvin, 2013) | ||
GPCP_PEN_v2.2 | GPCC, GHCN | 2.5° | 5-daily | Global | 1979–2014 | OPI, SSM/I, GPI, MSU | (Xie et al., 2003) | ||
CMAP | GPCC, GHCN | 2.5° | Monthly | Global | 1979–present | GPI, OPI, SSM/I scattering, SSM/I emission, MSU, NCEP–NCAR | (Xie et al., 2003; Xie & Arkin, 1997) | ||
CPC-Global | GTS, COOP, NMAs | 0.5° | Daily | Global land | 2006–present | GTS, COOP, NMAs | (Xie et al., 2010) | ||
TRMM 3B43 | GPCC | 0.25° | Monthly | 50°S–50°N | 1998–present | TMI, TRMM Combined Instrument, SSM/I, SSMIS, AMSR-E, AMSU-B, MHS, and GEO IR | Probability Matching | (Huffman et al., 2007) | |
TRMM 3B42 | X | 0.25° | 3 h/Daily | 50°S–50°N | 1998–present | TMI, TRMM Combined Instrument, SSM/I, SSMIS, AMSR-E, AMSU-B, MHS, and GEO IR | Probability Matching | (Huffman et al., 2007) | |
GSMaP | X | 0.1° | 1 h/daily | 60°S–60°N | 2002–2012 | TMI, AMSR-E, AMSR-E, SSM/I, multifunctional transport satellites (MTSAT), Meteosat-7/8, GOES 11/12 | Kalman filter model | (Ushio et al., 2009) | |
PERSIANN-CCS | X | 0.04° | 30 min/3, 6 h | 60°S–60°N | 2003–present | Meteosat, GOES, GMS, SSM/I, polar/near polar precipitation radar, TMI, AMSR | Artificial Neural Networks | (Sorooshian et al., 2000) | |
PERSIANN-CDR | GPCP | 0.25° | 3, 6 h/Daily | 60°S–60°N | 1983–present | GOES 8, GOES 10, GMS-5, Metsat-6, and Metsat-7, TRMM, NOAA 15, 16, 17, DMSP F13, F14, F15. | Artificial Neural Networks | (Ashouri et al., 2015) | |
CMORPH | X | 0.25°/8 km | 30 min/3 h/Daily | 60°S–60°N | 2002–present | TMI, SSM/I, AMSR-E,AMSU-B, Meteosat, GOES, MTSAT | Propagation & Morphing | (Joyce et al., 2004) | |
GPM | 0.1° | 30 min/3 h/daily | 60°S–60°N | 2015–present | GMI, AMSR-2, SSMIS, Madaras, MHS, Advanced Technology Microwave Sounder | IMERG | (Hou et al., 2008, 2014) | ||
MSWEP | CPC, GPCC | 0.1°/0.5° | 3 h/daily | Global | 1979–present | CPC, GPCC, CMORPH, GSMaP-MVK, TMPA, ERA-Interim, JRA-55 | (Beck et al., 2017) |
CMORPH has half-hourly analyses at a grid resolution of 8 km, depends on MW retrievals exclusively, and only uses IR data to propagate MW-derived precipitation features during times when updated PMW data are unavailable. Time-weighted linear interpolation is used to modify the shape and intensity of precipitation features during the times between PMW sensor scans to ensure precipitation estimates are temporally and spatially complete. CMORPH retrieves PMW precipitation estimates from the NOAA polar-orbiting operational meteorological satellites (including NOAA 15, 16, and 18); DMSP 13, 14, and 15; and NASA's Aqua and TRMM spacecraft. IR images over different time intervals are provided by the GOES 8, GOES 10, Meteosat-5, Meteosat-7, and GEO Meteorological Satellite-5 (GMS-5) satellites.
The Global PERSIANN Cloud Classification System (PERSIANN-CCS) estimates rainfall distribution at a finer scale (0.04° and 30 min) from IR brightness temperature data from GEO satellites and uses PMW measurements from LEO satellites to update its parameters. The variable-threshold cloud segmentation algorithm extracts informative features from cloud patches and then classifies these patches into different groups on the basis of the similarity of selected features. Rainfall mapping for each classified cloud cluster is achieved by using histogram matching and exponential regression to fit curves to the plots of pixel brightness temperature versus rainfall rate (Hong et al., 2007). The PERSIANN Climate Data Record (PERSIANN-CDR) provides longer (1983 to present) and finer (0.25°) daily precipitation estimates using the PERSIANN algorithm on GridSat-B1 IR satellite data. The artificial neural network is trained with stage IV hourly precipitation data from the National Centers for Environmental Prediction (NCEP). The high-resolution PERSIANN estimates are then adjusted by GPCP data at a resolution of 2.5° for bias reduction (Ashouri et al., 2015).
The primary rainfall sensors on the TRMM spacecraft include the Precipitation Radar, the TMI, and the VIS/IR Radiometer. A suite of precipitation products has been produced based on the data from these sensors, available at three levels. Level 2 products include 2A25, 2A23, 2A12, and 2B31. TRMM rainfall products that have a uniform space and time grid are level 3 products and include 3A25, 3A11, 3A12, 3B21, 3B42, and 3B43. TRMM 3B42 and 3B43 are the most commonly used products and combine precipitation estimates from multiple satellites (Liu et al., 2012). TRMM 3B43 combines the TRMM 3B42 data set with the GPCC rain gauge analysis. The TMPA algorithm derives precipitation by combining high-quality PMW observations and IR data from GEO satellites. The MW precipitation estimates are calibrated and combined from different sources, including the TMI onboard the TRMM spacecraft, the SSM/I on DMSP satellites, the AMSR-E on the Aqua spacecraft, AMSU-B on the NOAA satellite series, the MW Humidity Sounders (MHS) on later NOAA-series satellites and the European Operational Meteorological satellite (Huffman et al., 2007).
The Global Satellite Mapping of Precipitation (GSMaP) project was sponsored by the Japan Science and Technology Agency during the period of 2002–2007. It aims to produce high-resolution global precipitation maps and develop precise MW radiometer algorithms. This precipitation product is based on MW radiometer data from TMI, AMSR-E, SSM/I (F13, F14, and F15), AMSU-B (N15, N16, N17, and N18), and IR data merged from all available GEO satellites (GOES 8/10, Meteosat-7/5, and GMS) provided by NCEP/CPC. Surface precipitation rates are retrieved according to Aonashi et al. (1996). High-resolution (1° /1 h) precipitation maps are created with a morphing technique using the IR cloud motion vector and Kalman filtering (Ushio & Kachi, 2010). GSMaP_MVK refers to the Kalman filter-based system; a near-real-time system named GSMaP_MVK_RT contains the propagation process forward in time.
Rain gauges provide relatively accurate and trusted measurements of precipitation at single points but are unavailable over many sparsely populated and oceanic areas and can be affected by sampling errors. Satellite observations provide precipitation information with homogeneous spatial coverage but contain nonnegligible random errors and biases owing to the indirect nature of the relationship between the observations and precipitation, inadequate sampling, and deficiencies in the algorithms. Many attempts have been made to merge different sources of information to overcome these problems while tapping into the individual advantages of the different methods, to obtain optimal precipitation analyses with regular gridded fields (Figure 4). The CPC Merged Analysis of Precipitation (CMAP) (Xie & Arkin, 1997) and GPCP (Adler et al., 2003) are the most widely recognized and used merged data sets. The GPCP precipitation product was first released in 1997 and version 2 was released in 2002. It is based on the sequential combination of MW, IR, and gauge data. For the SSM/I period 1987 to the present, MW measurements from the SSM/I and the Special Sensor Microwave Imager Sounder (SSMIS) calibrate the GPI between 40°S and 40°N and are combined with estimates based on data from the TIROS Operational Vertical Sounder (TOVS) and the Atmospheric IR Sounder to offer globally complete satellite-only precipitation estimates. For the pre-SSM/I periods, the calibrated Outgoing long-wave radiation (OLR) Precipitation Index (OPI) (trained against GPCP for the period of 1988–1997) is used globally between 1979 and 1985; for other time periods, the Adjusted Global Precipitation Index is used between 40°S and 40°N and the calibrated OPI is used elsewhere. Then, the multisatellite field is merged with rain gauge analyses (over land) by adjusting the satellite estimates to the gauge bias and then combining the (adjusted) satellite and gauge fields by inverse error-variance weighting (Adler et al., 2003).

CMAP uses very similar input data to GPCP, but there are some differences in the merging technique. CMAP is constructed in two steps, merging seven independent sources with different characteristics. First, the IR-based GPI, the OLR-based OPI, the MSU-based Spencer data set, the SSM/I-scattering-based NOAA/National Environmental Satellite, Data, and Information Services data set, the SSM/I-emission-based Chang data set, and the precipitation forecast from NCEP-NCAR reanalysis are linearly combined using a maximum-likelihood method in which the weighting coefficients are in inverse proportion to the squares of the individual random errors. The errors over land and ocean are determined by comparison with the GPCC and atoll gauge measurements. Second, the variational blending method is used to combine the output from the first step with gauge-based analyses to remove possible biases (Xie & Arkin, 1997). Yin et al. (2004) reported that the GPCP and CMAP analyses are generally consistent, with just a few differences occurring owing to discrepancies in the source data and merging techniques. The GPCP pentad precipitation analysis is constructed by modulating the pentad CMAP analysis on the basis of observations only to ensure that the overall magnitude of the adjusted pentad analyses matches the monthly GPCP while the high-frequency components remain unchanged (Xie et al., 2003).
The TRMM 3B43 monthly fields are constructed by combining multisatellite and gauge analyses via inverse random-error variance weighting, then scaling all the individual 3-hourly combined PWM-IR fields so that they sum to the 3B43 grid box values.
The GPCP 1° daily precipitation analysis (GPCP 1dd) was released to meet the initialization requirement in numerical models, to drive land-surface models, to resolve the advance and retreat of precipitation, and to validate model forecasts (Huffman et al., 2001). The algorithm for obtaining instantaneous precipitation in the GPCP 1dd uses the Threshold Matched Precipitation Index (TMPI) for 40°N–40°S, based on a merged geo-IR data set from IR brightness temperatures, and rescaled TOVS precipitation estimates at higher latitudes (Huffman et al., 2001). It should be noted that all merged data sets make assumptions that the precipitation distribution estimated from combined satellite estimates is optimal and that the gauge observations are bias-free.
The global coverage Multi-Source Weighted-Ensemble Precipitation (MSWEP) rainfall data set provides 3-hourly temporal resolution and 0.25° spatial resolution (Beck et al., 2017). MSWEP merges the highest-quality precipitation data sources available as a function of timescale and location. It uses a combination of rain gauge measurements, satellite observations, and estimates from atmospheric models (Beck et al., 2017). The weight assigned to the gauge-based estimates is calculated from the gauge network density, and the weights assigned to the satellite- and reanalysis-based estimates are calculated from their comparative performance at the surrounding gauges. This determines the temporal variability of MSWEP at each grid.
2.3 Reanalysis
The idea behind reanalysis systems is to merge irregular observations and models that encompass many physical and dynamical processes in order to generate a synthesized estimate of the state of the system across a uniform grid, with spatial homogeneity, temporal continuity, and a multidimensional hierarchy. Many essential climate variables output from reanalysis systems maintain a physically consistent framework and can be obtained after only a short time delay. A reanalysis system includes a background forecast model and a data assimilation routine. The observations assimilated into the reanalysis system, the model parameterizations, and the complex interactions between the model and the observations all influence the subsequent precipitation forecast generated by the system (Bosilovich et al., 2008). Successive generations of reanalysis products produced by various organizations have advanced their quality, with improved models, input data, and assimilation methods. We focus on global reanalysis systems in this study, including the two NCEP/NCAR Reanalysis system (NCEP1 and NCEP2), two European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis systems (ERA-40 and ERA-Interim), the Twentieth Century Reanalysis system (20CRv2), the Modern-Era Retrospective Analysis for Research and Application system (MERRA), the NCEP Climate Forest System Reanalysis system (CFSR), and the Japanese 55-year Reanalysis (JRA-55).
NCEP2 was conceived as an updated and human-error-fixed version of NCEP, with similar input data and vertical resolution (Kanamitsu et al., 2002); however, some evaluations have indicated that there are few differences between the performance of NCEP2 and NCEP1. ERA-40 had some data assimilation problems, so the ECMWF created ERA-Interim in an attempt to overcome this. ERA-40 overestimated rainfall over tropical oceans owing to the humidity analysis scheme and bias adjustments for IR radiance (Uppala et al., 2005). ERA-Interim applies four-dimensional variational data assimilation (4D-Var), uses a completely automated scheme to adjust for biases in satellite radiance observations, and executes modified convective and boundary layer cloud schemes, increasing the atmospheric stability and producing less precipitation (Dee et al., 2011). 20CRv2, CFSR, and MERRA are generally classified as modern reanalysis systems with higher spatial resolution (Table 3) that apply advanced numerical models and assimilation schemes to combine observations from multiple sources. CFSR is based on a fully coupled ocean-land-atmosphere model and uses numerical weather prediction techniques to assimilate and predict atmospheric states (Saha et al., 2010). CFSR applies three-dimensional variational data assimilation (3D-Var) based on grid-point statistical interpolation (GSI), consistent with MERRA. Since 2014, the second MERRA reanalysis system (MERRA-2) has increasingly replaced the original reanalysis system and uses version 5 of the Goddard Earth Observing System Model data assimilation system and an updated GSI. The NOAA 20CR data set has the longest record of variables, spanning the twentieth century. Observations of synoptic surface pressure, prescribed monthly sea-surface temperatures, and sea ice distributions were assimilated to form boundary conditions for atmosphere in the 20CR reanalysis system (Compo et al., 2011). In 2010, the second Japanese global atmospheric reanalysis project, Japanese 55-year Reanalysis (JRA-55), was improved by the Japan Meteorological Agency to overcome the deficiencies in the first Japanese reanalysis project, Japanese 25-year Reanalysis, and to provide a long-term comprehensive atmospheric data set. JRA-55 adopts a new radiation scheme in the forecast model as well as 4D-Var with variational bias correction (Ebita et al., 2011). In addition, JRA-55 includes greenhouse gases at time-varying concentrations to improve the data quality (Ebita et al., 2011).
Data set | Resolution | Freq. | Coverage | Period | Source | Assimilation schemes | Reference |
---|---|---|---|---|---|---|---|
NCEP1 | 2.5° × 2.5° | Monthly/Daily/6 hourly | Global | 1948–present | NCEP/NCAR | 3D-Var (Spectral statistical interpolation) | (Kalnay et al., 1996) |
NCEP2 | 1.875° × 1.875° | Monthly/6 hourly | Global | 1979–present | NCEP/DOE | 3D-Var | (Kanamitsu et al., 2002) |
ERA 40 | 2.5° × 2.5°/1.125° × 1.125° | Monthly/6 hourly | Global | 1957–2002 | ECMWF | 3D-Var | (Uppala et al., 2005) |
ERA Interim | 1.5° × 1.5°/ 0.75° × 0.75° | Monthly/6 hourly | Global | 1979–present | ECMWF | 4D-Var | (Dee et al., 2011) |
20CRv2 | 2.0° × 2.0° | Monthly/daily/6 hourly | Global | 1871–2012 | NOAA | Ensemble Kalman Filter | (Compo et al., 2011) |
JRA-55 | 60 km | Monthly/3 hourly/6 hourly | Global | 1958–present | Japanese Meteorological Agency | 4D-Var | (Ebita et al., 2011) |
MERRA | 0.5° × 0.67° | Daily | Global | 1979–present | NASA | 3D-Var | (Rienecker et al., 2011) |
MERRA Land | 0.5° × 0.67° | Monthly/Daily/1hourly | Global land | 1980–present | NASA | 3D-Var | (Reichle et al., 2011) |
CFSR | 38 km | 6 hourly | Global | 1979–2010 | NOAA | 3D-Var | (Saha et al., 2010) |
3 Intercomparison of Precipitation Estimates Among the Different Data Sets
The data sets described above have been used in numerous studies, including the detection of climate variability, the attribution of climate change, and the evaluation of climate models at global and regional scales (Ceglar et al., 2017; Gehne et al., 2016; Huffman et al., 2001). However, the estimated precipitation is not completely consistent among these data sets owing to their different data sources, quality control schemes, and estimation procedures. Consequently, comparing the performance of the data sets at different spatiotemporal scales will provide essential information for related studies that use these data sets. Some comparisons of global precipitation data sets have been carried out in previous studies. For instance, Gehne et al. (2016) compared the characteristics of precipitation estimates from 11 global products. Herold et al. (2017) explored the difference in observed daily precipitation extremes over tropical land (50°S–50°N) estimated by five products. These studies focused on a limited number of data sets and compared only certain aspects of precipitation characteristics at temporal and spatial scales. In the following sections, we investigate the differences among these data sets at annual, seasonal, and daily scales. We will also examine uncertainties in mean and extreme precipitation values.
The websites used to access the data sets and the code necessary to read them can be found in the supporting information. Owing to differences in the time periods covered by the data sets (Figure 5), only the data sets with overlapping time periods were used in the comparison. Thus, the CRU, GPCC, UDEL, PRECL, GPCP, CMAP, CPC, CMORPH, TRMM 3B43, PERSIANN-CDR, PERSIANN-CCS, MSWEP, 20CR, CFSR, NCEP1, NCEP2, JRA-55, ERA Interim, and MERRA were compared at the annual and seasonal scales. For the daily scale, 15 products were compared, GPCC-daily, GPCP 1dd, CPC, CMORPH, TRMM 3B42, PERSIANN-CDR, PERSIANN-CCS, MSWEP, 20CR, CFSR, NCEP1, NCEP2, JRA-55, ERA Interim, and MERRA. Lastly, 22 monthly or daily data sets were evaluated. The data sets cover a range of spatial (0.04° to 2.5°) and temporal resolutions (30 min to monthly). We also summarize the preexisting regional intercomparisons of estimated precipitation in different areas.

3.1 Intercomparison of Annual Precipitation Estimates
Figure 6 displays the annual precipitation estimates over global land (excluding Antarctica) and tropical land (50°S–50°N) from 17 precipitation products (PERSIANN-CDR was not included owing to a short time period). The data sets provide precipitation estimates over different time periods. The largest discrepancy occurs in the magnitudes of the different precipitation estimates. Although the 20CR data set provides the longest precipitation time series, this data set overestimates precipitation compared with many of the other data sets. The gauge-based CRU, GPCC, and UDEL data sets are popular products covering the 20th century and show reasonably consistent interannual variability, but with deviations in magnitudes of up to about 100 mm. In the Taylor diagrams (Taylor, 2001) (Figure 7), GPCC is used as the reference object because it is the largest gauge-observation data set, with data from more than 70,000 different stations (Schneider et al., 2008). In the period of overlap (1979–2010), the annual series from gauge-based estimates are similar and are located close to each other in the Taylor diagrams, whereas there is great inconsistency in the annual values obtained from reanalysis products. In tropical regions, gauge-based products showed consistent interannual variability and small biases. The discrepancies were increased when other products were added. However, some products, including the satellite-based TRMM 3B43, PERSIANN-CDR products, and GPCP, match the gauge-based precipitation well in tropical regions (Figure 7). The GPCC data set is used to calibrate both GPCP and TRMM 3B43, and PERSIANN-CDR is adjusted according to GPCP estimates (Ashouri et al., 2015). This can partly explain the higher correlation among these data sets. Precipitation estimates for tropical land from reanalysis products other than 20CR show relatively high correlation coefficients with GPCC (>0.6) compared with the coefficients over global land, but there are larger discrepancies in the variation of annual precipitation.


Figure 8 shows the spatial distribution of annual precipitation from different products during the overlap period 2003–2010. All products capture a similar spatial precipitation pattern. The reanalysis products, especially CFSR, NCEP2, and JRA-55, show higher precipitation estimates over tropical regions than the gauge-based, satellite-based, or merged products. This overestimation is especially apparent in tropical ocean and results from overestimation of small and medium-sized precipitation amounts (Pfeifroth et al., 2013). Furthermore, tropical precipitation is characterized by high spatiotemporal variability and stems mainly from convective events, which requires accurate parameterization schemes and high resolution in the reanalysis model (Pfeifroth et al., 2013). Owing to a lack of abundant direct observations, precipitation estimates over ocean remain challenging and largely uncertain. Over global land, the largest differences among precipitation estimates from gauge-based, reanalysis, and satellite-gauge merged products are concentrated in northern Africa, northwest China, eastern Russia, northern North America, Greenland, and the west coast of South America, areas that are characterized by sparse measurements owing to sparse populations and complex terrain. High-elevation regions have relatively warm clouds. Incorrect discrimination between raining and nonraining clouds with thermal IR could cause the IR rainfall retrieval algorithms to miss light-precipitation events and underestimate total rainfall (Bitew & Gebremichael, 2010; Maggioni et al., 2016). Conversely, reanalysis data sets tend to overestimate precipitation at higher elevations compared with observations from stations. For instance, compared with the station observations from China Meteorological Administration and the U.S. National Snow and Ice Data Center, MERRA, ERA-Interim, and CFSR significantly overestimate precipitation at high elevations; however, the TRMM 3B42 satellite data underestimate precipitation in mountainous areas in Central Asia (Hu et al., 2016). In addition, the discrepancies between products are slightly greater in arid and semiarid regions than in humid regions (Cattani et al., 2016; Dinku et al., 2011). In tropical regions, discrepancies between products did not increase when satellite estimates were included (Figure 9).


3.2 Intercomparison of Seasonal Precipitation Estimates
At the seasonal scale, satellite-gauge merged products produced low precipitation estimates, whereas reanalysis products produced high estimates (Figure 10). The CPC estimates are the lowest in all four seasons, leading to the underestimation of annual precipitation. In March-April-May (MAM), September- October- November (SON), and December-January-February (DJF), the CFSR estimates exceed those of the other products. GPCP, CRU, and GPCC agree well with each other in all seasons. Although PRECL and UDEL are also gauge-based, precipitation estimates for MAM, June-July-August (JJA), and SON from these products are higher than the estimates from CRU and GPCC. The coverage of the raw data sources, orographic correction, and interpolation techniques may be potential factors impacting agreement among gridded data sets. The seasonal contributions to the difference in annual precipitation are slightly larger for JJA and MAM than for the other seasons. Differences in JJA and MAM precipitation span more than 100 mm. For tropical land, the relatively lower MAM, JJA, and SON precipitation estimates from CPC and the relatively higher MAM, SON, and DJF precipitation estimates from PERSIANN-CCS result in these two products having the lowest and highest annual precipitation estimates, respectively. The estimates from GPCP and the four gauge-based products are consistent, because they use the same gauges. The seasonal precipitation estimates are similar for TRMM 3B43 and PERSIANN-CDR. However, the seasonal estimates from the reanalysis products are uneven and display discrepancies.

At the spatial level, there are discrepancies between products in northwest Africa, inland Asia, northern Eurasia, northern North America, and Greenland, with a normalized discrepancy range of more than 0.4 in MAM (Figure 11). All products capture the zonal characteristics of global precipitation. The overestimation of global precipitation in CFSR relative to other products is mainly due to overestimation of precipitation in the equatorial regions. Underestimation of precipitation in the subtropical zone and temperate regions is the major contributor to the low CMORPH MAM precipitation estimates. Precipitation estimates for JJA show large discrepancies over Siberia, Alaska, southeast Brazil, western Asia, and northern Africa. For SON, the normalized discrepancy range over Siberia, Alaska, and Greenland is generally greater than 0.5. It should also be noted that clear discrepancies in DJF precipitation estimates exist for large areas of Eurasia and Alaska. In JJA, SON, and DJF, CPC (Figure 11) tends to underestimate seasonal precipitation in equatorial regions compared with other products.

3.3 Intercomparison of Daily Precipitation Estimates
Figure 12 shows the long-term histograms of daily precipitation intensity, from 0.1 mm d−1 to 250 mm d−1, averaged over six continents. Light precipitation events occur more frequently than other precipitation events, and there is a large divergence in the frequency of light events estimated by the different products. For Asia, North America, Europe, Africa, South America, and Australia, the spread of frequency estimates covers 16.34%, 23.72%, 21.65%, 15.94%, 19.84%, and 17.65%, respectively. GPCP has the lowest frequency estimate for light precipitation events on all continents. PERSIANN-CDR estimates of light precipitation frequency in Asia, North America, and Australia are generally higher than the estimates from other precipitation products. For moderate-intensity precipitation events, NCEP1 has higher frequency estimates for Asia, North America, Africa, and, in particular, South America, with the latter having an estimated frequency of greater than 40%. However, the estimated frequencies of precipitation at higher bins (>50 mm d−1) are lower for NCEP1 than for most products, especially in South America. For heavy precipitation intensity events, the lowest estimated frequencies for all six continents come from NCEP1. NCEP2, CFSR, TRMM 3B42, and PERSIANN-CCS, however, generally produce higher frequency estimates at high precipitation intensity bins than most of the other products. The satellite precipitation products estimate a greater frequency of high-intensity rain events. The PERSIANN algorithm applied pattern recognition to develop a “patch-based” cloud classification rainfall estimate based on satellite IR images. The higher frequency estimates in PERSIANN-CCS are likely related to the uncertainties in the statistical relationship between cloud top brightness, temperature, and precipitation rate. This relationship contains some uncertainties associated with the height, thickness, and type of cloud, which translate into uncertainties in the ensuing precipitation estimation (AghaKouchak et al., 2011; Shah & Mishra, 2016; Tian et al., 2007). As the input forcing the observed initial conditions in hydrological models, discrepancies in reported daily precipitation may induce large biases in simulated streamflow (Shah & Mishra, 2016).

The spatial distributions of daily precipitation 90th percentiles during the 2003–2010 period were plotted to investigate the differences between product estimates of extreme events (Figure 13). Overall, the satellite-related and gauge-based products show higher extreme precipitation over Africa, southern Asia, and South America than the reanalysis products, except for MERRA. Across all products, NCEP1 and ERA Interim provide the lower estimates of extreme precipitation and MERRA the highest. Compared with other products, MERRA greatly overestimates extreme precipitation in the Indian Ocean, the Pacific Ocean, and the western Atlantic Ocean. Consistent with the mean values (Figure 9), the differences in estimates of extreme precipitation are larger for arid regions than for humid regions. Large differences in the estimates are located in northern Africa, western and central Asia, and central Australia (Figure 14). However, differences in extreme precipitation estimates are lower at higher latitudes than at lower latitudes, which is not consistent with the results for mean precipitation estimates.


Sources of error include the satellite sensor itself and retrieval error, plus the spatial and temporal sampling of satellite products and the numerical models used in reanalysis products. Understanding the sources of error is important for improving the retrieval algorithms, model optimization, and bias-correction techniques. Therefore, the Willmott decomposition technique (Willmott, 1981) was used to decompose the systematic and random error components. Figures 15 and 16 show the systematic and random error components for daily precipitation data in the satellite products and reanalysis products, respectively, with the GPCC daily as the reference. Overall, there are significantly fewer systematic errors than random errors for all satellite and reanalysis precipitation products assessed over large regions. For satellite estimates, higher systematic errors are found in the Himalayas, central Asia, and East Asia. PERSIANN-CCS has larger systematic errors than CMORPH, TRMM 3B42, and PERSIANN-CDR. The spatial distribution of the systematic errors is similar for all reanalysis products. Systematic errors are the main source of errors over large parts of Africa, northern South America, and Greenland. Random errors are the dominant form of error for large regions of global land, especially at high latitudes.


3.4 Regional Intercomparison of Precipitation Estimates
Previous studies have presented evidence for the reliability and usability of global precipitation data sets at the regional scale. In this section, we summarize and discuss the results of this previous research for a number of important world regions. At the regional scale, precipitation product estimates can be compared with data from local collection sites.
3.4.1 South Asia
South Asia is the most prominent monsoon region in the world, and accurate precipitation data are essential for understanding the monsoon. Precipitation products have succeeded in depicting region-specific rainfall patterns across climatologically different parts of India. Nevertheless, most data sets, including gauge-based estimates, reanalysis systems, and satellite retrievals, have difficulty estimating orographic rainfall, particularly in the Western Ghats mountain range, northeast India, and the Himalayan foothills (Hu et al., 2016; Palazzi et al., 2013; Prakash, Mitra, Momin, Pai, et al., 2015; Prakash, Mitra, Momin, Rajagopal, et al., 2015; Shah & Mishra, 2016). Most reanalysis products exhibit higher interannual variability in monsoon-season precipitation than the actual observations. Most products, especially satellite-based ones, underestimate monsoon-season precipitation at high elevations but overestimate it in other parts of India relative to the gauge based gridded precipitation data set from the India Meteorological Department (Shah & Mishra, 2014; Sunilkumar et al., 2015). The positive bias observed in monsoon-season reanalysis estimates might be related to an overestimation of moisture content, and hence precipitable water, by the observation systems (Shah & Mishra, 2014). Larger biases in the frequency of daily extreme precipitation events are found in satellite products than in other products. Extreme rainfall frequency is largely overestimated in TRMM products over India, except in northern India and the Western Ghats (Rana et al., 2015; Shah & Mishra, 2016).
3.4.2 East Asia
For China, all products capture the overall spatial distribution and temporal variations in precipitation (Huang et al., 2016). But the performance of the products depends on the region and the particular precipitation regime. There is a greater discrepancy in precipitation estimates in the northwest regions of China, and over the Tibetan Plateau (Ma et al., 2009; Miao et al., 2015; Sun, Miao, et al., 2014). The satellite products, including PERSIANN-CDR, TRMM 3B42, and CMORPH, perform better over wet regions and in warm seasons (Shen et al., 2010; Zhao & Yatagai, 2014). At the daily scale, CMORPH generates more light-rain events and fewer heavy-rain events, which is partially due to the bilinear interpolation process during the generation of gridded satellite products (Yu et al., 2009). TRMM 3B42 overestimates the frequency of heavy precipitation events for some areas of southeastern China but underestimates the frequency of light and moderate precipitation events over most of northwestern China compared with data from 756 Chinese stations archived by the Chinese Meteorological Administration (Zhao & Yatagai, 2014). All other satellite products show lower frequencies for both light and heavy rain events (Shen et al., 2010). The reanalysis products ERA-Interim, MERRA, and NCEP1 reproduce the frequency of heavy precipitation events reasonably well, but NCEP2 significantly overestimates the frequency of heavy precipitation events, especially very heavy rainfall, compared with GPCP and CMAP estimates and rain gauge observations over mainland China (Huang et al., 2016). Because it includes relatively few gauge observations, CPC tends to smooth the precipitation structure and miss local heavy precipitation events (Shen & Xiong, 2016). Over South Korea, TRMM successfully reproduces medium-intensity precipitation frequency but is less accurate during periods of heavy rain (Koo et al., 2009). CMORPH and PERSIANN significantly underestimate the summer mean distribution compared with observations from 520 automated weather stations in a network operated by the Korean Meteorological Administration (Sohn et al., 2010). In Japan, CMORPH estimates have higher correlations with rain gauge data than TRMM 3B42 and PERSIANN (Kubota et al., 2009).
3.4.3 North America
Overall, CMORPH and PERSIANN tend to overestimate precipitation and the frequency of high-intensity precipitation, particularly during the warm months, but tend to underestimate precipitation during the cold months, when compared with Stage IV radar-based multisensory precipitation estimates (AghaKouchak et al., 2011). TRMM 3B42 has a lower probability of detection and lower false-alarm rates than other products in both warm and cold seasons. MERRA reproduces the continental-scale patterns of change observed in the CPC U.S. Unified gridded data in a reasonable manner, although it underestimates the magnitude of extremes, especially over the Gulf Coast regions (Ashouri et al., 2016), the value of the 99th percentile of precipitation was lower for MERRA than for CPC (Ashouri et al., 2016).
3.4.4 South America
Compared with rain gauge data, most satellite-based products, such as CMORPH and TRMM, overestimate precipitation, principally for extreme precipitation over northern Argentina and southern Brazil, which are strongly affected by convective systems (Salio et al., 2015). The reanalysis products have systematic limitations in depicting precipitation across South America, with low spatial correlations (Bosilovich et al., 2008). The reanalysis data sets generally overestimate the amount of mountain rainfall in South America compared with the CPC data set (Silva et al., 2011). Although CFSR is notably better than NCEP1 and NCEP2 at estimating large-scale precipitation patterns, it does show dry bias during the onset phase of the South American monsoon wet season and wet bias during the peak and decay phases of the monsoon wet season (Silva et al., 2011). Gauge-related products (CPC, GPCC, CMAP, and GPCP) show a fairly consistent pattern of annual and seasonal precipitation, with 5% and 11% differences over the Amazon region and northeast Brazil; larger disagreements are found in the spatial and temporal patterns for the interannual to decadal variation (Juarez et al., 2009).
3.4.5 Europe
The statistical characteristics for interannual variability among reanalysis products are more consistent for northern and eastern Europe than for the mountainous regions in southern Europe. When compared with station data for extreme precipitation, the NCEP products (especially NCEP2) fare better than the ECMWF products, which can be attributed to the differences in model parameterizations, spatial resolution, and input data assimilation (Zolina et al., 2004). For satellite-based products, orography and seasonal variability affect the accuracy of the satellite rainfall retrieval techniques (Stampoulis & Anagnostou, 2012). Satellite products generally perform well in the summer but relatively poorly in the winter, owing to the difficulties in retrieving low-intensity precipitation during the winter and/or cold surface backgrounds affecting the PMW retrievals (Kidd et al., 2012). Compared with surface radar data and gauge data, all products underestimate precipitation in northwest Europe throughout the year but overestimate summer precipitation in Germany, which tends to have more convective regimes in the summer than at other times of the year (Kidd et al., 2012). The convective nature of rainfall can also increase gauge-interpolation uncertainty, with higher values for the standard deviation of gauge error during the warm season (Stampoulis & Anagnostou, 2012).
3.4.6 Africa
All reanalysis products capture the regionally averaged seasonal cycle, with only a few spatial mismatches between estimates and observations seen in the climate pattern for the rainy season over southern Africa (Zhang et al., 2012). CFSR reproduces both the regional pattern and the local details for the precipitation mean and variability fairly well, because it has the finest resolution of all data sets and includes coupled ocean-atmosphere assimilation in the models (Zhang et al., 2012). For the satellite-based estimations, all data sets show general underestimation of heavy precipitation over eastern Africa relative to the observations from 205 gauge stations (Thiemig et al., 2012). Satellite-based estimates have some difficulties depicting the precipitation gradient normal to the elevated terrain (Derin & Yilmaz, 2014). For the long-term mean, GPCP and CMAP display the major precipitation patterns, but substantial discrepancies occur in areas with low gauge densities, such as equatorial West Africa (Yin et al., 2004). GPCC, GPCP, CMAP, and CRU are consistent in exhibiting the drying trend in the East African long rains, although the GPCC and CRU have smaller trend rates than the two satellite-gauge data sets, GPCP and CMAP (Yang et al., 2014). There are large discrepancies between GPCC, CMAP, GPCP, and CPC in the interannual and decadal variations in rainfall over the Congo basin (Juarez et al., 2009).
3.4.7 Australia
Both TRMM and GPCP miss many light-rain events (intensities less than 1 mm/h), which may account for the low correlations between gridded data sets and the merged satellite-gauge data sets over the most arid regions of Australia (Contractor et al., 2015). Futhermore, relative to a gauge-based gridded rainfall product from the Australian Water Availability Project, TRMM 3B42 generally overestimates tropical cyclone rain at low rain rates but underestimates it at high rain rates (Chen et al., 2013). Larger differences occurred for heavy precipitation during the winter months over southern Australia (Peña-Arancibia et al., 2013). For the long-term precipitation series, PREC/L is in relatively poor agreement with GPCC, GPCP, and CMAP over Australia compared with other continents, partly because there have been fewer gauge observations for PRECL in the latest decade (Simmons et al., 2010).
4 Discussion and Conclusions
Reliable and accurate estimates of precipitation are not only crucial for the study of climate variability but are also important for water-resource management, agriculture, and weather, climate, and hydrological forecasting (Sarojini et al., 2016). This study provides a comprehensive overview of 30 existing precipitation products and quantifies the discrepancies in the different precipitation estimates over timescales ranging from daily to annual. The 22 monthly or daily precipitation products evaluated had spatial resolutions varying from 0.04° to 2.5° and included gauge-based (CRU, GPCC, GPCC-daily, PRECL, UDEL, and CPC-Global), satellite-related (PERSIANN-CCS, PERSIANN-CDR, CMORPH, TRMM 3B43, TRMM 3B42 GPCP, GPCP 1dd, CMAP, and MSWEP), and reanalysis (NCEP1, NCEP2, ERA Interim, 20CRv2, JRA-55, MERRA, and CFSR) products. We found that current observations had large uncertainties in the magnitude and variability of precipitation at multiple timescales. There were deviations of up to 300 mm in the estimated magnitude of annual precipitation, even among products within the same category. The reanalysis data sets generally had the largest discrepancies when compared with the other data sets. JJA and MAM made slightly greater contributions to the differences in annual precipitation than the other seasons. At the daily scale, light precipitation events occurred more frequently than other precipitation events, and the divergence in product estimates was greatest for light events. There were slightly greater discrepancies associated with estimates of extreme precipitation events at lower latitudes than at higher latitudes, which is inconsistent with the results for mean values.
The intermittency of precipitation coupled with sampling in time and space is the major challenge for precipitation observation (Hegerl et al., 2015; Trenberth et al., 2017). Rain gauges are indispensable for measuring precipitation directly, but global gauge density is limited and gauge distribution is uneven because there are few gauges in the oceans (Kidd et al., 2017). Critically, the number of gauges available has been declining, and this may reduce our ability to track precipitation variability in the future. At the temporal scale, gauge-based precipitation products are restricted to monthly sampling. The data latency and lack of fine temporal resolution make it difficult to use these products in real-time research or for tracking changes in short-term extremes. Further, gauge observations undergo interpolation to form gridded data sets in order to cover land worldwide, and this smooths the extreme values and affects the long-term trends, especially in regions with sparse gauges. Gauge-based precipitation estimates are inconsistent between products and vary greatly across the globe, depending on the number of stations used, their homogeneity, the manner of analysis, quality control procedures, and how data coverage that changes over time is treated (Hegerl et al., 2015; Sun, Miao, et al., 2014).
Satellite data provide adequate temporal resolution and fine spatial resolution with wide coverage, enabling accurate precipitation estimates in some un-gauged regions, such as the oceans, complex mountain areas, and deserts. Several algorithms and models based on multiple wavelengths have been developed to derive precipitation estimates. Nevertheless, it is essential to note that precipitation estimates derived from satellite data are indirect and are inevitably accompanied by a large degree of variability. For instance, many previous studies have indicated that satellite-based products generally have difficulty representing precipitation in areas with complex topography in which precipitation is controlled by the orography and characterized by high spatiotemporal variability (Derin & Yilmaz, 2014). IR retrievals generally fail to capture light precipitation events and underestimate orographic rains, whereas PMW retrievals face challenges detecting orographic precipitation, especially in the cold season (Derin & Yilmaz, 2014). In addition, it is difficult to use satellite data for climate-related research because only about 40 years of satellite data have been obtained. The recent GPM mission is the most promising plan for better calibration of space-based observations. It uses a constellation of satellites to improve sampling, making it possible to provide accurate and timely precipitation estimates and to capture the intermittency of precipitation. With the current products, better calibration of satellite data or better methods for the optimal combination of measurements, estimates, and model outputs may provide a better understanding of precipitation.
Another challenge is that the degree of discrepancy in precipitation estimates varies from region to region; some regions with insufficient observations display relatively large discrepancies. Over tropical oceans, precipitation events are mainly dominated by convective systems and feature high spatiotemporal variability (Pfeifroth et al., 2013). Hence, more accurate convective parameterization schemes, reasonable representation of the physical processes, and higher resolution are required in the reanalysis models. Satellite data can be included in the data assimilation to improve precision in reanalysis estimates. However, a single algorithm is not always applicable to different regions. Cross validating the differences among multiple data sets is essential for reducing discrepancies.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (416622101 and 91547118), the National Key Research and Development Program of China (2016YFC0501604), U.S. Department of Energy (DOE Prime Award, DE-IA0000018), and the State Key Laboratory of Earth Surface Processes and Resources Ecology. We are grateful to the products' developers for providing the precipitation data sets; the data sets and the MATLAB code for reading data are included in the supporting information.
Glossary
-
- 20CRv2
-
- Twentieth century reanalysis system
-
- 3D-Var
-
- Three-dimensional variational data assimilation
-
- 4D-Var
-
- Four-dimensional variational data assimilation
-
- AMSR-E
-
- Advanced Microwave Scanning Radiometer for the Earth Observing System
-
- AMSU
-
- Advanced Microwave Sounding Unit
-
- AMSU-B
-
- Advanced Microwave Sounding Unit-B
-
- CFSR
-
- Climate Forest System Reanalysis system
-
- CMAP CPC
-
- Merged Analysis of Precipitation
-
- CMORPH
-
- Climate Prediction Center morphing technique
-
- COOP
-
- Cooperative Observer Network
-
- CPC
-
- Climate Prediction Center
-
- CPC-Global
-
- Gauge-Based Analysis of Global Daily Precipitation
-
- Japanese
-
- 55-year Reanalysis (JRA-55)
-
- CRU
-
- Climate Research Unit
-
- DMSP
-
- Defense Meteorological Satellite Program
-
- ECMWF
-
- European Centre for Medium-Range Weather Forecasts
-
- ERA
-
- European Centre for Medium-Range Weather Forecasts reanalysis systems
-
- FAO
-
- Food and Agriculture Organization
-
- GCOS
-
- Global Climate Observing System
-
- GEO
-
- Geostationary
-
- GHCN
-
- Global Historical Climatology Network
-
- GMS
-
- Geostationary Meteorological Satellite
-
- GOES
-
- Geostationary Operational Environmental Satellites
-
- GPCC
-
- Global Precipitation Climatology Centre
-
- GPCP
-
- 1dd GPCP one-degree daily precipitation analysis
-
- GPCP
-
- Global Precipitation Climatology Project
-
- GPI
-
- Geostationary Operational Environmental Satellites Precipitation Index
-
- GPM
-
- Global Precipitation Measurement
-
- GSI
-
- Grid-point statistical interpolation
-
- GTS
-
- Global Telecommunication System
-
- IR
-
- Infrared
-
- JJA
-
- June-July-August
-
- JRA-55
-
- Japanese 55-year Reanalysis
-
- LEO
-
- low Earth orbit
-
- MAM
-
- March-April-May
-
- MERRA
-
- Modern-Era Retrospective Analysis for Research and Application system
-
- MTSAT
-
- Multifunctional Transport Satellites
-
- MHS
-
- Microwave Humidity Sounders
-
- MSU
-
- Microwave Sounding Unit
-
- MSWEP
-
- Multi-Source Weighted-Ensemble Precipitation
-
- MW
-
- Microwave
-
- NCAR
-
- National Center for Atmospheric Research
-
- NCEP
-
- National Centers for Environmental Prediction
-
- NMAs
-
- National meteorological agencies
-
- NOAA
-
- National Oceanic and Atmospheric Administration
-
- OLR
-
- Outgoing long-wave radiation
-
- OPI
-
- Outgoing long-wave radiation precipitation index
-
- PERSIANN
-
- Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks
-
- PERSIANN-CCS
-
- Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Cloud Classification System
-
- PERSIANN-CDR
-
- Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Climate Data Record
-
- PMW
-
- Passive microwave
-
- PREC
-
- Precipitation reconstruction
-
- PRECL
-
- Precipitation reconstruction over land
-
- SON
-
- September-October-November
-
- SSM/I
-
- Special Sensor Microwave/Imager
-
- SSMIS
-
- Special Sensor Microwave Imager Sounder
-
- TIROS
-
- Television and Infrared Observation Satellite
-
- TMI
-
- TRMM Microwave Imager
-
- TMPA
-
- TRMM Multi-Satellite Precipitation Analysis
-
- TMPI
-
- Threshold Matched Precipitation Index
-
- TOVS
-
- Television and Infrared Observation Satellite Operational Vertical Sounder
-
- TRMM
-
- Tropical Rainfall Measuring Mission
-
- UDEL
-
- University of Delaware
-
- VIS/IR
-
- Visible/infrared
-
- VIRS
-
- Visible and infrared radiometer
-
- WMO
-
- World Meteorological Organization