Improving Global Forecast System of extreme precipitation events with regional statistical model: Application of quantile-based probabilistic forecasts
Abstract
Forecasting of extreme precipitation events at a regional scale is of high importance due to their severe impacts on society. The impacts are stronger in urban regions due to high flood potential as well high population density leading to high vulnerability. Although significant scientific improvements took place in the global models for weather forecasting, they are still not adequate at a regional scale (e.g., for an urban region) with high false alarms and low detection. There has been a need to improve the weather forecast skill at a local scale with probabilistic outcome. Here we develop a methodology with quantile regression, where the reliably simulated variables from Global Forecast System are used as predictors and different quantiles of rainfall are generated corresponding to that set of predictors. We apply this method to a flood-prone coastal city of India, Mumbai, which has experienced severe floods in recent years. We find significant improvements in the forecast with high detection and skill scores. We apply the methodology to 10 ensemble members of Global Ensemble Forecast System and find a reduction in ensemble uncertainty of precipitation across realizations with respect to that of original precipitation forecasts. We validate our model for the monsoon season of 2006 and 2007, which are independent of the training/calibration data set used in the study. We find promising results and emphasize to implement such data-driven methods for a better probabilistic forecast at an urban scale primarily for an early flood warning.
1 Introduction
Extreme precipitation events are among the most devastating weather phenomena. An overabundance of precipitation, usually followed by flash floods and strong surface winds, causes large destruction upon human society and environment in different parts of the world every year [Intergovernmental Panel on Climate Change, 2010; Easterling et al., 2000]. Occurrence of precipitation extremes at the same time is expected to increase under global warming [Diffenbaugh et al., 2005]. A skillful forecasting is most crucial in early detection and broadcasting alerts to safeguard the people through efficient evacuation, flow diversion/regulation, and preparedness of the disaster mitigation team. An accurate prediction of extreme precipitation at the same time is a challenging task due to the multiscale nature of the atmospheric processes responsible for occurrence of precipitation and associated inherent variability of these processes over space and time [Fritsch et al., 1998]. In a monsoon system, it becomes even more challenging due to the involvement of underlying complicated geophysical processes. Furthermore, over urban regions, a good forecasting system needs proper incorporation of urban feedback. Interactions between atmospheric processes associated with monsoon and land processes associated with urban canopy make extreme forecasting a challenging area of weather research. Furthermore, the extreme precipitation events are relatively rare events; hence, it is difficult to obtain detailed information about them. Significant research has been carried out in this area during the last few decades [Kunkel et al., 1999; Coles and Powell, 1996]. However, a considerable number of extreme events remained unpredicted, causing loss of life and properties in many parts of the world [Coumou and Rahmstorf, 2012].
Several attempts have been made in modernization of observing systems and data processing capability and in implementing those to improve precipitation forecasts. Numerical weather prediction (NWP) models based on dynamical weather equations are commonly used to provide accurate and meaningful forecasts based on weather conditions [Giorgi and Lionello, 2008]. With the advances in computing power, the NWP modeling has steered new atmospheric models with enhanced spatial and temporal resolutions. The dynamic NWP models are primarily used to produce an ensemble forecast of mesoscale (20–200 km) precipitation, along with other meteorological variables, at a synoptic scale (200–2000 km). An ensemble forecast quantifies the forecast uncertainty and enables the assessments of risk for the decision-makers. However, the current NWP ensemble forecasts are found to be typically biased and often underestimate precipitation with huge forecast ensemble uncertainty, primarily resulting from model structure and approximations of subgridscale processes [Goddard et al., 2001; Maraun et al., 2010]. Furthermore, NWP models are not very efficient in predicting heavy rainfall events [Březková et al., 2010; Hong and Lee, 2009; Khaladkar et al., 2007; Selvam, 2011]. To overcome this limitation, the mesoscale atmospheric models are used on a finer spatial resolution of 0.5–1 km incorporating land surface processes; they are effective in predicting extreme heavy rainfall events [Dodla and Ratna, 2010; Chang and Chiang, 2009]. These high-resolution NWP models demand huge computational requirement. Although they are good in obtaining realistic weather simulations, they remain nonuseful for simulating extreme events over a larger region within a shorter span of lead time of forecasts.
Statistical methods popularly known as Quantitative Precipitation Forecasting (QPF) are applied to combine dynamical NWP model output and statistical methods for producing skillful rainfall forecasts. Several studies have attempted to link the extreme rainfall with corresponding atmospheric behavior [Hart and Grumm, 2001; Panziera and Germann, 2010]. The methods are developed based on the fact that the large-scale (synoptic scale) processes have better predictability than regional rainfall itself. These simulated synoptic scale circulation patterns are used as predictors for rainfall forecasts in QPF. Some of the QPF techniques, such as analogue methods [Lorenz, 1969], perfect prog method [Klein, 1971], and fingerprinting technique [Root et al., 2007], have been efficiently applied in prediction of extreme events.
The extreme precipitation over the monsoonal region of India is caused by formation and subsequent movement of cyclonic, low-pressure disturbances originating from the Bay of Bengal and Arabian Sea [Rakhmecha and Pisharoty, 1996; Francis and Gadgil, 2006]. A large number of extreme precipitation events are caused by thunder storms, which are nonuniformly distributed in space and time. The precipitation extremes over Indian landmass also resulted from the convective instabilities in moist air, which is mostly confined to small spatial location and is of shorter duration [Goswami et al., 2006]. The extreme rainfall over many regions, especially northwest India, also occurs due to strong interactions between the easterlies and western disturbances [Krishnamurthy and Shukla, 2000]. Indian summer monsoon precipitation is reported to have overall increasing trends in both heavy and very heavy rainfall events over the entire country [Rajeevan et al., 2008] as well as increase of extreme rainfall events over Central India [Goswami et al., 2006]. Further, these increasing trends of occurrence of the extreme events are found to be spatially nonuniform, which can possibly be attributed to changing regional processes such as urbanization and land use change [Ghosh et al., 2012]. With the increased urbanization and industrialization [United Nations Habitat, 2010] the importance of local scale land surface processes on regional meteorology and extreme precipitation has increased [Zheng et al., 2006; Karl and Trenberth, 2003].
The impact of extreme precipitation becomes highly distressing when it hits a densely populated urban area that is a center of economic activities of the country. Impacts of urbanization on extreme precipitation are evident from global studies [Oke, 1988; Shepherd et al., 2002; Rozoff et al., 2003; Gero and Pitman, 2006; Niyogi et al., 2011]. India, ranked as the second most populous country of the world, is undergoing fast rate of urbanization [World Urbanization Prospects, 2014]. The country houses three megacities of the world [United Nations Habitat, 2012] and has 27 urban centers having population more than 1 million housing 43% population of the country [Census of India, 2011]. This growing urbanization affects rainfall extremes in India [Vittal et al., 2013; Kishtawal et al., 2010]. The signature of urbanization on rainfall extremes is found to be prominent over the Southern, Central, and Western India [Shastri et al., 2015]. This shows the necessity of extreme rainfall forecasting system over the urban regions in India. An extreme rainfall event with an intensity of 944 mm/d that occurred in July 2005 over the urban center of Mumbai caused nearly 500 fatalities and the economic loss of 2 billion U.S. dollars [Ranger et al., 2011], and such risk and vulnerability have been increased in Mumbai over decades [Sherly et al., 2015]. This event has drawn the attention of meteorologists to study the trend of extreme rainfall events over Indian subcontinent as well to develop an efficient flood forecasting system [Goswami et al., 2006; Rajeevan et al., 2008].
1.1 Synoptic Scale and Modeling
A significant improvement in the quality of numerical forecast is made in the last two decades. This has come as a result of increased resolution; improved physical parameterizations; improved chemistry and aerosol physics; improved estimates of the initial state estimate due to better data assimilation techniques; and improved couplings between the atmosphere and the land surface, cryosphere, ocean, and more [Hamill et al., 2013]. In India, the National Centre for Medium Range Weather Forecasting produces rainfall forecasts with a Unified Model (UM), at N512L70 resolution [Rajagopal et al., 2012]. Regional configurations of UM have also been set up at resolutions of 12 km, 4 km, and 1.5 km. The domain over South Asia (approximately 30°–115°E, 9°S–45°N) has a spatial resolution of 12 km and 70 vertical levels. A nested domain has also been set up with mesoscale atmospheric model at a resolution of 4 km and variable-grid atmospheric model at a variable grid resolution of 1.5 km over the urban region of Delhi (76°–79°E, 26°–29°N). It is run for 3 days, and mesoscale configurations are run for even shorter periods, when the primary purpose is nowcasting. The state-of-the-art global weather models are found to be poor in simulating extreme rainfall in India [Khaladkar et al., 2007]. One possible reason may be the nonhomogeneity of Indian regions with respect to the land-ocean interaction, terrain distribution, and prevailing weather systems. This aspect of nonhomogeneity in Indian regions brings the need to take into account the mesoscale conditions of the atmosphere, along with the synoptic scale circulations, for the prediction of heavy rainfall events. The forecasts, hence, have low hit rate and high false alarms [Durai and Bhowmik, 2014] and are not widely acceptable in society.
The Global Ensemble Forecast System (GEFS) NWP system developed at National Centers for Environmental Prediction (NCEP) is a physics-based dynamic forecasting system. To address the uncertainty in the weather observations GEFS forecast generates an ensemble of multiple forecasts each minutely different, or perturbed, from the original observations. The GEFS ensemble forecast operational on the NCEP supercomputer produces output at global level consisting of 21 members and is run 4 times daily (00:00, 06:00, 12:00, and 18:00 UTC). The perturbed initial conditions for both the operational GEFS and the reforecast use the ensemble transform technique with rescaling [Wei et al., 2008] and Stochastic Total Tendency Perturbation. The detailed description of the GFS can be found in Hamill et al. [2013] and at www.emc.ncep.noaa.gov/gmb/moorthi/gam.html. The current horizontal resolution of the atmospheric model of GEFS is ~70 km and has 28 vertical levels (T190L28). The model is run daily up to 240 h based on 00:00 UTC initial conditions. The model outputs are postprocessed at 6 h interval to a 1° × 1° regular latitude-longitude grid. GEFS provides 29 years (December 1984 to Present) of reforecast or hindcast data with 10 perturbed forecast members and one control forecast.
The outputs of weather and climate prediction models are, however, contaminated by systematic errors. To provide reliable and accurate guidance to the end users, a statistical postprocessing may be helpful. Especially for rare events and longer lead forecasts, a long training reforecast data set is helpful to provide enough similar cases to statistically correct the forecasts for rare events. Here we present a data-driven approach applied to the outputs of GEFS toward the forecasting of extreme precipitation events over an urban region of Mumbai, India. The city of Mumbai is situated at latitude 18.9°N and longitude 72.8°E in the western coast of India, locally known as the Konkan region. Mumbai, with an estimated metropolitan population of 20.7 million, is the second most populous city in India [Census of India, 2011] and the eighth most populous city in the world [United Nations Habitat, 2010]. Mumbai is a major commercial hub of India, known as the financial capital of the country. The city is located (Figure 1a) in the west coast of India and on the windward side of the Western Ghats, a major mountain range of India. The occurrence of rainfall over the city of Mumbai is associated with the southwest Indian summer monsoon [Kumar et al., 2008]. Being on the windward side of Western Ghats, the city receives high orographic precipitation because of prevailing westerly winds perpendicular to the Western Ghats. The mean and maximum annual precipitations over the time period of 1979–2007 at the Santacruz-Mumbai station are accounted to be 180 mm and 9942 mm, respectively. We first evaluate the performance of the state-of-the-art Global Ensemble Forecast System (GEFS) by National Centers for Environmental Prediction (NCEP) for extreme precipitation over Mumbai with respect to the observed rainfall data over the station Santacruz. We find that the climatology of monsoon rainfall is well captured by GEFS, although heavy rainfall epochs are underestimated (Figure 1b). The mean rainfall is well captured by GEFS simulations, but standard deviation is underestimated due to the underestimation of extremes (Figure 1c). The GEFS forecasts of precipitation over Mumbai suffer from very low probability of detection (PD, ratio of detected extremes to total number of extremes) [Hsu and Murphy, 1986] and very high false alarm ratio (FAR, ratio of number of forecasts of extreme event on a nonextreme day to the total number of extreme forecasts) [Hsu and Murphy, 1986]. Here we define a daily rainfall event as an extreme event when it exceeds 95th percentile of the observed daily precipitation time series. To further test if a standardization-based bias correction [Wilby et al., 2004] improves the values of these metrics, we evaluate the bias-corrected output and find a modest improvement. The PD remains around 20%, whereas the FAR is around 80% for Mumbai (Figure 1d). This further results into a very low value of Heidke Skill Score (HSS) [Heidke, 1926], which combines both PD and FAR.

These low performances of GEFS for precipitation extremes over Mumbai motivate us to develop a methodology that applies the quantile regression for probabilistic forecasting of extreme rainfall events. Forecast skills as well as the reliability of forecast are assessed through validation with the historical data. Quantile regression was earlier applied in meteorology to understand different quantiles of rainfall corresponding to the perceptible water over a region [Friederichs and Hense, 2007]. Here we apply this concept to develop a forecast model for extreme rainfall events over Mumbai, India. Quantile regression is applied to the synoptic circulation patterns derived with bias-corrected GEFS-simulated predictors. These predictors are known to be reliably simulated by the models. The objective of this study is to test if such quantile regression improves the GEFS forecasts and to develop probabilistic forecasts with such methods. To the best of our knowledge, this is the first application of data-driven approach to the ensemble forecasts made by GEFS for a better forecasting skill of extreme precipitation during Indian summer monsoon. The outline of the manuscript is as follows: section 2 describes the study region and data used for the study. Section 3 explains the methodology adopted to assess the occurrence of extreme rainfall. The results and discussion of the statistical analysis are provided in section 4, followed by summary and conclusion in section 5.
2 Data
Ground-based observed precipitation data at meteorological station of Santacruz, Mumbai, for the time period of 1979–2007 at daily time scale, are obtained from India Meteorological Department. The rainfall data pertaining to the southwest summer monsoon months, June-July-August-September, are extracted from the annual time series and are used in this study. The meteorological variables at a synoptic scale that directly affect rainfall process are used as predictors to forecast the occurrence of extreme precipitation events. The selection of predictor variables is important to obtain the desired skill of model performance. Wilby et al. [1999] described three requirements for selection of the predictor variables: 1 the data for the particular predictor should be available for the desired period, 2 the selected variable should be well simulated by the model, and 3 the predictor should show a good correlation with the predictand. Based on these criteria, the meteorological variables, namely, geopotential height, relative humidity, air temperature, eastward wind field, and northward wind field, at the pressure level of 1000 hPa and 500 hPa as well as at the surface level are selected as the predictors. The 500 hPa level is selected as a representative level for mean steering flow for convective storms [Hagemeyer, 1991]. The first step of data-driven forecasting is to develop the statistical relationship between predictor and predictand. Here we use the reanalysis data from ERA-Interim, provided by the European Centre for Medium-range Weather Forecasts (ECMWF) [Dee et al., 2011], as the observed predictors.
The forecasting is achieved with the help of synoptic scale weather forecasts from currently operational National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS). The reforecast data (1985–2007) are obtained through the National Oceanic and Atmospheric Administration/National Operational Model Archive and Distribution System (http://nomads.ncdc.noaa.gov).The operational medium-range GEFS data are available every 3-hourly intervals from 0 to 72 h and every 6 h thereafter up to 192 h. The GEFS reforecast data are collected at 1° latitude and 1°longitude spatial resolution for the selected meteorological variables and spatial extent. An ensemble data set with 10 perturbed forecast members is obtained for the time period of 1984–2007 to understand the associated intramodel uncertainty.
3 Methodology
An overview of the methodology used in this study is summarized in Figure 2a. The methodology is based on censored quantile regression (Figure 2b) that provides different quantile-based rainfall forecasts corresponding to a specific synoptic scale circulation pattern. The approach involves two basic steps: training/calibration of the regression model based on observed/reanalysis data and application of calibrated model to the synoptic scale forecasts from GEFS ensemble. The areal extent of predictors used in regression is delimited by latitudes 10°–30°N and longitudes 60°–80°E to account for physical processes over the Indian subcontinent and Arabian Sea, possibly associated with monsoon rainfall in Mumbai. The reanalysis data at spatial resolution of 1° latitude × 1 longitude grids are obtained for the baseline period of 1979–2007 (29 years), which is the same as the availability period of the rainfall. The time duration of data availability is sufficient to establish a reliable climatology as well as training and validation of the proposed statistical model. Figure 3 shows spatial extent of ERA-Interim reanalysis grid points (total 441 points, 21 × 21) over the map of India along with the location of Santacruz station (Mumbai).


3.1 Preprocessing of Predictors
Preprocessing of predictors for preparation of input data to the regression model involves three steps: (i) normalization of the predictors, (ii) dimensionality reduction, and (iii) bias correction of GEFS output. The data availability period (1979–2007, 29 years) is divided into two successive time windows. The data for the first 15 years (1979–1993) are used for model calibration/training, and the rest (1994–2007, 14 years) are used for model validation. First the predictor variables are individually normalized with the standardization method. Normalization takes care of adjusting the values of selected variables to a common scale. The selected 10 predictor variables at 441 (21 × 21) grid points around the Santacruz station (Mumbai) make a sum of 4410 dimensions to be utilized for regression analysis. Use of all dimensions of the predictors for regression poses the difficulty of multidimensionality. Second, as the climatic variables are highly correlated, use of all dimensions results into multicollinearity. The use of high-dimensional correlated data at the same time is computationally expensive. On the other hand, if the dimensions are reduced without preserving the internal pattern and variability, then they hamper the accuracy of the model output. Principal component analysis (PCA) is a powerful and most widely used multivariate statistical technique for reducing dimension without losing much of its variability for multidimensional data set. With the help of PCA the complete set of predictor variables (4410 in number) are reduced to a new set with fewer dimensions but retaining a large fraction of the variability of the original data sets. This new set of variables, termed as the principal components (PCs), are used as regressors with the observed rainfall as regressand. Preisendorfer [1988] investigated various rules for the selection of principal components. Here we have used Kaiser's rule, as adopted by Jolliffe [1986], to retain the principal components accounting for 60–98% of variability present in the complete predictor data set (Table 1). Apart from PCA we also use another dimensionality reduction technique known as least absolute shrinkage and selection operator (LASSO) to obtain the optimum and best set of predictor variables. The methodology of LASSO developed by Tibshirani [1996] selects variables on the basis of their effect on the response and reduces the number of variables with a constraint of sum of absolute value of coefficients less than a threshold. A detailed description of LASSO is presented in section S1 in the supporting information. We also combine both dimensionality reduction methods by applying LASSO to PCs. We perform regression for all the possible cases presented in Table 1 and select the best based on regression performance.
| Sr. No. | Methodology Used | % Age Variance Explained by the Selected Predictors | No. of Predictors Selected |
|---|---|---|---|
| 1 | PCA | 98 | 200 |
| 2 | PCA | 90 | 48 |
| 3 | PCA | 88 | 39 |
| 4 | PCA | 87 | 35 |
| 5 | PCA | 86 | 32 |
| 6 | PCA | 85 | 31 |
| 7 | PCA | 80 | 21 |
| 8 | PCA | 75 | 15 |
| 9 | PCA | 70 | 11 |
| 10 | PCA | 65 | 9 |
| 11 | PCA | 60 | 7 |
| 12 | LASSO | 87 | |
| 13 | PCA and LASSO | 59 |
(1)Here the overbar refers to average skill score over a preferably large sample. The CRPS generalizes the absolute error and therefore provides a direct way of comparing various deterministic and probabilistic forecasts by using a single metric. The minimal value of CRPS is zero, achieved for
, that is, in the case of a perfect deterministic forecast. The CRPS has the dimension of the parameter x (which enters via the integration over dx).
(2)
(3)Here we calculate the CRPSS for the probabilistic forecast based on 10-member ensemble of GEFS. The standard CRPS is calculated with the help of the standard R verification statistical package. The negative value of CRPSS reveals no gain with respect to the reference forecast.
3.2 Quantile Regression
(4)- In the first step the conditional probability of the occurrence of precipitation π = prob(Y > 0|X) is estimated by using a generalized linear model with a logit function [Fahrmeir and Tutz, 1994]. The estimated probability of precipitation, denoted as
, indicates probability of not censoring.
- Based on this estimate, a subsample J0 is chosen with J0 = {i:
i = prob(Y > 0/xi) > 1 − τ}, where i varies from 1 to N (number of observations). Using the subsample J0 an initial estimate of quantile coefficients βτ is obtained.
- An updated subsample with J1 = {i:
xi > 0} is selected. An estimate of
is obtained with the new subsample J1. This may be considered as the final value of
, or the same procedure may be repeated, which is optional. Here we perform the quantile regression by using the R open source statistical computer program [R Development Core Team, 2003] and the R quantreg package. The quantreg was earlier used by Friederichs and Hense [2007] for statistical downscaling for extremes.
3.3 Verification of Estimated Quantiles
(5)
(6)
(7a)
(7b)Equation 6 is the censored least absolute deviation (LAD) function of a regression quantile.
To further substantiate the quantile forecasts, we investigated the wellness of forecast with the individual extreme events. If a quantile forecast is reliable, then it is expected that the observed precipitation value will lie around the forecast at high quantiles. The performance of the quantile model is validated for the extreme events occurring over the last two years, 2006 and 2007, of the data availability period. The validation samples comprised paired forecasts and observations at lead time of 24 h.
4 Results
The results as obtained from the different steps used in the proposed methodology are explained in the following subsections.
4.1 Bias Correction of GEFS Forecasts
We first present the bias that exists in the forecasts of the synoptic scale meteorological variables from GEFS. Figures 4a and 4b show the mean and standard deviation of observed temperature (a predictor used in the study). The same for the forecasted multirealization average temperature by GEFS are presented in Figures 4c and 4d. Although the observed spatial pattern with high temperature in Northern India and low temperature in Southern India matches quite well with the forecasted data, there are certain discrepancies at a very local level (e.g., temperature mean over Northern India and standard deviation in temperature) which are corrected here. As the model used here is a black box model and we have no control on the sensitivity of this model to a small change in predictor patterns, the result may be entirely different from that expected if we do not perform the bias correction. After performing bias correction to the GEFS forecasts by using standardization, the patterns from the forecasts resemble quite well those of observed (Figures 4e and 4f).

4.2 Selection of Principal Components and Evaluation of the Model
An optimal number of PCs are selected by performing number of experiments as shown in Table 1, followed by evaluation of the forecast skill for each of the set during the testing period. Figure S1 in the supporting information presents the percentage variance explained by the first 50 empirical orthogonal functions (EOFs) obtained with the PCA. Table 1 shows the proportion of total variance explained by the PCs selected under individual experiment. The altered cutoffs of explained variability for a complete set of PCs result in selection of the numbers of PCs ranging from 200 to 7. The skill score achieved with the validation of forecast from each of the experiments is presented in Table 2 and Figure 6. The results are obtained for a lead time of 24 h. We perform the quantile regression for 80, 85, 90, 95, and 99 percentiles and present the score evaluations based on 95 percentile. We find that, initially, PD increases and FAR decreases with the decrease in number of PCs. They reach the optimum values at experiment no. 4, i.e., when the number of selected PCs is 35, and they collectively present 87% of the variability of the original data set. The variability explained by selected 35 predictors is provided in Table S1 in the supporting information. We find that even within the first 35 dimensions, the last few PCs represent less than 1% of the variability. However, further reduction in predictor dimension (experiments 5–11, Table 1) results in reduction of model forecast skill (Table 2 and Figure 5). We also compute the CQVSS for all the 13 experiments, and this also shows the highest score for experiment no. 4 (Figure 5a). Consistency exists across different metrics, and based on the scores, we select experiment no 4 for further forecasting. Section S2 provides an interpretation of the contributions of predictors to PCs.
| Experiment | Probability of Detection (PD) | False Alarm Ratio (FR) | Heidke Skill Score (HSS) | Censored Quantile Verification Skill Score (CQVSS) | ||||
|---|---|---|---|---|---|---|---|---|
| 80 | 85 | 90 | 95 | 99 | ||||
| 1 | 0.83 | 0.82 | 0.158 | −0.140 | −0.096 | 0.032 | 0.200 | 0.537 |
| 2 | 0.91 | 0.78 | 0.219 | −0.002 | 0.0753 | 0.187 | 0.414 | 0.770 |
| 3 | 0.92 | 0.79 | 0.212 | 0.0251 | 0.0776 | 0.201 | 0.410 | 0.771 |
| 4 | 0.94 | 0.78 | 0.223 | 0.0167 | 0.0753 | 0.204 | 0.412 | 0.776 |
| 5 | 0.92 | 0.79 | 0.199 | 0.0144 | 0.0769 | 0.195 | 0.421 | 0.773 |
| 6 | 0.94 | 0.79 | 0.214 | 0.0100 | 0.0798 | 0.202 | 0.438 | 0.783 |
| 7 | 0.92 | 0.79 | 0.200 | 0.0189 | 0.0801 | 0.210 | 0.449 | 0.780 |
| 8 | 0.91 | 0.79 | 0.202 | 0.0291 | 0.1001 | 0.220 | 0.456 | 0.777 |
| 9 | 0.90 | 0.79 | 0.200 | 0.0329 | 0.1023 | 0.220 | 0.450 | 0.782 |
| 10 | 0.87 | 0.80 | 0.193 | 0.0375 | 0.1087 | 0.224 | 0.444 | 0.788 |
| 11 | 0.86 | 0.80 | 0.193 | 0.0390 | 0.1130 | 0.228 | 0.452 | 0.793 |
| 12 | 0.90 | 0.81 | 0.174 | −0.080 | −0.007 | 0.133 | 0.339 | 0.672 |
| 13 | 0.84 | 0.82 | 0.155 | −0.089 | −0.055 | 0.045 | 0.254 | 0.591 |

We have also included the GEFS-simulated (forecasted) precipitation as an extra predictor along with the existing 35 PCs. The experiment does not result into any improvement, rather, a marginal deterioration of the forecast skill. This attributes to the poor skill by GEFS in forecasting precipitation, specifically for extreme events in Mumbai. The results of the same experiment are presented in Figure S3 and Table S2. It is worth mentioning that by using the data-driven method, the PD has increased from around 0.2 to around 0.9; however, the FAR remains almost the same (Figure 5b). We further analyzed the days with false alarm and have found that this attributes to temporal uncertainty of extreme precipitation. We find that, during a day with false alarm, for most of the cases, the heavy precipitation occurs either 1–2 days earlier or 1–2 days later. Consideration of the entire window of 5 days (targeted day with 2 day earlier and 2 day later) results into decrease of FAR by around 90% (Figure 5c).
With the GEFS outputs, we also perform the forecasts for 3 day lead time. Here we present the scores for all the 10 ensemble members forecasted by GEFS. We do not find any visible differences in the scores across the ensemble runs, and all of them are preforming nearly similar (Figure 6a). When we compare the 1 day and 3 day forecasts, the CQVSS is observed to be lower for 3 day forecasts as expected. However, we do not find any significant differences in PD and FAR (Figure 6b). This suggests that the data-driven model is capable of identifying a possible extreme event 3 days ahead; however, the value gets improved with time which is evident from the CQVSS. As there is a good possibility that an identified extreme day may occur even 2 days earlier (Figure 5), a good strategy would be to make use of 3 day forecasts. If an extreme event is identified with the 3 day forecasts, then the preparedness should be started from the next day (2 days earlier the forecasted day).

The CRPSS for ensemble uncertainty is estimated for the proposed model with respect to the original GEFS-simulated rainfall, bias-corrected GEFS-simulated rainfall, and rainfall forecasted from the linear regression model. The result of CRPSS is presented in Figure 7. The highest CRPSS (above 0.7) is estimated at the highest (99th) quantile. The CRPSS reduces for the lower quantiles. A negative CRPSS is also observed at lower quantiles (80th–85th) in reference with only the linear regression model. This is indicative to no gain in performance achieved by the proposed model for the lower quantiles. At the higher quantiles (extreme rainfall) the quantile regression model presents higher skill than the considered reference models.

4.3 Forecasts for Extreme Events During 2006 and 2007
We use the present model for forecasting of extreme events during the monsoon months of 2006 and 2007 and validate them against the observed data. Here we evaluate the model for each of the extreme events during the period individually. We find 29 such extreme events from the observed data (exceeding 95th percentile of observed rainfall). The ensemble average forecasts for those 29 days are plotted (Figure 8) with the ensemble uncertainty bands. The ensemble uncertainty is defined by the differences between lower and upper bounds from the 10 ensemble members of GEFS. The forecasted (hindcast) precipitations with uncertainties are presented in Figure 8, as obtained from the bias correction model as well from the quantile forecast model. The observed precipitation is plotted with the red solid line, the bias-corrected GEFS ensemble forecasts are plotted with the green solid line along with its spread by green band, and the quantile forecast is plotted with the blue solid line along with its spread by cyan band. We consider the forecast as correct if the line showing the forecast (green or blue) intersects (or above) the observed line (red). We find that the bias-corrected GEFS ensemble average forecasts can identify only 6 extreme events out of 29. Three more are identified by the ensemble uncertainty band of GEFS. This gets significantly improved, when we apply the proposed data-driven quantile regression. We find that 17 events (marked with red asterisk) are identified by the present method. Here it depends on the users to interpret the results obtained at different quantiles. An overestimation of the extreme event may occur, in case when only the forecast at a higher quantile is picked up. Furthermore, the width of the ensemble uncertainty band is quite smaller as compared to the original bands. From an application point of view, this is a significant improvement, which is made possible due to the use of present methodology. It is essential to mention that the ensemble uncertainty from quantile regression does not need to be always smaller compared to those from bias correction; however, for the present case, we are getting the same for all the cases.

We evaluate the gain in performance of quantile forecasts of precipitation from GEFS predictors and the same obtained by standard linear regression model. The forecast obtained with a linear regression model shows a considerable reduction in ensemble uncertainty (green band). However, the linear regression methodology fails to capture the extreme events with its ensemble mean (green). It is expected that the linear regression presents a quantile close to 50th, and hence, an intersection between the linear regression line and quantile regression line is not expected. However, for the present case, quantile regression is applied to the censored data, while linear regression is applied to the whole sample. Differences in the sample used in the two methods do not ensure “no intersection,” which is reflected in multiple subplots of Figure 9. We further investigate the events missed by the proposed forecast model and check if this is due to time variability, resulting in the occurrences of extreme event not on the forecasted day but in the window of 1–2 days earlier/later. We use the forecasts for 2 days earlier, 1 day earlier, 1 day later, and 2 days later around the extreme days. The results are presented in Figure 10. Figures 8-10 are similar to the reliability diagram [Hamill, 1997] computed for the forecasts of specific extreme events that occurred in 2006 and 2007. Here we specifically focus on the extreme days which are missed by the quantile regression forecasts (24 h ahead). We find that, among the 12 missed events, 5 are forecasted 1–2 days earlier and 4 are forecasted 1–2 days later. This originates from the uncertainty associated with the exact time of occurrence of extreme event, given a favorable synoptic scale circulation pattern for a number of days. However, the model still fails to forecast three extreme events, and this can be reported as one of the limitations. A possible solution would be a coupled dynamic-statistical regional model with feedback from urban canopy, where the regional dynamic model will capture the circulation pattern, and that may result into better input to the quantile regression for an improved forecast. This can be considered as a potential area of future research.


5 Summary and Conclusions
We develop and demonstrate a data-driven quantile regression-based forecasting model for extreme precipitation at a weather scale, which uses the synoptic scale circulation pattern from state-of-the-art GEFS and improves its original precipitation forecasts quite significantly. The ensemble forecasts of precipitation from NCEP-GEFS suffer from very low value of PD and high value of FAR, which make the models inefficient for extreme rainfall prediction. Use of dynamic regional model would probably improve this; however, they suffer from high computational cost and these forecasts should be performed on real-time basis. The present model, being a data-driven model, is computationally very efficient and presents different quantiles of precipitation, given a favorable circulation pattern. The present methodology leads to significant improvement in the ensemble mean forecasts along with reducing its spread for extreme events. For instance, the original GEFS forecasts of precipitation do not capture majority of the observed extremes, whereas 90% of them are forecasted by this present method (considering the window of 5 days).The underlying assumption of the model is not to use the variables with low skill, such as precipitation, but to use the reliable variables and develop the circulation patterns based on these variables. These patterns work as the input to the quantile regression. It should be noted that a careful selection of predictors is crucial to obtain good forecasts. The model skill can be improved by an appropriate selection of forecast variables. A clear and precise measure of goodness is very important at the same time. The CRPSS confirms reduction of forecast ensemble uncertainty, especially at higher quantiles in application of the proposed model. The CQVSS provides a critical check on the performance of the model with respect to the state-of-the-art simulation outputs. The forecast skill in terms of both CQVSS and CRPSS reduces with the forecast lead time, but interestingly, the PD and FAR remain similar. This is an important conclusion as CQVSS essentially measures the quality of quantile values with respect to a reference output, whereas PD and FAR measure the binary output denoting extreme or nonextreme day. Low decline in PD with increase in lead time gives sufficient time to the population for the preparedness toward extreme event. The model is still unable to produce satisfactory FAR; however, this attributes to the uncertainty in the timing of extreme rainfall. However, if we consider a larger time window (more than days), then the FAR improves. This needs to be considered while using the forecasts for extreme event preparedness. The quantile regression model is specifically useful for higher quantile rainfall events. Given a synoptic scale circulation pattern, the precipitation event has huge space-time variability, which is even more for extremes. The extreme events are rare in number, and the smaller sample size associated with the same may not result in a good standard linear regression model for a forecast system. Hence, we use a quantile regression that provides the complete picture of the chances of rainfall with different quantities. The purpose is not to provide a specific value of rainfall, which is prone to significant error, but to provide different quantiles of rainfall associated with different probability levels, corresponding to forecasted synoptic scale circulation pattern. The users need to consider quantile regression results at multiple probability levels. The results from quantile regression also show a reduction in uncertainty across ensemble members. It should be noted that forecasts are not reliable if the uncertainty associated with the forecasts is overspread or underspread. The results obtained for the city of Mumbai emphasize the fact that the combination of dynamic (GEFS) and statistical methods (quantile regression) is more efficient to progress in the field of objective weather forecasting. An increased effort in this direction can prove to be much beneficial for a fast developing country like India having huge number of urban centers.
Acknowledgments
The work presented here is financially supported by the Ministry of Earth Science (project code: MoES/PAMC/H&C/36/2013-PC- II) and the Ministry of Water Resources (project code: 06/23/2013-INCSW/194-213). The precipitation data over Santacruz, Mumbai, have been collected from India Meteorological Department, Pune. The GEFS forecasts have been collected from http://www.esrl.noaa.gov/psd/forecasts/reforecast2/download.html. The reanalysis data ERA-Interim has been collected from the European Centre for Medium-Range Weather Forecasts (ECMWF).





