Determining the Parameters of the Ångström‐Prescott Model for Estimating Solar Radiation in Different Regions of China: Calibration and Modeling

The Ångström‐Prescott model (referred to as the A‐P model) is one of the most accurate and widely used models for estimating global solar radiation (Rs). In the absence of Rs measurements and given the regional discrepancy of model parameters, it is crucial to increase the availability of these parameters and the applicability of parameter‐predicted models in different regions. In this study, we evaluated and compared the applicability and performance of the calibrated model and eight predictive models in terms of A‐P model parameters, using daily Rs and meteorological data from 105 radiation stations in seven natural geographic zones in China. These models were evaluated based on their coefficient of determination (R2), root mean square error, Nash‐Sutcliffe efficiency coefficients, percent bias, and global performance indexes. Results indicated that altitude was the main factor determining the Ångström‐Prescott parameters in most regions. All models performed well, with acceptable accuracy across the whole country; however, their performances varied among regions. The best performing predictive models for the northeast region (Zone 1), north China (Zone 2), central China (Zone 3), south China (Zone 4), Inner Mongolia (Zone 5), northwest region (Zone 6), and Qinghai‐Tibet region (Zone 7) were obtained: These were Models 6, 1, 7, 3, 6, 1, and 7, respectively. The present results support the application of these predictive models for the estimation of daily global Rs in the corresponding regions of China, where measured Rs data are not available, and possibly in other regions with a similar climate.


Introduction
Solar radiation (Rs) is the main source of energy for the biological, physical, and chemical processes that occur in the Earth's surface systems and is an important driving factor for ecological, hydrological, and other process models (Liu et al., 2014). Accurate estimation of Rs is crucial for the simulation of the ecosystem carbon cycle, crop growth, evapotranspiration, and other ecological processes (Pan et al., 2013;Qin et al., 2011). However, unlike other routine meteorological data (e.g., precipitation and temperature), Rs data are not readily available for many locations worldwide owing to the high cost associated with acquisition, equipment maintenance, and technical complexities (De Souza et al., 2016). Such a lack of adequate Rs data hinders research programs and practical applications. Therefore, many studies have been conducted to estimate Rs by developing and applying appropriate methods. Empirical models are one example of widely used methods; these are mainly based on three main meteorological variables, for example, sunshine hours (Ågnström, 1924;Al-Mostafa et al., 2014;Lockart et al., 2015), cloud cover (Kimball, 2009), temperature (Bristow & Campbell, 1984;Yacef et al., 2014), and combination of meteorological variables. Meanwhile, the physical radiation models take into considerations of radiative transferring process (aerosol absorption and scattering), which is proved as an effective method for predicting Rs around the world (Pyrina et al., 2015;Wang et al., 2016). In addition, satellite remote sensing methods (Sanchez-Lorenzo et al., 2017;Zhang et al., 2015) and artificial intelligence methods, such as artificial neural network models (Fadare, 2009;Kashyap et al., 2015;Qin et al., 2011), have also been developed in recent years.
The Ångström-Prescott (A-P) model, which is based on sunshine hours, has been commonly used all over the world with good simulation performances . Given its simplicity and superior performance compared with other empirical models, its reference values for radiation parameters a and b, given by the Food and Agriculture Organization (FAO) Irrigation and Drainage Paper No. 56 (FAO-56: a = 0.25, b = 0.5), can be used in cases where Rs data are not available (Allen et al., 1998). Many studies conducted in various areas have shown that the use of the given parameters to estimate Rs yields limited accuracy, and therefore, the parameters of the A-P model should be calibrated locally (Liu et al., 2009;Sabziparvar et al., 2013). However, the limited availability of long-term measured Rs data makes it difficult to calibrate these parameters . To solve this problem, researchers have averaged parameters at different locations (Wu et al., 2006) or fitted parameters from an overall regional database combining all of the datasets at different locations to regional A-P parameters . However, these methods may fail to estimate radiation in complex terrains (Liu et al., 2017). Several earlier studies (Gopinathan, 1988;Jin et al., 2005) have found that the parameters in the model are site dependent. Models for the estimation of the A-P parameters have subsequently been developed by introducing variables, such as sunshine duration, and geographical factors, such as latitude and altitude. Liu et al. (2012Liu et al. ( , 2014 evaluated the applicability of these models and developed several parameter models for three regions in China; however, all of the models lacked general validity and were locally tuned. Zhao et al. (2013) developed and validated three kinds of models and established a countrywide general equation for China, based on daily Rs, sunshine hours, and Air Pollution Index data at nine meteorological stations for the period 2001 to 2011. Liu et al. (2017) used observation data from 15 radiation stations to validate different empirical estimation methods over the Tibetan Plateau and improved the Ångström-type model using altitude and water vapor pressure as the leading factors. Fan et al. (2018) developed four new combined sunshine-based models, two of which used temperature and precipitation to estimate daily horizontal global Rs in south China. These studies showed that A-P model parameters are influenced by geographical and meteorological factors, such as cloudiness and altitude, and vary greatly between different regions. Much work has been conducted by Liu et al. (2014), who estimated and developed models for predicting A-P parameters covering three agro-climatic zones in China. However, China covers a vast geographic area, and A-P parameters vary greatly among the different regions. To improve the accuracy of Rs estimation, it is necessary to validate and estimate the applicability of the parameter models and to select suitable models for different geographic domains.
The objectives of the current study were (1) to calibrate and predict the A-P parameters using long-term daily radiation data from 105 radiation stations in China and analyze their distribution and variation and (2) to evaluate and compare the applicability and performance of the calibrated model and eight selected predictive models and to choose the best performing predictive models for different regions in China.

Sites and Data
Daily meteorological data and daily radiation data from 105 radiation stations (Figure 1) in China from 1981 to 2016 were obtained from the National Meteorological Information Center. In accordance with the natural division reported by Zhao (1995), China can be divided into seven temperature zones: the northeast region (referred to as Zone 1), north China (Zone 2), central China (Zone 3), south China (Zone 4), Inner Mongolia (Zone 5), northwest region (Zone 6), and Qinghai-Tibet region (Zone 7). Zone 1 belongs to the cold-temperate zone and temperate, continental monsoon climate in which the annual precipitation is 400-700 mm in most areas and the annual temperature difference is large. Zone 2 has a warm, temperate, continental monsoon climate, with considerable temperature variability and concentrated precipitation. Zone 3 belongs to a subtropical, humid, monsoon climate, with abundant thermal resource and precipitation, and there are significant differences in surface elevation and geomorphological combination. The annual precipitation is 800-1,800 mm. Zone 4 is located in the southernmost part of China, belonging to a tropical south, subtropical monsoon climate, and the precipitation in most areas reaches 1,400-2,000 mm with high intensity but uneven distribution. The average annual temperature in most areas of Zone 4 is more than 20°C. Zone 5 is dominated by plateau, belonging to the mid-temperate, semiarid climate, with low temperature and little precipitation. The annual precipitation decreases from east to west, and its seasonal distribution is extremely uneven. Zone 6 is the most arid region in China, with abundant thermal resources, great temperature variability, and little precipitation. Zone 7 is located in the southwest of China, in which the average altitude is greater than 4,500 m. This region has low temperature, strong solar radiation, and a large number of sunshine hours.

The A-P Equation
According to the A-P model, the global Rs was calculated using the astronomical and geographical factors (such as latitude) of 105 radiation stations from 1981 to 2003. The model is as follows: where a and b are the A-P parameters, n/N is the long-term annual average daily sunshine fraction, and Ra is the daily extraterrestrial solar irradiance on a horizontal surface (MJ·m −2 ·d −1 ), which can be calculated by the following formula: where G sc is the solar constant, equal to 0.0820 MJm −2 ·min −1 ; J is the day of the year; N is the maximum possible sunshine duration (hr); and φ is latitude (rad).

Selected Equations for Predicting a and (a + b)
To satisfy both simplicity and accuracy, six equations containing one to three variables for a and two equations for (a + b) were selected (summarized in Table 1). All of these equations were acceptable according to the study by Liu et al. (2012Liu et al. ( , 2014. The combination of these equations for a and (a + b) formed eight models: These were reorganized such that Models 1-6 referred to the combinations of equations (1) to (5) for a with equation (2) Note. Z is altitude in km, P is annual average precipitation in cm, and Φ is longitude (°E).

10.1029/2019EA000635
Earth and Space Science the combination of equations (3) and (4) for a, with equation (1) for (a + b), which showed the best performance in Liu et al. (2012).

Performance Indicators
The accuracy and performance of the studied models were evaluated and compared using four commonly used statistical indicators and a single statistical index proposed by Despotovic et al. (2015): These were the coefficient of determination (R 2 , equation (7), root mean square error (RMSE, equation (8), Nash-Sutcliffe efficiency (NSE, equation (9), percent bias (PBIAS, equation (10), and global performance index (GPI, equation (11)). They were calculated as The R 2 value indicates how well-observed values are replicated by a model, with higher R 2 values indicating a better performance.
The RMSE is the mean of the square root of the error between the predicted and true values. In the actual measurement, the number of observations (n) is always finite, and the true value can only be replaced by the most reliable (best) value. The square root error is within a set of measurements. A very large or very small error reflects a high sensitivity. The RMSE can well reflect the accuracy of measurement. The smaller the RMSE, the higher the accuracy.
NSE values lie between −∞ and 1 (including 1). When NSE < 0, the model performance is unacceptable. For a better model performance, the NSE should approach 1.0 as closely as possible.
PBIAS is used to compare the sizes of the average trends of the simulated and observed data. When the PBIAS is positive, the model underestimates the deviation, and vice versa. A value approaching 0 indicates a better model performance.
where y j is the median of the scaled values of the jth indicator, y j is the scaled value of the jth indicator for the ith model, and αj is equal to −1 for j = 1 (R 2 , NSE) and 1 for the other statistical indicators.

The A-P Parameters of the Calibrated and Predictive Models 3.1.1. The A-P Parameters of the Calibrated Model
The statistics of the calibrated A-P parameters in different zones ( Figure 2 and Table 2) showed that parameter a had low values in southern and central China and high values in the northern regions. The average value of a was highest in Zone 6 while lowest in Zone 3. The spatial distribution of parameter b was opposite to that of a, with the largest mean value in Zone 7 and the smallest in Zone 4. The mean value of (a + b) was largest in Zone 7 and smallest in Zone 4, showing a similar trend to that of parameter a in the spatial distribution pattern.
Compared with the values of the parameters (a = 0.25 and b = 0.50) given by FAO-56, the average calibrated parameter a in each zone was lower, while the average calibrated parameter b was higher. The national average (a = 0.19, b = 0.55) was 25% lower for a and 10% higher for b than the FAO-56 recommended values. The deviations indicated that local calibration of the parameters was significant, although this was often ignored owing to the limited availability of measured Rs data. Partial correlation analysis between the calibrated A-P parameters and other factors such as latitude, precipitation, and altitude from 105 sites (Figure 3) was conducted. Results revealed that parameter a was significantly (r > r 0.05 ) affected by n/N, followed by P, Z, Φ, and T. Parameter b was significantly correlated (r > r 0.05 ) with n/N and T. The sum (a + b) was also correlated with many variables, such as P, Z, and T; however, there was no significant correlation between (a + b) and these variables.

10.1029/2019EA000635
Earth and Space Science

The A-P Parameters of the Predictive Model
Parameter a was predicted by six equations, referred to as a1 to a6, and parameter (a + b) was predicted by two equations, referred to as (a + b)1 and (a + b)2. According to the two-step methods, eight series of parameter b data were obtained, described as b1 to b8. The statistics of the predicted parameters for the whole country and seven zones (supporting information Table S1) showed that compared with the calibrated a, the predicted a's of eight predictive models all had a smaller variation across the China. The spatial distribution of parameter a in eight predictive models was similar to that of calibrated a, which was largest in Zone 6, followed by Zones 1, 5, 7, 2, 3, and 4. There was almost no variation between the zones for the mean values of a3, which were all nearly equal to 0.18. Similarly, the predicted parameter (a + b) displayed less variation than the calibrated values for the whole of the country, and the variation was largest in Zones 2 and 7.
The mean values of predicted (a + b) in the zones were ranked from large to small as Zones 7, 1, 2, 6, 5, 1, 3, and 4. With the exception of b3, the mean values of parameter b from the predictive models for the whole country were same, being slightly lower than the calibrated value. The distribution of the predicted parameter b, with the exception of b3 and b8, had the largest mean value in Zone 3, followed by Zones 2, 4, 7, 5, 1, and 6, which differed from the distribution of the calibrated value. A large variation of the predicted value of b was evident in Zones 2 and 7, while b8 in each area varied little when calculated using parameter a3.

The Regional Performance of Individual Models 3.2.1. The Performance of the Calibrated Model
Across the whole of the country, the mean values of R 2 , RMSE, NSE, and PBIAS for the calibrated model were 0.89, 2.44 MJ/m 2 , 0.88, and 2.20, respectively. The performance of the calibrated model varied between zones (Table S2). When tested using the R 2 value, the calibrated model was found to perform best in Zone 1, followed by Zones 2, 6, 4, 3, and 5. The RMSE statistic indicated that the calibrated model had the smallest error in Zone 1, followed by Zones 6, 5, 2, 7, 4, and 3, meaning that this model yield higher accuracy in drier northern areas. The mean value of NSE was 0.88 for the whole of the country, and the calibrated model performed best in Zone 6, followed by Zones 1, 5, 2, 3, 4, and 7. PBIAS showed that the calibrated model underestimated the Rs value at most stations with an average value of 2.20 for the country; however, there was an overestimation in Zone 6.

The Performance of the Predictive Models
The results for the four validating indexes showed that all predictive models performed well in estimation of Rs across the country (Table S2). The mean R 2 and NSE values for the predictive models were above 0.89 and 0.88, respectively, indicating that Rs could be estimated with acceptable accuracy. The mean RMSE values for the eight predictive models were lower than 2.44, which also indicated acceptable accuracy. However, Figure 3. Partial correlations between the calibrated Ångström-Prescott (A-P) parameters and various factors (φ is latitude, Φ is longitude, Z is altitude, P is average precipitation, n/N is the long-term annual average daily sunshine fraction, and T is the average annual temperature).

Earth and Space Science
PBIAS varied between −14 and 17, and the smallest and largest variations were 4.98 and 5.44, indicating a respective overestimation and underestimation at the different stations.
The performance of the eight predictive models showed little variation within a single region; however, there were differences between the zones. The mean R 2 value of the predictive models was largest in Zone 6 (0.92), followed by the values for Zones 1, 5, 2, 7, 3, and 4. The performance of the predictive models in the same region showed extremely small variation. The RMSE statistic showed that all models were more accurate in Zone 1, with an average value of 2.26 MJ/m 2 , followed by Zones 6, 7, 2, 5, 4, and 3. The NSE statistical index indicated that all models performed best in Zone 6, with an average value of 0.92, followed by Zones 1, 5, 2, 3, 7, and 4. This indicated that the A-P model performed best in arid zones with temperate climates and poorest in subtropical zones with humid climates. The performance of the A-P model varied considerably in Zones 4 and 7, which had the most diverse climate types among the various zones. As with the calibrated model, the PBIAS at most sites indicated that the predictive models underestimated the Rs.

Determining the Regionally Best Performing Model
The fact that all predictive models validated by the four statistical indicators performed well and that there was no significant difference between the models in each zone revealed that these four indicators could not be used alone to determine the best model in each area. However, model performance varied across the different areas owing to differences in climate and topography. It was necessary to determine whether there was a significant deviation among these models and to select the appropriate model for Rs estimation in each area. Thus, the GPI was applied to compare the performances of the models in each zone.
The GPI values and ranking of the eight predictive models for each area (Table 3) showed that in Zone 1, Model 6 performed best and Model 7 performed poorest. Model 6 considered three factors for predicting parameter a, while Model 7 only considered altitude. This indicated that in northeast China, the A-P parameter was influenced by multiple factors and considering only altitude to estimate Rs would result in large errors. In Zone 2, Model 1 performed best and Model 7 performed poorest, with the PBIAS value of Model 7 indicating an underestimation of Rs. This demonstrated that precipitation was the most important factor for the prediction of the A-P parameter a in north China, with Rs estimated by Model 1 with acceptable accuracy. In Zone 3, Model 7 was the best performing model, followed by Model 3, while Model 2 was the worst. The equation of parameter a was based on altitude in Models 8 and 3, and on sunshine duration in Model 3, indicating that altitude rather than sunshine duration was the dominant factor affecting Rs in the region. In Zone 4, Model 3 performed best, followed by Model 7, while Model 8 performed worst. The equation for parameter a in Models 3 and 8 was the same, being based on altitude, while in Model 8, it was based on altitude and precipitation, indicating that the influence of altitude on parameter a played a leading role in the region. The prediction of parameter (a + b), which was affected by many factors, was vital to the estimation of Rs because the regional climatic conditions were changeable and complex. In Zone 5, Model 6 (based on multiple factors) yielded the best performance, while Models 3 and 8 gave the worst, indicating that the A-P parameter a was affected by multiple factors in this region. Large errors were produced if only altitude was considered in the predictive model. In Zone 6, model 1 performed best, while Models 7 and 3 performed worst, showing that precipitation had the most important influence on Rs in this arid desert region. In Zone 7, Model 7 based on altitude gave the best performance, followed by Models 8, 6, 5, 4, 2, 3, and 1.  This demonstrated that the prediction of (a + b) was more important than a in the Qinghai-Tibet region and that altitude was an important factor affecting (a + b) when estimating Rs.
To evaluate the prediction accuracy of Rs, estimated from the regional best performing predictive models and the calibrated model, specific values of the models' statistics in each zone were compared ( Figure 4). As shown in Figure 4, the R 2 values of both the calibrated model and the predictive models in each region were very close to the 1:1 line, which means that the Rs determined from the predictive models and calibrated model were in good agreement. This was also indicated by comparison of NSE values. These two statistics also indicated that the predictive models in arid areas, such as Zones 6, 5, and 1, were more accurate for estimating Rs than the models in humid areas. Unlike the R 2 and NSE statistics, the RMSE, and especially the PBIAS, indicated differences between the performances of the calibrated and best predictive models.
Although the PBIAS values of the calibrated and predictive models showed underestimations at most stations, there were large variations in the PBIAS values between the calibrated model and the best regional predictive models in each region. In particular, the deviation between the PBIAS values of the calibrated model and the best performing predictive model in Zones 1, 4, and 7 was clearly larger than in other regions. This was because the predictive model in some stations generated an overestimation bias, while the calibrated model underestimated the Rs, and vice versa, resulting in a large difference between the two models. It is not known why the results from two individual stations, Changning station in Zone 3 and Yushu station in Zone 7, had R 2 and NSE values lower than 0.75 for both the calibrated and the predictive models, and why the RMSE values were larger than 4 MJ/m 2 , while the PBIAS values of the two stations were smaller than 1.

Main Factors Affecting the A-P Parameters
The variation in parameter values was related to geographical location, weather systems, and atmospheric conditions (Adaramola, 2012;Martínez-Lozano et al., 1984). According to the literature, parameter a is the astronomical radiant fraction that reaches the surface of the earth on a cloudy day and is affected by atmospheric conditions such as humidity, dust content, cloud type and thickness, and the concentration of pollutants (Almorox & Hontoria, 2004); it varies with the altitude of stations (Rensheng et al., 2006) and depends mainly on the type and thickness of the cloud, increasing with increasing cloudiness (De Souza et al., 2016). Parameter b reflects the transport properties (aerosol density) of a cloudless atmosphere, affected by altitude and depends mainly on the total water content and turbidity of the atmosphere (Liu et al., 2009). The sum (a + b) represents the clearness index under a clear sky and increases slightly with altitude (Paulescu et al., 2016).
In this study, a partial correlation analysis between calibrated parameters and variables indicated that the calibrated parameters were mainly influenced by sunshine duration, temperature, altitude, and precipitation. The spatial distribution of the calibrated parameters was described in section 3.1, which outlines how parameters generally varied with altitude. Liu et al. (2014) reported that only Z and P in the parameter model could be reliably used to predict the parameters. Our study confirmed that the predictive models that take account of variations in altitude performed best in most regions. For example, Model 6 in Zone 1, Model 3 in Zones 2 and 3, Model 8 in Zones 4 and 5, and Model 5 in Zone 6 performed best, with all these models taking altitude into account, which implies that altitude was the main factor determining the A-P parameters in most regions. Our results concurred with those of Paulescu et al. (2016), which demonstrated that altitude is a mandatory input variable to the A-P model. De Souza et al. (2016) also revealed that a is dependent on altitude, while b is dependent on latitude and altitude. They showed that the dependency of parameters a and b on latitude and longitude was crucial in their fitting; in contrast, in our study, predictive Model 6, which depended on latitude and longitude, did not perform better than altitude-based predictive models. Additionally, the partial correlation between the parameters and latitude or longitude was not significant. Thus, the significance of the variables of latitude and longitude for predicting the parameters was not superior to that of altitude. Liu et al. (2014) concluded that the accuracy of the predictive model increased with the number of variables (≤3) considered in the model (e.g., Model 6 was the best performing model in their study). In our study, Model 6 performed best only in Zones 1 and 5, while in other regions, the predictive models that considered one or two variables for parameter a performed better. This indicates that the conclusion of more variables to develop the A-P parameter model is no guarantee of its wider applicability, which provides strong evidence of the need to simplify the predictive model for parameters.

Determination of the Best Regional Parameter Model
The differences between the performance of the calibrated model and the predictive models in different regions demonstrated the importance of studying the local applicability of different parameterization methods. There is unlikely to be a globally applicable radiation model, and local climate factors dictate the performance of these models; therefore, they must be locally tuned . China is a vast country, with large differences in local geographic and climatic conditions, and there is a lack of Rs measurements in many areas, where the factors influencing the parameters vary. Therefore, it is necessary to select suitable models for predicting the parameters in different regions. In this study, a comparative analysis of the predictive models in the regions discussed in section 3.3 showed that there were differences in the regional performance of the eight predictive models, reflecting the regional applicability of the models. Moreover, the calibrated model and the eight predictive models had the best performance in the arid regions of northern China, with R 2 and NSE values higher than 0.91 and 0.90, respectively, while the biases in the southern humid regions were relatively large. The RMSE in south China and central China (Zones 4 and 3) were 2.7 and 2.5 MJ/ m 2 , which was consistent with the results of previous studies. The reason for these differences was that southeast China has a humid subtropical and tropical climate, with abundant precipitation and high cloud cover, which affects the accuracy of determining the A-P parameters. Liu et al. (2012) suggested that it is important to improve the accuracy of the model in humid areas. In this study, Models 7 and 3 had the highest GPI values in Zones 3 and 4, respectively, corresponding to an RMSE of 2.45 and 2.67 MJ/m 2 , that is, a relatively small error. Zone 7 (the Qinghai-Tibet Plateau), with abundant radiation resources, deserves further study. Owing to its high elevation, few radiation stations have been established, and therefore, few research studies have applied the A-P model to estimate the Rs over the region (Pan et al., 2013). Our results showed that Model 7, which was based only on altitude, performed best in the Qinghai-Tibet Plateau, with an NSE of 8.62, indicating that altitude is a major factor in predicting the parameters. Liu et al. (2017) used altitude and water vapor pressure as the main factors to predict A-P model parameters, with the results indicating an NSE of 0.856, which was lower than that obtained using Model 7 in our study. Therefore, owing to its better performance and accessibility, we recommend using the simple Model 7 to estimate Rs on the Tibetan Plateau.

Conclusions
Based on daily Rs and meteorological data from 105 radiation stations over seven natural geographic zones in China during 1981-2016, the performance of the calibrated A-P model and eight predictive models for A-P parameters were evaluated, and the best performing predictive models for each region were obtained. Altitude was the main factor in determining the A-P parameters in most regions. The performances of all models were acceptable, but varied regionally. The best performing predictive models for Zones 1 to 7 were Models 6, 1, 7, 3, 6, 1, and 7, respectively. These predictive models were recommended for the estimation of daily global Rs in the corresponding regions where the calibration cannot be conducted. The findings of this study have practical implications for increasing parameter availability and provide a reference for A-P model Rs estimation.