Testing reanalysis data sets in Antarctica: Trends, persistence properties, and trend significance
Abstract
The reanalysis data sets provide important sources for investigating the climate in Antarctica where stations are sparse. In this paper, we compare the 2 m near-surface temperature data from five major reanalysis data sets with observational Antarctic stations data over the last 36 years: (i) the National Centers for Environmental Prediction and the National Center for Atmospheric Research Reanalysis (NCEP1), (ii) NCEP-DOE Reanalysis 2 (NCEP2), (iii) the European Centre for Medium-Range Weather Forecasts Interim Reanalysis (ERA-Interim), (iv) the Japanese 55 year Reanalysis (JRA-55), and (v) the National Aeronautics and Space Administration Modern-Era Retrospective-analysis for Research and Applications. In our assessment, we compare (a) the annual and seasonal trends obtained by linear regression analysis, (b) the standard deviation around the annual trends, (c) the detrended lag-1-autocorrelation C(1), (d) the Hurst exponent α that characterizes the long-term memory in a record, and (e) the significance levels of the warming/cooling trends. We find that all five reanalysis data sets are able to reproduce quite well the long-term memory in the instrumental data. In contrast, C(1), which is needed as input for the conventional significance analysis, shows fully erratic behavior. The observational warming/cooling trends in East and West Antarctica are not reproduced well by all reanalysis data sets, in particular, NCEP1, NCEP2, and JRA-55 show spurious warming trends in many parts of East Antarctica, even in those parts where cooling has been observed. In contrast, the standard deviation around the trends is quite well reproduced by all reanalysis data sets. In the peninsula where the station density is quite high, the performance of the reanalysis data is considerably better. It is remarkable that all reanalysis data sets as well as the observational data show (under the assumption of a long-term persistent process) that in the considered time period since 1979 the warming in the peninsula is not significant with p values well above 0.1.
Key Points
- The observational warming/cooling trends in East and West Antarctica are not reproduced well by all reanalysis data sets
- In the peninsula where the stationdensity is quite high, the performance of the reanalysis data is considerably better
- All reanalysis data sets and the observational data show that in the considered time period the warming in the Peninsula is not significant
1 Introduction
In recent years, the warming patterns of Antarctica have received much attention [e.g., Turner et al., 2005; Steig et al., 2009; Ding et al., 2011; Bromwich et al., 2013]. The large Antarctic Ice Sheet is one of the crucial tipping elements in the global climate system [Lenton et al., 2008], and a change of the dynamics and mass balance of the Antarctic Ice Sheet may have widespread implications, contributing and finally dominating the global and regional sea level rise [Turner and Marshall, 2011]. Thus, addressing the Antarctic temperature change in a long-term context and determining its significance is an important issue [Schneider et al., 2006; Steig et al., 2013; Bromwich et al., 2013; Bromwich and Nicolas, 2014; Bunde et al., 2014; Ludescher et al., 2015; Tamazian et al., 2015]. However, this is quite a challenging problem due to the lack of long observational data sets.
Accordingly, reanalysis data sets that are able to fill the gaps in data-sparse regions and help give the full physical picture of the climate change in Antarctica are highly needed and appreciated [Ding et al., 2011, 2012; Bromwich et al., 2013]. Reanalysis data also play a central role in reconstructing Antarctic near-surface temperature [Monaghan et al., 2008; Steig et al., 2009; Nicolas and Bromwich, 2014]. The most commonly used physical quantities are mean sea level pressure, geopotential height at 500 hPa (Z500), near-surface air temperature (at 2 m, T2m), as well as geopotential height at higher pressure levels such as 30 hPa (Z30) [Thompson and Solomon, 2002; Ding et al., 2011, 2012; Bromwich et al., 2013].
Until now, several reanalysis data sets have been released. The most commonly used data sets include the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) reanalysis (hereafter, NCEP1), NCEP-DOE Reanalysis 2 (hereafter, NCEP2), the European Centre for Medium-Range Weather Forecasts (ECMWF) Interim Re-Analysis (hereafter, ERA-Interim), the Japanese 55 year Reanalysis (hereafter, JRA-55), and the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective-analysis for Research and Applications Reanalysis (hereafter, MERRA). Besides Antarctic research, these data sets are also widely used to study atmospheric dynamics and provide invaluable data sources for interdisciplinary research [e.g., Rozanov et al., 2005; Rodger et al., 2008; Seppälä et al., 2013; Lam and Tinsley, 2015; Regi et al., 2016; Gozolchiani et al., 2011; Yamasaki et al., 2008; Donges et al., 2009; Berezin et al., 2012; Wang et al., 2013; Ludescher et al., 2013, 2014]. Accordingly, the reliability of these reanalysis data sets is of great interest.
In this paper, we are interested in the quality of the near-surface (2 m) temperature data provided by the five reanalysis data sets over Antarctica. Previous evaluations of the performance of reanalysis data [Hines et al., 2000; Marshall and Harangozo, 2000; Marshall, 2003; Bromwich and Fogt, 2004; Bromwich et al., 2007, 2011; Cullather and Bosilovich, 2011; Bracegirdle and Marshall, 2012; Jones and Lister, 2015; Nicolas and Bromwich, 2014] concentrated mainly on the biases of the data and the temperature patterns. Here we assess the reanalysis skills by focusing for the first time on (a) the annual and seasonal trends obtained by linear regression, (b) the standard deviation around the annual trends, (c) the detrended lag-1-autocorrelation C(1), (d) the Hurst exponent α that characterizes the long-term memory in a record, and (e) the significance levels of the annual warming/cooling trends. We concentrate on the time span between January 1979 and December 2014, when the modern satellite data were assimilated.
It is obvious that the magnitude of a trend and the fluctuations around the trend line are central quantities of a temperature record, in particular, in the context of global change. For describing the persistence of a record, C(1) and α are central quantities. In most previous attempts to obtain quantitatively the significance of a warming or cooling trend it has been assumed that the data are short-term persistent such that the detrended lag-s-autocorrelation function C(s) decays exponentially, C(s) = C(1)−s. As far as we know, there are no theoretical or observational arguments why atmospheric (or sea surface) temperature should decay exponentially. The exponential decay has been assumed as the simplest way of describing the persistence of the record. Furthermore, there are no theories that describe the value of C(1) as a function of the location of the considered station. Since the persistence is fully determined by C(1) in this case, the significance of a trend is also determined solely by C(1)[Santer et al., 2000], making C(1) an essential quantity.
In the past few decades it has been realized based on data analysis that temperature records are not short-term persistent but long-term persistent all over the globe [e.g., Hurst, 1951; Mandelbrot and Wallis, 1968; Bloomfield and Nychka, 1992; Koscielny-Bunde et al., 1998; Malamud and Turcotte, 1999; Fraedrich and Blender, 2003; Eichner et al., 2003; Monetti et al., 2003; Vyushin et al., 2004; Cohn and Lins, 2005; Király et al., 2006; Rybski et al., 2008; Lennartz and Bunde, 2009; Franzke, 2010, 2012; Lovejoy and Schertzer, 2013; Ludescher et al., 2015; Yuan et al., 2015]. In long-term persistent records, C(s) has been found not to decay exponentially, but by a power law, . The exponent γ can be obtained best by using the detrended fluctuation analysis (DFA) [Peng et al., 1994; Kantelhardt et al., 2001] (see section 5), where the Hurst exponent α = 1−γ/2 is measured. Typically, for coastline stations α is around 0.65 (±0.10), while for island stations and sea surface temperature α is significantly larger, being around 0.75 (±0.15) for island stations and around 0.8 (±0.10) for sea surface temperatures [Koscielny-Bunde et al., 1998; Malamud and Turcotte, 1999; Fraedrich and Blender, 2003; Eichner et al., 2003; Monetti et al., 2003; Király et al., 2006; Franzke, 2010, 2012; Ludescher et al., 2015; Yuan et al., 2015]. The error bars refer to the 95% confidence interval. Recently, it has been demonstrated explicitly by using DFA2 that the Antarctic temperature records cannot be considered as short-term persistent [Bunde et al., 2014; Ludescher et al., 2015; Yuan et al., 2015]. Instead, as for the rest of the globe, also the Antarctic temperature data are long-term persistent, with α values similar to those found in other locations around the globe.
There is no comprehensive theory that describes the origin of the long-term persistence in temperature data. However, studies of general circulation models show that apart from the inertia of the oceans, the natural forcings play an important role, in particular, volcanic forcing [Vyushin et al., 2004]. Since the atmospheric and the sea surface temperatures are long-term persistent, the Hurst exponent α is the central quantity that quantifies the temperature “landscape” of a record, and it is important to know if the reanalysis data are able to reproduce proper Hurst exponents or not. From the Hurst exponent of a long-term persistent record one can derive quantitatively the statistical significance of a warming or cooling trend [Lennartz and Bunde, 2009, 2011; Tamazian et al., 2015, 2009]. Since in the context of global change the statistical significance of a trend is of great interest, we also compare their values for the different reanalysis data sets in Antarctica.
In this article, we are not the first to study the temperature patterns in Antarctica and the long-term persistence in the near-surface temperature data. We do not aim here to establish a theory for the various temperature patterns as well as for the long-term persistence with the slightly varying Hurst exponents. Instead, we wish to quantify the cooling/warming trends and the Hurst exponents in the observational and the reanalysis data and compare them to each other. This comprehensive comparison is important since the observational data is sparse in Antarctica, and we need to know to which extent we can rely on the reanalysis data sets, in order to investigate the Antarctic temperature change as well as the teleconnections with other regions in the globe. Our study shows that all reanalysis projects are unable to describe satisfactorily the annual and seasonal trends in West and East Antarctica where the station data are sparse. Regarding the dynamical behavior, we find that C(1) fluctuates very strongly in all reanalysis data sets, such that different reanalysis sets would provide very different persistent properties for the same station, ranging from intermediate antipersistence to strong persistence in case the conventional picture of a dominating short-term persistent process was correct. We show that this is another indication that the Antarctic temperatures are not mainly short-term persistent.
Indeed, one of the remarkable results of this study is that all reanalysis data are able to reproduce the long-term persistence of the observational temperature records, with comparable Hurst exponents. The second remarkable result is that all reanalysis data sets except JRA-55 describe the situation in the peninsula (which is considered to be one of the fast warming regions on Earth) reasonably well. The results show that from 1979 on the warming is not significant, with p values well above 0.1. Accordingly, after 1979 the temperature trends in the peninsula are well within the bounds of natural variability, in agreement with the results of Ludescher et al. [2015] and Turner et al. [2016].
This paper is organized as follows. In section 2, we provide the source of the Antarctic stations data and describe the five reanalysis data sets. In section 3, we present the results of our assessment. Section 4 concludes the article with a short summary. Section 5 describes the methods, including long- and short-term persistence, detrended fluctuation analysis (DFA2), and statistical significance analysis in detail.
2 Data
2.1 Observation Data Set
In this study, we use monthly mean observation T2m data (from January 1979 to December 2014) from 12 Antarctic stations (four in West Antarctica including the peninsula and eight in East Antarctica) provided by the British Antarctic Survey Reference Antarctic Data for Environmental Research project (READER) [Turner et al., 2004] (see http://www.antarctica.ac.uk/met/READER/). The stations in the peninsula are Bellingshausen, Rothera, and Faraday/Vernadsky, the West Antarctic station is McMurdo, and the East Antarctic stations are Halley, Syowa, Mawson, Davis, Dumont-Durville, Mirny, and Amundsen-Scott (the South Pole). We have chosen these stations because they provide us with the longest reliable temperature records and also have been subject of a recent study on the persistence of Antarctic temperatures and the significance of their warming/cooling trends [Ludescher et al., 2015]. The READER data set provides continuous observational temperature records for the Antarctic region. To fill the short temporal gaps in the data, we apply linear interpolation. In addition, because of the lack of long records of station data in the interior of the West Antarctica, we also analyzed the reconstructed data set at Byrd station [Bromwich et al., 2013]. Figure 1 shows the locations of all stations studied here. At each location, we show the temperature change and the Hurst exponent that characterizes the long-term persistence in the considered time window.
2.2 Reanalysis Data
We consider five reanalysis data sets NCEP1, NCEP2, ERA-Interim, JRA-55, and MERRA, which are widely analyzed. NCEP1 uses a frozen state-of-the-art global assimilation system [Kalnay et al., 1996]. The data assimilation and the model used are identical to the global system implemented operationally at NCEP on 11 January 1995 but with a reduced horizontal resolution of T62 (∼209 km). The spectral model used in the assimilation system contains 28 vertical levels from 1000 hPa to 3 hPa. Its temporal coverage is four times per day from 1 January 1948 to the present day.
NCEP2 is an improved version of the NCEP1 model with the same horizontal and vertical resolutions, but fixed errors and updated parameterizations of the physical processes, such as the new boundary layer, new short wave radiation in the model, and improved sea ice sea surface temperature fields [Kanamitsu et al., 2002]. The data records range from 1 January 1979 to the present day. Both NCEP1 and NCEP2 reanalyses use a three-dimensional variational analysis scheme in the data analysis module. The major input observational data for NCEP1/NCEP2 are the NCEP global upper air Global Telecommunication System (GTS) data from 1962 provided by the National Center for Atmospheric Research (NCAR). Another data source is the NCEP global surface GTS data from 1967. Both data include all available Antarctic stations. These input data are combined with other data sets, such as satellite data, surface marine data, and surface wind data. Therefore, the output T2m is strongly affected by both the observational data and the model parameterizations [Kalnay et al., 1996; Hines et al., 2000].
ERA-Interim, on the other hand, uses a four-dimensional variational assimilation scheme with a 12 h analysis window, and the data assimilation system is based on a 2006 version of the ECMWF Integrated Forecast Model. The spatial resolution of the model is T255 (∼80 km) on 60 vertical levels from the surface up to 0.1 hPa [Dee and otheers, 2011]. The data are continuously updated in real time from 1 January 1979. Moreover, ERA-Interim and NCEP1/NCEP2 use different assimilation schemes of satellite data. While ERA-Interim assimilates raw satellite radiances, both NCEP1 and NCEP2 use satellite retrievals. Retrievals estimate the vertical temperature and humidity profiles through a series of empirical and statistical relationships, while raw radiances are direct measurements of atmospheric radiation acquired by the satellite sensors [Bromwich et al., 2007]. Moreover, ERA-Interim reanalysis data use the GTS surface data from both ECMWF and external institutions, such as NCEP/NCAR and Japan Meteorological Agency (JMA).
JRA-55 is known as the second reanalysis project conducted by the JMA, which uses a constant state of the art but more complicated data assimilation system compared with their first reanalysis data set. JRA-55 reanalysis mainly uses observational data provided by ECMWF, which includes both ECMWF and NCEP/NCAR surface and upper air observations from 1958. Therefore, it includes similar Antarctic stations as ERA-interim reanalysis. In addition, newly available observational data sets are collected whenever possible, including those used in past operational systems and delayed observations as well as digitized observations. High-quality reprocessed satellite data are also assimilated where available. This reanalysis data covers date from the year 1958, when regular radiosonde observation began on a global basis [Kobayashi et al., 2015].
MERRA is a NASA atmospheric data reanalysis for the satellite era using a major new version of the Goddard Earth Observing System Data Assimilation System Version five (GEOS-5). MERRA focuses on historical analyses of the hydrological cycle on a broad range of weather and climate time scales. MERRA also has very high spatial resolution (0.5° × 0.67°) that might show improved skills [Rienecker et al., 2011]. MERRA reanalysis data set mainly uses NCEP land surface observations from 1970. Therefore, all available Antarctic stations are also included as a part of input data. MERRA also used radiosonde data that were quality controlled by NCEP, with additional corrections. The conventional observational data used in MERRA include British Antarctic Survey radiosonde observational data.
Here we study the monthly mean T2m records in these five reanalysis data sets. To compare with the observational station records, we choose in the reanalysis records the nearest land point to a given selected station, since most stations analyzed are located near the Antarctic coast. However, if the nearest land point is more than 150 km away (only in the Antarctic Peninsula), we choose the nearest reanalysis grid point instead. Note that many previous studies used the dry adiabatic lapse rate (9.8°C km−1 or 6°C km−1) to account for the Antarctic near-surface air temperature [Jones and Lister, 2015; Bracegirdle and Marshall, 2012]. We do not need to use height adjustment in this study since all quantities we are interested in are independent of this correction.
3 Results
3.1 Annual Trends and Standard Deviation Around the Trend Lines
Accordingly, we define the total temperature change by Δ = b(L − 1). The standard deviation around the regression line is . The relative trend x relevant for the statistical significance analysis is defined by the ratio between the total trend and the standard deviation, x = Δ/σ, as defined by Tamazian et al. [2015].
Figure 2 compares, for 4 of the 13 stations (Dumont-Durville, Amundsen Scott, Rothera, and McMurdo), the annual READER temperature time series (upper curves) with the corresponding data sets of the five reanalysis projects considered here. The straight lines are the regression lines. The y axis is in degrees Celsius. To avoid overlapping of the data, we have added to the different reanalysis sets arbitrary offsets (for transparency since we are interested in the trends and the structure of the temperature landscapes). One can see that the observational temperature trends are not reproduced well by all reanalysis data. For Dumont-Durville, the observed temperature trend is negative, but two of the data sets (NCEP1 and NCEP2) show a remarkable warming trend. For Amundsen-Scott, there is a moderate warming trend in the observational data. The reanalysis data show a mixed behavior, ranging from strong warming (NCEP1) to considerable cooling (ERA-Interim). In contrast, the variation of the data around the trend line is comparable in all cases, and also, the characteristic “mountain-valley” structure of the READER data that is generated by their persistence properties, is roughly in line with the reanalysis data sets.
Figure 3 shows the result of our quantitative analysis of Δ and the standard deviation σ along the regression line. In the figures, Bellingshausen, Dumont-Durville, and Amundsen-Scott are abbreviated by Bell, DD, and AS, respectively. In the Antarctic Peninsula, the observational data show warming trends at Faraday and Rothera, while in Bellingshausen there is no obvious temperature change in the past 36 years. Almost all reanalysis data sets show warming patterns except JRA-55, and the warming trends are in reasonable agreement with the observations. JRA-55, however, shows almost no temperature change in the last 36 years in the peninsula, which is inconsistent with the observational records. In West Antarctica, however, the ERA-Interim and JRA-55 reanalysis data are in good agreement with the observations, while the NCEP2 reanalysis exaggerates the warming trends considerably. In particular, NCEP2 overestimates the trends Δ at Byrd and McMurdo by a factor of 3.3 and 1.8, respectively. MERRA, on the other hand, shows almost zero warming at Byrd and McMurdo.
In East Antarctica, Figure 3 shows that the performance of NCEP1 and NCEP2 is not reliable, with considerable warming trends at Halley, Casey, Dumont-Durville, and Amundsen-Scott. Moreover, the JRA-55 reanalysis data also exhibit a strong warming trend at Casey. In contrast, the observational data at Halley and Casey show cooling trends with Δ=-0.26°C per decade and −0.23°C per decade, respectively. For Amundsen-Scott, NCEP1 and JRA-55 exaggerate the warming trend Δ by a factor of 3.6 and 1.5, respectively. The performance of the ERA-Interim/MERRA reanalysis data in East Antarctica is considerably better. An exception for ERA-Interim perhaps is the negative trend at Amundsen-Scott (∼−0.4°C per decade) that is inconsistent with the observation. Indeed, recent careful investigations showed that ERA-Interim did not assimilate any observations for Amundsen-Scott before the start of 1986 [Jones and Lister, 2015].
The physical reason that West Antarctica and the Antarctic Peninsula exhibit a warming trend in the near surface temperature data has been addressed by previous studies. For example, the warming in West Antarctica during austral spring is partly due to the Pacific South American (PSA) mode, especially PSA-1 mode that is a wave train extending from the tropics to the high southern latitudes [Schneider et al., 2012]. Moreover, the warming in the peninsula may be due to the extratropical Rossby wave train associated with tropical Pacific sea surface temperature anomalies [Ding et al., 2011; Ding and Steig, 2013].
Here Δi is the temperature change of a specific reanalysis data set at station i, i = 1,2,⋯,13, and is the corresponding observational temperature change from the READER data set. Table 1 shows the dimensionless relative standard error δr=δE/δO for all reanalysis data sets. We can obtain a threshold for δr by comparing the observational data with their shuffled counterpart. For the shuffled data, Δi≅0 for all stations, which leads to δr=1. We consider a reanalysis data set as unsatisfying when the δr is above the threshold. When the signs of Δi have the same signs as the of the observational data, then δr<1. The lower is δr, the better is the performance of the reanalysis data set.
Annual | JJA | SON | DJF | MAM | σ | α | C(1) | |
---|---|---|---|---|---|---|---|---|
NCEP1 | 1.44 | 1.62 | 0.93 | 1.24 | 1.94 | 0.27 | 0.08 | 1.23 |
NCEP2 | 1.66 | 1.72 | 1.27 | 1.47 | 1.68 | 0.18 | 0.08 | 0.93 |
ERA-I | 0.96 | 0.85 | 0.70 | 1.32 | 0.94 | 0.21 | 0.08 | 1.12 |
JRA55 | 1.21 | 1.16 | 0.82 | 1.37 | 1.30 | 0.21 | 0.09 | 1.05 |
MERRA | 0.77 | 0.65 | 0.85 | 1.23 | 0.79 | 0.24 | 0.09 | 2.08 |
- a The bold δr are the ones with value smaller than the threshold of the null model. JJA: June–August, SON: September–November, DJF: December–February, and MAM: March–May.
As expected from the foregoing discussion, NCEP1, NCEP2, and JRA-55 show a quite poor performance, with δr of the annual trend well above 1, δr=1.44,1.66, and 1.21, respectively. The best performance can be found in ERA-Interim and MERRA, with δr=0.96 and 0.77, respectively. The very poor performance of NCEP1, NCEP2, and JRA-55 is a puzzle for us. It is unlikely that the inconsistency of the temperature change between observational data and reanalysis data is simply a manifestation of a low quantity of observations in this region, since many observational data have been assimilated after the year 1979. The inconsistency may rather result from the data assimilation methods and model details, in particular, from the biases in surface radiative flux that affects the surface temperature inversion. Note that JRA-55 does not show any temperature change in the Antarctic Peninsula in the past 36 years, which is inconsistent with the observational temperature change.
For the standard deviation σ along the regression line, there is no dramatic difference between reanalysis and observation data except at Mawson and McMurdo, where the ERA-Interim reanalysis data slightly exaggerate the variability. Moreover, at Amundsen-Scott, σ of JRA-55, and MERRA is one third less than observational data. As a result, the relative temperature trend x will show a very similar picture as the absolute trend Δ (i.e. Figure 3a). We have calculated our quality measure δr for the standard deviations. According to Table 1, δr varies between 0.27 for NCEP1 and 0.18 for NCEP2, representing only small deviations between reanalysis and observational data. In this case, however, unlike for the temperature trends, δr=1 does not provide the threshold for shuffled data, since the shuffling method yields σ unchanged.
3.2 Seasonal Warming/Cooling Trends
Next we address the seasonal warming/cooling trends in Antarctica. The results are shown in Figure 4. In the Antarctic Peninsula, the reanalysis data sets show a reasonable performance, as for the annual trends discussed above. An exception is again JRA-55, which during all seasons shows less warming trends at Faraday and Rothera than the observational data. The less warming trend as reflected by JRA-55 challenges the reliability of its underlying model, since the observational data in this region were well assimilated after the year 1979. Moreover, NCEP2 provides higher trends at Bellingshausen than the observational data during austral winter; ERA-Interim shows little evidence of warming trends at Faraday across all seasons.
In West and East Antarctica where the density of stations is low, the performance is much worse. During austral winter, the warming trends of NCEP1 and NCEP2 at Byrd are exaggerated by a factor of 2.6 and 4.7, respectively (reaching 0.88°C and 1.58°C per decade). MERRA, however, does not show any temperature change during austral spring, which is inconsistent with the strong warming trend in the observational data.
In East Antarctica, NCEP1 and NCEP2 are unreliable in describing the temperature trends during nonsummer seasons, where spurious warming trends are found at Halley, Casey, and Amundsen-Scott. Moreover, JRA-55 shows considerable warming trends at Casey and Amundsen-Scott during austral nonsummer seasons and austral summer, respectively. The observational data, in contrast, show either cooling (Halley and Casey) or modest warming (Amundsen-Scott). ERA-Interim and MERRA perform considerably better and show trends much closer to observation with slight deviations of ERA-Interim at Amundsen-Scott.
We have applied our accuracy measure δr to the seasonal trends in the reanalysis data sets. The results are listed in Table 1. As for the annual trends, the accuracy of NCEP2 is unacceptable, with δr values between 1.27 and 1.72. These high values can only occur when the reanalysis project produces warming, while the observational data show cooling and vice versa. Surprisingly, NCEP1 is slightly better, with the best performance (0.93) in austral spring and the worst performance in austral fall (1.94). The performance of JRA-55 is only slightly better, again with the worst performance (1.37) in austral summer and the best performance (0.82) in austral spring. In contrast, ERA-Interim and MERRA have a reasonable performance, where our accuracy index is below 1 for all seasons except austral summer.
3.3 Persistence Analysis: Lag-1 Autocorrelation
The second assumption is that also in short records C(1) can be obtained sufficiently accurately. However, it has been shown [Lennartz and Bunde, 2009] that this assumption is not fulfilled. The accuracy in the autocorrelation function depends on the record length L, and the results for C(s) are only reliable for s well below L/50. This means that in records of length L below 50 even C(1) cannot be calculated reasonably well. This sheds doubts on the conventional significance analysis as long as short records are considered, even in the case that the record is short-term persistent.
Under the condition that the record is short-term persistent, the sign of C(1) highly matters. When C(1) is positive, the data are persistent. When C(1) is negative, the data are antipersistent. Under the assumption that the record is long-term persistent, C(1) does not play an important role and is replaced by the Hurst exponent α. Figure 5 shows the linearly detrended lag-1 autocorrelation C(1) obtained from the annual mean temperature data at the different stations. The figure shows that for the same station, C(1) fluctuates strongly in the different reanalysis projects (with a δr value even exceeding 2 for MERRA), fluctuating between positive and negative values. If the data were indeed short-term persistent as postulated in several articles [e.g., Bromwich et al., 2013; Jones and Lister, 2015], this would imply that for the same station, some reanalysis projects would produce short-term persistent temperature curves, while the others would produce antipersistent curves. When observing the temperature data in Figure 2, however, such a difference cannot be recognized, since all curves for one station showed the same characteristic mountain-valley structure. For obtaining a threshold value for δr, we consider again shuffled observational data where C(1) = 0. In this case, as for the temperature trends considered above, δr=1 is the threshold. Table 1 shows that except for NCEP2 where δr is slightly below 1, all δr values are above 1.
Accordingly, the value of C(1) cannot describe sufficiently the structure of the temperature records, and thus, C(1) cannot represent a central quantity for the statistical significance of the data. Following Lennartz and Bunde [2009], we believe that the large fluctuations in C(1) between the different reanalysis projects are mainly produced by the finite size effects in calculating C(1), and not by different dynamical features in the reanalysis projects.
3.4 Persistence Analysis: Long-Term Correlations and Hurst Exponent
For γ between 0 and 1, α ranges between 1/2 and 1. Long-term persistence is not a feature of the Antarctic stations alone. Long-term persistence occurs all over the globe, in atmospheric temperatures as well as in sea surface temperatures. Typical values for α are 0.65 for continental and coastline stations [Koscielny-Bunde et al., 1998; Malamud and Turcotte, 1999; Fraedrich and Blender, 2003; Eichner et al., 2003; Király et al., 2006; Franzke, 2010, 2012; Ludescher et al., 2015; Yuan et al., 2015], 0.75 for island stations, and 0.8 for sea surface temperatures [Monetti et al., 2003]. The error bars corresponding to the 95% confidence interval are ±0.10 for coastline and continental stations, ±0.15 for island stations, and ±0.10 for sea surface temperatures. Long-term persistence (long-term memory) does not only occur in temperature records but is also known to characterize systems as diverse as river flows [Hurst, 1951; Mandelbrot and Wallis, 1968; Tessier et al., 1996; Montanari et al., 2000; Koutsoyiannis, 2006; Kantelhardt et al., 2006; Koscielny-Bunde et al., 2006; Mudelsee, 2007; Livina et al., 2003], sea level heights [Beretta et al., 2005; Dangendorf et al., 2014; Becker et al., 2014], wind fields [Santhanam and Kantz, 2005], and midlatitude cyclons [Blender et al., 2015]. Other examples include heartbeat intervals [Bunde et al., 2000; Peng et al., 1993], DNA sequences [Peng et al., 1992], the volatility in financial markets [Lux and Ausloos, 2012], and the arrangement of rare words in literary texts [Ebeling and Pöschel, 1994; Altmann et al., 2012].
We have performed a DFA2 analysis and determined F(s) for all temperature records considered here. Typical result for two stations, Rothera and Dumont-Durville, are shown in Figure 6. The figure shows that all fluctuation functions display the same perfect power law behavior, from which we can easily extract the Hurst exponent as slope in the double-logarithmic plots. The result for the Hurst exponents of all records is shown in Figure 7. One can see that the Hurst exponents for the observable data found by us in Antarctica are within the ranges expected for continental, coastline, and sea surface temperatures discussed above. The figure shows that the Hurst exponents of the five reanalysis projects, and the READER data are very close to each other, revealing that all the reanalysis projects are able to reproduce the long-term persistent nature of the instrumental data. Indeed, this may be expected from Figure 2, since the reanalysis data show the same characteristic mountain-valley structure generated by the long-term memory in the data as the instrumental data. Our accuracy measure gives the same very low value for δr for all reanalysis projects considered (see Table 1). A threshold value for δr can be obtained, as for the temperature trends and C(1), from shuffling the READER data. When shuffling, the long-term persistence vanishes, and the Hurst exponents become 1/2. This leads to the threshold value 0.31, which is above the δr values for all reanalysis data sets.
3.5 Significance of Trends
We have described, in the section 5, how the significance S of a trend can be obtained for (a) a short-term persistent process characterized by C(1) and (b) a long-term persistent process characterized by the Hurst exponent α. Despite the evidence that the Antarctic temperature data are long-term persistent (see Figure 6) [Franzke, 2010, 2012; Bunde et al., 2014; Ludescher et al., 2015; Yuan et al., 2015], we discuss the significance level of warming/cooling trends also for the hypothetical case of short-term persistence.
The results for the p values (p ≡ 1−S) are shown in Figure 8: (a) under the hypothesis that the data are short-term persistent and (b) for the assumption that the data are long-term persistent. We would like to emphasize that this result is only valid for the considered time window of 36 year, since our purpose is to assess the skills of reanalysis data sets. For most observational records the time window is indeed longer. Since the statistical significance of a trend increases with increasing length, the significance of the observational trend in the full time window might be larger than for the considered time window (see section 5 and Tamazian et al. [2015]).
Under hypothesis (a), only the observational records at Faraday and Dumont-Durville have p values below 0.05 and are thus statistically significant. In contrast, NCEP1, NCEP2, and JRA-55 produce highly significant trends at several stations in West and East Antarctica. As expected from Figure 3, the ERA-Interim and MERRA data sets produce more realistic trend significances. There are only few exceptions, for example, ERA-Interim at Amundsen-Scott, that implie a significant cooling trend that is not observed in reality; MERRA exhibits a significant warming trend at Mawson.
Under the more realistic assumption (b), all observational records have p values even above 0.1, and thus, the warming/cooling trends are not statistically significant in the considered time window. It is remarkable that despite all their differences, in the Antarctic Peninsula, the reanalysis data produce a comparable significance level, revealing that in the time period after 1979, the warming in the Peninsula is not significant. In the rest of Antarctica, however, NCEP1 and NCEP2 again show significant spurious warming trends at many stations. At Halley NCEP2 produces highly significant trends with p below 0.05. In contrast, the observational data have p values above 0.1, and the contrast could hardly be higher. Again, JRA-55 shows significant warming at Casey and Amundsen-Scott (p≤0.05). ERA-Interim (except at Amundsen-Scott) and MERRA (except at Mawson) perform much better.
4 Conclusions
In this paper, we assessed, for the first time, the warming trend and its significance as well as the persistence properties of five widely used global reanalysis data sets (NCEP1, NCEP2, ERA-Interim, JRA-55, and MERRA) in Antarctica. We considered the time period from January 1979 to December 2014, when modern satellite data were assimilated into the reanalysis data sets. We compared the reanalysis data sets with the longest observational T2m data from staffed observation stations across Antarctica. In our performance test, we first compared the absolute trends Δ and the standard deviation along the regression line σ. Then we considered the seasonal warming trends. Finally, we studied the persistence properties (lag-1 autocorrelation C(1) and Hurst exponent α) as well as the significance level (p value) of the relative trends.
We found that all five reanalysis data sets were able to reproduce nicely the long-term persistence in the instrumental data, with Hurst exponents quite close to each other. Also, the standard deviations along the regression lines are in very good agreement between reanalysis and instrumental data sets. In contrast, C(1), which is needed as input for the conventional significance analysis, shows fully erratic behavior, and the observational warming/cooling trends in East and West Antarctica (where the data are sparse) are not reproduced well by the reanalysis data sets. The worst performance showed NCEP1, NCEP2, and JRA-55 with spurious warming trends even in those parts of East Antarctica where cooling has been observed. In the peninsula where the station density is quite high, the performance of the reanalysis data is considerably better. Under the assumption of a long-term persistent process all reanalysis data sets as well as the observational data show that in the considered time period since 1979 the warming in the peninsula is not significant with p values well above 0.1.
Overall, ERA-Interim and MERRA showed the best performance when testing the local temperature data in Antarctica. Further work is needed in order to see if the data sets are also better in reproducing the known teleconnections between locations on different parts of the globe.
5 Methods
However, it has been argued recently that the temperature records in Antarctica [Bunde et al., 2014; Tamazian et al., 2015; Ludescher et al., 2015; Yuan et al., 2015], as well as in other places of the globe are not short-term, but long-term persistent [e.g., Koscielny-Bunde et al., 1998; Eichner et al., 2003; Fraedrich and Blender, 2003]. In a long-term persistent stationary process, the autocorrelation function decays algebraically as C(s) = (1 − γ)s−γ, where the correlation exponent γ is between 0 and 1. For detecting long-term memory one usually does not consider C(s) because it exhibits strong finite size effects, but the detrended fluctuation analysis called DFA [Peng et al., 1994; Kantelhardt et al., 2001]. The current standard method for detecting long-term memory in climate records is DFA2 [Kantelhardt et al., 2001], which is a modification of DFA [Peng et al., 1994] and eliminates linear external trends.
The detrended fluctuation analysis (DFA2) has been applied to a large number of temperature records all over the globe [e.g., Koscielny-Bunde et al., 1998; Fraedrich and Blender, 2003; Eichner et al., 2003; Lovejoy and Schertzer, 2013] revealing the long-term persistent nature of temperature records. It has been shown in Vyushin et al. [2004] that a major cause of the long-term memory is volcanic forcing.
DFA2 has been also applied to the monthly Antarctic records considered here in Bunde et al. [2014], Bromwich and Nicolas [2014], Ludescher et al. [2015], Yuan et al. [2015], and Tamazian et al. [2015]. It has been shown explicitly in Bunde et al. [2014] and Ludescher et al. [2015] that for each record, the fluctuation function F(s) agreed with the fluctuation function of long-term correlated surrogate data with the same length and Hurst exponent α.
It has been shown in Rybski et al. [2008] when considering millennium runs that the Hurst exponent was the same for monthly, annual, and biannual data, i.e., did not depend on the length of the averaged region in the time series. For a meaningful DFA analysis, the length of the record should not be smaller than 400 data points, and thus, for the Antarctic stations we can consider only monthly data.
When a record is fully characterized by a certain Hurst exponent α, the significance S of a relative trend x depends only on α and the record length N. It has been shown recently by Tamazian et al. [2015] that S(x,N) also follows equation 10, but with different parameters a and l. These parameters depend on α and N and have been tabulated (for 0.5≤α≤1.5) and N≥400 in Tamazian et al. [2015]. Accordingly, for given α and N, the trend significance can be obtained straightforwardly from Tamazian et al. [2015], making the trend estimation as easy as for short-term persistent processes. Analytic approximate formulas for the significance of the trends in long-term persistent records can be found in Lennartz and Bunde [2009, 2011].
Acknowledgments
The authors would like to acknowledge the support of the LINC project (289447) funded by the EC's Marie-Curie ITN program (FP7-PEOPLE-2011-ITN) and the Israel Science Foundation for financial support. The data sets used in this work can be accessed through the following sources. NCEP1 is provided by the National Oceanic and Atmospheric Administration (NOAA), from website “http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html.” NCEP2 is provided by NOAA, from website “http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html.” ECMWF ERA-Interim is provided by the European Centre for Medium-Range Weather Forecasts (ECMWF), from website “http://www.ecmwf.int/en/research/climate-reanalysis/era-interim.” JRA-55 is provided by the Japan Meteorological Agency (JMA), from website “http://jra.kishou.go.jp/JRA-55/index_en.html.” MERRA is provided by the Global Modelling and Assimilation Office (GMAO), National Aeronautics and Space Administration (NASA), from website “https://gmao.gsfc.nasa.gov/reanalysis/MERRA/.” READER is provided by the Scientific Committee on Antarctic Research (SCAR), from website “https://legacy.bas.ac.uk/met/READER/data.html.”