Sources of Nonergodicity for Teleconnections as CrossCorrelations
Abstract
Teleconnections are prominent features of internal variability which, along with climatic means, can also undergo forced changes. The assessment of such changes based on single realization observational data calculating moving window temporal correlation coefficients is burdened by biases. Previously, we attributed one source of such a bias, that is, nonergodicity, to the omission of lowfrequency (LF) variability, but find here that it is negligible for typical processes due to their spectral characteristic, given that LF, say, interdecadal, variability is much weaker than the high frequency interannual variability. The unnecessary omission of detrending, as another source, is potentially much more deleterious, as we find that the bias depends quadratically on both the window size and the trends of the means. Given that forced changes in the variances should not introduce much bias, a nonlinear change of the teleconnection strength is implied as the main irreducible source of nonergodicity.
Key Points

Analyzing changes in teleconnections based on temporal statistics in observed data is expected to lead to biases, that is, nonergodicity

The effect of omitting lowfrequency variability is typically negligible, but the omission of detrending could lead to considerable biases

A nonlinear change of the teleconnection strength is the only irreducible source of biases, but not making a nonstationarity test liberal
Plain Language Summary
The variability of far separated regions can be correlated, teleconnected, which can give rise to seasonal predictability. Teleconnections can also undergo forced changes under external influence. These changes are not perfectly reflected in observed time series because of systematic errors, biases, of the estimates of correlation coefficients. We identify five sources of such biases for correlations, including trends and spectral filtering, but find only the former potent. We propose that correlations are always calculated upon detrending via spectral highpass filtering. We take this approach in our case study of the El Niño–Southern OscillationIndian summer monsoon teleconnection, and demonstrate, using a new statistical test, that no forced change, including a much debated weakening since ∼1980, is detectable. Yet, considering the extreme weakness of the test, we conclude that the strengthening teleconnection in the Max Planck Institute Grand Ensemble model data might not be a robust result.
1 Introduction
Manmade climate change imposes one of the greatest societal challenges, not least because of the many uncertainties about its unfolding and impacts. Some of the changes may be relatively small but potentially rather impactful. The El Niño–Southern OscillationIndian summer monsoon (ENSOISM) teleconnection, the subject of our case study, could be one such example. Changes like this might not even be detectable when they are already taking their toll—by hindering predictability in this case, potentially. For studies concerning the changes of the ENSO teleconnections, see Bódai et al. (2021), Domeisen et al. (2019), Fasullo et al. (2018), Haszpra et al. (2020); Hu et al. (2021), Michel et al. (2020), Müller & Roeckner (2008), Power & Delage (2018), Rodgers et al. (2021), Taschetto et al. (2020), and Yeh et al. (2018). The ENSOISM teleconnection, in particular, have been argued to have already detectably weakened (Kumar et al., 1999), since about 1980, but other statistical tests, for example, that of Yun and Timmermann (2018), might cast doubt on this claim.
The forced change, exerted by external causes such as the anthropogenic ones, is defined in a conceptually sound sense in terms of a converged (Drótos et al., 2017) ensemble (Bódai & Tél, 2012; Drótos et al., 2015; Hasselmann, 1976; Leith, 1978; Tél et al., 2019). Therefore, such an accurate analysis can be possibly done only in a model world, but not about Earth's climate system for which only a single observational realization is available. The first accurate analysis of the “transient” forced change (Bódai, Drótos, et al., 2020) examining the ensemblewise correlation coefficient r (Herein et al., 2016; Yettella et al., 2018) detected a strengthening of the ENSOISM teleconnection in the end of the 20th c. in the MPIGE Large Ensemble data set (Maher et al., 2019), and further analysis (Bódai et al., 2021) of the robustness of this finding did not alter the conclusion.
 (I)
Model error: the trivial one.
 (II)
Undersampling (Gershunov et al., 2001; Yun & Timmermann, 2018): fluctuations of the finite τwindow temporal correlation coefficient (cc) r_{τ} evaluated from observational data might be consistent with the scenario, that is, nullhypothesis, of a stationary teleconnection climatology.
 (III)
The temporal cc r_{τ} is missing any contribution from lowfrequency (LF), say, interdecadal, variability and, so, it is biased wrt. the true ensemblewise r.
Hypothesis (II) seems rather convenient to resolve the apparent contradiction. Our own statistical test (documented in detail in the Supporting Information S1 Part 1) cannot detect a significant change at the usual 95% confidence level either. However, we also point out that both ours and the test of Yun & Timmermann (2018) are somewhat conservative and very weak, with just a tiny detection rate in excess of 5% for a very considerable endofcentury scenario of a forced change of r. That is, detection of a nonstationarity of the statistical quantity of the correlation coefficient is very data hungry, for which reason we can have little satisfaction in resolving the apparent contradiction by (II). We believe that there is plenty of room for hypothesis (I), and that further physical modeling and statistical analysis should focus on this question.
In Bódai, Drótos et al. (2020), regarding hypothesis (III), we posited alternatively—to the end of possibly resolving the apparent contradiction—that the LF contribution to the change of r could be dominant and strengthening, but it is missing from r_{τ}. In fact, already Figure 10a of Bódai, Drótos, et al. (2020) was serving to falsify this, from which figure we reproduce the ensemble mean (Emean), or expectation, as 〈r_{τ,p}〉 in Figure 1 here. (The subscript p stands for “preprocessing,” referring to the methodology of Kumar et al. (1999) to calculate the cc.) Looking beyond the sign of change, however, it may still be the case that 〈r_{τ,p}〉 ≠ r, that is, that the temporal statistics is biased. Previously, we referred to a bias of this type as nonergodicity, and argued that it is a generic feature of nonautonomous, that is, externally forced, systems (Drótos et al., 2016). This inequality is not very obvious in any single year, as the finite ensemble size errors of estimates of r—also reproduced in Figure 1—are very large. A smoothing r* of r, plotted in the same diagram, loosely indicates the presence of nonergodicity, most prominently around 2020. In Bódai et al. (2021), we developed a statistical test to formally detect nonergodicity. However, a question remained as to whether this nonergodicity implies a nonlinear forced change of r.

(SN) Nonlinear forced change of r itself.

(SL) Omission of the contribution of LF variability. Identical with (III).

(SD) Omission of detrending.

(SH) Forced change of the Estd, that is, heteroscedasticity.

(SE) Finiteτ sample size for the sample correlation coefficient.
(SN) is already studied in Bódai et al. (2021). Here, we will explain why (SL) is not a prominent source for the ENSOISM teleconnection, as prompted already by Figure 1, whereas (SD) clearly is, potentially, and, so, should be eliminated by always detrending, using for example, r_{τ,p} instead of r_{τ,d}. Finally, we will discuss our formal results in Section 4.
2 SL: Filtering Out LowFrequency Variability
Here, we relate some spectral property of the signals to the bias of r_{τ,p}. In order to analyze the source (SL) in isolation from (SN), we will analyze ensemblewise correlation coefficients of LF and highfrequency (HF) anomaly components. We define the LF component operatively as x_{L} = x*, where * denotes some smoothing, for example, running window mean or SavitzkyGolay (SG) filtering, or, these combined. The anomaly itself is then defined naturally (i.e., with ) as: . The HF anomaly is also defined operatively: . We note that this is not a natural definition as in general due to nonergodicity. However, if 〈x〉 ≈ 〈x*〉 for our purposes, then and provide approximately a decomposition of the full anomaly x′, as collecting the above yields: .
That is, even if r_{L} and its changes are nonnegligible, their contribution to r, and, so, the bias of r_{H}, or possibly r_{τ,p}, might still be negligible.
For a white noise signal of flat power spectrum, ϵ^{2} ∼ 1/τ as per the law of large numbers, that is, the Central Limit Theorem (CLT). Given that observed time series of the JuneJulyAugust mean Niño 3 (Rayner et al., 2003; NOAA ESRL Physical Sciences Laboratory, n.d.) and the JuneJulyAugustSeptember mean AllIndian Summer Monsoon Rainfall (AISMR) (Parthasarathy et al., 1994; Indian Institute of Tropical Meteorology, n.d.) are more like white noise, the bias in question is expected to be small, rather negligible, for the common choice of τ = 21 [yr], indeed. In this regard, on the one hand, we display the Emean of the power spectra of the true anomalies taken between years 1880–2000 in Figure 2a (Similar spectra are shown in Figures 2 and S5 of Rodgers et al. (2021) for the CESM2LE.) Both Niño 3 and AISMR feature a maximum at the ∼10 years period, and then, for longer periods, AISMR levels off sooner than Niño 3, which corresponds to ϵ^{2}(τ) plotted in a loglog diagram in panel (b) approaching an asymptote of slope −1 to do with the CLT also sooner. We note that in the regime of what is referred to as “macroweather” in Figure 2 of Lovejoy (2015), the power exponent (β) in this case is not slightly positive but negative; although it does not really make a qualitative difference here.
Nevertheless, here we determine r_{L} for the MPIGE data set, as well as some measure of the bias in question. We apply a shorter window length of τ = 11 [yr]. In Figure 3a, we display r along with r_{H} for a comparison, which indicates indeed a rather small bias. In panel (c), we compare a direct calculation of r_{L} with a reconstruction of it via rearranging Equation 1 and calculating ϵ_{x} and ϵ_{y} directly via calculating the Evariances. The latter are shown in panel (b). The reconstruction features very large error fluctuations, likely owing to dividing a small number by another small number. We apply a smoothing to the reconstruction, , which achieves a rather good match with the direct estimate. For one thing, this serves as a numerical verification of Equation 1. Otherwise, we find that: r_{L} might be considerably different from zero at times; it might take both positive and negative values; and it might also undergo a much larger forced change than r_{H}. Given that the LF anomalies are by definition serially correlated very considerably, r_{L} should also be serially correlated, and, therefore, determining the significance of its change should take that into account. Given that the reconstruction is a function of variables featuring no or considerably less serial correlation, it might be suitable for assessing the significance of the forced changes of r_{L}. The results of Yun and Timmermann (2019) suggest that r_{L} changes considerably more than r_{H}.
To see what drives the changes of r, in panel (d), we compare the (smoothed) changes of and a scaled version of r_{L}, being ρ_{L} = ϵ_{x}ϵ_{y}r_{L}, which is “missing from r”—the bias—according to the approximation 2 (but using the direct estimate r_{L}). We also plot alongside a (smoothed) reconstruction of the latter, , originating from the more accurate Equation 1 versus 2, that match rather well. Clearly, driving the changes of r, changes much more than for example, . We can quantify this naively by calculating a temporal standard deviation of them. For this, we trim the beginning and end of the time series which are affected by endeffects upon applying the SG filter. (This is even more prominent in the ϵ’s in panel (b).) The figures given in Table 1 indicate that std is larger by a factor of 4. The match of the reconstruction is somewhat worse with about double the window length, the rather usual choice, as presented in Supporting Information S1 Part 2 (Figure S2), and, otherwise, std proves, in that case as seen in Table 1, to be an order of magnitude larger than std.
τ [yr]  std  std  std[r_{L}] 

11  0.043  0.013  0.19 
21  0.042  0.005  0.20 
3 SD: Omitting Detrending
That is, in contrast with ϵ_{x}ϵ_{y} ∼ τ^{−1} of (SL), we have here (SD) a quadratic τdependence of the bias ϵ_{x}ϵ_{y} ∼ τ^{2} in Equation 4. There is no tradeoff situation, though, as we can just opt for the use of r_{τ,p}. Furthermore, the bias depends on the changes of the climatological means also quadratically (αβ).
For completeness, as a third contrast with (SL), we point out that r_{r} = 1 ≫ r_{L}. Some previous studies failed to identify this as a bias but included it in the meaning of the teleconnection, breaking with the principle of disentangling internal variability from the climatological means. Just the fact that r_{r} is trivially 1 under any circumstances prompts that it is not a meaningful quantity for a connection. (Considering nonlinear ramps, generically, it would be the nonlinear rank correlation coefficient that would be trivially 1.) In practice, given a finite ensemble size, the estimate would not be 1, but some nontrivial value contingent on internal variability “seeping through,” which could mask its meaninglessness. See also Bódai et al. (2021) (Footnote 2) and Lee & Bódai (2021) on this matter.
Next, using the MPIGE data, we reconstruct 〈r_{τ,d}〉 numerically from Equations 3 and 4, in which latter α and β are calculated via movingwindow regression on the Emeans 〈x〉, 〈y〉. The reconstruction is rather accurate, as seen in Figure 4b r_{f} can be likened to r_{H}, or, as per Figure 4a, to 〈r_{τ,p}〉. Therefore, for small ϵ_{x}ϵ_{y}’s, Equation 3 implies that 〈r_{τ,d}〉 − 〈r_{τ,p}〉 ≈ ϵ_{x}ϵ_{y}. However, in panel (b) we see that ϵ_{x}ϵ_{y} is not so small in this sense. Otherwise, we see that αβ becomes large enough that already with τ = 21 [yr], r_{τ,d} is considerably biased, even to the extent that in the 21st c. the true trend is canceled, or even reversed slightly.
4 Discussion
Yun and Timmermann (2018) made the point implicitly that physical mechanisms for a phenomenon should be sought after a solid statistical analysis finds that the hypothesis is unlikely to be false, or, that there is a strong enough signal to care about. Regarding the multidecadal fluctuations of moving window correlation coefficients linking the ENSO and the Indian summer monsoon, they found that various data sets are all consistent with the hypothesis that the fluctuations arise from interannual variability alone. That is, there is no indication that drivers of multidecadal variability are influencing the ENSOISM teleconnection, but its multidecadal fluctuations are in a way an artifact of the analysis technique: the timescale is introduced by the window length. This is the hypothesis of undersampling labeled by (II) in the Introduction. More to the point of our subject, their statistical test implies at the same time that externally forced changes of the teleconnection strength cannot be detected either. However, they used the CRU precipitation data and a rectangular box encompassing the Indian subcontinent as well as the surrounding seas, and the resulting moving window correlation coefficient time series does not in fact suggest a forced change. Here, instead, we used the AISMR data, which does seem to suggest an endofthecentury forced weakening, as exposed by Kumar et al. (1999). We also propose a new test. However, even with such a setup, we cannot reject the same nullhypothesis (Supporting Information S1 Part 1). This counters the claim of Kumar et al. (1999) and leaves the hypothesis of undersampling (II) intact. Nevertheless, considering that the respective tests of Yun and Timmermann (2018) and ours are both very weak, with very little power to detect even considerable endofthecentury forced changes of the teleconnection (Supporting Information S1 Part 1), we argue that it is meaningful to examine the physical hypothesis of a forced change of the teleconnection. It is also possible that our tests, sharing the same philosophy, will turn out upon more careful consideration to be overly conservative. That is, we conclude that the strengthening ENSOISM teleconnection in the MPIGE model data might not be robust, that is, that there is room for hypothesis (I) (Section 1).
Besides the soundness of the statistical hypothesis test, in the sense as that of Yun and Timmermann (2018) or rather ours in Supporting Information S1 Part 1 contrasts (Kumar et al., 1999), the main focus of the present paper was the soundness of other aspects of the statistical analysis, namely, the choice of the quantity for the purpose of quantifying the teleconnection strength and its forced changes. The conceptually sound choice, the ensemblewise correlation coefficient, r, is unfortunately not an option when it comes to observational data. We assessed therefore what mistakes are expected in the case of some alternatives based on temporal statistics. We found (Section 3) that omitting detrending, directly evaluating moving window correlation coefficients, r_{τ,d}, can potentially incur considerable biases, nonergodicity, which can corrupt the conclusion of a study (Figure 1). Therefore, it is worth using a somewhat more involved definition of a correlation coefficient, r_{τ,p}, that is based on preprocessing time series by highpass filtering (Kumar et al., 1999). The latter too is biased, but it is negligible for long enough windows when the processes of interest are more like white noise (Section 2). On the multidecadal timescales of interest, Lovejoy (2015) posits that most observables behave like this, being more like white noise (the power exponent of the power spectrum being β = 0) than random walk (β = 2). These scenarios correspond to ϵ ≪ 1 and ϵ > 1 in our Equation 1, respectively.
The main remaining source of the bias, that is, nonergodicity, is a nonlinear forced change of r (SN). Therefore, if the statistical test introduced in Bódai et al. (2021) is performed involving r_{τ,p} instead of r_{τ,d}, then nonergodicity could robustly imply nonlinearity (SN). Instead of using r_{τ,p}, for Figure S44 of Bódai et al. (2021) (showing the pvalues of detecting nonergodicity), we evaluated r_{τ} using the true anomalies x′, y′. From our present analysis it turns out that not only did it eliminate (SD), as intended, but also (SL). Therefore, Figure S44 of Bódai et al. (2021) does robustly imply a nonlinear forced change of r (SN).
We note that our concern in this paper of what sources of nonergodicity for teleconnections there might be is purely conceptual and mathematical. Concerning any particular teleconnection, however, it might have a physical explanation how a particular source like, for example, nonlinear change of the teleconnection comes about. As for the ENSOISM teleconnection, in particular, dedicated analysis should be devoted to the nonlinear change of the teleconnection in the MPIGE. A trivial reason for a nonlinear change could be the nonlinear forcing signal even when the response characteristic is linear (Bódai, Lucarini, et al., 2020). Or, a nonlinear response characteristic can emerge via a shift of spatial patterns (Lee & Bódai, 2021) besides a change of intensity. Such an effect might also make the projections of the ENSOISM teleconnection change not at all robust across different models. In Bódai et al. (2021) we identified one source of nonlinearity being the nonlinear/nonmonotonic change of ENSO variability (Kim et al., 2014).
The bottom line is that SN is the only considerable source of bias that could potentially weaken the detectability (without making it liberal) of the forced change of the teleconnection using single realization observational data. However, often it might not even matter from the point of view that such changes are hardly detectable anyway (Supporting Information S1 Part 1). In modeling studies, on the other hand, we have the possibility of generating a large enough ensemble to detect change (Bódai, Drótos, et al., 2020, Bódai et al., 2021). Our observation is that in that setting the use of typically biased temporal statistics instead of the sound ensemblewise statistics (Figure 1) should be explicitly justified or avoided (Bódai et al., 2021).
Acknowledgments
T. Bódai was supported by the Institute for Basic Science (IBS) under IBSR028Y1, and S. Aneesh and J.Y. Lee under IBSR028D1. T. Bódai thanks Jan Vrbik for useful discussions about the Fisher transformation and possible improvements; KyungSook Yun for calling his attention to Yun and Timmermann (2019) on high and lowfrequency ENSO teleconnections; and Gábor Drótos for many useful discussions and feedback on the manuscript. We are grateful to the Indian Institute of Tropical Meteorology for providing the AISMR data (Parthasarathy et al., 1994; Indian Institute of Tropical Meteorology, n.d.), and to the NOAA ESRL Physical Sciences Laboratory for providing the Niño 3 time series (Rayner et al., 2003; NOAA ESRL Physical Sciences Laboratory, n.d.). We are thankful to B. Stevens, T. Mauritsen, Y. Takano, and N. Maher for providing early access to the output of the MPIESM ensembles via Gábor Drótos, also used in Bódai, Drótos, et al. (2020). We would like to acknowledge the constructive feedback from two anonymous reviewers which helped improve the quality of the presentation.
Open Research
Data Availability Statement
No new data have been used for this work. The AISMR data (Parthasarathy et al., 1994) is available from Indian Institute of Tropical Meteorology (n.d.); the Niño 3 data (Rayner et al., 2003) is available from NOAA ESRL Physical Sciences Laboratory (n.d.); the MPIGE data (Maher et al., 2019) is available from Max Planck Institute for Meteorology (n.d.). See also Section 2.1 of Bódai et al. (2021) on the MPIGE data availability with details as to variable names and codes. Matlab codes to reproduce the results in this paper, including the analysis scripts and Matlab data files for Niño 3 and AISMR data previously (Bódai et al., 2021) extracted from the SST and precipitation netcdf files of the MPIGE data set, are available at https://github.com/bodait/Nonergodicityteleconnections or https://zenodo.org/record/6099874, https://doi.org/10.5281/zenodo.6099874 (Bódai, 2022).