When Does the Lorenz 1963 Model Exhibit the Signal‐To‐Noise Paradox?

Seasonal prediction systems based on Earth System Models exhibit a lower proportion of predictable signal to unpredictable noise than the actual world. This puzzling phenomena has been widely referred to as the signal‐to‐noise paradox (SNP). Here, we investigate the SNP in a conceptual framework of a seasonal prediction system based on the Lorenz, 1963 Model (L63). We show that the SNP is not apparent in L63, if the uncertainty assumed for the initialization of the ensemble is equal to the uncertainty in the starting conditions. However, if the uncertainty in the initialization overestimates the uncertainty in the starting conditions, the SNP is apparent. In these experiments the metric used to quantify the SNP also shows a clear lead‐time dependency on subseasonal timescales. We therefore, formulate the alternative hypothesis to previous studies that the SNP could also be related to the magnitude of the initial ensemble spread.

Previous attempts of identifying the origin of the SNP have focused on either deficiency in the model formulation or statistical uncertainties. Studies focusing on model deficiencies proposed hypotheses such as too weak stratosphere-troposphere interactions Stockdale et al., 2015) and too weak ocean-atmosphere coupling (Ossó et al., 2020;Smith et al., 2014). The latter is substantiated by the apparently too weak persistence of the NAO (Zhang & Kirtman, 2019) and surface air temperature (Sévellec & Drijfhout, 2019) identified in uninitialized simulations. Since ocean-atmosphere feedback improves with increasing resolution, Sévellec and Drijfhout (2019) hypothesized that higher resolution prediction systems may suffer less from the SNP, however, Scaife et al. (2019) finds no discernible impact of atmospheric resolution. In contrast, Weisheimer et al. (2019) focus on statistical uncertainties in the quantities used to assess the level of underconfidence. Furthermore, Strommen and Palmer (2018) and Strommen (2020) propose that the SNP might be related to deficiencies of the models in representing the regime behavior of the NAO. Despite these previous efforts, there is up to now no scientific consensus on the origin of the SNP.
Here, we investigate the occurrence of the SNP in a conceptual framework for an ensemble-based prediction system based on the dynamical Lorenz, 1963 Model (Lorenz, 1963). We conduct a variety of different experiments using different parameters in the initialization of the ensemble. We then analyze these experiments with the purpose of examining under which conditions the SNP occurs for seasonal predictions in our framework. We also analyze how the SNP evolves on subseasonal timescales in this framework, aiming at identifying characteristics of the SNP, which might be in future studies related to the SNP in comprehensive ESMs.

General Model Description
The Lorenz 1963 model (Lorenz, 1963) describes atmospheric convection in an idealized setup and comprises the following three equations: , , , where the dot denotes the derivative of a quantity with respect to time. The three dimensionless parameters ρ, σ and b describe the physical properties of the fluid and the idealized setup (for details see Lorenz, 1963). We set them as σ = 10, ρ = 28, and b = 8/3. All numerical integrations are done using a fourth order Runge-Kutta method and a timestep of Δt = 0.01.

Creation of Idealized Hindcast Experiments
In the following the general setup of seasonal hindcast experiments in the L63 system is introduced, partly following Saetra et al. (2004): 1. Spinup The model is integrated starting with the initial conditions [1,1,1] for 100.000 timesteps allowing the trajectory to evolve to the model attractor.

Generating initial conditions
The model is integrated for further 100.000 timesteps starting from the last timestep of the model-spinup. The set of all states that occur during this phase serves as a pool of initial conditions for the following hindcast experiments (Figure 1a, blue line). 3. Forecast (a) A random state from the pool of initial conditions is chosen as the initial state of the real world ( Figure 1a, black dots). (b) The reference run is acquired by integrating the model for 2.000 time units starting from the initial state of the real world. (c) The observation of the initial state of the real world is acquired by adding normally distributed noise with zero mean to each dimension on the initial state. The standard deviation (observational spread) is denoted σ o and the same for each dimension.
(d) A 100-member ensemble is initialized by adding normally distributed noise with zero mean to each dimension of the observation. The standard deviation (initial ensemble spread) is denoted σ e and the same for each dimension. (e) The time series for verification of the hindcast (verifying analysis), is obtained by adding normally distributed noise with zero mean on each dimension of the reference run with standard deviations corresponding to the observational spread for each dimension.

Hindcast Experiment
The forecast (Step 3) is repeated 100 times each time with a different randomly chosen initial state. The set of these 100 forecasts combined with the verifying analysis is denoted a hindcast experiment.

Experiment Descriptions
We conduct in total four sets of experiments, each comprising multiple hindcast experiments, which are all calculated over 100 years and with 100 ensemble members with different and randomly chosen initial states. The first set consists of 100 hindcast experiments where initial and observational spread are equal (σ e = σ o = 0.01) for all experiments and which is therefore further referred to as the equal initial spread set. The second and third set also comprise 100 hindcasts, however the magnitude of the initial ensemble spread underestimates or overestimates the observational spread (σ o = 0.01) by one order of magnitude and which is therefore further referred to as the low initial spread and high initial spread set, respectively. The fourth set MAYER ET AL.
10.1029/2020GL089283 3 of 10 (incremental initial spread) consists of 501 experiments where the initial ensemble spread is incremented over two orders of magnitude from σ e = 0.001 to σ e = 0.1, spaced evenly on a log scale with constant observational spread σ o = 0.01.

Postprocessing
The raw hindcast experiments (forecasts and verifying analysis) are averaged over 0.1 time units (10 timesteps) using a running mean. According to Palmer (1993), this period amounts to approximately 1 day in the real atmosphere. A month then corresponds to the average over 30 consecutive days (300 timesteps) and accordingly we define the season as the average over 90 consecutive days (900 timesteps). For a seasonal prediction (Figure 1b) we leave a gap of 1 month (300 timesteps) between the initialization and the beginning of the season as general practice in some operational seasonal forecasting systems (e.g., Frhlich et al., 2020;Johnson et al., 2019).
We normalize the timeseries by subtracting their mean and dividing by the standard deviation. Mean and standard deviation are calculated over time and ensemble members. In this study all results are obtained for predictions of the x-component.

Calculation of the RPC
To test whether the amount of predictable signal relative to total variability is comparable in model and actual world, Eade et al. (2014) introduced the ratio of predictable components (RPC) as the ratio between the predictable component in the actual world (PC act ) and the predictable component in the model (PC mod ): The predictable component in the actual world is estimated by the actual predictability (ACP) calculated by the correlation coefficient between the mean of the full ensemble and the verifying analysis. In this study we estimate the predictable component in the model by the model predictability (MOP) using the so called perfect model approach (e.g., Weisheimer et al., 2019). We first choose a single ensemble member as a substitute for the verifying analysis and calculate the ensemble mean over the remaining reduced 99-member ensemble. The correlation between the reduced ensemble mean and the substitute for the verifying analysis yields one sample of the MOP. This process is repeated for every ensemble member. The mean over these samples is then used to estimate the mean MOP (e.g., Ehsan et al., 2013;Kumar et al., 2014).

Calculation of the RPC for Smaller Ensemble Sizes
In order to investigate the dependency of the SNP on the number of ensemble members we also select reduced ensembles from the full 100-member ensemble, according to the following procedure: First, a random permutation of the full ensemble is generated. The first ensemble member in this permutation is considered to be the substitute for the verifying analysis. Subsequently, ensemble mean predictions of different ensemble sizes are generated by iterative averaging over the remaining ensemble members: The second ensemble member in this specific permutation is gives one possibility of a 1-member ensemble, while the average over the second and third ensemble member gives one possibility of a 2-member ensemble and so forth. The correlation of the ensemble mean prediction for every ensemble size with the verifying analysis (substitute of the verifying analysis) then yields one sample for the ACP (MOP). This process is repeated 100 times starting each time with a different permutation and the mean over all samples is then used to calculate the mean ACP and MOP.

Selection of Representative Experiments
When we conduct multiple hindcast experiments with different randomly chosen initial conditions, we select one experiment whose ACP as well as RPC are close to their corresponding mean values over all hindcast experiments. This selection is done by first normalizing the ACP as well as the RPC over all hindcast-experiments and then calculating the root mean square (RMS) of the normalized ACP and RPC index. The hindcast experiment with the smallest RMS-value is considered to be representative of the average behavior.

Dependency of Reliability in Seasonal Predictions on Initial Ensemble Spread and Observational Uncertainty
We first analyze the SNP for seasonal predictions of the x-component in the Lorenz 1963 Model with a leadtime of 1 month for the equal initial spread experiments. Since the predictability in the Lorenz 1963 System depends on the position of the initial state on the Lorenz Attractor (e.g., Huai et al., 2017), estimates of ACP and MOP are also subject to uncertainty related to the position of the initial states chosen. Therefore, both ACP, as well as RPC, exhibit variability across the 100 different hindcast experiments with the same general model setup but different initial conditions (Figure 2a). Our analysis shows that the sample mean ACP over all 100 hindcast experiments in the equal initial ensemble spread set amounts to 0.515 ± 0.008, which is comparable to the hindcast skill in a typical seasonal forecast system for the winter NAO (e.g., Dobrynin et al., 2018;Baker et al., 2018;Scaife et al., 2014). The distribution of the RPC over all 100 equal initial spread hindcast experiments exhibits a peak around the value of one ( Figure 2a) and the sample mean RPC amounts to 0.985 ± 0.013, which is close to the perfect reliable case for which the RPC is expected to be equal to one. We then select the one out of the 100 hindcast experiments (Figure 2a), which is representative of the average behavior (Section 2.6). This representative experiment is further analyzed for the ACP as well as MOP at different sizes of the ensemble, allowing us to investigate the uncertainty associated with the selection of ensemble members independent of the uncertainty associated with the position of the initial states. Our results show that independent of the ensemble size the mean ACP is within the interquartile range of the MOP for different permutations of the ensemble and therefore ACP and MOP are found to be almost equal (Figure 2b).
These results obtained in our equal initial spread experiment set indicate that, while individual hindcast experiments might be overconfident (RPC < 1) or underconfident (RPC > 1) simply by chance, the average of the RPC over all hindcast experiment appears to be close to reliable (RPC = 1). This finding is also independent of the ensemble size and robust with respect to random permutations of the ensemble for a representative hindcast experiment. In comparison, the results of a similar analysis for seasonal predictions of the NAO in a comprehensive ESM  reports values for the RPC larger than two.
MAYER ET AL. Therefore, if the model is perfect and the initial ensemble spread represents the observational spread, the conceptual framework does not show the SNP.
The previous result raises the question whether this overall statement about the occurrence of the SNP in our conceptual framework changes if the initial ensemble spread and the observational spread are not equal. To test whether this is the case, we analyze the seasonal forecasts of the x-component for the high initial spread experiment set and low initial spread experiment set.
In the case of low initial spread (Figure 3a  sample mean RPC, not even in the extremes of the distribution (Figure 3a). For the representative experiment (Section 2.6), the mean MOP exceeds the mean ACP independent of the ensemble size ( Figure 3b). Furthermore, the mean ACP lies outside the range of the MOP acquired by random permutations of the ensemble. The ensemble appears to be overconfident.
In contrast, seasonal forecasts in the high initial spread case exhibit a sample mean ACP (RPC) of 1.369 ± 0.008 (0.315 ± 0.038) and RPC = 1 is only just included in the extremes of the distribution. It is worth noting that the ACP, as well as the mean RPC found in this study for the high initial spread case, are lower than the ones found in seasonal predictions of the winter NAO in Scaife and Smith (2018). However, the investigated GloSea5 system (MacLachlan et al., 2014) belongs to the systems with the highest reported skill and the results of Baker et al. (2018) suggest that systems with higher skill exhibit higher values of the RPC.
The analysis of the representative experiment (Section 2.6) in the high initial spread set shows that the mean ACP exceeds the mean MOP independent of the ensemble size (Figure 3d). For most sizes of the ensemble the mean ACP also exceeds the interquartile range of the MOP for different permutations of the ensemble, however it still lies within the range of the extremes. Compared to the equal and low initial spread sets the uncertainty associated with the permutation of the ensemble is highest in the case of high initial spread.
These results show that increasing (decreasing) the ratio of initial ensemble spread to observational spread in the initialization by one order of magnitude leads to-on average-underconfident (overconfident) hindcasts (Figures 3a-3f), with the former resembling the situation empirically found in seasonal predictions of the NAO in comprehensive ESMs (e.g., Baker et al., 2018;. Similar to the previous set of experiments, we first analyze the RPC in the incremental initial spread case for seasonal predictions in the Lorenz 1963 Model for each individual hindcast experiment. The analysis of seasonal forecasts in the incremental initial spread set shows a systematic dependency between the RPC and the magnitude of initial ensemble spread to observational spread (Figure 3e). A linear ordinary least square regression between the RPC and the decimal logarithm of the initial ensemble spread to observational spread reveals a statistically significant slope (p < 0.01) of 0.450 ± 0.014. This means that depending on the relative size of the initial ensemble spread to the observational spread the seasonal forecasts based on the Lorenz 1963 Model can appear to be overconfident or underconfident and in general the level of over-or underconfidence increases with increasing miss match between initial ensemble spread to observational spread. Since these results are all obtained in a perfect model framework, they suggest that probably even if comprehensive ESMs at the core of seasonal prediction systems were to be perfect, the same paradoxical result found in these systems could also be obtained simply by initializing the ensemble with a too large initial spread when compared to the observational spread.

Lead-Time Dependency of Statements About Over-or Underconfidence on Subseasonal Timescales
Using the same representative experiments chosen from the three different sets of hindcast experiments (low initial spread, equal initial spread, high initial spread), we further analyze the evolution of the RPC on monthly timescales for different sizes of the ensemble (Figures 4a-4f). In the 1st month, after initialization all three representative hindcast experiments exhibit values of the ACP and MOP close to one for all sizes of the ensemble (Figures 4a, 4c, and 4e).
In the 2nd month after initialization the difference between the experiments becomes clearly apparent (Figures 4b, 4d, and 4f). The representative low initial spread experiment is overconfident with the MOP exceeding the ACP over all sizes of the ensemble (Figure 4b). The representative equal initial spread experiment is close to reliable with ACP and MOP being almost equal over all sizes of the ensemble (Figure 4d). The representative high initial spread experiment is underconfident with the ACP exceeding the MOP over all sizes of the ensemble (Figure 4f).
We investigate this time-dependency further in the incremental spread set for predictions of monthly means at different lead-times with 1 day increments ( Figure 4g)  the initial ensemble spread is larger (smaller) than the observational spread until it reaches a maximum (minimum) shortly before the prediction skill is lost. When the prediction skill is lost, the absolute RPC takes on large positive numbers by construction.
Our results for the SNP on subseasonal time-scales indicate that in the 1st month after the initialization hindcast experiments with different ratios of initial ensemble spread to observational spread show little deviations of the RPC from the expected value of one. However, the only difference between these experiments is their ratio of initial ensemble spread to observational spread that, even though not clearly apparent in the 1st month after initialization, determines when and to which degree the hindcast evolves to become over-or underconfident in the 2nd month.

Conclusion
Based on four sets of experiments using a conceptual framework for a seasonal prediction system based on the Lorenz, 1963 Model (Lorenz, 1963), we conclude that the SNP is not apparent if the initial ensemble spread represents the observational spread. However, the SNP can occur if in the process of initialization, the ensemble spread is overestimated compared to the observational spread. Since reference runs and forecast runs are time integrations of the same model using the same parameterization, we effectively create a perfect model framework where uncertainties in the model formulation can be ruled out when interpreting these results. Zhang and Kirtman (2019) conclude that the persistence in uninitialized simulations is weaker in models than in reanalysis products. According to the authors their results point toward a fundamental model problem instead of a problem in the initialization procedures, which is contrary to the present study. However, their analysis investigates the SNP in uninitialized ESMs, while our study investigates the SNP in an initialized conceptual model and therefore both setups differ from the initialized ESMs, subject to the SNP. The degree to which persistence in uninitialized simulations is related to forecast skill in initialized simulations as well as the degree to which the results in the conceptual model are related to ESMs remains uncertain in both studies and we therefore find that our results are not mutually exclusive.
Our results suggest that the magnitude of the initial ensemble spread relative to the observational spread could be an alternative hypothesis on the origin of the SNP. Based on these results we suggest to extend the set of candidates being potentially responsible for the SNP and include in further investigations of the SNP also an analysis of how well the initial ensemble spread actually represents the observational uncertainty. While it would be clearly desirable to conduct such a study in comprehensive ESMs, such studies will inevitable face challenges such as how to quantify the observational uncertainty as well as how to separate the effect of initial ensemble spread from other effects occurring in comprehensive models. Nevertheless, our results indicate that despite these challenges further investigations in comprehensive ESMs might be worthwhile.

Data Availability Statement
The code used to generate and analyze the data as well as to create all figures has been made available (BjoernMayer92, 2020). The original data generated for this study as well as the code has also been archived in the Climate and Environmental Retrieval and Archive (CERA) (https://cera-www.dkrz.de/WDCC/ui/ cerasearch/entry?acronym=DKRZ_LTA_1075_ds00003)