Spatial Covariance Modeling for Stochastic Subgrid‐Scale Parameterizations Using Dynamic Mode Decomposition

Stochastic parameterizations are increasingly being used in climate modeling to represent subgrid‐scale processes. While different parameterizations are being developed considering different aspects of the physical phenomena, less attention is given to technical and numerical aspects. In particular, empirical orthogonal functions (EOFs) are employed when a spatial structure is required. Here, we provide evidence they might not be the most suitable choice. By applying an energy‐consistent parameterization to the two‐layer quasi‐geostrophic (QG) model, we investigate the model sensitivity to a priori assumptions made on the parameterization. In particular, we consider here two methods to prescribe the spatial covariance of the noise: first, by using climatological variability patterns provided by EOFs, and second, by using time‐varying dynamics‐adapted Koopman modes, approximated by dynamic mode decomposition (DMD). The performance of the two methods are analyzed through numerical simulations of the stochastic system on a coarse spatial resolution and the outcomes compared to a high‐resolution simulation of the original deterministic system. The comparison reveals that the DMD‐based noise covariance scheme outperforms the EOF‐based one. The use of EOFs leads to a significant increase of the ensemble spread and to a meridional misplacement of the bimodal eddy kinetic energy (EKE) distribution. Conversely, using DMDs, the ensemble spread is confined, the meridional propagation of the zonal jet stream is accurately captured, and the total variance of the system is improved. Our results highlight the importance of the systematic design of stochastic parameterizations with dynamically adapted spatial correlations, rather than relying on statistical spatial patterns.


Introduction
Geophysical flows involve a multitude of phenomena with vastly different spatial and temporal scales (e.g., Franzke et al., 2019;Vallis, 2006), which interact with each other due to the underlying nonlinear equations of motion. In order to obtain dynamically consistent and stable long-time simulations, geophysical models need, in principle, to cover the whole range of scales. This poses great computational challenges: Processes occurring on spatial scales smaller than the prescribed numerical grid spacing and processes occurring on temporal scales faster than the prescribed numerical time step cannot be resolved. These unresolved subgrid-scale processes nevertheless may be energetically important, such as convective and turbulent processes, which are not resolved by current climate models and may significantly affect the dynamics on the large resolved scales. To capture the effects of the subgrid-scale processes, parameterizations are typically introduced, whereby the unresolved scales are conditioned on the resolved scales (Stensrud, 2007).
Further complications, caused by the inevitable distinction between resolved and unresolved spatial scales, in numerical schemes occur for nonlinear fluid systems which exhibit energy and enstrophy cascades. For atmospheric dynamics, it is well known that enstrophy is transferred from larger to smaller scales, until it is dissipated at the dissipation scale, whereas energy is transported from smaller to larger scales (Dubrulle, 2019;Vallis, 2006). For the majority of models, as for instance for general circulation models, the numerical resolution is not fine enough to resolve the dissipation processes. Subsequently, the enstrophy piles up at the truncation level, making the numerical model unstable and subject to numerical blow-up. In order to guarantee numerical stability, artificial hyperviscosity is introduced, leading to an increased viscosity of the fluid, which dissipates also the kinetic energy. Furthermore, the injection of energy from the unresolved subgrid scales leads to an unphysical grid-size-dependent representation of the kinetic energy.
We consider here a forced and damped two-layer quasi-geostrophic (QG) model, and as stochastic parameterization, we employ the projection operator approach introduced in Frank and Gottwald (2013). The energy-consistent parameterization developed in Frank and Gottwald (2013) had been devised only for a low-dimensional Hamiltonian ordinary differential equation. Subsequently, it was successfully adapted for an unforced inviscid QG model in Gugole and Franzke (2019). However, the spatial covariance of the stochastic parameterization is not specified by the methodology suggested in Frank and Gottwald (2013), and in Gugole and Franzke (2019), it was shown to be crucial for the system to have physically meaningful results. Our aim here is to further investigate the sensitivity of the model dynamics with respect to the definition of the noise covariance. Such a noise covariance is usually determined a priori and is not representative of some specific scale dynamics. Often, pieces of information obtained by means of empirical orthogonal functions (EOFs) (von Storch & Zwiers, 2003) are employed for this purpose (Cotter et al., 2018;Resseguier et al., 2019).
Here we will contrast EOFs, which capture the climatological dominant patterns of the variability and hence focus on the statistics of the field, with spatial covariances, which retain information about the dynamics in order to examine whether they provide superior means for the modeling of stochastic covariance matrices. In particular, we try to address the research question whether a flow-dependent covariance matrix performs better than a constant covariance matrix. For this purpose, we consider here patterns derived by means of dynamic mode decomposition (DMD) (Kutz et al., 2016;Schmid, 2010Schmid, , 2011. DMD is a computationally cost-effective algorithm attempting to compute a finite-dimensional approximation of the Koopman operator. The infinite-dimensional Koopman operator encodes the dynamics of a dynamical system and propagates observables in time (Lasota & Mackey, 1994). The intimate relationship between DMD modes and the eigenfunctions of the Koopman operator was established in Rowley et al. (2009). The patterns extracted by the DMD method, the so called DMD or Koopman modes, describe the dominant dynamical structures, and their corresponding eigenvalues characterize their temporal oscillation periods and their growth rates. In contrast to EOFs, DMD decomposes the dynamics according to its local in time oscillatory behavior. Connections between DMD and other model reduction techniques such as EOFs, singular vectors, and linear inverse modeling (Penland, 1989;Penland & Magorian, 1993) are discussed in Schmid et al. (2011) and Tu et al. (2014). By projecting the full system onto the subspace spanned by the leading DMD modes, the governing equations may be approximated by a low-dimensional dynamical system to study flow stability and bifurcations, among other characteristics (Bagheri, 2013;Jovanovic et al., 2014;Noack et al., 2016;Schmid, 2010;Schmid et al., 2009Schmid et al., , 2011. Here we shall use DMDs to construct the spatial structure of the noise covariance matrix. DMDs have the same numerical complexity as EOFs and have the advantage of using information of the system on the fly. For more details on DMDs and its limitations in approximating Koopman modes, the interested reader is referred to Tu et al. (2014) and Williams et al. (2015). Our analyses will be based on the comparison between the stochastic system run on a coarser spatial resolution and a simulation of the deterministic model but on a finer spatial grid.
In contrast to approaches attempting to determine subgrid-scale information from highly resolved simulations (e.g., Berloff, 2005;Franzke et al., 2005;Hermanson et al., 2009;Porta Mana & Zanna, 2014), our approach using DMDs has the potential to seamlessly adapt to any grid resolution and is, hence, scale adaptive. In fact, when changing resolution, the DMD approach automatically takes as input the new data and computes the modes at the newly defined resolution without requiring any extra offline computation. Our results show that the use of a dynamically adapted noise covariance keeps the ensemble spread confined and the meridional propagation of the zonal jet is better captured than with EOFs.
The remainder of this paper is structured as follows: In section 2, we introduce the forced and damped two-layer QG model. Section 3 describes the energy-consistent stochastic parameterization scheme. The spatial covariance of the noise is determined in section 4 using EOF and DMD analysis. Section 5 presents results from numerical simulations exploring the effect of employing either climatological or dynamically adapted spatial covariances. We conclude with a discussion in section 6.

The QG Model
We consider the nondimensional forced and damped two-layer QG equations on a β-plane with double periodic boundary conditions (Vallis, 2006). This model represents synoptic-scale atmospheric dynamics around the midlatitudes based on the QG approximation and simulates a jet-like zonal flow when suitable values for the parameters and forcing are chosen. A vertical structure of two discrete layers, which we assume to have equal depth, is the minimal vertical resolution that allows the representation of baroclinic processes (Holton, 2004).
Subgrid-scale eddies and bottom friction are modeled by biharmonic viscosity, while in the upper layer (i.e., i = 1), large-scale forcing is provided by a prescribed background-flow U = 0.6 as, for instance, in Cotter et al. (2018) and Jansen and Held (2014). The external forcing leads to the formation of a jet stream with nontrivial meridional structure whose location experiences meridional shifts-a prominent feature of the observed atmospheric jet stream (Feldstein, 1998;James & Dodd, 1996;Riehl et al., 1950). Since we consider a nondimensional description, the horizontal extensions have been rescaled to a 2π × 2π square. Finally, the evolution equations for the potential vorticities (PVs) q i of layer i ∈ {1,2} Þþ βy on the horizontal plane x ¼ ðx yÞ T ∈ R 2 , where x and y denote the zonal and the meridional directions respectively, read where ψ i (x,t)i ∈ {1,2} are the corresponding streamfunctions and τ f = 10 the frictional time scale. The term Þ 2 quantifies the strength of the shear between the two layers and, hence, also the intensity of the baroclinic instability (N = 1.2·10 −2 being the Brunt-Väisälä frequency, h = 200 the mean depth of the layers, and f≈f 0 +βy the approximate Coriolis term with f 0 = 1 and β = 0.509). These values imply a Rossby deformation radius k − 1 d ≈ 0:85 . In this setting, one nondimensional time unit corresponds to roughly 2.5 days. The strength of the effective damping of the subgrid-scale eddies is quantified by ν i = ν(ψ i ). We follow Jansen and Held (2014) and Leith (1996) and set where C Leith = 0.005 is an empirical constant and Δ is the size of the numerical grid spacing. ∇ and ∇ 2 denote, respectively, the horizontal gradient and the Laplacian operator, while the Jacobian operator J is defined as In order to have a better defined distinction between slow and fast modes, we rewrite Equations 1 as barotropic and baroclinic modes by assuming that barotropic modes evolve more slowly than baroclinic modes. Barotropic and baroclinic streamfunctions, ψ B and ψ T , can be defined as which lead to the corresponding barotropic and baroclinic PVs, q B and q T , It can easily be shown that barotropic and baroclinic PVs can also be written as and we can use these relations to determine the evolution equations for q B and q T from (1). After some manipulations, we obtain where the derivative operator d is only with respect to time, and the biharmonic viscosity coefficient has been decomposed in its constant and nonconstant parts. The unforced inviscid part of System 3 is Hamiltonian with the Hamiltonian H given by corresponding to the total energy. The Hamiltonian allows for the following relationships, which we will use in the next section, For a general review of Hamiltonian mechanics and its application to geophysical fluid dynamics, see, for instance, Badin and Crisciani (2018), Salmon (1988), and Shepherd (1990).
The numerical truncation affects deeply the dynamics by introducing a larger error at coarser resolutions. In particular, since smaller scales are not represented, the reinjection of kinetic energy from the unresolved into the resolved scales is reduced. This implies that the kinetic energy is dependent on the grid resolution (Dwivedi et al., 2019;Jansen & Held, 2014), leading, for instance, to a misrepresentation of the eddy kinetic energy (EKE) at coarser resolutions (Juricke et al., 2019;Porta Mana & Zanna, 2014). Since the computational cost of high-resolution simulations is often prohibitive, we aim at recovering the large-scale variability induced by the faster modes, and hence increase the EKE at lower resolutions, by correcting the numerical error through the introduction of a stochastic parameterization for the subgrid scales. In the next section, we present a stochastic parameterization which ensures that the stochastic noise does not break the inherent energy balance of the system.

Energy-Consistent Stochastic Parameterization
Our underlying model assumption is that there are many fast baroclinic modes which drive both the resolved and the large-scale barotropic modes and which can be efficiently represented by a stochastic Ansatz. Since barotropic modes are mainly large-scale, its spectra are dominated by the large scales, and the noise forcing can effectively affect just the baroclinic modes. Hence, as in Gugole and Franzke (2019), we represent the unresolved fast subgrid processes by means of a stochastic forcing, which we assume to act directly on the baroclinic modes and only indirectly on the barotropic modes. In order to introduce only dynamically consistent perturbations, we employ the projection operator method proposed in Frank and Gottwald (2013) to construct a stochastic forcing such that the energy of the unforced inviscid core of the two-layer QG model is preserved. This choice allows to retain the balance between the external forcing and the dissipation while redistributing the energy among the scales. The approach by Frank and Gottwald (2013) also introduces seamlessly state dependent noise and dissipation. This potentially also allows for a realistic representation of subgrid-scale effects as in previous studies (Berner et al., 2009;Dwivedi et al., 2019;Franzke et al., 2015;Jansen et al., 2015) since this approach also predicts the corresponding nonlinear damping. In previous approaches, the damping needed to be tuned in order to ensure numerical stability (Whitaker & Sardeshmukh, 1998;Zhang & Held, 1999). Our approach avoids any empirical tuning of the damping.
Since Gaussian white noise exists only as a distribution, stochastic evolution equations should be interpreted as integral equations (Gardiner, 2009;Pavliotis & Stuart, 2008). Hence, we slightly change notation toward this interpretation, where we dropped the integral symbol in order to have a not too heavy notation. In this work we, adopt Itô's interpretation of the stochastic integrals (Gardiner, 2009). We propose the following stochastically forced modification of the two-layer QG system (1): where W t denotes a Wiener process. The auxiliary stochastic process Y t , which is parametrized by B = B(x, t) and S = S(x, t), is determined to ensure that the stochastic forcing Σ(x, t)dW t preserves the energy given by the Hamiltonian (4) (Frank & Gottwald, 2013). Using Itô's formula (Gardiner, 2009), the change in the energy is given by where the matrix inner product is defined as A:B = a ij b ij = Tr(AB T ), and where Our aim is to control the stochastic forcing in order to preserve the energetic balance between the external forcing and the dissipation. In order to guarantee the total energy not to be affected by the stochastic forcing, we set σ H and the sum of those terms in μ H due to the stochastic processes to be zero. The auxiliary process must be constructed to force the deviations from the manifold of constant energy, caused by the stochastic forcing Σ(x,t)dW t , back onto the manifold. It should therefore only have components orthogonal to the manifold of constant energy. Thus, we define a projection operator P, which projects onto the tangent space of the energy manifold, and we require PS ¼ PB ¼ 0. Since the Wiener process affects only the evolution equation of the baroclinic mode, it is sufficient to project onto the manifold of constant baroclinic energy, and we define the projection operator P as where I stands for the identity operator. Using P ∇ q T H À Á ¼ 0, the condition σ H = 0 provides an expression for S, while it is possible to determine B by considering only the terms of μ H due to the introduction of the stochastic processes: We can now finally express our stochastic forced and damped two-layer QG model (5) as 10.1029/2020MS002115

Journal of Advances in Modeling Earth Systems
The stochastic forced and damped two-layer QG model (6) contains multiplicative noise and nonlinear damping, due to the specific definition of the projection operator. The multiplicative noise is in fact a correlated additive multiplicative (CAM) noise (Majda et al., 2009;Sardeshmukh & Sura, 2009). The interested reader may find more details about the necessary steps for the derivation of (6) in Frank and Gottwald (2013) and Gugole and Franzke (2019). An advantage of the projection operator approach is that it automatically predicts the necessary nonlinear damping in a energetically consistent form. The nonlinear damping is of cubic order and, thus, can ensure global stability (Majda et al., 2009;Peavoy et al., 2015).
In Equations 6, the noise strength Σ(x, t), which specifies the spatial covariance of the noise, is still unspecified. In Gugole and Franzke (2019), it was shown that the choice of a dynamically consistent spatial structure of the noise covariance is crucial for a stochastic parameterization to be reliable. We propose in the next section ways to prescribe the spatial structure.

The Spatial Covariance Structure of the Noise
We prescribe the spatial covariance of the noise by expressing Σ(x,t) through p dynamically relevant patterns of the large-scale dynamics φ i x; t ð Þ, i = 1,…,p. In particular, we write where the γ i ∈ R are weights associated with each pattern.
We shall discuss here two choices of patterns φ i : first, EOFs, which capture time-invariant climatological patterns, and, second, patterns obtained by means of DMD, which describe time-varying, dynamically adapted dominant patterns.

Empirical Orthogonal Functions 4.1.1. Theory
EOFs are a multivariate statistical analysis technique that derives the dominant patterns of variability from an n-dimensional field, usually indexed by location in space (von Storch, 1995;von Storch & Zwiers, 2003). Let X be an n-dimensional random vector, whose mean is assumed to be zero; otherwise, the anomalies of the field with respect to the mean should be considered. At its first stage, the EOF analysis computes the vector φ 1 with φ 1 ¼ 1 such that is minimized, where we denoted with E the expectation operator, the vector norm by · , and the inner product with · ; · h i. Equation 8 describes the projection of the field X onto a 1-D subspace spanned by the vector φ 1 . Minimizing ϵ 1 is equivalent to maximizing the variance of X contained in this subspace; in fact it can be shown that where the variance of X is defined to be the sum of the variances of its elements. Let Γ denote the covariance matrix of X.
It can be shown that φ 1 is an eigenvector of Γ with corresponding eigenvalue λ 1 . Therefore, the minimum of Equation 8 is achieved by the vector associated to the largest eigenvalue of Γ, that is, vector φ 1 .
The same procedure is repeated to find the second EOF, which is the vector φ 2 with φ 2 ¼ 1 minimizing and corresponding to the second largest eigenvalue λ 2 of Γ. Finally, we remark that Γ is an Hermitian matrix; hence, its eigenvectors are orthogonal to one another. Moreover, in case of translationally invariant systems, they correspond to Fourier modes.

Constructing Σ Using EOF
EOFs are computed on a time series of the baroclinic streamfunction (after the dynamics settled on the attractor) of the deterministic System 3 over a spatial grid with 128 × 128 elements.
To construct the spatial structure of the noise, we compute a linear combination of the first p EOF patterns φ EOF i for i = 1,…,p with weights given by the square roots of their corresponding eigenvalues λ EOF i i ¼ 1; …; p writing (7) as Hence, Σ is constant in time and Λ = ΣΣ T corresponds to the variance of the QG model's baroclinic stream function as approximated by the first p EOFs.

EOF Patterns
As in the majority of cases, the spectrum of the EOF eigenvalues rapidly decay, and the first five EOFs carry approximately 95% of the variance in our simulations. Higher EOFs do not carry significant variance and hence might be considered as numerical noise (see graph in the top left corner in Figure 1). EOFs 1 and 2 represent the predominant traveling Rossby wave supported by the two-layer QG model. EOF 3 (graph in the bottom left corner in Figure 1) does not represent any wave but captures the spatial dominant pattern associated with the jet stream. EOFs 4 and 5 capture again dominant wave patterns. In our numerical simulations, we use either only the first two EOFs, corresponding to Λ≈0.36, or the first five EOFs, that is, Λ≈0.47.
EOFs are widely used in climate science, thanks to their robust computability given a large data set. Nonetheless, EOFs have known limitations. In particular, their physical interpretation is restricted. While it is possible to associate the first EOF with observed physical features, this becomes increasingly complicated for higher-order EOFs, because of the orthogonality constraint (von Storch & Zwiers, 2003). We therefore introduce in the next section DMDs, which capture relevant modes, adapted to the prevailing dynamics.

Dynamic Mode Decomposition 4.2.1. DMD and the Koopman Operator
Here we briefly present the Koopman operator and its connection with DMD. Detailed reviews about the Koopman operator can be found, for instance, in Budišić et al. (2012) and Mezić (2013)  Let _ x ¼ f ðxÞ denote a general continuous-time dynamical system with initial condition xð0Þ ¼ x 0 ∈ R n . On the assumption that there exists a unique solution of this initial value problem, it is possible to introduce the flow map ϕ t such that x(t) = ϕ t (x 0 ). Define an arbitrary observable ψ(x). The value of this observable ψ, which the system sees starting in x 0 at time t, is The Koopman operator is a semigroup of operators K t , acting on the space of observables parameterized by time t It is important to underline that the operator K t is linear also in case of nonlinear dynamics f, thus it makes sense to consider its spectral properties, but the eigenfunctions of the Koopman operator are not necessarily linear.
DMD is a data-driven technique for computing an approximation of the Koopman modes. Consider a dynamical system as above, and two sets of data, either of the state variables or of any observable of them, such that where mΔt defines the time window, and δt ≤ Δt determines the accuracy of the reconstructed dynamics.
It is important to mention that matrices X and X′ are assumed to be tall and skinny, that is, it is assumed that the size n of a snapshot is larger than the number m−1 of snapshots. In the DMD algorithm, the Koopman operator is approximated by means of a least square fit operator K δt relating data X′≈K δt X. The numerically stable algorithm, based on a singular value decomposition and outlined for the first time in Schmid (2010) and improved in Tu et al. (2014), allows for a low-rank r ≤ m representation of the operator K δt onto the first r EOF modes of matrix X. Details about the algorithm, as well as a MATLAB © function, are provided in Kutz et al. (2016). The DMD modes φ i are the (complex) eigenvectors of K δt , and they are not orthogonal. Furthermore, they represent dynamically relevant structures, the so-called Koopman modes, whose temporal oscillation periods and growth rates are provided by their associated (complex) eigenvalues λ i . There exists a real eigenvalue λ 0 = 1 with eigenvector φ 0 corresponding to the mean of the observable x. Whereas EOF decomposes the dynamics according to dominant stationary patterns, DMD decomposes the dynamics according to its local in time oscillatory behavior.
We remark that there exists an intimate relationship between the DMD matrix K δt and the Koopman operator, first realized in Rowley et al. (2009). However, it is well established that DMD provides a good approximation of the actual Koopman operator-and hence constitutes a good representation of the underlying dynamics-only in case of sufficiently rich and diverse observations (Budišić et al., 2012;Tu et al., 2014;Williams et al., 2015). The least square approximation of the Koopman operator suggests that a good approximation is guaranteed for sufficiently small δt and for sufficiently small time intervals mΔt such that the dynamics are essentially linear.

Defining the Noise Covariance by Means of DMD
As for EOFs, we choose the baroclinic stream function ψ T to determine the DMD modes. In deterministic systems, the eigenvalues of the Koopman operator lie on the complex unit circle and, apart from the

Journal of Advances in Modeling Earth Systems
eigenvalue λ DMD 0 corresponding to the mean mode, appear as complex conjugate pairs. In stochastic systems, however, eigenvalues inside or outside the unit circle may appear; see Figure 2 for an instance of the DMD eigenvalues for the stochastic QG model (6). Since we want to capture the dynamically relevant patterns of the deterministic QG system, we exclude all eigenmodes φ DMD i whose eigenvalues do not lie on the unit circle (within some tolerance to account for numerical noise). The eigenvectors and eigenvalues are sorted with decreasing real part according to λ DMD we choose also its complex conjugate pair. To give a graphical illustration, the blue dot in Figure 2 corresponds to λ DMD 0 , while the green and orange dots to λ DMD 1 and λ DMD 2 , respectively, and their complex conjugates. The eigenmodes corresponding to the eigenvalues marked in red in Figure 2 are neglected since they are away from the unit circle.
To construct the spatial structure Σ(x,t) of the noise, we choose the first p = 2 dominant DMD patterns φ DMD 1;2 obtained from the low-resolution simulation of the stochastic two-layer QG system (6). Since the eigenvalues and the eigenfunctions are now complex, each mode is considered together with its complex conjugate; hence, Σ reads where ı 2 = −1 and c.c. denotes the complex conjugate. Finally, we normalize Λ = ΣΣ T to be either 47. This is done to ensure that the noise has equal intensity both with EOFs and DMDs and therefore have a fairer comparison of the results. To numerically estimate the first two complex conjugate DMD eigenpairs ðλ DMD i ; φ DMD i Þ for i = 1,2, we choose a small time interval δt = 0.1 (recall that δt needs to be chosen sufficiently small to allow for a reliable estimation of the DMD matrix K δt which encodes the dynamics). Furthermore, we choose a time window of mΔt = 4.8 time units, which corresponds to roughly half an eddy turnover time for the parameters of our setup (see section 5.1 for details), and a separation of snapshots of Δt = 3δt (implying m = 16). When numerically estimating singular value decompositions, only the first few singular vectors are reliable. An optimal truncation criterion was provided in Gavish and Donoho (2014) which, applied to our data, amounts to setting a low-rank approximation with r = 7 eigenmodes. We have tested that for the selected values of the parameters, DMD provides a good reconstruction of the dynamics in a time window of length mΔt time units, as can be seen in Figure 3, where the actual dynamic is shown alongside the DMD reconstruction. Other sets of parameters corresponding to different time windows spanning between 2 and 10 time units have been tested, but this particular choice was the only one among those tested which does not present two eigenvalues with null imaginary part and real part very close to 1. This second mean-mode cannot be excluded by our procedure since the module of its corresponding eigenvalue is still very close to 1, but by plotting

Journal of Advances in Modeling Earth Systems
and comparing it to the other modes, it can be seen that it is numerically spurious and not dynamically meaningful. We tested also the case with m = 48, r = 7, Δt≡δt = 0.1, that is, we considered a time window of the same length, and instead of subsampling-that is, sampling consecutive snapshots in the same data set every Δt>δt-we chose a small value of r, but the results show that subsampling is more efficient in filtering out the numerical noise.
Contrary to EOFs, which require a long offline simulation to be determined, the DMD pairs φ DMD 1;2 and λ DMD 1;2 are computed on the fly after each mΔt time units; hence, in this case, Σ is a function also of time. Since for the first mΔt time units the DMDs are not available yet, Σ is initialized using the first two EOFs. For simplicity, we do not propagate the DMD modes by means of the Koopman operator, but keep them constant for mΔt time units. We have checked that our results do not change much when propagating the DMDs in time to evaluate Σ at each time step. In our setup, the DMD modes do not move much away from the initial state in the selected mΔt time window, and this might be a reason why we obtained similar outcomes. For more complex models, a computationally cheaper alternative might be to recompute the DMD modes less often and to propagate the DMD modes for longer times.

DMD Patterns
In Figure 4, we show real and imaginary parts of the first two DMD modes as computed with the aforementioned set of parameters. The mode representing the mean has been neglected, and only one of the two modes corresponding to a complex conjugate pair of eigenvalues is displayed. Since the DMD analysis is repeated along the simulation, the resulting modes are not exactly the same for the entire run, but the eddies move in the zonal direction. Moreover, the eddies in the first mode slowly shift toward higher latitudes because of the meridional jet movement (as detected by the third EOF eigenvector). Since DMD decomposes the dynamics according to its oscillatory behavior, the jet cannot be represented by a DMD eigenmode for the reason that it is not a wave. Hence, in the DMD decomposition of the dynamics, the jet can be noticed only indirectly via its effect on the other modes. This is particularly evident when looking at the first mode as computed at the beginning (left column in Figure 4) and at the end (central column in Figure 4) of a simulation, when the difference in the meridional coordinate of the eddies is at its maximum. In this specific case, only the first mode is affected by the jet, while the eddies in the other modes retain the same meridional coordinate while revolving in the zonal direction. Hence, for sake of simplicity, we display the second mode only at the onset of a simulation (right column in Figure 4).
Real and imaginary parts of DMD mode number 1 resemble closely EOFs 1 and 2, although in the DMD mode, the eddy patterns look smaller and less regular. Furthermore, the eddies are centered in different meridional coordinates. This is likely due to the fact that EOFs capture directly the jet behavior, which is represented by EOF 3, while DMD perceives it indirectly by noticing the meridional shift of the eddies in the first mode. EOFs 4 and 5 are the most comparable eigenvectors to the second DMD mode (right column in Figure 4), but significant differences can be spotted for y ∈ 0:8; 1:8 ½ , where some eddy structure is present in the EOF vectors but is absent in the DMD mode. This could be an artifact due to the orthogonality constraint of the EOF algorithm.

Results
We now present numerical results comparing outputs of a high-resolution simulation of the deterministic forced and damped two-layer QG model (3) with those of a deterministic low-resolution simulation as well as with the energy-consistent stochastic parameterization (6) run at a low resolution. Particular emphasis is given on comparing the effect of the respective prescribed spatial noise structures, using either (9) or (10) for EOFs and DMDs, respectively.

Model Setup
As in most current ocean and climate models, we discretize Equations 3-6 by means of finite differences in a grid-point-based framework. The numerical discretization of the Jacobian operator in our QG model is based on the energy and enstrophy conserving scheme by Arakawa (1966). This scheme ensures that energy and enstrophy are conserved for all truncations in the inviscid case. In particular, this scheme does not require any numerical diffusion nor dissipation for numerical stability. For the time stepping of the deterministic part, we employ a fourth-order explicit Runge-Kutta method, while we use the Euler-Maruyama scheme for the stochastic terms (Pavliotis & Stuart, 2008). The inversion of the Laplacian is achieved in spectral space using fast Fourier transforms.
The simulations of the stochastic system (6) are run with a spatial resolution of 128 × 128 grid points and a time step of dt = 10 −3 . All simulations start from the same initial condition, which we have assured to lie on the attractor by having employed a preceding integration of the deterministic equations at resolution 128 × 128 for 8,000 time units. For each setting of the stochastic system, we run the analyses on an ensemble of 10 independent simulations and compare the outcomes with those of an equivalent deterministic low resolution and with those of a deterministic high-resolution simulation. The latter, which will be referred also as the reference solution, has been obtained by running the deterministic model (3) on a finer grid of 512 × 512 grid points. For numerical stability reasons, the reference solution is run with dt = 10 −4 , and its results are projected on the coarser 128 × 128 spatial grid, to allow for a direct comparison with the outcomes of the respective low-resolution simulations.

Total Energy
Looking at the total energy graphs of the different realizations in the various setups with EOF (orange) and DMD (green) reported in Figure 5, it can be noticed that on average, the energy is stable with both techniques, fluctuating by about 1-2% of its absolute value, which is about the same as in an inviscid setting (Gugole & Franzke, 2019). Although the EOF ensemble members show more variance (left column in Figure 5), when only the first two EOFs are used, the system seems to be slightly dissipative in time. This is particularly evident when looking at the ensemble mean (blue line in Figure 5). The inclusion of EOFs 3-5 reduces the dissipative effect, but realizations with a clear increasing trend can be present. Furthermore, some ensemble members drift away from the high-resolution simulation. This also raises questions about long-term stability of the simulations. On the other hand, the spread of the DMD ensemble members has less variance but individual runs are more energetically stable, and the system seems to be less dissipative compared with the EOF-based simulations (right column in Figure 5). This suggests that the usage of a dynamically adapted noise structure may help the numerical model to remain on the manifold of constant energy and in a dynamically consistent flow regime. In any case, deviations from the mean are less than 2%. Hence, they might be considered as negligible.

Long Time Statistics
The probability distribution functions (PDFs) are first computed per each grid point by means of a kernel density function and then averaged in the zonal coordinate. The outcomes reveal that the stochastic parameterization leads to improvements with respect to the low-resolution deterministic model. In particular, as it can be noticed in Figure 6, for the baroclinic PV, but similar outcomes are found also for the barotropic mode (not shown), in case of the low-resolution deterministic simulation, the field variables take values on, in general, smaller intervals with respect to the reference solution (see top row in Figure 6). The inclusion of the stochastic parameterization helps increase the extension of such intervals, but no relevant difference between the two methods for the construction of the noise covariance can be spotted (bottom row in Figure 6), as confirmed by the computation of the relative entropy by means of the Kullback-Leibler divergence (e.g., Cover & Thomas, 2012) (not shown).
Calculation of the eddy length, as in (Gugole & Franzke, 2019), does not reveal substantial differences either (not shown). This might be due to the weakly chaotic state of the system in our current setup, where the eddy length of the low-resolution deterministic model is already quite close to the reference solution. We expect this analysis to be more relevant in case of more chaotic systems. More insights are provided instead by the total variance. This has been computed per each grid point and considered as a spatial map. As the total energy graphs revealed, the EOF-forced ensembles display more variance with individual runs getting closer to the reference solution while others strongly depart from it. On the other hand, the DMD ensemble members instead display results more consistent to each other, and overall, it showed to have better performances. To give an idea, we computed the norm of the difference between the variance of any stochastic simulation with Λ≈0.47 and the variance of the reference solutions. We report here the results concerning the barotropic mode, but similar conclusions hold also for the baroclinic mode (not shown). In case of the EOF ensemble, the Euclidean norm of the difference in variance can vary between 0.0024 and 0.0070, while in case of the DMD ensemble, it remains between 0.0014 and 0.0042 (for reference, the norm of the difference between the highand low-resolution deterministic simulations is 0.0026).
As an example, we display in Figure 7 the difference between the variance of the reference solution and, from left to right, the low-resolution deterministic simulation, an EOF 1-5 induced stochastic simulation, and a DMD Λ≈0.47 forced stochastic run. This shows that our DMD scheme performs better than the EOF-based scheme and the low-resolution deterministic simulation.

Eddy Kinetic Energy
In order to compute the EKE, we first computed the horizontal velocities for the barotropic and baroclinic modes from the respective streamfunctions using where u is the zonal and v the meridional velocity. Then we considered a time window of k time units to compute the temporal mean velocities, that is, ū B , v B and ū T , v T for barotropic and baroclinic modes, respectively. Afterward, for each time unit, we computed the deviations from the mean, for example, u ′ B ðtÞ ¼ u B ðtÞ − ū B , and used these quantities to compute the EKE for each grid point for all t. As a last step, we averaged in time and then also in the zonal direction; therefore, the EKE is displayed simply as a function of the meridional direction y (Figures 8 and 9).
We split the time series in windows of 1,000 time units and consider each window individually. Such a length of the time intervals ensures one not to be looking just at transient dynamics while considering small movements of the jet, due to its low-frequency variability. Although the time-averaged EKE shows a bimodal behavior in all windows, the meridional location of the peaks varies according to the jet movement. Hence, we want to check how well the stochastic parameterization keeps track of the jet shift. The time-averaged EKE of the baroclinic mode for t ∈ 1; 000; 2; 000 ½ and for t ∈ 3; 000; 4; 000 ½ in the different  stochastic setups with EOF (orange) and DMD (green) are reported in Figures 8 and 9, respectively. The EKE of the barotropic mode shows similar results as for the baroclinic mode; hence, it is not reported here.
Both for t ∈ 1; 000; 2; 000 ½ and t ∈ 3; 000; 4; 000 ½ , it can be seen that the ensemble forced by EOFs 1 and 2 displays higher EKE values with respect to the reference solution, which are compensated in the mean (blue line in Figures 8 and 9) by simulations with lower EKE. This is particularly evident at later times (Figure 9), where the uncertainties grow in time and the single members do not display a coherent behavior, that is, different realizations have different meridional locations for the bimodal structure and rather different EKE amplitudes. The introduction of EOFs 3-5 brings the maximum EKE values closer to the reference but leads also to lower minimum values and does not help the ensemble members to maintain a coherent behavior for longer times. It can be further noticed in Figure 9 that, both with EOFs 1 and 2 and with EOFs 1-5, the EKE of the stochastic realizations is shifted to too high meridional positions. On the other hand, the DMD-forced ensembles have less variance and do not always enclose the reference solution, but they remain close to it and they follow quite well the meridional movement of the jet. This suggests that the DMD approach is introducing physically meaningful perturbations without disrupting the large-scale dynamics. Furthermore, in the DMD ensembles, the uncertainties grow much more slowly in time, allowing the single members to display a coherent behavior also at later stages of the system evolution.
Our results suggest that the use of a dynamically adapted noise covariance matrix in stochastic parameterizations is better suited to model phenomena, which do not reach statistical equilibrium, while keeping track of the large-scale dynamics. Moreover, considering the wide usage of DMD to detect dynamical features like instabilities and bifurcations (Bagheri, 2013;Budišić et al., 2012;Gottwald & Gugole, 2020;Kutz et al., 2016), a dynamically adapted spatial correlation might more easily foster the system toward tipping points, while

10.1029/2020MS002115
Journal of Advances in Modeling Earth Systems the use of climatic statistical patterns might push the system toward more statistically common scenarios and away from, for example, extreme events. On the other hand, climatic statistical patterns induce more variance in the ensemble, which might be desirable for certain applications.

Summary and Discussion
In this study, we develop a novel way to derive dynamically based noise covariance matrices which are flow dependent. In the framework of the forced and damped two-layer QG model, we consider an energy-consistent stochastic parameterization based on a projection operator approach (Frank & Gottwald, 2013). As shown in Gugole and Franzke (2019), the definition of the noise spatial structure is of fundamental importance for this parameterization to return physically meaningful results; hence, we analyze here two different procedures for its estimation. In particular, we investigate a statistical and a dynamical approach by using two different dimension reduction techniques: EOFs and DMD. The former looks at the variance field of the fluid, while the latter is strictly linked to the Koopman operator and hence to the generator of the dynamics. EOFs have been widely used in the literature; nevertheless, there is in general no one-to-one correspondence between the EOF eigenvectors and physical modes (von Storch & Zwiers, 2003). Moreover, being a statistical technique, it requires long time series in order to obtain reliable patterns. In contrast, DMD is able to work with tall and skinny matrices (Kutz et al., 2016), hence also with very short time series, and it describes oscillatory modes. Therefore, the choice of the length of the time series, mΔt, and the temporal shift between the two input matrices, δt, are crucial and serve as scale selection. In our model setup, we use half an eddy turnover time as a physically based time interval to recompute the DMD. Since DMD decomposes the dynamics according to its local in time oscillatory behavior, its modes and the noise covariance have to be recomputed periodically. This is a new approach in stochastic parameterizations allowing the noise covariance to be a function of time, while typically a fixed noise covariance is used during the whole realization.
Total energy graphs reveal that the EOF ensembles are either more dissipative or might include realizations with a clear increasing trend. On the other hand, DMD runs are individually more energetically consistent, suggesting that a dynamically adapted noise structure might help the model to stay on the manifold of constant energy. This might suggest that this approach may have the potential to lead to more dynamically consistent simulations and long-term stability. The computation of the PDFs revealed a general improvement of the stochastic simulations with respect to the low-resolution deterministic simulation but does not expose any meaningful difference due to the different method for the noise covariance definition. The analysis of the variance instead displays a clear preference for a dynamically adapted noise covariance. By analyzing the EKE, it has been discovered that in case of EOFs, the uncertainties grow faster, which induce the single ensemble members to display very different amplitudes of the EKE. Furthermore, the location of the bimodal structure of the EKE ensemble mean is not well defined among the individual realizations, and it is in general moved toward too high meridional locations. The DMD-forced ensembles instead are able to follow the meridional jet shift and well catch the meridional location of the double peak also at later times. Lastly, the uncertainties within the DMD ensemble grow more slowly with respect to the EOF ensemble. On one hand, this allows the individual members to display a coherent behavior during the entire simulation and to stay close to the reference solution. On the other hand, this implies a reduced spread among the ensemble members.
As regards computational time, DMD is very cheap, can reliably deal with rather short time series, and does not need extra computations beforehand, but can be run alongside the main code. These aspects allow the DMD algorithm to periodically reanalyze the dynamics and redefine the noise covariance accordingly. Hence, it is a very good candidate to parameterize scales undergoing phase transitions, or which do not reach statistically stable profiles. This might also allow DMD to be used for scale-adaptive parameterization schemes. Moreover, due to its close link to the Koopman operator and to its ability to detect instabilities and bifurcations within dynamical systems (Bagheri, 2013;Kutz et al., 2016), it might foster the system to reach tipping points or regime transitions.
Our results suggest that a dynamically adapted spatial structure should be considered in future developments of stochastic parameterizations. This is further motivated by the physics. Not only are the large scales affected by the small scales, but also the small-scale processes are influenced by the large-scale motions. Hence, physically correct parameterizations of the unresolved scales should allow the subgrid processes to be influenced by the resolved modes. This is a key aspect, which is gaining more and more consideration when developing stochastic parameterizations for the unresolved processes; see, for instance, Edeling and Crommelin (2019). Furthermore, the propagation of the DMD modes by means of the Koopman operator might be seen as a sort of memory term, which in turn has been shown to be important in parameterization schemes Gottwald et al., 2017;Hu & Franzke, 2017;Sakradzija et al., 2015). However, more detailed studies are required to establish what kind of relation, if any, exists between the propagation of the DMD modes and memory terms. If such a connection is proved to exist, this would be very useful from the numerical point of view since many memory terms require not easy computations and are often unstable; see, for instance, Demaeyer and Vannitsem (2018). More studies are required also to analyze the technical performance of our approach when implemented in parallel. As a matter of fact, GCMs make use of many CPUs, which means that our approach would require more communication among the CPUs during the simulation and could potentially slow down the model. This could be prevented by recomputing the DMD modes less often and propagating them for a longer time, although this could potentially depend both on the system and on its particular state. Finally, the results here obtained in the framework of the projector operator approach might not hold when considering other stochastic parameterizations. In fact, different parameterizations start from different assumptions about the mathematics or the physics involved; hence, they might in general lead to different conclusions.

Data Availability Statement
Model scripts are publicly available on Zenodo © : https://doi.org/10.5281/zenodo.3726977. Data have been generated by use of the aforementioned scripts.