Treating sample covariances for use in strongly coupled atmosphere-ocean data assimilation

Strongly coupled data assimilation requires cross-domain forecast error covariances; information from ensembles can be used, but limited sampling means that ensemble derived error covariances are routinely rank deﬁcient and/or ill-conditioned and marred by noise. Thus, they require modiﬁcation before they can be incorporated into a standard assimilation framework. Here we compare methods for improving the rank and conditioning of multivariate sample error covariance matrices for coupled atmosphere-ocean data assimilation. The ﬁrst method, reconditioning, alters the matrix eigenvalues directly; this preserves the correlation structures but does not remove sampling noise. We show that it is better to recondition the correlation matrix rather than the covariance matrix as this prevents small but dynamically important modes from being lost. The second method, model state-space localization via the Schur product, eﬀectively removes sample noise but can dampen small cross-correlation signals. A combination that exploits the merits of each is found to oﬀer an eﬀective alternative.


Introduction
The ideal approach to coupled data assimilation is strongly (or fully) coupled in which components of the Earth system are analyzed within a single seamless assimilation framework: a single assimilation algorithm is used to combine the available observations from each subcomponent with a fully coupled model forecast state, producing a single fully coupled analysis.Despite recent advancements (e.g., Frolov et al., 2016;Laloyaux et al., 2016;Lea et al., 2015;Sluka et al., 2016) there are still a number of scientific challenges to solve before coupled assimilation can become the primary technique for weather and climate forecasting and reanalysis (Penny & Hamill, 2017).A particular challenge for strongly coupled data assimilation systems is modeling the cross covariances that characterize the relationship between errors in the atmosphere and ocean model forecasts (Smith et al., 2017).
Our initial work (Smith et al., 2015) with an idealized single column coupled atmosphere-ocean model and incremental 4D-Var assimilation framework found that even when the errors in the initial atmosphere and ocean forecast states are assumed to be uncorrelated (i.e., cross-domain error covariances set to zero) the strongly coupled approach generally outperforms both the standard uncoupled and intermediate weakly coupled formulations in terms of producing more balanced initial analysis fields and reducing initialization shock in subsequent forecasts.This is attributed to the ability of the strongly coupled 4D-Var algorithm to implicitly generate atmosphere-ocean error cross covariances (see, e.g., Bannister, 2008) thereby enabling observations in one fluid to directly influence the analysis increments in the other.It is expected that explicitly prescribing nonzero atmosphere-ocean cross-domain forecast error covariances a priori will have further positive impact on the assimilation in that it will allow near-surface observations to be used to even greater effect and thus lead to even greater consistency between the atmosphere and ocean analysis states.
The problem of 4D-Var data assimilation is to find the initial state such that the model forecast best fits the available observations over a given time window, while remaining close to a given a priori guess (the "background") and allowing for the uncertainty in each.This best estimate (the "analysis") should be consistent with both the observations and the system dynamics.Estimates of the statistics of the errors in the background state and observations are described by the background (or forecast) and observation error covariance matrices, B and R, respectively.Variational methods have traditionally used a static or climatological estimate of B, but more recently, methods have been developed to incorporate flow-dependent ensemble A covariance matrix derived from a sample of size m can have at most rank (m − 1) and so can only be full rank if m > n where n is the dimension of the state space.Given that the size of most ensembles is restricted by computational resources, ensemble error covariances will routinely be rank deficient; further, when the size of the ensemble is significantly smaller than the dimension of the state the effect of sampling errors is likely to be nonnegligible.
For uncoupled ensemble-based data assimilation systems, various localization techniques have been designed to treat the problems associated with small ensembles, in both model and observation space (see, e.g., Ménétrier et al., 2015, for a review of current methods).Standard 4D-Var formulations work in model space and require B to be symmetric positive definite (so all eigenvalues are strictly positive) so that B −1 exists.By construction, a sample covariance/correlation matrix will be symmetric positive semidefinite; this means that it may have zero eigenvalues and hence be numerically rank deficient.Even if a matrix is positive definite, it may still be poorly conditioned and hence not accurately invertible (Golub & Van Loan, 2013).In either case, before we can use such a matrix in a standard data assimilation framework it will need to be modified.
The purpose of this letter is to address these issues in the context of strongly coupled atmosphere-ocean data assimilation; specifically, we investigate two strategies for treating fully coupled sample error covariance matrices that are rank deficient and/or ill-conditioned and evaluate their effectiveness in terms of preserving the underlying error correlation structures, particularly those between the atmosphere and ocean.The first approach, which we term "reconditioning," is to perform an eigenvector decomposition and alter the spectrum of the matrix directly: making any zero eigenvalues positive and moving the largest and smallest eigenvalues closer together; the matrix is then reformed using the modified eigenvalues and original eigenvectors (see, e.g., Daniels & Kass, 2001;Ledoit & Wolf, 2004 for discussion of variations of this type of approach, and Weston et al., 2014 andCampbell et al., 2017 for recent examples of its application).The second option is to use localization techniques (e.g., Houtekamer & Mitchell, 2001).A standard approach to model space localization is to use the Schur product (element-wise multiplication) of the raw ensemble forecast error covariance matrix and a positive-definite correlation matrix,  ∈ R n×n ; this process acts to remove the zero eigenvalues of the raw ensemble forecast error covariance/correlation matrix and thus increase its rank.
Although localization in model space is the natural choice for variational-based data assimilation, the size of the coupled model state, the contrast in scales between the atmosphere and ocean, plus the use of different coordinate systems all combine to make the coupled localization problem more complex.Both the strategies we present can be used to ensure that the matrix is strictly positive definite and at the same time improve its conditioning; improving the conditioning will in turn improve the convergence and stability of the 4D-Var minimization.

Reconditioning
There are various ways in which the eigenvalues of a matrix can be modified.The simplest approach is to set a minimum threshold,  tol , and reset all eigenvalues less than this value equal to it (e.g., Weston et al., 2014).Since only the smallest eigenvalues are changed, this method modifies the overall eigenvalue structure.As discussed in Weston et al. (2014), resetting the smallest eigenvalues to the same fixed value results in many of the small diagonal values of the original matrix being reset to an almost constant value.An alternative approach is to specify a required condition number,  tol , and increment all eigenvalues by a fixed amount  inc such that where  max ,  min are the largest and smallest eigenvalues, respectively.Although incrementing all eigenvalues means that the largest eigenvalues are changed, the effect is relatively modest compared to the effect on the smallest eigenvalues; furthermore, this method has the advantage that the overall structure of the eigenvalues is preserved.
Note that reconditioning a covariance matrix as opposed to a correlation matrix will have differing effects on the underlying correlation structures.As we discuss in section 4.1, this result is particularly important for coupled systems due to the range of scales involved.
SMITH ET AL.

Localization
Localization is used as a way of filtering out spurious correlations from ensemble covariances when small ensembles are used but can also be designed to increase the effective ensemble size and ensure that the covariance matrix is full rank (Hamill et al., 2001;Oke et al., 2007); therefore, it can potentially also be used as a way of improving the conditioning of ensemble error covariance/correlation matrices that are derived from a large ensemble size (and so relatively noise free) but are rank deficient or poorly conditioned.Typically, the localization matrix  is assumed to be an isotropic distance-dependent function such that the amplitude of the error correlation between two locations decreases as the distance between them increases and falls to zero beyond a given separation; the rate of correlation decrease and cutoff depends on a user defined length-scale parameter.
The development of a coupled atmosphere-ocean vertical localization function presents two key issues: first, the atmosphere and ocean model components may use different vertical coordinate systems and have very different localization length scales; second, the majority of preexisting localization methods have been developed for uncoupled systems and are focused on a single state variable, or use control variable transform/balance operator techniques that convert to a new set of uncorrelated variables (e.g., Kepert, 2009).The cross media atmosphere-ocean vertical localization problem is naturally multivariate, and the assumption that the errors in the atmosphere and ocean model forecasts are correlated is a fundamental one.
A popular form for  in meteorological applications is the 5th-order piecewise rational function described by Gaspari and Cohn (1999) (their equation (4.10), hereafter referred to as GC).As it is written, this is a univariate localization function and it is easy to see how the entries of  might be constructed for a single state variable on a discrete 1-D model grid; it is less immediately obvious how to construct  when the model state vector is multivariate.Roh et al. (2015) use a simple bivariate model to investigate two strategies for the design of localization functions for multivariate state vectors.The first method is based on forming a multivariate localization matrix from the univariate GC function.A similar approach is adopted by Frolov et al. (2016) for localizing the coupled cross-domain ensemble covariances in their atmosphere-ocean interface solver.One potential flaw in this approach, noted by Roh et al. (2015), is that a multivariate localization matrix constructed this way is not guaranteed to be positive definite; they propose fixing this by setting  equal to the product of a univariate function and a symmetric, positive-definite matrix whose diagonal entries are one.However, this acts to further attenuate cross-variable correlations and so is undesirable for cases where these correlations are small but significant.
For coupled atmosphere-ocean assimilation, the importance of intervariable error correlations is well acknowledged, especially in the lower atmosphere-upper ocean boundary layer; but these correlations may be small relative to single variable autocorrelations and to correlations between errors in variables within the same fluid; therefore, extra care must be taken to that ensure that the localization does not destroy the smaller, yet physical, atmosphere-ocean error cross correlations.We have developed a vertical localization strategy built on similar ideas to those proposed by Roh et al. (2015) and Frolov et al. (2016).Our approach is the same on a broad level, but here we give specific consideration to the trade-off between filtering sampling noise and improving conditioning, and preserving the atmosphere-ocean error cross-correlation signals; this aspect of coupled model space localization is not addressed in previous studies.
Unlike the reconditioning approach, localization via the Schur product can be applied to either the raw error covariance matrix or error correlation matrix with the same effect; for ease of presentation we focus our discussion on the ensemble forecast error correlations.We partition the ensemble atmosphere-ocean forecast error correlation matrix into blocks where n = n a + n o is the dimension of the combined atmosphere-ocean model state vector.The blocks C AA ∈ R n a ×n a and C OO ∈ R n o ×n o are square matrices representing the atmosphere and ocean state forecast error correlations, respectively.The off-diagonal block C AO ∈ R n a ×n o contains the cross correlations between errors in the atmosphere and ocean state variables.These blocks can be further decomposed into submatrices containing the error autocorrelations for individual variables and error cross correlations between different variables.Note that neither C AO nor its submatrices are square.We apply a separate GC-type localization function to each submatrix so our multivariate  has the same block structure as (2).SMITH ET AL.

The Coupled 1-D Model System
To compare the methods described in sections 2.1 and 2.2, we use sample coupled error covariances and correlations that were generated using an idealized strongly coupled single-column atmosphere-ocean incremental 4D-Var assimilation framework.In this section we provide a brief overview of this system; a full description of the model and strongly coupled incremental 4D-Var assimilation algorithm is given in Smith et al. (2015); details of the ensemble methodology used to estimate the coupled forecast error correlations are presented in Smith et al. (2017).

The Model
The model consists of a simplified version of the European Centre for Medium-range Weather Forecasts atmosphere single column model, which is based on an early cycle of their Integrated Forecast System code, coupled to a 1-D mixed layer ocean model developed by the National Centre for Atmospheric Science Centre for Global Atmospheric Modelling (Woolnough et al., 2007) and based on the K-Profile Parametrization (KPP) vertical mixing scheme of Large et al. (1994).The two components communicate via the exchange of sea surface temperature and surface fluxes of heat, moisture, and momentum.The full model equations are presented in Smith et al. (2015), together with the model validation.A detailed discussion of the different interactions between the atmosphere and ocean model variables and their errors is presented in Smith et al. (2017).

Strongly Coupled Incremental 4D-Var Data Assimilation
Solving the full nonlinear 4D-Var problem directly can be a complex and expensive procedure; the incremental formulation (Courtier et al., 1994) circumvents these issues by linearizing about the current state estimate and replacing the nonlinear problem with a sequence of linear least squares problems.Rather than searching for the optimal initial state directly, it searches for increments to the initial background estimate; this is done iteratively via a series of linearized inner-loop quadratic cost function minimizations and nonlinear outer-loop update steps (see, e.g., Lawless et al., 2005).For strongly coupled incremental 4D-Var the control vector consists of both the atmosphere and ocean prognostic variables, the observation vector contains both atmosphere and ocean observations, and the coupled model is used in both the inner and outer loops.

Ensemble Derived Forecast Error Correlations
In a follow up (Smith et al., 2017) to our initial study (Smith et al., 2015), estimates of the atmosphere-ocean error cross correlations for our system were derived using the analysis ensemble methodology of Zagar et al. (2005).This approach generates samples of background error by taking short-range forecast differences between members of a data assimilation ensemble; the statistics of these differences are then used as a proxy for the forecast error covariance matrix.Experiments were identical twin; each ensemble consisted of 500 cycled strongly coupled incremental 4D-Var data assimilations, with each member starting from a different randomly perturbed initial background state and assimilating a different set of randomly perturbed observations over eight consecutive 12 h cycles.An ensemble of 500 was chosen after looking at the convergence of the correlation structures for successively larger ensemble sizes.For the purposes of this paper, we consider the effect of modifying the raw ensemble error covariance/correlation matrix derived for a winter case in which the true state is given by a coupled model forecast from 00 UTC on 2 December 2013; the matrix was computed by averaging 12 h forecast differences between pairs of analysis ensemble members from cycles 2, 4, 6, and 8.Both the covariance and correlation matrices have zero eigenvalues and so are rank deficient; they also clearly contain sampling noise.

Matrix Reconditioning
For our system the smallest eigenvalues are mainly associated with the ocean where variability is low compared to the atmosphere.Increasing the smallest eigenvalues (either to the same constant value or incrementally) mainly has the effect of weakening the error correlations in and between the ocean fields, and the error cross correlations between the ocean and atmosphere; the larger the value of  tol , or the smaller the specified  tol , the bigger the increase in the smallest eigenvalues and the greater the effect on the ocean error correlations.This effect is particularly pronounced when we recondition the error covariance matrix as opposed to the error correlation matrix, as illustrated in Figure 1 for the case  tol = 10 4 .The differences between the original and reconditioned correlation matrix are very small (the minimum and maximum SMITH ET AL. 10.1002/2017GL075534 differences in Figure 1d are ∼ (10 −3 )); the structure of the correlations has been retained but so too has the sampling noise.Figure 1c shows the result of reconditioning the covariance matrix and then normalizing by premultiplying and postmultiplying with a diagonal matrix of inverse forecast error standard deviations.Here the structure of the ocean cross correlations and the atmosphere-ocean cross-correlation structure has been almost completely lost.
The largest and smallest (nonzero) eigenvalues of the ensemble error covariance and correlation matrices will not be the same and will not necessarily correspond to the same modes of variability, that is, the directions of greatest variance are not necessarily the same as the directions of greatest correlation, and so modifying the eigenvalues of the error covariance matrix versus the error correlation matrix will have differing effects on the correlation structures.In order to retain as much of the atmosphere-ocean cross-domain error correlation information as possible, it is better to modify the ensemble correlation matrix rather than the covariance matrix.This highlights an important point: even though a particular eigenvector may be associated with a small eigenvalue of the covariance matrix (and therefore assumed to account for little variance) it may still be dynamically important, particularly for coupled systems where the individual components have very different space and time scales.Care should therefore be taken with methods that truncate matrix eigenvalues, such as reduced order assimilation techniques.

Localization
Construction of the localization matrix  requires specification of a distance measure and decorrelation length scales.Generally, the localization distance is measured as the Euclidean distance between model grid points.Our ocean model uses a fixed vertical grid that stretches from 1 to 250 m depth, and so computing the distance between two ocean model grid points (for subblocks of  OO ) is simple.The atmosphere model uses a 60 level hybrid coordinate system that depends on the surface pressure; however, this is prescribed and does not vary significantly over the period of our experiments.We therefore assign each model level a fixed mean pressure value which we convert to a set of altitudes; this enables us to define the distance between atmosphere model levels in terms of difference in height (m) and makes the definition of the distance between the atmosphere and ocean points (for subblocks of  AO ) straightforward.If we denote the height at level i (i = 1, … , 60) of the atmosphere model as z a (i), and the depth of level j (j = 1, … , 35) of the ocean model as z o (j), element (i, j) of each submatrix of  AO will then depend on the distance since atmosphere heights are positive upward and ocean depths are positive downward.
For the localization length scales, we should, ideally, specify a separate value for each individual state variable and for each combination, or pair, of variables.However, initial experiments showed that the positive definiteness of the constructed  matrix depended on how these values were chosen.In general, using different length scales for different subblocks of  failed to produce even a semidefinite matrix whereas using a single fixed value for all atmosphere variables in  AA , and similarly for all ocean variables in  OO did.Although we found some variations in the ensemble error correlation length scales from one variable to another within each fluid, they were all of the same order of magnitude and so this approach is not unreasonable.Further, we found that if  and C are both positive semidefinite then their Schur product can be strictly positive definite; however, we are not currently aware of any theorems that state the conditions for which this result is guaranteed.
For the block C AO we are primarily interested in retaining the atmosphere-ocean error cross correlations within the near-surface atmosphere-ocean boundary layer, as this is where they are strongest (Smith et al., 2017).For our model, the atmosphere ensemble error correlation length scales are approximately 2 orders of magnitude greater than those for the ocean over the time scales we consider; to ensure that the atmospheric variables do not have undue influence on the ocean beyond the near-surface boundary, we need to account for this disparity when we define the localization length scales for the subblock  AO .Similarly to Frolov et al. (2016), we found that this could be achieved by replacing (3) with the scaled distance  where L a and L o represent the height and depth of the near-surface atmosphere-ocean boundary region, respectively.The elements of  AO then take the form (cf. GC equation (4.10)) (5) When both z a > L a and z o > L o ,  AO = 0; when z a < L a and z o < L o ,  AO will be close to 1 for points near the surface and decrease as z a → L a and/or AO will be nonzero but small and become smaller as z o → L o or z a → 2L a ; a similar argument holds when z a < L a and L o < z o < 2L o .Appropriate values for L a and L o will depend on a number of factors including the ensemble size, model system, and forecast period.In this case, choosing short correlation length scales reduces the condition number of the localized matrix but overdamps the correlation structures, particularly those in the atmosphere-ocean surface region, meaning that smaller correlations are lost.Unlike the reconditioning approach we cannot choose the condition number or minimum eigenvalue.Here localization produces a bigger adjustment to the larger eigenvalues in addition to increasing the smallest eigenvalues and so has a much greater impact on the overall eigenvalue structure; it is not possible to reduce the condition number to the same extent using this approach without completely destroying the matrix cross correlations.As an example, Figure 2 compares the raw and localized error correlation matrices for the cases L a = 7, 500 m, L o = 75 m (Figures 2b and 2e) and L a = 3, 750 m, L o = 37.5 m (Figures 2c and 2f ).Halving the length scales reduces the condition number, (C), by a factor of 10 from (10 9 ) to (10 8 ) but the strength of the cross correlations are noticeably reduced.We found that we could obtain (C) ∼ (10 4 ) by reducing the length scales to L a = 200 m, L o = 2 m but the localized correlations are quite obviously incorrect (Figures 2d and 2g).

Combining Reconditioning and Localization
Another potential strategy is to employ a combination of the first two: first reconditioning the raw error correlation matrix to produce a required condition number and then localizing to remove noise; since the reconditioned matrix will already be positive definite this will allow more flexibility in the choice of localization function and length scales.Figure 3 shows the result of this approach; first reconditioning using  tol = 10 4 and then localizing using L a = 7, 500 m, L o = 75 m.Comparing this with the results of the individual methods (Figures 1 and 2) shows that this combined approach produces a matrix that is visually similar to the localized correlation matrix in Figure 2b in that the sampling noise has been filtered (compare Figures 2e and 3c).However, the matrix in Figure 3b has condition number (10 4 ), as opposed to (10 9 ) for the matrix in Figure 2b; the reconditioning step has enabled the matrix condition number to be reduced without destroying the correlation structure.

Summary
In this study we have compared methods for improving the rank and conditioning of multivariate sample forecast error covariance matrices in the context of strongly coupled atmosphere-ocean data assimilation: SMITH ET AL.

10.1002/2017GL075534
reconditioning whereby the matrix eigenvalues are altered directly, and model space localization via the Schur product.There are advantages and disadvantages to each of these strategies.Assuming the computational effort required to perform the eigen-decomposition is not restrictive, matrix reconditioning is simple: it does not require changes to the model coordinate system or tuning of length scales and so offers a practical option when sampling noise is not an issue, for example, when the sample size is large relative to the dimension of the state.However, as illustrated, modifying the eigenvalues of an error covariance matrix as opposed to the associated error correlation matrix will have differing effects on the underlying correlation structures.Care needs to be taken with the choice of  tol or  tol particularly when applying the approach to systems with a range of scales.An important finding from this study is that methods that retain only the leading eigenpairs after decomposition, such as empirical orthogonal functions (EOFs), may not be optimal for representing the dynamics of a coupled system.At the very least, consideration needs to be given to the scaling of the problem to ensure that eigenvalues that are small but correspond to dynamically significant modes are not truncated.
Within the data assimilation community, localization is often the default choice for treating noisy, rank deficient ensemble covariances.However, the standard model space Schur product technique does not immediately translate to coupled systems; here we have presented a potential approach in the context of our 1-D system.Compared to reconditioning, Schur product localization is very effective at removing spurious noise, but because it is a more blanket approach it can also destroy true error correlation signals that are small.Because a pair of atmosphere and ocean points in a layered model will always have vertical separation, the scaled distance d in (5) can never be zero and  AO ( d) will always be less than 1; therefore, any cross-domain error correlations will always be damped by this type of distance-dependent localization.This can be remedied to a certain extent by an appropriate choice of atmosphere and ocean length scales, but the GC based localization function presented here is restrictive in its current form in that it does not allow us to account for differences in the correlation length scales for different components of the state vector.There are strategies proposed in the statistics literature for constructing positive-definite localization matrices with inhomogeneous length scales that could potentially be used to overcome this limitation (see, e.g., Kleiber & Porcu, 2015;Paciorek & Schervish, 2006).There is also a trade-off between the length scales and the conditioning of the localized matrix; shorter length scales give a lower condition number but also wipe out the off-diagonal correlation matrix elements and so are useless in this context.
Finally, we presented a third strategy that combines the benefits of the first two approaches; using reconditioning to reduce the matrix condition number and then localizing to reduce the sampling noise.Obviously, this approach will be computationally more expensive, but it offers greater flexibility in that it places less restriction on the choice of correlation length scales when a certain level of conditioning is desired.This may prove to be useful in the development of multivariate localization functions with multiple length scales.
Naturally, the topic of cross-fluid localization requires further exploration and will no doubt ultimately demand a far more sophisticated approach for full 3-D systems; nonetheless, this study has highlighted some crucial issues that will be important to build upon going forward.

Figure 1 .
Figure 1.Comparison of the structure of (a) the original ensemble error correlation matrix, (b) the reconditioned error correlation matrix, and (c) the normalized reconditioned error covariance matrix when  tol = 10 4 .(d and e) Show the difference between Figures 1a and 1b, 1c.Model levels 1-240 correspond to atmosphere temperature, specific humidity, and zonal and meridional wind on 60 levels; model levels 241-380 correspond to ocean temperature, salinity, and zonal and meridional current on 35 levels.

SMITHFigure 2 .
Figure 2. Comparison of the structure of (a) the original ensemble error correlation matrix and the localized error correlation matrix for length scales: (b) L a = 7, 500 m, L o = 75 m; (c) L a = 3, 750 m, L o = 37.5 m; and (d) L a = 200 m, L o = 2 m. (e-g) Show the difference between Figures 2a and 2b-2d.Model levels are as in Figure 1.

Figure 3 .
Figure 3.Comparison of the structure of (a) the original ensemble error correlation matrix and (b) error correlation matrix that has been reconditioned by setting  tol = 10 4 and then localized using L a = 7, 500 m and L o = 75 m.(c) The difference between Figures 3a and 3b.Model levels are as in Figure 1.