An Adjoint‐Free Alternating Direction Method for Four‐Dimensional Variational Data Assimilation With Multiple Parameter Tikhonov Regularization

Tikhonov regularization is critical for accurately specifying both the background (B) and observational (R) error covariances in four‐dimensional variational data assimilation (4DVar). The ratio of the background and observation error variances (referred to as the B‐R ratio) is the key to ensuring that 4DVar maximizes the information extracted from the observations. However, it is difficult to specify the regularization parameters in a high‐dimensional variational data assimilation (VDA) system for both the single‐ and multiple‐parameter regularization schemes. In this study, we used a maximum likelihood estimation (MLE)‐based inflation scheme that originated from the ensemble Kalman filter (EnKF) community and proposed an alternating direction method (ADM) to minimize the 4DVar cost function with the iterative application of multiple regularization parameters to simultaneously optimize the regularization parameters and model states under the framework of the nonlinear least‐squares 4‐D ensemble variational data assimilation method (NLS‐4DVar). The big‐data‐driven version of NLS‐4DVar (BD‐NLS4DVar) with multiple‐parameter Tikhonov regularization was able to adjust the B‐R ratios more accurately. Several groups of observing system simulation experiments (OSSEs) based on 2‐D shallow‐water equations demonstrated that BD‐NLS4DVar with multiple‐parameter Tikhonov regularization produced a substantial performance improvement over the standard BD‐NLS4DVar method with no regularization.


Introduction
Data assimilation can effectively combine observations and the background in an optimal way to yield the optimal estimation of model initial fields and thus improve the accuracy of numerical weather prediction (NWP) (e.g., Rabier et al., 2000). The effective fusion of the background states and observations is heavily dependent on the a priori specification of both the background (B) and observational (R) error covariances, as their accuracy will eventually influence the final assimilation performance (e.g., Li et al., 2009;Zhong et al., 2014). As the two major methods for data assimilation, both four-dimensional variational data assimilation (4DVar; Navon et al., 2005;Rabier et al., 2000) and the ensemble Kalman filter (EnKF; Evensen, 2009;Houtekamer et al., 2014) must estimate both the background and observational error covariances accurately to improve their assimilation accuracy.
In 4DVar, the ratio of the background and observation error variances (referred to as the B-R ratio) is believed to be critical for maximizing the information that can be extracted from observations (Johnson et al., 2000). However, an accurate estimate of the variance ratios is not generally available; 4DVar with multiple-parameter Tikhonov regularization is utilized to simultaneously optimize the model states and variance ratios by incorporating the Tikhonov regularization theory into 4DVar. Furthermore, the B-R ratio can be interpreted as a regularization parameter (Johnson et al., 2005). The accurate specification of the Tikhonov regularization parameter is the first and most important step in the implementation of the 4DVar with Tikhonov regularization. However, this specification process is extremely difficult in a high-dimensional variational data assimilation (VDA) system, and previous studies were largely implemented with low-dimensional toy models rather than actual high-dimensional systems. Using the L-curve rule regularization parameter selection, Zhao et al. (2011) first tested the Tikhonov regularization method to conduct radar observation data assimilation in a real high-dimensional VDA system based on the Weather Research and Forecasting (WRF) three-dimensional variational data assimilation (3DVar) system. Zhong et al. (2014) proposed a multiple-regularization-parameter selection method with acceptable computational complexity based on the posterior information of the 4DVar system. The proposed 4DVar with multiple regularization parameters as a weak constraint had superior performance compared with the traditional 4DVar for bogus data assimilation (BDA) in the initialization and simulation of a typhoon. In the EnKF, many studies have inflated or multiplied forecast ensemble perturbations by an appropriate inflation factor to adjust ensemble variance (Evensen, 2009). In the early stage, the inflation factor is estimated empirically. Subsequently, adaptive inflation approaches have been proposed to account for inhomogeneous covariance underestimation (Anderson, 2007;Li et al., 2009;Miyoshi, 2011;Wang & Bishop, 2003). Wang and Bishop (2003) developed an approach to optimize the inflation factor. Their scheme was further improved by Li et al. (2009). All of these schemes are based on a first moment estimation of the squared observation-minus-forecast residual, which was first introduced to data assimilation by Dee (1995). Zheng et al. (2013) proposed an alternative approach to optimize the inflation factor using maximum likelihood estimation (MLE), which was introduced by Dee and colleagues . Algorithm analyses showed that the inflation factor in EnKF was expressed as a reciprocal relationship to its corresponding regularization parameter in 4DVar. Generally, these approaches belong to the parameter estimation algorithms based on EnKF (e.g., ELSheikh et al., 2013).
The inclusion of the EnKF-estimated (Evensen, 2009;Houtekamer et al., 2014) and flow-dependent error covariances in 4DVar (Rabier et al., 2000) has been a subject of recent intense study (Bannister, 2017;Gustafsson, 2007;Kalnay et al., 2007;Lorenc, 2017). The achievement of the background and observation error variance regularization in their (EnKF and 4DVar) fusion is a logical necessity. Existing composite methods can be roughly divided into two groups based on their dependence on tangent/adjoint models, the "En/hybrid-4DVar" (Clayton et al., 2013;Zhang et al., 2009) and the "4DEnVar" (Qiu et al., 2007;Tian et al., 2008Tian et al., , 2018Tian & Feng, 2015;Wang et al., 2010) methods. En/hybrid-4DVar completely or partially utilizes the ensemble background error covariance B e in its 4DVar minimization, assisted by the tangent/adjoint models (Clayton et al., 2013;Zhang et al., 2009). Conversely, by approximating the tangent models with the linear assumption between the model state perturbations (MPs) and their corresponding simulated observation perturbations (OPs), the tangent/adjoint models are no longer needed in 4DEnVar (Qiu et al., 2007;Tian et al., 2008Tian et al., , 2018Tian & Feng, 2015). Lorenc (2013) concluded that the 4DEnVar method might be the optimal approach for integrating 4DVar and EnKF because of its independence from the adjoint/tangent models, ease of implementation, and parallelism. Among the many 4DVar methods, the nonlinear least-squares 4-D variational data assimilation method (NLS-4DVar; Tian et al., 2018;Tian & Feng, 2015) is distinctive as an advanced 4DEnVar algorithm that utilizes a Gauss-Newton iterative scheme (Dennis & Schnabel, 1996) to handle the nonlinearity of the 4DVar cost function. It therefore enhances assimilation accuracy while maintaining an overall structure that still resembles traditional 4DEnVar algorithms (i.e., with no dependence on the adjoint model) and provides a flow-dependent B e . Very recently, NLS-4DVar was further upgraded to a big-data-driven version (BD-NLS4DVar; Tian & Zhang, 2019a, 2019b, which has better performance than the standard NLS-4DVar without increased computational costs. In this study, to minimize the 4DVar cost function with multiple regularization parameters, we utilized an alternating direction method (ADM; the convergence of ADM for two separable variables has been established for years; for more details, see Yang et al., 2017) iteratively to simultaneously optimize the regularization parameters and the model states under an ensemble forecasting framework: (1) The MLE-based approach (Zheng et al., 2013) was adopted to estimate the inflation factor (and thus the regularization parameter) by minimizing the 2log-likelihood of the observation-minus-forecast residual; and (2) the NLS-4DVar was implemented to seek the solution of the 4DVar problem with the optimized regularization parameters.
We present the formulation of an adjoint-free ADM for 4DVar with multiple-parameter Tikhonov regularization in section 2. The performance evaluation of NLS-4DVar with Tikhonov regularization is presented in section 3, along with a comparison against the standard NLS-4DVar without Tikhonov regularization using the shallow-water equation model. Finally, we conclude with a summary and concluding remarks in section 4.

Methodology
With the growing number of observations available from remote sensing techniques, more asynchronous observations with different error properties and from a variety of sources are increasingly assimilated in real 4DVar data assimilations. In response, a multiple regularization parameters method was introduced by setting different regularization parameters for different observations and then transforming the traditional 4DVar cost function into the following function, with multiple-parameter Tikhonov regularization: (3) and Here, x ′ ∈ R mx (m x is the dimension of the state vector) is the perturbation of the background field x b ∈ R mx at t 0 , the superscript T stands for the matrix transpose, y The B-R ratio is therefore the key factor affecting the performance of 4DVar (Johnson et al., 2000). We further transformed Equation 1 into the following format: Equation 10 indicates that the ratio of the inflation parameters (Zheng et al., 2013) for the background error variance B and ith observation error variance R i is 1 : μ i (i = 1,⋯,n obs ).
In this study, an adjoint-free ADM is proposed to solve the 4DVar problem with the iterative application of multiple regularization parameters under a framework of ensemble forecasting as follows. For the jth (j = 1,⋯,j max ) iterative step, (1) the MLE-based approach proposed by Zheng et al., 2013, with some modifications, was applied to estimate the inflation factor λ j i for the ith observation by minimizing the following 2log-likelihood of the observation-minus-forecast residual: where . Thus, the ratio of the inflation parameters (Zheng et al., 2013) for the background error variance B and the ith observation error variance R j ð Þ i is λ j i : 1 (i.e., 1: (2) NLS-4DVar was then implemented to obtain the solution of the 4DVar problem with multiple optimized regularization parameters μ j i : (3) The observation error variances R Ignoring the different types of observations, Equation 10 can then be degraded into the 4DVar with single-parameter Tikhonov regularization: Correspondingly, Equation 11 is also simplified as where

10.1029/2020EA001307
Earth and Space Science Steps 1 and 2 were conducted under a framework of ensemble forecasting.
Applying the ith observation operator H i ð Þ k to the simulated model states, the corresponding simulated observation perturbations (OPs) were then yielded as follows: Furthermore, and According to the following approximation (Tian & Zhang, 2019b), Equation 14 can be rewritten as Obviously, the minimal solution to Equation 23 can be computed iteratively by performing evaluations of the cost function of Equation 23 and its gradient using a suitable descent algorithm (e.g., the limited memory quasi-Newton method [L-BFGS]; Liu & Nocedal, 1989). For the convenience of calculation, we first simplified the computation of the cost function of Equation 23 and its gradient based on a singular value decomposition (SVD): and Here, and, d j,k is the kth diagonal element of D j .
Simple mathematical calculations (see Zheng et al., 2013) yield and . We accomplished the first step to estimate the inflation factor λ j i for all of the ith observations by minimizing Equation 11 in the jth outer iteration. In Step 2, NLS-4DVar was applied to express the analysis increment x ′ a ∈ R mx by the linear combinations of the MPs, P x ∈ R mx × N , as Substituting Equation 31 and the ensemble background error covariance B ¼ P x ð Þ P x ð Þ T N − 1: (Evensen, 2009)

into
Equation 12 with the optimized μ j i and expressing the cost function in terms of the new control yields β (for more details, see Tian et al., 2018): To circumvent the use of the adjoint models, Equation 32 is further rewritten as a nonlinear least-squares problem, which, by making some necessary approximations, is solved by the Gauss-Newton iteration scheme as follows: for l = 1,⋯,l max , where l max is the maximum iteration number, and I N × N denotes the N × N identity matrix (see Tian et al., 2018 for more details).
To filter out the spurious long-range correlations resulting from the finite ensemble number N, Tian et al. (2018) localized Equation 33 as follows: and where and y × r is computed together with ρ m , and r is the selected truncation mode number . The definition of the notation "(· < e > ·)" is provided in Zhang and Tian (2018).

10.1029/2020EA001307
Earth and Space Science TIAN ET AL.
Additionally, NLS-4DVar uses the square root analysis scheme (Evensen, 2009) for the online ensemble update in the assimilation cycles, as follows: where and Φ is a random orthogonal matrix (for more details, see Evensen, 2009). Noticeably, the diagonal elements of ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi I − Σ T 2 Σ 2 q are less than 1.0, which would result in a gradual ensemble collapse. Since R is symmetric, it has the Cholesky factorization , and the eigenvalue decomposition can be easily obtained through Equations 43-45: and To maintain the ensemble spread, we further transformed Equation 38 into the following format: It can be seen that Equation 46 is actually a special inflation of Equation 38, which is used for the online ensemble update in the assimilation cycles. However, we still used Equation 38 to construct the analysis error covariance B a .

Observing System Simulation Experiments Using the Shallow-Water Equation Model
Several groups of observing system simulation experiments (OSSEs) based on the shallow-water equation model were designed to demonstrate the potential merits and advantages of the NLS-4DVar with single/multiple-parameter Tikhonov regularization, especially compared to the standard NLS-4DVar method with no regularization.

Shallow-Water Equation Model
The following 2-D shallow-water equations, with the f-plane formulation, were utilized as the forecast model for the OSSEs: Root mean square errors NLS-TK_m/s BD-NLS4DVar with multiple−/single-parameter Tikhonov regularization Observation error covariance at time t k α The regularization parameter γ i The impact of the regularization parameter α on each observation term The ith observation operator at time t k where f = 7.272 × 10 −5 s −1 is the Coriolis parameter, H = 3,000 m is the basic state depth, h s = h 0 sin(4πx/ L x )[sin(4πy/L y )] 2 is the terrain height, h 0 = 250 m, and the lengths of the two sides of the computational domain are D = L x = L y = 44d km, respectively, where d = Δ x = Δ y = 300 km is the uniform grid size. The domain was partitioned into a square mesh with 45 grid points in each coordinate direction, and periodic boundary conditions were imposed at x = (0, L x ) and y = (0, L y ). The second-order central finite difference and two-step backward difference schemes of Matsuno (1996) were selected to discretize the spatial and local time derivatives, respectively, and thereby ensure computational stability (Qiu et al., 2007). The time step was 360 s (6 min). It should be noted that the damping effect deserves special attention in the Matsuno scheme (Pfeffer et al., 1992). The model state vector was represented by height h, and the horizontal velocity components were represented by u and v at the grid points.

Experimental Setup
The true initial fields for all OSSEs were first produced by integrating the perfect (i.e., h 0 = 250 m) shallowwater equation model from the following initial conditions for 60 hr: h ¼ 360 sin u ¼ −f −1 g ∂h ∂y Correspondingly, the background state x b was produced in a similar manner but using the imperfect model (i.e., h 0 = 0 m). The different model configurations resulted in significant differences between the background and true states (not shown) due to the two 60-hr model integrations under the perfect-and imperfect-model scenarios, respectively. Specifically, the spatially averaged root mean square errors (RMSEs) were 23.4 m, 1.53 m s −1 , and 2.58 m s −1 for h, u, and v, respectively.
There were six assimilation cycles (windows) in each evaluation experiment. The length of each window was 12 hr, and observations were available every 3 hr (i.e., at 3, 6, 9, and 12 hr during each assimilation window). Each model grid had one randomly distributed observation site, resulting in a total of 44 × 44 observational sites at each observation time point. The observations were generated by adding random noise to the true values at the observational locations, which were obtained using a simple bilinear interpolation method. Therefore, the observation operator H k was simply the bilinear interpolation operator.
The big-data-driven version of NLS-4DVar was adopted to minimize the 4DVar cost function with/without Tikhonov regularization in the OSSEs. The total initial MPs P was composed of two ensembles, a preprepared historical big-data ensemble (P x; h ∈ R mx × Nh ) and a small online ensemble (P Zhang (2019a, 2019b) for more details about the preparation and update of the ensemble for the BD-NLS4DVar. The default parameter values were N o = 10, N h = 100, α = 11, β = 2, l max = 1, and j max = 4, and the covariance localization Schür radius r x = r y = 15 × 300 km for the x and y coordinates, respectively (Gaspari & Cohn, 1999). Also see Tian and Zhang (2019a) for the definition of α and β. The 4-D moving strategy (Tian & Feng, 2015) was adopted to produce initial MPs for BD-NLS4DVar with/without Tikhonov regularization, respectively. Both the perfect (h 0 = 250 m) and imperfect (h 0 = 0 m) models were used as forecast models in all subsequent OSSEs to investigate the BD-NLS4DVar with Tikhonov regularization under the perfect-and imperfect-model scenarios, respectively.

Experimental Results
We first evaluated the performance of BD-NLS4DVar with multiple-(NLS-TK_m) and single-parameter (NLS-TK_s) Tikhonov regularization and without regularization (referred to as NLS). The spatially and temporally averaged RMSEs of height ( r h mse ) and wind ( r wind mse ) (see Tian & Zhang, 2019b for definitions of r h mse and r wind mse ) for NLS-TK_m/s and NLS were compared for each assimilation window under the perfect-model assumption (h 0 = 250 m) for all true, forecast, and assimilation runs (Figures 1a and 1c). Figures 1a and 1c show that both NLS-TK_m and NLS-TK_s performed significantly better than NLS. We suggest this is due to the proposed (multiple/single) regularization parameter ADM selection approach, which could produce an accurate ratio of background and observation error covariances iteratively, resulting in superior performance compared to the NLS (i.e., the BD-NLS4DVar without regularization). In addition, the accuracy of the observation error variances could be effectively improved by the multiple regularization parameters, which contributed significantly to the substantially better performance of NLS-TK_m compared with NLS-TK_s. Another group of experiments using severe model error was conducted by changing the maximum terrain height (h 0 = 0 m) from that used in the truth simulations (h 0 = 250 m). Under this model error, BD-NLS4DVar with and without Tikhonov regularization performed slightly worse than under the perfect model scenario, with increased RMSEs for both the height and wind components. Nevertheless, the NLS-

Earth and Space Science
TK_m and NLS-TK_s methods still outperformed the original BD-NLS4DVar (i.e., without Tikhonov regularization), leading to substantially lower RMSEs of NLS-TK_m and NLS-TK_s in terms of both the height and wind fields than were observed in NLS (Figures 1b-1d). Not surprisingly, NLS-TK_m still performed much better than NLS-TK_s under this severe model error due to its better derived ratios of the background and observation error variances. These results clearly demonstrate that the implementation of the Tikhonov regularization scheme (especially the multiple schemes) improved the data assimilation performance of BD-NLS4DVar with and without model error.
To further investigate the impacts of Tikhonov regularization on the (NLS-) 4DVar method, the variations in the multiple (μ i , i = 1,2,3) and single (μ) regulation parameters with increasing iteration steps in the first assimilation window are shown in Figure 2. It was clear that, along with the increase in iteration time, both the multiple and single regularization parameters would gradually approach 1.0 for both the perfect and imperfect model assumptions. This indicated that the B-R ratio was adjusted to an accurate level through the proposed ADM. Consequently, the improvement of the B-R ratio helped to increase NLS-4DVar accuracy. Additionally, the differences in the three regulation parameters (i.e., μ i , i = 1,2,3) in NLS-TK_m with much lower RMSEs compared with NLS-TK_s further demonstrated the necessity of setting different regularization parameters for different observations.
The the sensitivity of the proposed ADM to the initial MPs, we compared NLS-TK_m performance with that of three different initial online MPs (i.e., 5P x,o , P x,o , and 1/5P x,o ). Their impacts on the NLS-4DVar for the imperfect model scenario (Figures 3b and 3d) were far less significant than those for the perfect model scenario (Figures 3a and 3c). However, as the NLS-TK_m proceeded, the adverse impacts of the initial MPs gradually decreased due to the Tikhonov regularization.
Obviously, the predetermined observation error variances R would be of much more importance to the final NLS-TK_m assimilation performance. To investigate the impact of the predetermined R on the NLS-TK_m results, three further experiments were conducted using NLS-TK_m with three different predetermined Rs (i.e., 10 2 R, R, and 10 −2 R) to contrast with each other. Similarly, the NLS-TK_m RMSEs derived from the imperfect model scenario (Figures 4b and 4d) were actually more influenced by the various predetermined Rs than those from the perfect model scenario (Figures 4a and 4c). Compared with the initial MPs, the effect of contamination of and inaccurate R value on the NLS-TK_m results was difficult to correct, even though the impact could be gradually reduced, especially for the imperfect model scenario (Figures 4b and 4d). Fortunately, compared with the background error variance B, the observation error variances R were usually confirmed easily and accurately.
Our final experiment examined the sensitivity of the impact of ensemble sizes N h and N o on the NLS-TK_m results by varying N o (=5, 10, and 20)/N h (=80, 100, and 120) and comparing with fixed values N h = 100/ N o = 10 for the imperfect model scenario. As with the original BD-NLS4DVar with no regularization, the increase in the N h = 100 usually resulted in improved NLS-TK_m assimilation performance, with much lower RMSEs (Figures 5b and 5d). Conversely, larger sizes (N o ) of the online MPs did not result in better assimilation results (Figures 5a and 5c). These results were combined with those from Tian and Zhang (2019a) to further verify the results, and the following conclusions were reached. The BD-NLS4DVar method (with/without Tikhonov regularization) was similar to the hybrid-4DVar method, where the historical ensemble P x,h (size N h ) was analogous to the climatological ensemble (to construct the climatological covariance, B c ), and the online ensemble represented the instantaneous and flow-dependent perturbations P x,o (size N o ), as with its counterpart in the hybrid-4DVar.

Summary and Concluding Remarks
Accurate specification of both the background (B) and observational (R) error covariances is essential to the success of data assimilation. For 4DVar, the B-R ratio is particularly critical to maximizing the information that can be extracted from the observations. To provide an accurate B-R ratio, the Tikhonov regularization theory was thus simultaneously incorporated into 4DVar to optimize the model states and the variance ratios. Furthermore, the B-R ratios can be interpreted as the (Tikhonov) regularization parameters (Johnson et al., 2005).
However, it is difficult to specify the regularization parameters in a high-dimensional VDA system for both the single-and especially the multiple-parameter regularization schemes. In the multiple-parameter regularization scheme, it became increasingly important to deal with different error properties from a variety of observations. Correspondingly, the EnKF usually adopts the inflation strategy to adjust the ensemble variances, which is relatively simple to implement, especially in terms of computational complexity.
In this study, we incorporated an MLE-based inflation approach into the 4DVar Tikhonov regularization. An ADM was proposed to minimize the 4DVar cost function with multiple regularization parameters iteratively to simultaneously optimize the regularization parameters and the model states under the framework of the NLS-4DVar method. The MLE-based approach was first adopted to estimate the regularization parameters (according to the reciprocal relationship between the regularization and inflation parameters) by minimizing the 2log-likelihood of the observation-minus-forecast residual. The NLS-4DVar was then implemented to obtain the solution of the 4DVar problem with optimized regularization parameters.
We conducted a comprehensive performance evaluation of the BD-NLS4DVar method with the proposed single-/multiple-parameter Tikhonov regularization using several groups of OSSEs based on 2-D shallowwater equations and compared the results with those obtained by the standard BD-NLS4DVar method. These experiments showed that the BD-NLS4DVar method with single-or multiple-parameter Tikhonov regularization performed substantially better than the BD-NLS4DVar method alone when all had the same online/historical ensemble sizes. Additionally, the BD-NLS4DVar method with multiple-parameter regularization (NLS-TK_m) performed much better than the corresponding single-parameter regularization method because NLS-TK_m provided better B-R ratios.
The proposed NLS-TK_m method has many advantages and great potential due to its superior performance, which was achieved without noticeably increasing computational costs, as well as its ease of implementation. The potential applications of NLS-TK_m in satellite radiance data assimilation are particularly promising due to the possible large discrepancies between the measured and simulated radiance, especially within regional models (Zhu et al., 2019). It is generally known that satellite data assimilation is responsible for much of the improvement in forecasting skill in global NWP over recent decades, especially in the Southern Hemisphere (Le Marshall et al., 2007). Therefore, our future work will focus on applying this proposed method in more complex real-world models to further investigate its potential application in real satellite data assimilation. Moreover, how to improve the B-R ratios and model perturbations simultaneously is indeed quite important in ensemble-based assimilation methods and still needs to be addressed.