Parameterizing the Impact of Unresolved Temperature Variability on the Large‐Scale Density Field: Part 1. Theory.

Unresolved temperature and salinity fluctuations interact with a nonlinear seawater equation of state to produce significant errors in the ocean model evaluation of the large‐scale density field. It is shown that the impact of temperature fluctuations dominates the impact of salinity fluctuations and that the error in density is, to leading order, proportional to the product of a subgrid‐scale temperature variance and a second derivative of the equation of state. Two parameterizations are proposed to correct the large‐scale density field: one deterministic and one stochastic. Free parameters in both parameterizations are fit using fine‐resolution model data. Both parameterizations are computationally efficient as they require only one additional evaluation of a nonlinear equation at each grid cell. A companion paper will discuss the climate impacts of the parameterizations proposed here.


Introduction
An important component of any ocean model is the seawater equation of state (EOS) which evaluates density as a function of temperature, salinity, and pressure. The choice of which EOS to use is crucial to the performance of an ocean general circulation model. Improved accuracy in the EOS, and hence the resolved density field, has been shown to improve simulations of the Atlantic Meridional Overturning Circulation, an important process in determining climate variability (Ma et al., 2020). A large source of error in the resolved density field is due to the interaction of unresolved temperature and salinity fluctuations with the nonlinear EOS. The EOS is empirically fit for water parcels in thermodynamic equilibrium (Millero, 2010). Thus, the true large-scale density is obtained by applying the EOS first at scales where the assumption of local thermodynamic equilibrium holds and then averaging to the large scale. However, most ocean models instead calculate density by applying the nonlinear EOS directly to large-scale temperature and salinity, where local equilibrium is generally not a valid assumption. Because of the curvature of the EOS, this method systematically overestimates the large-scale density. If the EOS were linear, these two calculations would be the same since averaging operators commute with linear functions. The effect of unresolved mesoscale fluctuations on the large-scale horizontal density gradient has been shown to be significant. McDougall and McIntosh (1996) estimate that temperature fluctuations of 1°C, which are common in regions of strong temperature fronts (Figure 1), can lead to errors that are 3% of the typical mean density gradient term. Brankart (2013) developed a method to simulate the effects of these unresolved temperature and salinity fluctuations and found that this parameterization has a significant effect on the large-scale circulation of the ocean and improves the Gulf Stream pathway. Further studies have shown that Brankart's parameterization impacts the large-scale density field across the Gulf Stream front at both coarse (2°) and eddy-permitting (1/4°) resolutions (Zanna et al., 2019).
The errors incurred by ignoring the rectified effect of subgrid-scale variability on the resolved density field are expected to be especially pronounced in areas with significant mesoscale eddy activity. Most climate models use a horizontal grid spacing for their ocean component that is too coarse to resolve mesoscale eddies (Tsujino et al., 2020). Mesoscale eddies play an important role in oceanic dynamics; they contain over half of the ocean's kinetic energy and contribute substantially to the transport of heat, salt, and nutrients (McWilliams, 2013). Although mesoscale eddies are typically not resolved in the current generation of climate models, their effects can be parameterized. Turbulent mesoscale eddy flows are chaotic, so comparison of ocean model simulations to the real ocean can only be done in a statistical sense. One way to improve the alignment between the statistics of the modeled mesoscale eddies and the statistics from observations is through stochastic parameterizations. Stochastic parameterizations have a long history in climate models, having been foreseen by Lorenz (1975). Stochastic parameterizations were implemented in weather models by the European Center for Medium-Range Weather Forecasts in 1999 with great success, leading to increased skill in their forecasts (Buizza et al., 1999). Stochastic parameterizations also improve ensemble spread, an important quality for data assimilation systems, and can improve patterns of low frequency variability (Berner et al., 2017).
Despite the advantages of stochastic parameterizations, ocean climate models remain largely deterministic. The generic approach to stochastic parameterization of Buizza et al. (1999) has been explored in ocean climate models (Andrejczuk et al., 2016;Brankart et al., 2015;Juricke et al., 2017Juricke et al., , 2018 and has generally demonstrated a positive effect on the spread in ensembles but only minor improvements with respect to the mean state. Other methods of adding stochastic perturbations to the resolved density field have also been explored (e.g., Williams et al., 2016). Several recent studies have focused on developing stochastic parameterizations in more idealized ocean models (e.g., Berloff, 2005;Cooper & Zanna, 2015;Grooms, 2016;Grooms & Majda, 2013Grooms & Kleiber, 2019;Jansen & Held, 2014;Kitsios et al., 2013;Porta Mana & Zanna, 2014), where the effects of these parameterizations can be more easily understood.
The scheme of Brankart (2013) is a stochastic parameterization of the impact of unresolved temperature and salinity fluctuations on the evaluation of large-scale density, used only to correct errors in the buoyancy force. In this paper we propose a new, computationally efficient and data-informed parameterization of the effect of unresolved temperature and salinity fluctuations on large-scale density. Our parameterization requires only one additional evaluation of a nonlinear function at each grid cell, compared to a minimum of two in Brankart's parameterization (Brankart, 2013). This parameterization can be used to correct the hydrostatic pressure gradient force, as in Brankart (2013); it can also be used in other places where the density is required, for example, to correct computation of the isopycnal slope in the Gent-McWilliams and Redi parameterizations (Gent & McWilliams, 1990;Redi, 1982). In section 2 we develop our deterministic parameterization. In section 3 we develop our stochastic parameterization. The final form of both the deterministic and the stochastic parameterizations are summarized in section 4. We conclude and provide discussion in section 5.

Derivation of Parameterization
Ocean models cannot resolve all spatial scales, so they typically discretize the domain into grid cells and track volume averages of desired quantities (though there are a few models not based on volume averages; e.g., Wang et al., 2014). Quantities of particular interest to this study are temperature, salinity, and pressure, which are used in the calculation of seawater density. Consider a grid cell G with volume V. Let T, S represent temperature and salinity. The volume-averaged temperature and salinity on G are where x ¼ ðx; y; zÞ is a spatial location. In the definition of potential density the pressure is a constant and is completely unaffected by averaging. For in situ density in the Boussinesq approximation the pressure is a linear function of depth p ¼ p 0 ðzÞ, so we can write the volume-averaged pressure as Density of a parcel of seawater in thermodynamic equilibrium is calculated through the EOS, a function of temperature, salinity, and pressure, ρ ¼ρðT; S; pÞ: We use the decoration· to distinguish the density, that is, the value ρ, from the EOS, that is, the functionρ that computes density. The standard ocean model calculation of density evaluates the EOS at the volume-averaged temperature and salinity and reference pressure, ρ m ¼ρðT ; S; pÞ: This calculation neglects any effects from unresolved temperature, salinity, and pressure perturbations. By contrast, the true volume-averaged density is impacted by unresolved scales. To make this explicit, we decompose temperature, salinity, and pressure into resolved and unresolved parts, where ΔT, ΔS, and Δp are the unresolved fluctuations of temperature, salinity, and pressure. Note that in the Boussinesq approximation Δp is a linear function of depth only. The volume-averaged density on a grid cell is Equation 6 includes the effect of unresolved temperature, salinity, and pressure perturbations, which is not captured in the standard model calculation of density in Equation 4.
The density calculated by the ocean model, ρ m , is not equal to the true average density, ρ, when the EOS is nonlinear. Brankart (2013) showed that this error in density is substantial and that it has a significant impact on the average large-scale circulation of the ocean. Our goal is to correct this error through a parameterization that is more computationally efficient and more accurate than Brankart's parameterization. That is, we wish to find a correction ρ c which is a function of resolved variables, is computationally inexpensive to compute, and improves the calculation of average density ρ ≈ ρ m þ ρ c : We develop the following parameterization for models with depth-based vertical coordinates, for example, z coordinates, and sufficient vertical resolution such that the vertical variations of ΔT and ΔS are much smaller than the horizontal variations. In practice, the vertical resolutions in operational ocean models are sufficient to ensure that subgrid-scale fluctuations of temperature and salinity are dominated by horizontal fluctuations. Thus, the volume averages we consider can be well approximated by horizontal averages, and we proceed to consider horizontal averages rather than volume averages. Throughout the rest of this paper, averages over the grid cell G are horizontal averages. We denote the cross-sectional area of a horizontal slice of the grid cell by A. Spatial locations are two-dimensional, x ¼ ðx; yÞ, unless otherwise stated. In a Boussinesq model, pressure is unaffected by horizontal averaging and there are no unresolved horizontal pressure fluctuations, Δp ¼ 0. For simplicity of presentation we express the EOS as a function of temperature and salinity only and denote itρðT; S; pÞ ¼ρðT; SÞ. This enables us to use the same correction for potential density and in situ density in a Boussinesq model. We discuss the utility of this parameterization in isopycnal models in Appendix B. The effect of non-Boussinesq subgrid-scale variations in pressure on the computation of volume-averaged density is beyond the scope of the current investigation.
where T ′ i and S ′ i are stochastic and the subscript i ¼ 1; …; N indicates N-independent samples from a joint distribution on T′ and S′. Brankart parameterized the joint distribution on T′ and S′ using physical arguments. The resulting stochastic approximation to density had salutary effects when implemented in a low-resolution global ocean model (Brankart, 2013;Brankart et al., 2015), but it was never verified whether the physical arguments that produced the parameterized joint distribution on T′ and S′ were consistent with data. Rather, the resulting density correction was compared with data. Our interest here is to construct a different but related form of parameterization that achieves the same ultimate goal of correcting the computation of coarse-grained density, taking care to ensure that the statistics of T′ and S′ are consistent with data from a high-resolution reference simulation.
To that end, it is worth beginning by recalling how the unresolved temperature and salinity fluctuations ΔT and ΔS can be related to the random variables T′ and S′. We will use this connection to diagnose the statistics of T′ and S′ directly from a reference simulation at eddying resolution (nominal 0.1°; see section 2.2). The connection is straightforward: Let x ′ be a random location drawn from a uniform distribution within the coarse grid cell; then we may define T′ ¼ ΔTðx′Þ, and similarly S′ ¼ ΔSðx′Þ. The joint probability density function on T′ and S′ is denoted ϕ(T′, S′) and is related to the spatial distribution of ΔT and ΔS as follows where δ(·) is the Dirac delta distribution, whose units in this context are one over the units of temperature and salinity. Using this definition, the spatial integral that defines the coarse-grained true density (6) can now be transformed, without approximation, into an average over the jointly distributed random variables T′ and S′ ρ ¼ (The proof of this identity involves inserting 9 into 10, rearranging the order of integration, and using familiar properties of the Dirac delta distribution to arrive back at the original definition (6).) Brankart's parameterization (8) is a Monte Carlo approximation to the integral (10). The weaknesses of the scheme are the ad hoc specification of ϕ, and the need to evaluate the nonlinear EOS 2N times.
where RðT ′ ; S ′ Þ is the Taylor remainder. Plugging the expansion in Equation 11 into the integral in 10, recalling that the fluctuations have zero mean, and discarding higher-order terms (i.e., the remainder) gives where σ 2 T is the variance of T′, σ 2 S is the variance of S′, and σ 2 TS is the cross covariance of T′ and S′. Our parameterization of the density correction, ρ c , is based on 12. We will begin by diagnosing all the terms in 12 to verify that it is a good approximation. Next we will find that the second-order terms involving S contribute little to the overall density correction, and will focus on constructing deterministic and stochastic parameterizations for the subgrid-scale temperature variance, σ 2 T . To lay a foundation for the diagnostic analysis, we begin by relating the second-order statistics of T′ and S′ back to spatial integrals as follows and similarly We will use the spatial integral formulation to diagnose σ 2 T , σ 2 S , and σ 2 TS . As noted above, the diagnostic analysis uses data from a high-resolution reference simulation. The values of temperature T i and salinity S i on the grid in this simulation are in fact cell-averaged values themselves, over the smaller cells in the high-resolution grid. For convenience we will assume that the values of T i and S i produced by the reference simulation are the exact values of T and S averaged over the ith fine grid cell. This assumption implies that we are ignoring errors in the high-resolution reference simulation.
Let the cells i ¼ 1; …; N from the reference simulation combine to form a single coarse grid cell. Let A i be the cross-sectional area of the ith cell of the fine-resolution grid, and define weights Then, under our assumption that the reference simulation is without error, we can exactly compute T and S as follows Having computed T and S, we can define and approximate the integrals (13-15) by Similarly, we can approximate the exact cell-averaged density as follows These are all quadrature, or numerical integration, approximations on the fine resolution grid, so the errors in the approximation decrease as the resolution of the reference simulation improves. Succinctly, scales that are unresolved in the reference simulation are ignored in our computation of subgrid-scale variances and covariances, and in our computation of the cell-averaged density.
A list of frequently used notation is compiled in Table 1. In the next section we describe a 0.1°model run whose output we use in section 2.3 to further develop our parameterization.

Ocean Model Configuration
We use output from the POP2 ocean model run at 0.1°resolution from Johnson et al. (2016) to inform details of our parameterization. The ocean and ice components are forced in accordance with the Coordinated Ocean-Ice Reference Experiments (Griffies et al., 2009) guidelines. The spin up is 15 model years and is forced with CORE-II normal-year forcing. After the spin up, the model is integrated for 33 years using CORE-II interannually varying forcing corresponding to the years 1977-2009. The model uses z coordinates and has 62 vertical levels. The output is saved as 5-day averages from which we use 1 year, which is 73 snapshots. We coarse grain the output to (1/2)°, 1°, and 2°nominal resolutions by averaging spatially over horizontal boxes of 5 × 5, 10 × 10, and 20 × 20 grid cells, respectively. The following presentation is validated at 1°resolution. Our results for coarsening scales from (1/2)°to 2°are qualitatively similar and are summarized in Appendix A. We use the Wright EOS for all evaluations of density (Wright, 1997). We perform all calculations over 1 year and report the average of the statistics calculated at each of the 73 snapshots.

Using Model Output to Inform Parameterization
We now use the output from the 0.1°model run described in section 2.2 to develop our parameterization for a 1°resolution model. The true cell-averaged density ρ (Equation 6) is not the same as the model's density ρ m (Equation 4) and we wish to find a correction, ρ c , which improves the calculation of large-scale density, ρ ≈ ρ m þ ρ c . In this section we use model output to further develop the approximation in 12.
We coarse grain the 0.1°model output to a nominal 1°resolution by assigning a 10 × 10 block of model grid cells to one lower resolution grid cell. Consider one coarse resolution grid cell and the high-resolution grid cells i ¼ 1; …; 100 associated with it. Let T i and S i be the values of temperature and salinity on the ith high-resolution grid cell. As in section 2.1, we assume that the reference simulation produces the exact values of temperature and salinity averaged over a fine grid cell. That is, we ignore errors in the high-resolution reference simulation. We calculate average temperature, T , and average salinity, S , on the coarse grid as in Equation 17 with N ¼ 100 and w i ¼ 1 100 . Note that we assume that the variation in cross-sectional area of the high-resolution grid cells is small within a 1°grid cell, and hence the weights Table 1 List of Frequently Used Notation

Notation used in both parameterizations
Density as calculated in a standard ocean model, ρ m ¼ρðT ; SÞ. ρ c Proposed deterministic correction to density, ρ c ¼ ∂ 2 Tρ ðT ; SÞ 2 s 2 T . Notation used in the stochastic parameterization only χ Noise field, enters into the proposed stochastic parameterization as multiplicative noise e χ . φ

10.1029/2020MS002185
Journal of Advances in Modeling Earth Systems defined in 16 simplify to w i ¼ 1 100 . The temperature and salinity fluctuations ΔT i , ΔS i are calculated as in Equation 18.
We additionally assume that the unresolved variability at 1°resolution is dominated by resolved scales in the 0.1°reference simulation. This assumption implies that we are ignoring the impact of unresolved variability in the 0.1°simulation on our parameterization of resolved density in 1°models. Under this assumption the approximations of subgrid-scale variances in 19 and cell-averaged density in 20 incur negligible errors. Thus, we diagnose the subgrid-scale variances at 1°from the 0.1°reference simulation as in 19 and the true cell-averaged density as in 20 with N ¼ 100 and w i ¼ 1 100 . The density that would be calculated by a 1°model using sample averages is ρ m ¼ρðT ; SÞ.
Using the 0.1°model output as data, we find that the model in Equation 12 shows excellent skill, with an average R 2 value of 0.999. The R 2 value of a data set [y 1 , … y n ] and modeled, or predicted, values [ f 1 , … , f n ] is calculated as where SS res is the sum of squares of residuals and SS tot is the total sum of squares, The R 2 value ranges between 0 and 1 and measures the proportion of variance in the data that is explained by the model. However, R 2 is sensitive to outliers and for this reason we also report a second measure of similarity, pattern correlation. Pattern correlation is calculated as The pattern correlation value ranges between −1 and 1, with a value of 1 indicating that the data [y i ] and the model [ f i ] are the same, up to a positive multiplicative factor. The average pattern correlation of the three term Taylor expansion in Equation 12 and the density correction, ρ − ρ m is 0.999. Both the R 2 and pattern correlation values are calculated globally for all depths at each of 73 different snapshots. We report the average of the 73 values.
Using just the temperature term to predict the density correction is nearly as accurate as using all three terms, with an R 2 value of 0.994 and a pattern correlation of 0.999. By contrast, the R 2 value when using just the salinity term is −0.044, indicating that the salinity term by itself is a worse predictor of the density correction than using the average density correction everywhere. The temperature term by itself is a good predictor because it is at least an order of magnitude larger than either of the other terms (Williams et al., 2016). Moreover, using only one term makes the parameterization more parsimonious and reduces the computational cost. Thus the proposed correction to the resolved density field is where σ 2 T is the subgrid-scale temperature variance. Note briefly that the density is strongly sensitive to salinity but that this sensitivity is via a linear term in the Taylor expansion, which does not have an impact on the cell-average, and is therefore already accurately represented by the model. Further note that the corrected density is always positive. The derivative of the EOS ∂ 2 Tρ is everywhere negative while the variance σ 2 T is always positive, leading to a correction that is always negative. Theoretically, this could lead to a negative value for density, which is unphysical. In practice, ρ m is on the order of 1,000 kg m −3 while the largest correction is −0.2 kg m −3 so that the predicted density is always positive. Next we move on to parameterizing the subgrid-scale temperature variance.

Parameterization of Subgrid-Scale Temperature Variance
We have so far used subgrid-scale temperature variance, σ 2 T , that we diagnose directly from the 0.1°model output to validate that the second-order Taylor expansion including only the temperature term gives a very accurate approximation to the true cell-averaged density. In a low-resolution model we will not have access to the true subgrid-scale temperature variance, thus requiring a model which is a function of resolved variables. We propose the following simple model where δx ¼ ðδx; δyÞ, δx is the zonal grid-scale length, δy is the meridional grid-scale length, c is a constant to be estimated, ∘ is the Hadamard product, and ∇ is the horizontal gradient operator, so that jδx∘∇T . Note that in section 2.1 we used δ(·) to refer to the Dirac delta function. Going forward we will use δ to refer to grid-scale lengths and step sizes only. For clarity of exposition, we introduce the notation s 2 T which denotes the parameterized subgrid-scale temperature variance, while σ 2 T denotes the diagnosed subgrid-scale temperature variance. Brankart proposes a stochastic parameterization of subgrid-scale temperature fluctuations which, much like our parameterization, relies on the gradient of large-scale temperature. His parameterization implies a parameterization of subgrid-scale temperature variance which has the same form as 25, albeit with a coefficient c that varies with latitude.
There are two contributions to the subgrid variance that can be modeled: (i) that due to anomalies associated with unresolved turbulent structures and (ii) that due to the large-scale gradient. While the former is the main focus of this work, the latter can be estimated by a simple expansion; Taylor expand ΔT at the center of the grid cell, (x 0 , y 0 ), and keep only the linear term Then approximate σ 2 T by an integral This is the same model as that in Equation 25, with c ¼ 1=12 and accounts for the subgrid variance due to the large-scale gradient. We expect the amplitude of unresolved turbulent anomalies to be related to the local large-scale gradients which suggests that c will likely be larger than 1/12 where unresolved turbulence contributes to the subgrid variance. We diagnose c below and find c > 1/12 for all resolutions considered, and that the form of the model is still appropriate as long as a different value of c is used at each resolution.
The discrete approximations to the derivative operators in the relation (25) require careful attention. We compute σ 2 T at the center of the grid, so we also want to compute jδx∘∇T j 2 at the center of the grid. We use a centered difference scheme so that jδx∘∇T j 2 is calculated as We could use a forward or backward difference instead of a centered difference, but we found that this adversely affected the quality of our parameterization. Interestingly, the formulation in Equation 28 depends only on the volume-averaged temperature T at neighboring locations, not on the grid-scale length. Also note that the difference between the predicted value of c ¼ 1=12 above and the diagnosed value below could be partly due to errors in approximating ∇T by a finite difference scheme.
We estimate the coefficient c in the relation (25) from 0.1°model output iteratively. First we use ordinary least squares (OLS) regression, where we regress over all grid cells for a given snapshot. We then take the average of the 73 estimates for c and find c ¼ 0:25. Next we re-estimate c by minimizing Huber's loss function (Equation 29) evaluated at the residuals, aðcÞ ¼ σ 2 T − c · jδx∘∇T j 2 , and summed over all grid cells in one snapshot.
We use Huber's loss function because it is less sensitive to outliers than the squared loss function used in ordinary least squares (Huber & Ronchetti, 2009). When we use OLS to estimate c at each depth independently, we find that c varies greatly over depth. Upon inspection of the data we hypothesize that this is due to a few large outliers. When we re-estimate c over all depths independently using Huber's loss, we find a consistent estimate of c across all depths. For this reason, we feel confident in using Huber's loss to estimate c for all depths simultaneously.
Huber's loss function has a free parameter b which dictates the threshold between quadratic and linear loss. This parameter is commonly calculated as the top percentile of errors which are considered outliers. We choose b to be the 90th percentile of the OLS residuals. With this choice of b we estimate c ¼ 0:20, which is an average of the values for c calculated at each of 73 different snapshots. When c ¼ 0:20, s 2 T ¼ c · jδx∘∇ T j 2 is an acceptable model for σ 2 T , with an R 2 ¼ 0:569 and a pattern correlation of 0.783, as shown in Figure 1. In the next section we discuss the deterministic parameterization of the impact of subgrid-scale variability on large-scale density that is associated with this parameterization of subgrid-scale temperature variance.

Summary of Deterministic Parameterization
In section 2.3 we show that the impact of subgrid-scale temperature fluctuations on the evaluation of large-scale density is proportional to the product of the subgrid-scale temperature variance and the second derivative of the EOS with respect to temperature (24). In section 2.4 we propose a parameterization of subgrid-scale temperature variance that is proportional to the product of the grid-scale length and the resolved temperature gradient, squared. Together this establishes our deterministic parameterization of the impact of subgrid-scale temperature fluctuations on the evaluation of large-scale density, where s 2 T ¼ c · δx∘∇T 2 with c ¼ 0:20 for 1°resolution models. Our parameterization, ρ c , which is calculated using sample averages from 0.1°model output is an acceptable model for ρ − ρ m with R 2 ¼ 0: 546 and a pattern correlation of 0.779. Using our model of subgrid-scale temperature variance, rather than the diagnosed subgrid-scale temperature variance leads to decreased R 2 and pattern correlation values.

Journal of Advances in Modeling Earth Systems
This is a result of the variability in the subgrid-scale variance shown in Figure 1. We compare the diagnosed density correction to our parameterization in Figure 2 and see very close alignment between the two fields.
We close this section with a few remarks about the deterministic parameterization. Our parameterization requires just one extra evaluation of a nonlinear function, the second derivative of the EOS with respect to temperature. Brankart's parameterization, given in Equation 8, requires at least two evaluations of a nonlinear EOS, and more evaluations are needed for added accuracy. Thus, our parameterization has the potential to significantly increase the computational efficiency of accounting for the impact of subgrid-scale variability on large-scale density.
The results presented in this section are diagnosed from a z coordinate model coarsened to a nominal 1°resolution. However, the results generalize to other resolutions and vertical coordinates. We reevaluate the results presented in this section at resolutions ranging from (1/2)°to 2°and find qualitatively similar results. The estimated constant of proportionality c varies between 0.1 and 0.25. Appendix A summarizes our findings at different resolutions.
We find also that this parameterization is a good fit to the density correction in isopycnal models. The magnitude of the impact of temperature fluctuations on resolved density is a factor of two smaller in isopycnal models than in z-coordinate models because temperature fluctuations along isopycnal layers are smaller than along surface of constant depth. The estimate of the constant of proportionality is slightly larger but still within the range estimated for a z coordinate model. Further discussion of this parameterization in a isopycnal model is in Appendix B.
In future studies we wish to use our parameterization to correct the calculation of the hydrostatic pressure gradient force and of the isopycnal slope in the Gent-McWilliams and Redi parameterizations of tracer transport. These calculations use the large-scale horizontal density gradient. The results in this section show that our parameterization corrects the evaluation of large-scale density. However, we desire a parameterization that corrects the calculation of the horizontal gradient of density. We propose ∇ d ρ c , where ∇ d is a discrete horizontal operator, as a parameterization of this error. In Appendix C we show that the fit of our parameterization is not as good, though still passable, at the level of the gradient.
Finally, Figure 1 shows that while s 2 T provides a reasonable fit to σ 2 T , there is a significant amount of variability which is unaccounted for in the deterministic parameterization. Some of this variability could presumably be reduced with a better model of σ 2 T , but some of the subgrid-scale temperature variance is likely an inherently unpredictable result of turbulent subgrid-scale variability. One idea is to model this part stochastically, as we do in the following section.

Stochastic Parameterization
The main limitation of the correction to density presented above is our parameterization of subgrid-scale temperature variance. We should not expect to be able to perfectly parameterize subgrid-scale temperature variance as it is inherently an unresolved quantity, affected by turbulent and unpredictable fluctuations at the mesoscale. Instead, we aim to replicate its statistical properties: mean, variance, and spatiotemporal correlation structure.
In this section we build on our deterministic mean model (Equation 30) and propose the following stochastic parameterization of subgrid-scale temperature variance σ 2 T ðx; y; z; tÞ ≈ e χðx; y; z; tÞ · s 2 T ðx; y; z; tÞ: The quantity χ(x, y, z, t) is a random variable indexed over both space and time. Stochasticity has been introduced in a multiplicative and exponential form for two reasons: (i) the right panel of Figure 1 suggests that the random errors in the deterministic model are in fact a multiplicative exponential and (ii) multiplication by an exponential is a simple way to maintain a positive parameterization of subgrid-scale temperature variance. In section 3.1 we show that χ(x, y, z, t) can be approximated by a depth-independent field χ(x, y, t). In section 3.2 we show that χ(x, y, t) is approximately normally distributed with constant mean and variance. This vastly simplifies our model as it requires us to further estimate only the spatiotemporal correlation structure of χ(x, y, t). We discuss the correlation through time (t) in section 3.3 and the correlation through horizontal space (x, y) in section 3.4. In section 4 we substitute our stochastic parameterization of subgrid-scale temperature variance into the model for the correction to density given in Equation 24 and discuss the resulting stochastic parameterization.

Depth Structure
We exploit the vertical structure already present in our deterministic model and approximate χ(x, y, z, t) by a field χ(x, y, t) that is independent of depth. We observe that the diagnosed subgrid-scale temperature variance has a depth structure that is similar to the depth structure of our deterministic model. This observation is motivated by the comparison of leading empirical orthogonal functions (EOF) and validated through the calculation of the percent of variance explained by a depth-independent noise term. Figure 3 shows the leading EOFs of the diagnosed and modeled subgrid-scale temperature variance for two random samples of horizontal locations. Within each sample, the leading EOF of the diagnosed subgrid-scale temperature variance resembles the leading EOF of our deterministic model, with notable similarities in the behavior near the surface and at depths greater than 1,000 m. There is a discrepancy in the behaviors around 500 m. Nonetheless, this approximation explains, on average, 75% of the variance in our deterministic model. Figure 3 are calculated globally for one snapshot. Both the diagnosed and modeled subgrid-scale temperature variance fields approach 0 at depths greater than 2,000 m, as shown in Figure 3. Hence, we include only the horizontal locations where the ocean depth is 2,000 m or greater and consider only the top 2,000 m, 46 out of 62 vertical levels. To speed up calculations, we calculate the EOFs over a random sample of 2,000 horizontal locations between 60°S and 60°N. Outside of these latitudes subgrid-scale temperature variances are small as are lateral temperature gradients. Thus, the deterministic parameterization is near zero and the impact of multiplicative noise is small. Results are similar across sev- Figure 4. Left: Empirical probability density function of the percent of variance explained when the vertical structure of the stochastic parameterization is taken to be the same as the vertical structure of our deterministic parameterization of subgrid-scale temperature variance. The peak at 95% indicates that at a random horizontal location the most probable amount of variance explained by a depth-independent noise field is 95%. Right: The probability ( y axis) that at a random horizontal location a depth-independent noise field explains at least a given percent of the variance (x axis). For example, at a random horizontal location there is a 90% chance that a depth-independent noise field explains at least 40% of the variance and a 50% chance that it explains at least 80% of the variance. Data are taken from a temporal snapshot at locations between 60°S and 60°N, where the water column is at least 2,000 m deep and the depth-averaged subgrid-scale temperature variance is at least 10 −4 (°C) 2 . eral samples. The leading EOF of the diagnosed subgrid-scale temperature variance, σ 2 T , explains close to 80% of the variance over depth. Similarly, the leading EOF of our deterministic model, s 2 T , also explains close to 80% of the variance.

The EOFs shown in
The comparison of leading EOFs motivates our decision to use the vertical structure inherent in our deterministic parameterization as the vertical structure for our stochastic parameterization, that is, to use a depth-independent multiplicative noise. We quantify the strength of this approximation by calculating the percent of variance explained by a depth-independent noise field and find that it explains, on average, 75% of the variance, as shown in Figure 4.
The choice to use our deterministic model as the vertical structure for the unresolved subgrid-scale temperature variance leads to the following simplified model, σ 2 T ðx; y; z; tÞ ≈ e χðx; y; tÞ · s 2 T ðx; y; z; tÞ: In the following section we discuss how we diagnose the field χ(x, y, t) and some interesting properties of this field.

Diagnosing the Noise Field
Consider a single snapshot at time t and let χðx; yÞ ¼ χðx; y; tÞ. Our proposed multiplicative field is independent of depth, so at each horizontal location (x, y) we multiply the entire column by a single multiplicative factor e χ(x, y) . That is, We use boldface to indicate that the quantities σ 2 T ðx; yÞ and s 2 T ðx; yÞ are vectors. We denote the depth of the ith grid cell in a column by z i . In our diagnosis using the 0.1°model data, N ¼ 46 out of a possible 62 vertical levels because we consider only the top 2,000 m.
Next we see that the mean of e χ(x, y) is close to 1, so that the mean of our stochastic parameterization is close to the mean of our deterministic parameterization. Recall that our deterministic model has the form σ 2 T ¼ s 2 T þ η where the error term, η, is small. We fit our model at the level of individual data points. Nonetheless, when we consider the entire column, σ 2 T ¼ s 2 T þ η, we see that each entry in the column η is small, and hence conclude that the vector η is also small when compared to the deterministic model ?
Since η is small compared to the deterministic model, the second term on the right hand side of Equation 36 is also small. Thus the mean of the field e χ(x, y) is approximately 1 everywhere.
The quantity χ that we wish to provide a stochastic model for can be estimated from the data as follows χðx; yÞ ¼ ln σ 2 T ðx; yÞ; s 2 T ðx; yÞ We diagnose the field χ (x, y, t) at each horizontal location (x, y) on the coarse-grained 1°grid over one year of model output, which is 73 snapshots (as described in section 2.2). We include only the horizontal locations where the ocean depth is 2,000 m or greater and considered only the top 2,000 m, 46 out of 62 vertical levels.
We consider data with latitudes between 60°S and 60°N. We calculate sample mean and variance at each spatial location (x, y) by time-averaging over the year of model data.
The sample mean of the diagnosed field χ (x, y, t) is 0.10 and the sample variance is σ 2 χ ¼ 0:39. In our stochastic model we choose to use a mean of 0 for simplicity because the sample mean is small compared to the sample standard deviation, and as we see in Equation 36 the mean of e χ is approximately 1. Importantly, although there is some spatial variability in the distribution of χ, neither the mean nor the variance display any large-scale spatial structure. Additionally, we find that χ (x, y, t) is approximately normally distributed so that the multiplicative noise e χ (x, y, t) is log-normally distributed. We validate normality of χ through Q-Q plots and histograms, shown in Figure 5. The assumption of normality is very good for the left tail of the distribution. The right tail follows a normal distribution out to 1.5 standard deviations after which it diverges. In choosing to model this field with a mean zero normal distribution, we underpredict the large positive values. An improved representation might be achieved using an alternative distribution like the "stochastically generated skewed" distribution of Penland and Sardeshmukh (2012). In the next section we discuss the temporal correlation of this diagnosed field χ (x, y, t).

Temporal Correlation
We choose to model the temporal correlation of the diagnosed field χ (x, y, t) (Equation 37) with an AR(1) time series model at each spatial location (x, y). That is, Figure 5. Left: Empirical probability density function (solid blue) of the diagnosed field χ(x, y) and proposed normal distribution (dashed red). Right: Q-Q plot comparing quantiles of χ to quantiles of a standard normal. Where the blue crosses lie along the red line, the assumption of normality is reasonable. The left tail follows a normal distribution. The right tail follows a normal distribution out to 1.5 standard deviations. χðx; y; tÞ ¼ φðx; y; tÞχðx; y; t − δtÞþεðx; y; tÞ; where the AR(1) coefficient φ and the innovation, or error, ε are allowed to vary over space. In practice, we model φ(x, y, t) as a function of instantaneous resolved surface velocity. However, for the purposes of diagnosing φ(x, y, t), we assume that it varies slowly in time so that one spatial field φ(x, y) with no time dependence is a reasonable model for φ(x, y, t) over the year of model data we consider. We use maximum likelihood estimation to diagnose the time-independent AR(1) field φ(x, y) at each (x, y) location. The AR(1) parameter φ(x, y) is related to a decorrelation time τ(x, y) in the following way φðx; yÞ ¼ e −δt=τðx; yÞ ; where δt is the time interval over which the output is saved, here 5 days. We now consider the diagnosed field of decorrelation time τ. Rearranging Equation 39 gives We now seek a parameterization of the decorrelation time of subgrid-scale temperature variance. The unresolved temperature fluctuations ΔT which comprise σ 2 T are carried by the mean flow. We expect that the inherently subgrid-scale quantity σ 2 T will decorrelate over a time period comparable to the time it takes a fluctuation ΔT to be carried through the grid cell. This time is precisely the transit time, which is approximated by grid cell length divided by speed, ‖δx‖/‖u‖, where u ¼ ðu; vÞ is velocity. We investigate two values of ‖u‖: (i) integrated over depth and (ii) at the surface and find that both are good predictors of decorrelation time. We choose to work with resolved ‖u‖ at the surface going forward for computational ease, and because the resolved velocity is primarily responsible for sweeping perturbations ΔT through a coarse grid cell. In our actual parameterization, the AR(1) parameter φ(x, y, t) is a function of instantaneous resolved surface velocity, ‖u‖. For the purposes of diagnosing φ we model a single spatial field φ(x, y) for the whole year of model (bottom). The color scale is on a log scale and is generated with cmocean (Thyng et al., 2016). Spatial locations with less than 2,000 m depth are not considered and are shown in white. The overall large-scale spatial structure of the diagnosed decorrelation time (top) resembles that of the modeled decorrelation time (bottom) with a pattern correlation of 0.80 after discarding outliers where the diagnosed decorrelation time is greater than 500 days. data. For this reason, we fit our model using ‖u‖ averaged over one year. We time average ū 2 ðx; y; tÞþv2 ðx; y; tÞ over a year where ūðx; y; tÞ is the zonal surface velocity on the coarse-grained 1°grid at the horizontal location (x, y) and time t and vðx; y; tÞ is the analogous quantity for the meridional surface velocity. We denote the time averaged energies by ū 2 ðx; yÞþv2ðx; yÞ, dropping the time variable t.
We propose a model for decorrelation time τ, which we diagnose through Equation 40, of the following form, where δx and δy are the zonal and meridional grid-scale lengths, respectively. Using the diagnosed decorrelation time τ, we estimate the coefficient k by calculating the sample geometric mean of τðx; yÞ · ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi δx 2 þ δy 2 ū 2 ðx; yÞþv2ðx; yÞ which gives k ¼ 3:7. This value varies slightly across different coarse-grained resolutions: At (1/2)°we estimate k ¼ 2:3, and at 2°we estimate k ¼ 2:7. The overall large-scale structure of the modeled values of τ resemble the spatial patterns seen in the diagnosed values of τ, as shown in Figure 6.
Solving for the AR (1) As the AR(1) coefficient varies in space, so does the theoretical process variance. At a given horizontal location (x, y) the variance of the AR(1) process in Equation 38 is where σ 2 χ is the variance of the noise field χ(x, y, t) and σ 2 εðx; yÞ is the variance of the innovations at the location (x, y). The field χ(x, y, t) is approximately homoskedastic, meaning the variance is approximately constant across space and time. We estimate the variance to be σ 2 χ ¼ 0:39. In order to satisfy Equation 44, we must have that the variance σ 2 ε of the AR(1) innovation ε also varies, and we set σ 2 εðx; yÞ ¼ σ 2 χ 1 − φðx; yÞ 2 À Á .
In the next section we consider the spatial correlation of the scaled innovations, with unit variance, given by

Horizontal Spatial Correlation
We diagnose the scaled innovations θ(x, y, t) with modeled AR(1) coefficient from Equations 38, 43, and 45. We calculate nearest neighbor correlations for the 20 nearest east-west neighbors as the time average of the product θ(x, y, t)θ(x + n · δx, y, t) where n ¼ ±1; 2; …; 20. We calculate north-south nearest neighbor correlations analogously. We are building this model for an inherently subgrid-scale quantity and thus expect correlations across grid cells to be small. As expected, the sample correlations, shown in Figure 7, are negligible.
We proceed with a model that is uncorrelated in space. The resulting stochastic parameterization and the preceding deterministic parameterization are summarized in the following section.

Summary of Deterministic and Stochastic Parameterizations
Summarizing the previous section, our full stochastic model for subgrid-scale temperature variance is σ 2 T ðx; y; z; tÞ ≈ e χðx; y; tÞ · s 2 T ðx; y; z; tÞ; where the time correlation of χ(x, y, t) is given by the following AR (1) where k ¼ 3:7 at 1°, k ¼ 2:3 at (1/2)°, and k ¼ 2:7 at 2°. The innovations θ(x, y, t) are independent and identically distributed standard Gaussian white noise in space and time. From our diagnosed field we estimate σ 2 χ ¼ 0:39. Finally, we substitute this stochastic model for subgrid-scale temperature variance into our model for the density correction, which yields ρðx; y; z; tÞ ≈ ρ m ðx; y; z; tÞþe χðx; y; tÞ · s 2 T ðx; y; z; tÞ 2 · ∂ 2 Tρ ðT ðx; y; z; tÞ; Sðx; y; z; tÞÞ: The deterministic parameterization is given by Equation 49 with χ set to 0. The constant of proportionality c used in the parameterization of subgrid-scale temperature variance, s 2 T defined in Equation 25, is estimated from model data to be c ¼ 0:20 for nominal 1°resolution models. Estimates of c for other resolutions are shown in Figure A1.
Because the multiplicative noise has a lognormal distribution the mean of the stochastic model is not equal to the deterministic model. Instead, the median of the stochastic model is equal to the deterministic model, while the mean of the stochastic model is larger than the deterministic model by a factor of exp σ 2 χ =2 h i , or about 21.5%. The mode of the stochastic model is smaller than the deterministic model by a factor of exp −σ 2 χ h i , or about 32.3%. The stochastic model frequently predicts smaller density corrections than the deterministic model, and occasionally predicts density errors much larger than the deterministic model.

Summary and Conclusions
In this paper we analyze the impact of unresolved temperature and salinity fluctuations on the evaluation of cell-averaged density. The standard model calculation of cell-averaged density applies the nonlinear EOS to cell-averaged temperature and salinity. This is not the same as the true cell-averaged density, which is calculated by applying the EOS locally and then averaging over a grid cell. We parameterize this difference with a spatially averaged second-order Taylor expansion of the EOS about cell-averaged temperature and salinity.
The linear terms average to 0, which leaves the three second-order terms. The EOS is nearly linear in salinity, so the second derivatives involving salinity are small. We use model data to show that the corresponding second-order salinity and the temperature salinity cross term are negligible compared to the temperature term. This simplifies our parameterization to one term, the subgrid-scale temperature variance times the second derivative of the EOS with respect to temperature. We propose a deterministic parameterization of subgrid-scale temperature variance that is proportional to the grid cell length times the resolved horizontal temperature gradient, squared. Using model data, we estimate the coefficient of proportionality and show that this value changes only slightly, between 0.1 and 0.25, over a range of model resolutions ( Figure A1). This parameter may need to be tuned in future studies because lateral density gradients in 1°models are expected to be smaller than those we diagnosed from coarse graining the 0.1°model. Although the deterministic parameterization of subgrid-scale temperature variance is fit to model data, there is quite a bit of unexplained variability.
We propose a multiplicative stochastic parameterization which replicates the statistical properties of subgrid-scale temperature variance. We diagnose the noise field and find that it is largely depth independent. It suffices to model a two-dimensional noise field, which we diagnose by projecting the subgrid-scale temperature variance onto our deterministic parameterization. We find that the field is approximately log-normally distributed, implying that the mean of the stochastic noise is somewhat larger than the deterministic model due to the rare occurrence of large multiplicative factors. For the temporal correlation we use an AR(1) process with a coefficient that is a function of resolved surface velocity with innovations that are uncorrelated in horizontal space. Our deterministic and stochastic parameterizations are both data informed because we use model output to estimate free parameters. Throughout this work we make the assumption that the scale separation between the model data we use and the coarse-grained resolution is sufficient to provide reliable statistics of unresolved scales on the coarse model grid. Moreover, while the parameters we estimate in this paper change very little over the snapshots we consider, it is always possible that the parameters may shift in a changing climate. Both of our parameterizations are computationally efficient because they require only one additional evaluation of a nonlinear function, compared to two or more in other parameterizations of this effect. This work demonstrates a density correction which provides a good fit to static fields. However, this does not guarantee an improvement in the dynamical model, a requirement for the adoption of any new parameterization.
An upcoming paper will report the impact of each version of our parameterization on large-scale circulation in MOM6 when used to correct: (1) the hydrostatic pressure gradient force, as in Brankart (2013); and (2) computation of the isopycnal slope in the Gent-McWilliams and Redi parameterizations (Gent & McWilliams, 1990;Redi, 1982). Based on previous experience with the Brankart parameterization we expect an improvement in the path of the Gulf Stream, and an increase in ensemble spread for ensemble forecasts and data assimilation (Brankart, 2013;Brankart et al., 2015;Zanna et al., 2019). Brankart's parameterization only had a stochastic version, so it is not clear how many of the improvements were due to the mean and how much were due to the stochasticity. However, we speculate that many of the improvements to the mean circulation come from the mean density correction. Indeed, the curvature of the EOS is negative in all physically relevant regimes. As a result, ignoring the effect of unresolved temperature and salinity fluctuations on large-scale density leads to a systematic overestimation of the density in eddy energetic regions such as the Gulf Stream. We contend that one of the main effects of Brankart's parameterization is to improve the mean density used in the model's hydrostatic equation, steering currents, such as the Gulf Stream, in more realistic pathways. When interpreted like that, Brankart's parameterization, and by extension the new parameterization proposed in this manuscript, can be seen as variations of the semiprognostic method from Greatbatch et al. (2004). The difference is that the semiprognostic method relies on a convex combination of the model density and an input density usually derived from climatological hydrographic data to compute a correction term to the horizontal momentum equations, while we and Brankart (2013) parameterize the impact of unresolved temperature and salinity fluctuations on the evaluation of the density to infer their correction terms to the horizontal momentum equations. The semiprognostic method has been used in many ocean models and across a large range of resolutions. It has been shown to consistently reduce the model systematic error and, for example, improve the representation of the Gulf Stream/North Atlantic Current systems. This gives us confidence in the ability of our parameterization to improve the simulated average large-scale circulation of the ocean.
using the modeled subgrid-scale temperature variance. One possibility is that the terms involving salinity in the Taylor expansion (12), while not significant at the level of density, become significant when differenced at the level of the density gradient. However, as in the correction to density, we find that a finite difference of the temperature term alone with the diagnosed subgrid-scale temperature variance is an excellent model of the correction to the horizontal gradient of density, with R 2 ¼ 0:993. Thus, we conclude that the inclusion of salinity terms is not necessary. An improvement to the parameterization of subgrid-scale temperature variance is likely to make a bigger impact on the correction to the horizontal gradient of density than the inclusion of subgrid-scale salinity fluctuations.
To see why our model with parameterized subgrid-scale variance performs worse at the level of the gradient, we compare the errors present in each model. We parameterize the density correction through ρ c given in Equation 30. Our parameterization is not perfect, so there is some nonzero error η associated with our parameterization. That is, We now compare the error η 1 in the parameterized density correction to the error in our parameterized correction to the gradient of density.
Apply the discrete horizontal gradient to both sides of Equation C1 to find The true discrete large-scale horizontal gradient of density is ∇ d ρ. The horizontal density gradient calculated by the model is ∇ d ρ m . Note that some ocean models use thermal expansion and haline contraction coefficients to calculate the horizontal gradient of density. Our results do not change when we calculate the horizontal density gradient this way, rather than through the finite difference, ∇ d ρ m .
Comparing Equation C2 to Equation C1, we see that the total error at the level of the gradient is ∇ d η, compared to η at the level of density. The operator ∇ d is bounded because it is discrete. Nonetheless, ∇ d η can be significantly larger than η when η is dominated by small scales, which explains the substantially decreased R 2 value.

Data Availability Statement
Data and scripts used in this article are available online (at https://doi.org/10.5281/zenodo.4019845).