Volume 12, Issue 12 e2020MS002108
Research Article
Open Access

Uncertainty Quantification of Ocean Parameterizations: Application to the K-Profile-Parameterization for Penetrative Convection

A. N. Souza

Corresponding Author

A. N. Souza

Massachusetts Institute of Technology, Cambridge, MA, United States

Correspondence to:

A. N. Souza,

[email protected]

Search for more papers by this author
G. L. Wagner

G. L. Wagner

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
A. Ramadhan

A. Ramadhan

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
B. Allen

B. Allen

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
V. Churavy

V. Churavy

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
J. Schloss

J. Schloss

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
J. Campin

J. Campin

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
C. Hill

C. Hill

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
A. Edelman

A. Edelman

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
J. Marshall

J. Marshall

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
G. Flierl

G. Flierl

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
R. Ferrari

R. Ferrari

Massachusetts Institute of Technology, Cambridge, MA, United States

Search for more papers by this author
First published: 24 October 2020
Citations: 9

Abstract

Parameterizations of unresolved turbulent processes often compromise the fidelity of large-scale ocean models. In this work, we argue for a Bayesian approach to the refinement and evaluation of turbulence parameterizations. Using an ensemble of large eddy simulations of turbulent penetrative convection in the surface boundary layer, we demonstrate the method by estimating the uncertainty of parameters in the convective limit of the popular “K-Profile Parameterization.” We uncover structural deficiencies and propose an alternative scaling that overcomes them.

Key Points

  • A Bayesian methodology can be used to probe turbulence parameterizations and better understand their biases and uncertainties
  • Parameterization parameter distributions, learned using high-resolution simulations, can be used as prior distributions for climate studies

Plain Language Summary

Climate projections are often compromised by significant uncertainties which stem from the representation of physical processes that cannot be resolved—such as clouds in the atmosphere and turbulent swirls in the ocean—but which have to be parameterized. We propose a methodology for improving parameterizations in which they are tested against, and tuned to, high-resolution numerical simulations of subdomains that represent them more completely. A Bayesian methodology is used to calibrate the parameterizations against the highly resolved model, to assess their fidelity and identify shortcomings. Most importantly, the approach provides estimates of parameter uncertainty. While the method is illustrated for a particular parameterization of boundary layer mixing, it can be applied to any parameterization.

1 Introduction

Earth System Models (ESMs) require parameterizations for processes that are too small to resolve. Uncertainties arise both due to deficiencies in the scaling laws encoded in the parameterizations and the nonlinear interactions with resolved model components, sometimes leading to unanticipated and unphysical results. The first challenge can be addressed by improving the representation of the unresolved physics (e.g., Schneider et al., 2017), while the second requires “tuning” of the parameterizations when implemented in the full ESM (e.g., Hourdin et al., 2017). In this paper, we illustrate how to leverage recent advances in computation and uncertainty quantification to make progress toward the first challenge. Our focus will be on oceanic processes, but the approach can be applied to any ESM parameterization, provided that a high-resolution submodel can be constructed.

The traditional approach to the formulation of parameterizations of subgrid-scale processes is to derive scaling laws that relate the net effect of such processes to variables resolved by the ESMs. These scaling laws are then tested with either field observations (e.g., Large et al., 1994; Price et al., 1986), laboratory experiments (e.g., Cenedese et al., 2004; Deardorff et al., 1980), or results from a high resolution simulations (e.g., Harcourt, 2015; Li & Fox-Kemper, 2017; Reichl et al., 2016; Wang et al., 1996). Rarely are parameterizations tested over a wide range of possible scenarios due to the logistical difficulty and high cost of running many field experiments, the time necessary to change laboratory setups, and computational demand. The computational limitations have become much less severe over the last few years through a combination of new computer architectures such as Graphic Processing Units (GPUs; Besard, Churavy, et al., 2019), new languages that take advantage of these architectures (e.g., Julia Bezanson et al., 2017), and improved large eddy simulation (LES) algorithms (Sullivan & Patton, 2011; Verstappen, 2018). Modern computational resources have opened up the possibility of running libraries of LES simulations to explore a vast range of possible scenarios. This paper discusses how such computational advances can be applied to assess parameterizations in ocean models.

LES simulations alone are not sufficient to formulate parameterizations. Statistical methods are needed to extract from the LES solutions the functional relationships between small-scale processes and coarse variables available in ESMs. A common approach is to rely on well-established scaling laws and use the LES solutions to constrain the nondimensional parameters that cannot be determined from first principles. In this approach, only a few LES simulations are necessary to find the optimal parameter values. However, it is rare that scaling laws and associated parameterizations perfectly capture the functional dependencies of large-scale variables—if they did, they would be referred to as solutions rather than parameterizations. In general, it is necessary to run a large ensemble of LES simulations to estimate optimal parameter values and test whether those values hold for different scenarios, thereby supporting the functional dependencies.

State estimation, which has a long tradition in geophysics (Wunsch, 2006), has been used to constrain parameter values. A loss function is chosen to quantify the mismatch between the prediction of the parameterization and observations. Uncertain parameters are then adjusted to minimize the loss function. One can also estimate the standard deviation around the optimal values by computing the Hessian of the loss function (Sraj et al., 2014; Thacker, 1989).

An alternative approach, based on the seminal work of Bayes (1763) and its modern incarnation (Jaynes, 2003), is arguably better suited to constrain the transfer properties of turbulent processes. The Bayesian method allows one to estimate the entire joint probability distribution of all parameters. The method is a crucial extension over state estimation, because the statistics of turbulent processes are generally far from Gaussian (Frisch, 1995) and thus are not fully characterized by the first and second moments alone. In the Bayesian approach, one defines a prior parameter distribution, based on physical considerations, and a “likelihood function,” which measures the mismatch between the parameterized prediction and the LES simulation. Based on this information, Bayes' formula shows how to compute the posterior distribution of the parameters consistent with the LES simulations and the parameterization. If the posterior distribution is narrow and peaked, then one can conclude that a unique set of parameters can be identified which can reproduce all LES results. In this limit, the Bayesian approach does not provide more information than state estimation. However, the power of Bayes' formula is that it can reveal distinct parameter regimes, the existence of multiple maxima, relationships between parameters, and the likelihood of parameter values relative to optimal ones.

The Bayesian approach can also be used to test the functional dependence of the parameterization on large-scale variables. One estimates the posterior distribution on subsets of the LES simulations run for different scenarios. If the posterior probabilities for the different scenarios do not overlap, the functional form of the parameterization must be rejected. We will illustrate how this strategy can be used to improve the formulation of a parameterization.

Bayesian methods are particularly suited to constrain ESM parameterizations of subgrid-scale ocean processes. Most of these processes, such as boundary layer or geostrophic turbulence, are governed by well understood fluid dynamics and thermodynamics. Thus, LES simulations provide credible solutions for the physics. The atmospheric problem is quite different where leading order subgrid-scale processes such as cloud microphysics are governed by poorly understood physics that may not be captured by LES simulations.

In this paper, we will apply Bayesian methods to constrain and improve a parameterization for the surface boundary layer turbulence that develops when air-sea fluxes cool the ocean. LES simulations that resolve all the relevant physics will be used as ground truth to train the parameterization. Our paper is organized as follows: In section 2 we describe the physical setup and the LES model. In section 3 we introduce Bayesian parameter estimation for the parameters in the K-Profile Parameterization (KPP). We then perform the parameter estimation in the regime described by section 2 and show how the Bayesian approach provides insight on how to improve the KPP parameterization. Finally, we end with a discussion in section 4.

2 Large Eddy Simulations and K-Profile-Parameterization of Penetrative Convection

During winter, high latitude cooling induces near-surface mixing by convection which generates a “mixed layer” of almost uniform temperature and salinity which can reach depths of hundreds of meters—see Marshall and Schott (1999) for a review. At the base of the mixed layer, convective plumes can penetrate further into the stratified layer below—called the “entrainment layer”—where plume-driven turbulent mixing between the mixed layer and stratification below cools the boundary layer. This process, in which the layer is cooled both at the surface and by turbulent mixing from the entrainment layer below, is called penetrative convection. Here we evaluate the ability of the K-Profile Parameterization (Large et al., 1994) to capture penetrative convection by comparing predictions based on it against large eddy simulations (LES) of idealized penetrative convection into a resting stratified fluid. It provides the context in which we outline the Bayesian approach to parameter estimation which we advocate.

2.1 Penetrative Convection Into a Resting Stratified Fluid

We suppose a constant surface cooling Qh > 0 to a resting, linearly stratified boundary layer with the initial state
urn:x-wiley:jame:media:jame21262:jame21262-math-0001(1)
where z ∈ [− Lz, 0], urn:x-wiley:jame:media:jame21262:jame21262-math-0002 is the resolved velocity field simulated by LES, b is buoyancy, N2 is the initial vertical buoyancy gradient, and urn:x-wiley:jame:media:jame21262:jame21262-math-0003 is a Gaussian white noise process added to induce a transition to turbulence. The surface buoyancy flux Qb is related to the imposed surface cooling Qh, which has units W m−2, via
urn:x-wiley:jame:media:jame21262:jame21262-math-0004(2)
where urn:x-wiley:jame:media:jame21262:jame21262-math-0005 is the thermal expansion coefficient (assumed constant), urn:x-wiley:jame:media:jame21262:jame21262-math-0006 is gravitational acceleration, urn:x-wiley:jame:media:jame21262:jame21262-math-0007 is a reference density, and urn:x-wiley:jame:media:jame21262:jame21262-math-0008 3993 J/(kg °C) is the specific heat capacity. Our software and formulation of the large eddy simulation model are discussed in Appendix A.
Results from a large eddy simulation of turbulent penetrative convection in a domain urn:x-wiley:jame:media:jame21262:jame21262-math-0010 meters and 256 × 256 × 512 grid cells, respectively, is presented in Figure 1. The resulting horizontally averaged temperature profiles are not affected by the domain size. The left panel shows the three-dimensional temperature field urn:x-wiley:jame:media:jame21262:jame21262-math-0011 associated with the buoyancy b, where urn:x-wiley:jame:media:jame21262:jame21262-math-0012 is the surface temperature at urn:x-wiley:jame:media:jame21262:jame21262-math-0013. The right panel shows the horizontally averaged buoyancy profile
urn:x-wiley:jame:media:jame21262:jame21262-math-0014(3)
Details are in the caption following the image
A 3-D simulation of the LES model of the Boussinesq equations and its horizontal average at urn:x-wiley:jame:media:jame21262:jame21262-math-0009 days. The Δh region of the figure on the right corresponds to the entrainment layer, h − Δh corresponds to the mixed layer, and h corresponds to the boundary layer depth.

The visualization reveals the two-part boundary layer produced by penetrative convection: close to the surface, cold and dense convective plumes organized by surface cooling sink and mix ambient fluid, producing a well-mixed layer that deepens in time. Below the mixed layer, the momentum carried by sinking convective plumes leads them to overshoot their level of neutral buoyancy (nominally, the depth of the mixed layer), “penetrating” the stably stratified region below the surface mixed layer and generating the strongly stratified entrainment layer. The total depth of the boundary layer is h and includes the mixed layer and the entrainment layer of thickness Δh. Turbulent fluxes are assumed negligible below urn:x-wiley:jame:media:jame21262:jame21262-math-0015.

In Figure 2, we show the evolution of h(t) defined as the first depth from the bottom where the stratification is equal to a weighted average of the maximum stratification and the initial stratification (The weights are 2/3 for the initial stratification N2 and 1/3 for the maximum stratification urn:x-wiley:jame:media:jame21262:jame21262-math-0016 so that h satisfies urn:x-wiley:jame:media:jame21262:jame21262-math-0017. This guarantees that h is a depth where the local stratification lies between the background stratification and the maximum stratification since it is defined as the first depth starting from the bottom that satisfies such a criteria.). The dotted line confirms that the evolution after an initial transient is best fit by the formula,
urn:x-wiley:jame:media:jame21262:jame21262-math-0018(4)
where N2 is the initial stratification and the numerical factor is a best-fit parameter.
Details are in the caption following the image
Boundary layer depth and its evolution in time after initial transients. The blue squares are the analytic scaling 4, the red line is an estimate of the boundary layer depth directly from the LES (described in the text), and the purple line is the classic scaling which ignores the entrainment layer 8.
Equation 4 is easily understood through dimensional considerations (up to prefactors), but more information flows from an analysis of the horizontally averaged buoyancy equation,
urn:x-wiley:jame:media:jame21262:jame21262-math-0019(5)
where urn:x-wiley:jame:media:jame21262:jame21262-math-0020 is the horizontally averaged buoyancy, urn:x-wiley:jame:media:jame21262:jame21262-math-0021 is the horizontally averaged vertical advective flux, and urn:x-wiley:jame:media:jame21262:jame21262-math-0022 is the horizontally averaged vertical diffusive flux. Integrating the equation in time between urn:x-wiley:jame:media:jame21262:jame21262-math-0023 and some later time urn:x-wiley:jame:media:jame21262:jame21262-math-0024, and in the vertical between the surface, where urn:x-wiley:jame:media:jame21262:jame21262-math-0025, and the base of the entrainment layer where all turbulent fluxes vanish, one finds
urn:x-wiley:jame:media:jame21262:jame21262-math-0026(6)
Substituting urn:x-wiley:jame:media:jame21262:jame21262-math-0027 and urn:x-wiley:jame:media:jame21262:jame21262-math-0028, an approximation of the profile shown in Figure 1b except at very early times in the simulation, yields
urn:x-wiley:jame:media:jame21262:jame21262-math-0029(7)

The first term on the left of Equation 7 describes boundary layer deepening due to buoyancy loss at the surface, while the second term corresponds to the further cooling caused by turbulent mixing in the entrainment layer. Other authors have also arrived at a similar expression for the boundary layer depth upon taking into account turbulent entrainment. See, for example, Appendix F in Van Roekel et al. (2018).

Ignoring turbulent mixing in the entrainment layer, that is, setting urn:x-wiley:jame:media:jame21262:jame21262-math-0030, yields the deepening rate
urn:x-wiley:jame:media:jame21262:jame21262-math-0031(8)
which differs by roughly 20% from the best fit expression 4 due to the effects of turbulent mixing in the entrainment layer. Equation 8 is the deepening rate associated with a convective adjustment parameterization and is known as the empirical law of free convection. We now review how these processes are represented in the KPP model.

2.2 The K-Profile Parameterization of Penetrative Convection

In penetrative convection in a horizontally periodic domain, the K-Profile Parameterization models the horizontally averaged temperature profile, urn:x-wiley:jame:media:jame21262:jame21262-math-0032 with the coupled equations
urn:x-wiley:jame:media:jame21262:jame21262-math-0033(9)
urn:x-wiley:jame:media:jame21262:jame21262-math-0034(10)
where T(z, t) is the modeled temperature meant to approximate urn:x-wiley:jame:media:jame21262:jame21262-math-0035, h(t) is the boundary layer depth, urn:x-wiley:jame:media:jame21262:jame21262-math-0036 is a set of free parameters, F(T, h; C) is the parameterized temperature flux, and urn:x-wiley:jame:media:jame21262:jame21262-math-0037 is a nonlinear constraint that determines the boundary layer depth at each time t. Our formulation, which isolates the four free parameters {CS, CN, CD, CH}, is superficially different but mathematically equivalent to the formulation in Large et al. (1994) (see Appendix C for details). Finally, we emphasize that the K-Profile Parameterization is deemed successful only if it accurately models the evolution of the entire observed temperature profile urn:x-wiley:jame:media:jame21262:jame21262-math-0038, rather than, say, the boundary layer depth or the buoyancy jump across the base of the mixed layer.
The K-Profile Parameterization (KPP) represents F through the sum of a downgradient flux and a non-local flux term (Large et al., 1994),
urn:x-wiley:jame:media:jame21262:jame21262-math-0039(11)
for h ≤ z ≤ 0 and 0 otherwise, and urn:x-wiley:jame:media:jame21262:jame21262-math-0040. Here urn:x-wiley:jame:media:jame21262:jame21262-math-0041 is the convective turbulent velocity scale, h is the boundary layer depth, urn:x-wiley:jame:media:jame21262:jame21262-math-0042 is the “K-profile” shape function (K is the namesake downgradient diffusivity of the K-Profile Parameterization), and Φ is a “non-local” flux term that models convective boundary layer fluxes not described by downgradient diffusion.
The KPP model estimates the boundary layer depth h using the nonlinear constraint (10). The boundary layer geometry introduced in the right panel of Figure 1 motivates the form of nonlinear constraint. The jump in buoyancy, Δb, is the difference between the buoyancy in the mixed layer and the base of the entrainment region. The buoyancy jump may thus be written in terms of the entrainment region thickness, Δh, and the entrainment region buoyancy gradient, urn:x-wiley:jame:media:jame21262:jame21262-math-0043, as urn:x-wiley:jame:media:jame21262:jame21262-math-0044. Using the plume theory outlined in Appendix B to motivate the scaling Δh ∝ w/Ne, we thus find
urn:x-wiley:jame:media:jame21262:jame21262-math-0045(12)
for some universal proportionality constant urn:x-wiley:jame:media:jame21262:jame21262-math-0046. KPP posits that the boundary layer depth h is the first such depth from the surface at which Equation 12 holds.
Large et al. (1994) estimate the mixed layer buoyancy with an average over the ‘surface layer’, urn:x-wiley:jame:media:jame21262:jame21262-math-0047 where urn:x-wiley:jame:media:jame21262:jame21262-math-0048, and 0 < CS < 1 is a free parameter that defines the fractional depth of the surface layer relative to the total boundary layer depth, h. The buoyancy jump becomes, therefore,
urn:x-wiley:jame:media:jame21262:jame21262-math-0049(13)
Large et al. (1994) then express the stratification in the entrainment region, Ne, in terms of the stratification at the base of the boundary layer, such that
urn:x-wiley:jame:media:jame21262:jame21262-math-0050(14)
The scaling in Equation 14 introduces a new free parameter in addition to urn:x-wiley:jame:media:jame21262:jame21262-math-0051; however, because this free parameter is not independent from urn:x-wiley:jame:media:jame21262:jame21262-math-0052, we combine the two into a new free parameter CH, which we call the “mixing depth parameter.” To prevent division by zero, the small dimensional constant 10−11m2s2 is added to the denominator of Equation 12 (Griffies et al., 2015). Combining Equations 12-13, and 14, we can write
urn:x-wiley:jame:media:jame21262:jame21262-math-0053(15)

Equation 15 is the implicit nonlinear constraint in Equation 10 that determines the boundary layer depth, h. In Appendix B we discuss the physical content of Equation 15 for the case of penetrative convection.

The boundary layer depth criteria in Equation 15 is often referred to as the bulk Richardson number criteria, because in mechanically forced turbulence the denominator is replaced by an estimate of the mean shear squared and CH becomes a critical bulk Richardson number (Large et al., 1994). In penetrative convection there is no mean shear, and CH is not a Richardson number. See Appendix C for more details.

The representation of penetrative convection in KPP has four free parameters: the surface layer fraction CS, the flux scalings CN and CD in Equation 11, and the mixing depth parameter CH in Equation 15. Ranges for their default values are reported in Large et al. (1994). We choose reference parameters within those ranges as
urn:x-wiley:jame:media:jame21262:jame21262-math-0054(16)

These parameters are not the original set of independent parameters proposed by Large et al. (1994), but rather algebraic combinations thereof. Nevertheless, we emphasize that our formulation is mathematically identical to that proposed by Large et al. (1994). The mapping between the current set of parameters and the original are one-to-one; hence, no information is lost in transforming from the current set of parameters to the original ones; see Appendix C for details. With regard to the numerical implementation, we do not use enhanced diffusivity as explained in the appendices of Large et al. (1994). Our objective is to calibrate the free parameters urn:x-wiley:jame:media:jame21262:jame21262-math-0055 by comparing KPP temperature profiles T(z, t; C) with the LES output urn:x-wiley:jame:media:jame21262:jame21262-math-0056.

3 Model Calibration Against LES Solutions

We outline a Bayesian method for optimizing and estimating the uncertainty of the four free parameters through a comparison of the parameterization solution for T(z, t; C) and the output urn:x-wiley:jame:media:jame21262:jame21262-math-0057 of the LES simulations. First we introduce a loss function to quantify the parameterization-LES difference,
urn:x-wiley:jame:media:jame21262:jame21262-math-0058(17)

We choose the square error in space to reduce the sensitivity to vertical fluctuations in the temperature profile. We take the maximum value of the squared error in time for t ∈ [t1, t2] to guarantee that the temperature profile never deviates too far from the LES simulation at each instant in time. The parameterization is taken to be the KPP model given by Equations 9 through 15, and the data are the horizontally averaged LES output. The initial time t1 is chosen after the initial transition to turbulence of the LES simulations.

A natural way to extend the concept of loss functions to account for parameter uncertainty is to introduce a likelihood function for the parameters. Similar to how the form of the loss function is critical to the estimation of optimal parameters, the form of the likelihood function is critical for estimating the parameter uncertainties. The likelihood function quantifies what we mean by “good“ or “bad” parameter choices. The Bayesian method uses this information to estimate parameter uncertainties. These estimates are only as good as the choice of likelihood function, much like optimal parameters are only as good as the choice of the loss function. See, for example, Morrison et al. (2020), Nadiga et al. (2019), Schneider et al. (2017), Sraj et al. (2016), Urrego-Blanco et al. (2016), Zedler et al. (2012), and van Lier-Walqui et al. (2012) for definitions of likelihoods in various geophysical/fluid dynamical contexts. In Appendix D we discuss in detail the rationale for the choices made in this paper.

Following Schneider et al. (2017) we introduce the likelihood function as the probability that parameter values explain the data urn:x-wiley:jame:media:jame21262:jame21262-math-0059, as
urn:x-wiley:jame:media:jame21262:jame21262-math-0060(18)
where urn:x-wiley:jame:media:jame21262:jame21262-math-0061 is the loss function which depends both on data and parameters C, and urn:x-wiley:jame:media:jame21262:jame21262-math-0062 is a hyperparameter associated with the likelihood function as opposed to a parameter in the parameterization. The posterior distribution, urn:x-wiley:jame:media:jame21262:jame21262-math-0063, is then given by Bayes formula
urn:x-wiley:jame:media:jame21262:jame21262-math-0064(19)
where urn:x-wiley:jame:media:jame21262:jame21262-math-0065 is the prior distribution. In terms of probability densities, letting urn:x-wiley:jame:media:jame21262:jame21262-math-0066 and urn:x-wiley:jame:media:jame21262:jame21262-math-0067 denote our prior and posterior distributions for the parameters (The proportionality sign is introduced, because Bayes' formula applies to probabilities, while ρ0(C) is a probability density function.) C, Bayes formula becomes
urn:x-wiley:jame:media:jame21262:jame21262-math-0068(20)

In our context Bayes' formula updates prior guesses about KPP parameter values and yields a posterior distrbution based on the LES data.

We choose the hyperparameter urn:x-wiley:jame:media:jame21262:jame21262-math-0069 as the minimum of the loss function urn:x-wiley:jame:media:jame21262:jame21262-math-0070. The minimum is found using a modified simulated annealing procedure (In simulated annealing one finds the minimum of the loss function decreasing urn:x-wiley:jame:media:jame21262:jame21262-math-0071 to zero as one explores the parameter space through a random walk. Here we keep updating urn:x-wiley:jame:media:jame21262:jame21262-math-0072 to the new local minimum every time the random walk stumbles on a set of parameters, for which urn:x-wiley:jame:media:jame21262:jame21262-math-0073.) (Kirkpatrick et al., 1983). Once the parameter values C that minimize the loss functions have been found, that is, urn:x-wiley:jame:media:jame21262:jame21262-math-0074, the likelihood of any other parameter choice C1 is given by
urn:x-wiley:jame:media:jame21262:jame21262-math-0075(21)

For example, if the choice C1 increases the minimum of the loss function by a factor of two, that is, urn:x-wiley:jame:media:jame21262:jame21262-math-0077, then it is 1/e less likely. The probability distribution ρ(C) is then sampled with a Random Walk Markov Chain Monte Carlo (RW-MCMC) algorithm (Metropolis et al., 1953), described further in Appendix E.

To illustrate our choices, as well as the RW-MCMC algorithm, we show a typical output from an RW-MCMC algorithm for a 2-D probability distribution in Figure  3. We use the probability density function for the KPP parameterization presented in the next section, but keep two of the four parameters fixed (CD and CH) to reduce the problem from four to two parameters (CN and CS). The prior distributions for CN and CS are uniform over the ranges reported at the end of this section. The parameters CD and CH are set to the values that minimize the loss function. We show results for two arbitrary values of urn:x-wiley:jame:media:jame21262:jame21262-math-0078 for illustrative purposes. Starting from a poor initial guess, the RW-MCMC search proceeds towards regions of higher probability (lower loss function) by randomly choosing which direction to go. Once a region of high probability is found, in this case parameter values in the “blue” region, the parameters hover around the minimum of the loss function as suggested by the high values of the likelihood function. The orange hexagons represent the process of randomly walking towards the minimum of the loss function and correspond to the “burn-in” period. The burn-in period is often thrown away when calculating statistics since it corresponds to an initial transient before the RW-MCMC settles around the minimum of the likelihood function. We see that the choice of urn:x-wiley:jame:media:jame21262:jame21262-math-0079 does not change the overall structure of the probability distribution but does affect how far from optimal parameters the random walk is allowed to drift.

Details are in the caption following the image
An example of a RW-MCMC search trajectory based on a sample probability distribution for KPP parameters using 105 RW-MCMC iterations. The trajectories starts from a region of very low probability (white areas) and moves toward progressively higher probabilities (the darker the blue shading, the higher the probability). The blue probability distributions on the left side and the top are the corresponding marginal distributions of CH and CD, respectively. The green star is the best known optimal of the probability distribution (i.e., the mode of the probability distribution). The value of urn:x-wiley:jame:media:jame21262:jame21262-math-0076 is the value of the loss function at the green star.

Parameterizations such as KPP exhibit a dependence on resolution in addition to nondimensional parameters. Here we perform all calculations for a vertical resolution urn:x-wiley:jame:media:jame21262:jame21262-math-0080 m and timestep urn:x-wiley:jame:media:jame21262:jame21262-math-0081 min representative of those used in state of the art ESMs. We do not use enhanced diffusivity as in Large et al. (1994) for this resolution. The parameterization is relatively insensitive to halving Δz and Δt, for a fixed set of parameters, but the results are sensitive to doubling either one. Thus, the optimal parameter values and their uncertainties are only appropriate for the resolution used for the calibration and would need to be updated especially if the parameterization was run at a coarser resolution. This dependence on resolution could be handled within the Bayesian method by introducing Δz and Δt as additional parameters in the probability distribution, but we do not pursue this approach.

The temporal window used to compute the loss function is from urn:x-wiley:jame:media:jame21262:jame21262-math-0082 days (so as to eliminate initial transients in the LES) to the final simulation day chosen to be when h ≈ 70 m. We apply the Bayesian parameter estimation procedure to KPP using data from one LES simulation in section 3.1 and from multiple LES simulations using different initial stratifications in section 3.2. We use a uniform prior distributions for the KPP parameters over the following ranges:
urn:x-wiley:jame:media:jame21262:jame21262-math-0083(22)

The surface layer fraction CS, being a fraction, must stay between zero and one. The other parameter limits are chosen to span the whole range of physically plausible values around the reference values given in Equation 16. The choice of uniform distributions is made to avoid favoring any particular value at the outset.

3.1 Calibration of KPP Parameters From One LES Simulation

In this section we apply the Bayesian calibration method to the LES simulation of penetrative convection described in section 2.1 and quantify uncertainties in KPP parameters in section 2.2. The horizontal averages from the LES simulations are compared with predictions from solutions of the KPP boundary layer scheme, Equations 9 and 10. The boundary and initial conditions for KPP are taken to be the same as those for the LES simulation, that is, 100 W/m2 cooling at the top, urn:x-wiley:jame:media:jame21262:jame21262-math-0084C m−1 at the bottom, and an initial profile urn:x-wiley:jame:media:jame21262:jame21262-math-0085.

To estimate the full probability distribution function, we use the RW-MCMC algorithm with 106 iterations to sample the probability distributions of the four KPP parameters (CS, CN, CD, CH). The large number of forward runs is possible because the forward model consists of a one-dimensional equation, namely, KPP in single column mode. The Markov chain leads to roughly 104 statistically independent samples as estimated using an autocorrelation length, see Sokal (1997). The RW-MCMC algorithm generates the entire four-dimensional PDF, Equation 18.

The parameter probability distribution can be used to choose an optimal set of KPP parameters. Of the many choices, we pick the most probable value of the four-dimensional probability distribution, the mode, because it minimizes the loss function, see Appendix D for the detailed calculation. In Figure 4a we show the horizontally averaged temperature profile from the LES simulation (continuous line) and the temperature profiles obtained running the KPP parameterization with reference and optimal parameters (squares and dots) at urn:x-wiley:jame:media:jame21262:jame21262-math-0086 days. The optimized temperature profiles are more similar to the LES simulation than the reference profiles especially in the entrainment region. Figure 4b confirms that the square root of the instantaneous loss function, the error, grows much faster with the reference parameters. The oscillations in the error are a consequence of the coarseness of the KPP model: only one grid point is being entrained at any given moment.

Details are in the caption following the image
KPP and horizontally averaged LES temperature profiles for different point estimates of parameters at urn:x-wiley:jame:media:jame21262:jame21262-math-0087 days as well as the error in time. In the left plot, the blue squares correspond to reference parameter choices, the red circles correspond to the optimized parameterization (using the mode of the probability distribution), and the blue line to the horizontally averaged LES solution, all at time urn:x-wiley:jame:media:jame21262:jame21262-math-0088 days. On the right plot we show the instantaneous error at each moment in time. We see that the “optimal” parameter does indeed reduce the bias over the time period. The loss function is the largest square of the error over the time interval.

The improvement in boundary layer depth through optimization of the parameters is about 10%, or 10 m over 8 days. As discussed in section 2.1, the rate of deepening can be predicted analytically within 20% by simply integrating the buoyancy budget over time and depth and assuming that the boundary layer is well mixed everywhere, that is, ignoring the development of enhanced stratification within an entrainment layer at the base of the mixed layer. KPP improves on this prediction by including a parameterization for the entrainment layer. The reference KPP parameters contribute a 10% improvement on the no entrainment layer prediction, and the optimized parameters contribute another 10%. While these may seem like modest improvements, they can prevent large biases for the boundary layer depth when integrated over a few months of cooling in winter rather than just 8 days. We will return to this point in the next section when we discuss structural deficiencies in the KPP formulation.

To visualize the probability distribution we focus on 2-D marginal distributions, for example,
urn:x-wiley:jame:media:jame21262:jame21262-math-0089(23)
along with the other five possible pairings, as well as the 1-D marginal distributions such as
urn:x-wiley:jame:media:jame21262:jame21262-math-0090(24)
and similarly for the other three parameters.

The marginal distribution can intuitively be thought of as the total of a parameter (or pair of parameters) while taking into account the total uncertainty of other parameters. Furthermore, the marginal distribution takes into account potential compensating effects that different parameters may have on one another. The marginal distribution does not capture the effect of individually varying a parameter while keeping all the other parameters fixed at a particular value (That is, unless the other parameters have essentially delta function 1-D marginal distributions.). That is an effect represented by a conditional distribution.

Constructing the marginal distributions only requires constructing histograms of the trajectories generated by the RW-MCMC algorithm. The 2-D marginal distributions are visualized with heatmaps in Figure 5, and the 1-D marginal distributions of the corresponding parameters are shown along the outermost edges. For the 2-D marginal distributions, the dark blue regions correspond to regions of high probability, and the light blue regions are regions of low probability. The white space corresponds to regions that the RW-MCMC algorithm never visited. The 2-D marginal distributions show that parameters must be changed in tandem with one another in order to correspond to a similar model output. Furthermore, their structure is distinctly non-Gaussian.

Details are in the caption following the image
Marginal distributions for KPP parameters. The dark blue regions correspond to regions of high probability and the light blue regions are regions of low probability. The white space corresponds to regions that the RW-MCMC algorithm never visited. The corresponding 1-D marginal distributions (corresponding to integrals of the 2D marginal distributions) are displayed on the left and on top of the plots for reference.

The 1-D marginal distribution of the mixing depth parameter CH (the bottom left rectangular panel) is much more compact than that of the other three parameters suggesting that it is the most sensitive parameter. The mixing depth parameter's importance stems from its control over both the buoyancy jump across the entrainment layer and the rate-of-deepening of the boundary layer. (Again it may be useful to remember that CH is often referred to as the bulk Richardson number in the KPP literature, even though it takes a different meaning in convective simulations, see Appendix C.) The parameters CD and CN set the magnitude of the local and nonlocal fluxes. Results are not sensitive to their specific values, as long as they are large enough to maintain a well-mixed layer. The value of the surface layer fraction CS is peaked at lower values but is less sensitive to variations than CD or CH.

The uncertainties of the parameters can be used to infer the uncertainties of the temperature profile at each depth and time, predicted by KPP. To do this, we subsample the 106 parameter values down to 104 and evolve KPP forward in time for each set of parameter choices. We construct histograms for the temperature field at the final time for each location in space individually. We then stack these histograms to create a visual representation of the model uncertainty. This uncertainty quantifies the sensitivity of the parameterization with respect to parameter perturbations as defined by the parameter distributions.

The histogram of temperature profiles at time urn:x-wiley:jame:media:jame21262:jame21262-math-0091 days as calculated by both our prior distribution (uniform distribution) and the posterior distribution (as obtained from the RW-MCMC algorithm) is visualized in Figure 6. We see that there is a reduction of the uncertainty in the temperature profile upon taking into account information gained from the LES simulation. The salient features of the posterior distribution temperature uncertainty are as follows:
  1. 0–10 m depth: There is some uncertainty associated with the vertical profile of temperature close to the surface.
  2. 20–60 m depth: The mean profile of temperature in the mixed layer is very well predicted by KPP.
  3. 60–70 m depth: The entrainment region contains the largest uncertainties.
  4. 70–100 m depth: There is virtually no uncertainty. The unstratified region below the boundary layer does not change from its initial value.
Details are in the caption following the image
Uncertainty propagation of the temperature profile with respect to the prior and posterior probability distributions. The use of probability distributions for parameters has the consequence that the temperature field is no longer a point estimate, but rather a probability distribution at each moment in space and time. By sampling from the parameter probability distributions and evolving the parameterization forward in time, we obtain a succinct representation of what it means to “fiddle” with parameters. The legend on the right shows what the colors correspond to in terms of the base 10 logarithm of the probability distributions.

Now that we have applied the Bayesian methodology to one LES simulation and explored its implications, we are ready to apply the method to multiple LES simulations covering different regimes in the following section.

3.2 Calibration of KPP Parameters From Multiple LES Simulations

We now use our Bayesian framework to explore possible sources of bias in the KPP model. To this end we investigate what happens when we change the initial stratification in penetrative convection simulations. This is motivated by recent work on boundary layer depth biases in the Southern Ocean (DuVivier et al., 2018; Large et al., 2019). In those studies, KPP failed to simulate deep boundary layers in winter when the subsurface summer stratification was strong.

We perform 32 large eddy simulations and calculate parameter distributions for each case. In the previous section we saw that CH is the most sensitive parameter. Thus, our focus now will be on the optimization and uncertainty quantification of CH. In the background, however, we are estimating all parameters. We keep the surface cooling constant at 100 W/m2 for all regimes and only vary the initial stratification. The integration time was stopped when the boundary layer depth filled about 70% of the domain in each simulation. We used 1283 grid points in the LES, 0.8 m resolution in each direction (Although the parameter estimates will vary upon using less LES resolution, the qualitative trends are expected to be robust.). We use a lower resolution for the LES in these trend studies as compared to those in the previous section, but results were not sensitive to this change. In the Bayesian inference, each one of the probability distributions were calculated 105 iterations of RW-MCMC, leading to an effective sample size on the order of 103. The stratifications ranged from N2 ≈ 1 × 10−6 to N2 ≈ 3.3 × 10−5 s−2.

We find, as visualized in Figure 7, that CH is not constant but depends on the background stratification, N2. The blue dots are the median values of the probability distributions, and the stars are the modes (minimum of the loss function). The error bars correspond to 90% probability intervals, meaning that 90% of parameter values fall between the error bars. The large discrepancy between the median and the mode is due to the mode being the optimal value of the entire four-dimensional distribution whereas the median only corresponds to the marginal distribution. The reference KPP value is plotted as a dashed line.

Details are in the caption following the image
Mixing depth parameter optimized across various background stratification. The dots are the median values, the stars are the mode, and the error bars correspond to 90% probability intervals. The horizontal dashed line is the default value of the mixing depth parameter for reference. Here one can see that the mixing depth parameter when estimated across various regimes produces different results. This is a signature of a systematic bias in the parameterization.

The median values and optimal values increase monotonically with the initial stratification revealing a systematic bias. Furthermore, it exposes where the systematic bias comes from: No single value of CH, Equation 15, can correctly reproduce the deepening of the boundary layer for all initial stratifications. This suggests that the scaling law for the boundary layer depth criteria is incommensurate with the LES data.

The failure of Equation 15 can be understood by going back to the buoyancy budget in Equation 7. Using the KPP estimate for the buoyancy jump across the entrainment layer,
urn:x-wiley:jame:media:jame21262:jame21262-math-0092(25)
and introducing urn:x-wiley:jame:media:jame21262:jame21262-math-0093 for the stratification at the base of the entrainment layer to distinguish it from the interior stratification N2, we find that the boundary layer depth criterion, Equation 15, implies,
urn:x-wiley:jame:media:jame21262:jame21262-math-0094(26)
Substituting this expression in the buoyancy budget, Equation 7, one obtains an implicit equation for the evolution of the boundary layer depth h,
urn:x-wiley:jame:media:jame21262:jame21262-math-0095(27)

The LES simulation described in section 2.1, and many previous studies of penetrative convection, for example, Deardorff et al. (1980) and Van Roekel et al. (2018), show that the boundary layer depth grows as urn:x-wiley:jame:media:jame21262:jame21262-math-0096. To be consistent, Nh would have to scale as h2/3, but this is not observed in the LES simulations nor supported by theory. This suggests that we must modify the formulation of boundary layer depth, as we now describe.

3.3 Modification of the KPP Parameterization to Reduce Biases

From the multi-regime study of the previous section we found that there is no optimal KPP mixing depth parameter CH that works for arbitrary initial stratification. This prompted us to look for an alternative formulation of the depth criterion which satisfies the well-known empirical result that the boundary layer depth deepens at a rate,
urn:x-wiley:jame:media:jame21262:jame21262-math-0097(28)
where c is a dimensionless constant found to be close to 3.0 with the LES simulation in section 2.1. Furthermore, c was found to be close to 3.0 across all the numerical experiments from section 3.2. Substituting this expression into the buoyancy budget, Equation 7, we find that
urn:x-wiley:jame:media:jame21262:jame21262-math-0098(29)
This expression can then be used as a new boundary layer depth criterion to replace Equation 15,
urn:x-wiley:jame:media:jame21262:jame21262-math-0099(30)
where C replaces CH as the dimensionless parameter whose value sets the boundary layer depth. The value of N2 here is the background stratication. Based on Equation 29 and our LES data, we expect
urn:x-wiley:jame:media:jame21262:jame21262-math-0100(31)

Equation 30 is an implicit equation for h which guarantees that Equation 28 holds.

We now repeat the model calibration in section 3.2 but using this new boundary layer depth criterion to test whether there is an optimal value of C that is independent of initial stratification. We estimate all KPP parameters and show the new mixing depth parameter for simulations with different initial stratifications in Figure 8. Encouragingly there is no obvious trend in the optimal values of C and the error bars overlap for all cases. This supports the new criterion in the sense that parameters estimated in different regimes are now consistent with one another. The uncertainties in C translate into an uncertainty in boundary layer depth prediction. In particular, values between 0.05 ≤ C ≤ 0.2 imply a boundary layer depth growth in the range urn:x-wiley:jame:media:jame21262:jame21262-math-0101.

Details are in the caption following the image
The modified mixing depth parameter optimized across various background stratification. The dots are the median values, the stars are the mode, and the error bars correspond to 90% probability intervals. The dashed line corresponds to 1/6, the theoretical expectation based on Equation 31. This is similar to Figure 7, but using the modification from section 3.3. Here one can see that there mixing depth parameter when estimated across various regimes produces similar results. This is a desirable feature in a parameterization.

Additionally, one can check if the constants estimated following the methodology of section 3 are consistent with an independent measure directly from the diagnosed LES simulation. In particular the LES simulations suggest that C ≃ 1/6 as per Equation 31. From Figure 8 we see that the optimal C is smaller than urn:x-wiley:jame:media:jame21262:jame21262-math-0102 (the dashed black line), and the value 1/6 is not within the confidence intervals for many of the simulations. There are several potential reasons for the discrepancy, for example, the neglect of curvature in the buoyancy budget (since we assumed a piece-wise linear buoyancy profile) or the finite resolution of the parameterization. Perhaps the most likely explanation is the difference in how the boundary layer depth was diagnosed in the LES, which need not have the same meaning as the one in KPP. A different definition in the LES simulation, such as the depth of maximum stratification, would yield a different scaling law, but still proportional to urn:x-wiley:jame:media:jame21262:jame21262-math-0103. Whatever the choice, the Bayesian parameter estimation bypasses these ambiguities/inconsistencies by direct comparison with the entire horizontally average temperature profile from the LES.

We do not explore other modifications to the boundary layer depth criterion as this would greatly expand the scope of this article. Furthermore, biases in KPP are not limited to the cases explored here, see Van Roekel et al. (2018) for discussions and remedies. The criterion described in this section assumes a constant initial stratification and a constant surface heat loss, which leads to the urn:x-wiley:jame:media:jame21262:jame21262-math-0104 growth of the boundary layer depth. It would be interesting to extend the criterion to arbitrary initial stratification, variable surface heat fluxes, not to mention the interaction with wind-driven mixing. The goal here is not to derive a new parameterization, but rather to illustrate and argue for a Bayesian methodology in the refinement and assessment of parameterizations.

4 Discussion

We presented a Bayesian approach to assess the skill of the K-Profile Parameterization (KPP) for turbulent convection triggered by surface cooling in an initially stably stratified ocean. The KPP model for this physical setting consists of a one-dimensional model with an algebraic constraint for the mixing-layer depth together with four non-dimensional parameters. These parameters consisted of an algebraic reorganization of the original KPP parameters so that terms in the equations could be associated with choices of parameters. Parameters were estimated by reducing the mismatch between the vertical buoyancy profile predicted by KPP and the area-averaged buoyancy profile simulated with a three-dimensional LES code for the same initial conditions and surface forcing. Using Bayes' formula we further estimated the full joint probability distribution of the four parameters. Furthermore, the probability distribution was used to quantify interdependencies among parameters and their uncertainty around the optimal values.

Repeating the Bayesian parameter optimization and uncertainty quantification for different initial stratifications, we found that no unique set of parameters could capture the deepening of convection in all cases. This implied that the KPP formulation does not capture the dependence of convection on the initial stratification in the simple test case considered here: constant surface cooling, constant initial stratification, no wind, and no background flow. The parameter that required re-tuning for each case was the one associated with the boundary layer depth criterion, thereby suggesting that this criterion has the wrong functional dependence on stratification. We thus reformulated the boundary layer depth criterion to capture the semi-analytical result, supported by the LES simulations, that the boundary layer depth deepens as the square root of time when the initial stratification is constant. The validity of the new formulation was vindicated because the Bayesian approach was able to find a set of parameters which captured the evolution of the boundary layer, as compared to the LES simulations, for all initial strtatifications. In this way, the Bayesian methodology allowed us identify and remove a bias in KPP formulation.

The methodology outlined here could be as easily applied to other parameterizations of boundary layer turbulence, such as those reviewed in CVMix (Griffies et al., 2015), Pacanowski and Philander (1981), Mellor and Yamada (1982), Price et al. (1986), and Kantha and Clayson (1994). It is expected that the inclusion of additional physics, such as wind-driven mixing and its interaction with convection, would also be amenable to the techniques described in this manuscript. Our experience is that progress is faster if one starts with simple idealized setups, like the ones considered here, and then move to progressively more realistic ones which accounted for variable stratification and surface heat fluxes, wind-stress forcing, background shear, surface waves, and so forth. The Bayesian method would then provide a rigorous evaluation of parameter uncertainty, parameter dependencies, and biases in the formulation of the parameterization.

Ultimately, our hope is that parameter probability distributions estimated in local regimes will serve as useful prior information for calibration/tuning of Earth System Models (ESMs). Local simulations of turbulence must be carefully designed and incorporate suites of subgrid-scale processes that have leading order impact in global ocean dynamics: surface and bottom boundary layer turbulence, surface wave effects, deep convection, mesoscale and submesoscale turbulence, and so forth. Bayesian calibration of parameterization for each subgrid-scale process will then result in probability distributions for all the nondimensional parameters associated with the parameterizations. These distributions can then be used as prior information for what is a reasonable range of values that each parameter can take, when the parameterizations are implemented in an ESMs.

With regard to calibration of ESMs, the parameterizations of different subgridscale processes may nonlinearly interact with each other and with the resolved physics. Additional calibration is then required for the full ESM. Presently, this is achieved by perturbing the parameters within plausible ranges (Mauritsen et al., 2012; Schmidt et al., 2017). The Bayesian approach provides an objective approach to determine a plausible range. The same algorithm cannot be used to calibrate the ESM, because the methodologies described here are not computationally feasible when applied to larger systems. Promising approaches to address this challenge through the use of surrogate models are described in Sraj et al. (2016) and Urrego-Blanco et al. (2016). Such models bring internal sources of uncertainty, and it is not clear to what extent one can trust a surrogate of a full ESM. One potential way to address this additional challenge is the Calibrate, Emulate, and Sample (CES) approach outlined in Cleary et al. (2020). There the surrogate model's uncertainty is estimated through the use of Gaussian processes and included as part of a consistent Bayesian calibration procedure.

Should the global problem still exhibit significant biases, even when all available prior information about parameterizations and about global data are leveraged utilizing emulators or traditional methods of tuning, then one would have to conclude that there is a fundamental deficiency in our understanding of how the different components of the climate system interact with one another, or that perhaps the models do not include some key process. For example, Rye et al. (2020) argue that glacial melt might be one such missing process which is not currently represented in ESMs. The advantage of the systematic calibration approach outlined here is that it allows us to quantify uncertainty in ESM projections and identify the sources of such uncertainty.

Acknowledgments

The authors would like to thank Carl Wunsch, Tapio Schneider, Andrew Stuart, and William Large for numerous illuminating discussions. We would also like to thank the reviewers for their helpful suggestions with the manuscript. Our work is supported by the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program, and by the National Science Foundation under grant AGS-6939393.

    Appendix A: Oceananigans.jl

    Oceananigans.jl (Ramadhan et al., 2020) is open source software for ocean process studies written in the Julia programming language (Besard, Churavy, et al., 2019; Bezanson et al., 2017). For the large eddy simulations (LESs) reported in this paper, Oceananigans.jl is configured to solve the spatially filtered, incompressible Boussinesq equations with a temperature tracer. Letting urn:x-wiley:jame:media:jame21262:jame21262-math-0105 be the three-dimensional, spatially filtered velocity field, θ be the conservative temperature, p be the kinematic pressure, f be the Coriolis parameter, and τ and q be the stress tensor and temperature flux due to subfilter turbulent diffusion, the equations of motion are Equations A1-A3,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0106(A1)
    urn:x-wiley:jame:media:jame21262:jame21262-math-0107(A2)
    urn:x-wiley:jame:media:jame21262:jame21262-math-0108(A3)
    The buoyancy b appearing in Equation A1 is related to conservative temperature by a linear equation of state,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0109(A4)
    where urn:x-wiley:jame:media:jame21262:jame21262-math-0110 is a reference temperature, urn:x-wiley:jame:media:jame21262:jame21262-math-0111 is the thermal expansion coefficient, and urn:x-wiley:jame:media:jame21262:jame21262-math-0112 is gravitational acceleration at the Earth's surface.

    1 Subfilter Stress and Temperature Flux

    The subfilter stress and momentum fluxes are modeled with downgradient closures, such that
    urn:x-wiley:jame:media:jame21262:jame21262-math-0113(A5)
    where urn:x-wiley:jame:media:jame21262:jame21262-math-0114 is the strain rate tensor and νe and κe are the eddy viscosity and eddy diffusivity of conservative temperature. The eddy viscosity νe and eddy diffusivity κe in Equation A5 are modeled with the anisotropic minimum dissipation (AMD) formalism introduced by Rozema et al. (2015) and Abkar et al. (2016), refined by (Verstappen, 2018), and validated and described in detail for ocean-relevant scenarios by Vreugdenhil and Taylor (2018). AMD is simple to implement, accurate on anisotropic grids (Vreugdenhil & Taylor, 2018) and relatively insensitive to resolution (Abkar et al., 2016).

    2 Numerical Methods

    To solve Equations A1-A3 with the subfilter model in Equation A5, we use the software package “Oceananigans.jl” written in the high-level Julia programming language to run on Graphics Processing Units, also called “GPUs” (Besard, Churavy, et al., 2019; Besard, Foket, et al., 2019; Bezanson et al., 2017). Oceananigans.jl uses a staggered C-grid finite volume spatial discretization (Arakawa & Lamb, 1977) with centered second-order differences to compute the advection and diffusion terms in Equations A1 and A2, a pressure projection method to ensure the incompressibility of u, a fast, Fourier-transform-based eigenfunction expansion of the discrete second-order Poisson operator to solve the discrete pressure Poisson equation on a regular grid (Schumann & Sweet, 1988), and second-order explicit Adams-Bashforth time stepping. For more information about the staggered C-grid discretization and second-order Adams Bashforth time-stepping, see section 3 in Marshall et al. (1997) and references therein. The code and documentation are available for perusal at https://github.com/CliMA/Oceananigans.jl.

    Appendix B: Parcel Theory Derivation for the KPP Boundary Layer Depth Criterion

    Here we summarize the derivation of the KPP boundary layer depth criterion for penetrative convection, because we could not find a succinct description in the published literature. Following Deardorff et al. (1980) we consider the vertical momentum equation for a parcel punching through the entrainment layer,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0115(B1)
    where b is the buoyancy of the parcel, assumed to be equal to the mixed layer value, and urn:x-wiley:jame:media:jame21262:jame21262-math-0116 is the area mean buoyancy profile in the entrainment layer. This equation holds if the area occupied by sinking plumes is small compared to the total area so that urn:x-wiley:jame:media:jame21262:jame21262-math-0117 is a good proxy for the buoyancy in the environment around the plumes and urn:x-wiley:jame:media:jame21262:jame21262-math-0118 represents the buoyancy force experienced by the parcel. The parcel velocity decelerates from w ≡ we at the mixed layer depth ( urn:x-wiley:jame:media:jame21262:jame21262-math-0119) to zero at the boundary layer depth ( urn:x-wiley:jame:media:jame21262:jame21262-math-0120) where turbulence vanishes. Assuming that the background stratification urn:x-wiley:jame:media:jame21262:jame21262-math-0121 is approximately constant in the entrainment layer we also have urn:x-wiley:jame:media:jame21262:jame21262-math-0122. The momentum equation can then be integrated from urn:x-wiley:jame:media:jame21262:jame21262-math-0123 to urn:x-wiley:jame:media:jame21262:jame21262-math-0124,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0125(B2)
    assuming that the background stratification urn:x-wiley:jame:media:jame21262:jame21262-math-0126 is constant in the entrainment layer. Introducing Δb as the difference between the environment buoyancy in the mixed layer and that at the base of the entrainment layer, we have urn:x-wiley:jame:media:jame21262:jame21262-math-0127, and hence,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0128(B3)
    and (Deardorff et al., 1980) assumes that we ∝ w(− h + Δh). The criterion for diagnosing the boundary layer depth follows from this relationship; h is defined as the first depth z below the ocean surface, where
    urn:x-wiley:jame:media:jame21262:jame21262-math-0129(B4)
    for some universal constant CH. In the main text we show this scaling fails to predict the rate of deepening of the boundary layer depth in LES simulations. Further analysis, not reported here, show that this failure stems from relationship (B3) which is not supported by the simulations.
    Equation B4 is often referred to as a critical Richardson number criterion which may seem odd given that no Richardson number appears in the expression. This is best understood if one extends the criterion to the case when there is a momentum shear in the boundary layer, typically induced by mechanical stresses, such that in addition to a density jump Δb(z) there is also a momentum jump Δu(z) across the entrainment layer. The entrainment layer base is then found where the Richardson number matches a critical value Ric,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0130(B5)

    The rationale behind this extended criterion can be found in Large et al. (1994). For the purely convective limit urn:x-wiley:jame:media:jame21262:jame21262-math-0131 and the dependence on Ric drops out.

    Appendix C: Relationship Between the Model in Section 2.2 and Large et al. (1994)'s Formulation of KPP

    The formulation of KPP in Section 2.2 represents an algebraic reorganization of the formulation proposed by Large et al. (1994). The two formulations are mathematically equivalent. In this appendix, we discuss in detail how the four free parameters CH, CS, CD, and CN are algebraically related to the free parameters proposed by Large et al. (1994).

    Large et al. (1994)'s formulation of KPP for the case of penetrative convection with no horizontal shear introduces six nondimensional parameters: the Von Karman constant urn:x-wiley:jame:media:jame21262:jame21262-math-0132, the ratio of the entrainment flux to the surface flux urn:x-wiley:jame:media:jame21262:jame21262-math-0133, a constant that sets the amplitude of the non-local flux urn:x-wiley:jame:media:jame21262:jame21262-math-0134, a constant that ensures the continuity of the buoyancy flux profile urn:x-wiley:jame:media:jame21262:jame21262-math-0135, the surface layer fraction urn:x-wiley:jame:media:jame21262:jame21262-math-0136, and a parameter that controls the smoothing of the buoyancy profile at the base of the boundary layer depth Cv. Large et al. (1994) argue that Cv can take any value between 1 and 2. We set the reference value urn:x-wiley:jame:media:jame21262:jame21262-math-0137, which corresponds to the strong stratification limit in the model proposed by Danabasoglu et al. (2006) and given by equation (8.184) in Griffies et al. (2015).

    In our formulation we introduce four parameters which are related to the original Large et al. (1994) parameters as follows,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0138(C1)

    We are able to reduce the number of parameters from six (ϵ, cs, CV, βT, κ, C) to four (CH, CS, CD, CN), because in the case of penetrative convection the two combinations Cv(βT)1/2 and csκ4 always appear together.

    Using the reference KPP parameter values reported above, our parameters take the values:
    urn:x-wiley:jame:media:jame21262:jame21262-math-0139(C2)

    We refer to these as the reference parameters.

    It is worth commenting why the critical Richardson number, the focus of much literature on KPP, does not appear when considering penetrative convection. The boundary layer depth is determined implicitly through equations (21) and (23) in Large et al. (1994),
    urn:x-wiley:jame:media:jame21262:jame21262-math-0140(C3)
    where B is buoyancy and Br is the average of B between the surface and the depth ϵz. The boundary layer depth is defined as the depth urn:x-wiley:jame:media:jame21262:jame21262-math-0141 where urn:x-wiley:jame:media:jame21262:jame21262-math-0142. For convection without shear, the case considered in this paper, urn:x-wiley:jame:media:jame21262:jame21262-math-0143 and urn:x-wiley:jame:media:jame21262:jame21262-math-0144. The two equations can therefore be combined together:
    urn:x-wiley:jame:media:jame21262:jame21262-math-0145(C4)
    and the critical Richardson number drops out from the expression. This expression further supports our decision to introduce the single parameter CH in favor of the combination of original parameters appearing on the left hand side of (C4). In penetrative convection it is the parameter CH that controls the boundary layer depth rather than the critical Richardson number.
    The optimal parameters and probability distributions for (CH, CS, CD, CN) can be mapped onto (ϵ, Cv(βT)1/2, csκ4, C) using the inverse transformation,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0146(C5)

    Appendix D: A Primer on Uncertainty Quantification

    The probability distribution of the parameters in a parameterization must quantify the likelihood that the parameters take on values other than those that minimize the loss function urn:x-wiley:jame:media:jame21262:jame21262-math-0147. To achieve this the probability distribution must satisfy two key properties:
    1. In the limit of no uncertainty, the probability distribution should collapse to a delta function centered at the optimal parameter values that minimize the loss function.
    2. The uncertainty of a parameter value C should increase proportionally to the value of urn:x-wiley:jame:media:jame21262:jame21262-math-0148.
    There are many probability distributions that satisfy the above properties. We choose the following:
    urn:x-wiley:jame:media:jame21262:jame21262-math-0149(D1)
    where ρ0 is a uniform prior distribution, urn:x-wiley:jame:media:jame21262:jame21262-math-0150 is a loss function, and urn:x-wiley:jame:media:jame21262:jame21262-math-0151 is a hyperparameter.

    The hyperparameter urn:x-wiley:jame:media:jame21262:jame21262-math-0152 sets the shape of the likelihood function urn:x-wiley:jame:media:jame21262:jame21262-math-0153 and its associated uncertainty quantification. The limit urn:x-wiley:jame:media:jame21262:jame21262-math-0154 corresponds to no uncertainty, because the likelihood function and the probability distribution collapse to a delta function peaked at the optimal parameter values that minimize the loss function. The limit urn:x-wiley:jame:media:jame21262:jame21262-math-0155 instead corresponds to a likelihood function that adds no information to reduce the uncertainty and the posterior distribution ρ(C) is equal to the prior one ρ0(C). Thus, urn:x-wiley:jame:media:jame21262:jame21262-math-0156 must take finite values between zero and infinity, if the likelihood function is to add useful information.

    For any finite value of urn:x-wiley:jame:media:jame21262:jame21262-math-0157, the probability distribution has its mode (maximum) at the optimal parameters, if the prior distribution is uniform. This can be easily demonstrated. Let C denote the parameter values for which the loss function has its global minimum and C denote any other set of parameter values. It is then the case that ρ(C) is smaller than ρ(C) for any C,
    urn:x-wiley:jame:media:jame21262:jame21262-math-0158(D2)

    Hence, the most probable value of the probability distribution is achieved at the minimum of the loss function independent of urn:x-wiley:jame:media:jame21262:jame21262-math-0159 for a uniform prior distribution.

    As mentioned in section 3, it is convenient to set the hyperparameter urn:x-wiley:jame:media:jame21262:jame21262-math-0160 to be equal to the minimum of the loss function urn:x-wiley:jame:media:jame21262:jame21262-math-0161. This choice satisfies two key requirements. First, the uncertainties of parameters should be independent of the units of the loss function. Second, the hyperparameter urn:x-wiley:jame:media:jame21262:jame21262-math-0162 should be larger the larger the loss function urn:x-wiley:jame:media:jame21262:jame21262-math-0163, because the latter is a measure of the parameterization bias and the former should be larger if there is more uncertainty about acceptable parameter values.

    In practice it is seldom possible to find the global minimum of urn:x-wiley:jame:media:jame21262:jame21262-math-0164 and instead we adopt a “best guess” of the optimal parameters urn:x-wiley:jame:media:jame21262:jame21262-math-0165 and set urn:x-wiley:jame:media:jame21262:jame21262-math-0166. Since urn:x-wiley:jame:media:jame21262:jame21262-math-0167, our choice is conservative because a larger urn:x-wiley:jame:media:jame21262:jame21262-math-0168 corresponds to more uncertainty.

    Appendix E: Random Walk Markov Chain Monte Carlo

    We use the Random Walk Markov Chain Monte Carlo Method (RW-MCMC) introduced by Metropolis et al. (1953) to sample values from the probability distribution. While other more efficient algorithms exist, our parameter space is only four dimensional and computational cost is not an issue. The RW-MCMC samples the probability function by taking a random walk through parameter space. The algorithm generates a sequence of sample parameter values Ci in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the joint parameter probability distribution of the parameters. At each iteration, the algorithm picks a candidate parameter set for the next sample value based on the current sample value. Then, with some probability, the candidate parameter set is either accepted (in which case the candidate value is used in the next iteration) or rejected (in which case the candidate value is discarded, and current values reused in the next iteration). The criterion for acceptance and its relation to the probability distribution is best described by sketching the algorithm:
    1. Choose a set of initial parameter values C0. We pick our best guess at the set of values that minimize the log-likelihood function as estimated from standard minimization techniques.
    2. Choose a new set of candidate parameters by adding a Gaussian random variable with mean zero and covariance matrix Σ to the initial set, urn:x-wiley:jame:media:jame21262:jame21262-math-0169. The algorithm is guaranteed to work independently of the choice of Σ as long as it is nonzero and does not vary throughout the random walk. However, suitable choices can speed up convergence and will be discussed below.
    3. Calculate urn:x-wiley:jame:media:jame21262:jame21262-math-0170. This is a measure of how much more likely urn:x-wiley:jame:media:jame21262:jame21262-math-0171 is relative to C0.
    4. Draw a random variable from the interval [0, 1], for example, calculate urn:x-wiley:jame:media:jame21262:jame21262-math-0172. If urn:x-wiley:jame:media:jame21262:jame21262-math-0173 accept the new parameter values and set urn:x-wiley:jame:media:jame21262:jame21262-math-0174. Otherwise reject the new parameter values urn:x-wiley:jame:media:jame21262:jame21262-math-0175. This is the “accept/reject” step. Note that if Δ > 0, that is, if the proposed parameter produces a smaller output in the negative log-likelihood function, the proposal is always accepted.
    5. Repeat steps 2–4, replacing C0 → Ci and C1 → Ci + 1, to generate a sequence for Ci of parameter values.

    The sequence of parameter values generated by the algorithm can then be used to construct any statistics of the probability distribution 18, including empirical distributions, marginal distributions, and joint distributions. In the context of KPP it can generate the uncertainty of the temperature value at any depth and time as well as the uncertainty of the boundary layer depth at a given time.

    To guide the choice of an appropriate value for Σ, one diagnoses the “number of independent samples” by using approximations of the correlation length as described by Sokal (1997). If Σ is too small, then the acceptance rate is too large since each candidate parameter is barely any different from the original one. Too large a Σ yields too low acceptance rates. To find an appropriate compromise we perform a preliminary random walk and estimate the covariance matrix of the resulting distribution. We then set Σ equal to this covariance matrix.

    Last, in order to sample parameters within a finite domain, we artificially make the parameter space periodic and the random walk is therefore guaranteed to never leave the desired domain.

    Data Availability Statement

    Code and data may be found via figshare at Souza (2020).