Characterizing Convection Schemes Using Their Responses to Imposed Tendency Perturbations

Convection is usually parameterized in global climate models, and there are often large discrepancies between results obtained with different convection schemes. Conventional methods of comparing convection schemes using observational cases or directly in three‐dimensional (3D) models do not always clearly identify parameterization strengths and weaknesses. In this paper we evaluate the response of parameterizations to various perturbations rather than their behavior under particular strong forcing. We use the linear response function method proposed by Kuang (2010) to compare 12 physical packages in five atmospheric models using single‐column model (SCM) simulations under idealized radiative‐convective equilibrium conditions. The models are forced with anomalous temperature and moisture tendencies. The temperature and moisture departures from equilibrium are compared with published results from a cloud‐resolving model (CRM). Results show that the procedure is capable of isolating the behavior of a convection scheme from other physics schemes. We identify areas of agreement but also substantial differences between convection schemes, some of which can be related to scheme design. Some aspects of the model linear responses are related to their RCE profiles (the relative humidity profile in particular), while others constitute independent diagnostics. All the SCMs show irregularities or discontinuities in behavior that are likely related to threshold‐related mechanisms used in the convection schemes, and which do not appear in the CRM. Our results highlight potential flaws in convection schemes and suggest possible new directions to explore for parameterization evaluation.

way. Arakawa (2004) defines convective parameterization as "an attempt to formulate the statistical effects of cumulus convection without predicting each individual cloud." Convection parameterizations typically simulate subgrid-scale precipitation and adjust the vertical distribution of heat, moisture, and momentum (Kain & Fritsch, 1990). Most convection schemes used in GCMs today are mass-flux based and updated from schemes developed in the 1980s and 1990s (Rio et al., 2019). More recently, new approaches to parameterize convection have been proposed, for example with the introduction of stochastic elements (e.g., Berner et al., 2017;Grell & Freitas, 2014) and new processes such as cold pools (e.g., Del Genio et al., 2015;Grandpeix & Lafore, 2010;Rio et al., 2013). There are also now attempts based on machine learning (e.g., Gentine et al., 2018;O'Gorman & Dwyer, 2018).
The wide array of convection schemes employing different underlying assumptions is one of the major sources of uncertainties in GCMs. For instance, schemes often use different trigger functions and closure assumptions. As Arakawa (2004) points out, there are at least six types of convection schemes based on their closure assumptions alone. Trigger functions can be constructed using various variables such as convective available potential energy (CAPE), vertical velocity at the lifting condensation level (Bechtold et al., 2001;Kain & Fritsch, 1990), cloud work function (Arakawa & Schubert, 1974), and surface temperature and moisture (Tawfik & Dirmeyer, 2014). Certain assumptions that are widely used in convective parameterization have been found to be flawed. The quasi-equilibrium assumption (Arakawa & Schubert, 1974;Emanuel et al., 1994), for example, has been recognized to be incomplete in some cases (Bechtold et al., 2014;Davies et al., 2013;Mapes, 1997;Raymond, 1995;Yano & Plant, 2012). Further, convection schemes inherently have adjustable parameters that can be "tuned," in particular to allow simulation results to better match certain observed features of the Earth system such as clouds, temperature, and winds (e.g., Kain & Fritsch, 1990;Mauritsen et al., 2012). All these factors have led to considerable differences in model outputs when different convection schemes were employed (e.g., Emanuel & Živković-Rothman, 1999). Convective parameterization has also been identified as one of the major contributors to the discrepancies in climate sensitivity predictions between GCMs (e.g., Bony & Dufresne, 2005;Boucher et al., 2013;Vial et al., 2013). Studies have attributed the biases in various simulated variables, such as precipitation variability (DeMott et al., 2007;Wang & Zhang, 2013;Zhang & Mu, 2005), clouds (Chepfer et al., 2008;Zhang et al., 2010), convective organization (Bony et al., 2015), and the diurnal cycle of convection (Bechtold et al., 2014;Langhans et al., 2013;Rio et al., 2009), to the parameterization of convection.
Conventional methods of comparing convection schemes typically use observational case studies, where model outputs are compared with a selection of relevant observed properties in the atmosphere (e.g., Grell & Freitas, 2014;Han & Pan, 2011;Kwon & Hong, 2017;Zhang & Wang, 2017;Zhang et al., 2011). However, this method relies on sometimes difficult derivation of large-scale forcing and is based on a limited selection of observed situations. An alternative approach was suggested by Arakawa (2004), wherein he notes that differences between convection schemes could perhaps be better understood if they were expressed in a common mathematical framework instead of the physical theories they were based on. Along these lines, Kuang (2010) (hereafter K10) proposed the linear response function as an assessment method for convective parameterizations based on their behavior, that is how they actually react to atmospheric variations. There have been many studies that examined the convective responses of cloud-resolving models (CRMs) as well as convection schemes to perturbation of its large-scale environment (e.g., Derbyshire et al., 2004;Lambert et al., 2020;Redelsperger et al., 2002;Takemi et al., 2004;Tulich & Mapes, 2010).
In this study, we base our approach on K10's method and assess how it can be applied to explore the behavior of convection schemes in a systematic way. K10 points out that the responses of a cumulus ensemble to weak perturbations of its large-scale environment can be quite linear even though cumulus convection involves many non-linear processes. The behavior of a cumulus ensemble (i.e., its variation around a reference state) can therefore be approximated with a linear response function (or linear response matrix), M, which can be used to probe the mean response of a non-linear system to small imposed perturbations. The anomalous convective tendencies are given as where x is the anomalous state vector, that is vertical profiles of anomalous temperature T' and moisture q' corresponding to the vector of the anomalous temperature or moisture tendency (dT'/dt or dq'/dt). Prime indicates departure from the equilibrium state of the control (unperturbed) run and bold characters denote column vectors, for example q' = q'(k), where k is the vertical levels. In K10's experiments, small perturbations are applied to the tendencies of the thermodynamic variables, and maintained until the system reaches a new equilibrium. The anomalous convective tendencies (dx /dt) in this new equilibrium state then balance the additional perturbed forcing applied. The deviation of the temperature and moisture profiles from their profile in the control unperturbed run is x. To construct the matrix M, perturbations are applied to the temperature and moisture tendencies separately, using similarly shaped profiles that peak at successive models levels. The resulting vectors of dx /dt and x are stacked together so that In this matrix formulation, each column of Y represents a profile of the prescribed tendency perturbation that peaks at a given model level (dT'/dt sfc , … dT'/dt top , dq'/dt sfc , …, dq'/dt top ) T , where the subscripts sfc and top denote the lowest and highest model levels, and the corresponding column of X is the corresponding state responses (T' sfc , …, T' top , q' sfc , …, q' top ) T . The negative sign of Y is due to the fact that dx /dt is the anomalous convective tendencies that balance the imposed perturbations.
Our study focuses on the temperature and moisture responses to small tendency perturbations using single-column model (SCM) simulations, following Herman and Kuang (2013) (hereafter HK13). To be precise, we present the "response per unit perturbation" of the models, that is, the −M −1 matrix (the "steady state responses" of the models to imposed tendency perturbations, see Appendix A of HK13). The overarching goal is to characterize and compare some widely used convection schemes using K10's linear response function method. Further efforts to investigate the underlying mechanisms and assumptions of the individual schemes that may explain their behavior presented here form part of our ongoing work and will be presented in future publications. 12 physical packages in five SCMs are tested. We also compare our results with the corresponding CRM (SAM6.8.2, 2 km resolution) results of K10. The focus on the steady state responses (−M −1 matrix) of the SCMs in this paper allows us to easily recognize salient features of the schemes and locate discrepancies between them to gain insights into their behavior.
The mean state used in this study is that of a radiative-convective equilibrium (RCE), in which the climate system is represented by a balance between radiative cooling and convective heating. RCE resembles the tropical atmosphere on a large scale, where there is no vertical motion on average (Manabe & Strickler, 1964). It is the simplest framework to describe the atmosphere and has been applied to study a myriad of climate phenomena such as convective self-aggregation (Wing et al., 2020), precipitation extremes (Pendergrass et al., 2016), and convective updraught velocities (Singh & O'Gorman, 2015). Besides comparing between convection schemes, we also compare simulations with different planetary boundary layer (PBL) and microphysics (MP) schemes.
The specific objectives of this paper are: (1) to compare the RCE mean states of the different SCMs, (2) to examine and compare the steady state responses (T' and q') of the different schemes to small tendency perturbations, and (3) to test the sensitivity of the RCE mean state and the responses to the types of parameterization typically used in global models (convection, PBL, and MP).
The sea surface temperature (SST) used in all models is 28°C. Additionally, the surface sensible and latent heat fluxes (SH and LH, respectively) in all SCMs are computed using a bulk aerodynamic formula following Equations 3 and 4 in Friehe and Schmitt (1976): where ρ, p, T, q and π are, respectively, the air density, pressure, temperature, specific humidity and the Exner function, with their subscripts s and 1 referring to surface and lowest model level, respectively; C h and C e are the surface exchange coefficients for heat and moisture, respectively; U is the near surface wind speed; q sat (T s , p s ) is the saturation specific humidity at surface temperature and pressure; c p is the heat capacity of dry air and L v is the latent heat of water vapourization. We used a fixed value of 0.001 for the exchange coefficients C h and C e and constant of 4.8 m s -1 for the near surface wind U. This removes any surface exchange feedback caused by winds. The horizontal mean wind speeds are relaxed to a vertically uniform value of 4.8 m s -1 for zonal and 0 m s -1 for meridional wind, with a relaxation time constant of 3 h.
Our approach assumes that closely examining model behavior under RCE conditions (w = 0) will be helpful for characterizing model physics behavior. However, a few convection schemes in WRF-specifically, the Kain-Fritsch and New Simplified Arakawa-Schubert schemes-use mechanisms that involve the largescale vertical velocity in their convection triggering functions (see Table 2), even though this is arguably unphysical . Our experimental setup is possibly not well suited to such schemes, since they require non-zero vertical velocity (i.e., a departure from local RCE) to behave properly. The WRF SCM, however, does have small fluctuating w values in its individual grids due to the 3 × 3 horizontal grid stencil that it uses (described in Hacker & Angevine, 2013), which are sufficient to trigger convection in those schemes. Although the w values remain small (∼0.1 cm s -1 in individual grids, almost zero averaged over all grids) compared to those in nature, we believe that this is a reasonable test of any scheme since the average condition of the atmosphere on a large scale is close to RCE (i.e., no large scale w).

Perturbation Experiment
We apply the method described in HK13 to get the T and q responses to small perturbation of T and q tendencies ("inverse technique"). The procedure is briefly described here. We first use the PreRCE state to initiate a control run with no perturbations. For the perturbation runs, we initiate the same way but force the models with small, steady perturbations, separately, in temperature (dT/dt) and moisture (dq/dt) tendencies at every time step until a new RCE is reached. The applied perturbation follows that of HK13 and is the sum of a delta and Gaussian function. The form of the perturbation applied at the j-th model level is as follows: where p i is the local pressure, p j is the pressure at model level j, and  ij is a delta function at the j-th model level. The amplitudes of the perturbations are 0.5 K day -1 for temperature tendency perturbations and 0.2 g HWONG ET AL.

10.1029/2021MS002461
kg -1 day -1 for moisture tendency perturbations. The profile of a perturbation that peaks at a given model level is hence the respective amplitude multiplied by the function in Equation 5. For brevity, in this paper we refer to a perturbation profile that peaks at pressure level p as "perturbation at pressure level p". For instance, "perturbation at 850 hPa" denotes a perturbation profile where the magnitude of the perturbation peaks at 850 hPa.
Positive and negative perturbations are applied at every model level in separate runs. The anomalous state response vectors T' and q' are then the differences of the time-averaged T and q profiles between the perturbation and control runs. We ensure that the simulation lengths and averaging windows used in the models are long enough to attain sufficient signal-to-noise ratio (see Table 3). The anomalies of the positive and negative perturbation runs are averaged to obtain the best-estimate T and q responses presented in this paper; they can also be compared to assess linearity. Note that linearity is assessed following formula B1 in HK13: where    j D z is the discrepancy for perturbation applied at the j-th model level,

 
 j x z is the T or q anomaly for the perturbation at that level, with the +/− subscript denoting positive or negative perturbation, respectively.  0 D indicates perfect linearity. Detailed investigation into linearity is beyond the scope of this study. We merely ensure that the linearity of our models is satisfactory and comparable to that of the SCMs in HK13 (Appendix B1 in HK13). For a few models (UM-MF, SCAM, and LMDZ) we reduced the perturbation amplitudes to 0.2 K day −1 and 0.1 g kg −1 day −1 to improve linearity. In addition, for SCAM results are the average of an ensemble of five members after a series of random noise is added to specific humidity over the whole perturbation period, based on the procedure described in Appendix B4 in HK13 but with a longer period for each random perturbation. This additional step improved the linearity of the system, bringing the linearity of SCAM closer to that of the other SCMs. Table 3 summarizes the simulation details of the SCMs.
As mentioned in Section 1, we present the responses in the form of the matrix −M −1 , which shows the steady state responses per unit perturbation. To construct −M −1 , we multiply both sides of Equation 2 by −Y −1 and then by −M −1 to get −M −1 = XY −1 . Y −1 is a diagonal matrix where the diagonal elements are the inverses of the total power input (i.e., the mass-weighted vertically integrated heating or moistening rate over all model layers) for perturbation of a given model level in the units of W m −2 . Additionally, we multiply −M −1 by the standard power inputs of the SAM CRM (noting that the total power input to each model is different owing to the different vertical resolutions) so that the matrices of the SCMs are expressed in the more intuitive units of K or g kg −1 (

Individual Scheme Sensitivity Tests
We anticipate that the SCM behaviors examined here will largely be determined by their convective schemes but this is not guaranteed a priori. To test this, in addition to comparing the behavior of the SCMs as configured in Table 1, we also run two separate sets of simulations with alternate PBL and MP schemes. We run this part of the study only in WRF, as it is the only model system that provides multiple options for each parameterization and allows switching between schemes. We run these tests for only two perturbation levels: 850 and 650 hPa. As the radiative profile is prescribed, radiation schemes are not considered here. Four PBL schemes are tested: Yonsei University (

RCE Mean States
We begin by examining the RCE mean state of the SCMs for temperature and relative humidity (RH), as shown in Figure 1. These are calculated from the temporal averages of the state variables after the models have reached RCE in the control run (see Table 3). For the temperature profiles, saturation equivalent potential temperatures (θ es ) are shown instead of temperatures as they are more informative and show the spread better (in a temperature plot the curves are indistinguishable from each other). Note that for a given pressure there is a unique, monotonic relationship between θ es and absolute temperature T. The mean states of the CRM are shown for comparison (Figures 1a and 1d). The SCMs are generally colder than the CRM, probably due to the warmer SST used in K10's experiment (K10 used an SST of 29.5°C as opposed to 28°C in his SCM experiments in HK13. Sensitivity tests show that using SST = 29.5°C does not change the pattern of the perturbation results by much. For consistency with HK13's SCM experiments we used SST = 28°C). The profiles are all near moist-adiabatic but there are significant departures (Figures 1b  and 1c). In the region of scientific interest to this study (below 160 hPa), a maximum θ es difference of around 25 K (∼5 K in T) is detected around the surface regions (below 900 hPa) and around 20 K (∼8 K in T) in the free troposphere (except for UM-SBM). UM-SBM has an outlying RCE temperature profile that is consistently warmer than the other SCMs between the lifting condensation level (LCL) and tropopause. Additionally, UM-SBM's θ es profile, which is constant above 900 hPa, indicates that its model-predicted temperature above this level is a moist adiabat. As UM-MF and UM-SBM simulations are identical except for the convection scheme, it is realistic to assume this is not an implementation error. Despite the warm bias in UM-SBM, this SCM is included in this study as the pattern of the perturbation results is the primary interest (we further show in Section 5 that no correlation was found between the mean state temperature and the perturbation results). Nevertheless, this warm bias should be borne in mind in interpreting UM-SBM's results. Apart from UM-SBM, the spread in RCE temperature profiles among the SCMs is consistent with other similar studies (Daleu et al., 2015;Wing et al., 2020). Even among the WRF cases, which use the same experimental setups except for the convection scheme, there is a similar spread throughout the column.
A large spread is also found in the RCE RH profiles (Figures 1e and 1f), similar to what HK13 found, and consistent with results of comparable studies (Emanuel & Živković-Rothman, 1999;Rennó et al., 1994;Sobel & Bretherton, 2000;Wing et al., 2020). The RH values of the SCMs range between 56% and 88% at the surface levels and between 6% and 85% in the mid-troposphere. CNRM, UM-MF, UM-SBM, and WRF-BMJ are generally moister than the other SCMs in the free troposphere, while WRF-KF, WRF-ZM, and LMDZ5A are generally drier. Again, the WRF cases diverge considerably in their RH profiles despite identical simulation setups. A kink in the RH profile around the cloud base level (∼850-950 hPa) is detected in the CRM and the SCMs, albeit generally steeper in the SCMs. In a few SCMs these coincide with a slight inversion in their temperature profiles, although this is not always evident. The SCMs in our experiment are generally drier than the CRM, except for CNRM. The RH profiles of the SCMs also frequently display kinks in the free troposphere, which are not found in the CRM, for example ∼600 hPa for UM-MF and WRF-ZM, ∼700 hPa for WRF-NT. The RCE mean precipitation rates of the SCMs lie between 3.92 and 5.14 mm day −1 ( x = 4.78, σ = 0.38), similar to the SCM values of the RCE Model Intercomparison Project of Wing et al. (2020) and consistent with the expected precipitation rates diagnosed from the prescribed radiative profile.
The two cases involving the Zhang-McFarlane convection scheme (WRF-ZM and SCAM) display similar temperature profiles and comparable shape in their RH profiles, although WRF-ZM is consistently drier than SCAM by around 10%-20% in the free troposphere. Given that these two SCMs use largely the same model physics (Convection, PBL, and MP schemes), the differences in their mean state could be due to numerics or the way the schemes are implemented. The same applies for the two Betts-Miller cases (WRF-BMJ and UM-SBM), which also display quite different temperature and RH profiles, although HWONG ET AL.
10.1029/2021MS002461 11 of 34 in this case the models use different PBL and MP schemes. Additionally, the BMJ and SBM convection schemes-although based on the same concept of relaxation toward a reference profile-differ considerably in their implementation. The two LMDZ6A versions (6A and 6Ab) display almost identical temperature and RH profiles, while the profiles of LMDZ5A differ considerably from those of the LMDZ6A versions.
It is difficult to analyze the SCMs behavior based solely on their RCE mean states presented in Figure 1.
In order to investigate this further, we next present in Section 4 the linear responses outlined in Section 2, which convey richer information about the models behavior. We will explore whether the RCE mean states and linear responses are related in Section 5, and investigate the impact of PBL and MP schemes on the RCE mean states in Section 6.

Key Aspects of the SCM Responses
In this section we present vertical profiles of the T and q responses (i.e., departure from RCE profiles presented in Section 3) resulting from temperature and moisture tendency perturbations at two particular levels (850 and 650 hPa), for the SAM CRM and four selected SCMs (Figures 2 and 3). The goal is to illustrate a few high-level observations in a more intuitive format before delving into the full results. The complete −M −1 matrices of all models and a more detailed analysis of their behavior are presented in Section 4.2.
HWONG ET AL. Overall, the responses vary greatly among the models. Here, for each variable (T or q response) we show the responses of the SAM CRM from K10, one SCM that closely resembles the CRM (CNRM for T response, WRF-BMJ for q response), and one that differs greatly from it (UM-MF for T response, WRF-NSAS for q response).
As K10 pointed out, the CRM responds to both heating and moistening perturbations by warming throughout the column, approximating the difference between two moist adiabats (Figures 2b and 2f). The attendant q responses roughly resemble the expected change in specific humidity computed using the corresponding change in T, if RH remains the same as in the reference state (Figures 3b and 3f). CNRM and WRF-BMJ largely echo this CRM behavior in their T and q responses, respectively (Figures 2c and 2g; Figures 3c and 3g). The observation that WRF-BMJ responds in a similar way to the CRM is perhaps unsurprising, given that the shift in the CRM's response profiles largely conforms to the difference between two moist adiabats. This is the way Betts-Miller type schemes are constructed, where convective activity acts to relax the atmospheric state back to a reference profile, often a moist adiabat (Betts, 1986;Betts & Miller, 1986). We elaborate further on the behavior of WRF-BMJ and CNRM in Section 4.2.2.
By contrast, UM-MF and WRF-NSAS exhibit significantly different behavior compared to the CRM. UM-MF shows cool anomalies above the heating levels ( Figure 2d). When moistening is applied, its T response drops abruptly to zero around 650 hPa, above which the change in T appears to intensify (Figure 2h). This happens for both perturbation levels. WRF-NSAS shows sharp negative anomalies in its q responses around 850 hPa when heating or moistening is applied (Figures 3d and 3h), again for both perturbation levels.
10.1029/2021MS002461 13 of 34 Nevertheless, there are a few similarities between the four SCMs and the CRM. Perturbations applied at the higher level (650 hPa) induce stronger responses, likely because convective damping is weaker at higher altitudes, making the convection less able to counter the applied forcing at those levels. A greater change in the equilibrium state is then required to sufficiently alter the convection. All four SCMs display the greatest q responses at the surface levels where the specific humidity itself is largest.
One notable difference between the four SCMs and the CRM is the sharp kinks in SCM responses, commonly around the model-predicted cloud base level (850-950 hPa), but also in the mid-troposphere in UM-MF. These kinks often appear to divide the responses into distinctive regions, signaling a level shift in sensitivity. In UM-MF, for example, the T responses either decrease (for heating perturbation, Figure 2d) or increase (for moistening perturbation, Figure 2h) dramatically above the kink around 600 hPa. This characteristic is not observed in the CRM, whose responses are generally smoother and do not appear to have discontinuities, except for a slight kink in its T response when perturbing 650 hPa (Figure 2b), which could be because applied heating produces a small inversion that reduces the T response just above it. The presence of sharp kinks in the SCMs and not the CRM suggests that the kinks might reflect threshold-related behavior common in convective parameterizations. Another possible cause could be the simple bulk plume assumption used in many convection schemes, which might not be able to respond smoothly to localized perturbations, leading to bumpier model responses. We elaborate on this in Section 4.2.2.

Matrices of T and q Responses
In this section, we present the −M −1 matrix, which gives a more complete overview of the SCMs' behavior. Note that these are the inverses of the M matrices of K10 and should not be directly compared with previous studies showing M. For plotting, we divide −M −1 into four quadrants: T response to heating perturbation (Figure 4), q response to heating ( Figure 5), T response to moistening (Figure 6), and q response to moistening ( Figure 7). Basically, the quadrants show the T or q response profiles for successive perturbation levels stacked next to each other, with the main diagonal representing the local responses (i.e., responses at pressure level p to perturbation applied at p). The profiles in Figures 2 and 3 comprise two columns of these matrices: The x-axis in these figures is the perturbation level and the y-axis the response level. First, we present the broad features that are largely similar between the models (Section 4.2.1); then, notable differences between the models are presented (Section 4.2.2); finally, we compare the matrices of SCMs with similar or comparable model physics (Section 4.2.3).

Similarities Between Models
We first examine if the features presented in Section 4.1 are valid across all perturbation levels and models.
Overall, as noted before, the CRM and SCMs all show a general tendency toward stronger T and q responses when perturbations are applied higher in the troposphere (Figures 4-7, increasing warmer colors toward the right columns of the matrices), and changes in q responses are generally largest at the surface levels where moisture content is the largest (Figures 5 and 7, warmer-colored horizontal layers close to surface), although sudden surges in q response are sometimes observed higher up. CNRM and Betts-Miller type schemes (WRF-BMJ and UM-SBM) behave most similarly to the CRM (d, h, and j in Figures 4-7), especially in their T responses. We offer a potential explanation for this in Section 4.2.2.
Additional similarities between the models can now also be observed when scrutinizing their complete −M −1 matrices. In general, upper-tropospheric heating produces strong upper-tropospheric warming responses in all models (Figure 4, warmer colors in upper right corners, indicating stronger positive T responses), but inconsistent lower-tropospheric warming. Lower-tropospheric heating, on the other hand, leads to weak lower-tropospheric warming, but usually stronger upper-tropospheric warming. In other words, in the upper troposphere larger T responses are required to balance the imposed heating there, while heating applied in the lower troposphere requires much smaller T responses to stabilize. Also, heating applied at any level tends to increase the moisture below the perturbation level ( Figure 5, red lower right triangles indicating positive q responses) and reduce it above, but to varying degrees among the models.
Next we examine the responses to moistening perturbations (Figures 6 and 7). Overall, the T responses to moistening are the most consistent across models of the four response types, and moreover are relatively uniform across a wide range of perturbation levels ( Figure 6). This indicates that moistening tends to produce warming responses that are independent of where forcing is applied, while (as with the response to heating perturbations) increasing with height. Moistening also tends to provoke a stronger q response at and/or below the moistening level, sometimes with a weaker response above (Figure 7).
The above observations can be explained with the following physical interpretation. The difference in local T responses to heating perturbations in the upper and lower troposphere indicates strong lower tropospheric damping and weak upper tropospheric damping as noted earlier. Note that weaker damping is indicated by warmer colors in the figures (i.e., stronger responses required to compensate for the imposed perturbation). The increase in moisture below a heating level is also expected since heating stabilizes the atmosphere locally, inhibiting convection and trapping moisture below the heating level, leading to drying of the air above. The near-invariance of the response of T to the moistening level is interesting and the reason not obvious, but suggests that moisture added at any level ends up benefitting deep convection throughout the column.

Differences Between Models
Next we analyze the notable differences between the models. First, we note that the outlying behavior of UM-MF in its T response (horizontal discontinuity around 600 hPa and cool anomalies above heating levels; Figures 4g and 6g) and WRF-NSAS in its q responses (exceedingly strong, mostly negative, q responses around 850 hPa, Figures 5f and 7f) described in Section 4.1 is now observable across all perturbation levels.
In general, the matrices of the SCMs are not as smooth as the CRM, containing more splotchy patterns that indicate jumpy responses, with discontinuities sometimes evident with respect to forcing level (vertical stripes) and sometimes with respect to response level (horizontal stripes). This is most apparent in the lower troposphere, possibly because responses in these layers are more dependent on contributions from different physics schemes (e.g., PBL and convection schemes). The inconsistent responses in the lower levels could be reflective of the different ways schemes represent shallow convection, downdrafts, and the evaporation of precipitation.
The kink around cloud base (∼900 hPa) noted in Section 4.1 is clearly visible as a horizontal stripe across all perturbation levels and in all SCMs, most prominently in their q responses (Figures 5 and 7) but in a few cases also their T responses (Figures 4 and 6).  Figure 4, but for q responses to temperature tendency perturbation, in the units of g kg −1 . and 6) and stronger for q change (warmer-colored horizontal layers near surface in Figures 5 and 7). As mentioned, these discontinuities are not observed in the CRM, indicating that they probably reflect threshold-like behavior common in convective parameterization, or perhaps deficiencies in the coupling to the PBL schemes. Our speculation of threshold values used in convection schemes as the cause for these discontinuities is also supported by analyzing the linearity of the SCMs' responses (not shown). As mentioned in Section 2.2, we ensure that the responses are linear to a large extent (calculated with Equation 6). Nevertheless, non-linearities are sometimes detected and we found that they often coincide with the heights of the discontinuities. This suggests that threshold-related mechanisms-which are inherently non-linear-could be the common cause for both the discontinuities and response non-linearity.
It is noteworthy that the discontinuity around the LCL is more pronounced in the q responses than the T responses. This echoes findings of GCM studies, where moisture errors in convective regions are usually larger than temperature errors, possibly a consequence of deficiencies in the formulation of the entrainment and detrainment processes of moisture in some convection schemes (Gregory, 1997). For example, in a mass-flux based approach, errors in estimating the apparent moisture sink (Q2 in the notation of Yanai et al., 1973) can arise when the effect of entrainment into the areas near cloud base is not properly represented, leading to an underestimation of drying in regions below 800 hPa and overestimation above this HWONG ET AL.
10.1029/2021MS002461 Figure 6. As in Figure 4, but for T responses to moisture tendency perturbation, in the units of K.
level (Gregory & Miller, 1989). This has consequences in the way a convection scheme behaves when additional heating or moistening is imposed in our experiment.
A few SCMs also display kinks or discontinuities in their T responses around the freezing level, which are not present in the CRM: Around 650 hPa for WRF-NT, and around 600 hPa for WRF-ZM, UM-MF, and SCAM (c, e, g, and i in Figures 4 and 6). For the latter models T responses near the freezing level are generally weak (cooler color stripe), while for WRF-NT they are strong (warmer color stripe). All four SCMs use plume-based mass-flux schemes with CAPE closure, although the location of these anomalies near the freezing level suggests a possible role for microphysics and phase transitions around the freezing level.
Overall, we note two main groups of SCMs: The first displays smoother responses (especially in their T responses) that are more similar to the CRM, and the second exhibits more jumpy and disjointed behavior. As mentioned, the former consists of SCMs employing Betts-Miller adjustment type schemes (WRF-BMJ and UM-SBM) and CNRM. The remaining models belong to the latter group, and all employ mass-flux based convection schemes. A steep decrease in T response (at times negative) immediately above the imposed heating is often detected in the second group, most evident in WRF-KF, WRF-NSAS, UM-MF, and LMDZ6A (blue hues in b, f, g, and i in Figure 4). The discontinuity in responses (horizontal stripe) mentioned before HWONG ET AL.

10.1029/2021MS002461
18 of 34 is also more prominent in the second group. These behaviors may be a reflection of the way convection balances the imposed forcing. In mass-flux schemes, where it is mainly the subsidence term that balances the forcing, this can be achieved either through modification of the mass-flux shape or the environmental profile (or a mixture of the two). Where the mass-flux shape is less flexible, the environment has to be modified substantially to accommodate the forcing; where the mass-flux shape is more adaptive, less modification to the environment is required. It will be a subject of future research to identify the correct balance between these two.
The simpler assumptions of Betts-Miller adjustment-type schemes might result in more efficient balancing of the applied perturbations. We speculate that this could be due to how the closures are applied in the BM schemes for deep convection. In UM-SBM, the CAPE closure is applied by ensuring enthalpy conservation, which is achieved by either shifting the temperature reference profile (a cooling effect), or by reducing the precipitation rate computed from the moisture relaxation (a drying effect). Both methods are applied with a constant change to the convective tendencies at each vertical level between the ascent level and the level of neutral buoyancy. In WRF-BMJ, the enthalpy conservation is broadly similar to UM-SBM, as the applied enthalpy correction is smooth between the vertical levels. The closure of the BM schemes might explain why they are more effective in balancing the imposed forcing. The smooth CRM-like response of CNRM is interesting, as it is the only mass-flux scheme that exhibits smooth responses. What sets it apart from the schemes in the second group is its consistent use of buoyancy as the forcing term in the scheme design, including triggering condition, mass-flux calculation, entrainment and detrainment rates (Guérémy, 2011). It is possible that this smoother and continuous treatment of convection enhances the scheme's ability to respond locally to perturbations and could have contributed to its CRM-like responses. However, further tests are required to confirm this.

Comparison of SCMs with Similar Physics
We now analyze the −M −1 matrices of similar or comparable SCMs: The three LMDZ cases (LMDZ5A, 6A, and 6Ab), the two Betts-Miller cases (WRF-BMJ and UM-SBM), and two Zhang-McFarlane cases (SCAM and WRF-ZM). Since these groups of SCMs share related convection schemes, they might be expected to produce similar results.
The three LMDZ versions share the same deep convection scheme but with different ways of handling shallow convection and associated clouds, and cold pools (Tables 1 and 2). They display significantly different responses (k, l, m in Figures 4-7). Two additional parameterizations are introduced in LMDZ6A that were not available in LMDZ5A: The representation of dry and shallow convection by a thermal plume model, and near-surface cold pools created by the evaporation of precipitation. Indeed, differences in response between LMDZ5A and LMDZ6A are the largest at low levels (below 800 hPa), with LMDZ6A displaying weaker T and q responses. The big discontinuity in the q responses of LMDZ5A around cloud base (950 hPa, purple line in Figures 5k and 7k) appears to be attenuated in LMDZ6A, perhaps an effect of the new parameterizations which are active at this level in LMDZ6A. The T responses to perturbations above 500 hPa are also stronger at high levels in LMDZ6A than LMDZ5A, which could be related to the different representation of entrainment between the two versions .
LMDZ6A displays unusually strong q responses within its shallow convective cloud layer (between 800 and 600 hPa) when perturbations are applied at certain levels (dark red blocks in Figures 5l, 7l). Its T responses also show unusual behavior in this layer, with a clear horizontal discontinuity at around 600 hPa and irregular responses below (800-600 hPa, Figures 4l, 6l), for example negative anomalies are observed when heating perturbations are applied below 800 hPa (blue hues in Figure 4l). In fact, our perturbation experiments have shed light on a problematic behavior of LMDZ6A that was not identified earlier with traditional 1D case-studies nor 3D experiments. Following these results, further investigation pointed to potential flaws in the representation of the evaporation of precipitation in the large-scale cloud scheme of LMDZ6A, which also handles shallow clouds. In LMDZ6A, evaporation of precipitation has two limitations: (1) it assumes that precipitation falls into clear sky, which could potentially overestimate evaporation in the shallow cloud layer, (2) at a given level it is not possible to saturate a fractional area greater than the maximum cloud fraction above, which can lead to underestimation of evaporation. LMDZ6Ab is a slightly modified version of LMDZ6A where a new scheme, inspired by Jakob and Klein (2000), has been introduced to take into account the overlap between clouds in the formation and evaporation of precipitation, thus addressing the two limitations outlined above. Our results here show that this new development has a significant effect on the model behavior, improving its q responses between 800 and 600 hPa (l and m in Figures 4-7), as well as the linearity of the responses (not shown). This implies that the representation of the evaporation of precipitation may be an important factor in the response of a model to a modification of its environment. Note also that the RCE mean states of LMDZ6A and LMDZ6Ab are almost identical, showing that their −M −1 matrices have captured important features of the models which are not obvious by only scrutinizing their mean states.
The two Betts-Miller SCMs employ related convection schemes but in two different SCM architectures and with otherwise different model physics. Even though they both exhibit behavior that is close to the CRM, there are telling differences between them. While their T responses are largely similar, they display quite different q responses. The q responses of UM-SBM to moistening applied at mid-levels (800-400 hPa) are more localized (dark red diagonal in Figure 7h), that is a peak in forcing is attenuated by a peak in response in the same region, whereas WRF-BMJ displays more uniform q responses to moistening (Figure 7d), more similar to the CRM. While these two models have similar convection schemes, the implementation of the two schemes is different enough that we would not expect a priori for the perturbation responses to be the same. Compared to the original Betts-Miller scheme, UM-SBM is a simplified version while WRF-BMJ is more complex. One possible explanation for the different q responses in these two cases could be that our experiments have picked up on the changes implemented by Janjic (2000) in the BMJ scheme that include a more sophisticated formulation of the moisture profile and variable relaxation time. It is also possible that other model differences play a role, although we think this is less likely (see Section 6).
The two Zhang-McFarlane SCMs employ nominally identical model physics in two different SCM model systems (WRF vs. CAM). As we would hope, they exhibit largely similar behavior (e and i in Figures 4-7).
Although not identical, they are still significantly more similar to each other than to the other SCMs. In any case, their −M −1 matrices are more similar than their RCE profiles in Figure 1, again suggesting that linear responses may provide a clearer window into model physics than mean profiles. A slight horizontal discontinuity around the freezing level (∼600 hPa) in the T responses is visible in both SCMs (e and i in Figures 4 and 5). Given the location of this discontinuity, one possible explanation could be the interaction between the Zhang-McFarlane deep and the UW shallow convection schemes. To test this, using WRF we reran the experiments with only the Zhang-McFarlane deep convection scheme switched on and the UW shallow convection scheme switched off. Results show that the horizontal discontinuity around the freezing level remains present (not shown). We further tested altering a constant that defines the freezing level in the ZM scheme, which shifted the horizontal stripe to the new specified freezing level, confirming that it is caused by the ZM scheme. As mentioned before, such discontinuities could indicate threshold setting in the scheme; in the ZM scheme, for example, a threshold is implemented to restrict precipitation production only to clouds that extend beyond the freezing level (Zhang & McFarlane, 1995), which could explain our results.
We note that the physical explanations presented in Section 4 are preliminary and speculative at this point. Nonetheless, they serve as useful hypotheses to guide ongoing research. The main takeaway from this section is that the idealized framework based on the linear response function that we have applied is able to illuminate and locate areas of agreement and differences between model physics, which can provide insights into physical processes or ways to simplify or improve current convective parameterizations.

Relationship Between RCE Mean States and Responses
We noted in the previous section a couple of examples where aspects of model behavior changes were more evident in the linear responses than in RCE mean states (temperature and RH) described in Section 3. In this section we examine more generally if the linear responses can be linked to the RCE mean states in any way. One aspect that is well documented is the interaction between environmental humidity and convection. Convective activity has been shown to be sensitive to environmental humidity in observational studies (Brown & Zhang, 1997;Parsons et al., 2000;Sherwood et al., 2004) and experimental analyses using CRMs (Grabowski, 2003;Tompkins, 2001). Derbyshire et al. (2004), for example, found a significant impact of mid-tropospheric humidity on convective activity, where a dry RH inhibits deep convection and encourages shallow convection instead. A recent study by Wolding et al. (2020) found a cyclical behavior of moisture and convection which points to a joint evolution of the two variables. In our experiment, we found a large spread in the SCMs' RH profiles as well as their responses. Convection plays a role in influencing both. However, we do not know if they (RH and T, q responses) respond in similar fashion. This section addresses this question. Specifically, "do a model's temperature and moisture responses to heating and moistening perturbations correlate with its RCE mean state?" The first aspect we examine is whether the shape of a model's mean RH profile is linked to the shape of its responses. As pointed out in Section 3, the mean RH profiles often contain kinks. We found that these kinks almost always coincide with discontinuities in the linear responses (horizontal stripes in the −M −1 matrices). These kinks are ubiquitous at cloud base but can also be observed at ∼700 hPa for WRF-NT, ∼800 hPa for WRF-NSAS, ∼600 hPa for WRF-ZM, SCAM, and UM-MF, and ∼500 hPa for LMDZ5A. This collocation of RH kinks and discontinuities in linear responses are found in both T and q responses to heating and moistening perturbations. The smooth −M −1 quadrants of the CRM are likely related to its smooth RH profile. Additionally, the size of the kinks in the RH profile appears to have an impact on the size of the kinks in the responses: For example, the big RH kinks around 800 hPa of WRF-NSAS and around 600 hPa of UM-MF ( Figure 1e) coincide with strong discontinuities in their responses at the same heights (f and g in Figures 4-7), while the smaller RH kinks around 600 hPa of WRF-NT, WRF-ZM, and SCAM coincide with smaller or less obvious discontinuities in their responses. The even smaller RH kink of WRF-BMJ around 600 hPa hardly registers in its responses. The correspondence appears to fade away in higher altitudes (above 500 hPa): For example, the RH kinks around 450 and 350 hPa of WRF-NT, around 450 hPa of WRF-NSAS, and around 400 hPa of WRF-KF are not noticeably associated with discontinuities in their respective model responses. This is probably because the amount of moisture available at these higher altitudes is too small for any sharp changes to be registered in the responses. Now that we have seen that the shape of the RCE mean RH profile is linked to the linear responses, next we examine if there is a correlation between the magnitude of these two components in either temperature or moisture. This will tell us whether, if a model's environment is warmer or moister, it will also respond more strongly to heating or moistening perturbations. To this end, we correlate the RCE θ es and RH values of all the models at specific pressure levels with their responses at various levels and averaged over all perturbation levels, that is the average of a horizontal stripe in a −M −1 quadrant (negative anomalies are set to zero to avoid ambiguity in interpreting the correlations). We compute the Spearman correlation coefficient of these correlations as it is more suitable for non-parametric data and less sensitive to outliers than Pearson correlation coefficient (Kokoska & Zwillinger, 2000), although both methods for computing the coefficients return similar results for our experiment. Eight common pressure levels were selected between 1,000 and 200 hPa, in intervals of 100 hPa. This yields eight 8 × 8 correlation matrices, one for each combination of RCE variable (θ es or RH), forcing variable (dT/dt or dq/dt) and response variable (  T or  q ), with the matrix entry in the i-th column and j-th row representing the correlation coefficient between the RCE variable at pressure level p i and response at pressure level p j . In other words, an entry in our correlation matrix denotes the Spearman correlation coefficient r ij between two data series A i and B j : where      , , ..., representing the mean response (  T or  q to dT/dt or dq/dt perturbation) at pressure level p j for the models.
We found significant correlations (p-value < 0.05) only between the RCE RH and q responses to applied heating, shown in Figure 8. The other correlation matrices contain mostly weak correlations (|r ij | < 0.5) and are not explored here. Apart from in the boundary layer, RCE RH is positively correlated with q responses locally and at levels higher up, evident by the red tiles in and above the main-diagonal. That is, a high RCE RH at level p tends to correspond to strong q responses at p and above, or a strong q response at p tends to correlate with high RH values at p and below. The local correlations suggest that high RCE RH values at certain levels indicate that convection is acting strongly and introducing moisture near those levels, and thus when convection is slightly enhanced via a temperature or moisture tendency perturbation, the q responses at those levels are also larger due to the stronger effect of convection there. The strong positive correlations above the main diagonal are interesting. These results suggest high RH at level p permits convection to penetrate that level more easily, possibly an effect of entrainment of the environmental air, which leads to stronger q responses above p. It is also possible that variations in scheme behavior affect convection at level p, which leads to RH changes in the same direction near and below p. Interestingly, the same correlations are not observed for T responses. In other words, while the shape of the RCE RH profile reflects that of both T and q responses, the magnitude of mean RH reflects only the magnitude of q responses.

Sensitivity to PBL and MP Schemes
Here we present results from the tests to determine the role of schemes other than convection schemes in a model's linear response, as described in Section 2.3. Specifically, this section addresses the question: "Do a model's RCE mean state and responses to heating and moistening perturbations change significantly when different PBL or microphysics (MP) schemes are used?" We first present the sensitivity of the RCE mean states to the choice of PBL and MP schemes (Figures 9  and 10). Figure 9 shows clearly that the impact of the other schemes on the mean state temperature, especially the microphysics scheme, is small compared to that of the convection scheme. The RCE profiles of RH do show some sensitivity to choice of PBL and MP schemes, but at different heights of the troposphere ( Figure 10). For PBL sensitivity, differences are more prominent in the lower-to mid-troposphere (below HWONG ET AL. 500 hPa). For MP sensitivity, divergence between the MP schemes appears mostly in the upper troposphere (above 500 hPa). This is consistent with expectations that the treatment of convective outflows and cloud hydrometeors will be most important to the water vapor budget in the upper troposphere where vapor amounts are smallest. Overall, the RCE temperature (θ es ) profiles are predominantly decided by the convection scheme while the RH profiles can be influenced by the PBL and MP schemes at different elevations.
Next, we present the sensitivity of T and q responses to the choice of PBL and MP schemes (Figures 11  and 12). To explore this we only perturbed the two levels shown in Figures 2 and 3 (850 and 650 hPa). As perturbing both levels return similar results, only results from the 850 hPa perturbation case are shown. We also combine the results for temperature and moisture tendency perturbations and show only the average as their sensitivities are very similar. Overall, the responses are not sensitive to MP schemes (Figure 12), and slightly more sensitive to PBL schemes ( Figure 11). WRF-KF is not sensitive to changes in either PBL or MP schemes. For WRF-NT, WRF-NSAS, and WRF-BMJ, the responses to temperature and moisture tendency perturbations when combined with different PBL schemes retain their general shape, except for the case of ACM2 PBL scheme, which shows outlying q response when combined with WRF-NT.
10.1029/2021MS002461 23 of 34 In summary, we find that the T and q responses are much more sensitive to the convection schemes than the PBL or MP schemes, indicating that our perturbation experiments can isolate the impact of convection schemes. This is also true for the RCE RH profile, but only at low and mid-levels, above which it is affected by microphysics. However, there are important caveats to these findings. These experiments have only been conducted in the WRF model, which has a modular design and relatively independent physics schemes. The same insensitivity might not hold in other models that employ a more integrated approach in the design of its model physics, where there is a tighter coupling between the schemes. See, for example, the differences between the response matrices of LMDZ6A and LMDZ6Ab, where only the large-scale cloud scheme has been modified. Note, however, the large-scale cloud scheme in LMDZ also handles shallow clouds (in the WRF cases shallow clouds are handled by the convection scheme), hence it is still reasonable to postulate that convective parameterization (including convective MP) dominates the linear responses at least in the lower-and mid-troposphere. Also, the weak sensitivity to PBL and MP schemes is most likely exaggerated by our experimental setup. Specifically, the use of RCE with an idealized radiative cooling profile, and a constrained surface flux computation. If these sensitivity tests are repeated with interactive HWONG ET AL.

10.1029/2021MS002461
24 of 34 radiation, surface wind and exchange coefficients, sensitivity to PBL and MP schemes becomes more significant (see Appendix A).

Conclusions
The overall goal of this paper is to advance our understanding of what can be learned about model physics from single-column models (SCMs) run in radiative-convective equilibrium (RCE) configurations. The objectives are threefold: First, to compare the RCE mean states of a few SCMs containing state-of-the-art physics currently used in atmospheric modeling; second, to compare and examine the behavior of the SCMs by observing their steady temperature and moisture responses to small temperature and moisture tendency perturbations (−M −1 matrices) using the linear response framework (Herman & Kuang, 2013;Kuang, 2010); and third, to determine which physical schemes control the RCE mean state and/or linear responses.
In terms of the first objective, similar to other recent intercomparison studies (e.g., Wing et al., 2020) we found substantial differences between the SCMs in their RCE temperature and relative humidity (RH) profiles, with ∼5 K differences in absolute temperature in the near-surface levels and ∼8 K in the free troposphere (with the exception of one outlying SCM) and free-tropospheric RH spanning nearly the entire pos-HWONG ET AL. sible range (0%-100%). Even between the SCMs that use similar convection schemes, the difference in their RCE profiles is non-trivial: The two Zhang-McFarlane cases (WRF-ZM and SCAM) show similar shapes in their RH profiles but WRF-ZM is consistently somewhat drier than SCAM and the temperatures vary by several K at some levels, while the RH profiles of the two Betts-Miller cases (Betts-Miller-Janjic in WRF and Simplified Betts-Miller in UM) differ in both shape and magnitude.
In addressing the second and third objectives, we arrive at the following main conclusions: (1) The idealized SCM testing framework appears capable of isolating the behavior of convection schemes, thus enabling direct evaluation of these schemes against CRM or LES reference calculations (2) This framework identifies areas of agreement, but also substantial differences in behavior among the models, which in some cases can be related to scheme design (3) Some linear responses correlate with the RCE mean profiles (RH in particular), while others do not and hence constitute independent information. While the RCE RH profile is strongly influenced by the convection scheme, it is more sensitive to other physics schemes than are the linear responses. The RCE temperature profile is however insensitive to schemes other than the convection scheme, in this setup (4) Almost all SCMs show irregularities or discontinuities in behavior that are likely related to threshold-related mechanisms built into the convection scheme(s), and which do not appear in the SAM CRM HWONG ET AL.
10.1029/2021MS002461 26 of 34 These conclusions will now be briefly discussed in turn.
First, our experiments manage to largely isolate the behavior of the convection schemes in the SCMs. We found multiple lines of evidence for this. In the WRF model, the temperature and moisture responses to applied heating and moistening vary greatly among the convection schemes but do not deviate much when different microphysics (MP) or planetary boundary layer (PBL) schemes are used. This shows that-although in some cases the PBL scheme exerts some influence-the T and q responses are predominantly decided by the convection scheme. Also, the linear responses of the same or comparable convection schemes (the two Zhang-McFarlane and Betts-Miller cases) are considerably more alike than their RCE profiles are, supporting this finding.
Second, our framework highlights the areas of agreement and disagreement between the SCMs, and between them and the CRM, which can potentially be linked to the convection scheme design of the SCMs. The SCMs in our experiment generally reproduce the broad behavior of the CRM, albeit to different degrees. Their responses are often not as smooth and contain more splotchy and irregular patterns. Nevertheless, many SCMs exhibit behavior that is closer to the CRM than the SCMs in HK13. In general, heating perturbations lead to more diverse responses among the SCMs than do moistening ones. These disparities in response point to the different characteristics of the convection schemes and provide clues as to where to focus further investigations. Overall, two main groups emerge from inspecting their responses: The first group exhibits smooth responses akin to that of the CRM and the second displays more jumpy responses. The former group includes two variations of an adjustment-type convection scheme (Betts-Miller) and a buoyancy-based mass-flux convection scheme (CNRM), while the latter contains only mass-flux based convection schemes with CAPE closures. A scheme's responsiveness in the vertical might hold the key to the smoothness of its response. The CRM-like responses of the Betts-Miller cases point to the efficiency of adjustment-type schemes to counteract the applied localized perturbations, while the dependency of the mass-flux based schemes on vertically integrated quantities perhaps hindered their responsiveness and contributed to their bumpier responses. Our experiments also highlight important discrepancies between the three versions of the LMDZ model that employ different physical packages, uncovering shortcomings in LMDZ6A that previous studies using traditional methods have not discovered. Notably, LMDZ6A and LMDZ6Ab display almost identical RCE mean states, but very different linear responses, with LMDZ6A exhibiting abnormally strong q responses within the shallow convective cloud layer. Following an update in the way evaporation of precipitation is represented in the model (LMDZ6Ab), a marked improvement in the model's moisture responses in the shallow cloud layer was observed, demonstrating the usefulness of our framework in parameterization development.
Third, some aspects of the linear responses correspond to features of the RCE mean profiles, while others do not and can be regarded as independent diagnostics. As mentioned above our experimental setup can isolate the behavior of the convection scheme. This is also true for the RCE temperature profiles although they provide less information about the differences between convection schemes. It is partially true for the RH profiles, where the convection scheme has the strongest influence, but only at low and mid-level altitudes, above which the MP scheme plays a significant role. In other words, multiple physics schemes could potentially exert control on a model's RCE mean state, whereas its T and q responses depend mainly on the convection scheme. It is unclear how to physically interpret links between the RCE mean profile and linear responses, since either could affect the other. The extent to which the models' diverse RCE mean states directly influence their responses is hard to estimate. Like HK13, we did not attempt to tune the parameters of the SCMs to bring their mean states closer to each other (The vertical resolution of a model likely also has an impact on its responses, which we also did not standardize between the SCMs). Nonetheless, in our experiments we found evidence that the two measures are correlated to some extent, particularly the RCE RH profiles and the perturbation responses. The responses correspond to the model's RH profile in two ways. First, the shape of the RH profile is related to the shape of the responses in the sense that kinks in the RH profiles often locally coincide with kinks in the responses (both T and q responses). The models that display more uniform responses also produce smoother RH profiles in RCE (the SAM CRM, Betts-Miller schemes, and CNRM). Second, the magnitude of RH is positively correlated with the magnitude of q (but not T) responses locally, as well as higher above, suggesting that a wetter environment corresponds with convective activity that introduces moisture locally, and hence when we apply perturbation the models with larger RH react more vigorously in their moisture response, possibly caused by detrainment. It is noteworthy that the shape of RH corresponds to the shape of both T and q responses, while the magnitude of RH is linked only to the magnitude of q responses. This implies that the two moisture-related variables (RH and q responses) tend to behave in a consistent manner, while T responses can be regarded as a complementary diagnostic.
Fourth, all SCMs in our study show discontinuities in their behavior that are likely associated with thresholds embedded in the convection scheme design, and which are not observed in the CRM. Although the responses of our SCMs are linear to a large extent, the locations (heights) of the larger non-linearities often coincide with discontinuities in their responses, suggesting a common cause. Since threshold-related mechanisms are inherently non-linear, it is reasonable to suggest that they are a possible explanation for both non-linearity and response discontinuities. These discontinuities manifest themselves as horizontal stripes in the −M −1 matrices, which often divide the responses into regions with distinctive behaviors. For example, a discontinuity is observed around the model-predicted cloud base level in all the SCMs. In a few SCMs, discontinuity is also observed around the freezing level, indicating an inability of the scheme to respond smoothly to phase transition. Admittedly, the vertical transport of heat and moisture through transitioning levels is challenging to parameterize (Neggers et al., 2017). To simplify matters, convection schemes often use threshold values in their design. Thresholds are also a common feature used for the triggering of deep convection. For example, the Arakawa-Schubert scheme uses the threshold value for a concept called cloud work function to trigger convection (Arakawa & Cheng, 1993). Suhas and Zhang (2014) analyze the triggering systems of a few widely used convection schemes and found that some of their performance can be improved by optimizing the threshold values used. Ultimately, these threshold values are often subjective and sometimes arbitrary. They are at best ad-hoc limitations placed in a scheme to represent processes that we do not yet fully understand, and our experiment captures this flaw.
In this study, we have tried to standardize the simulation setup among the SCMs to the best of our abilities, including using an idealized radiative cooling profile and surface fluxes computation. However, our experiments are still not a completely untainted assessment of the effects of convection schemes as other model aspects potentially play a role in the model responses. Ideally, all convection schemes in our study would be run in the same SCM to allow the purest evaluation of their behavior. This is far from a trivial task and we have not attempted to do so in the present study. Nevertheless, we have accomplished this to a limited extent with WRF, where five convection schemes were tested under identical simulation setup apart from the convection scheme, and the differences in their linear responses can therefore be considered to be largely contributed by the respective convection scheme.
By expanding on the experiments of HK13 to a few widely used models and convection schemes, we demonstrate that the idealized framework based on a model's responses to small heating and moistening perturbations is a useful approach to study the behavior of models and their parameterizations. In this study we compare our results to the CRM (2 km resolution) results of K10 as it is the most viable option available to us. However, we caution that these CRM results cannot be regarded as the "truth", as past studies have shown that CRMs can potentially return different results depending on model resolution (Fan et al., 2017;Lebo & Morrison, 2015;Varble et al., 2014) and other parameterized physics such as the microphysics schemes (Khain et al., 2015;Kim et al., 2014;Liu & Moncrieff, 2007). There is a need for more studies to be done-large-eddy simulations (LES), for example-to verify K10's results. Nevertheless, the T and q responses presented here are a simple and helpful way to characterize and evaluate convection schemes. Clues for deficiencies in a scheme can be diagnosed from the irregularities in the −M −1 matrices and the location of these irregularities could provide guidance in examining the causes of errors in model physics. The results presented in this study are a useful first step in investigating the behavior of convection schemes. We acknowledge the importance of providing physical explanations for the features identified here, which can help improve certain aspects of convection schemes. This is a non-trivial task and forms part of our ongoing work.

Appendix A
We report here the impact of our idealized experimental procedure on the sensitivity of the responses to PBL and MP schemes. Specifically, we show the effects of the idealized radiative profile and surface flux computation. In the first set of simulations we enabled interactive radiation and kept the idealized surface Figure A1. WRF-KF sensitivities comparison for (a) ideal radiation and surface fluxes, (b) interactive radiation and ideal surface fluxes, and (c) fully interactive surface fluxes and ideal radiation. As in Section 6, responses to dT/dt and dq/dt perturbations are averaged. PBL sensitivities are shown in first and second rows, and MP sensitivities in third and fourth rows. Figure A2. As in Figure A1 but for WRF-BMJ.
flux computation (following Equations 3 and 4); in the second set of simulations we kept the idealized radiative profile and enabled fully interactive surface flux computation. These simulations were carried out with two WRF cases: WRF-KF (a mass-flux scheme) and WRF-BMJ (an adjustment-type scheme).
Results are shown in Figures A1 and A2. For both SCMs, PBL and MP schemes affect results when either the radiation or surface wind and exchange coefficients are made interactive, but to varying degrees. For WRF-KF, the responses are sensitive to the choice of MP scheme when the radiation is interactive, likely due to the impact of cloud changes on radiation (which are negated in the idealized setup), while fully interactive surface fluxes only slightly decrease the sensitivity. For WRF-BMJ, the responses are significantly more sensitive to the choice of PBL scheme and slightly more sensitive to the choice of MP scheme when either of the idealized settings is disabled. In any case, applying both idealized settings decrease the dependence of the responses on PBL and MP schemes.

Data Availability Statement
The data and scripts required to reproduce the results described in this paper are available in a Zenodo repository: https://zenodo.org/record/4571658 (DOI: 10.5281/zenodo.4571658).