A Machine-Learning-Assisted Stochastic Cloud Population Model as a Parameterization of Cumulus Convection
Abstract
A machine-learning-assisted stochastic cloud population model is coupled with the Advanced Research Weather Research and Forecasting (WRF) model to represent fluctuations in the cloud-base mass flux associated with the life cycles and interactions among cumulus convection cells. In this cloud population model, the size distribution and the associated cloud-base mass flux of the convective cells are related to their previous state and to the change in the total convective area via a transition function. The convective area tendency in turn is assumed to depend on the cloud-base mass flux that is resolved by the host WRF model. The transition function is represented by a single hidden-layer neural network trained by the evolution of convective cell size distributions in a 1-km grid-spacing WRF simulation run over the Australian Monsoon region. At every grid point of the host model, the cloud population model predicts the cell size and cloud-base mass flux distributions from which a random sample of cells is fed to an entraining parcel model that calculates precipitation as well as the associated liquid water potential temperature and total moisture tendencies. These tendencies are averaged over the cells and provided to the host model. Several regional simulations are performed over tropical and midlatitude domains to test this as a potential approach to scale-aware parameterization. It is shown that such an approach could be a new promising path to simulating realistic precipitation statistics and propagation of precipitation associated with the Madden-Julian Oscillation while maintaining realistic depictions of the diurnal cycle over both land and ocean.
Key Points
-
A stochastic cloud population model is coupled with a mesoscale model for applications as a parameterization of cumulus convection
-
The model predicts the evolution of the cloud-base mass flux distribution via a transition function obtained using machine learning
-
The potential of the approach as a path to improvement of statistics and variability of precipitation in models is demonstrated
Plain Language Summary
A stochastic cloud population model is coupled with a Weather Research and Forecasting model for applications as a parameterization of cumulus convection. The model predicts the evolution of the cloud-base mass flux distribution via a transition function obtained using the machine learning algorithm trained by a 1-km grid-spacing convection permitting model simulation. The predicted mass flux statistics is then provided to an entraining plume model to obtain heating and moistening tendencies. The potential of the approach as a path to improvement of statistics and variability of precipitation in models is examined over multiple regimes.
1 Introduction
The expansion of computational resources and numerical methodologies has allowed global weather forecasting and experimental climate models to run with horizontal grid-spacings of 10s of kilometers or less. But this opportunity comes with the requirement to re-examine the treatment of subgrid convection processes. Traditionally climate models, which typically run at a grid spacing of 100 km or more, relied on cumulus parameterizations, which implicitly or explicitly assume a “quasi-equilibrium” (QE) balance between large-scale (resolved) forcing and convection. Under the QE assumption, low-level heating and moistening along with upper-level radiative cooling and other processes that destabilize the troposphere are assumed to be in balance with the net large-scale effects of unresolved, convective cloud processes, including upper-level warming and boundary-layer drying (Arakawa & Schubert, 1974; Emanuel, 1994). These assumptions, while reasonable for coarse grid-spacing and under slowly varying conditions, fail for high-frequency variability from forcings such as the diurnal cycle and for the organized convection that covers areas comparable to the grid cell size such as MCSs (Jones & Randall, 2011; Xu et al., 1992). At grid spacings of order 1 km, convection-permitting simulations have proved valuable, typically for local or regional scales (Clark et al., 2016; Poujol et al., 2020). However, neither traditional parameterization nor explicit modeling is optimal for the intermediate grid-spacing models (Arakawa & Wu, 2013; Gerard, 2015).
Several approaches have been proposed to address the issue of representation of variability associated with lifecycles of the convection and interactions among clouds. Reviews are provided by Rio et al. (2019), Goswami et al. (2017), S. Hagos et al. (2018), and S. Hagos et al. (2020), and only a summary of relevant points is given here. One approach is to introduce stochasticity into parameterizations with a prescribed distribution of the mass flux or the convective area (Khouider et al., 2003; Plant & Craig, 2008; Teixeira & Reynolds, 2008; Wang et al., 2016, 2021). Alternatively, noise has been introduced to the heating/moistening tendencies or to state variables in GCM vertical columns (Davini et al., 2017) in some schemes. Yet another approach put forward by Majda and Khouider (2002) and Khouider et al. (2010) is a stochastic multicloud representation based on the Markov process on a subgrid lattice.
In our recent paper, S. Hagos et al. (2018), a stochastic prognostic model for the population dynamics of convective clouds was proposed. The model takes a nonequilibrium statistical mechanical approach to represent the evolution of the size distribution of convective cells and their associated cloud-base mass flux. At the core of the model is a transition function that relates the size distribution of convective cells at a given time to a previous state and to a given change in the total convective area in the box. The transition function was derived from physical and probabilistic arguments. In a follow-up study, S. Hagos et al. (2020), it was shown that such a transition function can also be directly derived from C-band radar observations of precipitating clouds via machine learning. In that study, the use of the transition function was extended to construct a simple model for describing the interactions between convective and stratiform clouds. The present study explores the potential applications of the ideas developed by S. Hagos et al. (2018, 2020) for the development of stochastic cloud population model-based cumulus parameterizations that can represent precipitation statistics in a regional mesoscale model at grid spacings between 25 and 1 km. The next section presents a description of the cloud population model (Sections 2.1-2.3) and its coupling to the mesoscale model (Section 2.4). Section 3 examines the impacts on precipitation statistics, the spatial distribution and diurnal cycle of precipitation, and precipitation propagation as part of Madden-Julian Oscillation events. The implications of the study are discussed in Section 4.
2 Development
2.1 Design
The overarching design of the cloud population model and its coupling to the host model is depicted in the schematic in Figure 1. The forcing to the cloud population model is provided by the resolved cloud-base mass flux (Mcb_res) in a grid box of the host model. Specifically, we will use the Weather Research and Forecasting (WRF; Skamarock et al., 2008) version 4.0 model for the tests below but the method could straightforwardly be applied to other host models with appropriate modification to the closure depending on the grid spacing of interest. The cloud population model applies a closure assumption to calculate in turn the change in the total convective area ΔAc in the grid box. The closure is presented in Section 2.4.
The cloud population is divided into N bins of the cloud-base mass flux, with a subscript i being used henceforth to denote a bin label. In the practical implementation for this study, we choose N = 30 bins, with the mean cloud-base mass flux in each bin exponentially increasing from a minimum of mcb(1) = 0.2 kgm−2s−1 to mcb(30) = 1.4 kgm−2s−1. These values correspond to the minimum mcb used to define a mass flux object in the WRF-CPM training simulation (Section 2.3) and to the maximum value obtained from the simulation. Precipitation from objects within this range accounts for almost all of the precipitation in the WRF-CPM domain.
c is a vector whose ith element is the total convective area associated with convective cells in the bin with a mean cloud-base mass flux of mcb(i) . Therefore, the total convective area in a grid box is given by . The mean cell area for the ith bin, denoted as ai, is constant and obtained by taking the average of the sizes of all cells in the mass flux bin mcb(i). Thus, ci = niai where ni is the number of distinct cells in the bin.
In the next three subsections, the simulations performed as part of the development (Section 2.2), the machine learning algorithm (Section 2.3), and the closure (Section 2.4) are discussed. As it is not central to the work, the detailed description of the entraining plume model used to calculate the moisture, heating tendencies, and precipitation is provided in the Appendix A. Hereafter, the cloud population model being used as a cumulus parameterization will be referred to as the machine Learning-Assisted Model for Parameterization of convective cloud populations (LAMP).
2.2 Development Simulations
In the development of the scheme, five supporting WRF simulations are performed, and these are described here. For all the simulations, lateral boundary and surface conditions, including sea-surface temperature, are obtained from ERA5 (Copernicus Climate Change Service [C3S], 2017) and are updated every 6 hr. The first simulation has 1-km grid spacing and covers the month of January 2006, an active phase of the Australian monsoon season. The purpose of this simulation is to obtain the training data for the cloud-base mass flux. This simulation is hereafter referred to as WRF-CPM (1 km). Before describing the other simulations, the choice of 1-km grid-spacing simulation for training must be justified. It has been shown that to fully resolve mass flux fluctuations in the convection, grid spacings of 250 m or less are required (e.g., Bryan et al., 2003; Heinze et al., 2017). Since the objective of this work is to demonstrate the potential of this approach and its key feature is the representation of the evolution of a large population of convective clouds over water and land, the large domain and hence relatively coarse grid-spacing is necessary due to computational resources constraints. In our previous work (S. Hagos et al., 2014), the performance of such kilometer-scale simulations in representing radar observed precipitating cloud population statistics is extensively documented. It was shown that the WRF model represents well the statistics of the convective area distribution, the relationship between echo-top height and cell area, and cold pool properties. Ideally, one would perform multiple training simulations to cover regions with diverse environmental conditions. In fact, one can even imagine using global convection permitting (km scale) simulations like those under development in the DYAMOND project (Stevens et al., 2019) for training applications. As computational resources increase, one could also envision performing a development like that proposed here based on such sub-km grid-spacing simulations over multiple regions. As will be discussed in the evaluation section, the use of convection permitting simulations as ground truth comes with its own challenges. One has to acknowledge the biases associated with the treatment of a microphysical and boundary layer and other processes in such simulations.
The second and third simulations are similar but with larger domains and run at 8-km and 25-km grid spacings. They are referred to as WRF-CTL (8 km) and WRF-CTL (25 km), respectively. In all these three simulations, cumulus parameterization is turned off and the time steps for the three simulations are 4, 30, and 100 s, respectively. The domains for these simulations are shown in Figure 2. The 8 and 25-km grid-spacing simulations are also repeated with LAMP coupled with WRF and these are referred to as WRF-LAMP (8 km) and WRF-LAMP (25 km), respectively. All the simulations are centered at Darwin, Australia (12.46°S and 130.85°E) and include both tropical ocean and land conditions. The last four simulations at coarser resolutions (WRF-CTL [8 km], WRF-CTL [25 km], WRF-LAMP [8 km], and WRF-LAMP [25 km]) are run for two months from 1 January to 28 February 2006. The model configuration choices that are common to all the simulations are listed on Table 1.
Longwave radiation scheme | The rapid radiative transfer model (Mlawer et al., 1997) |
Shortwave radiation scheme | The rapid radiative transfer model (Morcrette et al., 2008) |
Microphysics scheme | Thompson (Thompson et al., 2008) |
Land surface model | Unified Noah model (Ek et al., 2003) |
Boundary layer scheme | Mellor-Yamada Janjic scheme (Janjic, 2001) |
Surface, initial and boundary condition data | ERA5 (C3S, 2017) updated every 6 hr |
Number of vertical levels | 35 |
Model top | 50 hPa |
2.3 Determination of the Transition Function Using Machine Learning
A convective cell is defined as a contiguous area of the cloud-base mass flux of greater than 0.2 kgm−2s−1. The domain is gridded into 100 by 100 km boxes as shown in the example frame in Figure 3a. Although the target grid-spacing for the LAMP parameterization is less or equal to 25 km, this larger box is selected to include a sufficient sampling of larger cells. The convective cells within the 100 km x 100 km boxes are placed into N = 30 bins according the mean cloud-base mass flux of their respective contiguous areas, as described in the last subsection. The bin average mass fluxes constitute the n-dimensional mcb vector discussed above while the bin-average and bin-total cell areas are represented by the vectors a and c, respectively. Figure 3b shows the mean values of mcb(i) and ai for the 30 bins. For a given 100 × 100 km box (Figure 3a), c (t), the corresponding Δc(t) = c(t + 1) − c(t) and ΔAc are calculated from the 1-km grid-spacing CPM simulation for training the machine learning algorithm and determining f(c) from Equation 1. The 10-min frequency of training data, motivated by the frequency of precipitation radar scans (S. Hagos et al., 2018, 2020), is found to be computationally reasonable to capture the convective variations of interest on the scale of the box. However, in the implementation of WRF-LAMP, c is evolved at the frequency of the model time-step, along with Ac.
In many ways, the machine learning algorithm developed here, shown in Figures 3c, is like that presented in S. Hagos et al. (2020). The key differences here are that (a) the bins are defined by mass flux values from the 1-km grid-spacing simulation instead of convective cell sizes from radar observations and (b) in the current implementation, the explicit prediction of the stratiform area is excluded for simplicity. A single-nonlinear-hidden layer machine learning algorithm is constructed. The hidden layer constitutes a rectified linear unit activation function that introduces the nonlinearity. A graphical depiction of the algorithm is provided in Figure 3c. The transition function is assumed to be completely defined by the pairs of weight arrays (w0 and w1) and bias vectors (b0 and b1) and given the constraint that the error (Error), defined as the sum of the squares of the difference between elements of predicted and true Δc arrays, is minimized. The code is written in TensorFlow and the Adaptive Moment Estimation optimizer (Kingma & Ba, 2015) is used in the minimization. As is standard in machine learning applications, one set of snapshots (40,000 pairs) is used for training while another set of the same size is used for testing. The results are rather insensitive to the choice of the training and validation sets.
The weights and biases that define the transition function are optimized such that the error in the predicted changes in the convective area in the given mass flux bin is minimum as discussed above. Thus, in a statistical sense, given the evolution of the total convective area, the transition function should correctly predict the evolving frequency distributions of the mass flux. Figure 4 shows the frequency distribution of the mass flux as a function of the total convective area obtained from the CPM simulation and the corresponding distribution obtained from applying the transition function to the total convective area time series. The machine learning algorithm provides a reasonable description of the mass flux distribution. At smaller values of the total convective area, the distribution is relatively broad with both weak and strong convective cells co-existing with a comparable frequency. As the total convective area becomes larger the mass flux distribution becomes narrower and has a peak at about 0.8 kgm−2s−1. There are relatively few weak cells and the convection is dominated by strong cells, as one would expect from the increased organization.
2.4 Precipitation Data Set for the Evaluation
To evaluate the model simulated precipitation over various regions and seasons in a consistent manner, the Integrated Multi-satellite Retrievals from Global Precipitation Measurement V06B (IMERG, Huffman et al., 2019) is obtained and processed. This data set has a horizontal grid-spacing of 0.1° and the frequency of half-hour with global coverage. Recent evaluations of the IMERG data set against ground-based observations have documented significant improvements over its predecessor, the Tropical Rainfall Measuring Mission (TRMM) as well as other satellite products especially over tropical land such as India (Murali Krishna et al., 2017), Africa (Dezfuli et al., 2017), Australia (Islam et al., 2020) and Taiwan (Hong et al., 2006). However, over Western Africa, it has been shown to overestimate the rainfall amount from frequently occurring weak convective events while underestimating that from rare but strong MCSs (Maranan et al., 2020). Over the continental US, on the other hand, IMERG was found to have a wet bias in total precipitation, an overestimation of the frequency of weak precipitation (<2 mm hr−1), and underestimation of the frequency of intense precipitation (>10 mm hr−1) in warm-season MCSs (Cui et al., 2020). The IMERG data set has been used to develop a global MCS tracking data set in a recent study (Feng et al., 2021), where it is found that maximum MCS rainfall intensity over the ocean is higher than that over land. The authors of that study cautioned that IMERG precipitation retrievals may be biased in intense convection, like those based on spaceborne radar/microwave platforms such as TRMM or GPM (Gingrey et al., 2018). While the IMERG data set is by no means perfect, it has been examined over a broad range of environmental conditions. In this study, we use it as a benchmark for the development and evaluation of cloud-population-model-based parameterization. For the applications throughout this paper, the data set is regridded onto a 25-km grid and averaged to hourly, as are all of the precipitation data from the various simulations that it is compared to.
2.5 Coupling, Closure, and Optimization
-
Based on the closure the total convective area is determined.
-
The distribution of the convective area c is updated.
-
A random subsample of the convective activity, determined by p, is assumed to have taken place within the given grid box and
-
That subsample is passed on to the entraining plume model.
Figure 6 provides a comparison of the resulting precipitation frequency distribution for WRF-LAMP with those for WRF-CTL and observations. The distribution represents the count of the number of grid points in one of 40 precipitation intensity bins in each hour. The mean values in the 40 bins increase exponentially from 10−2.0 to 102.5 mm/hr. As the comparison between Figures 6a and 6b indicates, the WRF-LAMP produces precipitation statistics that are more in line with WRF-CPM (1 km) as well as IMERG and the sensitivity to grid-spacing is much reduced. While beyond the scope of this study, the discrepancy between the precipitation statistics from the WRF-CPM simulation and the IMERG observation is worth noting. Ultimately the success of this approach to parameterization hinges on continued improvements in the training simulations or perhaps even preferably using actual observations for training, the latter is a subject of an ongoing work, which will be reported elsewhere. The percentage contributions of various precipitation intensities to the total are also calculated by multiplying the frequency with the corresponding precipitation intensity (Figures 6c and 6d). The convergence toward the observations of the WRF-LAMP simulations in comparison to WRF-CTL is apparent.
The spatial distribution of mean precipitation is also considered. Figure 7 shows comparisons of the 2-month mean precipitation from the WRF-CTL and WRF-LAMP simulations, along with IMERG observations. The WRF-CTL simulations typically underestimate the precipitation over the waters surrounding the Maritime Continent region and the WRF LAMP simulation appear to somewhat reduce that bias. Although these improvements are encouraging, they are direct results of the development of WRF-LAMP focused over the Australian monsoon region specifically. To assess the potential value of this approach to parameterization, its impacts on precipitation over other regimes must be examined. This is the subject of the next section.
3 Evaluation of Impact on Precipitation
The present evaluation is meant to assess the potential value of developing this approach toward a more mature parameterization for the coming generations of climate models. Thus, one must keep in mind that the development of the cloud population model has several steps each with room for improvement. For example, one can imagine an increased resolution and fidelity of the training simulation or preferably even using observational data for training. One could also improve on the plume model, the closure, etc. With that in mind, a comprehensive evaluation of the full parameterization against multiple suites of observations and other more mature parameterizations is left as a future work. In this section, we provide a comparatively preliminary documentation of the performance of LAMP under various environmental conditions. Specifically, we focus on precipitation statistics, the diurnal cycle over the global tropics and continental US, and the propagation of precipitation associated with MJO events. To that end, in addition to those already described in Section 2, two additional pairs of simulations are performed. These simulations cover the tropical channel (hereafter TROPICS) and the continental US (hereafter CONUS) for periods of 1 October – 31 December 2011 (a period including a series of MJO events) and 1 April – 30 June 2016 (a rainy season with MCS-related precipitation), respectively. As for the Australian monsoon simulations discussed in Section 2, for the control cases (WRF-CTL, hereafter), the cumulus parameterization is turned off and in companion simulations, the LAMP parameterization is turned on (WRF-LAMP). All the evaluation simulations are run at a 25-km grid spacing. All other parameterizations, and the data sources used for lateral and surface conditions follow the Australian Monsoon development case discussed above.
Figure 8 shows the simulation domains and spatial distributions of the seasonal mean precipitation over the tropical channel (TROPICS) from the WRF-CTL and WRF-LAMP simulations compared with IMERG. While both simulations capture the main precipitation features (i.e., the ITCZ and continental precipitation over the various regions), WRF-LAMP has particularly high precipitation over the Western Pacific. This might be due to the oversensitivity of the cloud model to moisture rather than due to the neglect of evaporation of rain as the region is particularly moist. Figure 9 shows the spatial distribution of mean precipitation in the CONUS simulations compared to IMERG observations. Both the WRF-CTL and WRF-LAMP simulations capture the spatial distribution of precipitation well.
The model-simulated precipitation statistics over the five regions marked by the red boxes in Figure 8a, and over the CONUS region are calculated in a like manner to those shown in Figure 6, and they are compared with statistics from IMERG in Figure 10. The precipitation statistics in WRF-LAMP simulations (green) generally improve over the control to a varying degree, particularly for medium or high rain rates. In general, the underestimated frequencies of weak-to-moderate rain rates (0.1–10 mm/hr) in WRF-CTL are improved with LAMP. The frequencies of intense rain rates (>10 mm/hr) are generally reduced with WRF-LAMP compared to WRF-CTL, which is also mostly an improvement. Similar conclusions can also be drawn from the percentage contributions of various precipitation intensities (Figure 11). Specifically, over the land, the excessive amounts of rain produced by intense rain rates (>10 mm/hr) in WRF-CTL are reduced in WRF-LAMP, and the underestimated amounts from moderate rain rates (2–10 mm/hr) are also improved. While over the ocean, the reduction in the contribution from intense rain rates is excessive in LAMP, but the improvements in low and moderate rain rates are nonetheless apparent.
The diurnal cycles over these regions are represented well in both the WRF-CTL and WRF-LAMP simulations (Figure 12). Although many simulations with explicit convection are known to be able to produce reasonable representations of the diurnal cycle (e.g., Keat et al., 2019; Marsham et al., 2013), by contrast, many simulations with the parameterized convection struggle to produce good results (C. Zhang et al., 2020; Dirmeyer et al., 2012), despite some recent improvements (e.g., Baba, 2020; Bechtold et al., 2014; Xie et al., 2019). There is a bias of excess precipitation over the western Pacific with WRF-LAMP (Figure 12f). This excess precipitation over the western Pacific indicates that the convective activity in WRF-LAMP is overly sensitive to moisture, which may be related to the choice of entrainment rate in the entraining plume model. The relatively good diurnal cycle arises from the treatment of the WRF-LAMP closure, in that a convective area fraction is set to be proportional to the resolved cloud-base mass flux, as compared to many common closures that link the parameterized convection to measures of instability that peak at noon.
Finally, the propagation of MJO precipitation signals in WRF simulations is examined using MJO events observed in the winter of 2011. Figure 13 shows the Hovmöller diagrams of precipitation averaged between 5°S and 5°N, and the eastward propagation of the precipitation signals associated with three MJO events is most apparent in IMERG and in WRF-LAMP. As is common in many traditional parameterizations, the control simulation has a persistent precipitation signal during an otherwise suppressed phase of the MJO (S. M. Hagos et al., 2016). WRF-LAMP reduces the contribution of the low-intensity precipitation that prevails in WRF-CTL during the suppressed phases of the MJO resulting in a better agreement with the observations. This contrasts with many quasi equilibrium-based cumulus parameterizations that typically degrade the propagation of precipitation signals (e.g., Jiang, 2017).
While this study demonstrates the utility of machine-learning-assisted stochastic cloud population model for parameterization, the actual application in a global modeling research and operation would require addressing a few issues. The first issue is related to the imperfection of the training data. This has two aspects. The first one is related to biases in the convection-permitting model simulation used for training. One possible way to address this is training the algorithms directly using observations such as data from radar scans and soundings. Granted such observations have their own limitations. The second is generalizability in the sense of regional dependence of the relationship captured by the machine learning algorithm. Ideally, such a training would sample all major regions of the convective activity, which in theory could be obtained from global convection permitting model simulations like those from DYAMOND (Stevens et al., 2019) as discussed in Section 2.4. The third issue is related to closure. In this work, we argue that the resolved cloud-base mass flux contains information about a subgrid convective activity that could be exploited for closure. This is like moisture-convergence-based (Kuo, 1974) closures because the resolved mass flux at cloud base and the height of cloud base define the mass convergence while the effect of moisture is implicit in the height of the cloud base. This approach, while it has distinct advantages in that its accuracy improves with increasing resolution and scale-awareness can be introduced in a straightforward manner, could introduce large errors at courser resolutions as scale separation is the main rationale for parameterization in the first place. The fourth issue is related to the cloud model, which ultimately calculates the thermodynamic tendencies based on the predicted cloud-base mass flux and the environment. An obvious limitation of the cloud model presented here is a lack of representation of evaporation of precipitation, which can potentially lead to excessive precipitation over regions with dry boundary layers. For more practical applications, coupling a complex cloud model that accounts for this and other processes would be necessary.
4 Discussion
Increased availability of computational resources is enabling climate model simulations of the order of 10s of kilometers of the grid spacing. These advances require rethinking the role and design of cumulus parameterizations since the use of traditional cumulus parameterizations developed for coarser grid spacings has been widely documented to contribute to biases in precipitation statistics as well as issues in simulating modes of variability such as the diurnal cycle, the propagation of the MJO, and associated tropical precipitation signals. Convection-resolving models of order km grid spacing have shown some successes in eliminating or reducing these biases. The present study contributes to the development of parameterizations for grid spacings between 25 and 1 km by exploiting high-resolution limited-area modeling and machine learning.
The WRF-LAMP parameterization aims to represent variability in the cloud-base mass flux that is associated with lifecycles of clouds and interactions among them using a cloud population model with a machine-learning derived transition function. Given the change in the total convective area in a grid box, the population model predicts a distribution of convective areas for given bins of the cloud-base mass flux and cell size statistics. As a closure, the total convective area in a grid box is assumed to be proportional to the resolved cloud-base mass flux obtained from the host model, WRF. Cloud-base mass fluxes and cell sizes are drawn randomly as a subset of the predicted distribution and these are then fed to an entraining plume model that calculates heating and moistening tendencies as well as precipitation. The tendencies from each bin are combined and fed back to the host model.
The model performance is evaluated in terms of precipitation statistics, the diurnal cycle, the precipitation climatology (over tropical and midlatitude regions), and the propagation of precipitation signals associated with the MJO. Precipitation statistics in coupled cloud-population-model simulations are significantly improved over those in control simulations with no cumulus parameterization over many regions. It is also shown that the proposed parameterization represents well the diurnal cycle over both land and ocean regions in the tropics and over the continental US. This is attributed to the closure that relates the subgrid convective area to the resolved-scale cloud-base mass flux. This key assumption is based on the arbitrariness of the model grid-spacing and the need to do away with the “large-scale” and “convective-scale” separation. Similarly, unlike traditional quasi equilibrium-based parameterizations, it improves the propagation of precipitation signals associated with the MJO by reducing excessive precipitation during suppressed phases.
It should be noted that the parameterization approach proposed here is in some ways similar to other stochastic representations of the convection. But unlike many approaches that prescribe the distribution from which the convective events are drawn, the cloud population model here predicts the distribution based on the machine-learning-trained transition function. As such it has limitations that follow from the specificity of the training simulation, as is apparent, for example, in the wet bias over the western Pacific. Moreover, as a simple demonstration of the approach, several simplifying assumptions are made. Thus, the present study has to be considered as a step toward a more general class of approaches utilizing observations, high-resolution model simulations, and machine learning toward efficient representation of the convection in the next generation of global models. Some potential benefits from following this parameterization approach have been demonstrated in comparison to control simulations without a parameterization. Future work will include an examination of the performance of this approach in comparison to other approaches with comparable levels of complexity, of the scope for tuning as well as more extensive evaluations against observations from multiple platforms. Assessment of global applicability of transition functions obtained from a given set of observations or model simulations is also of scientific interest as it can reveal previously unknown cloud transition processes specific to a regime.
Acknowledgments
This research is based on work supported by the U.S. Department of Energy Office of Science Biological and Environmental Research as part of the Atmospheric Systems Research (ASR) Program. Computing resources for the model simulations are provided by the National Energy Research Scientific Computing Center (NERSC). Pacific Northwest National Laboratory is operated by Battelle for the U.S. Department of Energy under Contract DE-AC05-76RLO1830.
Appendix A: Entraining Plume Model
As in many mass-flux-based cumulus parameterizations, WRF-LAMP relies on an entraining plume model to calculate moistening and heating tendencies for the host model as well as the associated precipitation. In addition to the thermodynamic profile, the WRF-LAMP entraining plume model is provided with the subsample cloud-base mass fluxes mcb and the corresponding subsample cell sizes a. In this appendix, a description of the plume model is provided. It is based on that developed by Kim and Kang (2012) but with several differences detailed below.
The buoyancy is then updated and the plume calculations integrated up to the next model level until the model top is reached.
Table A1 shows the constants and parameter choices in the entraining plume model.
Parameter | Value |
---|---|
c0 | 1.0e−4/m |
ce | 2.0e−3 |
R | 287.05 J/(kg·K) |
Rv | 461.61 J/(kg·K) |
Lv | 2.5e6 KJ/Kg |
Lf | 3.35e5 J/Kg |
Κ | 0.622 |
p0 | 90,000.0 Pa |
cp | 1,005.0 J/(kg·K) |
T0 | 288.15 K |
Tmax | 263.15 K |
Tmin | 233.15 K |
qv0 | 0.006 kg/kg |
λmax | 0.001/m |
Open Research
Data Availability Statement
The WRF-LAMP model and analysis codes and data are available at https://portal.nersc.gov/project/cpmmjo/WRF-LAMP/.