Observational Constraints on Warm Cloud Microphysical Processes Using Machine Learning and Optimization Techniques
Abstract
We introduce new parameterizations for autoconversion and accretion rates that greatly improve representation of the growth processes of warm rain. The new parameterizations capitalize on machine-learning and optimization techniques and are constrained by in situ cloud probe measurements from the recent Atmospheric Radiation Measurement Program field campaign at Azores. The uncertainty in the new estimates of autoconversion and accretion rates is about 15% and 5%, respectively, outperforming existing parameterizations. Our results confirm that cloud and drizzle water content are the most important factors for determining accretion rates. However, for autoconversion, in addition to cloud water content and droplet number concentration, we discovered a key role of drizzle number concentration that is missing in current parameterizations. The robust relation between autoconversion rate and drizzle number concentration is surprising but real, and furthermore supported by theory. Thus, drizzle number concentration should be considered in parameterizations for improved representation of the autoconversion process.
Key Points
-
Machine-learning trained by in situ data constrains autoconversion and accretion rates with uncertainty of 15% and 5%, respectively
-
There is a surprising relation between autoconversion rate and drizzle number concentration that significantly improves parameterizations
-
The exponent of autoconversion rate dependence on cloud number concentration is 0.75, lower than that in existing parameterizations
Plain Language Summary
Drizzle has been a key element of research, because its formation modulates cloud properties and evolution, and affects the water cycle of the Earth. Since drizzle formation involves cloud droplets of all sizes, it requires extensive computational time. Hence, we often use simplified methods in weather and climate prediction models to obtain a bulk estimate of how fast and how many cloud droplets collide with each other or collide with bigger drops to form drizzle. However, many models continue to have inadequate representation of drizzle formation, calling for the need to improve these simplified methods. We introduce new methods to estimate the rate of those microphysical processes, capitalizing on aircraft measurements and recent advances in machine-learning techniques. Our techniques outperform the current methods significantly. Importantly, our analyses reveal that the rate of drizzle formation via collisions between cloud drops is related to drizzle drop number concentration itself, which is missing in the existing methods. This relation occurs because drizzle drop number concentration provides information on the stage of evolution of cloud size distribution during drizzle formation. Although this is not a causal relationship, it is important to incorporate this relation into models for better prediction of drizzle formation.
1 Introduction
Warm rain formation plays a crucial role in determining the properties and life cycle of marine boundary layer clouds, and has significant impacts on radiative and hydrological budgets. Yet, many global weather and climate models continue to produce rain too frequently over oceans (e.g., Stephens et al., 2010), too light (Ahlgrimm & Forbes, 2014; Jing et al., 2017), or too heavy (Abel & Boutle, 2012; Bodas-Salcedo et al., 2008). The intermodel spread in precipitation rate in the southeast Pacific, one of the major marine boundary layer cloud decks, can be an order of magnitude (Wyant et al., 2015). The model discrepancy and spread in precipitation are linked to diverse issues, such as rain drop size distributions, and the representation of boundary layer, autoconversion, and accretion processes. The effects of autoconversion on precipitation can change surface temperature prediction significantly (Golaz et al., 2013).
Many warm rain parameterizations for autoconversion and accretion processes have been developed in the past (e.g., Beheng, 1994; Berry, 1968; Kessler, 1969; Khairoutdinov & Kogan, 2000; Liu & Daum, 2004; Seifert & Beheng, 2001; and many others). These parameterizations have been reviewed critically (Lee & Baik, 2017; Liu & Daum, 2004; Wood, 2005), and confronted with in situ observations (Hsieh et al., 2009; Wood, 2005). By applying in situ size-resolved cloud measurements to the continuous collection equation, the two aforementioned observational studies showed that parameterized accretion rates generally agree with in situ data, but parameterized autoconversion rates can be significantly different from observational estimates. While these results are encouraging and informative, there has been little follow-up observational work. It remains unclear how to maximize the use of observations for improving understanding and model representations of these microphysical processes, and how to extend constraints from in situ to remote sensing platforms that can provide continuous observations in various cloud regimes on a global scale.
The objectives of this study are manifold. Instead of evaluating existing parameterizations, here we use in situ observations to build machine-learning (ML) models to “predict” autoconversion and accretion rates. Since translating ML results to physical formulations is not trivial and remains an active research area, we also perform nonlinear optimizations to fill the gap and to quantify the relationships of autoconversion and accretion with cloud/drizzle properties. These results are compared and contrasted with widely used parameterizations, and the implications are discussed.
2 In Situ Cloud Measurements
In situ cloud measurements were taken from the Aerosol and Cloud Experiments in the Eastern North Atlantic (ACE-ENA) field campaign, deployed by the Atmospheric Radiation Measurement (ARM) user facility. The aircraft flew near the ARM site on Graciosa Island during two intensive operational periods in June-July 2017 (IOP1) and January-February 2018 (IOP2). The cloud types sampled in ACE-ENA are mainly marine stratocumulus, with some scattered or precipitating cumulus. Measurements from three cloud probes were merged to form combined drop size distributions (DSD). Cloud droplets are defined as those with radii smaller than 25 µm, and drizzle drops are defined as those with radii larger than 25 µm and up to 400 µm. The choice of the cloud/drizzle separation threshold is appropriate for marine stratocumulus (Khairoutdinov & Kogan, 2000; Kogan, 2013). We also define a cloudy sample when cloud water content 0.01 g m–3. Based on this definition, a total of ∼93,000 in situ cloudy DSDs comprise 11% drizzle-free DSDs and 89% drizzling DSDs after data screening (see Table S1). The smallest drizzle water content (qr) observed in the drizzling DSDs is on the order of 10−5 g m–3, and thus we set it as the threshold for drizzle delineation, denoted as qr,crit. Property distributions from individual days for cloudy samples are shown in Figures 1a–1d.
3 Parameterization Derived From Machine Leaning Techniques
The in situ DSDs from ACE-ENA are used to develop 2 ML models. The first ML model, dubbed “initiation model,” uses two inputs (qc, Nc) for predicting autoconversion rate (Pau) in drizzle-absent conditions. The second, dubbed “standard model,” uses four inputs (qc, Nc, qr, Nr) for predicting Pau and accretion rate (Pac) in drizzling conditions. As discussed further below, we use the initiation model to generate nonzero qr and Nr values. Once qr and Nr exist, the standard model is superior and used to better predict Pau and Pac.
Both models use an Artificial Neural Network. It is a deep feed forward neural network (Schmidhuber, 2015) comprising eight hidden, fully connected layers with 1,024 nodes in each layer. All input and output variables are transformed to their logarithmic forms. Since these input variables have rather different magnitudes, we normalized them using their mean and standard deviation. We used LeakyReLU (Mass et al., 2013) as our activation function. Additionally, the training was performed by the Adam optimizer (Kingma & Ba, 2015), based on a loss function defined as the mean squared error between the true value and the prediction.
The training data sets for the two models are different but originate from the same pool of data points. The pool was generated by using the in situ DSDs as the initial conditions and propagating the DSDs forward in time with the stochastic collection equation (SCE). We used the two-moment bin model of Tzivion et al. (1987) to compute Pau and Pac directly from the explicit drop-drop interaction terms at 1-s time steps for 10 min. The bin model uses the Hall (1980) kernel. The 10-min time period is based on the typical in-cloud residence time (Feingold et al., 1996). Since our focus is on clouds, we exclude noncloudy data points from the pool.
For the initiation ML model, the training data set is based on data points generated from the initially drizzle-free DSDs in the pool. To ensure that DSDs used for the initiation model are absolutely drizzle free, we exclude DSDs that contain cloud droplets in the instrument size bin (17.5–22.5 µm radius) proximate to the cloud/drizzle boundary (i.e., 25 µm radius), based on the uncertainty of 1.5–5 µm in in situ size measurements (Glienke & Mei, 2019, 2020). We do retain all DSDs that do not have droplets in that bin initially but produce nonzero qr in 5 s. These are practical choices that are as inclusive as possible of DSDs and also facilitate the ML. As shown in Figures 2a and 2b, the initiation model has the 25th and 75th errors ranging between –60% and 80%.
Note that these initially drizzle-free DSDs generate qr ranging between 10−18 and 10−9 g m–3 in 5 s. The lower bound, 10−18 g m–3, is equivalent to one single drizzle drop with a radius of 25 µm in a 10 km × 10 km × 500 m cloudy volume. The upper bound, 10−9 g m–3, is at least four orders of magnitude smaller than any in situ measured qr. Therefore, although the initiation model generates nonzero qr, these numbers are very small and should still be considered as nondrizzling in any practical sense. We emphasize that the initiation ML model is mainly used as a gateway to the standard ML model that requires nonzero qr as input.
For the standard model, we sample all the cloudy points every 5 s from the pool to form the training data set, as long as their qr 10−18 g m–3. This threshold is based on the qr magnitude that can be initiated by the aforementioned initiation model, ensuring that the qr range between two models overlap as much as possible. This leads to a total of ∼10.7 M data points. From the rest of the pool that are not sampled for training, we randomly selected 2.5 M points for testing. The ratio between the training and the testing is about 4, similar to the standard practice in ML. Additionally, both the training and testing data sets contain ∼25% data that have a ratio of Pau to Pac 1 (i.e., in the early stages of drizzle formation), and 75% data that have a ratio < 1.
Figures 2, 3, 2d, 3a and 3b show the performance of the standard ML model. For both Pau and Pac, the majority of the data points fall on the 1:1 line in the scatter plots, confirming the appropriateness of the neural network. The uncertainty is 15% for Pau, and 5% for Pac. The good performance on the testing data set indicates that the ML model does not suffer from overfitting. Predicting Pau and Pac for these 2.5 M points takes about 100 s using a single Intel Xeon E5-2697V4 processor. Note that training Pau and Pac separately using the same input yielded similar results (see Table S2).
4 Parameterizations Based on a Simple Form
We use the testing data set with 2.5 M points to derive the parameters in Equation 3. The in situ qc and qr have an uncertainty of 30%, while Nc and Nr have an uncertainty of 50% and 20%, respectively (Glienke & Mei, 2019, 2020; Mei et al., 2020). These uncertainties are accounted for in Equation 3, leading to an additive error of 0.8 in Equation 5. Since it is possible that not all the variables constrain the solution, we systematically reduce the number of variables and adjust the additive error accordingly in the minimization.
Table 1 summarizes the parameter estimates and error statistics for predicting Pau and Pac, based on the testing data set but with qr qr,crit. The reason for this restriction on qr is that the power laws are unable to fit a range of qr spanning 18 orders of magnitudes. As a result, the sample size was reduced from 2.5 to 2.3 M. If we must predict Pau from qc and Nc alone, as do existing parameterizations, the corresponding exponents are about 2.90 and –1.69, respectively. Our qc exponent is closer to LD's value (a = 3) than KK's (a = 2.47), and our Nc exponent is closer to KK's value (b = –1.78) than LD's (b = –1). As shown in Table 1, adding qr into parameterizations produces a better correlation in Pau predictions. However, in general, the parameterizations involving Nr tend to perform best and have smaller errors. Once Nr is considered in the physical relationship, the exponents for both qc and Nc, that is, the sensitivity of the Pau to these two variables, is reduced. Interestingly, we also find that the exponents of Nr and Nc are nearly reciprocal. The key role of Nr in autoconversion rate is counter-intuitive and will be discussed in the next section.
Corr. | Error (%) | k | a | b | c | d |
---|---|---|---|---|---|---|
25th/50th/75th | ||||||
Autoconversion | ||||||
Nonturbulent conditions | ||||||
0.96 | −39/−8/55 | 2.44 ± 0.05 | 2.0681 ± 0.0007 | −0.7760 ± 0.0007 | −0.1285 ± 0.0004 | 0.7844 ± 0.0005 |
0.96 | −39/−9/55 | 5.9 ± 0.1 | 1.9839 ± 0.0008 | −0.7496 ± 0.0007 | −0.0642 ± 0.0005a | 0.7043 ± 0.0006 |
0.92 | −52/−9/90 | (164 ± 2) E7 | 2.2742 ± 0.0007 | −1.0930 ± 0.0006 | 0.3177 ± 0.0002 | — |
0.93 | −49/−9/77 | (71.1 ± 1.4) E7 | 2.5247 ± 0.0009 | −1.0548 ± 0.0008 | 0.4185 ± 0.0004a | — |
0.96 | −39/−8/55 | 16.8 ± 0.3 | 2.0150 ± 0.0006 | −0.7461 ± 0.0006 | — | 0.6403 ± 0.0003 |
0.88 | −59/−8/129 | (375 ± 4) E12 | 2.8957 ± 0.0005 | −1.6945 ± 0.0004 | — | — |
Turbulent conditions with a dissipation rate of 400 cm2 s−3 | ||||||
0.96 | −39/−8/53 | 11.1 ± 0.2 | 1.9777 ± 0.0007 | −0.7366 ± 0.0006 | — | 0.6511 ± 0.0003 |
Nonturbulent conditions, but only using points that | ||||||
0.96 | −31/1/49 | (201 ± 7) E6 | 1.7699 ± 0.0018 | −0.7975 ± 0.0014 | 0.8043 ± 0.0009 | — |
0.96 | −31/1/48 | 1.22 ± 0.05 | 1.7656 ± 0.0017 | −0.7929 ± 0.0013 | — | 0.8432 ± 0.0008 |
0.84 | −56/3/129 | (73 ± 2) E10 | 2.611 ± 0.0014 | −1.512 ± 0.001 | — | — |
Nonturbulent conditions, but only using initially drizzle-free cloud size distributions | ||||||
0.89 | −54/−2/113 | (4 ± 2) E17 | 4.08 ± 0.02 | −2.25 ± 0.02 | — | — |
Accretion | ||||||
Nonturbulent conditions | ||||||
0.996 | −22/−3/22 | (94.8 ± 1.5) E5 | 1.4030 ± 0.0007 | −0.3147 ± 0.0006 | 1.3069 ± 0.0004 | −0.2389 ± 0.0004 |
0.997 | −25/−3/29 | (89 ± 1) E3 | 1.4159 ± 0.0006 | −0.3018 ± 0.0005 | 1.1172 ± 0.0001 | — |
0.960 | −64/27/244 | (631 ± 9) E–4 | 1.9603 ± 0.0006 | −0.6487 ± 0.0005 | — | 1.2153 ± 0.0001 |
0.996 | −23/−6/21 | 69.5 ± 0.2 | 1.1476 ± 0.0002 | — | 1.1587 ± 0.0001 | — |
Turbulent conditions with a dissipation rate of 400 cm2 s−3 | ||||||
0.997 | −20/−5/17 | 55.2 ± 0.1 | 1.1287 ± 0.0003 | — | 1.1462 ± 0.0001 | — |
For accretion, the key roles of qc and qr are consistent with existing parameterizations. Taking the KK parameterization as an example, the exponent for qc and qr is 1.15, which is very close to our exponents, 1.148 for qc and 1.159 for qr.
5 The Dependence of Autoconversion on Drizzle Number Concentration Nr
Results in Sections 3 and 4 demonstrate that both Pau and Pac are influenced by cloud and drizzle simultaneously. The influence of cloud and drizzle properties on accretion makes sense and is consistent with collision-coalescence theory and existing parameterizations, but the influence of drizzle on autoconversion is less straightforward.
From the definition of autoconversion, one should not expect a causal relationship between Nr and Pau. Instead, the dependence of Pau on Nr represents the influence on Pau from the evolution stage of the cloud DSD, which is related to the appearance of raindrops. Such an influence was first pointed out by Cotton (1972), followed by Seifert and Beheng (2001) who incorporated this associated relationship using the ratio of qr to total water content (shown as one of the options in Table 1). Zeng and Li (2020) also demonstrated that qr is a good predictor of the width of the cloud droplet size distribution, and thus Pau. However, it remains unclear what is the best form to describe this associated relationship, and whether qr and Nr are equally effective predictors.
To understand whether Nr contains different information from qr and whether Nr is an effective predictor for all coalescence regimes, we conducted a number of ML tests (see Table S2). For the regime of , we found the use of (qc, Nc, qr, Nr) remains the best, and the performances from (qc, Nc, Nr) and (qc, Nc, qr) are similar. For all regimes, the use of (qc, Nc, qr, Nr) is better than (qc, Nc, Nr), and the latter is better than (qc, Nc, qr). These suggest that qr and Nr contain different information. These also suggest that qr and Nr are equally effective predictors of the autoconversion-dominant regime, but Nr is a better choice for all regimes. This is understandable as qr depends also on the accretion rate, while Nr is not affected by accretion.
To examine the role of the first term in Equation 21, we calculate all terms for a wide range of size distributions approximated by various combinations of lognormal and gamma distributions with realistic cloud and drizzle properties (Figure S1). Our results show that although the first term alone cannot completely replicate Pau, compared to the second, third, -related, and the last term, the first term is closer to Pau than each of them in 95%, 100%, 71%, and 100% of all cases, respectively, with the relative contributions of terms dependent on the size distributions. This supports the finding in Table 1 that the first term is a good predictor of Pau, and provides evidence as to why our Pau estimates show a dependence on Nr, and why the inclusion of Nr in the autoconversion parameterization is beneficial.
6 The Effect of Turbulence
The ML model and power-law parameterizations introduced above are based on SCE calculations in nonturbulent conditions. Since small-scale turbulence can enhance the collection rate (e.g., Ayala et al., 2008; Chen et al., 2018; Grabowski & Wang, 2013; Wang & Grabowski, 2009), we evaluate the turbulence impact by incorporating the enhancement of the collision efficiency tabulated in Wang and Grabowski (2009). Using the enhancement factor under turbulent cloud conditions with a 400 cm2 s−2 dissipation rate, we found that the exponents in power-law relationships did not change significantly (see Table 1). Ignoring the turbulence effects in the ML model leads to (−15% ± 13%) errors in Pau and (−10% ± 7%) errors in Pac. The medians of the error histograms are −18% for Pau and −7% for Pac. Compared to the median errors introduced by the KK parameterization, which are respectively 45% and −20% for Pau and Pac (see Figures 2 and 3), the errors due to turbulence collision effects in our ML model are smaller and can be accounted for if the dissipation rate can be estimated from radar or lidar observations.
7 Summary
We have built machine-learning models to predict autoconversion and accretion rates from cloud and drizzle properties, using cloud probe measurements from the ACE-ENA campaign in the Azores and the stochastic collection equation formulated as a two-moment bin model. Overall, the estimated autoconversion and accretion rates from the machine-learning model agree with the observed rates to within 15% and 5%, respectively. The standard model requires concurrent, separated cloud and drizzle water contents and number concentrations, which can be obtained from in situ observations or retrievals from remote sensing measurements (e.g., Fielding et al., 2015; Mace et al., 2016; Rusli et al., 2017; Wu et al., 2020).
The joint analyses from the machine-learning model and optimization techniques led to a robust dependence of autoconversion on drizzle number concentration. The dependence on drizzle number concentration also shows a reciprocity with the dependence of cloud droplet number concentration. These findings are unexpected, because the autoconversion process represents the coalescence between cloud droplets and is causally only related to cloud properties. However, drizzle number concentration does contain information on the width and evolution of the DSD, and hence indirectly on the autoconversion rate. By using simple collection kernels, we replicate the dependence and reciprocity in theoretical derivations. This implies that these features are physical and can be incorporated to improve parameterizations of autoconversion rate. The power-law parameterizations also suggest that the autoconversion rate relates to cloud droplet number concentration with an exponent of 0.75, that is, smaller than often assumed, which will affect precipitation susceptibility, and therefore warrants further investigation.
Acknowledgments
This research was supported by the Office of Science (BER), DOE under Grants DE-SC0021167, DE-SC0013489, DE-SC0020259, and DE-89243020SSC000055. Van Leeuwen was supported by the European Research Council under the CUNDA project 694509.
Open Research
Data Availability Statement
ARM data are available online through http://www.archive.arm.gov. The work on machine learning used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725. The training and testing data sets, and the machine-learning trained models are available freely in the ARM Archive and in Github (https://github.com/yang0920colostate/AuAc).