Initial Results From the Super‐Parameterized E3SM

Results from the new Department of Energy super‐parameterized (SP) Energy Exascale Earth System Model (SP‐E3SM) are analyzed and compared to the traditionally parameterized E3SMv1 and previous studies using SP models. SP‐E3SM is unique in that it utilizes Graphics Processing Unit hardware acceleration, cloud resolving model mean‐state acceleration, and reduced radiation to dramatically increase the model throughput and allow decadal experiments at 100‐km external resolution. It also differs from other SP models by using a spectral element dynamical core on a cubed‐sphere grid and a finer vertical grid with a higher model top. Despite these differences, SP‐E3SM generally reproduces the behavior of other SP models. Tropical wave variability is improved relative to E3SM, including the emergence of a Madden‐Julian Oscillation and a realistic slowdown of Moist Kelvin Waves. However, the distribution of precipitation exhibits indicates an overly frequent occurrence of rain rates less than 1 mm day −1 , and while the timing of diurnal rainfall shows modest improvements the signal is not as coherent as observations. A notable grid imprinting bias is identified in the precipitation field and attributed to a unique feedback associated with the interactions between the explicit cloud resolving model convection and the spectral element grid structure. Spurious zonal mean column water tendencies due to grid imprinting are quantified—while negligible for the conventionally parameterized E3SM, they become large with super‐parameterization, approaching 10% of the physical tendencies. The implication is that finding a remedy to grid imprinting will become especially important as spectral element dynamical cores begin to be combined with explicitly resolved convection.


Introduction
The scales of atmospheric processes deemed important for a comprehensive understanding of the climate system range from global forcing on the scale of decades to microscopic processes that govern hydrometeor formation on the scale of microseconds. Unfortunately, the largest uncertainties live on the smallest scales, which is problematic for building a model that can faithfully represent all these scales and their interactions. Putting aside the complexity of microscale processes, the apogee of the atmospheric model hierarchy is a comprehensive global turbulence resolving model. Although simulations that resolve convection on a global scale are being explored (Satoh et al., 2008), the computational cost of this goal remains a prohibitive barrier for decadal simulations. Nevertheless, we can seek practical alternatives within the spectra of model complexity to include elements of explicit convection to test hypotheses and make predictions (Jeevanjee et al., 2017).
Super-parameterization is one such approach in which a global general circulation model (GCM) can use a relatively coarse grid and ignore nonhydrostatic effects, while an embedded cloud resolving model (CRM) represents convective-scale circulations explicitly (Grabowski, 2001;Grabowski & Smolarkiewicz, 1999;Khairoutdinov et al., 2005;Randall et al., 2003). The embedded CRM can be thought of as a representative sample of the unresolved processes at each GCM grid location (Figure 1), and the two models exchange tendencies such that they remain tightly coupled as the two states evolve (Grabowski, 2004). Thus, this framework can provide realistic convective feedbacks to large-scale dynamics and radiative heating. Notwithstanding its own simplifying idealizations (e.g., two dimensionality and periodic CRM boundaries), these properties of super-parameterization have yielded improvements over traditionally parameterized GCMs over a range of timescales, such as simultaneous improvements in intraseasonal variability, equatorial waves, and the diurnal cycle (Benedict & Randall, 2009;Goswami et al., 2015;Kooperman et al., 2016a;Pritchard and Somerville, 2009;Subramanian and Palmer, 2017;Sun and Pritchard, 2016).
The new U.S. Department of Energy Energy Exascale Earth System Model Version 1 (E3SM) was conceived to extend the capabilities of existing state-of-the-science Earth system models (Bader et al., 2014) and the initial version, E3SMv1, has proven to compare favorably to other Climate Model Intercomparison Project (Eyring et al., 2016) models (Golaz et al., 2019). In line with these goals, the super-parameterized E3SM (SP-E3SM) was also developed, not only to build upon previous work with the SP Community Atmosphere Model (SP-CAM; Khairoutdinov & Randall, 2001) but also to more closely examine an algorithm that has the potential to make good use of the ongoing growth in petascale computing, and imminent emergence of exascale computing architectures.
It is not obvious where best to focus added computing power in a modeling framework that resolves two scale regimes. One enticing option is to more explicitly resolve planetary boundary layer turbulence to overcome the shortcomings of turbulence parameterizations. This addresses the limitation of typical super-parameterization implementations that use a relatively coarse CRM resolution that does not resolve boundary layer turbulence, nor the complex transitions between forms of shallow cloud organization that it mediates Stan and Xu, 2014;Tao et al., 2009;Tulich, 2015). Grabowski (2016) used idealized mock Walker-Cell simulations to suggest that if the cost barrier of a finer-resolution CRM grid can be overcome, then we can expect a more faithful representation of the dominant cloud regime in any region. Parishani et al. (2018) explored this idea for the first time in a global model of realistic complexity but found the change of cloud feedbacks under global scale forcing exhibited a surprising lack of sensitivity to including versus excluding the planetary boundary layer scales of motion. Much more investigation is needed to fully understand the benefits of explicitly resolving finer-scale turbulence in the embedded CRM.
Another option that is attractive in a computationally richer future is using a finer host model grid, which has been shown to improve the representation of precipitation extremes in traditional GCMs (Kopparla et al., 2013), although stochastic approaches are another promising avenue (Watson et al., 2017). Stan and Xu (2014) demonstrated some modest sensitivities in the SP Community Earth System Model (SP-CESM) when the GCM grid was reduced from roughly 2 • to 1 • , but further refinement is needed for reproducing important regional phenomena. For instance, super-parameterization is often touted as a way to include a representation of mesoscale convective systems (MCS) in a GCM (Pritchard et al., 2011), which has important implications for regional climate projections (Kooperman et al., 2014). But despite success in capturing their large cirrus anvils, coarse-resolution SP GCMs have historically struggled to exhibit realistic MCS precipitation. Moving to 25-km exterior resolution provides dramatic improvements in case studies of individual storms (Tulich, 2015). This is logical to expect given that it has been known for over a decade that models begin to resolve important mesoscale circulations at grid resolutions below 30 km (Moncrieff & Liu, 2006). Furthermore, MCSs can be significantly influenced by synoptic frontal systems (Peters et al., 2014;Schumacher et al., 2005) and orographic features (Houze, 2012), so the ability to accurately reproduce MCS precipitation statistics will be negatively affected if these synoptic scales remain underresolved. This is especially important in the central eastern United States, where GCMs continue to disagree on future projections .
Exploring the benefits of finer resolution in a SP model, either in the embedded CRM or the host GCM, comes with a noteworthy increase in computational expense. The same obstacles prohibit numerous additional CRM enhancements that would likely lead to further benefits over traditional models, such as more sophisticated microphysics schemes or semi-explicit aerosol interaction (Gustafson et al., 2008). The straightforward solution to overcoming these issues is to make the model faster, which is the central motivation guiding the development of SP-E3SM.
Several techniques have been implemented to accelerate SP-E3SM and allow further exploration with refined grids. The fact that there is no direct communication between CRMs (and thus no internode MPI data transfers in the CRM code) means the CRM code can be refactored to efficiently utilize the Graphics Processing Units (GPUs) on state-of-the-art high-performance computing architectures while keeping all data resident in fast GPU memory the entire time. A substantial algorithmic speed-up is achieved by including a mean state acceleration technique pioneered by Jones et al. (2015) that reduces the number of time steps required by the CRM to respond to the large-scale forcing. Further acceleration is obtained by performing radiative heating calculations on the average state of CRM column groups rather than individual columns.
The primary goal of the present study is to describe the initial results from SP-E3SM, which implements all three of the acceleration techniques outlined above. Additionally, we wish to know whether SP-E3SM can produce similar behavior to SP-CAM with a different dynamical core and vertical grid by comparing our results with previous studies. The model configuration, acceleration techniques, and experiment setup are described in section 2, followed by select comparisons of the models' climate and variability in section 3. Section 4 discusses a grid imprinting bias revealed by the analysis in section 3 and quantifies its effect on the budget of total water. Discussion of solutions to the grid imprinting bias and other ongoing development are discussed in section 5.

Model Description
E3SMv1 was originally forked from the National Center for Atmospheric Research CESM Version 1 (Hurrell et al., 2013) but has undergone significant development since then. The most notable differences between the atmosphere component in E3SM and CESM are that the number of vertical levels was increased from 30 to 72 and the model top was moved from approximately 40 to 60 km. The various physics schemes are broadly similar, but many details differ in important ways (Xie et al., 2018). The dynamical core uses a spectral element method on a cubed-sphere geometry (Ronchi et al., 1996;Taylor et al., 2007). The land, ocean, sea ice components also differ significantly from CESM.
The embedded CRM in SP-E3SM is adapted from the System for Atmospheric Modeling  similar to that used in the SP-CAM. Microphysical processes are parameterized with a single moment scheme and sub-grid-scale turbulent fluxes are parameterized using a diagnostic Smagorinsky-type closure. Aerosol concentrations are prescribed with present-day values because the prognostic aerosol scheme in E3SM requires information about hydrometeor size distributions, which are not available with the single moment microphysics in the CRM. Implementation and testing of double moment microphysics is currently underway, which will allow future experiments to explore feedbacks with prognostic aerosols.
Early explorations with SP-E3SM revealed several troubling stability issues, which have been overcome through a variety of code updates and changes to the model configuration. For instance, the CRM subgrid turbulent mixing did not include a check to see if the eddy diffusivity violated any stability criterion, which periodically crashed the model. This has been corrected in the current version. Additionally, through trial and error we also discovered that the use of a shorter 20-min GCM time step and temporal surface flux smoothing greatly enhanced the stability of the model.

Maximizing Model Throughput
The computational cost of the current SP-E3SM configuration when run on CPUs is roughly 10 times larger than the conventional E3SM, which makes it difficult to produce adequate output for analysis. This cost ratio is much smaller than the factor of 10 2 − 10 3 that was reported by Randall et al. (2003) for an older version of SP-CAM, but this can be mostly explained by a large increase in the cost of E3SMv1 physics relative to these previous model comparisons. For the current simulation we have implemented several changes that can dramatically decrease the model cost without significantly altering the model behavior.
Radiative heating calculations are often the most expensive part of a GCM physics package, and a typical SP model repeats these calculations on each CRM column, making it even more expensive. In SP-E3SM and SP-CAM the GCM radiation scheme is called at the frequency of the GCM time step and takes the time average of each CRM column over the previous CRM call as input. In theory, using every CRM column for radiation allows the simulation of realistic cloud-radiative feedbacks because radiatively driven circulations can set up between the clear and cloudy regions (Cole et al., 2005), but in reality a developing cloud is typically dislocated from its radiative feedback since the radiation is called so infrequently. Thus, cloud-radiative feedbacks in SP-E3SM are more statistical than explicit as they would be in a typical CRM that calculates radiative tendencies every few minutes or less. Since the computational expense of radiative transfer can significantly decrease model throughput, a further compromise was developed where the radiative heating calculations are done on the average state of even subdivisions of the CRM domain. As the number of radiation columns becomes very small (i.e., 1 or 2) the cloud radiative effects become increasingly unrealistic, but sensitivity tests show that reasonable results are still obtained with as little as four radiation columns (not shown). Detailed analysis of these sensitivities will be explored in a future publication.
Mean state acceleration is a novel technique that reduces the number of CRM time steps per GCM time step (Jones et al., 2015), which dramatically increases the overall throughput of SP-E3SM. This technique exploits the scale separation between the relatively slow evolution of the domain-mean state of a CRM (i.e., the state exposed to the GCM) and the relatively fast evolution of the turbulent eddies simulated by the CRM. Mean state acceleration effectively boosts the domain-mean CRM evolution, thereby reducing the number of time steps needed to get a comparable response from the CRM. Tests with SP-E3SM using acceleration factors up to 5 have proven that this method does not degrade the solution quality or stability (not shown), with nearly linear speedup in the CRM integration.
By far the largest source of model speed up was obtained by porting the CRM code to run on modern GPU architectures. This required a substantial refactor of the CRM code. Some of these changes were minor, such as encapsulating subroutines within modules and making all data explicitly allocatable. Large structural changes were also needed to expose parallelism, which included pushing the loop over multiple CRMs instances down the call stack so that individual subkernels of CRM physics, like advection, can be integrated in parallel across multiple independent CRMs on the GPU. This also required promoting most CRM variables to include the new dimension over CRM instances. Another important change was a restructuring of the CRM code so that all loops were tightly nested to optimize GPU performance.
An unavoidable consequence of these changes is that when running on CPUs the model could not be guaranteed to be bit-for-bit when compared to the original SP-E3SM. This is mainly due to changes to accommodate different numbers of subcycles per time steps over multiple CRMs and for the fallout of hydrometeors. Furthermore, no two GPU runs are bit-for-bit identical from one run to the next because of atomic floating point operations that are used to do data reductions (i.e., sums or averages). For these operations multiple parallel threads will update the same memory location, but the order of these updates is not guaranteed to be the same each time, and adding floating point numbers in a different order will give a different result because floating point arithmetic is not commutative. To overcome this issue during development, we verify refactored code by comparing its difference from baseline against two baseline calculations at different compiler optimization levels (simulating machine precision changes) after one model day. The difference of the refactored code at every stage was required to be within the compiler optimization-based envelope.
The average throughput of these experiments for SP-E3SM and E3SM was 1.35 and 3.33 simulated years per wall-clock day (sypd), respectively, which includes the impact of file I/O. Unfortunately, we cannot compare these throughput estimates directly since they were done on different machines with different numbers of nodes. The SP-E3SM run on Oak Ridge Leadership Computing Facility Summit used 60 nodes with 36 tasks per node (2,160 total MPI tasks), whereas the E3SM run on National Energy Research Scientific Computing Center Cori used 41 noes with 33 tasks per node (1,350 total MPI tasks). In spite of not being a fair comparison, the throughput estimates still suggest a substantial improvement over previous timing tests of SP-E3SM without any acceleration techniques that were roughly 10 times the cost of E3SM. From development benchmarks of SP-E3SM without file I/O we observed 3 sypd and estimated the effect of using six V100 GPUs can contribute an acceleration of up of 15 times (Norman et al., 2019) compared two Power9 CPUs.

Experiment Setup
A 5-year SP-E3SM simulation was conducted using 60 nodes (i.e., 360 V100 GPUs) of the Oak Ridge Leadership Computing Facility Summit computer. The global cubed-sphere grid was set at ne30np4 (30 × 30 elements per cube face and 4 × 4 Gauss-Lobatto-Legendre (GLL) nodes per element), which roughly corresponds to an average grid spacing of 100 km. The embedded CRM was configured with a two-dimensional domain with 64 CRM columns in a north-south orientation, 1-km horizontal grid spacing, 5-sec CRM time step. Although E3SM uses 72 vertical levels, the CRM shares only the bottommost 58 vertical levels due to fact that the anelastic approximation is not valid at very low pressures, and for computational expedience. The radiation columns were limited to 4 groups of 16 CRM columns (i.e., radiative heating is calculated on time and spatial averages over 4 groups of 16 CRM columns within each GCM column), and the CRM mean state acceleration factor was set to 2. Sea surface temperatures were prescribed using present-day monthly climatological values that are temporally interpolated to give a smooth evolution (Taylor et al., 2000).
A second 5-year simulation was conducted with E3SMv1 for comparison on the Department of Energy's National Energy Research Scientific Computing Center Cori supercomputer (http://www.nersc.gov/ users/computational-systems/cori). To make this simulation consistent with SP-E3SM, the 20-min physics time step, temporal surface flux smoothing, and prescribed aerosols were used, which makes this simulation slightly different from previously published E3SMv1 results (Rasch et al., 2019), but allows a cleaner isolation of the effects of super-parameterization that is helpful for validating against historical expectations.

Observational Benchmarks
Several observational data sets are used to benchmark the model results. The observational data used here span the years 2000-2004, as this was determined to be a reasonable period for comparison due to the lack of strong El Niño-Southern Oscillation events and relatively weak interannual variability. Daily averaged outgoing longwave radiation (OLR) data are from the National Oceanic and Atmospheric Administration polar-orbiting satellites (Liebmann and Smith, 1996). Monthly precipitation estimates are from the Tropical Rainfall Measuring Mission (TRMM) 3B43 monthly data product (Huffman et al., 2007). Higher-frequency TRMM 3B42 3-hourly precipitation and the Global Precipitation Climatology Project daily precipitation (Huffman et al., 2001) data are also used as baselines to compare with the simulated daily precipitation statistics. Reanalysis data are taken from the European Centre for Medium-Range Weather Forecasts' next-generation reanalysis (Hoffmann et al., 2019). Cloud radiative effect estimates are provided by Version 4 of the Clouds and the Earth's Radiant Energy System Energy Balanced and Filled top of atmosphere data product (Loeb et al., 2018).

Climatology
The annual climatology of precipitation is shown in Figure 2 for both models and 5 years of TRMM 3B43 data. Data are plotted using shaded quadrilaterals that represent the native grid. The general pattern and magnitude of precipitation in both models is comparable, but SP-E3SM has a smaller global mean bias relative to E3SMv1 to the extent that prescribed SST configurations can be compared to observations. Figure 3 shows the precipitation differences relative to TRMM after regridding all data to a 1 • grid to highlight some regional bias reductions, such as over central Africa and western equatorial Indian Ocean. Stippling indicates points with statistically significant differences at the 95% confidence level. There are also some regions where SP-E3SM shows larger precipitation biases, such as over the North West Tropical Pacific and the Amazon, which are regions where other SP model versions have struggled Chern et al., 2016). Aside from the model differences that come from differing treatments of physical processes, there is also an unphysical grid imprinting pattern that is evident in the precipitation climatology of SP-E3SM. This is the same grid imprinting mechanism discussed by Herrington, Lauritzen, Reed, et al. (2019) but is considerably amplified by the embedded CRM. The visible signature of the imprinting is the result of precipitation being systematically higher at the corners and edges of the quadrilateral spectral elements and can also be detected in other fields such as vertical velocity. The root cause of the grid imprinting and an estimate of its consequences are discussed in further detail in section 4 The comparison of time-averaged precipitation can mask importance differences in the variability, so it is useful to examine the distribution of daily mean precipitation. Here we calculate the distribution over logarithmically spaced bins following Pendergrass and Hartmann (2014). The width of each successive bin increases by a fixed percentage of 25%, which is a larger percentage than previous studies (Kooperman et al., 2016b;Pendergrass and Hartmann, 2014) but does not affect the interpretation of the results. A total of 40 bins are used with the lowest bin centered on 0.04 mm day −1 with a width of 0.02 mm day −1 . Using this bin structure the rain frequency and amount distribution (i.e., each bin's contribution to the area-averaged mean value) were calculated using daily mean, area-weighted data from 50 • S to 50 • N. Observational data from the Global Precipitation Climatology Project and TRMM 3B42 are used for comparison of simulated daily precipitation. The resulting distributions are shown in Figure 4. Figure 4 reveals several interesting differences in the rain rate frequency and amount distributions of both models when compared to observations. E3SMv1 exhibits a pronounced bias of overly frequent rain rates from 1 to 10 mm day −1 (Figure 4a). This is a signature of a known issue with conventional convective parameterizations where they frequently produce weak intensity rainfall, especially over land (Sun et al., 2006;Stephens et al., 2010). E3SMv1 also underestimates the frequency of the highest precipitation compared to TRMM. The rain amount distribution (Figure 4b) of E3SMv1 shows a larger area under the curve compared to observations, which reflects the larger mean precipitation rate (Figure 2b). The rain amount distribution of SP-E3SM more closely resembles TRMM data, although the lowest (<3 mm day −1 ) and highest (>50 mm day −1 ) rain rates provide a larger contribution. SP-E3SM shows a curiously high frequency of very low rain rates less than 1 mm day −1 . Maps of dry day frequency, defined as the fraction of time with rain rates less than 1.0 mm day −1 indicates that these overly frequent low rain rates are widespread in SP-E3SM and are not confined to any specific region (not shown).  The more frequent occurrence of very low and very high rain rates in SP-E3SM is consistent with the "convective throttling" hypothesis of Pritchard et al. (2014) and Yu and Pritchard (2015) who postulated that a small CRM domain reduces the efficiency of convective mixing by locally trapping gravity waves and subsidence associated with cloud updrafts such that further convection is suppressed. This ultimately delays the convective response to a large-scale instability until it reaches a level that warrants a more dramatic adjustment, increasing the occurrence of precipitation extremes and the overall precipitation variance. The number of columns in the CRM used here is 64 with a grid length of 1 km. The total domain size of 64 km is less than the typical SP-CAM configuration with a 128-km domain using 32 CRM columns and 4 km grid spacing, so some degree of throttling in SP-E3SM is not surprising. Limited sensitivity tests with SP-E3SM have been analyzed to show the throttling effects diminish as the domain size is increased, but the model appears to be most strongly sensitive to the number of CRM columns (not shown). A more thorough sensitivity analysis to explore this issue is planned for future work. Figure 5 shows the annual climatology of zonally averaged zonal wind, temperature, and specific humidity for E3SM and the difference between SP-E3SM and E3SM. The large-scale circulation in SP-E3SM does not differ much from E3SM except for slight shifts in jet positions. This includes a slight poleward shift in the subtropical jets, which is consistent with the slightly warmer upper tropical troposphere in SP-E3SM based on thermal wind balance. There is also a notable increase in specific humidity in the lower tropical troposphere of SP-E3SM. A complete explanation for this increase in SP-E3SM is unclear, but it coincides with an increase in low-level cloud condensate ( Figure 6) in ways that are also evocative of the convective throttling hypothesis . The excessive boundary layer humidity could also be a symptom of overentrainment leading to unrealistically deep boundary layers, as has been witnessed before when using super-parameterization at higher than usual horizontal and vertical grid resolution (Parishani et al., 2017). Figure 6 reveals interesting effects of the initial implementation of super-parameterization on mean cloud amount and its liquid-ice phase partitioning, and Figure 7 shows associated implications for top-of-atmosphere cloud radiative effect. The use of super-parameterization is well known to strongly impact vertical distributions cloud condensate and the associated radiative forcing in ways that depend sensitively on the grid resolutions of the embedded CRM (Parishani et al., 2017) as well as their microphysical formulation and parameter settings (Parishani et al., 2018). Consistent with this expectation, it is unsurprising that super-parameterization produces dramatic effects on mean cloud liquid relative to E3SM (a 30% increase equatorward of 60 • ) and a threefold increase in cloud ice water content. Whereas this coincidentally reduces zonal mean longwave cloud radiative biases relative to E3SM (which produces deep clouds that do not trap enough longwave radiation), it also leads to a strong "bright bias" in the tropics, suggesting that SP-E3SM's deep tropical clouds are too reflective in the shortwave, by as much as 20 W m −2 . The implication is that some microphysical re-tuning of SP-E3SM is needed before it can be used operationally. Preliminary analysis suggests that targeting unconstrained parameters such as the terminal ice velocity fall speed or liquid-ice autoconversion rate could be a prudent strategy.

Variability
One of the often touted successes of SP-CAM was an increased variance associated with certain Tropical wave types, such as the Madden-Julian Oscillation (MJO), and an improvement (slowdown) in the phase speed of Moist Kelvin Waves-both modes that tend to be distorted in traditionally parameterized global models (Benedict & Randall, 2009;Benedict et al., 2014;Khairoutdinov et al., 2005;Pritchard et al., 2014;Stan & Xu, 2014). But in spite of these improvements, other tropical wave modes, such as easterly waves, tend to be too active in SP-CAM relative to observations (Hannah et al., 2017;McCrary et al., 2014).
To examine how SP-E3SM compares to these previous results, Figure 8 shows the normalized zonal spectra of top-of-atmosphere OLR from National Oceanic and Atmospheric Administration observations, E3SM, and SP-E3SM. Each raw base 10 log power estimate is normalized by red background spectra calculated by smoothing the raw power spectra with a 1-2-1 filter following Wheeler and Kiladis (1999). Positive and negative wave numbers indicate eastward and westward propagating modes, respectively. Dispersion lines are shown for Kelvin, Rossby, and interio-gravity waves (Matsuno, 1966).

Journal of Advances in Modeling Earth Systems
The most prominent feature of the observed equatorial wave spectrum is a peak around 30-to 90-day periods (0.01-0.03 day −1 ) and eastward zonal Wave Numbers 1-2, which is the signature of the MJO. E3SM shows very weak variance in this MJO band consistent with the results of Rasch et al. (2019), whereas the results from SP-E3SM indicate considerably larger MJO amplitude, albeit weaker than observations. A similar result is found for the signature of Kelvin waves, which are too weak and too fast (i.e., peaks at higher frequencies along the dispersion curve) in E3SM and notably more realistic in SP-E3SM. Equatorial Rossby waves are comparable to observations in SP-E3SM and too weak in E3SM. Variance in the 2-to 10-day easterly wave band (0.1-0.3 day −1 ) is slightly too large in SP-E3SM consistent with previous studies. Neither E3SM nor SP-E3SM shows a peak associated with ∼2-day westward inertio-gravity waves. Strong intraseasonal variance does not necessarily mean that a model is producing an MJO. The observed MJO has a particular phasing of circulation and convection and sometimes models can have intraseasonal variance that is associated with stationary or westward propagating features, so it is important to check for these properties before concluding whether a model's MJO is at all accurate. Figure 9 shows an all-season lagged regression of 20-to 100-day-filtered OLR in shading and 850-hPa zonal wind in contours. Data were meridionally averaged from 15 • S to 15 • N before filtering and regressing against an index of 20-to 100-day-filtered OLR averaged over 80 • S to 100 • E and 15 • S to 15 • N. The 20-to 100-day filtering was done using a Lanczos filter with 91 weights and all 5 years of data were used. The observed MJO exhibits an envelope of convection and reduced OLR that is led by anomalous westerlies and followed by anomalous easterlies.
E3SM does not show an eastward propagating OLR signal as in observations but instead shows a westward propagating OLR signal that is not coupled to a circulation feature (Figure 9b). On the other hand, SP-E3SM shows much more consistent phasing and amplitude of the OLR and wind anomalies, suggesting that the increased intraseasonal variance in Figure 8 can indeed be considered an MJO (Figure 9c). The lack of eastward propagation in E3SM differs from the results of Golaz et al. (2019) who found a coherent signal in fully coupled E3SM experiments. We repeated the analysis to match their methods using a lag correlation based on precipitation and recovered a similar eastward propagation (not shown). E3SM is much less coherent than SP-E3SM with either method, and the lack of an eastward propagating OLR signal in E3SM suggests that important radiative feedback mechanisms are missing, which is consistent with the results of previous studies (Andersen and Kuang, 2012;Hannah and Maloney, 2014;Kim et al., 2015).
Another notable success of some configurations of SP-CAM was the improvement in the timing of the peak in daily precipitation . Pritchard and Somerville (2009) explored this by utilizing several analysis techniques to highlight various regional improvements. Pritchard et al. (2011) further showed that SP-CAM augmented with some reconfiguration of the CRM grid could reproduce the nocturnal longwave cloud forcing peak over the central United States that is observed in boreal summer due to eastward propagating orogenic MCSs, although this did not translate to surface precipitation. However, Pritchard et al. (2011) suggested many of these promising initial signals were not robustly reproducible in alternate configurations of the host model and subsequent updates to CRM formulation. Figure 10 provides a limited analysis of the boreal summer diurnal precipitation to compare with these previous results. The color hue and intensity in Figure 10 represent the phase and amplitude of the first harmonic of the diurnal precipitation composite for TRMM, E3SM, and SP-E3SM. All data were resampled to 3-hourly before compositing. Figure 10 shows several areas where SP-E3SM exhibits a peak timing that is more consistent with TRMM data, such as over the central United States, the Bay of Bengal, the Himalayas, and most tropical and subtropical ocean regions. However, over most tropical land regions SP-E3SM does not show any obvious improvement over E3SM, unlike early versions of SP-CAM. A recent update to the deep convection scheme in E3SM, which is not included in E3SMv1 simulation presented here, was shown to significantly improve the diurnal cycle by Xie et al. (2019).

Grid Imprinting
We have shown that a previously unencountered issue with cloud super-parameterization, when applied within a modern spectral element dynamical core, is a profound amplification of "grid-imprinting" symptoms. These effects were originally noticed by Herrington, Lauritzen, Reed, et al. (2019) but are significantly exacerbated with explicit convection and therefore worth especial scrutiny.

What Causes the Grid Imprinting?
To understand how the grid imprinting pattern in Figure 2 arises, it is important to understand the grid structure in E3SM. The dynamical core is based on continuous Galerkin finite elements on a cubed-sphere grid (Taylor & Fournier, 2010). Each element of the grid is a quadrilateral that employs an irregularly spaced 4 × 4 grid of GLL quadrature nodes (np4), which are used to define the polynomial basis set. The representation of any field in the element interior is infinitely smooth with continuous derivatives. However, the colocated edge GLL points between neighboring elements are arithmetically averaged to create a globally continuous field, and derivatives are not guaranteed to be continuous across element edges. This difference in numerical behavior is associated with a larger variance of derivative based quantities, such as vertical velocity diagnosed from horizontal wind gradients in a hydrostatic model, on the element edge and corner points that can influence other quantities, such as precipitation (Herrington, Lauritzen, Reed, et al., 2019).
A simple diagnostic of this phenomenon can be created by averaging a quantity over each node type (i.e., corner, edge, or interior). If there were no grid imprinting then the mean value for each node type would be statistically indistinguishable from the others. Figure 11 shows an example of this type of diagnostic where precipitation rate is binned by GLL node area. The GLL nodes do not actually cover a specific area like cells used in finite volume methods, but a representative area can be determined using the spectral projection weights for each node. For the np4 GLL grid the corner, edge, and middle node areas fall into distinct ranges that can be leveraged to isolate each bin type. The bounds of the area bins used here are 0.0, 0.0001, 0.0003, and 0.0006 in units of steradians. Figure 11 reveals that precipitation in SP-E3SM has a strong dependence on GLL node area, such that it is systematically higher on corner nodes. E3SM also shows a dependence of precipitation on GLL node area, but the slope is much weaker. We can test the significance of these relationships by using a standard t test to determine if the difference of means between corner and middle nodes is statistically different from zero. Assuming a 95% confidence level we find that the difference is significant for SP-E3SM, but not significant for E3SM.
The systematic differences between precipitation on GLL node corners versus interiors imply the existence of an artificial grid-imposed circulation, which we believe arise from the interaction between deep convective latent heating and the divergent circulations. For instance, rising air over a large area can destabilize the environment for moist convection and convection will respond non-linearly. An outbreak of convection can also drive a divergent circulation and further large-scale upward motion. The same mechanisms do not operate in reverse, because subsidence generally suppresses convective development and a lack of convection does not directly suppress future convection. In this same sense we hypothesize that the nonlinear behavior of realistic convection simulated by the CRM in SP-E3SM can lead to an upward circulation bias as a result of interacting with the higher vertical velocity variance on element edges and corners revealed by Herrington, Lauritzen, Reed, et al. (2019). This interaction produces an upward bias on edge and corner nodes as shown in profiles of omega averaged by GLL node area bins in Figure 12. This is consistent with the systematically higher precipitation on these nodes in Figure 11. Similarly, there is systematically weaker precipitation and stronger subsidence on the element interior nodes. Similar to the significance test above for precipitation, the difference between corner and interior nodes is significantly different from zero throughout the free troposphere in both models. Pulling these aspects together paints a picture of the grid imprinting as an intra-element circulation with rising motion at the edge and sinking motion in the middle.

Isolating the Grid Imprinting
The circulation associated with the grid imprinting bias may be influencing modes of variability or the long-term climate, but to know this with any certainty, we need to quantify these effects. To do this, we develop a framework that isolates the grid imprinting signal by decomposing a given field, q, into two components, where q does not exhibit any grid imprinting. Ideally, q ′ would only contain the grid imprinting signal, but due to the nature of the imprinting pattern, it is not clear how to cleanly isolate the imprinting signal. Nonetheless, we can still provide an estimate of the grid imprinting effect by designing a filter that minimizes the grid imprinting signal in q. To do this we use a simple average of the N nearest neighboring columns considering all directions to obtain a "boxcar"-like spatial low-pass filter and then calculate q ′ as the residual.
To verify our choice of N, Figure 13 shows the result of filtering the time mean precipitation of SP-E3SM with a range of N values. The significance of the difference between the corner and interior node mean values was tested for each case using the t test approach described above in section 4.
1. An open circle denotes when the difference between corner and interior points was significant at the 95% confidence level, indicating that grid imprinting was detectable. A closed circle denotes that the difference was not significant from 0, indicating that the filter successfully removed the grid imprinting. Low values of N do not do well removing the grid imprinting since they do not span the scale of the induced circulation. Not surprisingly, choosing the number of neighbors roughly equal to the Figure 13. Average precipitation difference between corner and interior GLL node types (see Figure 11) as a function of the number of nearest neighbors used by the filter to remove the grid imprinting signal (see text).
Open circles correspond to differences that are significantly different from zero at the 95% confidence level, indicating that the grid imprinting was not removed by the filter. Similarly, filled circles correspond to differences that are not significantly different from zero, indicating that the filter successfully removed the grid imprinting.
number of nodes per element (i.e., 16) adequately removes the grid imprinting signal. The filter is not effective for N between 19 and 28 for a similar reason that the footprint roughly encompasses one and a half of these circulations. Based on this result we use N = 16 for further analysis. Figure 14 demonstrates the effects of the spatial filter on the precipitation climatology. The filtered precipitation closely resembles the total precipitation in Figure 2 without any imprinting. The residual precipitation shows the imprinting signal as well as some indication of coastal and topographic features. The presence of physical features in the residual is partly due to the scale of these features, but is also partly due to the fact that these features are naturally associated with higher precipitation and higher precipitation tends to be associated with larger imprinting.

Moisture Budget Analysis
With a framework in place for decomposing a field into components with and without grid imprinting, we apply the decomposition in (1) to the terms of the advective form of the column integrated budget of total water, where q t = q v +q l +q i is the total water mixing ratio and E and P represent the source and sink of total water through evaporation and precipitation at the surface, respectively. Angle brackets represent a mass-weighted integral through the troposphere from the surface to 50 hPa. We can then apply the filter from (1) to decompose each individual variable in (2) and  separate each term into components.
To demonstrate whether the grid imprinting is having a significant impact on the budget, it is sufficient to reduce the problem by focusing on the precipitation and vertical advection terms of (2) since and P were found to have the largest grid imprinting signal. Figures 15a and 15b show area-weighted zonal means of the filtered column total water tendencies from vertical advection (−⟨ p q t ⟩) and precipitation (−P), which account for the majority of the total, unfiltered tendency (not shown). The filtered fields provide a useful context for the imprinting tendencies shown in Figures 15c and 15d. Analysis of the q t field shows no significant imprinting in either model (not shown), which means that any signal in the −⟨ p q ′ t ⟩ term (orange line) can be interpreted as a signal isolated by the filter that is not related to the grid imprinting. Comparing this to the −⟨ ′ p q ′ t ⟩ (green line) term, we can see a similar pattern in the zonal mean, suggesting that there is no significant contribution to the grid imprinting.
Given the statistically significant imprinting in and P, it is no surprise that the −⟨ ′ p q t ⟩ and P ′ terms both show a pronounced signal in SP-E3SM (Figure 15d). The signal is largest in the tropics but is generally 10% of the magnitude of the corresponding filtered terms in Figure 15b at all latitudes. The difference of these two terms (not shown) confirms that −⟨ ′ p q t ⟩ is larger such that the net effect is a slight positive column tendency in SP-E3SM, which is significantly different from 0 at the 95% confidence level. A similar relationship is evident in E3SMv1, but smaller and less systematic in sign. This result shows that the grid imprinting has a nonnegligible effect on the climate of SP-E3SM. For E3SMv1, the net effect is arguably small enough to be considered negligible.

Conclusions
Initial results of SP-E3SM have been analyzed and compared to E3SMv1, affording a new chance to test which effects of super-parameterization are robust versus sensitive to details of its implementation. To this end, the SP-E3SM is helpfully unlike historical precursors, such as SP-CAM, in its use of finer tropospheric vertical resolution and a spectral element dynamical core with associated grid eccentricities. SP-E3SM is also unique in that it incorporates techniques to increase model throughput in order to overcome the significant cost of the embedded CRM. These techniques include GPU hardware acceleration, CRM mean-state acceleration, and a reduction in the number of columns used for radiation.
In short, the main effects of super-parameterization on the tropical wave spectrum are consistent with expectations from past experience-such as the emergence of a reasonably strong MJO and the slowdown of equatorial Kelvin waves to more realistic phase speeds. However, the Boreal summer diurnal cycle shows modest improvements in SP-E3SM that do not live up to the successes of early SP-CAM studies. This includes the representation of propagating nocturnal convective systems over the central United States, which are not as robust as those found by Pritchard et al. (2011).
SP-E3SM also exhibits a previously unencountered challenge unique to the use of super-parameterization when coupled to modern spectral element dynamical cores. A surprisingly large grid imprinting pattern was identified in SP-E3SM for fields such as precipitation and vertical velocity. The grid imprinting is also detectable in the vertical velocity of E3SMv1, but the amplitude is weaker and the signal is not detectable in precipitation. A limited analysis of the column total water budget with a spatial filter designed to isolate the grid imprinting signal shows that the net effect of imprinting is on the order of 10% for vertical advection and precipitation. The column integrated vertical advection and precipitation tend to be large with a lot of mutual cancelation (Hannah et al., 2016), so it is unclear whether the small contribution from the grid imprinting is actually changing the model behavior all that much. But given that this unphysical signal is not negligible, there is sufficient motivation to develop a solution that removes it.

Discussion
Addressing the grid imprinting is a top development priority. The spectral element method used for dynamics calculations in E3SM requires a fourth-order hyperviscosity operator for stability, which effectively smooths the model state. Our first attempt to address the grid imprinting was to enhance this hyperviscosity to further smooth the state, but unfortunately, this increased the computational cost of the model with unacceptably modest reductions in the grid imprinting (not shown), so this does not offer an avenue for adequately addressing the problem.
The preferred way to address the grid imprinting is to completely remove or mask the inhomogeneity that is inherent between the nodes of the GLL grid. Herrington, Lauritzen, Reed, et al. (2019) and Herrington, Lauritzen, Taylor, et al. (2019) did this by moving the physics calculations to a quasi-equal-area finite volume grid. We have implemented a scheme inspired by their work that uses a 2 × 2 grid of finite volume cells for each spectral element and an interpolation method that does not require information from neighboring elements. This method is undergoing further development and testing to ensure that the mapping between these grids does not create new issues, but preliminary results are very encouraging.
With a throughput over 1 sypd we consider this model viable for decadal scale experiments, although including the higher microphysical moments needed to simulate realistic cloud and aerosol feedbacks will significantly reduce throughput. Ongoing work seeks to increase model throughput to 5 sypd by exploiting and expanding the acceleration techniques that make SP-E3SM unique. This includes developing a GPU accelerated double moment microphysics scheme such that prognostic aerosols can interact with the CRMs' explicit convection through the use of the explicit-cloud-parameterized-pollutants scheme of Gustafson et al. (2008).
A test that extends the concept of super-parameterization to the land component is being explored and is expected to produce improved land-atmosphere interactions. Another line of development aims to treat convective momentum transport with a 2-D CRM using the scheme developed by Tulich (2015), which transports the zonal and meridional momentum components as scalars along with an approximation of the pressure gradient force induced by rising thermals. Ultimately, we plan to pull this work together in the context of unprecedented national GPU supercomputing resources in order to push the boundaries of the represented scales in the model to explicitly resolve boundary layer turbulence and finer synoptic features that will inform future studies of regional climate impacts. used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This research was supported as part of the Energy Exascale Earth System Model (E3SM) project, funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract DE-AC02-05CH11231. The E3SM project, code, simulation configurations, model output, and tools to work with the output are described at the website (https://e3sm. org). Instructions on how to get started running E3SM are available at the website (https://e3sm.org/model/ running-e3sm/e3sm-quick-start). All code for E3SM may be accessed on the GitHub repository (https://github. com/E3SM-Project/E3SM). All code for SP-E3SM may be accessed from a separate GitHub repository (https:// github.com/E3SM-Project/ E3SM-ECP). Model output data are accessible directly from the DOE's National Energy Research Scientific Computing Center (NERSC) or through the DOE Earth System Grid Federation (https://esgf-node.llnl.gov/ projects/e3sm-ecp).