A New Volcanic Stratospheric Sulfate Aerosol Forcing Emulator (EVA_H): Comparison With Interactive Stratospheric Aerosol Models

Idealized models or emulators of volcanic aerosol forcing have been widely used to reconstruct the spatiotemporal evolution of past volcanic forcing. However, existing models, including the most recently developed Easy Volcanic Aerosol (EVA; Toohey et al., doi: 10.5194/gmd‐2016‐83), (i) do not account for the height of injection of volcanic SO 2 ; (ii) prescribe a vertical structure for the forcing; and (iii) are often calibrated against a single eruption. We present a new idealized model, EVA_H, that addresses these limitations. Compared to EVA, EVA_H makes predictions of the global mean stratospheric aerosol optical depth that are (i) similar for the 1979–1998 period characterized by the large and high‐altitude tropical SO 2 injections of El Chichón (1982) and Mount Pinatubo (1991); (ii) significantly improved for the 1998–2015 period characterized by smaller eruptions with a large variety of injection latitudes and heights. Compared to EVA, the sensitivity of volcanic forcing to injection latitude and height in EVA_H is much more consistent with results from climate models that include interactive aerosol chemistry and microphysics, even though EVA_H remains less sensitive to eruption latitude than the latter models. We apply EVA_H to investigate potential biases and uncertainties in EVA‐based volcanic forcing data sets from phase 6 of the Coupled Model Intercomparison Project (CMIP6). EVA and EVA_H forcing reconstructions do not significantly differ for tropical high‐altitude volcanic injections. However, for high‐latitude or low‐altitude injections, our reconstructed forcing is significantly lower. This suggests that volcanic forcing in CMIP6 last millenium experiments may be overestimated for such eruptions.


Introduction
Stratospheric volcanic sulfate aerosol radiative forcing (volcanic forcing hereafter) is a major driver of Earth's climate variability. Volcanic eruptions can inject sulfur dioxide (SO 2 ) into the stratosphere and form long-lived (1-3 years) sulfate aerosol that modify Earth's radiative balance, causing a net cooling at the surface and affecting major modes of climate variability (e.g., Kremser et al., 2016;Robock, 2000;Timmreck, 2012). Recently, it has emerged that even relatively small eruptions (injecting less than around 1 teragram (Tg) of SO 2 ) of the early 21st century exert small but significant radiative forcing (e.g., Schmidt et al., 2018) and have a statistically discernible cooling effect on sea surface and tropospheric temperatures (Santer et al., 2015).
Models are key tools to reconstruct past volcanic impacts on climate and societies, as well as to predict the impacts of future volcanic eruptions. Interactive stratospheric aerosol models (e.g., Timmreck et al., 2018) predict the full life cycle of volcanic sulfate aerosol, and the associated radiative and climate response following an injection of volcanic SO 2 into the atmosphere. However, there is a large spread among the forcing predicted by these models for a specified volcanic SO 2 injection (e.g., Zanchettin et al., 2016). This intermodel uncertainty adds to intramodel uncertainties as well as uncertainties related to constraining eruption source parameters, for example, the mass of SO 2 and eruption latitude reconstructed from ice cores when investigating the impacts of past eruptions (Marshall et al., 2018;Toohey & Sigl, 2017). Given the computational cost of interactive stratospheric aerosol models, exploring how the propagation of model and source parameter uncertainties affect the assessment of the climate response to a volcanic eruption is challenging and requires significant efforts such as model intercomparison exercises (e.g., Timmreck et al., 2018;Zanchettin et al., 2016).
1. The vertical structure of the forcing produced by the model does not depend on characteristics of volcanic sulfur injections, in particular, plume height. 2. It is calibrated using data from the 1991 Mount Pinatubo eruption. Given the sensitivity of the relationship between the erupted sulfur mass and the subsequent volcanic forcing on eruption source parameters (such as the latitude or altitude of injection, e.g., Marshall et al., 2019;Toohey et al., 2019), one should be careful when applying this model to other eruptions. In particular, most eruptions whose plume reaches the stratosphere inject order(s) of magnitude less sulfur than Mount Pinatubo, with injections between 10 and 20 km altitude (instead of ∼20-25 km for Mount Pinatubo), and commonly occur in high latitudes instead of the tropics (Carn et al., 2016).
Consequently, the major objective of this study is to extend the EVA methodology to develop EVA_H (with "H" standing for height), an idealized model of volcanic aerosol forcing: (i) accounting for plume height to determine the forcing resulting from a sulfur injection; (ii) predicting the vertical structure of aerosol extinction; and (iii) calibrated against eruptions spanning a large range of mass of erupted sulfur, plume height, and latitude. We compare outputs of EVA_H to EVA and to interactive stratospheric aerosol models. We also provide example applications to improve reconstructions of past volcanic forcing and provide fast response to present/future eruptions.

Data 2.1.1. Primary Data Sets Used to Calibrate the Model
Our strategy is to calibrate the model so that its output best reproduces observations of atmospheric optical properties given an input inventory of volcanic sulfur emission estimates. For optical properties, we use the Global Space-based Stratospheric Aerosol Climatology (GloSSAC, version 1.1; Thomason et al., 2018), which is the National Aeronautics and Space Administration's latest reconstruction of extinction from satellite data. It contains latitude-and altitude-dependent extinction at 525 nm from 1979 to 2016. Typical uncertainties on extinction coefficients are about 10% (Thomason et al., 2018), although uncertainties associated with the processing and combination of the various observational data sets used in GloSSAC remain to be precisely quantified. In addition, whereas 1984-2005 climatological tropopause height from the Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al., 2011) are provided with the GloSSAC data set, we use time-varying tropopause height from the NCEP/NCAR reanalysis (Kalnay et al., 1996). This enables us to account for trends related to climate change (Santer et al., 2003) and the large variability of tropopause height at high latitudes when calculating stratospheric aerosol optical depths (see Figure S1 in the supporting information for a comparison of GloSSAC versions and tropopause height treatment). For the volcanic sulfur emission inventory, we use data reported by Carn et al. (2016) who report the date, location, mass of SO 2 , and altitude of volcanic emissions over 1978-2015. Typical uncertainties for the total mass of SO 2 injected by an eruption range from 20% to up to 50-100% (Carn et al., 2016), while typical uncertainties on the injection height are up to 20% (e.g., Aubry et al., 2017;Carboni et al., 2016).

Model Structure
The new model, EVA_H, maintains the overall approach of EVA , that is, • The global mean SAOD at 525 nm and effective radius are scaled from the total mass of SO 4 (sections 2.4 and 3.4). • Transport equations govern the production, transfer, and loss of SO 4 among the model grid boxes (section 2.3). • The latitudinal and vertical distribution of extinction is produced using the distribution of SO 4 mass in the model boxes and 2-D shape functions (section 3.3). • Wavelength-dependent extinction, single scattering albedo, and scattering asymmetry factor are calculated from the effective radius and extinction at 525 nm using Mie theory (section 3.4).
EVA separates the stratosphere into three latitudinal bands (southern extratropics, tropics, and northern extratropics) which is consistent with respect to the structure of the Brewer-Dobson circulation (e.g., Butchart, 2014;Neu & Plumb, 1999;Plumb, 1996). To add a vertical dimension while maintaining the simplified approach of a box model, we use three vertical bands: • The lowermost extratropical stratosphere (≤16 km), where cross-tropopause mixing and transport at midlatitudes is an important control on the transport of aerosols between the stratosphere and the troposphere. • The lower stratosphere (16-20 km) where aerosols in the tropics may be transported directly into the lowermost extratropical stratosphere due to the latitudinal dependence of isolines of potential temperature. • The middle stratosphere (≥20 km).
The proposed structure including three latitudinal and three vertical bands results in an "8-box" model ( Figure 1) if we keep only stratospheric boxes and exclude the uppermost tropical troposphere. To be consistent with the grid of the GloSSAC data, against which the model will be calibrated, the top of the model is at 39.5 km altitude, and the tropical boxes comprise latitudes ≤22.5 • .

Evolution of Sulfate Mass in the Model Boxes
The equations governing the evolution of the mass of sulfur in a model box will follow the approach of EVA, adapted to the new two-dimensional structure of EVA_H. The calibration of all parameters involved in the equations presented throughout section 2 is detailed in section 3. We assume that the evolution of the mass of SO 2 in a box i (see Figure 1 for box indices) M i SO 2 is governed by the equation: where S i is a source term, and i prod is an effective time scale for the conversion of SO 2 into sulfate aerosols. Accordingly, the production of SO 4 in a box i will be of the form: where the mass of SO 2 in a box i is decomposed into the mass from volcanic injections M i vSO 2 and a flux B i , assumed constant, corresponding to background nonvolcanic sulfur injections.
We assume that two-way mixing can occur between two adjacent boxes belonging to the same vertical band and/or between the lower tropical stratosphere (Box 5) and the lowermost extratropical stratosphere (Boxes 7 and 8). The two-way mixing flux from a box i to a box is proportional to the SO 4 mass difference between the boxes: where i mix is a mixing time scale. As for two-way mixing, we assume that one-way mixing, that is, residual transport, can happen between two adjacent boxes belonging to the same vertical band and/or between Box 5 and Boxes 7 and 8. The one-way mixing (OWM) flux from a box i to a box is proportional to the mass of SO 4 in box i: where i owm is a one-way mixing time scale. In EVA, one-way mixing terms are used to represent the residual Brewer-Dobson circulation from the tropics to the extratropics not accounted for in the two-way mixing terms.
We assume that the loss of aerosol in box i is proportional to the mass of SO 4 in the same box: where i loss is a loss time scale. In EVA_H, we assume that the SO 4 loss flux from a box that is not in contact with the tropopause (i.e., all boxes except Boxes 5, 7, and 8) corresponds to a positive flux for the box located directly below. For example, the loss term in Box 1, − in one of the eight boxes i will then be dM i where the production term PROD is governed by equation (2), two-way and one-way mixing term(s) MIX-ING and OWM are governed by Equations (3) and 4, respectively, and the loss term LOSS is governed by equation (5) and may include positive terms related to the loss of aerosols in the box located above box i (e.g., for Box 4, cf. fluxes illustrated on Figure 1). Note that time scales loss , mix , and owm are not physical time scales and depend on the geometry (e.g., thickness) of the eight boxes of the model.

10.1029/2019JD031303
The final configuration of the model depends on the following choices: 1. Between which boxes to include two-way and one-way mixing terms 2. The dependence of the time scales prod , loss , mix , and owm on latitude, altitude, and season.
We further discuss these choices in section 3.2.

Scaling for the Stratospheric Aerosol Optical Depth
The calibration of the model requires linking the model primary output (i.e., the mass of sulfate in each box) to optical properties that can be directly observed. Following previous studies (e.g., Crowley & Unterman, 2013;Gao et al., 2008;Toohey et al., 2016), we assume that the relationship between the global mean SAOD at 525 nm (SAOD 525 ) and the total stratospheric SO 4 burden M SO4 is adequately represented by a power law scaling: where is an exponent and A is a prefactor. In contrast with previous studies (e.g., Gao et al., 2008, we use observations from a large number of eruptions (19 eruptions with sulfur mass ranging from ∼10 −2 to 10 1 Tg S, latitude from 41 • S to 50 • N and height from 12 to 25 km) and simulations from interactive stratospheric aerosol models to constrain the exponent ( ) of this scaling: 1. Limited direct observational measurements of the stratospheric SO 4 burden exist. Consequently, we identified all SAOD peaks in the 1979-2016 GloSSAC SAOD time series, smoothed over 6 months to avoid peaks related to nonvolcanic signals. We then defined corresponding SAOD increases by removing the minimum SAOD value between two peaks from the second peak value. We defined the associated SO 2 loading as the mass of sulfur-taken from Carn et al. (2016)-injected by eruptions which occurred no earlier than 1 month before the minimum SAOD value and no later than 1 month before the SAOD peak. The chosen 1-month lags excludes eruptions for which most SO 2 would likely not been transformed into sulfate aerosols . We filtered eruptions for which H * = SO 2 inj. height tropopause height ≤ 1. Last, we fit SAOD increases as a function of corresponding stratospheric SO 2 injections using a power law (Figure 2.a). We find an exponent of 1 ± 0.2. 2. We use the 1979-2015 experiments run with the Community Earth System Model version 1 with a prognostic aerosol scheme (Whole Atmosphere Community Climate Model, WACCM) using the Neely and Schmidt (2016) volcanic sulfur emission inventory (Mills et al., 2016;Schmidt et al., 2018), with adjusted mass of 10 Tg of SO 2 (instead of 18 Tg in Neely & Schmidt, 2016) and height of 18-20 km (instead of 23-27 km in Neely & Schmidt, 2016) for the 1991 eruption of Mount Pinatubo. We fit the monthly mean values of global mean SAOD anomaly (i.e., the difference between runs with and without volcanic emissions) at 550 nm to the stratospheric SO 4 burden anomaly using a power law fit and find an exponent of 1.01 ± 0.01 (Figure 2b). 3. We use 30 experiments from the MAECHAM5-HAM interactive stratospheric aerosol model, where 8.5 Tg S were injected at six different sets of altitudes and latitudes (Toohey et al., 2019). We fit the monthly mean values of global mean SAOD anomalies at 550 nm to the total stratospheric SO 4 burden anomaly using a power law fit and find an exponent of 0.84 ± 0.03 (Figure 2c). 4. We use 41 experiments from the UM-UKCA interactive stratospheric aerosol model, where injection mass, altitude and latitude were varied between 5-50 Tg S, 15-25 km, and 80 • S to 80 • N, respectively (Marshall et al., 2019). We fit the monthly mean values of global mean SAOD anomalies at 550 nm to the total stratospheric SO 4 burden anomaly for burden ≤10 Tg S using a power law fit and find an exponent of 0.89 ± 0.02 (Figure 2d).
In agreement with scaling used in previous studies (e.g., Crowley & Unterman, 2013;Toohey et al., 2016), observations and the WACCM run with historical volcanic emission (Figures 2a and 2b) suggest that a linear scaling between the stratospheric sulfur burden and the global mean SAOD holds for eruptions of the 1979-2015 period, that is, for eruptions injecting on the order of or less SO 2 than the 1991 eruption of Mount Pinatubo (≃9 Tg S). However, the observational constraint on should be considered carefully as it was not derived from an observed relationship between monthly SAOD 525 and M SO4 . It is also very sensitive to the set of eruptions included, with, for example, a value of 2.3 ± 0.8 when excluding the 1991 eruption of Mount Pinatubo. The two sets of interactive stratospheric aerosol model simulations used here suggest that the value of the exponent to be used in equation (7) should be close to ∼0.86 for stratospheric sulfate burdens up to 10 Tg S (Figure 2c and 2d). Given the proximity of this value to 1 and for simplicity, we will use a linear scaling to calibrate all model parameters in section 3 -including the prefactor A in equation (7)-using 1979-2015 observational data sets (Carn et al., 2016;Thomason et al., 2018). However, our analysis shows that the assumption of a linear scaling between the mass of sulfate and SAOD should be considered with caution, even for relatively small stratospheric burdens (on the order of those following the Mount Pinatubo 1991 eruption).
For large SO 2 injections, previous studies have suggested that the relationship between the sulfate burden and the SAOD follows a 2/3 power law (Crowley & Unterman, 2013;Metzner et al., 2014;Timmreck et al., 2010;Toohey et al., 2016), although the critical mass above which a nonlinear scaling should apply has been suggested to be as low as 2.5 Tg S (Metzner et al., 2014) and as high as 30 Tg S . Here we take advantage of the recent simulations of Marshall et al. (2019), with sulfur burdens of up to 50 Tg S and a large variety of eruption source parameters, to revisit these results. We perform a linear fit of SAOD versus sulfate burden for burdens ≤5 Tg S, and a 2/3 power law fit for burdens ≥20 Tg S. These fits are shown on Figure 2d and intersect for a burden of 10 Tg S, which we choose as the threshold sulfate burden M * above which to apply a 2/3 scaling. This estimate falls in the large range of thresholds previously estimated. Note that when fitting SAOD to sulfate burdens larger than 20 Tg S using a power law fit without a prescribed exponent, we find an exponent of 0.72 ± 0.12 which is compatible although a bit larger than the usually suggested 2/3 power law. The final scaling we adopt for SAOD at 525 nm in EVA_H is thus with M * = 10 Tg S and where the prefactor A × (M * ) 1∕3 for the 2/3 scaling guarantees the continuity at M SO4 = M * .

Volcanic SO 2 Injection in the Model
The Carn et al. (2016) data set provides the latitude, date, estimated mass of SO 2 , and estimated height for each reported volcanic SO 2 injection into the atmosphere. A simple method to include SO 2 in the eight-box model is to inject the entire mass into the box which contains the point defined by the eruption latitude and estimated injection height. However, in the absence of a transport equation for SO 2 in the model, a more realistic approach may be to distribute the SO 2 spatially instead of injecting 100% of the mass in a single box. To determine the spatial distribution of injected SO 2 , we investigated patterns of extinction increase in GloSSAC for the first 5 months following eruptions from the Carn et al. (2016) data set (see Supporting Information S1 and Figure S2). We found that the latitudinal and vertical positions of regions of initial extinction increase are in good agreement with the injection latitude and altitude reported in Carn et al. (2016) (Figure S3) and have average extents of 1.2 km and 7 • in height and latitude, respectively ( Figure S4). Accordingly, in EVA_H, we distribute the SO 2 mass injection among the boxes using Gaussian distributions centered on latitude and altitude estimates from Carn et al. (2016), with widths of 7 • and 1.2 km.

Overview of the Calibration Process
The linear scaling for the global mean SAOD for eruptions injecting less than 10 Tg S, in particular all eruptions of the 1979-2015 period, can be written where A is the same prefactor as in equation (8), AOD i is the spatially averaged AOD in a box i (i.e., extinction integrated from the lower to the upper vertical boundary of the box), and w i are weights calculated from the latitudinal extent of each box. For the 1979-2015 calibration period, each box thus follows the scaling in a box i of the model at any time t is given by: where wAOD i obs (t) are the observed time series calculated from GloSSAC (Thomason et al., 2018). E is a root-mean-square error (RMSE) on AOD calculated over all time steps and all boxes. Figure 3 shows the corresponding SO 2 inputs and wAOD obs time series in the eight model boxes. To calculate E, we run the model without a nonvolcanic background injection (terms B i in equation (1)) and compare its output with wAOD obs time series from which we substract a nonvolcanic background (black dashed lines on Figure 3). We define this background as the 1999-2002 average because this period has the lowest stratospheric volcanic SO 2 injections in the post-Pinatubo era (e.g., Carn et al., 2016;Schmidt et al., 2018). We come back to the inclusion and calibration of background injections in the model in section 3.2.
Our calibration problem is nonlinear and involves between 4 and 54 parameters depending on the choices made for the model configuration, such as the latitudinal and vertical dependence of loss time scales, which will be discussed in section 3.2. Given a specific model configuration, we use a "genetic algorithm" to find a set of optimal parameter values minimizing the error metric E (equation (10)). Genetic algorithms use strategies inspired from natural selection processes to efficiently solve nonlinear optimization problems with a large number of parameters (Goldberg, 1989). Text S4 provides a detailed description of the algorithm used and tests conducted using synthetic AOD data sets. Note.The contender model becomes the new reference model when the probability p cont<re that the error E of the contender model, which is smaller than that of the reference model, is larger than 0.95 and that the calibration process leads to physically consistent parameter values (e.g., in terms of range or ranking). Significant probabilities are reported in bold.

Optimal Model Configuration
Section 3.1 provides an overview of the method employed to calibrate any configuration of the idealized aerosol model described in section 2. We now have to choose a procedure for deciding which model "configuration" to use, that is, (i) the dependence of the time scales prod , loss , mix , and owm on latitude, altitude, and season; (ii) between which boxes to include two-way and one-way mixing fluxes. Configurations of increasing complexity will include more parameters and better fit the data. However, we have to decide whether improved fitness is significant given uncertainties in SO 2 and extinction observations. Figure 4 sketches the methodology used to determine whether a relatively complex "contender" model configuration performs significantly better than a relatively simple "reference" model configuration. Differences between a contender and reference model are kept minimal, for example, the only difference may be that all boxes have the same loss time scale in the reference model while loss time scale depends on altitude in the contender model, resulting in three loss time scales instead of one. First, we use the Carn et al. (2016) and GloSSAC data sets to calibrate the contender model using a genetic algorithm (Text S2). To test whether the calibrated contender model is significantly better than the reference model, we create 100 sets of perturbed model input and output data by randomly perturbating SO 2 injection mass and height (Carn et al., 2016) and weighted AOD time series in the eight boxes (Thomason et al., 2018) by up to 30%, 20%, and 10%, respectively. The error E of both the contender and reference model are calculated for each perturbed data set and we then obtain the probability p cont<ref that the contender model is better than the reference model given uncertainties in observational data used to calibrate the model. We use a significance level of 95%, so that if p cont<ref ⩾ 0.95, the contender model becomes the new reference model. If p cont<ref < 0.95, we maintain the previous reference model and choose a new contender model by trying a different incremental change in the model configuration. The 95% confidence level chosen is somewhat arbitrary because we would need to better constrain the level of uncertainty in observational data and/or to use uncertainties specific to each eruption to rigorously determine a confidence level. However, it provides us with a threshold to discriminate model configurations that we estimate to be significantly fitter.
In our initial reference model (model "0"), there are no one-way mixing fluxes (equation (4)), two-way mixing fluxes (equation (3)) are horizontal only, and all model parameters are independent of latitude, altitude, and season. The resulting model configuration has four parameters (A, prod , loss , and mix ). Table 1 summa- rizes the result of our iterative process to determine an optimal model configuration ( Figure 4). For example, the first row indicates that in the first contender model (model 1), loss time scales loss depend on altitude. Model 1 outperformed model 0 for 63% of the perturbed input/output data sets, which is not significant at the 95% level. The reference model has thus not been changed before testing a new contender model, as reflected in the second row.
The only changes that we retain relative to our initial model configuration are to make loss time scales dependent on latitude and altitude. Making the production time scales dependent on altitude or latitude significantly improved the model error, but the calibration results in ≥18 months production time scales in model boxes that do not receive significant injections from the 1982 El Chichòn and 1991 Pinatubo eruptions. When fitting global mean SAOD time series following individual eruptions using a one-box model (not shown), production time scales for the 1982 El Chichòn and 1991 Pinatubo eruptions are 6-9 months whereas production time scales for six eruptions injecting smaller SO 2 mass at lower altitude (such as Sarychev Peak 2009 and Nabro 2011) range from 0.5-2 months. Production time scales of 18 months are thus unrealistic, in particular, for the lower boxes of the model. In fact, such large production time scales result in an extended aerosol production in other boxes, meaning that a minimum in our error metric is achieved by fitting AOD variability associated with the 21st century eruptions by a relatively constant term, which is not physically satisfying. Consequently, we maintain a constant production time scale in the model and further discuss this choice in section 4.
Most other tested changes, such as adding one-way mixing terms or making mixing time scales seasonally dependent, did not result in significant error improvement. Following our calibration process, we thus do not retain some of the parameterizations implemented in EVA  that are physically consistent and result in good predictions of the spatiotemporal evolution of SAOD following the 1991 Pinatubo eruption (e.g., seasonal mixing, one-way mixing). However, the model scripts provided with this paper are not restricted to our choice of configuration but enable the user to choose latitudinal, vertical, and seasonal dependence for all model time scales (see Text S4). The values of the SAOD-sulfate mass scaling factor (A = 0.0187), the production time scale (7.8 months) and mixing time scales (10.7 months) are moderately but significantly different from the values used in EVA (0.036, 6 and 15 months, respectively). The production time scale corresponds to the effective production time scale of SO 2 into radiatively active SO 4 aerosol and should not be interpreted as the decay time scale of SO 2 , which is on the order of days to weeks (e.g., Carn et al., 2016). The loss time scales span an important range (2.3-14.5 months), with most of them being much lower that the value used in EVA (≃11 months), which is expected as EVA_H comprises three vertical layers. For Boxes 1-3 and 4-6, extratropical loss time scales are significantly smaller than the tropical ones, which is consistent with the tropical pipe model (Neu & Plumb, 1999;Plumb, 1996). Most model parameters are well constrained, with relative uncertainties on  Table  S1 shows that when calibrating the model using different periods (e.g., 1990-2015 or 1990-1997), obtained parameter values are in close agreement with those obtained in Table 2. Using the full 1979-2015 period results in better constrained parameter values, in particular, for the SAOD-sulfate mass scaling factor and the production time scale. We also repeated the calibration process with a mass of 10 Tg of SO 2 for the 1991 Mount Pinatubo (Table S1) instead of 18 Tg of SO 2 in Carn et al. (2016). Some authors (e.g., Mills et al., 2016) have argued that a smaller mass is representative of the SO 2 not scavenged by ash and ice on the basis of the reanalysis of Pinatubo SO 2 evolution by Guo et al. (2004). The resulting parameter values are not significantly different from the one shown in Table 2, although values for the SAOD-sulfate mass scaling factor (A, equation (8)) and production time scale ( prod ) respectively lie in the upper and lower range of those exhibited in Table 2.
Last, we find background sulfate injection terms B i (equation (2)) by fitting a model run without volcanic injections to the background AOD in each box defined as 1999-2002 average. Corresponding background injections and their uncertainties are reported in Table S2. The total injection in the model is 0.28 Tg S/yr, a bit larger but not significantly different from the value of 0.2 Tg S/yr used in EVA. Note that version 2.0 of GloSSAC became available after formal acceptance of our study whereas we use version 1.1. The global mean SAOD time series of both versions are shown in Figure S1, and we verified that their differences: i) do not affect the fact that spatially-varying production timescales do not significantly improve the model (Table 1); and ii) do not result in significant differences in the calibrated model parameter values (Table 2  and Table S1).
With all key model parameters calibrated, Figure 5 shows AOD predictions (area-weighted) for each box in comparison to GloSSAC. The Northern Hemisphere lowermost stratosphere (Box 8) accounts for over 25% of the model error E, with an important overestimation of AOD related to post-2005 eruptions and underestimation of AOD related to the 1982 El Chichón eruption. Similar errors are observed for the Northern Hemisphere lower stratosphere (Box 6). In general, the AOD responses associated with the Kasatochi 2008, the Sarychev Peak 2009 and the Nabro 2011 eruptions are slightly overestimated by the model. However, the observed AOD mostly falls within the model prediction confidence interval, whose magnitude is largely driven by uncertainties in injected SO 2 altitude and mass. The model seems to overestimate typical rise and decay time scales of AOD peaks associated with lower stratospheric injections despite the latitude and altitude dependence of the loss time scales.

Shape Functions for Prediction of Latitudinally and Vertically Dependent Properties
In EVA, Gaussian shape functions (in latitude and height) are used to produce latitude-altitude distribution of extinction given mass of aerosols in the three latitudinal boxes. Here, we use a multilinear regression approach to produce extinction distributions from observations. At each latitude and altitude z, we perform a multilinear regression where the extinction time series EXT 525 ( , z, t) from GloSSAC is the dependent variable and the weighted AOD time series predicted by the model in the eight boxes wAOD i (t) (using the Carn et al. (2016) SO 2 inventory) are the independent variables: where i = 1...8 is the box index, ( , z, t) is the error, and c i ( , z) are the regression coefficients of box i for latitude and altitude z. We impose that coefficients c i are positive and that their upper limit decay exponentially with distance from the edge of their associated box i. As the global mean SAOD is equal to the sum of wAOD in the eight boxes as well as to the global mean of the vertical integral of extinction, we also normalize each shape function c i by its global mean vertical integral. Additional procedures related to smoothing and extension to high latitudes are described in Text S3. The final shape functions of EVA_H are shown in Figure 6. Last, the global mean vertical integral of extinction equals the global mean SAOD and must follow our chosen scaling for SAOD (equation (8)). Consequently, for sulfate burdens larger than M * , we normalize each shape function by

Effective Radius and Wavelength-Dependent Optical Properties
Climate models without an interactive stratospheric aerosol scheme generally require wavelength-dependent extinction, single scattering albedo and scattering asymmetry factor to simulate the climate response to volcanic eruptions. We adopt the same approach as EVA to produce these parameters . We assume that the aerosol size distribution is log-normal with a single mode and a width parameter = 1.2. We then use look-up tables calculated from Mie theory to calculate wavelength-dependent optical properties from the extinction at 525 nm and the effective radius of the aerosol size distribution.
To calculate the global mean effective radius (R eff ),  used the following scaling: with = 1∕3, R = 0.78 μm (Tg S) −1∕3 , and setting a minimum effective radius of 0.2 μm. We first test whether an exponent of 1/3 seems appropriate using observations and derived products from GloSSAC and simulations from the three interactive stratospheric aerosol model previously described (section 2.4). In GloSSAC, extinction at 525 nm and 1,020 nm are the only variables issued from direct observations, while the effective radius is derived from these variables using methods described by Thomason et al. (2008). Consequently, instead of investigating the relationship between the effective radius and the mass of sulfate, we look at the relationship between the SAOD at 525 nm and the effective radius (Figure 7), which follows a scaling of the type R eff = r 1 ×SAOD given our assumed linear scaling between SAOD and M SO4 for eruptions injecting less than 10 Tg S (section 2.4). When fitting the global mean effective radius (mass weighted or surface area density weighted) to SAOD using a power law, both GloSSAC and the simulations from UM-UKCA suggest that a 1/3 scaling is adequate, although simulations from WACCM and MAECHAM suggest values of with ≃ 0.2 instead of 1/3. We thus maintain a value of = 1∕3 as in EVA. We set a minimum value of effective radius of 0.1 μm which seems broadly consistent with GloSSAC and simulations from interactive stratospheric aerosol models (Figure 7). Fitting the effective radius to SAOD using a 1/3 power law, values of r 1 range from 0.47 to 0.93 (for GloSSAC), corresponding to values of R (equation (12)) ranging from 0.17 to 0.26 μm (Tg S) −1∕3 using the relationship R = r 1 × A 1∕3 and our estimate of 0.0187 (Tg S) −1 for A (Table 1). Such values of R are 3-4 times lower than the value of 0.78 used in EVA. However, Figure S9 shows that EVA_H captures best the evolution of the global mean SAOD at 1020 nm following the 1991 Mount Pinatubo eruption when using a value of 0.26 (close to the value derived from GloSSAC effective radius and SAOD at 525 nm). We thus use a value of R = 0.25 μm (Tg S) −1∕3 in EVA_H. The local effective radius is then calculated so that (i) its mass-weighted global mean matches equation (12); and (ii) it follows the same spatial distribution as the aerosol mass, raised to the power 1/3.

Comparison With EVA and WACCM for the Historical Period
In this subsection, we compare the predictions of EVA_H for the historical period  with those made by • EVA, the idealized model on which EVA_H builds, but which does not account for SO 2 injection height, has a prescribed vertical structure, and is calibrated against the 1991 Mount Pinatubo eruption only. • WACCM, which includes a prognostic stratospheric aerosol scheme (Mills et Neely and Schmidt (2016) (VolcDB2, 1990-2014, Carn et al. (2016) (VolcDB3, 1979, and the subset of the strongest eight eruptions over 1998-2012 with parameters (SO 2 mass and height) averaged from all other databases used in Timmreck et al. (2018). Figure 8 shows the global mean SAOD time series for GloSSAC, EVA_H, EVA, and WACCM. In panel (a), idealized models are run with the Carn et al. (2016) volcanic SO 2 emissions inventory, against which we calibrated EVA_H. In panel (b), models are run using data from Neely and Schmidt (2016). WACCM uses an adjusted SO 2 mass for the 1991 Pinatubo eruption that has been argued to be representative of the mass of SO 2 not affected by ash and ice scavenging, and results in a good agreement between the model and observations (Mills et al., 2016;Schmidt et al., 2018). For each eruption, we inject exactly the same mass of SO 2 in EVA_H and EVA. Table 3 shows each model's root-mean-square error (RMSE) for the two volcanic SO 2 emissions inventories and two different time periods (full 1979-2015 period and post-Pinatubo period).
Regardless of the SO 2 emissions inventory used, EVA_H reproduces well the time evolution of the global mean SAOD. Over the 1998-2015 period, it even performs better using the Neely and Schmidt (2016) inventory instead of the Carn et al. (2016) inventory against which it was calibrated. The observed SAOD following the El Chichón 1982 and Mount Pinatubo 1991 eruptions lies within the estimated 95% confidence interval for model predictions. EVA_H tends to overestimate the global mean SAOD associated with 21st century eruptions when using the Carn et al. (2016) inventory and to underestimate it when using the Neely and Schmidt (2016) inventory. The main reason is the lower plume height estimates provided in the Neely and Schmidt (2016) inventory that result in less injected SO 2 and shorter-lived SO 4 in our box model.

) of Model-Predicted Global Mean SAOD Time Series (Figure 8) Relative to the GloSSAC Time Series
SO 2 database/Period Carn et al. (2016) Neely and Schmidt (2016) Model 1979-2015 1998-2015 1979-2015 1998- Timmreck et al., 2018). In particular, we show that for the 21st century, uncertainties in model prediction related to the different inventories existing are often larger than discrepencies between two SAOD observational data sets (GloSSAC and ; Friberg et al., 2018). Regardless of the inventory or SAOD data set used, the main failure of EVA_H lies in a clear overestimation over the rise and decay time of SAOD associated with 21st century eruptions, despite the latitudinal and vertical dependence of loss time scales in the model. This failure is related to the fact that the production time scale is constant with a value of ∼7.8 months. Consequently, in addition to overestimating SAOD rise time scales, we also overestimate decay time scales of relatively small eruptions for which the long production time scale compensates the small loss time scales in lower stratospheric boxes. We further discuss this problem and our choice of model configuration for production time scales in the following sections.
Despite imperfections in the prediction and behavior of EVA_H, it represents a clear improvement over EVA.
For the 1979-2015 period, EVA_H has a RMSE 30-50% smaller than that of EVA although differences are not significant (Table 3), and for the 1998-2015 period, the RMSE of EVA_H is a factor of ∼3 lower than EVA, with this difference being significant for both the Carn et al. (2016) and Neely and Schmidt (2016) inventories. In particular, EVA overestimates global mean SAOD over 2008-2014 by almost a factor of 3 using the Carn et al. (2016) inventory. Differences between EVA and EVA_H are not straightforward to interpret as they result from the following: (i) a different model structure; (ii) an additional input (injection height) in EVA_H; and (iii) different data sets used to calibrate the model. To gain insights on the importance of injection height to accurately predict volcanic forcing, we run EVA_H with all injections height fixed to the Pinatubo 1991 height (25 km in ; Carn et al., 2016), which is the only eruption used to calibrate EVA. In this run, we inject exactly the same mass of SO 2 for each eruption as for the run with observed injection height (only the distribution among boxes changes). The corresponding global mean SAOD prediction is the thin dashed line on Figures 8a and 8b with associated RMSE reported in Table 3. It is in close agreement with EVA, demonstrating that accounting for injection height makes a significant difference for accurately capturing volcanic forcing over a large range of volcanic injection parameters (e.g., Pinatubo 1991 vs. Sarychev Peak 2009).
When using the Neely and Schmidt (2016) inventory, EVA_H has slightly lower RMSE on global mean SAOD than WACCM, but with differences between the two models being insignificant (Table 3). In general, WACCM predicts larger SAOD peaks than EVA_H for 21st century eruptions, with significant differences for the Kasatochi 2008 eruption. Given the relatively low average injection heights in the Neely and Schmidt (2016) inventory, we suspect that these differences are related to the self-lofting of volcanic gases in WACCM, which increases the fraction of sulfur ending in the stratosphere following upper tropospheric/lower stratospheric injections. This process is absent in EVA_H, and analyses done to determine SO 2 distribution among the box did not reveal any systematic bias between injection heights reported in Carn et al. (2016) and the height at which observed peak extinction enhancements occur following eruptions (Supporting Information S1, Figure S3). Lastly, WACCM captures well the short rise and decay time scales of SAOD peaks associated with relatively small volcanic injections in the 21st century in contrast to EVA_H.
Beyond improving predictions for the global mean SAOD, a major motivation for our new idealized model is to better capture the vertical structure of extinction changes associated with volcanic stratospheric sulfur injections. Figure 9 shows  Figures 8 and 9 show that EVA_H overestimates the decay time scale of SAOD associated with the 21st century eruptions, compared to both observations and simulations by WACCM. To further investigate this limitation, we investigate the sensitivity of two forcing metrics to injection altitude and latitude:

Model Sensitivity to Injection Height and Latitude: Comparison With EVA, UM-UKCA, and MAECHAM
• The cumulative global mean SAOD at 525 nm, in months, calculated as the time-integrated SAOD between 0 and 38 months following the eruption. • The e-folding time of the global stratospheric SO 4 burden, in months, calculated using an exponential fit of the SO 4 burden time series between 1 month after the peak value is reached and the month at which it reaches 10% of its peak value.
We calculate these parameters for a July injection of 8.5 Tg S and compare the results to simulations conducted with UM-UKCA by Marshall et al. (2019) and with MAECHAM by Toohey et al. (2019). Results for a January eruption are also shown for MAECHAM. Figure 10 shows cumulative global mean SAOD as a function of injection height and latitude for EVA_H (top left panel) and UM-UKCA (top center panel). Values for UM-UKCA are calculated using a Gaussian process emulator trained with 41 simulations (Marshall et al., 2019). The two models are in broad agreement on the following features: (i) cumulative SAOD decreases as the injection latitude increases (in absolute value); (ii) cumulative SAOD decreases with decreasing injection height below ∼20 km. However, there are important differences between the two models. First, the cumulative SAOD predicted by UM-UKCA is much larger than that of EVA_H. For example, for tropical injections between 20 and 25 km, UM-UKCA has cumulative SAOD of ∼4.5 months versus 1.8 months for EVA_H. Second, UM-UKCA is much more sensitive to injection latitude, with the cumulative SAOD of an eruption at ≥45 • latitude being 30-60% smaller than an eruption with the same injection height in the tropics while this difference is only ∼20% in EVA_H. Third, the only seasonal effect in EVA_H is related to the tropopause height seasonal cycle, which explains the slight differences in cumulative SAOD for injections in the lowermost Southern Hemisphere stratosphere and lowermost Northern Hemisphere stratosphere. In contrast, for the July injection shown, UM-UKCA predicts a clearly larger cumulative SAOD and e-folding time for eruptions in the Southern Hemisphere compared to those in the Northern Hemisphere for injection heights between 18 and 27 km. This may be related to the more pronounced transport and stratosphere-troposphere exchange in the winter hemisphere (Butchart, 2014) in January-March (i.e., the Northern Hemisphere), when the aerosol burden of a July eruption peaks in UM-UKCA. Figure 10 (top right) shows cumulative SAOD for EVA_H, UM-UKCA, MAECHAM, and EVA for six sets of injection latitude and height for which simulations were conducted with MAECHAM, for an 8.5 Tg S July injection. Although the cumulative SAOD predicted by MAECHAM and UM-UKCA differ by up to 30%, both interactive stratospheric aerosol models agree remarkably well on the dependence of SAOD to injection latitude for a 24 km injection, with a decrease by a factor 2-2.5 between an injection at 4 • S and one at 56 • N. In comparison, EVA_H produces a weaker dependence with a decrease by ∼15%. However, for a 56 • N injection, EVA_H and MAECHAM are in reasonable agreement for the dependence of cumulative SAOD to injection height. Lastly, regardless of the set of injection height and latitude used, the cumulative SAOD predicted by EVA is ∼1.7 months. This constant value is expected as EVA does not account for injection height, and uses injection latitude only to determine the latitudinal distribution of aerosol. The loss time scales are independent of latitude so that the time evolution of the total sulfate burden and global mean SAOD only depend on the injected mass.
Bottom panels of Figure 10 are similar to the top panels, but showing results for the SO 4 e-folding time instead of the cumulative SAOD. EVA_H and UM-UKCA agree well on e-folding time for tropical injections above ≥20 km, ∼12 months, while MAECHAM predicts a smaller value of ∼8 months. However, for both interactive stratospheric aerosol models, the e-folding time strongly decreases with increasing latitude, whereas EVA_H exhibits a weak dependence on latitude. The e-folding time in EVA (12.1 months) is independent of both eruption latitude and height. Overall, the e-folding time scale in EVA_H varies between 9 and 11.5 months for injections heights between 10 and 26 km and all latitudes. This range is very small compared to the one of MAECHAM and UM-UKCA and may appear surprising given that loss time scales loss in the model are as small as 2.3 months in extratropical boxes (3.8 months for the lowermost extratropical stratosphere, that is, Boxes 7 and 8). However, the production time scale prod is large (7.8 months) and independent of latitude or height. As a result, sulfate is produced long after the peak sulfate burden, and the e-folding time scale largely exceeds the loss time scales for extratropical injections. Lastly, Figure 10 shows MAECHAM's results for a January eruption in addition to a July eruption. For an injection height of 24 km and at latitudes spanning 15-56 • N, the e-folding time scale and cumulative SAOD tend to be larger for eruptions occuring in winter (January for latitudes considered), which is consistent with the explanation proposed above for the hemispheric asymmetry observed for UM-UKCA e-folding time scale and cumulative SAOD. In contrast, the total stratospheric aerosol burden evolution does not depend on eruption season in EVA and EVA_H.
All in all, comparison with both observations (Figure 8) and interactive stratospheric aerosol models (Figures 8 and 10) suggests that the forcing predicted by EVA_H still lacks sensitivity to eruption latitude. Despite this limitation, it is important to stress that the sensitivity of forcing to eruption source parameters is more realistic in EVA_H compared to EVA in which the total sulfate burden and global mean SAOD evolution are independent of both injection altitude and latitude.

EVA_H Limitations and Future Developments
In light of sections 4.1 and 4.2, the most important future improvement to EVA_H is to implement a dependence of the production time scale prod on the injection parameters. The currently constant time scale results in a lack of sensitivity of the model-predicted forcing to the eruption latitude. The calibration methodology and/or data sets used in our study did not enable us to constrain such dependence, with unrealistically high values of prod obtained when implementing a height or latitude dependence (section 3.2). If we calibrate a model with height-dependent production time scales bounded to a maximum of 2.5 months for Boxes 4-8, it is significantly outperformed by the model configuration retained with constant production time scales (using the same performance criteria as in section 3). The primary reason is that with all other parameters being kept constant, a smaller production time scale results in larger SAOD peaks. Consequently, when enabling smaller production time scales in Boxes 4-8, the overestimation of SAOD over the 21st century is 10.1029/2019JD031303 worsened although the predicted rise and decay time scales compare better with observations ( Figure S10). A solution and potential future development is to make the scaling factor A (equation (7)) dependent on height as well, so that SAOD signals associated with both the 1991 Pinatubo and the 21st century eruptions can be reproduced, despite the tendency of smaller production time scales to produce larger SAOD peaks. However, constraining the sulfate mass-SAOD scaling with available observations and interactive stratospheric aerosol models is already challenging, even at global scale (see section 2.4), and such solution would largely increase the complexity of both the calibration process and the box model. In addition, we cannot exclude that the apparent overestimation of SAOD peak and rise time scale for the 21st century eruptions is a consequence of errors in the observational data sets chosen to calibrate the model (Carn et al., 2016;Thomason et al., 2018). For example, Figure 8c shows that for two SO 2 emission inventories, EVA_H tends to underestimate post-2000 SAOD which would facilitate the implementation of short production time scales in Boxes 4-8 while maintaining good predictions for the Pinatubo eruption. Altogether, given the significant improvements of EVA_H over EVA, we choose to maintain the model configuration resulting from the calibration process described in section 3. The scripts provided make it trivial for users of EVA_H to implement different values of production time scales in each box, in which case we recommend values of 0.5-2.5 months in Boxes 4-8 (see section 3.2 for justification of these values and Figure S10 for the corresponding model run).
Given the empirical nature of EVA_H, its calibration and predictions are limited by the parameter space covered by the set of eruptions used for calibration. In particular, the calibration of parameters of Boxes 1-3 (≥20 km) is constrained mostly by two large tropical eruptions (El Chichón 1982 andPinatubo 1991). Furthermore, whereas the ice core and geological records suggest that some of the most important volcanic events of the Common Era injected material well above 30 km in the atmosphere (e.g., Samalas 1257, Vidal et al., 2015), no eruptions used to calibrate EVA_H injected sulfur above ∼25 km. Until future eruptions contribute to fill this gap, interactive stratospheric aerosol model experiments could be valuable to help inform idealized models outside the parameter space in which they are calibrated.
Following our calibration procedure, seasonal mixing was not included in our chosen model configuration, in contrast to EVA, because it did not significantly improve the model performance as defined by our error metric (equation (10)). However, the seasonality of stratospheric mixing is apparent both in observations and models (e.g., Butchart ;2014) and is implemented as an option in EVA_H (see Text S4). Lastly, whereas interactive aerosol size evolution is key to accurately predict volcanic forcing (e.g., Mann et al., 2015), the parameterization we use for aerosol effective radius is simplistic (section 3.4) and effective radius does not affect, for example, the model sulfate loss time scales. Improving the representation of aerosol size distribution in the box model is thus an important area of future development.

Examples of Application of EVA_H: Reconstruction of Past Volcanic Forcing and Fast Response During Volcanic Eruptions
A major application of EVA (Toohey & Sigl, 2017;Toohey et al., 2016) is to produce forcing data sets for the experiments of the Model Intercomparison Project on the climatic response to Volcanic forcing (VolMIP; ; Zanchettin et al., 2016) and the Paleoclimate Modeling Intercomparison Project (Jungclaus et al., 2017;Kageyama et al., 2018). For VolMIP, the large spread among predictions from state-of-the-art aerosol-chemistry-climate models indeed prevented the identification of consensual forcing data sets derived from these models, motivating the use of an idealized model. Consequently, an important question is whether using EVA_H would significantly affect forcing data sets used in VolMIP or PMIP. We test this hypothesis using the following: • A Tambora (1815)-like eruption with the same injections conditions as those used in Zanchettin et al., 2016Zanchettin et al., (2016 Figure 3), that is, 60 Tg of SO 2 at 0 • N and 24 km altitude in April. • An Eldgjá (939)-like eruption with 32 Tg of SO 2 (Toohey & Sigl, 2017) at 63.6 • N and 12.5 km altitude (Moreland, 2017;17.5 km for plume top which corresponds to ∼12.5 km for the umbrella cloud) in April.
The resulting global mean SAOD time series for EVA_H and EVA are shown in Figure 11, along with VolMIP runs from four interactive stratospheric aerosol models for the Tambora case.
For the Mount Tambora case (Figure 11, left), the peak SAOD predicted by EVA_H is 20% smaller than the one predicted by EVA, which is largely due to our lower value of the threshold sulfate burden above which Figure 11. Global mean SAOD anomalies following a volcanic SO 2 injection with source parameters similar to those estimated for ( (Marshall et al., 2018;Zanchettin et al., 2016). These models are WACCM (Mills et al., 2016), UM-UKCA (Dhomse et al., 2014), SOCOL (Sheng et al., 2015), and MAECHAM (Niemeier et al., 2009). We use the latest runs available after some modeling groups updated their contributions and always use runs with point injection (as opposed to band injection) for modeling groups that tested both types of injection of volcanic SO 2 .
we apply a 2/3 scaling for SAOD (equation (8)). However, differences between EVA_H and EVA are not statistically significant. This result is not surprising given the similarity of injections parameters (tropical injection at ≃25 km) for Mount Tambora 1815 and Mount Pinatubo 1991, against which EVA is calibrated. We thus expect a reasonable agreement between EVA_H and EVA for high-altitude tropical injections and, in particular, for most experiments of VolMIP. Figure 11 also shows for EVA_H the uncertainty related to model parameter values and injection parameters, with uncertainty on the erupted mass of SO 2 taken from Toohey and Sigl (2017) and a 20% uncertainty on injection height. Although the predicted SAOD is uncertain by a factor of 2, the spread among predictions of interactive stratospheric aerosol models remains much larger. The predictions of two models (WACCM and UM-UKCA) are also clearly incompatible with the predictions of EVA_H. Although no conclusion can be made on which models are more realistic given the absence of SAOD observations and large uncertainties on the SO 2 mass and injection altitude for the 1815 Tambora eruption, these results stress again the large magnitude of intermodel spread, even in the face of the important uncertainties related to constraining sulfate injections from ice cores or model calibration against recent eruptions.
For the Eldgjá case ( Figure 11, right), there are significant differences between the SAOD predicted by EVA and EVA_H. If we use a latitude of 63.6 • but a height of 25 km in EVA_H (similar to that of the Pinatubo 1991 eruption), the peak SAOD is 40% smaller than the one predicted by EVA. This difference is solely due to differences in model structure (including sensitivity to eruption latitude) and calibration processes. When we use the estimated injection height of 12.5 km for this eruption (Moreland, 2017), the resulting SAOD is significantly lower than the one predicted by EVA_H with a 25 km injection height or the one predicted by EVA.
In particular, the predicted SAOD is 50-90% smaller than the one predicted by EVA. As a consequence, we conclude that (i) using EVA_H instead of EVA would significantly affect the forcing reconstruction for extratropical eruptions; and (ii) injection height is an important parameter that should be accounted for-when constrained-in past volcanic forcing reconstruction. A comprehensive reconstruction of volcanic forcing associated with all eruptions for which the mass, latitude and altitude of injection are constrained is beyond 10.1029/2019JD031303 the scope of this paper but is the subject of ongoing work which will greatly benefit from recent efforts to better constrain eruption source parameters (Burke et al., 2019;Gautier et al., 2019;Hartman et al., 2019). However, the preliminary results shown in Figure 11 reinforce the discussion of uncertainties given by Toohey and Sigl (2017) and help to quantify the degree to which the recommended PMIP4 forcing represents an upper estimate for extratropical eruptions.
As a final comment to this section, one of the main advantages of EVA_H over interactive stratospheric aerosol models is that it is computationally inexpensive. Consequently, it can be used to produce rapid estimates of future SAOD perturbations immediately following volcanic eruptions. A recent example of such application of the model is the June 2019 eruption of Raikoke (Kurile Islands). Shortly after the first estimates of SO 2 loading and injection height were available, we ran EVA_H and provided global mean SAOD predictions to members of the "Volcano Response" (VolRes) initiative (https://wiki.earthdata.nasa.gov/display/ volres). The model was run 1,000 times to span the large range of SO 2 mass and height estimates available during the first few days after the eruption. The figures provided to the community are shown on Figure  S11, and were shared with the VolRes community less than 30 min after deciding to apply EVA_H to the Raikoke 2019 eruption. EVA_H predicts relatively small perturbations of SAOD confined to the Northern Hemisphere, with a peak value of 9 × 10 −3 at most for global mean SAOD. This upper estimate was later refined to 6.5 × 10 −3 after a more detailed SO 2 injection profile was provided. Following our discussion of EVA_H limitations (section 4), we expect that the rise and decay time scales of SAOD shown on Figure S11 are overestimated. It will be an interesting test for the model to compare Figure S11 with SAOD observations over the next year.

Conclusions
We take advantage of recently developed data sets of volcanic SO 2 injections (Carn et al., 2016) and atmospheric optical properties (GloSSAC; Thomason et al., 2018) to develop EVA_H, a new idealized model of volcanic aerosol forcing that accounts for the mass, latitude, and height of the sulfur injected by a volcanic eruption. Compared to the most recently developed idealized model (EVA; Toohey et al., 2016) that did not account for injection altitude and was calibrated only against the 1991 Mount Pinatubo eruption, we show that EVA_H • captures significantly better the global mean stratospheric aerosol optical depth variations during the 21st century; • captures well the vertical evolution of extinction following eruptions of the 1979-2015 period; • exhibits a forcing sensitivity to the eruption latitude and injection height that is in better agreement with observations and interactive stratospheric aerosol model results.
Despite this latter improvement, an extensive comparison of EVA_H with interactive stratospheric aerosol model simulations shows that the latter remain more sensitive to the eruption latitude.
We apply EVA_H to discuss potential biases and uncertainties in EVA-based volcanic forcing data sets recommended for use in VolMIP (Zanchettin et al., 2016) and PMIP (Jungclaus et al., 2017), components of Phase 6 of the Coupled Model Intercomparison Project. While the volcanic forcing constructed from EVA_H does not significantly differ for high-altitude tropical volcanic injections, it is significantly lower for high-latitude or low-altitude emissions. As a consequence, we expect that the forcing produced by EVA_H would be similar for most experiments of VolMIP (Zanchettin et al., 2016) but may have significant differences with EVA(eVolv2k) (Toohey & Sigl, 2017), the reference volcanic forcing data set used in PMIP (Jungclaus et al., 2017;Kageyama et al., 2018).
In contrast to interactive stratospheric aerosol models, idealized models like EVA and EVA_H are computationally inexpensive and can be used to extensively explore eruption source parameter space, which is, for example, required to rigorously quantify uncertainties associated with reconstructed forcing of past eruptions. We provide Matlab ® scripts that enable to run EVA_H in the configuration selected in our study (section 3.2), but also in different configurations, for example, with additional dependence of mixing time scales on season or production time scales on height and latitude. All scripts are available in Text S4 and EVA_H.zip or via T. J. A.'s website (https://sites.google.com/view/thomasjaubry/products), GitHub (https:// github.com/thomasaubry/EVA_H), and Code Ocean (https://codeocean.com/capsule/5107752/tree/v1, clik on "files" then "metadata" to get started) where users without a Matlab ® license can run the EVA_H model.