The Probability of Mantle Plumes in Global Tomographic Models

While the downward mass flux in the Earth's deep interior is well constrained by seismic tomography, the upward flux is still poorly understood and debated. Recent tomography studies suggest that we are now starting to resolve deep mantle plume structures. However, a lack of uncertainty quantification impedes a full assessment of their significance and whether they are statistically distinct from noise. This work uses a spherical wavelet transform and random noise realizations to quantify the probability of deep plume‐like features in six recent global tomographic models. We find that out of 50 possible mantle deep plumes, 12 are highly likely, with probabilities larger than 80%, and 12 are likely, with probability between 70% and 80%. Objective, quantitative approaches as proposed in this study should be used for model interpretation. The five most likely deep mantle plumes are Tahiti, Macdonald, East Africa, Pitcairn, and Marquesas, which have some of the largest buoyancy fluxes estimated in a previous study that used hotspot swell volumes. This could resolve past discrepancies between deep mantle plumes inferred by visual analysis of tomography models and flux estimations from hotspot swell data. In addition, a notable unlikely deep mantle plume is Yellowstone, with probability lower than 50%. We also identify a likely deep mantle plume associated with the Amsterdam‐St Paul hotspot, a region scarcely discussed in previous studies and that deserves future investigation. Hence, our automated, objective approach is a valuable alternative approach for the quantitative interpretation of tomographic models.


Introduction
The upward flow in the Earth's mantle in the form of plumes is key for producing Earth's largest melting events, which are recorded in the geological record as large igneous provinces (LIPs) (Kellogg & Wasserburg, 1990). In turn, LIPs coincide with virtually all major recent extinctions, supercontinent breakups, and also with changes in geodynamo behavior (Sobolev et al., 2011). Significant advances in forward and inverse modeling, and in computational power, as well as an expanding network of seismic stations, have led to increasingly more detailed tomographic models of the Earth's deep interior that are starting to resolve deep mantle plumes (e.g., Bozdag et al., 2016;Chang et al., 2015;French & Romanowicz, 2014). However, a lack of uncertainty quantification makes the interpretation of plumes in these models difficult.
Mantle upwelling sources are classically seen as large plume heads of hot buoyant material, followed by a narrow tail rooted in a hot thermal boundary layer deep in the mantle (Campbell & Griffiths, 1990). This view has since evolved to include effects from the physical and chemical properties of the boundary layer (Lin & van Keken, 2006), resulting in what are now called thermochemical plumes. These plumes exhibit a variety of structures in geodynamic experiments and simulations such as stagnant piles, a tail with no head or filaments in the tails (e.g., Farnetani & Samuel, 2005). Geoid anomalies in the central Pacific have been explained by oscillating thermochemical domes topped by clusters of plumes at different stages of evolution (Davaille, 1999). A proposed mechanism for thermochemical plumes is subducted mantle material pushing up against a thermal boundary layer in the lowermost mantle (Steinberger & Torsvik, 2012), though it has also been shown that subducted slabs are not necessary (Ballmer et al., 2016). These hypotheses can explain why plumes appear predominantly around the edges of two large low shear velocity provinces (LLSVPs) at the core-mantle boundary (CMB), which have been suggested to be stable, ancient thermochemical piles (Steinberger & Torsvik, 2012). Alternatively, it has also been suggested that at least some mantle plumes are rooted in ultralow-velocity zones (ULVZs) near the CMB (Yuan & Romanowicz, 2017).
When considering the classical thermal model of mantle plumes, the narrow tail poses problems when it comes to being imaged in tomographic models. Seismic waves can diffract around the plumes and obscure any delay of seismic energy that passed through the plume (Nolet & Dahlen, 2000), a phenomenon known as wavefront healing. This has been shown in synthetic tomographic experiments to significantly hinder the resolution of narrow structures, particularly in the lower mantle (Maguire et al., 2018;Rickers et al., 2012). In addition to the general loss of seismic resolution in the lower mantle, this makes it difficult to image low-velocity anomalies that are vertically continuous through the mantle, which could be interpreted as plumes. Furthermore, vertical smearing due to, for example, vertically propagating waves (Maguire et al., 2018) can result in steeply elongated anomalies in tomography models based on traveltime data, which may be misinterpreted as deep mantle plumes.
Lower mantle plumes started to appear in tomographic models at the turn of the century using finite-frequency tomography, as well as traditional ray theory (e.g., Montelli et al., 2004Montelli et al., , 2006D. Zhao, 2004). More recently, French and Romanowicz (2015), which we shall refer to as FR15 throughout this manuscript, presented a list of 27 hotspots (surface expressions of mantle thermal instabilities) and one other location where lower mantle plumes are resolved in their global tomography model (French & Romanowicz, 2014). The hotspots were categorized as corresponding to either "primary," "clearly resolved," or "somewhat resolved" plumes. These plumes are much broader than suggested by the classical thermal plume model. Below 1,000 km depth, the broad plumes are almost vertical and continuous and often pinched between 500 and 1,000 km above the CMB, which suggests that they have a thermochemical origin (Lin & van Keken, 2006). At around 1,000 km depth, some plumes deflect laterally, others become thinner and may even split, and some exhibit ponding (FR15). In addition, other global tomography studies also started to show complex mantle plume behavior (e.g., Bozdag et al., 2016), including intriguing deep plume-slab interactions (Chang et al., 2016).
There is now a plethora of global tomographic models, each constructed with different data sets and modeling schemes, and each claiming to image its own features. Objective, quantitative model comparisons are useful to identify different as well as common features in the models. In turn, structures that appear across multiple models are in principle more likely to be required by the data rather than due to arbitrary model-specific choices. One example of such a quantitative comparison of tomographic models was performed by Lekic et al. (2012), who used k-means cluster analysis of shear wave velocity profiles in the lower mantle of five tomography models to constrain the shape of the LLSVPs beneath Africa and the Pacific and, encouragingly, found that the models largely agreed. The work was extended by Cottaar and Lekic (2016) to analyze P wave tomography models, which showed similar features but with lesser agreement. Moreover, Boschi et al. (2007) used a pixel picking algorithm to find plumes in a set of tomographic models and compared with geodynamic simulations. Our method focuses on what can be inferred from the tomographic models alone, removing the uncertainties and limitations of geodynamic simulations.
The use of wavelets in global seismic tomography is still in a rather exploratory stage, being used primarily as a basis for regularization in tomographic models (e.g., Charléty et al., 2013;Simons et al., 2011). The novelty of this study is that wavelets are used to identify physical structures in data, as has been done successfully in fields such as cosmology (e.g., Cayón et al., 2001;McEwen et al., 2008), computational fluid dynamics (e.g., Farge et al., 1990;Schneider & Vasilyev, 2010), and image feature extraction (e.g., M. Zhao et al., 2009). Piromallo et al. (2001) used wavelets to identify subducting slabs in tomographic models, but only locally in the Mediterranean. The appeal of using wavelets for feature identification is in their multiresolution and localization properties, which give both the scale and position of the feature of interest. This makes them ideal for identifying mantle plumes.
In this study, we analyze a set of recent global shear wave velocity models (Supporting Information Table S1 and Figure 1) to determine the probability that features in the models are statistically robust, rather than artifacts or noise. We aim to produce an automated, objective assessment of the models themselves, rather Figure 1. Depth slices at 100, 1,500, and 2,800 km depth through the seven tomographic models used in this study. From left to right: (top group) S20RTS (Ritsema et al., 1999), S40RTS (Ritsema et al., 2011), S362WMANI+M (Moulik & Ekström, 2014), (bottom group) SAVANI (Auer et al., 2014), SEMUCB-WM1 (French & Romanowicz, 2014), SGLOBE-rani (Chang et al., 2015), and SEISGLOB2 (Durand et al., 2017). Colorbars are in percent perturbation from PREM (Dziewonski & Anderson, 1981). Black lines show land masses. Green lines in the 1,000 km slices show tectonic boundaries, and in the 1,500 and 2,800 km slices show the LLSVP boundaries from Cottaar and Lekic (2016). Filled yellow circles show the primary plumes of FR15, and open circles show the hotspots of Courtillot et al. (2003). than relying on subjective visual inspection or comparison with geodynamic simulations. The analysis is performed using for the first time a spherical wavelet transform and random noise simulations. These repeated simulations produce a sample from a statistical distribution, from which we can obtain a measure of the probability of the presence of plumes subject to noise in the model. In this way we incorporate some measure of uncertainty in the velocity models, despite model uncertainties not being reported. This is a different approach to that used, for example, by Piromallo et al. (2001), which involved using wavelets to calculate length-scale correlations.
In section 2 we describe the tomographic models analyzed in this study. In section 3 we outline the wavelet transform on the sphere, noise realizations, and statistical calculations. Section 4 presents the results of our analysis of the tomographic models, giving new constraints on global mantle plumes, which are discussed in section 5. Section 6 presents our conclusions. Table S1 in the Supporting Information provides details on the data sets, parameterizations, and modeling techniques used to build the seven global shear wave speed tomographic models analyzed in this study: S20RTS (Ritsema et al., 1999), S40RTS (Ritsema et al., 2011), S362WMANI+M (Moulik & Ekström, 2014), SAVANI (Auer et al., 2014), SEMUCB-WM1 (French & Romanowicz, 2014), SGLOBE-rani (Chang et al., 2015), and SEISGLOB2 (Durand et al., 2017). S362WMANI+M, SAVANI, SEMUCB-WM1, and SGLOBE-rani include 3-D radial anisotropy, but here we analyze only their isotropic parts. We choose these models because they use a variety in sizes and types of data sets, different parameterizations both horizontally and vertically, and different modeling theories. In particular, comparisons between S20RTS and S40RTS would be almost entirely explained by increasing the lateral nominal resolution of the model and could determine if any plumes are missed in lower-resolution models. Furthermore, with the exception of S20RTS, these models are all more recent than those studied by Boschi et al. (2007). As such, our results may show if the most recent global tomographic models have revealed anything new about mantle plumes. We note that, while the data sets used to build these models are all different, they are not entirely independent. Many (e.g., S20RTS, S40RTS, SAVANI, and SGLOBE-rani) include the Visser et al. (2008), Ritsema et al. (2011), and Ekström (2011) data sets, and so similar results may be expected from these models. The models SEMUCB-WM1 and SEISGLOB2 are built from largely independent data sets. For completeness, we also analyze the finite-frequency P wave tomography model GAP_P4 (Obayashi et al., 2013). However, due to the generally worse data coverage of P wave models (at least in the upper mantle, Becker & Boschi, 2002), our discussion will focus mostly on the shear wave models. While P wave traveltime data may produce more robust images of the lower mantle than shear wave traveltimes, the use of normal mode data in some of the models should at least partly compensate for this.

Global Tomographic Models
In this study, models are re-expressed as perturbations relative to PREM (Dziewonski & Anderson, 1981), as in Chang et al. (2015) for fair comparison. We discretize the models at 100 km depth intervals between 100 and 2,800 km depth. Laterally, we use HEALPix pixelization (Górski et al., 2005), most commonly used in cosmological mapping. This is an equal-area pixelization with associated libraries for fast spherical harmonic transforms which are necessary for the spherical wavelet transform, as discussed in the following section. All maps have a pixelization parameter N side = 32, corresponding to 12,288 pixels. As such, pixels at the Earth's surface have an area around 40,000 km 2 , that is, roughly 2 • × 2 • at the equator.
Overall, the models show large-scale similarities in shear wave velocity structure. Figure 1 shows depth slices at 100, 1,500, and 2,800 km. At shallow depths (down to ∼300 km) we find the expected fast velocity perturbations under cratons and subduction zones, and slow perturbations at spreading centers. At the CMB all models show low velocities beneath Africa and the Pacific, corresponding to the LLSVPs, though agreement on the boundaries is not perfect. The LLSVPs in S362WMANI+M and SEISGLOB2 in particular have different shapes to the other models, and the magnitude of velocity perturbations is generally smaller. This could possibly be due to these models using more normal mode data or, in the case of SEISGLOB2, not having any body waves diffracted along the CMB. At midmantle depths (about 1,000 to 2,000 km depth) there is less of a clear pattern between all model slices other than slow anomalies beneath Africa and the Pacific, and fast near deep subduction signatures (e.g., India and North America), but details vary. A clear difference between the models is the level of detail. S20RTS, S362WMANI+M, and SAVANI are generally smoother models showing large-scale features, while S40RTS, SEMUCB-WM1, SEISGLOB2, and SGLOBE-rani are more detailed with smaller-scale features. This is a result of the different models being parameterized horizontally up to different scales and also of model regularization choices (Table S1). Cross-sections through the models centered at the Afar, Cape Verde, and Marquesas hotspots are shown in the Supporting Information ( Figure S1). Again, the level of detail is different from model to model. Clearly seen in all cases are low-velocity anomalies spreading throughout much of the mantle. In Afar, the anomaly tilts from the southwest up to the Afar depression, and all models except SEISGLOB2 seem to suggest a branch detaching at about 1,000 km depth and rising under Arabia. At Cape Verde there is a generally narrower and more vertical anomaly than in Afar, which stops at 1,000 km depth. S20RTS, S40RTS, SEISGLOB2, and SGLOBE-rani suggest lateral deflections in all directions at this depth, but this behavior is not so clear in the other models. At Marquesas, and in the Pacific in general, individual, vertically continuous, slow anomalies are not clear in SGLOBE-rani or SEISGLOB2. However, all the models show a strong anomaly just southwest of the hotspot location. SEMUCB-WM1 and SAVANI also seem to have a second distinct plume-like anomaly to the northeast of the hotspot location. The distinction between these anomalies is not so clear in the other models. All the models also show a flat slow anomaly in the top ∼300 km of the mantle across the majority of the Pacific, corresponding to a well-known low-velocity zone (Tan & Helmberger, 2007).
The general agreement of these tomographic models on large-scale features and disagreement on smaller scales are reflective of the vast majority of models published in recent years. Indeed, some may argue that small-scale features may belong to the null space of the inversion. On the other hand, some recent studies (e.g., Chang et al., 2015) have shown improved correlations between recent tomographic models compared to older ones, even at high angular degrees. This, along with a lack of uncertainty quantification, is why our method (described in detail in the following section) is based around simulating the high-frequency information in tomographic models.

Spherical Wavelet Transform
The wavelet transform is a method to decompose a signal into components of different length scales. It is typically seen as the application of a series of successively higher-frequency band-pass filters to a signal, thereby separating the information at different frequency bands, all the while maintaining where in space those frequencies arise. In contrast, the Fourier transform (and equivalently the spherical harmonic transform) only reveals the frequency content of a signal and loses the localization in space. We make use of wavelets to separate out the high-frequency noise from the tomographic models.
We use the scale-discretized axisymmetric spherical wavelet transform of S2LET (Leistedt et al., 2013) to decompose depth slices of a chosen tomographic model. We use axisymmetric wavelets, rather than directional wavelets or curvelets (e.g., Starck et al., 2006;Wiaux et al., 2008), for their simplicity because we assume nothing about the directionality of possible tilting plumes. While a brief summary of the spherical wavelet transform is given here, details of the implementation can be found in Leistedt et al. (2013). A spherical signal f ( , ), where and are colatitude and longitude, respectively, can be reconstructed from its scaling and wavelet coefficients W Φ , W Ψ via the inverse wavelet transform (Leistedt et al., 2013) where Φ is the scaling function encompassing the largest-scale information of f ; Ψ j are the J max − J min + 1 wavelet functions, with Ψ J max encompassing the smallest-scale information of f ; and  is a rotation operator that rotates the basis functions, Φ and Ψ j , around the surface of the sphere S 2 . The coefficients W are the scaling and wavelet coefficients that are found by the inner product of f and the basis functions. The result of the transform on a depth slice of a tomographic model is a set of maps showing where the large-and small-scale velocity anomalies occur in the depth slice. Figure 2 shows the spherical wavelet transform at a depth of 200 km using the tomographic model SGLOBE-rani (Chang et al., 2015). It can be seen that, for example, midocean ridges are retrieved as medium-scale low-velocity anomalies (e.g., Figure 2e), as expected.

Noise Simulations
As we look to test whether small-scale features in the tomographic models are due to noise or artifacts, we assume that the model is one sample from a random normal distribution and that the difference between samples is high-frequency noise. To simulate this distribution, the wavelet maps of a depth slice at scales j to J max , for some chosen j, are chosen to represent noise and replaced with Gaussian random fields. The scales that are replaced are chosen by comparing their power spectra with that of the initial depth slice. The wavelets are by definition localized in harmonic space; that is, each wavelet function Ψ j has power between angular degrees 1 and 2 . As such, we choose to simulate the wavelet maps that have 1 and 2 greater than about half the bandlimit (maximum angular degree) of the model. This corresponds to angular degrees greater than around 16 for SAVANI, SEMUCB-WM1, S40RTS, and SGLOBE-rani and around 9 for S362WMANI+M and S20RTS. In these ranges the power spectra of the velocity perturbation maps are generally flat, which is indicative of white noise. Normalized spectra for all the models and their wavelet maps at all depths are given in the Supporting Information ( Figures S2-S8). The random fields are generated by scaling standard normally distributed random values of W Ψ in harmonic space by the power spectrum of the initial map at the jth scale ( Figure S9). This ensures that the random scale maps give the same contribution to the reconstruction as the initial scale map. Power spectra are calculated for each wavelet scale j by where W Ψ lm = ⟨W Ψ |Y lm ⟩ for spherical harmonics Y lm ( , ) of degree l and order m, ⟨· |·⟩ denotes an inner product, and * denotes complex conjugation. Figure S9 in the Supporting Information shows examples of power spectra of the smallest-scale maps shown in Figure 2 and those of a randomly generated map at the same scales. A new map is constructed by Equation 1 using the new wavelet scales. We create 500 samples in this way and include the initial tomographic model in the sample set, resulting in a sample size of N s = 501. This is performed for each depth slice and each model independently.
To summarize, for each model we manually choose angular degrees that can be considered as noise based on the wavelet map power spectra. Each depth slice is wavelet transformed, the wavelet coefficients at the noisy angular degrees are replaced with (pseudo)random values, and then an inverse wavelet transform gives a new random map. We end up with N s = 501 samples of each depth slice in each model. The different samples have the same large-scale information, but the small-scale information differs. Examples of this are shown in Figure S10.

Statistical Calculations
Having created our samples in wavelet space, we then perform some statistical calculations on their real space maps. Each map is tiled into an 8 • × 8 • grid for analysis. The tile size is chosen to be large enough to cover the lateral extent of a plume tilting by about 10 • from vertical at the CMB in any direction. While it is possible for a mantle wind to cause tilting of over 40 • in geodynamic modeling (e.g., Steinberger, 2000), this would result in a tile size so large that any signature of (sub)vertical plumes, like those reported by FR15, would be covered by their surroundings. Since the tiles tessellate, a strongly tilting plume could be visible in a group of neighboring tiles. Tests with larger tile sizes showed similar results in terms of the general regions where plumes are found, but resolving separate plumes proved difficult, particularly in the Pacific where a cluster of plumes is expected, as discussed below. The main drawback of smaller tile sizes is a significant increase in computational cost.
We define the signal-to-noise ratio (SNR) in each tile t of all the maps i at depth z as where p t represents a map pixel in tile t, N p t is the number of pixels in tile t, v (z) s i is the ith velocity perturbation map at depth z, ⟩) 2 is the map of standard deviations in velocity perturbation computed over the ensemble of maps at depth z (e.g., Figure S10). Note that N p t is not constant between tiles, as HEALPix is an equal-area pixelization scheme. SNR reflects the overall velocity perturbation above noise in the tile, with negative values representing a slow region. We consider plumes to be defined as, very simply, vertically continuous low-velocity regions. As such, we expect SNR for plumes to be negative at all depths. On the other hand, ridges are expected to have a strongly negative signal only at shallow depths, and conversely, cratons should have a strongly positive signal at shallow depths. From this we can calculate the probability of each tile in the tomographic model being a plume. First, we fit a Gaussian to the N s = 501 samples of SNR at each depth and tile. Then, at a given depth z, we define the probability of there being a plume at a tile t as the cumulative distribution of the fitted Gaussian at SNR = 0, that is, where  is the Gaussian probability density function, and (tz) and (tz) are the sample mean and standard deviation, respectively, of SNR in tile t at depth z. Taking the mean over all depths then gives the overall probability of a plume at tile t, where N z is the number of depths considered. We have chosen to integrate in Equation 4 from −∞ to 0 as our definition of "plume" simply requires low velocity, which is given by SNR < 0, regardless of if the region is noise dominated. To help identify noise-dominated regions, we also define a confidence metric. Our plume probability definition is equivalent to integrating what is known as the mixture distribution, which is obtained by taking an equally weighted average of the distributions of SNR at each depth, that is, We define our confidence in the probability measured as the distance between the mean of this mixture distribution, mix , and 0, measured as a number of standard deviations mix ; that is, confidence n is given by We define this only for when mix ≤ 0, as this is where we will find plumes. Noise-dominated areas will have −1 < mix < 0 and thus a low confidence. However, since this metric describes how much SNR stands out from its uncertainty, rather than directly how much velocity perturbations stand out from the simulated noise in the model, its interpretation in terms of plume probability is not obvious. As such we use this metric more as a verification of our plume probability. Figure S11 in the Supporting Information shows graphical representations of plume probability and confidence in terms of SNR. Figure 3 shows the depth profiles of SNR for the 11 primary plumes of FR15 and 11 other "elsewhere" locations chosen to represent various tectonic settings, for example, ridges, cratons, subduction zones, and mountain belts. The results shown here are for the model SGLOBE-rani, with corresponding figures for the other models shown in the Supporting Information (Figures S12-S17). The colored lines show the mean SNR profile, and the corresponding shaded areas represent 2 standard deviations. As previously mentioned, we expect this profile to be negative for mantle plumes. This is generally the case, particularly at depths below 1,000 km, with the exception of Cape Verde that shows positive SNR in the transition zone (between ∼400 and ∼700 km depth) and of Comoros, which show slightly positive SNR between ∼2,000 and ∼2,600 km depth. Overall, the depth profile for Comoros shows the lowest SNR, while Samoa has a much stronger signal throughout much of the lower mantle than the other plumes. The depth profiles for the "elsewhere" locations show more variability and are distributed around the zero line, as can be expected from their different tectonic settings. Large separations between the profiles occur from about 1,500 km depth. Cratonic regions such as beneath Australia and Russia show the expected strongly positive signal down to ∼300 km. The East Pacific Rise shows the expected strongly negative signal down to ∼400 km depth, which is also seen in the slower spreading Mid-Atlantic ridge, albeit not so strongly pronounced. At Yellowstone, where the nature of volcanism is hotly debated, SNR oscillates between positive and negative regions. Among all the "elsewhere" depth profiles, the SNR beneath Namibia stands out by becoming strongly negative below ∼1,800 km depth. Figures S12-S17 in the Supporting Information show the SNR profiles for the other six global tomography models considered. They show overall similar characteristics to those in Figure 3, except for model SAVANI, which shows greater and smoother variability with depth than the other models, that may be a result of the smoothing regularization used (Table S1). Moreover, model S362WMANI+M shows larger uncertainties (i.e., larger standard deviations) than other models, most likely due to this model being expanded only up to a maximum angular degree of 16. As a result, the high-frequency noise we simulate for this model is much lower frequency relative to the other models, and so the features that are simulated are much larger, resulting in quite different velocity map samples. SEISGLOB2 shows strong increases in magnitude of SNR from 2,000 to 2,500 km depth at the plume locations and then a decrease in magnitude down to the CMB. This could again be a result of having more normal mode data. However, there are also some plume locations . Colors run from blue (low probability) to red (high probability). Middle column: Significant detections. Orange, brown, and red show regions where probability is greater than 68.2%, 95.4%, and 99.6%, respectively, and gray shows low probability. Right column: Confidence in probability measurements. Blue, orange, and red show regions where the mean of the mixture distribution of SNR is up to 1, 2, and 3 standard deviations, respectively, from 0, and gray shows where the mean is positive. Each row shows results for S20RTS, S40RTS, S362WMANI+M, SAVANI, SEMUCB-WM1, SGLOBE-rani, and SEISGLOB2 individually and all combined, respectively.

10.1029/2020GC009276
with positive SNR and lower magnitude SNR in the midmantle. This could be a result of using relatively fewer body wave traveltime measurements, which provide good constraints in the midmantle (e.g., Ferreira et al., 2019). Figure 4 shows the probability, significant detection (probabilities greater than 68.2%, 95.4%, and 99.6%), and confidence maps obtained for the seven tomographic models individually and for the seven models combined. When combining the seven models, we take the seven SNR data sets of N = 501 samples and combine them into one larger data set of N = 3,507 samples, rather than averaging them. In all cases, probabilities over 50% are concentrated in the Pacific and Africa regions (Figure 4, left column), indicating a possible influence from LLSVPs. The northern elongation of the high probability region toward Iceland beyond the African LLSVP is present in all models, as well as a slight extension eastward into the southern Indian Ocean. Additionally, the "Perm Anomaly" in western Russia found by Lekic et al. (2012) is also an area of high probability, particularly for S20RTS and S40RTS. It is also present in SGLOBE-rani and SEMUCB-WM1 but is less well detached from the African LLSVP.

Plume Probability and Confidence
The significant detection maps (Figure 4, middle column) show where probability of a plume is greater than 68.2%, 95.4%, and 99.6%. The confidence maps (Figure 4, right column) indicate which regions stand out from noise in orange and red. In all the models we find the highest probabilities in the eastern Pacific near the Tahiti, Macdonald, Pitcairn, and Marquesas hotspots, some of the primary plumes found by FR15. This area also tends to be where there is the highest confidence in the probability measurements, with confidence greater than 2 . Another area of high confidence and high probability consistent throughout the models, except S362WMANI+M, is at the East Africa hotspot (referred to as Afar in FR15). This high probability detection extends southward toward Kenya, but not northeast toward Afar. While much of the area above the African LLSVP has probability greater than 68.2%, much of this region is noise dominated (blue regions in confidence maps) except around the edge of the LLSVP and near hotspots. This is similar for the Pacific LLSVP. Reassuringly, all the primary plumes of FR15 are found to have probability above 68.2% in all models, except Comoros in S20RTS, S40RTS, and SGLOBE-rani. Table S2 in the Supporting Information details the probabilities found for these plumes and for all the hotspots of Courtillot et al. (2003). SGLOBE-rani and SEMUCB-WM1 also show high confidence and significance near Iceland. Interestingly, the location of the Iceland plume in SEMUCB-WM1 is confidently found further northeast-at the Jan Mayen hotspot-than reported by FR15. Additionally, we note that confidence is lost and generally lower probabilities are obtained when combining SNR data sets from all the models (Figure 4, bottom row). For completeness, we also performed our analysis for the P wave model GAP_P4 ( Figure S23 in the Supporting Information). We see the same pattern of higher probability and confidence in eastern Africa and the Pacific. A notable difference is that features near the Canaries and Cape Verde hotspots seem to have been detected better than in the shear wave models. We repeated our analysis with the bottom 1,000 km of the mantle removed to see the effect of the LLSVPs. The results for this are shown in Figure S18 in the Supporting Information. The regions of highest probability and confidence remain largely unchanged in this case, although notably, lower probabilities are present under central and southern Africa. We also repeated our analysis using a coarser depth discretization for SGLOBE-rani ( Figure S19), to reduce correlation between depth slices and somewhat eliminate effects of possible vertical smearing in the model. Again, differences are minimal. Since there are no particular regions with increased confidence or very high probability when excluding the lowermost mantle, we continue our analysis using the whole mantle. Figure 5 shows a vote map where each model gives a vote to the tiles where it has assigned a plume probability greater than 94%, with the LLSVPs of Cottaar and Lekic (2016) overlain. We use this very high probability threshold to account for relatively high probability regions with low confidence (e.g., orange and blue regions in the middle and right columns, respectively, of Figure 4). This threshold is the highest percentage point at which all seven agree on at least one tile, which is just south of the surface expression of the East Africa hotspot. The standout regions are again the eastern Pacific and along the East African Rift, each gaining at least five votes, although Afar only gains two. Iceland and Hawaii each gain up to three votes; the Canary Islands and Cape Verde earn none. An area of interest is around the Southwest Indian Ridge around the Marion, Kerguelen, and Crozet hotspots, as well as near the surface expression of the Amsterdam-St Paul hotspot further east and not investigated by Courtillot et al. (2003). While no tile in this area gains more than two votes, the proximity of multiple tiles with votes indicates the possibility of some plume-like feature. Indeed, the Amsterdam-St Paul has plume probabilities greater than 80% in S20RTS, S40RTS, SEMUCB-WM1, and SGLOBE-rani (Table S2). We note also that votes in Africa and the Indian Ocean appear around the edges of the African LLSVP. The case is similar in the Pacific, though not as clearly due to clustering of many votes in the area. Figure S20 in the Supporting Information shows additional vote maps with thresholds at 80%, 85%, 90%, and 95%. At all these thresholds, there are no particular regions away from current hotspots where four or more models give a vote.

Hotspots
We present the probabilities for all the hotspots of Courtillot et al. (2003) in Figure 6 as boxplots, with the detailed numerical results in Table S2. We also include the East Africa and Amsterdam-St Paul hotspots, while the Meteor hotspot is omitted as it lies in the same tile as the Bouvet hotspot and so has identical results. The boxplots show the interquartile range (IQR) of probabilities across all seven tomographic models, with the median probability shown as a vertical bar. Presenting the probabilities in this manner allows us to summarize plume probabilities at each hotspot without mixing models, which, as shown above, has some undesirable effects. The hotspots have been classified as "highly likely," "likely," "unclear," and "unlikely" plumes according to their lower quartile being greater than 80%, between 70% and 80%, between 50% and 70%, and lower than 50%, respectively. We clarify here that a hotspot being classed as "unlikely" does not necessarily mean that a plume is unlikely to exist here, rather the models do not show a robust, continuous slow feature. Figure 6 also shows a map of median per model confidence at the hotspot locations, identifying which hotspots stand out from noise. All of FR15's primary plumes are classified as "highly likely," except for Cape Verde, Iceland (both "likely"), and Comoros ("unclear"). Crozet, Marion, and Easter are also classed as "highly likely." In general, our "highly likely" and "likely" plumes tend to have better agreement between models, with smaller IQRs and occasionally a single low-end outlier. These are also the hotspots that stand out from noise the most, with the highest per model confidence. Many of the "unclear" plumes on the other hand have outliers at both ends of their IQR, indicating fairly strong disagreements between models at these hotspots. Amsterdam-St Paul is just below being classified as "likely," with a fairly large IQR countering the high probabilities in some models previously mentioned. From the top inset map in Figure  6 we can see that the "highly likely" plumes are mostly around the edges of the LLSVPs, as are many of the "likely" and "unclear" plumes. Some "highly likely" plumes also appear on top of the Pacific LLSVP. On the other hand, "unlikely" plumes, for example, Yellowstone, are found away from the LLSVPs, with the exception of Hoggar. Figure 6. Boxplots of probabilities across seismic tomography models at the hotspots of Courtillot et al. (2003), in addition to Amsterdam-St Paul and East Africa, and without Meteor. Colored boxes show the interquartile range (IQR) of probabilities, with the vertical line inside the box representing the median probability. Colors are based on our plume classification, which is determined by the lower quartile. The whiskers extend to the lowest probability within 1.5 IQR beneath the lower quartile (left end of boxes) and to the highest probability within 1.5 IQR above the upper quartile (right end of boxes). Any probabilities beyond 1.5 IQR from either quartile are considered an outlier, shown as black diamonds. The top inset map shows hotspot locations with colors corresponding to the plume classification, and the bottom inset map shows the median per model confidence at each hotspot. Dark patches are the LLSVPs of Cottaar and Lekic (2016). Coordinates of the hotspot locations can be found in Table S2 of the Supporting Information.

Discussion
The fact that there is a general loss of confidence in probability when the SNR data from all models are combined into one data set shows that combining tomographic models that have been constructed using different seismic data sets and methods does not lead to a more complete solution. In fact, information seems to have been lost, most likely due to discrepancies between models. As such it is best to assess each model independently, while looking for consistencies and differences between them.
Most tomography models show a high probability and a high confidence anomaly beneath the East Africa hotspot extending south through Kenya and slightly lower probabilities beneath Afar ( Figure 5). Additionally, all models have strongly negative SNR in the lower mantle beneath Namibia (Figures 3 and S9-S13). This is consistent with results from fluid dynamics experiments suggesting that at least three instabilities exist beneath South Africa, Tanzania/Kenya, and Afar, at different stages of evolution (Davaille et al., 2005). While South Africa is suggested to lay above a large superswell that has yet to reach the upper mantle, Tanzania/Kenya is above a deep mantle plume and Afar is above a possible dying plume that used up all the material in the CMB layer beneath the region (Davaille et al., 2005). Figure S21 shows the SNR profiles at the Afar hotspot for all the models. All except S362WMANI+M show SNR becoming less negative in the lowermost mantle, which could indicate that a plume here is indeed dying. Alternatively, the high probability beneath the East Africa hotspot and the lack of evidence for a deep mantle plume beneath Afar could also be consistent with the presence of a single superplume in the region rising from the CMB beneath Kenya and then deflecting toward Afar in the upper mantle (e.g., Ritsema et al., 1999). Since our probability analysis results in an average over all mantle depths, it is difficult to distinguish the nature of the Afar hotspot from the probabilities alone.
Hawaii is distinct in Figure 5; however, the many votes in the southern Pacific make it difficult to distinguish individual plumes here. This is consistent with what is observed when visually inspecting the models ( Figure S1). While all the Pacific hotspots (Macdonald, Marquesas, Tahiti, Pitcairn, and Samoa) have high probabilities across all the models (Figure 6), how distinct these plumes are is not clear from our analysis. However, our results are consistent with the interpretation of a dome with multiple plumes on top (e.g., Davaille, 1999). The dome, along with a low-velocity layer at the top of the mantle suggested to be due to the presence of partial melt (Tan & Helmberger, 2007), would cause low velocities throughout much of the mantle and hence generally high probabilities. The highest probabilities occurring near the hotspots would thus correspond to the plumes on top of the dome.
Clusters of individual votes around the Kerguelen, Marion, Crozet, and Amsterdam-St Paul hotspots in the southern Atlantic and Indian oceans, as well as around 2 confidence in this area (Figure 4), suggest that features have been detected here in some models, but a relative lack of seismic stations in the area makes exact locations difficult to resolve. Plumes at these locations are likely according to our classification. Indeed, FR15 classified Kerguelen and Crozet as clearly and somewhat resolved plumes, respectively, in model SEMUCB-WM1. While Amsterdam-St Paul is not mentioned at all in that study, our analysis shows that SEMUCB-WM1 does give a vote in that area with some confidence (Figure 4). Amsterdam-St Paul is also not considered at all as a possible deep-origin hotspot by Courtillot et al. (2003), though has high plume probabilities in multiple models. To the best of our knowledge, there is only one seismic study referring to this hotspot being associated to a plume in a P wave tomography model (D. Zhao, 2004Zhao, , 2007. Moreover, studies on the geochronology and isotopes of the lava composition (e.g., Janin et al., 2012) have shown that there must be a deep plume here, making it a major target for plume-ridge interaction studies (Janin et al., 2012). Finding high plume probabilities at this hotspot shows that our numerical method can find features that may be missed upon visual inspection of tomographic models. Moreover, for example, Boschi et al. (2007) did not detect a feature here from pixel picking, which may also be indicative of an improvement in more recent tomographic models. Of the 11 primary plumes identified in model SEMUCB-WM1 by FR15, all are found with probabilities greater than 75% in that model (Table S2). However, some, such as Tahiti and Iceland, are just next to an area of higher probability (Figure 4), which could be a result of deflections in the top 1,000 km. In the case of Iceland, the plume is confidently placed at the Jan Mayen hotspot, with slightly lower probability in Iceland. This indicates the possibility of two plumes in the area, as suggested by Rickers et al. (2013) or a single broad plume, a distinction which, similarly to Afar and the Pacific, is difficult to make from our analysis. Alternatively, the location could also be an edge effect of our relatively coarse 8 • × 8 • tiling.
More generally, out of the 50 hotspots listed in Courtillot et al. (2003) (plus Amsterdam-St Paul and East Africa), we find that 14 are highly likely to correspond to deep mantle plumes, with probability larger than 80%, and 13 are likely, with probability between 70% and 80%. Our classification agrees well with most likely deep plumes of Boschi et al. (2007), for example, Macdonald, Tahiti, East Africa, Hawaii, Samoa, and the Canaries. In their work, Boschi et al. (2007) made direct comparisons with geodynamic simulations, allowing them to account for deflections of plumes, so it is encouraging that we achieve similar results from the tomographic models alone and as such we expect Figure 6 to further contribute to the plume debate. The strongest differences are that Marquesas and Pitcairn were ranked much lower, and Reunion and Comoros were ranked much higher, in Boschi et al. (2007) than in our work. Given the improved agreement in the more recent tomographic models used in this study (Chang et al., 2015), these differences may be due to an improvement in the models. There is an overall good overlap between the highly likely plumes found in our study and the hotspots with the largest buoyancy fluxes estimated from hotspot swell volumes (Afar, Bermuda, Galapagos, Hawaii, Iceland, Macdonald, Marquesas, Pitcairn, Samoa, and Tahiti; King & Adam, 2014), apart from Bermuda and Galapagos. This would resolve previously reported discrepancies between deep mantle plumes inferred by visual analysis of tomographic models and flux estimations from hotspot swell data (King & Adam, 2014). However, a recent study has revised these flux measurements, taking into account a bias for faster moving plates (Hoggard et al., 2020). This revised down the fluxes at many of the Pacific hotspots, and many of our "unlikely" and "unclear" plumes have been revised up ( Figure S22). As such, it seems that discrepancies between tomographic models and flux estimations may remain. A notable example of unlikely deep mantle plume is Yellowstone, where the origin of volcanism is heavily debated (e.g., Nelson & Grand, 2018;Zhou et al., 2018). However, we note that a thin plume impinging on a subducting slab as has been suggested at Yellowstone (Zhou et al., 2018, and references therein) may not be detectable in the global tomography models due to resolution limitations. Quantitative approaches such as that used in this study should be the way forward to try to resolve this debate; however, as discussed below, it is possible that our particular approach has missed a thin plume here.
Our method is a powerful way to interpret tomography models objectively and numerically and has the advantage over simpler analyses in that it can take into account some uncertainty in the high-frequency information of the models. It does however have some limitations. The primary one is in how we define a plume based on our SNR test statistic. With the aim of being as general as possible, our definition does not take into account tilts of more than 10 • from the CMB, deflections in the midmantle, gaps, plume heads, and other complexities in plume behavior. The most one could do to account for these complexities would be to consider different depth ranges, weigh each depth differently, or use a larger tile size. Incorporating deflections into our plume definition based on where plumes seem to deflect by visual inspection, or tracing a plume in tiles depth by depth, would bias the result, meaning we cannot define a plume in tiles where plumes are not expected and the analysis would no longer be completely objective. We could, however, incorporate, for example, instantaneous mantle flow calculations based on tomographic models as in Yoshida (2012), defining a plume to have both low velocity and upward flow. Another consequence of our definition is that what we have described as "deep" plumes in this work effectively means "slow on average throughout the whole mantle," and as a result, thin plumes under subducted slabs can be missed, which may have been the case for Yellowstone. Future work may include a comprehensive analysis of all possible hotspots using, for example, regional tomography models. Moreover, in order to refine our plume classification, it will be valuable to combine seismic tomography information with other observables (hotspot tracks, rock compositions, etc).

Conclusions
We have used for the first time a spherical wavelet transform and random noise simulations to assess the probability of deep mantle plumes in seven global tomographic models. This is a completely objective, numerical assessment that is mostly automated, requiring only a few manual choices. Analysis of the set of 50 global hotspots reveals that 12 are highly likely associated with deep mantle plumes, with probability larger than 80%, and 12 are likely, with probability between 70% and 80%. The Amsterdam-St Paul hotspot is found to have high plume probabilities in most of the models analyzed, despite being scarcely discussed in the seismic tomography literature. This shows the potential of our numerical assessment for finding features not found upon visual inspection. Future seismic instrumentation in the region of Amsterdam-St Paul should provide some valuable insights on plume-ridge interactions. We have also shown that combining the information from multiple tomography models leads to an overall loss of information and confidence, and thus, it is important to assess models independently, prior to looking for similarities. Further work can take advantage of the tool used in this study to analyze the probability of other features in tomographic models such as subduction zones, or plumes could be analyzed in more detail in regional models.

Data Availability Statement
Code for performing noise simulations and SNR measurements is available in the online repository (Marignier, 2020).