Volume 125, Issue 7 e2020JB019714
Research Article
Free Access

Optimization of the Match-Filtering Method for Robust Repeating Earthquake Detection: The Multisegment Cross-Correlation Approach

Dawei Gao

Corresponding Author

Dawei Gao

School of Earth and Ocean Sciences, University of Victoria, Victoria, British Columbia, Canada

Pacific Geoscience Centre, Geological Survey of Canada, Sydney, British Columbia, Canada

Correspondence to: D. Gao and H. Kao,

[email protected];

[email protected]

Search for more papers by this author
Honn Kao

Corresponding Author

Honn Kao

School of Earth and Ocean Sciences, University of Victoria, Victoria, British Columbia, Canada

Pacific Geoscience Centre, Geological Survey of Canada, Sydney, British Columbia, Canada

Correspondence to: D. Gao and H. Kao,

[email protected];

[email protected]

Search for more papers by this author
First published: 23 June 2020
Citations: 15

Abstract

Waveform match-filtering (MF), based on cross-correlation between an earthquake pair, is a powerful and widely used tool in seismology. However, its performance can be severely affected by several factors, including the length of the cross-correlation window, the frequency band of the applied digital filter, and the presence of a large-amplitude phase(s). To optimize the performance of MF, we first systematically examine the effects of different operational parameters and determine the generic rules for selecting the window length and the optimal frequency passband. To minimize the influence of a large-amplitude phase(s), we then propose a new approach, namely, MF with multisegment cross-correlation (MFMC). By equally incorporating the contributions from various segments of the waveforms, this new approach is much more sensitive to small separation between two sources compared to the conventional MF method using the entire waveform template. To compare the reliability and effectiveness of both methods in capturing interevent source separation and identifying repeating earthquakes, we systematically conduct experiments with both synthetic data and real observations. The results demonstrate that the conventional MF method can detect the existence of an event but sometimes lacks the resolution to tell whether the template and detected events are co-located or not, whereas MFMC works in all cases. The far-reaching implication from this study is that inferring source separation between an earthquake pair based on the conventional MF method, particularly with data from a single channel/station, may not be reliable.

Key Points

  • Selecting a proper template window length and an optimal filter is of great importance in optimizing the match-filtering method
  • High cross-correlation coefficients obtained by conventional match-filtering do not necessarily imply small separation between two events
  • Match-filtering with multisegment cross-correlation (MFMC) is much more effective and reliable in discriminating source separations

1 Introduction

The match-filtering (MF) method uses waveform cross-correlation to determine the similarity between a pair of events. It is a powerful tool in modern seismology to identify repeating earthquakes (e.g., Huang & Meng, 2018; Igarashi et al., 2003; Matsuzawa et al., 2004; Meng et al., 2015; Nadeau et al., 1995; Naoi et al., 2015; Schaff & Richards, 20042011; Schmittbuhl et al., 2016; Uchida et al., 2003; Yamashita et al., 2012) and to detect events that can be easily missed by conventional phase arrival-based methods (e.g., Chamberlain & Townend, 2018; Gibbons & Ringdal, 2006; Peng & Zhao, 2009; Ross et al., 2019; Shelly et al., 2007; Schultz et al., 20142017; Skoumal et al., 201420152019; Warren-Smith et al., 20172018; Zhang & Wen, 2015). The extensive applications of this technique have led to major observational breakthroughs (e.g., Shelly et al., 2007).

The cross-correlation coefficient (CC), a value characterizing the degree of similarity between different waveforms, is often taken as a sole criterion in MF for repeater identification (e.g., Buurman & West, 2010; Ma & Wu, 2013; Schaff & Richards, 20042011) and earthquake detection (e.g., Zhang & Wen, 2015). When dealing with repeating earthquakes with nearly identical waveforms recorded at one or more common stations, it is commonly assumed that events with a very high CC belong to the same cluster and physically represent repeated ruptures in the vicinity of the same patch of the same fault. The employed CC thresholds typically range from 0.75 to 0.98 (e.g., Bohnhoff et al., 2017; Hayward & Bostock, 2017; Huang & Meng, 2018; Igarashi et al., 2003; Matsuzawa et al., 2004; Meng et al., 2015; Nadeau & McEvilly, 1999; Naoi et al., 2015; Schaff & Richards, 20042011; Schmittbuhl et al., 2016; Schultz et al., 2014; Uchida et al., 2003).

Little attention has been paid to the operational parameters, such as the cross-correlation window length and filter frequency band, that can significantly affect the CC values. Because the calculation of CC is most sensitive to the phase(s) with large amplitudes within the template window (often the shear wave and surface waves) (e.g., Buurman & West, 2010; Calderoni et al., 2015; Li et al., 2017; Myhill et al., 2011; Schmittbuhl et al., 2016), the contribution of other phases with low amplitudes, such as depth phases (Ma, 2010; Ma & Atkinson, 2006) and coda waves (Robinson et al., 2011; Snieder & Vrijlandt, 2005), which contain additional source location information, can be overwhelmed. Thus, a very high CC due to the match of a specific high-amplitude phase(s) does not necessarily represent a small distance between two hypocenters. In other words, the traditional waveform cross-correlation-based MF approach may be an excellent tool to detect events, but whether they are truly co-located or not remains to be verified.

The focus of this study is to understand the limitations of the traditional MF method and to develop a new approach that can reliably identify repeating earthquakes, especially with data from a single station due to sparse network coverage. Even for regions with excellent network coverage and high station density (e.g., Japan), it is a popular practice to use only two stations in searching for repeating earthquakes because of their generally low magnitudes (e.g., Igarashi et al., 2003; Matsuzawa et al., 2004; Nishikawa & Ide, 2018; Uchida et al., 2003). To optimize the performance of MF, we first investigate how the operational parameters can influence CC values and define the generic rules for specifying the window length and the frequency passband of the applied filter. To minimize the effects of a large-amplitude phase(s) presented in the waveforms, we propose a modification to the existing MF: match-filtering with multisegment cross-correlation (MFMC). This new approach is very effective in recognizing the hypocenter location difference between a pair of events because it equally considers the contributions from all segments of the waveforms. A slight hypocenter shift causing subtle changes in the waveforms will be reflected in the sudden drop of CC values with the new approach. Finally, we verify the effectiveness of the MFMC method by applying it to both synthetic data and real observations. The results demonstrate that the MFMC approach can unambiguously discriminate event pairs with small separation from those with large separation in either horizontal or vertical direction.

2 Factors Affecting CC Values

Traditionally, the waveform similarity between an event pair with the same window lengths is defined by the normalized CC:
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0001(1)
where a and b correspond to the discrete time-series of two events, urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0002 and urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0003 are the mean values of each time-series, and n is the total number of samples. CC ranges from −1 (reversed shapes) to 1 (identical shapes). The CC value of 0 represents no correlation (i.e., totally different shapes). For earthquake detection and repeater identification, part or all of the wave train of a well-located event (known as the template event) is normally utilized as a template to scan through continuous waveforms. CC calculation is usually performed for every sample point. Once the computed CC exceeds a certain threshold, an event is declared. A few studies may take additional efforts to confirm the detection (e.g., Warren-Smith et al., 2017, 2018).

2.1 Window Length

It is clear that CC is a function of time window length. However, there is no standard rule of selecting time windows for CC calculation. Different studies favor different window lengths that generally covers high-amplitude phases, ranging from seconds to minutes (e.g., Huang & Meng, 2018; Igarashi et al., 2003; Nadeau et al., 1995; Ross et al., 2019; Schaff & Richards, 20042011; Schultz et al., 2017; Skoumal et al., 2014; Zhang & Wen, 2015). Using a shorter time window (normally containing at least one high-amplitude phase) is more likely to yield a higher CC value. In the extreme case, for example, the CC will be exactly ±1 if the window contains only 1 data point according to Equation 1. In contrast, using a longer time window is meant to compare all the phases of interest within that window and tends to result in a relatively lower CC value.

Because the total length of a seismic wave train increases with source-receiver distance, using a time window with a fixed length can only work for specific cases. To properly consider the distance effect, a more general and reasonable choice is to use a dynamic window based on the differential traveltime between the P and S phases (e.g., Baisch et al., 2008), i.e.,
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0004(2)
where Twin is the window length, k is a constant, and Ts and Tp represent S wave and P wave arrival times, respectively. If the window starts from the P onset, k is set to be 3. This choice is similar to the preferred value proposed by Baisch et al. (2008). If the window starts from the S onset, k is usually 2.

2.2 Frequency Band of the Digital Filter

A variety of filters ranging from very broad to very narrow frequency bands have been adopted by different studies before performing waveform cross-correlation (Table 1). To test how different filters can affect CC values, we apply some commonly used band-pass filters (Table 1) as well as high-/low-pass filters to a real waveform example.

Table 1. A List of Digital Filters Commonly Used in Waveform Match-Filtering
Filter (Hz) References
1–20 Bao and Eaton (2016)
1–10 Schmittbuhl et al. (2016) and Warren-Smith et al. (20172018)
0.8–8 Schultz et al. (2014)
1–8 Shelly et al. (2007), Meng et al. (2015), and Huang and Meng (2018)
2–8 Peng and Zhao (2009) and Zhang and Wen (2015)
0.5–5 Schaff and Richards (20042011) and Schaff (2010)
0.8–5 Schultz et al. (2015)
1–5 Schultz et al. (2017)
1–4 Igarashi et al. (2003), Uchida et al. (2003), Matsuzawa et al. (2004), Meng et al. (2015), and Huang and Meng (2018)

In Figure 1, we present an event pair with similar waveforms occurring in northeastern British Columbia, Canada (Visser et al., 2017). This pair (Nos. 3258 and 3257) is characterized by very high signal-to-noise ratio (SNR) (see waveforms in Figure 1a and amplitude spectra in Figure 1c). To calculate the CC value of an event pair, we cut the template from the S wave arrival with a length of 2(Ts − Tp) from one event. Then the template is slid from 0.2 s before the S arrival of the other event to 0.2 s after, with a step of one sample. A ±0.2 s shift should be adequate to take care of any manual phase-picking error. The maximum value of the CC results during the sliding is defined as the final CC value of the event pair. It is obvious that utilizing raw or band-pass-filtered waveforms yield similar cross-correlation results except for the case of 2–8 Hz band-pass filtering (Figure 1b). In this particular case, the improper choice of a 2–8 Hz band-pass filter would remove the dominant frequency band between 1 and 2 Hz in the waveform (Figure 1c), thus making the seismic waveforms less similar (Figure 1a, bottom) as shown by the sudden drop of the CC value (Figure 1b).

Details are in the caption following the image
Effect of band-pass filtering on the result of waveform cross-correlation (CC). (a) Normalized waveforms of a pair of events (Nos. 3258 and 3257 taken from Visser et al., 2017) aligned according to S wave arrivals. (top) Original and (bottom) filtered waveforms. CC value determined from the yellow shaded segment is labeled for each panel. (b) CC values determined after applying some commonly used band-pass filters. (c) Amplitude spectra of each event and its corresponding pre-signal noise. Pink bar outlines the dominant frequency band of seismic energy. (d) The coherence function of this event pair.

We reach the same conclusion when various high-pass filters are applied. Once the dominant seismic energy (in the 1–2 Hz frequency band) is filtered out, the CC value drops significantly (Figure S1a in the supporting information). With a low-pass filter, however, the CC can be extremely high (>0.99) even though the dominant frequency energy has been removed (corner frequency ≤1 Hz, Figure S1b). This is because the remaining frequency content of the signal becomes so narrow banded that the waveforms of these two events are essentially identical (Figure S1c). Another way of verifying the selection of frequency passband is to examine the coherence function between the two signals (Figure 1d). Overall, the coherence function provides consistent information that the two waveforms match best at lower frequencies. However, it cannot identify which frequency band (1–2 Hz in this case) we have to keep in order to boost the waveform similarity.

Taken together, CC is definitely a function of the frequency band of the applied digital filter. Choosing a proper filter is very important when performing waveform cross-correlation. By checking amplitude spectra of both signal and presignal noise, we can easily design a filter that keeps the dominant seismic energy while reducing background noise effectively. We note, however, that keeping the dominant energy in the signal does not mean choosing a filter with a very narrow frequency bandwidth, which may lead to meaningless correlation and even false detections (Carrier & Got, 2014; Dodge & Walter, 2015; Harris, 1991; Schaff, 2008). Instead, we could consider the time-bandwidth product of the seismogram that characterizes the amount of information contained in the signal to determine the optimal bandwidth. With a larger time-bandwidth product, i.e., longer signal duration and wider frequency bandwidth, the cross-correlation can be of greater statistical significance (Schaff, 2008; Schaff et al., 2018).

We caution that the choice of frequency band should also depend on the source size and rupture process when identifying repeaters (Uchida, 2019). A predominantly low frequency passband may not have sufficient spatial resolution to identify neighboring events that are not overlapped with each other, whereas a passband focusing on very high frequencies may increase the likelihood of excluding true repeaters that occur in the same source area but with different rupture characteristics (e.g., the rupture nucleation point and/or directivity). Readers interested in further discussion on this aspect are referred to Harris (1991), Schaff (2008), and Uchida (2019) for more details.

2.3 Large-Amplitude Phase(s)

As outlined in section 1, the conventional way of calculating CC is most sensitive to the large-amplitude phase(s) within the window. For demonstration purpose, here we show one synthetic example. The configuration of this experiment is given in Figure 2a. We place two strike-slip events in the center of an array at depths of 3 km (Event 1) and 13 km (Event 2), respectively. The stations are 5 km away from the epicenter. We calculate the synthetic seismograms using the frequency-wavenumber approach (Zhu & Rivera, 2002). The velocity model used in the calculation (Figure 2b) is taken from the ToC2ME experiment (Eaton et al., 2018; Tan et al., 2019). The source time function is assigned as a simple triangular shape with a duration of 0.1 s, suitable for small earthquakes (Harrington & Brodsky, 2009). All synthetic seismograms are filtered in the range of 1–10 Hz, a frequency band typical for local earthquakes (Warren-Smith et al., 2017).

Details are in the caption following the image
Synthetic experiment to test the performance of the match-filtering (MF) method. (a) Station-source setup. Brown triangles mark the seismic stations with azimuths labeled below the symbols. Open star represents the epicentral location of Event 1 and Event 2. Insert shows the focal mechanism used in the synthetic waveform calculation. (b) Velocity model from Eaton et al. (2018). Red and blue arrows mark the depths of Events 1 and 2, respectively. Red and black arrows also denote the template depths used in the tests in section 9. (c and d, top) Band-pass-filtered (1–10 Hz) and normalized synthetic seismograms of Event 1 at Station 1 and Station 2, respectively. Dashed fuchsia boxes show the template windows with one segment used in the conventional MF method. Color shaded areas represent the four segments employed in the MFMC method. Dashed thick black line indicates the surface wave train after the S phase. (c and d, middle) Band-pass-filtered (1–10 Hz) and normalized synthetic seismograms of Event 2 at Station 1 and Station 2, respectively. Template from Event 1 is superposed at the location of the best match according to the conventional MF method. (c and d, bottom) The cross-correlation results computed with the conventional approach and MFMC method, respectively.

The east-west component of Event 1 at Station 1 (azimuth = 315°) exhibits clear P wave, S wave, and surface wave trains (Figure 2c, top). The amplitudes of the P and S waves are comparable because Station 1's location is antinodal for P and SV waves and nodal for SH wave. In contrast, the P wave and surface wave of Event 2 are obscure (Figure 2c, middle) because of the much greater focal depth. Here we employ a template window that starts from the P wave onset of Event 1 with a length of 3(Ts − Tp) to cross-correlate the corresponding segment of Event 2 based on the traditional method (Equation 1). The poor CC indicates that these two events are not correlated owing to the very large difference in hypocentral locations (Figure 2c, bottom).

However, if we take a closer look at the east-west component of Station 2 (azimuth = 30°), both events show very strong S wave yet very little P and surface waves (Figure 2d, top and middle). Even though these two events are 10 km apart in depth with very different S-P differential traveltimes and the majority of the waveforms before and after S waves are very different (Figure 2d, middle, gray shaded box), the computed CC based on the conventional approach still remains very high (>0.9) because of the simple match of the high-amplitude S phase (Figure 2d, bottom). Therefore, a high CC obtained through the conventional waveform cross-correlation method cannot guarantee the spatial separation to be small, especially in the cases that one (or few) large-amplitude phase(s) dominates the waveform in the calculation window.

3 Proposed Alternative Form of Cross-Correlation

3.1 Theory

As shown in Figure 2d, the traditional approach is very sensitive to the large-amplitude phase(s), and thus, the result may not be reliable in identifying repeating earthquakes. To overcome this issue, we introduce the MFMC method. The first step is to divide the template (with n samples) into Nseg segments of equal lengths (n/Nseg). By shifting the segments together along the continuous waveform, we perform the cross-correlation calculation for each segment and its corresponding waveform, i.e.,
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0005(3)
where urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0006 is the segment length. The final cc at each sample point is defined as
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0007(4)
The parameter Nseg essentially acts as a weighting factor for each segment (Equation 4). The value of Nseg can be flexible in theory. If Nseg equals 1, the new method using a template with only one segment is identical to the conventional approach (Equation 1). Using a larger Nseg divides the template into more segments and is more sensitive to the waveform changes of another event due to location difference but increases the computing time. A practical way of assigning Nseg is to consider the cycles of the longest period wave (1/fmin) in the band-pass-filtered template, i.e.,
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0008(5)

Based on our experience, a minimum Nseg of 4 is required to achieve the optimal performance of our new method. In Figure 3, we summarize the work flow of MFMC.

Details are in the caption following the image
A flowchart to illustrate the steps of match filtering with multisegment cross-correlation (MFMC).

To demonstrate the strength of our new method, we apply it with four segments to the two synthetic events discussed in section 4. For the east-west component of Station 1, both the old and new methods yield very similar results with low CC values (Figure 2c, bottom), suggesting that these two events are poorly correlated. For the same component of Station 2, the CC value obtained with our new method is significantly lower than that obtained with the conventional method, meaning that large spatial separation exists between these two events. This simple experiment demonstrates that the multisegment approach is much more effective in recognizing the subtle change in the waveforms due to source location difference.

3.2 Effect of Background Noise

To test how sensitive our new method is to the level of background noise, we conduct several additional experiments. First, we take the three components of Event 1 at Station 2 as signal (Figure 4a) and generate Gaussian noise (Figure 4b) with Numpy (Walt et al., 2011; https://numpy.org). The noise is filtered in the same frequency range as signal. We then make noisy seismograms by adding scaled noise to the signal according to a given SNR. The SNR in this paper is defined by the ratio of the mean absolute amplitude of signal to that of noise, i.e.,
urn:x-wiley:21699313:media:jgrb54286:jgrb54286-math-0009(6)
where n and m are the numbers of samples in the signal and noise waveforms, respectively. It should be noted that we purposely exclude samples in the segment with the largest amplitude, the second segment in Figure 4a (top), for example, to emphasize the comparison between low-amplitude phases and noise when calculating the mean absolute amplitude of signal. Waveform cross-correlation is carried out with both the old and new methods between the signal and the noisy seismograms. For each assumed SNR, we repeat the experiment 100 times to account for the randomness of noise.
Details are in the caption following the image
A test on the effect of background noise. (a) Normalized waveforms of Event 1 at Station 2. Symbols are the same as in Figure 2d. (b) Normalized Gaussian noise filtered between 1 and 10 Hz. (c) Curves showing the relationship between CC and SNR. Thin gray and light pink lines represent the results of the conventional MF and MFMC methods with each noise realization, respectively. Thick black line denotes the averaging result of 100 noise realizations for the conventional method, and fuchsia line shows that for the MFMC method.

The results are summarized in Figure 4c. With a high SNR (≥6), the effect of background noise is negligible for both the conventional MF and MFMC methods. With a low SNR, however, the influence from noise becomes much more pronounced. Overall, the CC value calculated with the multisegment approach is lower and drops faster than that obtained with the conventional method. This suggests that our new approach is more sensitive to the level of background noise as, intrinsically, it emphasizes minor details in the waveform.

Furthermore, we notice that the CC curves derived with the conventional MF method show varied noise sensitivities over different channels, while those obtained with MFMC are sensitive to the specific noise added each time (Figure 4c). For example, the CC value calculated with the conventional method shows a clear decreasing trend when the SNR is ≤4 for both north-south and vertical channels. But the same decreasing trend is observed for the east-west channel when SNR is ≤2. In contrast, the CC curves derived with the MFMC method in general display a similar pattern over different channels. This indicates that the MFMC results are less dependent on a specific channel used in the calculation. However, the MFMC-derived CC values seem to be relatively dependent on the specific noise added each time as the variation between individual tests is more apparent.

In summary, there is a trade-off between the ability of reliably detecting repeating earthquakes and the tolerance of high noise. Both methods can work well when the SNR is ≥1 and suffers from high level of noise when SNR is ≤0.5 (Figure 4c).

4 Verification: Constraining Interevent Separations Using Synthetic Data

As we have shown that the conventional one-segment cross-correlation approach is not always reliable in differentiating the location difference between two earthquakes, here we apply both methods to synthetic data to systematically compare their sensitivity to interevent separations. In this experiment, we consider spatial separation in either horizontal or vertical direction. The station geometry is the same as that in Figure 2a, and one event (the template event) is placed at the center of the array. Then we incrementally shift the source (the detected event) to the north or surface by 0.1 km each time. The synthetic seismograms are generated and processed in the same way as in section 4. When implementing our new method, we follow the working procedure illustrated in Figure 3.

In our experiment, we considered three different focal mechanisms: strike-slip, normal, and reverse. The results of normal and reverse-faulting essentially reach the same conclusion of those of strike-slip fault (Figures 6, 7, and S2–S13). To keep our discussion focused, only the results from strike-slip fault are presented and discussed here. The other cases are presented in the supporting information (Figures S10–S13) for readers who are interested in more technical details.

4.1 Horizontal Interevent Separations

For horizontal interevent separations, we consider two situations: the template event that occurs at shallow depth (3 km) and that at deep depth (10 km). The most pronounced difference between waveforms of the two template events is the amplitude variation of surface waves. When the source depth is at 3 km, the amplitude of surface waves can be comparable to that of the S waves (Figure 5a). In contrast, waveforms of the template event at 10 km are all dominated by the S waves except the vertical channel (Figure 5b).

Details are in the caption following the image
Band-pass-filtered (1–10 Hz) and normalized synthetic seismograms of two template events at different depths. Source-station setup is the same as that in Figure 2a. Red lines mark the templates that start from the P phase.

To investigate how the CC value derived from both methods changes in response to horizontal source separations (Figure 6e, insert), we tested three scenarios: single channel, single station with three channels (Z, N, and E), and multistation with all channels. For the first scenario, the seismogram of the template event is slid from 0.5 s before the corresponding phase of the detected event to 0.5 s after. We consider two kinds of templates starting from either the P or S onset. The window length is dynamically determined by Equation 2. The peak CC value during the sliding is taken as the final CC between the template and detected events. For the second scenario, the CC values from all three channels of the same station are averaged, and the maximum within the sliding window is taken as the final CC. For the third one, we average the CC values over different stations and components with a fixed time difference between the template and detected events. In this case, a larger shifting window (from 0.5 s before the earliest P or S arrival to 0.5 s after the latest P or S arrival) is used during the CC calculation.

Details are in the caption following the image
CC variation due to horizontal interevent separation with a template starting from the P phase. Detailed configuration of the experiment, including the assumed strike-slip focal mechanism, is shown in Figure 2. The results of the template event at depths of (a–e) 3 and (f–j) 10 km. Dashed black line marks the CC value of 0.9 for reference. Insert in (e) schematically shows the arrangement of horizontal separation between two sources. Red star denotes the template event, and blue star represents the detected event. Station geometry is the same as that in Figure 2a.

We first conduct the experiment with a template containing both the P and S phases. Figures 6a6e present the results of how the CC value calculated by both methods varies with source separation when the template event is shallow (depth = 3 km). When a single channel is used, both one-segment and multisegment methods yield very scattering results (Figure 6a6c). Overall, the CC obtained by both methods drops with increasing horizontal separation between the template and detected events. Yet slight variation exists. For example, the CC curve derived with the conventional method using the east-west component of Station 2 reaches the local minima at the separation distance of 2.0 km but increases at larger distances between 2.0 and 2.5 km (Figure 6a). This implies that using the CC values to constrain source separation may not be stable when the separation is large. For a given separation distance between two sources, the CC value obtained with our new method in general is lower than that with the conventional method, suggesting that our new method is relatively effective in recognizing spatial changes. A particular case should be mentioned that both methods work similarly well for all three channels of Station 1 (Figure 6a6c). This is because the template waveforms are not dominated by any specific phase as found at other stations (Figure 5).

If three channels of the same station are included in the calculation, the scattering effect is reduced (Figure 6d). Especially for the new method, Stations 1, 2, and 4 all give similar solutions at small separations. If more stations are included, both methods can work extremely well because the template and detected events have different traveltimes to different stations. The computed CC becomes very sensitive to source location difference as a very small hypocenter horizontal shift of only 0.2 km can lead to a dramatic CC drop (below 0.4) no matter which method is applied (Figure 6e).

When the template event is at greater depth (e.g., 10 km), using the conventional method with the east-west or north-south channels from the same station is far from sufficient to constrain the hypocenter separation (Figures 6f and 6g). Using all three channels gives the similar conclusion (Figure 6i), which is not surprising because of the assigned equal weighting in the final CC calculation. The CC value remains high (>0.9 for the east-west and north-south channels and >0.8 for the average of three channels) even for a separation as large as 3 km (Figures 6f, 6g, and 6i). The vertical channel performs relatively better at small separations (<2 km) but loses the sensitivity at larger separations (≥2 km) as the CC curves become almost flat (Figure 6h). In comparison, the new method works more effectively in such cases, yet large CC variation with respect to the same change of source location exists when different channels and stations are used. Similar to the previous test, both methods work very well in constraining spatial separations when all stations are included. A point worth noting is that the performance of both methods using an array seems to become slightly poorer when the template event occurs at deeper depth (Figure 6e and 6j).

We further conduct experiments with a template that starts from the S phase. Without including the P wave part can make the conventional MF method even harder to distinguish any source location variation, especially for the vertical channel (e.g., Figure S2c and S2h). In contrast, choosing a template that starts from the P or S phase is not a major concern for the MFMC method as the results are very similar (Figures 6 and S2). The result of these experiments suggests that the MFMC method is more stable and less dependent of the choice of template windows. It is also worth noting that the MFMC method with a template starting from the S phase can outperform the conventional MF method with a template that contains both the P and S phases in discriminating the horizontal interevent separations (Figures 6 and S2).

Adding Gaussian noise with an SNR of 2 in the experiment seems only to have pronounced influences on the MFMC-derived CC values at small separations (<1 km; Figures S3 and S4). Even with no separation, the CC determined by MFMC can drop from 1.0 to a value slightly larger than 0.8. This is not surprising as we have learned from section 7 that MFMC is more sensitive to noise relative to the conventional MF method. With the enlarging separation (>1 km), however, it is the source location shift that matters as the difference in the MFMC-derived CC curves between the cases with and without noise become negligible.

From the tests above, we conclude that our new method is more effective in recognizing horizontal source location changes when data from only a single channel or single station are used. With the help of more stations, however, both methods work similarly well and a slight horizontal hypocenter changes will lead to a fast drop of CC values.

4.2 Vertical Interevent Separations

For vertical interevent separations (Figure 7e, insert), we also investigate two cases by placing the template event at two different depths. Similar to the finding in the preceding section, our new method is more capable of capturing hypocenter changes than the conventional method (Figures 7 and S5). For the conventional method, using more channels/stations helps significantly in constraining the vertical hypocenter separations only when the source depth of the template event is shallow (Figures 7e and S5e). When the template event is deeper, the conventional method fails to recognize the depth difference between the two sources no matter how many channels and stations (up to 12 channels from 4 stations in our experiment) are used in the calculation, while the new method is still capable of differentiating source depth changes but with much less sensitivity (Figures 7j and S5j).

Details are in the caption following the image
CC variation due to vertical interevent separation with a template starting from the P phase. Symbols and layout are the same as that in Figure 6.

For the new method, including more channels and/or stations does not appear to be helpful, as the single-channel MFMC method can produce results similar to those of using an array (Figures 7 and S5). It should be noted that, for both methods, the CC curves from vertical channels of different stations are nearly identical (Figures 7c and 7h). This seems to suggest that for strike-slip events, the vertical channel is independent of station azimuth in constraining the vertical separation of the two sources. Similar to the conclusion obtained in section 9, the MFMC method with a template starting from the S phase can outperform the conventional MF method with a template containing both the P and S phases in most cases (Figures 7 and S5). Including the P phase can improve the depth resolution of MFMC only when the source is deep (≥10 km). Also, the effect of adding random noise in the test is generally insignificant (Figures S6 and S7). It is interesting to note that both the conventional and new methods overall show less CC variation in constraining vertical interevent separation than in constraining horizontal over different channels and stations.

4.3 Importance of Station Distribution

We have demonstrated that the conventional multistation MF method lacks the resolution for vertical interevent separation when the template event is deep (Figures 7 and S5–S7). It is worth considering whether it is a worst-case scenario due to the model setup in which the same epicentral distance is used for all stations. Hence, we conduct additional experiments with an identical setup except that the epicentral distance is different for each station (Figure S8). With more variation in the traveltime moveout, the conventional multistation MF method is indeed able to distinguish the assumed depth changes more effectively, and its performance becomes much closer to that of the multistation MFMC method (Figure S9). For horizontal separations in particular, the resolution is approximately the same for both conventional MF and MFMC independent of station locations (Figure S9). Therefore, station distribution can be an important factor that affects the depth resolution of the conventional multistation MF method, consistent with the finding of Chamberlain and Townend (2018). In contrast, the performance of MFMC is independent of the setup of seismic stations.

5 Demonstration: A True Repeating Event or Not?

5.1 Real Examples From the Blanco Fracture Zone

To demonstrate how the conventional MF method may misidentify different earthquakes as repeating events due to the lack of resolution in interevent separation, we apply both the conventional MF and MFMC methods to a group of real earthquakes that occurred in the Blanco Fracture Zone (BFZ) in northeast Pacific (approximate location shown in Figure S14).

This group consists of three events on 8 October 2012, 12 December 2012, and 1 January 2013. For convenience, we refer to them as the Oct08, Dec12, and Jan01 events in the following text. These events are so small that only one nearby ocean-bottom seismometer (station X9.BB200, Figure S14) records clear signals, and hence, their precise hypocenters cannot be accurately determined. The waveforms of these events are all dominated by the large S phases and prolonged ringing (Figure 8, middle). Despite the P phases having a very small amplitude, they can be clearly recognized from the background noise (Figure 8, left). The S-P differential traveltimes of the three events are 0.93 s (Jan01), 0.28 s (Oct08), and 1.94 s (Dec12). Based on this evidence, there is no doubt that they are not repeating earthquakes. Assuming the average P and S wave velocities are 6.0 and 3.24 km/s, respectively (Kuna et al., 2019) and that these events occurred along the BFZ, we can estimate to the first order the interevent distance in the source-station direction for the event pairs of Jan01–Oct08 and Jan01–Dec12 as 4.58 and 7.11 km, respectively.

Details are in the caption following the image
Normalized waveforms of three events occurred in the Blanco Fracture Zone. Red, blue, and dark green lines correspond to the waveforms of the Jan01, Oct08, and Dec12 events, respectively. The waveforms of both the Oct08 and Dec12 events are aligned at the location of the best match with the Jan01 event based on the conventional MF method. The reference time (i.e., time 0) is defined by the P arrival of the Jan01 event. (left column) The zoom-in P waves without filtering. (middle column) The full unfiltered waveforms with mean removed. Color shaded segments indicate the S-P differential traveltimes. (right column) The band-pass-filtered waveforms. The templates from the Jan01 event are superposed at the location of the best match on the basis of the conventional method for the Oct08 and Dec12 events.

Here we take the Jan01 earthquake as the template event. The waveform templates start from the P phase with a length of 3(Ts − Tp) (Figure 8, right). All three events are band-pass filtered between 3 and 9 Hz according to the signal and presignal noise spectra (Figure S15). Taking the conventional MF approach, the three-channel averaged CC value for the Jan01–Oct08 event pair is as large as 0.85 (mainly due to the good match of the high-amplitude portion), despite the clear 0.65 s difference in the S-P differential traveltime (Figure 8). This CC value is higher than the threshold used by many previous studies to define repeating earthquakes (e.g., Buurman & West, 2010; Cannata et al., 2013; Ma & Wu, 2013; Petersen, 2007; Schaff & Richards, 20042011). In a hypothetical scenario, if we incorrectly consider the Oct08 event as a repeating earthquake at the hypocenter of the Jan01 event, the misidentification may inevitably incur a large bias/error in subsequent analyses, such as estimating fault creep rate (e.g., Materna et al., 2018; Nadeau & Johnson, 1998; Uchida et al., 20032006; Yu, 2013) and/or monitoring subsurface velocity changes (e.g., Li et al., 2017; Poupinet et al., 1984; Sawazaki et al., 2015; Schaff & Beroza, 2004). With the MFMC approach, however, the CC value is lower than 0.5, mainly due to the mismatch of the low-amplitude portions of the waveforms (Figure 8), suggesting that they are not repeating earthquakes.

For the Jan01–Dec12 event pair, we notice that the P first motions have opposite polarities (Figure 8). This observation provides strong evidence indicative of these two events being nonrepeaters, in addition to the large difference in the S-P differential traveltime. Yet the CC calculated from the conventional MF method is very high (0.91) in comparison to 0.52 when the MFMC approach is used.

We also tested three other different filters with much wider frequency passbands for both event pairs, and the results are comparable (Figure S16). We find that the performance of MFMC is less sensitive to the choice of filter passband. Taking the Jan01-Dec12 event pair, for example, the conventional method-derived CC can vary by as much as 0.13 when different filters are applied (e.g., a narrow band-pass filter of 3–9 Hz vs. a wider one of 1–20 Hz; Figure S16). In contrast, the MFMC-derived CC is very stable with a difference of only 0.02.

5.2 Real Examples From the Queen Charlotte Plate Boundary

The BFZ examples above manifest the capability of MFMC in differentiating distant sources. However, for closely spaced neighboring earthquakes with very similar P and S waveforms and S-P differential traveltimes (e.g., Cheng et al., 2007), it is always a challenge to distinguish if they are just neighboring events or true repeaters unless a very dense local array is available. Here we demonstrate a solution to this common but challenging problem by applying the MFMC method to two event pairs taken from the repeating earthquake catalog compiled by Hayward and Bostock (2017) for the Queen Charlotte plate boundary on the basis of conventional waveform cross-correlation using data from a single station (Figure 9).

Details are in the caption following the image
Normalized waveforms of two event pairs occurred in the Queen Charlotte plate boundary zone. (a) An example of misidentified repeating pair. Gray shaded boxes show the amplified waveforms for the segments before and after the S phase (marked by black dotted lines). (b) An example of true repeating pair. For both (a) and (b), yellow shaded segments indicate the windows used for CC calculation.

We process the waveforms of both pairs with the same filter (2–14 Hz) as that used in Hayward and Bostock (2017). For the first event pair, the conventional method-derived CC is as high as 0.93, in contrast to the low CC of 0.45 calculated with the MFMC method. This pair has almost identical P and S waveforms, yet the scattered phases both before and after the S phase are visibly distinct (especially on the vertical component, gray shaded boxes in Figure 9a). Given the extremely low noise level (as indicated by the nearly flat waveforms before the P arrival), the waveform discrepancy between the scattered phases can arise from either changes of scatters along the raypaths or, more likely, source location difference. Therefore, the first event pair could be neighboring events rather than repeating earthquakes, in good agreement with the MFMC-derived low CC. For true repeaters like the second pair, each segment of the waveforms matches almost perfectly when the background noise level is low (Figure 9b). In such a case, the CC values derived from both methods are high (0.96).

The demonstrated examples clearly show that the conventional MF method works well in detecting the existence of an event in the neighborhood of the template hypocenter but lacks the resolution to unambiguously determine whether the template and detected events are co-located or not. From this perspective, we point out that the MFMC method should not be used as a replacement of the conventional MF method. Depending on the purpose of research, the conventional MF method may be the preferred tool if the primary objective is to detect as many events as possible regardless of being repeaters or not. On the other hand, if the objective is to identify true repeating earthquakes with limited station coverage, then the MFMC method can work much more effectively.

6 Conclusions

The main conclusions from this study are summarized as follows:
  1. The value of CC can be affected by many factors, including the window length of the waveform template, the frequency band of applied digital filter, and the existence of a large-amplitude phase(s). As a result, a variety of different CC thresholds, ranging from 0.75 to 0.98 depending on different choices of data processing parameters, were used in previous studies to identify repeating earthquakes. It is important to bear in mind that the computed CC should not be taken at face value. Claiming an event pair to be repeaters should be made with caution if the identification criterion is solely based on a certain CC threshold.
  2. To avoid any ambiguity due to the choice of different operational parameters and to optimize the performance of MF, we propose generic rules for selecting the length of waveform template and the frequency passband of the digital filter used in data processing. Specifically, a dynamic window length depending on the differential traveltime between the P and S phases can properly account for the increasing wave train with source-receiver distance. Comparing the spectra of signal and presignal noise before performing cross-correlation is of great importance in designing an optimal filter with the highest SNR. Our results show that the performance of MF will be significantly improved when an optimal filter is applied before cross-correlation.
  3. As evident from both synthetic and real earthquake examples, the conventional MF method (i.e., the one-segment cross-correlation approach) is very sensitive to large-amplitude phases in the calculation window. Consequently, the CC value obtained with the conventional approach is not a proper indicator for interevent spatial separation. A high CC value does not necessarily imply a small hypocenter separation, especially when one (or few) large-amplitude phase(s) dominates the waveform in the calculation window. The situation becomes worse if the template event is deep (e.g., ≥10 km). With data from multiple stations, the traditional approach seems to work much better in constraining the horizontal separation but not always the vertical. Significant improvement can be reached if the station distribution is optimized. A key conclusion from these synthetic and real earthquake experiments is that inferring interevent separation based on the conventional MF method, especially with data from only a single channel or station, may not be reliable.
  4. To reliably identify repeating earthquakes, we propose the MFMC approach that properly incorporates the contributions from low-amplitude phases in the calculation of CC. The new method is very sensitive to subtle waveform changes caused by minor hypocenter shifts. A slight location difference can lead to an apparent drop in the final CC value. For horizontal interevent separation, including multiple stations may help reduce ambiguity; for separation in the vertical direction, using a single channel can perform similarly well as using an array. When the template event is deep, the new method becomes slightly less effective in recognizing vertical changes but still outperforms the conventional method.
  5. Finally, the MFMC approach can be applied directly to continuous waveforms to search for repeating earthquakes. It can also be used as a postanalysis tool to extract and verify repeating earthquakes from an existing catalog or from the output of the conventional MF. It is not a replacement of the conventional MF approach if the primary objective is to detect as many events as possible, regardless of being repeaters or not.

Acknowledgments

We thank the Associate Editor, Calum J. Chamberlain, and an anonymous reviewer for their constructive comments. We are grateful to Lupei Zhu for providing the FK code that is used in generating synthetic seismograms. Insightful discussions with Jianlong Yuan, Stan Dasso, Ryan Visser, Fengzhou Tan, Ramin Mohammad Hosseini Dokht, Bei Wang, Douglas A. Dodge, and Jean-Luc Got are much appreciated. Real waveform data used in this study were downloaded from the Incorporated Research Institutions for Seismology (http://ds.iris.edu/ds/nodes/dmc/, last accessed May 2020). Seismic data are processed with Obspy (Beyreuther et al., 2010; https://github.com/obspy/obspy/). Figures are made with Matplotlib (Hunter, 2007; https://matplotlib.org) and GeoMapApp (Ryan et al., 2009; www.geomapapp.org). This study is partially supported by a University of Victoria Fellowship (D. G.), the Induced Seismicity Research Project of NRCan (H. K.), Geoscience BC (H. K.), and a NSERC Discovery Grant (H. K.). This paper is NRCan contribution 20200096.