Predictability in the Epidemic‐Type Aftershock Sequence model of interacting triggered seismicity
Abstract
[1] As part of an effort to develop a systematic methodology for earthquake forecasting, we use a simple model of seismicity on the basis of interacting events which may trigger a cascade of earthquakes, known as the Epidemic‐Type Aftershock Sequence model (ETAS). The ETAS model is constructed on a bare (unrenormalized) Omori law, the Gutenberg‐Richter law, and the idea that large events trigger more numerous aftershocks. For simplicity, we do not use the information on the spatial location of earthquakes and work only in the time domain. We demonstrate the essential role played by the cascade of triggered seismicity in controlling the rate of aftershock decay as well as the overall level of seismicity in the presence of a constant external seismicity source. We offer an analytical approach to account for the yet unobserved triggered seismicity adapted to the problem of forecasting future seismic rates at varying horizons from the present. Tests presented on synthetic catalogs validate strongly the importance of taking into account all the cascades of still unobserved triggered events in order to predict correctly the future level of seismicity beyond a few minutes. We find a strong predictability if one accepts to predict only a small fraction of the large‐magnitude targets. Specifically, we find a prediction gain (defined as the ratio of the fraction of predicted events over the fraction of time in alarms) equal to 21 for a fraction of alarm of 1%, a target magnitude M ≥ 6, an update time of 0.5 days between two predictions, and for realistic parameters of the ETAS model. However, the probability gains degrade fast when one attempts to predict a larger fraction of the targets. This is because a significant fraction of events remain uncorrelated from past seismicity. This delineates the fundamental limits underlying forecasting skills, stemming from an intrinsic stochastic component in these interacting triggered seismicity models. Quantitatively, the fundamental limits of predictability found here are only lower bounds of the true values corresponding to the full information on the spatial location of earthquakes.
1. Introduction
[2] There are several well documented facts in seismicity: (1) spatial clustering of earthquakes at many scales, [e.g., Kagan and Knopoff, 1980] (2) the Gutenberg‐Richter (GR) [Gutenberg and Richter, 1944] distribution of earthquake magnitudes and (3) clustering in time following large earthquakes, quantified by Omori's ≈1/tp law for aftershocks (with p ≈ 1) [Omori, 1894]. However, there are some deviations from these empirical laws, and a significant variability in the parameters of these laws. The b value of the GR law, the p Omori exponent and the aftershock productivity are spatially and temporally variable [Utsu et al., 1995; Guo and Ogata, 1995]. Some alternative laws have also been proposed, such as a gamma law for the magnitude distribution [Kagan, 1999] or the stretched exponential for the temporal decay of the rate of aftershocks [Kisslinger, 1993].
[3] However, these “laws” are only the beginning of a full model of seismic activity and earthquake triggering. In principle, if one could obtain a faithful representation (model) of the spatiotemporal organization of seismicity, one could use this model to develop algorithms for forecasting earthquakes. The ultimate quality of these forecasts would be limited by the quality of the model, the amount of data that can be used in the forecast and its reliability and precision, and the stochastic component of seismic activity. Here, we analyze a simple model of seismicity known as the Epidemic Type Aftershock Sequence model (ETAS) [Ogata, 1988, 1989] and we use it to test the fundamental limits of predictability of this class of models. We restrict our analysis to the time domain, that is, we neglect the information provided by the spatial location of earthquakes which could be used to constrain the correlation between events and would be expected to improve forecasting skills. Our results should thus give lower bounds of the achievable predictive skills. This exercise is rather constrained but turns out to provide meaningful and useful insights. Because our goal is to estimate the intrinsic limits of predictability in the ETAS model, independently of the additional errors coming from the uncertainty in the estimation of the ETAS parameters in real data, we consider only synthetic catalogs generated with the ETAS model. Thus the only source of error of the prediction algorithms result from the stochasticity of the model.
[4] Before presenting the model and developing the tools necessary for the prediction of future seismicity, we briefly summarize in the next section the available methods of earthquake forecasting based on past seismicity. In section 3, we present the model of interacting triggered seismicity used in our analysis. Section 4 develops the formal solution of the problem of forecasting future seismicity rates based on the knowledge of past seismicity quantified by a catalog of times of occurrences and magnitudes of earthquakes. Section 5 gives the results of an intensive series of tests, which quantify in several alternative ways the quality of forecasts (regression of predicted versus realized seismicity rate, error diagrams, probability gains, information‐based binomial scores). Comparisons with the Poisson null‐hypothesis give a very significant success rate. However, only a small fraction of the large‐magnitude targets can be shown to be successfully predicted while the probability gain deteriorates rapidly when one attempts to predict a larger fraction of the targets. We provide a detailed discussion of these results and of the influence of the model parameters. Section 6 concludes.
2. A Rapid Tour of Methods of Earthquake Forecasts Based on Past Seismicity
[5] All the algorithms that have been developed for the prediction of future large earthquakes based on past seismicity rely on their characterization either as witnesses or as actors. In other words, these algorithms assume that past seismicity is related in some way to the approach of a large‐scale rupture.
2.1. Pattern Recognition (M8)
[6] In their pioneering work, Keilis‐Borok and Malinovskaya [1964] codified an observation of general increase in seismic activity depicted with the only measure, “pattern Sigma,” which is characterizing the trailing total sum of the source areas of the medium size earthquakes. Similar strict codifications of other seismicity patterns, such as a decrease of b value, an increase in the rate of activity, an anomalous number of aftershocks, etc, have been proposed later and define the M8 algorithm of earthquake forecast (see Keilis‐Borok and Kossobokov [1990a, 1990b] and Keilis‐Borok and Soloviev [2003] for useful reviews). In these algorithms, an alarm is defined when several precursory patterns are above a threshold calibrated in a training period. Predictions are updated periodically as new data become available. Most of the patterns used by this class of algorithms are reproduced by the model of triggered seismicity known as the ETAS model [see Sornette and Sornette, 1999; Helmstetter and Sornette, 2002a; Helmstetter et al., 2003]. The prediction gain G of the M8 algorithm, defined as the ratio between the fraction of predicted events over the fraction of time occupied by alarms, is usually in the range 3 to 10 (recall that a random predictor would give G = 1 by definition). A preliminary forward test of the algorithm for the time period July 1991 to June 1995 performed no better than the null hypothesis using a reshuffling of the alarm windows [Kossobokov et al., 1997]. Later tests indicated however a confidence level of 92% for the prediction of M7.5+ earthquakes by the algorithm M8‐MSc for real‐time intermediate‐term predictions in the Circum Pacific seismic belt, 1992–1997, and above 99% for the prediction of M ≥ 8 earthquakes [Kossobokov et al., 1999]. We use the term confidence level as 1 minus the probability of observing a predictability at least as good as what was actually observed, under the null hypothesis that everything is due to chance alone. As of July 2002, the scores (counted from the formal start of the global test initiated by this team since July 1991) are as follows: For M8.0+, 8 events occurred, 7 predicted by M8, 5 predicted by M8‐MSc; for M7.5+, 25 events occurred, 15 predicted by M8 and 7 predicted by M8‐MSc.
2.2. Short‐Term Forecast of Aftershocks
[7] Reasenberg and Jones [1989] and Wiemer [2000] have developed algorithms to predict the rate of aftershocks following major earthquakes. The rate of aftershocks of magnitude m following an earthquake of magnitude M is estimated by the expression

2.3. Branching Models
[8] Simulations using branching models as a tool for predicting earthquake occurrence over large time horizons were proposed in the work of Kagan [1973], and first implemented in Kagan and Knopoff [1977]. In a recent work, Kagan and Jackson [2000] use a variation of the ETAS model to estimate the rate of seismicity in the future but their predictions are only valid at very short times, when very few earthquakes have occurred between the present and the horizon.
[9] To solve this limitation and to extend the predictions further in time, Kagan and Jackson [2000] propose to use Monte Carlo simulations to generate many possible scenarios of the future seismic activity. However, they do not use this method in their forecasting procedure. These Monte Carlo simulations will be implemented in our tests, as we describe below. This method has already been tested by Vere‐Jones [1998] to predict a synthetic catalog generated using the ETAS model. Using a measure of the quality of seismicity forecasts in terms of mean information gain per unit time, he obtains scores usually worse than the Poisson method. We use below the same class of model and implement a procedure taking into account the cascade of triggering. We find, in contrast with the claim of Vere‐Jones [1998], a significant probability gain. We explain in section 5.6 the origin of this discrepancy, which can be attributed to the use of different measures for the quality of predictions.
[10] In the work of Helmstetter et al. [2003], the forecasting skills of algorithms based on three functions of the current and past seismicity (above a magnitude threshold) measured in a sliding window of 100 events have been compared. The functions are (1) the maximum magnitude Mmax of the 100 events in that window, (2) the Gutenberg‐Richter b value measured on these 100 events by the standard Hill maximum likelihood estimator and (3) the seismicity rate λ defined as the inverse of the duration of the window. For each function, an alarm was declared for the target of an earthquake of magnitude larger than 6 when the function is either larger (for Mmax and λ) or smaller (for b) than a threshold. These functions Mmax, b and λ are similar and in some cases identical to precursors and predictors that have been studied by other authors. Helmstetter et al. [2003] found that these three predictors are considerably better than those obtained for a random prediction, with the prediction based on the seismicity rate λ being by far the best. This is a logical consequence of the model of interacting triggered seismicity used in Helmstetter et al. [2003] and also in the present work, in which any relevant physical observable is a function of the seismicity rate. At least in the class of interacting triggered seismicity, the largest possible amount of information is recovered by targeting the seismicity rate. All other targets are derived from it as linear or nonlinear transformations of it. Our present study refines and extends the preliminary tests of Helmstetter et al. [2003] by using a full model of seismicity rather than the coarse‐grained measure λ. We note also that the forecasting methods of Rundle et al. [2001, 2002] are based on a calculation of the coarse‐grained seismicity above a small magnitude threshold, which is then projected into the future.
3. Model of Triggered Seismicity
[11] The parametric form that defines the ETAS model used in this paper was formulated by Ogata [1985, 1987, 1988]. See Ogata [1999] and [Helmstetter and Sornette, 2002a] for reviews of its origins, a description of the different versions of the model and of its applications to model or predict seismic activity. It is important to stress that the ETAS model is not only a model of aftershock sequences as the acronym ETAS (Epidemic‐Type Aftershock Sequence) would make one to believe but is fundamentally a model of triggered interacting seismicity.
[12] In addition to the strict definition of the ETAS model used by Ogata [1985, 1987, 1988, 1989, 1999], there were and still are a variety of alternative parametric forms of the extended “mutually exciting point processes” with marks (that is, magnitudes) introduced by Hawkes [1971, 1972], which have been applied to earthquakes, including Kagan and Knopoff [1987]; Kagan [1991] and Lomnitz [1974]. Kagan and Knopoff [1987] differs from Ogata [1985, 1987, 1988] in replacing the role played by the parameter c in the modified Omori law (1) by an abrupt cut‐off which models the duration of the main shock. They think that a nonzero value of c is merely the artifact of the missing events immediately after the main shock In contrast, based on the observation of the records of seismic waves, Utsu [1970, 1992] considers that the parameter c is not merely due to such artifact but also possesses some physical meaning. The analysis of Helmstetter and Sornette [2002a] shows that the choice of a nonzero c value [Ogata, 1988] or an abrupt cut‐off [Kagan and Knopoff, 1987] does not lead to any detectable differences in simulated catalogs at timescales beyond c (which is usually very small). Thus, from the point of view of the collective behavior of the model, both formulations lead to essentially indistinguishable catalogs and statistical properties. Lomnitz's [1974] model (“Klondike model”) was also directly inspired by Hawkes [1971] and is similar to the ETAS model, but assumes different parametric forms: in particular, the number of triggered events is taken proportional to the magnitude, not the exponential of the magnitude. Kagan and Jackson [2000] also use a formulation of the same class but with a more complex specification of the time, space and magnitude dependence of the triggering process and propagator.
3.1. Definition of the ETAS Model
[13] The ETAS model of triggered seismicity is defined as follows [Ogata, 1988]. We assume that a given event (the “mother”) of magnitude mi ≥ m0 occurring at time ti and position
gives birth to other events (“daughters”) in the time interval between t and t + dt and at point
at the rate

[14] Φ(t) is the direct Omori law normalized to 1

[15] Φ
is a normalized spatial “jump” distribution from the mother to each of her daughter, which quantifies the probability for a daughter to be triggered at a distance
from the mother, taking into account the spatial dependence of the stress induced by an earthquake.
[16] ρ(m) gives the total number of aftershocks triggered directly by an event of magnitude m

[17] The model is complemented by the Gutenberg‐Richter (GR) law which states that each earthquake has a magnitude chosen according to the density distribution

[18] In this first investigation, we limit ourselves to the time domain, studying time series of past seismicity summed over an overall spatial region, without taking into account information on earthquake locations. This amounts to integrating the local Omori law (2) over the whole space. In the following, we will thus use the integrated form of the local Omori law (2) given by

[19] We stress that not taking into account the spatial positions of the earthquakes is not saying that earthquakes occur at the same position. The synthetic catalogs we generate in space and time are similar to real catalogs and our procedure just neglects the information on space. Clearly, this is not what a full prediction method should do and it is clear that not using the information on the location of earthquakes will underestimate (sometimes grossly) the predictive skills that could be achieved with a full spatiotemporal treatment. However, the problem is sufficiently complex that we find it useful to go through this first step and develop the relevant concepts and first tests using only information on seismic time sequences.
3.2. Definition of the Average Branching Ratio n
[20] The key parameter of model (2) is the average number (or “branching ratio”) n of daughter‐earthquakes created per mother event. This average is performed over time and over all possible mother magnitudes. This average branching ratio n is a finite value for θ > 0 and for α < b, equal to

[21] Since n is defined as the average over all main shock magnitudes of the mean number of events triggered by a main shock, the branching ratio does not give the number of daughters of a given earthquake, because this number also depends on the specific value of its magnitude as shown by (4). As an example, take α = 0.8, b = 1, m0 = 0 and n = 1. Then, a main shock of magnitude M = 7 will have on average 80000 direct aftershocks, compared to only 2000 direct aftershocks for an earthquake of magnitude M = 5 and less than 0.2 aftershocks for an earthquake of magnitude M = 0.
[22] The branching ratio defined by (7) is the key parameter of the ETAS model, which controls the different regimes of seismic activity. There are two observable interpretations for this parameter [Helmstetter and Sornette, 2003]. The branching ratio can be defined as the ratio of triggered events over total seismicity when looking at catalog of seismicity at large scale. The branching ratio is also equal to the ratio of the number of secondary and later‐generations aftershocks over the total number of aftershocks within a single aftershock sequence.
4. Formal Solution of the Earthquake Forecast Problem in the ETAS Model
[23] Having stressed the importance of the indirect triggered seismicity in determining both the overall level of seismicity and its decay law, we now formulate the task of earthquake forecast within this model of triggered seismicity restricted to the time domain. In this paper, we do not address the delicate issue related to the fact that not all earthquakes are observable or observed. Indeed, calibrations of the ETAS parameters using the magnitude cut‐offs dictated by the requirement of seismic catalog completeness rather than by the physics of triggered seismicity may lead to misleading results, as unobserved events may play a significant role (in their sheer number) in triggering observable seismicity. To our knowledge, all previous calibrations of real seismic catalogs have bypassed this problem, which will be studied using a technique derived from our renormalized Omori law in a subsequent paper.
[24] We first summarize the single source prediction problem, which has been studied previously by Helmstetter and Sornette [2002a]. We then consider the complete prediction problem and derive analytically the solution for the future seismicity rate triggered by all past event and by a constant external loading.
4.1. Formulation of the Global Seismicity Rate and Renormalized Omori's Law
[25] We define the “bare propagator” ϕ(t) of the seismicity as the integral of (2) over all magnitudes

[26] The total seismicity rate λ(t) at time t is given by the sum of an “external” source s(t) and the aftershocks triggered by all previous events

[27] Taking the ensemble average of (9) over many possible realizations of the seismicity (or equivalently taking the mathematical expectation), we obtain the following equation for the first moment or statistical average N(t) of λ(t) [Sornette and Sornette, 1999; Helmstetter and Sornette, 2002a]

[28] The global rate of aftershocks including secondary and later‐generations aftershocks triggered by a main shock of magnitude M occurring at t = 0 is given by ρ(M) K(t)/n, where the renormalized Omori law K(t) is obtained as a solution of (10) with the general source term s(t) replaced by the Dirac function δ(t):





[29] Once the seismic response K(t) to a single event is known, the complete average seismicity rate N(t) triggered by an arbitrary source s(t) can be obtained using the theorem of Green functions for linear equations with source terms [Morse and Feshbach, 1953]

4.2. Multiple Source Prediction Problem
[30] We assume that seismicity which occurred in the past until the “present” time u, and which does trigger future events, is observable. The seismic catalog consists of a list of entries {(ti, mi), ti < u} giving the times ti of occurrence of the earthquakes and their magnitude mi. Our goal is to set up the best possible predictor for the seismicity rate for the future from time u to time t > u, based on the knowledge of this catalog {(ti, mi), ti < u}. The time difference t − u is called the horizon. In the ETAS model studied here, magnitudes are determined independently of the seismic rate, according to the GR distribution. Therefore the sole meaningful target for prediction is the seismic rate. Once its forecast is issued, the prediction of strong earthquakes is obtained by combining the GR law with the forecasted seismic rate. The average seismicity rate N(t) at time t > u in the future is made of two contributions:
[31] 1. The external source of seismicity of intensity μ at time t plus the external source of seismicity that occurred between u and t and their following aftershocks that may trigger an event at time t.
[32] 2. The earthquakes that have occurred in the past at times ti < u and all the events they triggered between u and t and their following aftershocks that may trigger an event at time t. We now examine each contribution in turn.
4.2.1. Seismicity at Times t > u Triggered by a Constant Source μ Active From u to t
[33] Using the external seismicity source μ to forecast the seismicity in the future would underestimate the seismicity rate because it does not take into account the aftershocks of the external loading. On the contrary, using the “renormalized” seismicity rate μ/(1 − n) derived in the work of Helmstetter and Sornette [2003] would overestimate the seismicity rate because the earthquakes that were triggered before time u by the external source would be counted twice, since they are registered in the catalog up to time u. The correct procedure is therefore to evaluate the rate of seismicity triggered by a constant source μ starting at time u to remove the influence of earthquakes that have been recorded at times less than u, whose influence for times larger than u is examined later.
[34] The response Kμ(t) of the seismicity to a constant source term μ starting at time u is obtained using (15) as

(t) is the integral of K(t) − δ(t) given by (12) from the lower bound u excluded (noted u+) to t. This yields

. Expression (16) takes care of both the external source seismicity of intensity μ at time t and of its aftershocks and their following aftershocks from time u to t that may trigger events at time t.
4.2.2. Hypermetropic Renormalized Propagator
[35] We now turn to the effect counted from time u of the past known events prior to time u on future t > u seismicity, taking into account the direct, secondary and all later‐generation aftershocks of each earthquakes that have occurred in the past at times ti < u. Since the ETAS model is linear in the rate variable, we consider first the problem of a single past earthquake at time ti < u and will then sum over all past earthquakes.
[36] A first approach for estimating the seismicity at t > u due to event i that occurred at time ti < u is to use the bare propagators Φ(t − ti), e.g., as done by Kagan and Jackson [2000]. This extrapolation leads to an underestimation of the seismicity rate in the future because it does not take into account the secondary aftershocks. This is quite a bad approximation when n is not very small, and especially for n > 0.5, since the secondary aftershocks are then more numerous than direct aftershocks.
[37] An alternative would be to express the seismicity at t > u due to an earthquake that occurred at ti < u by the global propagator K(t − ti). However, this approach would overestimate the seismicity rate at time t because of double counting. Indeed, K(t − ti) takes into account the effect of all events triggered by event i, including those denoted j's that occurred at times ti < tj < u and which are directly observable and counted in the catalog. Thus using K(t − ti) takes into account these events j, that are themselves part of the sum of contributions over all events in the catalog.
[38] The correct procedure is to calculate the seismicity at t > u due to event i by including all the seismicity that it triggered only after time u. This defines what we term the “hypermetropic renormalized propagator” Ku*(t − ti). It is “renormalized” because it takes into account secondary and all subsequent aftershocks. It is “hypermetropic” because this counting of triggered seismicity starts only after time u such that this propagator is oblivious to all the seismicity triggered by event i at short times from ti to u.
[39] We now apply these concepts to estimate the seismicity triggered directly or indirectly by a main shock with magnitude M that occurred in the past at time ti while removing the influence of the triggered events j occurring between ti and u. This gives the rate


[40] Ku*(t) defined by (19) recovers the bare propagator Φ(t) for t ≈ u, i.e., when the rate of direct aftershocks dominates the rate of secondary aftershocks triggered at time t > u. Indeed, taking the limit of (19) for u → t gives

[41] In the other limit, u ≈ ti, i.e., for an event that occurred at a time ti just before the present u, Ku*(t) recovers the dressed propagator K(t) (up to a Dirac function) since there are no other registered events between ti and t and all the seismicity triggered by event i must be counted. Using equation (11), this gives

[42] Using again (11), we can rewrite (19) as


[43] We present below useful asymptotics and approximations of Ku*(t):
[44] 1. Hypermetropic renormalized propagator for t ≪ t*: Putting the asymptotic expansion of K(t) for t < t* (14) in (19) we obtain for t ≫ c and t > u

[45] 2. Hypermetropic renormalized propagator for t ≫ u: In the regime t ≫ u, we can rewrite (22) as

[46] 3. Hypermetropic renormalized propagator for t ≈ u: In the regime t ≈ u and t − u ≫ c, we can rewrite (19) as

[47] We have performed numerical simulations of the ETAS model to test our predictions on the hypermetropic renormalized propagator Ku*(t) (19, 23). For the unrealistic case where α = 0, i.e., all events trigger the same number of aftershocks whatever their magnitude, the simulations show a very good agreement (not shown) between the results obtained by averaging over 1000 synthetic catalogs and the theoretical prediction (19, 23).
[48] Figure 2 compares the numerical simulations with our theoretical prediction for more realistic parameters α = 0.5 with n = 1, c = 0.001 day, θ = 0.2 and μ = 0. In the simulations, we construct 1,000 synthetic catalogs, each generated by a large event that happened at time t = 0. The numerical hypermetropic renormalized propagator or seismic activity Ku*(t) is obtained by removing for each catalog the influence of aftershocks that were triggered in the past 0 < ti < u where the present is taken equal to u = 10 days and then by averaging over the 1000 catalogs. It can then be compared with the theoretical prediction (19, 23). Figure 2 exhibits a very good agreement between the realized hypermetropic seismicity rate (open circles) and Ku*(t) predicted by (19) and shown as the continuous line up to times t − u ∼ 103u. The hypermetropic renormalized propagator Ku*(t) is significantly larger than the bare Omori law Φ(t) but smaller than the renormalized propagator K(t) as expected. Note that Ku*(t) first increases with the horizon t − u up to horizons of the order of u and then crosses over to a decay law Ku*(t) ∼ 1/t1−θ parallel to the dressed propagator K(t). At large times however, one can observe a new effect in the clear deviation between our numerical average of the realized seismicity and the hypermetropic seismicity rate Ku*(t). This deviation pertains to a large deviation regime and is due to a combination of a survival bias and large fluctuations in the numerics. For the larger value α = 0.8, which may be more relevant for seismicity, the deviation between the simulated seismicity rate and the prediction is even larger. Indeed, for α ≥ 1/2, Helmstetter et al. [2003] have shown that the distribution of first‐generation seismicity rate is a power law with exponent less than or equal to 2, implying that its variance is ill‐defined (or mathematically infinite). Thus average rates converge slowly, all the more so at long times where average rates are small and fluctuations are huge. Another way to state the problem is that, in this regime α ≥ 1/2, the average seismicity may be a poor estimation of the typical or most probable seismicity. Such effect can be taken into account approximately by taking into account the coupling between the fluctuations of the local rates and the realized magnitudes of earthquakes at a coarse‐grained level of description [Helmstetter et al., 2003]. A detailed analytical quantification of this effect for K(t) (already discussed semiquantitatively in the work of Helmstetter et al. [2003]) and for Ku*(t), with an emphasis on the difference between the average, the most probable and different quantiles of the distribution of seismic rates, will be reported elsewhere.

[49] Since we are aiming at predicting single catalogs, we shall resort below to robust numerical estimations of K(t) and Ku*(t) obtained by generating numerically many seismic catalogs based on the known seismicity up to time u. Each such catalog synthesized for times t > u constitutes a possible scenario for the future seismicity. Taking the average and calculating the median as well as different quantiles over many such scenarios provides the relevant predictions of future seismicity for a single typical catalog as well as its confidence intervals.
5. Forecast Tests With the ETAS Model
[50] Because our goal is to estimate the intrinsic limits of predictability in the ETAS model, independently of the additional errors coming from the uncertainty in the estimation of the ETAS parameters in real data, we consider only synthetic catalogs generated with the ETAS model. Testing the model directly on real seismicity would amount to test simultaneously several hypotheses/questions: (1) ETAS is a good model of real seismicity, (2) the method of inversion of the parameters is correctly implemented and stable, what is the absolute limit of predictability of (3) the ETAS model and (4) of real seismicity? Since each of these four points are difficult to address separately and is not solved at all, we use only synthetic data generated with the ETAS model to test the intrinsic predictive skills of the model (3), independently of the other questions. Our approach thus parallels several previous attempts to understand the degree and the limits of predictability in models of complex systems, developed in particular as models of earthquakes [see, e.g., Pepke and Carlson, 1994; Pepke et al., 1994; Gabrielov et al., 2000].
[51] Knowing the times ti and magnitude mi of all events that occurred in the past up to the present u, the mean seismicity rate Nu(t) forecasted for the future t > u by taking into account all triggered events and the external source μ is given formally by

(t) is given by (17). In the language of the statistics of point processes, expression (27) amounts to using the conditional intensity function. The conditional intensity function gives an unequivocal answer to the question of what is the best predictor of the process. All future behaviors of the process, starting from the present time u and conditioned by the history up to time u, can be simulated exactly once the form of the conditional intensity is known. To see this, we note that the conditional intensity function, if projected forward on the assumption that no additional events are observed (and assuming no external variables intervene), gives the hazard function of the time to the next event. So if the simulation proceeds by using this form of hazard function, then by recording the time of the next event when it does occur, and so on, ensures that one is always working with the exact formula for the conditional distributions of the interevent times. The simulations then truly represent the future of the process, and any functional can be taken from them in whatever form is suitable for the purpose at hand.
[52] In practice, we thus use the catalog of known earthquakes up to time u and generate many different possible scenarios for the seismicity trajectory which each take into account all the relevant past triggered seismicity up to the present u. For this, we use the thinning simulation method, as explained by Ogata [1999]. We then define the average, the median and other quantiles over these scenarios to obtain the forecasted seismicity Nu(t).
5.1. Fixed Present and Variable Forecast Horizon
[53] Figure 3 illustrates the problem of forecasting the aftershock seismicity following a large M = 7 event. Imagine that we have just witnessed the M = 7 event and want to forecast the seismic activity afterward over a varying horizon from days to years in the future. In this simulation, u is kept fixed at the time just after the M = 7 event and t is varied. A realization of the instantaneous rate of seismic activity (number of events per day) of a synthetic catalog is shown by the black dots. This simulation has been performed with the parameters n = 0.8, α = 0.8, b = 1, c = 0.001 day, m0 = 3 and μ = 1 event per day. This single realization is compared with two forecasting algorithms: the sum of the bare propagators of all past events ti ≤ u, and the median of the seismicity rate obtained over 500 scenarios generated with the ETAS model, using the same parameters as used for generating the synthetic catalog we want to forecast, and taking into account the specific realization of events in each scenario up to the present. Figure 4 is the same as Figure 3 but shows the seismic activity as a function of the logarithm of the time after the main shock. These two figures illustrate clearly the importance of taking into account all the cascades of still unobserved triggered events in order to forecast correctly the future rate of seismicity beyond a few minutes. The aftershock activity forecast gives a very reasonable estimation of the future activity rate, while the extrapolation of the bare Omori law of the strong M = 7 event underestimates the future seismicity beyond half‐an‐hour after the strong event.


5.2. Varying “Present” With Fixed Forecast Horizon
[54] Figure 5 compares a single realization of the seismicity rate observed and summed over a 5 days period and divided by 5 so that it is expressed as a daily rate, with the predicted seismicity rate using either the sum of the bare propagators of the past seismicity or using the median of 100 scenarios generated with the same parameters as for the synthetic catalog we want to forecast: n = 0.8, c = 0.001 day, μ = 1 event per day, m0 = 3, b = 1 and α = 0.8. The forecasting methods calculate the total number of events over each 5 day period lying ahead of the present, taking into account all past seismicity including the still unobserved triggered seismicity. This total number of forecasted events is again divided by 5 to express the prediction as daily rates. The thin solid lines indicate the first and 9th deciles of the distributions of the number of events observed in the pool of 100 scenarios. Stars indicate the occurrence of large M ≥ 7 earthquakes. Only a small part of the whole time period used for the forecast is shown, including the largest M = 8.5 earthquake of the catalog, in order to illustrate the difference between the realized seismicity rate and the different methods of forecasting.

[55] The realized seismicity rate is always larger than the seismicity rate predicted using the sum of the bare propagators of the past activity. This is because the seismicity that will occur up to 5 days in the future is dominated by events triggered by earthquakes that will occur in the next 5 days. These events are not taken into account by the sums of the bare propagators of the past seismicity. The realized seismicity rate is close to the median of the scenarios (crosses), and the fluctuations of the realized seismicity rate are in good agreement with the expected fluctuations measured by the deciles of the distributions of the seismicity rate over all scenarios generated.
[56] Figure 6 compares the predictions of the seismicity rate over a 5 day horizon with the seismicity of a typical synthetic catalog; a small fraction of the history was shown in Figure 5. This comparison is performed by plotting the predicted number of events in each 5 day horizon window as a function of the actual number of events. The open circles (crosses) correspond to the forecasts using the median of 100 scenarios (the sum of the bare Omori propagators of the past seismicity). This figure uses a synthetic catalog of N = 200000 events of magnitude larger than m0 = 3 covering a time period of 150 years. The dashed line corresponds to the perfect prediction when the predicted seismicity rate is equal to the realized seismicity rate. This figure shows that the best predictions are obtained using the median of the scenarios rather than using the bare propagator, which always underestimates the realized seismicity rate, as we have already shown.

[57] The most striking feature of Figure 6 is the existence of several clusters, reflecting two mechanisms underlying the realized seismicity: (1) cluster LL with large predicted seismicity and large realized seismicity; (2) cluster SL with small predicted seismicity and large realized seismicity; (3) cluster SS with small predicted seismicity and small realized seismicity. Cluster LL along the diagonal reflects the predictive skill of the triggered seismicity algorithm: this is when future seismicity is triggered by past seismicity. Cluster SL lies horizontally at low predicted seismicity rates and shows that large seismicity rates can also be triggered by an unforecasted strong earthquake, which may occur even when the seismicity rate is low. This expresses a fundamental limit of predictability since the ETAS model permits large events even for low prior seismicity, as the earthquake magnitudes are drawn from the GR independently of the seismic rate. About 20% of the large values of the realized seismicity rate above 10 events per day fall in the LL cluster, corresponding to a predictability of about 20% of the large peaks of realized seismic activity. The cluster SS is consistent with a predictive skill but small seismicity is not usually an interesting target. Note that there is no cluster of large predicted seismicity associated with small realized seismicity.
[58] Figure 7 is the same as Figure 5 for a longer time window of 50 days, which stresses the importance of taking into account the yet unobserved future seismicity in order to accurately forecast the level of future seismicity. Figure 8 is the same as Figure 6 for the forecast time window of 50 days with forecasts updated each 50 days. Increasing the time window T of the forecasts from 5 to 50 days leads to a smaller variability of the predicted seismicity rate. However, fluctuations of the seismicity rate of one order of magnitude can still be predicted with this model. The ETAS model therefore performs much better than a Poisson process for large horizons of 50 days.


5.3. Error Diagrams and Prediction Gains
[59] In order to quantify the predictive skills of different prediction algorithms for the seismicity of the next five days, we use the error diagram [Molchan, 1991, 1997; Molchan and Kagan, 1992]. The predictions are made from the present to 5 days in the future and are updated each 0.5 day. Using a shorter time between each prediction, or updating the prediction after each major earthquake, will obviously improve the predictions, because large aftershocks occur often just after the main shock. However, in practice the forecasting procedure is limited by the time needed to estimate the location and magnitude of an earthquake. Moreover, predictions made for very short times in advance (a few minutes) are not very useful.
[60] An error diagram requires the definition of a target, for instance M ≥ 6 earthquakes, and plots the fraction of targets that were not predicted as a function of the fraction of time occupied by the alarms (total duration of the alarms normalized by the duration of the catalog). We define an alarm when the predicted seismic rate is above a threshold. Recall that in our model the seismic rate is the physical quantity that embodies completely all the available information on past events. All targets one might be interested in derive from the seismic rate.
[61] Figure 9 presents the error diagram for M ≥ 6 targets, using a time window T = 5 days to estimate the seismicity rate, and a time dT = 0.5 days between two updates of the predictions. We use different prediction algorithms, either the bare propagator (dots), the median (circles) or the mean (triangles) number of events obtained for the 100 scenarios already generated to obtain Figures 5 and 6. Each point of each curve corresponds to a different threshold ranging from 0.1 to 1000 events per day. The results for these three prediction algorithms are considerably better than those obtained for a random prediction, shown as a dashed line for reference.

[62] Ideally, one would like the minimum numbers of failures and the smallest possible alarm duration. Hence a perfect prediction corresponds to points close to the origin. In practice, the fraction of failure to predict is 100% without alarms and the gain of the prediction algorithm is quantified by how fast the fraction of failure to predict decreases from 100% as the fraction of alarm duration increases. Formally, the gain G reported below is defined as the ratio of the fraction of predicted targets (=1 – number of failures to predict) divided by the fraction of time occupied by alarms. A completely random prediction corresponds to G = 1.
[63] We observe that about 50% of the M ≥ 6 earthquakes can be predicted with a small fraction of alarm duration of about 20%, leading to a gain of 2.5 for this value of the alarm duration. The gain is significantly stronger for shorter fractions of alarm duration: as shown in Figure 9b, 25% of the M ≥ 6 earthquakes can be predicted with a small fraction of alarm duration of about 2%, leading to a gain of 12.5. The origin of this good performance for only a fraction of the targets has been discussed in relation with Figure 6, and is associated with those events that occur in times of large seismic rate (cluster along the diagonal in Figure 6). Figure 9c shows the dependence of the prediction gain G as a function of the alarm duration: the three prediction schemes give approximately the same power law increase with an exponent close to 1/2 of the gain as the duration of alarm decreases. For small alarm duration, the gain reaches values of several hundreds. The saturation at very small values of the alarm duration is due to the effect that only a few targets are sampled. Figures 10 and 11 are similar to Figure 9, respectively for a smaller target of magnitudes larger than 5 and a larger target of magnitudes larger than 7.


[64] Table 1 presents the results for the prediction gain and for the number of successes using different choices of the time window T and of the update time dT between two predictions, and for different values of the target magnitude between 5 and 7. The prediction gain decreases if the time between two updates of the prediction increases, because most large earthquakes occur at very short times after a previous large earthquake. In contrast, the prediction gains do not depend on the time window T for the same value of the update time dT.
| T, days | dT, days | Mt | N1 | N2 | Gmax | A | Ns | N1% | N10% | N50% | G1% | G10% | G50% |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0 | 1.0 | 5.0 | 2003 | 1332 | 40.4 | 3.2 × 10−4 | 17 | 120 | 332 | 806 | 9.01 | 2.49 | 1.21 |
| 1.0 | 1.0 | 5.5 | 637 | 461 | 117. | 7.4 × 10−5 | 4 | 58 | 136 | 303 | 12.6 | 2.95 | 1.31 |
| 1.0 | 1.0 | 6.0 | 198 | 159 | 339. | 3.7 × 10−5 | 2 | 30 | 56 | 94 | 18.9 | 3.52 | 1.18 |
| 1.0 | 1.0 | 6.5 | 66 | 55 | 979. | 1.9 × 10−5 | 1 | 10 | 15 | 28 | 18.2 | 2.73 | 1.02 |
| 1.0 | 1.0 | 7.0 | 29 | 27 | 665. | 5.6 × 10−5 | 1 | 7 | 11 | 14 | 25.9 | 4.07 | 1.04 |
| 5.0 | 0.5 | 5.0 | 2003 | 1389 | 77.5 | 1.1 × 10−4 | 12 | 155 | 382 | 853 | 11.2 | 2.75 | 1.23 |
| 5.0 | 0.5 | 5.5 | 637 | 483 | 223. | 7.4 × 10−5 | 8 | 72 | 155 | 320 | 14.9 | 3.21 | 1.33 |
| 5.0 | 0.5 | 6.0 | 198 | 164 | 656. | 1.9 × 10−5 | 2 | 35 | 64 | 106 | 21.3 | 3.90 | 1.29 |
| 5.0 | 0.5 | 6.5 | 66 | 57 | 1889. | 9.3 × 10−6 | 1 | 12 | 18 | 32 | 21.0 | 3.16 | 1.12 |
| 5.0 | 0.5 | 7.0 | 29 | 28 | 3847. | 9.3 × 10−6 | 1 | 8 | 12 | 17 | 28.6 | 4.29 | 1.21 |
| 5.0 | 5.0 | 5.0 | 2003 | 1172 | 9.2 | 6.5 × 10−4 | 7 | 53 | 222 | 652 | 4.52 | 1.89 | 1.11 |
| 5.0 | 5.0 | 5.5 | 637 | 420 | 25.6 | 3.7 × 10−4 | 4 | 30 | 93 | 253 | 7.14 | 2.21 | 1.20 |
| 5.0 | 5.0 | 6.0 | 198 | 145 | 74.3 | 2.8 × 10−4 | 3 | 16 | 38 | 85 | 11.0 | 2.62 | 1.17 |
| 5.0 | 5.0 | 6.5 | 66 | 53 | 203. | 1.9 × 10−4 | 2 | 7 | 12 | 30 | 13.2 | 2.26 | 1.13 |
| 5.0 | 5.0 | 7.0 | 29 | 26 | 414. | 1.9 × 10−4 | 1 | 6 | 9 | 14 | 23.1 | 3.46 | 1.08 |
| 10. | 10. | 5.0 | 2003 | 1067 | 5.1 | 5.6 × 10−4 | 3 | 32 | 167 | 584 | 3.00 | 1.57 | 1.09 |
| 10. | 10. | 5.5 | 637 | 400 | 13.5 | 3.7 × 10−4 | 2 | 19 | 77 | 229 | 4.75 | 1.93 | 1.15 |
| 10. | 10. | 6.0 | 198 | 137 | 39.3 | 1.9 × 10−4 | 1 | 10 | 30 | 78 | 7.30 | 2.19 | 1.14 |
| 10. | 10. | 6.5 | 66 | 50 | 107. | 1.9 × 10−4 | 1 | 5 | 8 | 26 | 10.0 | 1.60 | 1.04 |
| 10. | 10. | 7.0 | 29 | 24 | 224. | 1.9 × 10−4 | 1 | 5 | 7 | 13 | 20.8 | 2.92 | 1.08 |
| 50. | 50. | 5.0 | 2003 | 701 | 1.5 | 0.016 | 17 | 11 | 84 | 370 | 1.57 | 1.20 | 1.06 |
| 50. | 50. | 5.5 | 637 | 329 | 3.3 | 9.3 × 10−4 | 1 | 8 | 43 | 181 | 2.43 | 1.31 | 1.10 |
| 50. | 50. | 6.0 | 198 | 123 | 8.8 | 9.3 × 10−4 | 1 | 5 | 20 | 62 | 4.07 | 1.63 | 1.01 |
| 50. | 50. | 6.5 | 66 | 48 | 22.4 | 9.3 × 10−4 | 1 | 4 | 7 | 32 | 8.33 | 1.46 | 1.33 |
| 50. | 50. | 7.0 | 29 | 22 | 48.9 | 9.3 × 10−4 | 1 | 4 | 5 | 16 | 18.2 | 2.27 | 1.45 |
| 50. | 5. | 5.0 | 2003 | 1172 | 9.2 | 6.5 × 10−4 | 7 | 53 | 209 | 657 | 3.37 | 1.78 | 1.12 |
| 50. | 5. | 5.5 | 637 | 420 | 25.6 | 3.7 × 10−4 | 4 | 27 | 89 | 251 | 4.76 | 2.12 | 1.20 |
| 50. | 5. | 6.0 | 198 | 145 | 74.3 | 2.8 × 10−4 | 3 | 13 | 37 | 82 | 7.24 | 2.55 | 1.13 |
| 50. | 5. | 6.5 | 66 | 53 | 203. | 1.9 × 10−4 | 2 | 7 | 11 | 24 | 8.49 | 2.08 | 0.91 |
| 50. | 5. | 7.0 | 29 | 26 | 414. | 1.9 × 10−4 | 2 | 7 | 9 | 13 | 15.4 | 3.46 | 1.00 |
- a N1 is the number of targets M ≥ Mt; N2 is the number of intervals with at least one target. Gmax is the maximum prediction gain, which is realized for an alarm duration A (in proportion of the total duration of the catalog), which is also given in the table. All three prediction algorithms used here provide the same gain as a function of the alarm duration, corresponding to different choices of the alarm threshold on the predicted seismicity rate. Ns is the number of successful predictions, using the alarm threshold that provides the maximum predictions gain Gmax for an alarm duration A (we count only one success when two events occur in the same interval). This number Ns is always very small, but a much larger number of successes can be obtained with a larger alarm duration. N1%, N10%, N50% are the number of successes corresponding to an alarm duration (in proportion of the total duration of the catalog) of 1%, 10% and 50% respectively, corresponding to the prediction gains G1%, G10% and G50%. The values of G50% show a saturation in the predictive power when increasing the fraction of alarm time, reflecting the fundamental limitation stemming from the fraction of large earthquakes not associated with a large seismic rate. Reading for instance of the last line of this table, we observe that, out of 26 time windows of 50 days that contained a M ≥ 7 earthquake, we are able to predict 7 of them with only 1% of the time occupied by alarms. Only two additional ones are predicted when using 10% of the time occupied by alarms. And only another four are predicted by increasing the time of alarms to half the total duration of the catalog. The catalog spans 150 years corresponding to a little more than 105 half‐day periods.
[65] The prediction gain is observed to increase significantly with the target magnitude, especially in the range of small fraction of alarm durations (see Table 1 and Figures 9, Figures 9–11). However, this increase of the prediction gain does not mean that large earthquakes are more predictable than smaller ones, in contrast with, for example, the critical earthquake theory [Sornette and Sammis, 1995; Jaumé and Sykes, 1999; Sammis and Sornette, 2002]. In the ETAS model, the increase of the prediction gain with the target magnitude is due to the decrease of the number of target events with the target magnitude. Indeed, choosing N events at random in the catalog independently of their magnitude gives on average the same prediction gain as for the N largest events. This demonstrates that the larger predictability of large earthquakes is solely a size effect.
[66] We now clarify the statistical origin of this size effect. Let us consider a catalog of total duration D with a total number N of events analyzed with D/T time windows with horizon T. These D/T windows can be sorted by decreasing seismicity r1 > r2 >. > rn >., where ri is the i‐th largest number of events in a window of size T. There are n1, n2,., ni,. windows of type 1, 2,., i,. respectively, such that Σirini = N. Then, the frequency‐probability that an earthquake drawn at random from the catalog falls within a window of type i is

[67] In our previous discussion, we have not distinguished the skills of the three algorithms, because they perform essentially identically with respect to the assigned targets. This is very surprising from the perspective offered by all our previous analysis showing that the naive use of the direct Omori law without taking into account the effect of any indirect triggered seismicity strongly underestimate the future seismicity. We should thus expect a priori that this prediction scheme should be significantly worse than the two others based on a correct counting of all unobserved triggered seismicity. The explanation for this paradox is given by examining Figure 12, which presents further insight in the prediction methods applied to the synthetic catalogs used in Figures 3-Figures 3–11. Figure 12 shows three quantities as a function of the threshold in seismicity rate used to define an alarm, for each of the three algorithms. These quantities are respectively the duration of alarms normalized by the total duration of the catalog shown in Figure 12a, the fraction of successes (=1 – failure to predict) shown in Figure 12b and the prediction gain shown in Figure 12c. These three figures tell us that the incorrect level of seismic activity predicted by the bare Omori law approach can be compensated by the use of a lower alarm threshold. In other words, even if the seismicity rate predicted by the bare Omori law approach is wrong in absolute values, its time evolution in relative terms contains basically the same information as the full‐fledged method taking into account all unobserved triggered seismicity. Therefore an algorithm that can detect a relative change of seismicity can perform as well as the complete approach for the forecast of the assigned targets. This is an illustration of the fact that predictions of different targets can have very different skills which depend on the targets. Using the full‐fledged renormalized approach is the correct and only method to get the best possible predictor of future seismicity rate. However, other simpler and more naive methods can perform almost as well for more restricted targets, such as the prediction of only strong earthquakes.

5.4. Optimization of Earthquake Prediction
[68] The optimization of the prediction method requires the definition of a loss function γ, which should be minimized in order to determine the optimum alarm threshold of the prediction method. The probability gain defined in the previous section cannot be used as an optimization method, because it is maximum for zero alarm time. The strategies that optimize the probability gain are thus very impractical. An error function commonly used [e.g., Molchan and Kagan, 1992] is the sum of the two types of errors in earthquake prediction, the fraction of alarm duration τ and the fraction of missed events v. This loss function is illustrated in Figure 13 for the same numerical simulation of the ETAS model as in the previous section, using a prediction time window of 5 days and an update time of 0.5 day, for different values of the target magnitude between 5 and 7. For a prediction algorithm that has no predictive skill, the loss function γ should be close to 1, independently of the alarm duration τ. This is indeed what we observe for the prediction algorithm using a Poisson process, with a constant seismicity rate equal to the average seismicity rate of the realized simulation. For the prediction methods based on the ETAS model, we obtain a significant predictability by comparison with the Poisson process (Figure 13). The loss function is minimum for a fraction of alarm duration of about 10%, and decreases with the target magnitude Mt from 0.9 for Mt = 5 down to 0.7 for Mt = 7. We expect that the minimum loss function will be much lower when using the information on earthquake locations.

5.5. Influence of the ETAS Parameters on the Predictability
[69] We now test the influence of the model parameters α, n, m0 and p on the predictability. We did not test the influence of the b value, because this parameter is rather well constrained in seismicity, and because its influence is felt only relative to α. We keep c equal to 0.001 day because this parameter is not critical as long as it is small. The external loading μ is also fixed to 1 event/day because it is only a multiplicative factor of the global seismicity. The value of the minimum magnitude m0 above which earthquakes may trigger aftershocks of magnitude larger than m0 is very poorly constrained. This value is no larger than 3 for the Southern California seismicity, because there are direct evidences of M3+ earthquakes triggered by M = 3 earthquakes [Helmstetter, 2003]. We are limited in our exploration of small values of m0 because the number of earthquakes increases rapidly if m0 decreases, and thus the computation time becomes prohibitive. We have tested the values m0 = 1 and m0 = 2 and find that the effect of a decrease of m0 is simply to multiply the seismicity rate obtained for M ≥ 3 by a constant factor. Using a smaller m0 does not change the temporal distribution of seismicity and therefore does not change the predictability of the system.
[70] We use the minimum value of the error function γ introduced in the previous section to characterize the predictability of the system. This function is estimated from the seismicity rate predicted using the bare propagator, as done in previous studies [Kagan and Knopoff, 1987; Kagan and Jackson, 2000]. While all three prediction methods that we have investigated give approximately the same results for the error diagram, the method of the bare propagator is much faster, which justifies to use it. The results are summarized in Table 2, which gives the minimum value of the error function γ for each set of parameters, and the corresponding values of the alarm duration, the proportion of predicted M ≥ 6 events and the prediction gain. All values of γ are in the range 0.6–0.9, corresponding to a small but significant predictability of the ETAS model. The predictability increases (i.e., γ decreases) with α, n and p, because for large α, large n and/or large p, there are larger fluctuations of the seismicity rate which have stronger impacts for the future seismicity. The minimum magnitude m0 has no significant influence on the predictability. We recover the same pattern when using another estimate of the predictability measured by the prediction gain G1% for an alarm duration of 1%.
| n | m0 | α | p | γ | G1% |
|---|---|---|---|---|---|
| 0.8 | 3.0 | 0.5 | 1.2 | 0.86 | 3.4 |
| 0.8 | 3.0 | 0.8 | 1.2 | 0.84 | 8.9 |
| 0.8 | 3.0 | 0.9 | 1.2 | 0.81 | 12.5 |
| 0.5 | 3.0 | 0.8 | 1.2 | 0.89 | 4.8 |
| 1.0 | 3.0 | 0.8 | 1.2 | 0.64 | 18.8 |
| 0.8 | 1.0 | 0.8 | 1.2 | 0.83 | 6.5 |
| 0.8 | 2.0 | 0.8 | 1.2 | 0.82 | 9.9 |
| 0.8 | 3.0 | 0.8 | 1.1 | 0.86 | 9.3 |
| 0.8 | 3.0 | 0.8 | 1.3 | 0.76 | 19.1 |
- a Another measure of the predictability is the prediction gain (G1%) corresponding to an alarm duration of 1%, which is equal to the percentage of predicted events. Predictions are done for the next day and updated each day (T = dT = 1 day).
5.6. Information Gain
[71] We now follow Kagan and Knopoff [1977], who introduced the entropy/information concept linking the likelihood gain to the entropy per event and hence to the predictive power of the fitted model, and Vere‐Jones [1998], who suggested using the information gain to compare different models and to estimate the predictability of a process.
[72] For a Poisson process, and assuming a constant magnitude distribution given by (5), the probability pi to have at least one event above the target magnitude Mt in the time interval (ti, ti + T) can be evaluated from the average seismicity rate λi above m0 by



[73] The binomial score B compares the prediction pi with the realization Xi, with Xi = 1 if a target event occurred in the interval (ti, ti + T) and Xi = 0 otherwise. For the whole sequence of intervals (ti, ti + T), the binomial score is defined by

[74] In order to test the performance of our forecasting algorithm, we compare the binomial score BETAS of the ETAS model with two time‐independent processes. First, we use a Poisson process with a seismicity rate equal to the average seismicity rate of the realized catalog, as in [Vere‐Jones, 1998], and use equation (29) to estimate the probability pi. Because the Poisson process assumes a uniform temporal distribution of target events and thus neglects clustering of large events, it over‐estimates the proportion of intervals which have at least one target event. Indeed, the probability of having several events of magnitude M ≥ Mt in the same time window is much higher for the ETAS model than for a Poisson process. We thus propose another null hypothesis. We define a time‐independent non‐Poissonian process by putting all values of pi equal to the fraction of intervals in the realized catalog that have at least a target events. For small time intervals and/or large target magnitudes, the proportion of intervals that have several target events is very small, and the two time‐independent processes give similar results. Another alternative to construct a time‐independent null‐hypothesis including clustering is to replace the Poisson distribution assumed in (29) by a negative binomial law [Kagan and Jackson, 2000], which has a larger variance.
[75] The results for different choices of the time interval T and of the target magnitude Mt are listed in Table 3. We evaluate the binomial score B for different prediction algorithms: (1) the ETAS model using numerical simulations to evaluate the probability pi given by (30) (BETAS), (2) using the sum of the bare propagators of the past seismicity to estimate the average seismicity rate λi and assuming a Poisson process (29) to estimate the probability pi from the rate λi (Bϕ), (3) a Poisson process (Bpois) and (iv) the second null‐hypothesis, with pi fixed to the observed fraction of time intervals which have at least a target event (Bnh). The results for the forecasting method based on simulations of the ETAS model (BETAS) are generally better than for the time‐independent processes, i.e., the binomial score for the ETAS model is larger than the score obtained with a Poisson process (Bpois) or with the second null‐hypothesis (Bnh). The second null hypothesis (Bnh) performs much better than a Poisson process because it takes into account clustering. For large time scales T and small target magnitude Mt, it performs even better than predictions based on the ETAS model. The score BETAS, which takes into account the cascade of secondary aftershocks, is significantly better than the score Bϕ obtained with the bare propagator, even for short time intervals T, because this predictor under‐estimates the realized seismicity rate. For large time intervals T ≥ 10 days, and for large target magnitudes, the scores Bϕ for the bare propagator are even worse than the results obtained with a Poisson process. Our results are in disagreement with those reported in Vere‐Jones [1998] on the same ETAS model: we conclude that the ETAS model has a significantly higher predictive power than the Poisson process while Vere‐Jones [1998] concludes that the forecasting performance of the ETAS model is worse than with the Poisson model. Vere‐Jones and Zhuang's procedure and ours are very similar. They use the same method to generate ETAS simulations and to update the predictions at rigid time intervals, with a similar time between two updates of the predictions. They use the same method to estimate the probability of having a target event for the Poisson process. Rather than using equation (30) to derive the probability pi from the number of events in each scenario as done in this work, they measure directly this probability from the fraction of scenarios which have at least a target event. We have compared the two methods and found that the method of Vere‐Jones gives results very similar to ours. However, for large target events and small time intervals, the method of Vere‐Jones requires to generate a huge number of scenarios to obtain accurate estimates of the fraction of scenarios which have at least a target event. We thus believe that our method might be better in this case and give a more accurate estimate of the probability pi of having at least a target event. This is one possible origin of the discrepancy between Vere‐Jones' results and ours.
| T, days | Mt | N1 | N2 | BETAS | Bϕ | Bpois | Bnh |
|---|---|---|---|---|---|---|---|
| 1 | 5.0 | 2003 | 1332 | −6155.3 | −6057.3 | −6361.8 | −6243.4 |
| 1 | 5.5 | 637 | 461 | −2569.1 | −2545.0 | −2678.7 | −2653.7 |
| 1 | 6.0 | 198 | 159 | −1031.0 | −1023.2 | −1089.4 | −1085.0 |
| 1 | 6.5 | 66 | 55 | −418.4 | −416.8 | −434.3 | −433.7 |
| 1 | 7.0 | 29 | 27 | −217.8 | −224.0 | −233.3 | −232.1 |
| 5 | 5.0 | 2003 | 1172 | −3677.2 | −3717.8 | −3862.8 | −3705.5 |
| 5 | 5.5 | 637 | 420 | −1737.1 | −1765.4 | −1810.7 | −1774.3 |
| 5 | 6.0 | 198 | 145 | −739.5 | −752.8 | −776.7 | −768.7 |
| 5 | 6.5 | 66 | 53 | −321.0 | −331.3 | −335.4 | −334.5 |
| 5 | 7.0 | 29 | 26 | −168.3 | −179.3 | −183.5 | −182.7 |
| 10 | 5.0 | 2003 | 1067 | −2686.6 | −2736.0 | −2852.4 | −2680.7 |
| 10 | 5.5 | 637 | 400 | −1407.0 | −1439.9 | −1465.4 | −1424.7 |
| 10 | 6.0 | 198 | 137 | −627.6 | −640.3 | −648.6 | −638.2 |
| 10 | 6.5 | 66 | 50 | −278.1 | −286.1 | −285.2 | −283.7 |
| 10 | 7.0 | 29 | 24 | −145.0 | −155.3 | −154.3 | −153.9 |
| 50 | 5.0 | 2003 | 701 | −706.6 | −758.7 | −817.2 | −696.7 |
| 50 | 5.5 | 637 | 329 | −668.6 | −702.7 | −706.2 | −662.8 |
| 50 | 6.0 | 198 | 123 | −384.5 | −398.5 | −395.5 | −382.6 |
| 50 | 6.5 | 66 | 48 | −191.9 | −204.3 | −197.9 | −196.2 |
| 50 | 7.0 | 29 | 22 | −102.4 | −113.3 | −107.5 | −107.4 |
- a We use nonoverlapping time intervals for the predictions of length T with a time dT = T between two predictions. BETAS is estimated using numerical simulations of the ETAS model, Bϕ is estimated using the sum of the bare propagators of the past seismicity to estimate the average seismicity rate, Bpois is obtained for a Poisson process and Bnh for a time‐independent process with pi fixed to the observed fraction of time intervals which have at least one target event. N1 is the number of target events M ≥ Mt ; N2 is the number of intervals with at least one target event.
[76] The ETAS parameters used in the two studies are also very similar and cannot account for the disagreement. The tests of Vere‐Jones [1998] have been performed using a α/b ratio of 0.57/1.14 = 0.5 smaller than the value α/b = 0.8 used in our simulations. This difference may lead to a smaller predictability for the simulations of Vere‐Jones [1998] because there are fewer large aftershock sequences. The branching ratio n = 0.78 used by Vere‐Jones [1998] is very close to our value n = 0.8. However, these difference in the ETAS parameters cannot explain why Vere‐Jones [1998] obtains a better predictability for a Poisson process than for the ETAS model. Vere‐Jones [1998] concludes that the Poisson process is better predictive than ETAS because the binomial score measured for time periods containing target events, i.e., by taking only the first term of equation (30), is larger for the Poisson process than for ETAS. While we agree with this point, we note that the fact that the first term of the binomial score is larger for the Poisson process than for ETAS does not imply that the Poisson process is more predictive. Indeed, the score for periods containing events is maximum for a simple model specifying a probability pi = 1 of having an event for all time intervals. Because this simple model does not miss any event, it gives the maximum binomial score for periods containing target events, but is of course not better than the ETAS or the Poisson model if we take into account the second term of the binomial score corresponding to periods without target events. If we take our second definition of the Poisson process, taking into account clustering, we obtain a better score with the ETAS model for periods containing target events than for the Poisson process. We thus think that the conclusion of Vere‐Jones [1998] that the ETAS model is sometimes less predictive than the Poisson process is due to an inadequate measure of the predictability. We thus caution that a suitable assessment of the forecasting skills of a model requires several complementary quantitative measures, such as the predicted versus realized seismicity rates, the error diagram and predictability gain and the entropy‐information gains, and a large number of target events.
6. Conclusions
[77] Using a simple model of triggered seismicity, the ETAS model, based on the (bare) Omori law, the Gutenberg‐Richter law and the idea that large events trigger more numerous aftershocks, we have developed an analytical approach to account for the triggered seismicity adapted to the problem of forecasting future seismic rates at varying horizons from the present. Tests presented on synthetic catalogs have validated the use of interacting triggered seismicity to forecast large earthquakes in these models. This work provides what we believe is a useful benchmark from which to develop real prediction tests of real catalogs. These tests have also delineated the fundamental limits underlying forecasting skills, stemming from an intrinsic stochastic component in the seismicity models. Our results offer a rationale for the fact that pattern recognition algorithms may perform better for strong earthquakes than for weaker events. Although the predictability of an earthquake is independent of its magnitude in the ETAS model, the prediction gain is better for the largest events because they are less numerous and it is thus more probable that they are associated with periods of large seismicity rates, which are themselves more predictable.
[78] We have shown in the work of Helmstetter et al. [2003] that most precursory patterns used in prediction algorithms, such as a decrease of b value or an increase of seismic activity can be reproduced by the ETAS model. If the physics of triggering is fully characterized by the class of models discussed here, this suggests that patterns and precursory indicators are sub‐optimal compared with the prediction based on a full modeling of the seismicity. The calibration of the ETAS model or some of its variants on real catalogs as done in the work of Kagan and Knopoff [1987], Kagan and Jackson [2000], Console and Murru [2001], Ogata [1988, 1989, 1992, 1999, 2001], Kagan [1991], Felzer et al. [2002] represent important steps in this direction. However, in practical terms, the issue of the model errors associated with the use of an incorrect model calibrated on an incomplete data set with not fully known parameters may make this statement weaker or even turn it on its head.
Acknowledgments
[79] We are very grateful to D. Harte, Y. Y. Kagan, Y. Ogata, F. Schoenberg, D. Vere‐Jones, and J. Zhuang for useful exchanges and for their feedback on the manuscript. This work is partially supported by NSF‐EAR02‐30429, by the Southern California Earthquake Center (SCEC), and by the James S. Mc Donnell Foundation 21st century scientist award/studying complex system. SCEC is funded by NSF Cooperative Agreement EAR‐0106924 and USGS Cooperative Agreement 02HQAG0008. The SCEC contribution number for this paper is 743.
Notes :
- a N1 is the number of targets M ≥ Mt; N2 is the number of intervals with at least one target. Gmax is the maximum prediction gain, which is realized for an alarm duration A (in proportion of the total duration of the catalog), which is also given in the table. All three prediction algorithms used here provide the same gain as a function of the alarm duration, corresponding to different choices of the alarm threshold on the predicted seismicity rate. Ns is the number of successful predictions, using the alarm threshold that provides the maximum predictions gain Gmax for an alarm duration A (we count only one success when two events occur in the same interval). This number Ns is always very small, but a much larger number of successes can be obtained with a larger alarm duration. N1%, N10%, N50% are the number of successes corresponding to an alarm duration (in proportion of the total duration of the catalog) of 1%, 10% and 50% respectively, corresponding to the prediction gains G1%, G10% and G50%. The values of G50% show a saturation in the predictive power when increasing the fraction of alarm time, reflecting the fundamental limitation stemming from the fraction of large earthquakes not associated with a large seismic rate. Reading for instance of the last line of this table, we observe that, out of 26 time windows of 50 days that contained a M ≥ 7 earthquake, we are able to predict 7 of them with only 1% of the time occupied by alarms. Only two additional ones are predicted when using 10% of the time occupied by alarms. And only another four are predicted by increasing the time of alarms to half the total duration of the catalog. The catalog spans 150 years corresponding to a little more than 105 half‐day periods.
- a Another measure of the predictability is the prediction gain (G1%) corresponding to an alarm duration of 1%, which is equal to the percentage of predicted events. Predictions are done for the next day and updated each day (T = dT = 1 day).
- a We use nonoverlapping time intervals for the predictions of length T with a time dT = T between two predictions. BETAS is estimated using numerical simulations of the ETAS model, Bϕ is estimated using the sum of the bare propagators of the past seismicity to estimate the average seismicity rate, Bpois is obtained for a Poisson process and Bnh for a time‐independent process with pi fixed to the observed fraction of time intervals which have at least one target event. N1 is the number of target events M ≥ Mt ; N2 is the number of intervals with at least one target event.
Number of times cited: 27
- Kristy F. Tiampo, Robert Shcherbakov and Paul Kovacs, Probability Gain From Seismicity-Based Earthquake Models, Risk Modeling for Hazards and Disasters, 10.1016/B978-0-12-804071-3.00007-0, (175-192), (2018).
- Angeliki Efstathiou, Andreas Tzanis and Filippos Vallianatos, On the nature and dynamics of the seismogenetic systems of North California, USA: An analysis based on Non-Extensive Statistical Physics, Physics of the Earth and Planetary Interiors, 270, (46), (2017).
- R. Shcherbakov, D.L. Turcotte and J.B. Rundle, Complexity and Earthquakes, Treatise on Geophysics, 10.1016/B978-0-444-53802-4.00094-4, (627-653), (2015).
- A. Efstathiou, A. Tzanis and F. Vallianatos, Evidence of non extensivity in the evolution of seismicity along the San Andreas Fault, California, USA: An approach based on Tsallis statistical physics, Physics and Chemistry of the Earth, Parts A/B/C, 85-86, (56), (2015).
- Grzegorz Lizurek and Stanisław Lasocki, Clustering of mining-induced seismic events in equivalent dimension spaces, Journal of Seismology, 18, 3, (543), (2014).
- Ting Wang, Jiancang Zhuang, Teruyuki Kato and Mark Bebbington, Assessing the potential improvement in short‐term earthquake forecasts from incorporation of GPS data, Geophysical Research Letters, 40, 11, (2631-2635), (2013).
- Alexander Saichev, Thomas Maillart and Didier Sornette, Hierarchy of temporal responses of multivariate self-excited epidemic processes, The European Physical Journal B, 86, 4, (2013).
- Eleftheria Papadimitriou, Dragomir Gospodinov, Vassilis Karakostas and Anastasios Astiopoulos, Evolution of the vigorous 2006 swarm in Zakynthos (Greece) and probabilities for strong aftershocks occurrence, Journal of Seismology, 17, 2, (735), (2013).
- Kristy F. Tiampo and Robert Shcherbakov, Seismicity-based earthquake forecasting techniques: Ten years of progress, Tectonophysics, 10.1016/j.tecto.2011.08.019, 522-523, (89-121), (2012).
- Peter M. Shearer, Self-similar earthquake triggering, Båth's law, and foreshock/aftershock magnitudes: Simulations, theory, and results for southern California, Journal of Geophysical Research: Solid Earth, 117, B6, (n/a), (2012).
- P. M. Shearer, Space-time clustering of seismicity in California and the distance dependence of earthquake triggering, Journal of Geophysical Research: Solid Earth, 117, B10, (2012).
- Zhengxiao Wu, Quasi‐hidden Markov model and its applications in cluster analysis of earthquake catalogs, Journal of Geophysical Research: Solid Earth, 116, B12, (2011).
- Lang-ping Zhang and Jiancang Zhuang, An improved version of the Load/Unload Response Ratio method for forecasting strong aftershocks, Tectonophysics, 509, 3-4, (191), (2011).
- A. I. Saichev and D. Sornette, Generating functions and stability study of multivariate self-excited epidemic processes, The European Physical Journal B, 83, 2, (271), (2011).
- Roberto Garra and Federico Polito, A note on fractional linear pure birth and pure death processes in epidemic models, Physica A: Statistical Mechanics and its Applications, 390, 21-22, (3704), (2011).
- Nicholas J. van der Elst and Emily E. Brodsky, Connecting near-field and far-field earthquake triggering to dynamic strain, Journal of Geophysical Research, 115, B7, (2010).
- R. Console, M. Murru and G. Falcone, Probability gains of an epidemic-type aftershock sequence model in retrospective forecasting of M ≥ 5 earthquakes in Italy, Journal of Seismology, 14, 1, (9), (2010).
- G. Molchan and L. Romashkova, Earthquake prediction analysis based on empirical seismic rate: the M8 algorithm, Geophysical Journal International, 183, 3, (1525-1537), (2010).
- Peter M. Powers and Thomas H. Jordan, Distribution of seismicity across strike‐slip faults in California, Journal of Geophysical Research: Solid Earth, 115, B5, (2010).
- Marco Antonio Leonel Caetano and Takashi Yoneyama, A new indicator of imminent occurrence of drawdown in the stock market, Physica A: Statistical Mechanics and its Applications, 388, 17, (3563), (2009).
- James R. Holliday, Donald L. Turcotte and John B. Rundle, Self-similar branching of aftershock sequences, Physica A: Statistical Mechanics and its Applications, 387, 4, (933), (2008).
- G. Molchan and V. Keilis‐Borok, Earthquake prediction: probabilistic aspect, Geophysical Journal International, 173, 3, (1012-1017), (2008).
- D.L. Turcotte, R. Shcherbakov and J.B. Rundle, Complexity and Earthquakes, Treatise on Geophysics, 10.1016/B978-044452748-6.00085-7, (675-700), (2007).
- D. L. Turcotte, J. R. Holliday and J. B. Rundle, BASS, an alternative to ETAS, Geophysical Research Letters, 34, 12, (2007).
- Jiancang Zhuang, Second‐order residual analysis of spatiotemporal point processes and applications in model evaluation, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 4, (635-653), (2006).
- Didier Sornette and Maximillian J. Werner, Apparent clustering and apparent background earthquakes biased by undetected seismicity, Journal of Geophysical Research: Solid Earth, 110, B9, (2005).
- Agnès Helmstetter, Guy Ouillon and Didier Sornette, Are aftershocks of large Californian earthquakes diffusing?, Journal of Geophysical Research: Solid Earth, 108, B10, (2003).




