Improving Physics‐Based Aftershock Forecasts During the 2016–2017 Central Italy Earthquake Cascade

The 2016–2017 Central Apennines earthquake sequence is a recent example of how damages from subsequent aftershocks can exceed those caused by the initial mainshock. Recent studies reveal that physics‐based aftershock forecasts present comparable skills to their statistical counterparts, but their performance remains a controversial subject. Here we employ physics‐based models that combine the elasto‐static stress transfer with rate‐and‐state friction laws, and short‐term statistical Epidemic Type Aftershock Sequence (ETAS) models to describe the spatiotemporal evolution of the earthquake cascade. We then track the absolute and relative model performance using log‐likelihood statistics for a 1‐year horizon after the 24 August 2016 Mw = 6.0 Amatrice earthquake. We perform a series of pseudoprospective experiments by producing seven classes of Coulomb rate‐state (CRS) forecasts with gradual increase in data input quality and model complexity. Our goal is to investigate the influence of data quality on the predictive power of physics‐based models and to assess the comparative performance of the forecasts in critical time windows, such as the period following the 26 October Visso earthquakes leading to the 30 October Mw = 6.5 Norcia mainshock. We find that (1) the spatiotemporal performance of the basic CRS models is poor and progressively improves as more refined data are used, (2) CRS forecasts are about as informative as ETAS when secondary triggering effects from M3+ earthquakes are included together with spatially variable slip models, spatially heterogeneous receiver faults, and optimized rate‐and‐state parameters. After the Visso earthquakes, the more elaborate CRS model outperforms ETAS highlighting the importance of the static stress transfer for operational earthquake forecasting.


Introduction
Earthquake cascades with multiple moderate to large magnitude events over weeks, months, or years expand the damage zone extensively causing disruption to livelihoods. Operational earthquake forecasts seek to provide reliable real-time information about the time dependence of earthquake hazard to the public and to decision makers (Jordan et al., 2011(Jordan et al., , 2014, but still little is known about earthquake nucleation processes and short-term fault interaction. For the last 30 years, statistical Epidemic Type Aftershock Sequence (ETAS) forecasts (Ogata, 1988(Ogata, , 1998 have shown considerable skills in capturing the clustering characteristics of triggered seismicity. However, these models offer limited insight into the physics of earthquake nucleation and fault interaction in terms of continuum mechanics. A commonly accepted physical mechanism to explain earthquake triggering is the static stress transfer hypothesis (Harris & Simpson, 1992). Nowadays, the implementation of laboratory-derived friction laws describing the seismicity response to an earthquake perturbation (Dieterich, 1994) is the basis of many physics-based forecasts known as Coulomb rate-state (CRS) models (Cattania et al., 2018;Cocco et al., 2010;Parsons et al., 2012Parsons et al., , 2014Segou, 2016;Toda & Enescu, 2011;Toda et al., 2005, among others). Recent advances in CRS modeling demonstrated that addressing the uncertainties behind physics-based models leads to a predictive power comparable to empirical models (Cattania et al., 2018;Segou & Parsons, 2016). However, the limitation for an operational use of CRS models is that their parametrization goes beyond a mere earthquake catalog and requires representations of earthquake sources and nearby receiver faults that are unlikely to be available immediately after a major event.
The Collaboratory for the Study of Earthquake Predictability (CSEP) provides a platform to evaluate the performance of forecast models in retrospective and prospective mode (Jordan, 2006;Michael & Werner, 2018). Here we assess how data quality and individual model choices driven by real-time conditions affect the performance of physics-based forecasts benchmarked against a statistical ETAS model. The 2016-2017 central Apennines (Italy) seismic sequence is an ideal testing ground with nine M5+ earthquakes between August 2016 and January 2017 that devastated many villages and historical heritage buildings.
We present seven classes of physics-based models with gradual complexity increase in a pseudoprospective experimental framework for the first year after the M w = 6.0 Amatrice event. The simplest forecasts include preliminary data available a few minutes after each M w ≥ 5.4 event, featuring synthetic source models with empirically derived fault length and previously determined fault constitutive parameters. More complex models incorporate (1) optimized rate-state parameters, (2) spatially heterogeneous receiver fault planes, (3) best-available slip models, and (4) secondary triggering effects. We evaluate the absolute and relative model performance using CSEP's metrics over different time horizons.
Our results show that the preliminary assumptions made to fill the early postdisaster knowledge gap severely hamper the absolute performance of CRS forecasts. When we incorporate revised data, optimize the rateand-state parametrization on previous regional seismicity, and account for the multilevel heterogeneities as (1) 3D spatial distribution of receivers, (2) spatially variable fault slip models, and (3) smaller magnitude aftershocks that reshape the local stress field, we obtain a dramatic performance improvement of physicsbased models. Results suggest that CRS performance is comparable to ETAS only when secondary triggering is taken into account.

The Central Apennines Earthquake Sequence
The Central Apennines are among the highest seismic hazard zones in Europe (Woessner et al., 2015). Present day deformation is taken up by NW-SE trending normal fault systems, expressing the postorogenic extension at a rate of~3 mm/year (Serpelloni et al., 2005). Historical and modern seismicity indicate moderate (M5+) to large (M6+) magnitude earthquake cascades associated with heavy damage patterns in the broader area (Rovida et al., 2016) such as the 1703 Norcia cascade where three M6.2+ earthquakes occurred within less than 20 days (Boschi et al., 2000), the six M5.2+ events that struck Colfiorito between September and October 1997 (Amato et al., 1998;Chiaraluce et al., 2003Chiaraluce et al., , 2004, and the 2009 M w = 6.3 L'Aquila sequence with an accelerating seismicity pattern observed few days before the mainshock (Chiarabba et al., 2009;Chiaraluce et al., 2011).
The 24 August 2016 Amatrice earthquake activated a 60-km long normal fault system ( Figure 1a) and was followed by an M w = 5.4 aftershock within less than 1 hr at the northern updip part of the mainshock rupture (Chiaraluce et al., 2017, and references therein). Two months later, on 26 October, two M w = 5.4 and M w = 5.9 earthquakes occurred further north near the village of Visso, closer to the southernmost aftershock zone of the 1997 Colfiorito sequence. The 30 October M w = 6.5 Norcia event remains the largest event of the sequence to date and the strongest in Italy since the 1980 M w = 6.9 Irpinia earthquake. Few months later, on 18 January 2017, four M5+ earthquakes occurred within a 4-hr window to the south of the Norcia mainshock rupture, coinciding with the northern aftershock zone of the 2009 L'Aquila sequence. The January 2017 seismicity raised immediate concerns about the Campotosto artificial dam lake, the second largest man-made lake in Europe.
In Table 1 we summarize the preliminary and revised source parameters of the major earthquakes of the Amatrice-Visso-Norcia (hereinafter, AVN) sequence. et al. (2017) for the Visso II and Norcia events. For the 18 January 2017 M w = 5.5 and M w = 5.4 Campotosto earthquakes, we use the only preliminary slip models available, as no refined versions have been issued. For the sources that lack a rupture model, we construct a synthetic slip distribution from their moment tensor solution, with empirically derived fault dimensions (Wells & Coppersmith, 1994) and uniform slip (from the moment relation of Hanks & Kanamori, 1979) tapered at the edges.  25 October 2016;green, 26 October to 29 October 2016;blue, 30 October 2016to 17 January 2017yellow, 18 January to 24 August 2017. The focal mechanisms of the three largest events are also displayed. Gray symbols indicate the 1997 Colfiorito and 2009 L'Aquila seismic sequences (M3+). We report the mapped active faults of the region (EMERGEO Working Group, 2016), (b) cumulative seismicity (M3+) with time, and (c) depth distribution of M3+ events in the first 24 hr following the M w = 6.0 Amatrice (red bars) and M w = 6.5 Norcia (blue bars) mainshocks. AVN = Amatrice-Visso-Norcia.

CRS Modeling
We estimate the coseismic Coulomb stress change (ΔCFF) by using the formulation of Rice (1992): where Δτ is the shear stress change on a given receiver fault plane (positive in direction of fault slip), Δσ n is the normal stress change, μ′ = μ(1 − B k ) is the effective coefficient of friction, and B k is Skempton's coefficient that accounts for pore fluid pressure effects. We set μ′ = 0.4 for all models, although we also tested the effect of implementing a lower μ′ (see Text S1.3). We assume an elastic half-space with a shear modulus of 30 GPa and Poisson's ratio ν = 0.25.
The receiver planes formulation follows two approaches: the optimally oriented planes (King et al., 1994) that assumes that earthquakes nucleate on hypothetical faults in favorable orientation with respect to the regional stress field and the geological receiver plane (GRP) approach. Both approaches received extensive criticism in recent years: the optimally oriented planes for relying on the knowledge of the largely unknown regional stress tensor to resolve stress changes on hypothetical planes that might not even exist (Segou & Parsons, 2016), whereas GRP may miss unmapped faults, when even in well-studied regions such as California~30% of seismicity occurs on previously unidentified structures . Here we adopt the Segou et al. (2013) approach where GRP is informed by presequence focal mechanisms and active fault maps to achieve a representation of the spatial structural heterogeneity.
We use the coseismic stress change estimates within a rate-and-state friction framework (Dieterich, 1994) to derive the time-dependent evolution of seismicity, expressed in expected number of events within 1-day time intervals for a 1-year time horizon. The seismicity rate R is related to the background rate r 0 : where _ τ represents the secular shear stressing rate and γ is a state variable. Under a constant secular shear stressing rate, γ reaches the steady state with a value given by In the absence of stress perturbations, the seismicity rate R equals the background rate r 0 . At the time of a stress step the state variable value is given by where ΔCFF is the coseismic stress change, and Aσ is a constitutive parameter accounting for the effective normal stress acting on the fault. A stress increase causes a drop in γ, resulting in a higher seismicity rate through equation (2). Due to the transient nature of the seismicity rate change, the evolving values of γ produce an Omori decay and R recovers to the background value in a time equal to t a (Dieterich, 1994): Given this inverse correlation between the stressing rate and the aftershock duration, it is evident that the seismicity rate on the most slowly stressed faults takes more time to decay toward background values (Stein & Liu, 2009).
To address the influence of the rate-and-state parameters Aσ and _ τ in the CRS forecasts, we test (1) predetermined values based on the study by Catalli et al. (2008) of the 1997 Colfiorito sequence and (2) parameters derived from a log-likelihood optimization on pre-AVN seismicity (see Text S1.1) using a common learning phase catalog for all CRS models.
The implementation of CRS models is based on a parallel computer code for calculating seismicity induced by time-dependent stress changes (Cattania & Khalid, 2016).

The ETAS Model
Statistical ETAS models are often used to describe short-term earthquake clustering (Ogata, 1988;Ogata, 1998). As a benchmark for the CRS forecasts, we use one standard version of ETAS (Seif et al., 2016). Although the focus of our study is on the improvements of CRS models, we acknowledge that other ETAS parametrizations may perform differently. Previous experiments show a good performance of ETAS both in retrospective (Cattania et al., 2018;Helmstetter et al., 2006;Marzocchi, Murru, et al., 2012;Werner et al., 2011;Woessner et al., 2011) and during unfolding sequences (e.g., Marzocchi et al., 2017;Marzocchi & Lombardi, 2009), but some weaknesses related with early catalog incompleteness (Omi et al., 2016;Segou & Parsons, 2016) and the absence of fault interaction effects (e.g., Marzocchi & Lombardi, 2009) have been reported.
The ETAS seismicity corresponds to a point process with a stochastic spatiotemporal branching evolution, where each earthquake triggers its own offspring events, whose number depends on the parent's magnitude and follow an Omori law decay. In the ETAS model, triggered earthquakes can have a larger magnitude than their parent event. The total space-time seismicity rate λ (or conditional intensity) is defined as where μ(x,y) is the background rate, a time-independent, and spatially heterogeneous Poisson process, while the summation term represents the triggering history (H t ) from all preceding earthquakes occurring at t i < t. The triggering function is expressed by empirical relations, according to the form of Ogata (1998): with normalized temporal and spatial distributions as second and third terms on the right-hand side, respectively. The parameter K 0 regulates the short-term aftershock productivity by a parent event with magnitude M equal or above a minimum triggering magnitude (M cut ), here set to 3.0; α establishes the efficiency of earthquakes in triggering aftershocks as a function of magnitude. The second term on the right-hand side of equation (7) is the modified Omori law (Utsu, 1961) describing the temporal distribution of triggered earthquakes. The term f(x,y|M) represents the probability distribution function of the spatial decay of triggered seismicity near the parent event given the parent's magnitude. Here we adopt a spatially isotropic power law distribution (e.g., Ogata & Zhuang, 2006;Werner et al., 2011): where q describes how triggered events decay in space, and the term d·e γ Mi−Mcut ð Þ scales the size of the M cut aftershock zone with the magnitude of the parent earthquake.

Development of Forecast Models
In this section, we describe forecast characteristics and model parameters. All models share a 1-year forecast horizon, with 24-hr time windows (dt) and target seismicity magnitude M3+.
The reference model CRS-1 is based on (1) real-time preliminary catalog data, including hypocenters and focal mechanisms available within few minutes to 1 hr, (2) stress perturbations from M w ≥ 5.4 events estimated using a uniform slip model with kinematic parameters from focal mechanisms with nodal planes selection constrained by the predominant rupture geometries reported in the DISS database (DISS Working Group, 2018), (3) a spatially homogeneous background rate obtained by stochastic declustering (Zhuang et al., 2002) of the CRS learning phase catalog and averaged over the testing region (r 0 = 0.034 M3+ events/day), and (4) spatially uniform receiver planes for ΔCFF estimation expressing the NW-SE striking, SW dipping main trend of the central Apennines normal fault systems (Basili et al., 2008). This latter assumption is valid when we do not know enough about the complex structures of the neighboring faults (McCloskey et al., 2003). This preliminary forecast realization features rate-and-state parameters Aσ = 0.04 MPa and _ τ = 10 −3 MPa/year (t a =40 years), previously used to characterize the active faults of the 1997 Colfiorito sequence (Catalli et al., 2008). The only difference between models CRS-1 and CRS-2 is that the latter implements a heterogeneous background rate, smoothed in space according to the adaptive kernel method proposed by Helmstetter et al. (2007).
From CRS-3 onward we use (1) revised hypocentral locations and moment tensor solutions available within 1-3 hr after a large earthquake (Table 1), and (2) fault constitutive parameters optimized on the learning phase catalog (see Text S1.1). During the optimization procedure, the grid search for Aσ ranges between [0.01-0.1] MPa enveloping all previous estimates (e.g., Catalli et al., 2008;Console et al., 2006;Toda et al., 1998) with aftershock durations (t a ) of [1-1000] years. The best fit Aσ-_ τ couples for each model are shown in Table 2. Otherwise, CRS-3 features the same implementation parameters as model CRS-2.
In model CRS-4, we introduce spatially variable receiver planes (SVP) to achieve a better representation of the structural heterogeneity of the fault system. We start by randomly selecting a nodal plane for each of the focal mechanisms included in the learning phase catalog ( Figure S1), and then we assign a rupture plane to each grid point through a 3D nearest neighbor spatial association. In regions where no previous focal mechanism exists, the assignment of a receiver fault is aided by the DISS database of active seismogenic structures (DISS Working Group, 2018). The resulting discrete fault grid is shown in Figure S2.
In CRS-5, we include the finite-length rupture models for events with M w ≥ 5.4 to implement a representation of spatial slip variability together with the structural heterogeneity expressed by the SVP receiver matrix in CRS-4.
The CRS-6 and CRS-7 models incorporate secondary triggering effects of smaller magnitudes with different thresholds (CRS-6, 41 M4+ events; CRS-7, 1167 M3+ events). In CRS-6, we represent 35 M4+ events using uniform slip distributions (similar to CRS-1 to CRS-4) with random selection of rupture planes from Time Domain Moment Tensor solutions of the Italian CMT database and peer-reviewed slip models for six events with M w ≥ 5.4. In CRS-7, we have rupture characterizations for only 4% of the M < 4.0 earthquakes. Therefore, for those events we assume for an isotropic distribution of coseismic stress changes (Helmstetter et al., 2005;Marsan, 2005), following the formulation of Chen et al. (2013): with M 0 the seismic moment and r the hypocentral distance.
We also produce two additional models, CRS-6s and CRS-7s, that isolate the sole contribution of secondary triggering (M4+ and M3+ stress sources, respectively) on model performance, where we omit the rate-andstate optimization by implementing the same fault constitutive parameters previously derived for CRS-5. We take these two auxiliary models into consideration during the model validation stage.
In the ETAS model, we set α = β = b · log (10) with Gutenberg-Richter b value = 1, requiring larger magnitude earthquakes to have a higher triggering potential than the small ones. For the estimation of the ETAS parameters, shown in Table S1, we use the M3+ ETAS learning phase seismicity within a polygon covering the entire Italian mainland (see Figure S3a and Text S2.1). To account for earthquake interactions outside the spatiotemporal boundaries of the inversion, we also use events in auxiliary spatial and temporal windows ( Figure S3). The inverted parameters are in close agreement with those of Seif et al. (2016) for the same areal extent, time window, and M cut . We fix the ETAS parameters for the whole forecast horizon, and we use them to simulate 1,000 synthetic catalogs within each forecast window (dt). Details on the simulation procedure are provided in the supporting information (see Text S2.2).

Results
In this section, we present the forecast results in the form of (a) temporal evolution of expected seismicity, (b) maps of predicted earthquake occurrence for period-specific (between mainshocks), and long-term (1 year) time windows. We then focus on the performance evaluation using the N, S, T-test metrics (Rhoades et al., 2011;Zechar et al., 2010) and comment on the absolute and relative predictive power of the models. Figure 2a compares the expected and observed daily rates for M3+ events. The preliminary CRS-1 and CRS-2 models systematically underestimate the daily seismicity rates by an order of magnitude over the entire period, whereas the first updated model CRS-3 presents a tenfold increase of expected rates arising from the optimized values of the fault constitutive variables.

Forecast Time Series
Refined source parameters together with the optimized rate-and-state variables (CRS-3), the systematic introduction of SVP receiver faults (CRS-4), and heterogeneity in slip models (CRS-5) bring stress-based models closer to the observed rates, especially during the first day following each mainshock.
CRS-5 and CRS-6 share very similar expected rates in the 24 hr following each large magnitude event, standing out among all physics-based models for fitting more closely the short-term (1 day) seismicity  Journal of Geophysical Research: Solid Earth after the M w = 6.0 Amatrice, M w = 5.4 Visso, and Campotosto earthquakes. These two model implementations share the same behavior as ETAS after the Amatrice and Norcia mainshocks. CRS-7, with M3+ secondary triggering effects, overestimates the short-term (≤ 1 day) seismicity rates after the larger magnitude events (e.g., 376 M3+ expected events against 237 observations in the 24 hr following the M w = 6.5 mainshock); however, this result is biased by early incompleteness of the real-time catalog, as the Mc is larger than 3.0 for at least 6 hr after the Norcia event ( Figure S5). For t > 10 days, CRS-6/7 and ETAS closely match the observed rates whereas CRS-3/4/5 generally underestimates.
In Figure 2b we show the cumulative number of expected events versus the cumulative real-time observations. We see that (1) ETAS captures the seismicity decay of the AVN sequence, while CRS models show a faster decay within the first few months, (2) CRS-7 is the only model that adequately forecasts the total number of observations within Poissonian error, and (3) ETAS and the stress-based models from CRS-3 onward satisfactorily estimate the seismicity after the Campotosto earthquakes. Figure 3 presents the forecast maps illustrating the difference in spatial distribution of expected rates between the preliminary (CRS-1) and the more elaborate physics-based models (CRS-7) against the statistical ETAS

10.1029/2019JB017874
Journal of Geophysical Research: Solid Earth forecast. A full set of forecast maps, including first-day forecasts, is provided in the supporting information ( Figures S8-S15).

Amatrice to Visso
The preliminary CRS-1 forecast presents low rates in the near-source region at the northern section of the Mount Gorzano fault, as well as in the epicentral area of the M w = 5.4 aftershock occurred 1 hr after the M w = 6.0 mainshock (Figure 3a). The most advanced CRS-7 model includes SVP receivers, a spatially variable slip model for the Amatrice I/II events, and secondary triggering effects from M3+ earthquakes; we observe good visual correlation between the observed and the expected seismicity rates over the entire region with notable increased rates near the two epicenters and close to Amatrice and Mount Vettore (Figure 3b). The ETAS model (Figure 3c) presents a smoother, more isotropic distribution of the higher expected rates over the entire epicentral area in comparison with CRS-7, including locations where no aftershock occurred in the first 2 months.
Comparing between CRS-1 and CRS-3 ( Figures S8a-S8c and S9a-S9c), we find that optimized rate-and-state parameters introduce 1-2 orders of magnitude rate increase around the M w = 6.0 event. Comparing CRS-4 to CRS-5 (Figures S8d and S8e and S9d and S9e) shows that heterogeneous slip representations for the Amatrice I/II mainshocks lead to high expected rates in epicentral regions and toward Mount Vettore. In CRS-6 (Figures S8e and S8f and S9e and S9f), we observe that secondary triggering effects from the 17 M4+ aftershocks do not exert an obvious difference in the spatial forecast from model CRS-5.

Visso to Norcia
Similar to the previous time window, CRS-1 predicts particularly low rates immediately after and in the region of the M w = 5.4 (Visso I) and M w = 5.9 (Visso II) October events and fails to capture the aftershock activity north of Ussita (Figure 3d). In CRS-7 we observe a noteworthy increase of expected rates in nearsource up to 5 orders of magnitudes with respect to CRS-1, and aftershock production is favored on the northern tip of the Mount Bove-Mount Vettore (MtBV) fault system thanks to the spatially variable rupture model for the Visso II event and the inclusion of M3+ secondary triggering effects (Figure 3e). The ETAS model presents a good fit with the observed seismicity; the area of higher expected rates extends eastward to Mount Bove, while the area of Norcia presents at least 10 times lower predicted rates compared to the Visso I/II epicentral region (Figure 3f).

Norcia to Campotosto
The preliminary CRS-1, in which stress perturbations are based on simplified uniform rupture distributions, fails to forecast the dramatic seismicity increase following the M w = 6.5 Norcia mainshock, presenting low rates in the near-source region and seismicity suppression on the Mount Bove fault (Figure 3g). The incorporation of the complex Norcia mainshock rupture process, which involved a~30 km long fault plane, produces (1) near-source coseismic stress heterogeneities and (2) a halving of the stress shadows SW of Mount Vettore and between Ussita and Mount Bove (see Figure S6); this results in a systematic increase of the expected seismicity in the respective areas from CRS-5 onward. The strength of the complex physicsbased model CRS-7 is that it better captures the triggered seismicity south of Amatrice due to the incorporation of M3+ secondary triggering effects (Figure 3h), while M4+ effects in CRS-6 ( Figure S12f) present an expected seismicity pattern similar to CRS-5 ( Figure S12e). From visual comparison, we find the main differences between ETAS and CRS-7 in the high clustering region around Mount Bove, where the former expects more than 1 order of magnitude higher rates than its stress-based counterpart, and in the near epicentral area, where the aftershock rates predicted by the ETAS model are approximately 2-3 times lower than CRS-7 (Figure 3i). This latter observation is possibly due to the early catalog incompleteness following the Norcia mainshock ( Figure S5).

One-Year Forecast
The preliminary CRS-1 (Figure 3j) widely underestimates the observed aftershock production; we see an acceptable spatial distribution of the expected seismicity, but the estimated rates are systematically <1 event/cell over the entire testing region. Model CRS-7 (Figure 3k) addresses this underestimation and is characterized by higher rates, particularly at intermediate off-fault distances (5-10 km from the mainshock faults) where a significant amount of smaller magnitude aftershocks occurred (~45%), contrary to the M4+ events that were mostly located at shorter ranges from the main ruptures ( Figure S7). The 1-year ETAS forecast (Figure 3l) captures the spatial distribution of the aftershock zone accurately, but due to its branching nature and the lack of a seismicity suppression mechanism, it projects a larger aftershock zone.

Model Performance in the Testing Region
We evaluate the absolute and relative predictive skills of the forecasts for a 1-year time period. We seek to test (1) the forecasted versus observed number of M3+ events using the modified N-test scores (Zechar et al., 2010) and rejection ratios (R N ), (2) the spatial consistency of the models through their S-test joint loglikelihood scores (jLLs) at short and long term (Schorlemmer et al., 2007;Zechar et al., 2010), and (3) the information gain per earthquake (G, T-test; Rhoades et al., 2011) with respect to the preliminary CRS-1 model. A thorough description of the statistical metrics used in this paper is available in the supporting information (see Text S3). In Table 3 we summarize model performance. Figure 4 presents the 1-day incremental N-test scores for the most preliminary (CRS-1) and updated (CRS-7) physics-based models and for the ETAS forecast, where quantiles δ 1 or δ 2 lower than the 0.025 significance level indicate model rejection due to rate underestimation or overestimation, respectively. The incremental N-test results for the full set of models are provided as supporting information ( Figure S16). We observe that (1) CRS-1 is rejected due to underestimation of the observed number of events (δ 1 = 0) at short term (1 day) and at t > 1 week after each mainshock ( Figure 4a); (2) CRS-7 and ETAS present~50% overestimation of the number of events immediately after the Amatrice, Visso, Norcia, and Campotosto events (Figures 4b and 4c); (3) the ETAS model features the lowest R N (6%) and while it shows a good performance in the time period following the Norcia mainshock, it suffers rejection for 3 days following the Campotosto events. On the other hand, CRS-7 presents a mixed performance after the Amatrice, Visso, and Norcia events, with overestimation within the first 24 hr and underestimation for t ≤ 1 week, but it passes the test for t > 24 hr after the Campotosto seismicity. CRS-2 performance is similar to CRS-1, as these two preliminary models feature the highest N-test rejection ratios (R N ≈ 30%) during the 1-year testing period (Figures S16a and S16b). Models CRS-3/4 show better overall R N values (15% and 14%, respectively) but underpredict (δ 1 < 0.025) during the two weeks following the two largest events of the sequence (Figures S16c and S16d). We find that this is most probably due to the implementation of simple uniform slip models that project negative coseismic stress changes on the Mount Vettore-Mount Bove fault system after the Amatrice earthquake (see Figures S8c and S8d) and suppress the near-source expected rates after the Norcia mainshock with a heavy stress shadow (Figures S11s and S12d).
Although overforecasting in the first 24 hr following the Amatrice and Norcia earthquakes, CRS-5 and CRS-6 are the only two models that pass the test after the M w = 5.4 Visso event and further reduce the overall R N to 12% and 8.5%, respectively (Figures S16e and S16f).
We use the S-test joint log-likelihood scores (jLL S ) to evaluate the predictive power of the forecasts in space. We present the earthquake forecasts expressed by the joint log-likelihood over the entire testing region,  Note. jLL S = S-test joint log-likelihood; N F/O = ratio between forecasted (F) and observed (O) number of events; G CRS-1 = information gain from the preliminary CRS-1 model. Log-likelihood values are negative by definition, and smaller absolute values indicate a better model performance. We note how CRS-3 performance is severely penalized in the 24-hr period after the Norcia mainshock by its poor spatial consistency, leading to a deterioration of the information gain on CRS-1 to a negative value.

10.1029/2019JB017874
Journal of Geophysical Research: Solid Earth where the expected rate at each spatial cell is multiplied by N obs /N fore , the ratio of forecasted to observed events in the entire region, so that the normalized forecast matches the observed number of events. Higher jLL S values indicate a better model performance (Zechar et al., 2010). Results from the short-term performance test (Table 3) show that (1) the ETAS model outperforms all the stress-based counterparts in the first 24 hr after the Amatrice, Norcia, and Campotosto events; (2) the elaborate physics-based model CRS-7 presents the best jLLs score following the Visso earthquakes and it constantly achieves a better   short-term performance than all the other CRS models; (3) CRS-5 is the second best CRS model in short-term windows, highlighting the importance of including spatially variable slip sources for physics-based modeling; the only exception is represented by the first 24 hr following the Campotosto seismicity where two out of the four M5+ earthquakes (namely the M w = 5.1 and M w = 5.0) lack a spatially variable rupture model, while for the M w = 5.4 and M w = 5.5 events only a preliminary slip model release is available.
In Figure 5a we compare the evolving spatial performance of the forecasts for the 1-year testing period after the M w = 6.0 Amatrice occurrence, including the auxiliary models CRS-6s and CRS-7s that isolate the effect of secondary triggering from M4+ and M3+ sources, respectively. Results show that (1) CRS-7/7s, including secondary triggering effects from M3+ aftershocks, are the best physics-based models, particularly when rate-and-state parameters are optimized on the learning phase seismicity (CRS-7); (2) the ETAS model has the highest spatial joint log-likelihood values over the entire testing period (jLL S = −4326), with a striking similar performance with CRS-7 in the 4 days between the Visso-Norcia earthquakes. While models CRS-1/2/4 perform similarly through the evaluation period, CRS-3 is characterized by the lowest scores because of the oversimplification of coseismic slip for the M w = 6.5 Norcia mainshock. When we compare CRS-6 to CRS-6s, we note that the parameter optimization degrades CRS-6 performance after the Norcia mainshock. The long-term likelihood scores of CRS-6s are only slightly better than CRS-5, but a lower magnitude threshold for stress sources (CRS-7/7s) remarkably increases the short-and long-term spatial consistency of the models.
We estimate the information gains per earthquake using the preliminary CRS-1 as benchmark model (G CRS-1 ). A positive G score indicates that a model is more informative than the selected benchmark. In Figures 5b-5e, we present the results in the form of average daily information gains per earthquake and their 95% confidence bounds from a paired Student's t-test (Rhoades et al., 2011) for CRS and ETAS forecasts. From the analysis of the G ranks, we note that CRS-7 is systematically the best performing among all CRS classes at all time periods (Figures 5b-5d and Table 3). The majority of CRS forecasts show positive information gains (G CRS-1 ≥ 2), indicating a significant improvement on the preliminary CRS-1 model. For the critical~4-day time window before the Norcia earthquake, CRS-7 and ETAS models present a comparable information gain (ΔG CRS-1 < 0.3) with respect to the reference CRS-1, and CRS classes 5/6/7 show a considerably good performance (G CRS-1 ≈6-7, Figure 5c). Especially in the first day after the Visso events, we find that CRS-7 outperforms the ETAS forecast (Table 3); the observed poorer performance of the Journal of Geophysical Research: Solid Earth statistical model is most likely caused by the lack of M3+ precursory seismicity in the near-epicentral region of the first M w = 5.4 Visso event (Chiaraluce et al., 2017). The 1-year performance evaluation reveals that the most advanced physics-based implementation, CRS-7, compares closely with the ETAS model (ΔG CRS-1 ≈0.5, Figure 5e). The trend of improving information gain from CRS-5 to CRS-6s/7s shows the importance behind considering stress changes from lower magnitude aftershocks. These results suggest that including SVP receivers, finite-length rupture models, and secondary triggering from M3+ sources in CRS models leads to a gradual information gain increase.
We assess the spatial component behind the models' performance by information gain maps for a 1-year time horizon (Figure 6), while period-specific maps are provided as supporting information (Figures S17-S19). We calculate the cumulative log-likelihood differences (ΔLL) at each grid point using CRS-1 (Figures 6a-6f) and ETAS (Figures 6g-6i) as reference models. In Figure 6, positive ΔLL values (green) highlight improved performance with respect to the reference model. Results show that (1) variable slip source representations lead to a sensible information gain increase in the Norcia-Visso near epicentral region (ΔLL up to 100) and on the Mount Bove-Vettore fault system (ΔLL > 60; Figure 6d); (2) CRS-7 better captures the unfolding sequence, as considering the contribution of M3+ sources into the evolving stress field entails a net performance improvement especially at the extreme edges of the fault system (ΔLL > 250 north of Visso and ΔLL > 50 in the Montereale-Pizzoli region; Figure 6f); (3) updated source parameter description of mainshock faults through optimized rate-and-state parameters improves the forecast in many locations but reduces the likelihood in a broader aftershock region (Figure 6b), particularly at Mount Bove location (ΔLL < −100); (4) model performance at the northwestern MtBV fault termination greatly benefits from the combined inclusion of SVP receiver planes (Figure 6c) and, to a greater extent, from finite-length slip models describing the Visso and Norcia coseismic slip (Figure 6d).
When we compare physics-based forecasts to ETAS, we observe that, although CRS models generally present a slightly positive information gain in the off-fault areas of the testing region (ΔLL ≈0.3), the log-likelihood differences between pairs of models are more marked where aftershocks were actually recorded. We find that ETAS widely outperforms its physics-based counterparts in the near-source regions (Figures 6g-6i), where slip models present large variability (Chiaraluce et al., 2017;Scognamiglio et al., 2016). However, we find that secondary triggering effects significantly increase the predictive power of CRS models that outperforms ETAS in the high clustering zone north of Visso-Ussita and in the Montereale-Pizzoli region (Figure 6i).
The model parameters that exert most influence on the predictive power of physics-based models are the spatially variable slip distributions and the inclusion of the secondary triggering effects from small magnitude earthquake (M3+). The most evolved CRS model captures the triggered seismicity at the fault edges, therefore improving the forecast in the case of the Campotosto cluster further south than Amatrice.
In Figure 7 we isolate the ΔLL in the critical period between the Visso and Norcia events, and we compare CRS-7 performance to CRS-1 and ETAS. We notice that CRS-7 is a significantly more informative model than CRS-1 in the time window that precedes the M w = 6.5 mainshock (Figure 7a). Results also confirm the comparable performance between ETAS and CRS-7; the former better captures the near epicentral seismicity in the area of the Visso events, while the latter is better performing in locations of high aftershock production at fault extremities.

Conclusions
The complex 2016-2017 Amatrice-Visso-Norcia (AVN) earthquake sequence in the Central Apennines represents a unique opportunity to test earthquake triggering hypotheses that are the basis for physics-based forecast models. Here we focus on how input data quality and model parameters influence the predictive power of physics-based forecast models. We design a pseudoprospective experiment for the first year of the AVN sequence. We implement a benchmark statistical ETAS model and seven classes of physics-based forecasts with gradually increasing level of input data quality and complexity. We then evaluate the absolute and relative model performance by means of the N (number), S (space) and T-test metrics that are currently implemented within the CSEP initiative.
We find that the representation of the crustal and stress-field heterogeneity increases the information gains of physics-based models. The CRS model that predicts closely and in cases outperforms the statistical ETAS

10.1029/2019JB017874
Journal of Geophysical Research: Solid Earth model combines best-available source models, spatially varying receivers informed by preexisting focal mechanisms and geological maps, and most importantly includes secondary triggering effects from M3+ events that enable us to describe the evolving coseismic stress field in greater detail. After the M w = 5.4 Visso event, the elaborate CRS-7 model outperforms ETAS, while in the critical 4-day time window before the M w = 6.5 Norcia mainshock and within the first year of the sequence, CRS-7 is as informative as ETAS.
The results support previous findings that in the near-source area, defined by the surface fault projection, ETAS models present higher predictive power (e.g., Segou et al., 2013), although the lack of precursory seismicity (e.g., before the Visso events) locally hampers their short-term performance. On the other hand, triggered seismicity at intermediate off-fault distances due to static stresses within an evolving sequence is better described by physics-based approaches. The preliminary unrevised source parameters and empirical source models released within a few minutes to hours following large events result in poor performance of stressbased models that by definition require a more demanding physical parametrization than their empirical counterparts. The comparative evaluation of the spatial consistency of CRS models suggests that the oversimplified uniform slip models implemented immediately after a large earthquake cannot adequately reproduce the early aftershock spatial distribution (< 1 day).
The Italian experiment reveals that in order for stress-based forecasts to reach and outperform ETAS models we need to use best-available data within heterogeneous fault and source representations. Although highquality relocated catalogs are not yet readily available during the early stages of a seismic crisis, both CRS and ETAS models in our study will improve their predictive skills with enhanced detection and event characterization techniques.