A New Statistical Test Based on the WR for Detecting Offsets in GPS Experiment

Detecting the probable offsets is a crucial step in the preprocessing of the Global Positioning System (GPS) coordinate time series. Undetected offsets lead to the biased estimation of time series parameters and their uncertainties resulting in data misinterpretation. In the current research, a DIA (detection, identification, and adaptation)‐based procedure in maximal overlap discrete wavelet transform (MODWT) rough space has been introduced to address the location of offsets in long GPS time series without a priori information of the functional or stochastic models. A remarkable property of a wavelet rough (WR) at lower‐scale (j ≤ 5) details is to reflect the local regularity of the time series, being large where the signal is irregular and small where it is smooth. Performance and effectiveness of the proposed approach have been shown with DOGEx (Detection of Offsets in GPS Experiment) data set, which was a simulated time series that mimicked realistic GPS data consisting of a velocity component, seasonal cycle, offsets, and white and flicker noises composed in an additive model. The results showed that the fifth percentile range (5% to 95%) in velocity biases was equal to 1.24 mm/yr, which was smaller than all tested automatic solutions. Furthermore, the offsets detected by this method were split into 34.3% of true positive (TP), 36.5% of false positive (FP), and 29.2% of the false negative (FN), offering the proposed method as the best automatic solution.


Introduction
The repeated observations of Global Positioning System (GPS) have been used over the past two decades for estimates of crustal deformation at discrete points on the Earth's surface. However, as the quantity and quality of the measurements improved, it became evident that GPS coordinate time series provided more reliable and accurate information even for small geophysical signals (Kawamoto et al., 2017;Montillet et al., 2015). Furthermore, a proper analysis of the long-period GPS time series (several years up to 10 yr or more) is essential for subsequent geodetic and geophysical interpretations (Williams, 2003a). The functional model of long-term GPS time series consisted of a linear trend, seasonal cycle (e.g., annual and semiannual periodic terms), probabilistic offsets (change points), data gaps, and a combination of white and flicker noises (Williams et al., 2004).
The offsets can be caused by the rapid releases of seismic stress or artificial events such as environmental factors, equipment changes, or human interruptions (Williams, 2003b). Nevertheless, the least squares estimations of the dynamics and deformation patterns require offset-free observations to provide unbiased trends and associated uncertainties. The remaining unmodeled effects may lead to the biased estimates of the model results or unreliable coordinate forecasts. By analyzing 10 various published data sets, Williams (2003b) estimated that on average, time series are contaminated by one offset every 9 yr, but this could be as frequent as one in every 2 yr. He found that unidentified offsets in the time series could mimic random walk behaviors. Likewise, Gazeaux et al. (2013) pointed out that, even the effect of repeated small offsets could significantly impair the estimates of station velocities, which was especially intended for long time series.
Various approaches have already been proposed to address the detection of offsets in GPS time series. Teunissen (1998) introduced an approach, namely, detection, identification, and adaptation (DIA), based on using both functional and stochastic models for detecting unknown systematic effects, identification of the potential source of the model's error, and the acceptability of null hypothesis which was needed to eliminate the presence of biases in the solution. This combination of estimation and testing could be rigorously captured in the DIA estimator as introduced in Teunissen (2018). ©2020. The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Several studies have used the DIA scheme in the context of different algorithms to detect the most probable discontinuities in the GPS coordinate time series (Ostini et al., 2008;Perfetti, 2006;Roggero, 2012). Borghi et al. (2012) and Vitti (2012) investigated the use of an approach based on the process of segmentation data for detecting discontinuities. Khodabandeh et al. (2012) used the robust M-estimator method in treating observational blunders or offsets for the reliable estimate of unknown parameters when the number of observations was sufficiently large. Gazeaux et al. (2013) illustrated the results of various approaches for identifying the offsets proposed by different researchers. All automated solutions were compared to the true offset epochs, within the 5-day window and the true velocity using DOGEx data set. The results showed that manual methods (where offsets are handpicked) almost always provided better results than automated or semiautomated methods. Furthermore, Bruni et al. (2014) suggested a method whose epochs of offsets were determined with the sequential t test analysis of regime shifts (STARS) and then manually removed, as recommended in Gazeaux et al. (2013). However, their results were not directly comparable with those obtained in the framework of DOGEx due to two reasons: (i) assuming time-constant noise at each site and (ii) identification of a TP (true positive) within the 25-day window. Montillet et al. (2015) developed a way to estimate coseismic offsets in the far field of an earthquake based on the statistical analysis of a coseismic offset series. Gazeaux et al. (2015) introduced a joint segmentation procedure of multiple GPS coordinate series for nearby stations. The approach simultaneously could estimate station-specific trends, seasonal signals, and meaningful offsets between series from adjacent stations. Furthermore, Blewitt et al. (2017) used Median Interannual Difference Adjusted for Skewness (MIDAS) as a robust trend estimator that was resistant to undetected offsets. Later on, Amiri-Simkooei et al. (2019) presented a method for detecting offset using multivariate analysis. They considered the three coordinate components including north, east and up simultaneously. The idea was based on comparing the original model under the null hypothesis (a model without offset) and the extended model under the alternative hypothesis using the generalized likelihood ratio (GLR) test.
The theory of wavelets permits the decomposition of functions into localized oscillating components. This is an ideal tool to study localized changes such as offsets in one dimension as well as three dimensions (Wang, 1995). However, many researchers such as Mallat and Hwang (1995), Yuan and Zhongjie (2000), Antoniadis and Gijbels (2002), and Raimondo and Tajvidi (2004) focused on determining the change point detection by wavelet methods. The current study aimed at detecting the possible offsets using the multiresolution analysis (MRA), based on maximal overlap discrete wavelet transform (MODWT), which was locally adapted to track the local time-frequency variations. Moreover, the last element in the MRA decomposition is known as the wavelet smooth, while the remaining elements are known as the wavelet rough (WR). A remarkable property of the WR at fine scales is to reflect the local regularity of the original signal, being large where it is irregular and being small where it is smooth (Percival & Walden, 2000). This property could be very useful to detect offsets in GPS coordinate time series.
The main concepts of MODWT have been explained in the next section. The subsequent section has dealt with criteria for choosing a decomposition level for optimal scale choice during the transformation process, namely, wavelet variance. The third section reviewed the model parameters of the GPS coordinate time series. The remainder of this section examined offset detection using DIA scheme both with the least squares assumption and in the frame of WR. Through the simulated data set, DOGEx, the capabilities of WR method has been illustrated in detecting the offsets. Finally, some concluding remarks have been drawn.

MODWT
The MODWT is presented to overcome the deficiencies of the discrete wavelet transform (DWT). The DWT contains a number of drawbacks as follows: (i) It requires the length of the time series to have a power of 2 for the full transform, (ii) it lacks a shift invariance associated with the wavelet and scaling coefficients, (iii) it limits the ability to carry out statistical analyses on the wavelet and scaling coefficients due to decreasing their numbers by factor 2 for each increase in the level of the transform, (iv) a time series and a circular shift of the series can have different empirical power spectra, and (v) analyzing variance (ANOVA) based on the wavelet and scaling coefficients may be influenced by the circular shifting of the original time series (Mondal & Percival, 2012). Namely, for any time series y of length N, the jth level MODWT wavelet and scaling coefficients are N dimensional vectors f W y j and e V y j whose elements are given by 10.1029/2019EA000810

Earth and Space Science
where the rows of f W j and e V j contain circularly shifted versions of jth level MODWT wavelet filter f e h j; l g and scaling filter fe g j; l g periodized to length N, respectively. The detail and smooth sets of MODWT at jth level are defined by e D for any J 0 ≥ 1, the MODWT-based MRA is given by Namely, MRA can be viewed as the sum of a constant vector e S y J0 and J 0 other vectors e D y j , each of which contains a time series related to variations in y at a certain scale. A level of a MRA is is coarser (finer) with respect to another whenever the index of the corresponding subspace is smaller (bigger) (Mallat & Hwang, 1995). Furthermore, the jth level rough for MODWT may be defined as The rough e R y j is what remains after removing the smooth from the vector of observations. For the smaller j, the WRS would be less smooth. A vector of observations may thus be decomposed through wavelet smooth and rough sets via y ¼ e S y j þ e R y j for all j.

Wavelet Variance
The wavelet variance decomposes the variance of a time series into the components associated with different scales. Namely, it breaks up the process variance into the pieces, each of which represents the contribution to the overall variance due to the variability on a particular scale (Mondal & Percival, 2012). In particular, an unbiased MODWT estimator of the wavelet variance may be obtained on the condition that the wavelet coefficients are used that are free of boundary effects (Percival & Walden, 2000): where M j ¼ N − L j + 1 and L j is the length of MODWT wavelet filter f e h j; l g or scaling filter fe g j; l g. The wavelet variance (also called the wavelet power spectrum) decomposes the variance of a time series on a scale-by-scale basis and provides a time and scale-based analysis of variance. The small values of variance may lead to noise contamination, whereas the large values of variance may lead to signal features (Mondal & Percival, 2012). This leads to the optimal scale choice during the transformation process, which could be the first local minimum value of the plot of the wavelet variance versus the scale. The selected scale has less contribution to the total variance due to noise contamination. It also removes the noisy high-frequency variations, providing the true underlying signal.

Model Parameters of the GPS Coordinate Time Series
The recorded daily solution of GPS observation is given by Earth and Space Science probable offsets, indexed by the number of n g offsets, magnitudes ς, and epochs T g , are corrected by the unit step function u. The last term in Equation 6 describes the error term. Assuming the offset epochs to be known, the vector of the unknown parameters is β ¼ [ϱ ϑ ζ ξ ς] T and the linear model is given by where A is the design matrix of observation equations. Furthermore, the moments of the firstand second-order observations of x are respectively denoted by where Q x is the covariance matrix of the observation vector x or the residual vector e. As shown by several authors (Amiri-Simkooei et al., 2007, 2019Williams et al., 2004;Zhang et al., 1997) the combination of flicker noise (power law noise with spectral index κ ¼ −1) and white noise (power law noise with spectral index κ ¼ 0) is the best stochastic model for the GPS coordinate time series. Furthermore, several researchers also acknowledged the presence of random walk noise (power law noise with spectral index κ ¼ −2) (King & Williams, 2009;Langbein, 2008).

DIA Procedure for the Offset Detection
The method implies that both functional and stochastic models for hypothesis testing procedure originated from the works of Baarda (1968) and Teunissen (1998). In this method, the vector of observables x was assumed to be normally distributed x ∼ NðAβ; Q x Þ with the known covariance matrix Q x . The null hypothesis stated that the functional model, namely, H 0 : E{x} ¼ Aβ, was unbiased and subjected to the covariance matrix of observations D{x} ¼ Q x . Furthermore, it was assumed that the functional model was correct; that is, it did not contain any unmodeled systematic effects or gross errors. Meanwhile, time-correlated noise should be considered in the structure of the covariance matrix. Neglecting colored noise can result in the high rates of false negatives (FNs) and false positives (FPs), which degrade the performance of the DIA procedure (Gazeaux et al., 2013). The alternative hypothesis H A represented that the functional model contains the potential bias C∇ having C as the known error and ∇ as the unknown magnitude, namely, H A : E{x} ¼ Aβ + C∇ subject to D{x} ¼ Q x . The purpose of testing H 0 against H A was to infer whether a set of observations confirmed H 0 or the data rejected H 0 based on H A . The test statistic was given as (for further details, we refer the reader to Teunissen, 1998) x Þx was the least squares residuals under the null hypothesis. The null hypothesis was rejected in favor of the alternative hypothesis if P d ≥ F α (d, ∞, 0), where F α (d, ∞, 0) corresponds to F distribution with d and ∞ degree of freedoms and α as the level of significance.
The bias term corresponding to an offset occurred at the jth epoch t j could be represented by a vector The DIA procedure found the most probable offset using the largest value of the statistical test, if it was statistically significant. According to test statistics given by Equation 9, the offsets detection based on the DIA procedure depended not only on the correct structure of the design matrix (A) but also on the full covariance matrix of observations (Q x ).

Offset Detecting Using the WR
Due to the drawbacks of ordinary DIA procedure on its dependency to design matrix (A) and covariance matrix (Q x ), it was preferred to transform it into MODWT rough space e R could be rewritten as where e R c J was the offset bias term for the rough set e R x J andê was the least squares residuals given bŷ where the associated covariance matrix in Equation 11 is given by where J ∈ R N × N was an integer matrix consisting of all ones. Furthermore, according to Equation 11, the offsets detection based on the WR depended neither on design matrix (A) nor on the covariance matrix of observations (Q x ). Table 1 illustrates the symbolic algorithm for the implementation of the WR to address the location of the offsets in CGPS time series. Gazeaux et al. (2013) have simulated three-dimensional coordinate time series containing known and realistic GPS signal, white and flicker noises (1/f spectrum noises), offsets, and data gaps. They have produced up to 18 yr of simulated GPS daily coordinate time series for the three components (north, east, and up) and for 50 idealized sites. According to Langbein (2008), they have assumed that the noise was not necessarily time-constant at each site. Three types of events have been considered to assess the quality of the method, namely the TP, the FN, and the FP events. A TP defined an offset that was originally simulated and detected by a solution within a 5-day window. A FP defined an offset that was not simulated but has been reported by the solution. Finally, the FN defined an offset that was simulated but has not been detected. A total of 25 solutions, consisting of 20 automatic solutions and 5 manual solutions, for offset detection were previously tested by Gazeaux et al. (2013).

DOGEx Data Set
Likewise, they found that the magnitude of detectable offsets by the manual solutions was smaller than the automated solutions; the smallest detectable offsets for the best manual and automatic solutions were equal to 5 and 8 mm, respectively. Overall, the best automatic solution in terms of these metrics was AIUBCOD2. This solution was an iterative DIA procedure (as explained in section 3.1) based on step-by-step algorithm for the modification of the functional model, which allowed introducing new offsets, velocity changes, or periodic functions at any iteration step.

Processing Using WR Method and Results
The proposed method was first applied to the simulated DOGEx data set and compared with the results of 18 best automatic and 5 manual solutions as indicated in Gazeaux et al. (2013). It should be noted that we have used a 3-day window for defining TP, FP, and FN events, whereas Gazeaux et al. (2013) have defined them based on a 5-day window. Therefore, the statistics for the previous solutions have been recomputed according to the new window criteria (Williams, personal communication, 2019). Hence, our results may differ somewhat from their results.

Earth and Space Science
The ratios of the three variables TP, FP, and FN by their positions in an equilateral triangle are shown in Figure 1. The perfect method would appear on the bottom right-hand corner of the triangle. Accordingly, the proposed method was the best solution with TP, FP, and FN rates of 34.3%, 36.5%, and 29.2%, respectively. This is while the related values for the former best automatic solution (AIUBCOD2) were TP ¼29%, FP ¼33%, and FN ¼38%. It can be seen that the proposed method offered lower FN but higher TP and FP compared to AIUBCOD2.
According to the definition of Ostini (2012), the sensitivity or TPR (true positive rate) for the best automatic solution (AIUBCOD2) was TP/(TP + FN) ¼ 0.43 and FPR (false positive rate) or 1-specificity was FP/(FP + TN) ¼ 0.25 × 10 −3 , where TN (true negative) was the complement of FP. If all offsets are correctly identified, the TPR and FPR values were, respectively, 1 and 0. The results showed such values for WR method to be TPR ¼ 0.54 and FPR ¼ 0.28 × 10 −3 , as illustrated by Figure 2.
Furthermore, the performance of different solutions in terms of fifth percentile ranges in velocity absolute biases is illustrated by Figure 3. A perfect solution would have lower velocity bias with a small offset detection threshold and would appear in the bottom left-hand corner of the plot. Results indicated that the AIUBCOD2 method was the best automatic solution of Gazeaux et al. (2013) with an offset detection threshold of 8 mm and a related velocity bias equal to 1.5 mm/yr. However, the proposed method had an equivalent offset detection threshold of 5.7 mm with velocity bias equal to 1.24 mm/yr.
Overall, the results based on these metrics indicate the remarkable performance of the proposed method against the simulated time series. However, the best manual solution (SDPWMANL) is still better than the proposed approach.

Conclusion
Due to the drawbacks of the ordinary DIA procedure regarding its dependency on design and covariance matrices, it was preferred to transform it into MODWT rough space which made it free of dependencies on those matrices. The performance of the method in finding the offsets was tested using simulated DOGEx data set. The results compared with other offset detection methods introduced in Gazeaux et al. (2013). The comparison was done using three different metrics. The WR approach had 34.3% of TP, 36.5% of FP, and 29.2% of the FN. Furthermore, the TPR and FPR values for WR method were, respectively, 0.54 and 0.28 × 10 −3 . Besides, the proposed method had an equivalent offset detection threshold of 5.7 mm, and 90% of the velocity estimates were less than 1.24 mm/yr distant to the simulated velocity. The outcomes confirmed that the recommended method was the best available automatic method in detecting offsets in GPS experiment. However, this approach is still not better than the best manual estimates.

Data Availability Statement
The DOGEx data set can be found in King et al. (2010)