A generalized Grubbs‐Beck test statistic for detecting multiple potentially influential low outliers in flood series
Abstract
[1] The Grubbs‐Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs‐Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less‐than” values, and a frequency distribution can be developed using censored‐data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right‐hand tail of a frequency distribution and provide protection from lack‐of‐fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
1. Introduction
[2] An important concern in the development of national flood frequency guidelines such as Bulletin 17B [Interagency Committee on Water Data (IACWD), 1982] or a new Bulletin 17C [Stedinger and Griffis, 2008; England and Cohn, 2008], is that the procedure be robust. That is, the recommended procedure should be reasonably efficient when the assumed characteristics of the flood distribution are true, while not doing poorly when those assumptions are violated. A critical issue is whether the low (or zero) flows in an annual flood series are relevant in estimating the probabilities of the largest events.
[3] Annual‐peak‐flow series, particularly those in the Western United States, often contain so‐called “low outliers.” In the context of Bulletin 17B, low outliers are small values which depart from the trend of the rest of the data [IACWD, 1982] and often reflect a situation wherein the smaller flood flows are unusually small given what one would expect based on the larger flood flows. For example, Figure 1 depicts the logarithms of 1 day rainfall flood annual peak flows for the Sacramento River at Shasta Dam, as computed by the Army Corps of Engineers, 1932–2008 (77 years). At this site, the three smallest observations appear visually to be unusually small. The figure includes a lognormal distribution fit to the top 74 observations. The standard Grubbs‐Beck test [Grubbs and Beck, 1972] (see equation 2, 10% significance level) generates a threshold of 7373 cubic feet per second (cfs), and thus correctly identifies the smallest observation as a low outlier, but not the second and third smallest observations.

[4] Low outliers and potentially influential low flows (PILFs) in annual‐peak‐flow series often reflect physical processes that are not relevant to the processes associated with large floods. Consequently, the magnitudes of small annual peaks typically do not reveal much about the upper right‐hand tail of the frequency distribution, and thus should not have a highly influential role when estimating the risk of large floods. Klemes [1986] correctly observes:
[5] “It is by no means hydrologically obvious why the regime of the highest floods should be affected by the regime of flows in years when no floods occur, why the probability of a severe storm hitting this basin should depend on the accumulation of snow in the few driest winters, why the return period of a given heavy rain should be by an order of magnitude different depending, say, on slight temperature fluctuations during the melting seasons of a couple of years.”
[6] The distribution of the proposed test statistic is derived specifically for the purpose Klemes suggests: to identify small “nuisance” values. Paradoxically, moments‐based statistical procedures, when applied to the logarithms of flood flows to estimate flood risk, can assign high leverage to the smallest peak flows. For this reason, procedures are needed to identify potentially influential small values so a procedure can limit their influence on flood‐quantile and flood‐risk estimates.
[7] This paper presents a generalization of the Grubbs‐Beck statistic [Grubbs and Beck, 1972] that can provide a standard to identify multiple potentially influential small flows. The present work is motivated by ongoing efforts [England and Cohn, 2008; Stedinger and Griffis, 2008] to explore potential improvements to Bulletin 17B [IACWD, 1982] with moments‐based, censored‐data alternatives [Cohn et al., 1997; Griffis et al., 2004]. The proposed statistic is constructed following the reasoning in Rosner [1975], who developed a two‐sided R‐Statistic “many outlier” test (RST), that is based on the following argument:
[8] “[t]he idea is to compute a measure of location and spread (a and b) from the points that cannot be outliers under either the null or alternative hypotheses, i.e. the points that remain after deleting 100 p% of the sample from each end.”
[9] In Rosner's implementation, p is some fraction of the total number of observations, n. We consider a one‐sided test statistic based on these concepts to detect PILFs in the left‐hand tail.
2. Literature Review
[10] A wide range of test procedures for identifying low outliers have been studied in the statistical literature [Thompson, 1935; Grubbs and Beck, 1972; Barnett and Lewis, 1994], including methods for dealing with the case of multiple low outliers considered here [Tietjen and Moore, 1972; Rosner, 1975; Prescott, 1975; Gentleman and Wilk, 1975; Prescott, 1978; Rosner, 1983; Marasinghe, 1985; Rousseeuw and van Zomeren, 1990; Hadi and Simonoff, 1993; Spencer and McCuen, 1996; Rousseeuw and Leroy, 2003; Verma and Quiroz‐Ruiz, 2006].
(1)
,
, and
denote the smallest, largest, and second largest observations respectively. Barnett and Lewis [1994] also discuss a Dixon‐type test for the second most extreme observation in either tail.
(2)
and
denote the sample mean and standard deviation of the entire data set. Any observation less than Xcrit is declared a “low outlier” [IACWD, 1982]. With Bulletin 17B low outliers are omitted from the sample and the frequency curve is adjusted, using a conditional probability adjustment [IACWD, 1982]. Kn values are tabulated in section A4 of IACWD [1982] based on Table A1 in Grubbs and Beck [1972].
(4)
and
are the sample mean and standard deviation of the entire sample. The observation corresponding to
is removed and T2 is computed from the remaining sample, i.e., the sample mean and standard deviation are computed from the remaining n − 1 observations. This process is repeated until
is computed, for some prespecified k.
[15] Rosner [1975]'s RST procedure is very similar to the ESD procedure, except that it is much less computationally intensive. As described above, an ESD test for k suspected outliers requires the computation of k trimmed sample means and standard deviations to construct k test statistics. The RST procedure only requires the trimmed sample moments to be computed once (after removing the k suspected outliers from the sample). Each of the k R‐statistics are then computed from the single trimmed mean and standard deviation.
[16] The Rosner [1975, 1983] tests detect outlier observations that are either unusually large or small. The ESD statistic is the Grubbs‐Beck test statistic generalized to consider both large and small outliers, and to test more than the smallest observation in a sample. The one‐sided multiple Grubbs‐Beck test statistic proposed in this paper differs from the Rosner [1975, 1983] ESD statistic in that it only considers low outliers and does not include the suspected outlier in its computation of the trimmed mean and standard deviation.
[17] In addition to equation (4), several other test statistics have been employed in sequential multiple outlier tests (Barnett and Lewis, [1994], p. 235). Published critical values achieve an overall type I error which matches the specified α for a given kmax, even though they employ multiple comparisons to decide on the number of outiers k, for k
kmax. This works well if the number of potential outliers is known.
[18] Bayesian outlier tests have also been proposed [see Ondo et al., 2001; Bayarri and Morales, 2003]. A primary feature of such tests is that they incorporate an explicit model for the alternative hypothesis, based on a contaminating distribution that is responsible for the outlier(s). Barnett and Lewis [1994] provides a good summary of the Bayesian approach to outliers and contrasts that approach with the frequentist approach. The analysis proposed here employs a traditional frequentist approach which does not require a specific alternative distribution for outliers.
2.1. Masking, Swamping, and Block Tests
[19] The Grubbs‐Beck test recommended in Bulletin 17B is, by construction, a 10% test for a single outlier in a flood sample [IACWD, 1982]; 10% is the probability of rejecting the hypothesis of normality when samples are drawn from a normal distribution. A reasonable concern is that a flood record could contain more than one low outlier and the additional outliers can cause the Grubbs‐Beck test statistic to fail to recognize the smallest observation as an outlier (by inflating the sample mean and variance). This effect is known as masking [Tietjen and Moore, 1972]. Siyi [1987] demonstrated the masking problem using the Bulletin 17B Grubbs‐Beck test with several flood records in China. Basically, the problem is that if a sample has several severe outliers, then the smallest observation does not look that unusual in the context of the other unusually small observations.
[20] Some multiple outlier tests consist of successive application of a single outlier test. These are generally divided into step‐forward and step‐backward tests [Spencer and McCuen, 1996]. Step‐backward tests first test if the most extreme (smallest) observation (k = 1) is an outlier. If the smallest (k = 1) proves significant, the next smallest (k = 2) is tested. This continues until no additional outliers are identified. Such step‐backward tests are particularly susceptible to masking [Barnett and Lewis, 1994, pp. 109–115]. Suppose the three smallest flood flows in a sample are much smaller than the remainder of the sample, and a successive step‐backward outlier detection test that uses the Grubbs‐Beck test statistic is used. Having three small outliers can cause the test to conclude at the first step (k = 1) that the smallest observation is not an outlier, so the procedure stops.
[21] The Rosner [1983] outlier test avoids masking by using a step‐forward test for at most k outliers. This means that the Rosner [1983] procedure first tests the kth most extreme observation; if the result is significant, all more extreme observations are also considered outliers. If not the k − 1 most extreme observation is tested, and so forth, working outward to the most extreme observation.
[22] If the number of potential outliers is thought to be k0, then a block test simultaneously tests for k0 outliers by computing a single statistic, perhaps based on the ratio of mean deviation from partial sample averages [see Tietjen and Moore, 1972]. Such block tests for k0 low outliers are particularly susceptible to swamping (Barnett and Lewis [1994], pp. 109–115). Consider use of a block outlier test for
low outliers applied to a sample with only one unusually small low outlier. The smallest observation can cause the second smallest to erroneously be identified as one of two outliers. The test statistic proposed here avoids the problem of swamping by not including the suspected outlier and all more extreme observations in the computation of the test statistic.
3. Concerns With the Grubbs‐Beck Test
[23] Spencer and McCuen [1996] raise three concerns with the Bulletin 17B low outlier‐identification procedure:
[24] 1. it provides critical values for only a single significance level,
(because of the limited tables);
[25] 2. it assumes zero skew (i.e., a normal distribution as the null hypothesis); and
[26] 3. it does not address multiple low outliers.
[27] To resolve these concerns, Spencer and McCuen [1996] used Monte Carlo simulation to identify critical
values for
,
, and
, the three smallest observations in a sample for significance levels
, and 1%; log‐space skew
, and 1.0; and sample sizes
. The test requires specification of the log‐space skew that defines the log‐Pearson type 3 (LP3) population that represents the null hypothesis. They recommend using a weighted average of the sample skew and a regional skew. If no regional skew is available, the sample skew is recommended. It is not clear if the estimated sample skew or an estimated weighted skew is appropriate for an outlier test which assumes the population skew is known independent of the data set.
[28] An additional issue is that the step‐forward multiple outlier test recommended by Spencer and McCuen [1996] uses the trimmed sample moments, including a trimmed sample skew coefficient. The trimmed skew coefficient is a very biased estimator of the true skew of a sample; and its use in a successive step‐forward test is highly suspect because it is not consistent with how the critical deviates,
, were developed.
[29] Hosking and Wallis [1997, p. 152] make two criticisms of Bulletin 17B's approach to low outlier detection. First, “the [Grubbs‐Beck] outlier adjustment is a complication that would not have been necessary had the logarithmic transformation not been used,” and, second, that “the criterion used …is arbitrary, being based on an outlier test from samples from the normal distribution at a subjectively chosen significance level of 10%.” Hosking and Wallis [1997] note that “no justification” for the choice of procedure was given beyond “saying that it was based on ‘comparing results' of several procedures.”
[30] Answers to these concerns can be found in Thomas [1985]. Thomas describes the process used to select Bulletin 17B's low outlier detection procedure from among 50 alternatives. This process involved first subjectively comparing their performance. Of the 50 procedures, 10 were judged to agree adequately with the subjective visual identification. Based on a Monte Carlo analysis of the 10 promising candidates, six procedures were selected. Comparison of subjective visual identification of additional outliers in flood records resulted in the selection of the current methodology.
4. Statistical Development
[31] A procedure is provided to compute critical values (p values) for the k smallest flows (PILFs) in flood frequency applications. We first present the context and notation that is compatible with current US National flood frequency guidelines [IACWD, 1982], and outline the generalization of the test statistic employed therein. A p‐value equation is then derived using a semianalytical approach with numerical integration, which enables the estimation of a critical value for any combination of k and record length n.
[32] Let
be a series of annual maximum flows. The values are assumed to be independent. Some annual peaks may not be floods. In arid and semiarid regions, such as the Southwestern United States, many watersheds have no discharge at all in some years, so zero flows appear in peak flow records. In the United States, one typically works with the logarithms of the annual peaks, denoted {X1,…,Xn}, and zero flows are always treated as low outliers that are afforded special treatment.
[33] We consider the sorted data set,
, where
denotes the smallest observation (the smallest order statistic) in a sample of size n. The problem is how to determine whether the kth smallest observation in the data set,
(and consequently
), is unusually small under the null hypothesis (H0) that
are drawn from a population of independent and identically distributed normal variates.
[34] The Grubbs‐Beck test [Grubbs and Beck, 1972, hereinafter GBT] is designed to determine if the single smallest observation in a sample is a low outlier. If more than one, say k, observations are below the Grubbs‐Beck threshold, all would be considered low outliers. Alternatively, the GBT has been implemented iteratively to identify multiple low outliers. As each outlier is detected, it is removed from the data set and the test is repeated on the remaining sample. The statistical properties of this iterative procedure have not been thoroughly investigated, though the discussion below shows this approach is misguided.
are consistent with a normal distribution and the other observations in the sample by examining the statistic
(5)
denotes the kth smallest observation in the sample, and
(6)
(7)[36] The partial mean (
) and partial variance (
) are computed using only the observations larger than
to avoid swamping.
4.1. Deriving the p Value
[37] This section presents a derivation of the probability density function of
. That distribution can be used to determine quantitatively if the kth smallest observation in a sample of size n is, in fact, unusually small. A low outlier test can subsequently be based upon the p value for
, which is the probability given H0 of obtaining a value of
as small or smaller than that observed with the sample. If that probability is less than, say, 0.10, the k smallest observations could be declared to be low outliers based on the selected significance level.
(8)
from equation (5) and rearranging the terms yields
(9)
is the kth‐order statistic in a random standard normal sample of size n, and
are the partial mean and standard deviation of the standard normal sample. Thus the distribution of the final ratio in equation 9 does not depend on the unknown moments,
and
. Thus, without loss of generality, we can limit this investigation to the case where
are drawn from a standard normal distribution.
[40] In order to estimate
, we now turn our attention to the three random variables on the right‐hand side of equation 9 and present an analytical approach to estimate their distribution.
4.2. Computing the p‐Value
in equation 9 can be computed from a multiple integral that reflects the distributions of the three random quantities in equation 9:
. We define:
(10)[42] η is the observed value of
,
[43] z is a variable of integration corresponding the the kth smallest observation in the sample,
[44]
is the pdf of
conditioned on z and s2,
[45]
is the pdf of
conditioned on z, and
[46]
is the pdf of
, the kth observation statistic in a standard normal population of size n.
[47] Evaluating equation 10 presents challenges. First, the integral is semi‐infinite and three‐dimensional. Second, aside from the trivial case where no low outlier is present, the estimators for
and
are not independent. We solve this by using a nearly closed‐form analytical approach with direct numerical integration presented in the next section. The result is a p value for any combination of k and n, which eliminates the need for tables and intepolation schemes for
. Tables generated by standard Monte Carlo studies, including critical deviates in IACWD [1982] and Rosner [1983] (among many others) are limited to discrete (and finite) combinations of n and significance level.
4.3. Direct Numerical Integration
[48] An approximate solution to equation 10 is generated with the following steps and approximations. First, the mean is reparametrized to eliminate the correlation between
and
. Moments for the partial mean and standard deviation are derived. Second, ratios of standardized random variables are approximated by noncentral Student t variates. The result is a one‐dimensional univariate integral for
.
conditioned on z in terms of:
(11)
(12)
and
, respectively, as a normal variate and an independent Gamma variate with parameters that can be derived analytically. Using that approach, we obtain:
(13)[51]
is the mean of
,
[52]
is the variance of
,
[53]
is the mean of
,
[54]
is the variance of
,
[55] U is the numerator in equation 13,
, and
[56] L is the denominator in equation 13,
.
[57] The required moments (
,
,
, and
) are derived in Appendix A.
and unit variance, and L can be approximated by the square root of an independent
divided by its degrees of freedom. Thus their ratio,
, is approximately a noncentral Student t variate, a well‐known distribution whose statistical properties are easily evaluated numerically [Johnson and Kotz, 1970]. The needed p values can then be readily computed as:
(14)
distribution (see equation (A12) in Appendix A).
can now be evaluated numerically as a single univariate integral:
(15)
is the pdf for the
‐order statistic in a standard normal sample of size n, given by [David, 1981]:
(16)[60] Thus equations 14-16 should provide an accurate approximation of the p values for
thereby avoiding the triple integral in equation 10 or the need for tables.
5. Validation of p‐Value Approximation Accuracy
[61] Where only one observation is suspected of being a low outlier (k = 1), critical values computed with the MGBT were compared with published values in Grubbs and Beck [1972], to assess the precision of the MGBT approximations for
(using the relationships in section A8). The largest error, expressed in terms of percentage difference in Kn, occurred with n = 10 and was about 0.5% of Kn. The errors decreased monotonically with n, with a maximum error of less than 0.2% for
and less than 0.1% for
. This comparison indicates the derived formula accurately reproduces the existing published values for this special case.
[62] Monte Carlo experiments were used to evaluate the accuracy of
estimated with equations 14-16. The experiments consist of essentially three steps:
[63] 1. generate a sample of n standard normal iid variates,
;
[64] 2. compute the approximate p value
for
, and
using equations 14-16; and
[65] 3. determine if computed values of
for each sample is less than several nominal significance levels (
, and 50%).
[66] Figure 2 shows the rejection rate for the MGBT based on equation 15 for sample sizes
, at select order statistics and significance levels. The results in Figure 2 are based on 100,000 replicate samples and are accurate to at least three significant digits. The rejection rates are generally quite close to the nominal significance levels for all α considered. As the sample size increases, the approximation in equation 15 becomes more accurate. The precision of the approximation, even in samples of size 10, is remarkable.

, number of suspected low outliers
and nominal significance levels
(shown as red line), based on 100,000 simulated normal samples for each sample size. The number of outliers k was rounded down to integers for samples resulting in noninteger values.
6. Examples
[67] Two flood frequency examples illustrate the p value computations and the identification of PILFs. Annual maximum 1 day rainfall flood flows from the Sacramento River at Shasta Dam, California are revisited. Annual peak‐flow data from Orestimba Creek, California demonstrate the challenge of flood records with many PILFs.
[68] For the 1932–2008 Sacramento River record, Figure 3 depicts p values for the low outlier criterion with
candidate observations suspected of being low outliers. The horizontal line on the graph represents the 10% significance level; if the kth smallest observation is below the line, that observation and those smaller could be called an outlier. For the Sacramento River, this is the case for the smallest three observations—that is,
. This is entirely consistent with the visual conclusion that there are
low outliers in the flood series (Figure 1). Unfortunately, the standard GBT identifies only a single low outlier for this data set. This is not surprising because the test is designed to find a single low outlier, and, does not consider the actual distribution of the second or third smallest observations, as does the new MGBT.

[69] Figure 4 displays the annual peak flow series from 1932 to 1973 for Orestimba Creek, California (U.S. Geological Survey 11274500) [IACWD, 1982, see 12–32]. The data set is discussed in Bulletin 17B and presents a case with an unusually large number of low outliers. The series includes 6 zero flows. Zero flows show conclusively that the LP3 distribution cannot describe the full range of observed flood flows because the support for the LP3 distribution vanishes for
. The standard GBT identifies only one nonzero low outlier. It is safe to conclude that the data set contains at least
low outliers including zeros, but should we conclude it contains more than seven?

[70] Figure 5 shows four fitted frequency curves where low outlier thresholds have been set at flow thresholds of T = 1, T = 10, T = 100, and T = 1000 cfs, respectively. The zeros and other values below the low outlier thresholds have been recoded as censored observations
cfs, respectively. In the case of a low outlier threshold of T = 1000 cfs, the fit between the model and the remaining observed flood series appears to be reasonably good; the fit is not very good for the cases employing smaller flow thresholds. However, employing a low outlier threshold of T = 1000 cfs implies censoring 19 of 42 observations. Is 19 a reasonable number of small flood values to treat as PILFs?

[71] Figure 6 presents p values corresponding to the MGBT for
as a proposed guide for identifying PILFs. Because the six smallest observations (the zero flows) are low outliers, we consider the 7th through 21st smallest values as possible low outliers in the sample. The horizontal line in Figure 6 represents the 10% significance level; if the kth smallest observation is below the line, that observation and all smaller values can be considered to be low outliers. For Orestimba Creek, this is the case for
. Figure 7 shows that employing a low outlier threshold at
, which is 1200 cfs, the LP3 distribution fits the censored data set reasonably well with a skew near zero.


[72] These two examples also illustrate the challenge of developing an objective algorithm for identifying the number of observations in a record to be labeled as outliers or as PILFs. For example, with the Shasta record, if one intended to determine the number of PILFs by testing successively
,
,
(etc.) at the 5% level, one would test the first observation
, find a p value > 5%, and thus would stop even though
is significant at the 0.1% level (Figure 3). So, when records have several PILFs, the smallest may not be statistically significant because of the masking effect of the other unusual values.
[73] The Orestimba record further illustrates the challenge of deciding how many observations in a record should be labeled as PILFs: the record has 6 zero flows plus a single value of 10 cfs, which would be treated as the seventh smallest observation. However, even on the log‐scale (Figure 4), five values in the range of 100–200 cfs are likely to be appropriate values to treat as PILFs. On the other hand, larger observations such as those above the lower quartile have relatively little leverage on estimated design flood flows, and thus we are very reluctant to classify them as PILFs. Moreover, when a decision results from several separate (but not independent) hypothesis tests, the overall type I error is not immediately clear, though it will be at least as large as the largest type I error for the individual tests. Rosner [1983] illustrates the concern with the overall type I error for a test that involves multiple individual tests on different order statistics.
[74] The p values generated by the relationships derived here can serve as the basis of an outlier identification algorithm. They are the basis of several algorithms considered in Lamontagne et al. [2013], which uses a different type I error for large‐order and small‐order statistics in step‐forward (median toward smallest observation) and step‐backward (smallest observation toward the median) sweeps.
7. Discussion and Interpretation
[75] Two independent approaches have been used to evaluate the integral in equation 10:
[76] 1. An almost analytical result based on the approximate joint distribution of
and
, resulting in a univariate numerical integral; and
[77] 2. Monte Carlo simulation experiments.
[78] The semianalytical equation was found to provide reasonably accurate p values for the relevant cases when
.
[79] Spencer and McCuen [1996] raise three concerns with the Bulletin 17B outlier detection procedure. One concern is that Bulletin 17B only provides critical values for a test with
. While it is unclear why this is problematic for a uniform procedure for flood frequency analysis, the p‐value approximation derived here can be used to test for outliers at any desired significance level. A second concern is that the Bulletin 17B outlier test is for only a single outlier. The p‐value approximation derived here can be used to test if any order statistic is unusually small, and can be iteratively applied in a many outlier test as described by Lamontagne et al. [2013].
[80] Their final concern is that the test assumes the sample is drawn from a normal distribution (equivalent to an LP3 with zero skew) as its null hypothesis. The p‐value approximation derived here also assumes the sample is normally distributed, so it is important to consider this concern. An objective here is to reduce the sensitivity of moment estimators to outlying observations in the left‐hand tail. Thus the issue is whether our fitted distribution describing the frequency of large floods is unduly influenced by the smallest observations in a sample. A skewness coefficient less than zero indicates that the smaller observations in a sample are more important than the larger observations, which is a situation we want to avoid. Thus a test employing a threshold based on a skew of zero should yield reasonable results. If the true skew happens to be positive, few outliers will be identified, and that causes no problem because the retained small values will have relatively little influence on sample moments. If the true skew is negative, then many of the smallest flows will be identified as PILFs, which also works well because we do not want the magnitude of unusually small floods to have too large an effect on the frequency distribution derived for large events.
[81] There is an additional concern with the logic of employing the standard GBT when more than a single observation is zero or suspected of being a low outlier. If a sample has m zeros, then the kth smallest nonzero observation is an outlier if it is unusually small when considered to be the
th smallest observation in a sample of size n. This is a more stringent test than the common misguided practice of treating the smallest nonzero retained observation as the smallest observation in an independent sample of size n − m. Similarly, if the GBT identifies one outlier in a sample of size n, then treating the second smallest observation as if it were the smallest observation in an independent sample of size n − 1 does not yield a statistical test with the anticipated type I error. The second smallest observation is indeed the second smallest observation in a sample of size n; the MGBT proposed here reflects that reality.
[82] There might be several reasons to be concerned about removing or censoring the smallest observations in a sample. First, efficient censored‐data estimators are more challenging conceptually and numerically than the relatively straightforward closed‐form estimators that employ complete data sets [Griffis et al., 2004]. The availability of good statistical software solves that problem.
[83] The second concern is that censoring might result in a loss of information. However, the impact of censoring the smallest observations—describing them as
—does not result in much loss of effective sample size if one is concerned about estimating the frequency of large floods. From a statistical as well as hydrological perspective, the smaller observations contain relatively little information pertaining to the magnitude of large quantiles, Qt with
years. Kroll and Stedinger [1996] demonstrate that when fitting the two‐parameter lognormal distribution with efficient algorithms, censoring up to the 60th percentile of the sample does not noticeably diminish the precision of the 10 year flood estimator. Figure 4 for
in Cohn et al. [1997] and Figures 2 and 3 of Griffis [2008] show that when fitting the three‐parameter LP3 distribution, censoring even at the 90th percentile level has very little impact on the precision of 100 year flood estimators.
[84] Part of the confusion is related to how we visualize flood data. Traditional probability plots display the noncensored observations versus their plotting positions and give little hint of the additional information provided by the knowledge that k censored observations were less than the smallest retained observation. Furthermore, one fails to realize that the smallest retained observation has the precision of the
th smallest observation in a sample of size n, and not the smallest in a sample of size n − k. Thus, while it may not be consistent with our intuition, sampling experiments confirm that if samples are analyzed efficiently, the precision of t‐year flood estimators for
obtained by fitting traditional three‐parameter distributions, are not substantially affected by censoring a major fraction of the smallest observations in a sample.
[85] More to the point, Lamontagne et al. [2013] show that use of a forward‐backward MGBT (instead of GBT or no PILF identification) to fit the LP3 distribution with EMA increased the precision of design flood quantile estimators. This perhaps surprising result occurs because the log‐moments employed by EMA are not the most statistically efficient estimators for the LP3 distribution, particularly when the log‐space skew is far from zero.
8. Conclusions
[86] This paper introduces a nearly closed‐form approximation of the p value of a generalization of the Grubbs‐Beck (1972) test statistic that can be used to objectively identify multiple potentially influential low flows in a flood series. It can be used to construct a one‐sided variation of the Rosner [1983] test. Once such low outliers have been identified, they can be treated as “less‐than” values, thereby increasing robustness of the frequency analysis without substantially reducing the precision with which moments‐based fitting methods such as EMA [Cohn et al., 1997; Griffis et al., 2004] can estimate the flood quantiles of interest.
Acknowledgments
[106] The authors would like to thank the members of the Hydrologic Frequency Analysis Workgroup, and Will Thomas Jr., Beth Faber, and Martin Becker in particular, for providing thoughtful analyses and rich discussion that led to this research.
Appendix A: Derivation of the Parameters
[87] Key relations in equations 13 and 14 make use of several means and variances for critical statistics. Formulas for the computation of those values are given here, along with brief derivations.
A1. Moments of the Truncated Normal Distribution
(A1)
(A2)
is given by:
(A3)
(A4)
(A5)[91] This leads immediately to the recursion given in the final line of Table A1. This result is significant because it provides the expected value (as a function of the number of low outliers or PILFs) that is needed for estimating moments.
A2. The Asymptotic Distribution of
Given z
and the variance of the mean
needed in equation 13. We restate equation (6), noting the use of the variate Z instead of X:
(A6)
are directly computed from the results in equation (A5) above.
(A7)
. One can now estimate the variance of this mean, using the standard equation for a variance and the key results from Table A1.
(A8)A3. The Approximate Distribution of
Given z
and the variance of the variance
needed in equation 13. We restate equation (7), noting the use of the variate Z instead of X:
(A9)
(A11)
(A12)
(A13)
(A14)A4. Covariance of
and

and
is given by:
(A17)A5. Approximating the Joint Distributions of
and

is described as a Gamma variate with parameters
, the moments of
can be expressed as the fractional moments of
[Johnson et al., 1994, equation (18.13), p. 421]:
(A18)
(A19)
(A20)A6. Covariance of
and

(A21)A7. Moments of 
for equation 13.
(A22)
(A23)[103] Those are the required results.
A8. Relationship of GBT Critical Kn and

and the critical MGBT statistic
can be compared by noting
(A24)
(A25)[105] By setting
and using the relationships in equations (A24) and (A25), the critical MGBT statistic derived from the approximations in this work can be compared to the GBT critical statistic published in section A4 of Bulletin 17B.
References
Citing Literature
Number of times cited according to CrossRef: 25
- Muhammad Aslam, Introducing Grubbs’s test for detecting outliers under neutrosophic statistics – An application to medical data, Journal of King Saud University - Science, 10.1016/j.jksus.2020.06.003, (2020).
- Juan Antonio Ballesteros-Cánovas, Tasaduq Koul, Ahmad Bashir, Jose Maria Bodoque del Pozo, Simon Allen, Sebastien Guillet, Irfan Rashid, Shabeer H. Alamgir, Mutayib Shah, M. Sultan Bhat, Akhtar Alam, Markus Stoffel, Recent flood hazards in Kashmir put into context with millennium-long historical and tree-ring records, Science of The Total Environment, 10.1016/j.scitotenv.2020.137875, 722, (137875), (2020).
- J.M. Bodoque, J.A. Ballesteros-Cánovas, M. Stoffel, An application-oriented protocol for flood frequency analysis based on botanical evidence, Journal of Hydrology, 10.1016/j.jhydrol.2020.125242, 590, (125242), (2020).
- Asher Metzger, Francesco Marra, James A. Smith, Efrat Morin, Flood frequency estimation and uncertainty in arid/semi-arid regions, Journal of Hydrology, 10.1016/j.jhydrol.2020.125254, 590, (125254), (2020).
- K. Haddad, A. Rahman, Regional flood frequency analysis: evaluation of regions in cluster space using support vector regression, Natural Hazards, 10.1007/s11069-020-03935-8, (2020).
- Gerardo Benito, Yolanda Sanchez-Moya, Alicia Medialdea, Mariano Barriendos, Mikel Calle, Mayte Rico, Alfonso Sopeña, Maria J. Machado, Extreme Floods in Small Mediterranean Catchments: Long-Term Response to Climate Variability and Change, Water, 10.3390/w12041008, 12, 4, (1008), (2020).
- Nancy A. Barth, Gabriele Villarini, Kathleen White, Accounting for Mixed Populations in Flood Frequency Analysis: Bulletin 17C Perspective, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001762, 24, 3, (04019002), (2019).
- Shumaila J. Bhatti, Charles N. Kroll, Richard M. Vogel, Revisiting the Probability Distribution of Low Streamflow Series in the United States, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001844, 24, 10, (04019043), (2019).
- Tao Liu, Noam Greenbaum, Victor R. Baker, Lin Ji, Jill Onken, John Weisheit, Naomi Porat, Tammy Rittenour, Paleoflood Hydrology on the lower Green River, upper Colorado River Basin, USA: An Example of a Naturalist Approach to Flood-Risk Analysis, Journal of Hydrology, 10.1016/j.jhydrol.2019.124337, (124337), (2019).
- Carlos De la Fuente, Rodrigo Ramirez-Campillo, Francisco Gallardo-Fuentes, Cristian Alvarez, Carlos Bustamante, Hugo Henríquez, Felipe P. Carpes, Pattern analysis of a complete Achilles tendon rupture suffered during high jump preparation in an official national-level athletic competition, Sports Biomechanics, 10.1080/14763141.2019.1651897, (1-11), (2019).
- Veronica L. Webster, Jery R. Stedinger, Flood Frequency Analysis in the United States, Statistical Analysis of Hydrologic Variables, 10.1061/9780784415177, (233-268), (2019).
- Zhengke Pan, Pan Liu, Shida Gao, Maoyuan Feng, Yangyang Zhang, Evaluation of flood season segmentation using seasonal exceedance probability measurement after outlier identification in the Three Gorges Reservoir, Stochastic Environmental Research and Risk Assessment, 10.1007/s00477-018-1522-4, 32, 6, (1573-1586), (2018).
- R. J. Nathan, T. A. McMahon, Recommended practice for hydrologic investigations and reporting, Australasian Journal of Water Resources, 10.1080/13241583.2017.1362136, 21, 1, (3-19), (2017).
- W. Booysen, L. A. Botes, W Hamer, undefined, 2017 International Conference on the Industrial and Commercial Use of Energy (ICUE), 10.23919/ICUE.2017.8068009, (1-6), (2017).
- Nancy A. Barth, Gabriele Villarini, Munir A. Nayak, Kathleen White, Mixed populations and annual flood frequency estimates in the western United States: The role of atmospheric rivers, Water Resources Research, 10.1002/2016WR019064, 53, 1, (257-269), (2017).
- Phuong Thi Cu, James E. Ball, The influence of the calibration metric on design flood estimation using continuous simulation, International Journal of River Basin Management, 10.1080/15715124.2016.1239623, 15, 1, (9-20), (2016).
- Milan Stojković, Stevan Prohaska, Nikola Zlatanović, Estimation of flood frequencies from data sets with outliers using mixed distribution functions, Journal of Applied Statistics, 10.1080/02664763.2016.1238055, 44, 11, (2017-2035), (2016).
- J. R. Lamontagne, J. R. Stedinger, Xin Yu, C. A. Whealton, Ziyao Xu, Robust flood frequency analysis: Performance of EMA with multiple Grubbs‐Beck outlier tests, Water Resources Research, 10.1002/2015WR018093, 52, 4, (3068-3084), (2016).
- Mark Babister, Monique Retallick, Melanie Loveridge, Isabelle Testoni, Carlos Varga, Robert Craig, A Monte Carlo framework for assessment of how mitigation options affect flood hydrograph characteristics, Australian Journal of Water Resources, 10.1080/13241583.2016.1145851, 20, 1, (30-38), (2016).
- Ian Brodie, Redefining Terminology of Flood Exceedance Probabilities by Basic Counting, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001420, 21, 10, (06016006), (2016).
- C. C. Hodges, W. M. McDonald, R. L. Dymond, K. L. Hancock, Improved Methods of Parameterization for Estimating the Magnitude and Frequency of Peak Discharges in Rural Ungaged Streams, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001248, 21, 1, (05015010), (2016).
- J. R. Lamontagne, J. R. Stedinger, Examination of the Spencer-McCuen Outlier-Detection Test for Log-Pearson Type 3 Distributed Data, Journal of Hydrologic Engineering, 10.1061/(ASCE)HE.1943-5584.0001321, 21, 3, (04015069), (2016).
- Hongxiang Yan, Hamid Moradkhani, Toward more robust extreme flood prediction by Bayesian hierarchical and multimodeling, Natural Hazards, 10.1007/s11069-015-2070-6, 81, 1, (203-225), (2015).
- David P. Wright, Mark Thyer, Seth Westra, Influential point detection diagnostics in the context of hydrological model calibration, Journal of Hydrology, 10.1016/j.jhydrol.2015.05.047, 527, (1161-1172), (2015).
- Jonathan R. Lamontagne, Jery R. Stedinger, Timothy A. Cohn, Nancy A. Barth, undefined, World Environmental and Water Resources Congress 2013, 10.1061/9780784412947.242, (2454-2466), (2013).



:

Where









is asymptotically a gamma variate with pdf:





