Volume 58, Issue 6 e2021WR031641
Research Article
Open Access

Rainfall Generation Revisited: Introducing CoSMoS-2s and Advancing Copula-Based Intermittent Time Series Modeling

Simon Michael Papalexiou

Corresponding Author

Simon Michael Papalexiou

Department of Civil Engineering, University of Calgary, Calgary, AB, Canada

Department of Civil, Geological and Environmental Engineering, University of Saskatchewan, Saskatoon, SK, Canada

Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Prague, Czechia

Correspondence to:

S. M. Papalexiou,

[email protected]

Search for more papers by this author
First published: 18 May 2022
Citations: 18


What elements should a parsimonious model reproduce at a single scale to precisely simulate rainfall at many scales? We posit these elements are: (a) the probability of dry and linear correlation structure of the wet/dry sequence as a proxy reproducing the distribution of wet/dry spells, and (b) the marginal distribution of nonzero rainfall and its correlation structure. We build a two-state rainfall model, the CoSMoS-2s, that explicitly reproduces these elements and is easily applicable at any timescale. Additionally, the paper: (a) introduces the Generalized Exponential (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0001) distribution system comprising six flexible distributions with desired properties to describe nonzero rainfall and facilitate time series generation; (b) extends the CoSMoS framework to allow simulations with negative correlations; (c) simplifies the generation of binary sequences with any correlation structure by analytical approximations; (d) introduces the rank-based CoSMoS-2s that preserves Spearman's correlations, has an analytical formulation, and is also applicable for infinite variance time series, (e) introduces the copula-based CoSMoS-2s enabling intermittent times series generation with nonzero values having the dependence structure of any desired copula, and (f) offers conceptual generalizations for rainfall modeling and beyond, with specific ideas for future improvements and extensions. The CoSMoS-2s is tested using four long hourly rainfall records; the simulations reproduce rainfall properties at multiple scales including the wet/dry spells, probability of dry, characteristics of nonzero rainfall, and the behavior of extremes.

Key Points

  • New flexible system of probability distributions for rainfall intensity

  • Advanced rainfall generation preserving wet/dry spells, marginal distributions, and copula dependence of nonzero rainfall

  • Applicable to a single time scale and reproducing rainfall characteristics at multiple scales

1 Brief Review of Rainfall Models

“Water is the driving force of all nature.” ∼Leonardo da Vinci

Nature cannot escape randomness. In fact, randomness is one of nature's building blocks with mechanistic laws allowing only short-term prediction of hydrometeorological processes such as rainfall. Yet long-term rainfall modeling is imperative for investigating hydroclimatic variability and natural or human-made environments. This motivated the development of probabilistic rainfall models that are based on different foundations—different models target reproducing different rainfall characteristics or are developed under different theoretical mandates. The perfect stochastic model would reproduce the joint distributions (of any order and at any scale) describing the process under investigation. Yet this approach, for the general case, might not be technically feasible—at least not in a meaningful way. An operational model should identify and reproduce key properties of a process, be as parsimonious as possible, and still represent adequately the process at multiple scales.

It would be a colossal task to describe here the mechanics of all known stochastic rainfall models or cite the whole literature; this is not the study's scope. Yet some characteristic and popular modeling strategies include those based on point processes, Markov chains (MCs), multifractals, as well as recent ones targeting to reproduce any marginal distribution and correlation structure.

A popular modeling strategy relies on a specialized class of point processes (e.g., Cox & Isham, 1980), the so-called cluster-based models. These models represent point rainfall as storms generated by clusters of rectangular pulses (cells). The most well-known rectangular pulse models are the Neyman-Scott (NS; Neyman & Scott, 1958) and Bartlett-Lewis (BL; Rodriguez-Iturbe et al., 1987). Originally, Neyman and Scott (1958) used point processes to model spatial clustering in galaxies. Later, this approach inspired applications in rainfall modeling and resulted in the NS rectangular pulse model (Kavvas & Delleur, 19761981; Le Cam, 1961). In both models, a Poisson distribution specifies epochs of storms. A random number of cells, defined by a Poisson or Geometric distribution, is attributed to each storm event. The rectangular pulses have random intensity and duration, usually exponentially distributed. The main difference between the models is the placement of cell origins relative to storm origins. In the NS model, the time intervals between the storm origin and the birth of individual cells are independent random variables, exponentially distributed. In the BL model, the storm duration is exponentially distributed and the intervals between successive cells are independent. The cells may overlap both within the same storm and with cells of different storms. Many studies explored variations of the NS model (e.g., Cowpertwait, 1991; Cowpertwait et al., 2002; Wheater et al., 2005) and others re-parametrized the BL model and randomized the storm arrivals and durations as well as the cell intensity (e.g., Kaczmarska et al., 2014; Onof & Wang, 2020; Rodriguez-Iturbe et al., 1988). These models are typically used to simulate daily or sub-daily rainfall time series.

MC models (Markov, 1906) have been extensively used in hydrology in different topics ranging from stochastic reservoir and flood cascade theories (see Pegram, 1971) to simulate rainfall occurrence, or else, wet/dry states. Gabriel and Neumann (19571962) seem to have first proposed an MC probability model of order one for rainfall occurrence, with some properties derived by Gabriel (1959). In the first-order MC, the daily rainfall probability is conditioned on the wet/dry state of the previous day. The lengths of the alternating wet and dry spells are independent, distributed according to the Geometric distribution. Higher-order MC models, based on wet day probabilities of few consecutive days, can improve the wet/dry clustering; other variations include the hybrid-order (Stern & Coe, 1984) and the multi-state MC (Haan et al., 1976). Over the past few decades, several stochastic rainfall generators coupled MC models with continuous probability distributions to describe nonzero rainfall (e.g., Exponential distribution in WGEN [Richardson, 1981], Gamma in CLIGEN [Nicks & Gander, 1994], Weibull in ClimGen [Stockle et al., 2001], and Exponential and Gamma in WeaGETS [J. Chen et al., 2012]; see also Wilks and Wilby [1999] for a relevant review). Another modeling strategy for rainfall occurrence uses renewal processes (e.g., Foufoula-Georgiou & Lettenmaier, 1987; Jones et al., 1972; Quélennec, 1973) assuming independence between wet and dry spells. Distributions such as the logarithmic series, truncated negative binomial distribution, and truncated geometric distribution were used to generate wet/dry spells (Wilks & Wilby, 1999) with other distributions describing nonzero rainfall.

Many researchers, circa mid-60s, identified the so-called fractal properties in nature. That is, an “object” can be subdivided into reduced-size copies of the whole in a cascade (Mandelbrot, 1982). Extensions to multifractal theory assume that single-scale fluctuations inform fluctuations at other scales via scale invariance (Grassberger, 1983). The multifractal representation of rainfall has found applications, for example, in predicting rainfall extremes and constructing IDF curves (e.g., Langousis & Veneziano, 2007; Veneziano et al., 2006). In time series or random fields simulation based on multifractals, the scale invariance is reproduced by using random multiplicative cascade processes (Lovejoy & Mandelbrot, 1985; Schertzear & Lovejoy, 1987). Multiplicative cascades, first introduced probably by Yaglom (1966), have been explored and applied in many later studies to disaggregate daily rainfall to finer temporal resolutions studies (e.g., Deidda, 2000; Gaume et al., 2007; Güntner et al., 2001; Menabde et al., 1997; Molnar & Burlando, 2005; Olsson, 1998; Perica & Foufoula-Georgiou, 1996; Rupp et al., 2009; Serinaldi, 2010). In the context of rainfall simulation, several advances and applications of multiplicative cascade models have been presented over the last decade (see e.g., Aguilar-Flores et al., 2021; Akrour et al., 2015; Gires et al., 2013; Licznar et al., 2015; Lombardo et al., 2012; Müller & Haberlandt, 2015; Müller-Thomy, 2020; Paschalis et al., 2014; and references therein). Finally, several variations of multifractal models exist, such as pulse-based, non-pulse-based using wavelet decompositions, and non-pulse-based using discrete or continuous multiplicative cascades (Flores, 2004).

The previous modeling strategies have dominated the literature for decades; yet other flexible approaches are gaining momentum. Their origins lie in time series generation by autoregressive moving average (ARMA) models tracked in the works of Box and Jenkins (1970), Matalas (1967), Pegram and James (1972), Thomas and Fiering (1962), and of others. Initially, ARMA models were applied for non-intermittent processes such as rivers flows (see also Salas, 1980) generating Gaussian times series with linear dependence. A more general approach, allowing simulations with various dependence structures beyond linear, is based on conditional sampling from copulas (e.g., Joe, 1997). This method was mainly applied in econometrics, yet its potential has been highlighted in hydrology for river flow simulation (Lee & Salas, 2011) while copula-based multisite models for daily rainfall have been suggested by Bárdossy and Pegram (2009) and Serinaldi (2009a2009b). Still, the most prominent approach is based on generating and transforming Gaussian time series to match any desired marginal distribution. Such transformations yet alter the Gaussian correlations (see Section 4 for details) with early cumbersome numerical approaches (e.g., Li & Hammond, 1975) focusing on reproducing continuous marginals and short-term correlations. This strategy was extended and simplified (Papalexiou, 2018; see also an earlier approach in Papalexiou, 2010) in the CoSMoS framework by introducing simple parametric correlation transformation functions (CTFs). This allowed rainfall generation reproducing the probability of dry, any marginal distribution describing nonzero rainfall, and the whole correlation structure of the intermittent process. In general, the one-state rainfall generation based on Gaussian variable transformations has been applied many times in hydrology, typically for space-time modeling, with zero rainfall corresponding to Gaussian values below a threshold; for example, see Bardossy and Plate (1992), Bell (1987), and Glasbey and Nevison (1997) to mention just a few early works, since the literature on space-time modeling is vast and outside the scope in this study.

Many of the previous models, to improve simulation of statistical characteristics at multiple time scales were coupled with disaggregation schemes. Such schemes, initially applied to river flows, were based on the ideas of Harms and Cambel (1967) and Matalas (1967), and were followed by works of Stedinger and Vogel (1984), Valencia and Schaake (19721973), and of many others. Later, disaggregation was extended to rainfall, recognizing the challenge to preserve intermittency and multiscale characteristics. The target was to simulate rainfall sequences at a fine scale (e.g., hourly) conditioned on rainfall totals at a larger one (e.g., daily). Many rainfall disaggregation schemes were based on the Glasbey et al. (1995) framework that uses conditional simulation from a point-process model. They suggested simulating long rainfall sequences at the desired fine scale and select sub-sequences matching coarse-scale totals, or alternatively, generate sequences iteratively until the coarse-scale total is achieved—many variations and extensions followed (e.g., Connolly et al., 1998; Cowpertwait et al., 1996). Recently, the DiPMaC scheme (Papalexiou, Markonis, et al., 2018) enabled disaggregation that explicitly reproduces the probability distributions and correlations at a fine scale, also allowing non-stationary disaggregation reproducing time varying properties. For a more detailed historical evolution of the disaggregation literature see Papalexiou, Markonis, et al. (2018).

This study aims to explore stochastic modeling strategies for intermittent processes and build a rainfall model that is easily applicable at a single scale and reproduces rainfall characteristics at multiple scales. Toward this aim several novelties are introduced merging into the CoSMoS-2s model that shows promising results in reproducing wet\dry spells and rainfall characteristics at large range of temporal scales.

The paper does not follow the typical structure but is rather based on the “natural” steps that build the CoSMoS models. The major components of these models identify: (a) the marginal distribution, and (b) the autocorrelation structure (ACS) (or dependence structure) that describe the process under investigation, and (c) the characteristics of a parent process (typically Gaussian but it can have other type of dependence) to be transformed. To help the reader navigate the outline: Section 2 introduces a new system of flexible marginal distributions; Section 3 explains the use of ACS's and their link with autoregressive (AR) models; Section 4 focuses on correlation transformations and extends previous works to negative correlation; Section 5 introduces the theoretical framework of the two-state intermittent model (CoSMoS-2s) that couples a binary and a continuous process; Section 6 introduces an analytical approach to simulate binary time series that facilities the operational use of CoSMoS-2s; Section 7 offers guidelines for different modeling strategies to simulate intermittent processes, including the one-state approach (CoSMoS-1s), and variations of the CoSMoS-2s including new analytical rank-based and copula-based alternatives; Section 8 tests the performance of CoSMoS-2s in simulating hourly rainfall and its potential to preserve statistical properties at a large range of time scales; Section 9 offers a discussion and conceptual generalizations for rainfall modeling with specific ideas for future improvements and extensions toward building the “ultimate” rainfall model; and Section 10 summarizes and concludes the paper.

2 New System of Probability Distributions

2.1 Probabilistic Behavior of Rainfall

The statistical characteristics of rainfall and of other hydroclimatic processes typically vary, depending on: (a) time and spatial scale, (b) season, (c) location, and (d) the period under investigation for non-stationary cases. For example, rainfall at sub-monthly scales (daily, hourly, etc.) is intermittent having high probability of dry (probability mass at zero) and continuous positively skewed marginals describing nonzero values. This can also be conceptualized as having a mixed-type (zero-inflated) marginal distribution. Noteworthy, analysis of intermittency at fine temporal scales should be conducted with caution, as rainfall measurement methods (e.g., tipping buckets) can affect the results (Mascaro et al., 2013). At larger time scales, such as annual or interannual, and in most places of the world, rainfall can be described by continuous marginals, typically bell-shaped due to the central limit theorem. The same holds for rainfall at different spatial scales; fine scale characteristics (e.g., point measurements) differ from those at large scales (e.g., at 1° × 1° grids), that is why concepts such as the areal reduction factor are so popular in hydrology (e.g., Wright et al., 2014). Seasons typically affect rainfall greatly; for example, Mediterranean regions have dry summers and wet winters (see Papalexiou & Koutsoyiannis, 2016 for a global analysis on seasonal variation of daily rainfall). Seasonal patterns change with location too, for example, the tropics have low seasonal variability while high latitude artic and subarctic regions have distinct seasons. Location, however, may alter rainfall characteristics even in the same latitude regions due to regional weather patterns, presence of mountains, etc. For example, rainfall climatology differs across the United States as eastern parts get more rainfall than the western. The behavior of extremes also varies with location as the tail heaviness shows clear regional patterns (e.g., Nerantzaki & Papalexiou, 2019; Papalexiou, AghaKouchak, & Foufoula-Georgiou, 2018). Finally, if nonstationarity is assumed, then time varying distributions should be considered, yet nonstationarity (e.g., Serinaldi & Kilsby, 2015) should be used with caution. Clearly, stationarity should also be used with caution since describing a nonstationary process with a stationary model can lead in underestimating the risk of extremes. While the natural variability of rainfall is high and potentially masks deterministic changes in many locations, there are many recent and large-scale or global studies indicating changes in the rainfall regime (e.g., Alexander, 2016; Barbero et al., 2017; Markonis et al., 2019; Moustakis et al., 2021; Papalexiou & Montanari, 2019; Prein et al., 2017; Ye et al., 2017).

The previous arguments dictate that probability distributions qualified to describe rainfall in a large range of spatiotemporal scales, different seasons, and regions, must have desired properties such as: (a) consistent domain—that is, (0, ∞) since nonzero rainfall is positive. This excludes distributions with a location parameter as it imposes a lower (or upper) bound different than zero. (b) Flexibility—the probability density function (pdf) of nonzero rainfall might be J-shaped (e.g., at subdaily scales) or bell-shaped (e.g., at annual scales), having a thin or heavy right tail. (c) Parsimony—the most parsimonious distribution, flexible enough to match the previous demands, have three parameters, that is, a scale and two shape parameters controlling the left and right tails. Two-parameter simplifications, if adequate, should be preferred where appropriate. For example, the Exponential (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0002), Gamma (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0003), Lognormal (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0004), and Weibull (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0005), distributions have been popular choices; however, the Exponential has fixed shape, the Lognormal allows control on the right tail but not on the left, the Gamma has J- and bell-shaped densities but always thin tail, and the Weibull has stretched-exponential tails for J-shape densities and hyper-exponential tails for bell-shaped densities. These limitations restrict such distributions to perform well across a wide range of spatial and temporal scales.

2.2 The Generalized Exponential (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0006) System

Known three-parameter distributions with the previous three properties exist, yet their use has been limited with most studies, as previously mentioned, focusing on the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0007, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0008, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0009, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0010 distributions. Possible reasons are: (a) complex expressions are not as favored as simpler two- or three-parameter models that include a location parameter; (b) fitting challenges; (c) complicated or not analytical moments and L-moments expressions, and (d) the need to use simple distributions in stochastic models to facilitate their mathematical formulation. Yet advances in stochastic modeling (Papalexiou, 2018; Papalexiou & Serinaldi, 2020; Papalexiou, Serinaldi, & Porcu, 2021), allow rainfall modeling with any desired distribution and correlation structure. Also, global studies (Papalexiou & Koutsoyiannis, 20122016) analyzing thousands of records indicate that such distributions describe effectively nonzero rainfall. Particularly, the Generalized Gamma (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0011; Equation A1 in Appendix A) introduced by Stacy (1962) and the Burr type XII (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0012; Equation A2) from the Burr (1942) system have been reparametrized and used extensively for rainfall (Papalexiou & Koutsoyiannis, 20122016). The Burr type III (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0013; Equation A3) has also been suggested (Papalexiou, 2018), while a generalization of the Beta of the second kind (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0014; Equation A4) and the Generalized Standard Gompertz (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0015; Equation A5) were introduced in Papalexiou and Serinaldi (2020) to describe rainfall in random fields. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0016, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0017, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0018 are power-type distributions and can have very heavy tails. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0019 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0020 are of exponential and double exponential form, respectively, and their tails can be heavier (stretched exponential) or thinner (hyper exponential) than the tail of the exponential distribution.

In general, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0021 distribution describes rainfall well; however, (a) its cumulative distribution (cdf) and quantile functions are not analytical, slowing down quantile transformations in time series generation, and (b) its shape parameters do not converge to meaningful values if fitted to heavy-tailed rainfall. Power-type distributions can describe heavy rainfall, yet some specific shape parameters correspond to infinite variance distributions. This excludes time series generation based on the CoSMoS framework since the ACS cannot be defined. Thus, to better describe rainfall and further advance stochastic modeling we need alternative exponential-form distributions with simple and analytical cdf's, having their moments finite (to guarantee the ACS existence) and being more versatile than the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0022.

This motivated the formulation of a general framework to build valid probability distributions. The idea is applied here to introduce a new distribution system, that is, the Generalized Exponential (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0023) system comprising six distributions (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0024 to urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0025); their cdf's are:

The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0032 pdf's are, in general, J- and bell-shaped for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0033 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0034, respectively, while urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0035 mainly controls to the right tail heaviness (see Figures 1a and 1b for pdf shapes). Exceptions are the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0036 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0037 where both shape parameters can affect the density shape. These expressions follow a consistent notation, that is, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0038 is a positive scale parameter and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0039 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0040 are positive shape parameters controlling mainly the left and right tail, respectively (see notation in Appendix A).

Details are in the caption following the image

Flexibility of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0041 distributions. Probability density functions of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0042 distribution for (a) varying urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0043 and fixed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0044, and (b) varying urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0045 and fixed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0046; (c) example of exceedance probability functions of the six urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0047 distributions fitted to the same first three L-moments and demonstrating different asymptotic tail behavior.

The framework creating these distributions starts with a valid distribution function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0048 defined in [0, ∞) and replaces urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0049 with any increasing function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0050, that is, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0051. Functions urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0052 can also be formed by distribution functions defined in [0,∞). If urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0053 is the survival function (sf) of a distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0054, then the function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0055 has the desired properties of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0056. Thus, the function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0057, with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0058 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0059 being valid distribution functions in [0,∞) defines a valid distribution function in [0, ∞).

For example, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0060 emerges by setting urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0061 to be the Exponential distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0062 (Equation A6), and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0063 the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0064 distribution (Equation A2). The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0065 inherits the parameters of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0066 having one scale and two shape parameters. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0067 uses as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0068 the Pareto II (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0069) distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0070 (Equation A7) and as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0071 the Weibull (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0072) distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0073 (Equation A8). Information on how each urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0074 emerges is given in Table 1. Here, we formed a system of powered-exponential distributions with analytical invertible cdf's, having J- and bell-shaped densities, and with control over the right tail heaviness. All urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0075 distributions for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0076 simplify to the Exponential distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0077, and for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0078 to the Weibull distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0079. This does not imply that all urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0080 distributions have the same asymptotic behavior when fitted to real data. For example, if we estimate the parameters of these distributions to have the same first three L-moments (e.g., first L-moment urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0081, L-variation urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0082, and L-skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0083), we note that their predictions for low exceedance probabilities differ (Figure 1c) indicating that some urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0084 distribution have heavier tails than others.

Table 1. Combinations of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0085 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0086 Functions Forming the Six Distributions of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0087 System (Equations 1-6) Emerging From the Function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0088
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0089type F1(x) F2(x)
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-00901 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0091 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0092
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-00932 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0094 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0095
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-00963 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0097 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0098
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-00994 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0100 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0101
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-01025 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0103 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0104
urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-01056 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0106 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0107

One could name these distributions based on the names of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0108 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0109, yet since they were designed to generalize the Exponential distribution, they were named accordingly. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0110 system is not exhausted here and more urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0111 distributions can be formed. In general, this framework can create infinitely many distributions, beyond the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0112 type, such as power-type distributions, or distributions with different support, defined for example in (0, 1) or in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0113. Also note that these distributions are continuous, suited to describe nonzero rainfall. Yet we can conceptualize rainfall as one-state process with mixed-type marginals and this conceptualization can facilitate its stochastic modeling (Papalexiou, 2018). The mixed-type expressions of the cdf, quantile functions, and moment-related quantities are given in Equations A9A15 in Appendix A.

2.3 Remarks on Applications

The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0114 system offers flexible distributions aiming to describe the whole sample (main body and tail) of skewed variables such as rainfall. This is crucial in stochastic modeling since synthetic time series are used as inputs in many models applied in risk assessment, streamflow prediction, water resources management, crop production, energy production and consumption, building resilience, etc. These models, to produce reliable outputs, need reliable inputs reproducing the marginal and joint properties of observations, since the frequencies in all quantiles can affect the system's response—as the saying goes: garbage in, garbage out. For example, we can feed a hydrologic model with two time series described by the same distribution tail but different densities—the outcome will differ.

A point of caution regards the fitting of such distributions as with great flexibility comes great responsibility. There are more than 20 distribution fitting methods (for a detailed review see Nerantzaki & Papalexiou, 2022); some are easily applicable but this does not imply they should be preferred. All distributions of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0115 system have analytical and invertible cdf's and pdf's. Thus, generic methods such as the maximum likelihood and least squares estimation can be implemented numerically. The method of moments is probably the most popular one, but it should be used with caution especially when higher order moments need to be estimated; there is large uncertainty in estimating such moments in skewed samples. An alternative and more robust method is that of L-moments (Greenwood et al., 1979; Hosking, 1990; Sillitto, 1951) which has gained popularity over the last decades and been extensively applied in hydrology (e.g., Royston, 1992; Vogel & Fennessey, 1993). While the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0116 system does not have analytical L-moments the method can be applied numerically (for details see Zaghloul et al., 2020).

Another point is whether such three-parameter distributions perform well in reproducing the behavior of extremes when compared to classical methods. Two methods dominating the analysis of extremes use block maxima (coupled typically with the Generalized Extreme Value (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0117) distribution) and peak over threshold (POT) values (coupled typically with Generalized Pareto (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0118 distribution); yet an alternative method can use the whole sample (see e.g., Salas et al., 2020). We performed here a toy-model Monte Carlo experiment comparing return-level estimates from different methods that does not verify the superior performance of the classical methods in general (see Section S1 and Figure S1 in Supporting Information S1). Samples from three popular distributions (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0119, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0120, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0121) are generated and the 100 and 500 yr return levels are estimated by fitting the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0122 to block maxima, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0123 to POT maxima, and the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0124 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0125 to the whole sample. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0126 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0127 estimates are almost unbiased in all cases but with very large variance. In contrast, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0128 or urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0129 estimates show very low variance but can be biased depending on the distribution that generated the sample. For example, for thin-tailed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0130 samples the heavy-tailed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0131 overestimated return levels, while for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0132 samples (heavy tail) the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0133 (thinner tail than urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0134) underestimated return levels. This toy-model experiment shows that if the fitted distribution is consistent with the underline tail, then it can describe extremes accurately.

In general, identifying the tail type or quantifying the tail-heaviness is not trivial and many methods have been invented and tested (see e.g., El Adlouni et al., 2008; Embrechts et al., 1997; Langousis et al., 2016; Nerantzaki & Papalexiou, 2019; Serinaldi, 2013; Smith, 1987; Wietzke et al., 2020). Tail-type identification or tail-heaviness estimates can be more robust if informed by global or regional studies (e.g., Papalexiou et al., 2013; Rajulapati et al., 2020; Serinaldi & Kilsby, 2014). For example, Papalexiou, AghaKouchak, and Foufoula-Georgiou (2018) used regional tail estimates to fix the tail parameter in the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0135 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0136 distributions before fitting them to the whole sample; this approach was used to assess hourly precipitation depths at large return periods. Additionally, many recent studies have used ordinary events (most of the sample) to assess extremes indicating better performance at high return periods (e.g., Marani & Ignaccolo, 2015; Marra et al., 2018; Zorzetto et al., 2016). Finally, the block and POT maxima methods are not free of limitations. For example, convergence to these liming laws is not guaranteed and in many cases, the estimated parameters indicate that extremes have an upper bound which might lead in underestimating risk (e.g., Moccia et al., 2021).

3 Autocorrelation Structures (ACS)

A stochastic process, in simple terms, is a collection of random variables (rv's) typically associated to each other. If urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0137 is such a collection of rv's in time, with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0138 being an indexed set (e.g., urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0139), then any form of association among the rv's, loosely speaking, implies that information on some rv's (e.g., on their values) can provide information on some of the others. This underlines the importance of this association. However, quantifying association is not easy and there is more than a century-long research on different association measures (see Särndal, 1974 for an early comparative study). Probably the most popular association measure is the Pearson correlation coefficient urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0140 quantifying linear dependence. If rv's are associated nonlinearly then quantifying correlations using urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0141 can be misleading (see e.g., Altman & Krzywinski, 2015). These issues led to different correlation measures. For instance, rank-based measures were introduced by Kendall (1938) and Spearman (1904) (suitable for linear and nonlinear dependencies), and later Linfoot (1957) formed the informational coefficient of correlation, an entropy-based measure that generalizes the Pearson correlation and is invariant under rv transformations.

A stochastic process exploits the associations among the rv's representing the states of a natural process (or system) to model its variability in time and space. Nature “connects things” in time (and space) and the dependence structure among the rv's governs the temporal (and spatial) dynamics leading to crucial features for risk analysis such as clustering of high (or low) values. This also implies that future states of a process depend on its current and past states. This, probably, led Thomas and Fiering (1962) to introduce Autoregressive (AR) models that were later extended and extensively explored by Box and Jenkins (1970) and many others. AR models express intuitively the link between current and past states. Indeed, the formula of a univariate AR model of order urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0142 is
stating that the process at time urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0144 depends on the previous urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0145 states weighted by constant parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0146 plus a random disturbance urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0147 expressing the unpredictability or our ignorance (random noise).

A temporal ACS is a mathematical parametric formula describing parsimoniously how the rv's of a stochastic process are correlated with each other in time. For example, the ACS can describe the linear dependence, as expressed by the lag-urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0148 Pearson correlation coefficient urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0149 for all possible lags in a stationary process. This is convenient since a parametric ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0150 (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0151 is a parameter vector) can be reproduced by AR models up to any desired lag. The model's parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0152 reproducing a desired positive definite urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0153 are given by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0154, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0155 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0156 is a urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0157 matrix with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0158 (e.g., Box et al., 2008). The notion of the ACS should not be limited in expressing linear dependence, since we could also use it to express rank correlations (or any other correlation measure) at different lags given that a model could reproduce such ACS's. For instance, we could reproduce a rank-based ACS expressing the decay of Spearman's rho with time using Gaussian or other multidimensional copulas (e.g., urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0159-copula).

Reproducing an observed (empirical) ACS expressing linear or rank-based dependencies should not be the target per se. The target is mimicking a natural process as precisely as possible and there is no guarantee that simple correlation measures can express complex dependence patterns. For instance, we can construct copulas having the same Pearson correlation and marginals but different dependence structures. Yet reproducing linear dependencies to model natural processes seems to provide a useful approximation of “reality”.

The number of parametric ACS's in literature is large and many date back in the sixties and seventies (see e.g., the list of ACS's given by Buell, 1972). These early works were followed by efforts to define theoretical requirements of valid ACS's such as positive definiteness (e.g., Franke et al., 1988; Julian & Thiebaux, 1975). Later on in the nineties, as Gneiting (1999) remarks, advances in global analysis systems led to a quest for flexible parametric ACS's and points out the work of Gaspari and Cohn (1999) for a detailed mathematical treatise on correlation functions.

In practice, one could test many parametric ACS's for rainfall and choose the best fitted to the empirical one. This, however, is fruitless given the many existing ACS's and those that can be formed. For hydroclimatic processes testing a few ACS's covering different types of asymptotic behavior should be sufficient. Here, we recall three ACS's suggested in Papalexiou (2018) as cases of a general framework that uses survival functions (sf's) as ACS's. The Weibull (W) ACS, Pareto II (PII) ACS, and Generalized Logarithmic (GL) ACS, are given, respectively, by:

These ACS's have a scale parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0163 and a shape parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0164 (for the W ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0165). The asymptotic behavior, or else the rate at which the ACS approaches zero for increasing lags, is controlled by the parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0166 and affects the long-term temporal dynamics. The asymptotic behavior, demonstrated in a log-log plot fixing the lag-1 correlation to the same value (Figure 2), shows that the W ACS is concave (powered exponential), the PII asymptotically straight line (power type), and the GL (for large urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0167 values) convex (slower decay than power type). The PII and GL for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0168 simplify to the Markovian ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0169. These ACS's can be generalized by adding a third parameter; for example, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0170 can be replaced by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0171 in PII and GL or use a survival function of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0172 distribution system instead of the W ACS. Yet parsimony should be sought, and two-parameter ACS's should always be preferred over three-parameter ones given adequate fit.

Details are in the caption following the image

Demonstration of parametric autocorrelation structures having the same lag-1 correlation and different asymptotic behaviors. In a log-log plot the Weibull (W) autocorrelation structure (ACS) is concave, the Pareto II (PII) is asymptotically a straight line, and the Generalized Logarithmic (GL) can be asymptotically convex having extremely slow decay.

Different parametrizations of the ACS's suggested here, or their special cases and generalizations, have appeared several times in literature. For example, a special case of the W ACS is given in Buell (1972) while in other works (mainly as space correlation) is mentioned as the powered exponential (e.g., Gneiting, 2013). Martin and Walker (1997) on their work on power law correlations mentioned the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0173 as a valid isotropic ACS with long-range dependence (LRD) for 0 urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0174. Gneiting (2000) generalized this power-type ACS proposing the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0175 and gave the permissible parameter space. This ACS (named Cauchy by Gneiting), based on the framework suggested in Papalexiou (2018) coincides with the Burr type XII survival function and was also used to simulate 10 s rainfall in Papalexiou et al. (2011). The motivation for these power-type ACS's was to generalize the LRD correlations of the celebrated fractional Gaussian noise (fGn) process (Mandelbrot & Wallis, 1968), given by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0176, with the parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0177 controlling the correlation strength (see Beran, 1994; Graves et al., 2017). Note yet that LRD is elusive and should be used with caution; short-term strong correlations can provide a false impression of LRD (Markonis et al., 2018).

The previous ACS's, and most of those found in literature, are monotonically decaying to zero, ranging in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0178, with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0179 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0180. That is mainly because negative autocorrelations are not frequently observed in nature. However, non-monotonically decreasing ACS's, allowing negative correlations, have also been suggested; for example, those describing the so-called hole effect (see e.g., Gneiting, 2002), or the Bessel-Lommel ACS's (see e.g., Hristopulos, 2020, p. 360) that fluctuate around zero taking negative values before convergence to zero. We can also modify any valid ACS to converge to a negative correlation or any asymptotic lower limit; yet real-world applications might be limited. For example, the
converges to urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0182 for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0183, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0184 is any valid ACS in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0185.

4 Correlation Transformations and Extensions to Negative Space

It was observed early on that the linear correlation between two rv's urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0186 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0187 following the bivariate Gaussian distribution is maximum, and any nonlinear transformation urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0188 applied to urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0189 and/or urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0190 will decrease it. In mathematical terms, this states that if urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0191 then urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0192, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0193 denotes the linear correlation between the subscripted rv's. This is the maximal property of the bivariate Gaussian correlation and its proof is linked with Lancaster (1957) with first results dating back to Gebelein (1941) and Maung (1941). In fact, these concepts exist in the works of Karl Pearson related to contingency theory and contingency tables (see the “On the theory of contingency and its relation to association and normal correlation”, Pearson, 1904). Interestingly, Lancaster (1958) refers to Hirschfeld (1935) who “…sought for transformations of the marginal variables that would yield linear least squares regression lines. He found that these variables maximized the coefficients of correlation”.

This property was exploited to generate time series having desired marginal distributions and linear autocorrelations. An early application is given by Conner (1971) in his Ph.D. thesis titled “Pseudo-random number generators having specified probability density functions and autocorrelations”. The same year Rowland and Holmes (1971) presented similar techniques in a report for the U.S. Army Missile Command citing Conner's thesis. Li and Hammond (1975) published probably the first paper with a clear demonstration, with Hammond being the supervisor of Conner's thesis. In a nutshell, the key idea was generating time series with appropriately inflated autocorrelations and Gaussian marginal distributions, which when transformed, resulted in time series with desired properties. These early works focused on simple cases such as generating time series with continuous marginals and preserving observed autocorrelations up to a few lags. Analytical expressions linking urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0194 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0195 exist for the uniform (Baum, 1957) and the Lognormal (Matalas, 1967) distributions, while asymptotic approximations based on Hermite-Chebyshev polynomial expansions, were used for other distributions (Lancaster, 1957; van der Geest, 1998).

Papalexiou (2018) extended and unified this strategy into the CoSMoS framework (see also Papalexiou, 2010). CoSMoS enabled time series generation with any type of marginal distribution including binary, discrete, continuous, and mixed-type distributions; the latter case was applied to simulate intermittent processes such as rainfall. This generalization reproduces the whole ACS (not correlations for a few lags) of a process and is based on parsimonious parametric CTFs that link Gaussian and target correlations. The proposed CTF for continuous and mixed-type marginals was:
where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0197 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0198 are parameters that can be easily estimated (for estimation details see Papalexiou (2018), or use the CoSMoS R package (Papalexiou, Serinaldi, Strnad, Markonis, & Shook, 2021); also note that a variant was used for binary/discrete marginals that will be further simplified here in Equation 19). This approach, once the CTF is identified for the target marginal distribution, allows one to analytically estimate the Gaussian ACS as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0199 that corresponds to the desired ACS of the target process. Another benefit for this approach is that it allows one to link the CTF parameters with the shape parameters of the target distribution; this can be exploited to create analytical approximations and avoid re-estimating the CTF (more details in Sections 6 and 9).

The same approach can be extended to transform negative correlations; this allows one to use parametric ACS's having negative values (e.g., hole-effect ACS's or those suggested in Equation 12). To achieve desired negative correlations, in contrast to positive ones, the Gaussian correlations must be deflated. Also processes with non-Gaussian marginals typically have a lower limit for negative correlations larger than −1. To assist the non-familiar reader with this framework and its extension to negative correlations we graphically demonstrate the mapping of a Gaussian rv to one having Bernoulli, continuous and mixed-type marginal (see Figures 3a, 3d, and 3g respectively). The corresponding CTF's (Figures 3b, 3e, and 3h) show the inflation for positive correlations, the lower negative correlation limit that can be achieved for these marginals, as well as, the rapid deflation for negative correlations. Once the CTF's are defined they can be used to estimate the parent-Gaussian ACS's (Figures 3c, 3f, and 3i) for any desired target ACS.

Details are in the caption following the image

(a) Transformation sketch of a Gaussian random variable (rv) into a Binary (Bernoulli), (b) the effects of this transform expressed by a correlation transformation function (CTF) linking desired binary correlation with the Gaussian one, and (c) a desired (target) autocorrelation for the binary time series and the corresponding parent Gaussian as estimated using the CTF. (d–f) and (g–i) are analogous to (a–c) but the transformation maps a Gaussian rv into continuous and mixed-type rv's, respectively.

Details are in the caption following the image

Example of two intermittent time series simulations as product of binary and continuous-marginal processes. Case 1:: (a) binary time series with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0200; (b) time series with a continuous marginal (c) intermittent time series; (d) correlogram of the target autocorrelations and comparison with the empirical values; (e) probability plot showing the target marginal distribution and its 95% confidence interval (gray region), and the empirical simulated distribution; (f–j) similar to (a–e) but for the Case 2: simulation.

Note that this modeling approach was applied in multivariate simulation using parametric cross-CTFs (Papalexiou, 2018), in the DiPMaC disaggregation scheme and for nonstationary simulation (Papalexiou, Markonis, et al., 2018), and more recently, for static and Lagrangian spatiotemporal random fields in Papalexiou and Serinaldi (2020), Papalexiou, Serinaldi, & Porcu (2021), respectively. The same approach is modified here to couple processes with Bernoulli and continuous marginal distributions to improve simulation of intermittent processes and reproduce rainfall characteristics at a large range of scales.

5 Two-State Intermittent Rainfall

Let urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0201 be a mixed-type rv, with probability mass at zero, describing intermittent rainfall (zeros and nonzero values), urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0202 a discrete (Bernoulli) rv describing dry and wet states, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0203 a continuous rv describing nonzero values. Then urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0204 denotes a stationary stochastic process with mixed-type marginals, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0205 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0206 processes with Bernoulli and continuous marginals, respectively; for brevity, hereafter, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0207, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0208, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0209 denote either processes or rv's depending on the context. Also, for clarity, the terms “intermittent process”, “binary process”, and “continuous process” refer to the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0210, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0211, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0212 processes, respectively. Similarly, their ACS's or generated time series of these processes will be labeled as “intermittent”, “binary”, and “continuous”.

If we assume that the intermittent process emerges as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0213, then it is trivial to show (see e.g., Goodman (1960) for variance) that:
where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0218, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0219, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0220, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0221, denote, respectively, the mean, variance, the lag-urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0222 covariance, and autocorrelation of the process indicated by the subscript. Thus, given the mean, variance, and the ACS's of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0223 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0224 we can estimate the ACS of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0225. This shows that an intermittent process with linear ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0226 can emerge by different products of binary and continuous processes; or else, different combinations of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0227 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0228 can result in an intermittent process with the same ACS.

The previous fact can affect profoundly the stochastic modeling of intermittent processes. We demonstrate this by simulating an intermittent process urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0229 resembling, for example, rainfall, with: urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0230, continuous marginal the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0231 (gray line Figures 4e and 4j), and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0232 (red line Figures 4d and 4i). Based on these characteristics urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0233, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0234, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0235 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0236. We simulate this process in two different ways.

Case 1:.We generate binary times series (Figure 4a) assuming the binary process urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0237 has ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0238 (blue line Figure 4d) and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0239. From Equation 16, given the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0240 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0241, we find the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0242 (green line; Figure 4d) and generate time series (Figure 4b) having the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0243 marginal. The two time series are multiplied forming the intermittent time series (Figure 4c) that preserves the target ACS (red dots; Figure 4d) and marginal distribution (Figure 4e).

Case 2:.We repeat the previous steps assuming now the binary process (Figure 4f) has a stronger ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0244 (blue line; Figure 4i). We re-estimate the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0245 (green line; Figure 4i) and re-generate time series from the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0246 process (Figure 4g). The two new time series are combined to give the new intermittent time series (Figure 4h).

The two intermittent time series (Figures 4c and 4h) have the same linear ACS, probability dry, and marginal distribution for nonzero values, and yet, they emerge by multiplying very different processes. Comparing the binary time series (Figures 4a and 4f) we observe the effects of the stronger ACS in Case 2: expressed as longer and more frequent dry spells. Likewise, the difference between the continuous time series (Figures 4b and 4g) is evident with stronger clustering of low/high values in Case 1:. Interestingly, in Case 2:, to match the target ACS (red line; Figure 4i) the continuous urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0247 (green line; Figure 4i) takes negative correlations. Differences between the two time series are better demonstrated by comparing the probabilities of wet and dry spells (Figures 5a and 5b); Case 2: time series, given the stronger binary ACS, has larger probability for long wet and long dry spells. This affects the scaling of probability zero with the stronger binary ACS resulting in slower decrease with scale (Figure 5c). In contrast to the scaling of probability zero, the weaker ACS of the continuous process urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0248 in Case 2: contributes to faster decay of distributional shape measures (here L-skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0249, and L-kurtosis urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0250 Figure 5d).

Details are in the caption following the image

Comparison of the Case 1: and Case 2: intermittent time series generated as product of different binary and continuous processes: (a) probability of wet, and (b) dry spells; scaling of (c) probability zero, and (d) L-skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0251 and L-kurtosis urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0252.

6 Binary Time Series Generation Made Simple

Simulating binary processes with any ACS is a component to generate intermittent time series as a product of binary and continuous processes. Here, we provide a solution that makes this task simple. For the binary case, we simplify the covariance transformation integral to:
where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0254 is the quantile value of the standard normal distribution corresponding to probability zero urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0255, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0256 is the bivariate standard Gaussian pdf. A Bernoulli rv with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0257 has mean urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0258 and standard deviation urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0259; thus, linking Gaussian and Binary correlation we get:

The one-variable integral in Equation 17 facilitates the numerical integration and will be used to form a readily applicable solution.

The two-parameter CTF for binary processes in Papalexiou (2018) is simplified here to:

Investigation showed that this one-parameter version performs equally well. For example, Figure 6a depicts the integral-estimated urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0262 points (Equation 18) for a binary process with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0263 and the fitted urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0264. The concavity of the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0265 function depends on urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0266, and thus, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0267 is a function of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0268. An easy-to-apply solution requires one to know the parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0269 in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0270 for any value of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0271. The steps to identify a function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0272 are: (a) set urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0273 to a specific value; (b) use Equation 18 to estimate urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0274 points for a few urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0275 values; (c) fit the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0276 to estimate the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0277; (d) repeat step 1–3 for a large number of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0278 values. This process results in a set of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0279 points (see dots in Figure 6b) that are used to identify the function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0280.

Details are in the caption following the image

Linking correlations of a Gaussian process transformed to Binary. (a) Estimated points and the fitted autocorrelation transformation function (CTF). (b) Points and the fitted urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0281 function in Equation 20; (c) performance of the CTF using the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0282 function in comparison with points estimated from the integral-based Equation 18.

Here, we propose the two-parameter function
with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0284 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0285, that interpolates accurately the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0286 points (dark gray curve in Figure 6b) for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0287. Interestingly, Equation 20 can be written in terms of the standard deviation urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0288 of the binary process as:

The performance of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0290, with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0291 given by Equation 20, is compared to integral-estimated points from Equation 18 (Figure 6c).

The previous framework makes the simulation of binary time series with any valid ACS simple as there is no need to calculate integrals or fit functions. Equation 20 (or Equation 21) is valid for all cases and provides analytically the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0292 parameter. Thus, to generate a binary process with desired properties:
  1. Select the characteristics of the target binary process, that is, the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0293 and the ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0294.

  2. Find the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0295 value for the desired urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0296 from Equation 20.

  3. Estimate the parent Gaussian ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0297 from Equation 19.

  4. Generate Gaussian time series using an AR model of sufficiently large order to reproduce urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0298.

  5. Transform the Gaussian values to binary by setting urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0299 where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0300.

For demonstration, we generate binary time series having the three different ACS's shown in Figure 2 and different levels of probability zero. The first, has urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0301 and the W ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0302 (Figures 7a and 7b); the second, has urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0303 and the PII ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0304 (Figures 7c and 7d); and the third, has urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0305 and the GL ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0306 (Figures 7e and 7f). The scale parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0307 in each of these ACS's was estimated so that the lag-1 autocorrelation is urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0308, that is, the same value in all ACS's. A comparison between the target theoretical ACS's (solid blue lines) with the empirical ones (blue dots) of the simulated time series shows their agreement (Figures 7b, 7d, and 7f). This scheme makes the generation of binary time series with any positive definite ACS possible even in a spreadsheet environment (see CoSMoS.xlsx in Supporting Information S2).

Details are in the caption following the image

Binary time series with different probability zero (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0309) and autocorrelation structures (ACS). (a) Time series with a Weibull ACS; (b) comparing the target Weibull ACS with the empirical one and the corresponding parent Gaussian ACS; (c, d) and (e, f) same as (a, b) but for the Pareto II (PII) and Generalized Logarithmic (GL) ACS, respectively.

7 Rainfall Generation Strategies

7.1 CoSMoS-1s | One-State Generation

This section summarizes the rainfall generation method in Papalexiou (2018) and this model will be referred as CoSMoS-1s. Conceptually, rainfall is considered as a one-state process described by a single linear ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0310 and a mixed-type marginal distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0311. Thus, CoSMoS-1s reproduces the observed ACS of the intermittent time series, the probability of zero, and the marginal distribution of nonzero values.

For one-step simulation:
  1. Use the observed time series to estimate the probability of zero urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0312, fit a parametric probability distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0313 to nonzero values, and fit a parametric ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0314.

  2. Estimate the CTF urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0315 for the desired urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0316 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0317 (for details see Papalexiou, 2018), and the Gaussian ACS as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0318.

  3. Generate standard Gaussian time series urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0319 using an AR model that reproduces the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0320.

  4. Apply urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0321 to transform the Gaussian time series into the desired intermittent one; where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0322 is the mixed-type quantile in Equation A10 (see also Figures 3g–3i).

7.2 CoSMoS-2s | Two-State Generation

This strategy assumes rainfall as a two-state process, that is, a binary process describes the wet/dry sequence and a “hypothetical” one with a continuous marginal describes nonzero rainfall. This model, named CoSMoS-2s, reproduces the linear ACS's of the binary and continuous processes, the probability zero, and the marginal distribution of nonzero values.

For two-step simulation:
  • 1.

    Create the binary time series from the observed one by replacing positive rainfall values with 1.

  • 2.

    Use the binary time series to estimate the probability zero urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0323, and fit a parametric ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0324 to the empirical urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0325. Generate synthetic binary time series reproducing urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0326 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0327 as described in Section 6 (see also Figures 3a–3c).

  • 3.

    Fit a parametric ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0328 to describe the continuous process. The sample urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0329 could be estimated by Equation 16, solving for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0330, and using the binary urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0331 and intermittent urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0332. Alternatively, it might be better to estimate urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0333 explicitly by defining a conditional correlation coefficient as:

    where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0335 is urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0336-th value in the time series; urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0337 the time series length; urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0338 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0339 are sets of values for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0340; urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0341 is the sample size of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0342 or urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0343 since it is equal; and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0344 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0345 are mean and standard deviation estimates for the sets indicated by the subscripts.

  • 4.

    Fit a parametric probability distribution urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0346 to nonzero values and generate synthetic time series reproducing the desired urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0347 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0348. Note that the procedure here is the same as in the one-step simulation. The difference is that, instead of the mixed-type quantile urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0349, the continuous urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0350 is used to estimate the CTF urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0351, and hence the parent-Gaussian ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0352, and transform the Gaussian time series (see also Figures 3d–3f).

7.3 Ranked-Based CoSMoS-2s

A variation of the previous two-state generation, that will not be evaluated in detail here, is a two-state generation based on rank-based (Spearman's) correlations. The motivation for this is the analytical solution and potentially competent performance. The author posits that reproducing accurately the wet/dry spells markedly contributes to reproducing rainfall characteristics across a large range of scales. Pearson's and Spearman's correlations coincide for a Binary process; hence, if explicitly reproducing the binary linear ACS leads in reproducing the wet/dry-spells, then using the Spearman's ACS will have the same results. Thus, binary time series generation with a desired Spearman's ACS is the same as in Section 7.2.

Now, generating a continuous process with desired marginal and Spearman's ACS urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0353 can be done using analytical equations. The linear and Spearman's ACS are the same for a process with uniform (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0354) marginal distribution. Thus, the analytical CTF urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0355 linking Gaussian (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0356) and uniform (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0357) correlations (Baum, 1957) can be exploited to link Gaussian and Spearman's correlations, that is,

This implies that the parent-Gaussian ACS for a desired Spearman's ACS is estimated as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0359. In practice, the sample urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0360 can be estimated by fitting a parametric ACS to Spearman's correlations estimated from the positive-value samples urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0361 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0362 defined in Section 7.2. The rest of the steps are the same, that is, the parent-Gaussian time series with the desired urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0363 is transformed to one with a desired marginal by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0364. Summarizing, the two-state rank-based generation reproduces the Spearman's ACS's of the binary and continuous processes and is essentially analytical as there is no need to solve any integrals.

Such an approach might be useful for processes where the marginal distribution has infinite variance (e.g., power-type marginals with infinite variance). In this case, the ACS of the parent Gaussian process that will reproduce the observed Pearson ACS cannot be defined. In contrast, the Gaussian ACS that reproduces the observed Spearman's ACS can be easily estimated by Equation 23. Cleary, a similar approach can focus on reproducing Kendall's ACS.

7.4 Copula-Based CoSMoS-2s

The framework of CoSMoS-2s can be extended to generate intermittent times series with nonzero values having the dependence structure of any desired copula. This extends past schemes generating copula-based time series with continuous marginals and thus not being applicable for intermittent processes such rainfall, wind speed, etc.

Briefly, an urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0365-dimensional copula is a function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0366 satisfying specific conditions and acting as a joint cdf urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0367 of uniform rv's urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0368 in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0369. Due to Sklar (1973), copulas are used to connect rv's and form their joint distribution. For example, a 2-dimensional joint cdf of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0370 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0371 is formed as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0372 where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0373 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0374 are arbitrary marginals. For time series generation, copulas have been mainly used in the field of econometrics. Some characteristic studies and applications on copula-based time series include the Darsow et al. (1992) on copulas and Markov processes, X. Chen and Fan (2006) on semiparametric time series models, Hofert (2008) and McNeil (2008) on sampling from Archimedean copulas, Ibragimov (2009) on higher-order Markov processes, Lee and Salas (2011) on generating annual flows, Ibragimov and Lentzas (2017) on copulas and long memory, and many more; see also the monographs of Joe (19972014) and Nelsen (2006).

Here, we briefly describe copula-based Markov time series simulations; in this case, the value of the process at time urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0375 depends only on the value at time urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0376. Uniform Markov time series urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0377 can be generated based on the joint cdf urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0378, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0379 is any valid copula, and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0380 a parameter vector controlling the dependence structure. The method relies on sampling from the conditional distribution of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0381 given the value of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0382 which is estimated as:

Since this equation provides the cdf of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0384 it can be used to generate random values as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0385, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0386 is a random number sampled from a uniform distribution in urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0387. This formula is applied recursively to generate uniform time series having the dependence structure of the copula urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0388. The copula-based uniform time series urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0389 can be transformed to time series urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0390 with any continuous marginal by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0391 or to binary time series urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0392 with any urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0393 by urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0394.

Next, we apply this strategy by generating copula-based intermittent time series by coupling copula-based binary and continuous time series. We show two characteristic cases using copulas with different tail dependence.

Case 1.Clayton copula. We demonstrate intermittent time series simulation by coupling binary and continuous processes both generated based on the Clayton (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0395) copula given by:

where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0397 controls the dependence structure and specifically the lower tail dependence. The binary time series has urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0398 (Figure 8a) and is based on a Clayton copula with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0399. The uniform time series transformed into binary shows the characteristic lower-tail dependence of the Clayton copula (scatter plot in Figure 8b). The Clayton-based (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0400) continuous time series (Figure 8c) has the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0401 marginal and the lack of upper-tail dependence is clear in the high values (Figure 8d). The lower-tail dependence is also apparent in the time series (see e.g., the strong clustering of very low values before urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0402 in Figure 8c); in contrast, the lack of upper-tail dependence is manifested with very weak clustering in high values that resembles sporadic peaks. The intermittent time series (Figure 8e) are generated by multiplying the binary and continuous time series and preserve the desired marginal and the Clayton dependence structure within nonzero values. Cleary, we can also estimate the linear ACS's (Figure 8f), or even fit the copula to match the lag-1 Pearson autocorrelation, yet the purpose of using such copulas is to reproduce nonlinear dependences and thus preserving a linear ACS might be meaningless in this case.

Case 2.Gumbel copula. We repeat the previous demonstration, keeping the same urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0403 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0404 marginal, but now the binary and continuous time series are generated based on the Gumbel (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0405) copula given by:

where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0407 controls the upper-tail dependence. Gumbel-based (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0408) uniform time series are generated (see the characteristic strong upper-tail dependence in Figure 8h) and transformed to binary time series (Figure 8g). Comparing with the Clayton case, the weaker and stronger clustering of low and high values, respectively, in the Gumbel-based (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0409) continuous time series (Figure 8i) is apparent. This is also shown in the lag-1 scatter plot (Figure 8j) where the upper-tail dependence is clear and highlights the difference from the Clayton structure (see Figure 8d). The Gumbel-based intermittent time series (Figure 8k) preserves the desired marginal and the Gumbel dependence structure; for demonstration we show their linear ACS's (Figure 8l) that preserve the theoretical relationship in Equation 16.

An important advantage of the copula-based CoSMoS-2s vs. the one-state approach is that the nonzero values of the intermittent time series preserve exactly the copula dependence. To clarify, in the one-state approach, intermittency would be introduced by applying a mixed-type quantile (see Equation A10) to a single copula-based uniform time series. However, in this case, the nonzero values will not preserve exactly the copula dependence structure. Uniform values urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0410 are mapped to zero, while those urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0411 are mapped to nonzero values using the target marginal's quantile (see Equation A10). Thus, this mapping reproduces the copula dependence for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0412 which does not coincide with the complete copula dependence expressed for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0413 (see e.g., the characteristic Clayton copula structure in Figure 8b where low and high values have very different dependence).

Details are in the caption following the image

Intermittent time series generation using the copula-based CoSMoS-2s model and the Clayton and Gumbel copulas. (a) Clayton-based binary time series with urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0414, and (b) the scatter plot of the uniform time series prior the transformation to binary (dark red regions indicate high density); (c) Clayton-based continuous time series with a urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0415 marginal, and (d) the scatter plot of its values; (e) intermittent time series by coupling the binary and continuous ones, and (f) their ACS's. (g–l) are similar to (a–f) for the Gumbel copula.

The previous demonstration shows the potential to also generate copula-based binary series. This approach, at this stage, is experimental, and its operational use needs further research to identify how a copula should be calibrated to a binary time series. For example, two copulas with different tail dependence, even if calibrated to have the same ACS (a measure easily estimated from the observed binary series), could lead to very different behavior of wet/dry spells. The best bet is to focus on the profile of wet/dry spells. Thus, one should identify how the copula dependence parameter (for given probability zero) affects the occurrence probabilities of the wet/dry spells; this could be exploited to calibrate the copula parameter to match the observed wet/dry spells' profile. The “secret life” of copula-based binary time series and their link with transition matrices and the behavior of wet/dry spells will be the topic of a future communication. Until more is revealed on this topic, one could generate copula-based intermittent time series by coupling binary time series that reproduce the observed ACS (Section 6), with continuous copula-based time series where the copula is calibrated to nonzero rainfall using any of the available fitting methods (see e.g., Joe, 19972014; Nelsen, 2006).

7.5 Operational Use and Parsimony

The previous sections focused on modeling strategies; here, we focus on the operational use and calibration mainly of CoSMoS-2s (Section 7.2), yet most of the points made are valid for all CoSMoS variations.

7.5.1 Seasonality

Rainfall (and most of hydroclimatic processes) exhibit seasonal variation. Typically, we assume that statistical characteristics do not change within each season and calibrate the model parameters in a seasonal basis. The question how many seasons we should use is not simple to answer. The optimal number of seasons and their individual lengths within a year can be subjective; one can apply clustering algorithms to estimate number and lengths of seasons, but estimates depend on the objective functions used in such algorithms. A common “out-of-the-box” approach is to assume each month as a different season and calibrate the model monthly. Once the model parameters are estimated for each month (or for each season in general), the model is applied continuously by switching its parameters cyclically. To clarify, the AR model urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0416 generating the Gaussian values runs continuously for urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0417, where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0418 is the desired time series length, yet the parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0419 depend on the value of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0420. For example, if urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0421 is running within hours of January then the calibrated parameters for January are used, once urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0422 reaches the first hour of February the parameters switch to those calibrated for February, etc. The same approach can be applied for all CoSMoS variants (e.g., the copula parameter can change monthly).

7.5.2 Choice of Marginal

Potentially, many different marginals can be used to describe nonzero rainfall (as in other hydroclimatic variables). It is up to the user to assess which one describes the data well. Distributions with one scale and two shape parameters (see Section 2) offer enough flexibility for most cases; distributions with more than three parameters should be used with caution. The user, however, might choose simpler forms, for instance, marginals with one scale and one shape parameter given adequate performance. Similarly, any fitting method can be applied but the method of moments should be used with caution for skewed samples and when higher-order moments are involved in parameter estimation. A relevant point regards whether we need to change all the parameters of the marginal in every season. For instance, if there is high uncertainty in the parameter controlling the heaviness of the tail (which is typically the case), then we can fix its value for all seasons with a value estimated from the whole sample or informed by a regional analysis.

7.5.3 Choice of ACS

The comments made for selecting and calibrating the marginal distributions apply also for selecting and calibrating the ACS's. It up to the user to decide whether a two-parameter ACS is needed for both the binary and the continuous process or if all of their parameters need to change seasonally. For example, the shape parameter of the ACS could be fixed for all seasons and allow changes only in the scale parameter to better match the short-term dependence. Of course, the copula-based CoSMoS does not reproduce a linear ACS but can be calibrated to match, for example, the lag-1 dependence.

7.5.4 Order of AR

The calibrated ACS's are reproduced by AR models of order urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0423. The rule of thumb is that the order should be large enough to reproduce the ACS; a precise answer cannot be given. For example, one could choose urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0424 so that the ACS value at urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0425 is close to zero (e.g., less than 0.05). It should be clear that all the parameters of the AR model are derived analytically from the ACS (see Section 3); thus, an AR model fitted to a two-parameter ACS, no matter of the order urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0426, is always a two-parameter model.

7.5.5 Parsimony

Calibrating or building a model is partly art, in the sense that many choices depend on the “eye of the beholder”. Yet it seems that seeking parsimony in models, due to Occam's philosophical razor, has become a “naïve” panacea. As far as the author knows, there was never evidence that the principle of parsimony is itself an irrefutable scientific result. Any model, at least theoretically, can become parsimonious, for example, by assuming fixed parameter values, and this does not imply it is a good model. Building a model, selecting its components, or comparing models based on the principle of parsimony is not straightforward. Parsimony is inevitably linked with specific conditions and is meaningful if assessed based on the desired model outputs. For instance, (a) if two models with different parameter number reproduce the same desired characteristics then the more parsimonious one may be considered better; (b) if two models with the same number of parameters reproduce the same desired characteristics but one of them reproduces additional features then it may be considered better. But if two models with the same parameter number reproduce equal number of desired but different characteristics then which one is the better? If two models with different parameter number reproduce some desired properties but the one with extra parameters reproduces extra desired properties, then which one is better? Model selection is a scientific field itself (see e.g., the review of Nerantzaki & Papalexiou, 2022), but the fact is that such questions cannot be easily answered and rely on defining the desired model outputs and the characteristics that are assumed more important than others. In a nutshell, a general goal would be to achieve the desired model output with the minimum number of parameters.

Regarding CoSMoS models, their structure allows one to select their components and control the number of parameters used. For example, as a general case, it was suggested that a two-parameter ACS and a three-parameter marginal per season offer enough flexibility for hydroclimatic processes. Yet it is up to the user to decide if more parsimonious versions achieve the desired outputs. For example, one can use a Markovian ACS (one parameter) for both the binary and continuous processes, the Exponential distribution (one parameter), and four seasons, ending up with a twelve-parameter model. For two- and three-parameter ACS's and marginals, respectively, and assuming monthly seasonality, the model will end up with 72 parameters. As aforementioned, various approaches can be applied to reduce the number of parameters if this is desired (e.g., minimum number of seasons, sinusoidal variation of the marginal's scale parameter, fixed-tail parameter informed by global or regional studies, a single ACS for the binary and a single ACS of the continuous processes across seasons, etc).

8 Rainfall Simulation in Action

8.1 Simulating and Assessing Hourly Rainfall Time Series

We compare the CoSMoS-1s and -2s models using four long records of hourly rainfall. These simulations demonstrate the modeling strategies and their aim is not to identify the most parsimonious version of these models. In the main text, we present results for a century-long hourly rainfall record from the Philadelphia airport station (Figure 9a); the results for all stations are given in Section S2 in Supporting Information S1. The observed time series is analyzed monthly. The urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0427 distribution is used to describe the nonzero values for each month (Figure 9d, Figure S3 in Supporting Information S1) and the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0428 ACS to represent the empirical autocorrelations of the binary, continuous, and intermittent processes (Figure 9g, Figure S6 in Supporting Information S1).

Details are in the caption following the image

(a) Observed hourly rainfall (1900–2011) at the Philadelphia airport station; (b) one-state, and (c) two-state simulated time series; (d) the fitted urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0429 distribution to January nonzero rainfall compared with the observed distribution; (e, f) the target urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0430 distribution and its 95% confidence interval (gray region) compared with the simulated one; (g) fitted intermittent urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0431, binary urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0432 and continuous urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0433 autocorrelation structures; (h, i) the target autocorrelation structures compared with the simulated ones. See the Figures in Section S2 in Supporting Information S1 for all months and stations.

The CoSMoS-1s simulated time series (Figure 9b), as expected, preserves the target marginal distributions (Figure 9e, Figure S4 in Supporting Information S1). However, the simulated ACS of the binary and continuous processes, in general, do not match the target ones (Figure 9h, Figure S7 in Supporting Information S1). The autocorrelation of the binary process is underestimated as the simulated one is weaker than the observed (target). In contrast, the autocorrelation of the continuous process seems in agreement with the observed (except for October and December), yet it is not explicitly preserved in CoSMoS-1s. The CoSMoS-2s simulated time series (Figure 9c) apart from preserving the target marginal distributions (Figure 9f, Figure S5 in Supporting Information S1) reproduces explicitly the ACS of the binary and continuous processes (blue and green dots, respectively, in Figure 9i and Figure S8 in Supporting Information S1). However, the intermittent ACS in many months appears weaker. The previous observations are valid to all stations simulations; see the Figures in Section S2 in Supporting Information S1. To avoid any misunderstanding, we stress that the models are applied in a continuous basis as described in Section 7.5, thus, the hourly simulated time series (Figures 9b and 9c) are across all months and years.

Differences between the two simulated time series are better demonstrated by comparing the probabilities of wet and dry spells (Figures 10a and 10b). The CoSMoS-2s simulated time series, as expected given the stronger binary ACS, has larger probability for long wet and long dry spells. In general, this approach reproduces accurately the distribution of the wet and dry spells (Figures 10a and 10b). The better representation of wet/dry spells also affects the scaling of probability zero with the stronger binary ACS resulting in slower decrease with scale (Figure 10c). Similarly, when the two simulated time series are aggregated at larger time scales the marginal distributions of nonzero values differ especially at large scales. For example, this is demonstrated by comparing distributional shape measures (here L-skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0434 and L-kurtosis urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0435 see Figure 10d). The decrease of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0436 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0437 with time scale is faster in the CoSMoS-2s simulation and matches the observed scaling (Figure 10d). Additionally, the box plots of nonzero rainfall values at different temporal scales (Figure 10e) reveal the superior performance of CoSMoS-2s, especially in scales ranging from 2 hr up to 2 days. The same evidence is found in the simulations for all stations studied (see the Figures in Section S2 in Supporting Information S1).

Details are in the caption following the image

Comparison of observed and simulated (CoSMoS-1s and CoSMoS-2s) hourly rainfall at the Philadelphia International Airport station. (a) Probability of wet and (b) dry spell; (c) scaling of probability zero, and (d) of L-skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0438 and L-kurtosis urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0439; (e) box plots of nonzero rainfall at different temporal scales (whisker indicate the 95% empirical range). See the Figures in Section S2 in Supporting Information S1 for all stations.

8.2 Assessment of Extremes at Multiple Scales

Any CoSMoS model variant, by definition, reproduces the fitted marginal distribution to nonzero values at the scale it is calibrated from (here at the hourly but it can be at any time scale). If the fitted marginal describes the behavior of rainfall well, then it reproduces the tail properties too and thus the behavior of extremes. This highlights the importance of selecting an appropriate distribution to describe nonzero rainfall. However, there are two points of caution: (a) a distribution might appear to describe the observations well, but this does not guarantee that its tail precisely reproduces extremes. This is because, fitting methods tend to reproduce properties of the main body of observations while the precise asymptotic tail behavior (or tail type) cannot be easily assessed, and (b) reproducing accurately extremes at a single time scale does not imply the same accuracy at coarser time scales. The structure of wet/dry spells or the strength of the ACS, as was previously shown, affects the properties of the process at larger scales and thus its extremes too. For example, a very strong ACS (or strong upper tail dependence) in nonzero rainfall leads to clustering of high values which in turn leads to larger extremes at larger time scales compared to the case of a weak ACS—the same holds for a process with longer wet spells or a binary process with strong ACS.

Here, we compare and assess the performance of CoSMoS-1s and -2s in reproducing the observed annual maxima at scales ranging from 1 hr up to 14 days. The annual maxima (from observations and simulations) at each scale are extracted by accumulating the hourly values over a sliding window of duration equal to the one of the investigated scale and picking the maximum of each year (e.g., Papalexiou et al., 2016; van Montfort, 1990). Another common approach accumulates values over non-overlapping windows leading in underestimating annual maxima and should be avoided. Once the observed annual maxima (at each scale) are identified a urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0440 distribution is fitted and its 95% confidence interval (CI) is estimated (gray regions in Figure 11). We can assume that simulated annual maxima spotted within the 95% CI at each scale are reproduced well. Note that the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0441 95% CI is constructed assuming that the fitted urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0442 to the observed maxima is the true one. Clearly, the true underlying urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0443 distribution is not known; thus, the actual 95% CI can be even broader since the observed maxima might have emerged by a different urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0444 than the fitted one. Thus, this assessment does not favor the models.

Details are in the caption following the image

Comparison of observed and simulated (CoSMoS-1s and CoSMoS-2s) annual maxima at different temporal scales at the Philadelphia International Airport station. The grey region shows the 95% confidence interval based on the fitted urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0445 to the observed maxima. See the corresponding Figures in Section S2 in Supporting Information S1 for all stations.

Even so the performance of the CoSMoS-2s shows that the empirical distributions of annual maxima at all scales are within the 95% CI and very close to the empirical distribution of the observed maxima (Figure 11); the same holds for all stations simulated (see Figures in Section S2 in Supporting Information S1). CoSMoS-1s seems to predict larger exceedance probabilities than those observed or simulated by CoSMoS-2s mainly at scales ranging from 4 hr to 2 days. Its deviations in simulating annual maxima at these scales match also the deviations shown for the scaling of probability dry, L-ratios, and the box plots of nonzero values (Figures 10c–10e) that are clear at these scales. The same comments are valid for the simulations in the other three stations (see Figures in Section S2 in Supporting Information S1). The fact is that improving the structure of wet/dry spells in CoSMoS-2s led to improving the simulation of extremes at every time scale tested. This is an important improvement considering that the model is calibrated at a single scale using the whole sample of values.

9 Insights and Future Quests

9.1 One- and Two-State Comparison

Comparing the CoSMoS-1s and -2s rainfall generation poses intriguing discussion points. Reproducing the linear ACS of the intermittent process does not explicitly reproduce the ACS's of the binary and continuous processes. This affects the length of wet/dry spells and the correlations within wet spells. The theoretical analysis shows that an intermittent ACS can result from different binary and continuous processes combinations. Thus, reproducing the intermittent ACS does not imply matching the ACS's of the binary and continuous processes. The level of matching, or, when and if reproducing the intermittent ACS is sufficient, has not been studied. The author speculates that for larger time scales (e.g., daily) it provides decent results; however, for fine scales (e.g., hourly, or sub-hourly) the correlations of wet/dry spells and those within wet spells can be accurately described only by individual ACS's and thus the CoSMoS-2s performs better.

An intriguing question is why reproducing explicitly the ACS's of the binary and continuous processes does not reproduce exactly the intermittent ACS. For several months there are clear deviations between the observed and simulated intermittent ACS's. The theoretical relationship (Equation 16) linking the intermittent ACS with the binary and continuous ACS's is always valid given that the binary and continuous processes are independent. Nature yet does not produce a binary and a continuous time series to directly test this hypothesis. We can assess, however, the cross-correlation between binary and intermittent times series (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0446) in observations and simulations (Figure 12a). The monthly analysis in the four investigated stations (48 points in Figure 12a) shows that in most cases the observed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0447 is higher than the simulated urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0448 (points above the diagonal). If this explains deviations in observed (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0449 and simulated (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0450) intermittent ACS's then deviations will be larger for larger urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0451. To assess this argument we define an error measure as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0452 and compare it with the observed urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0453 in each month and station (48 points in Figure 12b). These results indeed verify that larger error corresponds to larger urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0454. Additionally, we performed a Monte Carlo simulation of binary and continuous time series with different levels of cross-correlation and estimated the error urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0455 between the simulated intermittent ACS and the one predicted by Equation 16. As expected, there is no error when the time series are independent, but it gets larger as the cross-correlation increases (Figure 12c).

Details are in the caption following the image

Exploring deviations between and observed and simulated intermittent ACS's. (a) Observed cross-correlations between binary and intermittent times series (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0456) are higher in general than the simulated urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0457 (points above the diagonal); (b) the error urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0458 between the observed and simulated ACS's increases with larger urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0459; (c) Monte Carlo results verifying that urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0460 increases when the binary and continues time series are cross-correlated.

In theory, we could modify the model and generate cross-correlated binary and continuous processes to reproduce exactly the intermittent ACS too; for example, Equation 16 probably could be generalized for correlated processes; Goodman (1960) offers a complex variance expression for the correlated case. However, this will complicate the model's operational use (large-scale application, implementation speed, extension to multisite simulation, etc.) with no clear benefits. CoSMoS-2s reproduces the structure of wet/dry spells, and the probability dry, distribution of nonzero values and behavior of extremes at a large range to time scales and it is not clear what aspects will be further improved if the intermittent ACS is additionally preserved. This point deserves further investigation as this mismatch could be an artifact due to non-stationarities in storm's behavior (e.g., lower values in the beginning and end of a storm), remaining seasonality within each month, or other causes. In any case, there is always a trade off in building models to achieve operational functionality.

9.2 Reproducing Characteristics at Multiple Scales

CoSMoS-2s is more accurate than CoSMoS-1s (and more complex as it uses an additional parametric ACS) since it is calibrated to reproduce the ACS's of the binary and continuous processes. This improves its performance in reproducing the wet/dry spells and the correlations within wet spells; in turn, this leads to mimicking the process more precisely at multiple scales (see Figure 10).

However, an appealing question is whether reproducing any form of ACS (linear or nonlinear) can lead to exact multiscale representation. Reproducing the ACS acts as proxy in reproducing the joint distribution of the process. A valid postulate would be that precise reproduction of the joint distribution at a single scale would lead to precise reproduction of statistical properties at all larger scales. Yet the level that the true joint distribution of rainfall can be approximated by reproducing ACS's and marginal distributions needs further investigation. For example, the binary process in CoSMoS-2s clearly contributes to improving the simulation at multiple scales. Theoretically, a binary process can be fully characterized by a transition matrix that defines the occurrence probability of any binary sequence. This is ideal but not parsimonious; a transition matrix estimated from data can have thousands of parameters. Whether transition matrices emerging from copula-based binary processes, as generated here, match observed transition matrices is unexplored and this will be the topic of a future communication.

The popular technique for reproducing characteristics at multiple scales was based on disaggregation schemes. Yet such schemes, are not always easily applicable and still have limitations. Ideally, competent simulation at all scales should be achieved by simulating at a single scale which entails capturing the “soul” of the process. We can exploit yet the improved performance of CoSMoS-2s and use it as a disaggregation kernel in the DiPMaC scheme (Papalexiou, Markonis, et al., 2018) to further enhance its performance. DiPMaC is constrained to reproduce the intermittent ACS at a fine scale and match totals at a larger scale. Yet DiPMaC is not constrained in reproducing wet/dry spells and thus its performance could be markedly improved by coupling it with CoSMoS-2s.

9.3 One-, Two-, Three-… Multi-State Processes

The two-state approach can be generalized to simulate processes conceptualized as having multiple states, where each state is expressed by a specific occurrence probability. For example, a binary process can model two states (here dry as 0 and wet as 1); for rainfall the dry-state value coincides with the actual rainfall value. However, generalizing, we can use a process with discrete marginal to describe, for example, the probabilities urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0461, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0462, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0463, of three states indicated by arbitrary values (e.g., 1, 2, 3). Then each state can be simulated by another process (e.g., with continuous marginals). Technically, an urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0464-state process, generalizing the binary simulation framework (Section 6), can be simulated by transforming a parent Gaussian (or non-Gaussian) process. For example, a three-state process can be formed by mapping Gaussian values as urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0465 where urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0466 is the urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0467-score corresponding to the probability indicated by the subscript. This approach could improve performance, for example, of the CoSMoS-1s suggested in (Papalexiou & Serinaldi, 2020) for processes having a mixed-type marginal with probability masses at two points, that is, at a minimum and a maximum. Such variables appear frequently in nature and include cloud cover, water depth in natural reservoirs, etc.

9.4 Advancing Algorithmic Implementation

A model with more parameters, in general, reproduces explicitly more features. Yet more parameters increase complexity and jeopardize parsimony. Modeling explicitly the binary and continuous processes adds two parameters (if a two-parameter binary ACS is used). However, in practice, this approach facilitates simulation since analytical approximations bypass the need for numerical estimation of correlation transformation integrals. The characteristics of the binary process are now assessed by analytical equations (Section 6). Similar approximations can be formed for the continuous process. The correlation decrease caused to Gaussian variables when transformed to follow desired marginals (see Section 4) depends only on the shape parameters of the target distribution (location and scale parameters have no effect since their transformation effect is linear). Thus, if the target marginal has only one shape parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0468, we can create parametric (or nonparametric) interpolation functions urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0469 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0470 (following Section 6). Such functions readily provide the CTF (Equation 12) parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0471 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0472 for any value of the target distribution's shape parameter urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0473. If the target distribution has two shape parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0474 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0475, it is still technically feasible to form bivariate functions (surfaces) urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0476 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0477 to estimate the CTF parameters. For example, the parameters urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0478 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0479 are estimated in a grid of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0480) points and bivariate interpolation is used to assess urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0481 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0482 for any desired urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0483. Clearly, this procedure must be repeated for different distributions, yet it is applicable for any distribution, and a “library” for popular distributions can be created. This approach can also be applied for the CoSMoS-1s model; however, the probability zero urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0484 is an additional “shape” parameter affecting the correlation decrease. Thus, it is laborious to form functions urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0485 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0486 in a three-dimensional grid comprising a huge number of urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0487 points. In fact, this technique was applied a few years ago to create interpolation functions urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0488 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0489 to simulate intermittent rainfall with one-shape-parameter marginals (see the CoSMoS.xlsx file in Supporting Information S2).

9.5 Extending to Space-Time Simulation

A main quest would be to extend the CoSMoS-2s and potentially its rank- and copula-based variants for space-time rainfall modeling. Advances in generating intermittent static (frozen) random fields (RF's; Papalexiou & Serinaldi, 2020), and the introduction of locally varying velocity and anisotropy capabilities (Papalexiou, Serinaldi, & Porcu, 2021), could potentially be enhanced. Formulating a two-state approach for space-time modeling would allow one to explicitly simulate the spatiotemporal (linear or rank) correlation structures (STCS) of the binary and continuous processes. This may lead to improved modeling of wet/dry regions, and better representation of storm cells; this would be manifested also in more accurate spatiotemporal scaling descriptions of the process. However, coupling the binary and continuous space-time processes is not trivial as in the univariate case. If the two processes are independent of each other, unrealistic space patterns are formed. A first exploration shows that storm cells are not smoothed out naturally, or else, the transition from wet to dry regions seems artificial. This could be tackled by imposing some form of dependence between the binary and continuous processes. Finally, another option for space-time modeling could be a three-step approach that builds on: (a) a univariate binary process to model long-term clustering of wet/dry fields, (b) a spatiotemporal binary process to reproduce the short-term spatiotemporal structure of wet/dry regions, and (c) a continuous spatiotemporal process to imitate spatiotemporal structure of wet regions or storm cells.

9.6 Further Explorations

The analysis indicates that rainfall properties are better reproduced at multiple scales by accurately simulating the wet/dry spells, or else the intermittency and the correlations of wet clusters. This was accomplished by reproducing explicitly the ACS's of the binary and continuous processes. Yet such a framework can include many variations. For example, more methods should be explored to generate intermittency. Such methods include: generating binary sequences using transition matrices; reproducing explicitly the distribution of wet and dry spells (potentially as a bivariate process with discrete marginals); or reproducing linear or non-linear binary ACS's based on different copulas. The same holds for simulating the continuous process, for example, using not only Gaussian and t-copulas but others such as the Clayton, Gumbel, etc. These options and their combinations will be investigated in a follow-up study.

10 Conclusions

The Holy Grail in rainfall modeling is a consistent representation of rainfall at all scales. This implies that any characteristic in observed time series, at any scale, should be reproduced in the simulated ones. Yet a competent modeling strategy cannot aim in reproducing explicitly too many characteristics, but rather identify those basic ones, keep them as few as possible, and still adequately represent the process.

Modeling essential rainfall characteristics at a single scale could lead in simulating well rainfall at many scales. The author deems that these characteristics are: (a) the probability of dry, (b) the probability distribution describing nonzero values, and (c) the dependence structure. There are different ways to combine and reproduce such features, mainly because there are different dependence structures and methods to introduce intermittency. Thus, identifying the form of these components to design easily applicable and accurate models reproducing multiscale properties is not trivial. Here, we compare two conceptually different methods to generate rainfall (or intermittent processes in general). Both preserve the probability of dry and marginal distribution of nonzero values. The first (CoSMoS-1s), introduced in Papalexiou (2018), treats rainfall as one-state process, and reproduces the intermittent linear ACS, that is, the autocorrelations derived from the complete observed time series (including zero and nonzero values). Technically, the intermittency is introduced in one step by transforming Gaussian time series using a mixed-type quantile. The second (CoSMoS-2s), introduced here, considers rainfall as a two-state process. It couples a binary and a “hypothetical” continuous process reproducing explicitly their linear (or rank/copula based) ACS's. The intermittency, here, is modeled by the binary process.

Highlights from exploring rainfall modeling in this study, include:
  1. A framework for building probability distributions is introduced and applied to form a new system of distribution suitable for rainfall. The Generalized Exponential distributions type 1–6 (urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0490urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0491) are flexible, comprising a scale and two shape parameters to control the left and right tails; have analytical quantile expressions that facilitate fitting, and fast quantile transformations applicable in time series generation. In contrast to power-type distributions, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0492 distributions have always finite variance securing the existence of correlation and CTFs.

  2. The use of CTFs expressed by simple parametric functions (Papalexiou, 2018) is extended to negative correlations. There is a lower limit of negative correlation a process with non-Gaussian marginals can reach which is larger than −1 and can be easily estimated using the fitted CTF.

  3. Simulating binary times series having any linear (or Spearman's) ACS is simplified with analytical approximations.

  4. Theoretical analysis shows that the ACS of intermittent rainfall (including zero and nonzero values) can emerge using different combinations of binary and continuous processes. The gain in parsimony in CoSMoS-1s seems to be balanced by accuracy loss in reproducing the wet-dry spells.

  5. The CoSMoS-2s model is introduced which simulated accurately the wet/dry spells and the correlations within wet spells. This improved rainfall simulation at multiple scales.

  6. A rank-based CoSMoS-2s is proposed that reproduces the Spearman's rank correlations of the binary and continuous process. This variant is “analytical” and does not require numerical integrations.

  7. A copula-based CoSMoS-2s model is introduced that enables generation of intermittent times series with nonzero values having the dependence structure of any desired copula.

  8. Many extensions of the two-state approach are suggested and can spark further research. Such extensions include multi-state processes, variations in simulating intermittency and wet clusters, as well as space-time generalizations.

No doubt, the list of available rainfall models keeps piling up—none is perfect. This study attempts to advance rainfall modeling by building an accurate and easily applicable model at a single scale which reproduces rainfall properties at multiple scales. Yet the endeavor for the “ultimate” model remains.


I am grateful to Geoff Pegram, the two anonymous Reviewers, and the Associate Editor for their constructive remarks that helped to improve the original manuscript. I also thank Sofia Nerantzaki for spotting early references of rainfall models, and Francesco Serinaldi and Giuseppe Mascaro for discussing with me several points during the revision. This work was supported by the project “Investigation of Terrestrial HydrologicAl Cycle” (ITHACA) funded by the Czech Science Foundation (Grant: 22-33266M). The support of the Natural Sciences and Engineering Research Council of Canada is also acknowledged (NSERC Discovery Grant: RGPIN-2019-06894). The CoSMoS R package (Papalexiou, Serinaldi, Strnad, et al., 2021), originally resealed in April 2019, is available at CRAN (R Core Team, 2021); see also https://cran.r-project.org/web/packages/CoSMoS/vignettes/vignette.html.

    Conflict of Interest

    The author declares no conflicts of interest relevant to this study.

    Appendix A: Notation and Additional Equations

    1 Notation

    Probability distributions are abbreviated by using script letters followed by the parameters within parentheses. The parameters are omitted for brevity when appropriate. Distributions mentioned in this study include: the Exponential, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0493; Weibull, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0494; Pareto II, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0495; Burr type III, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0496; Burr type XII, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0497; Generalized Standard Gompertz, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0498; Generalized Gamma, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0499; and the Generalized Exponential type 1–6, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0500 to urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0501. In all expressions urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0502 is a positive scale parameter and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0503 a shape parameter. When more than one shape parameters exist, they are subscripted, for example, urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0504 and urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0505 denote positive shape parameters controlling the left and right tail, respectively.

    2 Additional Equations

    The next expressions show the pdf's (Equations A1 and A4) or the cdf's (Equations A2A3 and A5-A8) of distributions mentioned in the main text.
    Let urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0514 be a mixed-type rv expressed by probability mass at zero urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0515 and a continuous rv urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0516 for positive rainfall values. The mixed type expressions of the cdf urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0517, quantile function urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0518 are given by
    and the mean urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0521 and variance urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0522 by
    Expressions for higher order moments can be estimated by using the formula that links central and raw moments. For example, the raw moments expression is
    and can be used to estimate the third and fourth central moments as
    and then the coefficients of skewness urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0528 and kurtosis urn:x-wiley:00431397:media:wrcr26026:wrcr26026-math-0529.

    Data Availability Statement

    The author used four-hourly rainfall records (database codes: 366889, 310301, 097847, 237976) from the data set DSI-3240 archived at the National Climatic Data Center (NCDC) and found at https://doi.org/10.5065/YP2D-XA17.