Random Fields Simplified: Preserving Marginal Distributions, Correlations, and Intermittency, With Applications From Rainfall to Humidity
Abstract
Nature manifests itself in space and time. The spatiotemporal complexity of processes such as precipitation, temperature, and wind, does not allow purely deterministic modeling. Spatiotemporal random fields have a long history in modeling such processes, and yet a single unified framework offering the flexibility to simulate processes that may differ profoundly does not exist. Here we introduce a blueprint to efficiently simulate spatiotemporal random fields that preserve any marginal distribution, any valid spatiotemporal correlation structure, and intermittency. We suggest a set of parsimonious yet flexible marginal distributions and provide a rule of thumb for their selection. We propose a new and unified approach to construct flexible spatiotemporal correlation structures by combining copulas and survival functions. The versatility of our framework is demonstrated by simulating conceptual cases of intermittent precipitation, double‐bounded relative humidity, and temperature maxima fields. As a real‐word case we simulate daily precipitation fields. In all cases, we reproduce the desired properties. In an era characterized by advances in remote sensing and increasing availability of spatiotemporal data, we deem that this unified approach offers a valuable and easy‐to‐apply tool for modeling complex spatiotemporal processes.
1 Introduction: Brief Literature Review on Random Fields
Hydroclimatic, environmental, and geophysical processes such as precipitation, temperature, relative humidity, and surface wind fields are characterized by spatial and temporal variability (Sun et al., 2012). The complexity of the dynamics of such processes evolving in both space and time often prevents the use of deterministic models based on the governing equations. Therefore, stochastic models represent viable alternatives that are generally parsimonious in terms of the number of parameters and correspond to processes mimicking many patterns of the observed spatiotemporal variability, without necessarily following the underlying governing equations (Kyriakidis & Journel, 1999).
Among stochastic models, random fields provide the most appropriate descriptions of natural processes evolving in space and time. Let Ω be the sample space of a random experiment, ℝn the n‐dimensional (spatial) Euclidean space, and
the 1‐dimensional (temporal) Euclidean space; a spatiotemporal random field is a stochastic process, {X(ω; s,t)}, where ω ∈ Ω and (s,t) ∈ ℝn × T, that is, a collection of random variables {X(s1,t1),...,X(sn,tT)} at points (coordinates) (s1,t1),...,(sn,tT) over the Cartesian product ℝn × T (see Christakos, 2005, chapter 2.2 and 5.3 for formal definitions and more details).
The study of spatial and spatiotemporal random field models has quite a long history (Sun et al., 2012; Vanmarcke, 1983). Nonetheless, it is still a very active area of research because of the new challenges related to analysis and modeling of large data sets coming from remote sensors (Sun et al., 2012) or the study of spatial extreme values (Bacro & Gaetan, 2012) in a variety of applied fields, such as epidemiology and health risk assessment (Angulo et al., 2013; Christakos & Hristopulos, 1998; Fei et al., 2019; He et al., 2019), meteorology, environmental monitoring, and design (e.g., Genton, 2007; Jona‐Lasinio et al., 2012; Kleiber, 2016; Li et al., 2008; Nevat et al., 2015; Sampson & Guttorp, 1992; Youngman & Stephenson, 2016), geotechnics and reservoir engineering (Hardy & Beier, 1994; Ronold, 1990), image analysis, computer vision and medical imaging (Karimaghaloo et al., 2016; Winkler, 1995, and references therein), finance and insurance (e.g., Evstigneev & Zhitlukhin, 2013; Goldstein, 2000; Lescourret & Robert, 2006), and many others. Since spatiotemporal modeling deals mainly with the second‐order properties of random fields evolving temporally over a given spatial domain, covariance functions describing the interactions between spatial and temporal components are fundamental for estimation, prediction, and simulation (Alegría & Porcu, 2017; Gneiting et al., 2006). Constructing valid positive definite spatiotemporal covariance functions is sometimes closely related to simulation methods (Schlather, 2012).
First‐ and second‐order properties (i.e., mean, variance, and covariance) are sufficient statistics when the process is Gaussian. However, Gaussian spatial or spatiotemporal models rarely fit the observations of environmental processes. Therefore, non‐Gaussian options should be considered. The approaches proposed for building non‐Gaussian spatial processes can be grouped in two main classes (Allard, 2012): (1) methods involving the transformation of the data into values that are compatible with a Gaussian process or another tractable process (Chilès & Delfiner, 2009; Wackernagel, 2003) and (2) methods assuming that the random field follows specific processes, such as χ2 or t‐Student (e.g., Adler & Taylor, 2007), Gamma (Wolpert & Ickstadt, 1998), or skew normal (Allard, 2012). However, these approaches are not flexible and general enough to model complex processes such as precipitation or wind. These processes are intermittent, and the continuous part of their marginal distribution is characterized by large variability in shape.
Although a variety of simulation algorithms are available in the literature (see Christakos, 2005; Lantuejoul, 2002; Schlather, 2012, for a review), most of them are devised for random processes with Gaussian marginal or joint distribution and enable only simulation of spatially correlated but temporally independent random fields (Schlather, 1999). Therefore, independently of the strategy used to build non‐Gaussian random fields, their simulation often relies on the simulation of parent Gaussian fields. This method, which is known in spatiotemporal stochastics literature as the nonlinear transformation approach (Christakos, 2005, pp. 332–334; Liou et al., 2011), comprises the transformation of the marginal distribution via the so‐called normal quantile/score transform (also known as Gaussian anamorphosis and marginal‐back transform), and the transformation (inflation) of the covariance function. The latter aims to obtain the covariance of the parent Gaussian process yielding the desired covariance of the target process after applying the marginal‐back transformation. However, we stress again these schemes lack the generality to simulate intermittent processes, or processes with mixed‐type marginals with probability mass at arbitrary points, and reproduce any spatiotemporal correlation structure. Yet their most prominent disadvantage is the use of cumbersome numerical approaches to estimate the inflated Gaussian correlations that typically are constrained in preserving the lag‐1 or lag‐2 correlations.
Following the rationale of nonlinear transformation approach and building on the methodological results reported by Papalexiou (2018), in this study we introduce a framework to efficiently simulate spatiotemporal random fields with prescribed marginal distribution and spatiotemporal correlation suitable to describe a variety of environmental variables evolving over spatiotemporal lattices.
The paper is structured as follows: section 2 presents the key elements of the modeling and simulation methods consisting of a set of marginal distributions, nonlinear transformation, and spatiotemporal correlation functions. We suggest a set of marginal distributions, that are based on entropy maximization and simple selection rules, which are flexible enough to model a great variety of natural and nonnatural processes. The nonlinear transformation method to obtain the Gaussian spatiotemporal correlation structure is optimized by an efficient fitting and parametrization procedure relying on a simple correlation transformation function. We also suggest an alternative approach to build potentially permissible (valid) spatiotemporal correlation functions based on copulas and univariate survival functions, discussing the case of separable (independent) spatial and temporal correlation functions as well. In section 3, we report some representative examples demonstrating the versatility of the proposed approach and its capability to simulate virtual fields of precipitation with discrete‐continuous marginal distributions, relative humidity with double‐bounded beta marginal distributions, and daily maximum temperatures. As a real‐world example, we show in section 4 the simulation of spatiotemporal rainfall fields reproducing rainfall records over a spatial lattice of 0.05° resolution extracted from CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data). An extensive discussion in section 5 is followed by perspectives and conclusions in section 6.
2 Random Fields Preserving Marginal Distribution and Correlations
2.1 Probability Distributions for Fields
In the most general case, and in layman's terms, a spatiotemporal field is a collection of a finite (or infinite) number of random variables (r.v.'s) interrelated with each other. The properties of these r.v.'s may vary in time and space. Thus, their marginal distributions are an essential component of the field, with the other being dependence structures among them. We could say that the marginal distribution embodies the “potential” of a process—it dictates the frequency or the likelihood of its different magnitudes (values) to occur; it governs the frequency and the magnitude of extremes events. Thus, choosing a marginal distribution is crucial, and yet the selection is often based on performance criteria that may lead to choices neglecting even the variable's feasible range. Random variables of natural processes such as precipitation, wind, and discharge take positive values and are typically positively skewed, or heavy tailed (at fine spatiotemporal scales); however, it is not uncommon in the practice to use distributions with inconsistent support or inconsistent tails. For example, distributions having a location parameter (except those defined in the whole real line) end up having a lower bound or even an upper bound. Widely used distributions such as the Gamma and the Exponential have thin tails and cannot model adequately extremes of heavy‐tailed variables; for example, global analyses show that nonzero daily precipitation has, in general, heavier tails than the exponential (see, e.g., Papalexiou et al., 2013; Serinaldi & Kilsby, 2014).
The number of distributions that can be formed, at least theoretically, is infinite. The common practice of fitting many distributions (now facilitated greatly by using software) and selecting the best fitting option, as aforementioned, may lead to inconsistencies. In this respect, Papalexiou and Koutsoyiannis (2012) derived consistent distributions for positive geophysical and hydroclimatic r.v.'s based on theoretical reasoning and using entropy measures. Yet as a general rule of thumb, we propose that the continuous distributions to be used in practice should be (1) consistent with the valid range of the variable under study, (2) flexible enough to model heavy tails and J‐ and Bell‐shaped probability densities, and (3) have the least number of parameters necessary to fulfil the previous two requirements.
) (Papalexiou & Koutsoyiannis, 2012; Stacy, 1962) and the Burr type III (
) and XII (
) distributions (Burr, 1942; Papalexiou, 2018; Tadikamalla, 1980) as general and flexible candidate distributions. We also propose here a generalization of the Beta of the second kind (
), which for γ2 → 0 (see equation 2) becomes the celebrated Gamma distribution (
. The probability densities of the
and the
distributions, and the distribution functions of the
and
, are given, respectively, by
(1)
(2)
(3)
(4)These distributions, defined in (0, ∞), have a scale parameter β > 0 and two shape parameters γ1 > 0 and γ2 > 0 that control, respectively, the left and right tails. The product moments of these distributions are given in Appendix A and can be used to estimate the mean, variance, skewness, or higher order moments. If simpler two‐parameter models, such as the Pareto II, Lognormal, Weibull, and Gamma, can describe the data and the extremes adequately, they should be preferred. Also, in cases dealing with extremes such as annual maxima or peaks over threshold, the well‐known extreme value distributions (Generalized Extreme Value distribution (
) and Generalized Pareto (
) should be considered as they are supported by theoretical justification.
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)Thus, the unconditional distribution function F(x) and the quantile function QX(u) become piecewise functions, while the mean and variance of X can be directly estimated. A real‐world case could be the water depth of small lakes or ponds that dry up (probability mass at zero depth) or overspill for long periods due to inflows or long‐lasting precipitation events (probability mass at maximum depth); the in‐between states can be described by a continuous distribution with values ranging in (min, max).
2.2 Correlation Transformations of the Parent‐Gaussian Fields
Let {X(s,t) : s ∈ ℝ2, t ∈ [0, ∞)} be a stationary spatiotemporal target process having a prescribed marginal distribution FX(x) with finite variance and a valid (positive definite) spatiotemporal correlation structure (STCS) ρX(δ, τ), where δ is the Euclidean distance between the points si and sj in space and τ the time lag. The simulation method proposed here relies on the fact that the process {X(s,t)} can be obtained by transforming a parent‐Gaussian process {Z(s,t) : s ∈ ℝ2, t ∈ [0, ∞)} with a standard Gaussian marginal distribution ΦZ(z) and a suitable STCS ρZ(δ, τ) that needs to be specified. One of the first papers that applied this idea was by Li and Hammond (1975) to generate univariate temporal time series using a few continuous marginals and preserving the autocorrelation up to lag 2.
This approach requires a twofold transformation involving the marginal distribution and the STCS. Denoting the quantile function of FX(x) as QX, the r.v. X results from the Gaussian r.v. Z by applying the transformation g(Z) ≔ QX(ΦZ(Z)). While the monotonic nonlinear transformation g(Z(s,t)) applied to a Gaussian process yields a process with the desired marginal distribution, it affects the STCS ρZ(δ, τ). Particularly, let (Z1, Z2) be a pair of r.v.'s following a bivariate normal distribution with correlation ρ; then, nonlinear transformations X1 = g1(Z1) and X2 = g2(Z2) lead to a pair of r.v.'s (X1, X2) with correlation less (in absolute values) than ρ. The proof of this result is credited to Lancaster (1957) who referred to it as the maximal property of the bivariate normal distribution. Yet similar results were proven even earlier in the works of Maung (1941) and Gebelein (1941); the topic was also extensively studied by Rényi (1959). This result is obviously valid in a spatiotemporal framework, that is, for any bivariate vector (Z(si, t), Z(sj, t − τ)) we have |Cor(g(Z(si, t)), g(Z(sj, t − τ)))| < |Cor(Z(si, t), Z(sj, t − τ))| where g is the monotonic nonlinear transformation leading to the desired marginal.
As the function g yields a target STCS ρX(δ, τ) weaker than the parent‐Gaussian STCS ρZ(δ, τ), we need to suitably inflate the ρZ(δ, τ) in order to obtain the desired ρX(δ, τ). Thus, simulating the process {X(s,t)} requires the introduction of a correlation transformation function (CTF)
yielding the required ρZ(δ, τ) corresponding to the desired ρX(δ, τ) such that
. In other words,
takes as argument the target STCS and returns the inflated STCS of the parent‐Gaussian process. In general, the functions linking the Gaussian and transformed correlations do not have analytical expressions apart from a few specific cases. For example, Baum (1957) provided the analytical expression when the target marginal is uniform; analytical expressions also exist for the Lognormal distribution (Matalas, 1967). In his seminal paper, Lancaster (1957) proposed the use of Hermite‐Chebyshev polynomial expansions as a method to form this link; such methods have further been improved, for example, by van der Geest (1998).
based on a simple parametric expression of a CTF applicable also for mixed‐type marginals. We provide a brief summary of the procedure to make the paper standalone. Apart from the Hermite‐Chebyshev expansion proposed by Lancaster (1957), another idea uses the expectation of a transformed random variable (Feller, 1971, p. 5). If X = g(Z) emerges by transforming Z and the probability density function (pdf) fZ(x) is known, then the expectation
is calculated without the need to know the pdf of X. This is valid also in two dimensions, and the correlations of the processes {X(s,t)} and {Z(s,t)} can be linked by
(13)
(14)
, and
is the bivariate standard Gaussian pdf. Even though the double integral
does not generally have an analytical solution, its numerical estimation is quite simple and it can be performed for a fixed number of values in order to obtain a map ρZ → ρX. The resulting (ρX, ρZ) pairs describing the required correlation transformation function are interpolated by fitting a simple parametric function. In particular, Papalexiou (2018) extended the use of equation 14 by allowing the mixed‐type quantiles and suggested the following function
(15)The CTF
in equation 15 passes from the points (0,0) and (1,1) (as ρX is 0 and 1, for ρZ equal to 0 and 1, respectively) and is concave, thus fulfilling the condition |ρZ| ≥ |ρX| for b > 0 and c ≥ 0. Finally, the link between the two correlations can also be established using the Jacobian of the transformation to estimate directly the bivariate density
and in sequence the covariance (see Papalexiou, 2018 for more details).
2.3 Unified Space‐Time Correlation Structure
A space‐time correlation structure (STCS) ρ(δ, τ; θ) is a single mathematical expression that provides the correlation between two r.v.'s X(si,t) and X(sj,t‐τ) located at two points of the RF and have Euclidian distance ‖δ‖ and are lagged in time by τ. An STCS provides an elegant way to express the spatiotemporal properties of an RF using a single expression, and should have desirable mathematical properties (positive definiteness) that enable the generation of RF's. The literature on STCS's is vast with efforts focusing on constructing positive definite expressions. Some celebrated expressions include the Spherical, Stable, Gaussian, Whittle‐Matérn, Bessel, and Cauchy (Wackernagel, 2003 pp. 334–336). This topic is still an active area of research in geostatistics with many interesting recent contributions (e.g., Alegría & Porcu, 2017; Bogaert & Christakos, 1997; Gneiting et al., 2006; Ma, 2002; Porcu et al., 2016; Schlather, 2012).
, where F(x) is a valid distribution function) can be used as correlation structures and several cases were demonstrated (Papalexiou, 2018). Here we extend this approach by defining an STCS as
(16)
and
are survival functions with (δ, τ) ∈ [0, ∞) and parameter vectors θS and θT, respectively. Note that this construction leads to functions with desired properties for an STCS, that is, strictly and monotonically decreasing functions with ρ(0,0; θ) = 1 and limδ,τ → ∞ρ(δ, τ; θ) = limδ → ∞ρ(δ, τ; θ) = limτ → ∞ρ(δ, τ; θ) = 0; it is stressed yet that solely these conditions do not suffice, in general, to lead in positive define STCS's. The parameter vector of the STCS is θ = θS ∪ θT ∪ θC. Also, the
and
can be interpreted as functions describing marginal correlations in space and time, respectively. Precisely, marginal space and temporal correlation structures are defined for τ = 0 and δ = 0, that is,
and
. We stress that the survival and copula functions are not used in the typical probabilistic sense; instead, they are used as mathematical functions enabling to build flexible strictly decreasing two‐dimensional functions that could form potential spatiotemporal correlation structures.
Theoretically, this framework enables the construction of an infinite number of STCS's. It is out of the scope of this study, however, to investigate the mathematical properties and try to assess which copula families and which sf's can be combined leading to positive definite STCS's (see section 5 for further remarks). As a general principle, the copula functions and sf's used to build STCS's should be as simple as possible with analytical expressions facilitating implementation. Note that we name the STCS's by the copula name and the corresponding distribution name of the sf used.
(17)
and
. This leads to the Clayton‐Weibull STCS
(18)
(3,0.9,0.2) and probability zero p0 = 0.8 (see Figure A1 for a sequence of 80 RF's).

Some other simple copula families with potential use in constructing STCS's are the Gumbel‐Hougaard, Ali‐Mikhail‐Haq, Joe, Farlie‐Gumbel‐Morgenstern, among others (Nelsen, 2006; Salvadori et al., 2007). Note that potentially different sf's can be combined with a copula. For example, the marginal space correlation sf could be a Pareto II, while the marginal temporal correlation could be a Weibull resulting in a Clayton‐ParetoII‐Weibull STCS. We stress again, however, that the complete parameter space resulting in valid positive definite STCS's needs detailed investigation; it is a complex mathematical problem that goes beyond the scopes of this study.
2.4 Space and Time Separable Correlation Structures
(19)For the Clayton copula in equation 17, the product copula corresponds to limiting case θ = 0 (Salvadori et al., 2007; p. 237). Therefore, building STCF via copulas allows modulating the spatiotemporal dependence including independence and therefore separable STCF, if the selected copula family includes the product copula as a special case.
The hypothesis of STCS separability means that the STCS can be decomposed into the product (or sum) of a purely spatial and a purely temporal correlation functions (e.g., Gneiting, 2002). Intuitively, this hypothesis corresponds to assuming that the cross correlation (spatial correlation) can be studied independently of the autocorrelation (temporal correlation; see, e.g., Genton, 2007; Gneiting et al., 2006, and references therein for a technical discussion). Of course, more general types of STCS's do exist (Genton & Kleiber, 2015; Gneiting et al., 2006), and separable STCS's have some shortcomings (Cressie & Huang, 1999, p. 1,331; Kyriakidis & Journel, 1999, pp. 664–666). However, although nonseparable models are often physically more realistic, separable models of space‐time covariance functions are valuable in both physical and health applications (Kolovos et al., 2004). Indeed, this class of models are the simplest way to account for spatiotemporal dependence especially in cases, such as the occurrence of rare events, where there are often not enough data to justify the use of more complex dependence structures. Separable STCS's are also easier to estimate and allow for quite simple simulation.
and
any valid survival function that can be used as a valid correlation structure. As mentioned above, sf's are used as parametric functions and not as probability laws. Two simple and flexible models are the Weibull and Pareto II structures, given, respectively, by
(20)
(21)
(22)
(23)In practice, using a nonseparable (equation 16) or separable (equation 19) STCS to describe the empirical STCS should be investigated. If using a separable STCS results in small differences from the empirical STCS, or, the estimated copula parameter in the nonseparable version indicates weak space‐time interrelation, then the separable formula should be preferred. This is specifically true for fields that are strongly correlated in time. For example, if generation of Gaussian fields is made using high‐order Multivariate Autoregressive models (MAR), then their formulation for separable functions is heavily simplified as only diagonal coefficient matrices Ai are used. Although the diagonal MAR models can explicitly preserve only the spatial structure for τ = 0, and the autocorrelation structure at single field points (δ = 0), this does not imply that lagged in time and space points have zero correlation; actually, the correlation between any two points and for any temporal lag τ can be estimated by the chosen separable STCS.
To further clarity, theoretical STCS's are mathematical surfaces and can be compared with their empirical counterparts. The best option is a surface that guarantees the best fit in a parsimonious way. Fitting a separable STCS that results in agreement between the spatial ρX(δ; θ) and temporal ρX(τ; θ) correlation structures with their empirical counterparts
and
does not imply agreement between the lagged in space (δ > 0) and time (τ > 0) correlations ρX(δ, τ; θ) = ρX(δ; θ)ρX(τ; θ) with the empirical ones
. It should be clear, however, that using a separable STCS also does not imply zero values for the lagged space‐time correlations; these values can be estimated by ρX(δ; θ)ρX(τ; θ) and compared with the empirical ones. If these differences are small, then simulations using separable functions are computationally more efficient than using nonseparable functions.
2.5 Blueprint for Simulating Fields in Simple Steps
- Fit or choose an appropriate target marginal distribution FX(x) that describes the values of the RF's (section 2.1).
- Compute the correlation transformation function
for the target marginal distribution (section 2.2).
- Describe the target spatiotemporal correlation structure (STCS) of the field by using nonseparable (section 2.3) or separable (section 2.4) STCS ρX(δ, τ). Separable STCS's facilitate computation but should be used with caution.
- Transform the target STCS by using the correlation transformation function to compute the Gaussian STCS, that is,
.
- Generate Gaussian RF's by using multivariate AR(p) models (or any other method) that reproduce the parent‐Gaussian STCS.
- Transform the Gaussian RF's using the marginal‐back transformation.
We emphasize that we have not observed any effect in the positive definiteness of an STCS when it was transformed by the correlation transformation function
. Yet Steps 3 and 4 can be modified to assure that the Gaussian STCS is positive definite. Particularly, once the correlation transformation function
is defined (Step 2), instead of fitting a positive definite function ρX(δ, τ) to the observed field, we can fit a function of the form
, where
is the inverse of the correlation transformation function and ρZ(δ, τ) can be any mathematically proven positive definite STCS. This assures that the Gaussian field will be generated by using the positive definite STCS ρZ(δ, τ).
We note that this framework can be readily extended to reproduce seasonality by switching the model that generates the Gaussian fields based on a seasonal cycle and using the corresponding seasonal marginal distributions (for details on the univariate case the reader is referred to Papalexiou, 2018).
3 Random Fields in Action
3.1 Precipitation
We generate precipitation RF's with a mixed‐type marginal distribution having probability zero p0 = 0.8 and with nonzero values following a J‐shaped power‐type distribution, that is, the Burr type XII
. This choice is consistent with global analyses on daily precipitation showing that the Generalized Gamma and the Burr type XII are two effective models to describe precipitation (Papalexiou & Koutsoyiannis, 2012, 2016). We describe the spatiotemporal correlation structure by the Clayton‐Weibull model ρCW(δ, τ), given in equation 18, with parameters (bS, cS, bT, cT, θ) = (20,0.7,1.1,0.8,2) (Figure 1a). For the target p0 and marginal
distribution the correlation transformation function is
(Figure 1b), which is used to transform the target STCS into the Gaussian STCS (Figure 1c).
Note that the target STCS with these parameters (Figure 1a) provides a stretched‐exponential decaying spatial correlation structure (for example, for temporal lag τ = 0 and space distance δ = 1 and δ = 100 we have ρCW(1,0) = 0.88 and ρCW(100,0) = 0.05, respectively). The temporal structure is assumed to be less intense, as is typically observed in daily precipitation time series, yet decaying slower than a Markovian structure; for example, the autocorrelation of each point for lag τ = 1 and τ = 5 is ρCW(0,1) = 0.4 and ρCW(0,5) = 0.03, respectively. For demonstration we use the full MAR(5) that reproduces precisely the spatiotemporal structure up to lag τ = 5 and up to any distance δ. Finally, we generate Gaussian RF's from the MAR(5) and transform them to the target fields using the mixed‐type quantile function (see also Papalexiou, 2018). A sample sequence of nine RF's (Figure 1d) demonstrates characteristic properties of precipitation fields, that is, the strong spatial structure that yields totally dry or wet fields but also partly dry. The temporal structure is better visualized in a sequence of 80 RF's (Figure A1) where a strong clustering is apparent with several wet days followed by several dry days.
We note that stochastic modeling of precipitation has been evolving over several decades and many methods exist dating back to the 1970s (see, e.g., Amorocho & Wu, 1977; Mejía & Rodríguez‐Iturbe, 1974) and 1980s (see, e.g., Bell, 1987). Apart from methods based on Gaussian parent processes, many more exist based on different rationales. For example, there are methods using atmospheric circulation patterns (e.g., Bardossy & Plate, 1992), wavelets transforms (Kumar & Foufoula‐Georgiou, 1997), or are based on fractal and multifractal techniques (see, e.g., Foufoula‐Georgiou, 1998; Gupta & Waymire, 1993; Langousis et al., 2009; Lovejoy & Mandelbrot, 1985; Lovejoy & Schertzer, 1990; Schertzer & Lovejoy, 1987; Veneziano et al., 2006). Additionally, models for high‐resolution space‐time rainfall modeling (e.g., at 5 min and 1 km2) may need to include advection and anisotropy components in order to simulate fields more realistically at these scales. For example, Jinno et al. (1993) and Kawamura et al. (1996) use advection‐diffusion equations, Pegram and Clothier (2001) introduced the “String of Beads” method, and Mantoglou and Wilson (1982) investigated the “Turning bands method” (later modified by Leblois & Creutin, 2013); many more simulators can also be found in the literature (see, e.g., Baxevani & Lennartsson, 2015; Benoit et al., 2018; Paschalis et al., 2013; Peleg et al., 2017). Our technique can be easily modified to include anisotropy (see Figure A2) and advection, but such details are out of the scope of this study. Yet the typical approach for introducing anisotropy is trivial and is based on simple coordinate transformations (see, e.g., Youngman & Stephenson, 2016). We provide these equations in the Appendix A, and we show a simulation with anisotropy for a sequence of 80 RF's (see Figure A2).
3.2 Relative Humidity
with probability density function (pdf)
(24)
. The
distribution is defined in [0,1] and, thus, can describe variables bounded in this range such as relative humidity. Of course, any other consistent model can be used; another option is the Kumaraswamy distribution
, which has simpler and analytical cdf and quantile functions compared to the Beta distribution. Here we use the
as a target marginal (Figure 2a) and a very strong spatiotemporal structure, that is, the Clayton‐Weibull STCS ρCW(δ, τ) with parameters (βS, γS, βT, γT, θ) = (100,1,2.8,1,0.5) (Figure 2b). The spatial structure for τ = 0 is a very strong Markovian, that is, ρCW(δ, 0) ≈ 0.99δ, while the temporal structure for δ = 0 is also Markovian, less intense though, with ρCW(0, τ) ≈ 0.70τ.
For the target
distribution, the estimated correlation transformation function
shows that only a slight correlation inflation is needed, in contrast with the precipitation case in the previous demonstration. This is apparent by the similarity of the target STCS (Figure 2b) and the transformed Gaussian STCS (Figure 2c). For more details on which distributions result in highly concave correlation transformation functions see section 3.3 in Papalexiou (2018). Here we use the MAR(1) to generate Gaussian RF's and transform them to the target fields using the quantile function of the
distribution. A random sequence of nine RF's (Figure 2d) shows the very strong spatial structure with smooth spatial changes or even with fields preserving very similar values for the whole field area. The temporal clustering is better demonstrated by a longer sequence of RF's (Figure A3) where we observe consecutive fields with high relative humidity (RH) values alternating with fields having low RH values.
3.3 Daily Maximum Temperature fields
with distribution function (cdf)
(25)It is well known that such a model can represent maxima emerging from exponential‐type parent distributions as it is expected for temperature. Particularly, we use the
assuming that the field is isotropic with mean value μ ≈ 23.6 ° C and standard deviation σ ≈ 1.3 ° C (Figure 3a).

(23,1) (see Figure A4 for a sequence of 80 RF's).
The flexibility of this approach allows the choice of independent correlation structures for space and time, having the drawback yet of not being able to preserve explicitly the temporal correlation of distant points in space. As target spatial correlation structure, we use a slowly decaying Pareto II correlation structure (equation 20), that is, ρSCS(δ) = ρPII(δ; 10,5) (red line in Figure 3c). For temporal correlation structure, representing the autocorrelation of each pixel, we use the Weibull structure in equation 21 with ρTCS(τ) = ρW(τ; 4,0.75) (blue line; Figure 3c). The fitted correlation transformation function
(Figure 3b) for the target
distribution shows that inflation is negligible.
To reproduce the spatial and temporal correlation of each pixel, we use a high‐order MAR(20) model with diagonal matrices. Similarly to the previous examples, we transform the Gaussian fields using the quantile function of the target
distribution. This approach facilitates the computations, and the model reproduces precisely the spatial correlations for time lag τ = 0 (red dots in Figure 3c), and the temporal autocorrelation of each pixel up to lag τ = 20 (blue dots in Figure 3c). A random sequence of nine daily Tmax RF's (Figure 2d) illustrates the strong spatial structure but also of the temporal clustering of high‐ and low‐value fields. The latter is clearly illustrated in a longer random sequence of 80 RF's (Figure A3).
4 Real‐world Simulation of Daily Precipitation Fields
Here we explore the properties of real‐world precipitation fields, and we simulate synthetic ones. We extracted daily precipitation fields from the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) data set for a U.S. region defined by latitude 39.475° to 40.925° and longitude −111.425° to −109.925°. The spatial resolution of the data is 0.05° and covers the 1981–2017 period (see Figure 5 for six observed fields and Figure A5 for a sequence of 80 observed fields).
; it has an analytical cdf given by
(26)For γ1 = γ2 = 1 the cdf simplifies to
which can be considered as the “standard” Gompertz distribution emerging for shape parameter equal to 1 (Garg et al., 1970; Gompertz, 1825). The
is defined in [0, ∞) and has one scale parameter β > 0 and two shape parameters γ1 > 0 and γ2 > 0. Its pdf can be J‐ or bell‐shaped for γ1 < 1 and γ1 > 1, respectively, while the heaviness of the tail is controlled by γ2 (smaller γ2 lead to heavier tails).
For demonstration we use the precipitation fields of January. The empirical analysis shows that the
provides a good fit to the observed data (Figure 4a), while the probability dry is p0 = 0.78. The distribution was fitted by minimizing the square error between the observed values and the predicted ones by the quantile of the
. To describe the spatiotemporal correlation structure of the fields, we use the Clayton‐Weibull STCS introduced in equation 18 as follows: (1) we estimated the correlations between the time series in a randomly selected subsample of 62,500 pairs of grid cells (as the total number of pairs, i.e., 304, is too large) and for time lags up to 2 days; (2) we stratified the estimated correlations based on the distance of the pairs and calculated the mean correlation value; and (3) we fitted the Clayton‐Weibull STCS by numerically minimizing the square error between observed values and those yielded by the STCS. The estimated parameter vector of the Clayton‐Weibull STCS is (βS, γS, βT, γT, θ) = (3.12,0.74,0.44,0.49,0.64); this structure (see Figure 4b) provides an accurate description of the observed spatiotemporal correlation values.

distribution to the observed nonzero precipitation. (b) The fitted Clayton‐Weibull STCS compared with observed values for several time lags. (c) The empirical distribution of the generated field values compared with the target
. (d) The empirical STCS of the simulated field compared with the target Clayton‐Weibull STCS.
To simulate synthetic fields, we follow the rest of the procedure described in section 2.3: (1) we compute the correlation transformation function for the fitted (target)
and the estimated p0, that is,
; (2) we transform the fitted Clayton‐Weibull STCS ρCW(δ, τ) to obtain the Gaussian STCS, that is
; (3) we use the parent Gaussian STCS ρZ(δ, τ) to calculate the parameters of a MAR(2); and (4) we transform the generated Gaussian fields using the quantile of the target mixed‐type marginal distribution.
We show six simulated fields in Figure 5, while a sequence of 80 simulated ones is given in Figure A6. First, as anticipated, the simulated marginal distribution (Figure 4c) and the spatiotemporal structure (Figure 4d) match the target ones. Second, the comparison of the observed and simulated field (Figures 5, A5, A6) shows a clear spatial and temporal clustering. Sequences of no precipitation are followed by wet days with precipitating in the whole region or in parts of it. This behavior is clearly recognized also in the observed fields.

5 Discussion
Several issues of this blueprint should be emphasized, while others deserve further discussion. For marginal distributions of positive variables, we considered a set of distribution families (i.e., Generalized Gamma, Burr types III and XII, Beta of the Second Kind, and the Generalized Standard Gompertz introduced here) whose justification is based on simple criteria fulfilling basic distributional properties of the process under study (see section 2.1). These families comprise several well‐known and widely used distributions (Gamma, Weibull, Pareto, etc.) as special cases but are more general and flexible (enabling the modeling of both exponential and power‐type tails) while retaining parsimony in terms of number of parameters. Extending these continuous distributions by inflating them at zero, we built mixed‐type distributions that can describe zero‐inflated variables such as rainfall, or over‐threshold processes. Alternatively, these distributions can be easily combined with a Bernoulli process. We note that our framework also allows for the use of double‐bounded distributions such as beta with or without probability masses at one or both boundaries, describing, for instance, the occurrence of boundary conditions such as empty and full reservoirs.
In the proposed framework, these univariate distributions are used to transform the marginal distribution of spatiotemporal Gaussian random fields into the desired form (e.g., Generalized Gamma and Burr type XII). The spatiotemporal correlation function of these fields is suitably inflated to obtain the desired correlation after marginal transformation. To accomplish this task, we propose an efficient technique by using a transformation correlation function that enables the easy and straightforward estimation of the parent Gaussian correlation. Such an approach avoids multiple computation (or iterative procedures) of the double integral linking the correlation values of the parent and target processes, as well as the use of approximating polynomials, which are often proposed in the literature to deal with the transformation procedure.
Concerning the choice of the spatiotemporal correlation function, we proposed a new method to construct models resulting from the combination of copulas and survival functions. Of course, we are aware that new spatiotemporal correlation functions should be introduced with great care in order to guarantee that they are permissible, that is, valid functions fulfilling the condition of positive definiteness (Armstrong & Jabin, 1981). Although the proposed models have been tested over a variety of parameter sets, we did not provide theoretical proof of their positive definiteness. Moreover, some combinations of parameters surely do not fulfil this property. However, we recall that the simulation algorithm is based on vector AR models that are fed by correlation matrices resulting from the spatiotemporal correlation function. Therefore, from a practical point of view, positive definiteness of the feeding correlation matrices can be checked numerically for each specific application by calculating the sign and magnitude of their eigenvalues. Furthermore, if these matrices are not positive definite, they can be corrected to fulfil this property (Higham, 2002). Yet in all our examples, the proposed spatiotemporal correlation structures were always positive definite. This can help circumventing the problem in practical applications, while further research is required to provide mathematical proofs. We also note that the spatiotemporal correlation functions can involve the space and time separable models when the copula parameter tends to the value corresponding to the so‐called product copula, describing independence.
Noteworthy, the Gaussian dependence structure is characterized by zero tail dependence coefficient. Nonetheless, for finite size samples, the extreme values resulting by Gaussian copulas tend to preserve correlation values smaller than but not too far from those of models characterized by positive upper tail dependence, such as Student or extreme value (EV) copulas (e.g., Serinaldi et al., 2015). This behavior is due to the (elliptical) topology of the dependence structure that yields upper tail clustering weaker than that of EV or Student copulas but stronger, for example, than that of Frank or Clayton copulas. In this respect, a future line of research will be extending our model by including Student or grouped Student dependence structures building on Serinaldi and Kilsby (2016), thus accounting for asymptotic tail dependence. We also recall that the Gaussian dependence structures allow for modeling second‐order joint properties, while higher‐order joint properties can be of interest, yet they are hard to recognize and estimate based on data.
A point of special interest is the reproduction of extremes in terms of marginal properties. The frequency and the magnitude of extremes is quantified by the heaviness of the tail. Specifically, for precipitation extremes, recent global and large‐scale studies show that heavy‐tailed distributions represent extremes more accurately (see, e.g., Nerantzaki & Papalexiou, 2019; Papalexiou, AghaKouchak, et al., 2018; Serinaldi & Kilsby, 2014). In this respect, the flexibility offered by this scheme, allowing the choice of marginals with any asymptotic tail behavior (power type, stretched exponential, lognormal, etc.), facilitates the accurate reproduction of extremes if the selected marginal is indeed an adequate model. Of course, if the focus is specifically on reproducing extremes, extreme value theory also provides other options such as max‐stable processes and corresponding random fields (e.g., Blanchet & Davison, 2011; Cooley et al., 2006; Davison et al., 2012; Davison & Gholamrezaee, 2012; Robert, 2013; Martin Schlather, 2002; Tyralis & Langousis, 2019). Also, as a cautionary note, we mention that the reproduction of extremes does not only rely on competent models but also on precise data. This issue is specifically true when dealing with annual maxima where discretization bias may lead to severely underestimate the actual values. This important issue is typically neglected, while it could be tackled using the “forgotten” Hershfield factor (Papalexiou et al., 2016; van Montfort, 1990).
Also, the scheme presented here guarantees the reproduction of any marginal distribution but is not designed to reproduce specific product moments as in multifractals methods (see, e.g., Foufoula‐Georgiou, 1998; Langousis & Veneziano, 2007; Lovejoy & Mandelbrot, 1985; Schertzer & Lovejoy, 1987). For example, if a three‐parameter target marginal is fitted on the first three moments (any other fitting methods can be used), then the first three moments are explicitly preserved. All higher finite moments are completely specified by the fitted distribution. However, while the fitted marginal does not guarantee the reproduction of higher‐order moments, vice versa, preserving higher‐order moments does not guarantee the reproduction of the marginal. Similarly, this modeling approach does not explicitly reproduce moments at multiple scales; the interested reader is referred to the multifractal literature for such methods. Yet if the process at the fine scale (e.g., daily) is described with precision by the fitted model (correlation structure and marginal), then the aggregated process at higher scales will be consistent. Additionally, if long‐term persistence is needed, it can be achieved in two ways, that is, either by selecting a very strong temporal correlation structure (see examples in Papalexiou, 2018) or by disaggregating, for example, annual totals with persistence using the spatiotemporal extension of the DiPMaC disaggregation scheme (Papalexiou, Markonis et al., 2018). In any case, models reproducing long‐term persistence should be used with caution as it is difficult to distinguish or quantify persistence based on data (Markonis et al., 2018).
Finally, the use of multivariate AR models to simulate the parent Gaussian random fields is known to suffer from the so‐called curse of dimensionality, preventing the simulation of fields over spatiotemporal lattices with many grid nodes (at least in the nonseparable case and when the MAR order is very large). Therefore, further research is required to implement more efficient generators based, for instance, on spectral techniques (Lang & Potthoff, 2011).
6 Summarizing and Concluding
Geophysical and environmental processes naturally evolve in space and time, and their complexity usually prevents pure deterministic representations. Therefore, stochastic modeling coupled or not with deterministic components is a viable option to describe such natural processes. Despite the long history of geostatistics, modeling and simulation of random fields, that is, stochastic processes defined over the Cartesian product of multidimensional spatial domain and unidimensional time domain, is still an active research area. First, this is due to the challenging problems posed by the above mentioned complexity, and second, due to emerging advances in remote sensing and information technology that allow high‐resolution spatiotemporal observations and efficient storage and processing. The latter is verified by the many gridded products of precipitation, temperature, etc., that have been developed over the last decade and are readily available.
In this context, we aim to introduce a blueprint for effective, efficient, and easy‐to‐apply simulation of spatiotemporal random fields. The generated fields preserve important characteristics such as any valid spatiotemporal correlation structure and any prescribed marginal distribution. This includes also discrete‐continuous marginals to allow for simulation of intermittency, or, of more general cases where discrete probabilities exist in more than one value, but also purely discrete marginals. The scheme uses a basic idea, dating back in the 1970s, which forms a desired process by transforming a parent Gaussian process, and thus, it is based on the Gaussian dependence topologies. We build on the methodological techniques presented in Papalexiou (2018), but our results can be extended to non‐Gaussian dependences following the results shown by Serinaldi and Kilsby (2016).
As a proof of concept of the versatility of the proposed generator, we showed simulated fields for several conceptual cases mimicking zero‐inflated precipitation, double‐bounded relative humidity, and strongly dependent temperature maxima, as well as a real‐word example concerning intermittent precipitation fields. In all cases, we showed the faithful reproduction of the desired properties. Although the current study focuses on hydroclimatic random fields evolving typically in space and time, this scheme has potential applications far beyond hydroclimatology.
Further improvements concern the introduction of spatial anisotropy beyond simple coordinate transformations, and the efficient transport/advection terms allowing the simulation of weather systems, or spatial patterns, traveling over a given region. Finally, the current scheme forms the basis to extend the DiPMaC (Disaggregation Preserving Marginal and Correlations) algorithm (Papalexiou, Markonis et al., 2018) into spatial and spatiotemporal cases allowing downscaling that preserves the marginal distribution and the spatiotemporal correlation structure at any desired high‐resolution spatiotemporal scale.
Acknowledgments
S. M. P. was funded by the Global Water Futures program and acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant: RGPIN‐2019‐06894). F. S. acknowledges the support from the Willis Research Network. We thank three anonymous reviewers for their constructive remarks and criticisms that helped improve the original manuscript. The observed data used are freely available at https://www.chc.ucsb.edu/data/chirps#. We have used partly the CoSMoS R‐package available at CRAN. The authors declare no competing interests.
Appendix A
1 Product Moments
), Beta of the Second Kind (
), Burr type XII (
), and Burr type III (
) distributions shown in equations 1 to 4.
(A1)
(A2)
(A3)
(A4)
is the Gamma function and
is the Beta function. The mean μ and the variance σ2 can be estimated by μ = m(1) and σ2 = m(2) − μ2.
2 Anisotropy
, where
(A5)
(A6)3 Additional Figures






References
Citing Literature
Number of times cited according to CrossRef: 7
- Francesco Serinaldi, Federico Lombardo, Chris G. Kilsby, All in order: distribution of serially correlated order statistics with applications to hydrological extremes, Advances in Water Resources, 10.1016/j.advwatres.2020.103686, (103686), (2020).
- Jonathan R. Lamontagne, Caitline A. Barber, Richard M. Vogel, Improved Estimators of Model Performance Efficiency for Skewed Hydrologic Data, Water Resources Research, 10.1029/2020WR027101, 56, 9, (2020).
- Mohanad Zaghloul, Simon Michael Papalexiou, Amin Elshorbagy, Paulin Coulibaly, Revisiting Flood Peak Distributions: A Pan-Canadian Investigation, Advances in Water Resources, 10.1016/j.advwatres.2020.103720, (103720), (2020).
- Simon Michael Papalexiou, Chandra Rupa Rajulapati, Martyn P. Clark, Flavio Lehner, Robustness of CMIP6 Historical Global Mean Temperature Simulations: Trends, Long‐Term Persistence, Autocorrelation, and Distributional Shape, Earth's Future, 10.1029/2020EF001667, 8, 10, (2020).
- Ioannis Tsoukalas, Panagiotis Kossieris, Christos Makropoulos, Simulation of Non-Gaussian Correlated Random Variables, Stochastic Processes and Random Fields: Introducing the anySim R-Package for Environmental Applications and Beyond, Water, 10.3390/w12061645, 12, 6, (1645), (2020).
- Guangxin He, Xiaoju Zhen, Zhongliang Li, Weishou Shen, Jing Han, Lu Zhang, Xun Li, Ran Zhang, Influence of Variations of Hydrothermal Conditions on Normalized Difference Vegetation Index in Typical Temperature Zones Along the East Coast of China, Frontiers in Earth Science, 10.3389/feart.2020.574101, 8, (2020).
- Katja Polotzek, Holger Kantz, An ARFIMA-based model for daily precipitation amounts with direct access to fluctuations, Stochastic Environmental Research and Risk Assessment, 10.1007/s00477-020-01833-w, (2020).





