Volume 115, Issue D22
Climate and Dynamics
Free Access

Assimilation of multiresolution radiation products into a downwelling surface radiation model: 1. Prior ensemble implementation

B. A. Forman

B. A. Forman

Department of Civil and Environmental Engineering, University of California, Los Angeles, California, USA

Search for more papers by this author
S. A. Margulis

S. A. Margulis

Department of Civil and Environmental Engineering, University of California, Los Angeles, California, USA

Search for more papers by this author
First published: 24 November 2010
Citations: 5

This is a commentary on DOI:10.1029/2010JD013950


[1] An ensemble-based method for deriving high-resolution estimates of downwelling broadband shortwave and longwave radiation at the Earth's surface via merger of multiresolution (in space and time) satellite-based inputs is presented. An ensemble of data-derived, multiplicative, lognormal perturbations characterizes the uncertainty structure and is subsequently used to perturb nominal estimates of radiation model inputs. The resulting ensemble of model output implicitly contains radiative flux uncertainty associated with uncertain model inputs and explicitly accounts for clear sky versus cloudy sky uncertainties. The perturbations were generated using a two-dimensional conditional turning bands algorithm that accounts for both cross correlations and spatial correlations. Verification studies using independent, ground-based observations show the ensemble to perform well and to encapsulate the majority of observations with relatively little bias. The ensemble-based scheme adequately reproduced hourly downwelling radiative fluxes (and their uncertainty) under diverse atmospheric conditions over a 14 month simulation period in the Southern Great Plains of the United States, and was shown to outperform a more traditional, scalar perturbation approach. The prior ensemble is intended for inclusion into an ensemble-based data assimilation framework presented in part 2 of this study.

1. Introduction

[2] Significant spatial and temporal variability in downwelling longwave (LW) and downwelling shortwave (SW) radiative fluxes exist due to varying atmospheric characteristics. Much of this variability is associated with cloud structure, but can also be due to air temperature and humidity fields as well as other atmospheric states. This variability, in turn, can have a profound influence on the evolution of land surface states (e.g., soil moisture) and fluxes (e.g., latent and sensible heat fluxes). In order to capture the variability in downwelling radiative fluxes, accurate characterization often requires high spatial and temporal resolution [Li et al., 2005; Rossow et al., 2002]. Uncertainty arising from errors in the estimation (and measurement) of radiative fluxes, however, must also be characterized in order to improve their applicability. Therefore, a complete characterization of downwelling radiative fluxes requires that estimates: (1) be produced at a fine resolution in both space and time, and (2) provide estimates of their corresponding uncertainty.

[3] Numerous researchers have demonstrated the viability of using satellite-based instrumentation for estimating downwelling radiative fluxes at the necessary spatial and temporal scales for use in distributed hydrologic modeling applications [Gautier et al., 1980; Pinker and Ewing, 1985; Gupta, 1989; Gupta et al., 1992; Gautier and Landsfeld, 1997; Diak et al., 2000; Pinker et al., 2003]. Satellite-derived measurements, however, contain complex spatiotemporal uncertainties due to instrumental and sampling errors, and as a result, models utilizing these measurements must account for these uncertainties in order to obtain an accurate yet meaningful result. One practical technique for characterizing uncertainty is the use of an ensemble generated via prescribed perturbations of model inputs and/or parameters [Hamill et al., 2000]. This technique has been successfully applied in hydrologic studies of streamflow [e.g., Carpenter and Georgakakos, 2004; Hong et al., 2006], soil moisture [e.g., Crow and Wood, 2003; De Lannoy et al., 2006; Reichle et al., 2008], surface energy fluxes [e.g., Dunne and Entekhabi, 2006; Margulis et al., 2002], solar insolation [e.g., Lee and Margulis, 2007b], and snow water equivalent estimation [e.g., Andreadis and Lettenmaier, 2006; Durand and Margulis, 2007]. Many of these studies utilize perturbations about a nominal value as a basis for ensemble generation, which can effectively address uncertainties in model parameters, forcings, and initial states in a probabilistic framework. This representation of uncertainty can then be used to optimally merge measurements with the prior ensemble to yield a conditioned (posterior) ensemble in a data assimilation (DA) framework. Proper generation of the prior ensemble is critical in any DA scheme as poorly specified input uncertainties can lead to assimilation estimates of fluxes that are worse than the modeled estimates without assimilation [Reichle et al., 2008].

[4] One significant difference between this study and previous studies is that we chose to focus on the land surface forcing and pose it as a DA problem due to the myriad of readily available radiation products for use during conditioning. Since land surface model (LSM) applications rely on the accurate propagation of land surface states/fluxes, they are largely dependent on the land surface forcings. In order to accurately estimate the key modes of variability of land surface states/fluxes, it is therefore important to represent the correct mode of variability in the forcings. This is a chief motivation for this study with particular emphasis placed on the generation of spatially distributed, ensemble estimates of radiative fluxes, where the ensemble mean captures the true variability across large regions of space while the ensemble as a whole implicitly contains a realistic estimate of the radiative flux uncertainty.

[5] Generating an ensemble of radiative fluxes that captures the uncertainty (in space and time) requires careful consideration of the sources of the uncertainty. Ensemble methods often attempt to represent this uncertainty via random perturbations about a nominal estimate [e.g., Durand and Margulis, 2007; Margulis et al., 2002]. However, these perturbation approaches, particularly on model forcings, are often oversimplified and sometimes fail to properly account for cross correlations, correlations in space, or temporal correlations in the forcings. As a result, the uncertainty structure could be incorrect and may lead to suboptimal results. Our approach, which is discussed in more detail below, carefully accounts for the uncertainty in radiative forcings (for LSM and distributed hydrologic model applications) by carefully considering cross correlations and spatial correlations between different sources of uncertainty.

[6] The resulting ensemble is then used in an ensemble-based DA scheme in part 2 of this study [Forman and Margulis, 2010] where readily available radiation measurements are used to condition prior estimates. This conditioning step merges information content derived from more sophisticated retrieval algorithms into the conditioned ensemble while simultaneously reducing uncertainty across the ensemble. This conditioned (or posterior) ensemble is what could ultimately be used as “open loop” forcing in LSM applications. Additionally, the same ensemble could be used as a downwelling radiative flux validation data set for climate modelers.

[7] The larger vision for this work is to eventually yield physically consistent ensemble estimates of both radiative and precipitation forcings. Only the radiation study is presented here, but a precipitation model is in development that uses many of the same inputs and employs the same overall framework. The findings in this study serve to demonstrate the feasibility of such an approach using the radiation model as an example with the caveat that precipitation forcing is an equally important consideration to be addressed in future work.

2. Model Description and Background

[8] A previously developed bulk downwelling radiative flux model [Forman and Margulis, 2009] is used in this study. The data-driven model relies solely on a suite of satellite-based measurements to specify atmospheric and land surface conditions necessary for estimation of radiative fluxes reaching the Earth's surface. An important hypothesis in the model formulation used in this study is that clouds are a first-order modulator of downwelling radiative flux processes. Clouds serve to attenuate SW radiation while simultaneously amplifying LW radiation, which effectively couples the two radiative fluxes through cloud states. Remotely sensed cloud states such as cloud base temperature, cloud phase (liquid versus ice), and cloud water path are used in the estimation of LW emission. Cloud states such as hydrometeor size, cloud water path, and cloud phase dictate the scattering and absorption processes of downwelling SW radiation. It is important to note the model is diagnostic rather than prognostic, which has implications on the DA scheme discussed in part 2 [Forman and Margulis, 2010]. The prior model could be made prognostic through the inclusion of a cloud resolving forward model, for example, but is left in diagnostic form for reasons of computational efficiency when applied in a reanalysis study.

[9] At the heart of the model is the high-resolution (in space and time) Visible Infrared Solar-infrared Split-window Technique (VISST) cloud product created by the NASA Langley Research Center [Minnis et al., 1995, 2008]. In addition to cloud states, a number of remotely sensed land surface and atmospheric states are utilized. Surface albedo derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument is used to estimate SW reflection while MODIS-based column-integrated bulk water vapor (total precipitable water) is used to estimate SW absorption. LW estimation utilizes reference-level atmospheric states of specific humidity and air temperature derived from both the MODIS instrument and the Atmospheric Infrared Sounder (AIRS) instrument. For brevity, the satellite-based inputs are only mentioned here while further details are given by Forman and Margulis [2009]. Additionally, a brief description of the radiation model formulation is found in Appendix A. It is important to remember that all of the remotely sensed cloud, atmospheric, and land surface states are uncertain. As will be discussed in more detail, perturbation of these uncertain inputs is a large part of the ensemble formulation discussed below.

3. Prior Ensemble Framework

[10] An important step en route to conducting ensemble-based simulations is to determine which model inputs and/or parameters (hereafter referred to as inputs for simplicity) should be perturbed as to represent the appropriate amount of uncertainty in the model output. In addition, we desire realistic ensemble replicates that account for both spatial correlation within an input as well as cross correlations between different inputs. The low dimensionality of the data-driven model used in this study [Forman and Margulis, 2009] makes this procedure more feasible. Not accounting for such correlations could lead to unreasonable replicates that do not properly reflect system physics.

[11] It is difficult, if not impossible, to know the exact uncertainty structure of a radiative flux model [Lee and Margulis, 2007b]. In absence of this information, however, we believe the spatial and temporal variability of the inputs serves as a reasonable proxy for its uncertainty. For example, we assume low variability in the inputs suggests low uncertainty due to the relatively simplified task of quantifying a low variability feature. Conversely, we assume large variability suggests greater uncertainty as the opportunities for error are increased when quantifying a dynamic, highly variable feature. This approach may not be ideal, but we argue it is more rigorous than traditional, ad hoc approaches (e.g., assuming spatial uniformity and/or mutual independence). For example, application of multiplicative, spatially uniform perturbations neglects cloud structure and explicitly assumes clear sky uncertainties are greater than cloudy sky uncertainties. However, this has been shown not to be true [Forman and Margulis, 2009; Pinker et al., 2003] and, in most cases, to be the exact opposite.

[12] Uncertainties within cloudy regions, too, may differ due to differences in cloud structure. For example, optically thick cloud regions behave differently on SW fluxes than those in regions of moderate optical thickness; uncertainties in low-lying (and relatively warm) cloud regions influence LW fluxes differently than those in high-altitude (and relatively cold) cloud regions. Further, cross correlations between different sources of uncertainty should be considered. In addition, if large spatial domains are investigated, careful accounting often requires consideration of correlation lengths in order to properly reflect a characteristic length scale in uncertainty due to cloud structure and regional climatology. The accurate characterization of uncertainty should lead to better incorporation of uncertain characteristics and hence yield an ensemble that implicitly includes an improved representation of its inherent structure.

[13] The ensemble-based framework begins with a simple vector representation of the model output, y(x, t), that is explicitly dependent on both space (x) and time (t)
equation image
where RSW(x, t) is the downwelling SW flux, RLW(x, t) is the downwelling LW flux, ��[] is the radiation model operator, and u(x, t) is the vector of model inputs. u(x, t), as discussed in more detail in section 3.1, is composed of satellite-derived measurements as well as empirical parameterizations of atmospheric and land surface processes.
[14] We assume u(x, t) is a random vector that explains the uncertainty in y(x, t) and that u(x, t) subsequently follows some underlying (unknown) probability density function (pdf) upu(u). In principle, given the underlying pdf, we can sample it to generate an ensemble of fluxes such that
equation image
where j represents a single replicate from an ensemble of size N and ujpu(u). Since the underlying pdf is generally unknown, we assume that
equation image
where equation image(x, t) is the nominal estimate, γj(x, L) is a perturbation replicate sampled from the distribution of γ(x, L), L is the correlation length vector that defines spatial correlations in u(x, t), γ(x, L) ∼ ln(1, Cγ(x)), and Cγ(x) is the perturbation covariance matrix used to characterize the uncertainty structure where L is implicit in its formulation. The multiplicative γ(x, L) formulation used in this study assumes u(x, t) is unbiased; however, this assumption could be relaxed.

[15] Since all of the model inputs in u(x, t) are positive quantities, a multiplicative lognormal perturbation model is chosen. This type of model is commonly used in hydrological applications [Andreadis and Lettenmaier, 2006; Dunne and Entekhabi, 2006; Margulis et al., 2006; Lee and Margulis, 2007a; Reichle et al., 2008; Pan and Wood, 2006] and avoids the specification of physically unrealistic (i.e., negative) quantities. The selection of a lognormal perturbation model allows us to incorporate first- and second-moment information into the ensemble while maintaining feasibility by ignoring higher moments [Khare et al., 2008].

[16] As shown in (3), the perturbations used in this study do not fully characterize the temporal aspects of the perturbation correlations. In an effort to address this issue while lacking adequate information necessary to characterize the temporal uncertainty structure of the temporally sparse measurements, we make the following assumptions: (1) perturbations are uncorrelated day to day, and (2) perturbations are perfectly correlated throughout the course of a day. The first part of this assumption (i.e., day-to-day independence) is based on an investigation of continuous atmospheric states (e.g., air temperature, specific humidity, column-integrated water vapor) where temporal autocorrelations were computed and, in general, suggested relatively small (−0.3 to 0.3) correlations after 24–48 h (results not shown). Unfortunately, this same procedure could not be conducted using the discontinuous cloud states. That is, from a Eulerian perspective, clouds regularly advect into and out of the domain making computation of autocorrelations difficult due to the presence of large numbers of missing values. Therefore, the limited ability to compute autocorrelations within the highly dynamic and discontinuous fields, coupled with the results from the continuous field analysis, motivated the assumption of interday independence. Similarly, given the lack of available information to characterize uncertainties on a subdiurnal time scale, we feel the intraday assumption is reasonable.

[17] Sections 3.13.3 discuss the data-based approach for defining input perturbations. First, the selection criteria used to determine which uncertainties are significant, and hence should be included in the perturbation process, are considered. Next, the covariance matrix, Cγ(x), is constructed. The variance, covariance, and correlation lengths of the different sources of uncertainty are controlled by Cγ(x) within a regionalized space that is defined using computed variograms. Finally, the conditional turning bands algorithm used to generate the cross-correlated random fields is highlighted.

3.1. Sensitivity Analysis

[18] A sensitivity analysis was first performed to investigate the relative influence each input has on model output. Only sensitive inputs in u(x, t) that produce a relatively large amount of change in model output are included in the perturbation process. Relatively insensitive inputs should be excluded to reduce the number of degrees of freedom. A commonly used method in assessing model sensitivities is the computation of sensitivity coefficients. One such technique is the Influence Coefficient Method [Willis and Yeh, 1987]. Perturbations are made to one input at a time and are typically centered about a nominal value. Using the resulting output, normalized sensitivity coefficients for each input are calculated as
equation image
where image is the normalized sensitivity coefficient for a given input um(x), image is the resulting change in radiative flux, equation imageimage is the radiative flux associated with the nominal input, equation imagem(x), and δum(x) is the amount of input perturbation. Table 1 shows typical results of local sensitivity analyses conducted at local solar noon during a variety of cloud cover conditions. Tests were conducted at local solar noon because total downwelling flux is relatively large at this time of day, which allows for a more magnified assessment of input sensitivity. A total of nine (9) inputs with significantly large sensitivity coefficients, as highlighted by the bold font in Table 1, are subsequently included in the perturbation process. Additional details on the satellite-based inputs and/or process parameterizations are given by Forman and Margulis [2009]. The final size of the perturbation covariance matrix, Cγ(x) is defined based on the sensitivity results such that Cγ(x) ∈ image where Nγ = 9.
Table 1. Normalized Sensitivity Coefficients From Longwave and Shortwave Models Near Local Solar Noona
Model Input Source Model Clear Sky Cloudy Skyb
Air humidity Sat LW 0.12 0.036
Air temperature Sat LW 12 7.7
Cloud base temperature Sat LW N/A 0.52
Cloud water path Sat LW N/A 0.0013
Aerosol scattering P SW −0.10 −0.12
Albedo Sat SW 0.009 0.15
Cloud hydrometeor size Sat SW N/A 0.52
Cloud water path Sat SW N/A −0.55
Column-integrated water vapor Sat SW −0.13 −0.21
Downward diffuse scattering P SW 0.007 0.00013
Ozone absorption (ultraviolet) P SW −0.0005 −0.0005
Ozone absorption (visible) P SW −0.0004 −0.0004
Rayleigh scattering P SW −0.06 −0.09
  • a Variables in bold font perturbed as part of ensemble generation.
  • b Combined ice-phase and liquid-phase cloud effects; LW, longwave model; N/A, not applicable; P, parameterized as a function of solar zenith angle; Sat, satellite derived; SW, shortwave model.

3.2. Uncertainty Characterization

[19] Uncertainties in satellite-based inputs used in radiation models are generally not well characterized. Spatial scale differences between coarse, satellite-based estimates and point-scale, ground-based radiometers commonly used in the characterization procedure make the process more difficult. Further, a shortage of independent observations available for comparison often limits the amount of uncertainty characterization that may be performed. Despite these limitations, an explicit attempt is made to better characterize these uncertainties in order to generate a covariance matrix, including specification of correlation lengths, such that a physically realistic set of perturbations is produced for use in (3). We are motivated to produce a physically realistic set of perturbations that describes the uncertainties in the satellite-based inputs such that a positive (or negative) perturbation of one input will produce appropriate perturbations in all other inputs. We believe such an approach will yield an ensemble of radiative fluxes that not only implicitly contains the complex spatiotemporal uncertainties inherent in satellite-based radiative flux estimation but does so in a physically consistent manner.

[20] Uncertainties are defined by the covariance matrix, Cγ(x), in which correlation lengths, L, are implicit. Since we assume the multiplicative perturbations are mean unity, we define a single element of the covariance matrix as
equation image
where image is the correlation coefficient between variables um and un, image is the standard deviation of um, and image is the coefficient of variation of um. Computation of the CV values is done directly with the satellite-based inputs in conjunction with a consistency check using ground-based observations (when available), and is discussed further in section 3.2.1. Computation of the correlation matrix is highlighted in section 3.2.2. Finally, spatial correlations represented by L are estimated using variograms as discussed in section 3.2.3.

3.2.1. Coefficients of Variation

[21] Coefficients of variation (CVs) for the inputs were first computed across space as a function of time. Next, the temporally varying CVs were averaged over the 14 month simulation period (discussed in section 4) in order to estimate a typical CV for each input. Since the perturbations in (3) are mean unity by definition, the CVs are inserted directly into (5). It is important to note that all CVs for cloud states were computed using only estimates collected near local solar noon. This was done because of the additional information content derived from visible and infrared channels rather than using less accurate estimates at nighttime, which are derived from infrared channels only [Minnis, 2001; Minnis et al., 2008]. When satellite-based state estimates were not available for use, CVs were based on reported literature values. Namely, aerosol scattering and Rayleigh scattering CVs were based on work by Lee and Margulis [2007b]. The collection of CVs ultimately used in the perturbation procedure are listed in Table 2.

Table 2. Approximated Correlation Lengths, Coefficients of Variation, and Upper Thresholds Used in the Generation of Correlated Ensemble Perturbationsa
Model Variable Units Model CL (km) CV UT
Aerosol scattering (AS) - SW 200 0.030 3
Air specific humidity (qa) kg kg−1 LW 200 0.52 3
Air temperature (Ta) K LW 250 0.010 1.03
Cloud hydrometeor size (HS) μm SW 80 0.35 2
Cloud base temperature (Tc) K LW 150 0.015 1.15
Cloud water path (WP) g m−2 LW/SW 100 0.50 3
Column-integrated water vapor (WV) cm SW 200 0.15 2
Land surface albedo (Al) - SW 100 0.17 2
Rayleigh scattering (RS) - SW 200 0.028 3
  • a CL, correlation length; CV, coefficient of variation; UT, upper threshold.

[22] The reference-level state (i.e., air temperature and air humidity) CVs derived from MODIS and AIRS inputs included an additional check. Namely, these values were compared against uncertainty estimates derived via direct comparison to independent, ground-based observations. This procedure was conducted because published estimates of satellite-derived temperature and humidity uncertainty are generally conducted at altitude use radiosonde measurements, and not at the reference-level elevation used in this study. Observations of reference-level (∼2 m above the ground surface) air temperature and humidity from the Oklahoma Mesonet [Brock et al., 1995] were used. Figure 1 shows the location of the Mesonet stations. Assuming the Mesonet observations are a measure of the truth, bias and root mean squared error (RMSE) were computed. A total of 127 observation stations were used in the calculations.

Details are in the caption following the image
Southern Great Plains (SGP) domain used in this study as delineated by the thick black line. The black circles and white squares represent ground-based measurement stations from ARM SIRS and Oklahoma Mesonet, respectively.

[23] Over the entire 14 month period, the model reference-level air temperature shows a relatively small bias of −0.65 K whereas the dew point temperature shows a significant dry bias of −3.5 K. The latter agrees with other findings suggesting a dry bias during relatively moist conditions in the AIRS product [Wu et al., 2005] as well as the MODIS product [Seemann et al., 2003]. In addition, the bias appears to suggest some seasonality where the bias increases during the winter months. This behavior, too, agrees with reported findings that MODIS contains a wet bias during dry conditions [Albert et al., 2005; Seemann et al., 2003]. The RMSE values generally agree with the known uncertainty of the MODIS and AIRS sensors [Albert et al., 2005; Seemann et al., 2003; Wu et al., 2005], and furthermore, do not appear to demonstrate the seasonality seen in the bias. The results from this additional check suggest the amount of uncertainty in the satellite-derived, reference-level estimates of air temperature and humidity are reasonable, and furthermore suggest the level of uncertainty introduced into the perturbations (Table 2) are representative of the uncertainty observed in comparison against Mesonet observations.

3.2.2. Cross Correlations

[24] Determination of cross correlations, and hence uncertainty cross correlations, began with direct examination of the model inputs. The investigation started with careful quality control and assurance, and verification that all variables were properly colocated with one another in both time and space. The vector form of these colocated spatial fields were then used to compute the (lognormal) correlation matrix using the following expression:
equation image
where image is the correlation matrix, E[] is the expectation operator, um and un are two model variables in vector form, equation imagem and equation imagen are the vector means of um and un, respectively, and image and image are the standard deviations of um and un, respectively. This process was conducted using regionalized blocks of data as a means of incorporating estimates of spatial correlation, which is discussed in section 3.2.3.

[25] An average of the cross correlations computed near local solar noon during the 14 month investigation period is provided in Table 3. The resulting correlation matrix was subsequently used to specify cross correlations during the perturbation generation procedure discussed in section 3.3. As is clearly seen, some inputs are strongly correlated with one another whereas others behave independently. A simple t value to test the significance of the correlation coefficients at a level of α = 0.01 showed that correlations greater than 0.074 are significant and that one cannot reject the null hypothesis that variables are completely uncorrelated. Correspondingly, all correlations less than 0.074 were set equal to zero.

Table 3. Data-Derived Lognormal Correlation Matrix Used in Generating Variable Perturbationsa
A 1 0 0 0 0 0 0 0 0
AS 0 1 −0.24 0.8 −0.44 −0.45 0 0 0
qa 0 −0.24 1 −0.24 −0.18 0.18 0 0.16 −0.11
RS 0 0.8 −0.24 1 0 −0.4 0 0 0
Ta 0 −0.44 −0.18 0 1 0.24 0 0 0
WV 0 −0.45 0.18 −0.4 0.24 1 0 0 0
HS 0 0 0 0 0 0 1 −0.67 0.35
Tc 0 0 0.16 0 0 0 −0.67 1 −0.3
WP 0 0 −0.11 0 0 0 0.35 −0.3 1
  • a A, albedo; AS, aerosol scattering coefficient; qa, air specific humidity; RS, Rayleigh scattering coefficient; Ta, air temperature; WV, column-integrated water vapor; HS, cloud hydrometeor size; Tc, cloud base temperature; WP, cloud water path.

3.2.3. Spatial Correlations

[26] Uncertainties may be considered as regionalized random variables up to a certain distance at which point they behave more or less independently. In radiative flux processes, correlation lengths are associated with either clear sky or cloudy sky conditions. For example, in the presence of clouds, uncertainty correlation lengths are typically equal to or less than the approximate length scale of the clouds present. During clear sky conditions, the correlation length scales are typically longer than the correlation length scale in the presence of clouds due to clear sky atmospheric conditions being more continuous over longer distances relative to localized, and often discontinuous, cloud fields.

[27] In a manner similar to determining the uncertainty cross correlations, a data-driven approach was used to derive uncertainty correlation lengths. The estimation of correlation lengths involved computing a variogram for each input listed in Table 1. The variogram is defined as the mean quadratic increment (divided by 2) of a given field, u(x), for any two points separated by a distance h. It may be generally expressed as
equation image
where Γ(h) is the variogram and E[] is the expectation operator [Wackernagel, 2003]. The correlation length for a given input is estimated as the distance at which the variogram reaches a near-asymptotic value less than 1/e but greater than zero. Variograms were computed for each input over a range of atmospheric conditions across the 14 month investigation period. In general, the correlation lengths showed relatively little seasonality in their values except for the cloud states, which contained shorter correlation lengths during the summer months due to the presence of convective cloud systems. However, these changes as a function of season were relatively insignificant. The typical correlation length estimated for each input is compiled in Table 2.

3.3. Ensemble Perturbation Generation

[28] Correlated lognormal fields were derived using the Turning Bands Co-simulation (TBCOSIM) algorithm of Emery [2008]. TBCOSIM produces isotropic, multidimensional stationary random fields that are mean zero with unit variance. Inputs to the TBCOSIM algorithm include the Gaussian transformation of the lognormal correlation matrix shown in Table 3 as well as specification of correlation lengths shown in Table 2. The Gaussian transformation of the correlation matrix, equation imageimage first required the Gaussian transformation of the covariance matrix, equation imageimage which is a function of the lognormal coefficients of variation listed in Table 2 and the lognormal correlation matrix listed in Table 3. The transformed Gaussian covariance [Wang, 1998] was computed as
equation image
Next, the Gaussian transformation of the correlation matrix was computed as
equation image
The Gaussian transformation of the correlation matrix along with prescribed spatial correlations were used as input to the TBCOSIM algorithm. Spatial correlations were assigned using a “scale factor” parameter discussed by Emery [2008]. In order to ensure the (lognormal) random fields contained correlation lengths as listed in Table 2 utilizing the specified “scale factor”, correlation lengths were recomputed from the newly generated fields to make certain L matched the values listed in Table 2. The random fields were transformed from normal to lognormal quantities by computing their exponentials and subsequently multiplied by their corresponding CVs (Table 2) in order to ensure the desired covariance. Each set of lognormal perturbations was tested to verify they comprised a unit mean across the ensemble and that the prescribed covariances and correlation lengths were preserved.

[29] A few minor modifications were made to some of the fields in order to remove perturbations that resulted in physically unrealistic quantities. Examples include correction of albedos greater than one as well as cloud base temperatures exceeding reference-level air temperatures. These modifications were only made to values at the tail of the distributions and generally constituted changes to less than 0.5% of the total number of values. An exception to this is the reference-level air temperature perturbations in which approximately 2% of the values exceeded the threshold and were subsequently modified. Upper thresholds listed in Table 2 were based on reported literature values [Dong et al., 2002; Minnis et al., 2008] as well as on the need to impose realistic physical constraints on the generated fields.

[30] When a perturbation exceeded an upper threshold at a given location, the threshold value was applied to that location. Perturbations at these locations could have been achieved by simply drawing a new sample from the specified distribution, however, this was avoided for two reasons: (1) assigning the threshold value introduces less negative skew into the lognormal distributions, and (2) assigning the threshold value more closely preserves the correlation lengths specified in Table 2. Any fields modified by this procedure were reexamined to ensure mean unity was maintained.

4. Model Application

[31] The Southern Great Plains (SGP) of the United States (Figure 1) was used for model application in this study. This is the same region used in the deterministic model application of Forman and Margulis [2009]. The region covers a 10° × 10° area in the middle of the continental United States and, for the most part, includes the states of Nebraska, Kansas, Oklahoma, and Texas. The relatively homogeneous terrain in the SGP allows one to treat the satellite pixel field of view (on the order of kilometers) as a homogeneous region, which helps minimize much of the scale differences between satellite-scale estimates and the point-scale observations used for comparison.

[32] The selection of the SGP was further motivated by the wealth of ground-based observations used for radiative flux comparison (or verification) and for uncertainty characterization of satellite-based inputs. As shown in Figure 1, multiple ground-based measurement networks exist in the region. These include the Solar Infrared Radiation Stations (SIRS) operated by the Atmospheric Radiation Measurement (ARM) Program under the auspices of the United States Department of Energy as well as the Oklahoma Mesonet [Brock et al., 1995].

[33] A simulation period of 14 months from 1 August 2003 through 30 September 2004 was used in this study. This time period yielded coincident satellite-based inputs used in the radiation model and enabled the investigation of a minimum of one annual cycle. The former was necessary to maximize the amount of model output that could be produced whereas the latter was necessary to investigate the effects of seasonality.

5. Results and Discussion

5.1. Correlated, Spatially Distributed Perturbed Model Inputs

[34] An example of perturbed cloud hydrometeor size along with perturbed cloud temperature during cloudy sky conditions is shown in Figure 2. The nominal field is included for reference. Figure 2 is an ideal example of how spatially correlated and cross-correlated uncertainties are accounted for between states. Notice how an increase in cloud hydrometeor size, in general, corresponds to a decrease in cloud temperature, and vice versa. The prescribed negative correlation listed in Table 3 was incorporated as an input in the random field generation procedure. Random perturbations were simultaneously generated for all other inputs shown in Table 2 and were regenerated for each day of the simulation. As a result, perturbations were perfectly correlated in time during a given day and uncorrelated between days.

Details are in the caption following the image
Example of nominal and perturbed cloud hydrometeor sizes on 19 August 2004 near local solar noon for (a) nominal, (b) replicate 8, and (c) replicate 9. (d, e, and f) Corresponding perturbed cloud base temperatures for Figures 2a, 2b, and 2c, respectively. White areas in the southwestern and northeastern portions of the domain represent cloud-free areas. Axis labels have been omitted for simplicity.

[35] One additional aspect of ensemble generation worth noting is the selection of an appropriate ensemble size. Larger ensembles better approximate specified error distributions; however, computational constraints limit feasible ensemble sizes. Based on the convergence of root mean squared difference (RMSD) and mean difference (MD) values computed between ensemble means and corresponding SIRS measurements (results not shown), an ensemble size of 50 was deemed sufficiently large for this study.

5.2. Spatially Distributed Estimates

[36] Prior (or unconditioned) ensemble realizations were generated for the specified simulation period. Qualitative assessment was performed via investigation of radiative fluxes distributed across the study domain. Figure 3 shows results of the ensemble mean, 〈y〉, and the ensemble standard deviation, σy, during cloudy sky conditions near local solar noon on 19 August 2004. The 〈 〉 symbol represents the arithmetic mean operator. The southwestern corner of the domain is cloud free and, in general, contains lower uncertainty in both the LW and SW ensembles. In regions with cloud cover, the uncertainty increases due to the modulation of downwelling radiation by uncertain cloud states. LW fluxes are most uncertain, in general, in regions with relatively low-lying (and hence relatively warm) cloud base elevations. The multiplicative formulation of the perturbations introduces a greater degree of uncertainty into the uncertain lower (warmer) cloud base temperatures and hence yields more uncertainty in areas of enhanced downwelling LW flux via cloud contributions. SW fluxes are most uncertain in regions of clouds associated with small to moderate optical thicknesses. This is due to the fact that small perturbations in cloud hydrometeor size and cloud water path can exert a relatively large impact on scattering and absorption processes, respectively. In regions of optically thick clouds (e.g., deep convective cells), the SW uncertainty is small because changes in hydrometeor size or water path cause relatively little change in the already strongly attenuated SW signal. Figure 3 demonstrates the ability of the ensemble framework to capture the large-scale structure of downwelling radiative fluxes while preserving the fine-scale uncertainty structure. This is a good example of how the approach quantifies the heterogeneous uncertainty structure in satellite-based radiative fluxes where more traditional methods (e.g., assumption of spatial uniformity) do not.

Details are in the caption following the image
Prior ensemble means for (a) longwave and (b) shortwave fluxes during cloudy sky conditions near local solar noon on 19 August 2004. Ensemble standard deviations are shown for (c) longwave and (d) shortwave radiation.

[37] Further evidence of the advantage of using such a perturbation approach over a more simple, lumped perturbation approach is illustrated in Figure 4. The simple approach shown on the right-hand side utilized a multiplicative, lumped perturbation set (mean unity) to perturb the nominal radiative flux estimate of (1). This scalar perturbation set was derived using the coefficient of variation computed from the ensemble shown on the left-hand side of Figure 4 such that the ensemble-averaged, domain-averaged uncertainty of the radiative fluxes is identical in each of the two different techniques.

Details are in the caption following the image
Example of physical realism in shortwave uncertainty at local solar noon during cloudy sky conditions on 29 September 2003. Ensemble means for the (a) cross-correlated, spatially correlated and (b) lumped, scalar approaches. Ensemble standard deviations for the (c) cross-correlated, spatially correlated and (d) lumped, scalar approaches.

[38] Upon inspection of Figure 4 one notices two distinct features: (1) the ensemble means are nearly identical, and (2) the uncertainty structure between the two is vastly different. The methods outlined in section 3 (i.e., left-hand side of Figure 4) yield small uncertainties in areas of clear sky, or in areas of cloudy sky with large optical thickness, while relatively large uncertainties are found in cloudy sky regions with low to moderate optical thicknesses. The simple perturbation approach (Figures 4b and 4d) yields an uncertainty structure where high uncertainty coincides with large fluxes and low uncertainty coincides with low fluxes. This uncertainty structure is inaccurate [Forman and Margulis, 2009]. Uncertainty, in general, increases in cloudy sky regions because cloud presence introduces variability (and uncertainty) into downwelling radiative fluxes. Careful perturbation of both clear and cloudy sky states within the radiative flux model yields this type of an uncertainty structure whereas a simple, lumped perturbation of the nominal radiative flux does not.

[39] Accounting for such uncertainty structure in radiative forcings should lead to better characterization of the key modes of variability in land surface states and fluxes. For example, consider a land surface model (LSM) that provides estimates of latent (LE) and sensible (H) heat fluxes based in part on LW and SW forcings. Further, assume one seeks to merge existing, satellite-based estimates of soil moisture with model-based estimates of soil moisture in an effort to improve upon those of model or measurements alone. It is important to first note that the success of this merging (or assimilation) will partly depend on the accuracy (and magnitude) of the computed covariance between the model states and the predicted measurements. Suppose the LSM has been forced with estimates of LW/SW fluxes that accurately characterize their uncertainty, which in turn should lead to more certain LE/H fluxes. As a result, LSM-based soil moisture should be more certain, and in turn, presumably contain a stronger correlation structure with the predicted satellite measurements thereby allowing for a larger (and presumably more accurate) update. Next, suppose the LSM has been forced with LW/SW fluxes that do not accurately represent the uncertainty (e.g., large uncertainty in clear sky regions and small uncertainty in cloudy sky regions). When the covariance is computed between the LSM-based estimates and the predicted satellite measurements, the structure will likely be smaller in magnitude, and therefore less effective at transferring information from the measurements into the model. Further, an improved a priori uncertainty structure enhances spatial correlations between ensemble replicates such that information transfer between measurements at different locations within the domain is improved. Accurate characterization of prior uncertainty requires more work up front, but creates greater opportunities during posterior conditioning.

5.3. SIRS Comparisons

[40] Quantitative investigation into ensemble performance began with a comparison of the ensemble mean, 〈y〉, against all available SIRS observations as discussed in section 5.3.1. In addition, the ensemble mean of the replicate-wise RMSD (i.e., RMSD computed one replicate at a time for all available SIRS observations), 〈RMSDy〉, is presented in section 5.3.2.

5.3.1. Ensemble Mean

[41] The ensemble mean, 〈y〉, was first colocated with the SIRS locations at which time performance metrics (i.e., MD, RMSD, and correlation coefficient (ρ)) were computed on an hourly basis. Table 4 shows the average of the hourly values over the 14 month simulation period. The same procedure was conducted with the nominal simulation results [Forman and Margulis, 2009] and is included for reference. It is important to note, however, that these values differ fromthose presented by Forman and Margulis [2009] because no spatial aggregation procedure was conducted and because hourly statistics were not subsequently averaged across the diurnal cycle. As is shown, the ensemble mean contains a similar amount of MD and RMSD compared to the nominal simulation. RMSD increases due to the nonlinear behavior of both LW and SW radiation modules coupled with the effects of perturbations located in the elongated tail of the lognormal distributions and the effects associated with the use of a finite ensemble size. The low temporal correlation between LW estimates and SIRS observations is largely due to errors associated with the diurnal interpolation of temporally sparse air temperature and humidity measurements as well as due to errors in the satellite-based inputs [Forman and Margulis, 2009].

Table 4. Hourly Statistics of Mean Difference (MD), Root Mean Square Difference (RMSD), and Pearson Correlation Coefficient (ρ) Using Ensemble Mean Results Relative to Hourly SIRS Measurements for the Period of August 2003 Through September 2004 With Results From the Nominal Simulation Included for Referencea
Metric MD (W m−2) RMSD (W m−2) ρ
LW-EM 1.4 29 0.59
LW-NS 0.4 25 0.61
SW-EM 2.3 46 0.98
SW-NS −2 40 0.98
  • a EM, ensemble mean; NS, nominal simulation.

[42] Closer inspection of the ensemble relative to SIRS observations shows the ability of the ensemble to encapsulate most measurements. Figure 5 shows an example of a single SIRS station during cloudy sky conditions on 19 August 2004 (keeping in mind that all stations for all hours were used in the statistical comparisons). The ensemble as a whole is able to capture the ground-based observations excluding an early morning due to a gap in the satellite-based cloud inputs. Despite inherent scale differences between the point-scale observations and the satellite pixel-scale model output, the ensemble mean is able to accurately reproduce most of the observations during conditions with significant cloud cover. The nominal simulation (not shown) is similar to the ensemble mean and, in general, agrees with the time-integrated statistics shown in Table 4. Similar behavior is seen in Figure 6 during clear sky conditions where the ensemble mean agrees well with SIRS and that the ensemble as a whole is able to encapsulate the ground-based observations. The performance of the ensemble relative to independent, ground-based observations lends credibility in the individual replicates to reproduce the observations. Sections highlight the performance of the ensemble as a whole and its ability to represent the inherent uncertainty.

Details are in the caption following the image
Prior (a) longwave and (b) shortwave ensembles during cloudy sky conditions including corresponding ground-based measurements for a single SIRS station colocated in space and time on 19 August 2004 in local standard time (LST). Individual replicates are shown as solid lines, the ensemble mean is shown as a dashed line, and the SIRS measurements are shown as open circles. The gap near 0400 is due to a systematic gap in the satellite-derived cloud inputs.
Details are in the caption following the image
Same as Figure 5 but for clear sky conditions on 15 September 2003.

5.3.2. Ensemble RMSD

[43] The ensemble-averaged RMSD, 〈RMSDy〉, begins by computing the replicate-wise RMSD at all available SIRS locations. Next, the ensemble-averaged RMSD is computed across all available replicates. This is done repeatedly as a function of time and subsequently tallied as a time series. Monthly averaged estimates of 〈RMSDy〉 are shown in Figure 7 for both the lumped, scalar (S) approach and the cross-correlated, spatially correlated (C) approach during both clear and cloudy sky conditions.

Details are in the caption following the image
Monthly averaged, ensemble-averaged RMSD for (a) LW and (b) SW fluxes. Cross-correlated, spatially correlated (C) estimates are shown in black whereas lumped, scalar (S) estimates are shown in gray. August and September estimates include values from both 2003 and 2004.

[44] Inspection of Figure 7a shows relatively few differences between the two approaches when applied to LW fluxes. This is reasonable since both approaches produce increased uncertainty in cloudy sky regions. The cross-correlated, spatially correlated approach, in general, yields lower RMSD values clear sky conditions and comparable RMSD during cloudy sky conditions. In addition, the correlated approach is able to encapsulate more observations based on a computed containing ratio (results not shown). The containing ratio, CR, is simply the number of observations that fall within the ensemble range normalized by the total number of observations, and is computed as CR = equation image where I[O(x, t)] = 1 if ymin(x, t) ≤ O(x, t) ≤ ymax(x, t) or I[O(x, t)] = 0 otherwise. In general, the cross-correlated, spatially correlated approach applied to LW flux has a larger CR. That is, it captures more observations with less uncertainty during all sky conditions when compared against the lumped, scalar approach.

[45] The SW statistics in Figure 7b suggest significant differences between the two approaches. The correlated approach is both more precise and accurate than the scalar approach. Further, the correlated approach is much more certain during clear sky conditions than when clouds are present. This agrees with the findings in section 5.2 where consideration of cross correlations and spatial correlations between different sources of uncertainty yields a more physically realistic uncertainty structure.

5.4. Rank Histograms

[46] Investigation of the ensemble mean and ensemble RMSD is useful in highlighting systematic errors and assessing general performance of the prior ensemble. However, these comparisons do not address the question of how adequately the ensemble captures some notion of the truth. Furthermore, questions remain as to the performance of individual replicates within the ensemble as well as performance of the ensemble as a whole. In order to examine these issues, we employ the rank histogram, which has been used in ensemble-based hydrologic studies [e.g., De Lannoy et al., 2006]. The reader is referred to Hamill [2001] for a thorough discussion on the interpretation of rank histograms.

[47] In essence, the use of rank histograms helps answer whether or not the spread of an ensemble represents the variability (and uncertainty) of a set of observations. Rank histograms diagnose whether an ensemble contains too much or too little spread, and whether or not it contains bias. Rank histograms can uncover the occurrence of a systematic bias; however, they do not have the ability to express the magnitude of that bias. In addition, it is important to note that an ensemble with a uniform rank histogram is no guarantee that the ensemble is an adequate representation of the truth [Hamill, 2001]. The rank histogram, however, is useful in diagnosing deficiencies in the ensemble, and when used in conjunction with other metrics, can lead toward the development of an ensemble that is well dispersed and maintains good agreement with independent observations.

[48] Figure 8 shows the rank histogram for the prior ensemble during all sky conditions at 23 observation locations over the course of the 14 month simulation. Hourly SIRS observations were used throughout the course of each day in the LW comparison and only during daylight hours in the SW comparison. Rank histograms were generated for shorter time periods such as daily and seasonal time scales (results not shown) that, in general, suggest the LW ensemble contains a positive bias during subzero temperatures and that the SW ensemble contains a small negative bias during clear sky conditions. The former results from a significant positive bias in the satellite-derived estimates of reference level air temperature and dew point temperature during the winter months (results not shown) despite a relatively small, negative bias found for the entire 14 month simulation period. The latter partially results from a positive bias in the MODIS water vapor product [Ferrare et al., 2002], which causes increased water vapor absorption of shortwave radiation. Separate rank histograms were generated for clear sky and cloudy sky conditions, too, but are not shown because the ensemble behavior, in general, is similar to that shown in Figure 8.

Details are in the caption following the image
Rank histograms for prior ensembles of (a) longwave and (b) shortwave radiative fluxes. For graphical clarity, each histogram bar indicates bin counts from three consecutive rank histogram bins (e.g., rank 2 is the average of ranks 1–3).

[49] As shown in Figure 8, both LW and SW ensembles are underdispersed as expressed by the U-shaped features. That is, neither ensemble contains enough dispersion to always encapsulate all of the observations. This underdispersion is partially related to the systematic biases discussed in the previous paragraph. In addition, some underdispersion and bias results from the diurnal interpolation procedure employed during LW estimation [Forman and Margulis, 2009]. Underestimation, and hence under dispersion, of the upper ranks of the SW radiation estimates occurs primarily during clear sky conditions and is typically <5 W m−2, which is a relatively small flux. However, this consistent behavior yields the asymmetric form of the rank histogram found in Figure 8b suggesting an underperforming SW ensemble. Underdispersion of the lower ranks of the SW ensemble typically occurs in the presence of optically thick cloud regions and are generally on the order of 10–100 W m−2 (see Figure 5b near local solar noon for an example). The SW prior ensemble, in general, agrees well with the SIRS observations and encapsulates (or comes close to encapsulating) the observations during the vast majority of conditions. The LW ensemble performs less satisfactorily than the SW ensemble, but in general, performs well at capturing the observations except during the arrival of large-scale cold fronts, which is related to deficiencies in the diurnal interpolation routine [Forman and Margulis, 2009] as well as due to errors in the satellite-based inputs. Despite the presence of under dispersion in the prior ensembles, the rank histograms in conjunction with containing ratios suggest both LW and SW ensembles capture a majority of the observations across a range of seasonal cycles, diurnal cycles, and cloud cover conditions while containing a representative amount of variability and uncertainty within the ensemble.

5.5. Conditional Quantile Plots

[50] In addition to scalar rank histograms, the use of conditional quantile plots is a commonly used verification measure for continuous variables. Quantile plots graphically display certain aspects of the joint distribution of the estimated variable relative to observations, which can help with diagnosis of model errors and biases [Wilks, 1995]. A quantile plot involves comparing different percentiles of an ensemble (e.g., upper decile, lower decile) against observations colocated in space and time. The conditional distributions of the observations given the ensemble estimates are represented in terms of selected quantiles in comparison to the 1:1 diagonal line representing perfect estimates [Wilks, 1995]. The histograms in the lower portion of Figures 8a and 8b represent the frequency of the modeled estimates.

[51] Figure 9 shows the quantile plots for the prior LW and SW ensembles over the 14 month simulation period. Quantile plots were produced for short periods (not shown) and reaffirm many of the model biases discussed in section 5.4 on rank histograms. Figure 9 shows the SW ensemble to outperform the LW ensemble. In particular, the SW ensemble agrees well with the SW observations over the range of observed fluxes. The LW ensemble, on the other hand, indicates the presence of systematic biases. Namely, the prior LW ensemble tends to overestimate small LW fluxes and underestimate large LW fluxes. This behavior helps explain the overpopulated ranks of the two extremes in Figure 8a. In general, the LW ensemble is able to encapsulate the majority of observed values, but consistently overestimates values at the low end of the range. Given issues related to point-scale versus satellite pixel-scale differences coupled with observation errors, however, these findings suggest the ability of the ensemble-based scheme to reasonably reproduce independent, ground-based observations of both downwelling LW and SW flux.

Details are in the caption following the image
Quantile plots of prior ensembles for downwelling (a) longwave and (b) shortwave radiative flux. Quantile bounds are only provided for bins that contain a minimum of 0.25% of the total population size.

6. Conclusions

[52] The ensemble-based scheme effectively incorporates uncertainties into downwelling fluxes of broadband LW and SW radiation. The resulting ensemble of prior (unconditioned) simulations implicitly contains uncertainties associated with a covariance structure derived from satellite-based inputs as well as parameterizations used within the model. An advantage of perturbing individual inputs and/or parameters is that the physical basis of such uncertainties remains transparent. This transparency, coupled with the flexibility of ensemble techniques at representing model error [Dunne and Entekhabi, 2006], provides a useful tool to account for uncertainties. Furthermore, such an approach yields a more realistic uncertainty structure relative to a simple, lumped perturbation approach.

[53] As demonstrated, the prior ensemble is able to encapsulate ground-based observations during a variety of cloud cover and atmospheric conditions. Computed error metrics show the ensemble mean to contain more uncertainty than the nominal simulation but without significant degradation to model bias. Namely, the LW RMSD for the nominal and ensemble mean simulations were 25 W m−2 and 29 W m−2, respectively, while the SW RMSD was 40 W m−2 and 46 W m−2, respectively. Model biases in the prior ensemble are comparable to that found in the nominal simulation, and were shown to be within reasonable limits upon inspection of rank histograms and quantile plots. The amount of uncertainty in the ensemble as expressed in 〈RMSDy〉 (Figure 7) is greater than the natural variability of the observations [Li et al., 2005]. However, the prior ensemble is designed to contain too much uncertainty rather than too little uncertainty, which is a conservative yet advantageous approach [Crow and Van Loon, 2006] employed in the ensemble-based data assimilation scheme presented in part 2 of this study [Forman and Margulis, 2010].

[54] Although the postulated uncertainty structure used in the ensemble framework is shown to reasonably represent the uncertainties associated with downwelling radiative flux estimation, it is important to highlight the shortcomings of the approach. For example, the temporal correlation assumptions (i.e., perfectly correlated during the diurnal cycle and uncorrelated day to day) are not appropriate for all atmospheric conditions. When large-scale cloud systems are present, day-to-day temporal correlations in radiative flux uncertainties may exist due to the possibility of day-to-day correlations in cloud structure and their subsequent effect on radiative processes. Conversely, the assumption of perfect temporal correlations during a diurnal cycle may not be appropriate in the presence of short-term cloud systems (e.g., convective cloud formation). During situations with convective cloud formation, temporal correlations in radiative flux uncertainties may exist over shorter time scales (e.g., hours) associated with short-term correlations in cloud structure. The assumption of a temporal correlation structure within a single diurnal cycle is reasonable given the diurnal nature of radiative fluxes (i.e., SW radiative fluxes decorrelate between sunset and sunrise); however, the representation of an alternative uncertainty formulation during certain atmospheric conditions may benefit the ensemble-based radiative flux framework. In addition, the assumed lognormal structure of the postulated uncertainties may not always be appropriate. Alternative perturbation models could be employed to better represent non-Gaussian uncertainties. However, the lognormal formulation allows for inclusion of first and second moment information of the statistical distributions while maintaining tractability by ignoring higher-order moments. Finally, the assumption of unbiased inputs may be incorrect for certain inputs during certain times of the year. This assumption could be relaxed through a bias correction, which could help address deficiencies in overestimated longwave fluxes during subzero temperature conditions resulting from a positive temperature bias as well as underestimated shortwave fluxes during clear sky conditions resulting from a positive water vapor bias.

[55] The findings presented in this study demonstrate an effective technique for incorporating uncertainties within a prior ensemble of downwelling radiative fluxes while accounting for spatial and cross correlations between cloud, atmospheric, and land surface states that modulate radiative fluxes. The intended use of the prior ensemble is for inclusion into an ensemble-based data assimilation scheme. Measurements derived from more sophisticated retrieval algorithms are then used to condition the prior ensemble in order to yield an improved posterior that more adequately represents the radiative flux while simultaneously containing reduced uncertainty from that of the prior. This work is discussed in part 2 [Forman and Margulis, 2010]. It is paramount that the prior ensemble receives sufficient attention so that the ensemble representation of the radiative fluxes contains a reasonable amount of uncertainty that adequately captures the true variability of the fluxes. Improper representation of the prior ensemble, including underestimation of the ensemble size and/or underrepresentation of the prior uncertainties, can have a deleterious effect on the results of an ensemble-based data assimilation framework.


[60] Funding provided by the NASA Earth System Science Fellowship (contract NNX07AN64H) and NASA grants NNG04GO74G and NNG05GE58G. We thank Xavier Emery for help with TBCOSIM during the generation of cross-correlated random fields as well as the Minnis Research Group at the NASA Langley Research Center for answers to our questions regarding VISST. We thank three anonymous reviewers for their in-depth comments and suggestions that significantly helped improve the quality of this manuscript.

    Appendix A:: Shortwave and Longwave Formulations

    [56] The SW module employs a single column plane-parallel atmosphere conceptualization at each pixel where the presence or absence of clouds is based on a high-resolution cloud product. The formulation describing downwelling SW radiation may be written concisely as
    equation image
    where RSW(x, t) is the downwelling broadband SW flux at the Earth's surface as a function of space (x) and time (t), image is the top-of-atmosphere (TOA) incoming flux, τsw is the composite direct beam SW transmissivity, (1 + diff) represents the forward/backscattered component, A is the blue-sky (diffuse plus direct) broadband surface albedo, αdiff is the diffuse contribution, rc is the cloud reflectance, and ac is the cloud absorptance. In the absence of clouds, rc and ac are both equal to zero. Though not explicitly shown, all expressions on the right-hand side of (10) are dependent on space and time.

    [57] The SW formulation used in this study is the same as that used by Forman and Margulis [2009] except that a modified parameterization for Rayleigh scattering is used. In order to reduce negative bias during clear sky conditions, the modified Rayleigh scattering coefficient, which is explicit in τsw, is computed as 0.0261–0.0625 loge(θ) where θ is the solar zenith angle.

    [58] Downwelling LW radiation is formulated as a simple, physically based expression that explicitly accounts for cloud variability. The impact of clouds on downwelling radiative processes are assumed to behave additively such that the downwelling radiation emanated from the cloud base is attenuated by the effective transmissivity of the subcloud layer [Diak et al., 2000]. Hence, the formulation describing downwelling LW radiation may be written concisely as
    equation image
    where RLW(x, t) is the downwelling broadband LW flux at the Earth's surface, σ is the Stefan-Boltzmann constant, ɛa is the effective atmospheric emissivity, Ta is the reference-level air temperature, (1 − ɛa) is the effective transmissivity of the subcloud layer, ɛc is the cloud emissivity, and Tc is the cloud base temperature. In the absence of clouds, both ɛc and Tc are equal zero. Again, though not explicitly shown, all expressions on the right-hand side of (11) are dependent on space and time.

    [59] The LW formulation used in this study is the same as that used by Forman and Margulis [2009] except that an alternative parameterization for effective atmospheric emissivity is used. In an attempt to improve downwelling LW estimation during subzero temperature conditions, the methods of Satterlund [1979] rather than Idso [1981] were employed. Namely, atmospheric emissivity was computed as ɛa = 1.08(1 − exp(−eaTa/2016)) where ea is the reference-level vapor pressure in mbar and Ta is in K.