Volume 118, Issue 14 pp. 7489-7504
Regular Article
Free Access

Impacts of snow cover fraction data assimilation on modeled energy and moisture budgets

Kristi R. Arsenault

Corresponding Author

Kristi R. Arsenault

SAIC, Inc., Beltsville, Maryland, USA

NASA Goddard Space Flight Center, Greenbelt, Maryland, USA

Corresponding author: K. R. Arsenault, NASA/Goddard Space Flight Center, Greenbelt, MD 20771, USA. ([email protected])Search for more papers by this author
Paul R. Houser

Paul R. Houser

Department of Geography and GeoInformation Science, George Mason University, Fairfax, Virginia, USA

Search for more papers by this author
Gabriëlle J. M. De Lannoy

Gabriëlle J. M. De Lannoy

NASA Goddard Space Flight Center, Greenbelt, Maryland, USA

USRA, Columbia, MD, USA

Search for more papers by this author
Paul A. Dirmeyer

Paul A. Dirmeyer

Department of Atmospheric, Oceanic and Earth Sciences, George Mason University, Fairfax, Virginia, USA

Center for Ocean-Land-Atmosphere Studies, Calverton, Maryland, USA

Search for more papers by this author
First published: 05 June 2013
Citations: 28

Abstract

[1] Two data assimilation (DA) methods, a simple rule-based direct insertion (DI) approach and a one-dimensional ensemble Kalman filter (EnKF) method, are evaluated by assimilating snow cover fraction observations into the Community Land surface Model. The ensemble perturbation needed for the EnKF resulted in negative snowpack biases. Therefore, a correction is made to the ensemble bias using an approach that constrains the ensemble forecasts with a single unperturbed deterministic LSM run. This is shown to improve the final snow state analyses. The EnKF method produces slightly better results in higher elevation locations, whereas results indicate that the DI method has a performance advantage in lower elevation regions. In addition, the two DA methods are evaluated in terms of their overall impacts on the other land surface state variables (e.g., soil moisture) and fluxes (e.g., latent heat flux). The EnKF method is shown to have less impact overall than the DI method and causes less distortion of the hydrological budget. However, the land surface model adjusts more slowly to the smaller EnKF increments, which leads to smaller but slightly more persistent moisture budget errors than found with the DI updates. The DI method can remove almost instantly much of the modeled snowpack, but this also allows the model system to quickly revert to hydrological balance for nonsnowpack conditions.

Key Points

  • Snow cover assimilated via direct insertion(DI) and ensemble Kalman filter(EnKF)
  • EnKF method performed better in higher elevations and DI in lower elevations
  • Versus DI, EnKF leads to smaller model impacts but more hydrologic budget errors

1 Introduction

[2] Snow cover fraction (SCF) is one of the key state variables that can affect much of a land surface model's (LSM's) geophysical fields and subsequent updates to, for instance, a coupled atmospheric model [e.g., Jin and Wen, 2012]. Snow cover fraction affects not only surface albedo and net shortwave radiation but almost all subsequent model energy and temperature calculations. Despite being such an important land surface condition, it is parameterized in many models as a diagnostic variable, derived from snowpack column state such as snow depth or snow water equivalent (SWE). Thus, the model's ability to represent snow cover conditions is dependent on the accuracy of the snowpack forecasts and the accuracy of the function applied to convert the snowpack state variables into snow cover estimates. To improve the LSM's snow cover estimates, data assimilation (DA) techniques and a substantial record of snow cover observations can be used. Through assimilation, the combined model and observations should produce more consistent and complete results than either individually.

[3] Several different snow cover data assimilation methods have been developed and applied to LSMs. It is important to understand how these different methods compare, since many are being used in different land surface reanalysis projects [e.g., Rodell et al., 2004], operationally based integrated snow products [e.g., Carroll et al., 1999], research-grade short-range weather forecasts [e.g., Drusch et al., 2004], and long-range based climate reanalyses [e.g., Khan et al., 2008; Saha et al., 2010]. Currently, snow DA approaches are mostly applied in off-line simulations but are beginning to be used more in coupled land-atmosphere simulations [e.g., Jin and Miller, 2011; Xu and Dirmeyer, 2011; de Rosnay et al., 2013]. Also, for snow assimilation, most reanalysis and operational products simply use direct insertion (DI) approaches [e.g., Rodell et al., 2004; Saha et al., 2010] or other single-member methods, like optimum interpolation [e.g., Brasnett, 1999; Liston and Hiemstra, 2008] and Cressman analysis [e.g., Drusch et al., 2004]. More complex assimilation methods, like Kalman filter (KF)-based approaches, have been used mostly in research mode [e.g., Sun et al., 2004; Slater and Clark, 2006; Andreadis and Lettenmaier, 2006; De Lannoy et al., 2012], but much interest in the more complex methods currently exists from both operational and research centers.

[4] This study's main objective is to evaluate two DA methods of different complexity and also to examine their impact on the model's state and fluxes. Such an evaluation is important to obtaining an improved snow state analysis, since snow cover can impact not only albedo and energy fluxes but the larger global climate system [e.g., Yang et al., 2001; Levis et al., 2007; Euskirchen et al., 2007] through radiative and snowmelt-soil moisture connections [e.g., Shinoda, 2001]. Most snow data assimilation studies have examined assimilating observations on SWE and snow depth estimates [e.g., Andreadis and Lettenmaier, 2006; Slater and Clark, 2006], while only a few have examined the impacts on hydrologic and energy fluxes, albeit without much observational data [e.g., Dong et al., 2007; Zaitchik and Rodell, 2009, De Lannoy et al., 2012]. In this study, we examine the impacts of assimilated snow cover observations on the model's states and fluxes, and how they differ with regard to the data assimilation method complexity.

[5] In the following sections, two different data assimilation algorithm types are described and applied: the Rodell and Houser [2004] (hereinafter RH04) rule-based DI approach and the ensemble KF (EnKF) [e.g., Reichle et al., 2002a, 2002b]. The DI algorithm is considered a much simpler approach that does not typically take into account model or observation uncertainty. For the EnKF algorithm, dynamic model and observation error structures are predicted and accounted for by an ensemble of realizations propagated in time to derive a best analysis estimate. However, the EnKF approach relies on unbiased Gaussian error distributions, while in reality most earth system variables are non-Gaussian and biased. Each approach possesses advantages and disadvantages, which will be addressed in the subsequent sections.

2 Model and Observation Background

2.1 Model Background

[6] The Community Land Model, version 2 (CLM2) [Bonan et al., 2002; Dai et al., 2003; Oleson et al., 2004] models heat and moisture states and fluxes for each gridcell and utilizes plant function types (PFTs) to define vegetation cover. It uses leaf and stem area indices (LAI and SAI) as the main time-varying vegetation parameters. The soil column consists of ten soil layers with the thinnest layers (e.g., layers 1 and 2 thickness equivalent to 7 and 28 mm, respectively) near the column's top. The water balance equations are mass conserving. For the atmospheric surface layer, vertical kinematic fluxes of momentum and sensible and latent heat are derived using the Monin-Obukhov similarity theory. Bonan et al. [2002] and Oleson et al. [2004] discuss in further detail CLM2's governing soil and flux equations.

[7] CLM2's five-layer snow scheme accounts for layer-based liquid, ice, and heat energy. Snow compaction is accounted for via melting processes, destructive metamorphism, etc. [Anderson, 1976]. At each time step, snow layers can vary from one to five, depending on melting or accumulating conditions and layer thicknesses [Jordan, 1991; Oleson et al., 2004]. The snowpack's hydrology incorporates liquid and frozen precipitation, evapotranspiration, and surface and subsurface drainage. An upper snow accumulation limit is set mainly to 1000 mm, keeping snow from accumulating unrealistically for coarser gridcell sizes [Oleson et al., 2004]. Direct and diffuse beam ground albedos use snow and soil albedo combinations for net shortwave radiation calculations [Dickinson et al., 1993]. Direct beam and diffuse snow albedos depend on solar zenith angle, while diffuse also relies on snow aging properties (e.g., grain size and soot increases) and new liquid-equivalent snowfall of 10 mm [e.g., Dickinson et al., 1993].

[8] In CLM2, snow cover is a diagnostic variable, dependent mainly on total snow depth (see Figure 1). The snow fraction value is then used in the ground albedo calculations and subsequent energy and moisture flux predictions. Snow cover fraction can also be modified when snow-burying effects, like on vegetation, are accounted for in the model [Wang and Zeng, 2009].

Details are in the caption following the image
The CLM2 fraction of snow-covered ground as a function of snow depth.

[9] CLM2 is integrated forward in time on a regular 0.01° spatial grid using the Land Information System [LIS; Kumar et al., 2006]. Within LIS, CLM's PFT classification scheme and parameters are mapped to the University of Maryland (UMD) classification, as done for the Global Land Data Assimilation System (GLDAS) [Rodell et al., 2004] and is implemented in this study. CLM2 is run off line, uncoupled from an atmospheric model. Though feedbacks between the land and atmosphere are not present in this mode, an advantage is the ability to constrain and more easily quantify changes arising from the forcing inputs that drive the model, as well as any impacts of assimilating observations.

[10] CLM's snow parameterizations have been evaluated at point-scales [e.g., Feng et al., 2008; Rutter et al., 2009] and been shown to simulate snow processes relatively well for different parts of the world [e.g., Dai et al., 2003; Wang and Zeng, 2009; Rutter et al., 2009]. For this study, some minor adjustments were made to reduce a snowmelt lag found in CLM2, which include (1) turning off the vertical vegetation burying effects (e.g., grassland points cannot be buried by the modeled snowpack) which do not account for blowing-snow impacts and cause higher-than-normal surface albedo, low net shortwave radiation, surface temperatures, and energy fluxes; (2) increasing the SWE cap of 1000 mm to 1500 mm for the Washington region where SWE can greatly exceed 1000 mm; and (3) setting the hard-coded 800 mm SWE limit to the SWE cap to allow the snow to age when SWE exceeds 800 mm. Though the latest CLM version is 4.0, during the time that this research was conducted CLM2 was the only version available in LIS. The choice of LIS was made to take advantage of its existing data assimilation and meteorological forcing driver options. This work focuses on the role of snow cover fraction observations and the value they add to LSM analyses through different data assimilation methods.

2.2 Snow Cover Fraction Observations

[11] Terra's Moderate Resolution Imaging Spectroradiometer (MODIS) Level 3, Collection 5, 500 m daily snow cover fraction product (MOD10A1) [Salomonson and Appel, 2004] is used in this study. From their error analysis, Salomonson and Appel [2004, 2006] estimated a mean absolute error of less than 10% for the snow cover fraction range (0–100%). The SCF may be underestimated in dense forests [Salomonson and Appel, 2004] and for patchy snow cover conditions (e.g., <20%) [Déry et al., 2005]. The daily 500 m sinusoidal projection products are aggregated to the predefined LIS 0.01° geographic coordinate system grid, following the approach in De Lannoy et al. [2012]. Only the Terra MODIS SCF product, which has about a 10:00 am local overpass time, is used for the data assimilation experiments.

2.3 Snow Observation Validation Data Sets

[12] The SWE and snow depth validation data sets include the U.S. Department of Agriculture's (USDA) Natural Resources Conservation Service (NRCS) SNOTEL [National Climatic Data Center, 2002] and the National Weather Service Cooperative Observer Program (COOP) network daily observations, for the period of March 2000 to September 2010. The daily SNOTEL SWE data are based on snow pillow measurements at midnight local standard time. The SNOTEL sites are usually located in higher elevation regions, but this data set is one of the few available to validate model estimates in mountainous areas [Pan et al., 2003]. The USDA NRCS performs preliminary data quality assurance checks, but the data typically require additional screening for outlier values, as performed for this study [Serreze et al., 1999]. COOP snow depth data include manual daily measurements from weather stations in lower elevation locations.

2.4 Meteorological Forcing and Parameter Data Sets

[13] The meteorological forcing used is the 0.125° resolution North America Land Data Assimilation System (NLDAS) data set, which includes model-derived (i.e., Eta-based Data Assimilation System analyses) and observed precipitation and downward shortwave radiation products [Cosgrove et al., 2003]. The blended ground based observation-model precipitation product is used. Within LIS, the NLDAS fields (i.e., temperature, specific humidity, downward longwave radiation, and surface pressure) are downscaled and adjusted to the in situ SNOTEL and COOP elevation values using the mean environmental lapse rate (6.5 K/km) and the hydrostatic relationship. The Terra MODIS (collection 4) UMD land cover classification product was used [Friedl et al., 2002]. For LAI, monthly climatologies (based on years 2001–2006) of the Terra MODIS LAI product were used in the simulations. For soils, the Pennsylvania State University-USDA State Soil Geographic Database (STATSGO) soil layered data sets were included [Miller and White, 1998]. Also, CLM requires a soil color map [Dickinson et al., 1993; Zeng et al., 2002].

2.5 Study Areas

[14] Two regions in Washington (WA) and Colorado (CO) were selected because they reflect different snow and vegetation conditions with topographic variability. The areas have lower left (45.005°N, −121.995°W) and upper right corner (48.995°N, −116.995°W) bounding coordinates for Washington, and lower left (37.005° N, −108.995°W) and upper right (40.995°N, −102.005°W) coordinates for Colorado. Figure 2 highlights these study areas and each region's SNOTEL and COOP network sites used in this work. After applying additional quality-control checks, 56 and 98 SNOTEL stations (black triangle) are identified as being within the “WA” and “CO” domains, respectively, and 75 and 152 COOP stations (gray plus sign) are within the WA and CO domains, respectively. COOP sites were selected based on how static the site location was (i.e., not moved more than once in the given 11 years).

Details are in the caption following the image
Regional elevation maps of Washington and Colorado domains with a larger Western U.S. regional view (lower right figure with boxes highlighting the two featured regions). SNOTEL (black triangle) and NWS COOP (gray plus sign) sites are also shown. Source of the elevation maps is the National Elevation Dataset (NED) for the top two maps, and lower right map source is the GTOPO-1 km.

3 Data Assimilation Methods and Experimental Design

3.1 DI Method

[15] For DI, the observations are treated as being perfect, thus placing most of the weight on the observations. Many snow cover-based DI studies do produce better analyses than either the observations and model separately [e.g., Rodell and Houser, 2004; Zaitchik and Rodell, 2009; Tang and Lettenmaier, 2010]. As modeling domains become larger and state vectors include more variables, the main advantage of the DI methods is that they are computationally simpler and more efficient.

[16] The rule-based RH04 DI approach applied here is used in GLDAS [Rodell et al., 2004] reanalysis snow products. Though this algorithm was originally applied for a coarser scale, it is hypothesized that the rules, which were designed to limit erroneous observations yet retain enough useful information for assimilation, may also apply to finer scales, such as 0.01°. The RH04 DI algorithm includes two main rules for when a MODIS SCF observation is available at a station point and at assimilation time for updating the model's SWE and snow depth (Snod) amounts:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0001(1)
where ρsnow,bulk is the CLM2 bulk snow density with a value of 250 kg/m3. The 5 mm of SWE and 0.02 m of snow depth values are used to initiate snowpack growth.
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0002(2)
where SWE and snow depth are reduced to 0 mm, when a MODIS SCF value indicates very little to no snow presence. If neither condition is satisfied, then no update to the model occurs. If the gridcell's cloud cover is greater than 50%, then the gridcell is treated as cloud covered and no update occurs. This applies to both data assimilation methods. The constant bulk snow density value of 250 kg/m3 is used even though it does not capture low densities of “fresh” snow or high densities associated with older or melting snow. When an assimilation update occurs for the two total column snowpack state variables (SWE and Snod), the information is then propagated to each individual snow layer variable (liquid content from SWE and depth of each layer from snow depth). Then through physical process routines similar to those in CLM2, the layered state variables are further updated through snow compaction, combination and division of snow layers to make the analysis physically consistent. This is done the same for the DI and EnKF methods.

3.2 1-D EnKF

[17] The EnKF applies Monte-Carlo type simulations to the KF, allowing randomly generated error samples to represent the model and observational uncertainties [e.g., Evensen, 1994]. One major assumption is that the perturbed forecast distribution is Gaussian and unbiased. However, these conditions are often not met and leave the filter operation suboptimal.

[18] The EnKF method applied in LIS [e.g., Kumar et al., 2008] adheres to the works of Reichle et al. [2002a, 2002b] and Reichle et al. [2009]. When observations become available, the assimilation update step is initiated by estimating the model analysis, which reflects model forecast inputs, observational inputs, and error covariance matrices. The analysis equation is expressed as:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0003(3)
where xat,i represents the analysis estimate for ensemble member, i, and Kt is the Kalman gain matrix. The vector function, ht(), known as the observation operator, transforms the forecast state vector, xft,i, into a predicted observation vector, yft,i, equivalent to the actual observation set, yot,i. Here, the observation set is perturbed (for each ensemble member, i), assuming a normal distribution with a mean of 0 and a standard deviation, σ, as described below.
[19] Since the observation type being assimilated is MODIS SCF measurements, the observation operator used is the CLM2 snow cover fraction formulation [e.g., Oleson et al., 2004]:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0004(4)
which uses the snow depth forecast, snodft,i, to generate a predicted SCF observation, represented by scfft,i. This observation operator is also called a snow depletion curve (SDC) and is shown in Figure 1. Finally, the weighted covariance matrix term required in updating the analysis is the Kalman gain matrix, Kt, which is solved as
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0005(5)

[20] The σobs2 is the scalar observation error variance, and Cov() refers to error covariance matrices. The model state vector contains the two total column snowpack variables, SWE and snow depth, and the observation set contains only the one element of SCF.

[21] Here, the model and observational information are mapped 1:1 spatially where satellite observation data points correspond to a single model gridcell. This is considered a one-dimensional (1-D) EnKF method and is used in this study. More recent studies have begun to account for spatial uncertainty effects, like spatial-length error characteristics, using more sophisticated EnKF approaches such as 3-D filters [e.g., De Lannoy et al., 2010]. Forecast ensembles are generated by perturbing select atmospheric forcing fields and snow variables using a random field generator, which also cross correlates the perturbed atmospheric fields and selects model states to account for cross-correlated errors within the forcing fields and states [e.g., Reichle et al., 2002a; De Lannoy et al., 2012]. Additive perturbations are applied to temperature and downward longwave radiation to have zero means, and multiplicative perturbation ensemble means are scaled to 1 for precipitation, downward shortwave radiation, and the SWE and snow depth variables.

[22] For the DI and EnKF experiments, a baseline (unperturbed deterministic) and ensemble, or open-loop (OL), experiments were conducted, respectively. For the OL and EnKF runs, 12-member ensembles were employed and assumed to obtain sufficient spread for the state variables. The perturbation settings are adapted from De Lannoy et al. [2012], though they tuned the perturbations settings for the Noah LSM snowpack state. Through additional tests, it was found that these perturbations are sufficient for our experiments with CLM2.

3.3 Experiment Background and Setup

[23] For the assimilation experiments, only Terra MODIS SCF observations are assimilated at 1800UTC (or 10:00 am Pacific Standard Time) for WA and at 1700UTC (10:00 am Mountain Standard Time; MST) for CO. Experiments are validated with SNOTEL observations at 0800UTC, reflecting local midnight, and COOP observations when recorded by observers. Some of the validation statistics include temporal and spatial (i.e., station) averages, along with standard deviations and root mean squared errors (RMSE) to estimate the dispersion in the evaluated data and overall errors of the model state updates. To test a range of observation and DA method capabilities, two time periods are selected that encompass normal, negative, and positive snowpack anomalies. A rank analysis on the in situ snow observations for Water Years (WYs) 2000–2010 was performed to locate two experiment years that captured a range of snowpack conditions. A WY begins on 1 October and ends 30 September. One overlapping “normal” snow year for both regions is WY2004. For a negative snowpack anomaly year (or a “drought” year), the WA domain experienced this case in WY2005. This happens to coincide with a significant positive snowpack anomaly in CO. Thus, WYs 2004–2005 are used for the remaining experiments. All model experiments include a spin-up time period from 3 September 1996 to 30 September 2003, using a 1 h time step.

3.4 Ensemble Bias Correction

[24] One key feature that emerged in the CLM2 ensemble evaluations is a consistent underestimation of the ensemble mean SWE and snow depth with respect to the single deterministic run, as shown in Figure 3. This indicates that bias may be resulting from the ensemble generation, which will likely reduce the optimality of the DA filter. A first cause of biases is in the nonlinearity of the snow model (e.g. processes like snow aging, compaction, etc.). Furthermore, precipitation, downward shortwave radiation and the two snow state variables are found to be biased, when applying the lognormally distributed multiplication factors [based on Reichle et al., 2002a, 2002b]. When upper value limits and maximum standard deviation limits are imposed on these variables, they cap or scale back high or unreasonable perturbed values that otherwise ensure a mean of 1 for lognormal distributions. The imposed limits on precipitation and the two snow state variables contribute to the underestimated ensemble means. The imposed limits on precipitation and the two snow state variables contribute to the underestimated ensemble means. However, these limits remain in place to constrain any extreme values. Others have reported ensemble biases using normal additive perturbation values even with sufficient ensemble sizes [e.g., Reichle et al., 2007; Ryu et al., 2009].

Details are in the caption following the image
Spatial averages of SWE (mm) for the baseline run (thick gray line), the biased ensemble mean (thick red line), and bias-corrected ensemble mean (thick green line, overlaying the thick gray line) for both (a) WA and (b) CO. The dashed lines indicate ensemble maximums and minimums, representing ensemble spread for the biased (red-dashed lines) and bias-corrected runs (green-dashed lines). The thickness of the gray line is greater than the green line only to show the overlap of the bias-corrected ensemble mean.

[25] To best compare the EnKF method with the described DI method, a correction must be made to the ensemble bias. This bias, as shown in Figure 3, is corrected using the method outlined in Ryu et al. [2009]. To fully implement their bias-correction method, ensembles are still propagated and updated as normal. However, one additional unperturbed, single-member forecast is also updated during the assimilation step with an unperturbed SCF observation value. This is to maintain consistency with the analysis update to the perturbed snow variables. Ryu et al. [2009] only applied their ensemble bias correction to soil moisture layer state variables; here their equations 4 and 5 are also applied to the biased precipitation and downward shortwave radiation ensembles prior to any model or assimilation update steps.

[26] To show that this bias correction works, Figure 3 shows the original baseline, the underestimated or biased ensemble mean, and bias-corrected ensemble runs for both a) WA and b) CO, respectively. These SWE time series show the spatial averages over all SNOTEL-equivalent points for WA and CO. Both SWE and snow depth (not shown) have biases minimized in comparison to the baseline run. As further proof that the method works, ensemble minimum and maximum values are plotted (dashed lines), displaying the conserved spread for the two OL simulations. This simple scheme may correct for ensemble perturbation bias, but it does not truly account for other known biases in the model or observations, which require other bias mitigation approaches [e.g., Reichle and Koster, 2004].

3.5 EnKF Observation Error Estimation

[27] In many EnKF approaches, the observations are treated as random variables and perturbed similarly to the model states to ensure that filter performance does not degrade [Houtekamer and Mitchell, 1998; Burgers et al., 1998]. For the MODIS SCF observational error (σobs, with a unit of percentage), previous studies assigned a 10% SCF standard deviation value, which Andreadis and Lettenmaier [2006] and Su et al. [2008] selected, based on Maurer et al. [2003], to reflect MODIS SCF observation detection errors. To show what happens if only this 10% observation error is accounted for, a set of experiments were conducted and evaluated in section 4.

[28] In this study, MODIS SCF errors (σobs) are estimated for our particular study domains by comparing MODIS SCF observations with in situ snowpack observations (e.g., SNOTEL SWE) converted into SCF validation “truth” using the SDC-based observation operator (see Figure 1). More specifically, the observation error (σobs) is expressed in terms of RMSE between MODIS SCF and in situ SCF. Nonzero snow observation values from WYs 2000–2003 and 2006–2010 are only used, since WYs 2004–2005 are used for the DA experiment and validation period. To calculate the SNOTEL-SCF values, SNOTEL SWE is converted to snow depth using the CLM SDC and a bulk snow density of 250 kg/m3, which both might add some error to the observation error estimate. Furthermore, the observation error estimates represent not only MODIS SCF detection uncertainty but also representativeness errors (e.g., estimating ~1 km-based MODIS SCF errors using point-based SNOTEL/COOP data, SNOTEL site not representative of the local area, etc.). These final observation errors are estimated for each region and observation network, separately and combined, and presented in Table 1. As shown in Table 1, most of the total observational error estimates may be near 30–35%, capturing both measurement and representativeness errors in the observations. The higher errors found for WA (e.g., 35.17%) may relate to denser forest cover (e.g., Hall et al. [1998]) and slope-induced shadow effects on morning retrievals (e.g., Notarnicola et al. [2013]), which can result in underestimated SCF estimates. Monthly based errors were also examined where wintertime errors tended to be lower (e.g., 21% for CO) versus much higher springtime errors (e.g., 30% for CO) when compared to the annual errors. To examine the impact of the static annual errors, they are applied in subsequent EnKF experiments and presented in section 4.

Table 1. Observation Error, σobs,, for SCF (Expressed as a RMSE) in % Snow Cover
Observation Error (RMSE)
Case Stations σobs (%)
SNOTEL-CO 98 25.36
SNOTEL-WA 54 35.17
COOP-CO 152 34.04
COOP-WA 75 36.17
CO-combined 250 30.64
WA-combined 129 35.75

4 Results

4.1 Data Assimilation Method Comparison

[29] In this section, the results of the RH04 DI and EnKF experiments are evaluated in how their snowpack state variables, SWE and snow depth, are updated. For an initial comparison, the single-member deterministic run, the biased ensemble OL, RH04 DI and biased EnKF experiments are compared with the SNOTEL SWE and COOP snow depth observations for both CO and WA regions. The purpose of including the biased OL and EnKF experiments is to illustrate the impact of bias introduced through commonly used perturbation methods [Reichle et al., 2002a, 2002b], as currently often used in several applications [e.g., Kumar et al., 2008; De Lannoy et al., 2012]. First, spatial snowpack averages are calculated across all station (and model) points at each observation time for the two WYs, 2004–2005. For the EnKF experiments, the combined observational error, σobs, values from Table 1 are applied as 30.64% for CO and 35.75% for WA. Figure 4 displays time series of the spatial averages for the four cases: CO domain a) SNOTEL and b) COOP sites, and WA domain c) SNOTEL and d) COOP sites. SNOTEL-based observations include SWE (in mm), and COOP includes snow depth (converted to mm).

Details are in the caption following the image
Time series of snow observation and model point spatial averages are compared for the four cases: CO domain (a) SNOTEL and (b) COOP sites, and WA domain (c) SNOTEL and (d) COOP sites. SNOTEL-based observations include SWE (in mm) and COOP-based observations include snow depth converted to mm (blue lines). Black and gray lines indicate the single-member baseline CLM2 runs and 12-member CLM2 open-loop (OL) runs, respectively. The RH04 DI (red) and EnKF (with biased ensembles and combined total observation standard errors, in green) methods are also shown in this comparison.

[30] For almost all cases, the averaged sites indicate that CLM2 produces late spring snowpack melt-offs in comparison to the observations, except for the WA-COOP case. For the two SNOTEL cases (CO and WA), CLM2 shows a significant lag in the late spring snow accumulation and melt-off, partly due to a springtime cold bias in the NLDAS forcing [see Cosgrove et al., 2003]. As introduced in section 3.4, all OL experiment cases show underestimated peak snowpack in comparison to the deterministic CLM2 experiments, confirming the ensemble bias presence, especially for the deeper snowpack cases. For the two DA experiments, the RH04 DI and EnKF show similar time series patterns, and the assimilation of the MODIS SCF observations does reveal improved timing of snowmelt relative to the OL and baseline. For the CO cases, the RH04 DI method produces a slightly better agreement with the snow observations than the EnKF method. However, the opposite occurs for the WA cases. Also, in the WA cases, the DI method removes the snowpack too early due to MODIS SCF detection underestimation in denser forests areas, especially in spring months [e.g., Hall et al., 1998; Liu et al., 2008].

[31] To compare how these experiments differ overall, the experiments are evaluated in terms of their RMSE and temporal correlation coefficients (correl), averaged over all stations within a network for WYs 2004–2005, and summarized in Table 2. The statistics associated with SNOTEL sites include the months of December to May, and COOP sites include months December to March. The reason for the shorter COOP period is that these sites tend to be in lower elevation locations where the snowpack has typically melted by the end of March. As found in Table 2, SNOTEL points tend to have the lowest errors (123 and 135 mm, for CO and WA, respectively) and highest correlation values (0.80 and 0.84, respectively) associated with the EnKF analysis, specifically the unbiased ensemble-based experiments (except correlation values go down slightly). The errors are reduced by 10%, but remain substantial because of the unavoidable spatial discrepancy in the point to gridcell comparison. The correlation values show significant improvements through data assimilation, especially for the CO-SNOTEL case which shows statistically significant differences at the 99% (95%) level for EnKF (DI) experiments versus the OL (baseline) run. As for COOP points, the DI-based analysis tend to follow the COOP snow depth time series pattern more closely (see Figure 4), and for CO, the DI analyses have smaller differences with the observations than the other experiments for that case. Improvements over the OL experiments occur for almost all cases when MODIS SCF is assimilated.

Table 2. Spatial Averages of Temporally Based Summary Statistics for CLM2 Baseline, Open-Loop (OL), DI, and EnKF DA Experiments for Combined WYs 2004–2005 a
DA Comparison
RMSE Units: mm
EnKF Biased EnKF Unbiased
SNOTEL-CO Baseline OL biased OL unbiased DI-RH04 10% σobs Combined σobs Combined σobs
RMSE 136.46 136.26 136.20 133.59 159.50 129.00 123.47
Correl 0.58 0.60 0.58 0.76 0.74 0.81 0.80
SNOTEL-WA
RMSE 151.12 148.09 150.87 165.07 193.11 137.48 135.04
Correl 0.76 0.78 0.77 0.73 0.65 0.85 0.84
COOP-CO
RMSE 132.55 122.33 132.05 94.59 116.89 106.20 112.21
Correl 0.48 0.49 0.48 0.58 0.49 0.52 0.52
COOP-WA
RMSE 105.37 102.82 103.67 91.72 100.41 99.21 100.73
Correl 0.60 0.60 0.60 0.66 0.64 0.62 0.62
  • a SNOTEL sites are evaluated in terms of SWE, and snow depth for COOP. Statistics associated with SNOTEL sites encompass months December–May, while those with COOP sites include months, December–March. Bold-italicized values indicate best results for each region and year.

4.2 Ensemble Bias Reduction Impact

[32] Many ensemble-based DA studies have addressed different bias issues in terms of filter performance, including observation and model biases, and also parameter and forcing biases [e.g., Dee and da Silva, 1998; Reichle and Koster, 2004; Bosilovich et al., 2007; De Lannoy et al., 2007; Pauwels et al., 2013]. To best compare the DI RH04 and EnKF methods, the bias in the ensembles should be addressed. In this section, the OL ensemble mean bias is corrected, using the Ryu et al. [2009] approach (as described in section 3.4) to explore how such errors are propagated to the model forecasts and the analysis estimates. Summary statistics are presented in Table 2 and indicate very small differences between the baseline and OL, biased and unbiased simulations. Since the summary statistics do not specify where any significant differences occur between the OL and EnKF experiments, another type of metric is applied.

[33] For the climate and hydrologic applications, it is important to know when the snowpack completely melts. Such a measure is derived and utilized to see what, if any, impact the bias and the bias correction have relative to the SNOTEL and COOP observations. This snowmelt metric, referred to here as the final day of melt (“FDM”), reflects the day of the WY when the snowpack first melts totally after peak snowpack conditions. To see how the OL and EnKF experiments compare with the snow observations, differences are taken between the model's and observations' snowmelt metric and then spatially averaged over all available points for each case and year. The final results are presented in Figure 5. For the average in FDM differences (i.e., FDMCLM2 − FDMOBS), the greatest difference in actual days occurs between the OL runs and snow observations, which is expected due to the model snowmelt lag bias. There are noticeable reductions in days between the biased and unbiased OL experiments with the bias-corrected ensemble experiments showing closer agreement with the observations, except for the WA-SNOTEL, WY2005 case. These reductions translate to better agreement within the EnKF experiments, especially for SNOTEL-based cases. These small improvements in timing relate to reducing the wintertime “undercatch” conditions due to underestimated precipitation ensemble biases (especially at SNOTEL locations) and reducing negatively biased downward shortwave radiation fluxes in springtime, which again are perturbed with the lognormally distributed factors. Thus, bias correcting the ensembles does consistently benefit the snowpack analysis in this way. Therefore, the ensemble bias-correction scheme is applied for all subsequent evaluations.

Details are in the caption following the image
Average differences between CLM2 and snow observation snowmelt metrics are compared for the bias and bias-corrected OL and EnKF experiments for all four cases. The snowmelt metric applied is defined as the “final day of melt” (FDM), identified as the day-number (starting from 1 Oct.) when the snowpack first melts off.

[34] With the ensemble bias correction applied, the DI and EnKF experiments are evaluated against the observations again but in terms of change relative to their respective control runs. This evaluation is presented by normalizing the SWE (or snow depth) RMSE values of each experiment with the SWE (or snow depth) RMSE values of the CLM2 baseline and unbiased OL runs. A normalized RMSE value less than one indicates that an experiment performs better than its control run [e.g., Crow and Ryu, 2009]. Figure 6 shows how the relative RMSE values compare for DI and unbiased EnKF methods normalized by their respective control runs for all cases. The EnKF experiments appear to have overall smaller spatial standard deviation and minimum/maximum ranges than the DI experiments, indicating that the DI experiments show more wide-varying impacts on the analyses at both SNOTEL and COOP sites. Such impacts could translate to greater errors in subsequent processes, like streamflow conditions. For SNOTEL sites, like Figure 6a (CO domain) and 6c (WA domain), the EnKF method has slightly improved mean conditions versus the OL and DI runs, especially for the WA domain. At the lower elevation COOP points, the DI experiments perform slightly better than the baseline runs and the EnKF experiments, as shown before. One reason why the DI method performs better at lower elevation sites is that it removes more effectively the persistently biased CLM2 snowpack lag than the EnKF method at sites where smaller snowpacks typically melt off by March. For higher elevation SNOTEL sites, the DI method removes the snow too early at several sites, whereas the EnKF method more gradually reduces the overall snowpack, improving the timing of snow melt-off with respect to the observations.

Details are in the caption following the image
DA experiment RMSE values normalized by respective baseline or open-loop RMSE values for the CO domain (a) SNOTEL and (b) COOP cases, and WA domain (c) SNOTEL and (d) COOP cases. Black triangles indicate the statistical mean, gray bar lines indicate ±1 stdev unit from mean, and black error bar lines indicate the maximum and minimum extents of the normalized statistic. Values below 1 suggest improvement over the control runs.

[35] To show how these normalized values at the different sites vary with elevation, scatterplots of the four cases are shown in Figure 7 and only for the more normal snow year, WY2004. Somewhat weak relationships are shown between the normalized RMSE values and elevation. For CO, normalized RMSE values slightly increase (decrease) with an increase in elevation at SNOTEL (COOP) points. For WA, the relationships are even weaker. For WY2005 (not shown), similarly weak relationships also exist, except the DI-based normalized RMSE values decrease with elevation for the CO-COOP case.

Details are in the caption following the image
Scatterplots are shown between elevation (unit: meters) and the DI (open squares) and EnKF (“x” symbols) experiment RMSE values normalized by their respective baseline or open-loop RMSE values for the CO domain (a) SNOTEL and (b) COOP cases, and WA domain (c) SNOTEL and (d) COOP cases. WY2004 values are only shown.

[36] In these DA method comparisons, the EnKF experiments were conducted with the combined σobs values, 30.64% and 35.75%, for CO and WA, respectively. These values could be optimized to account for variation in year, elevation, vegetation type, etc.

[37] Additional CO domain experiments were performed by applying the SNOTEL-CO and COOP-CO-based observation error values (25.36% and 34.04%, respectively) from Table 1. Applying the SNOTEL-only σobs value produces slightly greater standard deviation and min/max ranges versus the COOP-only estimates (not shown). Again, this may relate to having the analysis depend more on the observational information and remove too much modeled snowpack. This is indeed the case for the EnKF experiment performed with the 10% detection error (summary statistics are reported in Table 2). Accounting for MODIS detection errors only degrades the EnKF algorithm by making it depend too much on the snow cover observation information and not the model forecast.

4.3 Impacts on the Hydrological and Energy Budgets and Fluxes

[38] The final objective is to show how assimilating snow cover fraction observations impacts the model's energy and water budgets. This is important since LSMs like CLM2 are typically coupled to numerical weather prediction, global climate, or streamflow models and can often control much of the surface radiative and energy flux conditions. To see how the different DA updates impact the LSM's surface budgets, the other LSM state variables (e.g., soil moisture) and fluxes (e.g., turbulent energy) are evaluated in terms of total impacts for a snowmelt period. Each DA experiment is evaluated qualitatively relative to its control run, similar to Dong et al. [2007] and Zaitchik and Rodell [2009]. For the DI experiments, the CLM2 baseline serves as the control run. For the EnKF experiments, the bias-corrected OL run serves as its control.

[39] For this evaluation, the snowmelt period from 1 March to 31 August 2004 is examined, which corresponds to an average snow year out of the 11 year period (2000–2010). Restart files (for 1 March 2004) from each experiment were used to initialize each model run. The model state and fluxes compared include: net shortwave (SWnet) and longwave (LWnet) radiation (Wm−2); latent (LHFlux), sensible (SHFlux), and ground (GHFlux) heat fluxes (Wm−2); total average surface temperature (SurfTemp; K); total column soil wetness (SoilWet; vol/vol %), surface (SurfRunoff) and subsurface (Subsurface) runoff (mm/day); snowmelt (Snowmelt; mm/day); and total evapotranspiration (Total Evap; mm/day).

[40] The energy budget can be written as follows:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0006(6)
where this expression equals zero in a perfectly balanced system. Deviations from zero can be expected when the fluxes are reacting to assimilation increments in select state variables. To see how energy fluxes, LHFlux and SHFlux, are impacted by the two DA methods, Figure 8 highlights the total (spatial) averages of the absolute differences between the combined daily averaged fluxes for each experiment and those of the control runs. The ground heat fluxes (not shown) almost mirror the combined LHFlux and SHFlux, indicating how the residual energy fluxes are balanced. The DI experiments' turbulent flux differences are slightly larger than the EnKF differences with its control run. This may suggest that the earlier reductions in the DI experiment's snowpack (as shown in Figure 4) contribute to higher combined flux values than those of the EnKF simulations. For the EnKF experiment, the smaller increments associated with the SCF updates can translate to reduced differences in energy fluxes relative to the control run, which could ultimately result in a smaller impact on the energy fluxes being communicated to an atmospheric model.
Details are in the caption following the image
Absolute differences between the sum of latent and sensible heat fluxes from the RH04-DI experiment and the CLM2 baseline run (black line), and the bias-corrected EnKF run and its OL run (gray-dashed line), for CO domain (a) SNOTEL and (b) COOP sites, and WA domain (c) SNOTEL and (d) COOP sites. Time series represent spatial and daily averages of the differences with units of W m−2.

[41] An additional way to consider which experiment may impose less impact on subsequent states and fluxes is to calculate a total average of absolute differences between the experiment's and the control's variable of interest (e.g., latent heat flux). Table 3 provides estimates of the temporal averages of absolute differences made between each experiment and its control for 1 March to 31 August 2004, at each gridcell point then averaged over the region. In comparing the RH04 DI and the bias-corrected EnKF runs, larger differences are found in the DI run relative to its control than those between the EnKF and its control run. The differences are greatest for the higher elevation-based SNOTEL cases, especially WA sites, where the DI method removes greater amounts of SWE and sometimes earlier in the snowmelt period. However, none of the differences between the experiments and their respective control runs are statistically significant, nor are the differences between the DI and EnKF experiments. The averaged differences appear to be small for both DA-based experiments, but if the total summed differences are examined for the six-month period, the accumulated impacts are more apparent.

Table 3. Comparison of Total Daily Averages of Absolute Differences Between Experiment and Control Run Values a
DA Impacts on Energy Fluxes (1 Mar. –30 Aug. 2004)
SWnet LWnet LHFlux SHFlux SHFlux SurfTemp EngBudget
SNOTEL-CO (W m−2) (W m−2) (W m−2) (W m−2) (K) (W m−2)
DI-RH04 0.28 1.50 2.81 2.81 0.30 0.00
EnKF 0.20 1.05 2.03 2.02 0.22 0.00
SNOTEL-WA
DI-RH04 0.07 1.81 3.10 4.46 0.38 0.00
EnKF 0.05 1.23 2.16 3.05 0.26 0.00
COOP-CO
DI-RH04 0.39 0.66 0.89 0.92 0.11 0.00
EnKF 0.25 0.41 0.64 0.63 0.07 0.00
COOP-WA
DI-RH04 0.13 0.25 0.25 0.33 0.03 0.00
EnKF 0.07 0.15 0.14 0.17 0.02 0.00
  • a SWnet, LWnet, LHFlux, SHFlux, SurfTemp, and energy budget (EngBudget) terms presented as spatiotemporal (daily) averages of absolute differences between experiment and control for the period: 1 Mar.–31 Aug. 2004.
[42] A few studies have quantified the impacts of snow data assimilation on hydrological variables [Zaitchik and Rodell, 2009; De Lannoy et al., 2012], or more specifically in terms of just runoff and streamflow [e.g., Liston et al., 1999; Dressler et al., 2006;]. Here, the CLM2 moisture mass budget is expressed at each time step, ∆t, as:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0007(7)
where this expression equals zero in a perfectly balanced system. Deviations from zero are expected when assimilation increments are included in the ∆SWE term. ∆TotalSM is the change in total column soil moisture storage, and ∆SWE and ∆CanopyWater represent the changes in snowpack and canopy water storage, respectively. TotalPPT represents combined snowfall and rainfall quantities, TotalRunoff reflects surface and subsurface flow (including snowmelt), and TotalEvap includes all vegetation, ground, and snow evaporative (and sublimation) processes. For the following discussion, equation 7 is evaluated for the different experiments.

[43] To show the DA method impacts on the hydrologic budgets, Figure 9 compares the sum in equation 7 for the CLM2 baseline run, the RH04-DI, and the bias-corrected EnKF experiments, highlighting the combined (both SNOTEL and COOP) for CO (Figure 9a) and WA (Figure 9d) domains. By design, the CLM2 baseline run's moisture mass budget (black line) closes (~0 mm/day) at each daily averaged time point. For the RH04 DI experiment, evidence of the sudden SWE removals is revealed in the moisture budget as extreme negative values. These large negative values represent the amount of water mass removed from the system during an assimilation step, and they occur mostly in April to May, as indicated previously in Figure 4. For the EnKF experiment, substantial moisture imbalances can also occur (e.g., Figure 9a) but typically with partial snow removals at each update instead of complete removal, like the DI method, leaving some of the snowpack present. With these smaller increments and snow conditions present, the EnKF experiments tend to have slightly more accumulated moisture budget imbalances over time, despite both DA methods assimilating the same set of observations. This difference between the DA methods is highlighted in Figures 9b and 9c for CO and 9e and 9f for WA domains, where errors exceeding ±3 mm/day are averaged and counted per month (March to June). Figures 9c and 9f show the higher monthly frequencies in EnKF moisture mass budget errors than the DI method.

Details are in the caption following the image
Daily hydrologic budget error averages are shown (on the left) for all analyzed sites in (a) CO and (d) WA for the CLM2 baseline run (black line), the RH04 DI experiment (dark gray line with pluses), and the bias-corrected EnKF experiment (light gray line with open circles) for 1 March to 30 June 2004 only. On the right-hand side, (b and e) monthly averages of the hydrological budget errors, outside the error range of ±3 mm/day, and (c and f) the corresponding percentage of points outside this range are presented for CO and WA.

[44] For the DI method, instantaneously removing much of the modeled snowpack reverts the LSM system back to the hydrological budget of its nonsnowpack conditions and related physics, contributing to fewer hydrological budget errors in time and possibly being a benefit to some reanalysis data sets [e.g., Rodell et al., 2004; Saha et al., 2010]. For the EnKF method, these higher frequencies relate to the fact that SWE and snow depth reflect accumulated variables which have inherently high temporal correlations and lead to a suboptimal KF operation. Figure 10 shows this serial dependence through the autocorrelation (spatially averaged at daily lags) of the EnKF experiments' SWE increments over the spring months of 2004 and 2005, for both CO and WA. An optimal filter would show mostly white-noise correlations, whereas here a 5–8 day autocorrelation is found. This is a known problem which some snow DA studies have addressed [e.g., Slater and Clark, 2006], and it should be accounted for in future studies.

Details are in the caption following the image
Autocorrelation time series of the (unbiased) EnKF experiments' SWE increments for both CO (black line with open circles) and WA (gray line with closed circles) regions, including the two spring seasons of 2004 and 2005. Also, fitted correlation length functions are overlaid for both regions. The time series reflect spatial averages across available stations at each time lag.

[45] For the model time step after the assimilation update, the other model variables and fluxes then adjust to the remaining SWE left in the system to obtain moisture mass balance. This is evident in Figure 11 which shows the differences in snowmelt (mm/day) between the DI experiment and its control and the EnKF experiment and its control run for all cases, though only the SNOTEL cases are shown. The large negative differences reflect the snowpack removed but not melted. This translates into decreased runoff processes for both DA experiments. The DI approach of Zaitchik and Rodell [2009] adjusts the forcing inputs, based upon MODIS SCF estimates, to update the model snow states. With that, they were able to maintain better hydrologic budget closure at each time step.

Details are in the caption following the image
Similar description to Figure 8 but for snowmelt differences (mm/day) and for SNOTEL-only (a) CO and (b) WA sites.
[46] Similar to Table 3, total averaged absolute differences in relevant moisture fluxes and total soil column wetness variables are provided in Table 4, reflecting the same snowmelt period. The averaged hydrological budget errors (HydroBudget) shown in Table 4 indicate that both methods accumulate a similar amount of budget errors, but with the EnKF resulting in slightly higher moisture budget errors (as explained by the higher error frequency, see Figure 9), except for the SNOTEL-WA EnKF case which has overall lower errors (i.e., 0.16 mm/day less). In terms of statistical significance, the WA-SNOTEL and both CO cases have significant differences between each experiment (both DI and EnKF) and its control run for snowmelt at least at the 95% level. This is also true for the surface and subsurface runoff terms for the CO-SNOTEL DI and EnKF experiments and the WA-SNOTEL DI experiment, indicating that the DI experiments can significantly impact the model and even subsequent streamflow estimates. If the absolute differences between each experiment and its control were summed for this six-month period, the total amount of water removed from the system can be substantial. For example, the summed total runoff difference (in mm) for all WA station points for the DI experiment is:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0008
Table 4. Description Similar to That of Table 3 but for Moisture Budget States and Fluxes a
DA Impacts on Moisture Fluxes (1 Mar.–30 Aug. 2004)
SoilWet SurfRunoff Subsurface Snowmelt TotalEvap HydroBudget
SNOTEL-CO (%) (mm) (mm) (mm) (mm) (mm)
DI-RH04 2.37 1.15 1.05 2.23 0.10 1.81
EnKF 1.44 0.83 0.78 1.60 0.07 1.84
SNOTEL-WA
DI-RH04 1.66 1.17 1.69 2.86 0.11 2.47
EnKF 0.97 0.86 1.11 1.91 0.07 2.31
COOP-CO
DI-RH04 1.12 0.34 0.24 0.54 0.03 0.34
EnKF 0.56 0.26 0.16 0.37 0.02 0.39
COOP-WA
DI-RH04 0.20 0.12 0.08 0.18 0.01 0.08
EnKF 0.11 0.07 0.04 0.09 0.00 0.10
  • a SoilWet, SurfRunoff, Subsurface, Snowmelt, Total Evap, and hydrological budget (HydroBudget) terms reflect spatiotemporal (daily) averages of absolute differences between experiment and control for the period: 1 Mar.–31 Aug. 2004. Bold-italicized values indicate statistical significance at the 95% level.
[47] For the bias-corrected EnKF experiment, the result is:
urn:x-wiley:2169897X:media:jgrd50542:jgrd50542-math-0009
representing 179.07 mm less of total runoff for the six-month period in comparison to the DI experiment. This comparison indicates how much the DA method selected can affect snowmelt and thus total runoff, and effectively how much water mass was removed and unaccounted for in the system. One other notable difference is with the higher total column soil wetness differences (in %) found with the DI runs. With more snow removed and earlier in the DI experiments, less snowmelt and liquid drainage through the soil column occurs than with the EnKF experiments, affecting spring soil moisture presence and potential atmospheric responses like summer-time precipitation [e.g., Su et al., 2012].

5 Summary and Conclusions

[48] In this study, two different data assimilation methods of varying complexity were applied and evaluated by assimilating MODIS snow cover fraction observations in to the CLM2 land surface model. The two DA methods, the simpler Rodell and Houser [2004] direct insertion approach and the ensemble-based 1-D EnKF method [Reichle et al., 2002a, 2002b], were selected due to their wide usage in many LSM data assimilation studies. In order to compare the two methods, a bias correction to the EnKF's ensemble generation was required due to an underestimation of the resulting snow variables' ensemble means. The ensemble bias correction was made using the approach by Ryu et al. [2009], which was shown to improve the final snow state analysis and updates, including reducing the differences in days of total melt-off between the model analysis and observations. Another major objective was to evaluate the two different DA methods in terms of their impacts on the other LSM's land state variables (e.g., soil moisture) and fluxes (e.g., latent heat flux), in the melt period. In terms of overall impacts, the EnKF method has less impact overall on CLM2's other variables and fluxes than the DI method, including smaller impacts on the hydrological budgets. These lower impacts by the EnKF method could translate to smaller and less shocking influences when coupled to an atmospheric model.

[49] Of the two methods, the EnKF method performs slightly better overall at the higher elevation SNOTEL points, and when accounting for ensemble biases, but the DI method shows slight advantages at lower elevation COOP sites in snowmelt timing and removal. Both techniques show similar integrated hydrological budget errors during the springtime melt. However, the DI experiments suffer from greater total snowpack removals but slightly less hydrological budget errors due to the LSM system reverting back to the hydrological budget of its nonsnowpack conditions. For the EnKF method, smaller incremental springtime snowpack reductions maintain a longer snow presence and relate to lingering small budget errors, in part due to the high temporal correlation persistence of such accumulated state variables, which also renders the filter operations suboptimal.

[50] In summary, the EnKF experiments show smaller changes than the DI experiments in energy fluxes and moisture budget terms, like soil moisture and total runoff, when compared to their control simulations. Even though MODIS SCF observations are effective at reducing the snowpack biases in CLM2, water mass is essentially being removed from the system. These SWE reductions lead to snowmelt water and runoff reductions that would eventually go into streamflow or evaporative processes. Though this result may indicate a trade-off for improving the snowpack state, removing excess snow could account for forcing errors and biases as well. Other sophisticated approaches exist for dealing with these model imbalances due to the updates either using a priori knowledge and daily bias-correcting approaches [e.g., Bosilovich et al., 2007] or budget closure approaches built within the data assimilation scheme itself [e.g., Yilmaz et al., 2011]. Finally, both DA methods could be optimized further in terms of how observations get assimilated and how the model and forcing errors are accounted for. For example, reducing biases between observations and model states a priori [e.g., Reichle and Koster, 2004] or dynamically [e.g., De Lannoy et al., 2007] could improve the performance of the filters.

[51] Though the two DA methods in this study were evaluated for higher-resolutions and midlatitudinal mountainous regions, simpler methods, like DI, could still be considered for coarser-scale and larger domain-based reanalysis data sets and operational forecasts. However, if model, ensemble, and observation errors are well addressed and computing resources are not a concern, methods like the EnKF are still recommended over the simpler methods. Applying the more complex EnKF method does allow researchers and forecasters the ability to adapt observation and model errors for a variety of situations and better account for bias impacts, which can be adapted for a range of hydrological modeling and numerical weather prediction applications.

Acknowledgments

[52] This work was funded by NOAA grant NA07OAR4310221 and NASA grants NNX08AU51G, NNX08AV05H. The authors would like to thank Jagadish Shukla, Zafer Boybeyi, and David Straus for their feedback and helpful discussions. Computer resources were provided by the Institute of Global Environment and Society. Gabriëlle De Lannoy was a research fellow of the Research Fund Flanders (Fonds Wetenschappelijk Onderzoek, FWO). We greatly acknowledge Wade Crow and the three other anonymous reviewers for their constructive comments.