ICESat‐derived inland water surface spot heights

Accurate measurement of water surface height is key to many fields in hydrology and limnology. Satellite radar and laser altimetry have been shown to be useful means of obtaining such data where no ground gauging stations exist, and the accuracy of different satellite instruments is now reasonably well understood. Past validation studies have shown water surface height data from the ICESat instrument to have the highest vertical accuracy (mean absolute errors of ∼10 cm for ICESat, compared, for example, with ∼28 cm from Envisat), yet no freely available source of processed ICESat data currently exists for inland water bodies. Here we present a database of processed and quality checked ICESat‐derived inland water surface heights (IWSH) for water bodies greater than 3 arc sec (∼92 m at the equator) in width. Four automated methods for removing spurious observations or outliers were investigated, along with the impact of using different water masks. We find that the best performing method ensures that observations used are completely surrounded by water in the SRTM Water Body data. Using this method for removing spurious observations, we estimate transect‐averaged water surface heights at 587,292 unique locations from 2003 to 2009, with the number of locations proportional to the size of the river.


Introduction
The use of remote sensing data sets in water resources monitoring has become increasingly popular, especially in those large areas of the globe not covered by the existing ground gauge network or for areas where gauge data cannot be obtained in a timely manner Calmant et al., 2008]. Remote sensing altimetry can provide near-global spatial coverage and timely delivery of water surface height data, and will be increasingly important if the global network of river gauges continues to decline [Hannah et al., 2011].
Satellite altimeters now routinely provide estimates of water surface heights globally for many river and lake systems and for hydrological and limnological studies Calmant and Seyler [2006] and Calmant et al. [2008] present a list of applications which satellite altimetry can help address. These applications include: the estimation of discharge; the geodetic levelling of hydrological network gauging stations; the estimation of spatial and temporal variations in water storage; the creation of water surface slope profiles; the monitoring of lakes; and the understanding of hydrologic regimes in ungauged or poorly gauged basins. Other applications include the calibration and validation of both hydrological [Getirana, 2010;de Paiva et al., 2013] and hydraulic models [Wilson et al., 2007;Neal et al., 2012], and operational forecasting [Evensen and van Leeuwen, 1996;Janssen et al., 1997;Segschneider et al., 2000].
All satellite altimetry missions measure surface water height in a similar way: a short energy pulse is transmitted which is then reflected by the water surface back to the sensor. If the position of the satellite is known, then the distance from the altimeter to the surface is proportional to the time delay between transmission and return of the reflected energy. Depending on the application, the choice of satellite altimeter is extremely important, and Table 1 lists the characteristics of the principal altimetry instruments. While all the principal altimetry instruments have ground footprints less than 1 km, water bodies need to be 2-3 times wider than the ground footprint for water elevations to be well sampled. Birkett et al. [2002] noted for TOPEX/Poseidon whose ground footprint is 600 m that it was only suitable for use with water bodies whose width was 1.5 km. The accuracy of each mission is obtained from studies by Frappart et al. [2006] and Urban et al. [2008], which looked at a similar area of the lower Tapajos River in the Amazon Basin. While Table 1 shows ICESat laser data to be the most accurate source of altimetry information with the smallest ground footprint, there are far more studies that have used radar altimetry, despite the latter's lower spatial precision and higher vertical error. This may be due, in part, to preprocessed radar altimetry data being freely available from either LEGOS or from ESA's River and Lakes website. Researchers wishing to use ICESat to derive water height information must download and process their own data set, and develop their own methodology for removing any spurious observations that may exist. A further reason that ICESat has not been widely used to date is that most hydrologists tend to think in terms of time series of regularly repeated measurements, which ICESat does not produce. Radar altimetry data have been collected at regular intervals and over a long period, which suits many hydrological applications. For example, the Envisat instrument has a 35 day repeat cycle, whereas the ICESat data are irregular in time. However, for a number of hydrological applications, for example the calibration and validation of hydrodynamic models, the use of accurate single point data can be extremely useful, even if those data are irregular in time .
To date studies have utilized ICESat measurements to look at changes in lake levels [Swenson and Wahr, 2009;Zhang et al., 2013;Phan et al., 2012;Wang et al., 2013] as a substitute for in situ data. A number of studies have also been undertaken that assess the accuracy of ICESat water level observations. For example, Zhang et al. [2013] compared ICESat water levels with ground measured water levels for 10 lakes and found R 2 5 0.86. Phan et al. [2012] compared Tibetan lakes' water level changes estimated by both ICESat and radar altimetry measurements and showed a trend in lake level of 10.108 m/yr from ICESat and 10.123 m/yr from the radar altimetry data.
Other studies have used ICESat to correct the datum levels of sites on the existing ground gauge network [Bourgoin et al., 2007;Hall et al., 2012]. Both these studies focused mainly on the Amazon Basin; however, Hall et al. [2012] applied their methodology to another large river, the Mississippi, to estimate the errors associated with ICESat. They found an absolute vertical height error of 19 cm. These errors are comparable to those found for Lake L eman where Baghdadi et al. [2011] looked at the suitability of ICESat elevation data for monitoring hydrological systems. While Baghdadi et al. found an average root mean square vertical height error of 15 cm for Lake L eman, the root mean square error for selected rivers in France was 1.15 m. While this error is larger than that found by Hall et al. [2012] and others, it should be noted that the width of many of the rivers in the study of Baghdadi et al. [2011] was close to the 70 m ground footprint of ICE-Sat. Baghdadi et al. [2011] stated that precision greater than 15 cm was possible for water bodies greater than 1.5 km wide.
O' Loughlin et al. [2013] used ICESat water levels to generate surface water slopes for the middle reach of the Congo River and found that water surface slopes in this basin varied spatially far more than previous studies using ground data sources had suggested. Neal et al. [2012] used ICESat data to calibrate a hydraulic model of the inland delta of the Niger River and used 127 observations over 18 locations. Neal et al., [2012] found that calibration by ICESat data resulted in only a 0.002% decrease in the downstream Nash-Sutcliffe efficiency compared with calibrating using ground gauge discharge, while at the same time improving the overall root mean square error in water levels of the modeled simulation. ICESat was extremely useful in these studies as few in situ measurements were available and other altimeters could not match the spatial coverage and accuracy of ICESat. While ICESat data have proven useful and sufficiently accurate for many types of hydrological and limnological studies, no freely available source of processed and quality-checked ICESat data currently exists. In this study, we therefore produce a global database of ICESat-derived inland water surface heights. We test a number of different methods for removing spurious observations caused by bank vegetation contamination and returns from land during low flows, and investigate what impact the spatial resolution of the water mask used within the processing chain has on water level estimates. This database of transect-averaged inland water spot heights (which we term the Inland Water Surface Heights database or IWSH) has been made freely available at http://data.bris.ac.uk/data/dataset/15hbqgewcrti51hmzp69bi4gky, along with the source codes that were used to process the data and an intermediate product, containing individual measurements with an indicator of whether an observation is in a water body and how many of its surrounding cells are also water.

Data and Methods
In this study, we create a database (IWSH) of inland water surface levels using data from the satellite laser altimeter ICESat (described below). To identify whether returns from the ICESat altimeter are over water, we use a globally available water mask. We then test four different methods for removing spurious or outlying observations.

ICESat
The ICESat Geoscience Laser Altimeter System (GLAS) was the first orbiting satellite laser altimeter and started its mission in 2003 and ended in 2009 [Abdalati et al., 2010]. As detailed in Table 1, ICESat GLAS had a footprint of approximately 70 m and made observations every 172 m along its track [Schutz et al., 2005]. These small footprints and along track observation distances makes ICESat GLAS far more attractive for measuring river and lake water levels than radar altimeters, such as TOPEX/Poseidon (T/P) or Envisat, as narrow rivers can be observed and bank vegetation contamination can be reduced. T/P has a footprint of roughly 1 km and an along track observation distance of approximately 596 m. ICESat data were obtained from the Reverb website (reverb.echo.nasa.gov), and for this project the GLA14 Land Elevation Product, Release 34, was used, which already includes Geodetic and atmospheric corrections [Zwally et al., 2014]. The data were extracted using the Interactive Data Language (IDL) code provided by the National Snow and Ice Data Centre (NSIDC). The extracted data were converted to the World Geodetic System of 1984 (WGS84). Suitable observations were selected by the use of the elevation-use flag, and the saturation index to remove and/or correct saturated observations using the same criteria employed by Hall et al., [2012] and O'Loughlin et al. [2013]. Observations with a saturation index of zero or one were used without correction; observations with an index of two were corrected for saturation; and observations with an index greater than two were excluded. The selected observations were then converted from WGS84 to the vertical datum Earth Gravitational Model of 1996 (EGM96) using the conversion tool, F477, available from the National Geospatial-Intelligence Agency (NGA). The EGM96 geoid was chosen, as it is the vertical datum used in the Shuttle Radar Topography Mission (SRTM) data and is still the most common geoid used in hydrodynamic modeling.

Water Masks
Two different water masks were used in this study, one global and one covering only the Congo Basin region. The global water mask is a combination of the SRTM Water Body Data (SWBD) and MODIS data. SWBD is a by-product of the SRTM digital elevation model in which ocean, lake, and river shorelines were identified and delineated. Rivers wider than 183 m and longer than 600 m are identified in the SWBD. The SWBD were converted to 3 arc sec (90 m at the equator) using the Geospatial Data Abstraction Library (GDAL) and are available between 60 N and 60 S. To ensure a global water mask the Global Land Cover Facility (GLCF) MODIS Water Mask database [Carroll et al., 2009] was used to fill in water bodies missing in the SWBD. This water mask has previously been used to create the Global Width Database for Large Rivers [Yamazaki et al., 2014]. The GLCF has a spatial resolution of 250 m. water masks to ensure that the use of the coarser global water mask was consistent with the higher resolution and accuracy regional data. This region of the water mask was also chosen as it contains a wide distribution of channel widths and many bifurcated rivers.

Outlier Removal Methods
Initially all selected observations were given two flags. The first flag, i_water, highlights if the observation corresponded with a water (1) or land (0) pixel in the water mask. The second flag, i_adj, indicates how many of the adjacent pixels were water or land, with eight indicating a pixel completely surrounded by water and zero indicating a pixel completely surrounded by land. Observations were then grouped into river transects based on time of observation and distance between any two adjacent observations. If the distance between any two adjacent, with the same time of observation, was greater than 18 arc sec (540 m at the equator) they were assigned to different transects. This value was determined by trial and error and is, of course, a compromise. However, this adjacent distance threshold enables the identification of observations for different branches of a river system while ensuring that small islands/sandbanks do not cause the creation of additional transects.
Once observations were grouped into transects, the mean and standard deviations were calculated for each transect and four different methods for removing spurious or outlying observations were tested. Spurious observations may be caused in a number of ways. These include small geolocation errors in either the ICESat observations or water masks, or falsely classified water pixels in the water masks. Either may lead to a water pixel being falsely classified as land or vice versa. Alternatively, water mask pixels may be falsely classified due to the dynamic nature of water levels combined with the timing of the image used in creating the water mask. If the timing of the ICESat observation is from a low water period and the water mask is from high water, the ICESat observation may actually be from terra firma, which may have a higher elevation than the surrounding observations from water. However, as only a static global water mask is currently available this is an unavoidable uncertainty. When a dynamic global water mask becomes available this will be corrected. The four different methods for removing spurious observations identified pixels to include in the database as follows: Method 1: Observations from all water pixels, i.e., i_water 5 1; Method 2: Observations only from water pixels completely surrounded by water pixels, i.e., i_water 5 1 and i_adj 5 8; Method 3: Observations from water pixels whose elevation values are less than the mean of the corresponding transect plus two standard deviations; and Method 4: Observations from water pixels whose elevation values were within two standard deviations of the mean of the corresponding transect.
Method 1 represents a control where all elevation values flagged as being water are used. Methods 2-4 then provide different ways of identifying outliers. Method 2 attempts to deal with the problem of outliers by assuming that most misclassifications will occur at the edges of water bodies. By only selecting pixels where i_water 5 1 and i_adj 5 8, the method creates a buffer around each water body and only uses pixels that lie away from the water bodies' edge. Methods 3 and 4 make the assumption that most water surfaces are approximately flat over horizontal length scales of a few ICESat pixels and then any elevation values that deviate significantly from this are spurious. All the methods tested necessarily assume that the water mask is correct and does not change during the period of the ICESat mission. However, this may not always be the case and we recommend caution in using these data for river systems which show significant morphological dynamics over the duration of the ICESat mission.

Comparison of Outlier Removal Method
For each of the outlier removal methods, a number of evaluation criteria were used. The first criterion is the number of distinct transects, as, with the exception of method 2, the methods should produce the same number of these. There should be fewer transects when method 2 is used as many single observations will be removed. The second criterion is the total number of observations. Method 1 is the most relaxed method and therefore should have the most observations, followed by method 3, method 4 and, finally, method 2.

10.1002/2015WR018237
The third criterion is the average number of observations per transect. This is a measure of how many outliers have been removed. As we assume that over short distances the water surface is flat and that errors are normally distributed, the error in the average transect elevation is inversely proportional to the square root of the number of observations per transect. The fourth and final criterion is the distribution of the mean-median bias for each transect, which gives a measure of the spread of the individual observations used to calculate the average for each transect. The smaller the spread the more confidence one can have that spurious or outlying observations were removed successfully. Figure 1 shows these results. Figure 1 clearly shows that Method 2 mostly reduces the spread between the median and mean observation height for each transect. Under the assumption that the water surface at a given time is flat over a short distance and that the errors would be normally distributed if outliers due to dry land were removed, this metric suggests that this is the best method for removing outliers of the four tested. For Method 2, nearly 93% of transects have a mean-median difference of 60.125 m, compared to only approximately 83% for the other three methods. While Method 2 results in a smaller number of unique transects and smaller total number of observations, it also results in a higher number of observations per transect (Figure 1). This higher number of observations per transect is due to the removal of transects with small numbers of observations. We conclude from this that Method 2 is the best algorithm for outlier removal.    number of observations: 68% of transects have fewer than five observations. Because of the small sample size, care needs to be taken in using these transects, although we have chosen to include them for completeness. We advise the user to manually check that these observations are consistent with local information.
The remaining 32% of transects, with five or more observations, cover most of the world's largest rivers and lakes whose widths are greater than 1000 m. This 32% is equivalent to 185,857 unique transects, which is a far greater number of unique transects than any other database of water surface heights.  Figure 2 shows the number of observations used to create each unique transect around the globe. As expected, few or no transects are observed in boreal (e.g., Greenland), desert (e.g., Sahara, Saudi Arabia), or high elevation (e.g., Tibetan Plateau) regions.

Comparison of Water Masks
To establish if the global water mask was suitable, we compared it with a higher resolution and theoretically more accurate Landsat-derived water mask for the Congo region. This Landsat water mask was created for high water [O'Loughlin et al., 2013] using the Normalized Difference Vegetation Index (NDVI) and is available at 30 m resolution compared with the 90 m resolution of the global water mask. Prior to the comparison of water masks, it was expected that the Landsat water mask would result in a larger number of transects and more observations than the global water mask due to its finer resolution. These expectations were corroborated by the analysis (Table 3), where there was an increase of between 6% and 33% in the number of transects. The finer resolution water mask (Landsat) also resulted in smaller variations in the number of transects across the four outlier removal methods compared with the global water mask, with a maximum change of 12.7% compared with 30.4%.
While the Landsat water mask resulted in more unique transects, it also resulted in a far larger number of single observation transects compared with the coarser global water mask (Figure 3, bottom). The difference in the number of observations per transects by water mask is only clearly visible for transects with one  or two observations. The use of the coarser spatial resolution global water mask also resulted in slightly less spread in the number of observations per transect compared with the finer resolution Landsat water mask.

Validation of ICESat-Derived Surface Water Spot Heights
To evaluate the vertical accuracy of the developed global database of water surface spots heights, we compared our transect-averaged spot heights, after applying each of the outlier methods, with USGS ground stage data within the Mississippi Basin. Out of the entire USGS database, we found 81 in situ stage measurement locations across the Mississippi, operated during the same period as the ICESat mission. When we investigated the distance between the stage levels and ICESat data, we set a 5 km cut off for our comparison, which resulted in 14 stage levels locations. The 14 stage levels (41 measurements), corrected from their local vertical datum to EGM96, were compared with the ICESat-derived spot heights.
We looked at how the accuracy of our data varied with distance from the stage levels. We found that the accuracy of our data, using any outlier removal method was consistent with the accuracies stated in previous studies [Bourgoin et al., 2007;Baghdadi et al., 2011;Hall et al., 2012]. Clearly errors increase the further an in situ gauge is from an ICESat transect given typical water surface slopes of decimetres per km; however, for the three measurements where ICESat transects were within 0.5 km of an in situ level the root mean square errors (RMSE) were 0.350 m (Method 1), 0.259 m (Method 2), and 0.296 m (Method 3 and 4). This changed to 0.350 m (Method 1), 0.246 (Method 2), and 0.276 (Method 3 and 4) at 1 km (7 measurements). At 5 km (41 measurements), the RMSE were 0.572 m (Method 1), 0.378 m (Method 2), and 0.377 m (Methods 3 and 4). From this analysis, it is again clear the Method 2 produced the best results.

Conclusions
The aim of this study was to develop a global database of water surface spot heights from ICESat satellite laser altimetry data. This database (IWSH) should be of high value for levelling in situ gauges [Hall et al., 2012], identifying river water surface slopes [O'Loughlin et al., 2013] and in the calibration and validation of hydraulic models [Neal et al., 2012], especially in areas where in situ measurements are sparse or difficult to obtain.
We tested four different methods for removing both outliers and bank vegetation contamination. We found that the best performing method (Method 2) removed any observations that are over land or are within one pixel of land. This method also resulted in the smallest mean-median difference, which further indicates that this is the best method for removing spurious or outlying observations. However, all methods tested assume that the river courses remain constant throughout the ICESat mission.
All four outlier removal methods were validated against in situ gauge measurement across the Mississippi Basin. We found that Method 2 outperformed the alternative methods tested and resulted in an RMSE of 0.259 m for transects within 0.5 km of in situ gauge locations, albeit from a small sample of three measurements. As expected given typical water surface slopes RMSE increased to 0.378 m when transects were within 5 km of an in situ gauge. These results are consistent with previous studies which reported errors in ICESat water surface elevation determination of between 0.1 and 1.15 m.
The impact of spatial resolution of the water mask was also investigated. We compared the global water mask to a finer Landsat-derived water mask for a region in the Congo Basin between [58N, 258S] and [158E, 2258E]. Results showed that while the finer resolution water mask increased the number of transects and observations, it also resulted in a large number of transects with only one or two observations. For the Congo region, the distributions of the mean-median biases for both water masks were similar, with the global water mask performing marginally better, indicating that either mask is useful. However, to date only the coarser water mask is available globally [e.g., Yamazaki et al., 2015]. There is also a clear need to develop a time-varying global water mask to overcome limitations associated with the static global masks currently available.