GEWEX Water Vapor Assessment: Validation of AIRS Tropospheric Humidity Profiles with Characterised Radiosonde Soundings

18 The tropospheric water vapor profile record from the Atmospheric Infrared Sounder (AIRS) 19 now spans over a decade, making it a valuable resource for climate studies. To be con20 sidered as a Climate Data Record (CDR) it is key that the ultimate performance of these 21 observations are understood. The GEWEX water vapor assessment (G-VAP) has been 22 tasked with characterising the current state of the art in water vapor products currently 23 available for climate analysis. Within the scope of this exercise, water vapor profiles from 24 AIRS have been assessed using collocated characterised in situ measurements of tropo25 spheric water vapor between 2007 and 2012. We first show how previously published meth26 ods for correcting radiosondes can be applied to global records, which show high corre27 lations to GCOS Reference Upper-Air Network (GRUAN) performance at pressures be28 tween the surface and 250 hPa. We go further and show the first comparison of uncer29 tainties from both the newly created Characterised Radiosonde Measurement (CRM) 30 and GRUAN data sets. Global estimates of AIRS water vapor profile (wet/dry) biases 31 relative to GRUAN and CRM are within 6±0.3 % ppmv and 15±0.1 % ppmv below 300 32 hPa respectively. The CRM record allows latitudinal analysis for the first time, which 33 when examined shows sensitivity to changes in absolute concentration due to large scale 34 circulation in the International Tropical Convergence Zone (ITCZ). This paper advances 35 the use of state-of-the-art in situ records for characterising absolute performance, recog36 nising that long term stability needs further research. 37

cesses extends over a wide range of spatial and temporal scales, from the global climate 48 to micrometeorology (Bevis et al., 1992). 49 In order to use satellite observations to capture these varying scales, assessment 50 of their performance against characterised in situ measurements (with well-defined un- this is often referred to as "halftop/halfbot". The number of trapezoids are dependent 152 on the parameter being retrieved (e.g. H 2 O=11), and therefore limits the independent 153 structure that can be resolved. For further information on the trapezoids, layers and lev-154 els in the AIRS L2 products please see E. T. Olsen et al. (2011). Therefore, to interpo-155 late between the AIRS 100 level retrieval grid and the 11 H 2 O layers the layer averag-156 ing kernel A is mapped by multiplying it with the trapezoid function and its pseudo-

167
This study employs two different radiosonde datasets to investigate the performance manuscript submitted to JGR: Atmospheres the temperature profile does not deviate from the observed value by more than 1 K be-184 low, and 2 K above 300 hPa (UKMO, 2006). This data record form the basis for the  acterised Radiosonde Measurement (CRM) Archive that is described in section 3.

211
Published correction methods (Miloshevich et al., 2004(Miloshevich et al., , 2009(Miloshevich et al., , 2006 are performed 212 on a standardised vertical temporal grid (6 second) representing the ascent of the bal-213 loon due to the time-dependent components to the sensor response. Global operational 214 radiosonde databases, however only report on a subset of pressure levels which exclude 215 -6-©2018 American Geophysical Union. All rights reserved. manuscript submitted to JGR: Atmospheres the relevant time information. Therefore, to interpolate onto a vertical time grid (or time 216 profile) the ascent rate of the balloon needs to be estimated.

217
The method employed for this study assumes a constant rate of ascent and deter-218 mines an uncertainty that is propagated through the correction method. To estimate the 219 constant rate of ascent observations from the GRUAN data archive were used as they 220 include ascent rate values. At the time of calculation the data record included 14,414 221 soundings, with over 80 million individual measurements. Examination of the ascent rate 222 data from GRUAN observations yields a mean value of 4.73 ms −1 ±0.86 ms −1 (Trent, 223 2015). Next the time interval (δt) for adjacent pressure levels is calculated by convert-224 ing the difference between levels P1 and P2 from hPa into m (δz). This δz value is then 225 divided by the ascent rate v z to yield δt. The time profile is then the cumulative sum 226 of the δt values. Once the low resolution time profile has been created, the maximum 227 (time) value is then used to create the appropriate number of levels for the 6 second ver-228 tical grid. The ascent time information is then used to interpolate the low resolution tem-229 perature, relative humidity (RH) and pressure profiles onto the 6 second grid.

230
This method was tested on soundings from the UKMO Cambourne site to inves-231 tigate the effects of interpolation on the lower resolution profiles.The Cambourne site 232 was chosen as; i) it was possible to obtain both low and high (temporal) resolution data 233 products, ii) RS92 model radiosondes are used, iii) not part of the GRUAN network and 234 iv) the site experiences a large variability of weather originating from the Atlantic ocean 235 due to its location of the site in the south-west of the UK. This last point was the most 236 significant reason for choosing Cambourne as the factors that have the biggest effect on 237 the ascent rate are atmospheric density and the coefficient of drag outside factor retain-238 ing to the actual balloon itself. To ensure that a linear interpolation routine was suffi-239 cient for this task, five other algorithms; i) cubic spline, ii) quadratic, iii) least squares 240 quadratic, iv) Akima (1970) and v) Akima (1991) were used on a test set of profiles mea-241 sured at Cambourne from 2009 (see Trent (2015) for further discussion).

242
The low resolution profiles were interpolated using each algorithm, corrected and 243 then compared back to the correct high resolution profile. The linear interpolation method 244 performed as well, and in some cases, better than the other algorithms. In general dif-245 ferences are less than 0.5% RH below 300 hPa, with an increase to between 1.2 -1.5% 246 RH nearer 200 hPa. This increase is due to the lower sampling in the operational sound-247 ing which can fail to capture the dehydration in the UTLS accurately. Therefore, the 248 limiting factor for the profile interpolation is the residual structure in the low resolution 249 profile.

250
The newly interpolated RH profile then has a time-lag correction applied to account 251 for the response delay of the radiosonde sensors (Miloshevich et al., 2004): where X = e − ∆t τ and ∆t = t − t 0 . Vaisala laboratory measurements of sensor 253 time-lag (τ ) which are given as a function of Temperature (T) are then used to calcu-254 late the corrected humidity (RH c ) from the measured humidity (RH). By fitting a sim-255 ple exponential this gives the relationship: where A and B are constants (-0.7399 and -0.07718 respectively) and the scale fac-257 tor α (0.8) corresponds to approximately two standard deviations below the mean. This 258 is an attempt to tackle this problem conservatively, meaning that 5% of sensors will be 259 over corrected and 95% will be slightly under corrected. This approach simplifies the method 260 -7-©2018 American Geophysical Union. All rights reserved.
where RH T LAG is the smoothed, time-lag corrected RH profile (RH c ). The correc-270 tion factor G(P, RH) is generally: where F(P, RH) is polynomial function fit to the RH profile: here a and P are pre-computed coefficients and N is the order of the fit. This step 273 accounts for mean calibration bias and for solar radiation error in daytime soundings, 274 and they are valid from the surface to about 18 km altitude (100 hPa daytime). uncertainty (±0.5% RH). Comparisons to microwave radiometer (MWR) also showed 281 a calibration (production) batch-dependent variability of ±2% relative to the long-term 282 mean. As this additional bias uncertainty is not representative of the majority of cases, 283 therefore an "expected value" is adopted which when combined with other components 284 gives an uncertainty estimate of bias of ±(3% + 0.5% RH pressure level must be within 1 hPa of the AIRS grid. The collated radiosonde profiles 302 are then analysed at each pressure level using orthogonal distance regression (ODR), as 303 it accounts for uncertainties in both variables (Boggs and Rogers (1990) 2. CRM has a mean offset bias relative to GRUAN of 2.72% between 1000-300 hPa, 310 which reduces to 1.34% at altitudes above 300 hPa. sity Estimation (KDE) method (Parzen, 1962;Rosenblatt, 1956): where f(U RH ) is the PDF for either CRM or GRUAN, U RH is the respective ra-318 diosonde uncertainty, n is the number of data points, K is the kernel function and h is 319 the bandwidth smoothing parameter. The KDE method is used here as it allows for the 320 PDF to be estimated and compared without being effected by any discontinuities in ei-321 ther distribution. For this study, a Gaussian kernel density function is used and the band-  Figure 3. PDF distribution of RH uncertainties for CRM (blue) and GRUAN (orange) for the lowest 12 standard AIRS pressure levels. Data is taken from common soundings found in both data bases that cover 4 sites. The Hellinger distance (DH), a measure of the similarity between the 2 PDFs is also displayed. At 250 hPa the 2 distributions start to diverge.
The Hellinger distance (Hellinger, 1909) is a metric used to measure the difference 326 between PDF distributions, and is a probabilistic analogue of the Euclidean distance.

327
A √ 2 is included to allow the distance values to range between 0 and 1. Where values 328 exceed 1, the two distributions are completely different or non-comparable.
where x o is the AIRS first guess profile,Ã is the averaging kernel that has been 364 interpolated onto the 100 level retrieval grid using the trapezoid functions, x t is high res-365 olution reference profile on the AIRS 100 level grid, and x est is the convolved reference Once the matched radiosonde profile has been convolved three quality flags from  3. Qual H2O is equal to 2. 381 4. AIRS retrieval uncertainty is greater than 25% ppmv.

382
-11-©2018 American Geophysical Union. All rights reserved. providing the first estimates of collocation uncertainty between these match-ups. All re-400 sults are reported as % of absolute concentrations of water vapor in ppmv.

401
The bias (∆q) between AIRS and the radiosonde water vapor profiles is defined 402 as: where q A and q R are the AIRS and convolved radiosonde water vapor profile mea-404 surement respectively. Next the modified z score (Iglewicz & Hoaglin, 1993) provides a 405 final filter that is applied to the data in order to identify outliers: where µ ∆q is the median AIRS layer difference and σ ∆q is the median absolute dif-407 ference (MAD) of the AIRS layer differences. If the minimum calculated z value is greater 408 than the theoretical maximum value N−1 √ N , where N is the number of data points, then 409 those points are considered non-Gaussian and a default bad data flag is assigned to that 410 layer. All points whose |z| score is greater than 3.5 (recommendation by Iglewicz and Hoaglin 411 (1993)) are removed. Both the measurements contribute a uncertainty for each point in 412 the profile, so the uncertainty of the bias U(δq) is calculated using classical error prop-413 agation methods such that: where U(q A ) and U(q R ) are the AIRS retrieved profile (Susskind,Blaisdell,& Iredell,415 2014) and characterised radiosonde profile uncertainty (Immler et al. (2010) in the case 416 of GRUAN) respectively. Finally any propagated uncertainty is calculated as the linear 417 combination of the average bias uncertainty and standard error:      will be key to ensure the robust assessment of long-term stability in satellite records.

522
A key advantage of using operational radiosonde networks is that they generate large 523 numbers of matches relative to collocations performed at climate networks like GRUAN.

524
With these larger numbers there is a higher confidence that the true bias is within the 525 uncertainty of the reported median bias. However, the calculated uncertainty only par-526 tially describes the calculated bias. As the collocation framework is imperfect the un-527 certainty arising from the match up process also needs to be assessed. Characterising to estimate the collocation uncertainty (σ): where m 1 and u 1 are the AIRS retrieved water vapor and retrieval uncertainty re-535 spectively, and m 2 and u 2 are the radiosonde measured water vapor and measurement 536 uncertainty respectively. The coverage factor, k is the interval about the mean value as

541
Cases are considered consistent where k is less or equal to 1 and in agreement when 542 k is less or equal to 2. Therefore, to calculate σ three assumptions are applied to the data: 543 1. The median bias is a robust estimate of the theoretical bias due to large number 544 of collocations, which are in the order of 10 4 to 10 5 matches for the majority of 545 bins.

546
-17-©2018 American Geophysical Union. All rights reserved. manuscript submitted to JGR: Atmospheres 2. The median bias is also agrees to some degree within the uncertainty, i.e. k ¡ 2. 547 3. The uncertainties u 1 and u 2 correctly and fully describe the uncertainties of the 548 two data records.

549
Firstly equation 15 can be rewritten as: where δm = |m 1 − m 2 | and U δm = u 1 2 + u 2 2 1 2 . Next equation 16 is then rear-551 ranged to make σ the subject: The collocation criteria for this study uses a basic set of parameters to match AIRS 553 retrievals with CRM soundings. These can lead to large uncertainties in the collocation 554 especially in regions of high variability, or by where the collocation is sampling completely 555 different areas. Through averaging large numbers of matches the collocation uncertainty 556 will reduce in the same manner as the uncertainty on the bias. In this study we then make 557 the assumption that for the yearly statistics to be valid estimates for AIRS performance 558 then they must also be either consistent (k = 1) or within agreement (k = 2) (Immler 559 et al., 2010). Therefore, by substituting the values of 1 and 2 for k, σ can then be cal-560 culated. These results are shown in Figure 6 with the calculated collocation uncertainty 561 assuming k is equal to 1 on the left-hand-side and for when k is equal to 2 on the right- i) a significant increase in σ k due to a reduction in signal-to-noise, or ii) a higher frequency 591 in missing data relative to the 100 km and ±3 hour window. In the Northern Hemisphere, 592 the lower signal-to-noise causes σ k to increase by less than 1 % ppmv on average for both 593 k hypothesises. For match-ups below 60 • S, there is a average reduction in σ k between 594 1-2 % ppmv for most years. Collocations in this region come from Antarctica stations, 595 which are mainly sited near the coast (Figure 2). This tighter match-up criteria acts to 596 remove the majority of AIRS soundings over the Southern Ocean, which were inflating 597 the collocation criteria.

598
Ensuring that both the reference measurement, which is made over 90-120 minutes, manuscript submitted to JGR: Atmospheres are accurate, one has higher performance (low bias) and the other provides greater lat-617 itudinal distribution. Unique to this study we use the uncertainties to help characterise 618 the performance of AIRS water vapor biases, and apply the AIRS averaging kernels to 619 the radiosonde profiles. Commonly used in trace gas validation, the application of av-620 eraging kernels in the convolution of a reference profile allows for like-for-like compar-621 isons of the collocated AIRS water vapor observations.

622
In an initial study by Hagan et al. (2004), balloon soundings and aircraft measure-623 ments were used to look at AIRS version 3 retrieved water between 500-100 hPa. These 624 first results found that AIRS had RMSE values of 25% or better for closely matched ob-625 servations (within 1 hour). This result was a significant improvement over the beta val-626 idation estimate of 50.5% (E. Fetzer et al., 2003) which used operational radiosondes.

627
Studies using AIRS version 4 water vapor retrievals found profile biases were better than 628 ±15% when compared to a 2-year study with operational radiosondes (Divakarla et al.,629 2006). However, comparisons at dedicated sites from the study by Tobin et al. (2006) 630 showed that profile biases were within ±5% below 400 hPa, with an increasing dry bias 631 between 400-200 hPa of -10%. These biases were reported to be largely independent of 632 cloud fraction. Our analysis of AIRS version 6 water vapor profiles with 6 years of GRUAN 633 radiosondes shows improved performance relative to these studies. In all-sky conditions 634 biases are below ±1.5 % ppmv between the surface and 600 hPa, while in clear-sky con-635 ditions (cloud fraction <1%) there is an observed increased wet bias relative to GRUAN.

636
This behaviour is also observed with matches at CRM sites, though it is more pronounced 637 with clear-sky wet biases exceeding 20 % ppmv. It could be thought that because CRM 638 has denser sampling of very wet atmospheres that this would be responsible for the in-639 creased wet bias. However, analysis of AIRS performance in extreme wet and dry atmo-640 spheres in Table 2 does not suggest this as the sole reason as:  3. For the upper troposphere, CRM shows AIRS to be predominately wet-biased for 648 both high and low TCWV thresholds, while the inverse is true for GRUAN.

649
Therefore, this is suggestive that regions where TCWV amounts are between 5-50 kg m −2 650 the differences in regime variability influence the larger observed bias values. The diur-651 nal split for all scenarios in Table 2 confirms this with daytime matches showing wet-652 ter biases compared to night-time matches. Another point to consider here is that CRM 653 also under-samples the vertical H 2 O profile relative to GRUAN, which could also be in-654 troducing a bias into the comparisons.    The value of systematic differences can be enhanced if the total difference can be 748 broken down into individual components. In the characterisation of the RS92 archive we 749 also compare the vertical distribution of CRM uncertainty with those from GRUAN. One 750 apparent difference is the inability of CRM to fully capture the random uncertainty com-751 ponent which could allow for collocated profiles that should be excluded during the con-752 sistency test to be included. A large variety of such structural uncertainties can contribute 753 to the total bias (see Kummerow, Schulz, and Bojkov (2011) for a brief discussion).

754
The AIRS version 6 water vapor product is an example of data which is based on 755 the combination of hyper-spectral and microwave observations. Similar retrievals exist 756 for IASI and The Cross-track Infrared Sounder (CrIS), though they have not been re-757 processed consistently until now. Incorporating observations from companion instruments 758 such as AMSU and MODIS to characterise elements such as cloud fraction or surface mois-759 ture content would aid in constraining AIRS retrievals and further improving performance.