Testing the daily PRISM air temperature model on semiarid mountain slopes
Abstract
Studies in mountainous terrain related to ecology and hydrology often use interpolated climate products because of a lack of local observations. One data set frequently used to develop plot-to-watershed-scale climatologies is PRISM (Parameter-elevation Regression on Independent Slopes Model) temperature. Benefits of this approach include geographically weighted station observations and topographic positioning modifiers, which become important factors for predicting temperature in complex topography. Because of the paucity of long-term climate records in mountain environments, validation of PRISM algorithms across diverse regions remains challenging, with end users instead relying on atmospheric relationships derived in sometimes distant geographic settings. Presented here are results from testing observations of daily temperature maximum (TMAX) and minimum (TMIN) on 16 sites in the Walker Basin, California-Nevada, located on open woodland slopes ranging from 1967 to 3111 m in elevation. Individual site mean absolute error varied from 1.1 to 3.7°C with better performance observed during summertime as opposed to winter. We observed a consistent cool bias in TMIN for all seasons across all sites, with cool bias in TMAX varying with season. Model error for TMIN was associated with elevation, whereas model error for TMAX was associated with topographic radiative indices (solar exposure and heat loading). These results demonstrate that temperature conditions across mountain woodland slopes are more heterogeneous than interpolated models (such as PRISM) predict, that drivers of these differences are complex and localized in nature, and that scientific application of atmospheric/climate models in mountains requires additional attention to model assumptions and source data.
Key Points
- In situ testing of a popular gridded air temperature model in a semiarid watershed revealed high accuracy but also systematic biases
- Model accuracy increased by moving from 4 km to 800 m grid size, but further downscaling did not improve results
- Biases in daily temperature extremes are linked to siting differences between in situ and model source data, and topoclimatic mechanisms
Plain Language Summary
Knowledge of daily-to-seasonal climate (such as air temperature) in mountain areas is important for assessment of landscape conditions related to plants, animals, and resources such as water supply. Because few actual observations of climate processes exist in mountains, scientists have developed models to estimate parameters like temperature across landscapes. In this paper we test one commonly used spatial temperature model using observations and report the model error as well as influential factors. Our conclusions state that while for some science and management uses the model differences from observations are inconsequential, improper application of the model in other contexts without local verification or consideration of assumptions would lead to incorrect results. We also show that the location of long-term monitoring stations in mountain landscapes likely impacts model accuracy more than differences in network instrumentation practices. Therefore, scientists or managers seeking to leverage such models of temperature to make decisions need to be aware of both the representation of source data and assumptions made during the modeling process. This study underscores the need for additional long-term monitoring of climate processes in mountain areas given the importance of such regions to society in terms of resources and value.
1 Introduction
Atmospheric conditions in mountainous terrain at ecohydrologically important scales (e.g., subkilometer) remain challenging to both observe and estimate. Scientific disciplines ranging from snow hydrology to bioecology to palaeoclimatology struggle with obtaining accurate data with reasonable estimates of uncertainty on the most basic of climatic parameters in complex topography [Lundquist and Cayan, 2007; Fridley, 2009; Dobrowski, 2011; Graae et al., 2012; Stoklosa et al., 2014], which could have profound effects on interpretation of large-scale cause and local-scale effect [Lookingbill and Urban, 2005; Minder et al., 2010; Warren et al., 2014; Oyler et al., 2015]. The importance of research on mountain processes continues to be recognized as crucial for resiliency of socio-ecological systems on a global scale [Messerli and Ives, 1997; Viviroli and Weingartner, 2004; Foley et al., 2005; Gurung et al., 2012], and therefore, scientific investigators and policymakers will use whichever data are made available in an information-poor environment.
Gridded products extrapolating point observations of meteorological parameters to landscape scales have advanced in recent years [Brohan et al., 2006; Daly et al., 2008; Haylock et al., 2008; Thornton et al., 2012]. Regardless of model sophistication, the accuracy of these modeling efforts varies with density and quality of source data networks [Hamlet and Lettenmaier, 2005; Daly, 2006; Hofstra et al., 2009; McEvoy et al., 2014; Stoklosa et al., 2014; Oyler et al., 2015], and it is in mountainous topographic regions that observational data are the most scarce [Palecki and Groisman, 2011]. Source data for models are typically derived from valley-situated ground stations and ridgetop or upper air data, leaving the category of mountain slopes poorly represented in observational data sets. Temporal and spatial resolutions of climate models are increasing [e.g., Fowler et al., 2007; Skamarock et al., 2008; Feser et al., 2011], but newer ground networks that add calibration and verification data are generally coarse in spatial resolution [e.g., Leeper et al., 2015]. It is not uncommon to see analyses leveraging gridded climate data sets for process modeling or climate change impact prediction, and yet not including verification observations and resulting error estimates. In situ measurements of climate parameters such as air temperature are critical inputs to this process, as remote-sensing-derived estimates of near-surface (e.g., 2 m) air temperature still contain significant sources of error [Kalma et al., 2008; Hengl et al., 2012]. Validation of interpolated climate products within mountainous regions is therefore an important scientific activity, in order to inform users on model quality/accuracy as well as provide feedback to the modelers themselves on the performance of their products in varying geographical and seasonal settings [Minder et al., 2010; McGuire et al., 2012; Holden et al., 2015].
Here we present a case study of topographic temperature model testing in complex topography in the Great Basin region of North America, whereby the daily time step PRISM (Parameter-elevation Regression on Independent Slopes Model) [Daly et al., 2008] temperature product was evaluated for performance across a range of middle to high elevations on homogeneous topographic slopes at spatially distributed sites at the large watershed scale. Daily maximum (TMAX) and minimum (TMIN) air temperature estimations from PRISM at four spatial scales, i.e., 4 km grid, 30 arcsec (~800 m) grid, downscaled 3 arcsec (~80 m) grid, and point interpolations from surrounding grid cells, are compared to in situ observations made with air temperature sensors on 16 mountain sites. Results of the comparison are examined for possible instrumental, topographic, or mechanistic sources of differences. Impacts of model bias on example applications are also presented, with discussion on the challenges of modeling near-surface mountain meteorology.
2 Study Geography
Sixteen monitoring sites in mountainous topography are located within the Walker River Basin, a large (10,200 km2) semiarid watershed in the western United States. This watershed is considered to be on the climate transition zone between the Sierra Nevada and Great Basin Desert ecoregions of North America (Figure 1). Several contemporary scientific, conservation, and policy-related projects in the watershed integrate PRISM data as part of their efforts [Lopes and Allander, 2009; Millar and Westfall, 2010; Mejia et al., 2012; Knick et al., 2013; Saito et al., 2014; Hatchett et al., 2015].

The monitoring sites are associated with an ongoing palaeoclimate study using upper and lower treeline species (Pinus flexilis and Pinus monophylla), ranging from 1967 to 3111 m in elevation and generally co-located in opposite-aspect pairs. Sites are distributed across four mountain ranges, ranging from west to east (Figure 1), and represent a spatial gradient from Eastern Sierra to Great Basin ecosystems. Topographic positioning of the sites varies (Table 1), but, in general, the locations were designed to represent homogenous slope features rather than peaks, ridgetops, saddles, gullies, or canyon floors. In this way, the observations are specifically targeted at general air conditions that are free of influence from local cold-air pools and topographically enhanced wind velocity. Thus, the primary drivers of air temperature on the study sites are radiative processes, larger-scale airflow, and local lapse rates. While it is true that other processes such as cold-air pooling, windiness, and snow presence have dramatic and important local effects, proper estimation of general air conditions across topography is the first step to being able to correctly model air temperature behavior at the watershed scale. We have therefore optimized the locations of in situ observations to focus on response of temperature to larger-scale topographic characteristics such as elevation, slope, and aspect, which represent the first order of variability in mountainous terrain.
Site Characteristics | PRISM800 Error Statistics, Daily Values October 2013 to September 2015 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
TMAX (°C) | TMIN (°C) | |||||||||||
Site | Elev (m) | Slope (deg) | Aspect (deg) | DAH Index | Bias | MAE | SD | r2 (p < 0.0001) | Bias | MAE | SD | r2 (p < 0.0001) |
Lucky.N | 2480 | 21 | 325 | −0.20 | −1.37 | 1.83 | 1.78 | 0.97 | −1.09 | 1.78 | 1.93 | 0.94 |
Lucky.S | 2497 | 30 | 136 | 0.18 | −3.45 | 3.49 | 1.54 | 0.97 | −2.95 | 2.99 | 1.74 | 0.96 |
DevGate.N | 2378 | 39 | 352 | −0.52 | 0.70 | 1.15 | 1.30 | 0.98 | −2.15 | 2.21 | 1.30 | 0.97 |
DevGate.S | 2360 | 27 | 200 | 0.43 | −3.47 | 3.49 | 1.31 | 0.97 | −3.28 | 3.30 | 1.42 | 0.96 |
CoreyLow.N | 2977 | 32 | 329 | −0.31 | 0.78 | 1.68 | 1.96 | 0.96 | −1.24 | 1.53 | 1.43 | 0.97 |
Corey.N | 3104 | 34 | 307 | −0.14 | 1.07 | 1.86 | 2.07 | 0.96 | −0.27 | 1.22 | 1.49 | 0.97 |
Corey.S | 3111 | 22 | 101 | −0.07 | 0.95 | 1.42 | 1.46 | 0.97 | −0.85 | 1.50 | 1.66 | 0.96 |
LWalker.N | 2452 | 16 | 304 | −0.04 | −1.81 | 1.98 | 1.39 | 0.97 | −0.91 | 1.14 | 1.30 | 0.96 |
Silverado.S | 2897 | 26 | 157 | 0.30 | −1.20 | 1.49 | 1.30 | 0.97 | −2.67 | 2.80 | 1.83 | 0.94 |
Silverado.N | 2937 | 31 | 270 | 0.20 | −1.14 | 1.62 | 1.68 | 0.96 | −1.99 | 2.13 | 1.69 | 0.94 |
PineGrove.N | 2355 | 18 | 347 | −0.24 | −0.26 | 1.65 | 2.07 | 0.95 | −2.58 | 2.65 | 1.68 | 0.95 |
PineGrove.S | 2371 | 17 | 226 | 0.27 | −3.19 | 3.32 | 1.86 | 0.96 | −2.17 | 2.27 | 1.57 | 0.96 |
Kavenaugh.N | 3000 | 37 | 310 | −0.17 | 1.09 | 1.66 | 1.71 | 0.95 | −1.29 | 2.07 | 2.17 | 0.91 |
Lundy.S | 2911 | 33 | 212 | 0.51 | −0.33 | 1.49 | 1.77 | 0.94 | −1.27 | 1.86 | 2.00 | 0.92 |
WalkerCyn.N | 2036 | 45 | 12 | −0.67 | 1.69 | 1.82 | 1.68 | 0.97 | −3.65 | 3.70 | 1.99 | 0.94 |
WalkerCyn.S | 1967 | 38 | 188 | 0.57 | −2.13 | 2.38 | 1.84 | 0.95 | −2.88 | 2.93 | 1.72 | 0.95 |
Overall mean | 2614 | 29 | 236 | 0.01 | −0.75 | 2.02 | 1.67 | 0.96 | −1.95 | 2.25 | 1.68 | 0.95 |
3 Data
3.1 PRISM Temperature Estimates
Grids of daily TMAX and TMIN estimates for the Walker River Basin were obtained from the PRISM AN81d data set [Parameter-elevation Regression on Independent Slopes Model (PRISM) Climate Group, 2016]. This data set spans 1981–present and covers the conterminous United States at 2.5 arcmin (~4 km) and 30 arcsec (~800 m) resolutions. Mapping of daily TMAX and TMIN was performed using the PRISM modeling system [Daly et al., 1994, 2002, 2003, 2008]. For each grid cell each day, PRISM calculated a local linear regression function between station temperature and a predictor grid (see below). Nearby stations entering the regression were assigned weights based primarily on the physiographic similarity of the station to the grid cell. In addition to distance and elevation differences between station and grid cell, physiographic factors accounted for included the level of the approximate wintertime inversion (if any) and a topographic index, which is measure of local topographic position (i.e., the station's elevation relative to the surrounding terrain). Detailed descriptions of the PRISM model algorithms, structure, input grids, and operation are given in Daly et al. [2002, 2008]. Specific methods relevant to this study are described below.
The PRISM AN81d (http://prism.oregonstate.edu/documents/PRISM_datasets.pdf) temperature time series was developed using climatologically aided interpolation (CAI). CAI uses an existing climate grid to improve the interpolation of another climate element for which data may be sparse or intermittent in time [Willmott and Robeson, 1995; Funk et al., 2003; Hamlet and Lettenmaier, 2005; Daly, 2006; Daly et al., 2012, 2015]. This method relies on the assumption that local spatial patterns of the climate element being interpolated closely resemble those of the existing climate grid (called the predictor grid). The use of CAI in mapping daily temperature for the AN81d data set involved using existing PRISM 1981–2010 monthly TMAX and TMIN normals as the predictor grids [PRISM Climate Group, 2016]. For example, interpolating TMAX for 5 January 2014 involved using the January 1981–2010 normal TMAX grid as the independent variable in the local PRISM regression function for a grid cell and nearby station values of 5 January 2014 TMAX as the dependent variable. Details on normals mapping methods are available from Daly et al. [2008].
Station temperature data in and around the Walker River Basin used in the PRISM AN81d data sets (Figure 1) came primarily from four main networks: (1) U.S. Department of Agriculture (USDA) Natural Resources Conservation Service SNOw TELemetry (SNOTEL; http://www.wcc.nrcs.usda.gov/snow/), (2) National Weather Service (NWS) Cooperative Observer Program (COOP; http://cdo.ncdc.noaa.gov/CDO/cdo/) and (3) Automated Surface Observing System (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/), and (4) USDA Forest Service and Bureau of Land Management Remote Automatic Weather Stations (RAWS; http://www.raws.dri.edu). Station data were screened for adherence to a “PRISM day” criterion. A PRISM day is defined as 1200 UTC–1200 UTC (e.g., 4 A.M.–4 A.M. PST). Once-per-day observation times, such as those from NWS COOP stations, must fall within ±4 h of the PRISM day to be included in the AN81d TMAX and TMIN data sets. The data set uses a day-ending naming convention; e.g., a day ending at 1200 UTC on 1 January is labeled 1 January. The PRISM day definition was established to match the “hydrologic day,” which is a standard in river modeling, and align with the observation times of most once-per-day observers (e.g., NWS COOP), which are in the morning.
3.2 In Situ Observations
At each site, observations of daily TMAX and TMIN (in °C) for 23 months (1 October 2013 to 1 September 2015) were made using Maxim iButton DS1922L thermochron dataloggers placed at approximately 1.7 m height above ground level in the vegetation canopy interspace. These data are completely independent of PRISM and were not used in model generation. The iButtons were configured using Maxim 1-Wire software to log their case temperature every 60 min, and the data were subsequently processed to extract the daily high and low temperature. Real-Time-Clock (RTC) drift of each iButton was evaluated at the end of the collection period to ensure that cumulative RTC error did not exceed 50% of the observation interval. In order to replicate standard weather station temperature measurements as closely as possible with the sensor deployments, the iButtons were placed inside six-plate Gill-type radiation shields using nonconductive mounting holders that mimicked typical temperature probe head positioning. During the 2013–2015 interval, wintertime bias of observed temperatures due to snow presence or interference with sensors at 1.7 m heights was minimized by record-setting low snowpack levels across the region [Swain, 2015], verified on the study sites with ground-level iButtons monitoring snow presence qualitatively. A subset of these observations is being continued through longer-term study.
Because this study focused on daily extremes, the time alignment of PRISM with the observation data in this study was checked using two different schemes: (1) PRISM Day, which consists of daily TMAX and TMIN taken from observations spanning the hours 1200DAY−1–1200DAY+0 GMT (0400DAY−1–0400DAY+0 PST local time), and (2) “Local Day,” which spans the hours 0800DAY+0–0800DAY+1 UT (0000DAY+0–0000DAY+1 PST local time) and lagging the PRISM day by one, such that the daily TMAX and TMIN from in situ observations from the interval 0000DAY+0–0000DAY+1 PST were aligned with PRISM daily TMAX and TMIN from 0400DAY+0–0400DAY+1 PST. It was found that statistical comparisons using the “Local Day” scheme were slightly improved, likely due to PRISM's use of local midnight-to-midnight SNOTEL TMAX and TMIN data, which fall within the ±4 h PRISM day window. Therefore, the results shown are derived from Local Day time alignment rather than PRISM Day.
4 Test and Analysis Methods
Our evaluation of PRISM temperatures in this context followed three general steps: (1) comparison of PRISM to observations, (2) investigation of potential error sources for both PRISM and the observations, and (3) evaluation of impacts of model departures on common hydrometeorological variables.
4.1 PRISM: Observation Comparison
Differences between model estimates and observations were initially calculated at the primary (i.e., 30 arcsec) scale of interest (PRISM800-Obs) for daily TMAX and TMIN, providing a first-order look at model performance.
4.2 Potential Error Sources
In order to better interpret the PRISM-Observations test, we investigated the likely sources of error present in the comparison: (1) observational methods, (2) model scale, and (3) bias in the model driven by topography and mountain meteorology given test site and source data locations.
4.2.1 Accuracy of Deployed Sensors
For long-term testing/evaluation purposes of the observational method used in this study, an identical iButton/shield combination was placed on a scientific-grade weather station located at 2600 m elevation in the center of the study watershed. The baseline instruments included a Campbell Scientific CR3000 data logger, an HMP-60 temperature and relative humidity probe, and a Type-T thermocouple installed in identical shielding and configured to Western Regional Climate Center and World Meteorological Organization specifications [World Meteorological Organization, 2008].
4.2.2 Model Scaling Test
Because most of the sites are located on steep slopes containing rapid changes in elevation over short distances (Slope; Table 1), we tested the influence of model scale on error. Estimates of daily TMAX and TMIN (in °C) were obtained from PRISM at three additional spatial scales. Besides the native 30 arcsec grid (~800 m; PRISM800), an upscaled 2.5 arcmin (~4 km; PRISM4K) grid was used, as well as point-interpolated values (PRISMPOINT) that used a two-dimensional spatial weighting of the surrounding PRISM800 grid cells. In addition, the 30 arcsec PRISM model was downscaled to 3 arcsec (~80 m; PRISM80) for the study area. This downscaling was performed at each grid cell by: (1) calculating a local lapse rate between elevation and the 30 arcsec gridded values and (2) applying this lapse rate to a 3 arcsec digital elevation model (DEM).
4.2.3 Topographic Relationships to Error

Variable | Description | TMAX Significance | TMIN Significance | Reference | ||
---|---|---|---|---|---|---|
r2 | p | r2 | p | |||
Elev | Elevation of 10 m DEM | 0.170 | 0.294 | 0.872 | 0.001 | |
Slope | Slope steepness | 0.401 | 0.042 | 0.121 | 0.452 | |
Asp | Aspect, degrees | 0.026 | 0.837 | 0.283 | 0.107 | |
TPI | Topographic position index | 0.535 | 0.002 | 0.106 | 0.490 | [Guisan et al., 1999] |
TRI | Terrain ruggedness index | 0.418 | 0.034 | 0.119 | 0.433 | [Riley et al., 1999] |
DAH | Diurnal anisotropic heating index | 0.783 | 0.001 | 0.021 | 0.871 | [Böhner and Selige, 2006; Böhner and Antonić, 2009] |
StdH | Standardized height | 0.168 | 0.300 | 0.491 | 0.014 | |
SlopH | Slope height | 0.170 | 0.299 | 0.349 | 0.055 | |
NormH | Normalized height | 0.303 | 0.103 | 0.192 | 0.230 | |
MSP | Midslope position | 0.333 | 0.074 | 0.280 | 0.114 | |
VllyD | Valley depth | 0.522 | 0.010 | 0.014 | 0.910 | |
TRASP | Topographic radiative aspect index | 0.818 | 0.001 | 0.039 | 0.754 | [Roberts and Cooper, 1989] |
- a Variables of primary (yellow) and secondary (green) significance are highlighted.
In an analysis similar to the topoclimatic work in Bunn et al. [2011], we investigated the influence of these nonindependent spatio-topographic variables on model departures from observations. A complimentary two-step process using independent cluster and ordination analyses was performed on the daily PRISM800 bias time series. First, hierarchical clustering with the Ward2 minimal-variance implementation in multidimensional Euclidean space [Murtagh and Legendre, 2014] was used to group sites by similar error characteristics. Two distinct cluster groups existed for both daily TMAX and TMIN errors (each with different site subsets) based on evaluation of silhouette graphs generated using the same Euclidean distance calculations [Rousseeuw, 1987]. In the second step, we plotted each site's TMAX and TMIN error characteristics in two-dimensional ordination space by using Non-Metric Multidimensional Scaling (NMDS) [Oksanen et al., 2015].
NMDS is an ordination method which allows comparison of pairwise dissimilarities between variables in low-dimensional space using rank-based correlation (as opposed to linear correlation methods used in principal components analysis or principal coordinates analysis). It accommodates variables with unknown distributions and removes unit distances (losing the absolute magnitude of ordination distance but retaining relative positions of variables). In this case, topographic variables that are physiographically related but associated with different atmospheric mechanisms can still be evaluated separately and grouped for relative influence on PRISM departures in the same dimensional space. This aids in interpretation of topoclimatic mechanisms which may influence PRISM error, such as insolation or large-scale air convergence. NMDS is an iterative algorithm which requires successive ordinations (beginning with a randomized placement within n dimensions specified) to be compared to actual pairwise dissimilarities until the difference between the two (“stress”) is minimized. Stress values of 0.1 or below are considered fair ordination fits.
Vectors of the topographic variables were fit to the TMAX and TMIN error ordinations [Oksanen, 2015], and significance assessed using permutation (n = 999; Table 2) within the package vegan [Oksanen et al., 2015] in the open-source R software [R Development Core Team, 2015].
4.3 Impacts to Commonly Derived Hydrometeorologic Variables
In order to evaluate effects of model error to potential science questions, three use cases were examined. Variables were targeted that are often used in applied mountain climate impact studies and estimated using models like PRISM [e.g., Horning et al., 2010; Johnson et al., 2010; West et al., 2015; Copenhaver-Parry and Cannon, 2016]. Because PRISM is leveraged for climate downscaling as well as plot-level studies, we thought that it is useful to briefly demonstrate three different scenarios of model application in mountain meteorology: (1) watershed-scale point-event prediction (such as first and last freezing days), (2) process modeling (precipitation as snow), and (3) cumulative temperature impacts (degree-day thermal sums).
4.3.1 Frost-Free Season (2014)
We calculated the 2014 frost-free season (FF14) using both the observed data and the model outputs. The length of time between the first and last frost (TMIN > 0°C) is considered an important metric of general growing season length for plants, and as a long-term climatic indicator has been increasing over recent decades in the western United States based on observations [Easterling, 2002; Kunkel et al., 2004]. The frost-free season at a given site is tied to synoptic mechanisms [Meehl et al., 2004] as well as local physiographic controls [Jordan and Smith, 1995], while regional trends in frost-free days are linked to global patterns [McCabe et al., 2015].
4.3.2 Precipitation as Snow

Precipitation values from the daily PRISM 30 arcsec precipitation product were obtained for all 16 study sites and then partitioned according to this relationship with air temperature. Although this snow-fraction model was obtained in a different climatological region than the Walker Basin (western Oregon), given that this is strictly a test of relative estimates between observed temperatures and the PRISM model, we felt comfortable using it in this scenario.
4.3.3 Heat Sums (2014 and 2015)

We set TBASE = 5°C, as this represents a commonly used base temperature for climate-ecological studies [Crookston et al., 2010; Thompson et al., 2012, 2015; Bentz et al., 2013], but such thermal sums are also a common feature in mountain hydrology models [Rango and Martinec, 1995; Bergström et al., 2001; Hock, 2003]. Because PRISM does not contain hourly data, the true daily mean temperature was estimated using TMAX and TMIN. It is most common to use the mean of the two extremes as in the GDD equation above [McMaster and Wilhem, 1997].
5 Test and Analysis Results
5.1 PRISM: Observation Comparison
Systematic departures were clearly present between the model and observations, with a seasonal cool bias in PRISM daily TMAX and consistent cool bias in PRISM daily TMIN (Figure 2). Across all sites, daily TMAX was underestimated on average (−0.75°C; Figure 2a), with mean bias varying by season. Mean absolute error (MAE) for daily TMAX ranged from 1.15 to 3.47°C, with standard deviations (SD) from 1.30 to 2.07, r2 values from 0.94 to 0.98 (p < 0.0001), and overall bias from −3.47 to 1.69°C (Table 1). For daily TMIN, negative bias was more pronounced (−1.95°C; Figure 2b). MAE ranged from 1.14 to 3.70°C, with SD spanning 1.30–2.17, r2 values from 0.91 to 0.97 (p < 0.0001), and overall bias from −3.65 to −0.27°C (Table 1). While some sites showed similar relative departures in MAE, SD, and r2 for both diurnal extremes, this was not true for all, indicating that underlying causes of the model departures were likely different for TMAX as opposed to TMIN.

5.2 Potential Error Sources
5.2.1 Accuracy of Deployed Sensors
Testing of the iButton/Gill shield deployment design revealed that this configuration logging at hourly intervals is capable of capturing the same daily TMAX and TMIN as the Campbell Scientific equipment. Over a continuous 93 week time frame, bias (calculated as iButton-Campbell) of the iButton daily TMAX compared to the HMP-60 probe and Type-T thermocouple was 0.20°C and 0.45°C (standard deviation σ = 0.64 and 0.66), respectively. Bias (iButton - Campbell) of the iButton daily TMIN compared to the HMP-60 probe and Type-T thermocouple was 0.15°C and −0.41°C (σ = 0.40 and 0.39), respectively. It should be pointed out that the two Campbell sensor types themselves diverged, with mean HMP-60-Thermocouple being 0.25°C for TMAX and −0.56°C for TMIN (σ = 0.31 and 0.10, respectively).
Engineering specifications for Maxim DS1922L iButtons require that all data loggers be calibrated/validated against National Institute of Standards and Technology traceable reference devices, with a temperature accuracy of ±0.5°C when postprocessed with Maxim iButton software [Maxim Integrated, 2016]. Campbell Scientific specifications state that HMP60 probes within calibration are accurate to ±0.6°C [Campbell Scientific, 2016]. With bias and standard deviation of the paired measurement differences equaling less than 1°C in all cases, we can safely state that the measurement methods using the iButtons for daily TMAX and TMIN are functionally equivalent to the research grade sensors in a noncanopied environment.
5.2.2 Model Scaling Test
Estimates of daily TMAX and TMIN were compared using the four different model scales (PRISM4K, PRISM800, PRISMPOINT, and PRISM80) over the entire 2 year study period. No systematic shifts were observed in model bias, error standard deviations, or correlation with observations by changing model scale below 800 m; however, we did observe greater departures in accuracy for many of the 4 km scale comparisons (Figure 3). This indicated that model departures from observations on aggregated time frames were not a matter of scale below the 800 m level even in very steep topography, but that application of the model at the 4 km resolution could yield different results. Patterns in these differences also indicated that separate error mechanisms were in play for TMIN as opposed to TMAX, suggesting that local land-atmosphere interaction becomes increasingly important for modeling temperature in mountains below 4 km spatial resolution.

5.2.3 Topographic Relationships to Error
Results from NMDS analysis of model error and associated topographic variables were as follows: TMAX nonmetric R2 = 0.999, stress = 0.034; TMIN nonmetric R2 = 0.994, stress = 0.079; indicating strong association with one or more topographic features for TMAX and a fair association with one or more features for TMIN. Variables of significance (measured using the squared correlation coefficient for the fit of the topographic vector and its scores in the ordination) for TMAX were the indices DAH (r2 = 0.783, p = 0.001) and TRASP (r2 = 0.818, p = 0.001), which are closely related surface heat-loading factors. VllyD (r2 = 0.522, p = 0.010) and TPI (r2 = 0.535, p = 0.002) were significant factors for TMAX as well. Variables associated with error characteristics for TMIN were Elev (r2 = 0.872, p = 0.001) and StdH (r2 = 0.491, p = 0.014).
The error ordinations, significance of topographic factors, and cluster groups were visualized in two-dimensional plots (Figure 4). These results indicated that the mechanisms associated with PRISM departures were tied to real processes associated with geographic location and topography. The observation that significant topographic factors are not directly aligned with the ordination axes tells us that multiple (and probably interacting) processes are at work behind the nature of daily model bias.

5.3 Impacts to Commonly Derived Hydrometeorological Variables
5.3.1 Frost-Free Season (2014)
PRISM800 correctly approximated the last and first frosts of FF14 at 11 of 16 sites. PRISM4K performed similarly but missed the last springtime freeze on three of the sites that the finer-scale models captured while correctly noting the first freeze on one site that the other scales did not (Figure 5). In four cases, PRISM800 underestimated the length of FF14, and in one case the season length was overestimated. Underestimates were primarily on south facing slopes, and the overestimate was on a north facing slope. In only one case did the PRISM80 and PRISMPOINT downscaled data improve the FF14 estimate. Differences among PRISM scales were limited to the lower elevation sites.

5.3.2 Precipitation as Snow
PRISM's estimates of PSNOW were similar to observation-driven estimates across most of the sites (Figure 6). However, we found that PRISM4K significantly underestimated PSNOW at four upper sites and noticeably overestimated PSNOW at three lower sites. Differences between the three finer-scale PRISM products were minimal and typically very close to PSNOW estimates made using the observed temperature.

5.3.3 Heat Sums (2014 and 2015)
As expected given the consistent cool bias in both TMAX and TMIN, PRISM generally underestimated GDD at all sites in both years (Figure 7). Once again, PRISM4K behaved differently than the finer model scales, in the cases of upper elevation significantly overestimating the thermal sums in both years. Most of the sites experienced between 200 and 400 more degree days in each year than estimated by the finer resolution PRISM products, with only the highest elevation sites showing minimal bias in these cases. These differences between scales and elevations could be significant for applications of the data in climate impacts modeling scenarios.

6 Discussion
Consistently high r2 values at all sites between observations and PRISM temperatures demonstrate that the model approximates daily temperature variations in open woodland environments well and that the response of the model to changing atmospheric conditions at the watershed scale is quite reasonable. Behavior of error statistics and derived products from the upscaled PRISM4K data set proved different than subkilometer-scaled PRISM outputs, indicating the importance of scale when estimating atmospheric conditions in complex terrain. The first-order error mode for PRISM800 on these study sites appears to be a cool bias that underestimates daily TMAX at 10 of 16 sites and TMIN at all 16. Surprisingly, reducing PRISM800 model scale by an order of magnitude had no consistent effect on results. Differences in error seasonality and relationships to site topography instead suggest mechanistic sources of model departures, tied to landscape/atmosphere interactions and how these are represented in source data or model processes.
6.1 Topographic Mechanisms of Error
Diurnal radiation loading is associated strongly with modes of model error in estimating true daily TMAX on open woodland mountain slopes, followed secondarily by relative topographic position (Table 2). While it is also possible that this specific error is being confounded by incorrect assumptions of local lapse rates, the ordination results clearly point to a strong contribution of hillslope heat-loading error vectors as opposed to strictly elevational ones (Figure 4a). When we plot the mean PRISM800 bias for the separate site clusters, we can immediately see that TMAX for Group 1 was consistently underestimated by PRISM (Figure 8a), whereas Group 2 was overestimated in winter. These groups are primarily split between sites with high (Group 1) and low (Group 2) radiative loading (DAH; Table 1). The difference in how PRISM treats these two categories of sites approached 5°C at times during the low-Sun (winter) season, when slope and aspect had the greatest effect on contrasting solar exposures. The relationship between daily temperature extremes and incident solar radiation is well known [Bristow and Campbell, 1984; Thornton and Running, 1999], and the interactions of radiation with geography, land cover, slope, and aspect have also been explored in the context of complex terrain [McCutchan and Fox, 1986; Bolstad et al., 1998; McCune and Keon, 2002; Bennie et al., 2008]. These relationships have been applied in gridded models with local calibration data obtained at relatively high spatial resolution [e.g., Daly et al., 2007; Holden et al., 2015].

In the case of TMIN departures (Figure 8b), there is no consistent seasonal or clustering pattern; rather, a systematic overprediction of drops in nighttime temperature occurred across all sites regardless of cluster group or season. That is, observed overnight lows were warmer across all instrumented sites relative to PRISM estimates. Because the most persistent negative bias in TMIN was actually during summer, we do not suspect snow-on climatology bias in the model during these low-snow years, although snow can be a factor in lowering nighttime temperatures [Pepin et al., 2011]. There were no significant topographic factors associated with these errors besides relative elevation in the watershed (Table 2). This suggests a systematic bias in PRISM's station suite compared to the study sites.
Most of PRISM's source stations are located in flat terrain or valley bottoms (Figure 1), which are conducive to local cold air pooling and atmospheric decoupling, resulting in relatively low minimum temperatures. In contrast, the steeply sloping study locations experience much less cold air pooling and are likely to be more closely coupled to the free atmosphere (Figures 1 and 9). Valley-bottom decoupling from the upper air during stable nighttime conditions is a notable (if understudied) feature of Great Basin topography, and cold-air pools at large scales have been documented as regularly occurring events in the region [Billings, 1954; Wells and Shields, 1964; Wells, 1983]. While PRISM's station-weighting functions account for topographic position for this very reason, the lack of midslope stations to draw from compromised its ability to replicate elevated TMIN values at the study sites. Thus, PRISM's success in dealing with cold-air pooling and stable-air nighttime lapse rates is largely dependent on the availability of stations in a variety of topographic positions. Cold-air pooling, in general, remains a challenge for temperature modeling in mountain environments, given that the frequency and depth of pools is dependent on regional climatology, local airflow responses, and seasonal surface-atmospheric energy exchanges of individual watersheds [e.g., Whiteman, 1982; Bell and Bosart, 1988; Lundquist et al., 2008; Daly et al., 2010; Lareau and Horel, 2014; Holden et al., 2015].

6.2 Biases Related to Hardware and Station Micrositing
Both the PRISM source data and the test observations in this study are subject to a certain amount of bias forced by hardware (primarily radiative shielding of the sensor) and sensor/station microsite characteristics (vegetative cover, snow presence/absence, and local surface albedo). While the test sensors were deployed in a uniform manner, the contributing network stations to PRISM consist of a wider variety of hardware and microsite factors. COOP stations in valley locations typically use the highly effective passively aspirated Stevenson screen sensor shelter [MacHattie, 1965], whereas the RAWS and SNOTEL networks use Gill-type passively aspirated plate shields similar to the test installations. COOP and RAWS stations generally have the sensors located at approximately 2 m in height above ground level (as do the test observations), whereas SNOTEL sensors can be much higher (Figure 9).
Measurements of TMAX are primarily biased by shortwave radiation either entering the shield housing or being transferred to the air inside by the shield material itself. The influence of shortwave radiation as a bias source is directly related to the exchange between ambient air and the air inside the shield (i.e., wind speed [Richardson et al., 1999]), as well as surrounding surface albedo [Huwald et al., 2009]. The albedo of typical geology in the watershed (granites and hydrothermally altered rocks/soils) is high (Figure 9), and this study was conducted during one of the lowest-snow periods recorded for the region, which would minimize snowpack-driven wintertime increase in surface albedo, especially on south aspects. Increased albedo of north facing or heavily vegetated surfaces due to snow retention in winter would be offset by reduced incident radiation angle and increased shading. Because of these factors, we expect that seasonal biases in both source and test data in TMAX due to snow presence are minimized during the study period. The strongest source of observation bias under these conditions is more likely to be the typical airflow across and through the shields during the warmest portion of the day. Surrounding vegetation significantly reduces average wind speeds at the sensor housing, with the consequence that Gill-type shields can experience TMAX warm biases up to several degrees when wind speed is near zero and the sensor housing is not shaded [Nakamura and Mahrt, 2005; Huwald et al., 2009].
Measurements of TMIN are more influenced by microsite position and cool airflow during nighttime hours, rather than hardware factors. The presence of snow can certainly impact local air temperatures [Pepin et al., 2011]; however, the abnormally low snow amounts during the study period would have minimized the duration of measurement bias during the snow season at these locations.
The question remains: how do these interacting factors impact source data bias versus test observation bias, and how do these conclusions relate to interpretation of PRISM model performance? A primary source of midelevation data for PRISM in the western U.S. is the SNOTEL network, a series of sites maintained for seasonal streamflow prediction that also includes basic meteorological sensors [. Natural Resources Conservation Service, 2015]. The Walker Basin is no exception—there are eight SNOTEL stations within or near the watershed, and these dominate the local middle and upper elevation data contributions to the model. The topoclimatic siting characteristics of these stations therefore become an important factor in the performance of PRISM and other gridded models in the region. Because the SNOTEL network is designed to measure snow hydrology variables near the headwaters of major streams and rivers, the stations are frequently associated with upper elevation montane forests. They are situated such that they are not thermally representative of woodland slopes, but rather flats or even sink zones. Accordingly, it is likely that the SNOTEL stations in and around the Walker Basin experience air conditions with canopy-influenced reduction of radiation and increased cooling as well as more stable, frequent flow of cool air during nighttime conditions (Figure 9). These features make the SNOTEL sites a prime suspect in the PRISM departures in this study, due to local siting characteristics rather than systematic instrumental error.
Instruments and topographic position being equal, the siting characteristics of SNOTEL sensors would, in general, result in cooler TMAX values than the test observations during times of low wind speed simply due to local shading and reduced local albedo. Elevated, more exposed sites in this region (such as our test locations) often experience persistent wind that would reduce TMAX bias in the test data. For example, the mean wind velocity at the Rockland climate station (midelevation, in the center of the watershed, and open woodland ridge) during the study period was 5.3 m s−1. Because of elevated wind speeds, it is possible that many of the testing sites would not experience significantly more TMAX bias than forested SNOTEL sites, in spite of generally increased radiation factors. The differences in SNOTEL sensor height, however, could offset this on a case-by-case basis, depending on air turbulence at the sensor housing [Nakamura and Mahrt, 2005].
It has recently been pointed out that changes in SNOTEL temperature sensing hardware and calibration/processing practices could be introducing biases into the network [Oyler et al., 2015]. Reported shifts in temperature have been as much as ~1.1°C, primarily on the warm side, but varying with temperature. At least some SNOTEL stations in the Walker were affected by this bias during the study period. A preliminary correction (PRISM800corr) using an empirically derived ninth-order function developed in collaboration with the National Park Service in Alaska (Daly et al., unpublished data) did not make significant improvements to the model performance in our study (mean adjusted PRISM800 TMAX bias = −0.58°C compared to 0.75°C, mean adjusted PRISM800 TMIN bias = −2.67°C, compared to −1.95°C). Thus, corrections to the model to compensate for SNOTEL bias appear to be overshadowed by other bias sources. We therefore interpret the primary mode of error in the overall test of the PRISM800 model to be a function of micrositing differences between the PRISM source data and the study sites, and secondarily the larger topographic settings.
6.3 Applications and Use of PRISM in Complex Terrain
Our tests of the PRISM temperature product at different scales describe both strengths and weaknesses of gridded climate interpolation models in mountainous terrain, particularly on the proportionally large areas that are composed of elevated slopes. Our work also highlights several sometimes confounding sources of measurement bias, which vary across data networks. These regional and network-specific biases create challenges for both the model developer and the end user. We feel that this study's results are likely transferable across large portions of the Intermountain West where vegetation is limited to shrub and woodland cover on steep topography. The reasoning for this is tied to typical spatio-topographic distribution of source data for the model as well as the land cover types. Studies using PRISM (or other gridded products) directly or as part of downscaling efforts applied to temperature-related questions should examine the scale of application and cumulative effects of potential bias in both regional source data and representative topography. For example, studies estimating frost-free periods on steep mountain slopes may well be quite accurate in their conclusions, as first and last frost days in the intermountain west are most likely tied to transitional-season synoptic events which are interpolated well by PRISM regardless of source data bias. Predictive models which seek to partition precipitation or build/melt snowpack may be subject to greater error if coarse-scale grid products are used (Figures 6 and 7). If thermal sum calculations (such as GDD5) are used in snowmelt or climate impacts models, PRISM could introduce significant error (because of the cumulative nature of the statistic) if study sites do not share the topographic characteristics of PRISM's source data (Figure 7). Thus, researchers focused on processes which are sensitive to small changes in atmospheric temperature may discover that in order to arrive at accurate conclusions, local observations over full seasonal cycles for model calibration purposes would be wise, especially if model source data in the region are not topographically representative. This is becoming a more and more common practice, although the method of measurement, micrositing factors, and radiative shielding practices introduce their own biases and must be understood when making comparisons with gridded data sets.
7 Conclusions for Climate Science in Mountain Regions
Our results indicate that PRISM temperature is reasonably representative of conditions on semiarid mountain slopes, but that absolute bias caused by a combination of factors can be significant and should be taken into account when applying PRISM data to science questions in similar settings or downscaling climate models over complex terrain. These findings highlight the importance of topoclimatic siting for near-surface observations in mountain science and how data processing (e.g., SNOTEL calibration) or prediction experiments (e.g., warming) that shift regional temperatures by a degree or two can be completely overshadowed or negated within interpolation models by source data bias at locally relevant scales.
We observed a significant improvement in model performance as grid resolution was increased from ~4 km to ~1 km, but no relative improvements were made by further downscaling to ~80 m or using point interpolation. This suggests that at scales of 1 km or less, factors other than elevation, such as slope, aspect, and vegetation cover (which drive local airflow and radiative effects), have greater significance for site-level daily temperatures on mountain slopes. Therefore, as gridded models mature and incorporate diverse sources of data in pursuit of greater resolution and accuracy, it falls to the individual researcher to ensure that the application and use of modeled climate data are in fact appropriate to the science in question given model assumptions.
The relative amount of geographic area in mountain watersheds encompassed by elevated slopes is often quite large compared to ridgetop or canyon bottoms [Strahler, 1952], representing opportunity for significant spatial propagation of error if model assumptions are incorrect where slopes are concerned. The atmospheric and mountain science communities need to make concerted efforts to rectify the current lack of representative data across the gradients of elevation, aspect, and vegetative cover. Better characterization of local station biases and increasing awareness of these in scientific application are also critical needs.
As future studies conduct in situ tests, the need for expansion of sensory networks that better capture a range of climate variables across diverse topographies will become abundantly clear. Conducting similar tests across a greater diversity of landscapes and topographic exposures both here in the Walker Basin and elsewhere in the semiarid western U.S., with the express intent of observations replicating “standard” 2 m weather station diurnal measurements that are used in interpolated products, is a lofty but obtainable goal in pursuit of improving the near-surface atmospheric modeling process and scientific applicability.
Acknowledgments
The authors wish to thank the following collaborators who have contributed to field and data operations supporting this climate work in the Walker Basin: Constance Millar from the USDA Forest Service; Graham Kent from the Nevada Seismological Laboratory; Sergiu Dascalu and Fredrick Harris from the Nevada Research Data Center at the University of Nevada, Reno; Gregory McCurdy from the Western Regional Climate Center; Matthew Doggett with the PRISM Climate Group; the NRCS SNOTEL program; and Jehren Boehm from the Department of Geography at the University of Nevada. We also acknowledge the valuable inputs of Jessica Lundquist at the University of Washington and three anonymous reviewers who helped us refine our initial manuscript. At this time, the data used in this study may be obtained from the authors. Future archival will be at the Nevada Research Data Center (https://sensor.nevada.edu). Scotty Strachan was supported in part by National Science Foundation grants GSS-1230329 and EPSCoR IIA-1301726, as well as the University of Nevada, Reno, College of Science Dean's Office.