An Updated Assessment of Near‐Surface Temperature Change From 1850: The HadCRUT5 Data Set

We present a new version of the Met Office Hadley Centre/Climatic Research Unit global surface temperature data set, HadCRUT5. HadCRUT5 presents monthly average near‐surface temperature anomalies, relative to the 1961–1990 period, on a regular 5° latitude by 5° longitude grid from 1850 to 2018. HadCRUT5 is a combination of sea‐surface temperature (SST) measurements over the ocean from ships and buoys and near‐surface air temperature measurements from weather stations over the land surface. These data have been sourced from updated compilations and the adjustments applied to mitigate the impact of changes in SST measurement methods have been revised. Two variants of HadCRUT5 have been produced for use in different applications. The first represents temperature anomaly data on a grid for locations where measurement data are available. The second, more spatially complete, variant uses a Gaussian process based statistical method to make better use of the available observations, extending temperature anomaly estimates into regions for which the underlying measurements are informative. Each is provided as a 200‐member ensemble accompanied by additional uncertainty information. The combination of revised input data sets and statistical analysis results in greater warming of the global average over the course of the whole record. In recent years, increased warming results from an improved representation of Arctic warming and a better understanding of evolving biases in SST measurements from ships. These updates result in greater consistency with other independent global surface temperature data sets, despite their different approaches to data set construction, and further increase confidence in our understanding of changes seen.

Since the release of the predecessor of the data set presented here, HadCRUT4 , new analyses of near-surface temperature have been undertaken, and with them understanding has improved of deficiencies in the observing network and in analysis methods. This has led to updates to analyses with long pedigrees (Lenssen et al., 2019;Zhang et al., 2019), the arrival of new and independent analyses (Rohde & Hausfather, 2020;Yun et al., 2019), and related studies (Benestad et al., 2019;Ilyas et al., 2017;Kadow et al., 2020).
Efforts to consolidate archives of instrumental air temperature series under the auspices of the International Surface Temperature Initiative (ISTI; Rennie et al., 2014) have greatly increased the availability of meteorological station series. The resulting ISTI databank underpins the updated GHCNv4 air temperature data set (Menne et al., 2018) and regional subsets of station series from the ISTI databank have been selectively included in updates to the CRUTEM4 and CRUTEM5 data sets Osborn et al., 2021). These improved data holdings have increased observational coverage of regions that were previously poorly represented, including the rapidly warming high northern latitudes.  and  introduced a new land air temperature analysis developed independently of preexisting studies. This analysis included a new method for bias-adjusting station records, a process that is commonly known as homogenization, and combined estimation of homogenization adjustments with an independently developed spatial analysis method. The study has since been extended to include analysis of HadSST3 SSTs (Kennedy et al., 2011a(Kennedy et al., , 2011b to produce a merged land-sea data product (Rohde & Hausfather, 2020).
A key uncertainty for estimating long-term change is that associated with corrections for systematic errors in SST measurements. Comparisons of long historical SST data sets  showed that there were differences between SST data sets which were larger than the estimated uncertainties. A comparison to modern "instrumentally homogeneous" data sets by Hausfather et al. (2017), found that HadSST3 (Kennedy et al., 2011a(Kennedy et al., , 2011b and COBE-SST-2 (Hirahara et al., 2014) underestimated recent warming. Cowtan et al. (2018) compared SST products to coastal weather stations highlighting discrepancies between temperature trends in land and ocean data sets. Carella et al. (2018) used characteristic daily cycles in SST measurements to infer how the measurements were made and showed that previous assumptions underestimated the prevalence of engine-room measurements. Freeman et al. (2017) compiled release 3.0 of the International Comprehensive Ocean Atmosphere Data Set (ICOADS) including newly digitized data. Two long-term historical SST analyses, HadSST and ERSST, which are based on ICOADS, have been updated using this new release. ERSST has gone through two updates-version 4 (Huang et al., 2016) and 5 -which extended bias adjustments to the whole SST record, implemented improvements to the analysis, and quantified uncertainty. HadSST.4.0.0.0 (Kennedy et al., 2019) revisited the bias adjustments applied to the data, using oceanographic measurements to understand and reduce some of the key uncertainties in HadSST3.
Recent updates to instrumental near-surface temperature data products have brought improvements in their assessment of uncertainty, and in provision of uncertainty information for use in onward analyses. Ensemble uncertainty assessments have become commonplace in air temperature data sets (Menne et al., 2018;Morice et al., 2012) and SST data sets (Huang et al., 2016Kennedy et al., 2011bKennedy et al., , 2019. The NOAAGlobalTemp version 5 analysis Zhang et al., 2019) updates previous NOAA analyses (Smith et al., 2008) by bringing together updates to underpinning data holdings over land (Menne et al., 2018) and merges the expanded land data holdings of GHCNv4 with the updated ERSSTv5 data set. An ensemble uncertainty assessment is included , sampling the uncertainty in parametric choices in the SST adjustments procedure, the station series homogenization algorithm (Menne et al., 2018) and the spatial analysis method used.
The NASA Goddard Institute for Space Studies GISTEMPv4 analysis (Lenssen et al., 2019) introduces an updated uncertainty assessment, applying the GISTEMP spatial analysis methods to the 100-member GH-CNv4 ensemble of homogenized station series and basing SST uncertainty assessments on the ERSSTv4 ensemble. Additional uncertainty associated with the production of spatial analyses from incomplete station data is assessed by subsampling reanalysis fields from a selection of modern reanalyzes.
The coverage of instrumental records of near-surface temperature changes is characterized by often sparse and nonuniform sampling of the globe. Assessments of uncertainty in global and regional average temperature changes have found that sparse data coverage is the most prominent source of uncertainty over monthly to decadal timescales Morice et al., 2012), outweighing uncertainty arising from changes in observing methods. Recent studies have also shown that poor representation of some regions, notably the rapidly warming high northern latitudes, may have contributed to an underestimation of globally averaged temperature changes in recent years (Cowtan & Way, 2014;Karl et al., 2015).
While efforts have been made to increase data coverage in the CRUTEM4 and now CRUTEM5 data sets through inclusion of additional meteorological station data in less well-observed regions Osborn et al., 2021) and marine data holdings have been expanded to include recently digitized marine reports (Freeman et al., 2017), statistical analysis methods were not used in HadCRUT4 or its underpinning land and marine data sets to infer temperature changes in regions where measurements are not available. An independent application of local statistical interpolation methods to HadCRUT4, in a study by Cowtan and Way (2014), found that statistically infilled reconstructions showed recent warming over high latitude regions that is not proportionately represented in global mean temperatures calculated from the noninfilled HadCRUT4 data set. The study also included an analysis that used satellite-based upper air temperature estimates as a proxy for near-surface temperature variability in the gaps in data coverage in HadCRUT4, which also showed warming in these high latitude regions. This high-latitude signal contributed to an increase in the assessed rate of change of global average temperatures since the beginning of the 21st century.
Unlike HadCRUT4, other existing near-surface temperature data sets utilize statistical analysis methods to infer spatial fields from scattered observations. Analysis methods based on spatial covariance structure, known variously as optimal interpolation (e.g., as used in Reynolds & Smith, 1994), kriging (e.g., as used in Cowtan & Way, 2014), Gaussian process regression (Rasmussen & Williams, 2006) and variants thereof, have a long history of use, particularly in analyses of SSTs (Donlon et al., 2012;Reynolds et al., 2002;Reynolds & Smith, 1994). These methods use knowledge of the covariance structure of spatial fields to infer field values as weighted averages of observations in locations with strong covariation. Typically, weighting is based on a statistical model in which nearby locations are expected to covary strongly and distant locations weakly. Methods of this form are a core part of the Rohde & Hausfather (2020) analysis and of the analysis of Cowtan and Way (2014). The GISTEMP data set also uses a distance-weighted average that, while similarly applying a weighted average of local observations, does not make use of a covariance model and so does not classify as a kriging type analysis.
A second form of spatial analysis methods that are commonly applied in instrumental climate analyses, reduced space methods, decompose spatial temperature variability into a finite, typically orthogonal, set of spatial patterns of variability (Kaplan et al., 1997). These patterns are generally, but not necessarily, global in extent. Spatial reconstructions are then formed as a weighted sum of these patterns. The Empirical Orthogonal Teleconnection (Smith et al., 2008) method employed within the NOAAGlobalTemp v5 analysis falls within this category of reduced space algorithms, employing a finite set of locally defined spatial patterns that are fit to the available data.
A recent assessment of the use of neural networks to estimate missing values in the HadCRUT4 data set (Kadow et al., 2020) expands the ensemble of methods used to reconstruct global temperatures. Derived global temperature series show good agreement with prior studies using more traditional methods.
Traditionally, surface temperature data sets have combined air temperatures over land with SSTs over the ocean, rather than the more natural choice of air temperatures over the ocean. SST measurements are currently far more numerous than marine air temperature (MAT) measurements because SST can be readily measured by automatic sensors on drifting buoys as well as being retrieved from satellite measurements of radiances, while observational sampling of MAT has been in recent decline . There are significant biases in daytime marine air temperature observations (Berry et al., 2004). Night-time measurements have therefore been used to develop observational records of marine air temperature changes (Kent et al., 2013), with up-to-date independent assessments of historical night-time MAT becoming available only recently (Cornes et al., 2020;Junod & Christy, 2020). Anomalies in MAT and SST have been expected to be similar over long space and time scales due to the strong physical link between the two. However, Cowtan et al. (2015) showed that MAT and SST changes simulated in coupled climate models differ, with MAT warming slightly faster than SST, affecting comparisons of observed and modeled global temperature change if care is not taken to ensure an "apples to apples" comparison. They also found that decisions about how to handle marginal sea-ice areas could affect the estimated changes, depending on the use of SST or MAT. Therefore, while there is good motivation for the use of MAT (Cowtan et al., 2015;Richardson et al., 2016), there are currently challenges relating to the MAT observational network  that provide an observational rationale for the continued use of SST in monitoring global surface temperature variability and change until these challenges are addressed.
Recent developments in satellite retrievals of surface skin temperatures present a new possibility for near-surface temperature monitoring, bringing the potential for detailed spatial information with sustained measurement over a time frame that is now of sufficient length for climate studies. Recent work (Rayner et al., 2020) has explored the potential of combining air temperature information inferred from satellite skin temperatures with traditional in situ observations, expanding on the understanding of relationships between satellite-derived skin temperatures and traditional near-surface air temperature observations, and on the stability of these relationships over time that is required to construct merged data products. Alternatively, dynamical reanalyzes, that combine numerical weather prediction models with a range of varied observational data sources, are increasingly being used to monitor the climate (e.g., ERA5, Hersbach et al., 2020;JRA-55, Kobayashi et al., 2015;and MERRA-2, Gelaro et al., 2017). These alternative sources of near-surface temperature data provide useful information in locations that are not well represented in traditional near-surface temperature data sets. However, in all cases, understanding of nonclimatic effects affecting observations and arising from analysis methods is required when combining observations from multiple sources.
Here, two ensemble surface temperature data sets are presented. The first, the "HadCRUT5 noninfilled data set," adopts the gridding and ensemble generation methods of HadCRUT4 . The second, the "HadCRUT5 analysis," uses a statistical infilling method to improve the representation of sparsely observed regions. Through application of the statistical infilling method to the HadCRUT5 noninfilled ensemble, the HadCRUT5 analysis ensemble samples the uncertainty in the gridded near-surface temperature data that arises from residual biases in observational data after correction, for example associated with uncertainty in changes in instrumentation and measurement practices at meteorological stations Morice et al., 2012) and changes in SST measurement methods (Kennedy et al., 2019). It also samples the effects of basic measurement uncertainty, uncertainty arising from estimation of gridded temperature fields from a finite number of observations and residual uncertainties associated with individual marine measurement platforms, where information identifying individual platforms is available (Kennedy et al., 2019). Statistical reconstruction uncertainty is also encoded in the HadCRUT5 analysis ensemble, producing an ensemble that samples a greater range of sources of uncertainty than was possible in HadCRUT4. Thus, the new ensemble analysis communicates the major known sources of uncertainty in an easily accessible way.
The remaining sections of this paper are structured as follows. Section 2 describes the data sets used as inputs and for comparison. Section 3 provides an overview of the methods used to construct HadCRUT5. Results are presented in Section 4 with conclusions and discussion in Section 5. Monitoring Service, as buoy data in ICOADS were incomplete following a change in data-transmission codes in late 2016. Early measurements made using buckets are adjusted using a physically based model of heat lost from water-sampling buckets (Folland & Parker, 1995;Rayner et al., 2006). From the 1940s onwards, ship measurements are adjusted based first on comparisons with near-surface oceanographic measurements (Atkinson et al., 2014) and then, from the early 1990s onwards, on comparisons with buoy measurements. The resulting HadSST.4.0.0.0 data set is presented as anomalies relative to 1961-1990 on a 5° latitude by 5° longitude grid and is representative of SST as measured by drifting buoys at an approximate depth of 20 cm.

Input
Overall, the global SST change estimated from HadSST.4.0.0.0 is larger than that estimated from HadSST.3.1.1.0 (and earlier versions). This is due to two factors. First, new estimates of biases associated with measurements made in ships' engine rooms show that these biases have declined since the 1950/1960s, artificially reducing the long-term change represented in the underlying data and in earlier versions of HadSST. Second, a greater proportion of measurements during the 1961-1990 period were estimated to have been made in ships' engine rooms. Other changes include: using buoys as a reference data set; producing ensemble members with step changes in the time evolution of the proportions of canvas and wooden buckets in the early 20th century alongside ensemble members which assume a linear transition; estimating the fraction of incorrect metadata using comparisons with oceanographic measurements; and using comparisons with oceanographic measurements to narrow the range of plausible transition dates from canvas buckets to modern rubber buckets (see Kennedy et al., 2019 for a detailed discussion).
Uncertainty in HadSST.4.0.0.0 is split into three main components associated with: pervasive systematic errors; systematic errors from individual ships or buoys; and uncorrelated errors from individual measurements and incomplete grid-box sampling. The pervasive systematic errors, which have complex temporal and spatial correlations, are represented using a 200-member ensemble generated by varying uncertain parameters and choices in the bias adjustment scheme. The systematic errors are represented using covariance matrices that encode the error covariances between grid cells that arise from ships making measurements in multiple grid cells in a month. Finally, uncertainties from uncorrelated errors are provided as gridded fields. Note that these uncertainty components do not span the full range of uncertainty. In particular, structural uncertainty remains (Thorne et al., 2011) and there may be an underestimate in the systematic error component because it does not currently deal explicitly with errors that correlate at the level of national shipping fleets (Chan & Huybers, 2019) or with marine reports that lack ship call signs or other identifying information (Carella et al., 2017).

CRUTEM.5.0.0.0
Monthly averages of near-surface air temperature measured at weather stations over the land surface for 1850-2018 are obtained from CRUTEM.5.0.0.0 (Osborn et al., 2021, referred to hereafter as CRUTEM5). The CRUTEM station database is a collection of station series obtained from National Meteorological and Hydrological Services (NMHSs) and large collections such as the European Climate Assessment and Data set (Klein Tank et al., 2002). CRUTEM incorporates corrections that NMHSs apply to their own data to minimize the impact of changes in weather station instrumentation or location on the measurement series. The monthly average temperatures from stations are subjected to quality control, converted to anomalies (differences from their 1961-1990 means) and then averaged into 5° latitude by 5° longitude grid boxes.
CRUTEM5 has improved quality control checks that: (i) improve the flagging of incorrect data during 1941-1990; (ii) reduce the trend toward increased flagging of suspect data outside of the 1941-1990 period; and (iii) reduce the number of genuine extreme values that are erroneously flagged as incorrect, for example, during coherent extreme events such as summer 2003 in Europe (see Osborn et al., 2021 for details). The station database has been expanded such that the number of those stations with sufficient data to estimate temperature anomalies has grown from 4,842 in CRUTEM.4.0.0.0 (as used in Morice et al., 2012) to 7,983 in CRUTEM5 (Osborn et al., 2021). Most of the new data acquisitions are in already-sampled regions, so the number of grid-box values is only moderately expanded (by 9%) relative to CRUTEM.4.0.0.0.
The changes in temperature seen in hemispheric or global averages since 1850 are not sensitive to these updates, but some regional differences are apparent. Osborn et al. (2021)  An alternative gridding method was explored in Osborn et al. (2021) for CRUTEM5 to address the underrepresentation of high latitude stations in the standard gridding. This alternative method allows each station to contribute to more than one neighboring 5° latitude by 5° longitude grid box on the same latitude, where the number of grid boxes to which each station can contribute is determined by an inverse cosine latitude weighting. In the current study, the alternative gridding method is not used because (a) the uncertainty model for the CRUTEM5 grids, as documented in Brohan et al. (2006), only applies to the standard gridding approach (where each station contributes to only one grid box); and (b) the issue of high-latitude sampling is dealt with here by statistical infilling.
HadCRUT5 uses an ensemble version of the CRUTEM5 uncertainty model. The HadCRUT5 noninfilled ensemble grids and accompanying uncertainty grids are produced from the CRUTEM5 station temperature anomaly series, following the methods of Morice et al. (2012), as described in Section 3.2.

HadISST.2.2.0.0
We use sea ice concentration from the Met Office Hadley Centre sea-Ice and Sea Surface Temperature data set, HadISST.2.2.0.0 (an update to Titchner and Rayner (2014)), on a 1° latitude by 1° longitude grid to determine the presence or absence of sea ice in any individual ocean grid box in each month from 1850 to 2018.
HadISST.2.2.0.0 is updated relative to version 2.1.0.0 in the following ways: (i) reinstatement of a small number of erroneously removed sea-ice-filled grid boxes after 1978; (ii) an alteration to the adjustments applied to the National Ice Center charts (used to determine the ice edge between 1972 and 1978) correcting a low-bias in the HadISST.2.1.0.0 fields in the Arctic then; and (iii) an improvement in the interpolation applied between two atlas-derived climatologies used to determine ice extents in the Antarctic to produce a smoother transition between them and between 1962 and the start of monthly observations in 1972.

ERA5
We have used monthly ERA5 analysis 2 m air temperature data from 1979 to 2018 (Hersbach et al., 2020) for coverage uncertainty estimation and for comparison of global and regional diagnostics. ERA5 was produced using 4D-Var data assimilation in the European Centre for Medium-range Weather Forecasts' (ECMWF) Integrated Forecast System. We used the (31 km) high resolution realization.
NOAAGlobalTemp version 5 is based on the Global Historical Climatology Network (GHCN) version 4 land station data set (Menne et al., 2018) and the Extended Reconstruction Sea Surface Temperature (ERSST) data set version 5 . Station records in GHCN v4 are homogenized using an automated algorithm. SSTs are adjusted using comparisons with marine air temperature and latterly drifting buoys. The data are interpolated using Empirical Orthogonal Teleconnections, providing improved coverage, although coverage does not extend fully into the polar regions.
GISTEMP version 4, like NOAAGlobalTemp v5, is based on a combination of GHCN v4 and ERSST v5. The SST data are interpolated as in NOAAGlobalTemp. Land surface air temperatures are interpolated from station data within a 1,200 km radius. Extrapolated land surface air temperatures are used over the oceans in sea-ice covered areas. Coverage of the GISTEMP data set is quasiglobal in the past 20 years, with good coverage of the poles and other data-sparse regions from interpolated station data.
The Berkeley Earth data set (Rohde & Hausfather, 2020) uses a kriging-based technique to interpolate and homogenize station data. A kriging based technique is also applied to SSTs from the HadSST3 data set to provide coverage over the whole globe. The version of the data set that uses extrapolated land-surface air temperatures over the oceans in sea-ice covered areas is used here. (2014) is based on the HadCRUT4 data set. The land and ocean data are interpolated using kriging. Grid cells that contain data in HadCRUT4 are not modified during interpolation (in contrast to the kriging of HadSST3 data in the Berkeley Earth data set). As with GISTEMP and Berkeley Earth, extrapolated land-surface air temperatures are used over the oceans in sea-ice covered areas.

Cowtan and Way
The Berkeley Earth (1° latitude by 1° longitude resolution) and ERA5 (0.25° latitude by 0.25° longitude resolution) analyses were regridded to 5° latitude by 5° resolution using an area-weighted average of all grid cells falling within a HadCRUT5 5° grid cell. Cowtan and Way and NOAAGlobalTemp were obtained on a 5° grid. The GISTEMP data, which were obtained on a 2° grid, were not regridded.

Methods
Two gridded data sets are provided as part of HadCRUT5. The first version of the data set is produced without statistical infilling, referred to here as the "HadCRUT5 noninfilled data set," following the methods of Morice et al. (2012), and is intended for use in applications where statistical infilling is not desired. This is accompanied by a second version of the data set, hereafter referred to as the "Had-CRUT5 analysis," that is produced using a statistical method to estimate more-complete temperature anomaly fields.
The HadCRUT5 noninfilled data set and the HadCRUT5 analysis are produced in the following steps. First, an ensemble land-surface air temperature data set, with accompanying additional uncertainty information, is generated from the CRUTEM5 station data (Section 3.2). The land data set is then merged with SST anomaly information from HadSST4 through a weighting method based on the land area fraction (Section 3.4) to produce the noninfilled data set. Next, monthly fields are estimated separately for the land surface air temperature data set and for HadSST4 using a statistical method to create an ensemble analysis for each (Section 3.3). The separate land and ocean analyses are then merged into a combined land and ocean ensemble analysis using a land-sea weighting scheme that also accounts for sea ice coverage (Section 3.4). Global and regional time series are then computed from the two merged data sets, following the methods of Morice et al. (2012) with updates to the method used to estimate uncertainty associated with incomplete observational coverage described in Section 3.5. Error models for each data set are described in Section 3.1. Full details of uncertainty propagation for land and ocean merging and global and regional time series are provided in the Supporting Information.

The HadCRUT5 Error Models
This section outlines the terms of the error model for grids and time series of the HadCRUT5 noninfilled data set and the HadCRUT5 analysis. Further details are given in the Supporting Information.
The error models are split into components according to the way that uncertainty information is presented in HadCRUT5. The sources of uncertainty modeled in HadCRUT5 are grouped according to their correlation structure to allow uncertainties to be propagated appropriately into derived diagnostics such as regional average time series.

The HadCRUT5 Noninfilled Data Set
The error model for the noninfilled data set describes the estimate of temperature anomaly   , T s t at spatial location s and time t as a sum of the true temperature anomaly T(s, t) and three error terms: a bias term ε b (s,t) representing biases with large-scale spatial and temporal structure; a partially correlated error term ε p (s,t) for errors with typically local structure; and an uncorrelated error term ε u (s,t) describing errors that are independent between spatial and temporal locations. The full error model for noninfilled fields is given by: This error model for the merged data set matches the structure of the error model for the land data set and for HadSST4. For the land data set, the contributions to the bias term are the land station homogenization error, urbanization and biases from nonstandard measurement enclosures. There is no contribution to the partially correlated term and the uncorrelated term models the within grid box measurement and sampling uncertainties . For HadSST4, the bias term models the effects of residual errors in the adjustments applied to account for changes in measurement methods, the partially correlated term models the effects of residual biases associated with individual observing platforms, and the uncorrelated term models the within grid cell measurement and sampling uncertainties (Kennedy et al., 2019).
The HadCRUT5 noninfilled ensemble samples the uncertainties for the combination   . The uncertainties for partially correlated and uncorrelated errors are not encoded into the noninfilled ensemble. Instead, uncertainty information for partially correlated errors ε p (s,t) is provided in spatial error covariance matrices and uncertainties for uncorrelated errors ε u (s,t) are provided for each observed grid cell.
The error model for estimates of spatial average time series   T t derived from the gridded data is given as a sum of the true temperature anomaly time series T(t) and four error terms: here ε b (t) is the effect of the bias term propagated into the spatial average, ε p (t) is the effect of the partially correlated term, ε u (t) the effect of the uncorrelated error term. The fourth error term, ε c (t), is the error in estimating the spatial average from incomplete spatial coverage, with missing grid cells resulting from limitations in the spatial sampling provided by the observation network. Full details of uncertainty propagation for each of these terms are given in the Supporting Information.

The HadCRUT5 Analysis
An overview of the HadCRUT5 analysis is provided in Section 3.4 and a detailed description of methods is provided in Appendix A. The HadCRUT5 analysis error model has fewer terms than that of the noninfilled data set as the analysis methods combine multiple sources of error into a single analysis error term. The error model for the HadCRUT5 analysis defines the temperature anomaly estimate as the sum of the true temperature T(s, t) and the analysis error ε a (s,t): The analysis error term combines all errors that are modeled in the Gaussian process analysis, both spatial reconstruction errors and observational errors, as described in Appendix A. The analysis ensemble samples the analysis uncertainty such that each ensemble member is a sample of T(s,t) + ε a (s,t).
For the HadCRUT5 analysis, errors in global and regional average time series are derived as a combination of the propagated analysis error and ε a (t) and an additional coverage error term ε c (t) that represents the error in estimating the spatial average from incomplete analysis grids, noting that this coverage error term differs from that of the noninfilled data set due to the different spatial coverage of the analysis.
The propagation of uncertainty associated with these errors is described in the Supporting Information.

Ensemble Land Air Temperature Data Set
As in the previous versions of HadCRUT, near-surface air temperature information over land is derived from the CRUTEM data set. As in Morice et al. (2012), an ensemble air temperature data set is produced by sampling from the distributions of known uncertainty in station temperature records. The station data on which the ensemble grids are based have been updated to now use the CRUTEM.5.0.0.0 data set (Osborn et al., 2021).
A detailed description of the land air temperature ensemble sampling method can be found in Morice et al. (2012). The sampling approach is designed so that the effects of known sources of residual systematic error in station anomaly series can be quantified for regional statistics and time series. The ensemble size has been increased to 200 members for HadCRUT5 to match the 200-member HadSST4 ensemble.
The sampling method is as follows. Samples are drawn from the distributions of known uncertainties during the station gridding process. Residual homogenization error and uncertainty in climatology normal information are sampled from distributions described in Brohan et al. (2006) and encoded into realizations of individual station series prior to gridding. The systematic effects of residual urbanization errors Parker, 2010) and nonstandard sensor enclosures (Folland et al., 2001;Parker, 1994) are sampled and encoded into the gridded ensemble at a regional level, again following the method of Morice et al. (2012).
Additional uncertainty information for errors that are uncorrelated between grid cells (e.g., from measurement error or incomplete sampling of a grid cell) is not encoded into the land ensemble. Instead, these measurement and sampling-related uncertainties are provided as additional uncertainty information outside of the ensemble, as in Morice et al. (2012).

Spatial Analysis of Temperature Anomaly Fields
HadCRUT5 now includes an ensemble spatial analysis that reconstructs more spatially extensive anomaly fields from the available observational coverage. The purpose of this analysis is to: (1) reduce uncertainty and bias associated with estimation of global and regional climate diagnostics from incomplete and uneven observational sampling of the globe; (2) provide improved estimates of temperature fields in all regions; and (3) provide a method to quantify uncertainty in anomaly patterns.
We adopt a Gaussian process based method for spatial analysis that is closely related to the ordinary kriging approach (Rasmussen & Williams, 2006), and apply the method independently to land air temperature and SST observations before merging the two to produce a global analysis. The method models monthly temperature anomaly fields as realizations of a Gaussian processes with a simple covariance structure, defined as a function of the distance between locations, and an a priori unknown mean, and accounts for observational uncertainty. A detailed description of the analysis method is presented in Appendix A.
The Gaussian process method is applied to the 5° latitude by 5° longitude gridded anomaly fields of the land ensemble and the HadSST4 ensemble. The additional observational uncertainty terms that accompany these input ensembles are provided to the Gaussian process estimation as monthly error covariance matrices. The spatial reconstructions are based upon a model of the covariance structure of the 5° latitude by 5° longitude anomaly fields. This covariance structure is modeled using a Matérn covariance function, for which the covariance decays as a function of Euclidian distance between locations. The parameters of the Matérn covariance function are fitted separately for land air temperature and SST anomalies (see Appendix A.2), representing typical variability in each domain.
As a Bayesian method, the approach provides a framework for assessing analysis uncertainties and provides the capability to draw samples from the posterior distribution of the analysis. We generate an ensemble of field estimates through application of the analysis method to each input ensemble member and then drawing samples from the posterior distributions of the Gaussian process estimates. The land and ocean analysis ensembles combine all sources of uncertainty represented in the input gridded data sets while respecting the estimated covariance structure of the temperature anomaly field so that each ensemble member is a plausible spatial analysis of the temperature anomaly field.
The analysis has limited capability to reconstruct temperatures at long distances from available observations, as the field estimates are based on a model of local covariance structure. We therefore introduce criteria for excluding regions where there is not a strong observational constraint on the analysis (see Appendix A.4). The masked land air temperature and SST anomaly ensembles are then merged, as described in Section 3.4.

Blending Land Air Temperatures with Sea-Surface Temperature Data
The 200-member ensemble land air temperature data set based on CRUTEM5 and the 200-member HadSST4 are merged as a weighted average of the 5° latitude by 5° longitude land and marine fields. Two versions of the data set are provided: one that uses the spatial analysis method presented in Section 3.2 and one that does not.

Merging Noninfilled Data Sets
For the noninfilled data set, the land air temperature ensemble and HadSST4 ensemble members are merged following the methods of Morice et al. (2012). The temperature anomaly T(s, t) at location s and time t is defined as the weighted average of the air temperature anomaly T L (s, t) and sea surface temperature anomaly T M (s, t), with weights f(s, t): T s t f s t T s t f s t T s t    The weighted average is based on the areal fraction of land and sea in a 5° latitude by 5° longitude grid cell using the same land fraction data set as HadCRUT4, originally derived from the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA; Donlon et al., 2012) 0.05° land mask information. As in Had-CRUT4, land air temperature information receives a minimum weighting of 25% to prevent island stations from receiving near-zero weighting. Where only one of the land air temperature or SST data sets are available, the available data source receives 100% weighting.
Methods for merging the uncertainty fields and measurement error covariance information for land and marine data sets are unchanged from those described in Morice et al. (2012) and are detailed in the Supporting Information.

Merging Land and Ocean Analyses
The land-sea weighting scheme is modified for the HadCRUT5 analysis. Areas of sea ice are treated as if they were land in the weighting (consistent with the approach used by Cowtan and Way (2014)), so that temperature anomalies over sea ice are reconstructed as part of the air temperature analysis rather than the SST analysis.
Sea ice concentrations are obtained from the HadISST.2.2.0.0 data set. Where the ice concentration on the native 1° latitude by 1° longitude HadISST.2.2.0.0 grid exceeds 15%, the threshold value used to define the ice edge in Titchner and Rayner (2014), the area is considered to be ice covered for the purpose of deriving weights. Ice concentrations below 15% are treated as open water. For each HadISST.2.2.0.0 grid cell, a value of one is set if the sea-ice concentration is greater than 15% and zero otherwise. On the 5° latitude by 5° longitude HadCRUT5 grid, the fractional area of water covered by sea ice is then obtained through area-weighted averages of the nonland 1° grid cells of ones and zeroes. This area of ice-covered water is treated as land when deriving weights for land and ocean analyses.
The 25% minimum weighting for land air temperature is retained for any 5° latitude by 5° longitude grid cells that are observed in the noninfilled land air temperature data set so that information from island stations is not lost in the averaging. This minimum weighting is not applied in grid cells that are not directly observed. Reconstructed land air temperatures are not used over 100% sea regions where there are no land stations or sea ice and, similarly, interpolated SST is not used over 100% land regions. This prevents extrapolation of land air temperature far into ocean regions and prevents inland extrapolation of SSTs.

Estimating Uncertainty Arising from Incomplete Coverage
Spatial fields of temperature anomalies in the noninfilled HadCRUT5 data set and the HadCRUT5 analysis are not globally complete. Variability in regions of the world that are not represented in the spatial fields gives rise to uncertainty in global and regional time series. For the noninfilled HadCRUT5, the coverage uncertainty accounts for regions of the globe where insufficient observations are available to compute grid cell average anomalies in the underlying air temperature and SST data sets. For the HadCRUT5 analysis, the coverage uncertainty accounts for the masked regions of the analysis that are not well constrained by observations.
Coverage uncertainty is assessed by subsampling globally complete reanalysis fields to the coverage of Had-CRUT5 using the method presented in Brohan et al. (2006) and Morice et al. (2012), which is described in detail in the Supporting Information. The approach is updated here to use the recently released ERA5 reanalysis (Hersbach et al., 2020) as the globally complete reference data set, in place of the previously used NCEP/NCAR reanalysis (Kalnay et al., 1996). Temperature anomalies are computed for the ERA5 monthly 2 m air temperature grids, referenced to the period of ERA5 that overlaps with our climatology period: [1979][1980][1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. Anomalies are then averaged to the 5° latitude by 5° longitude grid used in HadCRUT5 to produce the reference fields for the coverage uncertainty calculations.

Effects of Updated Data and Methods in HadCRUT5
Differences in global and hemispheric mean time series between HadCRUT4 (version HadCRUT.4.6.0.0) and the HadCRUT5 noninfilled data set and HadCRUT5 analysis are shown in Figure 1. The differences between the noninfilled HadCRUT5 and HadCRUT4 primarily arise from updates to the SST observational bias assessment in HadSST4. The updated bias corrections result in slightly cooler anomalies globally and in each hemisphere from the 1880s to 1970s. Anomalies are warmer from the 1980s onwards.
The most obvious difference is the relative warming of HadCRUT5 between around 1970 and 1980. This arises from improved estimates of biases in measurements made in ship engine rooms at that time. Engine room measurements were biased warm in the 1960s with the warm bias dropping over time, first between 1970 and 1980 and then again between the early 2000s and present. There are also changes around the Second World War, where changes to the assumptions made in HadSST4 about how measurements were taken shifted the mean and broadened the uncertainty range, reflecting the lack of knowledge of biases during this difficult period (Kennedy et al., 2019).
Northern hemisphere uncertainty estimates for the noninfilled HadCRUT5 are slightly wider that those of Morice et al. (2012). This results from a combination of the changes in the SST bias adjustment model and the adoption of ERA5 as the reference data set for coverage uncertainty calculations (Section 3.5). This change of reference data set typically gives wider uncertainty estimates in the northern hemisphere for similar observational coverage. The reverse is true in the southern hemisphere, with similar or slightly smaller coverage uncertainty estimates for the noninfilled HadCRUT5. This reflects differences in regional variability in sparsely observed regions between reanalysis products.
Further differences from HadCRUT4 can be seen in the HadCRUT5 analysis. Temperatures in the latter decades of the 19th century are on average cooler than in the noninfilled HadCRUT5 data set in the global and northern hemisphere series. Temperatures in the 21st century are on average warmer than those in the noninfilled HadCRUT5, primarily due to estimation of additional areas of warm anomalies in high latitude regions in the northern hemisphere, including use of air temperature anomalies over sea ice inferred from land stations. Rebalancing the representation of land and marine regions also affects average temperatures throughout the record. This is consistent with previous studies that adopt local interpolation methods MORICE ET AL.
10.1029/2019JD032361 12 of 28 (c) (Cowtan & Way, 2014;Karl et al., 2015;Lenssen et al., 2019). Together these features result in greater warming throughout the 20th and 21st centuries in the HadCRUT5 analysis than is indicated by the noninfilled data set. However, for any given year, the effect of the reconstruction may be to produce either a warmer or cooler annual average and is dependent on variability in reconstructed regions that were not well represented in HadCRUT4 (see also Figures 5b and 5d). Global and northern hemisphere HadCRUT5 analysis series fall outside the upper 95% uncertainty limit of HadCRUT4 in the 21st century but rarely depart from the uncertainty range of the HadCRUT5 noninfilled data set, which includes the updated HadSST4 bias adjustments and has wider northern hemisphere coverage uncertainty ranges.
The uncertainty range for the HadCRUT5 analysis is narrower than that for the noninfilled data set, as the infilling effectively reduces the coverage uncertainty by filling gaps in the data and accounting for the nonuniform distribution of observations. The effect of this can be clearly seen in the Southern Hemisphere (Figure 1) where the narrowing of the uncertainty range before the 1950s is much less than after the 1950s, when routine monitoring on the Antarctic continent started, and coverage of the HadCRUT5 analysis thereafter approaches 100%.
As discussed in Section 3.1, the error model structure for the noninfilled HadCRUT5 data set is the same as in Morice et al. (2012), with observational bias adjustment uncertainties encoded into the ensemble and separate measurement and sampling uncertainty information provided and propagated into the uncertainty ranges on the hemispheric and global averages shown in Figure 1. The approach adopted for the Had-CRUT5 analysis differs in including the effects of measurement and sampling uncertainties in the ensemble while also sampling from the spatial analysis uncertainty. Examples of HadCRUT5 analysis ensemble members are shown in Figure 2.
There is little change in the HadCRUT5 analysis ensemble spread for global or hemispheric averages from the 1970s onwards, reflecting the spread in the underlying SST ensemble and the relatively stable spatial sampling during this period. The ensemble spread in the global average in the 1940s is similar to that prior to the 1870s, though in the 1940s, this spread arises predominantly from uncertainty in the SST biases, whereas prior to the 1870s, the spread is largely due to uncertainty in the spatial field estimates due to limited observational sampling of the globe.
There is coherent spatial structure in the deviations of ensemble member fields from the ensemble mean. This results from uncertainty in the spatial analysis and its estimation from uncertain observations. Some ensemble members may be cool while others are warm in regions where uncertainty is high (e.g., see differences between ensemble members in Antarctica in Figure 2). The additional coverage uncertainty arising from masked regions is a relatively smaller component of the total uncertainty as a result of the increased coverage in the HadCRUT5 analysis fields and the inclusion of reconstruction uncertainty within the ensemble. On multiannual timescales, the uncertainty in observational bias adjustments becomes prominent. This is reflected in persistently warm or cool departures from the ensemble mean in global and regional diagnostics over many years for individual ensemble members (for example see ensemble series in Figure 2). Noninfilled HadCRUT5 ensemble members are shown in Figure 3, matching those shown for the Had-CRUT5 analysis in Figure 2. HadCRUT5 analysis fields have greater spatial extent than the noninfilled data set and are also smoother as a result of measurement and sampling uncertainties being taken into account within the analysis framework. In regions of few, scattered observations, infilled analysis fields have much greater extent but also show diversity in reconstructed anomaly patterns, reflecting uncertainty in the reconstruction in these sparsely observed regions.
Uncertainty ranges for the global average temperature series in Figure 3 show the ensemble spread in relation to the full uncertainty range, accounting for all quantified sources of uncertainty. While the HadCRUT5 analysis and noninfilled data set quantify uncertainty from the same error sources, the HadCRUT5 analysis encodes a greater portion of the uncertainty into the ensemble, whereas the noninfilled ensemble only samples uncertainties that are most important over multidecadal time scales. The ensemble for the noninfilled HadCRUT5 data set samples the uncertainty associated with observational bias adjustments, with structure that is relevant to multidecadal climate assessments. Unlike the HadCRUT5 analysis, measurement and sampling uncertainties that are relevant at shorter time scales are not encoded into the ensemble and are instead provided as auxiliary information. Uncertainty from incomplete global coverage of the observing network is a greater portion of the total uncertainty for the noninfilled data set. In contrast, for the Had-CRUT5 analysis, the uncertainty from incomplete global coverage is divided between the analysis ensemble spread in reconstructed regions and a smaller coverage uncertainty term relating to regions that are masked.

Global, Hemispheric and Regional Series
Annual global and hemispheric average temperature anomaly series for HadCRUT5 are shown in Figure 4, along with the fraction of regional data coverage represented in the noninfilled data set and the HadCRUT5 analysis.
Areal data coverage in the HadCRUT5 analysis grids first reaches 90% in the 1900s, with two subsequent drops in coverage in the late 1910s and early 1940s associated with the two world wars. Northern hemisphere coverage exceeds 99% in the early 1920s and reaches 100% in the mid-1950s. Uncertainty in southern hemisphere temperatures is greatest in the period prior to the establishment of a sustained Antarctic MORICE ET AL.  1942, 1958, and 2016 in four example ensemble members. Lower panel: ensemble spread in global mean (°C), 1850-2018. The difference between each ensemble member and the ensemble mean is shown by the gray lines, with the first four ensemble members (corresponding to the maps above) highlighted in red. Gray shading: 95% confidence interval determined by the ensemble spread. Orange: full uncertainty range adding the additional coverage uncertainty term. Global means have been calculated by averaging anomalies for northern and southern hemispheres for each ensemble member. Maps require six months of data within a year for a grid cell average to be plotted. monitoring network in the 1950s (see also Figure 5a), after which global coverage exceeds 97% in the 1960s. The spatial extent of the observing network in the southern hemisphere is also a prominent contribution to uncertainty in global average series prior to the 1950s. Global coverage of the analysis fields is typically not complete even in modern years due to an absence of sustained observation in the southern South Pacific, and the nearby Southern Ocean and Antarctic.
Southern Hemisphere anomalies are cooler in the HadCRUT5 analysis in the 1990s from around 1992, particularly in 30-60S (Figure 5b). The observing network is less dense in these regions, with regular shipping covering only the equatorward half of the latitude band, leading to differences between noninfilled Had-CRUT5 and the HadCRUT5 analysis. Variability in the regional time series ( Figure 5) is smaller in the early record in the HadCRUT5 analysis than the noninfilled data set, particularly in the high latitude regions as a result of reduced uncertainty from spatial sampling in the HadCRUT5 analysis.
In regions where data are sparse, and hence uncertainty in surface temperature analyses is largest, data that might be used to validate the analyses is also highly limited. Here we have used the ratio of posterior to MORICE ET AL.

10.1029/2019JD032361
15 of 28  (°C, relative to 1961-1990) for 1877, 1942, 1958, and 2016 in four example ensemble members. Lower panel: ensemble spread in global mean (°C), 1850-2018. The difference between each ensemble member and the ensemble mean is shown by the gray lines, with the first four ensemble members (corresponding to the maps above) highlighted in red. Gray shading: 95% confidence interval determined by the noninfilled ensemble spread. Orange: full uncertainty range including additional measurement and sampling uncertainty terms, that are not sampled by the noninfilled ensemble, and the coverage uncertainty term. Global means have been calculated by averaging anomalies for northern and southern hemispheres for each ensemble member. Maps require six months of data within a year for a grid cell average to be plotted.
prior variances to remove regions with weak observational constraint (see Appendix A for details). Despite restricting the reconstruction to regions that are locally constrained, there is a marked increase in the area of the globe represented by the HadCRUT5 analysis in comparison to the noninfilled data set (see coverage timeseries in Figure 5 and example monthly fields in Figures S10-S13 of the Supporting Information). Figure 6 reveals the patterns of change in successive 30-year periods and the most recent 19 years of the HadCRUT5 analysis. Even in these longer-term averages, there are regions that are particularly warm or cool relative to the global mean. The final panel for 2000-2018 illustrates the greater warming at high northern latitudes and over the land compared to the ocean. The surface waters of the Southern Ocean, in contrast, have warmed more slowly than many other areas. We also see one area of long-term cooling, to the south of Greenland and Iceland . 1880-1909 was a particularly cool period, with centers of low average anomalies in the South Atlantic, Canada, and Central Russia.
MORICE ET AL.

Comparisons with Other Analyses
Average temperature changes over the whole period of record in 30° latitude bands for a range of analyses are shown in Figure 7. The HadCRUT5 analysis method is closely related to the method used in Cowtan & Way but differs in three key aspects. First, it accounts for the spatial variation in data uncertainty as well as the estimated measurement and sampling error covariances. This is particularly important for the oceans, where less-reliable ship data are combined with more accurate data from drifting and moored buoys. Second, the spatial analysis method is used to make improved temperature estimates at all locations, not just grid cells without data. Third, by using a full covariance model for both the temperature field and the observational uncertainty within a Bayesian analysis framework, it is possible to sample from the posterior of the distribution to generate a consistent ensemble data set that combines all known sources of uncertainty while respecting the estimated covariance structure of the temperature anomaly field.
The differences between the HadCRUT5 analysis ensemble mean and Cowtan & Way in the post 1950 period, are largely due to changes in the estimated SST biases. As Berkeley Earth shows similar differences and uses the same SST data set as Cowtan & Way, we can infer that changes in the estimated SST biases are the key difference here as well. The changes in SST bias estimates are larger in the more sparsely observed regions-the tropics and southern hemisphere-where there are fewer ships, so changes in assumptions about observing practice of a few countries can have a proportionately larger effect.
Differences between HadCRUT5 and the ERSST-based data sets, GISTEMP and NOAAGlobalTemp are also largely due to differences in estimated SST biases. In particular, ERSST tends to be cooler than HadSST4 from the early 20th century to the start of the Second World War and from the end of the war to around 1955; this difference is associated with uncertainty in the estimated biases associated with bucket measurements, particularly in the Southern Hemisphere and the tropics. From the 1960s, agreement between HadSST4 and ERSST is better, though there is a notable cooling of ERSST relative to HadSST4 in the early 1990s associated with a relative cooling of marine air temperature compared to SST (see Kennedy et al., 2019). From the late 1990s onwards, both ERSSTv5 and HadSST4 show good relative stability compared to instrumentally MORICE ET AL.
Differences can be seen in the first half of the 20th century between GISTEMP/NOAAGlobalTemp and Cowtan & Way/HadCRUT5 over the latitude band 0°N-30°S with GISTEMP/NOAAGlobalTemp cooler (Figure 7c). Regional differences over land partly result from differences in homogenization and the underlying station data sets. HadCRUT5 uses homogenized station data (from CRUTEM5), as provided by national meteorological services or research projects. Other data sets include automated homogenization algorithms Menne et al., 2018;. This may result in regional differences between data sets, particularly where the measurement network is less dense and, as a consequence, there is greater uncertainty in homogenization.
Temperature changes relative to the average over the late 19th century are shown in Figure 8. The 51-year period 1850-1900 is often considered for practical purposes to be representative of preindustrial conditions. This approximation of preindustrial temperatures is consistent with that adopted in IPCC AR5 (Hartmann et al., 2013) and IPCC SR1.5 (Allen et al., 2018), noting that any choice of period is a compromise, with MORICE ET AL.

10.1029/2019JD032361
19 of 28 (c) (f) natural variability and forcing playing a role (Hawkins et al., 2017). For analyses that do not extend back to 1850 (NOAAGlobalTemp and GISTEMP), 1880 to 1900 is used as the reference period here. By referencing the time series to this early period, the spread of temperature anomalies later in the series is increased. This increased spread reflects uncertainty in temperatures in the early reference period and not uncertainty in recent temperature changes. On the global mean, the analyses are remarkably consistent with one another despite the differences in their construction.

Conclusions
An updated data set of global near-surface temperature change, HadCRUT5, is presented. Updates in the CRUTEM5 data set have expanded the underlying land station series and provided additional data quality checks. Updates in HadSST4 have brought improved understanding of the evolution of the marine observing network, contributing improved bias adjustments and uncertainty estimates. These are combined both in a noninfilled data set and in a new ensemble statistical analysis that provides a more spatially complete assessment of global and regional changes and uncertainty therein.
The new HadCRUT5 analysis ensemble samples a greater range of the quantified uncertainties than our previous assessment ( For all data sets except for ERA5, anomaly series are computed by adjusting monthly time series to the appropriate baseline using data available in the anomaly reference period before averaging to annual series. ERA5 timeseries are shifted to match the 1981-2010 average for the HadCRUT5 analysis series, due to insufficient data in the climatology periods to compute anomalies. Anomaly series and uncertainties provided by the data set producers using each data set's native methods are shown in Supporting Information Figure S9.

(a) (b)
methods, measurement and sampling errors and spatial analysis uncertainty are all encoded into the expanded 200-member ensemble, communicating the major known sources of uncertainty in an easily accessible way.
Time series of globally averaged temperature anomalies show greater 21st century warming for the Had-CRUT5 analysis than for the HadCRUT5 noninfilled data set. The increased warming is predominantly associated with improved representation of the rapidly warming but sparsely observed high latitudes of the northern hemisphere. This finding is consistent with other independently produced statistical analyses of global temperature changes and is also consistent with temperature changes observed in reanalysis data sets that assimilate observational data into a numerical weather prediction model (Blunden & Arndt, 2019;Gelaro et al., 2017;Hersbach et al., 2020;Kobayashi et al., 2015).
The HadCRUT5 analysis indicates that globally averaged temperatures in the second half of the 19th century were on average cooler than estimates based on noninfilled HadCRUT5. This is also consistent with assessments based on other independently produced statistically infilled analyses. Combined with the evidence of increased warming in recent years, the infilled analyses indicate that warming since the 19th century is likely greater than is indicated by HadCRUT4 as a result both of observational sampling in the noninfilled data set and of updates to our understanding of biases in SST measurements resulting from changes in the make-up of the marine observing network.
There is, however, uncertainty in our understanding of 19th century temperatures resulting from limitations in observational sampling, particularly in the southern hemisphere, and uncertainty associated with residual observational biases. Uncertainty remains in the early instrumental record in locations for which observational data are not available to inform the analysis. This is most evident in the Antarctic, the Arctic, and regions of the southern hemisphere land, prior to the establishment of permanent observing sites.
Methodological choices in representation of data sparse regions in different data sets lead to differences between global and regional average temperature time series. The impacts of these choices are most evident in regions and at times in which the observational data required to constrain the analysis is limited or unavailable, particularly in regions of the southern hemisphere in the early record. The spread of 19th century temperature analyses produced by different monitoring centers in part reflects the sensitivity to differences in methods used. These methods assume different statistical models for the data; therefore, the differences between analyses are not necessarily captured by the uncertainty estimates of any single method.
The updated analysis methods assist in mitigation of the impacts of low availability of observational data in data sparse regions. We anticipate that an extension, in potential future work, of the analysis covariance model to describe regional variation in variability would further improve the analysis temperature fields and uncertainty estimates. However, digitization of as yet unavailable observations and submission of these to open archives continues to be invaluable to improve regional data coverage and reduce uncertainty further.
The use of marine air temperature observations has recently been proposed to reconcile differences between data sets produced as a blend of SST and air temperature observations and model-based studies using near-surface air temperatures over ocean (Cowtan et al., 2015;Richardson et al., 2016). However, uncertainties in observed long-term changes in marine air temperature and their differences from observed SSTs are important to understand (Chan et al., 2019;Chan & Huybers, 2019;Kennedy et al., 2019), and the marine air temperature observing network is less robust than that for SST and is in long-term decline . Challenges also remain in monitoring near-surface temperature changes in the cryosphere, given sparse observational coverage and changes in sea-ice extent, with impacts on downstream assessments (Richardson et al., 2018).
Relative biases in SST measurements arise from differences in measurement methods and instrumentation. Such biases change regionally and over time with gradual as well as abrupt changes in the composition of the observing network or underlying databases. The characteristics of different bias adjustment schemes can be seen in the differences between analyses, broadly grouping data sets into those (GISTEMP, Lenssen et al. (2019) and NOAAGlobalTemp, Huang et al. (2019)) that adopt the ERSST v5 data set , those (Cowtan & Way (2014) and Berkeley Earth, Rohde & Hausfather (2020)) that adopt HadSST3 (Kennedy et al., 2011a(Kennedy et al., , 2011b, and that which uses the improved HadSST4 data set (Kennedy et al., 2019), as is documented here. Differences between bias adjustments applied in each data set are smaller than the assessed adjustments themselves, which result in a net reduction in observed warming compared to unadjusted measurements (Kennedy et al., 2019).
Nevertheless, differences in SST bias assessments feature prominently as a source of difference between studies and remain a key uncertainty in assessing long-term change .
Despite methodological differences, temperature series derived from different analyses are in good agreement, generally lying within the assessed uncertainty range of the HadCRUT5 analysis. Updates in Had-CRUT5 bring our estimates of global and hemispheric series closer to those of other recent studies. Remaining differences between estimates are understood to predominantly arise from differences in spatial analysis methods applied and differences in how each analysis accounts for changes in marine observing methods.
Cross covariances between observed grid cells and prediction grid cells (i.e., the full output grid) are defined as K cross . We define K y as the sum of the covariance K obs and the observational error covariance R: The observational error covariance matrices are constructed from the error model terms of the noninfilled data sets. When the analysis method is applied to an ensemble member of the land air temperature ensemble (i.e., the observation vector y contains the grid cell values for an individual land ensemble member for one month), the observational error covariance R contains the additional uncorrelated within-grid-cell measurement and sampling error variances on the leading diagonal with zeros elsewhere. When applied to a SST ensemble member (i.e., y contains the grid cell values for an individual HadSST4 ensemble member), R is constructed from the HadSST4 per-platform uncertainties for the partially correlated error component, provided as full error covariances in HadSST4, with additional uncertainty from uncorrelated measurement and sampling error variances added onto the leading diagonal.
Estimation proceeds following Rasmussen and Williams (2006). The expected value of the anomaly field g * given the observations y is defined as where: and: The posterior covariance |  Σ g y for the Gaussian process prediction is given by: Together, |  g y µ and |  Σ g y define the full posterior distribution of the Gaussian process estimate of the gridded temperature anomaly field g * for all output grid cells, given observations y.

A.2. Kernel Hyperparameter Estimation
The estimation of the amplitude (σ) and decorrelation range (r) parametersof our spatial model is based on application of the maximum marginal likelihood method that is described in Rasmussen and Williams (2006 here, N is the number of observed grid cells in y and J is the number of covariates included in the regression portion of the analysis model. We include a single covariate for the analysis field mean, hence 1 J  in our application. The hyperparameters are fit to monthly 'best estimate' gridded temperature anomaly fields separately for land air temperatures and SSTs. Observational uncertainties are derived from the HadCRUT5 land ensemble uncertainty model (described in Morice et al., 2012) and HadSST4 uncertainty model (Kennedy et al., 2019), as described below.
As we fit hyperparameters to best estimates of the nonfilled grids, we include an additional uncertainty component in the observational error covariance to represent the observational bias uncertainty that is encoded into the land ensemble and the HadSST4 ensemble. Hence, when fitting hyperparameters, an extended observational error covariance R' is substituted for R where ensemble    Σ R R and Σ ensemble is an error covariance matrix that is empirically derived from the ensemble. The ensemble-derived error covariance matrices are only used when fitting hyperparameters for the best estimate fields. They are not included in the observational error covariance term when fitting the analysis fields for individual ensemble members in Appendix A.3.
For land hyperparameter estimation, the monthly observation vector y is constructed from a CRUTEM5 best estimate field. The observational error covariance R is constructed from the uncorrelated measurement and sampling uncertainty grids, from the Brohan et al. (2006) error model, while Σ ensemble is computed from the HadCRUT5 land ensemble. For marine hyperparameter estimation, the observation vector y is constructed from a HadSST4 ensemble median field. The observational error covariance matrices R are constructed by combining HadSST4 uncorrelated measurement and sampling uncertainties with the HadSST4 "micro bias" error covariance matrices and Σ ensemble is computed from the HadSST4 ensemble.
Hyperparameter estimates are computed for each of the 360 monthly fields in the 1961 to 1990 climatology period, during which the observational sampling is near global in extent. The hyperparameters used in the analysis are taken as the average of the hyperparameters fitted in the 360 monthly optimizations, with scale parameters rounded to the nearest 0.05˚C and range parameters rounded to the nearest 50 km. The resulting amplitude parameter σ and range parameter r for the land air temperature analysis are 1.2  ˚C and 1300 r  km. For the sea surface temperature analysis, the fitted parameters are 0.6  ˚C and 1300 r  km. The smoothing parameter was fixed at 1.5   . This model represents typical land and marine temperature anomaly variability. The model does not include regional and seasonal variations in these parameters, nonetheless where there is a sufficient observational constraint the method can reproduce appropriate regional and seasonal variability in the analysis anomaly fields. Additional information on the monthly hyperparameter fits can be found in the Supporting Information.
The criteria used to mask regions, defined in terms of a threshold α, is based on the ratio of posterior and prior variance of the local Gaussian process estimate, omitting the global regression term which has an improper prior, with regions of the analysis masked where the following inequality is satisfied: The left-hand side of Equation A14 is bounded between zero and one and we use a threshold of 0.25   to provide a balance between retaining regions with useful information content and masking those regions that have a weak observational constraint. Global and hemispheric average temperature series for varying α are provided in the Supporting Information and indicate that these diagnostics are insensitive to the choice of α values in the range 0.1-0.5

Data Availability Statement
The gridded temperature anomalies, the global and hemispheric timeseries and their uncertainty intervals will be available from the Met Office website (https://www.metoffice.gov.uk/hadobs) HadCRUT5 data will be archived for long term preservation and reuse as part of the