Using Ice Cores and Gaussian Process Emulation to Recover Changes in the Greenland Ice Sheet During the Last Interglacial

The shape and extent of the Greenland Ice Sheet (GIS) during the Last Interglacial (LIG) is a matter of controversy, with different studies proposing a wide range of reconstructions. Here, for the first time, we combine stable water isotopic information from ice cores with isotope‐enabled climate model outputs to investigate the problem. Exploring the space of possible ice sheet geometries by simulation is prohibitively expensive. We address this problem by using a Gaussian process emulator as a statistical surrogate of the full climate model. The emulator is calibrated using the results of a small number of carefully chosen simulations and then permits fast, probabilistic predictions of the simulator outputs at untried inputs. The inputs are GIS morphologies, parameterized through a dimension‐reduction technique adapted to the spherical geometry of the setting. Based on the emulator predictions, the characteristics of morphologies compatible with the available ice core measurements are explored, leading to a reduction in uncertainty on the LIG GIS morphology. Moreover, a scenario‐based approach allows to assess the gains in uncertainty reduction which would result from the availability of better dated LIG measurements in Greenland ice cores.


Introduction
The extent and shape of the Greenland ice sheet (GIS) during the Last Interglacial (LIG; 129 to 116 ka) is a matter of some controversy. Mean global sea level is thought to have been increased by between 6 and 9 m compared to present-day level during this time period (e.g., Dutton et al., 2015;Kopp et al., 2009), with a likely contribution from both the Antarctic (e.g., DeConto & Pollard, 2016;Sutter et al., 2016) and the GISs (e.g., Fyke et al., 2011;Helsen et al., 2013;Lhomme et al., 2005;Tarasov & Peltier, 2003). The GIS contribution to the LIG highstand has however been difficult to pin down: Studies suggest that it may have contributed anywhere between +0.3 and +5.5 m to global mean sea level (Born & Nisancioglu, 2012;Cuffey & Marshall, 2000;Helsen et al., 2013;Otto-Bliesner et al., 2006;Quiquet et al., 2013;Robinson et al., 2011;Stone et al., 2013;Tarasov & Peltier, 2003;Yau et al., 2016;NEEM community members, 2013). Possible drivers for these changes include stronger summertime insolation at high northern latitudes, and Arctic land summer temperatures 4-5 • C higher than those that occurred during the preindustrial (e.g., CAPE Last Interglacial Project Members, 2006).
Evidence proximal to Greenland about the LIG comes from sediment discharged from Greenland and from deep ice cores (e.g., Colville et al., 2011;NGRIP Project Members, 2004;NEEM community members, 2013). From the sedimentary evidence from cores drilled off the Greenland coast it is difficult to produce an assessment of the extent and shape of the LIG GIS (Colville et al., 2011). Ice sections corresponding to the LIG have been found, near the bed, at up to seven Greenland ice sites: NEEM, NGRIP, GRIP, Renland, DYE3, GISP2, and Camp Century (CC) (Johnsen & Vinther, 2007). See Figure 1 for location of the sites. The suggestion of LIG ice in every one of these cores hinges on the occurrence of high stable water isotope ( 18 O) measurements of the ice in the deepest parts of each core. At each of these sites, measurements from near the bottom show 18 O values that are above the range that occurred during the better dated most recent 120 ka (e.g., Johnsen & Vinther, 2007;NEEM community members, 2013). Climate model simulations of the LIG, forced by appropriate greenhouse gas and orbital changes alone, have tended to fail to capture these observed 18 O  (Govin et al., 2015;Bazin et al., 2013;Veres et al., 2013); GRIP (blue squares, dated by Landais et al., 2003); GISP2 (green squares dated by Suwa & Bender, 2008; and green circles dated by Yau et al., 2016). (b) Camp Century, 18 O on a depth scale (Johnsen et al., 2001). (c) DYE3, 18 O on a depth scale (Johnsen et al., 2001). (and temperature) changes (e.g., Gierz et al., 2017;Malmierca-Vallet et al., 2018;Masson-Delmotte et al., 2011;Sime et al., 2013;Sjolte et al., 2014).
If the LIG age interpretation of the near-bed ice at these ice core sites is correct, there must have been an extensive LIG GIS, with only a modest retreat during this period. However, this ice is difficult to date at several of the sites due to basal melting and complex flow interfering with age-depth modeling (NGRIP Project Members, 2004;NEEM community members, 2013;Landais et al., 2003). This LIG age uncertainty contributes to a wide range of possible LIG GIS scenarios. Some studies suggest that DYE3 was ice covered during the LIG (e.g., Colville et al., 2011;Willerslev et al., 2007), while others imply a complete absence of ice in southern Greenland (e.g., Koerner & Fischer, 2002;MacGregor et al., 2015). Some studies suggest strong melting in the south (e.g., Otto-Bliesner et al., 2006;Tarasov & Peltier, 2003), some in the north (e.g., Quiquet et al., 2013;Stone et al., 2013), and some in both (e.g., Born & Nisancioglu, 2012;Calov et al., 2015;Helsen et al., 2013;Robinson et al., 2011). Some studies even propose a two-dome structure with a small ice dome in south Greenland and a large ice dome in north Greenland (e.g., Calov et al., 2015;Langebroek & Nisancioglu, 2016).
Here we build on these studies with a new comprehensive investigation of the response of 18 O to changes in the shape and extent of the GIS at 125 ka. We combine a synthesis of all available 18 O data from deep Greenland ice cores, with simulation outputs from the Hadley Centre Coupled Model, Version 3 (HadCM3; Gordon et al., 2000;Tindall et al., 2009, referred to as simulator in the following). Since each simulation is computationally expensive and time consuming, the approach we follow is to use a Gaussian Process (GP) emulator: a fast and efficient statistical model providing probabilistic predictions of the simulator response to any GIS configuration, on the basis of the actual response on a selected set of GIS morphologies. Thanks Note. Most likely, maximum, and minimum values between 120 and 126 ka. All values are anomalies with respect to the present day. See section S1 in the supporting information for full details. a For NEEM, we take the value at the age 126 ka. For the other ice cores, this is the value provided by Johnsen and Vinther (2007). b We take the maximum/minimum value within 120-126 ka along the dated NEEM, GRIP, and GISP2 ice cores. For NGRIP, we take the maximum/minimum value from near the bed, as the ice core does not reach back to 126 ka. For CC and DYE3, we use the maximum value from near the bed as a maximum, and the present-day value as the minimum. Details are provided in section S1 of the supporting information.
to the emulator predictions, we inspect the properties of the GIS morphologies that are compatible with the available 18 O records, reducing the uncertainty with respect to the previous studies.
The work is structured as follows. In section 2 we summarize the available information from the current LIG 18 O measurements in Greenland. In section 3 the methodology is discussed. We describe the approach used to parameterize the range of possible GIS morphologies, section 3.1, and to select the ones used as input to the HadCM3 simulation runs, section 3.2. After a brief overview of the simulation setting, section 3.3, GP emulation is discussed in section 3.4: In section 3.4.1 a summary of the main steps and formulas needed to construct a GP emulator is provided, with the aim of providing a concise and self-contained reference of practical use; the specific choices and considerations to construct our GP emulators are detailed in sections 3.4.2 and 3.4.3. Section 3.5 illustrates the way in which the emulator predictions are compared to the ice core data, in order to identify record-compatible (RC) morphologies. In section 4, we discuss the emulator fit and show the results of the comparison between emulator predictions and ice core measurements, in terms of both mathematical properties and ice morphologies. Finally, implications for the GIS morphology during the LIG and for the ice core drilling community are discussed in section 5.

Data
We compile LIG stable water isotopic ( 18 O) information from six Greenland deep ice cores: NEEM, NGRIP, GRIP, GISP2, CC, and DYE3; see Figure 1 for references and illustration of the data, and also Johnsen and Vinther (2007) and Suwa et al. (2006). While LIG ice layers have been found in numerous Greenland deep ice cores, it is not straightforward to infer the LIG peak 18 O and the associated uncertainty range because of the difficulties attached to the dating of the LIG layers and the possibility of missing LIG layers.
NEEM is the only Greenland ice core depicting a continuous LIG climatic record; that is, it covers the time interval ∼114.5 to ∼128.5 ka (NEEM community members, 2013). Absolute dating uncertainties for NEEM are estimated to be of at least 2,000 years (Govin et al., 2015). Dating errors at the bottom for the other Greenland ice cores are likely larger. The ice stratigraphy at NGRIP is undisturbed but basal melting possibly removed the older LIG section. Hence, dating uncertainties are attached to the lack of data available to constrain the glaciological flow model used to establish the age model (NGRIP Project Members, 2004). While tentative reconstructions of the chronology of the bottom of the GRIP and GISP2 ice cores have been made using gas record synchronization with Antarctic ice cores (Landais et al., 2003;Suwa et al., 2006), the accuracy of dating the bottom of CC and DYE3 is limited by the poor preservation of deep ice samples. See section S1 in the supporting information for full details.
In Table 1, we summarize the previous information by providing a most likely, minimum, and maximum value at each site. All values are provided as anomalies with respect to the present day. At NEEM, GRIP, and GISP2, the LIG values are computed by considering the maximum and minimum 18 O values measured across the time interval 120-126 ka (Figure 1). At NGRIP, we take the maximum/minimum value from near the bed, as the ice core does not reach back to 126 ka. We acknowledge that inferred uncertainties attached to the NGRIP 18 O minimum anomaly might be underestimated due to dating uncertainties as well as elevation-related assumptions (see supporting information for details). For CC and DYE3 sites, we use the maximum value from near the bed as the maximum, and the present-day value as the minimum (see supporting information for details). The last choice is due to the fact that, at both sites, it is not possible to confidently identify ice from the 120-126 ka time interval. In section 4, we investigate a series of scenarios that help to estimate how much information would be gained from recovering better dated LIG ice from the CC and DYE3 core locations.

Methods
To tackle the problem of using ice core data to help constrain the morphology of the GIS at 125 ka, we develop a methodology using climate simulations and Gaussian process emulation to assess the response of 18 O in Greenland ice cores to changes in the shape and extent of the GIS. This requires a mathematical parameterization of the set of GIS morphologies (section 3.1), and the selection of a small number of design morphologies on which to run the 18 O climate simulator (section 3.2). Based on the 18 O-simulated responses (section 3.3) on the design morphologies, an emulator is built (section 3.4) that allows almost instantaneous 18 O predictions to any GIS morphology.

Parameterizing the Set of Ice Sheet Morphologies
We undertake the problem of parameterizing the set of ice sheet morphologies using a Principal Component Analysis (PCA) approach. This allows extrapolation of important morphology patterns from an initial set of LIG GIS reconstructions. A classical application of PCA to our ice sheet data set, however, would be inappropriate due to the spherical geometry of the Earth. We therefore apply the PCA ideas while weighting the components of a real vector, in our case representing the surface height of an ice sheet at different grid cells, by the area of the grid cells. This represents a particular case of generalized PCA, described, for example, in section 14.2.1 of Jolliffe (2002). The paper Salter et al. (2019) illustrates an application of the methodology within the uncertainty quantification literature. We provide full details of the approach applied to our case in section S4 of the supporting information.
To represent the wide range of GIS morphologies proposed for the LIG, we gather 14 GIS data sets from across several studies; see Figure 2 for references and illustration (and section S2 in the supporting information for full details). These ice sheet morphologies are used as a starting point to generate a much wider set of synthetic, potential morphologies. Prior to this, the 14 original morphologies are regridded onto a common, rectangular grid with resolution of 0.2 • in both directions, ranging over latitudes 59.7 • N to 83.9 • N and longitudes 73.6 • W to 11 • W. This results in a total of 122 × 314 grid cells for each morphology. The application of our PCA approach to this data set yields 13 Principal Components (PCs) V 1 , … , V 13 , each in the form of  Figure 3). We can generate new morphologies by taking linear combinations of the PCs, and adding the morphology M obtained as the average of the original data set. That is, any synthetically generated morphology M in this work is of the following form: for some real coefficients x 1 , … , x r . We choose r = 8 in light of the two following observations. Table 2 shows that the first eight PCs already account for more than 95% of the data set variance, hence capturing most of the variability displayed by the 14 original morphologies. Moreover, we observe that clear patterns can be seen in the plots of the first PCs: for example, PC1 in Figure 3 reveals that a positive coefficient x 1 in equation (1) will mainly increase the surface height of the generated morphology in the north, and decrease it in the south. Patterns can also be seen in higher PCs, but the omitted (9th to 13th) PCs seem more challenging to interpret in a physical sense.
Associating a morphology M to the vector x = (x 1 , … , x 8 ) of its coefficients in equation (1), and vice versa, allows us to parameterize morphologies using the Euclidean space R 8 . Within this parameterization, we can canonically associate the 14 original morphologies to vectorsx ( ) , … ,x ( ) ∈ R 8 , obtained by projecting each morphology onto the space generated by the first eight PCs. Notice that equation (1) with coefficients x =x does not recover exactly the jth original morphology, due to the truncation error. However, the differences are small, since the neglected components account for less than 5% of the original data set variance. By denoting withx ( ) i the ith component of the vectorx ( j) , one can consider the sample variance This provides a measure of the range covered by the original morphologies along the ith PC. The values of the standard deviation i associated with the ith PC are shown in Table 2.
For our analysis we consider a prior distribution on the shape of all ice sheets or, equivalently, on the parameter vectors x ∈ R 8 . Since the prior distribution should be concentrated in a region of the parameter space informed by the 14 original morphologies, we use independently, that is, under the prior the parameters are independent, normally distributed with mean 0 and variance i 2 . We will use this prior distribution, when designing the simulation runs in the next section, and when the compatibility with the ice core records is explored in section 4.2.
Finally, in order to avoid generating a GIS morphology with unrealistically low surface heights, at each grid point we consider either the bedrock height (using Stone et al., 2013 at 1.5 m of sea level rise) or the surface height from equation (1), whichever is greater. Once a morphology is generated, a corresponding land ice mask is required to carry out a 18 O climate simulation: at each grid cell, this specifies whether the morphology is covered by ice or not. For the 14 original morphologies, the mask is provided. For a new morphology M, we generate the mask cell by cell, by comparing the cell height to the corresponding heights of the original morphologies, further considering which of them have ice in the cell in question. Full details are provided in section S5 in the supporting information, where the mask generation method is also validated.

Choosing Morphologies for the 18 O Simulations
The isotope-enabled coupled atmosphere-ocean general circulation model employed in this work is HadCM3. The model has been used extensively, with 18 O specific code, to examine a variety of climate, sea ice, and ice sheet problems (e.g., Holloway et al., 2016Holloway et al., , 2017Malmierca-Vallet et al., 2018;Tindall et al., 2010). In the present study, HadCM3 is used to simulate 18 O during the LIG, at the six Greenland locations introduced in Section 2.
In order to build an emulator of the climate simulator HadCM3, the simulated 18 O response on some GIS morphologies is required. We call these design morphologies; the parameters x (j) ∈ R 8 associated with them through equation (1) will be referred to as design points. Since each HadCM3 simulation is expensive, needing more than a week of real time to complete, only a few simulations can be performed. Careful selection of design morphologies is important to ensure a good calibration of the emulator (see Montgomery, 2017).
In this study, we perform simulations in two waves. For the first wave, our aim is to generate design morphologies which are well scattered within the region of space that the emulator is required to explore. For this, we use a quasi-random sample from the prior distribution (3): We generate a Halton quasi-random sequence (Kocis & Whiten, 1997) of well-scattered vectorsx (j) in the eight-dimensional unit cube, and we apply the inverse of a cumulative distribution function of N(0, i 2 ) to the ith component of the vector x ( j) . This yields a sequence of design points x (j) with the correct variance along each PC, and whose corresponding morphologies systematically cover different GIS scenarios for 125 ka. The GIS morphologies corresponding to the sequence of points x (j) are generated, and the first 64 are selected that pass a basic plausibility test on their maximum height. The test consists in checking that the maximum surface height of a generated morphology is not greater than the mean plus four standard deviations of the set of maximum heights of the original morphologies. Two of the numerical simulations crashed for unknown reasons and were thus omitted from the analysis.
For the second wave, we use seven morphologies which are specially designed to assess how much small elevation changes in a morphology affect the simulator response. Each additional morphology is obtained upon perturbation of the parameter x ∈ R 8 associated with one of the first-wave morphologies (different in each case). In all seven cases, the induced difference on the surface elevation is order of only a few tens of meters. In two cases, differences of order of only few millimeters have been intentionally induced to test the chaoticity of the simulator response. This is also discussed in section 3.4.2, in relation to the use of a nugget term in equation (17). The total number of design morphologies that we use is therefore n = 62 + 7 = 69.

HadCM3 Simulation Runs
After the design morphologies are chosen, HadCM3 is used to reproduce the 18 O response to these modified GIS morphologies at the ice core sites. All LIG simulations are forced with greenhouse gas values and orbital forcing which are appropriate for 125 ka. Further details on the model and the simulations are provided in supporting information section S3. In order to minimize the impact of climate biases of the simulator, anomalies with respect to a preindustrial simulation are considered: More precisely, we use the difference between the time-averaged 18 O output from the last 50 years of the modified GIS LIG simulations and the time-averaged 18 O output from the last 50 years of the preindustrial simulation.
A well-known limitation in comparing paleoclimate data with model results is the discrepancy in resolution. Model grid cells cover areas of hundreds of square kilometers, while the location where the ice core has been extracted corresponds to a single point on the globe. Here, we use bilinear interpolation over the different grid cells in order to compute the simulated 18 O output at each of the ice core locations of interest. We note further that the coarse HadCM3 spacial resolution makes it difficult to reliably simulate 18 O values at the coastal margins. Accordingly, the small Renland ice dome (where LIG ice has been recovered) is not analyzed in this study in that not adequately captured within the HadCM3 resolution.
In Figure 4, the outputs of the n = 69 simulation runs are shown for each ice core site, together with bands highlighting the minimum and maximum 18 O anomalies reconstructed from the ice core records (Table 1). The figure shows that, for all pairs of locations, there are design morphologies whose simulated 18 O anomalies are within the corresponding data ranges, illustrating that the model is able to reproduce the observed data in principle. Closer inspection shows that there are seven runs that match the data on five of the sites simultaneously. Due to the limited number of design morphologies, there is however no run which matches all six of the sites. Shaded bands in red correspond to the ranges reconstructed using ice core records, using the minimum/maximum values shown in Table 1.

The Emulator
A GP emulator is a statistical model that effectively predicts the response of a complex simulator, on the basis of the actual simulator response at only a few inputs. In our case, we emulate the response of 18 O anomalies to changes in the GIS morphology at 125 ka, starting from the climate simulator response on the design morphologies. For the convenience of the reader, in section 3.4.1 we summarize the formulas needed to build an emulator. In the subsequent sections, we explain and justify the particular choices undertaken in relation to the problem explored here.

General Setting
The Bayesian framework of GP emulation is investigated, among others, in O'Hagan (1978), Sacks et al. (1989), and Kennedy and O'Hagan (2001). Further works address more specific questions, such as the use of appropriate covariance functions (Fricker et al., 2012), the use of a nugget term (Andrianakis & Challenor, 2012), or methods to validate the emulator (Bastos & O'Hagan, 2009). In the following, we provide a concise summary of the main steps and formulas leading to the construction of a GP emulator: We refer the reader to the aforementioned literature for a more in-depth overview.
We use lowercase letters for numbers and vectors (plain and bold, respectively), and uppercase letters for matrices (plain). The general setting is the one where the simulator has been run on n design points x ( ) , … , x (n) ∈ R p , leading to outputs 1 , … , n ∈ R. The emulator is a stochastic process providing probabilistic predictions of the simulator output at any point x ∈ R p , and which interpolates the value y j at the design point x (j) , j = 1, … , n.

10.1029/2019JF005237
To build the emulator, a prior mean function m(·) and covariance function v(·, ·) must be chosen. Following O'Hagan (1992), we assume for some coefficients ∈ R q and 2 > 0. Here, h(x) is a vector of q functions of the parameter x, and c(x, x ′ ) is a covariance function. The functions h(·) and c(·,·) must be chosen as part of the design of the emulator, while and 2 can be estimated from data.
The GP (·) with mean function m(·) and covariance function v(·, ·) is conditioned on the observations (x (j) ) = y j for j ∈ {1, … , n}. Usually, a "flat" prior distribution (O'Hagan, 1992) is chosen for the pair ( , 2 ). In this case, under the posterior distribution (·) is a Student's t process (Tracey & Wolpert, 2018) with n − q degrees of freedom. To provide the expression of the posterior mean and covariance functions, we introduce some notation: where y = (y 1 , … , y n ) ⊤ and, for any input x ∈ R p , define Using this notation, the posterior mean of 2 iŝ The posterior mean and covariance function of the process (·) are then given bŷ andv respectively. In particular, the emulator prediction at a point x ∈ R p is a Student's t distributed random variable with meanm(x), variancev(x, x) and = n − q degrees of freedom: this can be written aŝ where T denotes a standard t distributed random variable with degrees of freedom.
DOMINGO ET AL. 9 of 20

Choosing the Mean and Covariance Functions
In the present application, we fit separate emulators for each ice core site. For each site, the design points x (j) are the eight-dimensional vectors associated with the design morphologies (section 3.2), while the values y j are the corresponding simulated 18 O anomalies.
To help us decide on the form of the function h(·) used in the prior mean (4), we perform a multiple linear regression, describing y j as linear combination of x ( ) 1 , … , x ( ) 8 . For all the six locations, a linear model proves appropriate, showing unstructured residuals and adjusted coefficient of determination (adjusted R 2 ) greater than 0.89 for all sites. Given the good fit, we choose Using the notation in section 3.4.1, we have h(x) = ( 1, x ⊤ ) ⊤ , p = 8 and q = p + 1 = 9.
For the covariance function c(·, ·) used in (5) we use the form where D is the 8 × 8 diagonal matrix with diagonal elements d 1 2 , … , d 8 2 , and x,y equals 1 if x = y and 0 otherwise. The parameters d 1 , … , d 8 and will be estimated from data (section 3.4.3).
The first, squared-exponential term in the definition of c(·, ·) models the decay in correlation when the distance of inputs increases. The parameters d i are known as "correlation lengths." In addition to the squared-exponential covariance function used in (17), experiments with other covariance functions (Matérn with parameters = 3∕2 and = 5∕2; see Rasmussen & Williams, 2006) were carried out while training the emulator. The mean and variance of the emulator predictions, as well as cross-validated estimates of the emulation error, were remarkably similar in the three cases. Given the apparent smooth linear response of the vector y to the eight PC scores, we chose the smoothest of the three covariance functions, that is, the squared exponential.
The second term in the definition of c(·, ·) takes observational variance into account. In the presence of this term, the emulator provides non-deterministic predictions (i.e., with nonzero variance) even at the design points. This term is often referred to as "the nugget term" (see, e.g., Andrianakis & Challenor, 2012). In our case, its addition is necessary because the system simulated by HadCM3 is chaotic. Even after averaging the outputs over 50 years, GIS morphologies that are physically equivalent (i.e., with surface elevations differing only by millimeters) result in significantly different 18 O simulated anomalies. A numerical benefit of using a nugget term is that it makes the matrix A in (6) better conditioned. It thus helps to achieve numerical stability when computing the emulator predictions, a step that involves the use of the matrix inverse A −1 .

Estimation of Correlation Lengths and Nugget Term
The correlation lengths d i and the nugget term appearing in equation (17) can be estimated in different ways. A maximum likelihood approach is often used in the estimation of hyperparameters (see Andrianakis & Challenor, 2012;Berger et al., 2001): however, especially in higher dimensions, the maximization algorithm is often trapped in either local maxima or flat regions. In this work, we use the alternative approach described in this section.
First, we observe that all PCs are normalized and have therefore comparable size. Changes of similar magnitude in any two different components of an input parameter x ∈ R 8 therefore yield, overall, changes of similar magnitude in the corresponding ice morphologies. At a single-site level this becomes less true, with similar changes in different PCs affecting differently the surface height at the site. We notice, however, that the simulated 18 O response at a location does not depend only on the surface height at the location, but rather on the surface elevation within a larger area around it and, ultimately, on the whole morphology. In the light of this, assuming that similar changes in the components of a vector x have a comparable impact in the simulated 18 O at a given location does not appear unreasonable.
Separately, we note that the number of parameters that have to be estimated to build each emulator includes the nine mean regression coefficients 0 , … , 8 , the variance 2 , the nugget term , and the 8 correlation lengths d i . Given the limited number of training points (n = 69), an independent estimation of all the parameters presents a high risk of overfitting. In light of both this and the previous consideration, we Starting from the data set (x (j) , ) =1, … ,n , consider the n emulators calibrated on the data sets where the j th observation has been left out, respectively. Let ( ) d, (·) be the posterior density function at x (j) of such an emulator, where the parameters d = (d, … , d) and are used as correlation lengths and nugget respectively. From section 3.4.1 we know that ( ) d, is the density function of a t distribution with mean and variance as in (13) and (14). Now define .
The function f(d, ) is a measure of the goodness of the emulator predictions, when correlation lengths d and nugget are used. The parameters d and used in our emulators are chosen to maximize f(·, ·). We use the Matlab nonlinear solver fminunc and repeat the maximization multiple times from different starting points, to reduce the risk of getting stuck in a local extremum.

Data-Model Comparison
In order to carry out the data-model comparison, we follow the idea of using an implausibility measure, as, for example, in Craig et al. (1997), Vernon et al. (2010), and Williamson et al. (2015). Implausibility measures are used within the context of history matching: This last term, originally used in the oil industry (see the above-mentioned Craig et al., 1997), is now employed in the emulation literature to refer to the process of finding the inputs of a model whose outputs match historical observations. In our case, to select the morphologies compatible with the observed ice core records, we proceed as follows.
Letm L (x) andv L (x, x) be the mean and the variance of the 18 O emulator prediction at a given location L, for the morphology represented by x ∈ R 8 (equations (13) and (14)). Let us further denote by R L the most likely ice core record 18 O anomaly at that location, and by R − L and R + L the lower and upper bounds associated with it: values for R − L < R L < R + L are provided in Table 1. In order to quantify the mismatch, at the location L, between the emulator 18 O prediction and the ice core record information, we define the following "implausibility function": The quantity Var(Rec L ) is meant to provide a measure of the variability associated with the ice core record. Based on the analogy that the variance of a uniform random variable on an interval of length l is l 2 ∕12, we compute Var(Rec L ) as follows: Notice that, in equation (20), we deal with the two subintervals rather than the whole interval, hence the factor 1∕3 rather than 1∕12 appears. Finally, in order to obtain a quantitative measure of the mismatch between the emulator predictions for the morphology represented by x ∈ R 8 and the 18 O information at all the six locations, we define where L 1 , … , L 6 are the six locations in question.
We classify a morphology as RC, if the condition I(x) ≤ 2 holds, and at least 95% of the morphology cell heights are within 2 standard deviations of the average height of the 14 original morphologies at that cell. This ensures that RC morphologies pass a general physical plausibility test, in addition to being compatible with the data records. The possibility of relaxing the first criterion (I L (x) ≤ 2 for L = L 1 , … , L 6 ) to the five sites with lowest implausibility measure has also been considered, but the choice leads to neglect in most cases the DYE3 constraint (in line with the high 18 O anomaly recorded at the site). This is considered at odds with both the aim of the three-scenario analysis and the fact that DYE3 is the only southern site available in this study.
We conclude with a comment. In the classical emulation literature, implausibility measures are generally used in combination with a several-wave simulation approach: The latter sequentially rules out the parts of the input space for which the condition I(x) > c holds true and builds a new emulator on the "Not Ruled Out Yet" part of the space. The constant c is often chosen to be equal to 3 following Pukelsheim's 3 rule (Pukelsheim, 1994). In this work, resource constraints prevented us from performing several iterations of the procedure. Hence, the implausibility measure I(x), alongside the physical plausibility test described in the previous paragraph, is used to directly select morphologies, rather than rejecting them. For this reason we use a stricter criterion (c = 2) than the one more commonly found in the literature: The approach allows us to identify morphologies with a greater potential to match the ice core data. Properties of these morphologies are explored in the next section.

Fit of the Emulator
One emulator for each location is built according to the procedure described in section 3.4. The estimated values of the correlation lengths and nugget are shown in Table 3. With the only exception of CC, the estimated d are of comparable magnitude to the standard deviations i of the first PCs, see Table 2. In the case of CC, the estimation returns instead a relatively large d. After considerations, we decided to retain such value of d: indeed, it can be seen that for very large d the emulator converges to a simpler Bayesian linear regression model, which would still represent an appropriate limit case since the data at this site is well-described by a linear model, with an adjusted R 2 of 0.951.
To assess the quality of the fitted emulators, we consider the cross-validated residuals for each site: That is, for each simulation, we fit an emulator leaving out this simulation and use this emulator to predict the simulation output. This is done, in turn, for all n simulations. Figure 5 shows the differences between actual and predicted model outputs, at the six sites. The figure shows that nearly all simulation results lie within two emulator standard deviations of the prediction. Thus, both the values and the uncertainties predicted by the emulator seem appropriate. In particular, the emulator does not only make accurate predictions but also can accurately assess the uncertainty of its predictions.

Comparison Between Emulator Predictions and Ice Core Records
The six emulators allow us to efficiently predict, at the different locations, the 18 O response of the climate simulator to any ice sheet morphology. Using the methodology of section 3.5, emulator outputs can be compared to the available 18 O ice core information to decide whether a given GIS morphology is RC.
In order to carry out the data-model comparison detailed in section 3.5, values of R − L , R L and R + L must be specified at all the locations where compatibility with records is to be imposed. We use the values in Table 1. However, as explained in section 2, lower bounds for the 18 O anomalies at CC and DYE3 cannot be easily inferred from the records. In order to acknowledge this uncertainty, we carry out the analysis in the case of three different scenarios, according to how close these lower bounds are set to the most likely values R CC = +2.5‰ and R DYE3 = +4.7‰. The scenarios we investigate are as follows: 1. Loose scenario (only imposing nonnegative anomalies): 2. Middle scenario: 3. Tight scenario (R − L = R L − 1‰): For our analysis, we represent the prior distribution on all possible ice sheet morphologies by a set of N = 10 7 morphologies, sampled randomly from the prior distribution described in section 3.1. Each of these N morphologies is classified as either being RC or not, according to the criterion described in section 3.5. The resulting RC morphologies then form a sample from the posterior distribution, incorporating the constraints from the data. In the loose scenario case, where the imposition of the record compatibility at CC and DYE3 has little effect, the RC morphologies represent 5.75% of the morphologies sampled from the prior distribution. This percentage approximately halves (2.86%) in the middle scenario, and further decreases to 0.73% in the tight scenario. In each of the three scenarios, the percentage of morphologies that satisfy the condition I(x) < 2 but do not pass the physical plausibility test (end of section 3.5) is as follows: 16.5% (loose), 17.3% (middle), and 27.0% (tight). Perhaps not surprisingly, this shows that the condition of matching the six ice core records, as simulated by the HadCM3 model and later estimated via the emulator, is alone not sufficient to identify physical morphologies. Figure 6 shows how the posterior distribution compares to the prior, in the two cases of the loose and tight scenarios. Two-dimensional cross sections of the subspace spanned by the first three PCs are shown. The colored lines represent contours of the posterior densities (red for the tight scenario, blue for the loose one), and the labels indicate the percentage of RC morphologies that are selected along the corresponding contour.
The gray shaded background instead shows the Gaussian prior distribution.
From Figure 6, it can be appreciated that the posterior distribution of RC morphologies looks closer to the prior in the loose scenario than in the tight scenario, as expected. However, the particular directions in which the shifts happen, alongside the PC shapes, can provide insight on the patterns displayed by the RC morphologies. Especially in the tight scenario case, we can see that areas characterized by a higher density of RC morphologies tend to have a positive score in the first PC (more evident in the subplot of PC1 and PC3).
In light of the "North-South" pattern shown by the first PC, this seems to suggest the following: Imposing the compatibility with records leads to morphologies with lower surface height (i.e., less or no ice) in the south and higher surface height in the north, compared to the average morphology generated through the prior.
Similarly, the two subplots in Figure 6 concerning the second and third PCs reveal critical information. The RC morphology scores for the second PC are mostly negative in the loose scenario (blue) and positive in the tight scenario (red). Notice that the second PC has negative values almost everywhere, particularly in the southwestern block of Greenland. The strong constraint on DYE3 and CC that is imposed in the tight scenario seems therefore to induce a clear loss of ice in the southwestern block. However, when a loose constraint is imposed at the two locations, the typical surface elevation in the south would increase, reflecting the presence of more ice.

Shape and Uncertainty of RC Morphologies
The main characteristics displayed by RC morphologies are illustrated in Figure 7. Mean and standard deviation are shown for the prior (left), and for the posterior in the three scenarios (right). Most of the variability, and thus of the uncertainty, is displayed within two regions, one in the north and one in the center south 10.1029/2019JF005237 (top row). The uncertainty in surface elevation in these two areas is significantly reduced, once the constraints from the data are taken into account. However, while little difference is present in the north between the three scenarios, the imposition of a stronger constraint on the DYE3 and CC records induces a clear reduction in the uncertainty of the southern surface elevation.
The prior and posterior mean morphologies are shown in the bottom row of Figure 7. To ease the interpretation, we show the prior mean on the left and the difference between posterior and prior means on the right. Progressively from left to right, a distinct decrease of surface height in the south appears. From the loose to the middle scenario, the increase of 2‰ on the lower bound of R − DYE3 induces a decrease of the average surface height in the area surrounding DYE3 of up to 400 m. Moreover, by further tightening the R − DYE3 constraint of 1.7‰ (middle to tight scenario), we observe an additional decrease in the same area of up to 450-500 m. The other area in Greenland where a similar phenomenon arises, albeit less prominently, is the one around CC and in the immediate north of it, where the decrease is of approximately 150-200 m in each of the two steps (loose to middle and middle to tight scenarios).
Also observe that, in the case of the loose scenario, the areas displaying surface heights lower than the prior morphologies are concentrated around the four locations where record constraints play an active role (NEEM, NGRIP, GRIP, and GISP2), leaving the DYE3 and CC areas mostly unaffected.
We remark that our approach tightens simultaneously the DYE3 and CC records, hence it does not allow us to ascribe with certainty a given posterior feature to the constraint imposed on one or the other location. The geographical position of the two sites and the size of the constraint on their R − L nonetheless favor the interpretation that the DYE3 constraint is responsible for the southern surface elevation decrease of tighter scenarios, as implied above. Inspection of the land ice masks for morphologies compatible with the tight scenario shows that most of these morphologies present a two-dome structure, with DYE3 sitting at the northern top of the southern dome.
We conclude with a note. The authors have carried out further experiments aiming to assess the sensitivity of the results to changes in the minimum NGRIP anomaly, whose value is also uncertain as acknowledged in section 2. The experiments reveal that no tangible change to the shape of the RC morphologies can be appreciated when the minimum 18 O NGRIP anomaly is decreased by 0.5‰, with Figures 6 and 7 remaining essentially unchanged. The percentage of morphologies which result RC increases to approximately 7.5%, 4%, and 1%, in the loose, middle, and tight scenarios, respectively.

Record Compatibility in the Wider Sense
Within the ice sheet modeling community, Greenland ice age information from ice cores is used to attempt to constrain LIG GIS results. The presence of ice at NGRIP and at the Summit region through the (late) LIG period has previously been used to reject particular simulations (e.g., Robinson et al., 2011;Stone et al., 2013). CC and DYE3 sites are not generally used in the same way because the age of the (possible) LIG period ice is considerably more ambiguous. At DYE3, ice may or may not have covered the site during the LIG (Alley et al., 2010;Willerslev et al., 2007). At CC some ice at the base occurs that has been interpreted as being pre-LIG in nature (Dansgaard et al., 1982), however, as at DYE3, this is ambiguous. Without better ice age information there will not be community agreement on whether DYE3, or CC, was definitely ice-covered during the LIG.
As well as ice age information, ice core-based elevation change reconstructions, based on 18 O, have been used to infer LIG GIS changes (e.g., Johnsen & Vinther, 2007). However, this cannot be used as an additional constraint here, because this 18 O information is already accounted for by our analytical approach. Beyond 18 O elevation interpretations, air content measurements have also been used as an independent constraint.
In particular, the NEEM elevation change reconstruction, which specifies a NEEM surface elevation difference of +45 ± 350 m at 126 ka compared to present day, is derived from air content measurements (NEEM community members, 2013). This data is fully independent of our approach.
If we impose the NEEM elevation constraint on our resulting RC morphologies, in the loose and middle scenarios cases around 3.7% of the RC morphologies are compatible with this additional NEEM data. This percentage increases in the tight scenario, where 6.5% of our 18 O compatible morphologies are in agreement. The relatively small overlap between air content and 18 O compatible morphologies could possibly reflect uncertainties in the air content elevation reconstruction methodology. The method relies on calculating corrections to air content due to insolation and temperature (Eicher et al., 2016;Raynaud et al., 2007) alongside secular changes in surface pressure and winds (Krinner et al., 2000;Martinerie et al., 1994). Surface melting also affects NEEM air content measurements between 127 and 118.3 ka (NEEM community members, 2013). Thus, the interpretation of elevation from air content measurements is rather uncertain. However, in spite of these uncertainties, results from the two independent approaches ( 18 O and air content) do overlap. Malmierca-Vallet et al. (2018) demonstrated that an isotopic-enabled climate model cannot correctly simulate the LIG 18 O measurements at DYE3, GRIP, and Renland sites via sea ice changes alone. Here, our analysis using the emulator establishes that outputs from the isotope-enabled HadCM3 model can be reconciled with the best current LIG 18 O measurements from ice cores. We have shown that this requires a careful parameterization of ice sheet morphologies and a thorough approach to the characterization of uncertainty.

Conclusions
The major source of uncertainty, tackled here via the construction of suitable emulators, arises from the impossibility of running the simulator on a high number of morphologies. Additional uncertainties stem from the fact that, both in reality and in simulations, the isotopic response in the ice is dependent not only on the ice sheet morphology but also on other parameters such as evaporation conditions, transport pathway effects, and Arctic sea ice variations (Sime et al., 2013). In this regard, we note that the nugget term used to model the prior emulator covariance, alongside reflecting the chaotic behavior of the simulator, accounts as well for the variability in simulator outputs which is due to inputs not explicitly included in the emulator: This is referred to as residual variability in Kennedy and O'Hagan (2001). Last but not least, we stress once again that the available ice-core-based estimates of 18 0 anomalies are themselves uncertain. The case of DYE3 and CC is a prototypical example of this.
In this work we aim to account for all the previous unavoidable sources of uncertainty within a unified framework, joining mathematical parameterization of the space of morphologies, Gaussian process emulation and available ice core records. The scenario-based approach we undertake demonstrates the importance of well-constrained records. In common with previous studies, we show that dated ice from around DYE3 is crucial for certain inferences about Greenland (Sime et al., 2019). If more certain records at DYE3 and CC confirmed that the high 18 O values measured near the bottom of the ice cores were from the LIG, our results show that this would be compatible with morphologies characterized by low surface height in the south: Roughly, a decrease of 200 m has been observed corresponding to an increase of 1‰ in the minimal 18 O anomaly. We have also shown that the variability in reconstructed surface elevation would decrease in such tighter scenarios, with glimpses that the data would then favor a two-dome structure of the LIG ice sheet.
While this study's aims are to provide a methodology of wide applicability to infer LIG GIS changes and to stress the importance of statistical emulators in similar expensive tasks, we believe that further steps may be included to ensure wider data-model comparisons. In light of the above, we encourage the search of additional data at DYE3 and CC and suggest that ice, sea ice, and isostatic changes are all accounted for within a joint framework. This may enable a useful quantitative assessment of how Greenland ice sheet changes affected global sea levels during the LIG.

Data Availability Statement
The data generated and employed in this study are available at the repository Domingo et al. (2020). The data sets contain (a) the values of the PCs V i and of the average morphology M used in equation (1) and (b) the PC scores identifying the design morphologies and the associated simulated 18 O anomalies at the six ice core sites.