Spin‐up of UK Earth System Model 1 (UKESM1) for CMIP6

For simulations intended to study the influence of anthropogenic forcing on climate, temporal stability of the Earth's natural heat, freshwater, and biogeochemical budgets is critical. Achieving such coupled model equilibration is scientifically and computationally challenging. We describe the protocol used to spin‐up the UK Earth system model (UKESM1) with respect to preindustrial forcing for use in the sixth Coupled Model Intercomparison Project (CMIP6). Due to the high computational cost of UKESM1's atmospheric model, especially when running with interactive full chemistry and aerosols, spin‐up primarily used parallel configurations using only ocean/land components. For the ocean, the resulting spin‐up permitted the carbon and heat contents of the ocean's full volume to approach equilibrium over 5,000 years. On land, a spin‐up of 1,000 years brought UKESM1's dynamic vegetation and soil carbon reservoirs toward near‐equilibrium. The end‐states of these parallel ocean‐ and land‐only phases then initialized a multicentennial period of spin‐up with the full Earth system model, prior to this simulation continuing as the UKESM1 CMIP6 preindustrial control (piControl). The realism of the fully coupled spin‐up was assessed for a range of ocean and land properties, as was the degree of equilibration for key variables. Lessons drawn include the importance of consistent interface physics across ocean‐ and land‐only models and the coupled (parent) model, the extreme simulation duration required to approach equilibration targets, and the occurrence of significant regional land carbon drifts despite global‐scale equilibration. Overall, the UKESM1 spin‐up underscores the expense involved and argues in favor of future development of more efficient spin‐up techniques.


Introduction
To a first approximation, the behavior of the Earth system (ES) is governed by the dynand interactions of the two geophysical fluids-the atmosphere and the ocean-that comprise the majority of the planet's surface substrate. Despite a number of similarities, these two fluids diverge in many other respects, including a critical difference in the timescales of their internal dynamics. Features in the atmosphere form and dissipate over periods typically of the order of hours, days, or weeks in duration, with a residence time for one of its A further approach, "off-line ocean-only," separates the spin-up of ocean physics from that of biogeochemistry, by treating ocean circulation itself as just another part of the forcing. Once circulation has stabilized, either in full Earth system or ocean-only mode, it is used as a three-dimensional climatology to transport tracers of marine biogeochemistry. In this way, the subsequent cost of calculating ocean physics is avoided, permitting a more computationally efficient spin-up. A superficially similar approach to "off-line ocean-only" is that of the transport matrix method (TMM Khatiwala et al., 2005;Khatiwala, 2007). Rather than explicitly using a stored model circulation to drive biogeochemical tracers, this method describes the spatial connectivity driven by ocean circulation as a sparse matrix that can efficiently be used as a transport operator. While both of these approaches serve to spin-up passive tracers at a somewhat reduced computational cost, both still require an equilibrium physical circulation in the first place, which in turn requires its own spin-up. As we need to spin up both the physical circulation and ocean biogeochemistry of UKESM1, our ocean spin-up here makes use instead of the "online ocean-only" to do both in tandem.
Note that the discussion above effectively assumes equilibration is always for the good, essentially because of the resulting reduction in model drift. However, as imperfect tools, models do not necessarily converge toward a state similar to that of the real Earth system, and extended spin-up is liable to produce a divergent state relative to the true observed state (while revealing model biases). Paradoxically, by reducing model drift while increasing model bias, equilibration can seemingly reduce a model's skill when evaluated against observations (Seferian et al., 2016). Conversely, by limiting spin-up, a model will diverge less from its (typically) observationally derived initial state, and its state will show smaller biases (if greater drift). Nonetheless, the need for a stable control simulation from which to initialize historical runs (and then future projections) is more important than such considerations. Not least because the drift from an observation-based initial condition is likely larger, over the first few hundred years of a simulation, than the anthropogenic signal we wish to detect.
Within the Coupled Model Intercomparison Project Phase 6 (CMIP6), the Diagnostic, Evaluation, and Characterization of Klima (DECK) protocol (Eyring et al., 2016) describes the baseline simulations that all participating models must undertake to "benchmark" their performance. An underpinning part of the DECK is the production of a preindustrial control (piControl) simulation from which model states can be drawn to initialize simulations for both the DECK and other Model Intercomparison Projects (MIPs). While the DECK outlines certain boundary conditions for this piControl (e.g., atmospheric CO 2 concentrations, orbital parameters, and a mean solar cycle), it does not specify a particular methodology or duration for the production of this model state. This stems partly from the variety of models participating in CMIP, and the resulting difficulty in defining universal criteria for models that range widely in complexity, resolution, and degree of internal coupling. Additionally, the potential computational cost of spin-up is a factor, with participating groups varying in their access to compute resources. Some MIP protocols, such as C4MIP , suggest equilibrium criteria for participating models, but the DECK requirement of a multicentury piControl to shadow MIP simulations is intended as a means to quantify (and control for) drift in CMIP6 simulations.
This situation largely repeats that of CMIP5, where total spin-up durations varied widely from only 200 years up to almost 12,000 years (Seferian et al., 2016). As well as this wide span of spin-up durations, the CMIP5 models summarized by Seferian et al. (2016) also varied widely in the spin-up methodology used. Models adopted various off-line, accelerated off-line, and component-only online approaches, often in unique combinations, prior to final periods of fully coupled simulation. However, in the absence of formalized guidance or commonly accepted spin-up procedures, the documentation of spin-up typically remains a lower-priority activity. Nonetheless, a number of studies have examined aspects of spin-up, such as how specifically to equilibrate ("spin-down") from modern initial conditions to the preindustrial state (Stouffer et al., 2004), quantifying the sources of drift or variability in spun-up models Doney et al. (2006), and how drifts can be corrected without introducing bias (Hobbs et al., 2016). Furthermore, an increasing number of studies document the approaches used to spin-up ESMs (more comprehensive examples include Watanabe et al., 2011;Séférian et al., 2013;Lindsay et al., 2014).
Here we document the spin-up procedure followed in preparing a preindustrial control state of the new ESM, UKESM1, for the CMIP6 DECK and MIP experiments. The manuscript begins with a brief description of UKESM1 and its main components, followed by an extensive description of the procedure employed to equilibrate UKESM1 to CMIP6 preindustrial forcing. We then show the evolution of the model's state during spin-up, from both the parallel ocean-and land-only spin-up activities, followed by the final, fully coupled model. The model's degree of equilibration and biases in its final state are discussed, together with potential future avenues for addressing these. In addition to the results presented in the main body of this manuscript, the supporting information includes additional tables and figures to document the spin-up and performance of UKESM1.

UKESM1 Description
UKESM1 is a new state-of-the-art ESM composed of components that represent both physical and biogeochemical aspects of the Earth's atmosphere, ocean, cryosphere, and land systems. It is built on the recent Hadley Centre Global Environment Model Version 3 Global Coupled (GC) climate configuration, HadGEM3 GC3.1 Williams et al., 2017). This physical core model is extended through the addition of ocean and land biogeochemistry, and interactive stratospheric-tropospheric trace gas chemistry, which predicts atmospheric oxidant fields as input to the aerosol model as well as a range of radiatively active gases (e.g., O 3 , CH 4 , and N 2 O). As well as including dynamics internal to their components, these Earth system additions couple where it is believed that they potentially feedback upon one another (either negative and damping or positive and amplifying), or where they impact the time evolution of the physical climate system. For example, atmospheric aerosols play a key role in mediating the transfer or absorption of radiation within the atmosphere, and their occurrence and behavior is an outcome of interactions between chemical and physical processes in the atmosphere, ocean, and ice (Halloran et al., 2010;Carslaw et al., 2010;Quinn & Bates, 2011;Myhre et al., 2013;Kok et al., 2018). Representing and understanding the nature of such linkages between components is of critical importance if models are to accurately represent the true Earth system sensitivity to anthropogenic forcing. Figure 1 presents a schematic diagram of the components included within UKESM1, together with an indication of the nature of the coupling between them. In outline, atmosphere and land components are closely coupled together as a single, integrated executable; use a common grid and time step; and communicate their states directly at each time step without the need for a coupler. The same is true for the three ocean components-dynamics, sea ice, and biogeochemistry-which are also coupled together as a single executable. The two executables-atmosphere-land and ocean-ice-biogeochemistry-communicate once every 3 hr through interface layers, labeled OASIS3-MCT_3.0 (Craig et al., 2017;Valcke, 2013) in Figure 1. The atmosphere of UKESM1 as represented by GA7.1 represents the physical dynamics of the atmosphere, including processes such as mass transport, radiative transfer, thermodynamics, and the water cycle. Coupled to the GA7.1 is the UK Chemistry and Aerosols model (UKCA; Morgenstern et al., 2009;O'Connor et al., 2014), which represents stratospheric and tropospheric chemistry, as well as aerosols via the GLOMAP-mode scheme , with dust represented by a binned scheme (Woodward, 2011). UKESM1 differs from GA7.1 in its treatment of the natural emissions of monoterpenes, dimethyl sulphide (DMS), and primary marine organic aerosols (PMOAs), which are interactively calculated from elements of the land and ocean components, permitting feedbacks between the biosphere and aerosol/cloud-radiative behavior in UKESM1. A further coupling that uniquely links the land to the ocean in UKESM1 is the production of wind-borne mineral dust as a function of simulated climate and bare soil on land, and which can fuel ocean productivity (and uptake of CO 2 ) by supplying bioavailable iron. See Mulcahy et al. (2018), , and Archibald et al. (2020) for further details concerning atmospheric chemistry in UKESM1.
The physical ocean component of UKESM1 is represented by the Nucleus for European Modelling of the Ocean model (NEMO; Madec, 2016) for its dynamical circulation and by the Los Alamos sea-ice model (CICE; Rae et al., 2015) for its marine cryosphere. More complete descriptions of the NEMO and CICE configuration used in UKESM1, including details of its sensitivity and resulting tuning, can be found in Storkey et al. (2018), Ridley et al. (2018), and Kuhlbrodt et al. (2018). Marine biogeochemistry is represented by the Model of Ecosystem Dynamics, nutrient Utilisation, Sequestration, and Acidification (MEDUSA-2.1), which includes the cycles of nitrogen, silicon, iron, carbon, and oxygen. The version used in UKESM1 is identified as MEDUSA-2.1, to distinguish it from its earlier parent model, MEDUSA-2, described in Yool et al. (2013). Developments made for UKESM1 include (1) replacement of its carbonate chemistry with the MOCSY-2.0 scheme of Orr and Epitalon (2015); (2) the addition of an empirical submodel of surface seawater DMS concentration (Anderson et al., 2001); and (3) various code improvements including adaptations for variable volume (VVL) and upgrading to utilize the XML Input-Output Server (XIOS) adopted by NEMO (Xios, 2013).
The land component of UKESM1 is represented by the Joint UK Land Environment Simulator (JULES; Best et al., 2011;Clark et al., 2011;Walters et al., 2019), which handles physics and integrated biogeochemistry. This is closely coupled with the Top-down Representation of Interactive Foliage and Flora Including Dynamics model (TRIFFID; Cox, 2001;Jones et al., 2011), a dynamic global vegetation model that represents plant and soil dynamics on land. Developments since CMIP5 include (1) updating of plant growth and turnover parameters to reflect the plant trait database, TRY (Kattge et al., 2011); (2) an increase in the number of plant functional types (PFTs) from 5 to 13, further permitting the distinction of evergreen/deciduous plants and tropical/temperate evergreen trees (Harper et al., 2016); (3) the emission of volatile organic compounds (VOCs; e.g. , Pacifico et al., 2011); and (4) limitation on terrestrial primary production (and therefore CO 2 uptake) through the availability of soil and plant nitrogen. Land use by agriculture is represented in TRIFFID by reserving fractions of each grid cell for crops and pasture, with these fractions prescribed as external forcing that can vary with time. The Greenland and Antarctic land ice sheets are represented via a subgrid-scale scheme described in Shannon et al. (2019).  By default, UKESM1 has a relatively coarse horizontal resolution of N96 (approximately 135 km) in the atmosphere and 1 • (approximately 73 km) in the ocean. Vertical resolution is 85 levels in the atmosphere (with a model top at 85 km), and 75 levels in the ocean (with a maximum depth of 6 km), with, in both cases, high vertical resolution focused at the interface between the two fluids. This resolution corresponds to the HadGEM3 N96ORCA1 configuration, a full description of which can be found in Kuhlbrodt et al. (2018).
In the work described here, the fully coupled version of UKESM1 was only utilized for a restricted (latter) portion of the full spin-up process. This was in part because of its high computational cost, but also because this cost is largely associated with atmospheric components that spin-up to equilibrium much more quickly than the ocean or the land. The majority of the spin-up was therefore performed using parallel ocean-only and land-only versions of UKESM1, forced at their surface boundary conditions by atmospheric output from a shorter coupled model simulation. More complete details of the fully coupled UKESM1, including an analysis of its preindustrial and historical climate, can be found in Jones et al. (2019).

UKESM1 Spin-up
Spin-up of UKESM1 utilized a combination of phases using coupled climate, ocean-only, land-only, and full Earth system coupled versions of the model (with and without interactive atmospheric chemistry). The development cycle of the full model occurred in parallel with spin-up activities, with the result that spin-up did not use a single version of the model throughout. Periodically, model improvements and bugfixes were applied between model phases. The last segment of spin-up did employ the final coupled version of UKESM1. Figure 2 presents an overview diagram of the spin-up procedure. Several primary branches of parallel spin-up are shown, each focused on the equilibration of separate ES components: ocean, land, and atmospheric chemistry. Model states (i.e., restart files of prognostic variables) were shared between these main branches during the full spin-up, with the main points of restart state sharing specifically identified. To illustrate underlying operational details, Tables S1 to S3 in the supporting information present the chains of simulations performed as part of the "ocean," "land," and "atmosphere" branches of spin-up (respectively, the top, middle, and bottom paths of Figure 2). Significant changes along these branches are switches from component-only to coupled branches, the switch from prescribed, noninteractive atmospheric chemistry (designated UKESM1-CN) to fully interactive chemistry (UKESM1), and the adoption of component model states as initial conditions from other branches. Since it is the longest branch in terms of total simulated years, attention focuses here on the ocean branch summarized in Table S1.
As noted previously, the largest active carbon and heat reservoir in the Earth system is the ocean, and imbalances in this reservoir can have a large impact on simulation drift. Consequently, the ocean spin-up branch was prioritized and operationally began first, principally in ocean-only mode before switching to a fully coupled mode with prescribed, noninteractive atmospheric chemistry (designated UKESM1-CN).

10.1029/2019MS001933
This was followed by land spin-up, which also started in land-only mode and also subsequently transitioned to UKESM1-CN mode. Finally, the fully coupled model, complete with interactive atmospheric chemistry (designated UKESM1), was spun-up. The ocean, land, and atmosphere states from these parallel branches were then finally combined into a UKESM1 simulation (identifier u-av472; Table S1 that led into the preindustrial control simulation (identifier u-aw310; Table S1.) This piControl was then simulated for a duration of more than 1,000 years both to act as a control for numerous other simulations in CMIP6 and to serve as a source of initial states for the CMIP6 Historical ensemble of UKESM1. This latter ensemble forms the subject of the analyses of Jones et al. (2019) and Archibald et al. (2020).
UKESM1 is the successor ESM to CMIP5's HadGEM2-ES, and, as noted already, this latter model is the source of some components in UKESM1. The spin-up procedure adopted for HadGEM2-ES also parallels that used here for UKESM1, with periods of ocean-and land-only spin-up followed by a final phase of fully coupled simulation, although with some significant differences (Collins et al., 2011). Unlike UKESM1's ocean, where observationally derived initial conditions were used, spin-up of HadGEM2-ES was initialized using an existing ocean state from the preceding HadGEM1 model used in CMIP3 (Johns et al., 2006). This included both physical and biogeochemical state variables, with the new biogeochemical variables introduced in HadGEM2-ES initialized with climatogical (silicic acid) or uniform (total iron) values. Similarly to UKESM1, this ocean state was then spun-up in ocean-only mode under model atmospheric forcing for 400 y, but with the advantage of starting from the previous spun-up state of HadGEM1. In the case of the land component, HadGEM2-ES used an acceleration technique in which, after 3 year periods of coupled simulation, the model's land state was implicitly extrapolated forwards by 100 years before returning to a further period of conventional coupled simulation. This procedure was repeated four times, advancing the land state of HadGEM2-ES by 400 years. This approach did not fully equilibrate refractive soil organic material because of the time scales its equilibration (e-folding of 50 years), and its sensitivity to subannual litter input. Spin-up of the model's soil carbon was instead achieved using 2,000 years of off-line simulation of this reservoir, forced using monthly fields of litter inputs. The ocean and land states obtained using these procedures were then used in a final period of fully coupled simulation under preindustrial conditions for 280 years, to produce a piControl state.

Detailed Spin-up Approach
The key motivating factor in our spin-up was minimizing drift in the Earth system's carbon cycle, and attention was strongly focused on the net air-sea CO 2 flux. Analysis by Seferian et al. (2016) found the diverse array of spin-up protocols followed during preparation for CMIP5 resulted in models that exhibited large differences in simulated fields, and potentially biased performance evaluations. Recognizing this,  suggest a drift criterion of ≤10 Pg C century −1 (i.e., a long-term average of ≤0.1 Pg C year −1 ) for net fluxes between the atmosphere, land, and ocean reservoirs as part of the C4MIP protocol . Note that both the land and ocean components of UKESM1 only exchange carbon with the atmosphere component and not directly with each other (e.g., via rivers).
To evaluate this particular target, as well as track a range of critical physical and biogeochemical properties (e.g., ocean heat content, surface temperature, top-of-atmosphere heat balance, Atlantic meridional overturning circulation, and sea-ice cover), the spin-up was monitored throughout using the Met Office Climate Model Monitoring tool (CMM) and BGC-val tools (de Mora et al., 2018). Running routinely in parallel with the spin-up simulations, these tools greatly facilitated rapid decision-making during model development, as well as identifying undesireable drifts or model errors.
The spin-up path began with a short physical climate simulation (run ID u-ai567; the full list of run IDs is given in Tables S1 to S3) using a prototype of HadGEM3 GC3.1 Williams et al., 2017) using CMIP6 preindustrial control forcing. This constitutes the physical core of UKESM1. This simulation was initialized from rest (i.e., with zero u and v velocity fields), with observationally derived initial conditions for the ocean (EN4 Good et al., 2013), and initial states for the atmosphere and sea-ice drawn from a GC3.0 simulation Kuhlbrodt et al. (2018). After 30 years, the atmospheric state of this simulation was judged to be sufficiently spun-up to serve as a source of forcing data for ocean-only configurations, and the simulation was continued to provide a further 30 year period of forcing data.
The forcing data collected from this GC3.1 simulation (and for subsequent forcing) consisted of 1.5 m air temperature, air humidity, 10 m wind velocities (U and V directions), surface downwelling short-and long-wave radiation, precipitation (rain and snow) and aeolian dust flux at 3 hr frequency, and river runoff at monthly frequency. These data fields are the same as those available in observationally derived reanalysis forcing data sets, such as CORE (Large & Yeager, 2009) and DFS (DRAKKAR Group, 2007), although at higher temporal resolution for heat and freshwater fluxes. In addition to these properties, fields of ocean surface temperature and salinity were collected from the same GC3.1 simulation at monthly frequency for relaxation purposes.
Based on the variability found in the atmospheric component of GC3.1, a forcing period of 30 years was selected as broadly representative of interannual patterns (but see later). Test simulations using repeated cycles of this forcing did not find any significant issues associated with the forcing "kick" imparted between cycles (i.e., upon reaching the end of Year 30, Year 1 is simply recycled). This approach of recycling forcing has previously been used successfully with NEMO-MEDUSA (Couldrey et al., 2016).
After this initial spin-up phase with HadGEM3-GC3.1, a successor phase was prepared using an ocean-only configuration of a UKESM1 prototype (run ID u-aj588). The ocean physical state of this (i.e., ocean, sea-ice, and icebergs) was initialized using the model state at end of Year 30 of the preceding GC3.1 simulation. The ocean biogeochemical state was initialized using observationally derived fields from the World Ocean Atlas 2009 (WOA09 Garcia et al., 2010a;Garcia et al., 2010b) and Global Ocean Data Analysis Project v1.1 (GLODAPv1.1 Key et al., 2004) climatologies. Fields of dissolved inorganic nitrogen (DIN), silicic acid, and dissolved oxygen were drawn from WOA09, while preindustrial DIC and alkalinity were drawn from GLO-DAPv1.1. Following Yool et al. (2013), the fields of DIC and alkalinity from GLODAPv1.1 were modified to interpolate over large regional lacunae in the original climatology (the revised GLODAPv2 climatology was not fully available at this time Lauvset et al., 2016). Note that although older climatologies were used to initialize run u-aj588, subsequent evaluation primarily uses their revised and updated equivalents, World Ocean Atlas 2013, and GLODAPv2.
Once initialized, this ocean-only configuration was run under repeated cycles of the initial atmospheric forcing data for a total of 1,890 years (i.e., 63 cycles of 30 years; run IDs u-aj588 and u-ak900). During this initial, extended period of spin-up, a difference in the bulk formulae for atmosphere to ocean momentum flux between the coupled UKESM1 and the ocean-only configuration was found. Changing this calculation so the ocean-only model mimicked the coupled model calculation led to the updated ocean-only run u-an869. However, because of the long duration of the initial ocean-only phase and a consistent "direction of travel" in the carbon cycle, this subsequent phase (u-an869) used the end state of the initial phase (u-aj588) for its initial condition. This new ocean-only phase also allowed an update to the atmospheric forcing that was taken from a longer duration spin-up of the UKESM1 prototype, again using a 30 year period. This subsequent phase was run for further 2,905 years (96.5 cycles; including run ID u-ar538). During this ocean-only phase, trial simulations of the full coupled ESM, initialized using ocean states drawn from the ocean-only spin-up, found that the two modes were comparable-though not identical-in terms of their evolving ocean properties and in net air-sea CO 2 flux, with these test coupled runs typically reaching an equilibrated state within 150 years. During this ocean-only phase net air-sea CO 2 declined to less than 0.1 Pg C year −1 , as desired. Upon reaching this CO 2 target, spin-up was switched from ocean-only mode to coupled Earth system mode.
Having determined the ocean-only configuration had reached a near-equilibrium state, these ocean conditions were used to initialize the coupled ESM, which ran for a further 500 years of spin-up before the start of the CMIP6 piControl. The first 300 years of this coupled spin-up used the faster UKESM1-CN configuration of the model to maximize the equilibration of the ocean and terrestrial biosphere in the available time. This UKESM1-CN configuration differs from the full UKESM1 model by using prescribed chemical oxidants taken from a parallel UKESM1 preindustrial run but is otherwise identical; see Appendix A of Jones et al. (2019) for details. The fully coupled model required some science changes during this final coupled spin-up to address important biases, many of which emerged as a result of coupling components that had previously been spun up separately. These changes are extensively described in section 3 of Jones et al. (2019). The magnitude and impact of these changes decreased as the spin-up progressed, and the last 200 years were performed with the final UKESM1 science settings, and with the full-complexity model configuration (e.g., interactive atmospheric chemistry now included).
In parallel to the ocean spin-up, there were separate spin-up phases for terrestrial biogeochemistry and atmospheric chemistry, prior to their introduction into the main spin-up simulation ( Figure 2). The land state, and in particular forest cover and the soil carbon and nitrogen pools, takes many hundreds of years to equilibrate with the surface climate and carbon fluxes. Some aspects of land cover, such as grass cover, equilibrate relatively quickly, so initial priority was therefore given to improving simulation of slower equilibrating aspects, such as forest cover and soil carbon, with subsequent tunings applied to the grass plant functional types and snow-vegetation interactions. However, as the whole system is interactive, changes in grass colonization affects soil carbon and nutrients which, in turn, feeds back on vegetation productivity. The land was initially spun up in a 1,000 year off-line simulation of the JULES land surface model, driven by surface forcing from a GC3.0 simulation, a prototype of fully coupled UKESM1. This land-only phase was, itself, initialized using the land state from a land-only simulation run in excess of 10,000 years using CRU-NCEP reanalysis forcing as derived for the Global Carbon Project (Le Quere et al., 2018). Prior to this, the land model was run in excess of 10,000 years using CRU-NCEP observation-based meteorology from the Global Carbon Project. Similar to the ocean-only spin-up, this approach considerably reduces simulation cost. The land system underwent a further 665 years of coupled UKESM1-CN spin-up, including the implementation of final tunings, before being used to reinitialize the land state for the final, 200 year UKESM1 spin-up phase. Thus, the terrestrial BGC fields experienced 865 years of coupled spin-up in total, following the initial 1,000 year offline spin-up.
Atmospheric timescales (ranging from minutes to tens of years) are much shorter than those of the land and ocean. Nevertheless, a coupled spin-up of 230 years was performed prior to the resulting atmospheric state being combined with the evolving ocean and land states to initialize a final 200 year period of coupled UKESM1 spin-up. This extended duration was required because of solar radiation and surface temperature differences between UKESM1-CN and UKESM1 that impacted the land carbon and nitrogen pools. It also served to avoid any impacts on the model's climate, which might arise if the chemical tracers were far from equilibrium with the other components at initialization. The atmospheric chemical tracers therefore experienced 410 years of preindustrial coupled simulation during the spin-up (as did the rest of the atmosphere component).
In summary (with reference to Figure 2), the separate ocean and land spin-up states were combined into a single model initial condition after, respectively (4,800 + 230), years of ocean spin-up and (1,000 + 710) years of land spin-up. After a further 80 years of coupled integration, atmospheric chemistry fields were also combined with the evolving land and marine 3-D fields, providing a final initial state for a further 200 years of coupled UKESM1 spin-up. We deemed the spin-up to be complete when this adjustment amounted to a multidecadal land carbon flux of less than 0.1 Pg C year −1 (averaged over a century), as recommended in the C4MIP experimental protocol noted already Jones et al. (2016).
This simulation then initialized all components of the UKESM1's piControl simulation, from which preindustrial initial states were drawn for the CMIP6 historical ensemble (see the CMIP6 implementation paper of Jones et al., 2019).

Analyzing the UKESM1 Spin-up
The complexity of UKESM1 means a complete evaluation needs to be spread over several dedicated studies. Such studies to date include atmospheric chemistry in Archibald et al. (2020) and aerosols in Mulcahy et al. (2018). The physical climate model that underpins UKESM1, HadGEM3, is assessed at the same atmosphere-ocean resolution in Kuhlbrodt et al. (2018). Meanwhile, an overview of the entire model, analyzed for the near-present using the CMIP6 historical ensemble, is provided by Jones et al. (2019).
Evaluation here is focused primarily on the spin-up period itself to analyze the equilibration pathway of key climate-relevant properties. The model state of the piControl simulation-the end point of spin-up-is then briefly analyzed to evaluate the scientific performance of UKESM1. This is done across the point from which the piControl is first used to initialize CMIP6 Historical simulations. Evaluation of the piControl focuses on the slow timescale variables that need to be spun-up. The piControl continues beyond this point as a reference simulation for all other CMIP6 experiments.
The selection of the piControl somewhat complicates model evaluation since most target fields only have near-present day observations available. Such data contain signals of ongoing anthropogenic climate change that are absent in the preindustrial period that the piControl aims to represent. In the case of the deep ocean, one focus of the spin-up described here, these signals are relatively minimal or absent, but they are manifest in the surface ocean, the atmosphere, and the land, although natural or background processes are arguably still dominant for numerous variables. As such, intercomparison with observations is still informative, so long as differences are appropriately interpreted. Jones et al. (2019) provides comparisons of UKESM1 at Historical period time points aligned with modern observations. The specific observational data sets used for evaluation include the following: • World Ocean Atlas 2013, for ocean physics (interior Zweng et al., 2013) and biogeochemistry (interior and surface Locarnini et al., 2014a;2014b) fields. • Hadley Centre Sea Ice and Sea Surface Temperature (Rayner et al., 2003), for ocean SST and sea-ice fields.
In addition to the above, model-observation intercomparison makes use of multiannual periods throughout, rather than focusing on a single year. This aims to account for both interannual variability in the case of synoptic observations for which we have good observational data (e.g., satellite-derived surface fields), and the representative time frames associated with composite observational data sets that are assembled over time (e.g., point samples of the ocean interior). Observational products differ in their availability and reference periods but in general are available from the late 20th century, and typically used here for the period 1995 to 2010.

Time Evolution of the Spin-up Simulations
In the following, time series analysis focuses initially on the ocean spin-up branch because of its extended duration. Time series analysis of the land and atmosphere branches (per Figure 2) focuses on periods closer to the start of the piControl when these shorter branches have merged with the ocean branch.
The panels of Figure 3 (and Figure S1) track key physical properties over the full duration of the ocean branch of the spin-up. The panels break the spin-up into sections colored according to different run modes: two ocean-only phases, a coupled UKESM1-CN phase, a fully coupled UKESM1 phase, and a final section that corresponds to the formal CMIP6 DECK piControl. Table S1 presents the full list of run IDs associated with the spin-up period depicted, with some continuous phases actually split between several run IDs. In each case, the panels in Figure 3 present the 30 year rolling averages of the properties, together with the corresponding 30 year interannual range.
In terms of volume-averaged ocean bulk properties, while-unsurprisingly-neither exhibit large interannual variability when averaged globally, both temperature and salinity experience long-term drifts across ocean-only and fully coupled phases, and it is noticeable that the trends in the ocean-only phase are reversed (at least somewhat) during the coupled phase.
In the case of volume-averaged temperature (Figure 3, Row 1), an upward drift of approximately +0.06 • C ky −1 during the ocean-only phase flips to a downward drift of approximately −0.25 • C ky −1 when UKESM1 transitions to fully coupled simulation. For salinity ( Figure S1), a slight upward drift of +0.0015 PSU ky −1 is approximately reversed in the transition between ocean-only and fully coupled phases.
In the ocean-only phase, the small salinity trends are related to strong surface salinity relaxation and water flux balancing, while in the fully coupled phase, they reflect the conservation of water across the modeled Colors indicate different phases, with two ocean phases followed by a UKESM1-CN phase and then a full UKESM1 phase, prior to the start of the piControl. Solid lines indicate 30 year rolling averages of the properties, with the shaded areas denoting the corresponding 30 year range of annual averages. Row 1 shows the evolution of ocean-average volume and surface temperature. Row 2 shows the evolution of ice area in the northern and southern polar regions. Row 3 shows the evolution in circulation strength for the AMOC and Drake Passage. The time axis is indexed such that the end of spin-up (and the start of the piControl) is at zero, with total spin-up duration (per Table S1) indicated by the negative extent of the time axis.
Earth system. Meanwhile, modest drift in the heat content of the ocean-only phase is explained by the use of repeating surface forcing derived from a period of GC3.1 coupled spin-up simulation that exhibited a +0.2 W m −2 global mean, top-of-atmosphere (TOA) radiation imbalance (downward directed) under which the ocean warms. Disequilibrium creates a situation in which the upper boundary of the ocean is consistently driven in one direction. The switch to fully coupled UKESM1 (in which the global mean net TOA radiation balance is effectively 0 W m −2 ) permits a correction of this, as the slightly-too-warm ocean can then properly exchange heat with the dynamic overlying atmosphere. In short, the excess heat gained by the ocean during the ocean-only spin-up is lost again during the coupled phase of spin-up.
The panels of Figures 3 and S1 showing the corresponding surface quantities indicate more distinct behavior for temperature and salinity. The former finds comparable ocean-only and coupled phases once the bulk formulae are harmonized across the two model configurations. The latter shows surface salinity returning to its observationally derived initial value after a prolonged period of lower salinity during ocean-only spin-up.
Unlike the full-ocean averages of temperature and salinity, northern and southern sea-ice areas (Figure 3, Row 2) are highly dynamic, with large interannual variabilities across both ocean-only and fully coupled phases. In the coupled model, this variability in sea-ice area shows marked multidecadal patterns. In the ocean-only phase, sea-ice trends in both hemispheres quickly equilibrate (<100 years) under the repeating atmospheric forcing, albeit to slightly different averages between the forcings used. In the fully coupled phase, interannual variability is comparable in magnitude in the north, but noticeably larger in the south, and does not appreciably settle down in either hemisphere during the (short) duration of the fully coupled phase. Sea-ice area and its seasonality are discussed further later.
Finally, Row 3 of Figure 3 shows two important metrics of ocean circulation, the Atlantic Meridional Overturning Circulation (AMOC) at 26 • N and the Antarctic Circumpolar Current (ACC) transport through Drake Passage. The AMOC characterizes the poleward flow of warm water in the North Atlantic, playing both an important role in heat transport and the conditioning of water masses leading to deep water formation in the subpolar Atlantic (Smethie Jr et al., 2000). Since 2004, the AMOC has been well observed by the RAPID array at 26.5 • N, with an annual average ranging between 14.6 and19.3 Sv (Smeed et al., 2017). Drake Passage is a "pinchpoint" for the circular ACC that rings Antarctica, with a role in framing the continent's isolated and cold climate, and in climate variability modes such as the Southern Annular Mode (SAM) (Mayewski et al., 2009). The ACC is balanced by the meridional density gradient throughout the depth of the ocean, which, in turn, is set up by the wind and buoyancy forcing at the surface (Meredith et al., 2011). While not permanently instrumented like the AMOC, Drake Passage is intermittently sampled, with its transport estimated at 173 ± 11 Sv (Donohue et al., 2016). UKESM1's preindustrial AMOC is typically lower than that found by RAPID, but it necessarily omits the present-day greenhouse gas (GHG) and aerosol forcing. By contrast, all UKESM1 runs over the historical period simulate an AMOC that strengthens by approximately 2 Sv to a maximum of around 17 Sv in the 1990s . This is almost certainly linked to Northern Hemisphere aerosols changing the simulated interhemispheric energy gradient, cooling the north relative to the south, with AMOC strength responding to this.
During the first portion of the ocean-only phase ("Ocean 1" in Figure 3), both the AMOC and Drake Passage transport are at the bottom end of their observed ranges, and at significantly lower values than those found in the corresponding coupled UKESM1 precursor that provided the atmospheric forcing. As noted earlier, investigation of this uncovered a discrepancy in the bulk formulae used in the transfer of momentum between the atmosphere and ocean, with the ocean-only model following that of CORE (Large & Yeager, 2009) and the coupled model following COARE 3.5 (Edson et al., 2013). This was rectified in the subsequent, longer portion of ocean-only spin-up where the coupled model formulation was used ("Ocean 2" in Figure 3). Nonetheless, we retained the first portion of spin-up because we judged that it achieved an ocean carbon state that was closer to equilibrium than the initial state in spite of this discrepancy.
A potential issue in using distinct ocean-only and fully coupled spin-up phases is a mismatch in the behaviour of the model between these phases. In ocean-only mode, the ocean model experiences the atmosphere as unchanging forcing; in fully coupled mode, the ocean model dynamically interacts with the overlying atmosphere, potentially modifying the evolving atmospheric forcing. Broadly mirroring Figure 3, Figure 4 compares the behavior of both phases for physical properties across a 200 year period from the time point at which the coupled phase branches from the ocean-only phase with the coupled ocean initialized using the ocean-only state. Although the ocean-only phase is generally equilibrated at this point, to more clearly evaluate the significance of this transition, it was continued past this branch point.
In the case of temperature (Figure 4, Row 1) and salinity ( Figure S2), the volume-integrated panels show the differences between the equilibrium values to which the two phases have converged or are converging. At the surface, SST in the coupled phase remains close to that of its ocean-only parent (with larger interannual variability), while SSS quickly (<100 years) equilibrates at a slightly higher value (though with similar interannual variability; see Figure S2).
In the case of sea ice (Figure 4, Row 2), the coupled phase very quickly shows larger interannual to interdecadal variability, but the longer-term behavior takes a more extended period to manifest (visible in Figure 3). The difference between the two spin-up phases is more obvious in the case of Southern

10.1029/2019MS001933
Hemisphere ice, where the cyclic 30 year forcing period used in the ocean-only phase precludes the large multidecadal variability exhibited by the coupled model.
The pattern of increased variability in southern sea ice closely corresponds to that of variability of coupled mode Drake Passage transport in Row 3 of Figure 4. Here, high transport is associated with reduced sea-ice area, and vice versa, with the two properties connected via periodic deep ocean mixing off Antarctica influencing the latitudinal gradient of the ocean density field across the Southern Ocean and the Antarctic sea-ice extent in a coherent manner (as discussed by Latif et al., 2013;de Lavergne et al., 2014). In the ocean-only phase, ACC interannual variability is low (<10 Sv) but grows quickly as the coupled phase begins, reaching almost 40 Sv within the time period shown. As Figure 3 shows, this magnitude largely persists during the piControl simulation, with strong centennial-scale variation in Drake Passage transport.
The change between spin-up phases is more slight for the AMOC, and after an initial shock (<100 years), the AMOC between the two phases remains similar (Figure 3). And, unlike the relationship between Drake Passage transport and southern sea ice, the AMOC's relationship with northern sea ice is less clear. As noted, AMOC strength is related to the interhemispheric energy gradient (Marshall et al., 2014). Poleward heat transport driven by a strong AMOC might be expected to correlate with increased melt of northern sea ice. However, AMOC strength at 26 • N does not show such a clear correlation because the underlying relationship is more complex. For example, Northern Hemisphere cooling, relative to that in the south, can act to both directly increase Arctic sea ice and intensify the meridional energy gradient, leading to a strengthening of the AMOC, which can negate the direct cooling influence on sea-ice decrease. Furthermore, AMOC strength is also influenced by buoyancy and freshwater fluxes out of the Arctic, which impact the occurrence of deep water formation (Liu et al., 2019).
Across the ocean's physical parameters, the ocean-only and coupled phases do show similar patterns and magnitudes of variability. There is limited evidence of strong shocks as the model's state branches, and most properties quickly adjust, although the ocean's different equilibrium heat content between the phases manifests in the change of inflection of long-term drifts. Nonetheless, the prescribed 30 year atmospheric forcing in the ocean-only phase clearly prevents the model from reproducing the longer-term modes visible in the fully coupled phase, most noticeably in the circulation of the Southern Ocean.
Remaining with the ocean, Figure 5 shows corresponding spin-up time series for several key marine biogeochemistry metrics, over both the full period of the ocean branch spin-up, and for the same 200 year overlapping period for the ocean-only to fully coupled transition.
As already noted, the most significant features of this spin-up branch lie with how the two periods of ocean-only spin-up differ in response to a change to the formulation of surface momentum exchange. In the intergrated primary production panel, addressing this discrepancy results in a global increase of 15%. The mechanism for this large increase lies in the increased momentum transfer, which can be seen in Row 2 of Figure 5 to deepen average mixed layer depth (47 to 50 m), leading to elevated surface DIN concentrations that fuel productivity. This change between the ocean-only phases is markedly larger than the 4% decrease in primary production driven mainly by an 8% decrease in aeolian deposition of iron as the 30 year cycle of dust forcing becomes dynamic in the fully coupled simulation. It is also noticeable that the model's productivity response to such transitions requires a longer period to equilibrate than seen for the earlier physical properties. Here, periods of at least several hundred years, and approaching 1,000 years, are necessary for the model to reach a new quasi-steady state. Nonetheless, despite only a 30 year cycle in forcing during the ocean-only stages, the interannual variability of productivity is similar, though slightly greater, to that in fully coupled UKESM1. This can also be seen in intercomparison of the 100 year sections of ocean-only and fully coupled simulation.
Bar a short initial period of ingassing, net air-sea exchange of CO 2 is, on average, outgassing across the entire spin-up and into the piControl. Interannually, both ingassing and outgassing occur, but the long-term trend is to steadily outgas as equilibrium is approached. As already mentioned, an initial target for average net air-sea flux was 0.1 Pg C year −1 , and this was reached after around 3,500 years, during the ocean-only stage of spin-up. By the start of Historical ensemble simulations, an annual average outgassing flux of around −0.04 Pg C year −1 had been reached, with a multicentennial range of approximately −0.35 to 0.25 Pg C year −1 . Due to the repeating 30 year cycle of surface forcing (e.g., wind-driven piston velocity), progress toward this equilibrium is steadier during the ocean-only phase (with the exception of the jump between ocean-only stages), although the range of interannual variability is very similar between ocean-only and fully coupled phases. The continuous outgassing of CO 2 from the ocean is indicative of a model bias in ocean carbon content and is discussed in more detail later.
The bottom two panels of Figure 5 show how productivity and air-sea exchange vary interannually across the transition between the ocean-only and fully coupled phases. While there is a slight offset in primary production between these phases, the modeled variability is otherwise largely consistent. The same is true for air-sea CO 2 flux, for which both phases oscillate interannually around near-zero net flux. Overall, and much as with the model's physics, the differences between the two spin-up modes are relatively minor.  Table S2. The period shown follows on from after the initial land-only spin-up phase, using UKESM1-CN (from −865 years; #1) and then UKESM1 (from −210 years) prior to starting the piControl at 0 year. Together with the physical responses shown in Figure 4, these results indicate that the coupled model largely adjusts to a new equilibrium after around 150 years, even when its ocean is initialized from the end of an ocean-only simulation.
Finally, switching to the terrestrial system, Figure 6 illustrates the spin-up pathway of the carbon reservoirs in living biomass and soil, and the global fractional cover of three major aggregate land surface types: forests, grasslands, and bare soil. Unfortunately, due to archiving issues, not all of the model data are available. Any data gaps, however, occur earlier than 500 years before the piControl, so do not affect our evaluation of the model's final equilibrium state. The land spin-up branch began with the land surface model being run off-line under 30 year cycles of meteorological data drawn from a prototype of UKESM1. The land model was run for approximately 1,000 years off-line until the globally averaged vegetation cover and carbon and nitrogen pools (results not shown) reached a quasi-equilibrium state. These were then used to initialize the coupled model at Time Point 1 of Figure 6. As is evident from the immediate drift the land-model run off-line has a different stable state to the coupled model, reflecting the importance of land-atmosphere coupling and differences in meteorology to the forcing used off-line. Subsequently, after further 30 years at Time Point 1 (Figure 6), the turnover rate parameters were scaled to lower values to increase the size of the soil carbon and nitrogen pools. At the same time, the soil carbon and nitrogen pools were rescaled to be consistent with these turnover rate changes. The spin-up of the vegetation fractions continued with an additional tuning applied to the rate at which grass can expand and colonize bare ground, which was reversed at Time Point 2 ( Figure 6). The result was a rapid decrease in grass cover and concurrent increase in bare soil over a 10-year period. Over the course of the next 600 years, the spin-up continued with some changes made to parameters controlling snow-vegetation interaction and the rate of grass colonization resulting in a close to stable global mean state at the start of the final spin-up at Time Point 3 ( Figure 6). The drifts in soil and vegetation carbon over the course of the piControl were −0.07 and 0.0025 Pg C year −1 , respectively, over the first 1,000 years of the piControl, well within the C4MIP acceptable range. The drift in tree cover is also small at less than 0.5% over 1,000 years.
Although the global drift may be small, it can be more significant at the regional level, particularly if some regions are compensating for changes in carbon or vegetation cover in other regions. Figure 7 shows the drift in soil carbon across seven major biomes for the final part of spin-up and the piControl. The drift in most biomes is less than 0.001 Pg C year −1 with the exception of the tundra, boreal, and desert regions. Tundra and boreal regions lose soil Carbon at −0.012 and −0.017 Pg C year −1 of carbon per year, respectively. This reflects the particularly long residence of soil carbon in these regions and therefore the greater time required for the pools to equilibrate. The desert regions continue to accumulate carbon at 0.002 Pg C year −1 responding to changes made during the spin-up phase to grass colonization rates and therefore the flux of litter to the soil carbon. The pools also demonstrate some long-term variability. For instance, across the 250 years corresponding to the first Historical ensemble member, the Savanna biome accumulates 3 Pg C despite having lost carbon during the preceding spin-up period. Consequently, we recommend that any analysis accounts for both the ongoing drift in terrestrial carbon pools and the multiannual variability. Further, our regional drifts imply that benchmarking global drift pools is a necessary but possibly insufficient condition for evaluating the equilibration of land carbon models. Future spin-up efforts may wish to execute longer spin-ups in order to equilibrate all regions and ecosystems.
The atmosphere adjusts rapidly (over days or weeks) to a changed external forcing, such as associated with different sea surface temperatures or incoming top of the atmosphere (TOA) solar radiation. We, therefore, are generally not overly concerned with the spin up of atmospheric variables when bringing ESMs into balance with a preindustrial forcing. Nevertheless, one of the primary constraints used to evaluate ESM simulations of the preindustrial period is that, averaged over sufficiently long timescales, the global mean net TOA radiation balance should be zero (0 W m −2 ). This was a leading constraint used in developing UKESM1. A consequence of a zero TOA net radiation balance is that, also averaged over sufficiently long timescales, the global mean energy content of the climate system should be temporally stable. As ocean heat content constitutes the overwhelming majority of energy in the climate system Trenberth et al. (2014), this constraint equates to a temporally stable global mean ocean heat content. Observational constraints of the absolute value of the preindustrial ocean heat content are not available, neither are constraints on the component, solar (   estimates include an anthropogenic component. Observational estimates of preindustrial (more correctly the very early industrial period, e.g., 1850 to 1900) global mean surface air temperature (GSAT) do exist (e.g., HadCRUT4, Morice et al., 2012;GISSTMP, Lenssen et al., 2019), although observation coverage is limited during this early period. With this in mind, our primary targets for the UKESM1 preindustrial atmosphere are a near-zero global mean net TOA radiation balance and a temporally stable GSAT, close to observational estimates for the 1850-1900 period. As a consequence of these two constraints, temporally stable Arctic and Antarctic mean sea ice amount and volume are also a useful constraint. Figure 8 summarizes these quantities over the final 500 years of the coupled spin-up. The first 300 years of this run uses UKESM1 with off-line atmospheric chemistry (referred to as UKESM1-CN), with ozone and chemical oxidants prescribed from an earlier preindustrial simulation of UKESM1 with interactive chemistry. The latter 200 years are with interactive atmospheric chemistry enabled. The simulation is initialized at year minus 500, in the ocean by fields derived from the ocean-only spin-up (u-ar538) and on land and in the atmosphere using fields from the parallel land-spin-up run of UKESM1-CN (shown in Figure 8). At Year −200, interactive atmospheric chemistry is activated and the required chemistry fields initialized based on output from the parallel UKESM1 spin-up for atmospheric chemistry run shown in Figure 8. All other prognostic fields are propagated from the UKESM1-CN spin-up run (u-ar783 → u-au835, Years −500 to −200) into the final UKESM1 spin-up (years −200 to 0).
The primary spin-up characteristic in Figure 8 is an increase (of 1 W m −2 ) in both global mean TOA net downward solar (SW) and outgoing longwave (LW) radiation, leading to a near-zero, global mean net TOA radiation budget at Year 0 (the start of the piControl simulation). This shift clearly occurs at the point when UKESM1-CN switches to include interactive chemistry (UKESM1) and results from two differences between these model configurations: (i) in the manner marine-emitted DMS is processed through to cloud condensation nuclei (CCN) in the model atmosphere and (ii) in the simulation of stratospheric ozone. Both differences influence the absorption and reflection of solar radiation and therefore net TOA solar radiation. As a result of these differences, and to retain a near-zero global mean net TOA radiation balance as we transitioned form off-line to interactive chemistry, it was necessary to introduce a small tuning to the parameterization of seawater DMS in UKESM1 (see Jones et al., 2019, for more details). This tuning acted to reduce seawater DMS in biologically inactive regions of the global oceans, reducing the average cloud droplet number in marine clouds and thereby reducing the simulated atmospheric solar reflectivity and retaining the desired near-zero net TOA radiation balance (−0.09 W m −2 downward Jones et al., 2019).
From the start of the UKESM1 piControl (Year 0 on Figure 8), slower timescale variability in GSAT appears to largely disappear, in concert with a reduction in the variability of Antarctic sea ice. In the early part of Figure 8, these variables exhibit an inverse correlation, driven by variability in ocean overturning in the far Southern Ocean (as discussed earlier). While it is tempting to conclude the final coupled tuning reduced this internal variability, we note that similar timescale variability does intermittently reappear in later periods of the full UKESM1 piControl. Finally, long-term mean GSAT during the first 200 years of the piControl is around 287.5 K( 13.35 • C), suggesting a cold bias of 0.3-0.4 • C in UKESM1 compared to available observational estimates for the period 1850-1900 (Morice et al., 2012;Lenssen et al., 2019).

Equilibrium State
We now analyze the equilibrium state that results from the confluence of all three spin-up branches, focusing on the UKESM1 piControl simulation from the point at which CMIP6 historical ensemble members begin to be drawn. This time point occurs early in the piControl simulation but at a point where UKESM1 was judged to be sufficiently equilibrated. The strategy for using the piControl as the source for the historical ensemble, and additionally its role in controlling for model drift, is described in detail by Jones et al. (2019). We use a decadal climatology of the piControl from this point throughout. Figure 9 compares simulated sea surface temperature (SST) with the HadISST observation-derived product, HadISST, for the period 1870-1879 (Rayner et al., 2003). This period is chosen as it is closest to that which the piControl simulation aims to represent (1850), but note that HadISST is a data-assimilated reanalysis product with relatively limited observational constraint for this time period (but see Figure S3).

Ocean
Northern and southern summer periods are shown, together with the differences between the model and observations. In general terms, the model shows very similar patterns to those observed, both geographically and seasonally. Nonetheless, the difference plots show persistent biases in the model, including a warm Southern Ocean, warm upwelling regions, and strong cold bias in the western North Atlantic. The latter feature is the most pronounced of a series of zonal, dipole-like biases in the North Atlantic, which include warm biases off the eastern seaboard of North America and in the Irminger/Iceland basins, and a cold bias in the Greenland-Iceland-Norwegian (GIN) Sea.
In the case of the Southern Ocean, this warm SST bias is primarily driven by a corresponding positive bias in downward shortwave radiation that originates in cloud biases (i.e., in cloud amount and albedo). The warm SST biases in upwelling regions are principally a result of the relatively coarse resolutions of UKESM1's ocean and atmosphere components. In the ocean, the model cannot represent the necessary small-scale features of coastal upwelling, while in the atmosphere, coastal wind forcing cannot be resolved. The root of the North Atlantic dipole bias has a similar cause, with resolution causing the separation of the Gulf Stream from the eastern seaboard of North America to occur too far south, resulting in a path that is too zonal . Figure 10 illustrates the seasonal extents of northern and southern sea ice, again compared to the same period of HadISST (but see Figure S4). In the Arctic, modeled sea-ice extent is always greater than that observed, with the excess ranging between 1 and 4 10 6 km 2 seasonally, but greatest around the annual sea-ice minimum. By contrast, in the Antarctic, it is the simulated minimum sea-ice extent that most closely matches that observed, but modeled sea-ice growth is conspicuously weaker, leading to a maximum extent only around two thirds of that observed. These patterns of sea-ice biases generally align with the SST biases in the GIN Sea (cooler) and Southern Ocean (warmer).
Switching to focus on the ocean interior, Figure 11 illustrates the biases of temperature and salinity within the ocean, again compared to the WOA. In each case, the plots present so-called "thermohaline circulation" sections that center the zonal averages of both major basins around the interconnecting Southern Ocean (see Figure 11 for more details).
In the case of potential temperature, UKESM1 shows a general warm bias in the upper 3 km, a smaller cold bias below this. This pattern differs between basins, with the Atlantic showing a much stronger bias, particularly in the upper 1 km of the subtropics (30 • S-30 • N), where it can exceed 4 • C. The corresponding region of the Pacific generally shows a cool bias, although with a more complicated structure. Despite the marked warm bias in its surface waters noted earlier, the Southern Ocean shows generally weaker biases, particularly in its main Pacific sector. Similarly, the salinity biases in UKESM1 broadly track those of temperature, with a similar strong positive bias in the subtropical Atlantic and a negative bias in the subtropical Pacific.
Moving to ocean circulation, Figure 12 shows the global stream functions of meridional overturning circulation (MOC) for the model and the observationally derived Estimating the Circulation and Climate of the Ocean ( ECCO Forget et al., 2015;Fukumori et al., 2019) product. Qualitatively, the simulated MOC broadly follows that observed, with a strong positive cell focused in the upper water column, driven by the AMOC, and a weaker negative cell at depth, driven by Antarctic Bottom Water formation. Compared to that in ECCO, the model exhibits a slightly weaker Deacon Cell centered around 50 • S (Döös & Webb, 1994), indicative of weaker surface wind stress over the model's Southern Ocean. The maximum strength of the model's deep AABW cell is also weaker than estimated in ECCO. Figure S5 shows the corresponding simulated MOC patterns for the Atlantic and Indo-Pacific sections. In the Atlantic strong northward flow in the surface of the Atlantic is balanced by the production at high latitudes of NADW that flows southward at depth and overlies an Antarctic Bottom Water (AABW) cell driven from the Southern Ocean. However, while the strength of this northward flow is similar to that observed as noted earlier (cf. Figure 3), circulation of the large AABW cell underlying the NADW in the Atlantic is very weak (especially when compared to the corresponding cell in the Pacific). As noted in Figure 11, this cell is characterized by cool and fresh biases that are indicative of a Southern Ocean origin. While this pool is fed in part by Antarctic Bottom Water (AABW), it shows biogeochemical properties that are suggestive of a more sluggish transport than typical for this water mass (see later). Figure 11. A "thermohaline circulation" section of biases in modeled potential temperature (top) and salinity (bottom). The section tracks southward "down" the Atlantic basin from the Arctic to the Atlantic sector of the Southern Ocean, before tracking northward "up" the Pacific basin from the Pacific sector of the Southern Ocean to the Bering Straits. The aim is to capture the stereotypical transport of deep water from its formation as a "young" water mass in the high North Atlantic through to end as an "old" water mass in the North Pacific. Dotted lines mark the "boundaries" of the Southern Ocean at 40 • S in each basin. Biases in potential temperature are in • C, and in practical salinity units (PSU) for salinity. Observational data from the World Ocean Atlas climatology for the period 1995-2004. Switching to marine biogeochemistry, Figure 13 shows model-observation intercomparison of three fields of key properties: surface nitrogen nutrient, surface chlorophyll, and vertically integrated primary production. In each case, these fields are annual averages, with the nutrient field drawn from the present-day climatology of the World Ocean Atlas 2013 (Locarnini et al., 2014b), and the lower two fields from the period 2000-2009.
Row 1 shows DIN, the main limiting nutrient for biological productivity across the World Ocean. The general pattern of higher values in upwelling regions and at higher latitudes, particularly the Southern Ocean, and low values in ocean gyre regions is simulated. However, the model does display a number of marked biases: Concentrations are markedly elevated in and around equatorial Pacific upwelling, particularly in the adjacent South Pacific. These discrepancies stem from upwelling of excessively DIN-rich deep water, as is clearer from ocean interior concentrations (see later). These patterns of mismatch in MEDUSA-2.1 are very similar to those found previously by Yool et al. (2013) with MEDUSA-2 (despite a considerably longer period of spin-up).
Row 2 shows the observed and modeled surface chlorophyll concentrations (O'Reilly et al., 1998). While, again, MEDUSA-2.1 captures the broad patterns of the observed field, agreement is much weaker, and the model displays a number of strong biases. Most clearly, simulated Southern Ocean concentrations are noticeably higher, with elevated concentrations also seen in Equatorial and subtropical Pacific concentrations, in keeping with the corresponding DIN excess availability. Away from the Pacific, oligotrophic gyre concentrations are much more biased downward, with relatively large regions of very low chlorophyll in the subtropical Atlantic. Spatial patterns in this basin are also somewhat aberrant with a pronounced patchiness that is absent in the observations. As noted by Yool et al. (2013) (and Kwiatkowski et al., 2014), chlorophyll is generally poorly represented in marine biogeochemistry models, with models frequently performing much better for fields of other bulk properties (nutrients and carbon) and productivity. As well as its observed high dynamic range (note the plot  (Gent & McWilliams, 1990). Observational data from the ECCO V4r4 ocean circulation reanalysis for the period 1992-2017. log scale), Yool et al. (2013) suggest that this may stem from the strong plasticity (in reality and in models) of chlorophyll:carbon ratios relative to other quantities, and the resulting high dynamic range.
Finally, Row 3 shows vertically integrated net primary production, with observations represented by the simple average of three observationally estimated products, VGPM (Behrenfeld & Falkowski, 1997), Eppley-VGPM (Carr et al., 2006), and CbPM (Westberry et al., 2008). Although this is empirically derived from satellite chlorophyll (as well as other fields), MEDUSA-2.1's agreement with it is greater than for surface chlorophyll. The observed patterns of low and high values are generally reproduced, again with biases. These include excessive productivity in the Southern Ocean throughout the year, more latitudinally focused productivity in the equatorial Pacific, and noticeably low productivity in the North Atlantic.
As discussed, one of the drivers for UKESM1's spin-up is equilibration of the marine carbon reservoir. Figure 14 compares the observed and simulated surface concentrations of DIC and total alkalinity (Lauvset et al., 2016). DIC here is an observation-based estimate of preindustrial DIC, as this biogeochemical property has changed significantly since the beginning of the industrial revolution (especially so in the surface Lauvset et al., 2016). Modeled DIC concentrations generally show good agreement with the observed estimates, although model DIC is somewhat elevated in the Southern Ocean, while biased low in the Indonesian Archipelago and parts of the Arctic (although data availability remains somewhat limited in this region). While the patterns of model surface alkalinity are similar to those observed, alkalinity is generally lower in the model, and there are noticeable biases, particularly in the North Pacific and, again, the Indonesian Archipelago.
Since alkalinity acts in part as a buffering capacity for DIC, generally lower sea surface alkalinity will reduce the amount of DIC in surface waters and, in turn, the ocean interior. The interior impacts can be seen in Figures 15 and S6, which, respectively, show DIC and alkalinity along thermohaline transect sections (see earlier). These show both lower alkalinity in the upper 1 km of the water column, and the correspondingly lower DIC throughout the ocean interior. This is most obvious in the southward moving NADW and in the deep waters (>1 km) of the North Pacific, where model bias can exceed 100 mmol C m −3 .  Figure 16 presents a biome-based evaluation of land cover at the end of UKESM1 spin-up against two observationally derived estimates, the IGBP and CCI products. The upper panel provides a geographical perspective of where different biomes are located, while the lower panel shows the fraction of each land cover type that occurs in each biome for the model and from the data products, keeping in mind that vegetation type, amount, and geographical distribution are dynamically predicted in UKESM1. Overall, the model performs well, with simulated biomes largely capturing their observed compositions, although there are some biases. With the exception of the high latitude biomes, such as boreal forest and (especially) tundra, UKESM1 underestimates the observed fractional cover of C3 grasses. In tropical forest, it is largely replaced by broadleaf trees, while in extratropical forests and deserts, its low bias is countered by a high bias toward C4 grasses. In the grassland and savanna biomes, where grasses are found to dominate, C3 grasses are displaced by forest, mostly by broadleaf trees in savanna and needleleaf trees in grasslands. C4 grasses, meanwhile, are typically biased positive, while modeled shrubs show mixed biases with observations across the biomes. In terms of bare ground (i.e., no vegetation cover), UKESM1 only shows a bias in the tundra biome where C3 grasses are overly abundant. As the IGBP and CCI products are assembled from present-day observational data sets, they include vegetation cover changes driven by human influences. At least in part, the land "biases" identified in UKESM1 are consistent with these changes in land cover and therefore indicate the problem of evaluating a preindustrial climate state using present-day observations. Figure 16 by illustrating the geographical patterns of fractional cover of each land cover type. As noted, UKESM1 simulates excessive broadleaf forest cover and extent in the tropics, particularly in South America, but also in Africa and southeast Asia. However, UKESM1 does not include fire feedbacks, which may explain some of this overestimate. Inclusion of an interactive treatment of fire in other models (Hantson et al., 2016), as well as in our land surface scheme when driven by observed climate, improves this type of vegetation bias (Burton et al., 2019). The geographical range of needleleaf forest is generally modeled well, though fractional cover is elevated in northern boreal forests, and there is an anomalous Asian forest in the vicinity of Tibet. As well as being biased low, the extent of UKESM1's C3 grasses is noticeably circumscribed, with almost no grasslands in southeast Asia, and reduced extents in Europe and the Americas. However, as already mentioned, UKESM1 does erroneously simulate solid C3 grass cover across northern Siberia. The geographical cover of C4 grasses is better than for C3 grasses, but there is a marked positive bias in Australia, as well as westerly displacements of their abundance in both the northern and southern Americas. Shrub cover is generally underestimated across the world, with exceptions only in Asia, again around northern Siberia and Tibet. Finally, UKESM1's patterns of bare soil generally map well to those observed. The exceptions lie in high northern latitudes, where tundra areas have excessive C3 and shrub cover in UKESM1.

Discussion
The new UKESM1 model has been jointly developed by NERC and the UK Met Office to represent the fully coupled Earth system with a state-of-the-art level of realism. UKESM1 succeeds its CMIP5 predecessor, HadGEM2-ES, and, while incorporating evolved forms of components from this earlier model, is almost a wholly new model. A particular effort was made to ensure that coupling between ES components and physical climate components was fully prognostic, enhancing the utility of UKESM1 for investigating future coupled ES feedbacks. As part of its preparations toward use in CMIP6, UKESM1 requires the production of a preindustrial control state that can be used to initialize the DECK and other MIP experiments. Critically, this piControl should exhibit a near-steady state climate so that forced trends introduced to the model Earth system in various experiments are clearly distinct, and not confounded by model drift.
The primary components of the Earth system are its major reservoirs of heat and carbon-the atmosphere, the ocean, and the land. The physical sizes of these components, and the timescales of the major processes  Figure 11 explains the format of this section. Concentrations in mmol C m −3 . Observational data are from the GLODAPv2 climatology, for the preindustrial period. that govern them, both physical and biogeochemical, mean that equilibration to achieve a steady state is necessarily prolonged relative to the perturbation experiments typically performed in CMIP6. Furthermore, the full ESM is computationally expensive to run, with a turnaround time of only a few simulated years per wallclock day. However, the most expensive component of UKESM1, the atmosphere and its attendant chemistry, is also the fastest to equilibrate. Consequently, the strategy adopted here was to spin up the slow equilibrating components, the ocean and the land, decoupled from the atmosphere, and to only bring the full model together once much of their time evolution was complete.
In the case of the ocean, a period of 4,800 years of ocean-only simulation was required to achieve a net air-sea CO 2 flux within the threshold suggested by the C4MIP community. In the case of the land, a corresponding period of 1,000 years was used to bring the modeled reservoirs of carbon into the same net balance with the atmosphere. During both of these phases, the individual component models were run in forced mode, under an atmospheric data set (bulk properties, heat and freshwater fluxes, and winds) derived from simulations of precursor versions of UKESM1 run with preindustrial forcing. Subsequently, the model states from both component-only spin-up branches were combined with the atmosphere, and UKESM1's spin-up was finalized in fully coupled mode, first with prescribed atmospheric chemistry (UKESM1-CN), and subsequently with interactive chemistry also activated.
The branch of ocean spin-up found relatively rapid stabilization (≪1,000 years) of near-surface physical variables and major circulation metrics. Interior temperature and salinity, however, exhibit prolonged drift, which changes sign from ocean-only to coupled phases, although in the case of temperature, of very low Figure 16. Observationally derived geographical map of major land biomes (top), together with a comparison of the land cover type found in each biome for the model and two observational products, IGBP and CCI (bottom). Each biome appears as a separate triplet of bars, with the color composition of the bar relating to the vegetation cover types indicated in the key. The observational IGBP product is derived from Year 1992 data, while the CCI product is derived from Year 2000. magnitude compared to simulated trends over the Historical period . Biogeochemical processes, such as productivity and surface nutrients, typically had somewhat slower stabilization (≈1,000 years). Net ocean surface carbon flux, like interior temperature, essentially exhibited steady decrease over the spin-up, slowly reaching a net flux below the target threshold. Examination of interior carbon concentrations shows that this slow decline is a function of the marine biogeochemistry model "favoring" a slightly lower total carbon inventory, driven, at least in part, by a bias toward lower sea surface alkalinity (cf. Halloran et al., 2015). A notable, if unwelcome, feature of the ocean-only phase of spin-up was a bulk formulae discrepancy that initially resulted in lower momentum transfer between the atmosphere and ocean in the ocean-only configuration compared to the coupled model. While this clearly affected the absolute magnitudes of properties across the model (Figures 3 and 5), this was amended without any significant lasting impact on the broad state of the ocean, with the subsequent revised period of ocean-only spin-up ultimately coming to more closely resemble that of the final, fully coupled model. The transition between the ocean-only and fully coupled phases was found to introduce a "kick" across the model, although the Figure 17. Geographical maps of fractional cover associated with each land cover type for the model (left) and two observational products, IGBP and CCI (middle and right, respectively). In each case, increasing color intensity denotes greater fractional cover. The observational IGBP product is derived from Year 1992 data, while the CCI product is derived from Year 2000. immediate effects of this were typically found to quickly (≈100 simulated years) settle, followed by slower evolution to a slightly different final coupled state, for some predicted variables.
Overall, the ocean-only spin-up compares well with that in fully coupled mode. Inevitably, the modes and scale of variability exhibited in the ocean-only configuration are reduced compared to that of the fully coupled model (e.g., Drake Passage transport), partly because of the limited variability in the short period of atmospheric forcing used, but mostly because the absence of coupled responses between the ocean and atmosphere. In terms of model biases, several were noted in the ocean's physical and biogeochemical spun-up preindustrial state, most significantly the carbon deficit already noted.
The inclusion of an interactive nitrogen cycle in UKESM1 has made for very a slow spin-up because of the interaction between soil and vegetation. The mineralization of soil inorganic Nitrogen fertilizes vegetation and encourages growth and the turnover and quality of vegetation litter affects the soil state. The spin-up of the model has through necessity gone hand in hand with the application of tunings and fixes as the model advances to being frozen and ready without the possibility of a long (in excess of 500 years) spin-up postfreeze. The computationally efficient off-line JULES model, forced using surface level atmospheric fields, was used initially for approximately 1,000 years of spin-up. However, when subsequently coupled directly to the atmosphere the model's behavior was found to differ, primarily because in the coupled model the change in vegetation state is able to feedback on the climate. The result is that an extended period of online spin-up is still required. Furthermore, the model shows some long periods of variability, making it hard to assess the degree of drift while model integrations are proceeding.
One of the further challenges is deciding on an appropriate preindustrial state given the general lack of observational data for the 1850s. We generally rely on present day data such as the land cover ( Figure 16) and make informed assessments around the expected level of change over the past due to land use change and the role of climate.
In UKESM1, we have achieved a near spun-up state for the ocean and land carbon pools well within the requirements of C4MIP for making assessments of carbon budgets for climate targets. However, as is shown in Figure 7, there can be significant regional drifts, which in some cases may oppose each other and give the impression of a better steady state. In UKESM1, the land is generally losing carbon driven by soil carbon losses in the Boreal and Tundra biomes. These are the regions with the slowest carbon residence times and therefore the most difficult to equilibrate. The high-latitude losses are slightly offset by the small positive drift we see in the Savana biome. While both the land and ocean components are losing carbon to the atmosphere, its fixed preindustrial CO 2 concentration masks this. However, in the fully coupled emission-driven model, these net fluxes to the atmosphere would result in a positive drift in atmospheric CO 2 . The spin-up of the emission-driven model is a separate activity that takes advantage of the more completely spun-up state from a later time point of the piControl.
In terms of the major Earth system quantities pertinent to anthropogenic change, the duration of spin-up in UKESM1 allowed these to reach quasi-equilibrium. After tuning (see Jones et al., 2019), net top-of-atmosphere radiation balance reached −0.09 W m −2 by the conclusion of UKESM1 spin-up, compared to the perturbed present-day net flux of approximately 0.61-0.81 W m −2 (Johnson et al., 2016). Surface ocean temperature drift for the same spin-up period was 0.016 • C decade −1 , as compared with observation-based estimated ranges during the historical period of 0.042 to 0.054 • C decade −1 (1880-2012) and 0.072 to 0.124 • C decade −1 (1979-2012) (Hartmann et al., 2013). Exchange of CO 2 between the atmosphere, land and ocean, reached net fluxes of 0.020 and −0.039 Pg C y −1 with the land and ocean, respectively, over the final century of spin-up, well below the C4MIP target of 0.1 Pg C y −1 sought .
In the preceding analysis of equilibration, the focus has largely concerned the exchanges of carbon between the land, ocean, and atmosphere components of the model. Table 1 presents the linear trends in ocean properties at different depth horizons for the final 500 year periods of both the ocean-only and fully coupled spin-up phases. While carbon fluxes fall below C4MIP's drift criterion (see Figures 5 and 6), it is clear that the ocean's state is still drifting and that these drifts have generally increased with the transition from the long duration ocean-only phase to the much shorter duration fully coupled phase.
As already noted, drifts in ocean temperature between these phases are a response, respectively, to a heat flux imbalance in ocean-only forcing, and the subsequent equilibrating response when fully coupled. Trends in nitrogen and iron nutrients have leveled off during the long ocean-only phase but have grown into the fully coupled phase as dust-forcing both changes and becomes more dynamic. Opposite sense trends result in these two nutrients and are much larger in the upper ocean where the change in iron is affected. Meanwhile, although the air-sea flux continues to equilibrates, drift in the ocean's surface carbon cycle is affected by a more dynamic hydrological cycle that increases surface alkalinity (tracking salinity Jiang et al., 2014), buffering higher DIC concentrations. In general, model drift is greater at the surface than at depth, although this varies between properties, most obviously dissolved oxygen. Here, surface concentrations are essentially controlled by the temperature-dependent solubility of this gas, while interior concentrations are affected by remineralization of sinking organic material. As noted previously, the fully coupled phase has slightly lower production of organic material because of reduced dust deposition and greater iron limitation. In turn, this translates to elevating interior oxygen as less oxygen is consumed.
Overall, Table 1 underscores the difficulty in equilibrating ESMs, especially where spin-up modes such as ocean-only incompletely capture the behavior of the fully coupled model.
A number of lessons can be drawn from the experience of the spin-up of UKESM1. Note. Drift rates calculated as the linear fit across the final 500 year periods, and shown as ky −1 .
Inevitably, biases occur across the model components, but a particularly marked bias is that of the ocean's dissolved inorganic carbon pool. As illustrated in Figure 15, UKESM1 shows a general deficit in ocean DIC concentration, together with patterns of bias that align with those in nutrients and oxygen. Some of these biases stem from deficiencies in modeled circulation, but MEDUSA-2.1's biogeochemistry plays a key role in others. While some minor tuning of model parameters took place during the development of UKESM1, no tuning to improve these interior ocean biases was undertaken, principally because of the timescales necessary (simulated and wallclock) to equilibrate changes to identify improvement (Yool et al., 2013). As noted earlier, there are off-line and accelerated simulation modes that can assist with this, although none were mature enough within the infrastructure of UKESM1 to be used during CMIP6 preparations. As such, a key lesson, and future aspiration for UKESM1, is the adoption of techniques for more rapid model equilibration, to facilitate both the identification and tuning-out of such biases.
Focusing on the component-only phases of spin-up, an obvious lesson lies in ensuring the interface exchanges between model components and the atmosphere are calculated in a manner consistent with that of the fully coupled model. As the ocean results show, and drawing also from land-only preparations, inconsistency favors alternative steady states, with the potential to favor different evolutions of heat and carbon between the component-only and coupled configurations. Given the ultimate aim is a spun-up model state consistent with the coupled model, a requirement is that the component models behave as close to the coupled model as possible. It is also worth remarking that, since we expected our ocean-only phase to differ from that of the coupled model because of the absence of ocean-atmosphere interactions, the source of the differences noted in the first ocean-only spin-up phase took time to be discovered. Ideally, the relationship between the fully coupled and component-only versions of an ESM should be formally examined, both in terms of code and coupling (e.g., the same parameterizations being used with the same input properties in the same ordering), and in the resulting simulation dynamics.
Another factor that our spin-up experience identified is the selection of the atmospheric forcing itself. Here, our ocean-only phases were driven using atmospheric forcing from periods of GC3.1 piControl simulation in which there was a top-of-atmosphere imbalance. This imbalance (+0.15-0.2 W m −2 ) is in the GC3.1 run throughout its piControl. This led to our ocean-only model consistently warming during spin-up, admittedly only a small absolute amount, but large enough that the final, fully coupled spin-up phase could be seen reversing this in response. Since stable ocean heat content is one of the key targets of spin-up, this points to the need for careful selection of atmospheric forcing, again, with the aim to be as consistent as possible with the atmosphere in the target coupled model. On a similar note, another feature of the atmospheric forcing used is its duration and character. Initial experience with limited-duration GC3.1 simulations found only modest variability in the ocean and atmosphere, both in terms of the absolute magnitude of variability and its temporal profile. Consequently, a relatively short, multidecadal period was selected for use in ocean-only spin-up. However, as results shown here illustrate, the model clearly exhibits variability of much larger magnitude, and with much longer periods, most clearly in UKESM1's Southern Ocean, where Drake Passage transport exhibits strong centennial-scale cycles. While a forced ocean-only model is unable to respond in the same way as the ocean in a fully coupled simulation because of the absence of ocean-atmosphere interactions, the short periods of forcing used here are not necessarily representative of what the full model can produce. Overall, an assessment of the flux biases of downward atmospheric forcing, and the role of slow timescale variability in atmospheric forcing on both, ocean-and land-only spin-up, requires further analysis.
An item that is not apparent in the earlier description of UKESM1's spin-up was its interaction with the model's development cycle. Since UKESM1 includes numerous new model components and developments relative to its CMIP5 predecessor, HadGEM2-ES, it required a lengthy period of development. The timescales associated with CMIP6 and with the throughput of the fully-coupled model (approximately 4 simulated years per 1 wallclock day) meant that development and spin-up necessarily occurred in parallel. While this meant that the spin-up was not "clean" (i.e., was not made using a single identical model throughout) and that it was not without inconsistency as problems were ironed out (e.g., the ocean-only bulk formulae issue), this mode of operation maximized the time available for spin-up. It avoided the need to wait for code freezing of a final version and permitted the addition of features that would otherwise not have been included. A number of coupling interactions in UKESM1, in particular, became possible because of this flexible approach to the development of UKESM1 from new components and its spin-up. The alternative approach of finalizing first would necessarily have either delayed UKESM1's participation in CMIP6 or required the use of a less complete ESM.
Of particular value in the development, tuning, and spin-up of UKESM1 was the availability of the BGC-val evaluation suite (de Mora et al., 2018). Focused on the ocean component, this tool automated the analysis of simulations, providing a range of plots covering geographical, depth, and globally integrated properties, as well as comparisons with observational fields where available. Initially used on a run-by-run basis, BGC-val became invaluable in the intercomparison of multiple runs, and in monitoring the spin-up to identify and avoid runtime or model bias problems. While most ESM groups will already have access to such tools because of the role they can play, we would encourage new entrants to the field to acquire (by adoption or development) such a tool.

Conclusions
• The UKESM1 model was spun-up using a combination of component-only phases for land and ocean with atmosphere forcing derived from coupled climate precursors of UKESM1, followed by a period of fully-coupled simulation. • Model states from parallel ocean (≈5,000 year) and land (≈1,600 year) spin-up branches were united with the atmosphere and, later, the full atmosphere chemistry and aerosol branch (≈240 years). • The resulting preindustrial control has a top-of-atmosphere heat balance of less than −0.09 W m −2 and net atmosphere-ocean and atmosphere-land CO 2 fluxes of less than 0.1 Pg C y −1 . • Although equilibrated at global scale, analysis of land carbon fluxes indicated that regional shifts were significant, implying that longer spin-up periods are required to ensure regional as well as global equilibration. • Issues encountered during spin-up included consistency of the interfaces of component-only models, the duration, and variability of the atmospheric forcing, including its overall consistency with atmospheric forcing in the target coupled model, and the important role played by rapid-turnaround evaluation tool. • While some tuning of UKESM1 was undertaken during spin-up, the slow turnover of the ocean component and conventional spin-up modes used here limited its scope, supporting the future tailoring of accelerated spin-up techniques to UKESM1 to reduce ocean biases, as well as achieve better equilibration.