Estimating Ocean Heat Uptake Using Boundary Green's Functions: A Perfect-Model Test of the Method
Abstract
Ocean heat uptake is caused by “excess heat” being added to the ocean surface by air-sea fluxes and then carried to depths by ocean transports. One way to estimate excess heat in the ocean is to propagate observed sea surface temperature (SST) anomalies downward using a Green's function (GF) representation of ocean transports. Taking a “perfect-model” approach, we test this GF method using a historical simulation, in which the true excess heat is diagnosed. We derive GFs from two approaches: (a) simulating GFs using idealized tracers, and (b) inferring GFs from simulated CFCs and climatological tracers. In the model world, we find that combining simulated GFs with SST anomalies reconstructs the Indo-Pacific excess heat with a root-mean-square error of 26% for depth-integrated changes; the corresponding number is 34% for inferred GFs. Simulated GFs are inaccurate because they are coarse grained in space and time to reduce computational cost. Inferred GFs are inaccurate because observations are insufficient constraints. Both kinds of GFs neglect the slowdown of the North Atlantic heat uptake as the ocean warms up. SST boundary conditions contain redistributive cooling in the Southern Ocean, which causes an underestimate of heat uptake there. All these errors are of comparable magnitude, and tend to compensate each other partially. Inferred excess heat is not sensitive to: (a) small changes in the shape of prior GFs, or (b) additional constraints from SF6 and bomb 14C.
Key Points
-
Green's functions (GFs) broadly reconstruct simulated ocean heat uptake when constrained by simulated observations in a historical simulation
-
Errors in the GF method arise from insufficient observational constraints and forced changes in ocean transports
-
GFs underestimate the Southern Ocean heat uptake when using sea surface temperature anomalies as boundary conditions
Plain Language Summary
Ocean warming is caused by “excess heat” being added to the ocean surface by air-sea fluxes and then carried to depths by ocean currents. Tracking global ocean warming is important for monitoring climate change. However, a substantial amount of ocean warming occurs at depths, where temperature measurements are scarce. A workaround is to treat well-observed surface ocean warming as a heat source, and propagate it downward using ocean currents. This method of estimating the interior ocean warming is called the Green's function (GF) method. But how accurate is the GF method? Here, we address this question by treating a computer simulation of the historical ocean as the real world, and comparing simulated ocean warming (as the “truth”) with that estimated using the GF method. We find that the GF method broadly reconstructs simulated ocean warming. The results contain some inaccuracies because: (a) neither computer simulations nor observations give an accurate estimate of ocean currents, and (b) part of the surface temperature changes cannot be treated as a source of the interior ocean warming.
1 Introduction
Imbalance in Earth's top-of-atmosphere radiative forcing leads to accumulation of “excess heat” in the climate system. Over 93% of the excess heat is stored in the ocean, causing ocean warming and sea-level rise (Meyssignac et al., 2019). Excess heat invades the ocean from the surface, like a drop of dye spreads in a water tank. This process can be conveniently described using a mathematical tool called Green's functions (GFs). Here, we examine how accurately excess heat in the ocean can be estimated using GFs.
A change in ocean heat content can be understood in terms of excess and redistributed heat content. Excess heat is defined as the change (warming or cooling) that is added to the ocean by air-sea fluxes, and then carried to depths by ocean transports. Redistributed heat, on the other hand, is defined as the change of the pre-existing heat in the ocean (i.e., spatial redistribution). Isolating excess heat is useful because: (a) excess heat change can be constrained by observations of transient tracers in the ocean (see Section 6), and (b) excess heat change dominates global/basin integrated ocean heat content change. Under CO2 forcing, climate models show that excess heat largely accumulates in the North Atlantic and the Southern Ocean, while redistributed heat tends to accumulate at low latitudes (Gregory et al., 2016; Newsom et al., 2022). Excess and redistributed heat are both theoretical constructs; neither of them is directly observable in the ocean.
Excess heat at depths can be estimated by propagating its surface “source” downward using boundary GFs of the tracer equation (Holzer & Hall, 2000). We refer to this method as the GF method. The source or boundary condition (BC) of excess heat is often computed from observed sea surface temperature (SST) in the literature (e.g., Messias & Mercier, 2022; Zanna et al., 2019). Boundary GFs represent the ocean's surface-to-interior transport. They can be derived from: (a) simulating idealized tracers in a model (e.g., Khatiwala et al., 2005; Zanna et al., 2019) or (b) solving an inverse problem using tracer observations (e.g., Holzer et al., 2010; Khatiwala et al., 2009). The GF method adds useful information to the in-situ estimate of ocean heat content change. The two estimates are directly comparable for basin integrals; regionally, their differences indicate redistributed heat in the ocean.
We refer to GFs derived from model simulations as “simulated GFs,” and GFs inferred from tracer observations as “inferred GFs.” In practice, both types of GFs are at best approximations of the real-world ocean transport, due to various assumptions, simplifications and trade-offs. Simulated GFs are often coarse grained in space and time, hence they do not fully capture the covariance between the true GFs and surface BCs. In addition, simulated GFs rely on a model's ocean transports, but no model is perfect. Inferred GFs, on the other hand, do not rely on a model; but they too are inaccurate, because observations are insufficient constraints.
As well as the GFs, surface BCs are not perfectly known for estimating excess heat. SST anomalies, as used by Zanna et al. (2019), are contaminated by redistributed temperatures which are not BCs of excess heat. This error affects both types of GFs.
The accuracy of the GF method for estimating excess heat in the ocean has not been examined in the literature. In this study, we address this problem using a HadCM3 historical simulation (1860–2008). We treat this simulation as the real world, and compare excess heat diagnosed in it (as the “truth”) with that estimated using simulated/inferred GFs. This approach is useful because it allows a separation of excess and redistributed heat and a quantification of different errors, both of which are not accessible in observations.
Because our historical simulation agrees well with observations for large-scale ocean heat uptake, our error estimates are relevant to applying the GF method to the real world. Importantly, our result pinpoints the main error sources in the GF method, and provides a quantitative benchmark for each of them. Nonetheless, we expect that at least some of our error estimates are HadCM3 specific, especially since HadCM3 is a coarse resolution model and not constrained by observations. Future studies with high-resolution models or ocean state estimates would be useful to provide a more robust error estimate.
Setup of the HadCM3 historical simulation and definitions of excess and redistributed heat are explained in Section 2. In Section 3, we explain how to solve the passive tracer equation using GFs. Section 4 explains the method of simulating GFs. Section 5 evaluates excess heat estimates based on simulated GFs. The same pattern is repeated in Sections 6 and 7, but for inferred GFs. Finally, a summary is given in Section 8 and discussions in Section 9.
2 Historical Simulation and Temperature Tracers
2.1 Setup of Historical Simulation
HadCM3 is an Atmosphere-Ocean General Circulation Model (AOGCM) that has been used extensively for climate studies (Gordon et al., 2000). The HadCM3 atmosphere model is based on the UK Met Office Unified Model, with a horizontal resolution of 2.5° × 3.75° and 19 vertical layers. The HadCM3 ocean model is based on the Cox (1984) model with a horizontal resolution of 1.25° × 1.25° and 20 vertical levels (vertical resolution is enhanced near the surface). Horizontal eddy mixing of tracers in the HadCM3 ocean is parameterized using the Gent and Mcwilliams (1990) and Redi (1982) schemes.
We run a pre-industrial control experiment and a historical experiment in parallel with HadCM3. Both experiments start from a pre-industrial state at 1860 and run to 2008. (This choice omits the ocean's slow response to the global cooling before 1860 cf., Gebbie and Huybers (2019).) The historical experiment is conducted by adding historical effective radiative forcing QERF (space and time dependent) to the sea-water surface. To compute QERF the atmosphere model ECHAM6.3 (Giorgetta et al., 2013) is forced with time-dependent historical changes in all forcing agents and fixed pre-industrial SSTs and sea ice concentrations, following the design of the piClim-histall experiment (Pincus et al., 2016). This ECHAM6.3 simulation is used in Gregory et al. (2020) to compute the global-mean QERF. Note that we choose to force HadCM3 with QERF instead of adding forcing agents to its atmosphere; Appendix A explains the motivation for this choice.
2.2 Evolution Equations of Temperature Tracers
2.2.1 Historical and Control Temperatures


The control and historical Θ fields are different because the two experiments have different Φ and Ψ. The ocean transport operator Φ is different because global warming affects ocean transports in many ways; for example, a reduction in high-latitude convection. The surface source Ψ = Qctrl/(ρ0cpdz1) in the control experiment, where Qctrl is the net surface heat flux (W m−2) under the pre-industrial condition. In the historical experiment Ψ = (Qctrl + QERF + Q′)/(ρ0cpdz1); the two additional terms come from: (a) the historical forcing (QERF) and (b) climate feedbacks (Q′) in response to the forcing. ρ0cpdz1 is the top layer thermal inertia (J K−1 m−2), wherein ρ0 is reference density, cp specific heat capacity, and dz1 top layer thickness.
2.2.2 Linear Equations of Temperature Evolution


It is important to note that Lctrl and Lhist are linear operators when applied to any tracer fields. For instance, we have Lhist(Θhist) − Lhist(Θctrl) = Lhist(Θhist − Θctrl). The same does not hold for the Φ operator because it is nonlinear in Θ. In Section 2.2.3, we will use the linearity of Lhist to derive the governing equation of redistributed temperature. Lctrl and Lhist both have time-varying coefficients due to variability and change in ocean transports.
2.2.3 Excess and Redistributed Temperatures



The key difference between Θe and Θr is that Θe only comes from the surface, while Θr has sources throughout the volume of the ocean. The global volume integral of Θr is zero, because the effect of L integrates to zero over the global ocean. (Θr defined in Gregory et al. (2016) does not have exactly zero volume integral.)
2.2.4 Converting Temperature to Heat Content
We compute ocean heat content anomaly as
. r is a 3D position vector of the ocean,
an arbitrary control volume, and ρ0cp ≡ 4 × 106 J K−1 m−3. Applying the same procedure to Θe and Θr results in excess heat content
and redistributed heat content
, respectively
.
2.3 Evaluation of Historical Simulation
The HadCM3 historical simulation captures the surface and depth integrated ocean warming in observations reasonably well (Figure 1). The global mean SST in HadCM3 generally follows that in HadISST (Rayner et al., 2003), but it does not capture the early 21st century warming hiatus in HadISST (Figure 1a). HadCM3 also tends to overestimate the surface cooling after volcanic eruptions compared to observations (Figure 1a); this is a common feature among CMIP5 models (D. M. Smith et al., 2016; Marotzke & Forster, 2015). In both HadCM3 and observations of Cheng et al. (2017), the global integrated (0–2,000 m) increases by about 300 ZJ in 2008 relative to 1946–1955 (1 ZJ = 1 × 1021 J); more than half of that is stored in the Indo-Pacific (Figures 1b–1d). HadCM3 does not capture the observed plateauing of
increase after the 1963 Mount Agung eruption (Figures 1b and 1c).

Surface and depth integrated ocean warming in the HadCM3 historical simulation. (a) Global averaged sea surface temperature. (b) Global integrated ocean heat content. Panels (c and d) are the same as panel (b), but for basin integrals. Global and basin integrals are calculated for the 0–2,000 m layers. All quantities are shown as anomalies relative to the 1946–1955 average. For comparison, observational estimates from Rayner et al. (2003) (HadISST) and Cheng et al. (2017) (heat content) are also plotted. 1 ZJ = 1 × 1021 J.
We compare and
simulated in HadCM3 with those in Bronselaer and Zanna (2020) (BZ2020). BZ2020 infers
by scaling the pattern of anthropogenic carbon in the ocean;
is then derived by subtracting inferred
from observed
change.
HadCM3 and BZ2020 have similar patterns of changes during 1951–2008 (Figure 2 left column, changes are integrated over 0–2,000 m). In both of them,
tends to accumulate in the subtropical gyres and the North Atlantic, but features little signal at low latitudes. HadCM3 has larger
changes than BZ2020 in the North Atlantic and the Arctic. This is partly due to different definitions of
. The
of BZ2020 is defined in a fixed-circulation scenario, which has a smaller QERF + Q′, hence a smaller
, than a free-circulation scenario (i.e., the HadCM3 simulation) at northern high latitudes (Winton et al., 2013).
is less coherent in space than
in both HadCM3 and BZ2020; the two data sets show very different
changes in the subpolar North Atlantic and the Southern Ocean (Figure 2, right column).

Linear trends of excess and redistributed heat content (0–2,000 m integrated) during the 1951–2008 period. (a and b) Bronselaer and Zanna (2020) (based on anthropogenic carbon). (c and d) HadCM3 historical simulation. 1 GJ = 1 × 109 J.
2.4 Heat Uptake by Control Ocean Transport






Changes of ,
and
in 1999–2008 relative 1946–1955 are shown in Figure 3 (a change is denoted as “Δ”).
is much less uniform than
and
across latitudes (Figures 3a and 3b), highlighting the role of
in shaping the patterns of
. A similar result is found in Zika et al. (2021) but on a shorter timescale (2006–2017).

Excess heat content change resulting from (1) historical and (2) control ocean transports. These two quantities are denoted as (red line) and
(blue line), respectively. Total heat content change (
=
+
) is shown as black lines. (a and b) Zonal-and-depth integrated change (0–2,000 m). (c and f) Depth distribution of panels (a and b). A change is calculated as the difference between 1999–2008 and 1946–1955. In panels (e and f), contours indicate
and
; shading indicates
minus
. Contour levels are 10, 30, and 45 in panel (c); 5, 15, and 25 in panel (d); 10, 20, 30, 50, and 70 in panel (e); and 5, 10, 15, and 25 in panel (f). 1 PJ = 1 × 1015 J. 1 TJ = 1 × 1012 J.
The latitude distributions of and
are very similar, especially in the southern subtropics (Figure 3, compare red and blue lines). This suggests that the patterns of
is mostly driven by the climatological ocean transport (i.e., Lctrl), not its transient response (i.e., differences between Lhist and Lctrl). A similar conclusion was found in several climate models under 1% increase of the atmospheric CO2 concentration (Couldrey et al., 2021; Gregory et al., 2016). Differences between
and
are most evident at northern mid latitudes, where
is redistributed equatorward relative to
in 0–200 m (Figures 3e and 3f shading). This redistribution pattern implies a weakening of the poleward ocean transport in the historical simulation.
3 Formulating Tracer Evolution Using Green's Functions




3.1 Concentration GF Formulation


3.2 Interpretations of Concentration GFs
Gc can be interpreted from two perspectives. When we fix the surface coordinate (rs, ts), Gc(r, t) is a time-evolving 3D field in the ocean. The 3D field depicts how a tracer injected at (rs, ts) spreads in the ocean subject to zero concentration BCs at all other times and surface locations. The BCs remove any tracer that surfaces after ts, hence we have , where τ = t − ts is elapsed time. This perspective is useful for probing GFs from forward simulations in an ocean model (see Section 4).
When we fix the field coordinate (r, t), Gc(rs, ts) is a time-evolving 2D map of the ocean surface. It shows how sensitive X(r, t) is to individual pulses in its surface history Xs. Holzer and Hall (2000) interpreted the 2D map as a measure of how a tracer injected at (r, t) surfaces in the time-reversed flow after t − ts. This perspective is useful for inferring GFs from tracer data (see Section 6). Causality requires that Gc = 0 whenever t <ts.

Gc has been used to study the transit-time distribution of the ocean (e.g., Ito & Wang, 2017; Maltrud et al., 2010; Peacock & Maltrud, 2006) and to estimate the ocean's uptake of anthropogenic carbon and heat (Gebbie & Huybers, 2019; Khatiwala et al., 2009; Newsom et al., 2020; Zanna et al., 2019).
3.3 Air-Sea Flux GF Formulation



3.4 Limitation of Boundary GFs
We want to stress that Gc and Gf are both boundary GFs; that is they only account for tracers emitted from the surface. The redistributed temperature Θr cannot be accounted for using Gc or Gf because it has non-zero source below the surface (Equation 7).
4 Simulating GFs in an Ocean Model
4.1 Approximations of Simulated GFs
The boundary GFs, Gc and Gf, can be generated for an ocean model by simulating passive tracers in it. By definition, we need to compute a GF for every possible (rs, ts), which is computationally demanding. To reduce computational cost, we make the following approximations.
First, we assume that ocean transports are constant. Taking Gc as an example, this assumption means: (a) Gc is the same for Lctrl and Lhist, (b) Gc does not depend on ts, hence Gc(r, t | rs, ts) = Gc(r, t − ts | rs, 0). Note that Gc(r, t − ts | rs, 0) only needs to be solved once (at ts = 0) for every rs. The constant-transport assumption neglects variability and forced-change in ocean transports; we refer to the resulting errors as a “unforced-transport error” and a “forced-transport error,” respectively.
Second, we assume that the boundary terms, Xs and , are dominated by large-scale patterns, hence we can approximate tracers emitted from them using coarse-grained GFs. Specifically, we derive GFs using surface patches defined in Figure 4. For Gc, we divide the global ocean into 27 regions based on the climatological surface densities in HadCM3, similar to Khatiwala et al. (2009). For Gf, we divide the global ocean into 20° latitude bands for each basin (20 patches in total). This step greatly reduces the dimension of GFs at the surface (the dimension of rs is about 1 × 104 in a 1° × 1° model). Xs and
are averaged onto the corresponding patches when convolved with GFs.

Surface patches for simulating boundary Green's functions Gc and Gf. Shading indicates the patch index. Gc propagates concentration boundary conditions, while Gf propagates surface sources/sinks.
Finally, we approximate the Dirac delta function in Equations 11 and 14 using a boxcar (rectangular) function with a unit height. The boxcar function lasts for 1 year after it is activated, so that the resulting GFs capture the effect of ocean transports averaged over a year, not that of a particular month. Using surface patches and the boxcar function neglects the covariance between GFs and the boundary terms within patches and years. We refer to this error as a “patch error.”
4.2 Defining Simulated GFs








4.3 Estimating Tracers Using Simulated GFs





We introduce as a shorthand for
. Using this notation, the
estimate of Θe can be written as
(substitute Θe for X in Equation 17), where
is Θe at the surface averaged onto patches. Similarly, the
estimate of
can be written as
.
4.4 Error Definitions



























We compute the errors for the estimate (Equation 18) in a similar way as for the
estimate (Equation 17). The only difference is that the boundary term
is the same for
and Θe in the
estimate by definition.
4.5 Surface Concentration BCs
,
and
are supplied as anomalies relative to 1860–1880 when evaluating Equation 17. This step is to exclude a shock in
(+0.15°C in global mean) shortly after the start of the historical simulation. Because
does not show a similar behavior, we suspect that the shock is due to an abrupt change in ocean transports. If not removed, the shock would cause a warm bias in the
estimate of Θe, because
is derived from the control experiment. Since this warm bias can be removed easily, we do not count it as a forced-transport error.
A comparison between ,
and
is shown in Figure 5 for the 1999–2008 average.
(black line) consistently has less warming than
(red line) in the Southern Ocean and the North Atlantic (north of 40°N) by as much as 2°C (Figures 5a and 5b). This difference is likely caused by a reduction of convection and a slow down of the Atlantic meridional overturning circulation, as shown in Gregory et al. (2016).

Surface temperature anomaly and surface excess temperature (
and
).
is the same as
except that it is evolved by the control ocean transport instead of the historical one. Values shown are differences between 1999–2008 and 1860–1880. (a and b) Zonal average over a basin. (c and d) Spatial map.
(red line) and
(blue line) are very similar at most latitudes (Figures 5a and 5b). The exception is the North Pacific and the North Atlantic, where
is much warmer than
. This implies a reduction of the ocean's surface-to-interior transport in those regions during the historical simulation (because global warming stratifies the ocean and thus inhibits heat uptake).
4.6 Potential Nonlinear Errors
Equation 19 assumes that the function Φ is strictly linear when operating on passive tracers in models. This is not necessarily true because some models use flux-limited transport schemes, which makes Φ nonlinear even when operating on passive tracers. This nonlinear error is included in the error computed from Equation 19, but it is likely small compared to the patch error.
Excess temperature Θe can alternatively be defined as a dynamical tracer that affects ocean transports (i.e., replacing Lhist with Φ in Equation 6). This definition leads to a set of “dynamical” GFs, as opposed to “passive” GFs of Section 3 (their distinctions are further discussed in Appendix B). The GF estimate of the dynamical Θe (Equation B1) contains a nonlinear error because the dynamical ocean response is not a linear function of the forcing. In contrast, the GF estimate of the passive Θe (Equation 6) does not have a nonlinear error, because Equation 6 is strictly linear. The errors introduced in Section 4.1 can all be eliminated by simulating GFs for Equation 6 at very fine space and time resolution. The nonlinear error, however, cannot be eliminated by any means.
5 Estimating Excess Heat Using Simulated GFs
In this section, we examine how well simulated GFs can reproduce excess heat changes in the historical simulation. The inaccuracy is partitioned into the patch, unforced-transport, forced-transport and BC errors (Equations 19-22).
We focus on three metrics when comparing the model truth with the GF estimates: (a) global/basin volume integral (0–2,000 m), (b) zonal-and-depth integral (0–2,000 m), and (c) depth distribution of (b) (0–1500 m). Metric (a) is shown as anomalies relative to 1946–1955. Metrics (b) and (c) are shown as changes between 1999–2008 and 1946–1955 (denoted using “Δ”). The root-mean-square error (RMSE) of the GF estimate (total error) and the RMS value for each error source (Equations 19-22) are listed in Figures 6-8. Each realization of GFs gives an unforced-transport error; we report the unforced-transport error averaged over four realizations. Since our metrics are all extensive quantities, we also report their normalized RMSEs; that is the ratio between RMSE and root-mean-square magnitude (RMSM).

Estimating the global/basin integrated (0–2,000 m) excess heat and
in the historical simulation using simulated Green's functions (GFs) (Sections 5.1 and 5.2). Black and gray lines show
and
in HadCM3, respectively.
is the same as
except that it is evolved by the control ocean transport. Blue and green lines are the
estimates of
and
, respectively. The
estimates of
and
are identical and both shown by red lines. The root-mean-square magnitude (RMSM) of the model truth and the root-mean-square errors (RMSEs) of different GF estimates are listed. The RMS values of the patch, unforced-transport and forced-transport errors are listed for the
and
estimates from left to right. The two
estimates have different RMSEs because
and
are different in the model truth.

Estimating latitude distribution of excess heat change and
in the historical simulation using
(Section 5.1). (a and b) Zonal-and-depth integral (0–2,000 m). (c–f) Depth distribution of panels (a and b). In all panels, black and gray lines show
and
in HadCM3, respectively; blue and green lines show the
estimates of
and
, respectively. Shading in panels (c and d) indicates errors in the
estimate of
(the patch error). Shading in panels (e and f) indicates errors in the
estimate of
minus the patch error (the forced-transport error). For each metric, the root-mean-square magnitude (RMSM) of the model truth and the root-mean-square error (RMSE) of the
estimate are listed, along with the RMS values of the patch, unforced-transport and forced-transport errors. All changes are calculated as differences between 1999–2008 and 1946–1955.

Estimating latitude distribution of excess heat change and
in the historical simulation using
(Section 5.2). (a and b) Zonal-and-depth integral (0–2,000 m). (c–f) Depth distribution of panels (a and b). In all panels, black and gray lines show
and
in HadCM3, respectively; the
estimates of
and
are identical, and shown by red lines. Shading in panels (c and d) indicates errors in the
estimate of
(the patch error). Shading in panels (e and f) indicates errors in the
estimate of
minus the patch error (the forced-transport error). For each metric, the root-mean-square magnitude (RMSM) of the model truth and the root-mean-square error (RMSE) of the
estimate are listed, along with the RMS values of the patch, unforced-transport and forced-transport errors. The two
estimates have different RMSEs because
and
are different in the model truth. All changes are calculated as differences between 1999–2008 and 1946–1955.
5.1 Concentration GF Estimate
In this subsection, we evaluate the GF estimate of and
based on Equation 17 (referred to as the
estimate).
is the same as
except that it is evolved by the control ocean transport (see Section 2.4).
is simulated in the control experiment. The BCs
and
are diagnosed in HadCM3 (i.e., BCs are perfectly known). Θe and
are converted to excess heat
and
, respectively, following the procedure of Section 2.2.4. We exclude
and
resulting from the Arctic patch to be consistent with Zanna et al. (2019).
5.1.1 Patch and Unforced-Transport Errors
We start with . The
estimate of
is inaccurate because of the patch error. For all metrics, the RMS value of the unforced-transport error is less than 1/3 of the patch error (compare numbers in Figures 6 and 7). The global integrated
(black line) increases by about 300 ZJ over 1860–2008, of which two thirds are stored in the Indo-Pacific and one third in the Atlantic (Figure 6). For this metric, the
estimate (blue line) reproduces the model truth well, with a RMSE of 16 ZJ for the global integral and 8 ZJ for basin integrals (Figure 6).
The zonal-and-depth integrated (black line) has a RMSM of 13.6 PJ m−1 in the Indo-Pacific and 6.2 PJ m−1 in the Atlantic, averaged over latitudes (Figures 7a and 7b). The
estimate of this metric (blue line) has an error of 24% in the Indo-Pacific and 42% in the Atlantic (Figures 7a and 7b). The patch error is most evident in the North Atlantic (underestimate) and in the Southern Ocean (overestimate) (Figures 7a and 7b, compare black and blue lines).
The latitude-depth pattern of (black contour) has a RMSM of 14.9 TJ m−2 in the Indo-Pacific and 5.2 TJ m−2 in the Atlantic, averaged over latitudes and 0–1,500 m (Figures 7c and 7d). The
estimate of this metric (blue contour) has an error of 26% in the Indo-Pacific and 44% in the Atlantic (Figures 7c and 7d). The patch error is strongest in the upper 200 m (Figures 7c and 7d shading). Below that, the
estimate follows the model truth broadly, except in the Atlantic around 60°N (Figures 7c and 7d, compare black and blue contours).
5.1.2 Forced-Transport Error
We next examine the estimate of
. By definition, this estimate is inaccurate because of the patch and forced-transport errors; we subtract the patch error (Section 5.1.1) from the total error to derive the forced-transport error (Equation 21). Global warming stratifies the ocean and weakens the surface-to-interior transport. The forced-transport error arises because
does not capture this weakening effect, hence tends to overestimate warming at depths. This is evident in Figure 6; the
estimate (green line) overestimates
(gray line) in both global and basin integrals. The overestimate is strongest at northern mid latitudes (Figures 7e and 7f shading).
In the North Atlantic, the forced-transport error is associated with a 1 Sv slowdown of the overturning circulation at 45°N after 1960 (not shown). In contrast, the overturning circulation shows little change compared to the control in the North Pacific, implying the forced-transport error there is associated with parameterized transports. Interestingly, the forced-transport error is nearly zero in the Southern Ocean. This is probably because the Southern Ocean circulation has a strong wind-driven component (Marshall & Speer, 2012), hence is less sensitive to surface warming compared to the North Atlantic circulation.
The forced-transport error is more than twice as large as the patch error for global/basin integrated (Figure 6), and about the same size as the patch error for zonal integrated
(Figure 7). The patch and forced-transport errors partially compensate each other for zonal integrated
in some regions (Figure 7, compare middle and bottom row shading), hence the RMSE of the
estimate is only slightly larger than that of the
estimate in Figure 7.
5.2 Air-Sea Flux GF Estimate
In this subsection, we evaluate the GF estimates of and
based on Equation 18 (referred to as the
estimate).
is simulated in the control experiment. The BCs
are diagnosed in HadCM3 as (QERF + Q′)/(ρ0cp).
is the same as
by definition, therefore the
estimates of
and
are identical. Θe and
are converted to excess heat
and
, respectively, following the procedure of Section 2.2.4.
The estimate has smaller RMSEs than the
estimate for global/basin integrals (Figure 6) and zonal-and-depth integrals (Figures 7 and 8, top row). In particular, the unforced- and forced-transport errors of the
estimate are much smaller than those of the
estimate. Note that the basin integrated
and
are largely determined by their surface fluxes, which are directly supplied to the
estimate.
In contrast, the estimate is less accurate than the
estimate for the latitude-depth patterns of
and
in the Indo-Pacific (compare Figure 7 with Figure 8 for (c) and (e)). This is because the
estimate has a larger patch error, especially in 0–200 m (compare shading in Figure 7c with Figure 8c). In the Atlantic, the
and
estimates have similar RMSEs for the latitude-depth patterns of
and
, but the forced-transport error is smaller in the
estimate, especially below 200 m (compare shading in Figure 7f with Figure 8f).
5.3 GF Estimate in a Real-World Application
In this subsection, we simulate a real-world application of the GF method in the model world. Specifically, we estimate excess heat in the historical simulation using: (a) simulated
and (b)
derived from HadCM3. This setup corresponds to Zanna et al. (2019) who reconstructed the real-world
by combining: (a) observed
and (b)
derived from an ocean model. To distinguish the
-based
estimate (examined in Section 5.1) from the
-based
estimate (to be examined below), we refer to the latter as the
estimate. The
estimate suffers an additional BC error compared to the
estimate, because of the differences between
and
(see Section 4.5).
5.3.1 BC Error
The BC error is the largest error for the latitude-depth pattern of (Figures 9e and 9f); it is as large as the forced-transport error for basin integrated
(Figures 9a and 9b) and depth integrated
(Figures 9c and 9d). For zonal-and-depth integrated
, the BC error causes an underestimate in most of the Atlantic and south of 40°S of the Indo-Pacific (Figures 9c and 9d, compare orange and green lines). In the Southern Ocean, the underestimate caused by the BC error partially compensates the overestimate caused by the patch and forced-transport errors, reducing the total error there.

Estimating excess heat in the historical simulation using
and
(Section 5.3). This estimate is referred to as the
estimate. (a and b) Basin-volume integral. (c and d) Zonal-and-depth integrated change (0–2,000 m). (e and f) Depth distribution of panels (c and d). In all panels, black lines are the model truth, orange lines are the
estimate, and green lines are the
estimate in Figures 6 and 7. Shading in panels (e and f) indicates the boundary condition (BC) error (Equation 22). For each metric, the root-mean-square magnitude (RMSM) of the model truth and the root-mean-square errors (RMSEs) of the two Green's function estimates are listed, along with the RMS values of the patch, forced-change and BC errors.
5.3.2 Total Error
When all error terms are considered, the estimate (orange line) reconstructs the model truth (black line) with an error of 48% and 39% for basin integrated
in the Indo-Pacific and Atlantic, respectively (Figures 9a and 9b). In the Indo-Pacific, the total error is 26% for zonal-and-depth integrated
and 39% for its depth distribution (Figures 9c and 9e). These numbers are larger in the Atlantic, at 37% and 68%, respectively (Figures 9d and 9f). For depth-integrated
, the largest error occurs at mid and high latitudes, for example, an overestimate in the North Pacific (Figures 9c and 9d, compare black and orange lines).
6 Inferring GFs From Tracer Data
6.1 Introducing the Inverse Problem
The GF model Equation 10 connects X at (r, t) with its surface history Xs via Gc at r. This forms a constraint on Gc at r for every pair of X(r, t) and Xs in observations. In this section, we introduce a method to infer Gc from such constraints. This problem is the “inverse” of the forward problem discussed in Section 5, which uses Xs and Gc at r to estimate X(r, t). Inferring Gc is useful as it only requires tracer data; for example, one can use it to estimate the real-world Gc from observed tracers.

6.2 Maximum Entropy Method
At every r, N observations of Xn and impose N constraints on Gc via Equation 23. In practice, N is much smaller than the number of unknowns in Gc; the latter is at least the number of locations in rs. Among infinitely many Gc that satisfy constraints, we choose the one that is the most “similar” to an initial guess of Gc (denoted as Gpr). This method is called the Maximum Entropy (MaxEnt) method and was first applied to infer Gc by Khatiwala et al. (2009) and Holzer et al. (2010). Formally, the above procedure can be cast as a constrained optimization problem, and solved using the method of Lagrangian multipliers.









There are other methods to infer Gc from observations. For example, Gebbie and Huybers (2010) and DeVries and Primeau (2011) estimate the operator L (Equation 9) from observations. Once L is derived, one can use it to calculate Gc analytically.
6.3 Transient Tracers in the Ocean
6.3.1 Introducing CFCs, SF6, and Bomb Δ14C
Observations of CFC-11, CFC-12, and SF6 (Fine, 2011) are often used as data constraints in the MaxEnt method (i.e., Xn in Equation 23). CFCs and SF6 are man-made chemical tracers that have been released into the atmosphere since the 1930s and gradually taken up by the ocean. CFCs and SF6 are stable in the oxygenated open ocean. Once entering the ocean, they are advected and diffused by ocean transports, like passive tracers.
We also explore the use of bomb 14C as data constraints in the MaxEnt method. 14C is commonly expressed as Δ14C, which is the deviation of the 14C/12C ratio relative to a standard value. 14C is naturally generated in the atmosphere by cosmic rays. The 14C content of a water parcel decays with a half-life of 5,730 years once it is out of contact with the atmosphere. During the 1950s and 1960s, the nuclear weapon tests dramatically increased Δ14C in the atmosphere. This “bomb Δ14C” signal invades the ocean in a way similar to CFCs and SF6.
6.3.2 Spatial Distribution
We use results from a historical simulation of CESM2 (Danabasoglu et al., 2020) to demonstrate passages of CFCs, SF6 and bomb Δ14C in the ocean (Figure 10). The CESM2 historical simulation is conducted under the CMIP6 protocol (Eyring et al., 2016; Orr et al., 2017). We derive bomb Δ14C as anomalies in Δ14C relative to its 1850–1870 climatology. Measurements of CFCs, SF6 and Δ14C from historical cruises are made available as gridded and profile data by Global Ocean Data Analysis Project (GLODAP; Key et al., 2004; Olsen et al., 2016).

Transient tracers in the ocean from the CESM2 historical simulation. (a) Sea surface concentrations of CFC-11, CFC-12, and SF6 (solid lines) and their atmosphere mixing ratios (dashed lines). Both quantities are shown as global mean. Dashed lines are multiplied by arbitrary scaling factors. (b and c) CFC-11 at year 1994 at the surface and the 150°W section (shading and black contours). For these two metrics, CFC-12 and SF6 have similar patterns compared to CFC-11, but with different magnitudes. Panels (d–f) are the same as panels (a–c), but for bomb Δ14C. CFC-11 and bomb Δ14C in HadCM3 (Section 6.4) and observations are shown as red and green contours, respectively, in the bottom row. 1 nmol = 1 × 10−9 mol.
Both CFC-11 and bomb Δ14C invade the ocean from the surface, similar to how excess heat is carried to depths, for example, at the 150°W section (Figures 10c and 10f, shading). A major difference between CFC-11 and 14C is that the latter has a much longer air-sea equilibration timescale than the former (10 years vs. weeks) (Broecker & Peng, 1974). This has two consequences for CFC-11 and bomb Δ14C in the ocean. First, the surface CFC-11 (solid line) follows its atmospheric history (dashed line) closely for global mean, while the surface bomb Δ14C shows a slower increase and decay compared to its atmospheric history (Figures 10a and 10d). Second, and more importantly, the surface CFC-11 and bomb Δ14C have very different patterns in the ocean (compare Figures 10b and 10e), because the pattern of bomb Δ14C is more affected by ocean transports (due to slow air-sea equilibration). CFC-12 and SF6 have similar air-sea equilibration timescales and spatial patterns as CFC-11.
6.4 Simulated Tracer Observations
To derive GME as one would do with observations, we include CFCs, SF6, and bomb Δ14C in the historical simulation (see Appendix D for details). The resulting CFC-11 and bomb Δ14C are similar to the gridded GLODAPv1 observations. Taking the 150°W section as an example, this is evident as a good agreement between red and green contours in Figures 10c and 10f. In polar regions, tracers in HadCM3 tend to penetrate to greater depths than those in CESM2, implying that HadCM3 has a stronger convection than CESM2 there. To isolate the forced-transport error, we also simulate CFCs, SF6, and bomb Δ14C in the control experiment. In Section 7, we use simulated observations in the historical and control experiments to estimate and
, respectively.
6.5 A Baseline Setup for Computing GME
It is important to note that GME is not uniquely defined, but depends on the choice of data constraints and priors. Because we want to test the application of GME in the real world, we construct a GME using HadCM3 equivalents of real-world observations. We refer to this GME as GMEb.
6.5.1 Data Constraints
We use four tracers simulated in HadCM3 to compute GMEb; they are CFC-11 and CFC-12 at year 1994 and climatological temperature and salinity. These four tracers are available in observations from the gridded GLODAPv1 data (Key et al., 2004). We choose this data set because it has a nearly global coverage, hence one could compute GME everywhere in the ocean.
For climatological temperature and salinity, we repeat their surface BCs in time, and truncate the time integral in Equation 25 from (−∞, tn] to [tn − 7,999, tn] years. The 8,000-year limit is an upper bound of the timescale to tracer equilibrium in the global ocean under concentration BCs. One can set tn to an arbitrary number for climatological tracers because their Xn and are both constant in time.
6.5.2 Space and Time Average
All data on the HadCM3 grid are averaged onto a 10° × 10° grid before solving for GME. Because every interior point has a GME, the spatial averaging reduces the total number of GME to be solved. Despite the low resolution, the coarse grid can still capture most of spatial variability in the surface BCs of CFCs, SF6 and bomb Δ14C, because they all exhibit coherent spatial structures (Figures 10b and 10e). On the time dimension, we focus on annually averaged quantities as for simulated GFs (Sections 4 and 5). After the coarse-grained averaging, GME(r, 0 | rs, τ) now becomes a 10° × 10° resolution 2D map defined on a yearly grid for a given r.
6.5.3 Computing Prior GFs





7 Estimating Excess Heat Using Inferred GFs
In this section, we examine how well inferred GFs GME can reproduce excess heat change in the historical simulation. We derive GME by updating a prior estimate of Gc to fit simulated tracer observations (Section 6.2).
7.1 Error Definitions






We compare the model truth with the GME estimates using the same metrics as Section 5. They are: (a) global/basin volume integral (0–2,000 m), (b) zonal-and-depth integral (0–2,000 m), and (c) depth distribution of (b) (0–1,500 m). All metrics are showed as anomalies relative to the 1946–1955 average. A change (denoted using “Δ”) is calculated as the difference between 1999–2008 and 1946–1955.
7.2 Evaluating a Baseline Estimate
In this subsection, we evaluate the GMEb estimate of and
. This estimate is calculated from Equation 23, wherein we replace Gc with GMEb. We use GMEb derived from the historical and control experiments to estimate
and
, respectively.
is the same as
except that it is evolved by the control ocean transport (see Section 2.4). GMEb is a particular GME constrained by simulated observations in HadCM3 (see Section 6.5). The BCs
and
are diagnosed in HadCM3 (i.e., BCs are perfectly known). Note that Section 5 uses the same
to estimate
and
, which is different from here.
7.2.1 Information Error
The GMEb estimate (blue line) reproduces the global/basin integrated in HadCM3 (black line) well (Figure 11), with an error of 25% for the global ocean, 27% for the Indo-Pacific and 38% for the Atlantic. (A percentage error is calculated as the ratio between RMSE and RMSM.) A constant 50 ZJ offset between the GMEb estimate and the model truth is evident in Figure 11a after 1965 (compare blue and black lines).

Estimating global/basin integrated (0–2,000 m) excess heat and
in the historical simulation using GMEb (Section 7.2). Black and gray lines show
and
in HadCM3, respectively. Blue and green lines are the GMEb estimates of
and
, respectively.
is the same as
except that it is evolved by the control ocean transport. The root-mean-square magnitude (RMSM) of the model truth, the root-mean-square errors (RMSEs) of the GMEb estimate (first number) and the prior estimate (second number), and the RMS values of the information and forced-transport errors are listed.
The GMEb estimate broadly captures the latitude-depth pattern of in the Indo-Pacific and the Atlantic, with a greater error in the latter (Figures 12a–12d compare black and blue lines/contours). The error for depth integrated
is 25% and 35% in the Indo-Pacific and the Atlantic, respectively. In both basins,
is underestimated by 0–5 PJ m−1 at most latitudes, except south of 50°S where it is overestimated (Figures 12a and 12b compare black and blue lines). The overestimate is evident over the 0–1,500 m depths, while the underestimate mostly comes from the 0–400 m depths (Figures 12c and 12d shading). For these zonal integrated metrics, the GMEb estimate has a similar accuracy compared to the
estimate (Section 5.1) in both basins.

Estimating latitude distribution of excess heat change and
in the historical simulation using GMEb (Section 7.2). (a and b) Zonal-and-depth integral (0–2,000 m). (c–f) Depth distribution of panels (a and b). In all panels, black and gray lines show
and
in HadCM3, respectively; blue and green lines show the GMEb estimates of
and
, respectively. Shading in panels (c and d) indicates errors in the GMEb estimate of
(the information error). Shading in panels (e and f) indicates errors in the GMEb estimate of
minus the information error (the forced-transport error). For each metric, the root-mean-square magnitude (RMSM) of the model truth, the root-mean-square errors (RMSEs) of the GMEb estimate (first number) and the prior estimate (second number), and the RMS values of the information and forced-transport errors are listed. All changes are calculated as differences between 1999–2008 and 1946–1955.
7.2.2 Forced-Transport Error
The forced-transport error causes an overestimate in the GMEb estimate, especially at northern mid latitudes (Figures 12e and 12f, shading), similar to that in the estimate. The forced-transport error is more than twice as large as the information error for the global and the Atlantic integrated
(Figure 11), while it is about the same size as the information error for zonal integrated
(Figure 12). In the Atlantic, the underestimate caused by the information error is partially compensated by the overestimate caused by the forced-transport error, reducing the total error there (except south of 50°S) (Figure 12b).
7.2.3 Effects of Data Constraints
How do data constraints improve on the initial guess Gpr? We examine this question by comparing RMSEs between the GMEb estimate and the Gpr estimate. The Gpr estimate is calculated using the same equation as the GMEb estimate, except replacing GMEb with Gpr. The GMEb estimate has a smaller RMSE compared to the Gpr estimate for all the metrics examined in Figures 11 and 12 (shown by numbers in the legends). The reduction of RMSE is between 20% and 40% (the number is different for different metrics). The exceptions are the global and the Atlantic integrated , for which the GMEb estimate has a greater RMSE than the Gpr estimate (Figure 11c). We suspect that this increase of RMSE is related to the forced-transport error, because the same behavior is not found for
.
7.3 GF Estimate in a Real-World Application
In this subsection, we simulate a real-world application of the GF method in the model world. Specifically, we estimate excess heat in the historical simulation using: (a) simulated
and (b) GMEb derived from simulated observations. This calculation can be repeated using the real-world
and observations. To distinguish the
-based GMEb estimate (examined in Sections 7.2) from the
-based GMEb estimate (to be examined below), we refer to the latter as the
estimate. The
estimate suffers an additional BC error compared to the GMEb estimate, because of the differences between
and
. The BC error of the
estimate is similar to that of the
estimate in Section 5.3 (compare Figures 9 and 13). In particular, the BC error is at least as large as the information and forced-transport errors for all metrics examined in Figure 13.

Estimating excess heat in the historical simulation using GMEb and
(Section 7.3). This estimate is referred to as the
estimate. (a and b) Basin-volume integral. (c and d) Zonal-and-depth integrated change (0–2,000 m). (e and f) Depth distribution of panels (c and d). In all panels, black lines are the model truth, orange lines are the
estimate, and green lines are the GMEb estimate in Figures 11 and 12. Shading in panels (e–f) indicates the boundary condition (BC) error (Equation 31). For each metric, the root-mean-square magnitude (RMSM) of the model truth and the root-mean-square errors (RMSEs) of the two Green's function estimates are listed, along with the RMS values of the information, forced-change and BC errors.
When all errors are considered, the estimate reconstructs the model truth with an error of 50% for basin integrated
and 40% for zonal-and-depth integrated
(Figures 13a–13d, RMSEs of orange lines). In the Indo-Pacific the error is largest around 40°S, while in the Atlantic the error is of similar magnitude across latitudes (Figures 13c and 13d, compare black and orange lines). It is important to note that the GMEb estimate is more accurate than the
estimate for all metrics examined here. This highlights the need to reduce the BC error when applying inferred GFs to estimate the real-world excess heat.
7.4 Sensitivity of the GMEb Estimate
In this subsection, we examine how sensitive the GMEb estimate is to the choice of data constraints and priors. For each sensitivity experiment, we focus on two metrics: (a) basin integral and (b) zonal-and-depth integrated change. Both metrics are calculated for only and integrated over the 0–2,000 m layers.
7.4.1 Constraints From SF6 and Bomb Δ14C
In the first experiment, we add SF6 and bomb Δ14C at year 1994 as additional constraints, while keeping other settings unchanged. Adding SF6 alone or SF6 and Δ14C together has little impact on the GMEb estimate (not shown). For instance, the RMSE change due to SF6 is less than 2% for all the metrics.

The GMEu estimate of (red line) is about 50% lower than the GMEb estimate (gray line) in both the Indo-Pacific and the Atlantic (Figure 14). The GMEu estimate is improved by adding bomb Δ14C (green line) as a constraint, but not by adding SF6 (blue line) (Figure 14). This is because bomb Δ14C and CFCs have very different surface BCs (Figures 10b and 10e), which provides additional constraints (equations) to the inverse problem. In contrast, the surface BCs are very similar between SF6 and CFCs. The improvement due to bomb Δ14C is much greater in the Indo-Pacific than in the Atlantic. This is probably because the surface BCs of Θe and bomb Δ14C are more alike in the Indo-Pacific than Atlantic (compare Figures 10e and 5d). In particular, the pattern of surface Θe peaks in the North Atlantic, whereas the pattern of surface bomb Δ14C is at its minimum in that region.

Sensitivity of the GMEu estimate to additional constraints from SF6 and bomb Δ14C. (a and b) Basin integrated . (c and d) Zonal-and-depth integrated
. The model truth is shown in black lines. The GMEu estimates constrained by different tracers are color coded. For comparison, the GMEb estimate is included as gray lines. The root-mean-square magnitude (RMSM) of the model truth and the root-mean-square errors (RMSEs) of different estimates are listed.
7.4.2 Perturbing Prior GFs
In the second experiment, we replace the FAMOUS prior in the GMEb estimate with Inverse Gaussian (IG) distributions of different shape, following Holzer et al. (2018). The IG distribution is the analytical form of Gc for constant 1D flow; a narrower IG distribution implies that the flow has a higher Peclet number (Waugh & Hall, 2002). The method of constructing the IG prior is described in Appendix E. The three IG priors tested here are called IG-0.5, IG-1.0, and IG-1.5.
Replacing the FAMOUS prior with the IG priors leads to a change in RMSE of less than 20% for all metrics examined in Figure 15. Among the three IG priors, IG-1.0 (corresponds to a Peclet number of one) gives the closest estimate compared to the FAMOUS prior. The RMSE of the GMEb estimate (first column) is always reduced compared to that of the Gpr estimate (second column) regardless of which prior is used (Figure 15 numbers in the legends), except for the Atlantic integral. This highlights the constraints of CFCs and climatological temperature and salinity on the passage of excess heat in the ocean.

Sensitivity of the GMEb estimate to the choice of prior (Gpr). (a and b) Basin integrated . (c and d) Zonal-and-depth integrated
. The model truth is shown in black lines. The GMEb estimates with different Gpr are color coded. For each Gpr, the RMSEs of the GMEb and Gpr estimates are listed from left to right. IG-0.5, IG-1.0, and IG-1.5 are approximations of the FAMOUS prior using Inverse Gaussian forms of different shape.
7.4.3 Perturbing Time of Constraints
In the third experiment, we alter the year of data constraints in the GMEb estimate from 1994 to 1984, 1989, 1999, and 2004, while keeping other settings unchanged. The resulting change in RMSE is about 2% in the Indo-Pacific and 5% in the Atlantic (Figure 16 RMSEs in the legends).

Sensitivity of the GMEb estimate to the time of data constraints. (a and b) Basin integrated . (c and d) Zonal-and-depth integrated
. The model truth is shown in black lines. The GMEb estimates with different data years are color coded. The prior estimate is shown in gray lines. The root-mean-square magnitude (RMSM) of the model truth and the root-mean-square errors (RMSEs) of different estimates are listed.
8 Summary
8.1 Excess Heat and Green's Functions
The ocean stores over 93% of the “excess heat” that has entered the climate system in recent decades (Meyssignac et al., 2019). This excess heat is added to the ocean surface by air-sea fluxes (warming or cooling) and carried to depths by ocean transports. One method to estimate excess heat is to propagate its surface BCs downward using a GF representation of ocean transports. The GFs can be derived from: (a) simulating idealized tracers in a model (“simulated GFs”) or (b) solving an inverse problem using tracer observations (“inferred GFs”) (Holzer et al., 2010; Khatiwala et al., 2009; Zanna et al., 2019). The BCs are often derived from SST anomaly in the literature.
8.2 Errors in the GF Method
-
Patch error: Simulated GFs are coarse grained in space and time, hence they partially neglect the covariance between the true GFs and surface BCs.
-
Transport error: Simulated/inferred GFs do not resolve time-varying ocean transports due to unforced variability and forced change.
-
Information error: Observations are insufficient constraints for inferring GFs.
-
BC error: SST anomalies are contaminated by redistributed changes.
-
Model error: Modeled ocean transports encoded in simulated GFs are different from those of the real world.
8.3 HadCM3 Perfect-Model Test
How different errors affect the accuracy of the GF method has not been examined in the literature. Here, we investigate this question using a historical simulation (1860–2008) conducted in the HadCM3 AOGCM. We treat this simulation as the real world, and compare excess heat diagnosed in it (as the “truth”) with that estimated using simulated/inferred GFs. Details on how different errors are computed are given in Sections 4.4 and 7.1. We focus on evaluating
derived from GFs instead of GFs themselves, because not every detail in GFs matters for estimating
.
8.4 Estimating Excess Heat Using Simulated GFs
We generate simulated GFs in a 200-year pre-industrial control experiment of HadCM3.
8.4.1 How Accurate Is the Method?
The simulated GFs reconstruct in the Indo-Pacific with a RMS error of 48% for the volume integral and 26% for zonal-and-depth integrated changes; the corresponding numbers are 39% and 37% in the Atlantic, respectively (including all errors except the model error). The volume integral is most affected by the forced-transport and BC errors; the patch error is <1/3 of the BC error in terms of the RMS value. The zonal-and-depth integral is affected by the patch, forced-transport and BC errors to a similar degree; the BC error is slightly larger than the other two in the Indo-Pacific. The unforced-transport error is <1/3 of the patch error for all metrics examined here. Results of this subsection are summarized in Figure 9.
8.4.2 Underestimated or Overestimated?
The patch error causes an underestimate of in the North Atlantic, and an overestimate of
south of 40°S. The forced-transport error causes an overestimate of
at most latitudes, especially in the northern subtropics. The BC error causes an underestimate of
in the Southern Ocean. This underestimate partially cancels out the patch and forced-transport errors, reducing the total error in the Southern Ocean. Note that the degree to which this error compensation would work may be different in the real world and in other models.
8.5 Estimating Excess Heat Using Inferred GFs
We compute inferred GFs by using HadCM3 equivalents of the GLODAPv1 data as constraints to update a prior estimate of GFs. The GLODAPv1 data consist of CFC-11 and CFC-12 at year 1994 and climatological temperature and salinity (Key et al., 2004).
8.5.1 How Accurate Is the Method?
The inferred GFs reconstruct in the Indo-Pacific with an error of 50% for the volume integral and 34% for zonal-and-depth integrated changes; the corresponding numbers are 44% and 42% in the Atlantic, respectively (including all errors). The volume integral is most affected by the BC error; the information and forced-transport errors are about 2/3 of the BC error (in terms of the RMS value) in the Indo-Pacific. The zonal-and-depth integral is affected by the information, forced-transport and BC errors to a similar degree; although the BC error is slightly larger than the other two in the Indo-Pacific. Results of this subsection are summarized in Figure 13.
8.5.2 Underestimated or Overestimated?
The information error causes an underestimate of at most latitudes (except south of 50°S). The forced-transport and BC errors have the same effects as discussed with simulated GFs. In the Atlantic, the information error partially compensates the forced-transport error, reducing the total error there. It is unclear whether the same compensation would occur in the real world or in other models. Removing the BC error improves the estimate with inferred GFs significantly.
8.5.3 Sensitivity to Data Constraints and Priors
The estimate of from inferred GFs is not sensitive to: (a) shifting the data year by ±10, (b) small changes in the shape of prior GFs, or (c) adding 1994 SF6 and bomb 14C as additional constraints, although bomb 14C (but not SF6) helps when a less informative prior is used.
9 Discussions
9.1 Model Error of Simulated GFs
Because we use GFs simulated in HadCM3 to estimate excess heat in the HadCM3 world, our results do not include the model error. To explore this error, one could perturb simulated GFs to generate an ensemble of estimates (Zanna et al., 2019). An alternative would be to use more than one AOGCM; by treating one of them as though it were perfect, one could make an estimate with the GFs of another. This approach would not include the effect of errors common to all models.
9.2 Is Air-Sea Flux GF a Better Option?
As well as concentration BCs, one can propagate surface heat fluxes to estimate using simulated GFs (with a different configuration). In HadCM3, we find that this method gives a better estimate of
than propagating concentration BCs. However, observations of surface heat flux are not adequate for the purpose of estimating
. For example, the Objectively Analyzed air-sea Fluxes (Yu et al., 2008) are not available before 1985 and do not have the accuracy to resolve the global mean energy imbalance.
9.3 Simulated GFs Versus Ocean Model
Evolution of passive tracers in the model world can be studied using simulated GFs as well as ocean models. Ocean models are more accurate than simulated GFs for this regard, because they do not suffer the patch and transport errors. In addition, GFs are computationally expensive to derive.
Nonetheless, simulated GFs are useful for the following purposes. First, simulated GFs encapsulate the effect of a model's ocean transports in a form that can be easily shared within the community. Especially, GFs are much easier to use than 3D ocean models. Second, simulated GFs can be used to quantify the surface sources and timescales of a tracer response (e.g., Marzocchi et al., 2021; Wu et al., 2021; Zanna et al., 2019).
9.4 Improving Simulated GFs
Simulating GFs with finer surface patches can reduce the patch error. At the limit that every grid box is a patch, the patch error is completely eliminated. What is the best strategy to simulate GFs given a limited amount of computer time? Air-sea fluxes and surface concentrations of a tracer often exhibit low-dimensional structures in space. Designing patches around these structures can reduce the patch error at low computational cost (see Appendix F for two examples). On the time dimension, simulating GFs starting from various years in a historical simulation (e.g., Marzocchi et al., 2021) can reduce the unforced- and forced-transport errors. For instance, a set of GFs per year can capture time variation of ocean transports on interannual and longer timescales. Simulating GFs starting from every 10 years of the historical run would be less accurate, but more appealing computationally. Sensitivity tests to find the optimal time interval for time-dependent GFs would be useful.
9.5 Improving Inferred GFs
To reduce the information error, one could add observations in the GLODAPv2 data set (Olsen et al., 2016) as additional constraints. At present, CFCs only constrain GFs at multi-decadal and shorter lead times, limited by their surface histories. It is important to maintain observations of transient tracers in the ocean, so that new observations can be added to constrain GFs over longer lead times in the future.
9.6 Excess Temperature BCs
To derive excess temperature BCs, one could combine modeled patterns of surface excess temperature and observed global-mean SST anomalies to form hybrid excess temperature BCs. These new BCs may help reduce the contamination of redistributive cooling in the SST BCs (e.g., Figure 5). Note that there are uncertainties in the modeled patterns of excess temperature because of the spread in the modeled surface heat fluxes (e.g., in the CMIP6 ensemble).
Notation
-
- r
-
- 3D position vector of the ocean
-
- rs
-
- 2D position vector of the ocean surface
-
- t
-
- time variable in general
-
- ts
-
- time variable of surface source
-
- Qctrl
-
- Net surface heat fluxes in the pre-industrial control experiment
-
- QERF
-
- Effective radiative forcing of the historical experiment
-
- Q′
-
- Changes in surface heat fluxes due to climate feedbacks
-
- Θctrl
-
- Ocean potential temperature in the pre-industrial control experiment
-
- Θhist
-
- Ocean potential temperature in the historical experiment
-
- Φ
-
- 3D ocean transport operator in general
-
- Lctrl,Φ
-
- saved from the pre-industrial control experiment
-
- Lhist,Φ
-
- saved from the historical experiment
-
- Θa
-
- Historical ocean temperature anomaly, Θa = Θhist − Θhist
-
- Θe
-
- Excess temperature tracer evolved by Lhist
-
-
- Excess temperature tracer evolved by Lctrl
-
- Θr
-
- Redistributed temperature tracer, Θa = Θe + Θr
-
-
- Historical ocean heat content anomaly, extensive form of Θa
-
-
- Excess heat content evolved by Lhist, extensive form of Θe
-
-
- Excess heat content evolved by Lctrl, extensive form of
- Excess heat content evolved by Lctrl, extensive form of
-
-
- Redistributed heat content, extensive form of Θr
-
- X
-
- Concentration of a tracer, X could be Θa, Θe,
, etc.
- Concentration of a tracer, X could be Θa, Θe,
-
- Xs
-
- X at the surface
-
-
- Xs averaged onto a yearly grid and surface patches
-
- G
-
- Boundary GF of tracer transport equation in general
-
- Gc
-
- G defined for surface concentration BCs
-
- Gf
-
- G defined for air-sea tracer fluxes or surface sources/sinks
-
-
- Simulated Gc defined for yearly- and patch-averaged surface conditions
-
-
- Simulated Gf defined for yearly- and patch-averaged surface sources/sinks
-
- GME
-
- Maximum entropy estimate of Gc in general
-
- Gpr
-
- Prior estimate of Gc used in GME
-
- GMEb
-
- GME constrained by GLODAPv1 data and the FAMOUS prior
-
- GMEu, GME
-
- constrained by GLODAPv1 data and the uniform prior
Acronyms
-
- GF
-
- Green's function
-
- BC
-
- boundary condition
-
- MaxEnt
-
- maximum entropy
-
- IG
-
- Inverse Gaussian
-
- RMSE
-
- root-mean-square error
-
- RMSM
-
- root-mean-square magnitude
Acknowledgments
The authors are grateful to Samar Khatiwala for discussions of the maximum entropy method and the suggestion about testing anthropogenic radiocarbon, to Thorsten Mauritsen for providing the ECHAM6.3 surface forcing data set, and to Elaine McDonagh, Laure Zanna, Heather Graven, and other colleagues at TICTOC meetings for useful discussions. This project has received funding from the UK Natural Environment Research Council (NERC) Grant NE/P019099/1 for the TICTOC project and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant Agreement No. 786427, project “Couplet”). The authors thank Tom Haine and two anonymous reviewers for their careful evaluation of the manuscript. Their comments and suggestions help to improve the manuscript significantly. The authors acknowledge the World Climate Research Program's Working Group on Coupled Modelling, which is responsible for CMIP, for producing and making available their model output.
Appendix A: Forcing the Historical Simulation
We did not conduct the historical simulation by prescribing CO2 and other forcing agents in the atmosphere. This is because HadCM3 does not include an up-to-date treatment of anthropogenic aerosol forcing, especially aerosol-cloud interaction. (In the historical experiments of Stott et al. (2000), the effect of tropospheric aerosol on cloud reflectivity was approximately represented by prescribed perturbations to surface albedo.) In addition, since our focus is on heat uptake, it is convenient to prescribe the heat flux which is added to the ocean. The use of surface forcing omits any non-radiative effects of forcing agents and their radiative effects directly in the atmosphere. But those effects are relatively minor and not important in this work. As shown in Section 2.3, the historical simulation is sufficiently close to observations for the purposes of this study.
Appendix B: Passive and Dynamical Θe Definitions

The GFs of Equation 6 are “passive” GFs; they are evolved by the same Lhist operator (see Equation 14) regardless of the time and location of the Θe perturbation, because Lhist is a pre-computed quantity. To simulate GFs for Equation 6 one only needs pre-computed velocities and diffusivities in principle. In contrast, the GFs of Equation B1 are “dynamical” GFs; they are evolved by different Φ operators, because Φ depends on the location and time of the Θe perturbation. To simulate GFs for Equation B1 one needs a full ocean model to interactively compute Φ for every Θe perturbation.
Because Φ depends on the forcing, the solution of Equation B1 cannot be written as a superposition of impulse responses scaled with the forcing. That is, the GF estimate of the dynamical Θe has a nonlinear error. The GF estimate of the passive Θe, however, does not have a nonlinear error, because Lhist is independent of the forcing (i.e., Equation 6 is strictly linear). The unforced- and forced-transport errors (Section 4.1) arise because we neglect the time-evolution of GFs to reduce computational time, not because the GF method is inherently inaccurate.
Appendix C: Maximum Entropy Method
C1 Problem Formulation
Inferring Gc(r, 0 | rs, τ) from Xn(r, tn) and is an underdetermined problem. Among infinitely many estimates of Gc that satisfy Equation 23, we choose one based on the principle of Minimum Discrimination Information (MDI) (Kullback, 1959). Recall that Gc is like a Probability Distribution Function (PDF) (Section 3). The principle of MDI states that: to update a prior PDF using new facts (i.e., constraints), the new PDF should be chosen such that it is the “nearest” to the prior PDF. The distance between two PDFs is measured using relative entropy in MDI.





C2 Exact Solution












The principle of MDI is also known as the principle of Maximum Entropy (MaxEnt) (Jaynes, 1957), which maximizes the negative relative entropy subject to constraints. Here, we refer to the procedure of deriving Equation 24 as the MaxEnt method following Khatiwala et al. (2009).
C3 Least Squares Solution
In practice, an exact fit between Xn and (i.e., Equation C6) is not desirable, because there are errors in Equation 23 and in observations. Solving for a using Equation C6 sometimes results in an overfitted GME that is difficult to interpret physically. For instance, a GME may have extremely large values at just a few rs and τ locations, as opposed to a much broader distribution in Gpr derived from models. These extreme cases often come with large an values, which modify Gpr via the exponential function in Equation 24.
To avoid overfitting, we relax the equality constraints and solve for a in a least squares sense with Tikhonov regularization (Tikhonov & Arsenin, 1977). This gives an in Equation 26. The regularization term penalizes large an values that cause GME to deviate from Gpr substantially.
We determine the λ value using the L-curve method (Hansen & O’Leary, 1993). A smaller λ value corresponds to a smaller model-data misfit and a larger value. Setting λ = 0 gives a without the regularization (i.e., Equation C6). The L-curve method finds the smallest λ allowed before any further decrease of λ leads to a rapid increase of
. The L-curve method requires repeating the minimization process for different λ values, which is computationally prohibitive if carried out at each interior location. We choose λ by applying the L-curve method to the centers of the subtropical gyres, where ocean tracers tend to accumulate. We find that the optimal λ value is between 0.1 and 10 in those locations; λ values within that range give a similar model-data misfit and a similar
value. Based on this evidence, we set λ to unity globally in this study.
Appendix D: Simulating Tracer Observations
We include CFCs, SF6 and bomb Δ14C in the HadCM3 historical simulation. All these tracers are simulated in the ocean with zero initial conditions and prescribed surface concentration boundary conditions (BCs) from 1860 to 2008. The BCs are derived by interpolating monthly outputs of the CESM2 historical simulation to each timestep of HadCM3 linearly. CESM2 is chosen here because it is the only model available to us that includes CFCs, SF6 and Δ14C in a historical simulation.
For simplicity, we choose not to simulate the air-sea gas transfer of chemical tracers in HadCM3, which is different from the CMIP6 biogeochemical protocol (Orr et al., 2017). This is because the MaxEnt method is not concerned with the air-sea gas transfer. GME is determined by the relationship between and Xn, which is only affected by ocean transports (e.g., Lhist or Lctrl).
The radioactive decay of 14C can be accounted for by adding an exponential decay term in Equation 23 (see Holzer et al., 2010). Because the history of bomb Δ14C is very short compared to its half-life, we neglect its radioactive decay in this study. The method of simulating Δ14C (hence the 14C/12C ratio) as a tracer was first proposed by Toggweiler et al. (1989).
Appendix E: Inverse Gaussian Prior



We generate the IG prior for the Λ/Γ ratio of 0.5, 1.0, and 1.5 to cover its likely range in the ocean (Waugh et al., 2006). These priors are referred to as IG-0.5, IG-1.0, and IG-1.5, respectively.
Appendix F: Alternative Patch Designs
Instead of decomposing BCs into pulses in the lon-lat space, one could project BCs onto Empirical Orthogonal Functions (EOFs), and construct GFs based on the leading EOFs. (EOFs are the optimal coordinates to capture the spacetime variability of a field.) This method has one limitation: the sources of a tracer are often interpreted in terms of water-mass formation sites, but EOFs do not always project back to isolated regions in the lon-lat space.
One could also reduce the patch error by prescribing a spatial pattern within every patch when simulating GFs. Such a pattern can be derived from long-term trends for surface excess temperature or long-term averages for surface heat fluxes, for instance. In this way, the covariance between the true GFs and BCs are better incorporated into simulated GFs than assuming that BCs are uniform within every patch. However, because tracers exhibit different patterns in their BCs, there is no universal pattern that would work for every tracer.
Open Research
Data Availability Statement
Outputs from the historical simulation are published at https://doi.org/10.5281/zenodo.6790458. Simulated tracer Green's functions are published at https://doi.org/10.5281/zenodo.6792335. The CESM2 data are available at https://esgf-node.llnl.gov. For the use of the HadCM3 model, contact [email protected].