Viewing Forced Climate Patterns Through an AI Lens

Many problems in climate science require extracting forced signals from a background of internal climate variability. We demonstrate that artificial neural networks (ANNs) are a useful addition to the climate science “toolbox” for this purpose. Specifically, forced patterns are detected by an ANN trained on climate model simulations under historical and future climate scenarios. By identifying spatial patterns that serve as indicators of change in surface temperature and precipitation, the ANN can determine the approximate year from which the simulations came without first explicitly separating the forced signal from the noise of both internal climate variability and model uncertainty. Thus, the ANN indicator patterns are complex, nonlinear combinations of signal and noise and are identified from the 1960s onward in simulated and observed surface temperature maps. This approach suggests that viewing climate patterns through an artificial intelligence (AI) lens has the power to uncover new insights into climate variability and change.


Introduction
Identifying a "signal" within a background of "noise" is a fundamental goal of many aspects of climate science. An example is the tremendous body of climate literature seeking to identify climate change signals due to anthropogenic activities in global and regional climate records, as are efforts trying to establish the influence of slowly varying ocean conditions on largely stochastic atmospheric modes of variability (e.g., Barsugli & Battisti, 1998). As a result, climate scientists have utilized a large number of statistical methods specifically designed to extract a signal of interest from a background of climate noise. A few examples include regression to identify linear trends (e.g., Mudelsee, 2019), filtering via spectral analysis to highlight specific time scales (e.g., Wheeler & Hendon, 2004), empirical orthogonal functions to extract dominant patterns of variability (e.g., Thompson & Wallace, 1998), and even simply averaging across many ensembles of climate simulations to reduce uncertainties in projections related to unforced climate variability and model uncertainty (e.g., Hawkins & Sutton, 2009).
Even with such an extensive toolbox, identifying climate signals from either natural forcings (e.g., volcanic eruptions, solar irradiance variability) or external forcings (e.g., anthropogenic greenhouse gas, aerosol emissions) remains a challenge, simply because the climate system is filled with internal noise. The El Niño-Southern Oscillation, Atlantic Multidecadal Variability, and the Interdecadal Pacific Oscillation are all well-documented examples of internal climate variability (e.g., Cassou et al., 2018;Meehl et al., 2014) surrounded by considerable debate around how, or even if, they respond to changes in forcings (e.g., Klockmann et al., 2018;Smith et al., 2016;Terray, 2012;Zhang & Delworth, 2016). A further complication is that the climate response to forcings can also take the form of one of its own modes of internal variability (known as fluctuation-dissipation; e.g., Shepherd, 2014). Thus, the excitation of an internal mode could just reflect stochastic, internal climate variability, or it could be evidence of a forced response.
Here, we demonstrate that artificial neural networks (ANNs) are an additional tool for the climate science "toolbox" for the purposes of extracting forced climate signals from noisy data. Specifically, we train an ANN on simulated maps of annual-mean surface temperature and precipitation to detect climate patterns forced under historical and future climate scenarios. By training the ANN to predict the year from which the maps came, the network learns to identify the spatial patterns that serve as the most reliable indicators of a changing climate-those patterns that best discriminate changes from the background noise of internal climate variability and model disagreement. These "indicator patterns" are thus a complex, nonlinear combination of the forced signal, internal climate variability, and model disagreement, and they highlight the regions of the globe with the most reliable response to a changing climate.
Recent advances in artificial intelligence (AI) have enabled autonomous detection of complex patterns in many different applications, ranging from facial recognition (Zhao et al., 2019) to extreme weather events (McGovern et al., 2017). AI is also a major new tool to improve forecasts and to aid in the development of numerical models, for instance, to better represent clouds and their atmospheric heating and moistening in relatively coarse resolution climate models (e.g., Brenowitz & Bretherton, 2019;O'Gorman & Dwyer, 2018). Our results suggest another potential application of AI approaches for climate science, namely, a new technique to identify signals amidst a large background of climate noise.

Data
We use annual-mean global 2-meter air temperature and precipitation rate output from climate model simulations performed for the Coupled Model Intercomparison Project, phase 5 (CMIP5; Taylor et al., 2012). The 29 models analyzed for temperature and the 22 models analyzed for precipitation are given in supporting information Tables S1 and S2. A single ensemble member is analyzed from each model. Each simulation is analyzed under historical forcings beginning in 1920 and then into the future (until 2099) under the RCP8.5 scenario . Each of the simulations across models have similar external forcings, so deviations across model projections mainly stem from differences due to climate model physics, resolution, and numerics (i.e., model uncertainty), as well as model differences in the unforced (or internal) variability of the climate (Hawkins & Sutton, 2009).
In addition, we also make use of a 40-member ensemble of the Community Earth System Model (CESM; Hurrell et al., 2013), where each simulation is run under identical RCP8.5 forcing but with differences in the initial conditions in the surface temperature field on the order of 10 -14 K . Since the same model configuration and forcing is used for each ensemble member, differences across simulations can be attributed to internal climate variability alone. Along with the CESM-Large Ensemble (CESM-LE), we also utilize a multicentury CESM control simulation with no year-to-year changes in external forcings, and with estimated atmospheric greenhouse gas levels and aerosol loadings representative of 1850.
For observations of surface temperature, we use the BEST (Berkeley Earth Surface Temperature) gridded fields from Berkeley Earth (Rohde et al., 2013). Prior to the midtwentieth century, data coverage is poor in BEST; thus, we only analyze data from 1956-2018 when there is complete global coverage. Monthly observational precipitation fields were obtained from the NOAA Global Precipitation Climatology Project (GPCP), version 2.3 for 1979-2018 (Adler et al., 2018). We refer the reader to the supporting information for additional details on these data sets.

Neural Network Architecture and Training
Our analysis is set up as a prediction task, taking annual-mean global maps of temperature (or precipitation) as input and training an ANN to predict the year from which they originated. The type of ANN architecture employed is shown in Figure 1a. Each unit of the input layer (yellow; 4,050 total) represents the temperature/precipitation of one grid point of the input map. The input layer is followed by a number of hidden layers (blue) that contain auxiliary units. The output layer (red) consists of a single unit representing the estimated year and learns a regression operation over the outputs of the last hidden layer, but without the nonlinear activation function. We use a simple architecture with a very small number of hidden layers (and units), as this setup turned out to be sufficient for our application. We opted for the simplest type of ANN with decent accuracy since the primary goal is not perfect prediction, but gaining insights into how (d, f) Results when global mean was removed from each map. The 1:1 line is plotted in black. Training data largely follow the diagonal and are hidden beneath the testing data, shown as larger dots with colors representing each of the six or four simulations used as testing data for temperature or precipitation. the prediction is generated (i.e., the spatial patterns). Furthermore, we repeated our results using deeper networks (i.e., more layers and units) and found the prediction errors to be similar to the shallower networks employed here.
For all results in the main text except Figure 2, we utilize an ANN with two hidden layers of 10 units each. For visualization in Figure 2, we train the ANN with one hidden layer with a single hidden unit. Such a setup is identical to linear regression, but with the addition of a nonlinear transformation before the output unit (i.e., the year assigned to the map). Results for the single hidden unit architecture are shown in the supporting information Figure S1. When the number of hidden layers is zero, the model is a simple linear regression model, with the output being a linear weighted combination of the values at the grid points. We include the linear regression model (i.e., zero hidden layers; supporting information Figures S2, S7, and S8) as a baseline method for comparison with the more complex ANNs.
We trained the ANNs using a loss function determined by the root-mean-square error between the predicted and correct year within the training simulations. Parameters were optimized by a gradient descent method, and we trained the ANN to minimize the mean squared error in the predicted year using Møller's Scaled Conjugate Gradient algorithm (Møller, 1993), which is an approximate second-order gradient descent algorithm.
Regression models are often hampered by dependencies among input variables (termed multicollinearity). In our application, this is through spatial correlation of temperature or precipitation across grid points. A common approach to deal with such collinearity is ridge regression (L 2 norm) which limits the magnitude of the regression coefficients by penalizing large magnitude coefficients through an additional regularization term (Marquardt & Snee, 1975). We employ ridge regression by adding the mean of squared weight values as a penalty term to the error term in the loss function. The amount by which this penalty influences the optimization is controlled by a constant factor referred to as the ridge parameter (λ ≥ 0). For λ = 0, only the squared error in the year prediction determines the final weight values while large values of λ limit how large individual weights can grow. This forces the model to combine values from many grid points to predict the year resulting in indicator maps that are generally more smooth, thus focusing on larger patterns and reducing the chances of overfitting. Here, we only penalize weights within the first hidden layer, or between the input layer and the output layer for the linear case since we are only interested in controlling the growth of the input weights to aid the interpretation of indicator maps. Table S3 in the supporting information lists the ridge parameter values that were found to result in good trade-offs between low prediction error in both the CMIP5 testing data and the observations. However, we have repeated our analysis over a range of ridge parameters and find our general conclusions to be robust.
Each ANN is trained over the entire 1920-2099 period for 500 iterations on 80% of the model simulations and then tested on the remaining 20%. Except for the results shown in Figure 3, which are a summary of 15 different training/testing sets, all temperature (or precipitation) results utilize the same training/testing sets and the same initialization of the model. This is done so that any differences in results across architectures (i.e., linear, one hidden layer, two hidden layers) can be attributed to the architecture and not the training/testing set. Values vary slightly for different training/testing sets, but overall results are robust.

CMIP5 Results
First, consider the results obtained from training an ANN on simulated climate model output from a preindustrial control run with no year-to-year changes in anthropogenic or natural forcings. The ANN fits the noise of internal variability across the training data, but cannot identify the year from which the testing data came since there are no reliable indicator patterns to help distinguish any one year from another ( Figure 1b). Contrast this with the ANN trained and tested on surface temperature maps that include external and natural forcings from 29 CMIP5 models (Figures 1c and S3). Because the forced temperature signal in the early twentieth century is small relative to internally generated variability (e.g., Figure 9.8 of Flato et al., 2013), the ANN cannot reliably predict the year; however, by the midtwentieth century a forced pattern emerges and the ANN performs well estimating the year through the end of the 21st century. During this time, anthropogenic greenhouse gas and aerosol forcings increase substantially Lamarque et al., 2011;Figure 8.18 of Myhre et al., 2013), and thus, the learned indicator patterns become easier to differentiate from internal variability and intermodel differences as the forced signal increases.
The results for surface temperature confirm that the ANN is successfully learning forced patterns that first emerge in the twentieth century; however, the results are perhaps not surprising given that temperature exhibits one of the most robust and well-documented impacts of anthropogenic climate change from a variety of forcing agents (Andrews et al., 2018;Bindoff et al., 2013;Gregory & Andrews, 2016;Knutson et al., 2017). Precipitation, on the other hand, has lower signal-to-noise ratios and greater intermodel disagreement in response patterns (Santer et al., 1994). Nevertheless, the ANN is able to learn reliable patterns of change in annual-mean precipitation, which emerge by the 1980s when the ANN begins to successfully identify the year (Figure 1e).
To investigate whether the ANN results are primarily driven by global-mean signals in temperature and precipitation, we used the same ANN architecture and initialization seed, but first removed the global-mean value from each map. The results across the testing models are very similar to those obtained with the spatial mean included (Figures 1d and 1f). This indicates that the ANN is truly relying on spatial patterns for accurate prediction. The results also highlight one particular testing model (blue dots; CNRM-CM5) offset from the others with the ANN predicted year being~40 years too early for surface temperature (Figure 1d). The fact that the network successfully identifies the ordering of the years, but is offset, suggests that this particular model exhibits similar patterns to the others but with a bias toward a smaller or delayed response.

Observations
Our "model only" analysis demonstrates that the ANN successfully identifies reliable indicator patterns of change in the CMIP5 model simulations amidst the noise of internal variability and model uncertainty. But are these learned indicator patterns of change identifiable in observations? Maps of observed surface temperature from the BEST data set for years 1956-2018 are fed into the ANN trained on the climate models (Figures 1c and 1d). For the mean-included observational maps (Figure 1c), the ANN is successful at identifying the correct year after roughly 1960 from the observed maps. This demonstrates that the indicator patterns of change identified by the climate model-trained ANN are present in the observations. When the global mean is removed from the observations (Figure 1d), the ANN predictions are shifted~25 years later, although the ordering of the years is still largely correct. This indicates that the regional patterns learned by the ANN are present in the observations, but look more like the mean-removed patterns roughly two decades later in the climate model simulations.
When observationally based precipitation data from GPCP are fed into the ANN, the network is also able to successfully identify the year (Figures 1e and 1f), indicating the detection of a forced response. When the mean is removed (Figure 1f), the ANN performs even better, with the ordering of the observed years still largely correct. As was the case for temperature (Figure 1d), the mean-removed indicator patterns are present in the observations but look more like the patterns 25 years later in the climate model simulations.

Indicator Patterns
To produce Figures 1b-1f, an ANN with two hidden layers of 10 units each was employed. We additionally trained the ANN with a much simpler architecture-one layer with only a single hidden unit (supporting information Figure S1). While the results are better for the more complex network architecture, the predictions take a similar form and the ability to understand how the network learns decreases. Specifically, the simplified framework has the advantage of allowing visualization of a single set of regression weights as a map (Figures 2c and 2d). It is important to point out that the more complex ANN is not restricted to learn a single set of regression weights, and instead, the patterns used by the ANN to identify the year are allowed to vary in time. Because of this, the maps shown in Figure 2c and 2d are an oversimplified picture of how the complex ANN identifies individual years. Visualizing how the network learns individuals years is an area of ongoing research beyond the scope of this paper.
Classically, the response to external forcing (e.g., the RCP8.5 scenario) is obtained by averaging climate variables across multiple models and/or simulations and then plotting the linear trend or the difference between a future and a past climate state. For surface temperature, the result for the CMIP5 models reveals the wellknown fingerprint of global-scale surface warming, with the strongest warming over the Arctic and continental regions (Figure 2a). In contrast, the indicator map produced by the ANN (Figure 2c) highlights regions where temperature changes offer the most reliable indication of a forced response, considering both the local internal variability as well as the level of model disagreement at each point. An illustrative example is that the ANN assigns lower weight to the warming over the Arctic as an indicator (Fyfe et al., 2013;Swart et al., 2015) because of the large internal variability there (supporting information Figures S12-S14). While the Arctic is expected to warm at an accelerated rate, a very warm Arctic in a particular year is not necessarily a good indicator of the magnitude of a forced climate response. Alternately, the ANN identifies regions such as the Indian Ocean as an important indicator of a changing climate: a region characterized by low internal variability and statistically significant warming in both observations and models (Hurrell et al., 2004).
The forced response for precipitation reveals increases in the Intertropical Convergence Zone (ITCZ) over the central Pacific, whereas subtropical regions typified by subsidence become even drier (Figure 2b). Contrast this with the indicator map learned by the ANN (Figure 2d). This indicator map takes into account the internal noise in precipitation and also model disagreement in the precipitation response, which is substantial (IPCC, 2013, chap. 12). Because of this, the region with the largest forced response, the ITCZ , does not receive large weights from the ANN. For both temperature and precipitation, the indicator patterns with and without the mean removed highlight similar regions ( Figures S6 and S7).

Year of Departure
For both temperature and precipitation, the learned indicator patterns are not identifiable in simulations of the early twentieth century, as seen in Figures 1c-1f. It is not until the midtwentieth century that the indicator patterns emerge from the noise and the network is better able to discern the true year, as visualized by the upward slope of the identified years in the middle-to-late twentieth century. To quantify this, the "year of departure" is defined as the year when the ANN is able to distinguish that year's map as different from any map that came from the baseline period of [1920][1921][1922][1923][1924][1925][1926][1927][1928][1929][1930][1931][1932][1933][1934][1935][1936][1937][1938][1939]. Under this definition, the earliest possible year of departure for identifying a forced change is 1940. A step-by-step explanation of the calculation is given in the supporting information and is similar to that of Mora et al. (2013).
Global maps of simulated surface temperature first depart from the baseline period by the 1960s, with a median year of 1978 computed across 15 iterations of training the ANN (Figure 3a). The spread displays the range of values obtained across all climate model testing simulations over 15 different combinations of training the ANN. When trained for subregions of the globe, the year of departure shifts later due to the decrease in the signal-to-noise ratio with smaller scales (Hawkins & Sutton, 2009). In the tropics, the ANN is able to reliably detect a forced response from individual surface temperature maps by the 1980s, with identification in the extratropics 10 years later. In contrast, the polar regions have much later years of departure due to the larger internal variability and model uncertainty (supporting information Figures S12 and S13). Maps of global precipitation depart from the baseline as early as the 1980s, with a median year of departure of 1997 ( Figure 3b). This is nearly two decades after the corresponding result for temperature due to the smaller signal-to-noise ratio and larger model disagreement in precipitation projections (e.g., Marvel & Bonfils, 2013;Neelin et al., 2006).
To demonstrate the important role that model uncertainty plays in shifting the year of departure to later years, we trained the ANN on an ensemble of 40 simulations using a single climate model (CESM-LE; pink boxes in Figure 3). Years of departure are significantly earlier than those trained across the CMIP5 ensemble since the ANN need only separate the forced signal from internal variability; however, this substantially degrades the observational predictions ( Figure S4). Only for the Arctic are the years of departure for a single climate model similar to those from the CMIP5-trained ANN, suggesting the dominant role of internal variability in this region (e.g., Olonscheck et al., 2019) over model uncertainty in masking a forced response there.

Discussion
The Historical and RCP8.5 forcing scenarios include variations in solar irradiance, changes in anthropogenic aerosols and black carbon, aerosol loadings due to volcanic eruptions, and changes in both tropospheric and stratospheric ozone, among others. In addition, anthropogenic land cover change has a direct impact on Earth's radiation budget through changes in the surface albedo, as well as through impacts on greenhouse gas concentrations from (principally) deforestation (Ciais et al., 2013). Since we have made no attempt to isolate one particular forcing from the others, the indicator patterns learned by the ANN can only strictly be viewed as the response to the combined effects of these forcings that rise over the noise of internal variability and model disagreement. In this sense, even though the ANN detects a forced change in climate, this cannot be viewed as a formal attribution study.
With that said, best estimates of the time histories of forcings over the historical record exist (see IPCC, 2013, Ch. 8). Natural drivers of climate change operate on multiple time scales, particularly solar variability and highly episodic volcanic forcing. While it is possible that a deep neural network could learn the specific years of volcanic eruptions, for instance, it is clear that our shallow network is unable to do so. This can be seen in Figure S16, where the ANN's prediction of the year is approximately 10-15 years too early following the eruption of Mount Pinatubo in 1991. This underestimate of the year reflects the cooling of the climate due to volcanic aerosols (e.g., Robock, 2000), which the shallow ANN instead identifies as being due to the map being from a year a decade or more earlier. Similarly, that the ANN predictions are relatively flat prior to~1960 (Figures 1c-1f) suggests that the neural network is not relying on early twentieth century increases in the solar irradiance, which shows a slight overall decline after 1950 (Figure 8.11 of Myhre et al., 2013).
In contrast, emissions of carbon dioxide have made the largest contribution to the increased anthropogenic forcing of the climate in every decade since the 1960s, and our results are consistent with the formal detection and attribution literature which attributes changes in global temperature and precipitation to increasing concentrations of carbon dioxide and other greenhouse gases over the latter half of the twentieth century (Bindoff et al., 2013). This includes studies which have illustrated the "fingerprint" of a human influence on tropospheric and stratospheric temperatures, including the time of emergence of that forced signal (~year 2000;Santer et al., 2018).

Conclusions
What distinguishes this neural network method from other approaches for isolating forced climate patterns is that the signal, the internal variability, and the model differences are allowed to evolve nonlinearly over the 20th and 21st centuries, with no need to estimate the internal variability from a long control run or assume it is stationary. Furthermore, detecting the signal in observations does not require one to directly estimate the internal variability from a model or from a detrended historical record, which has its own limitations since low-frequency internal variability may be included in the estimation of the forced response, which could lead to an underestimate of the noise (Deser et al., 2012). With that said, this approach is only useful in-so-far as the models are able to adequately simulate the observed internal variability as it evolves, and if they do not, the ANN will perform poorly when predicting the year from observational maps.
The results shown here are strongly suggestive of the potential power of artificial neural networks for climate research. For example, this framework may be useful for quantifying differences in the timing and patterns of climate change across different climate models, and evaluating models against observational data sets. Furthermore, this approach could be useful for detection and attribution of anthropogenic climate change, provided the simulations specifically isolate the forcings of interest. Application of additional analysis and interpretation techniques for ANNs, such as regularized input optimization (Erhan et al., 2009) and deep Taylor decomposition (Montavon et al., 2017), has the potential to reveal how the indicator patterns vary