Quantifying the net contribution of the historical Amazonian deforestation to climate change

Recent large‐scale carbon (C) emissions from deforestation have been estimated by combining remotely‐sensed land use change information with satellite‐based aboveground biomass (AGB) data. However, these estimates are constrained to the satellite era while regions such as the Amazon basin have been heavily impacted by deforestation before this period. Assessing the net contribution of past tropical deforestation to the growth in atmospheric CO2 is therefore challenging. We address this lack of data by constructing two maps of potential AGB with a machine learning algorithm trained on the relationship between AGB and climate and topography in intact forest landscapes of the Amazon basin. Reconstructions converge to a current deficit of 11.5–12% in AGB or a net loss of ~7–8 Pg C of AGB in the Amazon basin compared to current estimates. This represents a net contribution of ~1.8 ppm of atmospheric CO2 or 1.5% of the historical growth.


Introduction
The terrestrial biosphere has taken up~25% of anthropogenic emissions of carbon dioxide (CO 2 ) since the 1960s [Le Quéré et al., 2014] and thereby helped offset global warming. However, the durability of this sink is threatened by continuous emissions from land use and land cover change like fire, logging, or conversion of forests to croplands that offset about half of the terrestrial sink [ Van der Werf et al., 2009].
In recent years, satellite observations have enabled us to monitor anthropogenic land cover changes globally [Hansen et al., 2013] and identify undisturbed Intact Forest Landscapes (IFL) [Potapov et al., 2008]. Combining the extensive spatial coverage of remote sensing data with ground-based measurements has enabled the generation of maps of aboveground biomass (AGB) over moist tropical regions [Saatchi et al., 2011;Baccini et al., 2012]. Satellite information about the evolution of the land cover and remote sensing-based AGB over changing regions can be used to provide first-order estimates of the amount of C emitted through deforestation between two points in time [Ramankutty et al., 2007;Loarie et al., 2009;Baccini et al., 2012;Harris et al., 2012].
However, the availability of high-resolution and high-frequency global land cover maps derived from satellite data is limited to a period from 2000 onward [Hansen et al., 2008]. Accordingly, AGB maps are assumed to be representative of the first decade of the 21st century [Saatchi et al., 2011;Baccini et al., 2012;Mitchard et al., 2014]. Regions of the arc of deforestation in the southern Amazon basin, clearly located by spatial discontinuities in the distribution of IFL [Potapov et al., 2008] and high rates of tree cover loss [Hansen et al., 2013], have been repeatedly exploited for logging decades before 2000 [Ramankutty and Foley, 1999;Loarie et al., 2009]. Therefore, the difference between current AGB in the Amazon basin and its potential (i.e., without past land use and land cover change, or LULCC) AGB remains unknown. Knowledge of this difference would shed light on the net contribution of the region's anthropogenic land use to increasing atmospheric CO 2 concentrations and ongoing climate change.
To address this problem, we construct two data-informed maps of potential AGB in the Amazon basin based on two contemporary high resolution, remote sensing based estimates by Saatchi et al. [2011, hereinafter SA11] and Baccini et al. [2012, hereinafter BA12]. We compare the outcome of this approach for different land cover types [Jung et al., 2006] and ensemble (n = 25) of historical simulations (without anthropogenic LULCC) from the Intersectoral Impact Model Intercomparison Project (ISI-MIP) [Warszawski et al., 2014]. The goal of this validation is to check whether our data-informed potential AGB matches independent estimates obtained with state-of-the-art process-based global vegetation models (GVMs). Our analysis provides insights into the robustness of estimates of current AGB deficit in the Amazon basin. This deficit represents the net contribution of this region's LULCC to the historical increase in atmospheric CO 2 and hence climate change.

Current Estimates of Aboveground Biomass in the Amazon Basin
Several estimates of AGB exist for the Amazon basin [Brown and Lugo, 1992;Fearnside, 1997;Potter, 1999;Houghton et al., 2001;Malhi et al., 2006;Saatchi et al., 2007Saatchi et al., , 2011Baccini et al., 2012] that exhibits up to a twofold difference in stocks, accompanied by large discrepancies in the spatial distribution of AGB [Mitchard et al., 2014]. Earlier studies have relied on spatial extrapolation [Brown and Lugo, 1992;Fearnside, 1997] and ecosystem models [Potter, 1999], while most recent estimates are based on high-resolution remote sensing products [Saatchi et al., 2007[Saatchi et al., , 2011Baccini et al., 2012]. Satellite sensors do not measure AGB, so recent estimates SA11 and BA12 have been created using machine learning algorithms trained to reproduce the relationship between ground-based estimates of AGB and multiple remote sensing products, such as the canopy height of the Geoscience Laser Altimeter System on board NASA's Ice, Cloud, and land Elevation Satellite [Zwally et al., 2002], elevation data from the Shuttle Radar Topography Mission, and visual and infrared data from Moderate Resolution Imaging Spectroradiometer (MODIS). SA11 further uses Quick Scatterometer radar data of surface moisture. Variations in algorithms (Maximum Entropy [Phillips et al., 2006] for SA11 and Random Forest [Breiman, 2001] for BA12) and training data lead to disagreements in the distribution and total amount of AGB between these remote sensing-based estimates [Mitchard et al., 2014]. Saatchi et al.
[2011] provide a pixel-by-pixel measure of the uncertainty that integrates to represent less than ± 1% of their mean total carbon stocks in Amazonia [Mitchard et al., 2014]. Baccini et al. [2012] do not provide spatially explicit uncertainty but report an uncertainty estimate of ± 7% for the total carbon stocks in Amazonia [Mitchard et al., 2014].

Reconstruction of an Undisturbed Amazon Basin
Potential AGB is the amount of AGB that would exist if past and current large-scale deforestation and clearing had not replaced the original land cover, thereby allowing plant communities to have remained intact in currently cleared regions. We first trained two Random Forest algorithms [Breiman, 2001;Pedregosa et al., 2011] to reproduce AGB data SA11 and BA12 in regions identified as IFL [Potapov et al., 2008] as a function of the high-resolution (1/6°× 1/6°) climatology and topography CL2.0 data set available from the Climate Research Unit of the University of East Anglia [New et al., 2002]. IFL have been identified by remote sensing and include continuous areas of forest where no sign of significant human LULCC can be identified [Potapov et al., 2008].
In order to consider the inherent uncertainty in the AGB maps used to train the algorithm, we performed this procedure once with each of the SA11 and BA12 maps previously aggregated to match the resolution of CL2.0 using a weighted area interpolation to the nearest neighbor. We assume that uncertainties in SA11 and BA12 are spatially uncorrelated [Mitchard et al., 2014] and that the AGB estimate reported in every training pixel is a function of the local climate. Furthermore, local and regional differences between SA11 and BA12 data sets of AGB are likely greater than their respective uncertainties. Therefore, we only consider the mean value reported in each pixel and consider the uncertainty of our estimates to be the difference between the two AGB maps. The selected climate variables correspond to quantities used by process-based ecosystem models to simulate the terrestrial carbon balance and can be derived from climate models. Monthly mean data of total precipitation, number of wet days, and number of groundfrost days were summed up to annual values. Monthly mean temperature, mean relative humidity, mean wind speed, diurnal temperature range, and mean fraction of maximum possible sunshine were averaged to annual values. Topographical information was limited to an elevation map as other landscape features that may influence the distribution of vegetation (such as slope) are smoothed at the utilized 1/6°× 1/6°s patial resolution. We also use latitude as a proxy of intraannual photoperiod amplitude. More details about the CL2.0 data set are included in the supporting information (Text S1, Figure S1, and Table S1).
For simplicity, we only use annual mean or cumulative values and thereby disregard part of the seasonality in climate drivers such as the occurrence of a dry and a wet season in the southern part of the Amazon basin. However, information about precipitation seasonality is intrinsically included in other Geophysical Research Letters 10.1002/2015GL063497 variables. For example, the amount of rainfall is similar between the northern and the southern parts of the basin, while the southern part only experiences half as many rainy days per year ( Figure S1 of the supporting information). Furthermore, we assume that small-scale mechanisms that may have an impact on AGB in regions identified as IFL, such as cultivation practices of indigenous populations [Ramankutty and Foley, 1999], small-scale disturbances [Espírito-Santo et al., 2014], and past large-scale droughts, are intrinsically taken into account in the training AGB data and covered by the uncertainty represented by differences between the original maps SA11 and BA12.
The Random Forest method uses decision trees to group pixels based on their climate data. In each final cluster, a multiple linear regression is built to reproduce AGB as a function of climate data. In 90% of the disturbed pixels (non-IFL), climate data fall within the boundaries of climate data in undisturbed regions (IFL) for all variables, with an average of 97% when only one climate variable is considered. Therefore, the Random Forest models previously fitted in undisturbed regions (IFL) can predict potential AGB as a function of climate and topography in disturbed regions (non-IFL) with a reasonable degree of confidence.
As the Random Forest will use the most relevant statistical relationship in each pixel, all the Amazon basin can be simulated by the same algorithm regardless the spatial distribution of IFL.
In the following, SA11r refers to the map reconstructed using SA11 as training data set, while BA12r refers to the map reconstructed using BA12 as training data set. The difference between potential AGB maps and current AGB maps provides us with a first-order estimation of the net contribution of land cover change in the Amazon basin. We provide a synthetic validation of the approach in the supporting information (Text S2 and Figure S2) using only data corresponding to IFL. It shows that the Random Forest method is skilled in predicting AGB as a function of climate variables in regions that were not used in the training data set and is therefore not overfitted. While we cannot verify potential AGB reconstructions, we are confident that their quality is comparable to the synthetic validation because of similarities in climate data, the larger number of training pixels (~10,000 versus~5000) and the larger ratio of training to predicted pixels (1.4:1 versus 1:1) in the final reconstructions compared to the synthetic validation.

Validation
Data-informed reconstructions of potential AGB in IFL cannot be validated against observations. However, we expect non-IFL currently dominated by trees to have a lower AGB deficit than landscapes without trees. Therefore, we study the distribution of AGB deficit for three broad land cover types of the landcover product SYNMAP [Jung et al., 2006]. SYNMAP is based on the synergies between three independent land cover products: the Global Land Cover Characterization Database, the Global Land Cover 2000, and the MODIS land cover product. It represents a more reliable estimate based on multiple land cover classes that we regrouped in three dominant types: trees, mixed landscapes, and landscapes without trees. We also compare our reconstructions to a large number of simulations from the recent Intersectoral Impact Model Intercomparison Project (ISI-MIP) [Warszawski et al., 2014]. Previous studies have described the high uncertainty of Amazonian AGB dynamics under climate change in GVMs [e.g., Galbraith et al., 2010]. Therefore, we use an ensemble of five global vegetation models (GVMs) driven by bias-corrected data [Hempel et al., 2013] from five general circulation models for the historical experiment of the fifth phase of the Coupled Model Intercomparison Project [Taylor et al., 2012]. This ensemble represents a good sample of the uncertainty in current state-of-the-art vegetation modeling. Models are summarized in Table S3 of the supporting information. The 25 simulations, available at a 0.5°× 0.5°spatial resolution, cover the existing uncertainties in state-of-the-art ecosystem and climate model structures. This ensemble has recently been used to address the main sources of uncertainty in terrestrial vegetation response to climate change [Friend et al., 2014]. ISI-MIP simulations do not include anthropogenic LULCC [Nishina et al., 2014] and therefore simulate potential modern AGB as a result of multiple years of model integration, which can be compared with our independent reconstructions of potential modern AGB that are based on the empirical relationship between modern AGB in IFL and climate. Furthermore, differences in spatial resolution and boundary conditions guarantee a large degree of independence between ISI-MIP simulations and our reconstructions. We in each 1/6°× 1/6°pixel and consider total carbon to be 50% of total biomass BGB + AGB. The above relationship is valid for forested and woodlands regions [Mokany et al., 2006] and there may be a delay in the response of BGB to AGB removal. Therefore, we limit the interpretations of our results to the estimated deficit in AGB, which is the part of biomass directly removed by anthropogenic clearing activities.

Reconstructions
Reconstructed maps reproduce the spatial organization of AGB in the two original maps with a high degree of precision for IFL regions and exhibit large differences in previously disturbed (non-IFL) areas. Original and reconstructed AGB maps are presented in Figure 1 in Mg C ha À1 . Table S2 of   and Table S2 of supporting information). Spatial patterns observed in both training maps, such as the general lower AGB around the drainage network north of 10°S and between 75°W and 70°W, are well picked up by our reconstructions. Overall, the root-mean-square error of the reconstructions is comparable and amounts to 8.1 for SA11r and 8.4 Mg C ha À1 for BA12r. Interestingly, the importance of explanatory variables varies between the two maps. In SA11, the most important variables are elevation (one of the actual training variable reported in Saatchi et al. [2011]), precipitation, and latitude; while BA12 is more sensitive to the number of wet days, wind speed, and diurnal temperature range data that were not used in the creation of BA12. More information about importance is to be found in Text S3 and Figure S4 of the supplementary information.
The good agreement in spatial patterns and magnitude observed in Figure 1 translates into trivial underestimations of total AGB over the IFL of 0.007% (SA11r) and 0.003% (BA12r), in the reconstructed maps as compared to their respective original maps. However, reconstructions exhibit very different values compared to the original AGB maps outside the IFL (represented in Figure 2). This is obvious in the southernmost part the Amazon basin, where the large patches of low AGB (<25 Mg C ha À1 ) in northern Bolivia seen in SA11 ( Figure 1a) and BA12 (Figure 1c) disappear in their respective reconstructions, with SA11r ( Figure 1b) and BA12r (Figure 1d) converging toward values~75 Mg C ha À1 . Similarly, the large deficit in AGB found in both SA11 and BA12 maps at the border between Brazil and Guyana disappears in our reconstructions. There is more AGB around the Amazon River between the equator and 5°S in the reconstructed maps, although potential AGB outside IFL regions is consistently lower than 100 Mg C ha À1 around rivers, while it can reach values around 150 Mg C ha À1 elsewhere.
Reconstructions of potential AGB allow assessment of the current AGB deficit in the Amazon basin and corresponding net C losses and past contribution to atmospheric CO 2 growth. Differences in AGB between reconstructions and the original maps are shown in Figure 2. We masked out IFL where differences are trivial, as seen in Figure 1. Both reconstructions agree well on the spatial distribution of AGB deficits (R 2 = 0.66, p < 0.001) in all regions except the westernmost part of the Amazon basin in Peru, where BA12r simulates a deficit larger than SA11r by up to 50 Mg C ha À1 . In both reconstructions, deficits of up to 100 Mg C ha À1 are  Figure 2c represents the distribution of AGB deficit according to the land use map. Both reconstructions agree that AGB deficit (i.e., in non-IFL regions) is higher in zones currently without trees (~60 Mg C ha À1 ) than in mixed landscapes (~50 Mg C ha À1 ) and especially in forested regions (~20 Mg C ha À1 ). Overall, SA11r indicates that an undisturbed Amazon basin would hold 67.3 Pg C in AGB while SA11 currently indicates an AGB of 60.0 Pg C, a 7.3 Pg C deficit. The reconstruction BA12r indicates a higher estimate of 77.1 Pg C in potential AGB that corresponds to a current 8.0 Pg C deficit compared to the BA12 current AGB estimate of 69.1 Pg C.

Comparison With ISI-MIP Simulations
We have calculated the potential total plant biomass (i.e., BGB + AGB) using the relationship in equation (1) in order to compare results with the ISI-MIP ensemble. Reconstructions SA11r and BA12r provide estimates of total plant biomass in the Amazon basin of 86.8 and 99.1 Pg C. This is in agreement with the 72.7 ± 29.4 Pg C (mean ± 1 standard deviation) of total plant biomass simulated by ISI-MIP models at the beginning of the 21st century, despite SA11r and BA12r representing an overestimation of 19% and 36% of the ISI-MIP mean, respectively. The latitudinal distribution of SA11r-and BA12r-based potential total biomass is within 1 standard deviation of the model mean except in the southernmost part of the basin where ISI-MIP simulate lower biomass (Figure 3a). Both models and our reconstructions exhibit a trend of increasing biomass with latitude. The longitudinal distribution of our potential biomass estimates is within the uncertainty of the ISI-MIP models (Figure 3b). However, while ISI-MIP models exhibit a strong west-east decreasing trend in biomass, zonal averages based on our reconstructions do not show a clear trend, which is more in agreement with field plot data from IFL [Mitchard et al., 2014]. Overall, our reconstructions fall well within the high-confidence interval of potential biomass simulated by these GVMs without anthropogenic LULCC, both for total stocks and spatial distribution. This gives us confidence in the estimates of AGB deficit that can be derived from our data-informed approach.

Discussion
The ability of the Random Forest method to reproduce its training data set is not surprising, considering the way it splits the AGB data in many nodes according to climate variables. However, results of the synthetic validation (Text S2 and Figure S2 of the supporting information) show that the Random Forest algorithms are not overfitted to the training data sets as they are able to reproduce AGB in out-of-sample regions. Furthermore, the spatial distribution of AGB deficits is robust between SA11r and BA12r despite differences in their respective training data sets SA11 and BA12 and large discrepancies in the importance of explanatory variables (Text S3 and Figure S4 of the supplementary information) illustrate differences in the generation of the original maps from remote sensing data. The only mismatch seen in the westernmost part of the basin can be linked to this part of the basin having no matching region in the training data set in terms of climatic and topographic properties, and the extrapolation of a model fitted to these data. Nevertheless, SA11r and BA12r agree on larger absolute AGB deficits in currently tree-less non-IFL regions than in currently forested ones. This result is consistent with our expectation that more biomass is missing from land cover types without trees (i.e., cleared regions without forest regrowth) than in previously disturbed regions where clearance was not complete and/or regrowth has been allowed (current land cover includes trees). This agreement between AGB deficit distributions ( Figure 2c) is an indicator of the robustness of our maps, as these were constructed without any land cover information. Furthermore, estimates of potential biomass derived from SA11r and BA12r are within 1 standard deviation of the mean state from the ISI-MIP GVMs ensemble of process-based models for total biomass as well as the latitudinal and longitudinal distributions (Figure 3). This provides us with confidence on the robustness of our results even though the west-east longitudinal trend of biomass decrease exhibited by ISI-MIP models is absent from our reconstructions. They are nevertheless more in agreement with field plot data in IFL reported by Mitchard et al. [2014].
The estimated AGB deficits differ by 0.7 Pg C when integrated over the whole basin because of differences in the training data SA11 and BA12. However, AGB deficits derived from the reconstructions converge to a very similar fraction of the total potential AGB: 12.2% for SA11r versus 11.6% for BA12r. These results indicate that land clearance in the Amazon basin has reduced the carbon storage in aboveground biomass by~12% from its potential regardless the AGB estimates, although they constitute the largest source of uncertainty in the tropical carbon balance [Houghton, 2003]. This reduction represents a net loss generated by large-scale human activities, the result of both biomass removal, and regrowth processes occurring in regions where forests were replanted or allowed to regrow [Ramankutty et al., 2007]. Assuming that AGB maps are representative of 2005-2008 [Saatchi et al., 2011;Baccini et al., 2012] and that large-scale deforestation started in 1960 [Ramankutty et al., 2007], our reconstructions indicate an average net C loss of 0.16-0.18 Pg C yr À1 over that period. This is at the lower end of estimates of gross C emissions from recent Amazonian deforestation which range from 0.16 Pg C yr À1 in 2001-2007 [Loarie et al., 2009] when only considering biomass removal to 0.41 Pg C yr À1 in 1990-2009 when considering both instantaneous emissions and emissions due to subsequent changes in biogenic processes [Aguiar et al., 2012]. As deforestation rates decreased in recent decades [Nepstad et al., 2006;Malhi et al., 2008;Ramankutty et al., 2007], cumulative gross C emissions due to deforestation have been higher since 1960 than our estimates of net AGB deficits, which also include regrowth. Accordingly, estimates of total cumulative biomass extraction of 21.7 Pg C or 0.4 Pg C yr À1 removed in Tropical South America between 1960 and 2010 [data retrieved from www.globalcarbonatlas.org; Le Quéré et al., 2014] are about 3 times larger than the current 7.3-8 Pg C AGB deficit reported here. Our estimates of the current AGB deficit in the Amazon basin represent from 1.1 to 1.7% of the 450-650 Pg C stored in global vegetation biomass [Prentice et al., 2001] and potentially more if we consider the impact of AGB removal on BGB. Using a unit conversion factor of 2.120 Pg C ppm À1 [Joos et al., 2013], and considering that the land and ocean carbon sinks have maintained the airborne fraction of anthropogenic CO 2 emissions at about 50% [Oeschger and Heimann, 1983], the total net losses of C from the Amazon basin AGB correspond to 1.7-1.9 ppm, or around 1.5% of the total increase in atmospheric CO 2 from concentrations of 277 ppm at the beginning of the industrial era [Joos and Spahni, 2008;Le Quéré et al., 2014] to around 395 ppm at the end of 2013 [Dlugokencky and Tans, 2014;Le Quéré et al., 2014]. The 7.3-8 Pg C AGB deficit represents a potential long-term sink of about 35 years of emissions from fossil fuel burning, cement production, and gas flaring at the level of 2010 for countries spanning the basin (a total of 0.21 Pg C yr À1 for Bolivia, Brazil, Colombia, Peru, and Venezuela) but less than a year of emissions from humanity at the present level [Boden and Andres, 2013] while it would take much longer for forest to regenerate to their potential state.

Conclusion
We have reconstructed two maps of the potential biomass of the Amazon basin. We validated our datainformed reconstructions by comparing them to independent estimates derived from a large number of process-based GVMs. Results indicate that recent deforestation has caused a current deficit of 7.3-8.0 Pg C in the basin's total AGB. Our results agree that it corresponds to a net contribution of anthropogenic activities in the Amazon basin of about 1.8 ppm, or 1.5%, of the recent increase in atmospheric CO 2 .

10.1002/2015GL063497
Our data-informed, spatially explicit reconstructions of potential AGB can also provide a benchmark for regional carbon cycle models operating without land use change, helping to constrain their simulations of the interactive effects of climate change and human disturbance, and to determine preclearance AGB conditions using inverse methods. Carbon Dioxide Information Analysis Center for national emissions and land use change data. For their roles in producing, coordinating, and making available the ISI-MIP model output, we acknowledge the modeling groups (listed in Table S3 of the supporting information) and the ISI-MIP coordination team. We finally thank two anonymous reviewers and the Editor W. Knorr for their helpful comments.
The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.