Volume 128, Issue 6 e2022JG007196
Research Article
Open Access

EEAGER: A Neural Network Model for Finding Beaver Complexes in Satellite and Aerial Imagery

Emily Fairfax

Corresponding Author

Emily Fairfax

California State University Channel Islands, Camarillo, CA, USA

Correspondence to:

E. Fairfax,

[email protected]

Contribution: Conceptualization, Methodology, Formal analysis, ​Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization

Search for more papers by this author
Eric Zhu

Eric Zhu

Google, Mountain View, CA, USA

Contribution: Methodology, Software, Formal analysis, ​Investigation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Nicholas Clinton

Nicholas Clinton

Google, Mountain View, CA, USA

Contribution: Methodology, Software, Formal analysis, ​Investigation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Stefania Maiman

Stefania Maiman

Google, Mountain View, CA, USA

Contribution: Methodology, Software, Formal analysis, ​Investigation, Writing - original draft, Writing - review & editing

Search for more papers by this author
Aman Shaikh

Aman Shaikh

Google, Mountain View, CA, USA

Contribution: Methodology, Software, ​Investigation, Writing - review & editing

Search for more papers by this author
William W. Macfarlane

William W. Macfarlane

Utah State University, Logan, UT, USA

Contribution: Data curation, Writing - review & editing

Search for more papers by this author
Joseph M. Wheaton

Joseph M. Wheaton

Utah State University, Logan, UT, USA

Contribution: Data curation, Writing - review & editing

Search for more papers by this author
Dan Ackerstein

Dan Ackerstein

Ackerstein Sustainability, Santa Cruz, CA, USA

Contribution: Conceptualization, ​Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Supervision, Project administration, Funding acquisition

Search for more papers by this author
Eddie Corwin

Eddie Corwin

Google, Mountain View, CA, USA

Contribution: Conceptualization, Resources, Data curation, Writing - original draft, Writing - review & editing, Supervision, Project administration, Funding acquisition

Search for more papers by this author
First published: 27 May 2023
Citations: 2

Abstract

Beavers are ecosystem engineers that create and maintain riparian wetland ecosystems in a variety of ecologic, climatic, and physical settings. Despite the large-scale implications of ongoing beaver conservation and range expansion, relatively few landscape-scale studies have been conducted, due in part to the significant time required to manually locate beaver dams at scale. To address this need, we developed EEAGER—an image recognition machine learning model that detects beaver complexes in aerial and satellite imagery. We developed the model in the western United States using 13,344 known beaver dam locations and 56,728 nearby locations without beaver dams. Performance assessment was performed in twelve held out evaluation polygons of known beaver occupancy but previously unmapped dam locations. These polygons represented regions similar to the training data as well as more novel landscape settings. Our model performed well overall (accuracy = 98.5%, recall = 63.03%, precision = 25.83%) in these areas, with stronger performance in regions similar to where the model had been trained. We favored recall over precision, which results in a more complete catalog of beaver dams found but also a higher incidence of false positives to be manually removed during quality control. These results have far-reaching implications for monitoring of beaver-based river restoration, as well as potential applications detecting other complex landforms.

Plain Language Summary

Beavers are ecosystem engineers that can dramatically change the shape of the landscape and how water moves through it. They create and maintain wetland environments across North America in a wide variety of places, including mountains, deserts, coasts, forests, grasslands, shrublands, etc. Despite their large influence on the landscape, there are very few programs that monitor them at the landscape scale. This is partially due to how much time it takes to find and identify beaver dams in satellite and aerial images. To make it easier for us to find and understand the influence of beavers at larger scales, we built a model that can automatically find beaver dams in satellite and aerial imagery. While our model is trained to find beaver dams, this type of model has promise for finding other landscape features too.

Key Points

  • Tracking the distribution and range of ecosystem engineers like beavers is increasingly important under a changing climate

  • A neural network machine learning model can be trained to find and identify beaver dams in aerial and satellite imagery automatically

  • This type of machine learning model may have applications for finding other landforms where geospatial context is an important input

1 Introduction

1.1 Past and Present Distribution of North American Beavers

The North American beaver (Castor canadensis) is widely known as a keystone species and an ecosystem engineer (Brazier et al., 2021). Beavers drastically alter the physical environment by building channel spanning dams, excavating networks of small canals throughout the floodplain, and selectively foraging woody riparian vegetation (Larsen et al., 2021). These landscape modifications create and maintain riparian wetland ecosystems in a variety of environmental settings, including deserts, tundra, forests, rangelands, shrublands, estuaries, alpine headwaters, and urban areas (Lanman et al., 2013; Law et al., 2017; Naiman et al., 1988; Tape et al., 2022). The construction of riparian habitat is a disproportionately important ecosystem service - the majority of terrestrial animals rely on riparian habitat during at least part of their life cycle, and beavers played a large role in creating and maintaining that critical habitat for millions of years (Wohl, 2021).

Though once abundant on the North American continent, beavers were rapidly and systematically trapped to near extinction during the European-American Fur Trade (1600's through early 1800's) (Müller-Schwarze & Sun, 2003). The value of beaver pelts at the time prompted large fur companies to implement extreme trapping policies. For example, to discourage further settlement in the Western US and allow companies to maintain control of the fur stock, the Hudson Bay Company's “Fur Desert” policy explicitly aimed to eradicate all fur-bearers from large swaths of the Western United States (Ott, 2003). As a result of overexploitation, the beaver population crashed from a peak of 100–400 million beavers pre-trapping down into the hundreds of thousands (Naiman et al., 1988). As beavers were lost from the landscape, so were their dams, ponds, and carefully engineered habitat. The scale of landscape change that occurred due to the loss of beavers is not fully understood or appreciated (Scamardo et al., 2022; Wohl, 2021).

Today, beaver populations are slow to rebound. There are no systematic or widespread beaver population monitoring programs in the United States, however, it is estimated that the current population of beavers in North America is roughly 15–30 million (Naiman et al., 1988). The conversation around beavers has shifted in recent decades, with more emphasis placed on the idea of beavers as river restoration partners rather than as pests or commodities (Johnson et al., 2019; Jordan & Fairfax, 2022; Pollock et al., 2015; Pugh et al., 2022; Skidmore & Wheaton, 2022). Recent research highlights the ability for beaver activity to sequester carbon (Laurel & Wohl, 2019; Wohl, 2013), support sensitive, threatened, and endangered species (Anderson et al., 2015; Bouwes et al., 2016; Dittbrenner et al., 2022; Romansic et al., 2020), attenuate flood waves (A. Puttock et al., 20172021; C. J. Westbrook et al., 2020), keep vegetation green and maintain baseflow during droughts (Fairfax & Small, 2018; Silverman et al., 2019), and create patches of wildfire refugia (Fairfax & Whittle, 2020; Foster et al., 2020; Whipple, 2019; Wohl et al., 2022). These ecosystem services, among others, are monetarily valuable (Thompson et al., 2020), and that value is likely to increase as climate change intensifies.

1.2 Challenges to Understanding Beaver Impacts at the Landscape Scale

Despite the large-scale physical and ecological implications of beavers recolonizing their historic range, most of the research on beaver populations and impacts have been limited to the reach or watershed scale. Relatively few landscape-scale or larger studies have been conducted, and those that have typically required significant time investments for the manual identification and mapping of beaver dams, lodges, or canals using aerial or satellite imagery. This is due in part to the wide variety of shapes and sizes of beaver dams, and the broad distribution of them across topographic, climatic, and ecological settings (Figure 1). While sophisticated habitat suitability models can estimate how many beaver dams a given stream or river can theoretically support based on its physical and ecological setting (Dittbrenner et al., 2018; Macfarlane et al., 2015), there are no models that can identify where beavers have already built their dams and created wetland habitat.

Details are in the caption following the image

Variation in size and shape of beaver dams. Beaver dams naturally occur in a variety of shapes and sizes in many different landscape settings, which makes finding and mapping them out a time-consuming task.

Machine Learning (ML) offers a scalable and efficient approach to identification of beaver dam complexes, allowing for landscape- and regional-scale tracking of expansion and contraction of beaver populations and the resulting ecosystem services provided by their ecosystem engineering. Due to the wide variation in the size, shape, and construction of beaver dams (Karran et al., 2016; Ronnquist & Westbrook, 2021), convolutional deep-learning models on high-resolution imagery and other geospatial data hold great promise for prediction of objects that depend on spatial context—including beaver dams—because of their ability to automatically extract complex, relevant features (Cheng et al., 2017; Kattenborn et al., 2021; Krizhevsky et al., 2012; Minetto et al., 2019; Rezaee et al., 2018). Convolutional neural networks are a class of deep-learning algorithms that specialize in detecting spatial features such as edges, textures, and abstract geometries that describe an object or landform, and then learn iteratively at different spatial scales, which allows them to identify high-level patterns and low-level specific features (Kattenborn et al., 2021). The shape of beaver dams and ponds are partially dependent on the physical setting they reside in, so we hypothesize that a convolutional neural network considering both landform features and landscape context would be able to reliably identify them.

Image recognition ML models are typically evaluated on the accuracy, recall, and precision of their predictions. We consider recall to be the most valuable measure of model success in the context of our study because a high-recall beaver dam detection model will find a large portion of true beaver dams in a landscape. Though high recall typically comes at the cost of precision, it is faster for researchers to discard model-identified false positives than it is to manually search the entire landscape again to find false negatives.

To that end, we developed the Earth Engine Automated Geospatial Element(s) Recognition (EEAGER) model, a convolutional neural network trained to identify the locations of actual beaver complexes within a given search area using high-resolution geospatial imagery. We built the model and trained it on 7,408 known locations of beaver dams throughout the American West, then evaluated its accuracy, recall, and precision both within and outside of our training regions to determine if this model specifically, and these methods in general, are useful for automating the identification of non-uniformly shaped landforms in satellite and aerial imagery.

2 Data Curation and Methodology

Actual beaver dam locations were mapped across the western United States by experts in beaver dam identification. The data were primarily locations collected by researchers for prior beaver-related research. These locations were used to generate training and validation datasets for an image recognition machine learning model. Twelve distinct areas of known beaver occupancy but unmapped dam locations were set aside to use as evaluation polygons in model performance testing. The model we developed, Earth Engine Automated Geospatial Element(s) Recognition (EEAGER), was built as a convolutional neural network model that takes RGB imagery and a digital elevation model as inputs. Then, EEAGER was trained on the input layers and known beaver dam locations throughout the western USA (Figure 2, green shaded areas). The input RGB imagery was collected on varying dates between 2014 and 2022. The model was then tested in the twelve evaluation polygons (Figure 2, black boxes) located both within general training regions and in novel regions, giving insight into the models performance in familiar and unfamiliar landscape settings.

Details are in the caption following the image

Locations of training data and evaluation areas. Geographic distribution of input beaver dam locations (shaded green) and evaluation polygons (black boxes labeled A–L).

2.1 Input Data Collection

2.1.1 Presence Points: Locations With Beaver Dams

We compiled a data set of the location (latitude and longitude) of 13,344 manually mapped beaver dams, spanning seven states in the Western United States. Each datapoint was recorded along with its date of collection, with dates ranging from 2014 to 2021. During the manual beaver dam mapping process, care was taken to thoroughly search reaches and areas of interest such that all beaver dams within a given search area were accounted for. Some beaver dams did not have enough high-quality imagery available to use in the training set, so these 13,344 dams resulted in 8,558 landscape patches containing beaver dams. No data were discarded in this step—the lower number of patches is a result of individual patches containing multiple beaver dams. Figure 2 shows the geographic distribution of dams included in the training data. Although beaver dams were verified to be present by experts in aerial beaver dam mapping, actual beaver presence is unknown in these sites given that dams can persist in the landscape even after beavers move on or are removed from a site.

2.1.2 Pseudo-Absence Points: Likely Locations Without Beaver Dams

To provide negative data to the model, we also collected latitude-longitude points assumed to have no beaver dams, which we call “pseudo-absences.” We used proximity sampling to collect negative data from an annulus with outer radius 10 and 1 km surrounding a positive point. This was chosen to ensure that pseudo-absence points were not so close to the beaver dam as to include other surrounding dams, while still being close enough to the dam to be sampling similar terrain and landscape. The outer radius of the annulus was set to a size large enough to include a diversity of landscape types unsuitable for beaver habitat. See Figure 3 for an illustration of a pseudo-absence sampling annulus.

Details are in the caption following the image

Pseudo-absence sampling design. Conceptual diagram of sampling structure for pseudo-absence data points. A buffer radius of 1 km (solid black line, inner ring) was placed around positive data points (green circles) within which no pseudo-absence points were generated. Then a 10 km radius buffer around each positive data point was generated (dashed black line, outer ring). The area in between these two buffer radii (shaded yellow) was sampled to create the pseudo-absence data.

For the input data set, we created a geometry comprising the union of all annuli, excluding all buffer radii as visualized in Figure 3. Then, within the unioned geometry, for each positive point, we sampled two pseudo-absence points randomly and two additional pseudo-absence points from valley bottoms only (Theobald et al., 2015). This 4:1 pseudo-absence ratio was chosen to accurately represent the characteristics of the landscape around positive points while the preference of pseudo-absences from valleys ensured that our negative data were robust. This approach gave us 53,376 negative data points. To ensure this sampling method did not produce an excessive number of false negatives inadvertently, we manually rated a random sample of 500 pseudo-absence data points. In the rating process, we found only three pseudo-absences to be positive beaver dam sites incorrectly classified as not having any dams present, making the false negative rate in our pseudo-absences only 0.6%.

2.1.3 Assumed-Absence Points: Known Locations Without Beaver Dams

The above annulus-sampling approach was limited to areas within 10 km of a positive point. So, in addition to the randomly sampled pseudo-absences described above, we also identified a total of 44.08 km2 known to have zero beaver dams and then randomly sampled assumed-absence points from those areas. This helped diversify our negative data and make the EEAGER model more robust to false positives. This approach added 3,352 negative data points.

2.1.4 Evaluation Polygons: Areas to Test the Model's Prediction Performance

We evaluated EEAGER's predictions on twelve evaluation polygons in diverse regions of the American West totaling an area of 280 square kilometers and 1,262 known beaver dams (Figure 2, black boxes). The polygons were chosen as diverse landscapes suspected to contain varying densities of beaver dams. The areas, however, were not preselected based on any assumptions of model performance. The density of beaver dams ranged from 1.0 to 46.7 dams per square kilometer in the evaluation polygons. All evaluations were performed using most-recent imagery from 2022. Details of each evaluation polygon are reported in Table 1, alongside with the prediction results.

Table 1. Input Data Partitioning
Data points Total patches Train patches Validation patches
# % # % # % # %
Positive (beaver dams present) 13,344 19.0% 8,558 28.5% 7,408 27.1% 1,149 43.0%
Negative (beaver dams absent) 56,728 81.0% 21,451 71.5% 19,927 72.9% 1,523 57.0%
Total 70,072 100.0% 30,009 100.0% 27,335 100.0% 2,672 100.0%
  • Note. Positive and negative data points in each category.

Some evaluation polygons are within regions containing training data and other polygons are completely unseen in training (Figure 2). The amount of overlap is cataloged in our reported results in Table 1. Because the model's training imagery is sourced primarily between 2014 and 2017, there was often a significant shift in dam locations between the training and evaluation imagery in the same polygon so we did not consider it necessary to have completely separate training and evaluation regions. Evaluation polygons A-E, G, and L evaluate similar landscapes to those that EEAGER was trained with. Polygons F and H-K evaluate completely unseen landscapes for EEAGER (Figure 2).

2.2 Model Development and Training

2.2.1 Input Data and Model Training

In this section, we describe how the data collected are preprocessed to be input into the neural network. First, the data points are spatiotemporally matched to high-resolution imagery in a proprietary image database. Specifically, we used imagery containing red, green and blue (RGB) bands from a variety of sensors, but the resolution was constrained to 0.2 m or better. Each positive dam point was temporally matched with imagery taken from at most 6 months from the date that the point was recorded. All imagery was resampled to 0.2-m pixel resolution and consistent projection (EPSG:4326). Additionally, we supplied slope as an input band, where slope is computed from the 0.2-m pixel resolution digital elevation models (DEMs) in a proprietary database over the given area. The DEMs were collected with LIDAR and photogrammetric methods. We allowed the model to determine its own slope thresholds for beaver dam building activity, though it is generally accepted within the environmental science community that beavers typically prefer low-grade slopes (Macfarlane et al., 2015; Naiman et al., 1988). A 256 × 256 pixel patch was extracted over the center of each training data point. In each patch, we computed the RGB and DEM-derived slope at 0.2 m spatial resolution, resulting in a 4-band (R, G, B, slope) 51.2 m2 multidimensional array centered on each point. When using the model for predictions, we input overlapping 38.4 × 38.4 m patches every 25.6 m to ensure better prediction coverage. Overlapping predictions were dropped. This resulted in a patch area of 25.6 × 25.6 m per prediction. Auxiliary input signals such as Normalized Difference Vegetation Index (NDVI) were not used since the proprietary image database used did not include historical high-resolution NIR imagery for the majority of the dam site data points at the date they were collected (Carlson & Ripley, 1997).

If slope and/or RGB imagery of suitable resolution were not available at the point's location or time of observation, the point was discarded. In total, 32.1% of positive points and 61.2% of negative points were discarded (Table 1). The training points were then partitioned into training and validation sets through a spatial partitioning process (Table 1). An equal-area, hexagonal tessellation method was used, where we created ten spatial partitioning zones (Figure 4). Each zone had an area of approximately 20 km2. All points that fell into one randomly chosen zone were used as validation data and remaining zones were used for training. This ensured that the training points were geographically distinct from the validation points while minimizing the boundary length between zones and therefore spatial autocorrelation between training and validation sets. The final partitioned data quantities are recorded in Table 1.

Details are in the caption following the image

Equal area hexagonal partitioning scheme used to create validation and training sets. Hexagon area is a tunable parameter in the model. Value used in this study is 20 km2.

2.2.2 Model Structure

We developed Earth Engine Automated Geospatial Element(s) Recognition (EEAGER), a convolutional neural network trained to identify the locations of actual beaver complexes within a given search area using high-resolution geospatial imagery. EEAGER takes 51.2-by-51.2 m landscape patches at 0.2 m RGB imagery resolution and makes a prediction of whether a beaver dam is present in the square. EEAGER was built using Google Earth Engine, Google AI Platform, and Google Cloud Storage. A full description of all of the packages and functions used in model development can be found in the demonstration code pipeline archived at https://github.com/google/project-eager.

We employed a ResNet-50-v2 convolutional neural network architecture (He et al., 2016), with max-pooling and a 1 x 1-dimensional convolution at the output to produce a prediction between 0 and 1. Data augmentation was applied at model training time, including horizontal and vertical flips, random rotation up to 5% clockwise or counterclockwise. Prior to model training, both RGB and slope were normalized to a [0,1] range. The validation set (comprising 9% of the data) was used to determine early stopping—that is, when model training was stopped when validation accuracy ceased to increase. A held-out test set of approximately 1,000 points from geographically diverse areas outside of both the training region and the evaluation regions was used to compare models. Hyperparameters were chosen based on a grid search over the test set. We trained for ten epochs over the training data set with a learning rate of 0.00005 and a batch size of two, using Adam (Kingma & Ba, 2014) with the objective of minimizing binary cross-entropy. The model was trained on the Google Cloud platform, using machines outfitted with auto-scaled NVIDIA T4 GPUs, 46 GB RAM and 99 GB disk. Training takes approximately one hour in this framework, including validation evaluation at each epoch.

2.3 Model Evaluation and Statistics

To evaluate the success of the model, we consider its overall accuracy, recall, and precision (Junker et al., 1999; Powers, 2020). Accuracy is the percentage of patches correctly identified (i.e., as beaver-dammed or not beaver-dammed) in the landscape, recall is the percentage of known beaver dams that were correctly identified (i.e., what percentage of the manually mapped dams did the model find), and precision is the percentage of patches identified as beaver dams that were actually beaver dams (i.e., if the model predicts something is a beaver dam, what percentage of the time is it correct). The accuracy, recall, and precision varied significantly between evaluation areas (i.e., some had high recall and high precision, some had low values for both, etc.). These metrics were calculated as follows for each evaluation polygon and overall:
urn:x-wiley:21698953:media:jgrg22493:jgrg22493-math-0001(1)
urn:x-wiley:21698953:media:jgrg22493:jgrg22493-math-0002(2)
urn:x-wiley:21698953:media:jgrg22493:jgrg22493-math-0003(3)
where TN is the number of true negatives (patches without beaver dams), TP is the number of true positives (patches with beaver dams), FN is the number of false negatives (patches predicted to have no beaver dams that actually do have beaver dams), and FP is the number of false positives (patches predicted to have beaver dams that actually do not have beaver dams).
To further understand the model's performance on positive points alone, we also analyze the F1 score, which is defined as the harmonic mean of precision and recall (Equation 4).
urn:x-wiley:21698953:media:jgrg22493:jgrg22493-math-0004(4)

Higher F1 scores indicate stronger holistic model performance (Fawcett, 2006). However, F1 scores are not as useful of a metric in contexts where there are reasons to value either recall or precision more heavily. In our context, we valued recall over precision. We also calculate Cohen's kappa coefficient, κ, to determine if the agreement between the model predictions and the dams manually mapped by trained researchers goes beyond chance (Cohen, 1960). Kappa statistics are calculated in this study using the standard methodology and are interpreted according to the scheme proposed by Landis and Koch (Landis & Koch, 1977) where κ < 0 is no agreement, 0 < κ < 0.20 is slight agreement, 0.21 < κ < 0.40 is fair agreement, 0.41 < κ < 0.60 is moderate agreement, 0.61 < κ < 0.80 is substantial agreement, and 0.81 < κ < 1 is near perfect agreement. In short, higher kappa values indicate better model performance and that it is less likely that the model predictions and the manually mapped dam locations are in agreement by chance.

3 Results

3.1 Model Performance

The accuracy, recall, and precision of EEAGER's overall (all evaluation polygons combined) and individual evaluation polygons are reported in Table 2. We report performance at a model prediction threshold of 0.9 out of 1.0, which was chosen to maximize recall without significant precision losses. The model prediction is a score between 0.0 and 1.0, so our selected threshold of 0.9 means that we only count beaver-dammed areas identified with a score of 0.9 or higher as a predicted positive point (i.e., a landscape patch containing beaver dams). All other points that do not meet that threshold are considered a predicted negative point (i.e., a landscape patch that does not contain beaver dams).

Table 2. Model Performance Metrics and Site Information
Name Location description Area (sq. km) Count of dams Dams/sq. km Pos. training overlap Neg. training overlap Accuracy at 0.9 Recall at 0.9 Precision at 0.9 F1 score at 0.9 Kappa at 0.9
- Overall 280.0 1,262 4.5 85 52 98.50% 63.03% 25.83% 36.64 0.3601 fair
A Colorado, Alpine Meadow 7.5 350 46.7 3 5 95.41% 77.57% 67.22% 72.02 0.6953 substantial
B Colorado, Mountain Foothills 4.7 61 13.0 0 3 97.12% 83.56% 76.25% 79.74 0.7819 substantial
C Colorado, Pine Forest 1.1 8 7.3 0 2 95.24% 66.67% 40.00% 50.00 0.4766 moderate
D Colorado, Mountain Meadow 4.3 29 6.7 1 2 98.27% 45.24% 57.58% 50.67 0.4980 moderate
E Colorado, In-Channel River 9.1 73 8.0 45 3 98.36% 70.94% 25.86% 37.90 0.3725 fair
F Colorado, Floodplain 4.2 77 18.3 0 0 95.94% 6.03% 48.00% 10.71 0.0990 slight
G Colorado, Alpine 20.2 133 6.6 17 8 97.60% 70.04% 31.96% 43.89 0.4284 moderate
H Montana, Riverine/Floodplain 80.0 193 2.4 0 0 96.93% 73.57% 13.39% 22.66 0.2185 fair
I Nevada, Riverine/Floodplain 7.0 19 2.7 0 0 98.66% 51.72% 18.29% 27.03 0.2651 fair
J Oregon, Shrub/Scrub—Fire Scar 70.7 87 1.2 0 0 98.93% 31.86% 6.60% 10.93 0.1063 slight
K Washington, Pine Forest 41.7 43 1.0 0 0 93.18% 51.22% 21.32% 30.11 0.2716 fair
L Wyoming, Shrub/Scrub 29.5 189 6.4 19 29 99.13% 56.23% 58.41% 57.30 0.5686 moderate
  • Note. Model prediction performance metrics for classifying beaver dams in twelve evaluation polygons in the western United States.

The overall accuracy (98.5%), recall (63.03%), and precision (25.83%) at the 0.9 model prediction threshold indicate that our model is successful and useful for identifying beaver dams in satellite and aerial imagery in most cases (e.g., Figure 5). The overall Cohen's kappa coefficient (0.3601) indicates that our model has fair overall agreement with manually mapped beaver dams that generally goes beyond random chance (Cohen, 1960; Landis & Koch, 1977).

Details are in the caption following the image

Example successful EEAGER predictions. (a) An example landscape (within evaluation polygon A) with beaver dams scattered amidst forests, wet meadow, roads, and shrubby hillslopes. (b) The same landscape with each beaver dam manually highlighted with a white line. (c) The same landscape with EEAGER model predicted beaver dam locations shaded with light blue at the 0.9 prediction threshold.

The accuracy of our model is high across all evaluation polygons (98.5%). Thus, it is far better at classifying the landscape as “beaver-dammed” or “not beaver-dammed” than a random coin-flip style prediction model where half the pixels would be arbitrarily labeled as containing beaver dams (∼50%). High accuracy alone, however, does not necessarily indicate a useful model. A hypothetical model coded to classify every pixel as “not beaver-dammed” would appear highly accurate (99.3%) in our evaluation polygons given the relative rarity of beaver dams. However, that model would have a recall of 0% and therefore not be useful. In the case of identifying beaver dams, we determined a useful model would be one that is highly accurate and leans toward over-classifying pixels as beaver-dammed (i.e., high recall and low precision) rather than one that is conservative in its predictions (i.e., low recall and high precision). This decision was made with specific consideration to how an ecologist or other researcher should use EEAGER. It is far faster and easier for a scientist trained to identify beaver dams in aerial imagery to look at a less precise collection of landscape patches predicted to have beaver dams by our model and manually remove the ones that are incorrect than it is for them to look at a more precise collection of landscape patches predicted to have beaver dams but then have to go manually review the entire search area again knowing that the model likely missed many dams. We report the overall precision, recall, and Cohen's kappa coefficient of different model confidence thresholds in Table 3. Users of the model can decide for themselves what balance of recall and precision they want by simply adjusting the prediction threshold when viewing or exporting results.

Table 3. Precision and Recall Tradeoffs
Prediction threshold Recall Precision Cohen's Kappa coefficient
0.5 85.59% 9.89% 0.1670
0.6 81.63% 12.53% 0.2078
0.7 77.94% 16.00% 0.2570
0.8 72.44% 21.13% 0.3076
0.9 63.03% 25.83% 0.3602
0.95 53.36% 29.96% 0.3783
0.97 47.00% 32.57% 0.3797
  • Note. The decreasing recall and increasing precision of the model at different model prediction thresholds.

Because overall precision on evaluation polygons was relatively low at 25.83% at our chosen prediction threshold, we sought to understand the model's false positives. We noticed that many false positives tended to be very close to an actual beaver dam site. To quantify this, we also compared EEAGER predictions against expert-labeled broader beaver-influenced habitat in the evaluation polygons. To be considered beaver-influenced habitat, one or more of the following features had to be visible in the imagery: beaver dams, beaver-dug canals, beaver ponds, or beaver lodges. At the 0.9 prediction threshold, EEAGER's precision was 46.73% overall on beaver habitat, though overall recall fell to 34.98%. Thus, in the evaluation polygons, around half of the original false positive predictions were still in close spatial proximity to a beaver dam and represented landscapes modified by beaver in other ways (canal digging, tree-felling, etc.).

All model assessment metrics vary substantially across our evaluation polygons: accuracy ranges from 93.18% to 99.13%; recall ranges from 6.03% to 83.56%; precision ranges from 6.60% to 76.25%, and kappa ranges from 0.0990 to 0.7819. Evaluation polygons F, H, K, J and I were completely outside of our training data regions (i.e., no spatial overlap with any training data, including pseudo-absences) and also had the lowest precisions and kappa values. Separating our predictions by whether they were within general training regions or not yielded a more nuanced model performance assessment (Figure 6).

Details are in the caption following the image

Model assessment metrics by area. A comparison of model assessment metrics at the 0.9 prediction threshold within the model training region (light green) versus outside of the model training region (gray).

These results suggest that model performance can be improved by incorporating some training data from areas of interest prior to making predictions in that region. The generalizability of the model has a physical interpretation in the context of beaver dam constructions. Though no clear trends in false positives or false negatives were observed between different landscape types, there are likely subtle trends in physical settings from our training data that the model is looking for that are not present in some of the evaluation polygons. In addition to topography, beaver dam form is a function of local climate conditions, hydrologic conditions, vegetation types, proximity to human infrastructure, etc. These are not direct inputs into the model. Thus, if changes in construction based on unaccounted for geographic context results in structural changes to beaver dams, EEAGER will be less able to predict them reliably outside of training regions.

3.2 Model Scalability

To test the practicality of running the model over large areas, we performed county-wide runs for the following counties in the western USA: Carbon (Wyoming, 1,002 km2), Summit (Colorado, 1,603 km2), Tuolumne (California, 5,890 km2), Cache (Utah, 3,038 km2), and Teton (Wyoming, 10,919 km2). The average inference rate to assess an area of interest is 6 km2 per minute using auto-scaled NVIDIA Tesla T4 GPUs. This equates to $100–$1,500 in computing costs to run the model per county, which is significantly lower cost and time requirements than field mapping or helicopter/aerial survey work (Courtois et al., 2003; Giudice et al., 2010; Robel & Fox, 1993), though it should be noted that acquiring very high resolution imagery can be expensive for larger areas of interest. At EEAGER's current prediction rate, the entire United States could be searched in 1,138 days. In practice, however, the actual time to search the United States could be significantly shortened by incorporating pre/post processing search area filtering based on land cover types or by utilizing more computing resources.

3.3 Model Limitations

The accuracy of the EEAGER model is constrained by limitations in training and input data. Specifically, more training data, from a diversity of terrain types and surrounding ecosystem types would likely improve model generalizability. It is important to note that our training data was composed of areas with beaver dams, which is not inclusive of all types of beaver habitat. For example, beavers that live in large rivers typically do not build dams in the main channel of the river where water is already suitable depth, and instead dig burrows directly into riverbanks. If they do build dams, they typically do so in the floodplain in these systems. There were not a significant number of large river floodplain beaver dams in the training data, nor were there beaver habitats without dams present. When we evaluated the model in these types of settings (e.g., evaluation polygons H and I) it overpredicted on channel bends and meanders, resulting in low precision. Similarly, beavers can also live in large pre-existing lakes where they may or may not dam any inlet or outlet streams. The model therefore should not be used to evaluate beaver activity in these settings without additional training.

Not all training data we obtained had corresponding RGB imagery or terrain data at the right time and sufficient resolution, meaning that we discarded over half of the original data. The geographic representation of the training data could also be improved. Currently, most training data are from the western United States. The training data set should be expanded to include beaver ponds from other locations. We expect this would improve model accuracy in wetter climates and allow us to characterize model accuracy in ecotypes not currently represented in the training (and testing) data. We also found that in highly developed, geomorphically complex (e.g., multithread channel morphology) beaver habitat the model often correctly identified portions of beaver dams but failed to detect the entire landform (Figure 7a). This resulted in many false negative points, however, in practice the model still successfully identified the location of the beaver complex, just not every beaver dam within it. Depending on the specific model usage goals, these false negatives may not be overly detrimental.

Details are in the caption following the image

Examples of unsuccessful EEAGER predictions at 0.9 threshold. Top row: false negatives (a) where the model correctly predicted portions of beaver dams but failed to recognize the entire landform and smaller adjacent dams. Bottom row: false positives (b, c, d) that were increasingly difficult (left to right) to evaluate in the context of our study.

The model exhibits mixed behavior on imagery unseen in training data. Including additional verified absence data would reduce misalignment between training data and model prediction usage in practice, lowering the prevalence of blatantly false positives, for example, roofs and parking lots that are clearly not beaver habitat but contain features that visually resemble aspects of beaver dams when taken out of broader context (Figure 7b). There are also cases where the model predicts beaver dams, such as along the edges of reservoirs and retention ponds (Figure 7c), that are typically incorrect. However, beavers do frequently live in and further modify human-built ponds and reservoirs, so these patches should not be automatically discarded or labeled as not containing beavers, especially if the false positive is near actual beaver dams. Finally, the model often overpredicted the area around small, in-channel beaver dams (Figure 7d). While the overpredicted patches clearly do not have beaver dams, they are likely active beaver habitat and have been influenced by beavers in other ways (canals, felled trees, bank burrows, etc.) that are geomorphically and ecohydrologically meaningful. Depending on the goals of individual projects, those areas may not be false positives in practice.

Including more input data types may also be useful for increasing model accuracy. In particular, water is highly absorptive in near-infrared (NIR) and we expect that high-resolution NIR data could improve model accuracy. Although Sentinel-2 data were evaluated as a way to supply NIR data to the model, due to the combination of a limited data record (Sentinel-2 data is only available since early 2017, meaning even more of our training data would be discarded) and substantially lower spatial resolution (10 vs. 0.2 m), we determined that Sentinel-2 was unsuitable for the study presented here (Drusch et al., 2012). Although higher resolution NIR data is available from the National Agriculture Imagery Program (NAIP), the combination of low temporal resolution (approximately 5 years) and lack of availability outside the continental United States also mean that NAIP data was unsuitable. We expect additional features such as NIR, radar backscatter or land cover to improve model accuracy given appropriate spatiotemporal resolution, particularly in areas where color contrast in the RGB imagery is low between beaver wetlands and the adjacent landscape (e.g., evaluation polygon K).

Machine Learning techniques such as transfer learning, fine tuning pre-trained models, hyperparameter tuning and network architectural changes may be leveraged to improve model performance. Although we performed minimal parameter tuning, a full grid-search of the parameter space was beyond the scope of the current project. Further research is needed to understand the effect of reusing pre-trained weights from models trained on natural scenes or overhead imagery. For example, because beavers rapidly transform a variety of landscape types into wetlands, fine-tuning a beaver dam identification model pre-trained on BigEarthNet land classification data (Sumbul et al., 2019) may speed up training and improve accuracy. Additionally, the inclusion of a reconstruction error during training may substantially lower false positives (Noh et al., 2022; Vallez et al., 2021).

In practice, filtering the search area before beginning a search may improve both the accuracy and efficiency of the EEAGER model. Because beavers are most likely to build in certain land cover types and not in others, one could topographically constrain our search to valley bottoms (O'Brien et al., 2019), exclude urban areas as designated in the National Land Cover Database, or otherwise limit the geographic scope of the model to increase its speed and tailor it to specific project needs.

These results indicate that convolutional neural networks can successfully identify beaver dams if given suitable inputs and training data. Our model performs well despite the wide variety of beaver dam shapes, sizes, and landscape settings. This suggests that similarly diverse landforms could also be identified and monitored through these methods, such as riffles, log jams, point bars, or other visually complex geomorphic features.

4 Conclusions

4.1 Model Applications

EEAGER, and in particular its generalizability outside the training regions, has significant room for improvement. Even so, the EEAGER model in its current form is a valuable tool for policymakers, land managers, and scientists tasked with monitoring beavers and their landscape impacts. Though the number of beaver dams in a landscape is not a simple linear function of absolute beaver population, our model gives a baseline from which changes in beaver population, impacts, and spatial distribution can be inferred (Johnson-Bice et al., 2022). This information is key for developing, implementing, and assessing beaver-based restoration programs, as well as for general conservation and population monitoring. Beavers are a keystone species and ecosystem engineers whose presence has been linked to enhanced biodiversity; improved water quality; ecosystem resilience to floods, droughts, and fires; more and larger endangered fishes in rivers; aquifer recharge; and carbon sequestration (Brazier et al., 2021; Jordan & Fairfax, 2022; Larsen et al., 2021). Monitoring their population and impact provides valuable information about numerous interconnected physical and biological processes—not just about the beavers themselves.

Tracking changes in beaver (Castor canadensis) population, distribution, and landscape impacts as they recolonize their historic range in North America is not the only application of our model. The closely related Eurasian beaver (Castor fiber) is currently being reintroduced in Europe as part of reintroduction efforts (Puttock et al., 2015; Wróbel, 2020), invasive introduced beavers (Castor canadensis) in Tierra del Fuego are dramatically modifying Patagonian alpine environments (Anderson et al., 2009; Lizarralde et al., 2004; C. J Westbrook et al., 2017), and beavers in general are recolonizing the Arctic (Jones et al., 2020; Tape et al., 20182022)—an area that has not had large-scale beaver presence since the Pliocene (Davies et al., 2022; Mitchell et al., 2016; Rybczynski, 2008)—as anthropogenic climate change causes tundra ecosystems to shift to shrubland (Heijmans et al., 2022; Henry & Molau, 1997; Jung et al., 2016; Stow et al., 2004). As these large-scale changes take place—whether intentional as in the case of reintroduction and conservation, or unintentional as in the case of beavers tracking climate change northwards—efficiently characterizing the total area of beaver-influenced habitat and its geographic distribution are integral to making informed land management and conservation decisions. Within the limits of model accuracy, predictions can represent baseline data and enable change tracking, especially when coupled with a tiered system in which model predictions are confirmed by trained observers.

Characterizing the area and distribution of beaver impacts should not be a one-time effort. Given the opportunity, beavers can rapidly transform entire watersheds. Infrequent monitoring has a high likelihood of missing crucial beaver population and habitat dynamics. Thus, large scale identification of beaver dams, ponds, and habitat should be repeated as frequently as possible. Previously that would require significant financial resources and extensive trained researcher time to review aerial and satellite imagery or conduct field surveys. With our model, these costs would be heavily reduced. Even with manual review of the model identified features, repeat surveys can be completed much faster compared to a manual identification process. This frees up time for scientists and practitioners to focus their efforts on understanding large-scale processes and impacts rather than performing large scale data collection.

Additionally, our model framework can be used in parallel with capacity models like the Beaver Restoration Assessment Tool (BRAT) to fine-tune beaver-based river restoration practices (Macfarlane et al., 2015). Knowing the actual locations of beaver dams can be used to calibrate, verify, and validate beaver dam capacity estimates. Additionally, it could be used to classify riverscapes as beaver conservation areas, that is, places that have high densities of beaver dams, or as restoration areas, that is, places that have low densities of beaver dams but high modeled capacity for them. In places where there is an archive of historic satellite data and aerial imagery, older input data can be used in the model to reconstruct the recent historic distribution of beaver in areas of interest, and then retrospectively characterize the dispersal patterns and beaver population growth or shrinkage. Those data will enable project managers and scientists to evaluate how beavers responded to land management decisions, conservation efforts, land use change, climate change, or other known environmental changes and disturbances—even if no active beaver monitoring was being conducted on site.

Our model was designed as a tool for restoration, conservation, scientific monitoring, and community outreach. It is intended to support rigorous research as well as help the public learn about and find beavers in their local community. However, as with all public-facing tools that identify the location of animals on public land (e.g., iNaturalist (“iNaturalist,”)), care must be taken to ensure that our model does not get used to facilitate illegal trapping, poaching, or habitat destruction. To that end, the model results are currently accessed by permission only, and limited to researchers and conservation professionals. As more areas are evaluated, the quality of the model predictions will improve, and broad state-level beaver dam distribution trends will be made publicly available.

4.2 Model Importance

Researchers have been manually counting beaver dams for decades to monitor beaver activity and landscape impacts, but that practice is too time and resource intensive to implement frequently and at large spatial scales. There is growing interest in partnering with beavers as a river restoration and climate mitigation strategy. Being able to monitor beaver distribution and impacts at county, region, and continental scales annually or even seasonally as their populations naturally shift and are influenced by our own management activities is critically important. This baseline data and the functionality of the model opens the door for larger scale beaver dam building monitoring, habitat expansion tracking, and beaver-based riverscape restoration evaluation than was previously reasonable.

There is a pressing need to be able to efficiently monitor changes in beaver distribution, dam building activity, and habitat creation as local, state, federal, and international agencies increasingly call for beaver reintroduction and conservation to be included in climate mitigation and adaptation policies. Our model is a key initial step in meeting that need. While beaver dams and ponds identified by our model alone are not a perfect proxy for beaver populations, through repeat monitoring of total beaver influenced habitat, we can characterize trends in the beaver population and habitat - both in specific areas of interest and across the country. Use of the model by scientists and practitioners, with validation against traditional methods (e.g., manual identification in aerial or satellite imagery and field surveys), will increase model robustness.

Scientists are still piecing together fragments of evidence trying to understand the continental scale ecologic, hydrologic, and geomorphic changes driven by the widespread loss of beaver during the Fur Trade. We have the opportunity today to be more proactive in our data collection as beavers recolonize their historic habitat and respond to the stressors of climate change. Our model broadly supports large-scale beaver monitoring efforts—by using it, researchers can spend less time finding the places where beavers are building dams and creating habitat, and more time characterizing the impacts of that behavior on local, regional, and continental-scale ecologic, hydrologic, and geomorphic processes. Policymakers need that kind of data to develop effective land and beaver management strategies. Regardless of if the end goal is wildfire mitigation, carbon sequestration, drought resilience, floodplain reconnection, flood wave attenuation, or biodiversity conservation—the first step in partnering with beavers is knowing where they are building dams, and our model significantly expedites that effort.

Acknowledgments

Authors have no funding sources to report.

    Conflict of Interest

    The authors declare no conflicts of interest relevant to this study.

    Data Availability Statement

    Earth Engine (https://earthengine.google.com/), AI Platform (https://cloud.google.com/ai-platform/docs/technical-overview), and Cloud Storage (https://cloud.google.com/) used to build the EEAGER model are all publicly available products and platforms. A sample of pseudo-absence points, a sample of assumed absence points, a demonstration code pipeline, and the results in our evaluation areas from this research are archived in the CUAHSI HydroShare Repository at Fairfax, E. (2022). EEAGERModel, HydroShare, http://www.hydroshare.org/resource/17c9b293214641bea6948ae0d8facaeb. A fully functional demonstration code pipeline and other model updates and development information can be found in the Google github at https://github.com/google/project-eager. Input DEM and RGB data used in model development are proprietary and only available internally at Google, and as such are not accessible to the public or broader research community. Individuals seeking to recreate this model can do so using publicly available imagery, albeit at lower resolution, or with their own private high-resolution data. The full set of training data are not publicly available due to the ongoing risk of beaver trapping and intentional habitat destruction. However, the results discussed in this study (actual beaver dam locations and model predictions for all of the evaluation polygons) can be viewed via Earth Engine at https://maiman.users.earthengine.app/view/eeager. Additional training data will be made available to individuals within the research and river restoration communities upon receipt of a formal request via email to [email protected].