These two species of warbler are not likely to be found together in the same habitat. (L) Black-throated Green Warbler by Ian Davies. (R) Yellow Warbler by Brian E. Kushner. (Image: Cornell)

It begins with the more than 900,000 birders who report their sightings to the Cornell Lab of Ornithology's eBird program. When combined with innovations in technology and AI — the same innovations that power self-driving cars and real-time language translation — these sightings are revealing more than ever about patterns of bird biodiversity and the processes that underlie them.

“This method uniquely tells us which species occur where, when, with what other species, and under what environmental conditions,” said Lead Author Courtney Davis. “With that type of information, we can identify and prioritize landscapes of high conservation value — vital information in this era of ongoing biodiversity loss.”

Here is an exclusive Tech Briefs interview — edited for length and clarity — with Davis.

Tech Briefs: What was the catalyst for developing this technology?

Davis: Biodiversity loss is accelerating globally, but we lack robust information on species diversity (i.e., species richness, the number of species in a local community) and composition (i.e., the identity of species present in a local community) in most regions of the world because of data limitations and sampling biases. Fortunately, participation in digital citizen science data collection programs has grown exponentially in recent years, helping to fill in data gaps in many areas of the world.

The scale and amount of information provided by these large-scale data collection efforts is unparalleled. Leveraging such large observational datasets in combination with large feature sets (e.g., remote sensing data products describing environmental conditions) requires technologies that are ecologically relevant, computationally efficient, and allow for complex, nonlinear associations and feature interactions. Our paper is the result of a long-standing collaboration between the Institute of Computational Sustainability and Cornell University that is focused on this very task, with the goal of providing new, robust technologies and data-driven science to meet global information needs on biodiversity.

Tech Briefs: Can you explain in simple terms how it works?

Davis: The model we use (DMVP-DRNets) is a deep learning implementation of the joint species distribution models developed and widely applied in ecology. Joint species distribution models decompose the spatial distributions of multiple species into shared environmental affinities and residual patterns of co-occurrence, thereby accounting for the interspecific interactions that, in addition to environmental features, influence what species can occur where and when.

To mirror that general model structure, DMVP-DRNets employs a 3-layer-fully-connected network encoder to learn the relative importance of a large number of environmental features and generates a two-part structured latent space to express species’ environmental associations as well as the interactions among species. DMVP-DRNets produces three outputs: 1) environmental association embeddings, which capture the multivariate associations of different environmental covariates, and interactions among these, on species’ occurrences; 2) interactive association embeddings, which capture interactions among species; and 3) estimates of joint species occurrence probabilities across the study extent, which can be summarized at both the species- and community-level (e.g., to map species-specific distributions or species richness).

Tech Briefs: What were the biggest technical challenges you faced while developing it?

Davis: The biggest challenge was developing a technology capable of scaling to the large numbers of species, locations, sample sizes, and environmental predictors necessary for broad-scale applications. This is not a trivial task and hasn’t — until now — been achieved in the ecological literature. Most widely used joint species distribution models have been developed and applied with a focus on statistical inference, restricting the amount of information used to describe environments to a relatively small number of linear predictors. This constraint limits the range and amount of environmental data that can be used in these models, which can reduce the accuracy of predictions and confound the effects of shared environmental affinities and species interactions.

A key advantage of deep learning in this context is the ability to incorporate large, complex environmental data sets, allowing for a more accurate characterization of the processes that structure entire ecological communities. Deep learning can also isolate patterns that are shared by multiple species, thereby improving predictions across all species but particularly those that are detected less frequently.

Another technical challenge that arises with scaling to a large number of species occurs when integrating the model likelihood over a constrained multidimensional space of latent variables. The DMVP-DRNets combines and builds on previous work in Computer Science (e.g., deep multi-species embedding, end-to-end learning) to overcome these challenges, including the use of efficient parallel sampling processes that can be implemented on GPUS thereby making even broad-scale applications such as ours computationally feasible.

Tech Briefs: “The scientists are working now to make this method's outputs available to a broad array of users so they don't need computational expertise to reap the benefits…” How is that coming along? Any updates you can share?

Davis: We plan to develop and release new biodiversity data products on species richness that will be freely and publicly available alongside other products on species’ distributions and abundances currently produced by the Cornell Lab of Ornithology (science.ebird.org/en/status-and-trends). The timing of this release is still TBD but will likely occur in 2024.

Tech Briefs: Co-Author Carla Gomes is quoted as saying, “…we are also developing models to estimate bird abundance — the number of individual birds per species. We’re also aiming to enhance the model by incorporating bird calls alongside visual observations." Do you have any plans for further research/developing this tech? In other words, what are your next steps?

Davis: We see immense value in being able to extend this approach to also estimate species’ abundances.

Estimating abundance (i.e., expected count of individuals) is more difficult than estimating species’ occurrence (i.e., presence/absence), particularly for diverse ecological communities with many rare or infrequently observed species, but it is much more informative for guiding conservation action and decision-making. Extending this framework to accommodate multiple data modalities, including visual and audio observations of birds, is another ongoing area of research that has the potential to help fill spatial, temporal, and taxonomic data gaps across continental and hemispherical extents.

Tech Briefs: Do you have any advice for engineers aiming to bring their ideas to fruition?

Davis: Meeting our collective societal and ecological challenges will require the innovative solutions that arise from interdisciplinary collaboration. This work was only possible because our team brought together knowledge and expertise across multiple domains — ecology, computer science, materials science, and computational sustainability. Finding ways to inspire and support diverse teams and collaborations in their own work is critical, no matter the discipline.