Hostname: page-component-6766d58669-h8lrw Total loading time: 0 Render date: 2026-05-14T15:08:32.997Z Has data issue: false hasContentIssue false

OpenForest: a data catalog for machine learning in forest monitoring

Published online by Cambridge University Press:  27 February 2025

Arthur Ouaknine*
Affiliation:
School of Computer Science, McGill University, Montréal, QC, Canada Mila, Quebec AI Institute, Montréal, QC, Canada
Teja Kattenborn
Affiliation:
Remote Sensing Centre for Earth System Research, Leipzig University, Leipzig, Germany German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
Etienne Laliberté
Affiliation:
Institut de recherche en biologie végétale, Département de sciences biologiques, Université de Montréal, Montréal, QC, Canada
David Rolnick
Affiliation:
School of Computer Science, McGill University, Montréal, QC, Canada Mila, Quebec AI Institute, Montréal, QC, Canada
*
Corresponding author: Arthur Ouaknine; Email: arthur.ouaknine@mila.quebec

Abstract

Forests play a crucial role in the Earth’s system processes and provide a suite of social and economic ecosystem services, but are significantly impacted by human activities, leading to a pronounced disruption of the equilibrium within ecosystems. Advancing forest monitoring worldwide offers advantages in mitigating human impacts and enhancing our comprehension of forest composition, alongside the effects of climate change. While statistical modeling has traditionally found applications in forest biology, recent strides in machine learning and computer vision have reached important milestones using remote sensing data, such as tree species identification, tree crown segmentation, and forest biomass assessments. For this, the significance of open-access data remains essential in enhancing such data-driven algorithms and methodologies. Here, we provide a comprehensive and extensive overview of 86 open-access forest datasets across spatial scales, encompassing inventories, ground-based, aerial-based, satellite-based recordings, and country or world maps. These datasets are grouped in OpenForest, a dynamic catalog open to contributions that strives to reference all available open-access forest datasets. Moreover, in the context of these datasets, we aim to inspire research in machine learning applied to forest biology by establishing connections between contemporary topics, perspectives, and challenges inherent in both domains. We hope to encourage collaborations among scientists, fostering the sharing and exploration of diverse datasets through the application of machine learning methods for large-scale forest monitoring. OpenForest is available at the following url: https://github.com/RolnickLab/OpenForest.

Information

Type
Data Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Overview of forest monitoring topics and challenges associated with machine learning perspectives and challenges. Note: Each forest monitoring topic and its challenges are detailed with their corresponding section number (in red). They are associated with the three main machine learning perspectives and challenge categories, namely generalization, limited data, and domain-specific objectives, along with their corresponding section number (in red).

Figure 1

Figure 2. Illustration of forest monitoring datasets at different scales. Note: Inventories are in situ measurements realized at the tree level. Ground-based datasets are recorded within or below the canopy of the trees. Aerial datasets are composed of recordings from sensors mounted on unoccupied (drones) or occupied aircrafts. Satellite datasets are collected from sensors mounted on satellites orbiting the Earth. Map datasets are generated at the country or world level using datasets at the aerial or satellite scales.

Figure 2

Figure 3. Distribution of the reviewed open-access forest datasets. Note: (Left) World map of the location of the reviewed datasets at the country level. Most of the datasets are regional and do not reflect the entire associated country. The datasets categorized with a “Worldwide” location or at the continent level have been excluded for visualization purposes. (Right) Distributions of the publication years and recording years used and/or released in the associated datasets.

Figure 3

Table 1. Review of open-access forest inventories datasets

Figure 4

Table 2. Review of open-access ground-based forest datasets

Figure 5

Table 3. Review of open-access aerial forest datasets

Figure 6

Table 4. Review of open-access satellite forest datasets before 2020 (included)

Figure 7

Table 5. Review of satellite recording datasets after 2021 (included)

Figure 8

Table 6. Review of open-access map forest datasets before 2019 (included)

Figure 9

Table 7. Review of open-access map forest datasets after 2020 (included)

Figure 10

Table 8. Review of open-access mixed forest datasets, including inventories and aerial-based (IA); inventories, aerial-based, and satellite-based (IAS); and inventories and maps (IM)

Figure 11

Table 9. Review of open-access mixed forest datasets, including ground-based and aerial-based (GA); aerial-based and satellite-based (AS); aerial-based and maps (AM); and satellite-based and maps (SM)

Author comment: OpenForest: a data catalog for machine learning in forest monitoring — R0/PR1

Comments

Dear Editors.

We are writing to submit our manuscript “OpenForest: A data catalogue for machine learning in forest monitoring” to the Environmental Data Science journal.

In the context of a climate emergency, there is an urgent need to monitor forests worldwide.

This is essential for maintaining ecological equilibrium, as it helps mitigate human impacts and enhances our comprehension of forest composition.

This work aims to foster interest among both the machine learning and the forest biology communities regarding ongoing research topics and challenges for forest monitoring.

Forest biology research topics and their current challenges are discussed to target potential areas for future research within the community.

Machine learning methods are also introduced bringing the potential to explore and tackle forest biology challenges.

As both biology challenges and machine learning methods require a large source of available data, a clear review of open source datasets is also proposed.

To highlight and increase the research trend in these fields, the OpenForest dynamic catalog is publicly released to centralize all open source available datasets for forest monitoring while being open to updates by the community.

The overall objective of this work is to foster communication, inspire new applications of machine learning in forest monitoring, and motivate advancements in this field.

Review: OpenForest: a data catalog for machine learning in forest monitoring — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

The authors present a new dataset of open data on forests that can be useful for machine and deep learning applications. This effort is excellent and highly needed as forest are becoming a key element of discussion in climate mitigation and adaptation.

The article is very well written and will be quite well received by the remote sensing and forestry community.

I have only few minor comments on the introductory part and second section:

Line 47 – it should be mentioned that the projected carbon sequestration related to the use of forest as nature based solution can be limited by the climate effects on forest that we are witnessing, so those numbers require constant updates.

https://www.nature.com/articles/s41467-018-05132-5

Figure 1 – I wonder if within the challenges one should merge biotic and abiotic factors into 1 category “disturbance type detection”

Figure 1 – The forest biomass is not listed as challenge. Although at a certain spatial resolution this is true, biomass estimation at high resolution, and most important biomass stock changes are key challenges for the remote sensing community, and very important both scientifically but also for monitoring reporting and verification activities in policy and carbon credit context

Figure 1 – monitoring forest management is also a key challenge in my opinion missing

I would also suggest to highlight these challenges in the section 2.1

Page 7 line 4

Clarify what do you mean by “these” are traits or ecosystem functional properties?

I would merge section 2.2.3 and 2.2.4 into one section and call it monitoring forest stress and attribution to biotic and abiotic factors. In my opinion, monitoring the effects of these factors on forest functioning is relatively straightforward, but it is the attribution of the observed stress to the cause the main area we need to develop, also to be able to map the type of disturbance at large scale

I found the machine learning section very well structured.

For the Inventories the authors can consider to access some inventories such as the one from Sweden and Catalonia, which can be freely requested

Recommendation: OpenForest: a data catalog for machine learning in forest monitoring — R0/PR3

Comments

Dear authors,

you probably saw my earlier message about having problems finding reviewers. We finally got one back and for data papers we only need one review. The reviewer is very enthusiastic about the paper and I personally agree with this assessment. I would encourage you to address the minor issues raised so that we can proceed with the publication of this paper.

Best regards,

Miguel Mahecha

Decision: OpenForest: a data catalog for machine learning in forest monitoring — R0/PR4

Comments

No accompanying comment.

Author comment: OpenForest: a data catalog for machine learning in forest monitoring — R1/PR5

Comments

Dear Editors.

We are writing to submit our manuscript “OpenForest: A data catalogue for machine learning in forest monitoring” to the Environmental Data Science journal.

In the context of a climate emergency, there is an urgent need to monitor forests worldwide.

This is essential for maintaining ecological equilibrium, as it helps mitigate human impacts and enhances our comprehension of forest composition.

This work aims to foster interest among both the machine learning and the forest biology communities regarding ongoing research topics and challenges for forest monitoring.

Forest biology research topics and their current challenges are discussed to target potential areas for future research within the community.

Machine learning methods are also introduced bringing the potential to explore and tackle forest biology challenges.

As both biology challenges and machine learning methods require a large source of available data, a clear review of open source datasets is also proposed.

To highlight and increase the research trend in these fields, the OpenForest dynamic catalogue is publicly released to centralize all open source available datasets for forest monitoring while being open to updates by the community.

The overall objective of this work is to foster communication, inspire new applications of machine learning in forest monitoring, and motivate advancements in this field.

Review: OpenForest: a data catalog for machine learning in forest monitoring — R1/PR6

Conflict of interest statement

Reviewer declares none.

Comments

The authors addressed the comments and I am fine with the manuscript

Recommendation: OpenForest: a data catalog for machine learning in forest monitoring — R1/PR7

Comments

Dear Dr. Ouaknine: The reviewer and myself are happy with the revisions; from my side the paper is ready to go to production. Many thanks for submitting the paper to us.

Best wishes, Miguel Mahecha

Decision: OpenForest: a data catalog for machine learning in forest monitoring — R1/PR8

Comments

No accompanying comment.