Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-07T06:23:13.503Z Has data issue: false hasContentIssue false

Assessing the performance and accuracy of invasive plant habitat suitability models in detecting new observations in Wisconsin

Published online by Cambridge University Press:  06 September 2021

Niels Jorgensen
Affiliation:
Former Graduate Student, University of Wisconsin–Madison, Madison, WI, USA
Mark Renz*
Affiliation:
Professor, University of Wisconsin–Madison, Madison, WI, USA
*
Author for correspondence: Mark Renz, University of Wisconsin–Madison, 1575 Linden Drive, Madison, WI 53706. (Email: mrenz@wisc.edu)
Rights & Permissions [Opens in a new window]

Abstract

Land managers require tools that improve understanding of suitable habitat for invasive plants and that can be incorporated into survey efforts to improve efficiency. Habitat suitability models (HSMs) contain attributes that can meet these requirements, but it is not known how well they perform, as they are rarely field-tested for accuracy. We developed ensemble HSMs in the state of Wisconsin for 15 species using five algorithms (boosted regression trees, generalized linear models, multivariate regression splines, MaxEnt, and random forests), evaluated performance, determined variables that drive suitability, and tested accuracy. All models had good model performance during the development phase (Area Under the Curve [AUC] > 0.7 and True Skills Statistic [TSS] > 0.4). While variable importance and directionality was species specific, the most important predictor variables across all of the species’ models were mean winter minimum temperatures, total summer precipitation, and tree canopy cover. Post model development, we obtained 5,005 new occurrence records from community science observations for all 15 focal species to test the models’ abilities to accurately predict results. Using a correct classification rate of 80%, just 8 of the 15 species correctly predicted suitable habitat (α ≤ 0.05). Exploratory analyses found the number of reporters of these new data and the total number of new occurrences reported per species contributed to increasing correct classification. Results suggest that while some models perform well on evaluation metrics, relying on these metrics alone is not sufficient and can lead to errors when utilized for surveying. We recommend any model should be tested for accuracy in the field before use to avoid this potential issue.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of the Weed Science Society of America
Figure 0

Table 1. Predictors used for model development, including data source and whether postprocessing of the data was required.a

Figure 1

Table 2. Species modeled, including common and scientific names, number of occurrence records used to develop the models, and how many counties in the state those points came from (an indication of the spatial extent of these data).

Figure 2

Figure 1. Occurrence records for model development and assessment (A), the binary ensemble habitat suitability model (HSM) created using five model algorithms (B), and a final binary HSM representation (C) of the ensemble model for Pastinaca sativa in the state of Wisconsin.

Figure 3

Table 3. Number of species models for which each input was within the top three most important variables.a

Figure 4

Table 4. Species used to test the accuracy of community scientist observations (n = 15) and how many of the new points (Figure 1; Supplementary Figures 114) were correctly classified as suitable by the habitat models.a

Figure 5

Figure 2. Partial dependence plots (PDPs) for the top two most important variables in a random forest model to determine drivers of accurate classification from community scientist observations of 15 habitat suitability models for individual invasive plant species in Wisconsin. A PDP depicts the marginal impact a single variable has on the result of a machine learning algorithm when all other variables are held at their mean values. The number of independently collected occurrence records (A) not used to train the original habitat models was the most important variable, followed closely by the number of reporters (B) of these new occurrence records.

Supplementary material: File

Jorgensen and Renz supplementary material

Jorgensen and Renz supplementary material

Download Jorgensen and Renz supplementary material(File)
File 10.4 MB