Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-17T07:30:46.543Z Has data issue: false hasContentIssue false

Building occupancy type classification and uncertainty estimation using machine learning and open data

Published online by Cambridge University Press:  10 February 2025

Tom Narock*
Affiliation:
Center for Natural, Computer, and Data Sciences, Goucher College, Baltimore, MD, USA
J. Michael Johnson
Affiliation:
Lynker, Boulder, CO, USA
Justin Singh-Mohudpur
Affiliation:
Lynker, Boulder, CO, USA
Arash Modaresi Rad
Affiliation:
School of Computing, Boise State University, Boise, ID, USA
*
Corresponding author: Tom Narock; Email: thomas.narock@goucher.edu

Abstract

Federal and local agencies have identified a need to create building databases to help ensure that critical infrastructure and residential buildings are accounted for in disaster preparedness and to aid the decision-making processes in subsequent recovery efforts. To respond effectively, we need to understand the built environment—where people live, work, and the critical infrastructure they rely on. Yet, a major discrepancy exists in the way data about buildings are collected across the United SStates There is no harmonization in what data are recorded by city, county, or state governments, let alone at the national scale. We demonstrate how existing open-source datasets can be spatially integrated and subsequently used as training for machine learning (ML) models to predict building occupancy type, a major component needed for disaster preparedness and decision -making. Multiple ML algorithms are compared. We address strategies to handle significant class imbalance and introduce Bayesian neural networks to handle prediction uncertainty. The 100-year flood in North Carolina is provided as a practical application in disaster preparedness.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open data
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. Features utilized in our machine learning training data

Figure 1

Table 2. Summary statistics of building occupancy types in North Carolina

Figure 2

Table 3. Building occupancy type distribution for binary classification

Figure 3

Figure 1. Precision recall curve for random forest threshold moving.

Figure 4

Table 4. Results from binary classification experiments

Figure 5

Figure 2. Confusion matrix for neural network binary classification. Values are listed as proportions.

Figure 6

Figure 3. Permutation feature importance. Features are shown in ascending order of importance. The bars show the mean accuracy decrease after 10 repetitions with error bars displaying the standard deviation of the 10 repetitions. Building types and income values are county aggregates obtained from the US Census. Percent impervious is the percentage of area surrounding the unknown building comprised of impervious materials.

Figure 7

Table 5. Results from multiclass classification experiments

Figure 8

Figure 4. Confusion matrix for multiclass random forest.

Figure 9

Figure 5. Confusion matrices from the multiclass Bayesian Neural Network. On the left, results when all predictions are kept. On the right, results when prediction uncertainty is below 0.4.

Figure 10

Figure 6. Predicted residential flooding for buildings with previously unknown occupancy type.

Figure 11

Figure 7. Predicted commercial flooding for buildings with previously unknown occupancy type.