Hostname: page-component-77f85d65b8-8v9h9 Total loading time: 0 Render date: 2026-03-28T17:42:15.078Z Has data issue: false hasContentIssue false

FOLD-RM: A Scalable, Efficient, and Explainable Inductive Learning Algorithm for Multi-Category Classification of Mixed Data

Published online by Cambridge University Press:  01 July 2022

HUADUO WANG
Affiliation:
The University of Texas at Dallas, Richardson, USA (e-mails: huaduo.wang@utdallas.edu, farhad.shakerin@utdallas.edu, gupta@utdallas.edu)
FARHAD SHAKERIN
Affiliation:
The University of Texas at Dallas, Richardson, USA (e-mails: huaduo.wang@utdallas.edu, farhad.shakerin@utdallas.edu, gupta@utdallas.edu)
GOPAL GUPTA
Affiliation:
The University of Texas at Dallas, Richardson, USA (e-mails: huaduo.wang@utdallas.edu, farhad.shakerin@utdallas.edu, gupta@utdallas.edu)
Rights & Permissions [Opens in a new window]

Abstract

FOLD-RM is an automated inductive learning algorithm for learning default rules for mixed (numerical and categorical) data. It generates an (explainable) answer set programming (ASP) rule set for multi-category classification tasks while maintaining efficiency and scalability. The FOLD-RM algorithm is competitive in performance with the widely used, state-of-the-art algorithms such as XGBoost and multi-layer perceptrons, however, unlike these algorithms, the FOLD-RM algorithm produces an explainable model. FOLD-RM outperforms XGBoost on some datasets, particularly large ones. FOLD-RM also provides human-friendly explanations for predictions.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Algorithm 1 FOLD-R++ Algorithm: Information Gain function

Figure 1

Table 1. Left: Comparisons between a numerical value and a categorical value. Right: Evaluation and count for literal$(i,>,4)$.

Figure 2

Algorithm 2 FOLD-R++ Algorithm, Find Best Literal function

Figure 3

Table 2. Left: Examples and values on ith feature. Right: positive/negative count and prefix sum on each value

Figure 4

Table 3. The info gain on ith feature with given examples

Figure 5

Algorithm 3 FOLD-R++ Algorithm

Figure 6

Algorithm 4 FOLD-RM Algorithm

Figure 7

Table 4. Comparison of XGBoost and FOLD-R++ on various Datasets

Figure 8

Table 5. The size and label distribution of UCI datasets

Figure 9

Table 6. Comparison of FOLD-RM and XGBoost on UCI Datasets

Figure 10

Table 7. Comparison of FOLD-RM and MLP on UCI Datasets

Figure 11

Table 8. Comparison of RIPPER and FOLD-RM on UCI Datasets