Hostname: page-component-77f85d65b8-jkvpf Total loading time: 0 Render date: 2026-03-27T20:09:47.246Z Has data issue: false hasContentIssue false

CON-FOLD Explainable Machine Learning with Confidence

Published online by Cambridge University Press:  28 October 2024

LACHLAN MCGINNESS
Affiliation:
School of Computer Science, ANU and CSIRO/Data61, Canberra, Australia (e-mail: lachlan.mcginness@anu.edu.au)
PETER BAUMGARTNER
Affiliation:
CSIRO/Data61 and School of Computer Science, ANU, Canberra, Australia (e-mail: peter.baumgartner@data61.csiro.au)
Rights & Permissions [Opens in a new window]

Abstract

FOLD-RM is an explainable machine learning classification algorithm that uses training data to create a set of classification rules. In this paper, we introduce CON-FOLD which extends FOLD-RM in several ways. CON-FOLD assigns probability-based confidence scores to rules learned for a classification task. This allows users to know how confident they should be in a prediction made by the model. We present a confidence-based pruning algorithm that uses the unique structure of FOLD-RM rules to efficiently prune rules and prevent overfitting. Furthermore, CON-FOLD enables the user to provide preexisting knowledge in the form of logic program rules that are either (fixed) background knowledge or (modifiable) initial rule candidates. The paper describes our method in detail and reports on practical experiments. We demonstrate the performance of the algorithm on benchmark datasets from the UCI Machine Learning Repository. For that, we introduce a new metric, Inverse Brier Score, to evaluate the accuracy of the produced confidence scores. Finally, we apply this extension to a real-world example that requires explainability: marking of student responses to a short answer question from the Australian Physics Olympiad.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Algorithm 1 CON-FOLD

Figure 1

Fig. 1. This toy example illustrates the difference between the FOLD-RM and CON-FOLD core algorithms. Both produce rules of the form shown. CON-FOLD would not consider the Flamingo as part of the data to fit when generating rule 2. FOLD-RM would consider the Flamingo. Note that in many cases, both algorithms would generate an abnormal rule ab(X) :- flamingo(X), preventing the Flamingo from being covered by the first rule. In this case, both FOLD-RM and CON-FOLD would include the Flamingo. When harsh pruning occurs and there are few abnormal rules, this subtle change becomes noticeable.

Figure 2

Algorithm 2 Evaluate Exceptions

Figure 3

Fig. 2. Scatter plot of the accuracy and number of rules for a ruleset generated by the pruning algorithm with different values of the improvement threshold and the confidence threshold. Each point has two circles. The background circle displays the number of rules and accuracy for no pruning; therefore, all background circles are the same. The front circle displays the rules and accuracy when pruning is applied. The accuracy is indicated by the color shown in the scale bar on the right-hand side. Pruning conditions that are more accurate than the unpruned condition are indicated with a black dot in the center. The number of rules is indicated by the area of the circle (equal amount of ink for number of rules), normalized by the number of rules in the unpruned case. The results shown for both the accuracy and the number of rules are the average of 300 trial runs for each test condition.

Figure 4

Table 1. Accuracy, runtime, and number of rules and predicates for different methods and different benchmarks from the UCI repository. Pruning hyperparameters are provided for the CON-FOLD algorithm. The uncertainty values provided are the standard deviations from 30 trial runs. FOLD-SE results are taken from Wang and Gupta (2023)

Figure 5

Fig. 3. Plot of IBS against the percentage of data included in the stratified training data for the E.coli UCI dataset. Thirty trials for each condition were performed, and error bars indicate one standard deviation across the trials. Pruned CON-FOLD used a confidence threshold of $0.65$ and a pruning threshold of $0.07$.

Figure 6

Fig. 4. Each of the plots shows the performance of models using the Inverse Brier Score metric with different amounts of training data. Plots a and b show the regimes where large amounts of training data are available, while plots e and f explore model performance with very small amounts of training data available. Plots a, c, and e use automatic feature extraction, while plots b, d, and f use manual feature extraction using regular expressions, which allows for domain knowledge in the form of a marking scheme to be included. The total number of student responses was $n=1525$.