Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-07T01:21:58.012Z Has data issue: false hasContentIssue false

A Neurosymbolic Framework for Bias Correction in Convolutional Neural Networks

Published online by Cambridge University Press:  15 January 2025

PARTH PADALKAR
Affiliation:
The University of Texas at Dallas, Richardson, TX, USA (e-mail: parth.padalkar@utdallas.edu)
NATALIA ŚLUSARZ
Affiliation:
Heriot-Watt University, Edinburgh, UK (e-mail: nd1@hw.ac.uk)
EKATERINA KOMENDANTSKAYA
Affiliation:
Southampton University, Heriot-Watt University, Edinburgh, UK (e-mail: e.komendantskaya@soton.ac.uk)
GOPAL GUPTA
Affiliation:
The University of Texas at Dallas, Richardson, TX, USA (e-mail: gupta@utdallas.edu)
Rights & Permissions [Opens in a new window]

Abstract

Recent efforts in interpreting convolutional neural networks (CNNs) focus on translating the activation of CNN filters into a stratified Answer Set Program (ASP) rule-sets. The CNN filters are known to capture high-level image concepts, thus the predicates in the rule-set are mapped to the concept that their corresponding filter represents. Hence, the rule-set exemplifies the decision-making process of the CNN w.r.t the concepts that it learns for any image classification task. These rule-sets help understand the biases in CNNs, although correcting the biases remains a challenge. We introduce a neurosymbolic framework called NeSyBiCor for bias correction in a trained CNN. Given symbolic concepts, as ASP constraints, that the CNN is biased toward, we convert the concepts to their corresponding vector representations. Then, the CNN is retrained using our novel semantic similarity loss that pushes the filters away from (or toward) learning the desired/undesired concepts. The final ASP rule-set obtained after retraining, satisfies the constraints to a high degree, thus showing the revision in the knowledge of the CNN. We demonstrate that our NeSyBiCor framework successfully corrects the biases of CNNs trained with subsets of classes from the Places dataset while sacrificing minimal accuracy and improving interpretability.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Fig 1. The neSyFOLD framework.

Figure 1

Fig. 2. Semantic labeling of a predicate.

Figure 2

Fig 3. The NeSyBiCor framework. Note that the cross-entropy loss is calculated after the fully connected layer while the semantic similarity loss is calculated by using the filter output feature maps of the last convolution layer.

Figure 3

Table 1. The desired and undesired concepts for the classes of images in places dataset

Figure 4

Fig. 4. The initial and final rule-sets after applying the NeSyBiCor framework on the CNNs trained on des (RULE-SET 1) and defs (RULE-SET 2).

Figure 5

Table 2. Comparison between the rule-set generated before (Vanilla) and after (NeSyBiCor) bias correction with the NeSyBiCor framework. Mean Stats shows the average value for each evaluation metric. Bold values are better