Hostname: page-component-77f85d65b8-t6st2 Total loading time: 0 Render date: 2026-03-27T04:35:14.800Z Has data issue: false hasContentIssue false

Symbolic Rule Extraction From Attention-Guided Sparse Representations in Vision Transformers

Published online by Cambridge University Press:  08 September 2025

PARTH PADALKAR
Affiliation:
The University of Texas at Dallas, USA (e-mails: parth.padalkar@utdallas.edu, gupta@utdallas.edu)
GOPAL GUPTA
Affiliation:
The University of Texas at Dallas, USA (e-mails: parth.padalkar@utdallas.edu, gupta@utdallas.edu)
Rights & Permissions [Opens in a new window]

Abstract

Recentneuro-symbolic approaches have successfully extracted symbolic rule-sets from Convolutional Neural Network-based models to enhance interpretability. However, applying similar techniques to Vision Transformers (ViTs) remains challenging due to their lack of modular concept detectors and reliance on global self-attention mechanisms. We propose a framework for symbolic rule extraction from ViTs by introducing a sparse concept layer inspired by Sparse Autoencoders (SAEs). This linear layer operates on attention-weighted patch representations and learns a disentangled, binarized representation in which individual neurons activate for high-level visual concepts. To encourage interpretability, we apply a combination of L1 sparsity, entropy minimization, and supervised contrastive loss. These binarized concept activations are used as input to the FOLD-SE-M algorithm, which generates a rule-set in the form of a logic program. Our method achieves a better classification accuracy than the standard ViT while enabling symbolic reasoning. Crucially, the extracted rule-set is not merely post-hoc but acts as a logic-based decision layer that operates directly on the sparse concept representations. The resulting programs are concise and semantically meaningful. This work is the first to extract executable logic programs from ViTs using sparse symbolic representations, providing a step forward in interpretable and verifiable neuro-symbolic AI.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NoDerivatives licence (https://creativecommons.org/licenses/by-nd/4.0/), which permits re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Fig 1. The NeSyViT framework.

Figure 1

Table 1. Comparison between the NeSyViT and the Vanilla ViT. Bold values are better

Figure 2

Table 2. Comparison of relative % accuracy change w.r.t. the vanilla model and rule-set size between NeSyFOLD and NeSyViT

Figure 3

Fig 2. a) The top-5 images overlayed with activation heat-maps for neurons 106, 43 and 105 when rules are extracted for the P3.1 dataset containing classes “bathroom,” “bedroom” and “kitchen.” b) The raw rule-set and the labelled rule-set when NeSyViT is employed on the P3.1 dataset.

Supplementary material: File

Padalkar and Gupta supplementary material

Padalkar and Gupta supplementary material
Download Padalkar  and Gupta supplementary material(File)
File 1.3 MB