Hostname: page-component-5db58dd55d-qmkzp Total loading time: 0 Render date: 2026-06-03T16:51:46.106Z Has data issue: false hasContentIssue false

Uniform convergence of adversarially robust classifiers

Published online by Cambridge University Press:  24 November 2025

Rachel Morris*
Affiliation:
North Carolina State University, Raleigh, NC, USA
Ryan Murray
Affiliation:
North Carolina State University, Raleigh, NC, USA
*
Corresponding author: Rachel Morris; Email: rachel.morris@mail.concordia.ca
Rights & Permissions [Opens in a new window]

Abstract

In recent years, there has been significant interest in the effect of different types of adversarial perturbations in data classification problems. Many of these models incorporate the adversarial power, which is an important parameter with an associated trade-off between accuracy and robustness. This work considers a general framework for adversarially perturbed classification problems, in a large data or population-level limit. In such a regime, we demonstrate that as adversarial strength goes to zero that optimal classifiers converge to the Bayes classifier in the Hausdorff distance. This significantly strengthens previous results, which generally focus on $L^1$-type convergence. The main argument relies upon direct geometric comparisons and is inspired by techniques from geometric measure theory.

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. This diagram illustrates the sets present in the energy exchange inequality for the adversarial training problem (3) when $E = B_{\mathrm{d}}(R)$. The sets comprising $\varepsilon \mathrm{Per}_{\varepsilon }(A;\,B_{\mathrm{d}}(R))$ are shaded blue and purple, whereas the sets comprising $\varepsilon \mathrm{Per}_{\varepsilon }(B_{\mathrm{d}}(R)^{\mathsf{c}};\,A)$ are shaded pink and purple.

Figure 1

Table 1. This table defines the 13 $U_i$ sets and exhibits all possible conclusions about the $\Lambda$-sets for $A\setminus E$ based on the $\Lambda$-sets for $A$ and $E$ from $\Lambda$-monotonicity. This set decomposition, along with the further refinement in (9), will be key in proving the energy exchange inequality

Figure 2

Figure 2. This diagram depicts the $U_i$ regions for the attack function $\phi _\varepsilon$ associated with adversarial training problem (3). The $\varepsilon$-perimeter regions of $A$ are shaded blue and purple, whereas $\varepsilon$-perimeter regions of $A\setminus B_{\mathrm{d}}(R)$ are shaded pink and purple. Note that some sets, such as $\widehat U_{1}$, are null sets for the $\varepsilon$-perimeter, and so do not appear in this figure.

Figure 3

Figure A1. A degenerate example where $U_6$ and $U_9$ are neither solely attacked nor unattacked sets. The example arises because the boundaries of $A$ and $B_{\mathrm{d}}(R)$ coincide. The pink and purple sets represent the $\varepsilon$-perimeter regions of $A$, whereas the blue and purple regions represent the $\varepsilon$-perimeter regions for $A\setminus \overline {B_{\mathrm{d}}(R)}$.