Hostname: page-component-5db58dd55d-8lnk4 Total loading time: 0 Render date: 2026-06-01T20:17:56.775Z Has data issue: false hasContentIssue false

How to beat a Bayesian adversary

Published online by Cambridge University Press:  28 March 2025

Zihan Ding
Affiliation:
Department of Electrical and Computer Engineering, Princeton University, Princeton, USA
Kexin Jin
Affiliation:
Department of Mathematics, Princeton University, Princeton, USA
Jonas Latz*
Affiliation:
Department of Mathematics, University of Manchester, Manchester, UK
Chenguang Liu
Affiliation:
Delft Institute of Applied Mathematics, Technische Universiteit Delft, Delft, The Netherlands
*
Corresponding author: Jonas Latz; Email: jonas.latz@manchester.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model’s prediction through a small, directed perturbation of the model’s input – an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine learning loss under maximisation-based adversarial attacks. In this work, we study adversaries that determine their attack using a Bayesian statistical approach rather than maximisation. The resulting Bayesian adversarial robustness problem is a relaxation of the usual minmax problem. To solve this problem, we propose Abram – a continuous-time particle system that shall approximate the gradient flow corresponding to the underlying learning problem. We show that Abram approximates a McKean–Vlasov process and justify the use of Abram by giving assumptions under which the McKean–Vlasov process finds the minimiser of the Bayesian adversarial robustness problem. We discuss two ways to discretise Abram and show its suitability in benchmark adversarial deep learning experiments.

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Plots of the Lebesgue density of$\pi _1^{\gamma, \varepsilon }(\cdot |\theta _0)$for energy$\Phi (y_1 + \xi, z_1|\theta _0) = (\xi -0.1)^2/2$, choosing parameters$\varepsilon \in \{0.025, 0.1, 0.4\}$and$\gamma \in \{0.1, 10, 1000\}$.

Figure 1

Figure 2. Examples of the Abram method given$\Phi (\xi, \theta ) = \frac {1}{2}(\xi + \theta )^2$, $\varepsilon = 1$, and different combinations of$(\gamma, N) = (10,3)$(top left),$(0.1,3)$(top right),$(10,50)$(bottom left),$(0.1, 50)$(bottom right). In each of the four quadrants, we show the simulated path$(\theta _t^N)_{t \geq 0}$(top), the particle paths$(\xi _t^{1,N},\ldots, \xi _t^{N,N})_{t \geq 0}$(centre), and the path of probability distributions$(\pi ^{\gamma, \varepsilon }(\cdot |\theta _t^N))_{t \geq 0}$(bottom) that shall be approximated by the particles. The larger$\gamma$leads to a concentration of$\pi ^{\gamma, \varepsilon }$at the boundary, whilst it is closer to uniform if$\gamma$is small. More particles lead to a more stable path$(\theta ^N_t)_{t \geq 0}$. A combination of large$N$and$\gamma$leads to convergence to the minimiser$\theta _* = 0$of$F$.

Figure 2

Table 1. Definitions of stochastic processes throughout this work

Figure 3

Algorithm 1 Abram

Figure 4

Algorithm 2 Mini-batching Abram

Figure 5

Algorithm 3 Bayesian sample attack

Figure 6

Algorithm 4 Bayesian mean attack

Figure 7

Table 2. Comparison of test accuracy (%) on MNIST with different adversarial attack after Abram, mini-batching Abram, and FGSM [57] adversarial training

Figure 8

Table 3. Comparison of test accuracy (%) on CIFAR10 with different adversarial attack after mini-batching Abram and FGSM [57] adversarial training