The Weaponization of Imperfection: Quantifying Adversarial Vulnerability in Pre-Trained Vision Models and its Direct Implications for AGI Catastrophe

29 October 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

This study presents a rigorous empirical quantification of adversarial vulnerability in state-of-the-art Convolutional Neural Networks (CNNs) and directly relates this measurable fragility to the looming systemic risk of Artificial General Intelligence (AGI) misalignment. I conducted an extensive testing campaign on three widely adopted, ImageNet-pretrained architectures—ResNet-50, DenseNet-121, and VGG-16—using the same type of ImageNet image samples they were originally trained upon. My research focused exclusively on the vulnerability of these models to both targeted and untargeted gradient-based perturbations, specifically employing Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and Momentum Iterative Method (MIM) attacks across various budgets (ε). The core empirical objective was to determine the ease of triggering a specific, high-stakes failure: forcing the models to misclassify a non-combatant ambulance as a hostile armored vehicle. My quantitative analysis establishes a definitive hierarchy of model fragility: ResNet-50 demonstrated the highest average iterative attack success rate (80.5% ASR), DenseNet-121 showed moderate fragility (68.8% ASR), and VGG-16exhibited the highest resilience (48.0% ASR). Critically, catastrophic failure (≈100% ASR) was consistently achieved for the modern architectures (ResNet-50 and DenseNet-121) at minimal perturbation budgets (ε≥8/255). I conclude that this ease of manipulation and the well-established feature-level misalignment it exploits serve as an immediate, empirical warning. The current vulnerability of narrow AI, where an ambulance can be turned into a military target by imperceptible noise, is a chilling preview of how a misaligned AGI could systematically misinterpret human values and pursue goals, with potentially catastrophic consequences for global security and human life.

Keywords

machine learning
AGI misaligned
ANI-TO-AGI
Convolutional Neural Networks (CNNs)

Supplementary materials

Title
Description
Actions
Title
Figures for adverserial attacks
Description
Complete figures generated during research on the paper
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.
Comment number 1, Алмаз Яруллин: Nov 23, 2025, 23:28

Thank you for this eye-opening preprint on "The Weaponization of Imperfection: Quantifying Adversarial Vulnerability in Pre-Trained Vision Models and its Direct Implications for AGI Catastrophe"! The empirical breakdown of attack success rates across ResNet-50 (80.5% ASR), DenseNet-121, and VGG-16 effectively illustrates model fragility, with stark implications for AGI misalignment and global security. The focus on low-perturbation catastrophic failures is a crucial wake-up call for robust AI development. Eager for further explorations—brilliant and urgent contribution!