Abstract
This study presents a rigorous empirical quantification of adversarial vulnerability in state-of-the-art Convolutional Neural Networks (CNNs) and directly relates this measurable fragility to the looming systemic risk of Artificial General Intelligence (AGI) misalignment. I conducted an extensive testing campaign on three widely adopted, ImageNet-pretrained architectures—ResNet-50, DenseNet-121, and VGG-16—using the same type of ImageNet image samples they were originally trained upon. My research focused exclusively on the vulnerability of these models to both targeted and untargeted gradient-based perturbations, specifically employing Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and Momentum Iterative Method (MIM) attacks across various budgets (ε). The core empirical objective was to determine the ease of triggering a specific, high-stakes failure: forcing the models to misclassify a non-combatant ambulance as a hostile armored vehicle. My quantitative analysis establishes a definitive hierarchy of model fragility: ResNet-50 demonstrated the highest average iterative attack success rate (80.5% ASR), DenseNet-121 showed moderate fragility (68.8% ASR), and VGG-16exhibited the highest resilience (48.0% ASR). Critically, catastrophic failure (≈100% ASR) was consistently achieved for the modern architectures (ResNet-50 and DenseNet-121) at minimal perturbation budgets (ε≥8/255). I conclude that this ease of manipulation and the well-established feature-level misalignment it exploits serve as an immediate, empirical warning. The current vulnerability of narrow AI, where an ambulance can be turned into a military target by imperceptible noise, is a chilling preview of how a misaligned AGI could systematically misinterpret human values and pursue goals, with potentially catastrophic consequences for global security and human life.
Supplementary materials
Title
Figures for adverserial attacks
Description
Complete figures generated during research on the paper
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)