Generalizable Neural Architectures for Robust Human Feature Analysis in Versatile Imaging Conditions

28 April 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

This paper presents a framework designed to enhance the robustness of human-centric computer vision systems operating under severe hardware constraints. The core challenge addressed is the performance degradation of authentication and dermatological analysis platforms in unconstrained imaging scenarios, characterized by variable illumination, occlusion, and a critical scarcity of labeled multi-modal data, particularly in thermal and clinical spectral domains. Our contribution is a unified pipeline integrating deep generative models for cross-modal synthetic data augmentation with a hybrid convolutional architecture, refined by embedded classical optimization layers. The generative component employs a Spectral-Consistent Generative Adversarial Network (SC-GAN) to produce physiologically plausible thermal and clinical image pairs from limited RGB-Depth inputs, effectively mitigating data paucity and privacy concerns. The discriminative backbone is a multi-stream convolutional neural network optimized for heterogeneous input fusion. A pivotal innovation is the integration of an Alternating Direction Method of Multipliers (ADMM) layer within the network's decoding stage, formulated as a learnable optimization step for precise segmentation and depth refinement. This layer enforces spatial consistency and boundary accuracy within the gradient-based learning paradigm. Extensive experimental validation on a newly compiled multi-modal dataset demonstrates that our hybrid model significantly outperforms conventional deep learning baselines. We report a mean increase of 18.7% in segmentation Intersection-over-Union (IoU) for skin lesion boundaries under challenging lighting, a 22.3% reduction in biometric authentication error rates in occluded scenarios, and a 15.9% decrease in generalization error on unseen domains. The architecture maintains an inference latency of 47.3 ms on a mobile system-on-chip, ensuring realtime device level deployments.

Keywords

Domain Adaptation
Multi-Modal Learning
Generative Adversarial Networks
Convex Optimization
Edge Computing
Human-Centric Computer Vision
Hybrid Neural Architectures

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.