Mining Multimodal Fatigue Data Using Reasoning Foundation Models and Formalized Domain Knowledge

15 July 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

The scarcity and expense of fatigue data limits optimal design of components and constrains companies to a few well qualified materials when safety-critical applications are concerned. This research investigates different strategies to improve extraction of structured information from unstructured scientific literature—to date the largest corpus of fatigue information. Successful generative extraction is within reach considering latest foundation vision and reasoning language model (VLM/RLM) developments. In this work, a schema-based extraction is attempted for which an object-oriented fatigue data schema is designed. The schema provides labels, definitions and type-constraints for the target entities as contextual domain knowledge to the VLM/RLM model. The importance of nuanced target field definitions within the schema and constrained decoding is explored. Furthermore, the schema-based approach is gradually extended to form two agentic language model systems, one which utilizes a step-wise, human-inspired approach to first determine discriminative cues from fatigue S-N diagrams and one further applying dynamic knowledge augmentation. The latter dynamic workflow exploits the synergy of reasoning language models and ontologies by performing logical reasoning and web-search for dynamic knowledge augmentation and hallucination detection. On this rather complex fatigue data extraction task, requiring hierarchical pattern recognition and multimodal extraction, an overall F1-score of 0.82 is achieved, while fields contained in the narrative text modality are extracted with an F1-score of 0.92. The strengths and weaknesses of all models and methodologies are thoroughly discussed and extensions to our workflows are proposed.

Keywords

Materials
Fatigue
Information Extraction
Multimodal
Vision Language Models
Reasoning Language Models
Agentic VLM Systems
Logical Reasoning
Ontology

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.