Mining Multimodal Fatigue Data Using Reasoning Foundation Models and Formalized Domain Knowledge

Jyoti Prakash Mohanty; Akhil Thomas; Tresa M. Pollock; Ali Riza Durmaz

doi:10.26434/chemrxiv-2025-xwd6c

Materials Science

Search within Materials Science

Mining Multimodal Fatigue Data Using Reasoning Foundation Models and Formalized Domain Knowledge

15 July 2025, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

The scarcity and expense of fatigue data limits optimal design of components and constrains companies to a few well qualified materials when safety-critical applications are concerned. This research investigates different strategies to improve extraction of structured information from unstructured scientific literature—to date the largest corpus of fatigue information. Successful generative extraction is within reach considering latest foundation vision and reasoning language model (VLM/RLM) developments. In this work, a schema-based extraction is attempted for which an object-oriented fatigue data schema is designed. The schema provides labels, definitions and type-constraints for the target entities as contextual domain knowledge to the VLM/RLM model. The importance of nuanced target field definitions within the schema and constrained decoding is explored. Furthermore, the schema-based approach is gradually extended to form two agentic language model systems, one which utilizes a step-wise, human-inspired approach to first determine discriminative cues from fatigue S-N diagrams and one further applying dynamic knowledge augmentation. The latter dynamic workflow exploits the synergy of reasoning language models and ontologies by performing logical reasoning and web-search for dynamic knowledge augmentation and hallucination detection. On this rather complex fatigue data extraction task, requiring hierarchical pattern recognition and multimodal extraction, an overall F1-score of 0.82 is achieved, while fields contained in the narrative text modality are extracted with an F1-score of 0.92. The strengths and weaknesses of all models and methodologies are thoroughly discussed and extensions to our workflows are proposed.

Keywords

Materials

Fatigue

Information Extraction

Multimodal

Vision Language Models

Reasoning Language Models

Agentic VLM Systems

Logical Reasoning

Ontology

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jul 15, 2025 Version 1

Metrics

854

534

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-xwd6c

Funding

Bundesministerium für Forschung, Technologie und Raumfahrt

13XP5226G

Bundesministerium für Forschung, Technologie und Raumfahrt

13XP5094B

Deutsche Forschungsgemeinschaft

550126120

National Nuclear Security Administration

DE-NA0004152

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Mining Multimodal Fatigue Data Using Reasoning Foundation Models and Formalized Domain Knowledge

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share