Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model

12 August 2025, Version 2
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Validating chemical synthesis success requires confirming the desired product using various analytical techniques. While spectroscopic data collection is increasingly automated, interpreting results remains a major bottleneck often requiring expert input. With advances in laboratory automation and high-throughput synthesis, this challenge is expected to intensify. We introduce the MultiModalSpectralTransformer (MMST), a machine learning method that predicts chemical structures directly from diverse spectral data (NMR, IR, MS). Trained on 4 million simulated compounds, MMST achieves 72% and 80 % as top-1 and top-3 accuracy respectively. To address out-of-distribution challenges, we implemented an active learning improvement cycle that generates molecules in similar chemical spaces, enabling the model to adapt to chemical structures beyond its original training data. We demonstrate MMST's capabilities through comprehensive benchmarking across diverse molecular weight ranges and chemical spaces. Notably, despite training solely on simulated data, MMST demonstrates good performance with experimental spectra. This research represents a significant advancement in automated structure elucidation, offering a powerful and adaptable tool that bridges the gap between simulated and real-world data.

Keywords

NMR
IR
MS
Structure Elucidation
MultiModalTransformer

Supplementary materials

Title
Description
Actions
Title
Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model
Description
Supporting Information: Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.