Abstract
Validating chemical synthesis success requires confirming the desired product using various
analytical techniques. While spectroscopic data collection is increasingly automated,
interpreting results remains a major bottleneck often requiring expert input. With advances in
laboratory automation and high-throughput synthesis, this challenge is expected to intensify. We
introduce the MultiModalSpectralTransformer (MMST), a machine learning method that predicts
chemical structures directly from diverse spectral data (NMR, IR, MS). Trained on 4 million
simulated compounds, MMST achieves 72% and 80 % as top-1 and top-3 accuracy respectively.
To address out-of-distribution challenges, we implemented an active learning improvement
cycle that generates molecules in similar chemical spaces, enabling the model to adapt to
chemical structures beyond its original training data. We demonstrate MMST's capabilities
through comprehensive benchmarking across diverse molecular weight ranges and chemical
spaces. Notably, despite training solely on simulated data, MMST demonstrates good
performance with experimental spectra. This research represents a significant advancement in
automated structure elucidation, offering a powerful and adaptable tool that bridges the gap
between simulated and real-world data.
Supplementary materials
Title
Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model
Description
Supporting Information: Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model
Actions
Supplementary weblinks
Title
MultiModalSpectralTransformer
Description
MultiModalSpectralTransformer is a transformer-based architecture that integrates various spectroscopic modalities (NMR, HSQC, COSY, IR, MS) for automated molecular structure prediction, complete with a data generation pipeline and user-friendly HTML interface.
Actions
View Title
Datasets for MulitModalTransformer project
Description
This folder contains all the necessary data related to the publication:
"Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model "
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)