From pockets to products: a data-driven framework for classification of monoterpene synthases

Cathal  Ó Raghallaigh; Nigel Scrutton; Sam Hay

doi:10.26434/chemrxiv-2025-mf4xs

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

From pockets to products: a data-driven framework for classification of monoterpene synthases

10 October 2025, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Monoterpene synthases (mTSs) are a large family of enzymes, which have promising industrial applications, yet remain difficult to engineer due to complex and poorly understood sequence-function relationships. Here, we present a structure-based machine learning (ML) framework that accurately predicts whether a mTS produces linear or cyclic monoterpene products. Our approach identifies active site properties, which are used as ML features, enabling functional predictions that go beyond sequence alone. As residue positioning is important for monoterpene synthesis, we created an algorithm to identify structurally conserved residues in the active sites of mTSs, identifying new and existing motifs essential for both general catalysis and specific cyclization steps. This integrated workflow thus provides insights into terpene synthases that were previously inaccessible and offers a generalizable strategy for probing and engineering other poorly understood enzyme families. Future work could see this approach used to guide the rational design of mTSs and other hard-to-engineer enzyme families.

Keywords

Monoterpene synthase

Active site

Structure-based classification

Interpretable models

End-product cyclization

Machine learning

Enzyme engineering.

Supplementary materials

Title

Description

Actions

Title

Supporting information

Description

Supporting information

Actions

Supplementary weblinks

Title

Description

Actions

Title

Code

Description

All code used in this paper is freely available at https://github.com/cathaloraghallaigh/ATC

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 10, 2025 Version 1

Metrics

605

149

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-mf4xs

Funding

BBSRC

BB/Y008456/1

EPSRC

EP/S01778X/1

EPSRC

EP/S022856/1

EPSRC

studentship ref. 2602504

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

From pockets to products: a data-driven framework for classification of monoterpene synthases

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share