How to make machine learning scoring functions competitive with FEP

Matthew Warren; ísak Valsson; Charlotte Deane; Aniket Magarkar; Garrett Morris; Philip Biggin

doi:10.26434/chemrxiv-2024-bth5z

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

How to make machine learning scoring functions competitive with FEP

24 June 2024, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Machine learning offers a promising approach for fast and accurate binding affin- ity predictions. However, current models often fail to generalise beyond their training data and are not robustly evaluated on a diverse range of benchmarks, limiting their application in drug discovery projects. In this work, we address these issues by intro- ducing a novel graph neural network model called AEV-PLIG (Atomic Environment Vector - Protein Ligand Interaction Graph), which encodes protein-ligand interactions via atomic environment vectors to improve generalisation. We evaluate our model on improved benchmarks, including our new out-of-distribution test set we call OOD Test, and two alternative benchmark systems used for free energy perturbation (FEP) calculations, and highlight competitive performance of AEV-PLIG across the board. Moreover, we demonstrate how augmented data can be leveraged to enhance predic- tion accuracy, and how enriching the training data with three complexes from a con- generic series of ligands binding to a target of interest improves performance further. Altogether, we show that these strategies improve the applicability of machine learn- ing scoring functions and enable state-of-the-art performance nearing the accuracy of physics-based simulation methods—but at a fraction of their computational cost. This practical approach extends the predictive capabilities of machine learning for molecular discovery, paving the way for its broader use in computer-aided drug design.

Keywords

prediction

affinity

Absolute binding free energies

computer-aided drug design

protein-ligand

binding

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Additional supporting figures and tables.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 24, 2024 Version 1

Metrics

5,547

3,008

Views

Downloads

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-bth5z

Funding

Engineering and Physical Sciences Research Council

EP/S024093/1

Boehringer Ingelheim

N/A

Author’s competing interest statement

Aniiket Magarkar is an employee of Boehringer Ingelheim. Matthew Warren was supported by a PDRA grant funded by Boehringer Ingelheim.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

How to make machine learning scoring functions competitive with FEP

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share