Deep Learning-Ready Voxel Representation of Protein-Ligand Complexes from an Enhanced PBDbind v.2020 Dataset

Matheus  Müller Pereira da Silva; Isabella Alvim Guedes; Fábio Lima Custódio; Laurent Emmanuel Dardenne

doi:10.26434/chemrxiv-2023-f4q6k

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Deep Learning-Ready Voxel Representation of Protein-Ligand Complexes from an Enhanced PBDbind v.2020 Dataset

11 December 2023, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

A critical aspect of successful deep learning (DL) modelling in computer-aided drug discovery (CADD) is the representation of biomolecular data. Voxel grid representations have emerged as a straightforward method for depicting 3D molecular structures of protein-ligand complexes. Proper structural preparation of these complexes is also crucial, particularly in models where the orientation of hydrogen atoms and the accurate assignment of protonation/tautomeric states are vital. The PDBbind, a widely used dataset, can be improved in this regard. This work presents an enhanced version of the PDBbind v.2020 refined set concerning structural preparation, a voxel representation of these structures suitable for DL model training and a diverse set of docking-generated poses that could be used to develop new scoring functions for pose prediction. We also introduce DockTGrid, a software library developed to generate these voxel representations, which can be adapted to create new molecular features. With this work, we aim to provide the CADD community with high-quality, accessible resources to facilitate the development of DL models for drug discovery.

Keywords

structure-based drug design

virtual screening

Supplementary weblinks

Title

Description

Actions

Title

DockTGrid

Description

Create customized voxel representations of protein-ligand complexes using GPU.

Actions

View

Title

DockTGrid dataset

Description

Dataset based on the PDBbind v.2020 refined set used to generate the docking poses with DockThor and create the voxels.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 11, 2023 Version 1

Metrics

1,572

772

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-f4q6k

Funding

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

309744/2022-9

Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro

E-26/010.001415/2019, E-26/211.357/2021, E-26/ 200.393/2023, E-26/200.608/2022, E-26/210.372/2022

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Deep Learning-Ready Voxel Representation of Protein-Ligand Complexes from an Enhanced PBDbind v.2020 Dataset

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share