Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-06T13:53:58.989Z Has data issue: false hasContentIssue false

Applications of machine learning in computer-aided drug discovery

Published online by Cambridge University Press:  01 September 2022

SM Bargeen Alam Turzo
Affiliation:
Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA
Eric R. Hantz
Affiliation:
Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA
Steffen Lindert*
Affiliation:
Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA
*
*Author for correspondence: Steffen Lindert, E-mail: lindert.1@osu.edu
Rights & Permissions [Opens in a new window]

Abstract

Machine learning (ML) has revolutionised the field of structure-based drug design (SBDD) in recent years. During the training stage, ML techniques typically analyse large amounts of experimentally determined data to create predictive models in order to inform the drug discovery process. Deep learning (DL) is a subfield of ML, that relies on multiple layers of a neural network to extract significantly more complex patterns from experimental data, and has recently become a popular choice in SBDD. This review provides a thorough summary of the recent DL trends in SBDD with a particular focus on de novo drug design, binding site prediction, and binding affinity prediction of small molecules.

Information

Type
Perspective
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. Illustration of computational de novo drug design. (a) In atom-based drug design, the small molecule is built atom by atom by sampling additions of many different types of atoms. (b) In fragment-based drug design, the small molecule is built by sampling additions of a library of fragments.

Figure 1

Table 1. Summary of de novo drug design methods

Figure 2

Fig. 2. Illustration of computational binding site prediction. In this methodology, 3D voxels are used to identify regions of the protein as potential binding sites (shown as yellow rectangles in the figure). Next, these sites are ranked from most to least probable for a ligand to bind.

Figure 3

Table 2. Summary of binding site prediction methods

Figure 4

Fig. 3. Illustration of computational binding affinity prediction. (a) Small molecule is docked into a target protein. (b) The binding site and the small molecule are then characterised with many features in order to predict the binding affinity. The atoms in the small molecule are shown in grey, blue and red for carbon, nitrogen and oxygen, respectively. The carbon, nitrogen, oxygen and sulphur of the binding site residues are shown in blue, purple, red and yellow, respectively. Bonds in the small molecule ligand are shown in black.

Figure 5

Table 3. Summary of binding affinity prediction methods

Review: Applications of Machine Learning in Computer-Aided Drug Discovery — R0/PR1

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: Lindert and coworkers presented a thorough review of recent deep learning trends in computer-aided drug design with focus on de novo drug design, binding site prediction and binding affinity prediction of small molecules.

It can be potentially improved regarding the following:

1. It would help to explain briefly various scores, including the Q-value, SA and QED scores, Vinardo score, RF-score, X-score, cyScore, Chem score, AK-Score, etc.

How are they defined, what are the ranges and what values are needed for "good" predictions?

2. What are the main difference(s) in the deep learning algorithms for predicting allosteric sites compared with predicting orthosteric/primary ligand binding sites.

3. Although drug binding kinetics and efficacy are not covered in the review, it would help to still briefly comment on deep learning studies of these and related drug design aspects.

Review: Applications of Machine Learning in Computer-Aided Drug Discovery — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: In the present manuscript, the authors extensively review a wide range of machine learning applications for computer-aided drug design (CADD). Both machine learning and CADD are broad fields and the article particularly focuses on the application of deep learning techniques to structure-based drug-based drug design. The review is divided into three main topics: de novo drug design, binding site prediction, and binding affinity prediction. For each topic, a description of available methods with the corresponding advantages and drawbacks is provided. Overall, the review is really interesting and well-written and it can be a useful reference for people starting to use ML in the field of CADD. This field is constantly evolving and this review perfectly captures the current state of the art. I have some comments/suggestions that the authors may consider:

1. My major suggestion is to extend the information regarding the machine learning algorithms, which is limited to six lines in page 3. My feeling is that the review is too oriented toward people familiar with machine learning algorithms. I think that it would be relevant, for the ML-inexperienced reader, to add a more detailed description of the machine learning algorithms that are commented through the manuscript. During the text a lot of different methods are mentioned (different types of convolutional neural networks, recurrent neural networks, graph neural networks, …). I suggest including a brief description of the different neural networks and reinforcement learning methods. This can help the reader to familiarize with the different algorithms that are mentioned through the text.

2. One of the first examples of machine learning algorithms applied to binding site prediction is CryptoSite (J. Mol. Bio. 2016, 428, 709). It is not based on deep learning but it is one of the pioneering works in the application of machine learning to binding site identification. The authors may consider briefly mentioning CryptoSite in the manuscript.

3. Among the binding affinity prediction applications, there is a method called DeepBSP that provides good results for binding-pose predictions (J. Chem. Inf. Model. 2021, 61, 5, 2231-2240). I suggest the authors consider commenting about this protocol in the revised version of the manuscript.

4. One topic that is not discussed in the review is the use of deep learning to generate 3D structures for binding pose and affinity prediction. As the authors mention in the conclusions, deep learning has been used to predict the 3D structure of proteins with programs like AlphaFold and also RosettaTTAFold. I think that an interesting topic is to discuss the quality of these deep learning generated structures for hit identification. For example, recently, Alon et al (Nature 600 759-764 (2021)) reported the crystal structure of the sigma2 receptor. In the same work, they screened using molecular docking a large library of compounds to the X-ray structure, which resulted in the identification of around 130 active compounds. However, the molecular docking of these hits scored relatively poorly against the AlphaFold model, indicating that there is still a long way to go in terms of identifying relevant protein conformations for drug design. I understand that this topic may be out of the scope of the present review and that probably not enough information have been generated to include it in a review. Therefore, the authors should only include some sentences about this topic only if they consider that may fit with the manuscript.

Additional minor suggestions:

5. MolDQN acronym is introduced in page 4 but has not been previously defined. The same for HTMD (page 9) and GNN (page 6). The authors should define these terms in the revised version of the manuscript.

6. Browne et al. reference is incomplete

7. Jiménez et al. KDeep reference is not included in the main manuscript (page 13) and in the reference list (J. Chem. Inf. Model. 2018, 58, 2, 287-296).

Review: Applications of Machine Learning in Computer-Aided Drug Discovery — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: The application of machine learning in drug discovery has become more and more popular. In recent years, machine learning techniques have provided a toolset to improve data analysis and decision-making for drug design. In this work, the authors review recently available tools, especially in de novo drug design, binding site prediction, and binding affinity prediction. Overall, the authors induced the background and methods of each deep learning tool in structure-based drug design. The review clearly addresses all software and its code architecture and source. The manuscript should be a good publication for QRB Discovery. However, a couple of minor points can make the manuscript stronger.

The authors discuss each deep learning method. It is good to include a comparison between different methods. Also, address the potential pros and cons of the methods. This could help first-time users select the techniques to work on their system easily.

Although the methods the authors discussed in this manuscript are quite new, did any people apply those methods? What kind of systems did they apply? Were the results good? The above could be discussed to emphasize the importance of machine learning methods.

There have been many review papers about machine learning in drug discovery recently. It is good to clearly state the difference between the current manuscript and other review papers. Also, those review papers should be cited. For example:

Applications of machine learning in drug discovery and development, Nature Reviews Drug Discovery volume 18, pages 463-477 (2019)

Machine Learning in Drug Discovery: A Review, DOI: 10.1007/s10462-021-10058-4

A review on machine learning approaches and trends in drug discovery, Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 4538-4558

The resolution of the figures could be improved. For example, it is hard to see Figure 3B. May use different representations to show atoms and bonds.

Some sentences are very long and hard to read. For example, on page 17, "Machine learning has become an increasingly popular field of study and its application to biological problems will only continue to grow as academic and industry users attempt to create better tools to predict biomolecular structure, treat disease and improve public health."

Typo. On page 10, "DeepSite was able to provide a more accurate biding site prediction" should be "DeepSite was able to provide a more accurate binding site prediction".

Recommendation: Applications of Machine Learning in Computer-Aided Drug Discovery — R0/PR4

Comments

Comments to Author: Reviewer #1: Lindert and coworkers presented a thorough review of recent deep learning trends in computer-aided drug design with focus on de novo drug design, binding site prediction and binding affinity prediction of small molecules.

It can be potentially improved regarding the following:

1. It would help to explain briefly various scores, including the Q-value, SA and QED scores, Vinardo score, RF-score, X-score, cyScore, Chem score, AK-Score, etc.

How are they defined, what are the ranges and what values are needed for "good" predictions?

2. What are the main difference(s) in the deep learning algorithms for predicting allosteric sites compared with predicting orthosteric/primary ligand binding sites.

3. Although drug binding kinetics and efficacy are not covered in the review, it would help to still briefly comment on deep learning studies of these and related drug design aspects.

Reviewer #2: The application of machine learning in drug discovery has become more and more popular. In recent years, machine learning techniques have provided a toolset to improve data analysis and decision-making for drug design. In this work, the authors review recently available tools, especially in de novo drug design, binding site prediction, and binding affinity prediction. Overall, the authors induced the background and methods of each deep learning tool in structure-based drug design. The review clearly addresses all software and its code architecture and source. The manuscript should be a good publication for QRB Discovery. However, a couple of minor points can make the manuscript stronger.

The authors discuss each deep learning method. It is good to include a comparison between different methods. Also, address the potential pros and cons of the methods. This could help first-time users select the techniques to work on their system easily.

Although the methods the authors discussed in this manuscript are quite new, did any people apply those methods? What kind of systems did they apply? Were the results good? The above could be discussed to emphasize the importance of machine learning methods.

There have been many review papers about machine learning in drug discovery recently. It is good to clearly state the difference between the current manuscript and other review papers. Also, those review papers should be cited. For example:

Applications of machine learning in drug discovery and development, Nature Reviews Drug Discovery volume 18, pages 463-477 (2019)

Machine Learning in Drug Discovery: A Review, DOI: 10.1007/s10462-021-10058-4

A review on machine learning approaches and trends in drug discovery, Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 4538-4558

The resolution of the figures could be improved. For example, it is hard to see Figure 3B. May use different representations to show atoms and bonds.

Some sentences are very long and hard to read. For example, on page 17, "Machine learning has become an increasingly popular field of study and its application to biological problems will only continue to grow as academic and industry users attempt to create better tools to predict biomolecular structure, treat disease and improve public health."

Typo. On page 10, "DeepSite was able to provide a more accurate biding site prediction" should be "DeepSite was able to provide a more accurate binding site prediction".

Reviewer #3: In the present manuscript, the authors extensively review a wide range of machine learning applications for computer-aided drug design (CADD). Both machine learning and CADD are broad fields and the article particularly focuses on the application of deep learning techniques to structure-based drug-based drug design. The review is divided into three main topics: de novo drug design, binding site prediction, and binding affinity prediction. For each topic, a description of available methods with the corresponding advantages and drawbacks is provided. Overall, the review is really interesting and well-written and it can be a useful reference for people starting to use ML in the field of CADD. This field is constantly evolving and this review perfectly captures the current state of the art. I have some comments/suggestions that the authors may consider:

1. My major suggestion is to extend the information regarding the machine learning algorithms, which is limited to six lines in page 3. My feeling is that the review is too oriented toward people familiar with machine learning algorithms. I think that it would be relevant, for the ML-inexperienced reader, to add a more detailed description of the machine learning algorithms that are commented through the manuscript. During the text a lot of different methods are mentioned (different types of convolutional neural networks, recurrent neural networks, graph neural networks, …). I suggest including a brief description of the different neural networks and reinforcement learning methods. This can help the reader to familiarize with the different algorithms that are mentioned through the text.

2. One of the first examples of machine learning algorithms applied to binding site prediction is CryptoSite (J. Mol. Bio. 2016, 428, 709). It is not based on deep learning but it is one of the pioneering works in the application of machine learning to binding site identification. The authors may consider briefly mentioning CryptoSite in the manuscript.

3. Among the binding affinity prediction applications, there is a method called DeepBSP that provides good results for binding-pose predictions (J. Chem. Inf. Model. 2021, 61, 5, 2231-2240). I suggest the authors consider commenting about this protocol in the revised version of the manuscript.

4. One topic that is not discussed in the review is the use of deep learning to generate 3D structures for binding pose and affinity prediction. As the authors mention in the conclusions, deep learning has been used to predict the 3D structure of proteins with programs like AlphaFold and also RosettaTTAFold. I think that an interesting topic is to discuss the quality of these deep learning generated structures for hit identification. For example, recently, Alon et al (Nature 600 759-764 (2021)) reported the crystal structure of the sigma2 receptor. In the same work, they screened using molecular docking a large library of compounds to the X-ray structure, which resulted in the identification of around 130 active compounds. However, the molecular docking of these hits scored relatively poorly against the AlphaFold model, indicating that there is still a long way to go in terms of identifying relevant protein conformations for drug design. I understand that this topic may be out of the scope of the present review and that probably not enough information have been generated to include it in a review. Therefore, the authors should only include some sentences about this topic only if they consider that may fit with the manuscript.

Additional minor suggestions:

5. MolDQN acronym is introduced in page 4 but has not been previously defined. The same for HTMD (page 9) and GNN (page 6). The authors should define these terms in the revised version of the manuscript.

6. Browne et al. reference is incomplete

7. Jiménez et al. KDeep reference is not included in the main manuscript (page 13) and in the reference list (J. Chem. Inf. Model. 2018, 58, 2, 287-296).

Recommendation: Applications of Machine Learning in Computer-Aided Drug Discovery — R1/PR5

Comments

No accompanying comment.

Recommendation: Applications of Machine Learning in Computer-Aided Drug Discovery — R2/PR6

Comments

No accompanying comment.