Hostname: page-component-77f85d65b8-grvzd Total loading time: 0 Render date: 2026-03-30T04:05:07.850Z Has data issue: false hasContentIssue false

Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories

Published online by Cambridge University Press:  12 December 2022

Carlos Ramírez-Palacios
Affiliation:
Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
Siewert J. Marrink*
Affiliation:
Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
*
*Author for correspondence: Siewert J. Marrink, E-mail: s.j.marrink@rug.nl
Rights & Permissions [Opens in a new window]

Abstract

We previously presented a computational protocol to predict the enzymatic (enantio)selectivity of an ω-transaminase towards a set of ligands (Ramírez-Palacios et al. (2021) Journal of Chemical Information and Modeling 61(11), 5569–5580) by counting the number of binding poses present in molecular dynamics (MD) simulations that met a defined set of geometric criteria. The geometric criteria consisted of a hand-crafted set of distances, angles and dihedrals deemed to be important for the enzymatic reaction to take place. In this work, the MD trajectories are reanalysed using a deep-learning approach to predict the enantiopreference of the enzyme without the need for hand-crafted criteria. We show that a convolutional neural network is capable of classifying the trajectories as belonging to the ‘reactive’ or ‘non-reactive’ enantiomer (binary classification) with a good accuracy (>0.90). The new method reduces the computational cost of the methodology, because it does not necessitate the sampling approach from the previous work. We also show that analysing how neural networks reach specific decisions can aid hand-crafted approaches (e.g. definition of near-attack conformations, or binding poses).

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. Computational prediction of the ω-TA selectivity by (a) geometric analysis of near-attack conformations (NACs) and (b) deep learning. In both methodologies, short MD simulations (20 ps) are run using the docked complex as starting conformation. In the NAC-based methodology, presented in Ramírez-Palacios et al. (2021), a combination of docking and MD simulations was needed to sample through a slow-moving dihedral (χ1). The DL-based methodology, presented in this work, does not necessitate such a strategy of sampling through slow-moving parameters, which reduces the number of MD simulations required. Furthermore, no hand-crafted geometric features are needed for the NN to learn to distinguish between trajectories that contain ‘reactive’ or ‘non-reactive’ enantiomers, which would reduce the number of man-hours required for its implementation to new systems.

Figure 1

Fig. 2. Structures of the compounds used in this study. The ligands were the external aldimine form of the amines shown in the figure. Following CIP rules, all compounds shown are (S)-enantiomers, except 32, 33, 35, 36 and 45 which formally are (R)-enantiomers.

Figure 2

Fig. 3. (a) Structure of the complex of Vf-TA with the external aldimine intermediate of compound (S)-01. (b) Transamination reaction. The fully reversible transamination reaction consists of multiple steps in which an amino group is transferred from the substrate to the cofactor or, in the reverse direction, from the cofactor to the substrate. Among the reaction intermediates, we only modelled the external aldimine intermediate, because it is most likely involved in the rate-limiting step of the reaction. In this step, the nucleophilic proton abstraction of the external aldimine by the catalytic lysine leads to the formation of the quinonoid intermediate. Short MD simulations of the external aldimine intermediate complex were used as input for the neural networks, which were tasked with classifying them.

Figure 3

Fig. 4. Descriptors used to represent the MD trajectories. Colour codes are: blue, distances; green, angles; red, dihedrals. The 15 descriptors are shown in two parts only for clarity.

Figure 4

Fig. 5. Configuration of the NNs used to classify the MD trajectories. Each trajectory consists of 1,000 frames and each frame is represented by 15 descriptors (distances, angles and dihedrals thought to be relevant in the studied reaction). (a) In the 1D-CNN model, an MD trajectory is treated as a signal with multiple channels (descriptors). The input signal (sized 1,000 × 15) passes through multiple convolutional, pooling and dense layers until the final neuron outputs 0 or 1 to represent the class ‘non-reactive’ or ‘reactive’, respectively. (b) In the 2D-CNN model, the trajectory is represented as an image of size 1,000 × 15 (binary classification) or 320 × 320 (CAM visualisation), with one colour channel (analogous to a black-and-white picture). The input image passes through multiple convolutional, pooling, dropout and dense layers until the final neuron outputs the class prediction. (c) The LSTM model is trained to predict the next frame based on information from all other frames that came before it. After training, the embeddings obtained from the second LSTM layer are used to predict the trajectory’s class.

Figure 5

Table 1. Results of the tested NNs in the binary classification of MD trajectories

Figure 6

Fig. 6. Bar plot showing the number of misclassified trajectories per ligand molecule when evaluated with the trained 1D-CNN model. We want the number of misclassified trajectories to be small. The total number of trajectories per ligand is 200, half of which are labelled as ‘non-reactive’ and the other half are labelled as ‘reactive’. For example, ligand 06 has 100 trajectories containing the non-preferred enantiomer (class ‘non-reactive’) of which 37 were misclassified (red bars), and 100 trajectories containing the enantiomer preferred by Vf-TA (class ‘reactive’) of which 23 were misclassified (blue bars) by the NN. There are 49 ligands in total, adding up to a total of 9,800 trajectories.

Figure 7

Fig. 7. Accuracy obtained by using one input channel for training, $ {\boldsymbol{x}}_i\in {\mathbb{R}}^{N\times 1} $, and the accuracy obtained by using all the input channels, $ {\boldsymbol{x}}_i\in {\mathbb{R}}^{N\times 15} $ (first bar, labelled as ‘All’). The error bars are standard deviations over five replicas.

Figure 8

Fig. 8. t-SNE of the latent space representations ($ {\boldsymbol{h}}_i\in {\mathbb{R}}^{16} $) obtained from the penultimate layer of the 1D-CNN model. The evaluations were made in the validation dataset (n = 2,200 examples). The reader can think of the latent space as follows: imagine you show an image of a molecule to a chemist and scan their neurons to see which neurons are active and which are not. You then repeat the experiment with a non-expert in chemistry and compare the two scans. The scans will be different, because the expert can give meaning to the image shown. We expect the scan of the neuron activations of the expert to tell us something about the molecule that is not necessarily encoded in the input image (e.g. whether the molecule is realistic or not), coming from the experience (training) of the expert (neural network). That is latent space.

Figure 9

Fig. 9. (a) Heatmap showing the CAM activations of the 1D-CNN model of one input trajectory, presented as example. The y-axis is the filter number, and the x-axis is the time dimension. (b) CAM activations (heatmap) of filter number 4 overlapped to the input vectors of descriptors 3, 4 and 10 (line plot), along a small 2.0 ps window (x-axis). The y-axis contains the normalised values from each descriptor. The hand-drawn arrows show the direction each descriptor follows. At 1.67 ps, the activations are at their highest, and at 1.55 ps, activations are low. The red rectangle shows a region with little activations but where descriptor 3 looks visually similar to the region in which the activations were at their maximum (~1.67 ps).

Figure 10

Fig. 10. (a) Example of an input image of size 320 × 320 pixels, and (b) the CAM produced by the image after passing it through a 2D-CNN. Viridis colourmap: the regions with the lowest activations (low importance) are coloured blue, the regions that produce the highest activations are in yellow and the regions with middle importance are in green. The highest activations come from the pixels containing trajectory information.

Figure 11

Fig. 11. Violin plots showing the position-dependent values (y-axis) at which each descriptor (x-axis) showed the maximum CAM activation to be classified as either class ‘reactive’ (blue) or class ‘non-reactive’ (red). The plot shows all 9,800 trajectories. CAM visualisation was obtained from the 1D-CNN model (filter 4 of Fig. 9).

Supplementary material: PDF

Ramírez-Palacios and Marrink supplementary material

Ramírez-Palacios and Marrink supplementary material
Download Ramírez-Palacios and Marrink supplementary material(PDF)
PDF 1.4 MB

Review: Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories — R0/PR1

Conflict of interest statement

None.

Comments

Comments to Author: The paper reports the use of sophisticated deep learning methods to analyse molecular dynamics trajectories. Such approaches have great promise for the future, so this paper is certainly of interest to the molecular dynamics (MD) community. To have the maximum impact, the authors should try and rewrite it so that it is more accessible to the molecular dynamics field, as currently it is very brief when describing the methods used and uses a lot of technical language and jargon that the MD community is currently unfamiliar with (e.g. latent space/input channels).

To improve readability within the MD field, I have the following suggestions:

1) Provide more background to the system of interest and why these calculations are useful. The role and importance of the protein itself has not been discussed.

2) The dataset used has been well described, but how does it interface with TensorFlow? The section on network building in TensorFlow is very technical for an MD audience, and should be described with simpler language.

3) The conclusions should relate more specifically to the chemistry results that were obtained, and the insight into the molecular structures that resulted. Currently it is focused on the technicalities of the machine learning. The attempt to track down the mechanism of the learning and decision making is very interesting, and it would be very helpful if the authors could pull out specific chemical examples from their study to explain what is happening from a structural perspective.

Review: Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories — R0/PR2

Conflict of interest statement

No Conflicts of Interest.

Comments

Comments to Author: I believe the manuscript reports on substantial work well supported by the factual information.The results are interesting and important.However, I would prefer to see a more clear connection to commonly understood physical and chemical processes.For example, as simple as drawing the ligand in the active site would help understanding the details of the chemical process.Also, more clear description of the dynamics of the analysed process will help.If I understand correctly, only very short trajectories were analysed (20ps).If that’s the case, what information is contained in such short trajectories?

I think the manuscript can be published after the above is provided.

Recommendation: Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories — R0/PR3

Comments

Comments to Author: Reviewer #1: I believe the manuscript reports on substantial work well supported by the factual information.The results are interesting and important.However, I would prefer to see a more clear connection to commonly understood physical and chemical processes.For example, as simple as drawing the ligand in the active site would help understanding the details of the chemical process.Also, more clear description of the dynamics of the analysed process will help.If I understand correctly, only very short trajectories were analysed (20ps).If that’s the case, what information is contained in such short trajectories?

I think the manuscript can be published after the above is provided.

Reviewer #2: The paper reports the use of sophisticated deep learning methods to analyse molecular dynamics trajectories. Such approaches have great promise for the future, so this paper is certainly of interest to the molecular dynamics (MD) community. To have the maximum impact, the authors should try and rewrite it so that it is more accessible to the molecular dynamics field, as currently it is very brief when describing the methods used and uses a lot of technical language and jargon that the MD community is currently unfamiliar with (e.g. latent space/input channels).

To improve readability within the MD field, I have the following suggestions:

1) Provide more background to the system of interest and why these calculations are useful. The role and importance of the protein itself has not been discussed.

2) The dataset used has been well described, but how does it interface with TensorFlow? The section on network building in TensorFlow is very technical for an MD audience, and should be described with simpler language.

3) The conclusions should relate more specifically to the chemistry results that were obtained, and the insight into the molecular structures that resulted. Currently it is focused on the technicalities of the machine learning. The attempt to track down the mechanism of the learning and decision making is very interesting, and it would be very helpful if the authors could pull out specific chemical examples from their study to explain what is happening from a structural perspective.

Recommendation: Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories — R1/PR4

Comments

Comments to Author: The authors have significantly improved the accessibility and readability of their manuscript, and I am confident it will be of great interest to the biomolecular simulation community.

Recommendation: Computational prediction of ω-transaminase selectivity by deep learning analysis of molecular dynamics trajectories — R2/PR5

Comments

No accompanying comment.