Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-09T06:33:16.382Z Has data issue: false hasContentIssue false

Graph deep learning locates magnesium ions in RNA

Published online by Cambridge University Press:  06 October 2022

Yuanzhe Zhou
Affiliation:
Department of Physics and Astronomy, University of Missouri, Columbia, MO 65211-7010, USA
Shi-Jie Chen*
Affiliation:
Department of Physics and Astronomy, Department of Biochemistry, Institute of Data Sciences and Informatics, University of Missouri, Columbia, MO 65211-7010, USA
*
*Author for correspondence: Shi-Jie Chen, E-mail: chenshi@missouri.edu
Rights & Permissions [Opens in a new window]

Abstract

Magnesium ions (Mg2+) are vital for RNA structure and cellular functions. Present efforts in RNA structure determination and understanding of RNA functions are hampered by the inability to accurately locate Mg2+ ions in an RNA. Here we present a machine-learning method, originally developed for computer visual recognition, to predict Mg2+ binding sites in RNA molecules. By incorporating geometrical and electrostatic features of RNA, we capture the key ingredients of Mg2+-RNA interactions, and from deep learning, predict the Mg2+ density distribution. Five-fold cross-validation on a dataset of 177 selected Mg2+-containing structures and comparisons with different methods validate the approach. This new approach predicts Mg2+ binding sites with notably higher accuracy and efficiency. More importantly, saliency analysis for eight different Mg2+ binding motifs indicates that the model can reveal critical coordinating atoms for Mg2+ ions and ion-RNA inner/outer-sphere coordination. Furthermore, implementation of the model uncovers new Mg2+ binding motifs. This new approach may be combined with X-ray crystallography structure determination to pinpoint the metal ion binding sites.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. The MgNet workflow (a,b) and applications (c,d). (a) The MgNet workflow begins with input of the 3D structure of a RNA. 3D image is taken from a 24 × 24 × 24 Å cubic box centred at each given nucleotide and is used to capture the electrostatic and 3D-shape information for the binding and non-binding sites. The MgNet accepts the input images and can be used to perform: (b) Mg2+ binding site prediction. The hot spots (left, with decreasing probability from red to green) were collected, sorted, and clustered into final predicted binding sites (right, green spheres); (c) Saliency analysis. MgNet can be used to reveal the most important coordinating RNA atoms by calculating the radial saliency distributions of different atom types around the bound ion; (d) Binding Motif analysis. Statistics of the configurations of the coordinating atoms around the binding sites predicted by MgNet lead to newly discovered binding motifs.

Figure 1

Fig. 2. Investigation of MgNet performance and comparison between MgNet and other methods. (a) The TPR and PPV values of the MgNet model for cross-validation on both the general and high-quality set. Values are obtained from validation results, PPV values on the high-quality set are not shown. (b,c) Example of MgNet-predicted (magenta spheres) versus experimentally determined (green spheres, labelled with residue identifiers) Mg2+ ion sites in (b) 58 nt fragment of Escherichia coli 23S rRNA (PDB ID: 1HC8) and (c) the anticodon loop in tRNAAsp. The predicted site in (c) is shifted upward toward the G30·U40 wobble pair. Four residues shown in red are labelled with the residue names and residue sequence numbers. (d,e) Comparison of the success rates between the MgNet and molecular dynamics (MD) and Brownian dynamics (BD) simulation-based methods for various RMSD cut-offs. The test sets contain seven and three RNA structures for MD-based and BD-based method, respectively. Two different system conditions were used in MD-based method, with Mg2+ as the counterion (CI) ($ {\mathrm{Mg}}_{\mathrm{CI}}^{2+} $) only and with the physiological salt (PS) concentration $ {\mathrm{Mg}}_{\mathrm{PS}}^{2+} $ (Mg2+ counterions and 0.15 M NaCl). (f) Comparison between MetalionRNA (Philips et al.,2011) and MgNet on the general set. The horizontal axis represents the rank of the predictions, where n on the axis means the top-n predictions is used for each RNA, and the vertical axis represents the corresponding TPR and PPV values for the top-n predictions. The cut-off RMSD for a correct hit is 3 Å. Additional information can be found in Supplementary Tables S4–S8.

Figure 2

Fig. 3. Example of saliency calculation for eight binding motifs. These motifs differ by the type of ion coordination (i.e. inner-sphere or outer-sphere coordination), the number and type of the coordinating atoms, and the geometry of the coordination. Saliency values are calculated for eight binding sites: (a) 3Q3Z-V85; (b) 2Z75-B301; (c) 2YIE-Z1116; (d) 1VQ8–08004; (e) 3DD2-B1000; (f) 2QBA-B3321; (g) 4TP8-A1601; (h) 3HAX-E200, and two input channels: volume occupancy (top) and partial charge (bottom). Experimentally determined positions of Mg2+ cation are indicated by green spheres, oxygen atoms in water molecules are shown in small red spheres. Direct coordination (inner-sphere coordination) are shown as magenta dashes, and indirect coordination (outer-sphere coordination, i.e. mediated by water molecules) are shown as black dashes. Residues and coordinating atoms other than oxygen of water molecules are labelled with red text. One extra Mg2+ in (a) is shown as a cyan sphere. The saliency values of RNA atoms are shown in the blue scale, where the atoms with larger saliency values are shown in a darker blue colour.

Figure 3

Fig. 4. Radial frequency distributions and relative saliency distributions of different (ac) atom types and (df) representative atoms around the correctly predicted Mg2+ ion sites. The figure shows the contact radial frequency distributions (a,d), the relative saliency distributions for the volume occupancies (b,e) and the partial charges (c,f), respectively. The frequencies and saliency values are normalized to the [0, 1] range. In (df), only the representative atom of each atom type is shown (with the same colour as the corresponding atom type in (ac)). $ {\overline{\mathrm{O}}}_{\mathrm{r}} $ is the average of two sugar oxygen atoms (O3’ and O5’) due to the similar radial frequencies and relative saliency distributions, and $ {\overline{\mathrm{O}}}_{\mathrm{ph}} $ is the average of the two phosphate oxygen atoms OP1 and OP2. The representative atoms are chosen by selecting the most abundant atom for each atom type. Details can also be found in Supplementary Information.

Figure 4

Fig. 5. Representative sites for newly discovered motifs and relative abundance of various motifs. (a,b) Representative sites are defined by PDB codes, chain id, and the predicted Mg2+ residue number as follows: (a) “16-member ring” (1QU2-T-9) and (b) “Phosphate pyramid” (4FAR-A- 30). Magnesium ions and inner-sphere interactions are shown in green spheres and black dashed lines, respectively. The coordinating RNA atoms and nearby nucleotides are labelled with red text. The “16-member ring” motif involves two inner-sphere coordinating oxygen atoms from two phosphate groups, respectively, separated by one residue (not consecutive phosphate groups). The two coordinating oxygen atoms, the RNA backbone atoms in between, and the Mg2+ form a ring with 16 atoms. The “Phosphate pyramid” motif contains either a “10-member ring” or a “16-member ring” with another inner-sphere ion coordinating the phosphate oxygen atoms, forming a triangular pyramid. (c) Relative abundance of the top-5 previously reported and newly discovered inner-sphere Mg2+ binding motifs in general set (red) and MgRNA benchmark set (Zheng et al.,2015) (blue). The two newly discovered motifs are shown in the inset. The percentage of each motif is calculated by dividing the number of the sites belonging to the corresponding motif by the total number of sites with inner-sphere coordinating RNA atoms.

Supplementary material: PDF

Zhou and Chen supplementary material

Zhou and Chen supplementary material 1
Download Zhou and Chen supplementary material(PDF)
PDF 650.2 KB
Supplementary material: File

Zhou and Chen supplementary material

Zhou and Chen supplementary material 2

Download Zhou and Chen supplementary material(File)
File 195.1 KB

Review: Graph deep learning locates magnesium ions in RNA — R0/PR1

Conflict of interest statement

none.

Comments

Comments to Author: Correct modelling of RNA structure is imperative for the studies of RNA-based medicines, RNA interactions with proteins, etc. In the manuscript the authors try to improve the modelling of the integral yet elusive component of RNA structure such as the binding of Mg2+ ions using a deep learning approach.

I find the manuscript interesting, highly relevant to the today’s challenges in RNA modelling, and overall well-written.

My major comment is the lack of a user-friendly tutorial/documentation supporting the GitHub code for MgNet, which limits the usefulness of the approach and is a pity! This tutorial/documentation should either be provided on the GitHub page or/and presented as a supplementary file in the paper.

Minor comments:

I find this sentence somewhat incomprehensible:

Page 1 lines 63-66. An flexible RNA can lead to an ensemble of low-energy conformations, and Mg2+ binding preferences may be different in different conformations, and may induce the conformational change of the target RNA (Bergonzo and Cheatham, 2017; Bergonzo et al., 2015, 2016).

You start by saying that experimental studies of Mg-RNA interactions are difficult and then provide a conclusion derived by three computational studies. Please paraphrase.

Page 5, in the beginning of Methods section, the authors say “we remove redundant structures of the same RNA”. Please clarify what do you mean by that. Do you remove RNA structures with same sequences, same set of 3D elements (e.g. hairpin, bulge, etc.)? Also please clarify what do you mean by “similar Mg2+ binding sites”, maybe you should use some measure of RMSD of Mg2+ ions? Also, if you have identified same RNA structures with sufficiently different (let’s say RMSD > 2-3Å) binding sites for the Mg2+ ions, it should be commented on.

Also, did you use any criteria or some randomising procedure when dividing the 177 RNA structures into the 5 sets or contrary collected RNA structures with similar sequences/3D motifs/MG2+ binding sites in one group?

Fig.1 Panel b left hand-side image, by looking at a “hive” of the hot spots for Mg2+ binding predicted by MgNet which surrounds an RNA molecule, it seems to me that Mg2+ can bind practically anywhere. According to the provided description of the method, MgNet output the probabilities of the Mg2+ binding. I presume that the hot spots density should be higher in the regions where the binding of an ion should be most probable. Can this be integrated into an image, through some sort of shading? Also, if you just go from the 3D binding probability densities, why do you need clustering? Isn’t it redundant? Please comment on that.

Page 7. Typo, line 197 “experimental RNA structure, where many experimentally sites not included in the set could be…”

I believe the authors meant to write "experimentally derived" or something similar?

Fig. 2 panel A. Some variation of the success rate is seen depending on the tested set (one out of five). Can you comment on that? See my question above about the division into 5 sets. According to your data, is there some RNA structural motif that appears to be more difficult to provide a prediction for? It could be interesting to discuss these aspects.

Review: Graph deep learning locates magnesium ions in RNA — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Comments to Author: In this article, the authors present a machine-learning (ML) approach, called MgNet, to predict Mg2+ binding sites in RNA molecules. This is an important topic, since it is often difficult to observe Mg2+ ions though cryo-EM techniques. The paper is very well presented and the ML approach is validated over a large number of Mg2+-containing structures. MgNet is based on network theory and is an interesting innovation with respect to knowledge-based methods (e.g. Metalion) and molecular dynamics approaches. In my view, the paper will be of broad interest for the RNA community, and useful for structural biologists using cryo-EM.

The paper requires a couple of minor revisions. The prediction of Mg2+ ions has been has made extensive use of quantum mechanical methods, an aspect that should be discussed and is missing in the current version of the paper. The authors claim that this approach can be used by structural biologists to predict metal binding sites. A. github link to the code is provided, but its documentation appears of difficult understanding for an audience that goes beyond computational scientists. This is an issue that should be addressed, with clear guidance in the paper.

Decision: Graph deep learning locates magnesium ions in RNA — R0/PR3

Comments

Comments to Author: Reviewer #1: In this article, the authors present a machine-learning (ML) approach, called MgNet, to predict Mg2+ binding sites in RNA molecules. This is an important topic, since it is often difficult to observe Mg2+ ions though cryo-EM techniques. The paper is very well presented and the ML approach is validated over a large number of Mg2+-containing structures. MgNet is based on network theory and is an interesting innovation with respect to knowledge-based methods (e.g. Metalion) and molecular dynamics approaches. In my view, the paper will be of broad interest for the RNA community, and useful for structural biologists using cryo-EM.

The paper requires a couple of minor revisions. The prediction of Mg2+ ions has been has made extensive use of quantum mechanical methods, an aspect that should be discussed and is missing in the current version of the paper. The authors claim that this approach can be used by structural biologists to predict metal binding sites. A. github link to the code is provided, but its documentation appears of difficult understanding for an audience that goes beyond computational scientists. This is an issue that should be addressed, with clear guidance in the paper.

Reviewer #2: Correct modelling of RNA structure is imperative for the studies of RNA-based medicines, RNA interactions with proteins, etc. In the manuscript the authors try to improve the modelling of the integral yet elusive component of RNA structure such as the binding of Mg2+ ions using a deep learning approach.

I find the manuscript interesting, highly relevant to the today’s challenges in RNA modelling, and overall well-written.

My major comment is the lack of a user-friendly tutorial/documentation supporting the GitHub code for MgNet, which limits the usefulness of the approach and is a pity! This tutorial/documentation should either be provided on the GitHub page or/and presented as a supplementary file in the paper.

Minor comments:

I find this sentence somewhat incomprehensible:

Page 1 lines 63-66. An flexible RNA can lead to an ensemble of low-energy conformations, and Mg2+ binding preferences may be different in different conformations, and may induce the conformational change of the target RNA (Bergonzo and Cheatham, 2017; Bergonzo et al., 2015, 2016).

You start by saying that experimental studies of Mg-RNA interactions are difficult and then provide a conclusion derived by three computational studies. Please paraphrase.

Page 5, in the beginning of Methods section, the authors say “we remove redundant structures of the same RNA”. Please clarify what do you mean by that. Do you remove RNA structures with same sequences, same set of 3D elements (e.g. hairpin, bulge, etc.)? Also please clarify what do you mean by “similar Mg2+ binding sites”, maybe you should use some measure of RMSD of Mg2+ ions? Also, if you have identified same RNA structures with sufficiently different (let’s say RMSD > 2-3Å) binding sites for the Mg2+ ions, it should be commented on.

Also, did you use any criteria or some randomising procedure when dividing the 177 RNA structures into the 5 sets or contrary collected RNA structures with similar sequences/3D motifs/MG2+ binding sites in one group?

Fig.1 Panel b left hand-side image, by looking at a “hive” of the hot spots for Mg2+ binding predicted by MgNet which surrounds an RNA molecule, it seems to me that Mg2+ can bind practically anywhere. According to the provided description of the method, MgNet output the probabilities of the Mg2+ binding. I presume that the hot spots density should be higher in the regions where the binding of an ion should be most probable. Can this be integrated into an image, through some sort of shading? Also, if you just go from the 3D binding probability densities, why do you need clustering? Isn’t it redundant? Please comment on that.

Page 7. Typo, line 197 “experimental RNA structure, where many experimentally sites not included in the set could be…“

I believe the authors meant to write “experimentally derived” or something similar?

Fig. 2 panel A. Some variation of the success rate is seen depending on the tested set (one out of five). Can you comment on that? See my question above about the division into 5 sets. According to your data, is there some RNA structural motif that appears to be more difficult to provide a prediction for? It could be interesting to discuss these aspects.

Decision: Graph deep learning locates magnesium ions in RNA — R1/PR4

Comments

No accompanying comment.