Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-17T10:32:13.196Z Has data issue: false hasContentIssue false

RNA structure through multidimensional chemical mapping

Published online by Cambridge University Press:  18 March 2016

Siqi Tian
Affiliation:
Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
Rhiju Das*
Affiliation:
Department of Biochemistry, Stanford University, Stanford, CA 94305, USA Department of Physics, Stanford University, Stanford, CA 94305, USA
*
*Author for Correspondence: Department of Biochemistry, Stanford University, Stanford, CA 94305, USA. Tel.: (650) 723-5976; E-mail: rhiju@stanford.edu
Rights & Permissions [Opens in a new window]

Abstract

The discoveries of myriad non-coding RNA molecules, each transiting through multiple flexible states in cells or virions, present major challenges for structure determination. Advances in high-throughput chemical mapping give new routes for characterizing entire transcriptomes in vivo, but the resulting one-dimensional data generally remain too information-poor to allow accurate de novo structure determination. Multidimensional chemical mapping (MCM) methods seek to address this challenge. Mutate-and-map (M2), RNA interaction groups by mutational profiling (RING-MaP and MaP-2D analysis) and multiplexed •OH cleavage analysis (MOHCA) measure how the chemical reactivities of every nucleotide in an RNA molecule change in response to modifications at every other nucleotide. A growing body of in vitro blind tests and compensatory mutation/rescue experiments indicate that MCM methods give consistently accurate secondary structures and global tertiary structures for ribozymes, ribosomal domains and ligand-bound riboswitch aptamers up to 200 nucleotides in length. Importantly, MCM analyses provide detailed information on structurally heterogeneous RNA states, such as ligand-free riboswitches that are functionally important but difficult to resolve with other approaches. The sequencing requirements of currently available MCM protocols scale at least quadratically with RNA length, precluding general application to transcriptomes or viral genomes at present. We propose a modify-cross-link-map (MXM) expansion to overcome this and other current limitations to resolving the in vivo ‘RNA structurome’.

Information

Type
Review
Copyright
Copyright © Cambridge University Press 2016 
Figure 0

Fig. 1. Schematics for multidimensional expansions of chemical mapping to infer RNA structure. (a) Schematic of 1D chemical mapping and simulated reactivity profile. The red pin illustrates a chemical modification event on an exposed (non-base-pairing) nucleotide. The red and green circles highlight a reactive (exposed) and unreactive (protected) nucleotide, respectively. (b) Schematic of 2D chemical mapping through the mutate-and-map (M2) strategy. A sequence mutation (cyan) breaks a base pair, exposing both itself and its partner (red), resulting in measurable increases in chemical reactivity at the partner (right). On a full dataset with mutations made separately at every position (right), a diagonal feature should trace perturbations near each single mutation position, while cross-diagonal features should report individual residues released upon mutation of their pairing partners. (c) Schematic of 3D chemical mapping. When all double mutants are chemically mapped, the entire dataset would fill a cube (mutate-mutate-map, M3, right). In practice, a smaller set of single and compensatory double mutations can target particular base-pair hypotheses. A quartet of chemical mapping profiles (WT, MutA, MutB, and MutAB) illustrates mutate-map-rescue (M2R, bottom). Here, perturbations that occur upon single mutations (at base pair partners, in MutA; or delocalized changes, in MutB; outlined in red) are rescued upon concomitant double mutation (outlined in green, MutAB). In all panels, simulated data are shown to illustrate concepts; see subsequent figures for experimental data. Orange dotted lines connect specific nucleotides or nucleotide pairs in RNA (left) to corresponding positions in multidimensional data (right).

Figure 1

Table 1. Multidimensional chemical mapping methods for RNA structure characterization

Figure 2

Fig. 2. Proof-of-concept experiments for the M2 methodology. (a) Experimental M2 measurements (left) and secondary structure (right) of a H-20/X-20 DNA/RNA hybrid construct (Kladwang & Das, 2010). Single mutations of the H-20 DNA result in mismatches in the hybrid helix, exposing nucleotides in the X-20 RNA (purple) to DMS chemical modification. Purple line outlines region with expected base pair features; orange, blue, and green circles highlight a few strong features that correspond to expected base pairs. (b) M2 data and secondary structure of a MedLoop test RNA (Kladwang et al. 2011a). The test helix is designed to be mostly A/C on one side and U/G on the other. DMS (blue) and CMCT (red) M2 datasets are overlaid. Regions corresponding to expected base pairs from the step are outlined in green on the data. Yellow and cyan circles mark a few single-nucleotide features in the M2 data (left) that demarcate specific base pairs (right). In both (a) and (b), yellow arrows mark perturbations from mutation that extend beyond ‘punctate’ release of a single base pair and involve disruption of an entire helix. RMDB Accession IDs for datasets shown: (a). X20H20_DMS_0001; (b). MDLOOP_DMS_0002 and MDLOOP_CMC_0002.

Figure 3

Fig. 3. M2 reveals secondary structure of natural non-coding RNA domains. (a) M2 data and secondary structures of a double glycine riboswitch from F. nucleatum (Butler et al. 2011; Lipfert et al. 2007, 2010). RNA was probed in presence of 10 mM glycine. M2–SHAPE data are shown with helices outlined according to their assigned color. Solid outlines mark helices in which mutations cause punctate or localized increases of SHAPE reactivity around its expected partner, providing evidence for the helix; dashed outlines mark helices that do not give clear mutate-and-map signals. Magenta arrows mark exposure of P3-I loop upon disruption of tertiary structure that results not only from mutation of its tertiary contact partner (PI-II) but also from mutations in other helices. In secondary structures, bootstrapping confidence scores are marked under helix labels. The M2 predicted model using the automated Z-score analysis captured all six helices with > 80% bootstrapping support except for P3-I, which also has an extra base pair. (b) M2 data and secondary structures of the GIR1 lariat-capping ribozyme from D. iridis, RNA-Puzzle 5 (Miao et al. 2015). The data captured all helices and the pk2.1-5 tertiary contact observed in the subsequently released crystal structure (Meyer et al. 2014). Both a P5 helix (dark green) and an alternative alt-P5 (dark red), differing by a single-nucleotide register shift, were modeled by M2 with similar bootstrap supports. Visual inspection of M2-DMS [not shown; see (Miao et al. 2015)] suggested a tertiary contact involving non-canonical pairs between P9 and P2 (gray) that was indeed observed in the subsequently released crystal structure. (c) M2 data and secondary structures of the ydaO cyclic-di-adenosine riboswitch, RNA-Puzzle 12 (Gao & Serganov, 2014; Ren & Patel, 2014). RNA was probed in presence of 10 µM c-di-AMP. The differences of each model compared with the subsequently released crystallographic structure are marked by magenta and gray lines. The secondary structure based on expert sequence analysis (left), assumed by all RNA-Puzzle modelers, included an incorrect P4 (dark red), while the M2 predicted model (right) correctly rearranged this region. (d) M2R data and secondary structures of the GIR1 lariat-capping ribozyme from D. iridis. The discrepancy in M2-predicted model was resolved by M2-rescue data testing base pairs in P5 and alt-P5, showing that compensatory double mutations predicted to rescue P5 succeeded in restoring the sequence's chemical mapping profile (outlined in green) after their disruption by single mutations (outlined in red), while double mutants based on alt-P5 failed to rescue the profile. In panels (a)–(c), yellow arrows mark perturbations from mutation that involve disruption of helices or formation of alternative secondary structure. In panels (a) and (b), rows with red asterisks are mutants for which data were not acquired; to aid visual inspection, these rows have been filled in with wild type data. RMDB Accession IDs for datasets shown: (a). GLYCFN_SHP_0004; (b). RNAPZ5_1M7_0002; (c). RNAPZ12_1M7_0003; (d). unpublished result.

Figure 4

Fig. 4. Schematic of single-molecule correlated modification mapping and data comparison for the Tetrahymena group I intron P4–P6 domain. (a) Schematic of how multiple modifications can read out RNA structure. A primary modification serves as a ‘mutation’ similar to M2, leading to a correlated secondary modification at its base-pairing partner. Multiple chemical modification events on the same RNA are read out by reverse transcription under conditions in which mismatch nucleotides are incorporated into cDNA at modification sites. Simulated data are shown. (b) Secondary structure of the Tetrahymena group I intron P4–P6 domain. (c) M2-DMS measurements for the P4–P6 RNA; helix features color-coded as in (b). (d). Data using DMS in multiple-hit conditions, collected previously for RNA Interaction Group (RING-MaP) analysis (Homan et al. 2014) but displayed here in a distinct ‘MaP-2D’ view. The rate of modifications at each nucleotide position, given a detection of nucleotide modification at every other position, is shown. Each row shows such a profile, normalized by the sum of counts at each position.In panels (c) and (d), red arrows mark exposure of the P5b loop upon disruption of the RNA tertiary structure from not only mutation of this loop's ‘receptor’ (J6a/b) but also other helix perturbations. RMDB Accession IDs for datasets shown: (c) TRP4P6_DMS_0002; (d) adapted from (Homan et al. 2014).

Figure 5

Fig. 5. MOHCA-seq provides pairwise tertiary proximity information of RNA. (a) Schematic of MOHCA-seq (multiplexed •OH cleavage analysis read out by deep sequencing). After generation of hydroxyl radicals (•OH, purple), a strand scission event (red lightning bolt) and the corresponding iron chelate radical source position (yellow circle marked Fe) can be mapped out by subsequent reverse transcription to cDNA (green arrow) and paired-end sequencing. Simulated data are shown. (b) Additional oxidative damage events (red pins) that were not detectable in the original gel-based readout of MOHCA but are detectable by MOHCA-seq through termination of reverse transcription (green arrows). (ce) MOHCA-seq data and tertiary structure models of (c) a double-aptamer glycine riboswitch from F. nucleatum with 10 mM glycine with cross-aptamer tertiary contacts (magenta arrows in MOHCA-seq map), (d) the GIR1 lariat-capping ribozyme from D. iridis, RNA-Puzzle 5, and (e) the ydaO cyclic-di-adenosine riboswitch with 10 µM c-di-AMP, RNA-Puzzle 12. The latter two are blind tests. Structures labeled ‘MCM predicted model’ were based on a MCM pipeline of M2 secondary structure analysis, MOHCA-seq tertiary proximity mapping, and Rosetta computational modeling. Crystal structures are from the protein data bank (PDB), (c) 3P49, (d) 4P8Z, (e) 4QK8. In (d), red asterisks mark two positions that undergo catalytic modification (lariat formation and hydrolytic scission) by the ribozyme; for visual clarity, data at those positions are not shown. MOHCA-seq maps of (ce) are filtered to show features with signal-to-noise ratios above 2 (different from a cutoff of 1 in (Cheng et al. 2015b)). Cyan contours highlight map features corresponding to each secondary structure helix. Other contours mark hits that were inferred through visual inspection of MOHCA-seq maps; to aid visual comparison, only contours including at least one residue pair with phosphorus–phosphorus (P-P) distance <45 Å in the crystal structure are shown. Coloring of these tertiary contours reflect P–P distances of closest approach for residue pairs in the MCM predicted models (green, <30 Å; yellow, 30–45 Å; red, >45 Å). The same coloring is shown for cylinders in bottom panels of structures, which connect pairs of residues of closest distance corresponding to each contour; thick and thin cylinders correspond to strong and weak hits in (Cheng et al. 2015b). Each 3D model is shown with colored cylinders, or helices with matching color as in Fig. 3. MOHCA-seq maps have colored axes matching secondary structure in Fig. 3. In (e), gray spheres show positions of two c-di-AMP ligands in both model and crystal structure. RMDB Accession IDs for datasets shown: (c). GLYCFN_MCA_0002; (d). RNAPZ5_MCA_0001; (e–f). RNAPZ12_MCA_0000.

Figure 6

Fig. 6. M2-REEFFIT reveals hidden states in secondary structure ensembles. (a, b). M2 data (left), fitted cluster weights (center), and fits from REEFFIT, (right) of the ‘Tebowned’ riboswitch designed to interconvert between two states upon binding of flavin mononucleotide (FMN). RNA was probed (a) in absence of FMN and (b) in presence of 2 mM FMN. Red rectangles in (A) mark nucleotide A30, which was not expected to be reactive in either of two target states of the riboswitch, but is explained by a third state uncovered by REEFFIT. (c) Secondary structures of REEFFIT predicted states. TBWN-A and TBWN-B were target states of the riboswitch design problem; TBWN-C was an unexpected state modeled by REEFFIT. (d) Prospective tests of REEFFIT model. 1D-SHAPE profiles of each state-stabilizing mutant agree well with the SHAPE profiles predicted from REEFFIT analysis. Red rectangle marks nucleotide A30, predicted and confirmed to be exposed in TBWN-C-stabilizing mutants. Data are from (Cordero & Das, 2015). RMDB Accession IDs for datasets shown: (d). TBWN_1M7_0000; (b). TBWN_1M7_0001; (d). TBWN_STB_0000.

Figure 7

Fig. 7. MOHCA-seq detects preformed tertiary contacts in riboswitches. MOHCA-seq data and tertiary structures for (a) a double glycine riboswitch from F. nucleatum (including a kink-turning forming leader sequence), probed in presence of 10 mM glycine (left) or in absence of glycine (right); and (b) an adenosylcobalamin (AdoCbl) riboswitch from S. thermophilus (Peselis & Serganov, 2012), probed in presence of 70 µM AdoCbl (left) or in absence of AdoCbl (right). In each right panel, five MCM predicted models with lowest Rosetta energy provide an initial visualization of the ligand-free ensemble compared with the ligand-bound crystallographic structure (left panel). MOHCA-seq map filtering and color-coded contours in left panels (ligand-bound states) are same as in Fig. 5, except that contours for tertiary contacts are colored uniformly in magenta. The same contours are shown in right-hand panels (ligand-free states). Yellow arrows point to regions in the MOHCA-seq maps showing tertiary contacts in the ligand-bound states (left) that appear at lower intensity in the ligand-free states (right). To avoid clutter, not all such hits are marked. RMDB Accession IDs for datasets shown: (a). GLYCFN_KNK_0005 and GLYCFN_KNK_0006; (b). RNAPZ6_MCA_0002 and RNAPZ6_MCA_0003.

Figure 8

Fig. 8. Subsampling of MCM data to determine minimal number of sequencing reads to infer RNA structure. MOHCA-seq data of a double glycine riboswitch from F. nucleatum were used (see also Fig. 3a). A subset (1, 1/5, 1/500, and 1/5000) of the raw FASTQ file was randomly resampled and subjected to the complete COHCOA data processing and error estimation pipeline (Cheng et al. 2015b). Signal-to-noise ratio was estimated as the ratio between the mean of reactivity and the mean of statistical error across the whole dataset. Yellow arrows point to tertiary features that disappear as the number of resampled reads decreases. RMDB Accession IDs for datasets shown: GLYCFN_MCA_0002.

Figure 9

Fig. 9. Scaling of sequencing costs for MCM. Expected sequencing costs (number of reads) versus RNA lengths, plotted on (a) linear scale and (b) logarithmic scale. The plotted values are the number of reads required to achieve usable signal-to-noise levels for 1D, 2D, and 3D chemical mapping methods described or proposed in text. Costs are estimated based on publicly available data for a number of RNAs and transcriptomes and the subsampling procedure described in Fig. 8. Most M2 data (orange triangles) were collected by capillary electrophoresis (CE); conversion to number of Illumina reads was achieved by comparison of signal-to-noise values of CE and Illumina datasets for a 16S rRNA 126–235 four-way junction, for which both measurements are available. References for next-generation sequencing technologies for 1D mapping (blue circles): SHAPE-Seq (Lucks et al. 2011), MAP-Seq (Seetin et al. 2014), SHAPE-MaP (Siegfried et al. 2014) (Mauger et al. 2015), HRF-Seq (Kielpinski & Vinther, 2014), (Kielpinski & Vinther, 2014) Mod-Seq (Talkish et al. 2014), PARS (Kertesz et al. 2010; Wan et al. 2012, 2014), DMS-Seq (Ding et al. 2014; Rouskin et al. 2014), CIRS-Seq (Incarnato et al. 2014), icSHAPE (Spitale et al. 2015). For the studies in which the number of total raw reads was not reported explicitly, plotted values were estimated by total length × coverage/average read length. Statistics (blue squares) from the Eterna massive open laboratory (Lee et al.2014) used the MAP-Seq protocol and involved up to a thousand sequences per round; separate rounds are shown as separate data points. RMDB Accession IDs for datasets shown: (1D). 16SFWJ_STD_0001, TRP4P6_1M7_0006, ETERNA_R80_0001, ETERNA_R82_0001, ETERNA_R83_0003, ETERNA_R86_0000, ETERNA_R87_0003, ETERNA_R92_0000, ETERNA_R93_0000, ETERNA_R94_0000; (2D-M2). 16SFWJ_1M7_0001, 5SRRNA_SHP_0002, ADDRSW_SHP_0003, CIDGMP_SHP_0002, CL1LIG_1M7_0001, GLYCFN_SHP_0004, HOXA9D_1M7_0001, RNAPZ5_1M7_0002, RNAPZ6_1M7_0002, RNAPZ7_1M7_0001, RNAPZ12_1M7_0003, TRNAPH_SHP_0002, TRP4P6_SHP_0003; (2D-MaP). adapted from (Homan et al. 2014); (2D-MOHCA). 16SFWJ_MCA_0003, 5SRRNA_MCA_0001, CDIGMP_MCA_0003, GLYCFN_MCA_0002, HCIRES_MCA_0001, HOXA9D_MCA_0001, RNAPZ5_MCA_0001, RNAPZ6_MCA_0002, RNAPZ7_MCA_0001, TRP4P6_MCA_0004; (3D-M2-rescue). 16SFWJ_RSQ_0001.

Figure 10

Fig. 10. Schematic of the proposed modify-cross-link-map (MXM) expansion. (a) Correlated chemical modifications mark nucleotides brought together by RNA/protein structure in vivo. Shown are two sets of oxidative modifications produced by localized ‘spurs’ of hydroxyl radicals generated by scattering of a high-energy electron from water (Chatterjee et al. 1994; Krisch et al. 1991). (b) Additional processing steps of (i) sparse chemical cross-linking, (ii) nuclease digestion, and (iii) RNA ligation (Helwak & Tollervey, 2014) result in compact, chimeric RNA segments harboring correlated chemical modifications. This procedure removes unstructured RNA loops that yield no pairwise structural information and brings together segments distal in sequence or in different RNA strands. (c) Reverse transcription with mutational profiling (Siegfried et al. 2014) reads out modifications at nucleotide resolution; sequence contexts for the modifications allow their alignment to the reference genome sequence.