Secondary And Tertiary Structures Of RNA Molecules Biology Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

To understand the role of specific nucleotides in an RNA molecule, researchers currently perform point-wise mutations to observe putative changes in the expression or structural profile of the molecule. Such experiments are critical to identify the mutations that modify the function and structure of RNAs. However, these experiments are time consuming and costly. Experimental studies of all possible mutations are impossible. The choice of the mutations to test is thus critical. While it is not realistic to conduct experimental studies on all mutants, this limitation could be circumvented using computational simulations.

This report describes our progress in developing a computational method for predicting the effect of mutations on the stability of RNA structures, using both topological space and secondary structure partition functions. Since the structure of a molecule dictates its function, this work will enable us to predict multiple point mutations that could enhance or reduce the activity of mutated molecules. Such techniques are essential to the development of RNA-based gene therapies (e.g. the design of interfering RNAs to regulate gene expression) and study genetic diseases (e.g. the destabilizing effects of Single Nucleotide Polymorphisms).


Accurate prediction of deleterious mutations (i.e. mutations changing the structure) in non-coding RNA is essential for assisting experimental analysis of structural RNAs, and deciphering the role of SNPs (Single Nucleotide Polymorphisms) and other mutations associated with diseases.

First, many experimental studies of bio-molecules proceed through mutagenesis experiments. Broadly speaking, these experiments measure the expression levels of mutant sequences in cell cultures in order to investigate the contribution of specific nucleotide to the molecular structure and function. This process is extensively used in genetic studies. A tool enabling the prediction of potentially interesting mutations will accelerate the scientific discovery process. Indeed, an experimental investigation of all possible k-mutants is impossible. In this report, we describe our efforts to develop a fast and reliable computational method to pick the best candidates.

Second, recent studies show a correlation between genetic diseases and the destabilizing effect of SNPs . This phenomenon is still under-documented and we expect that further analyses will reveal that these destabilizing SNPs are even more common than we currently suspect. It follows that a method to predict deleterious mutations will be very helpful to annotate and predict these mutations. In particular, we expect that our methods will help to reveal potential disease-associated mutations that have not been identified through comparative genomic studies due to the lack of data and the rarity of some diseases. The development of personalized genomics and medicine reinforces the need for quick and reliable techniques for the analysis of genomic data generated by the high-throughput next-generation sequencers. The methodology described in this report promises to be fast (i.e. dynamic programming algorithms) and accurate (i.e. we use a tertiary structure model) and thus perfectly suited to this task.

Literature review

RNA structure prediction

Advances in sequencing technology have made abundant RNA sequence information available, but the challenge of how to interpret these data remains. Because RNA structure determination is often experimentally difficult despite advances in RNA crystallography, nuclear magnetic resonance spectroscopy, and chemical probing, RNA structure prediction is an important tool for generating hypotheses about sequence-structure-function relationships in RNA .

The description of the MC-Fold and MC-Sym software pipeline to mimic hierarchical folding of RNA molecules has recently been published . The first program, MC-Fold, determines low free energy 2D structures that are input to MC-Sym, which produces all-atom 3D structures. Both MC-Fold and MC-Sym are based on nucleotide cyclic motifs , the smallest non divisible units in a graph grammar description of RNA structure, which includes phosphodiester linkage information, and base pairing and base stacking interactions .

RNA mutational analysis

In recent years, the importance of RNA mutational analysis has been growing. New discoveries in non-coding regulatory RNAs, as well as notable advances in the understanding of RNA viruses have led to an increasing number of mutagenesis experiments, and in turn to the development of programs that can computationally predict and analyze the effect of unique point mutations on the structure and function of RNAs. In a variety of cases of biological importance, ranging from hepatitis C Virus (HCV) replication and translation initiation to bacterial resistance against antibiotics , or to the function mechanism of spliced leader RNA , it was shown that point mutations causing a conformational rearrangement in the RNA secondary structure may bring about a complete change in the function.

One program that can computationally predict and analyze deleterious mutations is RNAmutants . RNAmutants allows users to analyze the low energy ensemble of mutant RNA sequences and structures. Given an RNA sequence s of n nucleotides, an upper bound K for the number of mutations allowed, a desired number N of secondary structures samples to be generated, and a temperature T, RNAmutants computes the following for all k ≤ K : (i) the MFE structure MFETk, its free energy and the Boltzmann partition function ZTk, over all secondary structures of all k-point mutants; (ii) a plot of the ensemble free energy −RT ln ZTk, as a function of k; and (iii) a collection of N RNA mutant sequences and their secondary structures, as sampled using the partition function . By comparing low-energy structures from mutant RNA with the consensus structures from the Rfam database , one can infer putative deleterious mutations.

Chemical probing

Chemical probing of RNA is a field whose basic methods were worked out 25-30 years ago . An RNA of interest is treated with a chemical reagent that 'modifies' the RNA in some way, e.g. dimethyl sulfate . The reagent can be a small organic molecule, a metal ion, or an RNAse enzyme. The experiment is performed so that reaction with the RNA is relatively limited and any two-modification events are uncorrelated. Modification can result either in cleavage of the RNA or in formation of a covalent chemical adduct between the RNA and probe molecule. Chemical cleavage is usually detected by resolving end-labeled RNA fragments by size. Both classes of modification, cleavage and adduct formation, can be detected as a stop to primer extension mediated by a reverse transcriptase enzyme with sites of modification inferred from the length of the resulting cDNA fragments.

Materials and methods

Molecules studied

To better understand the effect of mutations on the structure of RNA molecules, we first performed a thorough analysis of the effect of mutations on the human Iron Response Element (IRE). The IRE is a short conserved stem-loop, which is bound by iron response proteins. It is found in the UTRs (Untranslated Regions) of various mRNAs whose products are involved in iron metabolism . We chose the IRE for two reasons. First, it is a short sequence (29 nucleotides), enabling an exhaustive analysis of all possible SNPs. Second, it has a simple hairpin secondary structure, a common and important structure for RNA molecules, on which the effect of mutations are clearly identifiable.

We then modeled the pre-catalytic type I hammerhead ribozyme and one of its mutants to illustrate the effect of mutations on a more complex structure. Hammerhead ribozymes are RNAs that self-cleave via a small conserved secondary structural motif termed a hammerhead because of its shape .

First, we used MC-Fold|MC-Sym to predict the secondary and tertiary structures respectively, of the wild type and mutant RNA molecules. Second, we used RNAmutants to determine the "mutational landscape" of the molecules.

Modeling and predicting RNA structures with MC-Fold|MC-Sym

We used the Web MC-Fold|MC-Sym Web service available at IRIC. We input a sequence of nucleotides to the first program, MC-Fold, which determines low free energy 2D structures. We can then submit any of these 2D structure predictions to MC-Sym, which produces all-atom 3D structures .

In a benchmark test of 13 sequences of 20 to 50 nucleotides with known 3D structures, including hairpin, multi-branched, and pseudoknotted RNAs, the pipeline predicted on average 98% of all the base pairs (including the non-Watson-Crick) found in their corresponding native structures. For each tested sequence, the conformational space included models that agree with the native structure to an average of 2 Å RMSD over all atoms (but H). These data show that the pipeline achieves a predictive accuracy of 3D structure that is superior to any other program . The greater accuracy of prediction is achieved because of a fundamental difference between the pipeline and other approaches, its first-order object: the nucleotide cyclic motif (NCM). The NCMs are basic building blocks of RNA structures. Single-stranded NCMs define hairpin loops; double-stranded NCMs define tandems of base pairs, and interior and bulge loops. The fitting of sequences in suites of NCMs is well characterized by a scoring function of four terms . The common base pair between two adjacent NCMs determines its validity. At the 3D level, a library of 3D fragments is associated to each sequence-instantiated NCM. Building the 2D and corresponding 3D structure invokes the same fusion algorithm.

Determining the mutation landscape with RNAmutants

RNAmutants uses efficient dynamic programming algorithms to compute in polynomial time and space the minimum free energy structures and the Boltzmann partition function for each k-mutants (sequence with k mutations) .

We input the IRE sequence and set the maximum number of mutations at 5. RNAmutants outputs the values of the partition function Z(k), defined as the sum of Boltzmann factors e(-E(S)/RT) over all secondary structures S of all k-point mutants of the given RNA sequence. RNAmutants also outputs the "superoptimal" secondary structure, defined as that secondary structure having the minimum free energy over all secondary structures of all k-point mutants of the given RNA sequence.



Figure 1 is a graphical representation of the secondary structure of the wild-type IRE molecule, rendered by MC-Fold. Figure 5 is a graphical representation of the tertiary structure of the wild-type IRE molecule, rendered by MC-Sym, and based on the MFE secondary structure outputted by MC-Fold. The wild-type IRE has a basic hairpin structure, an important and common shape in RNA structure.

Table 1 displays the minimum free energy secondary structures, in the dot-bracket notation, resulting from all possible SNPs of the IRE, as predicted by MC-Fold. The effect of the SNPs on the structure of the molecule varies depending on the nature and the position of the mutation.

38 out of 87 mutants have a secondary structure identical to that of the wild-type molecule (See Figures 2 and 4). Notably, mutating nucleotides located in the loop, at positions 14, 15 and 16, has no effect on the molecule' structure (See Figure 2). 9 out of 87 mutants have a drastically changed secondary structure. A three-way junction has replaced the hairpin (See Figure 3). The remaining 40 mutants, although not identical to the wild-type, conserve the basic hairpin structure (See Figure 6 for an example).

Table 2 is a more detailed look at the topospace of one particular SNP, A23C. All possible secondary structures for the SNP are displayed and ranked by free energy. It is important to look at the conformational ensemble, since the minimum energy structure is not the only structure possible or even always the most dominant.

Interestingly, the wild-type hammerhead ribozyme and the mutant SNP G4U have identical secondary structures (See Figures 7 and 8), but drastically different tertiary structures (See Figures 9 and 10).


The output of RNAmutants for IRE is displayed in Table 3. While the previous exhaustive enumeration of all mutations with MC-Fold limited us to one mutation per sequence, RNAmutants was able to analyze the effect of up to five mutations per sequence, in a few minutes, by using a partition function.


As is apparent from Table 1, not all SNPs have the same effect on the structure of RNA molecules. Some have no effect (see Figure 2). Others drastically alter the structure (see Figure 3). Both the position and the nature of the mutation determine its effect on the structure.

Mutating nucleotides in the IRE that are not involved in base-pairing interactions, such as positions 14, 15 and 16 of the loop, has no effect on the structure (See Table 1). At position 9, the conversion of uracil to guanine disrupts the stem and results in a three-branched molecule (see Figure 3). Guanine cannot base pair to nucleotide 22, another guanine, as in the wild-type structure. On the other hand, the conversion of uracil to cytosine at the same position does not disrupt the stem. Cytosine has a strong inclination to base-pair with guanonine. The wild-type structure, a single hairpin, remains intact (see Figure 4).

Our results also expose the limitations of analyzing the effect of mutations on RNA molecules with MC-Fold|MC-Sym.

First, since the number of mutations increases exponentially with the number of nucleotides, it is only possible to enumerate all mutations for very short sequences. There are 4n different sequences of n nucleotides. Even for a short sequence such as the Iron Response Element (29 nucleotides), there over 2.88 * 1017 possible mutants. If we impose an arbitrary limitation of one SNP per sequence, as we did to simplify our analysis in Table 1, there are still 3 * n = 3 * 29 = 87 mutations. It is thus impossible to analyze the effect of all possible mutations using MC-Fold|MC-Sym, except on very short sequences.

Second, to simplify our analysis, we took into account only the minimum free energy secondary structure (first prediction in the list). RNA molecules have multiple conformations (Table 2) and the minimum free energy structure is not necessarily the dominant one. Ideally, all possible conformations should be taken into account to understand the effect of mutations. Again, MC-Fold|MC-Sym offers no means to compare the ensemble of secondary structures.

RNAmutants addresses all three issues by determining the mutational landscape in a computationally efficient manner. However, RNAmutants is currently restricted to secondary structures. Mutations that do not affect the secondary structure may affect the tertiary structure, as illustrated by SNP G4U of the hammerhead ribozyme (See Figures 7-10). RNAmutants also does not take into account the NCMs, relying rather on thermodynamic parameters, which we believe to be a less reliable structure prediction method.

Finally, our analysis does not include any experimental data to validate our software predictions. This is what I will address in my second rotation.

Future research

To remediate the problems raised above, improvements to the current analysis technique are required.

Dynamic programming version of MC-Fold

The current implementation of the MC-Fold software uses a brute force approach to explore the complete conformational landscape. This strategy results in reduced performances when the software is run on large sequences. Currently, sequences with a length larger than 100 nucleotides require several hours to be completed. Nevertheless, it turns out that the cycle decomposition proposed by the Major's group and implemented in the MC-Fold program to predict RNA 2D structures is compatible with the recursive equations for RNA secondary structure prediction of Zuker and Stiegler . Indeed, the work of Lefebvre shows how any RNA model that can be expressed with a decomposition of structures in loops, can be solved using dynamic programming techniques. We propose to re-design and re-implement MC-Fold to fit this classical dynamic programming scheme. This upgrade will enable us to run the MC-Fold program in seconds on sequences with a couple of hundreds of nucleotides, and to run the software on very large RNA molecules (more than one thousand of bases).

Integrating MC-Fold|MC-Sym and RNAmutants

Once the dynamic programming version of MC-Fold|MC-Sym is implemented, we will integrate the dynamic programming equations into the RNAmutants framework, a software program developed by our collaborator at McGill, Dr. Waldispühl . Such an operation is possible since the theoretical framework used by RNAmutants generalizes the techniques developed in Lefebvre . This extension will enable the analysis of the consequences of mutations on 3D structures. Currently, RNAmutants is restricted to considering secondary structures. Integrating the cycle decomposition model in RNAmutants will alleviate this restriction, and in particular, will permit the prediction of deleterious mutations changing the RNA tertiary structure, but conserving the same secondary structure. Such an extension is crucial since a loss of function may result from a change in the tertiary structure, which is not visible at the secondary structure level.

Integrating MC-Fold|MC-Sym and RNAmutants will also enable us to use MC-Fold|MC-Sym to view the entire mutational landscape in a computationally efficient manner, since RNAmutants uses a partition function to capture the mutational ensemble .

Validating and feeding back the RNA model

Finally, we need to validate experimentally the software predictions. After predicting the structure of molecules in silico, we will analyze them in vitro by dimethyl sulfate (DMS) footprinting. We use a protocol in which dimethyl sulfate modification of the base-pairing faces of unpaired adenosine and cytidine nucleotides is used for structural analysis of RNAs. The protocol is optimized for RNAs of small to moderate size (≤500 nt). The RNA is first exposed to DMS under conditions that promote formation of the folded structure or complex, as well as 'control' conditions that do not allow folding or complex formation. The positions and extents of modification are then determined by primer extension, polyacrylamide gel electrophoresis and quantitative analysis. From changes in the extent of modification upon folding (appearance of a 'footprint'), it is possible to detect local changes in the secondary and tertiary structure of RNA

First, we treat the RNAs of interest with DMS. DMS methylates the N1 of adenines and N3 of cytosines. Second, we anneal a labeled primer to the DMS treated RNAs, and perform a reverse transcription reaction to the site of the modification. Third, we destroy the RNA and sequence the DNA. We compare the result with the unfolded RNA (folded in absence of MgCl2), as well as with untreated RNAs (no DMS), giving us the folding specific trace. We then compare the trace of the wild-type RNA molecule to the trace of mutant RNA molecules.

DMS reactivity cannot be linked directly to solvent accessibility. It rather produces a complex function that depends on the environment. It is therefore a great method to say yes/no to a structural change, but the analysis of the change can be difficult without good modeling. Das's group at Stanford has automated this modeling using the sequencer's output, which is much easier to understand and interpret than a sequencing gel . Das and colleagues published an algorithm for this that we will adapt to our problem . An alternative approach would be the SHAPE method . The difference between DMS and SHAPE is essentially the reactive used to modify the RNA. Both methods could be used in conjunction to provide more information.

The experimental data collected will be used to validate and improve the software predictions.


The anticipated outcomes of this project are the following. First, the experimental biologists will have a tool to guide them in their mutagenesis experiments and therefore save a substantial amount of time and money. Second, the medical community and pharmaceutical companies will be able to detect and understand pathological mutations and therefore design gene therapy drugs more efficiently.

Contribution and acknowledgements

First, I would like to thank Dr. Major for accepting me in his laboratory. I would also like to thank all the other members of the laboratory for their warm welcome.

In particular, I would like to thank Dr. Major for describing to me the project and its future. I would like to thank Véronique Lisi for reviewing some of my manuscripts, Marc-Frédéric Blanchet for showing me how to use the MC-Fold|MC-Sym software and Paul Dallaire for discussing with me the implementation of the partition function in RNAmutants.


Iron Response Element



Secondary structure
















































































































































































Table 1. Secondary structure predictions of the wild-type and mutants of the IRE molecule in dot-bracket notation, determined by MC-Fold. The first letter of the SNP indicates the wild-type nucleotide. The number indicates the nucleotide's position in the sequence. The second letter indicates the mutated nucleotide. Paired brackets indicate paired nucleotides. Dots indicate unpaired nucleotides.


Free energy rank

Secondary structure

Free energy





























































Table 2. Secondary structures of mutant IRE SNP A23C, ordered by decreasing amounts of free energy.

RNAmutants output for IRE




RNA sequence






























Table 3. RNAmutants output for the IRE. k is the number of mutations. Z(k) is the value of the partition function for the sequence with k mutations. RNA sequence (k) is the RNA sequence with k mutations that has the least free energy. Mutations are in lower case. MFE(k) is the optimal secondary structure of the RNA sequence. Energy(k) is the amount of free energy of the structure of the optimal sequence.


All figures were obtained with MC-Fold and MC-Sym.

Figure 1. IRE Wild-type

Figure 2. IRE SNP G15C

Figure 3. IRE SNP U9G

Figure 4. IRE SNP U9C

Figure 5. IRE Wild-type tertiary structure.

Figure 6. IRE SNP U9C tertiary structure

Figure 7. Wild-type hammerhead ribozyme secondary structure

Figure 8. Mutant SNP G4U hammerhead ribozyme secondary structure

Figure 9. Wild-type hammerhead ribozyme tertiary structure.

Figure 10. Mutant SNP G4U hammerhead ribozyme tertiary structure.