A complete understanding of the function of RNA molecules requires knowledge of their higher order structures (2D and 3D) as well as the characteristics of their primary sequence. RNA structure is important for many functions, including regulation of transcription and translation, catalysis, transport of proteins across membranes and the regulation of RNA viruses. The understandings of these functions are important for basic biology as well as for the development of drugs that can intervene in cases where pathological functionality of these molecules occurs.
Our group does research and development of methodologies for improving RNA folding and analysis techniques to help further our understanding of the functional properties of these molecules. In addition, we are focusing on the emerging field of RNA nanobiology. RNA represents a relatively new molecular material for the development of biologically oriented nano devices. It is an interesting material because of its natural functionalities, its ability to fold into complex structures and self-assemble. We have developed computational and experimental methodologies that permit the design of RNA-based nanoparticles that potentially have a variety of uses. Thus, our research on RNA covers five highly related and integrated areas of research:
- Research in algorithms for RNA secondary structure prediction and analysis;
- RNA biology and its relationship to sequence and secondary structure folding characteristics;
- Research in algorithms for RNA 3D structure prediction and analysis and their application to RNA biology;
- Research in algorithms for the design and analysis of RNA nanoparticles;
- Experimental design, synthesis and delivery of RNA-based nanoparticles.
What is learned in one area is applied to the other areas, enhancing our understanding of RNA structure, function, and RNA nanobiology and self-assembly.
Parallel Computational Biology and RNA Structure
Revolutionary changes in computational paradigms are required to maintain the necessary computational power to solve problems in molecular biology. Methodologies based on sequential computer architectures could not be expected to continually keep pace with the needed computational speeds. In order to accommodate the high speeds that are necessary, highly parallel computational techniques are now employed. Our group was one of the pioneers in the area of computational biology and the use of parallel high performance computer architectures for this endeavor.
Computational Techniques for RNA Secondary Structure Prediction and Analysis
We were the first to develop an RNA folding technique that uses concepts from genetic algorithms. Our algorithm, MPGAfold, was originally developed to run on a massively parallel SIMD supercomputer, a MasPar MP-2 with 16384 processors. This algorithm was modified and now runs on parallel high performance Linux clusters. Exceptional scaling characteristics are obtained with the ability to run the algorithm with hundreds of thousands of population elements. RNA pseudoknot prediction is part of the genetic algorithm, resulting in its ability to predict tertiary interactions. Other features include simulation of co-transcriptional folding, the ability to incorporate different energy rules, and the forced inhibition and embedding of desired helical stems. In addition, STRUCTURELAB, our heterogeneous bioinformatical RNA analysis workbench, can be used in conjunction with MPGAfold and RNA2D3D to produce predicted 3D atomic coordinates of RNA structures along with the visualization of these structures. Also, we developed a novel interactive visualization methodology that is part of STRUCTURELAB. This technique enables the comparison and analysis of multiple sequence RNA folds from a phylogenetic point of view, thus allowing improvement of predicted structural results across a family of sequences.
We developed KNetFold, a novel and powerful algorithm for RNA structure prediction from sequence alignments. The algorithm uses a unique hierarchical classification network based on mutual information, thermodynamics and Watson-Crick base-pairedness to predict structures. In addition, we have developed a web-based application, CorreLogo, that uses mutual information derived from RNA sequence alignments to determine covariations amongst base-paired positions. The algorithm includes a unique error measure and depicts results in 3D.
We developed, CyloFold, a unique algorithm for predicting, from a single sequence, RNA secondary structures that may include pseudoknots. This algorithm utilizes a novel technique that approximates the potential for 3D steric clashes in the predicted structures, thus filtering out those structures from consideration. The algorithm has been shown to have high accuracy when compared to other algorithms of its type.
We developed web software based on a Bayesian statistical approach that estimates the accuracy of base pair formation from data derived from SHAPE (Selective 2' - Hydroxyl Acylation analyzed by Primer Extension) experiments. The statistical/probabilistic results were derived by analyzing known RNA 3D structures having various types of known base interactions, and correlating them with SHAPE values. It was shown that low SHAPE values correlate well with Watson-Crick base pairing and stacking interactions while high SHAPE values indicate single stranded regions. Improvements could be seen if a 2 or 3 base context was also taken into account. We also showed that other types of known interactions did not correlate well. This type of information is helpful in ultimately determining the secondary structure of RNAs.
Computational Studies of RNA Folding Pathways
RNA folding pathways are proving to be quite important in the determination of RNA function. Studies indicate that RNA may enter intermediate conformational states that are key to its functionality. These states may have a significant impact on gene expression. It is known that the biologically functional states of RNA molecules may not correspond to their minimum energy state, that kinetic barriers may exist that trap the molecule in a local minimum, that folding often occurs during transcription, and cases exist in which a molecule will transition between one or more functional conformations before reaching its native state. Thus, methods for simulating the folding pathways of an RNA molecule, including co-transcriptional folding, and locating significant intermediate states are important for the prediction of RNA structure and its associated function. Several biological RNA folding pathways have been successfully studied using MPGAfold and STRUCTURELAB. Examples include the potato spindle tuber viroid, the host-killing mechanism of Escherichia coli plasmid R1, the hepatitis delta virus, HIV, and the dengue virus. These computational results are consistent with those derived from biological experiments. In addition, novel structural interactions and important functional intermediate and native states have been predicted. These have led to further successful confirmatory experiments.
Computational Prediction of RNA Interaction Networks
We have also developed programs CovaRna and CovStat to explore long-range co-varying RNA interaction networks using whole genome alignments. This new methodology, which was applied to Drosophila genomes, is currently being applied to other genomes. A parallel version of the program was devised to speed-up processing and the algorithms also rely on fast indexing schemes and conservative statistical methods to determine highly significant interactions. The methodology has found interesting interactions that are related to endogenous siRNAs, gene transport and genes related to morphogenesis.
Computational Studies of Three-Dimensional RNA Structures
Some structural elements of RNA molecules have been studied using molecular mechanics and molecular dynamics simulations. The structures examined include an RNA tetraloop where temperature-dependent denaturation of the tetraloop and the subsequent refolding to the original crystal structure were performed. A three-way junction from the core central domain of the 30S ribosomal subunit from Thermus thermophilus was explored. It has been experimentally determined that the intermolecular interactions between the three-way junction and the S15 ribosomal protein initiate the process of the assembly of the 30S ribosomal subunit. By using molecular dynamics simulations we obtained insights into the conformational transitions of the junction associated with the binding of S15. We determined using, molecular dynamics simulations, the structural effects of utilizing new types of modified RNA nucleotides containing carbocyclic sugars that are constrained to north or south conformations (C2' or C3' exo). In addition, we showed using molecular dynamics simulations, how ions and flanking bases play a very important role in human immunodeficiency virus (HIV) kissing loop monomer conformations. These results correlate well and may explain in detail, experimental studies that indicate the importance of ions for HIV-1 dimerization.
We have also examined the pseudoknot domain of telomerase. Molecular modeling and molecular dynamics of the pseudoknot domain, including its hairpin loop, were performed. Results indicated how the hairpin loop dynamics affected the opening and closing of the non-canonical U-U base pairs found in the stem. The opening suggested nucleation points for the formation of the pseudoknot. We have also examined the effect of dyskeratosis congenita (DKC) mutations in the loop and how they reduced the propensity for the opening of the stem by forming a relatively stable hydrogen bond network in the hairpin loop. We modeled the pseudoknot itself using our RNA2D3D software combined with phylogenetic analysis. We studied the dynamical impact of the DKC mutations on the pseudoknot with the result that the pseudoknot became unstable while the hairpin form became more stable.
We discovered and elucidated the 3D structures of new types of translational enhancers that are found in the 3' UTRs of the Turnip Crinkle Virus (the first of its kind found) and the Pea enation Mosaic Virus. The discovery of these structural elements has brought to light new mechanisms for translational enhancement in eukaryotic plant viruses that may have broader implications for understanding translational mechanisms in general. This was accomplished with the combined use of MPGAfold, our 3D molecular modeling software RNA2D3D, and close interactions with our experimental collaborators. We also modeled a novel pseudoknot found in the CCR5 mRNA. This pseudoknot is involved in frameshifting and appears to be stabilized by a microRNA, a novel function for a microRNA.
In addition, we have employed methods based on elastic network interpolation to reduce the computational costs related to RNA 3D dynamics. Three-dimensional dynamics trajectories can be determined using a reduced atom representation and given conformational states. Compute time can be reduced from weeks to hours using this approach.
Computational RNA Nanobiology
RNA nanobiology represents a new modality for the development of nanodevices that have the potential for use in a number of areas, including therapeutics. Building on our experience as outlined above, we developed several computational and experimental techniques (see below) that provide a means to determine a set of nucleotide sequences that can assemble into desired nano complexes. One of these tools is a relational database called RNAJunction. The database contains structure and sequence information for known RNA helical junctions and kissing loop interactions. These motifs can be searched for in a variety of ways, providing a source for RNA nano building blocks. Another computational tool, NanoTiler, permits a user to construct specified RNA-based nanoscale shapes. NanoTiler provides a 3D graphical view of the objects being designed and provides the means to work interactively or with computer scripts on the design process even though the precise RNA sequences may not yet be specified, and an all-atom model is not available. NanoTiler can use the 3D motifs found in the RNAJunction database with those derived from specified RNA secondary structure patterns to build a defined RNA nano shape. Also, a combinatorial search can be applied to enumerate structures that would not normally be considered.
Another web-based software tool for RNA nanostructure design is NanoFolder, which is one of the few software tools that are capable of predicting the structure and sequence attributes of multi-stranded RNA constructs. With this software it is possible to specify the desired secondary structure motifs and have the software predict the set of sequences that generate these desired motifs with the correct intra- and inter-strand folding characteristics.
Experimental RNA Nanobiology
Based on the above described computational approaches to RNA nanodesign we have demonstrated the ability to experimentally self-assemble and functionalize several RNA-based nanoparticles. This was accomplished with close interactions between the experimental and computational approaches leading to enhancements to both sets of methodologies. Examples include the self-assembly of 6 and 10 stranded cubes; the self-assembly of hexagonal rings of various sizes and double rings utilizing an RNA motif extracted from nature; the modification of sequences in the motif to improve yield while also maintaining appropriate geometries; and the self-assembly of triangular structures. We also developed techniques that define self-assembly protocols and that allow for co-transcriptional assembly of constructs that can also include modified bases to increase the chemical stability of these nanoparticles. In addition, we have functionalized these particles with up to six different siRNAs to enable controlled stoichiometry and gene silencing, and showed that these particles do indeed silence the designated genes when transfected into various cell lines.
We have also been exploring another paradigm based on the use of RNA/DNA hybrid nanoconstructs containing split functionalities. This allows, for example, the splitting of a Diceable siRNA into two DNA/RNA hybrid components with DNA toeholds, which when transfected into cells reassembles into a DNA duplex and a Diceable siRNA. This hybrid approach has been incorporated in our hexagonal nanorings and nanocubes. The utility of this approach permits, amongst other things, controlled activation of functionalities, incorporation of molecular beacons on the DNA strands without intefering with RNA functionality and resistance to nuclease degradation. This approach has been tried successfully in cell cultures and xenograph tumor mouse models.
Many of the computational systems have been adapted to other environments inside and outside our laboratory and the NIH and are accessible through our web site at http://www-CCRNP.ncifcrf.gov/~bshapiro.