Bruce A. Shapiro, Ph.D.
- Center for Cancer Research
- National Cancer Institute
- Building 558, Room 3
- Frederick, MD 21702-1201
Dr. Shapiro directed research on computational and experimental RNA structure prediction and analysis and pioneered research in the emerging field of RNA nanobiology. His work led to several novel RNA folding and analysis algorithms, experimental techniques and discoveries in RNA biology. His interests included RNA nanobiology, nucleic acid structure prediction and analysis, the relationships between RNA structure and function. He fostered a synergy between computational and experimental techniques, where computationally designed novel RNA based nanostructures were shown to be able to self-assemble as predicted and be delivered to cell cultures and mouse models to control gene expression and thus show potential for use in RNA-based therapeutics. For additional information, please visit our web site at https://rnastructure.cancer.gov
Areas of Expertise
1) RNA structure 2) RNA folding 3) RNA nanobiology – computational and experimental 4) computational RNA structure prediction and analysis 5) molecular dynamics 6) RNA 3D modeling
A complete understanding of the function of RNA molecules requires knowledge of their higher order structures (2D and 3D) as well as the characteristics of their primary sequence. RNA structure is important for many functions, including regulation of transcription and translation, catalysis, transport of proteins across membranes and the regulation of RNA viruses. Understanding these functions is important for basic biology as well as for the development of drugs that can intervene in cases where pathological functionality of these molecules occurs.
Our group pioneered research and development of methodologies for improving RNA folding and analysis techniques to help further the understanding of the functional properties of these molecules. In addition, we focused on the emerging field of RNA nanobiology. RNA represents a relatively new molecular material for the development of biologically oriented nano devices. It is an interesting material because of its natural functionalities, its ability to fold into complex structures and self-assemble. We developed computational and experimental methodologies that permit the design of RNA-based nanoparticles that potentially have a variety of uses. Thus, our research on RNA covered five highly related and integrated areas of research:
- Research in algorithms for RNA secondary structure prediction and analysis;
- RNA biology and its relationship to sequence and secondary structure folding characteristics;
- Research in algorithms for RNA 3D structure prediction and analysis and their application to RNA biology;
- Research in algorithms for the design and analysis of RNA nanoparticles;
- Experimental design, synthesis and delivery of RNA-based nanoparticles.
What is learned in one area is applied to the other areas, enhancing our understanding of RNA structure, function, and RNA nanobiology and self-assembly.
Parallel Computational Biology and RNA Structure
Revolutionary changes in computational paradigms are required to maintain the necessary computational power to solve problems in molecular biology. Methodologies based on sequential computer architectures could not be expected to continually keep pace with the needed computational speeds. In order to accommodate the high speeds that are necessary, highly parallel computational techniques are now employed. Our group was one of the pioneers in the area of computational biology and the use of parallel high-performance computer architectures for this endeavor.
Computational Techniques for RNA Secondary Structure Prediction and Analysis
We were the first to develop an RNA folding technique that uses concepts from genetic algorithms. Our algorithm, MPGAfold, was originally developed to run on a massively parallel SIMD supercomputer, a MasPar MP-2 with 16384 processors. This algorithm was modified and now runs on parallel high-performance Linux clusters. Exceptional scaling characteristics are obtained with the ability to run the algorithm with hundreds of thousands of population elements. RNA pseudoknot prediction is part of the genetic algorithm, resulting in its ability to predict tertiary interactions. Other features include simulation of co-transcriptional folding, the ability to incorporate different energy rules, and the forced inhibition and embedding of desired helical stems. In addition, STRUCTURELAB, our heterogeneous bioinformatical RNA analysis workbench, can be used in conjunction with MPGAfold and RNA2D3D to produce predicted 3D atomic coordinates of RNA structures along with the visualization of these structures. Also, we developed a novel interactive visualization methodology that is part of STRUCTURELAB. This technique enables the comparison and analysis of multiple sequence RNA folds from a phylogenetic point of view, thus allowing improvement of predicted structural results across a family of sequences.
We developed KNetFold, a novel and powerful algorithm for RNA structure prediction from sequence alignments. The algorithm uses a unique hierarchical classification network based on mutual information, thermodynamics and Watson-Crick base-pairedness to predict structures. In addition, we developed a web-based application, CorreLogo, that uses mutual information derived from RNA sequence alignments to determine covariations amongst base-paired positions. The algorithm includes a unique error measure and depicts results in 3D.
We developed CyloFold, a unique algorithm for predicting, from a single sequence, RNA secondary structures that may include pseudoknots. This algorithm utilizes a novel technique that approximates the potential for 3D steric clashes in the predicted structures, thus filtering out those structures from consideration. The algorithm was shown to have high accuracy when compared to other algorithms of its type.
We developed web software based on a Bayesian statistical approach that estimates the accuracy of base pair formation from data derived from SHAPE (Selective 2' - Hydroxyl Acylation analyzed by Primer Extension) experiments. The statistical/probabilistic results were derived by analyzing known RNA 3D structures having various types of known base interactions, and correlating them with SHAPE values. It was shown that low SHAPE values correlate well with Watson-Crick base pairing and stacking interactions while high SHAPE values indicate single stranded regions. Improvements could be seen if a 2- or 3-base context was also taken into account. We also showed that other types of known interactions did not correlate well. This type of information is helpful in ultimately determining the secondary structure of RNAs.
Computational Studies of RNA Folding Pathways
RNA folding pathways are proving to be quite important in the determination of RNA function. Studies indicate that RNA may enter intermediate conformational states that are key to its functionality. These states may have a significant impact on gene expression. It is known that the biologically functional states of RNA molecules may not correspond to their minimum energy state, that kinetic barriers may exist that trap the molecule in a local minimum, that folding often occurs during transcription, and cases exist in which a molecule will transition between one or more functional conformations before reaching its native state. Thus, methods for simulating the folding pathways of an RNA molecule, including co-transcriptional folding, and locating significant intermediate states are important for the prediction of RNA structure and its associated function. Several biological RNA folding pathways were successfully studied using MPGAfold and STRUCTURELAB. Examples include the potato spindle tuber viroid, the host-killing mechanism of Escherichia coli plasmid R1, the hepatitis delta virus, human immunodeficiency virus (HIV) and the dengue virus. These computational results are consistent with those derived from biological experiments. In addition, novel structural interactions and important functional intermediate and native states were predicted. These led to further successful confirmatory experiments.
Computational Prediction of RNA Interaction Networks
We also developed the programs CovaRna and CovStat to explore long-range co-varying RNA interaction networks using whole genome alignments. This new methodology, which was applied to Drosophila genomes, was applied to other genomes. A parallel version of the program was devised to speed up processing and the algorithms also relied on fast indexing schemes and conservative statistical methods to determine highly significant interactions. The methodology found interesting interactions that were related to endogenous siRNAs, gene transport and genes related to morphogenesis.
Computational Studies of Three-Dimensional RNA Structures
Some structural elements of RNA molecules were studied using molecular mechanics and molecular dynamics simulations. The structures examined include an RNA tetraloop where temperature-dependent denaturation of the tetraloop and the subsequent refolding to the original crystal structure were performed. A three-way junction from the core central domain of the 30S ribosomal subunit from Thermus thermophilus was explored. It was experimentally determined that the intermolecular interactions between the three-way junction and the S15 ribosomal protein initiate the process of the assembly of the 30S ribosomal subunit. By using molecular dynamics simulations we obtained insights into the conformational transitions of the junction associated with the binding of S15. We determined, using molecular dynamics simulations, the structural effects of utilizing new types of modified RNA nucleotides containing carbocyclic sugars that are constrained to north or south conformations (C2' or C3' exo). In addition we showed, using molecular dynamics simulations, how ions and flanking bases play a very important role in HIV kissing loop monomer conformations. These results correlate well and may explain in detail, experimental studies that indicate the importance of ions for HIV-1 dimerization.
We also examined the pseudoknot domain of telomerase. Molecular modeling and molecular dynamics of the pseudoknot domain, including its hairpin loop, were performed. Results indicated how the hairpin loop dynamics affected the opening and closing of the non-canonical U-U base pairs found in the stem. The opening suggested nucleation points for the formation of the pseudoknot. We also examined the effect of dyskeratosis congenita (DKC) mutations in the loop and how they reduced the propensity for the opening of the stem by forming a relatively stable hydrogen bond network in the hairpin loop. We modeled the pseudoknot itself using our RNA2D3D software combined with phylogenetic analysis. We studied the dynamical impact of the DKC mutations on the pseudoknot with the result that the pseudoknot became unstable while the hairpin form became more stable.
We discovered and elucidated the 3D structures of new types of translational enhancers that are found in the 3' UTRs of the turnip crinkle virus (the first of its kind found) and the pea enation mosaic virus. The discovery of these structural elements brought to light new mechanisms for translational enhancement in eukaryotic plant viruses that may have broader implications for understanding translational mechanisms in general. This was accomplished with the combined use of MPGAfold, our 3D molecular modeling software RNA2D3D, and close interactions with our experimental collaborators. We also modeled a novel pseudoknot found in the CCR5 mRNA. This pseudoknot is involved in frameshifting and appears to be stabilized by a microRNA, a novel function for a microRNA.
In addition, we employed methods based on elastic network interpolation to reduce the computational costs related to RNA 3D dynamics. Three-dimensional dynamics trajectories can be determined using a reduced atom representation and given conformational states. Computer time can be reduced from weeks to hours using this approach.
Computational RNA Nanobiology
RNA nanobiology represents a new modality for the development of nanodevices that have the potential for use in a number of areas, including therapeutics. Building on our experience as outlined above, we developed several computational and experimental techniques (see below) that provide a means to determine a set of nucleotide sequences that can assemble into desired nano complexes. One of these tools is a relational database called RNAJunction. The database contains structure and sequence information for known RNA helical junctions and kissing loop interactions. These motifs can be searched for in a variety of ways, providing a source for RNA nano building blocks. Another computational tool, NanoTiler, permits a user to construct specified RNA-based nanoscale shapes. NanoTiler provides a 3D graphical view of the objects being designed and provides the means to work interactively or with computer scripts on the design process even though the precise RNA sequences may not yet be specified, and an all-atom model is not available. NanoTiler can use the 3D motifs found in the RNAJunction database with those derived from specified RNA secondary structure patterns to build a defined RNA nano shape. Also, a combinatorial search can be applied to enumerate structures that would not normally be considered.
Another web-based software tool for RNA nanostructure design is NanoFolder, which is one of the few software tools that are capable of predicting the structure and sequence attributes of multi-stranded RNA constructs. With this software it is possible to specify the desired secondary structure motifs and have the software predict the set of sequences that generate these desired motifs with the correct intra- and inter-strand folding characteristics.
Experimental RNA Nanobiology
Based on the above described computational approaches to RNA nanodesign we demonstrated the ability to experimentally self-assemble and functionalize several RNA-based nanoparticles. This was accomplished with close interactions between the experimental and computational approaches leading to enhancements to both sets of methodologies. Examples include the self-assembly of 6 and 10 stranded cubes; the self-assembly of hexagonal rings of various sizes and double rings utilizing an RNA motif extracted from nature; the modification of sequences in the motif to improve yield while also maintaining appropriate geometries; the self-assembly of triangular structures; and the assembly of 30 stranded truncated-tetrahedral structures containing 12 Dicer substrates. We also developed techniques that define self-assembly protocols and that allow for co-transcriptional assembly of constructs that can include modified bases to increase the chemical stability of these nanoparticles. In addition, we functionalized these particles with up to six to twelve Dicer substrate siRNAs to enable controlled stoichiometry and gene silencing, and showed that these particles do indeed silence the designated genes when transfected into various cell lines.
We also explored another paradigm based on the use of RNA/DNA hybrid nanoconstructs containing split functionalities. This allows, for example, the splitting of a Diceable siRNA into two DNA/RNA hybrid components with DNA toeholds which, when transfected into cells, reassembles into a DNA duplex and a Diceable siRNA. This hybrid approach was incorporated into our hexagonal nanorings and nanocubes. The utility of this approach permits, amongst other things, controlled activation of functionalities, incorporation of molecular beacons on the DNA strands without interfering with RNA functionality and resistance to nuclease degradation. This approach was applied successfully in cell cultures and xenograph tumor mouse models.
Many of the computational systems have been adapted to other environments inside and outside our laboratory and the NIH and are accessible through our web site at https://rnastructure.cancer.gov
Bruce A. Shapiro, Ph.D.
Dr. Shapiro received his Ph.D. in computer science from the University of Maryland in 1978, with undergraduate work in mathematics and physics. During his association with the NIH, Dr. Shapiro did extensive work in image processing, nucleic acid structure prediction and analysis, and computational and experimental nanobiology, leading to several novel algorithms, computer systems, experimental techniques and discoveries in RNA biology. His interests included RNA nanobiology, understanding the relationships between RNA structure and function, and the use of parallel high-performance computer architectures to solve problems related to RNA computational and experimental biology and molecular modeling.
Dr. Shapiro retired as a Senior Investigator in 2021 and is now an NCI Scientist Emeritus.
RNA Nanostructures – Methods and Protocols
RNA nanotechnology is a young field with many potential applications. The goal is to utilize designed RNA strands, such that the obtained constructs have specific properties in terms of shape and functionality. RNA has potential functionalities that are comparable to that of proteins, but possesses (compared to proteins) simpler design principles akin to DNA. The promise is that designed RNA complexes may make possible novel types of molecular assemblies with applications in medicine (as therapeutics or diagnostics), material science, imaging, structural biology, and basic research.
Using this approach, scientists have shown that they can design RNAs that self-assemble into predefined shapes (such as rings, cubes, tetrahedrons, or lattices). Furthermore, designed RNAs can be programmed to impart different functionalities such as gene knockdown via RNA interference, temperature-specific behavior or RNA-based logic or multi-functional assemblies.
These successes, however, are typically only possible due to the use of specialized computational and experimental approaches. Repeating achievements based on regular research papers are frequently challenging if the methods are described only briefly. It is therefore, particularly useful that detailed protocols provided by leading experts in the field are compiled as a unit, thus making the current state of the art accessible to scientists entering the field. Presented in this book are 23 chapters representing a spectrum of computational and experimental protocols pertaining to the creation, characterization, and utilization of RNA nanostructures.
Bindewald E, Shapiro BA (Editors). RNA Nanostructures – Methods and Protocols, Methods in Molecular Biology, vol. 1632, Humana Press, New York, 2017.
Advances in RNA Structure Determination
The recent years have witnessed a revolution in the field of RNA structure and function. Until recently the main contribution of RNA in cellular and disease functions was considered to be a role defined by the central dogma, namely DNA codes for mRNAs, which in turn encode for proteins, a notion facilitated by non-coding ribosomal RNA and tRNA. It was also assumed at the time that less than 2% of DNA in the human genome was used to encode genes, the remainder considered “junk”. Subsequent research has unequivocally determined that RNA mediates a plethora of functions vital to cellular activity as well as clinically-significant diseases. In turn, it was discovered that the amount of DNA that encodes functional RNAs also increased significantly. This special journal issue, containing 19 articles, describes several of the computational and experimental methodologies that are used to determine RNA structure and function that enables the application of this knowledge for therapeutic purposes.
Shapiro BA, Le Grice SF (Editors). Advances in RNA Structure Determination. Methods. 2016 Jul 1;103:1-3. doi: 10.1016/j.ymeth.2016.06.006. PubMed PMID: 27342006.
The impact of dyskeratosis congenita mutations on the structure and dynamics of the human telomerase RNA pseudoknot domain
The pseudoknot domain is a functionally crucial part of telomerase RNA and influences the activity and stability of the ribonucleoprotein complex. Autosomal dominant dyskeratosis congenita (DKC) is an inherited disease that is linked to mutations in telomerase RNA and impairs telomerase function. In this paper, we present a computational prediction of the influence of two base DKC mutations on the structure, dynamics, and stability of the pseudoknot domain. We use molecular dynamics simulations, MM-GBSA free energy calculations, static analysis, and melting simulations analysis. Our results show that the DKC mutations stabilize the hairpin form and destabilize the pseudoknot form of telomerase RNA. Moreover, the P3 region of the predicted DKC-mutated pseudoknot structure is unstable and fails to form as a defined helical stem. We directly compare our predictions with experimental observations by calculating the enthalpy of folding and melting profiles for each structure. The enthalpy values are in very good agreement with values determined by thermal denaturation experiments. The melting simulations and simulations at elevated temperatures show the existence of an intermediate structure, which involves the formation of two UU base pairs observed in the hairpin form of the pseudoknot domain.
Yingling YG, Shapiro BA. The impact of dyskeratosis congenita mutations on the structure and dynamics of the human telomerase RNA pseudoknot domain. J Biomol Struct Dyn. 2007 Feb;24(4):303-20. PubMed PMID: 17206847.
Pattern Discovery in Biomolecular Data – Tools, Techniques, and Applications
Finding patterns in biomolecular data, particularly in DNA and RNA, is at the center of modern biological research. These data are complex and growing rapidly, so the search for patterns requires increasingly sophisticated computer methods. This book provides a summary of principal techniques. Each chapter describes techniques that are drawn from many fields, including graph theory, information theory, statistics, genetic algorithms, computer visualization, and vision. The chapters focus on finding patterns in DNA, RNA, and protein sequences, finding patterns in 2D and 3D structures, and choosing system components.
Wang JTL, Shapiro BA, Shasha D (Editors). Pattern Discovery in Biomolecular Data – Tools, Techniques, and Applications. Oxford University Press, New York, 1999.
CovaRNA (C++ sources) and CovStat (R package): The CovaRNA and CovStat software packages for detecting long-range covariations in nucleotide alignments. This is the software corresponding to the publication of Bindewald and Shapiro: Computational detection of abundant long-range nucleotide-covariation in Drosophila genomes. RNA 19: 1171-82, 2013. download
CorreLogo (2011): A C++ application (sources and 64bin Linux binaries) for a stand-alone version of our CorreLogo server for the 3D sequence logos of RNA and DNA. download
KNetFold: KNetFold is a new software for predicting the consensus secondary structure for a given alignment of RNA sequences. download
RSMatch 2.0: A package for comparing RNA structures via: 1) pair-wise and DB searches, 2) multiple structure alignment with common structure computation, 3) iterative DB searches. download
StructureLab (2011): An RNA workbench that assists in the 2D structure elucidation (with limited 3D capabilities). download
RNA2D3D (2011): An interactive system for the conversion of RNA 2D structures to 3D and 3D modeling. download
NanoTiler (2012): A Java application for the design of RNA nanoscale structures from building blocks. downloadNote: Not all of our software is available for download. If you cannot find the package you are looking for here, please contact Dr. Bruce A. Shapiro.
Databases and Web Applications
RNA Junction: A database of RNA structural elements including junctions, kissing loops, and bulges.
NanoFolder: Multi-strand RNA secondary structure prediction, as well as RNA sequence design
CyloFold: Single sequence RNA secondary structure prediction, including pseudoknots
rnashape: Normalization of RNA SHAPE experiment data
KNetFold: A webserver which predicts an RNA secondary structure from a sequence alignment. It uses compensatory base change information as well as energetic considerations to compute a structure. This algorithm is capable of predicting pseudoknots.
CorreLogo: A webserver that helps detect correlated mutations in RNA and DNA sequence alignments. It generates what we call a "3D sequence logo." This is an extension of the "sequence logo" concept.