Bruce A. Shapiro, Ph.D.

Bruce A. Shapiro, Ph.D.
Senior Investigator
Head, RNA Structure and Design Section

Dr. Shapiro directs research on computational and experimental RNA structure prediction and analysis and has pioneered research in the emerging field of RNA nanobiology. His work has led to several novel RNA folding and analysis algorithms, experimental techniques and discoveries in RNA biology. His interests include RNA nanobiology, nucleic acid structure prediction and analysis, the relationships between RNA structure and function. His has fostered a synergy between computational and experimental techniques, where computationally designed novel RNA based nanostructures have been shown to be able to self-assemble as predicted and be delivered to cell cultures and mouse models to control gene expression and thus show potential for use in RNA-based therapeutics.

Areas of Expertise
1) RNA structure 2) RNA folding 3) RNA Nanobiology – computational and experimental 4) Computational RNA structure prediction and analysis 5) Molecular dynamics 6) RNA 3D modeling

Contact Info

Bruce A. Shapiro, Ph.D.
Center for Cancer Research
National Cancer Institute
Building 560, Room 11-83
Frederick, MD 21702-1201
Ph: 301-846-5536

A complete understanding of the function of RNA molecules requires knowledge of their higher order structures (2D and 3D) as well as the characteristics of their primary sequence. RNA structure is important for many functions, including regulation of transcription and translation, catalysis, transport of proteins across membranes and the regulation of RNA viruses. The understandings of these functions are important for basic biology as well as for the development of drugs that can intervene in cases where pathological functionality of these molecules occurs.

Our group does research and development of methodologies for improving RNA folding and analysis techniques to help further our understanding of the functional properties of these molecules. In addition, we are focusing on the emerging field of RNA nanobiology. RNA represents a relatively new molecular material for the development of biologically oriented nano devices. It is an interesting material because of its natural functionalities, its ability to fold into complex structures and self-assemble. We have developed computational and experimental methodologies that permit the design of RNA-based nanoparticles that potentially have a variety of uses. Thus, our research on RNA covers five highly related and integrated areas of research:

  1. Research in algorithms for RNA secondary structure prediction and analysis;
  2. RNA biology and its relationship to sequence and secondary structure folding characteristics;
  3. Research in algorithms for RNA 3D structure prediction and analysis and their application to RNA biology;
  4. Research in algorithms for the design and analysis of RNA nanoparticles;
  5. Experimental design, synthesis and delivery of RNA-based nanoparticles.

What is learned in one area is applied to the other areas, enhancing our understanding of RNA structure, function, and RNA nanobiology and self-assembly.

Parallel Computational Biology and RNA Structure
Revolutionary changes in computational paradigms are required to maintain the necessary computational power to solve problems in molecular biology. Methodologies based on sequential computer architectures could not be expected to continually keep pace with the needed computational speeds. In order to accommodate the high speeds that are necessary, highly parallel computational techniques are now employed. Our group was one of the pioneers in the area of computational biology and the use of parallel high performance computer architectures for this endeavor.

Computational Techniques for RNA Secondary Structure Prediction and Analysis
We were the first to develop an RNA folding technique that uses concepts from genetic algorithms. Our algorithm, MPGAfold, was originally developed to run on a massively parallel SIMD supercomputer, a MasPar MP-2 with 16384 processors. This algorithm was modified and now runs on parallel high performance Linux clusters. Exceptional scaling characteristics are obtained with the ability to run the algorithm with hundreds of thousands of population elements. RNA pseudoknot prediction is part of the genetic algorithm, resulting in its ability to predict tertiary interactions. Other features include simulation of co-transcriptional folding, the ability to incorporate different energy rules, and the forced inhibition and embedding of desired helical stems. In addition, STRUCTURELAB, our heterogeneous bioinformatical RNA analysis workbench, can be used in conjunction with MPGAfold and RNA2D3D to produce predicted 3D atomic coordinates of RNA structures along with the visualization of these structures. Also, we developed a novel interactive visualization methodology that is part of STRUCTURELAB. This technique enables the comparison and analysis of multiple sequence RNA folds from a phylogenetic point of view, thus allowing improvement of predicted structural results across a family of sequences.

We developed KNetFold, a novel and powerful algorithm for RNA structure prediction from sequence alignments. The algorithm uses a unique hierarchical classification network based on mutual information, thermodynamics and Watson-Crick base-pairedness to predict structures. In addition, we have developed a web-based application, CorreLogo, that uses mutual information derived from RNA sequence alignments to determine covariations amongst base-paired positions. The algorithm includes a unique error measure and depicts results in 3D.

We developed, CyloFold, a unique algorithm for predicting, from a single sequence, RNA secondary structures that may include pseudoknots. This algorithm utilizes a novel technique that approximates the potential for 3D steric clashes in the predicted structures, thus filtering out those structures from consideration. The algorithm has been shown to have high accuracy when compared to other algorithms of its type.

We developed web software based on a Bayesian statistical approach that estimates the accuracy of base pair formation from data derived from SHAPE (Selective 2' - Hydroxyl Acylation analyzed by Primer Extension) experiments. The statistical/probabilistic results were derived by analyzing known RNA 3D structures having various types of known base interactions, and correlating them with SHAPE values. It was shown that low SHAPE values correlate well with Watson-Crick base pairing and stacking interactions while high SHAPE values indicate single stranded regions. Improvements could be seen if a 2 or 3 base context was also taken into account. We also showed that other types of known interactions did not correlate well. This type of information is helpful in ultimately determining the secondary structure of RNAs.

Computational Studies of RNA Folding Pathways
RNA folding pathways are proving to be quite important in the determination of RNA function. Studies indicate that RNA may enter intermediate conformational states that are key to its functionality. These states may have a significant impact on gene expression. It is known that the biologically functional states of RNA molecules may not correspond to their minimum energy state, that kinetic barriers may exist that trap the molecule in a local minimum, that folding often occurs during transcription, and cases exist in which a molecule will transition between one or more functional conformations before reaching its native state. Thus, methods for simulating the folding pathways of an RNA molecule, including co-transcriptional folding, and locating significant intermediate states are important for the prediction of RNA structure and its associated function. Several biological RNA folding pathways have been successfully studied using MPGAfold and STRUCTURELAB. Examples include the potato spindle tuber viroid, the host-killing mechanism of Escherichia coli plasmid R1, the hepatitis delta virus, HIV, and the dengue virus. These computational results are consistent with those derived from biological experiments. In addition, novel structural interactions and important functional intermediate and native states have been predicted. These have led to further successful confirmatory experiments.

Computational Prediction of RNA Interaction Networks
We have also developed programs CovaRna and CovStat to explore long-range co-varying RNA interaction networks using whole genome alignments. This new methodology, which was applied to Drosophila genomes, is currently being applied to other genomes. A parallel version of the program was devised to speed-up processing and the algorithms also rely on fast indexing schemes and conservative statistical methods to determine highly significant interactions. The methodology has found interesting interactions that are related to endogenous siRNAs, gene transport and genes related to morphogenesis.

Computational Studies of Three-Dimensional RNA Structures
Some structural elements of RNA molecules have been studied using molecular mechanics and molecular dynamics simulations. The structures examined include an RNA tetraloop where temperature-dependent denaturation of the tetraloop and the subsequent refolding to the original crystal structure were performed. A three-way junction from the core central domain of the 30S ribosomal subunit from Thermus thermophilus was explored. It has been experimentally determined that the intermolecular interactions between the three-way junction and the S15 ribosomal protein initiate the process of the assembly of the 30S ribosomal subunit. By using molecular dynamics simulations we obtained insights into the conformational transitions of the junction associated with the binding of S15. We determined using, molecular dynamics simulations, the structural effects of utilizing new types of modified RNA nucleotides containing carbocyclic sugars that are constrained to north or south conformations (C2' or C3' exo). In addition, we showed using molecular dynamics simulations, how ions and flanking bases play a very important role in human immunodeficiency virus (HIV) kissing loop monomer conformations. These results correlate well and may explain in detail, experimental studies that indicate the importance of ions for HIV-1 dimerization.

We have also examined the pseudoknot domain of telomerase. Molecular modeling and molecular dynamics of the pseudoknot domain, including its hairpin loop, were performed. Results indicated how the hairpin loop dynamics affected the opening and closing of the non-canonical U-U base pairs found in the stem. The opening suggested nucleation points for the formation of the pseudoknot. We have also examined the effect of dyskeratosis congenita (DKC) mutations in the loop and how they reduced the propensity for the opening of the stem by forming a relatively stable hydrogen bond network in the hairpin loop. We modeled the pseudoknot itself using our RNA2D3D software combined with phylogenetic analysis. We studied the dynamical impact of the DKC mutations on the pseudoknot with the result that the pseudoknot became unstable while the hairpin form became more stable.

We discovered and elucidated the 3D structures of new types of translational enhancers that are found in the 3' UTRs of the Turnip Crinkle Virus (the first of its kind found) and the Pea enation Mosaic Virus. The discovery of these structural elements has brought to light new mechanisms for translational enhancement in eukaryotic plant viruses that may have broader implications for understanding translational mechanisms in general. This was accomplished with the combined use of MPGAfold, our 3D molecular modeling software RNA2D3D, and close interactions with our experimental collaborators. We also modeled a novel pseudoknot found in the CCR5 mRNA. This pseudoknot is involved in frameshifting and appears to be stabilized by a microRNA, a novel function for a microRNA.

In addition, we have employed methods based on elastic network interpolation to reduce the computational costs related to RNA 3D dynamics. Three-dimensional dynamics trajectories can be determined using a reduced atom representation and given conformational states. Compute time can be reduced from weeks to hours using this approach.

Computational RNA Nanobiology
RNA nanobiology represents a new modality for the development of nanodevices that have the potential for use in a number of areas, including therapeutics. Building on our experience as outlined above, we developed several computational and experimental techniques (see below) that provide a means to determine a set of nucleotide sequences that can assemble into desired nano complexes. One of these tools is a relational database called RNAJunction. The database contains structure and sequence information for known RNA helical junctions and kissing loop interactions. These motifs can be searched for in a variety of ways, providing a source for RNA nano building blocks. Another computational tool, NanoTiler, permits a user to construct specified RNA-based nanoscale shapes. NanoTiler provides a 3D graphical view of the objects being designed and provides the means to work interactively or with computer scripts on the design process even though the precise RNA sequences may not yet be specified, and an all-atom model is not available. NanoTiler can use the 3D motifs found in the RNAJunction database with those derived from specified RNA secondary structure patterns to build a defined RNA nano shape. Also, a combinatorial search can be applied to enumerate structures that would not normally be considered.

Another web-based software tool for RNA nanostructure design is NanoFolder, which is one of the few software tools that are capable of predicting the structure and sequence attributes of multi-stranded RNA constructs. With this software it is possible to specify the desired secondary structure motifs and have the software predict the set of sequences that generate these desired motifs with the correct intra- and inter-strand folding characteristics.

Experimental RNA Nanobiology
Based on the above described computational approaches to RNA nanodesign we have demonstrated the ability to experimentally self-assemble and functionalize several RNA-based nanoparticles. This was accomplished with close interactions between the experimental and computational approaches leading to enhancements to both sets of methodologies. Examples include the self-assembly of 6 and 10 stranded cubes; the self-assembly of hexagonal rings of various sizes and double rings utilizing an RNA motif extracted from nature; the modification of sequences in the motif to improve yield while also maintaining appropriate geometries; and the self-assembly of triangular structures. We also developed techniques that define self-assembly protocols and that allow for co-transcriptional assembly of constructs that can also include modified bases to increase the chemical stability of these nanoparticles. In addition, we have functionalized these particles with up to six different siRNAs to enable controlled stoichiometry and gene silencing, and showed that these particles do indeed silence the designated genes when transfected into various cell lines.

We have also been exploring another paradigm based on the use of RNA/DNA hybrid nanoconstructs containing split functionalities. This allows, for example, the splitting of a Diceable siRNA into two DNA/RNA hybrid components with DNA toeholds, which when transfected into cells reassembles into a DNA duplex and a Diceable siRNA. This hybrid approach has been incorporated in our hexagonal nanorings and nanocubes. The utility of this approach permits, amongst other things, controlled activation of functionalities, incorporation of molecular beacons on the DNA strands without intefering with RNA functionality and resistance to nuclease degradation. This approach has been tried successfully in cell cultures and xenograph tumor mouse models.

Many of the computational systems have been adapted to other environments inside and outside our laboratory and the NIH and are accessible through our web site at

Scientific Focus Areas:
Cancer Biology, Computational Biology, Immunology, Structural Biology, Virology
  1. Afonin KA, Viard M, Koyfman AY, Martins AN, Kasprzak WK, Panigaj M, Desai R, Santhanam A, Grabow WW, Jaeger L, Heldman E, Reiser J, Chiu W, Freed EO, and Shapiro BA.
    Nano Lett. 14: 5662-71, 2014. [ Journal Article ]
  2. Belew AT, Meskauskas A, Musalgaonkar S, Advani VM, Sulima SO, Kasprzak WK, Shapiro BA, and Dinman JD.
    Nature. 512: 265-9, 2014. [ Journal Article ]
  3. Afonin KA, Kasprzak W, Bindewald E, Puppala PS, Diehl AR, Hall KT, Kim TJ, Zimmermann MT, Jernigan RL, Jaeger L, and Shapiro BA.
    Methods. 67: 256-65, 2014. [ Journal Article ]
  4. Bindewald E, and Shapiro BA.
    RNA. 19: 1171-82, 2013. [ Journal Article ]
  5. Kasprzak WK, and Shapiro BA.
    Methods Mol Biol. 1138: 199-224, 2014. [ Journal Article ]

Dr. Shapiro received his Ph.D. in computer science from the University of Maryland in 1978, with undergraduate work in mathematics and physics. During his association with the NIH, Dr. Shapiro has done extensive work in image processing, nucleic acid structure prediction and analysis, and computational and experimental nanobiology, leading to several novel algorithms, computer systems, experimental techniques and discoveries in RNA biology. His latest interests include RNA nanobiology, understanding the relationships between RNA structure and function, and the use of parallel high performance computer architectures to solve problems related to RNA computational and experimental biology and molecular modeling.

Name Position
Anu Puri, Ph.D. Staff Scientist
Eckart Bindewald Ph.D. Senior Computational Scientist (Leidos)
Neil Dold Postbaccalaureate Fellow
Wojciech Kasprzak Programmer Analyst IV (Leidos)
Lorena Parlea Ph.D. Postdoctoral Fellow (CRTA)
Shannon Tsai Postbaccalaureate Fellow (CRTA)
Mathias Viard Ph.D. Scientist II (Leidos)
Paul Zakrevsky Ph.D. Postdoctoral Fellow (CRTA)

Available Software

CovaRNA (C++ sources) and CovStat (R package) - The CovaRNA and CovStat software packages for detecting long-range covariations in nucleotide alignments. This is the software corresponding to the publication of Bindewald and Shapiro:  Computational detection of abundant long-range nucleotide-covariation in Drosophila genomes. RNA 19: 1171-82, 2013.   download

MPGAfold - A massively parallel genetic algorithm that predicts RNA secondary structure.  To obtain a copy of this software, please contact Dr. Bruce A. Shapiro.

MPGAfold Visualizer - A Java application that allows the user to visually see an MPGAfold run.  To obtain a copy of this software, please contact Dr. Bruce A. Shapiro.

CorreLogo (2011) - A C++ application (sources and 64bin Linux binaries) for a stand-alone version of our CorreLogo server for the 3D sequence logos of RNA and DNA.  download

KNetFold - KNetFold is a new software for predicting the consensus secondary structure for a given alignment of RNA sequences.  download

RSMatch 2.0 - A package for comparing RNA structures via: 1) pair-wise and DB searches, 2) multiple structure alignment with common structure computation, 3) iterative DB searches.  download

StructureLab (2011) - An RNA workbench that assists in the 2D structure elucidation (with limited 3D capabilities).  download

RNA2D3D (2011) - An interactive system for the conversion of RNA 2D structures to 3D and 3D modeling.  download

NanoTiler (2012) - A Java application for the design of RNA nanoscale structures from building blocks.  download

Note:  Not all of our software is available for download. If you cannot find the package you are looking for here, please contact Dr. Bruce A. Shapiro.

Databases and Web Applications

RNA Junction:  A database of RNA structural elements including junctions, kissing loops, and bulges

NanoFolder:  Multi-strand RNA secondary structure prediction, as well as RNA sequence design

CyloFold:  Single sequence RNA secondary structure prediction, including pseudoknots

rnashape:  Normalization of RNA SHAPE experiment data


KNetFold:  A webserver which predicts an RNA secondary structure from a sequence alignment. It uses compensatory base change information as well as energetic considerations to compute a structure. This algorithm is capable of predicting pseudoknots.

CorreLogo:  A webserver that helps detect correlated mutations in RNA and DNA sequence alignments. It generates what we call a "3D sequence logo." This is an extension of the "sequence logo" concept.