Thomas D. Schneider, Ph.D.
Dr. Schneider is interested in discovering and exploring the fundamental mathematics of biology: "Living things are too beautiful for there not to be a mathematics that describes them." He uses the mathematics of information theory, first developed by Claude Shannon in 1948.
Dr. Schneider first discovered that binding sites on nucleic acids usually contain just about the amount of information needed for molecules to find the sites in the genome. Information is measured in bits, the choice between two equally likely possibilities. It is the number of times one needs to divide the possibilities to reach a subset of objects. That is, the log base 2 of the number of posibilities is the number of bits. For example, ribosome binding sites in E. coli have about 10 bits of information per site on the average. To find the roughly 4000 gene starts in the 4 million base E. coli requires about log2(4,000,000/4,000) = 10 bits, close to the information measured in the ribosome binding sites.
Schneider and then high-school student Mike Stephens invented sequence logos to understand the patterns at donor and acceptor human RNA splice junctions. Sequence logos are now widely used in molecular biology.
The relationship between information, measured in bits, and the binding energy is a fundamental problem in biology. The Second Law of Thermodynamics gives the ideal relationship for converting the energy dissipated during molecular binding to bits. Using this conversion factor Dr. Schneider discovered that binding sites are 70% efficient. It turns out that rhodopsin in the eye and muscle are also 70% efficient. Dr. Schneider has discovered the basic mathematics that gives this general result.
For more information, see https://alum.mit.edu/www/toms/
Information Theory in Molecular Biology
Sequence logos were invented by Tom and Mike Stephens.
Shannon's measure of information is useful for characterizing the DNA and RNA patterns that define genetic control systems. Dr. Schneider has shown that binding sites on nucleic acids usually contain just about the amount of information needed for molecules to find the sites in the genome. This is a working hypothesis, and exceptions can either destroy the hypothesis or reveal new phenomena. For this reason, he is studying several interesting anomalies.
The first major anomaly was found at bacteriophage T7 promoters. These sequences conserve twice as much information as the polymerase requires to locate them. The most likely explanation is that a second protein binds to the DNA. In another case, he discovered that the F incD region has a three-fold excess conservation, which implies that three proteins bind there. Both anomalies are being investigated experimentally. Thus, the project has three major components: theory, computer analysis, and molecular biology experiments. The theoretical work can be divided into several levels. Level 0 is the study of genetic sequences bound by proteins or other macromolecules, briefly described above. The success of this theory suggested that other work of Shannon should also apply to molecular biology. Level 1 theory introduces the more general concept of the molecular machine, and the concept of a machine capacity equivalent to Shannon's channel capacity. In Level 2, the Second Law of Thermodynamics is connected to the capacity theorem, and the limits on the functioning of Maxwell's Demon become clear. The practical application of this work for most molecular biologists will be the replacement of consensus sequences with better models. Consensus sequences are being used to characterize the binding sites of macromolecules on DNA and RNA. After aligning a set of binding-site sequences, the most frequent base is chosen. A position that contains 100 percent As will be represented by an A, while a position that is only 75 percent A will also be represented by an A. The consensus is frequently used to search for binding sites, and the number of mismatches to the consensus is counted. A mismatch to a 100 percent A position is much more severe than one to a 75 percent A, but this is not accounted for so the researcher is misled. Mathematically robust graphical replacements for the consensus sequences called the Sequence Logo and Sequence Walkers won't discard hard-earned data. The Walker, which is patented, has direct medical application because it can be used to distinguish polymorphisms from mutations in human sequences. The Walker method allows one to display many different binding sites simultaneously. This bird's-eye-view is a powerful tool for gene structure analysis. We collaborated on this research with Peter Rogan, Allegheny University of the Health Sciences and many other people.
My most recently published discovery is that many molecular machines are 70% efficient. This result is explained using high dimensional geometry.
Molecular information theory tells us how molecules function. The theory therefore tells us how to build useful devices at the molecular level. We have several projects in nanotechnology. Nanoprobes are single-molecule detectors while the Medusa(TM) Sequencer is a single-molecule DNA sequencing device.
Further information may be found on the web at:
- Mol. Microbiol.. 83: 612-22, 2012. [ Journal Article ]
Promoter variants in the MSMB gene associated with prostate cancer regulate MSMB/NCOA4 fusion transcripts.Hum. Genet.. 131: 1453-66, 2012. [ Journal Article ]
70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence.Nucleic Acids Res.. 38: 5995-6006, 2010. [ Journal Article ]
- Nano Commun Netw. 1: 173-180, 2010. [ Journal Article ]
- Nucleic Acids Res.. 36(11): 3828-33, 2008. [ Journal Article ]
After receiving a B.S. in biology at the Massachusetts Institute of Technology, Dr. Schneider obtained his Ph.D. in molecular biology with Dr. Larry Gold in the Department of Molecular, Cellular and Developmental Biology at the University of Colorado, Boulder. His thesis and postdoctoral work, also done at Boulder, were on the application of information theory to nucleic-acid binding sites. He helped to organize and run the international news group bionet.info-theory, which is devoted to the application of information theory to biology. He was a member of the GenBank Advisory Committee and is a member of the American Association for the Advancement of Science (AAAS), The Scientific Research Society Sigma Xi and the Institute of Electrical and Electronics Engineers (IEEE) Information Theory Society.