Byungkook Lee, Ph.D.

Byungkook  Lee, Ph.D.
NIH Scientist Emeritus

Dr. Lee is a pioneer in the analysis of protein structure. He invented the concept of the accessible surface area of protein molecules and developed a paradigm-shifting theory of hydrophobicity, which is one of the most important forces that govern the structure and interaction of biological molecules. His group’s recent studies of the internally symmetric proteins and on protein domain determinations using locally similar structural pieces (LSSPs) promise to find fundamental units and elementary interactions that make up protein structures and their interactions.

Dr. Lee’s group also collaborates in the development of immunotoxins as anti-cancer agents by designing improvements by protein engineering and by mathematically modeling their delivery process.

Areas of Expertise

1) protein structure analysis, 2) hydrophobicity, 3) structural bioinformatics, 4) computational biology, 5) protein structure modeling, 6) antibody engineering

Contact Info

Byungkook Lee, Ph.D.
Center for Cancer Research
National Cancer Institute
Building 37, Room 5120
Bethesda, MD 20892-4264
Ph: 301-496-6580

We use theoretical and computational techniques to solve important biochemical and molecular biological problems. The PI invented the concept of the solvent accessible surface area of protein molecules and proposed a paradigm changing new theory on the origin of the phenomenon called hydrophobicity. Since moving to NCI, the group designed mutations that improved properties of immunotoxins as an anti-cancer agent, e.g. to improve the stability of the molecule, to reduce non-specific toxicity, to increase affinity to its specific antigen, and to reduce immunogenicity. The group also found many interesting and potentially useful organ-specific differentiation antigens by searching through the EST (Expressed Sequence Tag) database.

Currently, the group is focused on the following three specific research topics.

Internally symmetric proteins. Many functional proteins have a symmetric structure. Most of these are multimeric complexes, which are made of non-symmetric monomers arranged in a symmetric manner. However, there are also a large number of proteins that have a symmetric structure in the monomeric state. Some well-known examples of such internally symmetric structures are the 8-fold symmetric 'TIM' barrel folds, the β-blade propellers, the α/α superhelices, the leucine-rich repeat horseshoe-shape structures, etc. Occurrence of internally symmetric structures poses a number of questions: What sequence and energetic features make repeating units fold into a similar structure and cause them to arrange in a symmetric pattern? What is the biological function of such symmetric chains? How are they different from the symmetric structures of multimeric complexes, which are formed by symmetrically assembling non-symmetric monomers? How many symmetric chains and what types of symmetry exist in the protein universe? What is their evolutionary history? Symmetric structures also tend to cause problems for automatic domain partition programs, which may recognize a single repeating unit or several such units as a domain for some chains and the whole repeat set with the full symmetry for others of the similar structure. For structures with super-helical symmetry, automatic structure comparison can be a problem also because of the flexibility of the structure between the repeating units. One or a few units in two such structures can be recognized as similar, but the whole structures can often be sufficiently different for routine detection of the similarity. In order to begin to study these interesting objects, we devised a 3D structural symmetry detection program called SymD (for Symmetry Detection). It works through the 'alignment scan' procedure, in which a structure is systematically compared with a copy of itself, after the copy is permuted by k residues where k varies from 1 to N-3, N being the number of amino acid residues of the protein. This procedure is sensitive because (1) it allows detection of symmetry even when the structure contains symmetry-breaking insertions or deletions either within or between the repeating units and (2) it amplifies symmetric signal. When a protein is found to be symmetric, the procedure also yields information on the direction and position of the symmetry axis, the rotation angle, and the pitch if the symmetry is that of a helix. Among the symmetric folds that SymD finds in the ASTRAL 40 domain database are 70% to 80% of the TIM barrels, more than half of the alpha-alpha superhelices, most of the alpha-alpha toroids, LRRs (leucine-rich repeats), and transmembrane beta-barrels, and all beta-trefoils and beta-propellers. Globally, 10% to 15% of the proteins in the ASTRAL 40 domain database may be considered symmetric depending on the precise cutoff value one uses to measure the degree of perfection of the symmetry. Symmetrical proteins occur in all structural classes and can have a closed, circular structure, a cylindrical barrel-like structure, or an open, helical structure. Our plan for future on this project is first to complete the automation of the symmetry type determination and then to explore the relation between the symmetry type and the sequence, function and evolutionary history of the protein. Of particular interest is the interface between repeating units. This interface can serve as a prototype of both the protein-protein interaction surface and also the interface of assembly of different folding units within a single globular protein.

Protein domain structure and recurrence of locally similar structural pieces. Structural domains are basic units of protein structure and essential for understanding the function of the protein and for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in PDB is increasing dramatically and there is a clear need for a reliable automatic domain assignment procedure. However, domain parsing is an old problem that has not been solved in a satisfactory fashion after some 30 years of effort. For example, Holland et al. (J. Mol. Biol., 2006) reported that there are substantial differences between domain definitions that different domain partition programs produce and those from manual partition for their set of target structures with relatively non-controversial domain boundaries. We think that at least part of this difficulty lies in the fact that domains have been defined ultimately by subjective criteria. Essentially all automatic procedures rely on recognizing groups of residues that are 'separated' from others either geometrically and/or energetically. But there is no precise, objective guidance on how to decide when groups are separated enough to be in separate domains. The criterion of recurrence is different at least in principle. Here, one defines a group of residues as a domain if they occur together in otherwise unrelated structures. Because the decision is made by observation rather than by judgment, there is the possibility that an objective procedure may emerge using this criterion. We developed initial versions of domain definition procedure according to this principle. There is much room for improvement of this procedure and we will work on refining the procedure. But we made an unexpected discovery during the development of this procedure: We found that a large collection of locally similar structural pieces, which we named LSSPs, defines domains nearly as well as other programs that operate on the principle of separation. LSSPs are small pieces, typically covering only about 10% to 20% of the domain, but containing 3 or more secondary structure elements. It is surprising that a collection of such small pieces can define domains so well and we are still trying to figure out why. However, we are suddenly becoming aware of many occasions wherein LSSP-like pieces figure prominently. These include the fragments used in the highly successful fragment assembly method of protein structure prediction (Bystroff, et al., J. Mol. Biol. 1998; Simons et al., Proteins 1999; Bystroff, et al., J. Mol. Biol. 2000), the SSS (super secondary structure) library of Szustakovski et al. (Bioinfo. 2005), which are built from a collection of what looks like our LSSPs and which one can use like a lego set to build most protein domains, the LSSP-like pieces of Petrey et al. (Proc. Nat. Acad. Sci. 2009) that seem to move through proteins of different folds carrying a common function, and finally the proposal by Lupas et al. (J. Struct. Biol. 2001) that domains originated from a conglomerate of LSSP-like ADSs (antecedent domain segments). These separate reports seem to point to the common idea that domains are made of some non-random collection of small pieces that preserve their structure, and perhaps the sequence and function as well, across different protein folds. If so, symmetric proteins share some common features since they are also made of small units. One may think of the symmetric proteins as homomers of small units and the domains as heteromers of small units. These ideas also give us some confidence that domains can, or perhaps should, be defined by means of the LSSPs. We will refine the automatic domain partition procedure using the recurrence principle. In addition, we plan to investigate the nature of the LSSPs and their relation to the domain structure. For example, is there detectable homology among the more common LSSPs? Is the universe of protein folds made of different combinations of these small pieces and therefore effectively continuous? What, if any, is the relation between these pieces and the function and active site of the protein? Answers to these questions are relevant to the protein folding and design and important for understanding the function and evolution of protein structures.

Mathematical model of the immunotoxin delivery process. Immunotoxins (ITs) are molecules constructed by joining the Fv part of an anticancer antibody and a suitable toxin, in our case, a part of the pseudomonas exotoxin A. These molecules are designed to bind only to the target cancer cells and kill them. Dr. Pastan's group has made many such molecules, each of which has a specific antibody for a particular cancer. Some of these have been or are being tested in phase I and II clinical trials against leukemia, mesothelioma and other forms of cancer. The toxin is potent; it has been estimated that just a few molecules inside the cell can kill the whole cell. However, the effective dose found to reduce the tumor size of mouse xenograph model corresponds to at least several hundred times more than expected from this potency of the toxin. We are working on making a mathematical model of the delivery process to provide a quantitative understanding of the process, to identify different cellular and other factors that influence the effectiveness of the delivery process, to identify the sources of waste, and to help determine the dosing method and possible combination therapy to make the delivery process more efficient. We have published the initial mathematical model of the immunotoxin delivery process (Chen et al., Annals of Biomedical Engineering, 36: 486-512, 2008). The model consists of a set of differential equations that represent the rates of various processes, which include the translocation of the immunotoxin (IT) from blood vessel into the tumor tissue, diffusion through the intercellular space of the tumor tissue, non-specific decay and clearance from the tumor tissue, uptake by the tumor cells, endocytosis and transport through the cell interior into the cytoplasm, decay during this process, cell killing, and the tumor volume change (growth or shrinkage). The model identified nearly 20 factors that influenced the delivery process. We could determine or estimate values for these parameters that fairly accurately reproduced the experimentally observed tumor volume change upon IT administration on mouse xenograph tumor models and identified some of the sensitive parameters. For example, the natural tumor growth rate is found to be one of the most sensitive parameters. This means that any other drug that slows the tumor growth will increase the potency of IT and partly explains the observed synergistic effect of taxol and IT combination therapy (Zhang et al., Clin. Cancer Res. 2006). (However, there are probably other, perhaps more important, mechanisms by which taxol exerts its synergistic effect. See Zhang et al., Proc. Nat. Acad. Sci. 2007). Similarly, the blood vessel density in the tumor tissue is another highly sensitive parameter, which suggests that anti-angiogenesis agents should also exert a synergistic effect. The model works in two different scales, one at the molecular level for tracking the concentration of the immunotoxin in various compartments of the tumor tissue and another at the cell level for tracking the number of fast dividing cancer cells, of cells intoxicated by the toxin, and of the dead cells waiting to be cleared. The initial model unfortunately contained an error in coupling these two scales. We have now developed a new model (Pak et al., Cancer Res. 72:3143-3152, 2012) that correctly couples the two scales in a seamless manner. The new model also handles the effect of the antigen shedding, which is known to happen for most, if not all, known tumor-specific antigens. From this new model, we found the surprising fact that the antigen shedding increases the deliver efficacy of SS1P, the anti-mesothelin immunotoxin against mesothelioma and lung and pancreatic cancers. We are now working on a third generation model, which has much improved parameters. We are using this most up-to-date model to delimit the range of parameter values in which the antigen shedding becomes beneficial instead of being detrimental as normally expected and on determining the mechanism by which agents like taxol exerts synergistic effect.

For more information on our research and on the software that we developed, please click here.

NIH Scientific Focus Areas:
Biomedical Engineering and Biophysics, Computational Biology, Structural Biology
  1. Lee B, Richards FM.
    J. Mol. Biol. 55: 379-400, 1971. [ Journal Article ]
  2. Lee B.
    Biopolymers. 31: 993-1008, 1991. [ Journal Article ]
  3. Kim C, Basner J, Lee B.
    BMC Bioinformatics. 11: 303, 2010. [ Journal Article ]
  4. Pak Y, Zhang Y, Pastan I, Lee B.
    Cancer Res. 72: 3143-52, 2012. [ Journal Article ]
  5. Tai C, Bai H, Taylor TJ, Lee B.
    Proteins. 82 Suppl 2: 57-83, 2014. [ Journal Article ]

Dr. Lee received his Ph.D. from Cornell University in 1967 and studied at Yale University. From 1970 to 1980 he taught at the University of Kansas. He has been at the NIH since 1981 and in his present position since 1991.