Mikhail Kolmogorov, Ph.D.

Center for Cancer Research
National Cancer Institute

Building 41, Room A100C
Bethesda, MD 20892
240-858-3169
mikhail.kolmogorov@nih.gov

Stadtman Investigator

Cancer Data Science Laboratory

Head, Structural Variation and Advanced Sequencing Technologies Section

RESEARCH SUMMARY

The focus of Dr. Kolmogorov’s research is computational genomics, with particular interest in cancer. The Kolmogorov lab develops new algorithms and tools that take advantage of new genomic technologies (such as long-read sequencing or Hi-C) to understand how genomic mutations and rearrangements affect cancer evolution and treatment response.

Areas of Expertise

Cancer Genomics

Computational Genomics

Bioinformatics Algorithms

Data Science and Computational Biology

Long-Read DNA Sequencing

Big Data Analysis

Research
Publications
Biography
Job Vacancies
Team
News
Resources
Alumni

New sequencing technologies and algorithms to study cryptic variation in cancer genomes

Cancer is a disease of the genome. Most cancers are driven by somatic mutations, such as single nucleotide variations (SNVs) that can alter protein sequence and function. Another hallmark of cancer is structural variation (SV), a process that can insert, delete or rearrange large chromosomal fragments. SVs vary greatly in size and complexity: from local oncogene amplifications to catastrophic events that shuffle megabase-scale fragments from one or multiple chromosomes.

Recent analysis of 2,658 tumors by the Pan-Cancer Analysis of Whole Genomes project showed that ~50% of cancer driver mutations are explained by SVs. Despite that, somatic SVs in cancer remain understudied because of technological and methodological challenges. Most current cancer genomics projects rely on short-read sequencing data, which systematically miss certain classes of somatic SVs and often produce many false-positive calls.

Our lab is developing new computational approaches that utilize novel sequencing technologies (such as long-read sequencing or chromatin conformation capture) to better understand the prevalence and role of SV in cancer. In collaboration with other NIH and extramural investigators, we apply these new methods to various cancer types and patient cohorts. We also aim to improve scalability and democratize the cost of long-read sequencing projects, paving the road towards the complete variational landscape of the human genome and microbiome.

Current research highlights

New open-source tools for characterization of complex somatic SVs and CNVs using long reads. Despite the recent successes of long-read genomics (including the methods developed in our group), most current popular approaches were not designed to handle the complexity of a cancer genome. Our lab is developing several new long-read tools to address this. Severus, a breakpoint graph-based algorithm for somatic SV calling that supports matching tumor/normal analysis, can characterize complex multi-break rearrangements, and produces phased calls. Another tool, Wakhan, for haplotype-specific copy number variant (CNV) calling, designed to explore large chromosomal aberrations - such as aneuploidy - and at the same time sensitive to small focal amplifications.

Long-read genome assembly of heterogeneous human microbiomes. Bacterial species in microbial communities are often represented by mixtures of strains, distinguished by small variations in their genomes. Short-read approaches can be used to detect small-scale variation between strains but fail to phase these variants into contiguous haplotypes. We have developed Strainy, for strain-level metagenome assembly and phasing from long-read metagenomic sequencing. Strainy takes a de novo metagenomic assembly as input and identifies strain variants, which are then phased and assembled into contiguous haplotypes.

Scalable methods for population-scale long-read sequencing. Despite the advances of long-read technologies, cost and scalability have remained prohibitive barriers to the use of in population-scale studies. We are developing new methods and engaging with various groups and consortia to ultimately enable long-read analysis of thousands of samples. This will pave the road towards understanding of the population-scale structural diversity of the human genome. For example, in collaboration with NIH CARD, we developed the Napu pipeline with the ultimate goal of sequencing and analyzing 4000+ human brain genomes.

Publications

Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing

Ayse G Keskus, Asher Bryant, Tanveer Ahmad, Byunggil Yoo, Sergey Aganezov, Anton Goretsky, Ataberk Donmez, Lisa A Lansdon, Isabel Rodriguez, Jimin Park, Yuelin Liu, Xiwen Cui, Joshua Gardner, Brandy McNulty, Samuel Sacco, Jyoti Shetty, Yongmei Zhao, Bao Tran, Giuseppe Narzisi, Adrienne Helland, Daniel E Cook, Pi-Chuan Chang, Alexey Kolesnikov, Andrew Carroll, Erin K Molloy, Chengpeng Bi, Adam Walter, Margaret Gibson, Irina Pushel, Erin Guest, Tomi Pastinen, Kishwar Shafin, Karen H Miga, Salem Malikic, Chi-Ping Day, Nicolas Robine, Cenk Sahinalp, Michael Dean, Midhat S Farooqi, Benedict Paten, Mikhail Kolmogorov

Nature Biotechnology. 2025.

Full-Text Article

[ Journal Article ]

Unraveling the hidden complexity of cancer through long-read sequencing

Qiuhui Li, Ayse G Keskus, Justin Wagner, Michal B Izydorczyk, Winston Timp, Fritz J Sedlazeck, Alison P Klein, Justin M Zook, Mikhail Kolmogorov, Michael C Schatz

Genome Research. 2025.

Full-Text Article

[ Journal Article ]

Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing

E Kazantseva, A Donmez, M Frolova, M Pop, M Kolmogorov

Nature Methods. 2024.

Full-Text Article

[ Journal Article ]

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Mikhail Kolmogorov, Kimberley J Billingsley, Mira Mastoras, Melissa Meredith, Jean Monlong, Ryan Lorig-Roach, Mobin Asri, Pilar Alvarez Jerez, Laksh Malik, Ramita Dewan, Xylena Reed, Rylee M Genner, Kensuke Daida, Sairam Behera, Kishwar Shafin, Trevor Pesout, Jeshuwin Prabakaran, Paolo Carnevali, Jianzhi Yang, Arang Rhie, Sonja W Scholz, Bryan J Traynor, Karen H Miga, Miten Jain, Winston Timp, Adam M Phillippy, Mark Chaisson, Fritz J Sedlazeck, Cornelis Blauwendraat, Benedict Paten

Nature Methods. 2023.

Full-Text Article

[ Journal Article ]

Assembly of long, error-prone reads using repeat graphs

Kolmogorov M, Yuan J, Lin Y, Pevzner PA.

Nat Biotechnol. 37(5): 540-546, 2019. [ Journal Article ]

Biography

Stadtman Investigator

Mikhail Kolmogorov, Ph.D.

Mikhail is currently a Stadtman investigator at the National Cancer Institute, where he leads a group focusing on computational and cancer genomics. Prior to that, Mikhail was a postdoctoral fellow at the UC Santa Cruz, supervised by Dr. Benedict Paten. Mikhail completed his Ph.D. in September 2019 in Computer Science from UC San Diego, under the mentorship of Dr. Pavel Pevzner. Mikhail received his M.S. in bioinformatics from St. Petersburg Academic University, Russia.

Job Vacancies

We have no open positions in our group at this time, please check back later.

To see all available positions at CCR, take a look at our Careers page. You can also subscribe to receive CCR's latest job and training opportunities in your inbox.

Team

POSTDOCTORAL VISITING FELLOW

Ayse Gokce Keskus, Ph.D.

PREDOCTORAL VISITING FELLOW

Ataberk Donmez, M.S.

POSTDOCTORAL VISITING FELLOW

Tanveer Ahmad, Ph.D.

PREDOCTORAL FELLOW

Asher Bryant, MSc

PREDOCTORAL CRTA

Anton Yuri Goretsky

POST-BACCALAUREATE FELLOW (SUMMER STUDENT)

Julissa Zelaya Portillo

News

Learn more about CCR research advances, new discoveries and more
on our news section.

Resources

Our Software

Severus

Severus is a somatic structural variation (SV) caller for long reads (both PacBio and ONT). It is designed for matching tumor/normal analysis, supports multiple tumor samples, and produces accurate and complete somatic and germline calls. Severus takes advantage of long-read phasing and uses the breakpoint graph framework to model complex chromosomal rearrangements.

Wakhan

A tool to analyze haplotype-specific chromosome-scale somatic copy number aberrations and aneuploidy using long reads (Oxford Nanopore, PacBio). Wakhan takes long-read alignment and phased heterozygous variants as input, and first uses extends the phased blocks, taking advantage of the CNA differences between the haplotypes. Wakhan then generates inetractive haplotype-specific coverage plots.

Strainy

Strainy is a tool for phasing and assembly of bacterial strains from long-read sequencing data (either Oxford Nanopore or PacBio). Given a reference (or collapsed de novo assembly) and set of aligned reads as input, Strainy produces multi-allelic phasing, individual strain haplotypes and strain-specific variant calls.

Flye & metaFlye

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. Flye also has a special mode for metagenome assembly.

Alumni

Jeshuwin Prabakaran

June 2022-August 2022

Summer Student

Breadcrumb

Mikhail Kolmogorov, Ph.D.

RESEARCH SUMMARY

Areas of Expertise

Mikhail Kolmogorov, Ph.D.

Research

New sequencing technologies and algorithms to study cryptic variation in cancer genomes

Current research highlights

Publications

Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing

Unraveling the hidden complexity of cancer through long-read sequencing

Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Assembly of long, error-prone reads using repeat graphs

Biography

Mikhail Kolmogorov, Ph.D.

Job Vacancies

Team

News

Resources

Our Software

Alumni