Javed Khan, M.D.

Javed  Khan, M.D.
Deputy Chief
Senior Investigator
Head, Oncogenomics Section

The mission of the Oncogenomics Section is to harness the power of high throughput genomic and proteomic methods to improve the outcome of children with high-risk metastatic, refractory and recurrent cancers. The research goals are to integrate the data, decipher the biology of these cancers and to identify and validate biomarkers and novel therapeutic targets and to rapidly translate our findings to the clinic.

Areas of Expertise

1) oncogenomics, 2) genomics, 3) proteomics, 4) immunogenomics, 5) bioinformatics

Contact Info

Javed Khan, M.D.
Center for Cancer Research
National Cancer Institute
Building 37, Room 2016B
Bethesda, MD 20892
Ph: 240-760-6135

Mission of the Oncogenomics Section

The mission of the Oncogenomics Section is to harness the power of high throughput genomic and proteomic methods to improve the outcome of children with high-risk metastatic, refractory and recurrent cancers. The research goals are to integrate the data, decipher the biology of these cancers and to identify and validate biomarkers and novel therapeutic targets and to rapidly translate our findings to the clinic. 

1. Comprehensive Omics Analysis: a) Applying high-throughput genomics, proteomics mathematical modeling and bioinformatics to characterize currently incurable malignancies including metastatic, resistant and relapsed tumors for the identification and validation of biomarkers and therapeutic targets. b) Genome wide association studies and mutational screening of germ-line DNA.

2. Targeted Therapeutics: a) High-throughput siRNA, small molecule, natural products, and drug screening for high-risk pediatric malignancies. b) Development of molecularly targeted therapeutic agents against existing and newly identified targets.

3. Translational Oncology: Leverage the existing clinical and scientific strengths within the Genetics Branch including phase 1/2 therapeutics, immune/vaccine therapy and molecular biology to translate these findings to the clinic in an environment where there is state-of-the-art clinical care.

NIH Scientific Focus Areas:
Cancer Biology, Chemical Biology, Computational Biology, Genetics and Genomics, Molecular Biology and Biochemistry
  1. Gryder BE, Pomella S, Sayers C, Wu XS, Song Y, Chiarella AM, Bagchi S, Chou HC, Sinniah RS, Walton A, Wen X, Rota R, Hathaway NA, Zhao K, Chen J, Vakoc CR, Shern JF, Stanton BZ, Khan J.
    Nature Genetics. 51(12): 1714-1722, 2019. [ Journal Article ]
  2. Gryder BE, Wu L, Woldemichael GM, Pomella S, Quinn TR, Park PMC, Cleveland A, Stanton BZ, Song Y, Rota R, Wiest O, Yohe ME, Shern JF, Qi J, Khan J.
    Nat Commun. 10(1): 3004, 2019. [ Journal Article ]
  3. Gryder BE, Yohe ME, Chou H-C, Zhang X, Marques J, Wachtel M, Schaefer B, Sen N, Song Y, Gualtieri A, Pomella S, Rota R, Cleveland A, Wen X, Sindiri S, Wei JS, Barr FG, Das S, Andresson T, Guha R, Lal-Nag M, Ferrer M, Shern JF, Zhao K, Thomas CJ, Khan J.
    Cancer Discov. 7(8): 884-899, 2017. [ Journal Article ]
  4. Chang W, Brohl AS, Patidar R, Sindiri S, Shern JF, Wei JS, Song YK, Yohe ME, Gryder B, Zhang S, Calzone KA, Shivaprasad N, Wen X, Badgett TC, Miettinen M, Hartman KR, League-Pascual JC, Trahair TN, Widemann BC, Merchant MS, Kaplan RN, Lin JC, Khan J.
    Clin Cancer Res. 22(15): 810-20, 2016. [ Journal Article ]
  5. Shern JF, Chen L, Chmielecki J, Wei JS, Patidar R, Rosenberg M, Ambrogio L, Auclair D, Wang J, Song YK, Tolman C, Hurd L, Liao H, Zhang S, Bogen D, Brohl AS, Sindiri S, Catchpoole D, Badgett T, Getz G, Mora J, Anderson JR, Skapek SX, Barr FG, Meyerson M, Hawkins DS, Khan J.
    Cancer Discov. 4: 216-31, 2014. [ Journal Article ]

Dr. Khan obtained his bachelor's degree in 1984 and his master's degrees in 1989 in immunology and parasitology at England's University of Cambridge. He subsequently obtained his M.D. there and the postgraduate degree of MRCP (Membership of the Royal College of Physicians), equivalent to board certification in the United States. After clinical training in internal medicine and pediatrics as well as other specialties, he received a Leukemia Research Fellowship. In May 2001, Dr. Khan joined the Pediatric Branch, NCI, as a tenure track investigator. Dr. Khan and colleagues have published a new model for diagnosis of cancer using artificial neural networks (ANN), a form of artificial intelligence, and microarray technology. In April 2001, Dr. Khan was recognized by the American Association for Cancer Research for his work in tumor profiling by receiving a Scholar in Training Award. Recently Dr Khan has led an international collaboration to perform comprehensive analysis of pediatric cancer genomes using next generation sequencing strategies.

Name Position
Abdalla Abdelmaksoud B.S. Bioinformatics Analyst (Leidos)
Andrew Brohl M.D. Guest Researcher
Tai Chi (Adam) Cheuk Ph.D. Scientist (Contr.)
Hsien-Chao Chou Ph.D. Bioinformatics Analyst (Contr.)
Berkley Gryder Ph.D. Scientist (Contr.)
Robert G. Hawley Ph.D. Research Collaborator
Yong Kim M.D., Ph.D. Clinical Fellow
Igor Kuznetsov Ph.D. Research Collaborator
Katherine Masih B.S. Predoctoral Fellow (Medical Student)
David E. Milewski Ph.D. Postdoctoral Fellow (CRTA)
Young Song Ph.D. Biologist
Beverly Stalker Program Specialist
Meijie Tian Ph.D. Postdoctoral Fellow (Visiting)
Chaoyu Wang B.S. Biologist
Jun S. Wei, Ph.D. Staff Scientist
Xinyu Wen M.S. Bioinformatics Analyst (Leidos)

Data Analysis Tools

  • Online Array CGH Analysis Tool (under construction)
  • Primer Database:  Contains validated genome-wide sequencing from a published article by Sjoblom, et al. Science 314:268-74, 2006
  • NB-cMAP:  NCI-Neuroblastoma Connectivity Map

Bioinformatic Analysis Training

Code locations: 

   ChIP-seq: /data/khanlab/projects/ChIP_seq/projects/

   RNA-seq: /data/khanlab/projects/ChIP_seq/RNA_DATA/RNA_projects/


Section 1: Navigating Biowulf, Pipeline Output, and Unsupervised Correlation 

Video 1.1: What is biowulf? What is the pipeline? 
  • ChIPseq Bioinformatic Pipeline Overview (0:00)

  • Locate your data (2:45)

  • Input metadata into ChIP_seq_samples.xlsx (4:20)

  • Example of pipeline output (8:15)

  • Viewing files in IGV (8:50)

  • Load from ENCODE (10:10)

  • Choosing a p-value (12:10 and 21:40)

  • GREAT ontology (14:35) 

  • Viewing summary files to determine total number of peaks (20:15)

  • Homer motif analysis (23:10)

  • Identifying mutations from ATACseq data (27:50)


Video 1.2: Summarize peaks 
  • Where to find scripts (1:10)

  • Logging on to biowulf and running scripts (2:15)

    • in putty, type biowulf2.nih.gov and click “Open”

    • answer username and password prompts

  • Navigating in biowulf (4:40)

    • type: cd /data/khanlab/projects/ChIP_seq/

  • Code_Builder (8:40)

    • ***code_builder created during these videos is saved onto the google drive***

  • Summarize peaks across samples (10:40)

    • Find ChIP_seq_samples: khanlab/projects/ChIP_seq/manage_samples

  • Using summarized peaks to determine p-value (21:40)

  • Summary of other analyses that can be done with ATACseq data (28:00)


Video 1.3: Run an unsupervised correlation 
  • Figures produced by unbiased correlation of ATACseq data (0:00)

  • What is the folder path for correlations? (0:45)

  • What files are generated? (1:30)

  • Code_Builder for correlation (2:30)

  • Create sample files using RStudio (4:00)

  • Defining different p-values for your samples for all downstream analysis (14:00)

  • Run correlation on created sample files using biowulf (30:15)

    • ***Ultimately does not work, skip to Video 1.4 ***

  • Checking your “queue” on biowulf (31:40)

  • Output of correlation (38:00)


Video 1.4: 
  • What was done in Video 1.3 (0:00)

  • Code_builder explanation (1:10)

  • How to format bed, bam, and sample list files to be Unix compatible (1:30)

  • Checking the slurm (5:50)

  • Viewing peaks from generated beds in IGV (11:00)


Section 2: Running BCHNV 

Video 2.1: 
  • What is BCHNV? (0:00)

  • Gathering necessary (configuration, run, etc.) files (7:00)

  • Code_Builder BCHNV (9:20)

  • Customizing BCHNV run parameters in the shell script and code_builder (11:50)

  • Editing configuration file (22:20)

  • Determining color scheme (31:20)

  • Gathering .bed files (38:00)

  • Edit permissions (49:00)


Section 3: Gathering, Merging, and Combining Bed Files

Video 3.1: “bed tools” 

  • Using ChIP_seq_samples.xlsx to gather beds (0:00)

  • Intro to RStudio (2:30)

  • Gathering .beds with RStudio (6:00)

  • Merge .beds using bedtools merge, how to count, etc (20:00)

  • Running script on biowulf (29:00)


Video 3.2: Combining beds with “.cat” 
  • How to combine bed files that are not in the ChIPseq_samples excel spreadsheet

  • Shortcut to combine all files with the same ending in a folder (2:40)


Section 4: Downloading Data from Gene Expression Omnibus (GEO)

Video 4.1: RNA data


Video 4.2: ChIPseq or ATACseq data 


Section 5: Processing and Analyzing RNAseq Data (NOT main Khan Lab Pipeline)

Video 5.1: 
  • Overview on pipeline output (0:00)

  • RNAseq Code Builder (3:40)

  • Putting RNA samples in ChIP_seq_samples.xlsx (6:45)


Video 5.2: 
  • Review of RNAseq pipeline outputs (0:00)

  • Locating R scripts (1:40)

  • TPM vs. FKPM (4:30)

  • Viewing values in R (7:55)

  • Build Matrix (11:00)

  • Make a gene expression heat map (13:40)

  • Create a comparison scatter plot (16:20)

  • GSEA rank list maker (24:30)

  • Download GSEA (39:40)

    • software.broadinstitute.org/gsea/login.jsp

  • Heat map for smaller gene sets. (48:00)

  • GSEA Output (57:55)


Video 5.3: 
  • Where to find/What is RNA_projects (0:00)

    • Notes: 

      • Clear environment in R = “Ctrl” + “Shift” + “F10”

      • View a “value” in R = start typing the name, “up arrow”, “enter”

  • Create TPM Matrix for RNAseq data (1:45)

  • Inversion grep (14:09)

    • “invert=T” 

  • Visualize with a heatmap (18:10)

  • Install “pheatmap” (19:08)

  • Heatmap with only genes of interest (23:20)

    • Change subset, meaning select columns from matrix (25:30)

  • Confirming expression level with IGV (29:20)

    • VERY important when using Access RNAseq data

    • Look for 5’UTRs and long exons without probes 

  • Identify housekeeping genes (34:00)

    • K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/HouseKeeping_genes.txt 

    • From Cancer Discovery paper… a compilation of a variety of normal tissues

  • Define housekeeping genes within your own sample set (39:00)

  • Use a list to compare 2 gene sets (44:00)

  • Transcription Factors gene set - Based on the Broad’s + Berkley’s additions (49:00)

    • K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/TFs_Epimachines.genelist.txt 

  • Find specialized TFs for your sample set (52:30) 

  • Edit heat map’s scale (1:08:15)

  • Overview of TF identification strategy (1:13:00)

  • Video summary/where to find files (1:19:40) 


Video 5.4 (Continuation of Video 5.3): 
  • Load in second set of samples (1:00)

  • Identify maximum in a matrix (3:30)

  • List out genes in a defined gene set (9:00)

  • Explanation of how we could theoretically ignore UTRs and compare multiple RNAseq data types (20:00)

  • What you can compare using Access RNAseq data (21:40)

  • Ideas of how to look at your data after you’ve filtered out multiple gene lists (26:00)

  • Using Microsoft Excel to look at your data and run T test, log scale, etc (36:30)


Section 6: ChIPseq qPCR, Pipeline, and Analysis 

Video 6.1: Designing ChIP qPCR Primers
  • Rationale on where to design primers (0:20)

  • Visualizing data in IGV (3:00)

  • Choosing locations (8:00)

  • “Designing Primers for ChIP.docx” (10:00)

    • follow links and instructions in file

  • Go to www.idt.dna.com/pages 

    • use Young’s account info, and tell him you have primers to order

    • 25 nm DNA Oligo with Standard Desalting

  • Add primers to qPCR_primers_v2.bed using genome.ucsc.edu (18:30)


Video 6.2: How to run ChIP qPCR
  • Where do I find the files and which files are there? (1:00)

  • Setting up the plate in excel (3:00)

  • Run qPCR follow protocol in “ChIP-qPCR ViiA7 protocol.docx” (12:00)


Video 6.3: Auto-launching ChIPseq Pipeline 
  • See “Video 1.1” for a more detailed overview of the pipeline output

  • Input metadata (0:10)

    • preferably before sequencing run

  • Importance of file naming (3:00)

  • How to manually launch the pipeline (7:30)

    • SEE YOUNG OR JUN FIRST! (not for a normal situation)


Video 6.4: Gathering Homer Motifs and Summarizing Enhancers
  • Where to find files and creating input file (1:00) 

  • Viewing all motif data based on all called peaks in excel ( 3:50)

    • ***Note mistake made in sorting… corrected at (16:00)

  • Look at homerMotifs.R to see scripts that you can use to manipulate the data (7:00)

    • can upload the created matrix into R and use pheatmap 

  • Correlating to Coltron Output (13:30) 

    • “DEGREE_Table” 

    • Use to tease out which TFs are likely more significant 

      • what is expressed? 

      • what has super enhancers? 


Video 6.5: Compare motif enrichment across various groups of samples
  • R script = gatherBCHN_motifs.R (0.15)

    • Gather files (7:00) and grep

    • Make matrix (11:00)

    • Heatmap data (13:30)

    • Filter for highly enriched motifs (15:00)

  • “Wide format”? (11:30)

  • Using generated “allmotifs.wide.text” file with RNAseq data (21:50)


Section 7: Making and Editing Figures 

Video 7.1: Figures for Western Blots in Adobe Illustrator 

  • “masking” Illustrator (0:00)

    • object > clipping mask > make 

    • double click inside the box to see inside mask (9:50)

  • “rectangle tool” (3:00)

  • “rotation” of a bitmap (4:00)

  • “align” (9:00)

    • Can make any line straight by using the “Shift” ke