Peng Jiang, Ph.D.
- Center for Cancer Research
- National Cancer Institute
- Building 41, Room A100D
- Bethesda, MD, 20892
Dr. Jiang's research is focused on developing integrative frameworks that leverage the big-data resource in public domains to identify regulators of cancer therapy resistance. A general challenge in cancer research is the lack of data to understand the clinical efficacy of each treatment, while new drugs with distinct mechanisms of action get approved every year. To fill in the gap, we are developing statistical and machine learning infrastructures that transfer knowledge from a vast amount of previous data cohorts to the study of new cancer biology problems.
Areas of Expertise
1) big data integration, 2) cancer genomics, 3) machine learning, 4) biostatistics, 5) precision medicine, 6) cancer immunotherapy
For most anticancer drugs, we do not have precise rules for response prediction and mechanistic understanding of therapy resistance. Moreover, new drugs with distinct mechanisms of action get approved every year. But it takes many years to accumulate clinical data, which creates a significant gap between our current ability and the goal of cancer precision medicine. Our vision is that the data integration approach, leveraging the ever-growing volume of data from public domains, is a cost-effective solution to fill in the gap. Many statistical and machine learning methods can achieve knowledge transfer from previous data to the study of a new problem. Therefore, the general theme of our research is to develop infrastructures that transfer knowledge from big data to inform the cancer therapy decision.
The specific focus of our current work is how to utilize both genomics and imaging data to identify new regulators in cancer immune evasion. In the first direction, we study how to predict immune evasion regulators by leveraging the vast amount of functional genomics datasets and the spatial transcriptomics data produced by recent technological progress. In the second direction, we develop machine learning infrastructures for feature selection in imaging data to understand how spatial interaction among different cells can determine the anticancer immune response. Our deliverables are infrastructures that enable the users to leverage the vast amount of data resources in public domains to find immune evasion mechanisms in their own clinical studies.
A description of my previous research before joining NCI is available at https://scholar.harvard.edu/pengjiang/research
Peng Jiang, Ph.D.
Dr. Peng Jiang started his research program at the National Cancer Institute (NCI) in July 2019. His Lab focuses on developing big-data and artificial intelligence frameworks to identify biomarkers and new therapeutic approaches for cancer immunotherapies in solid tumors. Before joining NCI, he finished his postdoctoral training at the Dana Farber Cancer Institute and Harvard University. During his postdoctoral research, Peng developed computational frameworks that repurposed public domain data to identify biomarkers and regulators of cancer immunotherapy resistance. Notably, his computational model TIDE revealed that cancer cells could utilize the self-protection strategy of cytotoxic lymphocytes to resist lymphocyte killing under immune checkpoint blockade. Dr. Peng finished his Ph.D. at the Department of Computer Science & Lewis Sigler Genomics Institute at Princeton University, and his undergraduate study with the highest national honors at the Department of Computer Science at Tsinghua University (GPA rank 1st in his year). He is a recipient of the NCI K99 Pathway to Independence Award, the Scholar-In-Training Award of the American Association of Cancer Research, and the Technology Innovation Award of the Cancer Research Institute.
There are no open positions at this time. Check back again later, or take a look at CCR's Careers page.
Large-scale Data Integration
FDC (Framework for Data Curation)
The Framework for Data Curation (FDC, Jiang et al., Nature Methods 2021) aims to enable researchers to annotate the meta information of datasets in the GEO and ArrayExpress databases to enable automatic algorithmic analysis. Focusing on a research topic, users can input a query result, composing a list of dataset IDs, downloaded from the GEO and ArrayExpress databases. The server will download the meta information of uploaded dataset IDs. Then, curators will annotate the meta-information based on a set of pre-defined schemes. The annotated sample information will be combined with the processed data matrices from GEO and ArrayExpress databases to enable algorithmic analysis.
Cancer Therapy Response and Resistance
TRES (Tumor-Resilient T cell)
Despite breakthroughs in cancer immunotherapy, most T cells reactive to tumor targets cannot persist in immunosuppressive solid tumors. Identifying molecular programs of T cells sustaining effective antitumor immunity is the center of cancer research. We developed a computational framework named the tumor-resilient T cell (Tres) model. Tres utilizes single-cell transcriptomic data from solid tumors to identify signatures of T cells that are resilient to immunosuppressive signals, including TGF-beta, TRAIL, and PGE2. Analyzing single-cell data cohorts, the Tres model can predict the clinical efficacies of T cells in immune checkpoint blockade and adoptive cell transfer.
TIDE (Tumor Immune Dysfunction and Exclusion)
TIDE is an infrastructure with several modules to assist cancer immunotherapy applications and research (Jiang et al., Nature Medicine, 2018). The first component is a gene expression biomarker to predict the clinical response to immune checkpoint blockade. The input is a gene expression profile of a cancer sample measured by RNA-Seq on genome-scale or Nano-String on a gene panel. The output is a likelihood score of therapy response or resistance. The second component provides gene query functions for the gene activity associations with T-cell dysfunction and immunotherapy response. The input is a gene name. The output is the associations between gene activity and cancer immune evasion potentials computed from a vast amount of datasets from human clinical studies or pre-clinical models.
CARE (Computational Analysis of REsistance)
CARE is a software developed to identify genome-scale biomarkers of targeted therapy response using compound screen data (Jiang et al., Cell Systems 2018). For each drug, its CARE score vector can serve as a pattern of good responder. Patients will be predicted as responders or non-responders depending on the Pearson correlation between the gene expression profile of cancer samples and CARE score vector. For each gene, the CARE score indicates the association between its molecular alteration and drug efficacy. A positive score indicates a higher expression value (or presence of mutation) to be associated with drug response, while a negative score indicates drug resistance. You can search the results on CCLE, CTRP and CTRP datasets here. Please use the auto-completed name when available.
Biological Network Analysis
CytoSig (Cytokine Signaling Analyzer)
The Cytokine Signaling analyzer (CytoSig, Jiang et al., Nature Methods 2021) platform aims to help biologists to study the cellular response to cytokine signaling molecules (e.g., cytokines, chemokines, and growth factors), leveraging the public expression data from treatment experiments deposited in the NCBI GEO and ArrayExpress databases. You can query cell signals and analyze genes induced or repressed (SEARCH module). You can also input a gene expression profile, and analyze the enriched signals, leveraging the treatment response profiles collected (RUN module).
NEST (Network Essentiality Scoring Tool)
NEST is designed to predict the gene essentiality based on protein interaction network and gene expression or epigenetic profiles (Jiang et al., Genome Bio 2015). NEST can also be used to enhance the quality of CRISPR or shRNA screen results.
RABIT (Regression Analysis with Background InTegration)
RABIT is a very efficient feature selection algorithm (Jiang et al., PNAS 2015). We applied RABIT to find gene expression regulators in shaping tumor-specific gene expression patterns. The gene expression regulator could be a transcription factor or an RNA binding protein. Besides our application here, you can use RABIT as a general algorithm for feature selection.
SPICi (Speed and Performance In ClusterIng)
SPICi is a fast local network clustering algorithm (Jiang et al., Bioinformatics 2010). SPICi runs in time O(Vlog V +E) and space O(E), where V and E are the numbers of vertices and edges in the network. It also has a state-of-the-art performance with respect to the quality of the clusters it uncovers.
CCAT (Combinatorial Code Analysis Tool)
CCAT is a software package for predicting genome-wide co-binding between biological regulators such as transcription factors (TF) (Jiang et al., Nucleic Acids Res 2014) or RNA binding proteins (RBP) (Jiang et al., PLoS Comput Biol 2013). The CCAT package also includes accompanying tools to cluster similar Position weight matrix (PWM) of different TFs or RBPs into clusters, and search PWMs on multiple genome alignments for conserved motif instances.