Resources

Bioinformatic Analysis Training

Code locations:

   ChIP-seq: /data/khanlab/projects/ChIP_seq/projects/

   RNA-seq: /data/khanlab/projects/ChIP_seq/RNA_DATA/RNA_projects/

Section 1: Navigating Biowulf, Pipeline Output, and Unsupervised Correlation 

Video 1.1: What is biowulf? What is the pipeline? 
  • ChIPseq Bioinformatic Pipeline Overview (0:00)

  • Locate your data (2:45)

  • Input metadata into ChIP_seq_samples.xlsx (4:20)

  • Example of pipeline output (8:15)

  • Viewing files in IGV (8:50)

  • Load from ENCODE (10:10)

  • Choosing a p-value (12:10 and 21:40)

  • GREAT ontology (14:35) 

  • Viewing summary files to determine total number of peaks (20:15)

  • Homer motif analysis (23:10)

  • Identifying mutations from ATACseq data (27:50)

Video 1.2: Summarize peaks 
  • Where to find scripts (1:10)

  • Logging on to biowulf and running scripts (2:15)

    • in putty, type biowulf2.nih.gov and click “Open”

    • answer username and password prompts

  • Navigating in biowulf (4:40)

    • type: cd /data/khanlab/projects/ChIP_seq/

  • Code_Builder (8:40)

    • ***code_builder created during these videos is saved onto the google drive***

  • Summarize peaks across samples (10:40)

    • Find ChIP_seq_samples: khanlab/projects/ChIP_seq/manage_samples

  • Using summarized peaks to determine p-value (21:40)

  • Summary of other analyses that can be done with ATACseq data (28:00)

Video 1.3: Run an unsupervised correlation 
  • Figures produced by unbiased correlation of ATACseq data (0:00)

  • What is the folder path for correlations? (0:45)

  • What files are generated? (1:30)

  • Code_Builder for correlation (2:30)

  • Create sample files using RStudio (4:00)

  • Defining different p-values for your samples for all downstream analysis (14:00)

  • Run correlation on created sample files using biowulf (30:15)

    • ***Ultimately does not work, skip to Video 1.4 ***

  • Checking your “queue” on biowulf (31:40)

  • Output of correlation (38:00)

Video 1.4: 
  • What was done in Video 1.3 (0:00)

  • Code_builder explanation (1:10)

  • How to format bed, bam, and sample list files to be Unix compatible (1:30)

  • Checking the slurm (5:50)

  • Viewing peaks from generated beds in IGV (11:00)

Section 2: Running BCHNV 

Video 2.1: 
  • What is BCHNV? (0:00)

  • Gathering necessary (configuration, run, etc.) files (7:00)

  • Code_Builder BCHNV (9:20)

  • Customizing BCHNV run parameters in the shell script and code_builder (11:50)

  • Editing configuration file (22:20)

  • Determining color scheme (31:20)

  • Gathering .bed files (38:00)

  • Edit permissions (49:00)

Section 3: Gathering, Merging, and Combining Bed Files

Video 3.1: “bed tools” 

  • Using ChIP_seq_samples.xlsx to gather beds (0:00)

  • Intro to RStudio (2:30)

  • Gathering .beds with RStudio (6:00)

  • Merge .beds using bedtools merge, how to count, etc (20:00)

  • Running script on biowulf (29:00)

Video 3.2: Combining beds with “.cat” 
  • How to combine bed files that are not in the ChIPseq_samples excel spreadsheet

  • Shortcut to combine all files with the same ending in a folder (2:40)

Section 4: Downloading Data from Gene Expression Omnibus (GEO)

Video 4.1: RNA data
Video 4.2: ChIPseq or ATACseq data 

Section 5: Processing and Analyzing RNAseq Data (NOT main Khan Lab Pipeline)

Video 5.1: 
  • Overview on pipeline output (0:00)

  • RNAseq Code Builder (3:40)

  • Putting RNA samples in ChIP_seq_samples.xlsx (6:45)

Video 5.2: 
  • Review of RNAseq pipeline outputs (0:00)

  • Locating R scripts (1:40)

  • TPM vs. FKPM (4:30)

  • Viewing values in R (7:55)

  • Build Matrix (11:00)

  • Make a gene expression heat map (13:40)

  • Create a comparison scatter plot (16:20)

  • GSEA rank list maker (24:30)

  • Download GSEA (39:40)

    • software.broadinstitute.org/gsea/login.jsp

  • Heat map for smaller gene sets. (48:00)

  • GSEA Output (57:55)

Video 5.3: 
  • Where to find/What is RNA_projects (0:00)

    • Notes: 

      • Clear environment in R = “Ctrl” + “Shift” + “F10”

      • View a “value” in R = start typing the name, “up arrow”, “enter”

  • Create TPM Matrix for RNAseq data (1:45)

  • Inversion grep (14:09)

    • “invert=T” 

  • Visualize with a heatmap (18:10)

  • Install “pheatmap” (19:08)

  • Heatmap with only genes of interest (23:20)

    • Change subset, meaning select columns from matrix (25:30)

  • Confirming expression level with IGV (29:20)

    • VERY important when using Access RNAseq data

    • Look for 5’UTRs and long exons without probes 

  • Identify housekeeping genes (34:00)

    • K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/HouseKeeping_genes.txt 

    • From Cancer Discovery paper… a compilation of a variety of normal tissues

  • Define housekeeping genes within your own sample set (39:00)

  • Use a list to compare 2 gene sets (44:00)

  • Transcription Factors gene set - Based on the Broad’s + Berkley’s additions (49:00)

    • K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/TFs_Epimachines.genelist.txt 

  • Find specialized TFs for your sample set (52:30) 

  • Edit heat map’s scale (1:08:15)

  • Overview of TF identification strategy (1:13:00)

  • Video summary/where to find files (1:19:40) 

Video 5.4 (Continuation of Video 5.3): 
  • Load in second set of samples (1:00)

  • Identify maximum in a matrix (3:30)

  • List out genes in a defined gene set (9:00)

  • Explanation of how we could theoretically ignore UTRs and compare multiple RNAseq data types (20:00)

  • What you can compare using Access RNAseq data (21:40)

  • Ideas of how to look at your data after you’ve filtered out multiple gene lists (26:00)

  • Using Microsoft Excel to look at your data and run T test, log scale, etc (36:30)

Section 6: ChIPseq qPCR, Pipeline, and Analysis 

Video 6.1: Designing ChIP qPCR Primers
  • Rationale on where to design primers (0:20)

  • Visualizing data in IGV (3:00)

  • Choosing locations (8:00)

  • “Designing Primers for ChIP.docx” (10:00)

    • follow links and instructions in file

  • Go to www.idt.dna.com/pages 

    • use Young’s account info, and tell him you have primers to order

    • 25 nm DNA Oligo with Standard Desalting

  • Add primers to qPCR_primers_v2.bed using genome.ucsc.edu (18:30)

Video 6.2: How to run ChIP qPCR
  • Where do I find the files and which files are there? (1:00)

  • Setting up the plate in excel (3:00)

  • Run qPCR follow protocol in “ChIP-qPCR ViiA7 protocol.docx” (12:00)

Video 6.3: Auto-launching ChIPseq Pipeline 
  • See “Video 1.1” for a more detailed overview of the pipeline output

  • Input metadata (0:10)

    • preferably before sequencing run

  • Importance of file naming (3:00)

  • How to manually launch the pipeline (7:30)

    • SEE YOUNG OR JUN FIRST! (not for a normal situation)

Video 6.4: Gathering Homer Motifs and Summarizing Enhancers
  • Where to find files and creating input file (1:00) 

  • Viewing all motif data based on all called peaks in excel ( 3:50)

    • ***Note mistake made in sorting… corrected at (16:00)

  • Look at homerMotifs.R to see scripts that you can use to manipulate the data (7:00)

    • can upload the created matrix into R and use pheatmap 

  • Correlating to Coltron Output (13:30) 

    • “DEGREE_Table” 

    • Use to tease out which TFs are likely more significant 

      • what is expressed? 

      • what has super enhancers? 

Video 6.5: Compare motif enrichment across various groups of samples
  • R script = gatherBCHN_motifs.R (0.15)

    • Gather files (7:00) and grep

    • Make matrix (11:00)

    • Heatmap data (13:30)

    • Filter for highly enriched motifs (15:00)

  • “Wide format”? (11:30)

  • Using generated “allmotifs.wide.text” file with RNAseq data (21:50)

Section 7: Making and Editing Figures 

Video 7.1: Figures for Western Blots in Adobe Illustrator 
  • “masking” Illustrator (0:00)

    • object > clipping mask > make 

    • double click inside the box to see inside mask (9:50)

  • “rectangle tool” (3:00)

  • “rotation” of a bitmap (4:00)

  • “align” (9:00)

    • Can make any line straight by using the “Shift” key