Resources

Bioinformatic Analysis Training

Code locations:

ChIP-seq: /data/khanlab/projects/ChIP_seq/projects/

RNA-seq: /data/khanlab/projects/ChIP_seq/RNA_DATA/RNA_projects/

Section 1: Navigating Biowulf, Pipeline Output, and Unsupervised Correlation

Video 1.1: What is biowulf? What is the pipeline?

ChIPseq Bioinformatic Pipeline Overview (0:00)
Locate your data (2:45)
Input metadata into ChIP_seq_samples.xlsx (4:20)
Example of pipeline output (8:15)
Viewing files in IGV (8:50)
Load from ENCODE (10:10)
Choosing a p-value (12:10 and 21:40)
GREAT ontology (14:35)
Viewing summary files to determine total number of peaks (20:15)
Homer motif analysis (23:10)
Identifying mutations from ATACseq data (27:50)

Video 1.2: Summarize peaks

Where to find scripts (1:10)
Logging on to biowulf and running scripts (2:15)
- in putty, type biowulf2.nih.gov and click “Open”
- answer username and password prompts
Navigating in biowulf (4:40)
- type: cd /data/khanlab/projects/ChIP_seq/
Code_Builder (8:40)
- ***code_builder created during these videos is saved onto the google drive***
Summarize peaks across samples (10:40)
- Find ChIP_seq_samples: khanlab/projects/ChIP_seq/manage_samples
Using summarized peaks to determine p-value (21:40)
Summary of other analyses that can be done with ATACseq data (28:00)

Video 1.3: Run an unsupervised correlation

Figures produced by unbiased correlation of ATACseq data (0:00)
What is the folder path for correlations? (0:45)
What files are generated? (1:30)
Code_Builder for correlation (2:30)
Create sample files using RStudio (4:00)
Defining different p-values for your samples for all downstream analysis (14:00)
Run correlation on created sample files using biowulf (30:15)
- ***Ultimately does not work, skip to Video 1.4 ***
Checking your “queue” on biowulf (31:40)
Output of correlation (38:00)

Video 1.4:

What was done in Video 1.3 (0:00)
Code_builder explanation (1:10)
How to format bed, bam, and sample list files to be Unix compatible (1:30)
Checking the slurm (5:50)
Viewing peaks from generated beds in IGV (11:00)

Section 2: Running BCHNV

Video 2.1:

What is BCHNV? (0:00)
Gathering necessary (configuration, run, etc.) files (7:00)
Code_Builder BCHNV (9:20)
Customizing BCHNV run parameters in the shell script and code_builder (11:50)
Editing configuration file (22:20)
Determining color scheme (31:20)
Gathering .bed files (38:00)
Edit permissions (49:00)

Section 3: Gathering, Merging, and Combining Bed Files

Video 3.1: “bed tools”

Using ChIP_seq_samples.xlsx to gather beds (0:00)
Intro to RStudio (2:30)
Gathering .beds with RStudio (6:00)
Merge .beds using bedtools merge, how to count, etc (20:00)
Running script on biowulf (29:00)

Video 3.2: Combining beds with “.cat”

How to combine bed files that are not in the ChIPseq_samples excel spreadsheet
Shortcut to combine all files with the same ending in a folder (2:40)

Section 4: Downloading Data from Gene Expression Omnibus (GEO)

Video 4.1: RNA data

Downloading RNA data using ChIPstack and the ChIPseq pipeline (0:00)
- https://www.ncbi.nlm.nih.gov/geo/

Video 4.2: ChIPseq or ATACseq data

Downloading ChIP or ATAC data using ChIPstack and the ChIPseq pipeline (0:00)
- https://www.ncbi.nlm.nih.gov/geo/

Section 5: Processing and Analyzing RNAseq Data (NOT main Khan Lab Pipeline)

Video 5.1:

Overview on pipeline output (0:00)
RNAseq Code Builder (3:40)
Putting RNA samples in ChIP_seq_samples.xlsx (6:45)

Video 5.2:

Review of RNAseq pipeline outputs (0:00)
Locating R scripts (1:40)
TPM vs. FKPM (4:30)
- https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
Viewing values in R (7:55)
Build Matrix (11:00)
Make a gene expression heat map (13:40)
Create a comparison scatter plot (16:20)
GSEA rank list maker (24:30)
Download GSEA (39:40)
- software.broadinstitute.org/gsea/login.jsp
Heat map for smaller gene sets. (48:00)
GSEA Output (57:55)

Video 5.3:

Where to find/What is RNA_projects (0:00)
- Notes:
  - Clear environment in R = “Ctrl” + “Shift” + “F10”
  - View a “value” in R = start typing the name, “up arrow”, “enter”
Create TPM Matrix for RNAseq data (1:45)
Inversion grep (14:09)
- “invert=T”
Visualize with a heatmap (18:10)
Install “pheatmap” (19:08)
Heatmap with only genes of interest (23:20)
- Change subset, meaning select columns from matrix (25:30)
Confirming expression level with IGV (29:20)
- VERY important when using Access RNAseq data
- Look for 5’UTRs and long exons without probes
Identify housekeeping genes (34:00)
- K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/HouseKeeping_genes.txt
- From Cancer Discovery paper… a compilation of a variety of normal tissues
Define housekeeping genes within your own sample set (39:00)
Use a list to compare 2 gene sets (44:00)
Transcription Factors gene set - Based on the Broad’s + Berkley’s additions (49:00)
- K:/projects/ChIP_seq/RNA_DATA/RNA_projects/Genesets/Qlucore_format/TFs_Epimachines.genelist.txt
Find specialized TFs for your sample set (52:30)
Edit heat map’s scale (1:08:15)
Overview of TF identification strategy (1:13:00)
Video summary/where to find files (1:19:40)

Video 5.4 (Continuation of Video 5.3):

Load in second set of samples (1:00)
Identify maximum in a matrix (3:30)
List out genes in a defined gene set (9:00)
Explanation of how we could theoretically ignore UTRs and compare multiple RNAseq data types (20:00)
What you can compare using Access RNAseq data (21:40)
Ideas of how to look at your data after you’ve filtered out multiple gene lists (26:00)
Using Microsoft Excel to look at your data and run T test, log scale, etc (36:30)

Section 6: ChIPseq qPCR, Pipeline, and Analysis

Video 6.1: Designing ChIP qPCR Primers

Rationale on where to design primers (0:20)
Visualizing data in IGV (3:00)
Choosing locations (8:00)
“Designing Primers for ChIP.docx” (10:00)
- follow links and instructions in file
Go to www.idt.dna.com/pages
- use Young’s account info, and tell him you have primers to order
- 25 nm DNA Oligo with Standard Desalting
Add primers to qPCR_primers_v2.bed using genome.ucsc.edu (18:30)

Video 6.2: How to run ChIP qPCR

Where do I find the files and which files are there? (1:00)
Setting up the plate in excel (3:00)
Run qPCR follow protocol in “ChIP-qPCR ViiA7 protocol.docx” (12:00)

Video 6.3: Auto-launching ChIPseq Pipeline

See “Video 1.1” for a more detailed overview of the pipeline output
Input metadata (0:10)
- preferably before sequencing run
Importance of file naming (3:00)
How to manually launch the pipeline (7:30)
- SEE YOUNG OR JUN FIRST! (not for a normal situation)

Video 6.4: Gathering Homer Motifs and Summarizing Enhancers

Where to find files and creating input file (1:00)
Viewing all motif data based on all called peaks in excel ( 3:50)
- ***Note mistake made in sorting… corrected at (16:00)
Look at homerMotifs.R to see scripts that you can use to manipulate the data (7:00)
- can upload the created matrix into R and use pheatmap
Correlating to Coltron Output (13:30)
- “DEGREE_Table”
- Use to tease out which TFs are likely more significant
  - what is expressed?
  - what has super enhancers?

Video 6.5: Compare motif enrichment across various groups of samples

R script = gatherBCHN_motifs.R (0.15)
- Gather files (7:00) and grep
- Make matrix (11:00)
- Heatmap data (13:30)
- Filter for highly enriched motifs (15:00)
“Wide format”? (11:30)
Using generated “allmotifs.wide.text” file with RNAseq data (21:50)

Section 7: Making and Editing Figures

Video 7.1: Figures for Western Blots in Adobe Illustrator

“masking” Illustrator (0:00)
- object > clipping mask > make
- double click inside the box to see inside mask (9:50)
“rectangle tool” (3:00)
“rotation” of a bitmap (4:00)
“align” (9:00)
- Can make any line straight by using the “Shift” key

Breadcrumb

Bioinformatic Analysis Training

Section 1: Navigating Biowulf, Pipeline Output, and Unsupervised Correlation

Section 2: Running BCHNV

Section 3: Gathering, Merging, and Combining Bed Files

Section 4: Downloading Data from Gene Expression Omnibus (GEO)

Section 5: Processing and Analyzing RNAseq Data (NOT main Khan Lab Pipeline)

Section 6: ChIPseq qPCR, Pipeline, and Analysis

Section 7: Making and Editing Figures