scientific-skills/deeptools/references/workflows.md
This document provides complete workflow examples for common deepTools analyses.
Complete quality control assessment for ChIP-seq experiments.
Compare replicates and samples to verify experimental quality:
# Generate coverage matrix across genome
multiBamSummary bins \
--bamfiles Input1.bam Input2.bam ChIP1.bam ChIP2.bam \
--labels Input_rep1 Input_rep2 ChIP_rep1 ChIP_rep2 \
-o readCounts.npz \
--numberOfProcessors 8
# Create correlation heatmap
plotCorrelation \
-in readCounts.npz \
--corMethod pearson \
--whatToShow heatmap \
--plotFile correlation_heatmap.png \
--plotNumbers
# Generate PCA plot
plotPCA \
-in readCounts.npz \
-o PCA_plot.png \
-T "PCA of ChIP-seq samples"
Expected Results:
# Check sequencing depth and coverage
plotCoverage \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--labels Input ChIP_rep1 ChIP_rep2 \
--plotFile coverage.png \
--ignoreDuplicates \
--numberOfProcessors 8
Interpretation: Assess whether sequencing depth is adequate for downstream analysis.
# Verify expected fragment sizes
bamPEFragmentSize \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--histogram fragmentSizes.png \
--plotTitle "Fragment Size Distribution"
Expected Results: Fragment sizes should match library preparation protocols (typically 200-600bp for ChIP-seq).
# Compute GC bias
computeGCBias \
--bamfile ChIP1.bam \
--effectiveGenomeSize 2913022398 \
--genome genome.2bit \
--fragmentLength 200 \
--biasPlot GCbias.png \
--frequenciesFile freq.txt
# If bias detected, correct it
correctGCBias \
--bamfile ChIP1.bam \
--effectiveGenomeSize 2913022398 \
--genome genome.2bit \
--GCbiasFrequenciesFile freq.txt \
--correctedFile ChIP1_GCcorrected.bam
Note: Only correct if significant bias is observed. Do NOT use --ignoreDuplicates with GC-corrected files.
# Evaluate ChIP enrichment quality
plotFingerprint \
--bamfiles Input1.bam ChIP1.bam ChIP2.bam \
--labels Input ChIP_rep1 ChIP_rep2 \
--plotFile fingerprint.png \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8 \
--outQualityMetrics fingerprint_metrics.txt
Interpretation:
Complete workflow from BAM files to publication-quality visualizations.
# Input control
bamCoverage \
--bam Input.bam \
--outFileName Input_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
# ChIP sample
bamCoverage \
--bam ChIP.bam \
--outFileName ChIP_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
# Compare ChIP to Input
bamCompare \
--bamfile1 ChIP.bam \
--bamfile2 Input.bam \
--outFileName ChIP_vs_Input_log2ratio.bw \
--operation log2 \
--scaleFactorsMethod readCount \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
Result: Log2 ratio track showing enrichment (positive values) and depletion (negative values).
# Prepare data for heatmap/profile around transcription start sites
computeMatrix reference-point \
--referencePoint TSS \
--scoreFileName ChIP_coverage.bw \
--regionsFileName genes.bed \
--beforeRegionStartLength 3000 \
--afterRegionStartLength 3000 \
--binSize 10 \
--sortRegions descend \
--sortUsing mean \
--outFileName matrix_TSS.gz \
--outFileNameMatrix matrix_TSS.tab \
--numberOfProcessors 8
# Create heatmap around TSS
plotHeatmap \
--matrixFile matrix_TSS.gz \
--outFileName heatmap_TSS.png \
--colorMap RdBu \
--whatToShow 'plot, heatmap and colorbar' \
--zMin -3 --zMax 3 \
--yAxisLabel "Genes" \
--xAxisLabel "Distance from TSS (bp)" \
--refPointLabel "TSS" \
--heatmapHeight 15 \
--kmeans 3
# Create meta-profile around TSS
plotProfile \
--matrixFile matrix_TSS.gz \
--outFileName profile_TSS.png \
--plotType lines \
--perGroup \
--colors blue \
--plotTitle "ChIP-seq signal around TSS" \
--yAxisLabel "Average signal" \
--xAxisLabel "Distance from TSS (bp)" \
--refPointLabel "TSS"
# Calculate enrichment in peak regions
plotEnrichment \
--bamfiles Input.bam ChIP.bam \
--BED peaks.bed \
--labels Input ChIP \
--plotFile enrichment.png \
--outRawCounts enrichment_counts.tab \
--extendReads 200 \
--ignoreDuplicates
Generate strand-specific coverage tracks for RNA-seq data.
bamCoverage \
--bam rnaseq.bam \
--outFileName forward_coverage.bw \
--filterRNAstrand forward \
--normalizeUsing CPM \
--binSize 1 \
--numberOfProcessors 8
bamCoverage \
--bam rnaseq.bam \
--outFileName reverse_coverage.bw \
--filterRNAstrand reverse \
--normalizeUsing CPM \
--binSize 1 \
--numberOfProcessors 8
Important: Do NOT use --extendReads for RNA-seq (would extend over splice junctions).
Compare multiple ChIP-seq samples (e.g., different conditions or time points).
# For each sample
for sample in Control_ChIP Treated_ChIP; do
bamCoverage \
--bam ${sample}.bam \
--outFileName ${sample}.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 10 \
--extendReads 200 \
--ignoreDuplicates \
--numberOfProcessors 8
done
computeMatrix scale-regions \
--scoreFileName Control_ChIP.bw Treated_ChIP.bw \
--regionsFileName genes.bed \
--beforeRegionStartLength 1000 \
--afterRegionStartLength 1000 \
--regionBodyLength 3000 \
--binSize 10 \
--sortRegions descend \
--sortUsing mean \
--outFileName matrix_multi.gz \
--numberOfProcessors 8
plotHeatmap \
--matrixFile matrix_multi.gz \
--outFileName heatmap_comparison.png \
--colorMap Blues \
--whatToShow 'plot, heatmap and colorbar' \
--samplesLabel Control Treated \
--yAxisLabel "Genes" \
--heatmapHeight 15 \
--kmeans 4
plotProfile \
--matrixFile matrix_multi.gz \
--outFileName profile_comparison.png \
--plotType lines \
--perGroup \
--colors blue red \
--samplesLabel Control Treated \
--plotTitle "ChIP-seq signal comparison" \
--startLabel "TSS" \
--endLabel "TES"
Specialized workflow for ATAC-seq data with Tn5 offset correction.
alignmentSieve \
--bam atacseq.bam \
--outFile atacseq_shifted.bam \
--ATACshift \
--minFragmentLength 38 \
--maxFragmentLength 2000 \
--ignoreDuplicates
bamCoverage \
--bam atacseq_shifted.bam \
--outFileName atacseq_coverage.bw \
--normalizeUsing RPGC \
--effectiveGenomeSize 2913022398 \
--binSize 1 \
--numberOfProcessors 8
bamPEFragmentSize \
--bamfiles atacseq.bam \
--histogram fragmentSizes_atac.png \
--maxFragmentLength 1000
Expected Pattern: Nucleosome ladder with peaks at ~50bp (nucleosome-free), ~200bp (mono-nucleosome), ~400bp (di-nucleosome).
Analyze ChIP-seq signal specifically at peak regions.
computeMatrix reference-point \
--referencePoint center \
--scoreFileName ChIP_coverage.bw \
--regionsFileName peaks.bed \
--beforeRegionStartLength 2000 \
--afterRegionStartLength 2000 \
--binSize 10 \
--outFileName matrix_peaks.gz \
--numberOfProcessors 8
plotHeatmap \
--matrixFile matrix_peaks.gz \
--outFileName heatmap_peaks.png \
--colorMap YlOrRd \
--refPointLabel "Peak Center" \
--heatmapHeight 15 \
--sortUsing max
Solution: Use --region parameter to process chromosomes individually:
bamCoverage --bam input.bam -o chr1.bw --region chr1
Solution: Index BAM files before running deepTools:
samtools index input.bam
Solution: Increase --numberOfProcessors:
# Use 8 cores instead of default
--numberOfProcessors 8
Solution: Increase bin size:
--binSize 50 # or larger (default is 10-50)
--numberOfProcessors to available cores--region for testing or memory-limited environmentsalignmentSieve to create filtered BAM files once, then reuse--region chr1:1-1000000 for testing parameters