scientific-skills/deeptools/references/tools_reference.md
This document provides a comprehensive reference for all deepTools command-line utilities organized by category.
Computes read coverages for genomic regions across multiple BAM files, outputting compressed numpy arrays for downstream correlation and PCA analysis.
Modes:
Key Parameters:
--bamfiles, -b: Indexed BAM files (space-separated, required)--outFileName, -o: Output coverage matrix file (required)--BED: Region specification file (BED-file mode only)--binSize: Window size in bases (default: 10,000)--labels: Custom sample identifiers--minMappingQuality: Quality threshold for read inclusion--numberOfProcessors, -p: Parallel processing cores--extendReads: Fragment size extension--ignoreDuplicates: Remove PCR duplicates--outRawCounts: Export tab-delimited file with coordinate columns and per-sample countsOutput: Compressed numpy array (.npz) for plotCorrelation and plotPCA
Common Usage:
# Genome-wide comparison
multiBamSummary bins --bamfiles sample1.bam sample2.bam -o results.npz
# Peak region comparison
multiBamSummary BED-file --BED peaks.bed --bamfiles sample1.bam sample2.bam -o results.npz
Similar to multiBamSummary but operates on bigWig files instead of BAM files. Used for comparing coverage tracks across samples.
Modes:
Key Parameters: Similar to multiBamSummary but accepts bigWig files
Converts BAM alignment files into normalized coverage tracks in bigWig or bedGraph formats. Calculates coverage as number of reads per bin.
Key Parameters:
--bam, -b: Input BAM file (required)--outFileName, -o: Output filename (required)--outFileFormat, -of: Output type (bigwig or bedgraph)--normalizeUsing: Normalization method
--effectiveGenomeSize: Mappable genome size (required for RPGC)--binSize: Resolution in base pairs (default: 50)--extendReads, -e: Extend reads to fragment length (recommended for ChIP-seq, NOT for RNA-seq)--centerReads: Center reads at fragment length for sharper signals--ignoreDuplicates: Count identical reads only once--minMappingQuality: Filter reads below quality threshold--minFragmentLength / --maxFragmentLength: Fragment length filtering--smoothLength: Window averaging for noise reduction--MNase: Analyze MNase-seq data for nucleosome positioning--Offset: Position-specific offsets (useful for RiboSeq, GROseq)--filterRNAstrand: Separate forward/reverse strand reads--ignoreForNormalization: Exclude chromosomes from normalization (e.g., sex chromosomes)--numberOfProcessors, -p: Parallel processingImportant Notes:
Common Usage:
# Basic coverage with RPKM normalization
bamCoverage --bam input.bam --outFileName coverage.bw --normalizeUsing RPKM
# ChIP-seq with extension
bamCoverage --bam chip.bam --outFileName chip_coverage.bw \
--binSize 10 --extendReads 200 --ignoreDuplicates
# Strand-specific RNA-seq
bamCoverage --bam rnaseq.bam --outFileName forward.bw \
--filterRNAstrand forward
Compares two BAM files by generating bigWig or bedGraph files, normalizing for sequencing depth differences. Processes genome in equal-sized bins and performs per-bin calculations.
Comparison Methods:
Normalization Methods:
Key Parameters:
--bamfile1, -b1: First BAM file (required)--bamfile2, -b2: Second BAM file (required)--outFileName, -o: Output filename (required)--outFileFormat: bigwig or bedgraph--operation: Comparison method (see above)--scaleFactorsMethod: Normalization method (see above)--binSize: Bin width for output (default: 50bp)--pseudocount: Avoid division by zero (default: 1)--extendReads: Extend reads to fragment length--ignoreDuplicates: Count identical reads once--minMappingQuality: Quality threshold--numberOfProcessors, -p: ParallelizationCommon Usage:
# Log2 ratio of treatment vs control
bamCompare -b1 treatment.bam -b2 control.bam -o log2ratio.bw
# Subtract control from treatment
bamCompare -b1 treatment.bam -b2 control.bam -o difference.bw \
--operation subtract --scaleFactorsMethod readCount
computeGCBias: Identifies GC-content bias from sequencing and PCR amplification.
correctGCBias: Corrects BAM files for GC bias detected by computeGCBias.
Key Parameters (computeGCBias):
--bamfile, -b: Input BAM file--effectiveGenomeSize: Mappable genome size--genome, -g: Reference genome in 2bit format--fragmentLength, -l: Fragment length (for single-end)--biasPlot: Output diagnostic plotKey Parameters (correctGCBias):
--bamfile, -b: Input BAM file--effectiveGenomeSize: Mappable genome size--genome, -g: Reference genome in 2bit format--GCbiasFrequenciesFile: Frequencies from computeGCBias--correctedFile, -o: Output corrected BAMImportant: Never use --ignoreDuplicates after GC bias correction
Filters BAM files by various quality metrics on-the-fly. Useful for creating filtered BAM files for specific analyses.
Key Parameters:
--bam, -b: Input BAM file--outFile, -o: Output BAM file--minMappingQuality: Minimum mapping quality--ignoreDuplicates: Remove duplicates--minFragmentLength / --maxFragmentLength: Fragment length filters--samFlagInclude / --samFlagExclude: SAM flag filtering--shift: Shift reads (e.g., for ATACseq Tn5 correction)--ATACshift: Automatically shift for ATAC-seq dataCalculates scores per genomic region and prepares matrices for plotHeatmap and plotProfile. Processes bigWig score files and BED/GTF region files.
Modes:
Key Parameters:
-R: Region file(s) in BED/GTF format (required)-S: BigWig score file(s) (required)-o: Output matrix file (required)-b: Upstream distance from reference point-a: Downstream distance from reference point-m: Region body length (scale-regions only)-bs, --binSize: Bin size for averaging scores--skipZeros: Skip regions with all zeros--minThreshold / --maxThreshold: Filter by signal intensity--sortRegions: ascending, descending, keep, no--sortUsing: mean, median, max, min, sum, region_length-p, --numberOfProcessors: Parallel processing--averageTypeBins: Statistical method (mean, median, min, max, sum, std)Output Options:
--outFileNameMatrix: Export tab-delimited data--outFileSortedRegions: Save filtered/sorted BED fileCommon Usage:
# TSS analysis
computeMatrix reference-point -S signal.bw -R genes.bed \
-o matrix.gz -b 2000 -a 2000 --referencePoint TSS
# Scaled gene body
computeMatrix scale-regions -S signal.bw -R genes.bed \
-o matrix.gz -b 1000 -a 1000 -m 3000
Quality control tool primarily for ChIP-seq experiments. Assesses whether antibody enrichment was successful. Generates cumulative read coverage profiles to distinguish signal from noise.
Key Parameters:
--bamfiles, -b: Indexed BAM files (required)--plotFile, -plot, -o: Output image filename (required)--extendReads, -e: Extend reads to fragment length--ignoreDuplicates: Count identical reads once--minMappingQuality: Mapping quality filter--centerReads: Center reads at fragment length--minFragmentLength / --maxFragmentLength: Fragment filters--outRawCounts: Save per-bin read counts--outQualityMetrics: Output QC metrics (Jensen-Shannon distance)--labels: Custom sample names--numberOfProcessors, -p: Parallel processingInterpretation:
Common Usage:
plotFingerprint -b input.bam chip1.bam chip2.bam \
--labels Input ChIP1 ChIP2 -o fingerprint.png \
--extendReads 200 --ignoreDuplicates
Visualizes average read distribution across the genome. Shows genome coverage and helps determine if sequencing depth is adequate.
Key Parameters:
--bamfiles, -b: BAM files to analyze (required)--plotFile, -o: Output plot filename (required)--ignoreDuplicates: Remove PCR duplicates--minMappingQuality: Quality threshold--outRawCounts: Save underlying data--labels: Sample names--numberOfSamples: Number of positions to sample (default: 1,000,000)Determines fragment length distribution for paired-end sequencing data. Essential QC to verify expected fragment sizes from library preparation.
Key Parameters:
--bamfiles, -b: BAM files (required)--histogram, -hist: Output histogram filename (required)--plotTitle, -T: Plot title--maxFragmentLength: Maximum length to consider (default: 1000)--logScale: Use logarithmic Y-axis--outRawFragmentLengths: Save raw fragment lengthsAnalyzes sample correlations from multiBamSummary or multiBigwigSummary outputs. Shows how similar different samples are.
Correlation Methods:
Visualization Options:
Key Parameters:
--corData, -in: Input matrix from multiBamSummary/multiBigwigSummary (required)--corMethod: pearson or spearman (required)--whatToShow: heatmap or scatterplot (required)--plotFile, -o: Output filename (required)--skipZeros: Exclude zero-value regions--removeOutliers: Use median absolute deviation (MAD) filtering--outFileCorMatrix: Export correlation matrix--labels: Custom sample names--plotTitle: Plot title--colorMap: Color scheme (50+ options)--plotNumbers: Display correlation values on heatmapCommon Usage:
# Heatmap with Pearson correlation
plotCorrelation -in readCounts.npz --corMethod pearson \
--whatToShow heatmap -o correlation_heatmap.png --plotNumbers
# Scatterplot with Spearman correlation
plotCorrelation -in readCounts.npz --corMethod spearman \
--whatToShow scatterplot -o correlation_scatter.png
Generates principal component analysis plots from multiBamSummary or multiBigwigSummary output. Displays sample relationships in reduced dimensionality.
Key Parameters:
--corData, -in: Coverage file from multiBamSummary/multiBigwigSummary (required)--plotFile, -o: Output image (png, eps, pdf, svg) (required)--outFileNameData: Export PCA data (loadings/rotation and eigenvalues)--labels, -l: Custom sample labels--plotTitle, -T: Plot title--plotHeight / --plotWidth: Dimensions in centimeters--colors: Custom symbol colors--markers: Symbol shapes--transpose: Perform PCA on transposed matrix (rows=samples)--ntop: Use top N variable rows (default: 1000)--PCs: Components to plot (default: 1 2)--log2: Log2-transform data before analysis--rowCenter: Center each row at 0Common Usage:
plotPCA -in readCounts.npz -o PCA_plot.png \
-T "PCA of read counts" --transpose
Creates genomic region heatmaps from computeMatrix output. Generates publication-quality visualizations.
Key Parameters:
--matrixFile, -m: Matrix from computeMatrix (required)--outFileName, -o: Output image (png, eps, pdf, svg) (required)--outFileSortedRegions: Save regions after filtering--outFileNameMatrix: Export matrix values--interpolationMethod: auto, nearest, bilinear, bicubic, gaussian
--dpi: Figure resolutionClustering:
--kmeans: k-means clustering--hclust: Hierarchical clustering (slower for >1000 regions)--silhouette: Calculate cluster quality metricsVisual Customization:
--heatmapHeight / --heatmapWidth: Dimensions (3-100 cm)--whatToShow: plot, heatmap, colorbar (combinations)--alpha: Transparency (0-1)--colorMap: 50+ color schemes--colorList: Custom gradient colors--zMin / --zMax: Intensity scale limits--boxAroundHeatmaps: yes/no (default: yes)Labels:
--xAxisLabel / --yAxisLabel: Axis labels--regionsLabel: Region set identifiers--samplesLabel: Sample names--refPointLabel: Reference point label--startLabel / --endLabel: Region boundary labelsCommon Usage:
# Basic heatmap
plotHeatmap -m matrix.gz -o heatmap.png
# With clustering and custom colors
plotHeatmap -m matrix.gz -o heatmap.png \
--kmeans 3 --colorMap RdBu --zMin -3 --zMax 3
Generates profile plots showing scores across genomic regions using computeMatrix output.
Key Parameters:
--matrixFile, -m: Matrix from computeMatrix (required)--outFileName, -o: Output image (png, eps, pdf, svg) (required)--plotType: lines, fill, se, std, overlapped_lines, heatmap--colors: Color palette (names or hex codes)--plotHeight / --plotWidth: Dimensions in centimeters--yMin / --yMax: Y-axis range--averageType: mean, median, min, max, std, sumClustering:
--kmeans: k-means clustering--hclust: Hierarchical clustering--silhouette: Cluster quality metricsLabels:
--plotTitle: Main heading--regionsLabel: Region set identifiers--samplesLabel: Sample names--startLabel / --endLabel: Region boundary labels (scale-regions mode)Output Options:
--outFileNameData: Export data as tab-separated values--outFileSortedRegions: Save filtered/sorted regions as BEDCommon Usage:
# Line plot
plotProfile -m matrix.gz -o profile.png --plotType lines
# With standard error shading
plotProfile -m matrix.gz -o profile.png --plotType se \
--colors blue red green
Calculates and visualizes signal enrichment across genomic regions. Measures percentage of alignments overlapping region groups. Useful for FRiP (Fragment in Peaks) scores.
Key Parameters:
--bamfiles, -b: Indexed BAM files (required)--BED: Region files in BED/GTF format (required)--plotFile, -o: Output visualization (png, pdf, eps, svg)--labels, -l: Custom sample identifiers--outRawCounts: Export numerical data--perSample: Group by sample instead of feature (default)--regionLabels: Custom region namesRead Processing:
--minFragmentLength / --maxFragmentLength: Fragment filters--minMappingQuality: Quality threshold--samFlagInclude / --samFlagExclude: SAM flag filters--ignoreDuplicates: Remove duplicates--centerReads: Center reads for sharper signalCommon Usage:
plotEnrichment -b Input.bam H3K4me3.bam \
--BED peaks_up.bed peaks_down.bed \
--regionLabels "Up regulated" "Down regulated" \
-o enrichment.png
Advanced matrix manipulation tool for combining or subsetting matrices from computeMatrix. Enables complex multi-sample, multi-region analyses.
Operations:
cbind: Combine matrices column-wiserbind: Combine matrices row-wisesubset: Extract specific samples or regionsfilterStrand: Keep only regions on specific strandfilterValues: Apply signal intensity filterssort: Order regions by various criteriadataRange: Report min/max valuesCommon Usage:
# Combine matrices
computeMatrixOperations cbind -m matrix1.gz matrix2.gz -o combined.gz
# Extract specific samples
computeMatrixOperations subset -m matrix.gz --samples 0 2 -o subset.gz
Predicts the impact of various filtering parameters without actually filtering. Helps optimize filtering strategies before running full analyses.
Key Parameters:
--bamfiles, -b: BAM files to analyze--sampleSize: Number of reads to sample (default: 100,000)--binSize: Bin size for analysis--distanceBetweenBins: Spacing between sampled binsFiltration Options to Test:
--minMappingQuality: Test quality thresholds--ignoreDuplicates: Assess duplicate impact--minFragmentLength / --maxFragmentLength: Test fragment filtersMany deepTools commands share these filtering and performance options:
Read Filtering:
--ignoreDuplicates: Remove PCR duplicates--minMappingQuality: Filter by alignment confidence--samFlagInclude / --samFlagExclude: SAM format filtering--minFragmentLength / --maxFragmentLength: Fragment length boundsPerformance:
--numberOfProcessors, -p: Enable parallel processing--region: Process specific genomic regions (chr:start-end)Read Processing:
--extendReads: Extend to fragment length--centerReads: Center at fragment midpoint--ignoreDuplicates: Count unique reads only