skills/bulk-rnaseq/references/design-and-qc.md
The statistics downstream are only as good as the design and the QC gates. Decide design before sequencing; apply QC before, during, and after quantification. This is what makes a bulk RNA-seq result defensible.
~batch + condition.~age + condition (ensure it's numeric).~genotype + condition + genotype:condition.pydeseq2 will error. Check pd.crosstab(metadata.condition, metadata.batch) for empty cells.logs/salmon_quant.log) — usually >70%. Low → wrong transcriptome, no decoys, or contamination.-s).--remove_ribo_rna on Path A.upstream-manual.md).PCR/optical duplicates look alarming but in RNA-seq mostly reflect genuine high expression. Standard gene-level DE (DESeq2) does not remove duplicates. Only consider dedup with UMIs (use the UMI, not coordinate dedup).
Always do this on the counts, ideally on variance-stabilized/log values:
~batch + condition); if it's unknown, consider surrogate-variable / RUV approaches (out of scope here — note it).[ ] >=3 biological replicates per group
[ ] batch recorded and NOT confounded with condition
[ ] raw FastQC reviewed; adapters trimmed
[ ] mapping/assignment rate acceptable; strandedness verified
[ ] PCA + sample-distance heatmap inspected; outliers/swaps resolved
[ ] design formula full-rank, adjustment vars before variable of interest
[ ] p-value histogram sane after DE
[ ] versions pinned (pipeline -r, tools, genome+annotation release)