scientific-skills/depmap/references/dependency_analysis.md
Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:
| Score Range | Interpretation |
|---|---|
| > 0 | Likely growth-promoting when knocked out (some noise) |
| 0 to −0.3 | Non-essential: minimal fitness effect |
| −0.3 to −0.5 | Mild dependency |
| −0.5 to −1.0 | Significant dependency |
| < −1.0 | Strong dependency (common essential range) |
| ≈ −1.0 | Median of pan-essential genes (e.g., proteasome subunits) |
Genes that are essential in nearly all cell lines (score ~−1 to −2):
These can be used as positive controls for screen quality.
Genes with negligible fitness effect (score ~ 0):
To determine if a dependency is cancer-selective:
import pandas as pd
import numpy as np
def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
"""Compute selectivity score for a cancer lineage."""
scores = gene_effect_df[target_gene].dropna()
# Get cell line metadata
from depmap_utils import load_cell_line_info
cell_info = load_cell_line_info()
scores_df = scores.reset_index()
scores_df.columns = ["DepMap_ID", "score"]
scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])
cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]
# Selectivity: lower mean in cancer lineage vs others
selectivity = other_scores.mean() - cancer_scores.mean()
return {
"target_gene": target_gene,
"cancer_lineage": cancer_lineage,
"cancer_mean": cancer_scores.mean(),
"other_mean": other_scores.mean(),
"selectivity_score": selectivity,
"n_cancer": len(cancer_scores),
"fraction_dependent": (cancer_scores < -0.5).mean()
}
| Dataset | Description | Recommended |
|---|---|---|
CRISPRGeneEffect | Chronos-corrected gene effect | Yes (current) |
Achilles_gene_effect | Older CERES algorithm | Legacy only |
RNAi_merged | DEMETER2 RNAi | For cross-validation |
DepMap reports quality control metrics per screen:
Good screens: skewness < −1, AUC > 0.85
Common values for lineage field in sample_info.csv:
| Lineage | Description |
|---|---|
lung | Lung cancer |
breast | Breast cancer |
colorectal | Colorectal cancer |
brain_cancer | Brain cancer (GBM, etc.) |
leukemia | Leukemia |
lymphoma | Lymphoma |
prostate | Prostate cancer |
ovarian | Ovarian cancer |
pancreatic | Pancreatic cancer |
skin | Melanoma and other skin |
liver | Liver cancer |
kidney | Kidney cancer |
import pandas as pd
import numpy as np
from scipy import stats
def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
fdr_threshold=0.1):
"""
Find synthetic lethal partners for a loss-of-function mutation.
For each gene, tests if cell lines mutant in biomarker_gene
are more dependent on that gene vs. WT lines.
"""
if biomarker_gene not in mutation_df.columns:
return pd.DataFrame()
# Get mutant vs WT cell lines
common = gene_effect_df.index.intersection(mutation_df.index)
is_mutant = mutation_df.loc[common, biomarker_gene] == 1
mutant_lines = common[is_mutant]
wt_lines = common[~is_mutant]
results = []
for gene in gene_effect_df.columns:
mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()
if len(mut_scores) < 5 or len(wt_scores) < 10:
continue
stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
results.append({
"gene": gene,
"mean_mutant": mut_scores.mean(),
"mean_wt": wt_scores.mean(),
"effect_size": wt_scores.mean() - mut_scores.mean(),
"pval": pval,
"n_mutant": len(mut_scores),
"n_wt": len(wt_scores)
})
df = pd.DataFrame(results)
# FDR correction
from scipy.stats import false_discovery_control
df["qval"] = false_discovery_control(df["pval"], method="bh")
df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
return df
DepMap also contains compound sensitivity data from the PRISM assay:
import pandas as pd
def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
"""
Load PRISM drug sensitivity data.
Rows = cell lines, Columns = compounds (broad_id::name::dose)
Values = log2 fold change (more negative = more sensitive)
"""
return pd.read_csv(filepath, index_col=0)
# Available datasets:
# primary-screen: 4,518 compounds at single dose
# secondary-screen: ~8,000 compounds at multiple doses (AUC available)