Back to Claude Scientific Skills

DepMap Dependency Analysis Guide

scientific-skills/depmap/references/dependency_analysis.md

2.38.05.6 KB
Original Source

DepMap Dependency Analysis Guide

Understanding Chronos Scores

Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:

  • Copy number effects (high-copy genes appear essential due to DNA cutting)
  • Guide RNA efficiency variation
  • Cell line growth rates

Score Interpretation

Score RangeInterpretation
> 0Likely growth-promoting when knocked out (some noise)
0 to −0.3Non-essential: minimal fitness effect
−0.3 to −0.5Mild dependency
−0.5 to −1.0Significant dependency
< −1.0Strong dependency (common essential range)
≈ −1.0Median of pan-essential genes (e.g., proteasome subunits)

Common Essential Genes (Controls)

Genes that are essential in nearly all cell lines (score ~−1 to −2):

  • Ribosomal proteins: RPL..., RPS...
  • Proteasome: PSMA..., PSMB...
  • Spliceosome: SNRPD1, SNRNP70
  • DNA replication: MCM2, PCNA
  • Transcription: POLR2A, TAF...

These can be used as positive controls for screen quality.

Non-Essential Controls

Genes with negligible fitness effect (score ~ 0):

  • Non-expressed genes (tissue-specific)
  • Safe harbor loci

Selectivity Assessment

To determine if a dependency is cancer-selective:

python
import pandas as pd
import numpy as np

def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
    """Compute selectivity score for a cancer lineage."""
    scores = gene_effect_df[target_gene].dropna()

    # Get cell line metadata
    from depmap_utils import load_cell_line_info
    cell_info = load_cell_line_info()
    scores_df = scores.reset_index()
    scores_df.columns = ["DepMap_ID", "score"]
    scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])

    cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
    other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]

    # Selectivity: lower mean in cancer lineage vs others
    selectivity = other_scores.mean() - cancer_scores.mean()
    return {
        "target_gene": target_gene,
        "cancer_lineage": cancer_lineage,
        "cancer_mean": cancer_scores.mean(),
        "other_mean": other_scores.mean(),
        "selectivity_score": selectivity,
        "n_cancer": len(cancer_scores),
        "fraction_dependent": (cancer_scores < -0.5).mean()
    }

CRISPR Dataset Versions

DatasetDescriptionRecommended
CRISPRGeneEffectChronos-corrected gene effectYes (current)
Achilles_gene_effectOlder CERES algorithmLegacy only
RNAi_mergedDEMETER2 RNAiFor cross-validation

Quality Metrics

DepMap reports quality control metrics per screen:

  • Skewness: Pan-essential genes should show negative skew
  • AUC: Area under ROC for pan-essential vs non-essential controls

Good screens: skewness < −1, AUC > 0.85

Cancer Lineage Codes

Common values for lineage field in sample_info.csv:

LineageDescription
lungLung cancer
breastBreast cancer
colorectalColorectal cancer
brain_cancerBrain cancer (GBM, etc.)
leukemiaLeukemia
lymphomaLymphoma
prostateProstate cancer
ovarianOvarian cancer
pancreaticPancreatic cancer
skinMelanoma and other skin
liverLiver cancer
kidneyKidney cancer

Synthetic Lethality Analysis

python
import pandas as pd
import numpy as np
from scipy import stats

def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
                           fdr_threshold=0.1):
    """
    Find synthetic lethal partners for a loss-of-function mutation.

    For each gene, tests if cell lines mutant in biomarker_gene
    are more dependent on that gene vs. WT lines.
    """
    if biomarker_gene not in mutation_df.columns:
        return pd.DataFrame()

    # Get mutant vs WT cell lines
    common = gene_effect_df.index.intersection(mutation_df.index)
    is_mutant = mutation_df.loc[common, biomarker_gene] == 1

    mutant_lines = common[is_mutant]
    wt_lines = common[~is_mutant]

    results = []
    for gene in gene_effect_df.columns:
        mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
        wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()

        if len(mut_scores) < 5 or len(wt_scores) < 10:
            continue

        stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
        results.append({
            "gene": gene,
            "mean_mutant": mut_scores.mean(),
            "mean_wt": wt_scores.mean(),
            "effect_size": wt_scores.mean() - mut_scores.mean(),
            "pval": pval,
            "n_mutant": len(mut_scores),
            "n_wt": len(wt_scores)
        })

    df = pd.DataFrame(results)
    # FDR correction
    from scipy.stats import false_discovery_control
    df["qval"] = false_discovery_control(df["pval"], method="bh")
    df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
    return df

Drug Sensitivity (PRISM)

DepMap also contains compound sensitivity data from the PRISM assay:

python
import pandas as pd

def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
    """
    Load PRISM drug sensitivity data.
    Rows = cell lines, Columns = compounds (broad_id::name::dose)
    Values = log2 fold change (more negative = more sensitive)
    """
    return pd.read_csv(filepath, index_col=0)

# Available datasets:
# primary-screen: 4,518 compounds at single dose
# secondary-screen: ~8,000 compounds at multiple doses (AUC available)