skills/scvi-tools/references/models-scrna-seq.md
This document covers core models for analyzing single-cell RNA sequencing data in scvi-tools.
Purpose: Unsupervised analysis, dimensionality reduction, and batch correction for scRNA-seq data.
Key Features:
When to Use:
Basic Usage:
import scvi
# Setup data
scvi.model.SCVI.setup_anndata(
adata,
layer="counts",
batch_key="batch"
)
# Train model
model = scvi.model.SCVI(adata, n_latent=30)
model.train()
# Extract results
latent = model.get_latent_representation()
normalized = model.get_normalized_expression()
Key Parameters:
n_latent: Dimensionality of latent space (default: 10)n_layers: Number of hidden layers (default: 1)n_hidden: Number of nodes per hidden layer (default: 128)dropout_rate: Dropout rate for neural networks (default: 0.1)dispersion: Gene-specific or cell-specific dispersion ("gene" or "gene-batch")gene_likelihood: Distribution for data ("zinb", "nb", "poisson")Outputs:
get_latent_representation(): Batch-corrected low-dimensional embeddingsget_normalized_expression(): Denoised, normalized expression valuesdifferential_expression(): Probabilistic DE testing between groupsget_feature_correlation_matrix(): Gene-gene correlation estimatesPurpose: Semi-supervised cell type annotation and integration using labeled and unlabeled cells.
Key Features:
When to Use:
Basic Usage:
# Option 1: Train from scratch
scvi.model.SCANVI.setup_anndata(
adata,
layer="counts",
batch_key="batch",
labels_key="cell_type",
unlabeled_category="Unknown"
)
model = scvi.model.SCANVI(adata)
model.train()
# Option 2: Initialize from pretrained scVI
scvi_model = scvi.model.SCVI(adata)
scvi_model.train()
scanvi_model = scvi.model.SCANVI.from_scvi_model(
scvi_model,
unlabeled_category="Unknown"
)
scanvi_model.train()
# Predict cell types
predictions = scanvi_model.predict()
Key Parameters:
labels_key: Column in adata.obs containing cell type labelsunlabeled_category: Label for cells without annotationsOutputs:
predict(): Cell type predictions for all cellspredict_proba(): Prediction probabilitiesget_latent_representation(): Cell type-aware latent spacePurpose: Automatic identification and modeling of zero-inflated genes in scRNA-seq data.
Key Features:
When to Use:
Basic Usage:
scvi.model.AUTOZI.setup_anndata(adata, layer="counts")
model = scvi.model.AUTOZI(adata)
model.train()
# Get zero-inflation probabilities per gene
zi_probs = model.get_alphas_betas()
Purpose: RNA velocity analysis using variational inference.
Key Features:
When to Use:
Basic Usage:
import scvelo as scv
# Prepare velocity data
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)
# Train VeloVI (lives in scvi.external)
scvi.external.VELOVI.setup_anndata(adata, spliced_layer="Ms", unspliced_layer="Mu")
model = scvi.external.VELOVI(adata)
model.train()
# Get velocity estimates
latent_time = model.get_latent_time()
velocities = model.get_velocity()
Purpose: Isolating perturbation-specific variations from background biological variation.
Key Features:
When to Use:
Basic Usage (contrastiveVI lives in scvi.external):
import numpy as np
scvi.external.ContrastiveVI.setup_anndata(adata, layer="counts")
model = scvi.external.ContrastiveVI(
adata,
n_background_latent=10, # Shared/background variation
n_salient_latent=10, # Target-specific (salient) variation
)
# Train with explicit background (control) and target (perturbed) cell indices
background_idx = np.where(adata.obs["condition"] == "control")[0]
target_idx = np.where(adata.obs["condition"] == "treated")[0]
model.train(background_indices=background_idx, target_indices=target_idx)
# Extract representations
background = model.get_latent_representation(representation_kind="background")
salient = model.get_latent_representation(representation_kind="salient")
Purpose: Marker-based cell type annotation using known marker genes.
Key Features:
When to Use:
Basic Usage:
# Create marker gene matrix (cell types x genes)
marker_gene_mat = pd.DataFrame({
"CD4 T cells": [1, 1, 0, 0], # CD3D, CD4, CD8A, CD19
"CD8 T cells": [1, 0, 1, 0],
"B cells": [0, 0, 0, 1]
}, index=["CD3D", "CD4", "CD8A", "CD19"])
# CellAssign lives in scvi.external and needs a size factor per cell
adata.obs["size_factor"] = adata.X.sum(axis=1)
scvi.external.CellAssign.setup_anndata(adata, size_factor_key="size_factor")
model = scvi.external.CellAssign(adata, marker_gene_mat)
model.train()
predictions = model.predict()
Purpose: Identifying doublets (cells containing two or more cells) in scRNA-seq data.
Key Features:
When to Use:
Basic Usage:
# Train scVI model first
scvi.model.SCVI.setup_anndata(adata, layer="counts")
scvi_model = scvi.model.SCVI(adata)
scvi_model.train()
# Train Solo for doublet detection
solo_model = scvi.external.SOLO.from_scvi_model(scvi_model)
solo_model.train()
# Predict doublets
predictions = solo_model.predict()
doublet_scores = predictions["doublet"]
adata.obs["doublet_score"] = doublet_scores
Purpose: Topic modeling for gene expression using Latent Dirichlet Allocation.
Key Features:
When to Use:
Basic Usage:
scvi.model.AmortizedLDA.setup_anndata(adata, layer="counts")
model = scvi.model.AmortizedLDA(adata, n_topics=10)
model.train()
# Get topic compositions per cell (Monte Carlo estimate of topic proportions)
topic_proportions = model.get_latent_representation()
# Get gene-by-topic loadings
topic_gene_loadings = model.get_feature_by_topic()
Choose scVI when:
Choose scANVI when:
Choose AUTOZI when:
Choose VeloVI when:
Choose contrastiveVI when:
Choose CellAssign when:
Choose Solo when: