skills/scvi-tools/references/models-specialized.md
This document covers models for specialized single-cell data modalities in scvi-tools.
Purpose: Analysis of single-cell bisulfite sequencing (scBS-seq) data for DNA methylation.
Key Features:
When to Use:
Data Requirements:
Basic Usage:
import scvi
# Setup methylation data
scvi.external.METHYLVI.setup_anndata(
adata,
layer="methylation_counts", # Methylation data
batch_key="batch"
)
model = scvi.external.METHYLVI(adata)
model.train()
# Get latent representation
latent = model.get_latent_representation()
# Get normalized methylation values
normalized_meth = model.get_normalized_methylation()
Basic Usage:
# Setup with cell type labels
scvi.external.METHYLANVI.setup_anndata(
adata,
layer="methylation_counts",
batch_key="batch",
labels_key="cell_type",
unlabeled_category="Unknown"
)
model = scvi.external.METHYLANVI(adata)
model.train()
# Predict cell types
predictions = model.predict()
Key Parameters:
n_latent: Latent dimensionalityregion_factors: Model region-specific effectsUse Cases:
Purpose: Batch correction and integration of flow cytometry and mass cytometry (CyTOF) data.
Key Features:
When to Use:
Data Requirements:
Basic Usage:
scvi.external.CYTOVI.setup_anndata(
adata,
protein_expression_obsm_key="protein_expression",
batch_key="batch"
)
model = scvi.external.CYTOVI(adata)
model.train()
# Get batch-corrected representation
latent = model.get_latent_representation()
# Get normalized protein values
normalized = model.get_normalized_expression()
Key Parameters:
n_latent: Latent space dimensionalityn_layers: Network depthTypical Workflow:
import scanpy as sc
# 1. Load cytometry data
adata = sc.read_h5ad("cytof_data.h5ad")
# 2. Train CytoVI
scvi.external.CYTOVI.setup_anndata(
adata,
protein_expression_obsm_key="protein",
batch_key="experiment"
)
model = scvi.external.CYTOVI(adata)
model.train()
# 3. Get batch-corrected values
latent = model.get_latent_representation()
adata.obsm["X_CytoVI"] = latent
# 4. Downstream analysis
sc.pp.neighbors(adata, use_rep="X_CytoVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)
# 5. Visualize batch correction
sc.pl.umap(adata, color=["batch", "leiden"])
Purpose: Batch effect correction with emphasis on preserving biological variation.
Key Features:
When to Use:
Basic Usage:
scvi.external.SysVI.setup_anndata(
adata,
layer="counts",
batch_key="batch"
)
model = scvi.external.SysVI(adata)
model.train()
latent = model.get_latent_representation()
Purpose: Trajectory inference and pseudotime analysis for single-cell data.
Key Features:
When to Use:
Basic Usage (Decipher lives in scvi.external):
# Decipher learns its own interpretable low-dimensional representation
scvi.external.Decipher.setup_anndata(adata, layer="counts")
decipher_model = scvi.external.Decipher(adata)
decipher_model.train()
# Interpretable Decipher representation for trajectory/structure analysis
adata.obsm["X_decipher"] = decipher_model.get_latent_representation()
Visualization:
import scanpy as sc
# Build a neighborhood graph on the Decipher representation, then embed
sc.pp.neighbors(adata, use_rep="X_decipher")
sc.tl.umap(adata)
sc.pl.umap(adata, color="cell_type")
Many specialized models work well in combination:
Methylation + Expression:
# Analyze separately, then integrate
methylvi_model = scvi.external.METHYLVI(meth_adata)
scvi_model = scvi.model.SCVI(rna_adata)
# Integrate results at analysis level
# E.g., correlate methylation and expression patterns
Cytometry + CITE-seq:
# CytoVI for flow/CyTOF
cyto_model = scvi.external.CYTOVI(cyto_adata)
# totalVI for CITE-seq
cite_model = scvi.model.TOTALVI(cite_adata)
# Compare protein measurements across platforms
ATAC + RNA (Multiome):
from mudata import MuData
# MultiVI for joint analysis (configured from a MuData object; see models-multimodal.md)
mdata = MuData({"rna": rna_adata, "atac": atac_adata})
scvi.model.MULTIVI.setup_mudata(
mdata, modalities={"rna_layer": "rna", "atac_layer": "atac"}
)
multivi_model = scvi.model.MULTIVI(
mdata, n_genes=rna_adata.n_vars, n_regions=atac_adata.n_vars
)
What data modality?
Do you have labels?
What's your main goal?
import scvi
import scanpy as sc
# 1. Load methylation data
meth_adata = sc.read_h5ad("methylation_data.h5ad")
# 2. QC: filter low-coverage CpG sites
sc.pp.filter_genes(meth_adata, min_cells=10)
# 3. Setup MethylVI
scvi.external.METHYLVI.setup_anndata(
meth_adata,
layer="methylation",
batch_key="batch"
)
# 4. Train model
model = scvi.external.METHYLVI(meth_adata, n_latent=15)
model.train(max_epochs=400)
# 5. Get latent representation
latent = model.get_latent_representation()
meth_adata.obsm["X_MethylVI"] = latent
# 6. Clustering
sc.pp.neighbors(meth_adata, use_rep="X_MethylVI")
sc.tl.umap(meth_adata)
sc.tl.leiden(meth_adata)
# 7. Differential methylation
dm_results = model.differential_methylation(
groupby="leiden",
group1="0",
group2="1"
)
# 8. Save
model.save("methylvi_model")
meth_adata.write("methylation_analyzed.h5ad")
Some specialized models are available as external packages:
SOLO (doublet detection):
from scvi.external import SOLO
solo = SOLO.from_scvi_model(scvi_model)
solo.train()
doublets = solo.predict()
scArches (reference mapping): scArches-style surgery is built into the
core models via ArchesMixin, not a separate SCARCHES class. Train a
reference model, then map a query with load_query_data:
# Reference model already trained and saved to "./reference_model"
query_model = scvi.model.SCVI.load_query_data(query_adata, "./reference_model")
query_model.train(max_epochs=200, plan_kwargs={"weight_decay": 0.0})
query_latent = query_model.get_latent_representation()
These external tools extend scvi-tools functionality for specific use cases.
| Model | Data Type | Primary Use | Supervised? |
|---|---|---|---|
| MethylVI | Methylation | Unsupervised analysis | No |
| MethylANVI | Methylation | Cell type annotation | Semi |
| CytoVI | Cytometry | Batch correction | No |
| SysVI | scRNA-seq | Large-scale integration | No |
| Decipher | scRNA-seq | Trajectory inference | No |
| SOLO | scRNA-seq | Doublet detection | Semi |