Back to Claude Scientific Skills

Spatial Transcriptomics Models

skills/scvi-tools/references/models-spatial.md

2.46.012.2 KB
Original Source

Spatial Transcriptomics Models

This document covers models for analyzing spatially-resolved transcriptomics data in scvi-tools.

DestVI (Deconvolution of Spatial Transcriptomics using Variational Inference)

Purpose: Multi-resolution deconvolution of spatial transcriptomics using single-cell reference data.

Key Features:

  • Estimates cell type proportions at each spatial location
  • Uses single-cell RNA-seq reference for deconvolution
  • Multi-resolution approach (global and local patterns)
  • Accounts for spatial correlation
  • Provides uncertainty quantification

When to Use:

  • Deconvolving Visium or similar spatial transcriptomics
  • Have scRNA-seq reference data with cell type labels
  • Want to map cell types to spatial locations
  • Interested in spatial organization of cell types
  • Need probabilistic estimates of cell type abundance

Data Requirements:

  • Spatial data: Visium or similar spot-based measurements (target data)
  • Single-cell reference: scRNA-seq with cell type annotations
  • Both datasets should share genes

Basic Usage (DestVI requires a CondSCVI reference model, not plain SCVI):

python
from scvi.model import CondSCVI, DestVI

# Step 1: Train the single-cell LVM (CondSCVI) on the labeled reference
CondSCVI.setup_anndata(sc_adata, layer="counts", labels_key="cell_type")
sc_model = CondSCVI(sc_adata, weight_obs=False)
sc_model.train(max_epochs=300)

# Step 2: Set up the spatial data and build DestVI from the trained reference
DestVI.setup_anndata(spatial_adata, layer="counts")
st_model = DestVI.from_rna_model(spatial_adata, sc_model)
st_model.train(max_epochs=2500)

# Step 3: Get cell type proportions per spot
proportions = st_model.get_proportions()
spatial_adata.obsm["proportions"] = proportions

# Step 4: Get cell type-specific expression
# Expression of genes specific to each cell type at each spot
ct_expression = st_model.get_scale_for_ct("T cells")

Key Parameters:

  • weight_obs (CondSCVI): downweight common cell types when training the reference
  • amortization: Amortization strategy ("both", "latent", "proportion")
  • n_latent: Latent dimensionality (inherited from the CondSCVI reference)

Outputs:

  • get_proportions(): Cell type proportions at each spot
  • get_scale_for_ct(cell_type): Cell type-specific expression patterns
  • get_gamma(): Proportion-specific gene expression scaling

Visualization:

python
import scanpy as sc
import matplotlib.pyplot as plt

# Visualize specific cell type proportions spatially
sc.pl.spatial(
    spatial_adata,
    color="T cells",  # If proportions added to .obs
    spot_size=150
)

# Or use obsm directly
for ct in cell_types:
    plt.figure()
    sc.pl.spatial(
        spatial_adata,
        color=spatial_adata.obsm["proportions"][ct],
        title=f"{ct} proportions"
    )

Stereoscope

Purpose: Cell type deconvolution for spatial transcriptomics using probabilistic modeling.

Key Features:

  • Reference-based deconvolution
  • Probabilistic framework for cell type proportions
  • Works with various spatial technologies
  • Handles gene selection and normalization

When to Use:

  • Similar to DestVI but simpler approach
  • Deconvolving spatial data with reference
  • Faster alternative for basic deconvolution

Basic Usage (Stereoscope is split into two scvi.external models):

python
from scvi.external import RNAStereoscope, SpatialStereoscope

# Train the single-cell reference model
RNAStereoscope.setup_anndata(sc_adata, labels_key="cell_type", layer="counts")
sc_model = RNAStereoscope(sc_adata)
sc_model.train(max_epochs=100)

# Build the spatial model from the trained reference
SpatialStereoscope.setup_anndata(spatial_adata, layer="counts")
spatial_model = SpatialStereoscope.from_rna_model(spatial_adata, sc_model)
spatial_model.train(max_epochs=2000)

# Get cell-type proportions per spot
proportions = spatial_model.get_proportions()

Tangram

Purpose: Spatial mapping and integration of single-cell data to spatial locations.

Key Features:

  • Maps single cells to spatial coordinates
  • Learns optimal transport between single-cell and spatial data
  • Gene imputation at spatial locations
  • Cell type mapping

When to Use:

  • Mapping cells from scRNA-seq to spatial locations
  • Imputing unmeasured genes in spatial data
  • Understanding spatial organization at single-cell resolution
  • Integrating scRNA-seq and spatial transcriptomics

Data Requirements:

  • Single-cell RNA-seq data with annotations
  • Spatial transcriptomics data
  • Shared genes between modalities

scvi-tools wraps the Tangram method as scvi.external.Tangram. The example below uses the standalone tangram-sc package, whose helper functions (annotation/gene projection) are convenient for downstream analysis.

Basic Usage:

python
import tangram as tg

# Map cells to spatial locations
ad_map = tg.map_cells_to_space(
    adata_sc=sc_adata,
    adata_sp=spatial_adata,
    mode="cells",  # or "clusters" for cell type mapping
    density_prior="rna_count_based"
)

# Get mapping matrix (cells × spots)
mapping = ad_map.X

# Project cell annotations to space
tg.project_cell_annotations(
    ad_map,
    spatial_adata,
    annotation="cell_type"
)

# Impute genes in spatial data
genes_to_impute = ["CD3D", "CD8A", "CD4"]
tg.project_genes(ad_map, spatial_adata, genes=genes_to_impute)

Visualization:

python
# Visualize cell type mapping
sc.pl.spatial(
    spatial_adata,
    color="cell_type_projected",
    spot_size=100
)

gimVI (Gaussian Identity Multivi for Imputation)

Purpose: Cross-modality imputation between spatial and single-cell data.

Key Features:

  • Joint model of spatial and single-cell data
  • Imputes missing genes in spatial data
  • Enables cross-dataset queries
  • Learns shared representations

When to Use:

  • Imputing genes not measured in spatial data
  • Joint analysis of spatial and single-cell datasets
  • Mapping between modalities

Basic Usage:

python
# gimVI (scvi.external) jointly models a single-cell and a spatial dataset
scvi.external.GIMVI.setup_anndata(sc_adata, layer="counts")
scvi.external.GIMVI.setup_anndata(spatial_adata, layer="counts")

model = scvi.external.GIMVI(sc_adata, spatial_adata)
model.train()

# Per-dataset latent representations and imputed expression
sc_latent, spatial_latent = model.get_latent_representation()
_, imputed_spatial = model.get_imputed_values(normalized=True)

scVIVA (Variation in Variational Autoencoders for Spatial)

Purpose: Analyzing cell-environment relationships in spatial data.

Key Features:

  • Models cellular neighborhoods and environments
  • Identifies environment-associated gene expression
  • Accounts for spatial correlation structure
  • Cell-cell interaction analysis

When to Use:

  • Understanding how spatial context affects cells
  • Identifying niche-specific gene programs
  • Cell-cell interaction studies
  • Microenvironment analysis

Data Requirements:

  • Spatial transcriptomics with coordinates
  • Cell type annotations (optional)

Basic Usage (scVIVA lives in scvi.external):

python
# scVIVA jointly models each cell and its niche. Precompute neighborhood
# composition / embeddings first, then register the resulting keys.
scvi.external.SCVIVA.preprocess_anndata(
    spatial_adata,
    sample_key="sample",
    labels_key="cell_type",
    cell_coordinates_key="spatial",  # coordinates in .obsm
)

scvi.external.SCVIVA.setup_anndata(
    spatial_adata,
    labels_key="cell_type",
    sample_key="sample",
)

model = scvi.external.SCVIVA(spatial_adata)
model.train()

# Latent representation capturing cell state in its spatial context
spatial_adata.obsm["X_scVIVA"] = model.get_latent_representation()

See the scVIVA user guide for the exact preprocessing keys and niche/environment outputs.

ResolVI

Purpose: Addressing spatial transcriptomics noise through resolution-aware modeling.

Key Features:

  • Accounts for spatial resolution effects
  • Denoises spatial data
  • Multi-scale analysis
  • Improves downstream analysis quality

When to Use:

  • Noisy spatial data
  • Multiple spatial resolutions
  • Need denoising before analysis
  • Improving data quality

Basic Usage (ResolVI lives in scvi.external):

python
scvi.external.RESOLVI.setup_anndata(
    spatial_adata,
    layer="counts",
)

model = scvi.external.RESOLVI(spatial_adata)
model.train()

# Denoised / corrected expression
denoised = model.get_normalized_expression()

Model Selection for Spatial Transcriptomics

DestVI

Choose when:

  • Need detailed deconvolution with reference
  • Have high-quality scRNA-seq reference
  • Want multi-resolution analysis
  • Need uncertainty quantification

Best for: Visium, spot-based technologies

Stereoscope

Choose when:

  • Need simpler, faster deconvolution
  • Basic cell type proportion estimates
  • Limited computational resources

Best for: Quick deconvolution tasks

Tangram

Choose when:

  • Want single-cell resolution mapping
  • Need to impute many genes
  • Interested in cell positioning
  • Optimal transport approach preferred

Best for: Detailed spatial mapping

gimVI

Choose when:

  • Need bidirectional imputation
  • Joint modeling of spatial and single-cell
  • Cross-dataset queries

Best for: Integration and imputation

scVIVA

Choose when:

  • Interested in cellular environments
  • Cell-cell interaction analysis
  • Neighborhood effects

Best for: Microenvironment studies

ResolVI

Choose when:

  • Data quality is a concern
  • Need denoising
  • Multi-scale analysis

Best for: Noisy data preprocessing

Complete Workflow: Spatial Deconvolution with DestVI

python
import scvi
import scanpy as sc
import squidpy as sq

# ===== Part 1: Prepare single-cell reference =====
# Load and process scRNA-seq reference
sc_adata = sc.read_h5ad("reference_scrna.h5ad")

# QC and filtering
sc.pp.filter_genes(sc_adata, min_cells=10)
sc.pp.highly_variable_genes(sc_adata, n_top_genes=4000)

# Train the CondSCVI reference model (labels_key carries cell types)
from scvi.model import CondSCVI, DestVI

CondSCVI.setup_anndata(
    sc_adata,
    layer="counts",
    labels_key="cell_type",
)

sc_model = CondSCVI(sc_adata, weight_obs=False)
sc_model.train(max_epochs=300)

# ===== Part 2: Load spatial data =====
spatial_adata = sc.read_visium("path/to/visium")
spatial_adata.var_names_make_unique()

# QC spatial data
sc.pp.filter_genes(spatial_adata, min_cells=10)

# ===== Part 3: Run DestVI =====
DestVI.setup_anndata(
    spatial_adata,
    layer="counts"
)

destvi_model = DestVI.from_rna_model(
    spatial_adata,
    sc_model,
)

destvi_model.train(max_epochs=2500)

# ===== Part 4: Extract results =====
# Get proportions
proportions = destvi_model.get_proportions()
spatial_adata.obsm["proportions"] = proportions

# Add proportions to .obs for easy plotting
for i, ct in enumerate(sc_model.adata.obs["cell_type"].cat.categories):
    spatial_adata.obs[f"prop_{ct}"] = proportions[:, i]

# ===== Part 5: Visualization =====
# Plot specific cell types
cell_types = ["T cells", "B cells", "Macrophages"]

for ct in cell_types:
    sc.pl.spatial(
        spatial_adata,
        color=f"prop_{ct}",
        title=f"{ct} proportions",
        spot_size=150,
        cmap="viridis"
    )

# ===== Part 6: Spatial analysis =====
# Compute spatial neighbors
sq.gr.spatial_neighbors(spatial_adata)

# Spatial autocorrelation of cell types
for ct in cell_types:
    sq.gr.spatial_autocorr(
        spatial_adata,
        attr="obs",
        mode="moran",
        genes=[f"prop_{ct}"]
    )

# ===== Part 7: Save results =====
destvi_model.save("destvi_model")
spatial_adata.write("spatial_deconvolved.h5ad")

Best Practices for Spatial Analysis

  1. Reference quality: Use high-quality, well-annotated scRNA-seq reference
  2. Gene overlap: Ensure sufficient shared genes between reference and spatial
  3. Spatial coordinates: Properly register spatial coordinates in .obsm["spatial"]
  4. Validation: Use known marker genes to validate deconvolution
  5. Visualization: Always visualize results spatially to check biological plausibility
  6. Cell type granularity: Consider appropriate cell type resolution
  7. Computational resources: Spatial models can be memory-intensive
  8. Quality control: Filter low-quality spots before analysis