scientific-skills/scvi-tools/references/theoretical-foundations.md
This document explains the mathematical and statistical principles underlying scvi-tools.
What is it? Variational inference is a technique for approximating complex probability distributions. In single-cell analysis, we want to understand the posterior distribution p(z|x) - the probability of latent variables z given observed data x.
Why use it?
How does it work?
ELBO Objective:
ELBO = E_q[log p(x|z)] - KL(q(z|x) || p(z))
↑ ↑
Reconstruction Regularization
Architecture:
x (observed data)
↓
[Encoder Neural Network]
↓
z (latent representation)
↓
[Decoder Neural Network]
↓
x̂ (reconstructed data)
Encoder: Maps cells (x) to latent space (z)
Decoder: Maps latent space (z) back to gene space
Reparameterization Trick:
Concept: Share encoder parameters across all cells.
Traditional inference: Learn separate latent variables for each cell
Amortized inference: Learn single encoder for all cells
Benefits:
Single-cell data are counts (integer-valued), requiring appropriate distributions.
x ~ NB(μ, θ)
When to use: Gene expression without zero-inflation
x ~ π·δ₀ + (1-π)·NB(μ, θ)
When to use: Sparse scRNA-seq data
x ~ Poisson(μ)
When to use: Less common; ATAC-seq fragment counts
Problem: Technical variation confounds biological signal
scvi-tools approach:
Mathematical formulation:
Encoder: q(z|x, s) - batch-aware encoding
Latent: z - batch-corrected representation
Decoder: p(x|z, s) - batch-specific decoding
Key insight: Batch info flows through decoder, not latent space
Generative model: Learns p(x), the data distribution
Process:
Benefits:
Inference network: Inverts generative process
Input: Gene expression counts x ∈ ℕ^G (G genes)
Encoder:
h = ReLU(W₁·x + b₁)
μ_z = W₂·h + b₂
log σ²_z = W₃·h + b₃
z ~ N(μ_z, σ²_z)
Latent space: z ∈ ℝ^d (typically d=10-30)
Decoder:
h = ReLU(W₄·z + b₄)
μ = softmax(W₅·h + b₅) · library_size
θ = exp(W₆·h + b₆)
π = sigmoid(W₇·h + b₇) # for ZINB
x ~ ZINB(μ, θ, π)
Loss function (ELBO):
L = E_q[log p(x|z)] - KL(q(z|x) || N(0,I))
Categorical covariates (batch, donor, etc.):
Continuous covariates (library size, percent_mito):
Covariate injection strategies:
Concept: Use pretrained model as initialization for new data
Process:
Why it works:
Applications:
Idea: Separate shared and sample-specific variation
Latent space decomposition:
z = z_shared + z_sample
Hierarchical structure:
Sample level: ρ_s ~ N(0, I)
Cell level: z_i ~ N(ρ_{s(i)}, σ²)
Benefits:
Goal: Predict outcome under different conditions
Example: "What would this cell look like if from different batch?"
Method:
Applications:
Definition: Distribution of new data given observed data
p(x_new | x_observed) = ∫ p(x_new|z) q(z|x_observed) dz
Estimation: Sample z from q(z|x), generate x_new from p(x_new|z)
Uses:
Traditional methods: Compare point estimates
scvi-tools approach: Compare distributions
Definition: Ratio of posterior odds to prior odds
BF = P(H₁|data) / P(H₀|data)
─────────────────────────
P(H₁) / P(H₀)
Interpretation:
In scvi-tools: Used to rank genes by evidence for DE
Goal: Control expected false discovery rate
Procedure:
Advantage over p-values:
Optimizer: Adam (adaptive learning rates)
Training loop:
Convergence criteria:
KL annealing: Gradually increase KL weight
Dropout: Random neuron dropping during training
Weight decay: L2 regularization on weights
Mini-batch training:
Stochastic optimization:
GPU acceleration:
Common symbols:
Key Papers:
Concepts to explore:
Mathematical background: