scientific-skills/scvelo/references/velocity_models.md
RNA velocity is based on the kinetic model of transcription:
dx_s/dt = β·x_u - γ·x_s (spliced dynamics)
dx_u/dt = α(t) - β·x_u (unspliced dynamics)
Where:
x_s: spliced mRNA abundancex_u: unspliced (pre-mRNA) abundanceα(t): transcription rate (varies over time)β: splicing rateγ: degradation rateVelocity is defined as: v = dx_s/dt = β·x_u - γ·x_s
# Use with scVelo for backward compatibility
scv.tl.velocity(adata, mode='steady_state')
scv.tl.velocity(adata, mode='stochastic')
scv.tl.recover_dynamics(adata, n_jobs=4)
scv.tl.velocity(adata, mode='dynamical')
Kinetic states identified by dynamical model:
| State | Description |
|---|---|
| Induction | α > 0, x_u increasing |
| Steady-state on | α > 0, constant high expression |
| Repression | α = 0, x_u decreasing |
| Steady-state off | α = 0, constant low expression |
The velocity graph connects cells based on their velocity similarity to neighboring cells' states:
scv.tl.velocity_graph(adata)
# Stored in adata.uns['velocity_graph']
# Entry [i,j] = probability that cell i transitions to cell j
Parameters:
n_neighbors: Number of neighbors consideredsqrt_transform: Apply sqrt transform to data (default: False for spliced)approx: Use approximate nearest neighbor search (faster for large datasets)Latent time τ ∈ [0, 1] for each gene represents:
Shared latent time is computed by taking the average over all velocity genes, weighted by fit_likelihood.
fit_likelihood: Goodness-of-fit of dynamical model (0-1; higher = better)
adata.var[adata.var['fit_likelihood'] > 0.1]fit_alpha: Transcription rate during inductionfit_gamma: mRNA degradation ratefit_r2: R² of kinetic fitvelocity_length: Magnitude of velocity vector (cell speed)velocity_confidence: Coherence of velocity with neighboring cells (0-1)# Check overall velocity quality
scv.pl.proportions(adata) # Ratio of spliced/unspliced per cell
scv.pl.velocity_confidence(adata, groupby='leiden')
| Parameter | Function | Default | When to Change |
|---|---|---|---|
min_shared_counts | Filter genes | 20 | Increase for deep sequencing; decrease for shallow |
n_top_genes | HVG selection | 2000 | Increase for complex datasets |
n_neighbors | kNN graph | 30 | Decrease for small datasets; increase for noisy |
n_pcs | PCA dimensions | 30 | Match to elbow in scree plot |
t_max_rank | Latent time constraint | None | Set if known developmental direction |
import cellrank as cr
from cellrank.kernels import VelocityKernel, ConnectivityKernel
# Combine velocity and connectivity kernels
vk = VelocityKernel(adata).compute_transition_matrix()
ck = ConnectivityKernel(adata).compute_transition_matrix()
combined = 0.8 * vk + 0.2 * ck
# Compute macrostates (terminal and initial states)
g = cr.estimators.GPCCA(combined)
g.compute_macrostates(n_states=4, cluster_key='leiden')
g.plot_macrostates(which="all")
# Compute fate probabilities
g.compute_fate_probabilities()
g.plot_fate_probabilities()
scVelo works natively with Scanpy's AnnData:
import scanpy as sc
import scvelo as scv
# Run standard Scanpy pipeline first
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
# Then add velocity on top
scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)