skills/scanpy/references/r_interop.md
Many single-cell datasets arrive as R objects (.rds, .RData, Seurat, or SingleCellExperiment) even when the downstream analysis should happen in Scanpy. Agents should convert these inputs to AnnData .h5ad first, then continue with normal Scanpy workflows.
.rds directly in Python. Use R to deserialize R objects and write .h5ad..h5ad as the Python handoff format. After conversion, all QC, clustering, plotting, and exports should use Scanpy/AnnData.obs), gene metadata (var), raw counts/layers, and dimensional reductions when available.Rscript -e or script files, set CRAN repos explicitly, and pass ask = FALSE, update = FALSE for Bioconductor installs.Use these checks before installing anything:
uname -s 2>/dev/null || true
command -v Rscript || command -v R || true
Rscript --version 2>/dev/null || R --version 2>/dev/null || true
On Windows from PowerShell:
Get-Command Rscript -ErrorAction SilentlyContinue | Select-Object -ExpandProperty Source
Get-Command R -ErrorAction SilentlyContinue | Select-Object -ExpandProperty Source
If Git Bash cannot find R on Windows, query PowerShell or common install paths:
Get-ChildItem "C:\Program Files\R" -Filter Rscript.exe -Recurse -ErrorAction SilentlyContinue |
Select-Object -First 1 -ExpandProperty FullName
Then call the discovered executable with quotes, for example:
"/c/Program Files/R/R-4.6.0/bin/Rscript.exe" --version
Prefer existing system package managers. If installation requires GUI approval, admin credentials, or an unavailable package manager, stop and report the blocker.
Use Homebrew when available:
brew install --cask r
For packages with compiled code, install command-line build tools if the system asks for compilers:
xcode-select --install
Some R packages with Fortran code may require the CRAN macOS toolchain from https://mac.R-project.org/tools/. Prefer CRAN binary packages where possible to avoid compiler work.
Debian/Ubuntu:
sudo apt-get update
sudo apt-get install -y \
r-base r-base-dev build-essential gfortran \
libcurl4-openssl-dev libssl-dev libxml2-dev libhdf5-dev \
libharfbuzz-dev libfribidi-dev libfontconfig1-dev libfreetype6-dev \
libpng-dev libtiff5-dev libjpeg-dev
Fedora/RHEL-like systems:
sudo dnf install -y \
R R-devel gcc gcc-c++ gcc-gfortran make \
libcurl-devel openssl-devel libxml2-devel hdf5-devel \
harfbuzz-devel fribidi-devel fontconfig-devel freetype-devel \
libpng-devel libtiff-devel libjpeg-turbo-devel
If sudo is unavailable, use a managed environment such as Conda/Mamba if already present:
conda install -c conda-forge r-base r-essentials
Use winget from PowerShell when available:
winget install --id RProject.R -e
Install Rtools only if packages need compilation:
winget install --id RProject.Rtools -e
After installation, open a new shell or locate Rscript.exe under C:\Program Files\R\R-*\bin\. In Git Bash, call Windows executables through quoted paths or PowerShell; do not assume /usr/bin/R exists.
Create a project-local R library when you do not want to alter the user's global R library:
mkdir -p .r-lib
export R_LIBS_USER="$PWD/.r-lib"
Install the core conversion stack:
Rscript -e 'options(repos = c(CRAN = "https://cloud.r-project.org")); install.packages(c("BiocManager", "remotes"), Ncpus = max(1, parallel::detectCores() - 1)); BiocManager::install(c("SingleCellExperiment", "zellkonverter"), ask = FALSE, update = FALSE)'
Install Seurat support only when the input is a Seurat object:
Rscript -e 'options(repos = c(CRAN = "https://cloud.r-project.org")); install.packages(c("Seurat", "SeuratObject"), Ncpus = max(1, parallel::detectCores() - 1))'
Optional SeuratDisk fallback for h5Seurat-to-h5ad conversion:
Rscript -e 'options(repos = c(CRAN = "https://cloud.r-project.org")); if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes"); remotes::install_github("mojaveazure/seurat-disk", upgrade = "never")'
Avoid sceasy as the first choice. It can work, but it depends on reticulate/Python environment coupling and has more version-specific failure modes. Use it only after zellkonverter and SeuratDisk paths fail.
For .rds files:
Rscript -e 'obj <- readRDS("input.rds"); print(class(obj)); if (is.list(obj) && !is.data.frame(obj)) print(names(obj))'
For .RData/.rda files:
Rscript -e 'e <- new.env(parent = emptyenv()); load("input.RData", envir = e); print(ls(e)); print(lapply(as.list(e), class))'
If multiple objects are present, choose the object with class Seurat, SingleCellExperiment, or SummarizedExperiment. If there is ambiguity, ask the user which object to convert.
.rds to .h5adUse this script as the default conversion path. It handles SingleCellExperiment directly and converts Seurat objects through SingleCellExperiment before writing .h5ad.
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
if (length(args) < 2) {
stop("Usage: Rscript convert_rds_to_h5ad.R input.rds output.h5ad [assay]", call. = FALSE)
}
input <- normalizePath(args[[1]], mustWork = TRUE)
output <- args[[2]]
assay <- if (length(args) >= 3) args[[3]] else NULL
options(repos = c(CRAN = "https://cloud.r-project.org"))
ensure_pkg <- function(pkg, bioc = FALSE) {
if (!requireNamespace(pkg, quietly = TRUE)) {
if (bioc) {
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install(pkg, ask = FALSE, update = FALSE)
} else {
install.packages(pkg)
}
}
}
ensure_pkg("SingleCellExperiment", bioc = TRUE)
ensure_pkg("SummarizedExperiment", bioc = TRUE)
ensure_pkg("zellkonverter", bioc = TRUE)
obj <- readRDS(input)
message("Input classes: ", paste(class(obj), collapse = ", "))
if (inherits(obj, "SingleCellExperiment")) {
sce <- obj
} else if (inherits(obj, "Seurat")) {
ensure_pkg("Seurat")
ensure_pkg("SeuratObject")
obj <- Seurat::UpdateSeuratObject(obj, verbose = FALSE)
if (is.null(assay)) {
assay <- SeuratObject::DefaultAssay(obj)
}
if ("JoinLayers" %in% getNamespaceExports("SeuratObject")) {
obj <- tryCatch(SeuratObject::JoinLayers(obj, assay = assay), error = function(e) obj)
}
sce <- Seurat::as.SingleCellExperiment(obj, assay = assay)
} else {
stop("Unsupported RDS class: ", paste(class(obj), collapse = ", "), call. = FALSE)
}
x_name <- if ("counts" %in% SummarizedExperiment::assayNames(sce)) "counts" else NULL
zellkonverter::writeH5AD(sce, output, X_name = x_name)
message("Wrote: ", normalizePath(output, mustWork = FALSE))
Run it:
Rscript convert_rds_to_h5ad.R input.rds output.h5ad
If the Seurat object has multiple assays and the user specified one, pass it explicitly:
Rscript convert_rds_to_h5ad.R input.rds output.h5ad RNA
If Seurat-to-SingleCellExperiment conversion fails, try SeuratDisk:
library(Seurat)
library(SeuratDisk)
obj <- readRDS("input.rds")
obj <- UpdateSeuratObject(obj)
DefaultAssay(obj) <- "RNA"
SaveH5Seurat(obj, filename = "output.h5Seurat", overwrite = TRUE)
Convert("output.h5Seurat", dest = "h5ad", overwrite = TRUE)
Be aware that SeuratDisk chooses which assay/layer becomes AnnData .X based on the available Seurat slots. If raw counts are essential, validate where counts landed after conversion and copy them into adata.layers["counts"] if needed.
After conversion, always validate with Scanpy before continuing:
import scanpy as sc
adata = sc.read_h5ad("output.h5ad")
adata.var_names_make_unique()
print(adata)
print(adata.obs.head())
print(adata.var.head())
print("layers:", list(adata.layers.keys()))
print("obsm:", list(adata.obsm.keys()))
adata.write_h5ad("output.validated.h5ad", compression="gzip")
If the user requested metadata and expression exports:
import scipy.io as sio
adata.obs.to_csv("cell_metadata.csv")
adata.var.to_csv("gene_metadata.csv")
sio.mmwrite("expression_matrix.mtx", adata.X)
For very large datasets, avoid dense CSV expression exports unless the user explicitly asks. Prefer .h5ad, Matrix Market (.mtx), or sparse-aware downstream analysis.
Rscript: command not found: R is not installed or not on PATH. Use the OS-specific install steps above, then reopen the shell or call the full Rscript path.Rscript.exe and invoke the quoted path from Git Bash.r-base-dev, compilers, HDF5/libcurl/OpenSSL/XML headers, Rtools on Windows, or macOS command-line tools).SeuratObject::JoinLayers() before conversion, pass the assay explicitly, or use SeuratDisk fallback..X: Inspect adata.layers, adata.raw, and value ranges. Keep raw counts in adata.layers["counts"] before normalization if available..h5ad checkpoints after successful conversion.https://www.r-project.org/, https://cran.r-project.org/bin/macosx, https://cran.r-project.org/bin/windows/base/rw-FAQ.htmlhttps://www.bioconductor.org/install/, https://cran.r-project.org/web/packages/BiocManager/vignettes/BiocManager.htmlhttps://www.bioconductor.org/packages/release/bioc/html/zellkonverter.html, https://github.com/theislab/zellkonverterhttps://mojaveazure.github.io/seurat-disk/, https://github.com/mojaveazure/seurat-diskhttps://github.com/cellgeni/sceasy