skills/gget/references/database_info.md
Overview of databases queried by gget modules, including update frequencies and important considerations.
The databases queried by gget are continuously being updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary. For reproducible environments matching this skill, pin the current verified version:
uv pip install "gget==0.30.5"
--download_all_accessionsOPENAI_API_KEY in Python workflows and avoid hard-coded keysTo ensure reproducibility in analyses:
Specify database versions/releases:
# Use specific Ensembl release
gget.ref("homo_sapiens", release=110)
# Use specific Census version
gget.cellxgene(gene=["PAX7"], census_version="2023-07-25")
Document gget version:
import gget
print(gget.__version__)
Current verified version for this skill: 0.30.5 (requires Python >=3.8).
Save raw data:
# Always save results for reproducibility
results = gget.search(["ACE2"], species="homo_sapiens")
results.to_csv("search_results_2025-01-15.csv", index=False)
Regular gget updates:
Error handling:
API rate limiting:
gget virus, use restrictive filters and resume partial downloads with baseline/merge optionsgget ref --list_speciesgget virus queries over all-accession downloadscommand_summary.txt after each run for errors, software versions, and output pathsWhen using gget, cite both the gget publication and the underlying databases:
gget: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836
Database-specific citations: Check references/ directory or database websites for appropriate citations.