scientific-skills/gget/references/database_info.md
Overview of databases queried by gget modules, including update frequencies and important considerations.
The databases queried by gget are continuously being updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary. Always keep gget updated:
pip install --upgrade gget
To ensure reproducibility in analyses:
Specify database versions/releases:
# Use specific Ensembl release
gget.ref("homo_sapiens", release=110)
# Use specific Census version
gget.cellxgene(gene=["PAX7"], census_version="2023-07-25")
Document gget version:
import gget
print(gget.__version__)
Save raw data:
# Always save results for reproducibility
results = gget.search(["ACE2"], species="homo_sapiens")
results.to_csv("search_results_2025-01-15.csv", index=False)
Regular gget updates:
Error handling:
API rate limiting:
gget ref --list_speciesWhen using gget, cite both the gget publication and the underlying databases:
gget: Luebbert, L. & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836
Database-specific citations: Check references/ directory or database websites for appropriate citations.