Back to Claude Scientific Skills

gget Module Reference

scientific-skills/gget/references/module_reference.md

2.38.017.1 KB
Original Source

gget Module Reference

Comprehensive parameter reference for all gget modules.

Reference & Gene Information Modules

gget ref

Retrieve Ensembl reference genome FTPs and metadata.

Parameters:

ParameterTypeDescriptionDefault
speciesstrSpecies in Genus_species format or shortcuts ('human', 'mouse')Required
-w/--whichstrFile types to return: gtf, cdna, dna, cds, cdrna, pepAll
-r/--releaseintEnsembl release numberLatest
-od/--out_dirstrOutput directory pathNone
-o/--outstrJSON file path for resultsNone
-l/--list_speciesflagList available vertebrate speciesFalse
-liv/--list_iv_speciesflagList available invertebrate speciesFalse
-ftpflagReturn only FTP linksFalse
-d/--downloadflagDownload files (requires curl)False
-q/--quietflagSuppress progress informationFalse

Returns: JSON containing FTP links, Ensembl release numbers, release dates, file sizes


Search for genes by name or description in Ensembl.

Parameters:

ParameterTypeDescriptionDefault
searchwordsstr/listSearch terms (case-insensitive)Required
-s/--speciesstrTarget species or core database nameRequired
-r/--releaseintEnsembl release numberLatest
-t/--id_typestrReturn 'gene' or 'transcript''gene'
-ao/--andorstr'or' (ANY term) or 'and' (ALL terms)'or'
-l/--limitintMaximum results to returnNone
-o/--outstrOutput file path (CSV/JSON)None

Returns: ensembl_id, gene_name, ensembl_description, ext_ref_description, biotype, URL


gget info

Get comprehensive gene/transcript metadata from Ensembl, UniProt, and NCBI.

Parameters:

ParameterTypeDescriptionDefault
ens_idsstr/listEnsembl IDs (WormBase, Flybase also supported)Required
-o/--outstrOutput file path (CSV/JSON)None
-n/--ncbiboolDisable NCBI data retrievalFalse
-u/--uniprotboolDisable UniProt data retrievalFalse
-pdbboolInclude PDB identifiersFalse
-csvflagReturn CSV format (CLI)False
-q/--quietflagSuppress progress displayFalse

Python-specific:

  • save=True: Save output to current directory
  • wrap_text=True: Format dataframe with wrapped text

Note: Processing >1000 IDs simultaneously may cause server errors.

Returns: UniProt ID, NCBI gene ID, gene name, synonyms, protein names, descriptions, biotype, canonical transcript


gget seq

Retrieve nucleotide or amino acid sequences in FASTA format.

Parameters:

ParameterTypeDescriptionDefault
ens_idsstr/listEnsembl identifiersRequired
-o/--outstrOutput file pathstdout
-t/--translateflagFetch amino acid sequencesFalse
-iso/--isoformsflagReturn all transcript variantsFalse
-q/--quietflagSuppress progress informationFalse

Data sources: Ensembl (nucleotide), UniProt (amino acid)

Returns: FASTA format sequences


Sequence Analysis & Alignment Modules

gget blast

BLAST sequences against standard databases.

Parameters:

ParameterTypeDescriptionDefault
sequencestrSequence or path to FASTA/.txtRequired
-p/--programstrblastn, blastp, blastx, tblastn, tblastxAuto-detect
-db/--databasestrnt, refseq_rna, pdbnt, nr, swissprot, pdbaa, refseq_proteinnt or nr
-l/--limitintMax hits returned50
-e/--expectfloatE-value cutoff10.0
-lcf/--low_comp_filtflagEnable low complexity filteringFalse
-mbo/--megablast_offflagDisable MegaBLAST (blastn only)False
-o/--outstrOutput file pathNone
-q/--quietflagSuppress progressFalse

Returns: Description, Scientific Name, Common Name, Taxid, Max Score, Total Score, Query Coverage


gget blat

Find genomic positions using UCSC BLAT.

Parameters:

ParameterTypeDescriptionDefault
sequencestrSequence or path to FASTA/.txtRequired
-st/--seqtypestr'DNA', 'protein', 'translated%20RNA', 'translated%20DNA'Auto-detect
-a/--assemblystrTarget assembly (hg38, mm39, taeGut2, etc.)'human'/hg38
-o/--outstrOutput file pathNone
-csvflagReturn CSV format (CLI)False
-q/--quietflagSuppress progressFalse

Returns: genome, query size, alignment start/end, matches, mismatches, alignment percentage


gget muscle

Align multiple sequences using Muscle5.

Parameters:

ParameterTypeDescriptionDefault
fastastr/listSequences or FASTA file pathRequired
-o/--outstrOutput file pathstdout
-s5/--super5flagUse Super5 algorithm (faster, large datasets)False
-q/--quietflagSuppress progressFalse

Returns: ClustalW format alignment or aligned FASTA (.afa)


gget diamond

Fast local protein/translated DNA alignment.

Parameters:

ParameterTypeDescriptionDefault
querystr/listQuery sequences or FASTA fileRequired
--referencestr/listReference sequences or FASTA fileRequired
--sensitivitystrfast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitivevery-sensitive
--threadsintCPU threads1
--diamond_binarystrPath to DIAMOND installationAuto-detect
--diamond_dbstrSave database for reuseNone
--translatedflagEnable nucleotide-to-amino acid alignmentFalse
-o/--outstrOutput file pathNone
-csvflagCSV format (CLI)False
-q/--quietflagSuppress progressFalse

Returns: Identity %, sequence lengths, match positions, gap openings, E-values, bit scores


Structural & Protein Analysis Modules

gget pdb

Query RCSB Protein Data Bank.

Parameters:

ParameterTypeDescriptionDefault
pdb_idstrPDB identifier (e.g., '7S7U')Required
-r/--resourcestrpdb, entry, pubmed, assembly, entity types'pdb'
-i/--identifierstrAssembly, entity, or chain IDNone
-o/--outstrOutput file pathstdout

Returns: PDB format (structures) or JSON (metadata)


gget alphafold

Predict 3D protein structures using AlphaFold2.

Setup: Requires OpenMM and gget setup alphafold (~4GB download)

Parameters:

ParameterTypeDescriptionDefault
sequencestr/listAmino acid sequence(s) or FASTA fileRequired
-mr/--multimer_recyclesintRecycling iterations for multimers3
-o/--outstrOutput folder pathtimestamped
-mfm/--multimer_for_monomerflagApply multimer model to monomersFalse
-r/--relaxflagAMBER relaxation for top modelFalse
-q/--quietflagSuppress progressFalse

Python-only:

  • plot (bool): Generate 3D visualization (default: True)
  • show_sidechains (bool): Include side chains (default: True)

Note: Multiple sequences automatically trigger multimer modeling

Returns: PDB structure file, JSON alignment error data, optional 3D plot


gget elm

Predict Eukaryotic Linear Motifs.

Setup: Requires gget setup elm

Parameters:

ParameterTypeDescriptionDefault
sequencestrAmino acid sequence or UniProt AccRequired
-s/--sensitivitystrDIAMOND alignment sensitivityvery-sensitive
-t/--threadsintNumber of threads1
-bin/--diamond_binarystrPath to DIAMOND binaryAuto-detect
-o/--outstrOutput directory pathNone
-u/--uniprotflagInput is UniProt AccFalse
-e/--expandflagInclude protein names, organisms, referencesFalse
-csvflagCSV format (CLI)False
-q/--quietflagSuppress progressFalse

Returns: Two outputs:

  1. ortholog_df: Motifs from orthologous proteins
  2. regex_df: Motifs matched in input sequence

Expression & Disease Data Modules

gget archs4

Query ARCHS4 for gene correlation or tissue expression.

Parameters:

ParameterTypeDescriptionDefault
genestrGene symbol or Ensembl IDRequired
-w/--whichstr'correlation' or 'tissue''correlation'
-s/--speciesstr'human' or 'mouse' (tissue only)'human'
-o/--outstrOutput file pathNone
-e/--ensemblflagInput is Ensembl IDFalse
-csvflagCSV format (CLI)False
-q/--quietflagSuppress progressFalse

Returns:

  • correlation: Gene symbols, Pearson correlation coefficients (top 100)
  • tissue: Tissue IDs, min/Q1/median/Q3/max expression

gget cellxgene

Query CZ CELLxGENE Discover Census for single-cell data.

Setup: Requires gget setup cellxgene

Parameters:

ParameterTypeDescriptionDefault
--gene (-g)listGene names or Ensembl IDs (case-sensitive!)Required
--tissuelistTissue type(s)None
--cell_typelistCell type(s)None
--species (-s)str'homo_sapiens' or 'mus_musculus''homo_sapiens'
--census_version (-cv)str"stable", "latest", or dated version"stable"
-o/--outstrOutput file path (required for CLI)Required
--ensembl (-e)flagUse Ensembl IDsFalse
--meta_only (-mo)flagReturn metadata onlyFalse
-q/--quietflagSuppress progressFalse

Additional filters: disease, development_stage, sex, assay, dataset_id, donor_id, ethnicity, suspension_type

Important: Gene symbols are case-sensitive ('PAX7' for human, 'Pax7' for mouse)

Returns: AnnData object with count matrices and metadata


gget enrichr

Perform enrichment analysis using Enrichr/modEnrichr.

Parameters:

ParameterTypeDescriptionDefault
geneslistGene symbols or Ensembl IDsRequired
-db/--databasestrReference database or shortcutRequired
-s/--speciesstrhuman, mouse, fly, yeast, worm, fish'human'
-bkg_l/--background_listlistBackground genesNone
-o/--outstrOutput file pathNone
-ko/--kegg_outstrKEGG pathway images directoryNone

Python-only:

  • plot (bool): Generate graphical results

Database shortcuts:

  • 'pathway' → KEGG_2021_Human
  • 'transcription' → ChEA_2016
  • 'ontology' → GO_Biological_Process_2021
  • 'diseases_drugs' → GWAS_Catalog_2019
  • 'celltypes' → PanglaoDB_Augmented_2021

Returns: Pathway/function associations with adjusted p-values, overlapping gene counts


gget bgee

Retrieve orthology and expression from Bgee.

Parameters:

ParameterTypeDescriptionDefault
ens_idstr/listEnsembl or NCBI gene IDRequired
-t/--typestr'orthologs' or 'expression''orthologs'
-o/--outstrOutput file pathNone
-csvflagCSV format (CLI)False
-q/--quietflagSuppress progressFalse

Note: Multiple IDs supported when type='expression'

Returns:

  • orthologs: Genes across species with IDs, names, taxonomic info
  • expression: Anatomical entities, confidence scores, expression status

gget opentargets

Retrieve disease/drug associations from OpenTargets.

Parameters:

ParameterTypeDescriptionDefault
ens_idstrEnsembl gene IDRequired
-r/--resourcestrdiseases, drugs, tractability, pharmacogenetics, expression, depmap, interactions'diseases'
-l/--limitintMaximum resultsNone
-o/--outstrOutput file pathNone
-csvflagCSV format (CLI)False
-q/--quietflagSuppress progressFalse

Resource-specific filters:

  • drugs: --filter_disease
  • pharmacogenetics: --filter_drug
  • expression/depmap: --filter_tissue, --filter_anat_sys, --filter_organ
  • interactions: --filter_protein_a, --filter_protein_b, --filter_gene_b

Returns: Disease/drug associations, tractability, pharmacogenetics, expression, DepMap, interactions


gget cbio

Plot cancer genomics heatmaps from cBioPortal.

Subcommands: search, plot

search parameters:

ParameterTypeDescriptionDefault
keywordslistSearch termsRequired

plot parameters:

ParameterTypeDescriptionDefault
-s/--study_idslistcBioPortal study IDsRequired
-g/--geneslistGene names or Ensembl IDsRequired
-st/--stratificationstrtissue, cancer_type, cancer_type_detailed, study_id, sampleNone
-vt/--variation_typestrmutation_occurrences, cna_nonbinary, sv_occurrences, cna_occurrences, ConsequenceNone
-f/--filterstrFilter by column value (e.g., 'study_id:msk_impact_2017')None
-dd/--data_dirstrCache directory./gget_cbio_cache
-fd/--figure_dirstrOutput directory./gget_cbio_figures
-t/--titlestrCustom figure titleNone
-dpiintResolution100
-q/--quietflagSuppress progressFalse
-nc/--no_confirmflagSkip download confirmationsFalse
-sh/--showflagDisplay plot in windowFalse

Returns: PNG heatmap figure


gget cosmic

Search COSMIC database for cancer mutations.

Important: License fees for commercial use. Requires COSMIC account.

Query parameters:

ParameterTypeDescriptionDefault
searchtermstrGene name, Ensembl ID, mutation, sample IDRequired
-ctp/--cosmic_tsv_pathstrPath to COSMIC TSV fileRequired
-l/--limitintMaximum results100
-csvflagCSV format (CLI)False

Download parameters:

ParameterTypeDescriptionDefault
-d/--download_cosmicflagActivate download modeFalse
-gm/--gget_mutateflagCreate version for gget mutateFalse
-cp/--cosmic_projectstrcancer, census, cell_line, resistance, genome_screen, targeted_screenNone
-cv/--cosmic_versionstrCOSMIC versionLatest
-gv/--grch_versionintHuman reference genome (37 or 38)None
--emailstrCOSMIC account emailRequired
--passwordstrCOSMIC account passwordRequired

Note: First-time users must download database

Returns: Mutation data from COSMIC


Additional Tools

gget mutate

Generate mutated nucleotide sequences.

Parameters:

ParameterTypeDescriptionDefault
sequencesstr/listFASTA file or sequencesRequired
-m/--mutationsstr/dfCSV/TSV file or DataFrameRequired
-mc/--mut_columnstrMutation column name'mutation'
-sic/--seq_id_columnstrSequence ID column'seq_ID'
-mic/--mut_id_columnstrMutation ID columnNone
-k/--kintLength of flanking sequences30
-o/--outstrOutput FASTA file pathstdout
-q/--quietflagSuppress progressFalse

Returns: Mutated sequences in FASTA format


gget gpt

Generate text using OpenAI's API.

Setup: Requires gget setup gpt and OpenAI API key

Parameters:

ParameterTypeDescriptionDefault
promptstrText input for generationRequired
api_keystrOpenAI API keyRequired
modelstrOpenAI model namegpt-3.5-turbo
temperaturefloatSampling temperature (0-2)1.0
top_pfloatNucleus sampling1.0
max_tokensintMaximum tokens to generateNone
frequency_penaltyfloatFrequency penalty (0-2)0
presence_penaltyfloatPresence penalty (0-2)0

Important: Free tier limited to 3 months. Set billing limits.

Returns: Generated text string


gget setup

Install/download dependencies for modules.

Parameters:

ParameterTypeDescriptionDefault
modulestrModule nameRequired
-o/--outstrOutput folder (elm only)Package install folder
-q/--quietflagSuppress progressFalse

Modules requiring setup:

  • alphafold - Downloads ~4GB model parameters
  • cellxgene - Installs cellxgene-census
  • elm - Downloads local ELM database
  • gpt - Configures OpenAI integration

Returns: None (installs dependencies)