scientific-skills/database-lookup/references/sra.md
Sequencing run metadata: experiments, samples, studies, and runs. Accessible via E-utilities with db=sra. Returns XML metadata describing sequencing experiments, platforms, library strategies, and sample attributes.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
&api_key=YOUR_KEY.tool and email parameters.GET esearch.fcgi?db=sra&term=QUERY&retmax=N&retmode=json
Example -- search RNA-seq experiments in human:
GET esearch.fcgi?db=sra&term=RNA-seq[Strategy] AND Homo sapiens[Organism]&retmax=5&retmode=json
Response:
{
"esearchresult": {
"count": "584231",
"retmax": "5",
"idlist": ["28574913", "28574912", "28574911", ...]
}
}
Example -- search by accession:
GET esearch.fcgi?db=sra&term=SRP123456[Accession] OR SRR123456[Accession]&retmode=json
GET efetch.fcgi?db=sra&id=IDS&rettype=full&retmode=xml
Example -- fetch metadata for an SRA record:
GET efetch.fcgi?db=sra&id=28574913&rettype=full&retmode=xml
Response (abbreviated XML):
<EXPERIMENT_PACKAGE_SET>
<EXPERIMENT_PACKAGE>
<EXPERIMENT accession="SRX12345" alias="...">
<TITLE>RNA-seq of human liver tissue</TITLE>
<STUDY_REF accession="SRP12345"/>
<DESIGN>
<LIBRARY_DESCRIPTOR>
<LIBRARY_STRATEGY>RNA-Seq</LIBRARY_STRATEGY>
<LIBRARY_SOURCE>TRANSCRIPTOMIC</LIBRARY_SOURCE>
<LIBRARY_SELECTION>cDNA</LIBRARY_SELECTION>
<LIBRARY_LAYOUT><PAIRED/></LIBRARY_LAYOUT>
</LIBRARY_DESCRIPTOR>
</DESIGN>
<PLATFORM>
<ILLUMINA><INSTRUMENT_MODEL>Illumina NovaSeq 6000</INSTRUMENT_MODEL></ILLUMINA>
</PLATFORM>
</EXPERIMENT>
<SUBMISSION accession="SRA12345" center_name="GEO"/>
<Organization><Name>Some Institute</Name></Organization>
<STUDY accession="SRP12345">
<DESCRIPTOR>
<STUDY_TITLE>Transcriptomic analysis of human tissues</STUDY_TITLE>
<STUDY_TYPE existing_study_type="Transcriptome Analysis"/>
</DESCRIPTOR>
</STUDY>
<SAMPLE accession="SRS12345">
<TITLE>Human liver RNA</TITLE>
<SAMPLE_ATTRIBUTES>
<SAMPLE_ATTRIBUTE><TAG>tissue</TAG><VALUE>liver</VALUE></SAMPLE_ATTRIBUTE>
<SAMPLE_ATTRIBUTE><TAG>cell_type</TAG><VALUE>hepatocyte</VALUE></SAMPLE_ATTRIBUTE>
</SAMPLE_ATTRIBUTES>
</SAMPLE>
<RUN_SET>
<RUN accession="SRR12345" total_spots="45000000" total_bases="9000000000">
<Statistics nreads="2">
<Read average="150" count="45000000"/>
</Statistics>
</RUN>
</RUN_SET>
</EXPERIMENT_PACKAGE>
</EXPERIMENT_PACKAGE_SET>
GET esummary.fcgi?db=sra&id=IDS&retmode=json
Returns: experiment title, platform, total runs/spots/bases, create date, study/sample accessions as an XML string in the expxml and runs fields.
GET elink.fcgi?dbfrom=sra&db=biosample&id=SRA_UID
GET elink.fcgi?dbfrom=sra&db=gds&id=SRA_UID
| Prefix | Entity |
|---|---|
SRP / ERP / DRP | Study |
SRX / ERX / DRX | Experiment |
SRS / ERS / DRS | Sample |
SRR / ERR / DRR | Run |
SRA | Submission |
# By organism and strategy
term=Mus musculus[Organism] AND WGS[Strategy]
# By platform
term=Illumina[Platform] AND ATAC-seq[Strategy] AND human[Organism]
# By study accession
term=SRP123456[Accession]
# By BioProject
term=PRJNA123456[BioProject]
# By date range
term=("2024/01/01"[Publication Date] : "2024/12/31"[Publication Date])
# By library source
term=GENOMIC[Source] AND ChIP-Seq[Strategy] AND cancer[Text Word]
# By read count range
term=10000000:100000000[ReadLength]
# Combined complex query
term=(RNA-Seq[Strategy] AND paired[Layout] AND Homo sapiens[Organism] AND Illumina[Platform])
usehistory=y with WebEnv/query_key, fetch in batchesfastq-dump/fasterq-dump) or the SRA cloud URLs