scientific-skills/citation-management/references/pubmed_search.md
Comprehensive guide to searching PubMed for biomedical and life sciences literature, including MeSH terms, field tags, advanced search strategies, and E-utilities API usage.
PubMed is the premier database for biomedical literature:
PubMed automatically maps terms to MeSH and searches multiple fields:
diabetes
CRISPR gene editing
Alzheimer's disease treatment
cancer immunotherapy
Automatic Features:
Use quotation marks for exact phrases:
"CRISPR-Cas9"
"systematic review"
"randomized controlled trial"
"machine learning"
MeSH is a controlled vocabulary thesaurus for indexing biomedical literature:
MeSH Browser: https://meshb.nlm.nih.gov/search
Example:
Search: "heart attack"
MeSH term: "Myocardial Infarction"
In PubMed:
Basic MeSH search:
"Diabetes Mellitus"[MeSH]
"CRISPR-Cas Systems"[MeSH]
"Alzheimer Disease"[MeSH]
"Neoplasms"[MeSH]
MeSH with subheadings:
"Diabetes Mellitus/drug therapy"[MeSH]
"Neoplasms/genetics"[MeSH]
"Heart Failure/prevention and control"[MeSH]
Common subheadings:
/drug therapy: Drug treatment/diagnosis: Diagnostic aspects/genetics: Genetic aspects/epidemiology: Occurrence and distribution/prevention and control: Prevention methods/etiology: Causes/surgery: Surgical treatment/metabolism: Metabolic aspectsBy default, MeSH searches include narrower terms (explosion):
"Neoplasms"[MeSH]
# Includes: Breast Neoplasms, Lung Neoplasms, etc.
Disable explosion (exact term only):
"Neoplasms"[MeSH:NoExp]
Search only where MeSH term is a major focus:
"Diabetes Mellitus"[MeSH Major Topic]
# Only papers where diabetes is main topic
Field tags specify which part of the record to search.
Title and Abstract:
cancer[Title] # In title only
treatment[Title/Abstract] # In title or abstract
"machine learning"[Title/Abstract]
Author:
"Smith J"[Author]
"Doudna JA"[Author]
"Collins FS"[Author]
Author - Full Name:
"Smith, John"[Full Author Name]
Journal:
"Nature"[Journal]
"Science"[Journal]
"New England Journal of Medicine"[Journal]
"Nat Commun"[Journal] # Abbreviated form
Publication Date:
2023[Publication Date]
2020:2024[Publication Date] # Date range
2023/01/01:2023/12/31[Publication Date]
Date Created:
2023[Date - Create] # When added to PubMed
Publication Type:
"Review"[Publication Type]
"Clinical Trial"[Publication Type]
"Meta-Analysis"[Publication Type]
"Randomized Controlled Trial"[Publication Type]
Language:
English[Language]
French[Language]
DOI:
10.1038/nature12345[DOI]
PMID (PubMed ID):
12345678[PMID]
Article ID:
PMC1234567[PMC] # PubMed Central ID
humans[MeSH Terms] # Only human studies
animals[MeSH Terms] # Only animal studies
"United States"[Place of Publication]
nih[Grant Number] # NIH-funded research
"Female"[Sex] # Female subjects
"Aged, 80 and over"[Age] # Elderly subjects
Combine search terms with Boolean logic.
Both terms must be present (default behavior):
diabetes AND treatment
"CRISPR-Cas9" AND "gene editing"
cancer AND immunotherapy AND "clinical trial"[Publication Type]
Either term must be present:
"heart attack" OR "myocardial infarction"
diabetes OR "diabetes mellitus"
CRISPR OR Cas9 OR "gene editing"
Use case: Synonyms and related terms
Exclude terms:
cancer NOT review
diabetes NOT animal
"machine learning" NOT "deep learning"
Caution: May exclude relevant papers that mention both terms.
Use parentheses for complex logic:
(diabetes OR "diabetes mellitus") AND (treatment OR therapy)
("CRISPR" OR "gene editing") AND ("therapeutic" OR "therapy")
AND 2020:2024[Publication Date]
(cancer OR neoplasm) AND (immunotherapy OR "immune checkpoint inhibitor")
AND ("clinical trial"[Publication Type] OR "randomized controlled trial"[Publication Type])
Access: https://pubmed.ncbi.nlm.nih.gov/advanced/
Features:
Workflow:
Example built query:
#1: "Diabetes Mellitus, Type 2"[MeSH]
#2: "Metformin"[MeSH]
#3: "Clinical Trial"[Publication Type]
#4: 2020:2024[Publication Date]
#5: #1 AND #2 AND #3 AND #4
"Review"[Publication Type]
"Systematic Review"[Publication Type]
"Meta-Analysis"[Publication Type]
"Clinical Trial"[Publication Type]
"Randomized Controlled Trial"[Publication Type]
"Case Reports"[Publication Type]
"Comparative Study"[Publication Type]
humans[MeSH Terms]
mice[MeSH Terms]
rats[MeSH Terms]
"Female"[MeSH Terms]
"Male"[MeSH Terms]
"Infant"[MeSH Terms]
"Child"[MeSH Terms]
"Adolescent"[MeSH Terms]
"Adult"[MeSH Terms]
"Aged"[MeSH Terms]
"Aged, 80 and over"[MeSH Terms]
free full text[Filter] # Free full-text available
"Journal Article"[Publication Type]
NCBI provides programmatic access via E-utilities (Entrez Programming Utilities).
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
Main Tools:
No API key required, but recommended for:
Get API key: https://www.ncbi.nlm.nih.gov/account/
Retrieve PMIDs for a query.
Endpoint: /esearch.fcgi
Parameters:
db: Database (pubmed)term: Search queryretmax: Maximum results (default 20, max 10000)retstart: Starting position (for pagination)sort: Sort order (relevance, pub_date, author)api_key: Your API key (optional but recommended)Example URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&
term=diabetes+AND+treatment&
retmax=100&
retmode=json&
api_key=YOUR_API_KEY
Response:
{
"esearchresult": {
"count": "250000",
"retmax": "100",
"idlist": ["12345678", "12345679", ...]
}
}
Get full metadata for PMIDs.
Endpoint: /efetch.fcgi
Parameters:
db: Database (pubmed)id: Comma-separated PMIDsretmode: Format (xml, json, text)rettype: Type (abstract, medline, full)api_key: Your API keyExample URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&
id=12345678,12345679&
retmode=xml&
api_key=YOUR_API_KEY
Response: XML with complete metadata including:
Lighter-weight alternative to EFetch.
Example:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
db=pubmed&
id=12345678&
retmode=json&
api_key=YOUR_API_KEY
Returns: Key metadata without full abstract and details.
Find related articles or links to other databases.
Example:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?
dbfrom=pubmed&
db=pubmed&
id=12345678&
linkname=pubmed_pubmed_citedin
Link types:
pubmed_pubmed: Related articlespubmed_pubmed_citedin: Papers citing this articlepubmed_pmc: PMC full-text versionspubmed_protein: Related protein recordsWithout API key:
With API key:
Best practice:
import time
time.sleep(0.34) # ~3 requests/second
# or
time.sleep(0.11) # ~10 requests/second with API key
Get API key:
Use in requests:
&api_key=YOUR_API_KEY_HERE
Store securely:
# In environment variable
export NCBI_API_KEY="your_key_here"
# In script
import os
api_key = os.getenv('NCBI_API_KEY')
For systematic reviews and meta-analyses:
# 1. Identify key concepts
Concept 1: Diabetes
Concept 2: Treatment
Concept 3: Outcomes
# 2. Find MeSH terms and synonyms
Concept 1: "Diabetes Mellitus"[MeSH] OR diabetes OR diabetic
Concept 2: "Drug Therapy"[MeSH] OR treatment OR therapy OR medication
Concept 3: "Treatment Outcome"[MeSH] OR outcome OR efficacy OR effectiveness
# 3. Combine with AND
("Diabetes Mellitus"[MeSH] OR diabetes OR diabetic)
AND ("Drug Therapy"[MeSH] OR treatment OR therapy OR medication)
AND ("Treatment Outcome"[MeSH] OR outcome OR efficacy OR effectiveness)
# 4. Add filters
AND 2015:2024[Publication Date]
AND ("Clinical Trial"[Publication Type] OR "Randomized Controlled Trial"[Publication Type])
AND English[Language]
AND humans[MeSH Terms]
# Specific disease + clinical trials
"Alzheimer Disease"[MeSH]
AND ("Clinical Trial"[Publication Type]
OR "Randomized Controlled Trial"[Publication Type])
AND 2020:2024[Publication Date]
# Specific drug trials
"Metformin"[MeSH]
AND "Diabetes Mellitus, Type 2"[MeSH]
AND "Randomized Controlled Trial"[Publication Type]
# Systematic reviews on topic
"CRISPR-Cas Systems"[MeSH]
AND ("Systematic Review"[Publication Type] OR "Meta-Analysis"[Publication Type])
# Reviews in high-impact journals
cancer immunotherapy
AND "Review"[Publication Type]
AND ("Nature"[Journal] OR "Science"[Journal] OR "Cell"[Journal])
# Papers from last year
"machine learning"[Title/Abstract]
AND "drug discovery"[Title/Abstract]
AND 2024[Publication Date]
# Recent papers in specific journal
"CRISPR"[Title/Abstract]
AND "Nature"[Journal]
AND 2023:2024[Publication Date]
# Specific author's recent work
"Doudna JA"[Author] AND 2020:2024[Publication Date]
# Author + topic
"Church GM"[Author] AND "synthetic biology"[Title/Abstract]
# Meta-analyses and systematic reviews
(diabetes OR "diabetes mellitus")
AND (treatment OR therapy)
AND ("Meta-Analysis"[Publication Type] OR "Systematic Review"[Publication Type])
# RCTs only
cancer immunotherapy
AND "Randomized Controlled Trial"[Publication Type]
AND 2020:2024[Publication Date]
Basic search:
python scripts/search_pubmed.py "diabetes treatment"
With MeSH terms:
python scripts/search_pubmed.py \
--query '"Diabetes Mellitus"[MeSH] AND "Drug Therapy"[MeSH]'
Date range filter:
python scripts/search_pubmed.py "CRISPR" \
--date-start 2020-01-01 \
--date-end 2024-12-31 \
--limit 200
Publication type filter:
python scripts/search_pubmed.py "cancer immunotherapy" \
--publication-types "Clinical Trial,Randomized Controlled Trial" \
--limit 100
Export to BibTeX:
python scripts/search_pubmed.py "Alzheimer's disease" \
--limit 100 \
--format bibtex \
--output alzheimers.bib
Complex query from file:
# Save complex query in query.txt
cat > query.txt << 'EOF'
("Diabetes Mellitus, Type 2"[MeSH] OR "diabetes"[Title/Abstract])
AND ("Metformin"[MeSH] OR "metformin"[Title/Abstract])
AND "Randomized Controlled Trial"[Publication Type]
AND 2015:2024[Publication Date]
AND English[Language]
EOF
# Run search
python scripts/search_pubmed.py --query-file query.txt --limit 500
# Search multiple topics
TOPICS=("diabetes treatment" "cancer immunotherapy" "CRISPR gene editing")
for topic in "${TOPICS[@]}"; do
python scripts/search_pubmed.py "$topic" \
--limit 100 \
--output "${topic// /_}.json"
sleep 1
done
# Search returns PMIDs
python scripts/search_pubmed.py "topic" --output results.json
# Extract full metadata
python scripts/extract_metadata.py \
--input results.json \
--output references.bib
Start with MeSH terms:
Include text word variants:
# Better coverage
("Diabetes Mellitus"[MeSH] OR diabetes OR diabetic)
Use field tags appropriately:
[MeSH] for standardized concepts[Title/Abstract] for specific terms[Author] for known authors[Journal] for specific venuesBuild incrementally:
# Step 1: Basic search
diabetes
# Step 2: Add specificity
"Diabetes Mellitus, Type 2"[MeSH]
# Step 3: Add treatment
"Diabetes Mellitus, Type 2"[MeSH] AND "Metformin"[MeSH]
# Step 4: Add study type
"Diabetes Mellitus, Type 2"[MeSH] AND "Metformin"[MeSH]
AND "Clinical Trial"[Publication Type]
# Step 5: Add date range
... AND 2020:2024[Publication Date]
Too many results: Add filters
[MeSH Major Topic]Too few results: Broaden search
Irrelevant results: Refine terms
Document search strategy:
Export systematically:
Validate retrieved citations:
python scripts/validate_citations.py pubmed_results.bib
Set up search alerts:
Track specific journals:
"Nature"[Journal] AND CRISPR[Title]
Follow key authors:
"Church GM"[Author]
Solution:
Solution:
Solution:
Solution:
python scripts/format_bibtex.py results.bib \
--deduplicate \
--output clean.bib
Solution:
PubMed provides authoritative biomedical literature search:
✓ Curated content: MeSH indexing, quality control
✓ Precise search: Field tags, MeSH terms, filters
✓ Programmatic access: E-utilities API
✓ Free access: No subscription required
✓ Comprehensive: 35M+ citations, daily updates
Key strategies:
For broader coverage across disciplines, complement with Google Scholar.