scientific-skills/hypothesis-generation/references/literature_search_strategies.md
Comprehensive literature search is essential for grounding hypotheses in existing evidence. This reference provides strategies for both PubMed (biomedical literature) and general scientific search.
Clarify search goals:
Use WebFetch with PubMed URLs for:
Why: Reviews synthesize literature, identify key concepts, and provide comprehensive reference lists.
Search strategy:
Example searches:
https://pubmed.ncbi.nlm.nih.gov/?term=wound+healing+diabetes+reviewhttps://pubmed.ncbi.nlm.nih.gov/?term=gut+microbiome+cognition+systematic+reviewWhy: MeSH terms are standardized vocabulary that captures concept variations.
Strategy:
Example:
AND: Narrow search (all terms must be present)
diabetes AND wound healing AND inflammationOR: Broaden search (any term can be present)
(Alzheimer OR dementia) AND gut microbiomeNOT: Exclude terms
cancer treatment NOT surgeryQuotes: Exact phrases
"oxidative stress"Wildcards: Variations
gene* finds gene, genes, genetic, geneticsPublication types:
Date filters:
Strategy:
Mechanistic understanding:
https://pubmed.ncbi.nlm.nih.gov/?term=(mechanism+OR+pathway)+AND+[phenomenon]+AND+(molecular+OR+cellular)
Causal relationships:
https://pubmed.ncbi.nlm.nih.gov/?term=[exposure]+AND+[outcome]+AND+(randomized+controlled+trial+OR+cohort+study)
Biomarkers and associations:
https://pubmed.ncbi.nlm.nih.gov/?term=[biomarker]+AND+[disease]+AND+(association+OR+correlation+OR+prediction)
Treatment effectiveness:
https://pubmed.ncbi.nlm.nih.gov/?term=[intervention]+AND+[condition]+AND+(efficacy+OR+effectiveness+OR+clinical+trial)
Use WebSearch for:
Include field-specific terminology:
Search operators:
site:arxiv.org - Preprints (physics, CS, math, quantitative biology)site:biorxiv.org - Biology preprintssite:edu - Academic institutionsfiletype:pdf - Academic papers (often)Example searches:
superconductivity high temperature mechanism site:arxiv.orgCRISPR off-target effects site:biorxiv.orgWhen you find a relevant paper:
Strategies:
Structure:
Boolean logic:
"spike protein mutation"(transmissibility OR transmission rate)"spike protein" AND (transmissibility OR virulence) AND mutationStart with reviews (PubMed or Web Search):
Focused primary research (PubMed):
Broaden with web search:
Citation mining:
Iterative refinement:
Goal: Understand how something works
Search components:
Examples:
diabetic wound healing mechanism inflammationautophagy pathway cancerGoal: Find what factors are related
Search components:
Examples:
vitamin D cardiovascular disease associationgut microbiome diversity predicts cognitive functionGoal: Evidence for what works
Search components:
Examples:
probiotic intervention depression randomized controlled trialexercise intervention cognitive decline efficacyGoal: How to test hypothesis
Search components:
Examples:
CRISPR screen cancer drug resistancemeasure protein-protein interaction methodsGoal: Find insights from related phenomena
Search components:
Examples:
nitrogen fixation rhizobia legumesantibiotic resistance evolution mechanismsCitation counts indicate influence and importance in the field. Interpret citations relative to paper age and field norms:
| Paper Age | Citations | Interpretation |
|---|---|---|
| 0-3 years | 20+ | Noteworthy - gaining traction |
| 0-3 years | 100+ | Highly Influential - significant impact already |
| 3-7 years | 100+ | Significant - established contribution |
| 3-7 years | 500+ | Landmark - major contribution to field |
| 7+ years | 500+ | Seminal - widely recognized important work |
| 7+ years | 1000+ | Foundational - field-defining paper |
Field-specific considerations:
Tier 1 - Premier Venues (Always Prefer):
Tier 2 - High-Impact Specialized (Strong Preference):
Tier 3 - Respected Specialized (Include When Relevant):
Tier 4 - Other Peer-Reviewed (Use Sparingly):
Prefer papers from established researchers:
Strong Author Indicators:
How to Check Author Reputation:
For ML/AI and computer science topics, conference rankings matter:
A (Flagship) - Equivalent to Nature/Science:*
A (Excellent) - Equivalent to Tier-2 Journals:
B (Good) - Equivalent to Tier-3 Journals:
Strong quality signals:
Red flags:
Systematic reviews (highest quality):
Narrative reviews (variable quality):
For straightforward hypotheses (30-60 min):
For complex hypotheses (1-3 hours):
For contentious topics (3+ hours):
Signs you've searched enough:
When to search more:
For each relevant paper:
Group by:
Synthesis notes:
For report structure: Organize citations for two audiences:
Main Text (15-20 key citations):
Appendix A: Comprehensive Literature Review (40-60+ citations):
Target citation density: Aim for 50+ total references to provide comprehensive support for all claims and demonstrate thorough literature grounding.
Grouping strategy for Appendix A:
Define search goals (5 min):
Broad review search (15-20 min):
Targeted primary research (30-45 min):
Cross-domain search (15-30 min):
Citation mining (15-30 min):
Synthesize findings (20-30 min):
When initial search is insufficient:
Red flags requiring more search:
Confirmation bias: Only seeking evidence supporting preferred hypothesis
Recency bias: Only considering recent work, missing foundational studies
Too narrow: Missing relevant work due to restrictive terms
Too broad: Overwhelmed by irrelevant results
Single database: Missing important work in other fields
Stopping too soon: Insufficient evidence to ground hypotheses
Cherry-picking: Citing only supportive papers
When little published work exists:
When evidence is contradictory:
When spanning multiple fields:
Direct applications:
Indirect applications:
Too literature-dependent:
Too literature-independent:
Optimal balance: