scientific-skills/literature-review/references/database_strategies.md
This document provides comprehensive guidance for searching multiple literature databases systematically and effectively.
gget skill or WebFetch tool"CRISPR"[Title] AND "gene editing"[Title/Abstract] AND 2020:2024[Publication Date]gget skill or direct APIcat:q-bio.QM AND title:"single cell"gget skill or bioservices skillgget skill or bioservices skillbioservices skillgget skill or direct downloadgget skill with alphafold commandgget or direct APIFor clinical/biomedical reviews:
Example: "What is the efficacy of CRISPR-Cas9 gene therapy (I) for treating sickle cell disease (P) compared to standard care (C) in improving patient outcomes (O)?"
Identify 2-4 main concepts from your research question.
Example:
List alternative terms, abbreviations, and related concepts.
Tool: Use MeSH (Medical Subject Headings) browser for standardized terms
Example: (CRISPR OR Cas9 OR "gene editing") AND ("sickle cell" OR SCD) AND therapy
* or %: Matches any characters?: Matches single characterExample: genom* matches genomic, genomics, genome
Search at least 3 complementary databases:
| Database | Field Tags | Example |
|---|---|---|
| PubMed | [Title], [Author], [MeSH] | "CRISPR"[Title] AND 2020:2024[DP] |
| arXiv | ti:, au:, cat: | ti:"machine learning" AND cat:q-bio.QM |
| Semantic Scholar | title:, author:, year: | title:"deep learning" year:2020-2024 |
search_databases.py --deduplicate to remove duplicatesAll searches must be documented for reproducibility:
## Search Strategy
### Database: PubMed
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**:
("CRISPR"[Title] OR "Cas9"[Title] OR "gene editing"[Title/Abstract]) AND ("sickle cell disease"[MeSH] OR "SCD"[Title/Abstract]) AND ("gene therapy"[MeSH] OR "therapeutic editing"[Title/Abstract]) AND 2015:2024[Publication Date] AND English[Language]
- **Results**: 247 articles
- **After deduplication**: 189 articles
### Database: bioRxiv
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**: "CRISPR" AND "sickle cell" (in title/abstract)
- **Results**: 34 preprints
- **After deduplication**: 28 preprints
### Total Unique Articles
- **Combined results**: 217 unique articles
- **After title screening**: 156 articles
- **After abstract screening**: 89 articles
- **After full-text screening**: 52 articles included in review
Always prioritize papers based on citation count, venue quality, and author reputation. Quality matters more than quantity.
Use citation counts to identify influential work:
| Paper Age | Citations | Classification |
|---|---|---|
| 0-3 years | 20+ | Noteworthy |
| 0-3 years | 100+ | Highly Influential |
| 3-7 years | 100+ | Significant |
| 3-7 years | 500+ | Landmark |
| 7+ years | 500+ | Seminal |
| 7+ years | 1000+ | Foundational |
Database-Specific Citation Features:
Prioritize papers from higher-tier venues:
Tier 1 (Always Prefer):
source:Nature or journal:Nature in Google ScholarTier 2 (High Priority):
Tier 3 (Include When Relevant):
PubMed Journal Filtering:
"Nature"[Journal] OR "Science"[Journal] OR "Cell"[Journal]
Google Scholar Journal Filtering:
source:Nature source:Science source:Cell
Finding Influential Work:
Identifying Seminal Papers:
Semantic Scholar Features:
Find papers that cite a key paper:
Review references in key papers:
Follow prolific and reputable authors in the field:
Many databases suggest related articles:
Too narrow search: Missing relevant papers
Too broad search: Thousands of irrelevant results
Single database: Incomplete coverage
Ignoring preprints: Missing latest findings
No documentation: Irreproducible search
Manual deduplication: Time-consuming and error-prone
Unverified citations: Broken DOIs, incorrect metadata
Publication bias: Only including published positive results
# Example workflow using available skills
# 1. Search PubMed via gget
search_term = "CRISPR AND sickle cell disease"
# Use gget search pubmed search_term
# 2. Search bioRxiv
# Use gget search biorxiv search_term
# 3. Search arXiv for computational papers
# Search arXiv with: cat:q-bio AND "CRISPR" AND "sickle cell"
# 4. Search Semantic Scholar via API
# Use semantic scholar API with search query
# 5. Aggregate and deduplicate results
# python search_databases.py combined_results.json --deduplicate --format markdown --output review_papers.md
# 6. Verify all citations
# python verify_citations.py review_papers.md
# 7. Generate final PDF
# python generate_pdf.py review_papers.md --citation-style nature
https://meshb.nlm.nih.gov/search
https://www.ncbi.nlm.nih.gov/books/NBK3827/
See references/citation_styles.md in this skill
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: http://www.prisma-statement.org/