scientific-skills/hugging-science/references/topics-and-slugs.md
The Hugging Science catalog organizes scientific resources across 17 topics. Each has a markdown file at https://huggingscience.co/topics/<slug>.md. Slugs are lowercase and hyphenated.
| Slug | What's in here |
|---|---|
astronomy | Galaxy/stellar surveys, cosmology, exoplanets, telescope imagery, foundation models for astronomical data |
benchmark | Cross-domain evaluation suites — useful when comparing methods or running standard tests |
biology | Protein/DNA/single-cell data and models, antibodies, bioacoustics, microbiome — broad biology umbrella |
biotechnology | Synthetic biology, fermentation, applied genetic engineering data |
chemistry | Molecules, reactions, drug discovery, SMILES corpora, DFT data, ligand-protein interactions |
climate | Weather forecasting, climate models, atmospheric data, storm/flood prediction |
conservation | Wildlife monitoring, biodiversity, camera-trap and bioacoustic models |
earth-science | Remote sensing, satellite imagery, geospatial foundation models |
ecology | Species distribution, ecosystem dynamics, biogeography (overlaps with conservation/biology) |
energy | Battery materials, fusion plasma, grid simulation, renewables modeling |
engineering | CAD, mechanical/structural simulation, robotics datasets |
genomics | DNA language models, variant effect, single-cell, phylogenetics (overlaps heavily with biology) |
materials-science | Crystal structures, band gaps, catalysts, perovskites, alloys, materials foundation models |
mathematics | Theorem proving, formal math, mathematical reasoning datasets |
medicine | Pathology, radiology, clinical NLP, drug-disease, EHR (overlaps with biology/chemistry) |
physics | PDE solvers, fluid/plasma simulation, particle physics, physics-informed ML |
scientific-reasoning | LLMs for scientific QA, paper understanding, multi-step scientific reasoning |
Most real tasks span multiple slugs. Pull all relevant ones rather than guessing:
chemistry, biology, medicinebiology, chemistryclimate, earth-science, physicsbiology, genomics, medicinematerials-science, chemistry, energybiology, ecology, conservationWhen in doubt, fall back to python scripts/fetch_catalog.py search "<keyword>" against the full catalog.
Each catalog entry is an H3 block with bulleted metadata followed by a description. Three flavors:
### org/dataset-name
- **Type**: <category, e.g. "Genomics", "Pathology", "PDE Simulation">
- **Tags**: <comma-separated topic tags>
- **HuggingFace**: https://huggingface.co/datasets/org/dataset-name
<one-line description>
### Model Display Name
- **Type**: <category, e.g. "Protein Language Model", "Materials Foundation Model">
- **Tags**: <comma-separated topic tags>
- **HuggingFace**: https://huggingface.co/org/model-id
<one-line description>
### Post Title
- **Author**: <username>
- **Date**: <YYYY-MM-DD>
- **Tags**: <comma-separated>
- **Link**: <URL — usually huggingface.co/blog/...>
<one-line description>
https://huggingscience.co/llms.txt — compact site indexhttps://huggingscience.co/llms-full.txt — every entry, every domain (this is the file to fetch when you want to grep across the whole catalog)https://huggingscience.co/topics/<slug>.md — one domainhttps://huggingscience.co/feed.xml — RSS for new entriesThe fetch_catalog.py script wraps these and adds parsing, filtering, and JSON output. Prefer the script for structured access; use raw WebFetch/curl only if the script fails.