metadata-ingestion/datahub-skills.md
datahub-skills is a Claude Code plugin that accelerates connector development with specialized skills for planning, standards loading, and code review. It is designed to be used alongside the adding a source guide ā the skills handle research, scaffolding, and review; you handle the code.
:::note
This guide assumes you have already set up your local development environment per the developing guide.
:::
Option 1 ā install script (recommended):
curl -fsSL https://raw.githubusercontent.com/acryldata/datahub-skills/main/install.sh | bash
Option 2 ā npx:
npx datahub-skills install
After installation, confirm the plugin is registered:
cat ~/.claude/plugins/installed_plugins.json
{
"version": 2,
"plugins": {
"datahub-skills@datahub-skills": [
{
"scope": "user",
"installPath": "/Users/you/.claude/plugins/cache/datahub-skills/datahub-skills/1.2.0",
"version": "1.2.0"
}
]
}
}
| Skill | Command | When to use |
|---|---|---|
| Load standards | /datahub-skills:load-standards | Start of every session ā loads golden connector standards into context |
| Plan connector | /datahub-skills:datahub-connector-planning | Before writing any code ā research, entity mapping, architecture decisions |
| Review connector | /datahub-skills:datahub-connector-pr-review | Before opening a PR ā automated review against standards |
Run this at the start of every Claude Code session. It loads the golden connector standards into context so all subsequent skills and code generation follow established patterns.
/datahub-skills:load-standards
Expected output:
Loaded connector standards. Ready to review.
Standards loaded:
- main.md ā base classes, SDK V2, config patterns
- patterns.md ā file organization, error handling, warning/failure reporting
- testing.md ā test requirements, golden files, anti-patterns
- code_style.md ā type safety, naming conventions, mypy compliance
- containers.md ā container hierarchy design
- lineage.md ā SqlParsingAggregator, view lineage, query log lineage
...
:::note
Standards are loaded into your current Claude Code context window. If you start a new session, run /load-standards again before continuing.
:::
Run the planning skill before writing any implementation code. It researches the source system, asks you a set of scoping questions, then generates a _PLANNING.md document that serves as the implementation blueprint.
/datahub-skills:datahub-connector-planning build a connector for [source name]
The skill runs in four stages:
_PLANNING.md at the repo root with entity mapping, architecture decisions, and an ordered implementation planAfter the document is generated, you will see a summary and a prompt:
## Planning Document Created
Location: `_PLANNING.md`
### Key Decisions:
- Base class: StatefulIngestionSourceBase ā no SQLAlchemy dialect exists
- Entity mapping: Pipeline ā DataFlow, Table ā DataJob, outlet lineage to destination Datasets
- Lineage approach: outlet URNs from state files; inlet URNs user-configured
- Test strategy: filesystem fixtures (no live service required)
Do you approve proceeding to implementation?
Reply approved to proceed, or describe changes and the skill will revise the document before continuing.
This is your implementation step. Use the plan in _PLANNING.md as the guide and refer to CLAUDE.md for build, lint, and test commands.
Key commands during development:
# Lint and format Python code
./gradlew :metadata-ingestion:lintFix
# Run unit tests
./gradlew :metadata-ingestion:testQuick
# Run a single test file
pytest tests/unit/myconnector/test_myconnector_source.py -v
See the developing guide for the complete command reference.
Run the review skill before opening a PR. It performs a full standards compliance review ā architecture, code quality, type safety, error handling, test coverage, and documentation ā and produces a verdict with prioritized findings.
/datahub-skills:datahub-connector-pr-review
The skill outputs a structured report:
## Connector Review Report
Verdict: NEEDS CHANGES ā 3 blockers require fixes before merge.
| Category | Status |
| ----------------- | -------------- |
| Architecture | ā
PASS |
| Code Organization | ā
PASS |
| Error Handling | š“ NEEDS FIXES |
| Type Safety | š” NEEDS FIXES |
| Test Quality | š“ NEEDS FIXES |
| Performance | ā
PASS |
### š“ BLOCKERS (Must Fix)
B1 ā Silent exception swallow in state.json parsing
dlt_client.py:269 ā except Exception: pass with no logging...
Severity levels:
| Level | Meaning | Action |
|---|---|---|
| š“ BLOCKER | Violates standards or will cause issues in production | Must fix before merge |
| š” WARNING | Significant issue that should be addressed | Should fix |
| ā¹ļø SUGGESTION | Would improve quality | Optional |
After fixing blockers, re-run the review skill. It resumes with full context from the previous session:
/datahub-skills:datahub-connector-pr-review
skill_docs/ conventionCommit the skill-generated artifacts alongside your connector code. This gives PR reviewers full context on design decisions and known issues, and creates a paper trail for follow-up work.
Place them in src/datahub/ingestion/source/<connector>/skill_docs/:
src/datahub/ingestion/source/myconnector/
āāā __init__.py
āāā config.py
āāā myconnector.py
āāā myconnector_client.py
āāā myconnector_report.py
āāā skill_docs/
āāā _PLANNING.md # /datahub-connector-planning output
āāā _datahub-connector-pr-review-YYYY-MM-DD.md # /datahub-connector-pr-review output
/load-standards at the start of a new Claude Code session ā standards are not persisted across sessions_PLANNING.md; answer them rather than skipping, as they determine the architecture decisions in the planskill_docs/ alongside the connector code; PR reviewers use it to understand design decisions without re-reading the full diff