Back to Airflow

Prepare Providers Documentation (AI-driven)

.agents/skills/prepare-providers-documentation/SKILL.md

3.3.0b127.6 KB
Original Source
<!-- SPDX-License-Identifier: Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 -->

Prepare Providers Documentation (AI-driven)

This skill replaces the manual commit-by-commit classification step that the release manager normally performs when running breeze release-management prepare-provider-documentation. Instead of asking the release manager to type d/b/f/x/m/s/v for each commit, the skill drives the classification itself — inspecting every PR (with extra care for potentially breaking changes), scoping multi-provider PRs to the slice that touched the current provider, and asking the release manager only when genuinely uncertain.

The skill keeps the existing breeze tooling as the source of truth for template generation. Claude only owns the classification + version bump + changelog entries; everything else (__init__.py, README.rst, pyproject.toml, conf.py, get_provider_info.py, index.rst) is still regenerated by breeze release-management prepare-provider-documentation --reapply-templates-only.

[!IMPORTANT] This is a release-manager workflow. It mutates provider.yaml and changelog.rst for many providers in one pass. Always run on a clean working tree (or in a dedicated branch) and let the release manager review the diff before committing.


When to Use This Skill

Use during the regular provider release cycle, in place of either of:

shell
breeze release-management prepare-provider-documentation
breeze release-management prepare-provider-documentation --incremental-update

…when the release manager wants Claude to classify the changes instead of doing it by hand. The skill covers the same scope: classifying changes, bumping versions, generating changelog sections, reapplying templates, and folding new commits into an already-prepared release PR (incremental update).

Two entry points:

  • Initial run — classify everything from scratch for a new release. Follow Phases 1–5 below.
  • Incremental update — extend an existing release PR with commits that landed on main since the changelog was first generated (typical when rebasing a release PR before merging). Skip ahead to the Incremental Update section after Phase 5.

Do not use this skill for:

  • --only-min-version-update runs (these don't need classification — just run breeze directly).
  • Releasing from a non-main base branch unless you also pass the right --base-branch to the breeze invocations described below.
  • Removing providers (state changes belong in a separate PR).

Inputs You Need Before Starting

Ask the release manager (and confirm by reading the answers back) for:

  1. RELEASE_DATE in YYYY-MM-DD (or YYYY-MM-DD_NN) format, e.g. 2026-04-26. This is what breeze stamps into providers/.last_release_date.txt.
  2. Base branch — defaults to main. Only override when releasing from a provider-specific branch (e.g. provider-cncf-kubernetes/v4-4).
  3. Subset of providers, if any. By default, classify every provider that has pending changes since its last release tag. If the release manager wants a subset (or has set DISTRIBUTIONS_LIST), use that list.
  4. Include flags: whether to include --include-not-ready-providers and/or --include-removed-providers.

Set the environment for the session:

bash
export RELEASE_DATE=<date>
# Optional, scopes everything to a subset
export DISTRIBUTIONS_LIST="<provider1> <provider2> ..."

Make sure the apache-https-for-providers git remote exists and is up to date — running breeze the first time below will recreate and fetch it.


Workflow

The skill runs in five phases. Mark tasks with TaskCreate for each phase and tick them off as you go — the release manager wants to see progress.

Phase 1 — Discover providers with pending changes

For each provider, the source of truth for "what changed since last release" is the same git query breeze uses internally: commits between the latest release tag for that provider (providers-<id>/<version>) and apache-https-for-providers/<base-branch>, restricted to the provider's own folders.

Discover in batch by running:

bash
breeze release-management prepare-provider-documentation \
    --non-interactive \
    --skip-changelog \
    --skip-readme \
    --release-date "$RELEASE_DATE"

[!WARNING] Do not commit the result of that command. --non-interactive answers the classification prompts with random values — Claude will overwrite the changelog and version bumps in Phase 4 with real classifications. The only reason to run breeze first is to refresh the apache remote, regenerate build files, and confirm which providers have pending changes (read the "Summary of prepared documentation" block at the end).

Record from the summary:

  • Success — providers that had real changes (these need classification).
  • Docs only — providers with only documentation changes (already handled by breeze; skip in Phase 2).
  • Skipped on no changes — nothing to do.

Reset the per-provider files that breeze touched but you'll be rewriting yourself before continuing:

bash
git checkout -- $(git diff --name-only -- '**/provider.yaml' '**/changelog.rst')

This leaves the regenerated build files (__init__.py, README.rst, pyproject.toml, conf.py, get_provider_info.py, index.rst) in place and discards only the stuff Claude is about to rewrite.

Phase 2 — Per-provider commit list

For each provider in Success from Phase 1, get the same commit list that breeze would have shown. From the repo root:

bash
PROVIDER_ID=<dotted.id>      # e.g. amazon, cncf.kubernetes
PROVIDER_PATH=$(echo "$PROVIDER_ID" | tr '.' '/')   # folder path: cncf/kubernetes
PROVIDER_TAG=$(echo "$PROVIDER_ID" | tr '.' '-')    # tag segment: cncf-kubernetes
# Pick the latest *final* release tag. Two gotchas the tag pattern must handle:
#  * dotted provider ids use HYPHENS in tag names (providers-cncf-kubernetes/<ver>),
#    even though the source folder uses slashes — build the tag prefix from
#    PROVIDER_TAG, not PROVIDER_PATH;
#  * skip the sentinel upper-bound tags (providers-<id>/99.98.0, /99.99.0) and rc
#    tags — git's default version sort orders "1.2.0rc1" AFTER "1.2.0", so a bare
#    `head -n1` would otherwise select a sentinel or a release candidate.
LAST_TAG=$(git tag --list "providers-${PROVIDER_TAG}/*" --sort=-v:refname \
    | grep -vE '/99\.9[0-9]\.' | grep -vE 'rc[0-9]+$' | head -n1)
git log --pretty=format:'%H %h %cd %s' --date=short \
    "${LAST_TAG}..apache-https-for-providers/main" \
    -- "providers/${PROVIDER_PATH}/"

[!WARNING] This git query is a convenience for building the per-provider commit list, but the authoritative set is what breeze prints in the Phase 1 "Commit" tables for each provider. The tag-based range can still diverge from breeze when a provider's most recent final tag is not the last actually-published release (for example, a wave commit bumped the version on main but the published baseline is older), which makes breeze include repo-wide commits this query misses. When the two disagree, trust breeze's list and reconcile against it before classifying.

Capture the full hash, short hash, date, subject, and #NNNN PR number for each commit. Note that some old providers also have legacy paths under airflow/providers/<id>/ — include those when present (consult provider_details.possible_old_provider_paths semantics by checking the provider's provider.yaml history if needed).

Phase 3 — Classify each PR with sub-agents

For each commit, classify it into one of:

CodeMeaningVersion bump
dDocumentation-onlynone (patch if combined)
bBug fixpatch
fFeatureminor
xBreaking changemajor
mMisc (deps, refactors, internal only)patch
sSkip (test/CI/example only — no user impact)none
vMin Airflow version bumpminor (treated as misc + bump)

Auto-classify cheap cases first

Before spawning a sub-agent, apply the same fast heuristics breeze uses (see classify_provider_pr_files in dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py):

  • All changed files match providers/<id>/docs/**/*.rstd (docs).
  • All changed files match providers/<id>/tests/** or providers/<id>/src/airflow/providers/<id>/example_dags/**s (skip).
  • Subject contains Bump minimum Airflow version and only __init__.py / provider.yaml changed → v.

Note these classifications and move on — no sub-agent needed.

Sub-agent per PR for the rest

For the remaining commits, spawn sub-agents in parallel (batches of 5–10 to avoid context pressure). Use the Explore agent type — they need read-only access. Brief each sub-agent with:

Classify a single Apache Airflow provider PR.

PR:        #<NNNN>
Commit:    <full-hash>
Subject:   <subject>
Provider:  <provider-id>      (path: providers/<provider-path>/)

Tasks:
1. Read the PR's title, body, and labels:
   `gh pr view <NNNN> --json title,body,labels,files`
2. Read the diff for the slice of the PR that touched
   providers/<provider-path>/ only:
   `gh pr diff <NNNN> -- 'providers/<provider-path>/**'`
   (When the PR touches multiple providers, you only care about the slice
   for THIS provider — ignore the others when classifying.)
3. Decide a single classification:
   - documentation: only docs/comments/typos in the provider slice
   - bugfix:        fixes incorrect behavior, no API changes
   - feature:       adds new capability, parameter, operator, sensor, hook,
                    or extends an existing one in a backwards-compatible way
   - breaking:      see "Breaking-change checklist" below
   - misc:          dependency bumps, internal refactors, packaging-only
                    changes, type-hint cleanups, no user-visible behavior
   - skip:          only tests/examples/CI for this provider's slice
   - min_airflow_bump: explicitly bumps the minimum Airflow version pin
4. Output strictly:
   CLASSIFICATION: <one of: documentation|bugfix|feature|breaking|misc|skip|min_airflow_bump>
   CONFIDENCE: <high|medium|low>
   JUSTIFICATION: <one sentence>
   BREAKING_RISK: <none|maybe|yes>     (set "maybe" when the diff has any
                                         signal from the breaking-change
                                         checklist, even if you think the
                                         author intended otherwise)

Breaking-change checklist (any of these → BREAKING_RISK >= maybe; usually
breaking unless clearly behind a deprecation shim):
  * Public class/function/method removed or renamed
    in the **public interface** of the provider — i.e. files under
    `providers/<path>/src/**/{hooks,operators,sensors,triggers,
    notifications,decorators,executors}/**`, the provider's
    top-level package `__init__.py`, plus anything imported by
    `provider.yaml` (`hook-class-names`, `extra-links`, etc.).
    Internal helpers (e.g. `utils/`, `_internal/`, `pod_manager.py`,
    or any module not re-exported from the package or referenced
    in `provider.yaml`) are NOT breaking on their own. NOT in tests/.
  * Required parameter added to a public constructor or operator __init__
  * Default value of a public parameter changed
  * Return type or signature of a public method changed
  * `extra_dejson` / connection-form fields removed or renamed
  * Behavior change in `execute()`, `poke()`, `get_conn()` that produces
    different results for the same inputs
  * Minimum Python or Airflow version bumped (separate: that's
    min_airflow_bump unless the bump excludes a previously supported version
    of a provider's hard dependency, in which case it's also breaking)
  * Removed deprecation: a previously-deprecated symbol is now deleted
  * Schema change in stored data (xcom, connection, asset metadata,
    or the serialized state/context of a `BaseTrigger` subclass —
    deferred tasks survive provider upgrades only if the trigger's
    `serialize()` payload stays compatible)

Do NOT trust the PR title alone — read the diff. A PR titled "Refactor X"
that removes a public method is breaking. A PR titled "BREAKING: rename
foo" that only renames a private symbol is not.

Collect all sub-agent results into a table.

Phase 3.5 — Confirm with the release manager

Print a per-provider summary in this exact format (so the release manager can scan it quickly):

Provider: amazon
Current version: 9.12.0
Most-impactful change: feature → next version: 9.13.0

Commits (12):
  abc1234  d   high   docs: fix S3 example                                  #65000
  def5678  b   high   Fix retry on transient SQS error                      #65010
  9ab0123  f   high   Add wait_for_completion to AthenaOperator              #65020
  4cd5678  x   med    Remove deprecated S3Hook.list_objects                  #65030  ⚠ BREAKING
  7ef9012  m   high   Bump aiobotocore to 2.13                              #65040
  ...
Uncertain: 2 commits below — please confirm:
  4cd5678  x   med    Remove deprecated S3Hook.list_objects (#65030)
    Why: list_objects is documented as deprecated since 8.0.0 but never
    raised DeprecationWarning, so removal may surprise users.
  abc4321  ?   low    "Refactor Athena client" (#65060)
    Why: PR description says non-breaking but diff changes the default
    region resolution from env to provider extras.

Always escalate to the release manager when:

  • CONFIDENCE: low from any sub-agent.
  • BREAKING_RISK: maybe but the sub-agent classified as anything other than breaking.
  • Same PR appears in multiple providers and got different classifications across them — explain why and let the RM call it.
  • Most-impactful change is breaking (major bump): always reconfirm explicitly before applying. Major bumps are never silent.

If the release manager corrects a classification, save it in your classification table and re-derive the most-impactful change.

Phase 4 — Apply classifications

For each provider, in order:

4a. Bump the version in provider.yaml

Open providers/<provider-path>/provider.yaml, find the versions: block, and prepend the new version. The bump rule (most-impactful classification across all commits for this provider, computed in Phase 3.5):

Most-impactfulBump
breakingmajor (X+1.0.0)
featureminor (X.Y+1.0)
min_airflow_bumpminor (X.Y+1.0)
bugfixpatch (X.Y.Z+1)
miscpatch (X.Y.Z+1)
documentation onlyno bump — handle as doc-only (see below)
skip onlyno bump — nothing to do

Also update source-date-epoch: to the current int(time.time()).

For doc-only providers, do not bump the version. Instead, write the latest commit hash from the doc-only batch into providers/<provider-path>/docs/.latest-doc-only-change.txt (newline terminated). This is what breeze checks on the next release to know the provider hasn't really changed.

4b. Write the changelog entry

Open providers/<provider-path>/docs/changelog.rst. Insert a new section above the most recent existing version section. The exact format must match dev/breeze/src/airflow_breeze/templates/CHANGELOG_TEMPLATE.rst.jinja2 — don't paraphrase it. The skeleton:

rst
<NEW_VERSION>
<dots matching length of NEW_VERSION>

.. note::
    This release of provider is only available for Airflow X.Y+ as explained in the
    Apache Airflow providers support policy <https://github.com/apache/airflow/blob/main/PROVIDERS.rst#minimum-supported-version-of-airflow-for-community-managed-providers>_.

Breaking changes
~~~~~~~~~~~~~~~~

* ``<commit subject for breaking change> (#NNNN)``

Features
~~~~~~~~

* ``<commit subject for feature> (#NNNN)``

Bug Fixes
~~~~~~~~~

* ``<commit subject for bugfix> (#NNNN)``

Misc
~~~~

* ``<commit subject for misc/min_airflow_bump> (#NNNN)``

Doc-only
~~~~~~~~

* ``<commit subject for doc> (#NNNN)``

.. Below changes are excluded from the changelog. Move them to
   appropriate section above if needed. Do not delete the lines(!):
   * ``<commit subject for skip> (#NNNN)``

Rules:

  • Include the .. note:: block only when the version bump was driven by a min_airflow_bump (or by a breaking whose breaking aspect is the Airflow min bump).
  • Drop a section entirely if it has no entries (e.g. no Breaking changes section if there were none — don't leave an empty header).
  • The .. Below changes are excluded ... block at the end is required even if empty. Lines under it use the indented ` * ``...``` form (three-space indent, double backticks).
  • Subjects must be the original commit subject with backticks replaced by single quotes (matches message_without_backticks). Don't paraphrase.
  • Always keep the (#NNNN) PR suffix.

4c. Regenerate templates with breeze

Once all providers have their provider.yaml and changelog.rst updated, run:

bash
breeze release-management prepare-provider-documentation \
    --reapply-templates-only \
    --skip-git-fetch \
    --release-date "$RELEASE_DATE"

This regenerates __init__.py, README.rst, pyproject.toml, conf.py, get_provider_info.py, and index.rst for every provider — picking up the new versions you just wrote. It will not touch changelog.rst.

[!NOTE] commits.rst per provider is also stable template content (the actual commit list is rendered at doc-build time via the airflow-providers-commits directive). It will be regenerated on the next full release. No action needed here.

Phase 5 — Validate

Run the same checks the release manager would run:

bash
# RST lint + license headers + ruff on Python files
prek run --from-ref main --hook-stage pre-commit

# Spot-check that provider.yaml versions parse
breeze release-management prepare-provider-documentation \
    --reapply-templates-only --skip-git-fetch \
    --release-date "$RELEASE_DATE"   # idempotent — should be a no-op diff

Then git diff --stat and walk the release manager through the diff provider-by-provider:

  • Confirm the version in provider.yaml matches the bump rule.
  • Confirm changelog.rst has the right sections populated.
  • Flag anything where Phase 3.5 had to escalate, so the RM can double-check.

Stop here. Do not commit, do not push — the release manager opens the PR themselves following the regular release workflow in dev/README_RELEASE_PROVIDERS.md.


Incremental Update

Use this flow when the release PR has already been opened (changelog and version bumps applied via Phases 1–5) and the release manager rebases it to pick up commits that landed on main after the original classification. This is the equivalent of breeze release-management prepare-provider-documentation --incremental-update, but driven by the same AI classification logic as the initial run.

[!IMPORTANT] Run on the release PR branch after rebasing onto the latest base branch. Do not start the incremental flow on a clean checkout — it needs the prior classifications already written into changelog.rst to diff against.

Incremental Phase 1 — Refresh the apache remote

bash
breeze release-management prepare-provider-documentation \
    --reapply-templates-only \
    --release-date "$RELEASE_DATE"

This re-fetches apache-https-for-providers/<base-branch> and regenerates the auto-generated build files for every provider — picking up any upstream template changes that landed since the original PR was opened. It does not touch provider.yaml or changelog.rst.

Incremental Phase 2 — Detect new commits per provider

For each provider that already has a new version section in its changelog.rst (the providers in the release PR), get the current commit list the same way as Phase 2 of the initial run:

bash
PROVIDER_ID=<dotted.id>
PROVIDER_PATH=$(echo "$PROVIDER_ID" | tr '.' '/')   # folder path (slashes)
PROVIDER_TAG=$(echo "$PROVIDER_ID" | tr '.' '-')    # tag segment (hyphens)
# Same tag-selection rules as Phase 2: hyphenated tag segment, and skip sentinel
# (99.98.0/99.99.0) and rc tags so we compare against the last *final* release.
LAST_TAG=$(git tag --list "providers-${PROVIDER_TAG}/*" --sort=-v:refname \
    | grep -vE '/99\.9[0-9]\.' | grep -vE 'rc[0-9]+$' | head -n1)
git log --pretty=format:'%H %h %cd %s' --date=short \
    "${LAST_TAG}..apache-https-for-providers/main" \
    -- "providers/${PROVIDER_PATH}/"

Then identify new commits by comparing PR numbers to the existing changelog. A commit is "new" if its (#NNNN) PR suffix is not already present anywhere in providers/${PROVIDER_PATH}/docs/changelog.rst. This is exactly the same predicate breeze uses internally (see _generate_new_changelog append branch in dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py).

bash
CHANGELOG="providers/${PROVIDER_PATH}/docs/changelog.rst"
# pseudo: emit only commits whose #NNNN is NOT in the changelog
git log --pretty=format:'%H %h %cd %s' --date=short \
    "${LAST_TAG}..apache-https-for-providers/main" \
    -- "providers/${PROVIDER_PATH}/" \
  | python3 -c "
import re, sys
seen = open('${CHANGELOG}').read()
for line in sys.stdin:
    m = re.search(r'\(#(\d+)\)', line)
    if not m or f'(#{m.group(1)})' not in seen:
        print(line, end='')
"

If there are zero new commits for a provider, skip it.

Incremental Phase 3 — Classify the new commits

Same logic as Phase 3 of the initial run — including the auto-classify heuristic for docs/test-only changes and the sub-agent-per-PR pattern with the breaking-change checklist. The output is a per-provider table mapping each new commit hash to a classification.

Incremental Phase 3.5 — Decide whether to escalate the version bump

Compute the most-impactful classification across both the existing classified commits in the changelog and the new ones. If the most impactful is now stronger than what's already in provider.yaml, the version needs to be re-bumped. The escalation table:

Was bumped toNow most-impactful isAction
patchfeaturere-bump to next minor (X.Y+1.0)
patchmin_airflow_bumpre-bump to next minor (X.Y+1.0)
patch / minorbreakingre-bump to next major (X+1.0.0)
minorfeatureno change — already minor
anythingbugfix or miscno change

A re-bump means: replace the prepended version in provider.yaml AND update the version header in changelog.rst's new section to match.

Always confirm a re-bump with the release manager — explicitly state the old version, the new version, and which incoming commit forced the escalation. Don't silently re-bump.

Incremental Phase 4 — Apply the new entries

For each new commit, insert into the existing latest-version section of changelog.rst under the right header:

ClassificationSection
breakingBreaking changes
featureFeatures
bugfixBug Fixes
miscMisc
min_airflow_bumpMisc
documentationDoc-only
skipexcluded block at end

If the section header doesn't exist yet (e.g. previously there were no breaking changes, but a new commit introduced one), create the header above the next existing section, matching the order in CHANGELOG_TEMPLATE.rst.jinja2: Breaking changesFeaturesBug FixesMiscDoc-only.

If you re-bumped the version in Incremental Phase 3.5, also add or remove the .. note:: block about the Airflow min version requirement to match the new bump kind.

Incremental Phase 5 — Validate

Same as Phase 5 of the initial run plus an extra check: confirm there are no leftover "Please review …" markers from a prior interactive breeze release-management prepare-provider-documentation --incremental-update run. If any are present (someone ran the breeze incremental flow before invoking this skill), remove them as part of the final pass. Then walk the diff with the release manager.


Cross-Cutting Rules

PRs covering multiple providers

When a single PR touches several providers (e.g. Add Python 3.14 Support (#63520) touches dozens), classify it independently per provider. The same PR can be feature in one provider (a real new capability) and misc in another (just a constraint bump in pyproject.toml). Always scope the sub-agent's diff inspection to the current provider's path:

bash
gh pr diff <NNNN> -- 'providers/<provider-path>/**'

If the per-provider classifications come back different, do NOT try to "reconcile" them — that's a feature, not a bug. The release manager wants each provider's changelog to reflect what changed in that provider.

Asking the release manager — phrasing

When you ask, state your best guess and the alternative explicitly:

Provider amazon, commit 4cd5678 ("Remove deprecated S3Hook.list_objects" #65030): I classified this as breaking because the symbol is removed from the public API in providers/amazon/src/airflow/providers/amazon/aws/hooks/s3.py, even though the PR description says "deprecated since 8.0.0". Confirm breaking (major bump 9.x → 10.0.0) or override to misc (patch)?

Don't ask vague yes/no questions ("is this breaking?"); always offer the two alternatives with the version-bump consequence.

Things you must NOT do silently

  • Bump major version without explicit confirmation from the release manager.
  • Reclassify a commit the RM already confirmed.
  • Skip commits that don't fit a category — flag them as ? and ask.
  • Edit commits.rst, index.rst, __init__.py, README.rst, pyproject.toml, conf.py, get_provider_info.py directly. Those are template-generated by breeze.
  • Run git add or git commit — the release manager owns the PR.

When to give up and fall back to interactive breeze

If the per-provider commit count is huge (50+) and the sub-agents come back with low confidence on most of them (typically because the diffs require deep domain knowledge), tell the release manager you're stopping the AI classification and recommend they run the regular interactive breeze release-management prepare-provider-documentation for that specific provider. Don't try to power through guesswork — the wrong classification at major-bump granularity is worse than a slower manual run.


References

  • dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py — the breeze module this skill replaces (classification + changelog generation). Read this when in doubt about format.
  • dev/breeze/src/airflow_breeze/templates/CHANGELOG_TEMPLATE.rst.jinja2 — exact format for the changelog section you write in Phase 4b.
  • dev/README_RELEASE_PROVIDERS.md §"Convert commits to changelog entries and bump provider versions" — the human workflow this skill automates.
  • PROVIDERS.rst §"Upgrading minimum supported version of Airflow" — policy for min_airflow_bump classifications.