.agents/skills/airflow-java-sdk/SKILL.md
The Java SDK lets Airflow tasks execute JVM code (Java, Kotlin, or any JVM language). You are helping a contributor work in one or both of these locations:
java-sdk/ — the JVM-side library (Kotlin source, published to Maven)task-sdk/src/airflow/sdk/coordinators/java/ — the Python coordinator that launches the JVM subprocessRead these two documents early in every session — they contain the authoritative reference material:
airflow-core/docs/authoring-and-scheduling/language-sdks/java.rst — user-facing guide:
annotation vs. interface API, XCom type mapping, Gradle/Maven steps, coordinator config.java-sdk/README.md — contributor guide: repository layout, detailed execution walkthrough,
Gradle + Breeze test commands, coding conventions, common tasks, and PR checklist.The JVM-side library is split into two packages with distinct visibility rules:
org.apache.airflow.sdk — public, user-facing API. Classes here (e.g. Client, Bundle,
BundleBuilder, Server) are stable contracts that DAG authors and task implementers import
directly. Changes to this package are breaking changes.org.apache.airflow.sdk.execution — internal implementation detail. Everything in this
package (CoordinatorComm, LogSender, Log, Client in execution/, generated schema
models, etc.) is not intended to be imported by users. It may change between releases without
notice.When reviewing or writing code, enforce this boundary: user task code and BundleBuilder
subclasses must only import from org.apache.airflow.sdk; any import of
org.apache.airflow.sdk.execution.* in user-facing API surface is a red flag.
A bundle is a directory of JAR files (typically build/bundle/) placed on the coordinator's
jars_root. The coordinator scans the directory at task-dispatch time to find:
Main-Class (standard JAR manifest attribute) — the fully-qualified class name of the
entry point that the coordinator invokes with java -classpath … <Main-Class> --comm … --logs ….
This must be a class with a public static void main(String[] args) method; the Gradle plugin
org.apache.airflow.sdk writes it automatically from airflowBundle { mainClass = "…" } and
validates that the class exists and has the right signature at build time.
Airflow-Supervisor-Schema-Version (Airflow-specific manifest attribute) — the wire
protocol version the JVM side expects when talking to the Python supervisor. In fat-JAR mode
(the default), the Gradle plugin reads this value from the airflow-sdk JAR in
runtimeClasspath and copies it into the shadow JAR manifest. In thin-JAR mode (fatJar = false), the value stays in the airflow-sdk JAR deployed alongside the bundle JAR.
The Python coordinator (JavaCoordinator) scans every JAR under jars_root with
_JarInfo.find(), reads META-INF/MANIFEST.MF out of each ZIP, and collects Main-Class and
Airflow-Supervisor-Schema-Version from whichever JARs carry them. The resolved schema version
is then passed as the schema_version return value from _build_execute_task_command, which
the base SubprocessCoordinator uses to negotiate the supervisor wire protocol.
If main_class is set explicitly on the JavaCoordinator instance (via [sdk] coordinators
kwargs), the scan uses it as a filter; otherwise the first JAR with a Main-Class attribute
wins. Either way, Airflow-Supervisor-Schema-Version must be present in at least one JAR in
jars_root or startup fails.
| File | Purpose |
|---|---|
java-sdk/sdk/.../Client.kt | Public API (Variables, Connections, XCom) |
java-sdk/sdk/.../execution/Client.kt | Supervisor wire calls |
java-sdk/sdk/.../execution/Comm.kt | 4-byte-prefix MessagePack framing |
java-sdk/sdk/.../Server.kt | Entry-point; drives the execution loop |
java-sdk/processor/.../BuilderProcessor.kt | Kapt annotation processor |
java-sdk/plugin/.../AirflowSdkPlugin.kt | Gradle bundle plugin |
task-sdk/.../coordinators/java/coordinator.py | Python side — spawns the JVM |
task-sdk/.../schema/schema.json | Wire protocol definition (both sides) |
Always use ./gradlew from inside java-sdk/; never run Gradle via apt's gradle.
See java-sdk/README.md#testing for the full list of Gradle commands.
For the Python coordinator, use Breeze (never pytest directly on the host):
breeze testing task-sdk-tests -- task_sdk/coordinators/java
End-to-end test suite:
E2E_TEST_MODE=java_sdk uv run --project airflow-e2e-tests pytest \
tests/airflow_e2e_tests/java_sdk_tests/ -xvs
coordinator.py extends SubprocessCoordinator. The only method subclasses must implement is
_build_execute_task_command, which returns (argv, schema_version). Look at the existing
implementation for how jars_root, java_executable, jvm_args, and main_class are
assembled into the command. Do not reach into the JVM process from Python beyond what this
method provides.
When upgrading to a newer Supervisor Schema version:
./gradlew generateJsonSchema2Pojoexecution/Client.kt to handle changesThe java-sdk/README.md#contributing section walks through the full "adding a new Client
method" sequence step by step.