java-sdk/adr/0005-coordinator-packaging.md
Proposed
Note: This ADR describes coordinator packaging as a standalone distribution separate from the Task SDK. The current plan is to ship coordinators as part of the Task SDK (
apache-airflow-task-sdk); a separate distribution may be introduced later once the coordinator interface exits experimental status, but is not committed. This document is retained for reference if that split is revisited. Tracked operationally in apache/airflow#66451.
ADR-0001 introduces a coordinator extension point. Reviewers on PR #65958 raised three related but separable questions:
apache-airflow-providers-sdk-java (consistent with every other provider) or under
airflow.sdk.coordinators as part of the Task SDK (recognizing that "language
coordinator" is a structurally new kind of distribution that does not behave like
operators/hooks/sensors)?providers/sdk/java/ alongside other
providers, or as a subpackage of the Task SDK?ProvidersManager, or through some other mechanism?A second concern, raised separately, is runtime configuration: a single JavaCoordinator
class is not enough to express "use JDK 11 for the legacy queue and JDK 17 for the modern
queue, with different -Xmx values." Class-only registration forces operators to subclass for
every variant or hardcode environment lookups, which the issue calls out explicitly:
How can I use different JDK version? How can I use different JVM arguments? We hardcoded the subprocess cmd … so users have to subclass another Coordinator to override the Java config. — apache/airflow#66451
The Java coordinator ships as part of the Task SDK (apache-airflow-task-sdk) and is
importable as airflow.sdk.coordinators.java.JavaCoordinator. This avoids extra packaging
infrastructure (separate release cadence, testing matrix, constraints files) while the
coordinator interface is still stabilising. New language coordinators (go, typescript, …)
follow the same model.
A coordinator distribution exposes:
BaseCoordinator subclass under airflow.sdk.coordinators.<lang>.provider.yaml.airflow.sdk.coordinatorsEach coordinator contributes a subpackage to the namespace package airflow.sdk.coordinators.
The Task SDK owns the namespace; individual language coordinators add
airflow.sdk.coordinators.<lang>.
The Java coordinator therefore resolves as:
from airflow.utils.module_loading import import_string
JavaCoordinator = import_string("airflow.sdk.coordinators.java.JavaCoordinator")
Both Airflow Core (DAG processor) and the Task SDK (task runner) import coordinators by this path. The namespace package layout means the physical distribution can change in the future without altering import paths or user configuration.
[sdk] coordinators (Airflow configuration)Coordinators are not discovered through ProvidersManager /
ProvidersManagerTaskRuntime, and there is no coordinators key in provider.yaml. They are
registered as named instances in airflow.cfg:
[sdk]
coordinators = {
"jdk-11": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {
"java_executable": "/usr/lib/jvm/java-11-openjdk-amd64/bin/java",
"jvm_args": ["-Xmx512m"],
"jars_root": ["/files/legacy/lib"]
}
},
"jdk-17": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {
"java_executable": "/usr/lib/jvm/java-17-openjdk-amd64/bin/java",
"jvm_args": ["-Xmx1024m", "-Xms256m"],
"jars_root": ["/files/new/lib"]
}
}
}
queue_to_coordinator = {"legacy-java-queue": "jdk-11", "modern-java-queue": "jdk-17"}
[sdk] coordinators is a JSON object: the key is the coordinator's name (used as the routing
target in [sdk] queue_to_coordinator), and the value supplies classpath and free-form
kwargs passed to the constructor. Two entries with the same classpath (e.g., both
JavaCoordinator) but different keys and kwargs are independent instances — this is how
JDK 11 and JDK 17 tasks run on the same worker without subclassing.
provider.yaml / ProvidersManager?Coordinators are not providers in the Airflow sense:
provider.yaml registers classes, not instances, and bolting kwargs onto provider entries
would distort the provider data model.airflow providers list, etc.). On the contrary, listing
apache-airflow-providers-sdk-java next to AWS/GCP providers is misleading for users.Putting the registry in airflow.cfg keeps the data model honest (instances, with their kwargs)
and makes the per-host opt-in (install + config-edit) explicit rather than implicit
(install-implies-active).
airflow.sdk.coordinators is a namespace package owned by the Task SDK; language
coordinator modules contribute subpackages to it. Multiple coordinator modules can be
installed side by side without colliding.[sdk] coordinators carries instance-level configuration; [sdk] queue_to_coordinator
carries queue → instance routing.jdk-11 and jdk-17) can be
registered with different kwargs and bound to different queues — solving the multi-JDK and
JVM-flag use cases raised in
apache/airflow#66451 without subclassing.