pip/pip-465.md
Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra, Elasticsearch, JDBC, Debezium,
etc.) as part of its main repository. These connectors are packaged as NAR files and bundled into
a pulsar-all Docker image alongside the core broker, client, and functions runtime.
Each connector brings its own dependency tree — often large and conflicting with other connectors
or with Pulsar's core dependencies. The connectors interact with Pulsar exclusively through the
stable pulsar-io-core API, making them natural candidates for independent development and release.
The primary goal of this PIP is to make development of Pulsar easier by shrinking the core codebase. Removing ~30 connectors and their dependency trees from the main repository will massively improve compile time, test execution time, CI resource consumption, and CI stability.
Build and CI impact. Compiling and packaging 30+ connector NARs adds significant time to every CI run and local build, even when a developer is only working on the broker or client. The connectors collectively bring hundreds of transitive dependencies into the build graph, which slows down dependency resolution, inflates vulnerability reports (OWASP checks must scan connector dependencies), and creates version conflicts that require careful management in the main repository's BOM. Removing them dramatically reduces the surface area of the build.
Release coupling. Connectors are tied to the Pulsar release cycle. A bug fix in a single connector (e.g., updating the Elasticsearch client) requires waiting for the next Pulsar release. Conversely, a Pulsar patch release must rebuild all connectors even when none of them changed. The release cadence for connectors will be independent from Pulsar releases, similar to what we already do for client SDKs (Go, Python, Node.js).
Low integration risk. The pulsar-io-core API that connectors depend on has been very
stable for a long time. There have been no breaking changes to the connector API in years,
so there is essentially no risk of integration pain from this split.
Docker image bloat. The pulsar-all image bundles every connector NAR, weighing in at
~2.9 GB — a very large image that most deployments don't need. Users typically deploy only
1-2 connectors but pay the image pull cost for all of them. The main reason users chose
pulsar-all over
pulsar was to get the tiered-storage offloaders — this PIP addresses that by packaging the
offloader NARs directly into the pulsar image. Users who need specific connectors can still
build tailored images by adding just the connector NARs they need on top of apachepulsar/pulsar.
Independent velocity. Connector maintainers should be able to release new connector versions against a stable Pulsar API without coordinating with the core release train.
Create apache/pulsar-connectors repository containing all IO connector modules, with
their own Gradle build, version catalog, and CI pipeline. The repository is forked from the
main Pulsar repository to preserve full git history.
Remove connector modules from the main Pulsar repository. Retain only:
pulsar-io-core (the connector API)pulsar-io-data-generator (minimal connector used in integration tests)Remove the pulsar-all Docker image. The image is too large and most users don't need
all connectors in a single image. The pulsar image becomes the single official image.
Tiered-storage offloader NARs — the main reason users chose pulsar-all — are included
directly in the pulsar image.
Independent connector releases. The pulsar-connectors repository has its own versioning
and release cadence, independent from Pulsar releases — similar to what we already do for
client SDKs. It can release new connector versions against any compatible Pulsar release.
Connector distribution packaging. The connectors repository produces a single release containing all connector NARs, as a distribution tarball that users can deploy into an existing Pulsar installation.
pulsar-io-core)The split creates two repositories from what is currently one:
apache/pulsar (main repo)
├── pulsar-io/core/ # Connector API (retained)
├── pulsar-io/data-generator/ # Test connector (retained)
├── pulsar-functions/ # Runtime + worker (retained)
├── docker/pulsar/ # Single Docker image
└── (broker, client, etc.)
apache/pulsar-connectors (new repo)
├── aerospike/
├── aws/
├── cassandra/
├── debezium/
│ ├── core/
│ ├── mysql/
│ ├── postgres/
│ └── ...
├── elastic-search/
├── jdbc/
│ ├── core/
│ ├── postgres/
│ └── ...
├── kafka/
├── kafka-connect-adaptor/
├── kinesis/
├── rabbitmq/
├── ... (all other connectors)
├── distribution/io/ # Distribution packaging
└── docs/ # Connector docs generation
The connectors repository consumes Pulsar artifacts (pulsar-io-core, pulsar-client, etc.)
as external Maven dependencies, not as source dependencies. This ensures connectors build against
the published API and don't accidentally depend on internals.
The new pulsar-connectors repository is forked from the main Pulsar repository to preserve
git history, then trimmed to contain only connector-related modules. Connectors are promoted
from nested pulsar-io/<name> paths to top-level <name>/ directories for a flatter structure.
The connectors repository has its own:
settings.gradle.kts with all connector modulesgradle/libs.versions.toml with connector-specific dependency versionspulsar-dependencies/ platform module pinning Pulsar artifact versionsbuild.gradle.kts root build with shared configurationPulsar core artifacts are declared as dependencies with a configurable version:
implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
The initial release of pulsar-connectors will use the same version as the next Pulsar
release (whether that is 4.3 or 5.0), to make the transition clear. After that, the
connectors repository follows its own independent release cadence.
All connectors are released together as a single release (not individually), and each
release specifies which Pulsar versions it is compatible with.
The pulsar-all image is removed. It bundled all connector NARs alongside the broker,
producing a very large image that most deployments didn't need. The main reason users chose
pulsar-all over pulsar was to get the tiered-storage offloaders. With this change:
pulsar image, eliminating the primary reason
for pulsar-all to existpulsar Docker image becomes the single official image, containing the broker, functions
runtime, and tiered-storage offloader NARsapachepulsar/pulsar, or mount them via volume mountsdata-generator for testing the
connector loading and runtime machineryUsers who currently use pulsar-all Docker image:
pulsar Docker image/pulsar/connectors/)Users who build from source:
| Before | After |
|---|---|
pulsar — core only | pulsar — core + tiered-storage offloaders |
pulsar-all — core + all connectors + offloaders | (removed) |
pulsar-connectors repositoryNo changes to broker, client, or functions worker configuration.
The connector API (pulsar-io-core) does not change. Existing connector NARs continue
to work with the functions worker without modification.
The pulsar-io-core API has been very stable for years with no breaking changes, so connectors
built against older API versions will continue to work with newer Pulsar releases and vice versa.
New connector releases can target older Pulsar versions, as long as the pulsar-io-core
API they depend on is compatible. Given the long track record of API stability, this is
expected to work seamlessly across Pulsar 4.x releases.
No security implications. Connectors continue to be loaded through the same NAR classloader isolation mechanism. The split does not change the security model.
Separating connector dependencies from the main repository actually improves security posture by reducing the attack surface of the core Pulsar build and making connector dependency updates independently releasable.