ARCHITECTURE.md
Apache Pulsar is a distributed pub-sub messaging and streaming platform. The codebase is performance-critical, heavily asynchronous, and concurrency-sensitive (brokers, storage, networking).
The authoritative documentation lives at https://pulsar.apache.org — see the Architecture Overview for the conceptual model. For a deeper, generated architecture description see DeepWiki; coding agents can install the DeepWiki MCP for richer coverage of Pulsar's architecture. This file is a map of the repository for contributors (and AI coding agents) who need to find their way around the modules quickly.
Pulsar separates a stateless serving layer (brokers) from durable storage (Apache BookKeeper) and a metadata store (Oxia / ZooKeeper). The Gradle modules layer accordingly:
pulsar-client-api, pulsar-client-admin-api — public, backward-compatible interfaces only.
pulsar-client-api-v5 / pulsar-client-v5 are the newer V5 client API (PIP-466/468).pulsar-client (:pulsar-client-original) — the Java client implementation
(producer/consumer/reader, connection pooling). pulsar-client-admin implements the admin REST
client.pulsar-common — wire protocol and shared types. Protobuf / lightproto messages are
generated into generated-lightproto/ / generated-sources/ (excluded from checkstyle and
spotless).pulsar-metadata — pluggable metadata store abstraction (Oxia / ZooKeeper, plus RocksDB and memory)
used by broker and bookkeeper.managed-ledger — the storage abstraction over Apache BookKeeper: append-only ledgers +
cursors that track consumer/subscription positions. This is the durability layer the broker reads
and writes through.pulsar-broker — the server. PulsarService is the composition root wiring everything
together; BrokerService manages topics, subscriptions, and client connections. Entry points:
PulsarBrokerStarter (broker), PulsarStandalone / PulsarStandaloneStarter (all-in-one),
PulsarClusterMetadataSetup (cluster init).pulsar-proxy — optional proxy/gateway in front of brokers.pulsar-functions/* — serverless compute (Functions): proto, api-java, instance,
runtime, worker, localrun.pulsar-io/* — connector framework core only; most built-in connectors were moved to the
separate pulsar-connectors repo (PIP-465).pulsar-transaction/* — transaction coordinator and common types.tiered-storage/*, offloaders/ — offload ledger data to cloud/filesystem storage.pulsar-websocket — WebSocket-to-Pulsar bridge. pulsar-client-tools — the
pulsar-admin / pulsar-client CLIs.pulsar-client-shaded, pulsar-client-all,
pulsar-client-admin-shaded produce relocated fat jars; distribution/* assembles
server/shell/offloader tarballs.pip/)The pip/ directory holds Pulsar Improvement Proposals (pip-<N>.md) — the design
documents for significant changes, referenced as PIP-<N> throughout commit messages and code (e.g.
PIP-463 = Maven→Gradle migration, PIP-465 = IO connectors moved out, PIP-466/468 = V5 client).
pip/README.md describes the process and pip/TEMPLATE.md is the proposal template. Consult the
relevant PIP for the rationale behind a non-trivial feature or architectural decision. A PIP number
is reserved by the first [email protected] thread that uses it — start the discussion to claim
the next free number.
Pulsar does not have a clearly established, documented concurrency model, which makes it hard to
evaluate whether a given piece of code is correct by construction. (Contrast Netty, which has a clear
rule: all handling on the IO thread is non-blocking, which by extension means avoiding synchronization
and locks on that path.) Pulsar does not strictly follow such a rule; modern JVMs and hardware
optimize synchronized code well enough that this has not blocked high performance, but it does make
reasoning about correctness harder than it needs to be.
Conventions that should be documented (and largely are not yet):
CompletableFuture's completion thread should be switched to another thread, and which one;Until such a model is written down, follow the surrounding code's conventions and the Java-Memory-Model
rules in CODING.md. Once a model is defined, it becomes far more tractable to
"lift and shift" existing code toward it and enforce the rules consistently rather than having each
contributor rediscover the conventions case by case.
Closely tied to the concurrency model is backpressure — how the system avoids accepting more work than it can handle, particularly with respect to memory. The memory side is described in PIP-442 "Existing Broker Memory Management". Broader backpressure (beyond memory) is not yet documented and would benefit from being defined alongside the concurrency model.
Apache Pulsar uses a Gradle build (migrated from Maven via PIP-463; some older tooling and docs
elsewhere still reference Maven). The wrapper ./gradlew requires JDK 21 or 25 (bytecode targets
Java 17). See CONTRIBUTING.md → Building for the build and lint commands.
settings.gradle.kts — all modules, organized in dependency tiers (Tier 0 has no internal deps,
higher tiers build on lower ones).build-logic/conventions/ — convention plugins (pulsar.java-conventions,
pulsar.code-quality-conventions, pulsar.shadow-conventions, etc.) applied by modules. Shared
compile/test/dependency config lives here — edit it here rather than duplicating across modules.gradle/libs.versions.toml — version catalog (single source of truth for dependency versions;
referenced as libs.* in build scripts).pulsar-dependencies — enforced platform (BOM) pinning all dependency versions; applied to every
module.The build enables both the configuration cache (org.gradle.configuration-cache=true) and
configure-on-demand (org.gradle.configureondemand=true).
Several Gradle project paths do not match their directory because the Maven artifactId is preserved. Most importantly:
pulsar-client/ → project :pulsar-client-originalpulsar-client-admin/ → project :pulsar-client-admin-originalpulsar-functions/localrun/ → project :pulsar-functions:pulsar-functions-local-runner-originalAlways use the Gradle project path (left of any --tests), e.g. ./gradlew :pulsar-client-original:test.
Check settings.gradle.kts when a path is ambiguous.
When editing build-logic/, settings.gradle.kts, a module build.gradle.kts, gradle.properties,
gradle/libs.versions.toml, or the pulsar-dependencies platform:
build-logic/conventions/, not per-module.gradle/libs.versions.toml (libs.* / pulsar-dependencies) — never
hardcode a version in a build script.Project access in task actions — use Provider / value
sources, and verify with --configuration-cache. Tasks reached by the common flows (assemble,
test, integrationTest, rat / spotlessCheck / checkstyle*, checkBinaryLicense, docker*)
must be compatible; one-off tooling tasks not part of those flows (e.g. verifyTestGroups, ad-hoc
report tasks) may be exempt.pulsar.public-java-library-conventions../gradlew checkBinaryLicense and update the distribution
LICENSE/NOTICE; justify any genuinely new dependency (see
CODING.md → Dependencies).Before finishing a build change, confirm the affected task and ./gradlew help run clean with
--configuration-cache, and that assemble and rat spotlessCheck checkstyleMain checkstyleTest pass
(plus checkBinaryLicense if a dependency changed).