pip/pip-447.md
This PIP proposes a mechanism to selectively expose existing Pulsar Topic Properties as Prometheus metrics labels. Instead of introducing a new mechanism for custom labels, Administrators will define a set of allowed property keys at both the broker level and namespace level. If a topic has a property matching one of these allowed keys, its value will be automatically sanitized and injected as a Prometheus label. This enables granular metric filtering and alerting using the metadata users are already maintaining, while managing cardinality through centralized configuration.
Currently, Pulsar topic metrics exposed to Prometheus have a fixed set of labels: cluster, namespace, and topic. This limits users' ability to categorize topics for alerting and dashboarding based on custom criteria (e.g., SLA tier, data sensitivity, application owner) that are not easily expressed through topic names alone. Relying on regular expressions (regex) applied to the topic label in Prometheus for such grouping is often complex, inefficient, and error-prone.
This limitation can lead to:
Imprecise Alerting: Difficulty in setting distinct alerting thresholds for different categories of topics.
Alert Fatigue or Missed Alerts: Overly broad or overly complex alerting rules.
Operational Overhead: Increased effort in managing and maintaining Prometheus alert configurations.
Users require a native Pulsar mechanism to inject queryable, custom metadata directly into topic metrics to improve alerting precision, simplify dashboarding, and enhance overall observability.
Users often already attach this metadata to topics using Topic Properties (e.g., sla_tier=gold, owner=team_a). However, these properties are currently invisible to the monitoring layer.
The primary goals of this proposal are to:
Allow administrators to define a configurable list of allowed topic property keys in the broker configuration.
Allow administrators to define namespace-level allowed topic property keys via admin API, providing finer-grained control per namespace.
Update the Prometheus metrics generation logic to retrieve values from the existing Topic Properties map and inject them as labels if they match the allowed list (either broker-level or namespace-level).
Provide robust control over Prometheus metric cardinality by strictly enforcing the allowed-list.
Configuration Implementation: Add parameters to broker.conf to define the broker-level allow-list (e.g., allowedTopicPropertyKeysForMetrics). Add namespace policy settings to enable namespace-level allow-list configuration via admin API.
Metrics Generation Modification: Update the PrometheusMetricsServlet (or equivalent exporter) to:
Read the topic's existing properties.
Filter keys against the configured allow-list (merge of broker-level and namespace-level).
Append them to the outgoing Prometheus metric lines.
We utilize the existing Map<String, String> properties in topic metadata. No new storage is needed.
The Prometheus metrics generation component will be modified to:
allowedTopicPropertyKeysForMetrics from broker config and namespace policy (namespace-level settings override broker-level)To ensure compatibility with Prometheus and prevent conflicts with internal metrics, custom metric label keys must pass the following validation rules:
Non-empty: The label key must not be null or empty.
Valid characters: Must match the regex [a-zA-Z_][a-zA-Z0-9_]* (starts with a letter or underscore, followed by letters, digits, or underscores).
Reserved prefixes (internal use):
__ are reserved for Prometheus internal use.pulsar_ or pulsar. are reserved for Pulsar internal use.otel_ or otel. are reserved for OpenTelemetry internal use.If a topic property key matches the allowed list but fails validation, it will be rejected with an error message indicating the invalid key and the expected format.
The following new configuration parameters will be introduced in broker.conf:
exposeCustomTopicMetricLabelsEnabled=(true|false)
Description: Enables or disables the custom topic metric labels feature.
Default: false
allowedTopicPropertyKeysForMetrics=<key1>,<key2>,...
Description: A comma-separated list of Topic Property keys that are allowed to be exposed as metrics. Only keys explicitly listed here will be exposed.
Default: Empty string (if the feature is enabled but no keys are defined, no custom metric labels can be exposed).
Namespace-level configuration allows per-namespace control of which topic properties can be exposed as metrics. The namespace-level settings will override the broker-level settings for that specific namespace.
REST API:
POST /admin/v2/namespaces/{tenant}/{namespace}/allowedTopicPropertyKeysForMetrics - Set allowed properties keysGET /admin/v2/namespaces/{tenant}/{namespace}/allowedTopicPropertyKeysForMetrics - Get allowed properties keysDELETE /admin/v2/namespaces/{tenant}/{namespace}/allowedTopicPropertyKeysForMetrics - Remove allowed properties keysCLI Examples:
# Set allowed properties keys at the namespace level
pulsar-admin namespaces set-allowed-topic-property-keys-for-metrics \
--keys sla_tier,owner,environment \
my-tenant/my-namespace
# Get allowed properties keys at the namespace level
pulsar-admin namespaces get-allowed-topic-property-keys-for-metrics my-tenant/my-namespace
# Remove allowed properties keys at the namespace level, it will fall back to broker-level configuration
pulsar-admin namespaces remove-allowed-topic-property-keys-for-metrics my-tenant/my-namespace
Java Admin API:
public interface Namespaces {
void setAllowedTopicPropertyKeysForMetrics(String namespace, Set<String> allowedKeys) throws PulsarAdminException;
Set<String> getAllowedTopicPropertyKeysForMetrics(String namespace) throws PulsarAdminException;
void removeAllowedTopicPropertyKeysForMetrics(String namespace) throws PulsarAdminException;
}
Disabled by Default: The feature will be disabled by default (exposeCustomTopicMetricLabelsEnabled=false).
Existing Pulsar deployments will see no change in behavior or metric format.
No Impact if Unused: If the feature is enabled but allowedTopicPropertyKeysForMetrics is not configured or no labels are set on topics, metrics will remain unchanged.
Existing APIs: Existing pulsar-admin commands and REST APIs are unaffected.
Prometheus Systems: If a Pulsar broker with this feature enabled sends metrics with custom metric labels to an older Prometheus server or a monitoring system not expecting these additional labels, those systems will typically ignore the extra labels without issue.
Future Enhancements: Future Pulsar versions could extend this feature, for example, by allowing more dynamic management of allowedTopicPropertyKeysForMetrics if deemed safe and necessary.
OpenTelemetry Alignment: The key-value structure of custom metric labels aligns well with OpenTelemetry attributes, ensuring that this feature remains relevant and compatible with Pulsar's evolving metrics infrastructure.
A comprehensive testing strategy will be required:
Test the filtering logic against allowedTopicPropertyKeysForMetrics.
Integration Tests:
Verify correct setting, getting, and removing of custom metric labels via admin tools and REST APIs.
Test validation logic for allowed keys.
End-to-End Flow: Set a property via pulsar-admin topics update-properties, scrape the metrics endpoint, and verify the label appears.
Dynamic Updates: Verify that updating a property value changes the metric label value in subsequent scrapes.
Removal: Verify that removing a property removes the label.
Documentation updates will include:
allowedTopicPropertyKeysForMetricsBriefly, two other approaches were considered and rejected:
A. Single Composite Tag Label:
Description: Exposing a list of user-defined tags as a single, comma-separated string label (e.g., custom_tags="tagA,tagB,tagC").
Reason for Rejection: This approach can lead to extremely high cardinality of the label value itself if tag combinations are diverse. It also necessitates complex and less performant regex queries in Prometheus and loses the semantic key-value structure.
B. Prometheus Relabeling with External Metadata:
Description: Keeping Pulsar metrics unchanged and using Prometheus's relabel_configs to enrich metrics with labels from an external metadata source (e.g., a file or a separate API).
Reason for Rejection: This shifts the implementation complexity and maintenance burden to the Prometheus configuration and external systems. It introduces risks of stale metadata and potential performance overhead on Prometheus. Crucially, it is not a Pulsar-native solution, which is the aim of this proposal.
C. Store label in topic policy
Description: Instead of using topic properties, we can store the custom metric labels in topic policies.
Reason for Rejection: Storing labels in topic policies adds complexity to the topic management and requires additional code to handle the policy updates.