docs/proposals-done/202304-query-path-tenancy.md
Owners:
Related Tickets:
Other docs:
This design doc proposes to add tenancy awareness in the query path.
In a multi-tenant environment, it is important to be able to identify which tenants are experiencing issues and configure (e.g. with different limits) each one of them individually and according to their usage of the platform so that the quality of service can be guaranteed to all the tenants.
The current lack of tenancy awareness in Thanos' query path makes it impossible to investigate issues related to multi-tenancy without the use of external tools, proxies, or deploying a full dedicated query path for each tenant (including one Query Frontend if the setup needs it, one Querier, and one Storage Gateway, to be as complete as possible). For example, it's impossible to determine which tenants are experiencing high latency or high error rates
The tenancy model for queries differ from the one used in Thanos Receive, where the tenant indicates who owns that data. From a query perspective, the tenant will indicate who is initiating such query. Although this proposal will use this tenant to automatically enforce a tenant label matcher into the query, whenever cross-tenant querying is implemented (not part of this proposal) this behavior could likely change.
Any team running Thanos in a multi-tenant environment. For example, a team running a monitoring-as-a-service platform. For example, a team running a monitoring-as-a-service platform.
tenant_id=~"A|B"), how does Thanos export a request duration metric about this query? This is out of scope and an specific proposal can be created for it.--receive.tenant-header="THANOS-TENANT" flag to configure the tenant header, adapting its name to each component. So in the Querier, the flag name will be --querier.tenant-header="THANOS-TENANT and in the Query Frontend it will be --query-frontend.tenant-header="THANOS-TENANT". This ensures we have consistency across components and make adoption easier for those already using a custom header name in Receive or coming from other alternative tools, like Cortex.--querier.tenancy="false".
--query.tenant-label-name="tenant_id" flag to identify the tenancy label.Identifying and transporting tenant information between requests:
THANOS-TENANT as default value. Consistent with the behavior of Thanos Receive.Enforcing tenancy label in queries:
tenant_id as default label name. Consistent with the behavior of Thanos Receive.The Query Frontend is an optional component on any Thanos deployment, while the Querier is always present. Plus, there might be deployments with multiple Querier layers where one or more might need to apply tenant verification and enforcement. On top of this, doing it in the Querier supports future work on using the new Thanos PromQL engine, which can potentially make the Query Frontend unnecessary.
Pros:
Cons:
While this could work for some of the features, like exporting per-tenant metrics, it would have to be inserted in front of many different components. Meanwhile, it doesn't solve the requirement for per-tenant configuration.
This incurs in a lot of wasted resources and demands manual work, unless a central tenant configuration is used and a controller is built around it to automatically manage the query paths.
--query-frontend.tenant-header="THANOS-TENANT" flag to forward the tenant ID from incoming requests to downstream query endpoints.Thanos-Tenant (hardcoded) as the internal header name for these downstream requests.--querier.tenant-header="THANOS-TENANT", --querier.default-tenant="default-tenant" to identify and forward tenant ID in internal communications.--querier.tenant-label="tenant-id" and --querier.tenancy="false" to enable tenancy, identify the tenant, verify and enforce the tenant label in queries using prom-label-proxy's Enforce.EnforceMatchers.