docs/usage_limits.md
Server-side guardrails that bound the resources a single Weaviate instance can consume. Designed for the upcoming Weaviate Cloud Free Tier; usable in any deployment that fits the supported deployment shape (see Scope below).
All limits are opt-in: env vars unset means no enforcement.
Source of truth for the design: the RFC. This file is the codebase-internal pointer that explains what is implemented and where the hooks live; the RFC has the full rationale and out-of-scope discussion.
| Variable | Type | Default | Effect |
|---|---|---|---|
USAGE_LIMITS_ERROR_MESSAGE | string | "{limit} count limit of {value} reached for this instance." | Operator-overridable template for the user-facing error message. Placeholders: {limit} (resource type) and {value} (configured threshold). |
MAXIMUM_ALLOWED_OBJECTS_COUNT | int | -1 (unlimited) | Cap on live object count, summed across all loaded local shards (node-wide). Checked on every single + batch insert at the storage chokepoint. |
MAXIMUM_ALLOWED_COLLECTIONS_COUNT | int | -1 (unlimited) | Cap on number of collections. Pre-existing env var; behavior preserved. |
MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION | int | -1 (unlimited) | Cap on tenants per multi-tenant collection. Checked at tenant-create time only. |
MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION | int | -1 (unlimited) | Cap on desiredCount of a class create request's shardingConfig. Config-time check. |
All values are runtime-overrideable via the existing runtime overrides YAML file (see RUNTIME_OVERRIDES_*). Field names in the YAML are the lowercase-snake-case forms of the env-var names.
REPLICATION_MAXIMUM_FACTOR=1The object/tenant/shard caps only work in the RF=1 deployment shape (see Scope below). When any of MAXIMUM_ALLOWED_OBJECTS_COUNT, MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION, or MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION is set, you must also set REPLICATION_MAXIMUM_FACTOR=1. Startup fails otherwise. REPLICATION_MAXIMUM_FACTOR also caps the per-class replicationConfig.factor for new classes, so the invariant holds at runtime too.
MAXIMUM_ALLOWED_COLLECTIONS_COUNT is not part of the linkage — it predates this RFC and tying it would break existing operators.
| Limit | Hook | File |
|---|---|---|
| Objects (single) | Shard.PutObject (top of function, before LSM write) | adapters/repos/db/shard_write_put.go |
| Objects (batch) | Shard.PutObjectBatch (top of function) | adapters/repos/db/shard_write_batch_objects.go |
| Collections | usecases/schema/class.go AddClass() | usecases/schema/class.go |
| Tenants | usecases/schema/tenant.go AddTenants() | usecases/schema/tenant.go |
| Shards | usecases/schema/class.go AddClass() (sharding-config validation) | usecases/schema/class.go |
The object check sits at the storage chokepoint rather than at the use-case layer. That covers both writes that arrive locally and writes that were forwarded from another node — both converge at Shard.PutObject{,Batch} on the home node for RF=1. The use-case layer (usecases/objects/) does not enforce the object cap; that hook was deliberately removed when we moved the chokepoint deeper.
The schema-side limits (collections, tenants, shards) stay at the use-case layer because that's the single coordinator path — AddClass/AddTenants go through RAFT, no forwarded-write concern.
The object count is node-wide across local shards: the manager sums each loaded shard's bucket.CountAsync() (adapters/repos/db/lsmkv/bucket.go) on every enforced write. Each CountAsync() is O(segments-per-shard) — it walks the live segment list and sums each segment's already-loaded net-additions counter, no I/O. For the Free-Tier shape (few shards, few segments) that's a handful of atomic reads on the hot path.
We deliberately don't route through UsageForIndex — that path triggers other usage-module computations beyond a count.
When any limit is hit, Weaviate returns:
429 Too Many Requests with body
{
"errorCode": "USAGE_LIMIT_EXCEEDED",
"limit": "objects",
"value": 10000,
"message": "Object count limit of 10000 reached for this instance."
}
codes.ResourceExhausted with errdetails.ErrorInfo carrying the same limit/value/message fields under Reason="USAGE_LIMIT_EXCEEDED", Domain="weaviate.usagelimits".The structured fields (errorCode, limit, value) are stable contract regardless of the USAGE_LIMITS_ERROR_MESSAGE template — only the human-facing message text changes.
When a batch insert would exceed MAXIMUM_ALLOWED_OBJECTS_COUNT, the shard-slice is rejected as a unit:
Index.putObjectBatch partitions a client batch by shard before forwarding, so the chokepoint sees one slice per shard. A single client batch can therefore produce per-shard partial success on multi-shard collections — accepted under our current scope (see Scope below).Supported deployment shapes (where the cap is meaningful and exact):
Out of scope:
Shard.PutObject{,Batch} (it goes through shard_replication.go's preparePutObject{,s} → s.putOne / s.putBatch directly). Supporting RF>1 would require either dropping the check one level deeper or a smarter scheme like a lease-based quota.cap × min(N_collections, N_nodes_with_shards)). Not a deployment shape we ship the cap in.The pre-existing MAXIMUM_ALLOWED_COLLECTIONS_COUNT enforcement previously returned HTTP 422 Unprocessable Entity with a free-text "maximum number of collections" message. As of this release it returns HTTP 429 with the structured USAGE_LIMIT_EXCEEDED body described above. Clients matching on the prior 422 status or message text must adapt.
CountAsync and exclude the in-memory memtable, so during fast bulk imports the count lags slightly behind on-disk state. Bounded by in-flight write volume between count refreshes; self-corrects on the next flush. Sync counting on every write would scan the entire memtable — wasteful at the 10K free-tier scale, fatal at 10M+ scale.AddClass/AddTenants call, so two concurrent creates can both pass the check. Bounded overshoot; next request is correctly rejected.A second class of opt-in guardrails that constrain what kind of class an operator's tenants may create — distinct from the usage limits above, which cap how much state they can produce. Like usage limits, these are unset by default; existing deployments are unaffected.
| Variable | Type | Default | Effect |
|---|---|---|---|
ALLOWED_VECTOR_INDEX_TYPES | comma-separated list | unset (no restriction) | Allow-list for class vectorIndexType and named-vector vectorConfig[*].vectorIndexType. Valid entries: hnsw, flat, dynamic, hfresh. |
ALLOWED_COMPRESSION_TYPES | comma-separated list | unset (no restriction) | Allow-list for the compression configured on a class's vector index. Valid entries: none, pq, sq, rq-1, rq-8, bq (same names accepted by DEFAULT_QUANTIZATION). Hfresh classes are exempt — hfresh has no compression knobs. |
RESTRICTIONS_ERROR_MESSAGE | string | "{value} is not allowed for {restriction}. Allowed values: {allowed}." | Operator-overridable template for the user-facing message. Placeholders: {restriction}, {value}, {allowed}. |
All three are runtime-overrideable via the runtime overrides YAML (allowed_vector_index_types, allowed_compression_types, restrictions_error_message).
Validated at startup in Config.Validate() (usecases/config/config_handler.go):
DEFAULT_VECTOR_INDEX / DEFAULT_QUANTIZATION) must either be unset (in which case it is seeded to the single value) or match it.ALLOWED_VECTOR_INDEX_TYPES=hfresh (only) paired with a non-empty ALLOWED_COMPRESSION_TYPES is rejected at startup — hfresh has no compression. Compression alongside hfresh in a mixed allow-list (e.g. hfresh,hnsw) is allowed because the non-hfresh members still need a compression policy.# Force everyone to a single vector index type.
ALLOWED_VECTOR_INDEX_TYPES=hfresh
# DEFAULT_VECTOR_INDEX is seeded to "hfresh"; DEFAULT_QUANTIZATION and
# ALLOWED_COMPRESSION_TYPES must remain unset.
# Allow hfresh + hnsw with a forced compression on the hnsw side.
ALLOWED_VECTOR_INDEX_TYPES=hfresh,hnsw
DEFAULT_VECTOR_INDEX=hfresh # must be set: multi-entry list
ALLOWED_COMPRESSION_TYPES=rq-8
DEFAULT_QUANTIZATION=rq-8 # seeded if unset
# Maximum performance, cost no object: hnsw only, no compression.
ALLOWED_VECTOR_INDEX_TYPES=hnsw
ALLOWED_COMPRESSION_TYPES=none
# Defaults seeded to "hnsw" and "none" respectively.
| Restriction | Hook | File |
|---|---|---|
| Vector index type (legacy + named) | Handler.validateVectorIndexType | usecases/schema/class.go |
| Compression (legacy + named) | Handler.validateAllowedCompression (invoked from validateVectorSettings) | usecases/schema/class.go |
The compression check inspects user-supplied config only; the default compression applied later (in enableQuantization) is guaranteed by startup validation to be in the allow-list, so a request that arrives with no compression block still produces a compatible class.
When a class create/update violates a restriction:
422 Unprocessable Entity with body
{
"errorCode": "CONFIG_NOT_ALLOWED",
"restriction": "compression",
"value": "pq",
"allowed": ["rq-8"],
"message": "pq is not allowed for compression. Allowed values: rq-8."
}
codes.FailedPrecondition with errdetails.ErrorInfo carrying the same fields under Reason="CONFIG_NOT_ALLOWED", Domain="weaviate.restrictions".The errorCode, restriction, value, and allowed fields are stable wire contract; the message is rendered from RESTRICTIONS_ERROR_MESSAGE and varies across deployments. Example operator override:
RESTRICTIONS_ERROR_MESSAGE=Invalid config: {value} for {restriction} is not allowed on this tier — please upgrade.
{"pq": {"enabled": false}} is treated identically to a class with no compression block at all — both fall through to the default, which startup validation already vetted against the allow-list. The only way to opt out of all compression is skipDefaultQuantization: true, which the validator surfaces as the value none.vectorIndexType is hfresh.