docs/usage_limits.md
Server-side guardrails that bound the resources a single Weaviate instance can consume. Designed for the upcoming Weaviate Cloud Free Tier; usable in any deployment that fits the supported deployment shape (see Scope below).
All limits are opt-in: env vars unset means no enforcement.
Source of truth for the design: the RFC. This file is the codebase-internal pointer that explains what is implemented and where the hooks live; the RFC has the full rationale and out-of-scope discussion.
| Variable | Type | Default | Effect |
|---|---|---|---|
USAGE_LIMITS_ERROR_MESSAGE | string | "{limit} count limit of {value} reached for this instance." | Operator-overridable template for the user-facing error message. Placeholders: {limit} (resource type) and {value} (configured threshold). |
MAXIMUM_ALLOWED_OBJECTS_COUNT | int | -1 (unlimited) | Cap on live object count, summed across all loaded local shards (node-wide). Checked on every single + batch insert at the storage chokepoint. |
MAXIMUM_ALLOWED_COLLECTIONS_COUNT | int | -1 (unlimited) | Cap on number of collections. Pre-existing env var; behavior preserved. |
MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION | int | -1 (unlimited) | Cap on tenants per multi-tenant collection. Checked at tenant-create time only. |
MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION | int | -1 (unlimited) | Cap on desiredCount of a class create request's shardingConfig. Config-time check. |
All values are runtime-overrideable via the existing runtime overrides YAML file (see RUNTIME_OVERRIDES_*). Field names in the YAML are the lowercase-snake-case forms of the env-var names.
REPLICATION_MAXIMUM_FACTOR=1The object/tenant/shard caps only work in the RF=1 deployment shape (see Scope below). When any of MAXIMUM_ALLOWED_OBJECTS_COUNT, MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION, or MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION is set, you must also set REPLICATION_MAXIMUM_FACTOR=1. Startup fails otherwise. REPLICATION_MAXIMUM_FACTOR also caps the per-class replicationConfig.factor for new classes, so the invariant holds at runtime too.
MAXIMUM_ALLOWED_COLLECTIONS_COUNT is not part of the linkage — it predates this RFC and tying it would break existing operators.
| Limit | Hook | File |
|---|---|---|
| Objects (single) | Shard.PutObject (top of function, before LSM write) | adapters/repos/db/shard_write_put.go |
| Objects (batch) | Shard.PutObjectBatch (top of function) | adapters/repos/db/shard_write_batch_objects.go |
| Collections | usecases/schema/class.go AddClass() | usecases/schema/class.go |
| Tenants | usecases/schema/tenant.go AddTenants() | usecases/schema/tenant.go |
| Shards | usecases/schema/class.go AddClass() (sharding-config validation) | usecases/schema/class.go |
The object check sits at the storage chokepoint rather than at the use-case layer. That covers both writes that arrive locally and writes that were forwarded from another node — both converge at Shard.PutObject{,Batch} on the home node for RF=1. The use-case layer (usecases/objects/) does not enforce the object cap; that hook was deliberately removed when we moved the chokepoint deeper.
The schema-side limits (collections, tenants, shards) stay at the use-case layer because that's the single coordinator path — AddClass/AddTenants go through RAFT, no forwarded-write concern.
The object count is node-wide across local shards: the manager sums each loaded shard's bucket.CountAsync() (adapters/repos/db/lsmkv/bucket.go) on every enforced write. Each CountAsync() is O(segments-per-shard) — it walks the live segment list and sums each segment's already-loaded net-additions counter, no I/O. For the Free-Tier shape (few shards, few segments) that's a handful of atomic reads on the hot path.
We deliberately don't route through UsageForIndex — that path triggers other usage-module computations beyond a count.
When any limit is hit, Weaviate returns:
429 Too Many Requests with body
{
"errorCode": "USAGE_LIMIT_EXCEEDED",
"limit": "objects",
"value": 10000,
"message": "Object count limit of 10000 reached for this instance."
}
codes.ResourceExhausted with errdetails.ErrorInfo carrying the same limit/value/message fields under Reason="USAGE_LIMIT_EXCEEDED", Domain="weaviate.usagelimits".The structured fields (errorCode, limit, value) are stable contract regardless of the USAGE_LIMITS_ERROR_MESSAGE template — only the human-facing message text changes.
When a batch insert would exceed MAXIMUM_ALLOWED_OBJECTS_COUNT, the shard-slice is rejected as a unit:
Index.putObjectBatch partitions a client batch by shard before forwarding, so the chokepoint sees one slice per shard. A single client batch can therefore produce per-shard partial success on multi-shard collections — accepted under our current scope (see Scope below).Supported deployment shapes (where the cap is meaningful and exact):
Out of scope:
Shard.PutObject{,Batch} (it goes through shard_replication.go's preparePutObject{,s} → s.putOne / s.putBatch directly). Supporting RF>1 would require either dropping the check one level deeper or a smarter scheme like a lease-based quota.cap × min(N_collections, N_nodes_with_shards)). Not a deployment shape we ship the cap in.The pre-existing MAXIMUM_ALLOWED_COLLECTIONS_COUNT enforcement previously returned HTTP 422 Unprocessable Entity with a free-text "maximum number of collections" message. As of this release it returns HTTP 429 with the structured USAGE_LIMIT_EXCEEDED body described above. Clients matching on the prior 422 status or message text must adapt.
CountAsync and exclude the in-memory memtable, so during fast bulk imports the count lags slightly behind on-disk state. Bounded by in-flight write volume between count refreshes; self-corrects on the next flush. Sync counting on every write would scan the entire memtable — wasteful at the 10K free-tier scale, fatal at 10M+ scale.AddClass/AddTenants call, so two concurrent creates can both pass the check. Bounded overshoot; next request is correctly rejected.