Back to Weaviate

Usage Limits / Free-Tier Guardrails

docs/usage_limits.md

1.37.48.0 KB
Original Source

Usage Limits / Free-Tier Guardrails

Server-side guardrails that bound the resources a single Weaviate instance can consume. Designed for the upcoming Weaviate Cloud Free Tier; usable in any deployment that fits the supported deployment shape (see Scope below).

All limits are opt-in: env vars unset means no enforcement.

Source of truth for the design: the RFC. This file is the codebase-internal pointer that explains what is implemented and where the hooks live; the RFC has the full rationale and out-of-scope discussion.

Environment variables

VariableTypeDefaultEffect
USAGE_LIMITS_ERROR_MESSAGEstring"{limit} count limit of {value} reached for this instance."Operator-overridable template for the user-facing error message. Placeholders: {limit} (resource type) and {value} (configured threshold).
MAXIMUM_ALLOWED_OBJECTS_COUNTint-1 (unlimited)Cap on live object count, summed across all loaded local shards (node-wide). Checked on every single + batch insert at the storage chokepoint.
MAXIMUM_ALLOWED_COLLECTIONS_COUNTint-1 (unlimited)Cap on number of collections. Pre-existing env var; behavior preserved.
MAXIMUM_ALLOWED_TENANTS_PER_COLLECTIONint-1 (unlimited)Cap on tenants per multi-tenant collection. Checked at tenant-create time only.
MAXIMUM_ALLOWED_SHARDS_PER_COLLECTIONint-1 (unlimited)Cap on desiredCount of a class create request's shardingConfig. Config-time check.

All values are runtime-overrideable via the existing runtime overrides YAML file (see RUNTIME_OVERRIDES_*). Field names in the YAML are the lowercase-snake-case forms of the env-var names.

Required pairing: REPLICATION_MAXIMUM_FACTOR=1

The object/tenant/shard caps only work in the RF=1 deployment shape (see Scope below). When any of MAXIMUM_ALLOWED_OBJECTS_COUNT, MAXIMUM_ALLOWED_TENANTS_PER_COLLECTION, or MAXIMUM_ALLOWED_SHARDS_PER_COLLECTION is set, you must also set REPLICATION_MAXIMUM_FACTOR=1. Startup fails otherwise. REPLICATION_MAXIMUM_FACTOR also caps the per-class replicationConfig.factor for new classes, so the invariant holds at runtime too.

MAXIMUM_ALLOWED_COLLECTIONS_COUNT is not part of the linkage — it predates this RFC and tying it would break existing operators.

Where each check fires

LimitHookFile
Objects (single)Shard.PutObject (top of function, before LSM write)adapters/repos/db/shard_write_put.go
Objects (batch)Shard.PutObjectBatch (top of function)adapters/repos/db/shard_write_batch_objects.go
Collectionsusecases/schema/class.go AddClass()usecases/schema/class.go
Tenantsusecases/schema/tenant.go AddTenants()usecases/schema/tenant.go
Shardsusecases/schema/class.go AddClass() (sharding-config validation)usecases/schema/class.go

The object check sits at the storage chokepoint rather than at the use-case layer. That covers both writes that arrive locally and writes that were forwarded from another node — both converge at Shard.PutObject{,Batch} on the home node for RF=1. The use-case layer (usecases/objects/) does not enforce the object cap; that hook was deliberately removed when we moved the chokepoint deeper.

The schema-side limits (collections, tenants, shards) stay at the use-case layer because that's the single coordinator path — AddClass/AddTenants go through RAFT, no forwarded-write concern.

Counter source

The object count is node-wide across local shards: the manager sums each loaded shard's bucket.CountAsync() (adapters/repos/db/lsmkv/bucket.go) on every enforced write. Each CountAsync() is O(segments-per-shard) — it walks the live segment list and sums each segment's already-loaded net-additions counter, no I/O. For the Free-Tier shape (few shards, few segments) that's a handful of atomic reads on the hot path.

We deliberately don't route through UsageForIndex — that path triggers other usage-module computations beyond a count.

Error response

When any limit is hit, Weaviate returns:

  • HTTP: 429 Too Many Requests with body
    json
    {
      "errorCode": "USAGE_LIMIT_EXCEEDED",
      "limit": "objects",
      "value": 10000,
      "message": "Object count limit of 10000 reached for this instance."
    }
    
  • gRPC: codes.ResourceExhausted with errdetails.ErrorInfo carrying the same limit/value/message fields under Reason="USAGE_LIMIT_EXCEEDED", Domain="weaviate.usagelimits".

The structured fields (errorCode, limit, value) are stable contract regardless of the USAGE_LIMITS_ERROR_MESSAGE template — only the human-facing message text changes.

Batch behavior

When a batch insert would exceed MAXIMUM_ALLOWED_OBJECTS_COUNT, the shard-slice is rejected as a unit:

  • Single-shard collections (Free Tier): whole-batch rejection. No partial fill.
  • Multi-shard collections: Index.putObjectBatch partitions a client batch by shard before forwarding, so the chokepoint sees one slice per shard. A single client batch can therefore produce per-shard partial success on multi-shard collections — accepted under our current scope (see Scope below).

Scope

Supported deployment shapes (where the cap is meaningful and exact):

  • Single-node clusters (the Free Tier sandbox case) — there is no other node.
  • Namespaced clusters in phase 1 — a namespace's collections/shards are pinned to a single node, so the per-namespace sum is local.

Out of scope:

  • RF > 1. The replicated write path bypasses Shard.PutObject{,Batch} (it goes through shard_replication.go's preparePutObject{,s}s.putOne / s.putBatch directly). Supporting RF>1 would require either dropping the check one level deeper or a smarter scheme like a lease-based quota.
  • Hypothetical multi-node, non-namespaced, RF=1, single-shard clusters where collections are distributed across nodes. Each node would only see its local slice of the count, so the effective cap stacks (cap × min(N_collections, N_nodes_with_shards)). Not a deployment shape we ship the cap in.
  • Phase-2 namespaces that spread a namespace's collections across nodes — same problem as the previous bullet.
  • Cluster-wide enforcement. Reserved for the future namespaces work; not API-stubbed here.

Backward-compat note: collections-limit status code

The pre-existing MAXIMUM_ALLOWED_COLLECTIONS_COUNT enforcement previously returned HTTP 422 Unprocessable Entity with a free-text "maximum number of collections" message. As of this release it returns HTTP 429 with the structured USAGE_LIMIT_EXCEEDED body described above. Clients matching on the prior 422 status or message text must adapt.

Accepted imperfections

  • Object count via async path. Counts come from CountAsync and exclude the in-memory memtable, so during fast bulk imports the count lags slightly behind on-disk state. Bounded by in-flight write volume between count refreshes; self-corrects on the next flush. Sync counting on every write would scan the entire memtable — wasteful at the 10K free-tier scale, fatal at 10M+ scale.
  • Cold lazy-load shards are skipped from the sum. Including them wouldn't force a load (counts can be read from on-disk segment metadata), but it would add a directory walk + per-segment metadata read per cold shard on every write — unacceptable on the hot path. Effect: accounts with dormant tenants may sit slightly under-counted. Future option: cache cold counts in memory at load time.
  • Per-shard-slice batch rejection on multi-shard collections (see Batch behavior). Single-shard collections (Free Tier) see whole-batch rejection unchanged.
  • Tenants checked at create time only, not on subsequent multi-tenancy config changes.
  • Schema-side caps are not transactional with RAFT. Read-check-write is not atomic across the RAFT-replicated AddClass/AddTenants call, so two concurrent creates can both pass the check. Bounded overshoot; next request is correctly rejected.