Back to Tuist

Kura

kura/README.md

4.191.820.2 KB
Original Source
<p align="center"> </p>

Kura

Kura is a Rust server for building low-latency cache meshes for tenants, handling distributed cache traffic for binary artifacts and metadata.

[!NOTE] Kura comes from the Japanese word (kura), which refers to a storehouse or warehouse. The name fits the system's role: keeping build artifacts and cache metadata stored durably and close at hand so they can be served with low latency.

Summary ✨

  • ⚡ Hot reads come from local disk
  • 🪨 Local metadata, multipart state, and the replication outbox live in RocksDB
  • 🔁 Blobs and cache metadata replicate to peer nodes with eventual consistency
  • 🔎 Nodes can discover peers through DNS and bootstrap themselves from already-running nodes
  • 📦 Kura actively supports Bazel and Buck2 REAPI, Xcode Cache, Gradle, and Tuist Module Cache
  • 🧪 Compatibility endpoints for Nx and React Native Metro are available, but they are not a primary focus today
  • 🧰 The gRPC API exposes the Bazel Remote Execution cache services used by Bazel and Buck2
  • 📊 The local stack includes Grafana, Prometheus, Loki, Promtail, and Tempo traces

Supported cache protocols

Actively supported:

  • Bazel and Buck2: Bazel Remote Execution API v2 over gRPC on KURA_GRPC_PORT
  • Xcode Cache: HTTP CAS artifacts on POST/GET /api/cache/cas/{id} and action-cache style entries on PUT/GET /api/cache/keyvalue
  • Gradle: PUT/GET /api/cache/gradle/{cache_key}
  • Module Cache: multipart uploads on POST /api/cache/module/start, POST /api/cache/module/part, POST /api/cache/module/complete, and HEAD/GET /api/cache/module/{id}

Compatibility surfaces:

  • Nx: self-hosted remote cache API on GET/PUT /v1/cache/{hash}
  • React Native Metro: HttpStore / HttpGetStore on GET/PUT /api/metro/cache/{cache_key}

Local stack 🧪

Run:

bash
docker compose up --build -d

Useful endpoints:

  • http://localhost:4101/up
  • http://localhost:4101/ready
  • http://localhost:4101/status/rollout
  • http://localhost:4102/up
  • http://localhost:4102/ready
  • http://localhost:4102/status/rollout
  • http://localhost:4103/up
  • http://localhost:4103/ready
  • http://localhost:4103/status/rollout
  • grpc://localhost:5101 for Bazel/Buck2 REAPI against kura-us
  • grpc://localhost:5102 for Bazel/Buck2 REAPI against kura-eu
  • grpc://localhost:5103 for Bazel/Buck2 REAPI against kura-ap
  • http://localhost:3000 for Grafana with admin / admin
  • http://localhost:9090 for Prometheus
  • http://localhost:3100 for Loki
  • http://localhost:3200 for Tempo

Toolchain 🛠️

Install Rust from mise.toml:

bash
mise trust mise.toml
mise install

Run tests:

bash
mise x [email protected] -- cargo test
mise x [email protected] -- shellspec

Runtime configuration is summarized in the table under Runtime Model And Limits. Kura now derives sensible defaults for the main FD, memory, and metadata-store budgets at startup when you do not set them explicitly.

🗺️ Project Areas

Kura is easier to read by subsystem than by tutorial step. The sections below group the project by the main areas you operate or extend.

🔌 Protocol Surfaces

Kura exposes multiple cache protocols behind one service. The actively supported surfaces are:

  • 🛠️ Bazel and Buck2: REAPI over gRPC on KURA_GRPC_PORT
  • 🍎 Xcode Cache: POST/GET /api/cache/cas/{id}?tenant_id=...&namespace_id=...
  • 🗂️ KeyValue / action-cache entries: PUT /api/cache/keyvalue?tenant_id=...&namespace_id=...
  • 🐘 Gradle: PUT/GET /api/cache/gradle/{cache_key}?tenant_id=...&namespace_id=...
  • 📦 Module Cache: POST /api/cache/module/start?..., POST /api/cache/module/part?..., POST /api/cache/module/complete?..., HEAD/GET /api/cache/module/{id}?...

Kura also exposes compatibility endpoints that are not a primary focus today:

  • 🧱 Nx: PUT/GET /v1/cache/{hash}
  • 📱 Metro: PUT/GET /api/metro/cache/{cache_key}

The local compose stack is still the quickest way to exercise all of those surfaces together:

bash
docker compose up --build -d

Example Xcode artifact round trip:

bash
curl -X POST \
  "http://localhost:4101/api/cache/cas/artifact-1?tenant_id=acme&namespace_id=ios" \
  -H "content-type: application/octet-stream" \
  --data-binary "xcode-binary"

curl \
  "http://localhost:4102/api/cache/cas/artifact-1?tenant_id=acme&namespace_id=ios"

Example keyvalue entry round trip:

bash
curl -X PUT \
  "http://localhost:4101/api/cache/keyvalue?tenant_id=acme&namespace_id=ios" \
  -H "content-type: application/json" \
  -d '{"cas_id":"cas-1","entries":[{"value":"hello"},{"value":"world"}]}'

curl \
  "http://localhost:4103/api/cache/keyvalue/cas-1?tenant_id=acme&namespace_id=ios"

🗄️ Storage And Replication

Kura splits storage into two planes:

  • 🪨 RocksDB stores metadata, keyvalue payloads, multipart state, tombstones, segment lifecycle state, and the replication outbox.
  • 📦 Segment files store large immutable binary artifacts for the hot path.

Replication is leaderless and eventually consistent:

  • 🔁 local writes become durable together with their outbox work
  • 🌍 peers bootstrap by pulling manifests, tombstones, and artifact bodies
  • 🔎 DNS discovery can expand the peer set automatically
  • 🧠 the outbox is processed incrementally so queue depth does not blow up heap usage during backlog

Peer-to-peer traffic always uses the dedicated internal plane:

  • KURA_INTERNAL_PORT

Peer-to-peer mTLS is optional on that plane:

  • KURA_INTERNAL_TLS_CA_CERT_PATH
  • KURA_INTERNAL_TLS_CERT_PATH
  • KURA_INTERNAL_TLS_KEY_PATH

When peer mTLS is disabled:

  • KURA_NODE_URL and every value in KURA_PEERS must use http://...:<KURA_INTERNAL_PORT>
  • /_internal/* is only served on the internal HTTP listener
  • 🌍 the public API still stays on KURA_PORT

When peer mTLS is enabled:

  • 🔒 KURA_NODE_URL and every value in KURA_PEERS must use https://...:<KURA_INTERNAL_PORT>
  • 🧱 /_internal/* is only served on the internal mTLS listener
  • 🌍 the public API still stays on KURA_PORT
  • 🪪 the certificate configured through KURA_INTERNAL_TLS_CERT_PATH should be valid for both server and client auth
  • 🏷️ the certificate SANs must cover the hostname used in KURA_NODE_URL

⚙️ Runtime Model And Limits

Kura is designed around explicit resource budgets instead of relying on ambient process limits.

When Optional is Yes, the Default column shows what Kura uses today. auto means Kura derives the value at startup from detected file-descriptor limits, memory limits, or CPU count.

NameDescriptionOptionalDefault
KURA_PORTPublic HTTP port.No
KURA_GRPC_PORTgRPC port for REAPI.No
KURA_INTERNAL_PORTInternal HTTP or mTLS port used for peer replication and discovery.No
KURA_TENANT_IDDefault tenant identifier for the node.No
KURA_REGIONRegion label advertised in metrics and replication state.No
KURA_TMP_DIRTemporary directory for staged request bodies and multipart assembly.No
KURA_DATA_DIRPersistent directory for metadata state and segment files.No
KURA_NODE_URLCanonical internal URL other peers use to reach this node.No
KURA_PEERSSeed peer list used before discovery converges.YesKURA_NODE_URL
KURA_DISCOVERY_DNS_NAMEDNS name to probe for automatic peer discovery.Yesdisabled
KURA_FILE_DESCRIPTOR_POOL_SIZEApp-managed file-descriptor budget for request and background I/O.Yesauto
KURA_FILE_DESCRIPTOR_ACQUIRE_TIMEOUT_MSHow long a request waits before FD backpressure fails the checkout.Yes5000
KURA_DRAIN_COMPLETION_TIMEOUT_MSMaximum grace window Kura gives in-flight HTTP and gRPC work to finish during shutdown before forcing exit progression.Yes240000
KURA_SEGMENT_HANDLE_CACHE_SIZEMaximum number of pinned segment read handles; must stay below the FD pool size.Yesauto
KURA_MEMORY_SOFT_LIMIT_BYTESSoft watermark where Kura starts shedding optional memory use.Yesauto
KURA_MEMORY_HARD_LIMIT_BYTESHard watermark where Kura pauses replication work and trims hot caches aggressively.Yesauto
KURA_MANIFEST_CACHE_MAX_BYTESMaximum size of the in-memory manifest hot cache.Yesauto
KURA_MAX_KEYVALUE_BYTESMaximum per-request keyvalue payload size on public and replication APIs.Yes1048576
KURA_METADATA_STORE_MAX_OPEN_FILESDescriptor budget reserved for the metadata store itself.Yesauto
KURA_METADATA_STORE_MAX_BACKGROUND_JOBSBackground flush and compaction concurrency for the metadata store.Yesauto
KURA_METADATA_STORE_READ_CACHE_BYTESCapacity of the metadata-store read cache.Yesauto
KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTESTotal memory budget reserved for metadata write buffering.Yesauto
KURA_METADATA_STORE_WRITE_BUFFER_BYTESSize of each metadata write buffer before flush.Yesauto
KURA_METADATA_STORE_MAX_WRITE_BUFFERSMaximum number of metadata write buffers kept in memory.Yesauto

Auto-derived defaults currently follow these rules:

  • file_descriptor_limit comes from RLIMIT_NOFILE when available, otherwise Kura falls back to a conservative host default.
  • memory_limit_bytes comes from the cgroup memory limit when available, otherwise Kura falls back to physical host memory.
  • cpu_count comes from detected parallelism via the runtime.
  • KURA_FILE_DESCRIPTOR_POOL_SIZE is usable_fds / 8, clamped to [64, 256], where usable_fds is the detected FD limit minus reserved headroom.
  • KURA_SEGMENT_HANDLE_CACHE_SIZE is KURA_FILE_DESCRIPTOR_POOL_SIZE / 4, clamped to [16, 64], and then capped below the FD pool so transient work keeps headroom.
  • KURA_MEMORY_SOFT_LIMIT_BYTES is 70% of detected memory, rounded down to MiB boundaries, with a minimum of 128 MiB.
  • KURA_MEMORY_HARD_LIMIT_BYTES is 85% of detected memory, rounded down to MiB boundaries, and always at least 64 MiB above the soft limit.
  • KURA_MANIFEST_CACHE_MAX_BYTES is KURA_MEMORY_SOFT_LIMIT_BYTES / 16, rounded down to MiB boundaries and clamped to [8 MiB, 64 MiB].
  • KURA_METADATA_STORE_MAX_OPEN_FILES is usable_fds / 2, clamped to [128, 1024].
  • KURA_METADATA_STORE_MAX_BACKGROUND_JOBS is cpu_count, clamped to [1, 8].
  • KURA_METADATA_STORE_READ_CACHE_BYTES is memory_limit_bytes / 32, rounded down to MiB boundaries and clamped to [16 MiB, 128 MiB].
  • KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES follows the same memory_limit_bytes / 32 rule as the metadata-store read cache.
  • KURA_METADATA_STORE_WRITE_BUFFER_BYTES is KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES / 4, rounded down to MiB boundaries and clamped to [4 MiB, 32 MiB].
  • KURA_METADATA_STORE_MAX_WRITE_BUFFERS is KURA_METADATA_STORE_WRITE_BUFFER_POOL_BYTES / KURA_METADATA_STORE_WRITE_BUFFER_BYTES, clamped to [2, 8].
  • KURA_MAX_KEYVALUE_BYTES defaults to 1048576, KURA_FILE_DESCRIPTOR_ACQUIRE_TIMEOUT_MS defaults to 5000, and KURA_DRAIN_COMPLETION_TIMEOUT_MS defaults to 240000.

A minimal direct-binary deployment still looks like:

bash
KURA_PORT=4000 \
KURA_GRPC_PORT=50051 \
KURA_INTERNAL_PORT=7443 \
KURA_TENANT_ID=default \
KURA_REGION=eu-central \
KURA_TMP_DIR=/tmp/kura \
KURA_DATA_DIR=/var/cache/kura \
KURA_NODE_URL=http://cache-1.internal:7443 \
KURA_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector:4318/v1/traces \
KURA_OTEL_SERVICE_NAME=kura-eu-central \
KURA_OTEL_DEPLOYMENT_ENVIRONMENT=production \
./target/release/kura

Set KURA_SENTRY_DSN to also forward panics and tracing::error! events to Sentry. In Helm deployments, inject it via extraEnv or extraEnvFrom.

📊 Observability

Kura ships with a fairly complete local observability story:

  • 📈 Prometheus metrics
  • 📉 Grafana dashboards
  • 🪵 Loki and Promtail logs
  • 🧭 Tempo traces
  • 🚨 Optional Sentry error reporting for panics and error-level tracing events

Prometheus exposes live metadata-store memory gauges:

  • kura_rocksdb_block_cache_usage_bytes
  • kura_rocksdb_block_cache_pinned_usage_bytes
  • kura_rocksdb_block_cache_capacity_bytes
  • kura_rocksdb_write_buffer_usage_bytes
  • kura_rocksdb_write_buffer_capacity_bytes

Kura also exports:

  • 📦 artifact read and write counters by kind, client, artifact_class, and result
  • 🔁 replication latency and result metrics
  • 💾 file descriptor pool pressure metrics
  • 🧠 manifest cache occupancy and admission metrics

📣 Runtime Analytics

Analytics webhooks are a separate optional subsystem that mirrors the older Tuist cache contract for Xcode and Gradle traffic.

When enabled:

  • 🍎 Xcode upload and download events are sent to /webhooks/cache
  • 🐘 Gradle upload and download events are sent to /webhooks/gradle-cache
  • ✍️ requests are signed with x-cache-signature
  • 🧭 requests also include x-cache-endpoint
  • 🪶 delivery stays in-memory and best-effort, so analytics never block the hot path
  • 🧯 a per-pipeline circuit breaker opens after repeated delivery failures so Kura sheds analytics instead of backing up under a misbehaving upstream

Configure it with:

  • KURA_ANALYTICS_SERVER_URL
  • KURA_ANALYTICS_SIGNING_KEY
  • optional KURA_ANALYTICS_BATCH_SIZE default 100
  • optional KURA_ANALYTICS_BATCH_TIMEOUT_MS default 5000
  • optional KURA_ANALYTICS_QUEUE_CAPACITY default 1000
  • optional KURA_ANALYTICS_REQUEST_TIMEOUT_MS default 5000
  • optional KURA_ANALYTICS_CIRCUIT_BREAKER_FAILURE_THRESHOLD default 5
  • optional KURA_ANALYTICS_CIRCUIT_BREAKER_OPEN_MS default 30000

It also exposes analytics-specific runtime metrics for:

  • 📣 queue depth and drops
  • 📦 batch sizes and flush outcomes
  • 🧯 circuit-breaker state and open events

☸️ Deployment Options

Helm And Kubernetes

The repository includes a Helm chart at ops/helm/kura that deploys Kura as a StatefulSet with:

  • 💾 one PVC per pod for metadata-state and segment storage
  • 🔒 single-writer fencing through a process-held data-dir lock plus ReadWriteOncePod by default
  • 🧭 a headless service for stable pod DNS and peer discovery
  • 🌐 a regular service exposing both HTTP and gRPC
  • 🚪 optional ingress for the HTTP API
  • 🧩 optional inline extension script mounting through a ConfigMap
  • 🔐 optional peer mTLS for /_internal/* traffic via a mounted Kubernetes Secret
  • 🚦 /ready for public readiness and /up for liveness, with a preStop SIGUSR1 drain hook that removes pods from traffic before SIGTERM
  • ⏱️ a pod grace period derived from Kura's own drain timeout plus small lifecycle buffers so Kubernetes does not cut shutdown short

Lint and render the chart:

bash
helm lint ops/helm/kura
helm template kura ops/helm/kura --namespace kura

Install it on a generic cluster:

bash
helm upgrade --install kura ./ops/helm/kura \
  --namespace kura \
  --create-namespace \
  --set image.repository=ghcr.io/tuist/kura \
  --set image.tag=latest \
  --set config.region=fr-par \
  --set config.telemetry.otlpTracesEndpoint=http://otel-collector.monitoring.svc.cluster.local:4318/v1/traces

The chart defaults persistence to ReadWriteOncePod so one Kura process owns each PVC. If your CSI driver does not support it, override persistence.accessModes[0]=ReadWriteOnce; Kura will still fence the volume with its app-level writer lock.

The chart computes terminationGracePeriodSeconds from config.shutdown.drainCompletionTimeoutMs, podLifecycle.preStopDelaySeconds, and podLifecycle.terminationGraceExtraSeconds. That keeps the platform budget aligned with the application's shared shutdown deadline instead of relying on a separate hard-coded Kubernetes timeout.

For a local kind smoke test, the repo includes:

bash
./test/e2e/kura_helm_kind.sh

For a gated in-place StatefulSet rollout, the repo also includes:

bash
./ops/helm/kura/rollout.sh kura kura --set image.tag=<new-tag>

That script is the Kubernetes adapter. The rollout gate itself lives in ops/rollout/gate.sh and only assumes it can fetch Kura's rollout status endpoint once per node per poll. The Helm adapter stages the new revision behind a StatefulSet partition, rolls the highest ordinal first, and only advances after every node reports the same membership generation, all nodes are back in serving, the updated pod stays ready, ring membership is restored cluster-wide, outbox depth stays near baseline, no node is under critical memory pressure, and the cluster is not introducing new file-descriptor timeout activity.

If the Kura container listens on a non-default HTTP port, set KURA_HTTP_PORT=<port> when invoking the rollout helper so the adapter samples the correct loopback endpoint inside each pod.

For adjacent-version mixed rollout and rollback validation on the same persistent Docker volumes, use:

bash
PREVIOUS_REF=origin/main ./test/e2e/kura_compatibility_rollout.sh

That harness proves PREVIOUS_REF -> HEAD -> PREVIOUS_REF across a mixed-version window, but it validates protocol and on-disk compatibility only. It does not try to model Kubernetes PVC reattachment behavior.

To enable peer mTLS in Kubernetes, set:

  • peerTls.enabled=true
  • peerTls.internalPort=<port>
  • peerTls.secretName=<secret-with-ca-cert-and-key-material>

The referenced secret should contain the files configured by:

  • peerTls.caCertFileName
  • peerTls.certFileName
  • peerTls.keyFileName

When enabled, the chart advertises peer URLs over https on the internal port and mounts the secret into /etc/kura/peer-tls.

Scaleway Kapsule

For Scaleway, start from the bundled overrides in ops/helm/kura/values-scaleway.yaml:

bash
helm upgrade --install kura ./ops/helm/kura \
  --namespace kura \
  --create-namespace \
  -f ./ops/helm/kura/values-scaleway.yaml \
  --set image.repository=ghcr.io/tuist/kura \
  --set image.tag=latest \
  --set config.region=fr-par \
  --set config.telemetry.otlpTracesEndpoint=http://otel-collector.monitoring.svc.cluster.local:4318/v1/traces

That values file does two important things:

  • 🚪 uses a LoadBalancer service, which is the simplest way to expose Kura on Kapsule
  • 💾 pins persistence to scw-bssd, which Scaleway documents as the default block storage class for Kapsule multi-AZ clusters

🧩 Extensions And Policy

Kura can load one operator-provided extension script at startup to customize authentication, authorization, and response headers without recompiling the binary.

Core env vars:

  • KURA_EXTENSION_ENABLED=true
  • KURA_EXTENSION_SCRIPT_PATH=/etc/kura/extensions/hooks.lua
  • KURA_EXTENSION_HOOK_TIMEOUT_MS=25
  • KURA_EXTENSION_AUTH_CACHE_ALLOW_TTL_SECONDS=600
  • KURA_EXTENSION_AUTH_CACHE_DENY_TTL_SECONDS=3
  • KURA_EXTENSION_FAIL_CLOSED_AUTHENTICATE=true
  • KURA_EXTENSION_FAIL_CLOSED_AUTHORIZE=true
  • KURA_EXTENSION_FAIL_OPEN_RESPONSE_HEADERS=true

Generic host resources are also env-driven:

  • ✍️ signers:
    • KURA_EXTENSION_SIGNER_<ID>_ALGORITHM
    • KURA_EXTENSION_SIGNER_<ID>_SECRET
  • 🪪 JWT verifiers:
    • KURA_EXTENSION_JWT_VERIFIER_<ID>_ALGORITHM
    • KURA_EXTENSION_JWT_VERIFIER_<ID>_SECRET
    • KURA_EXTENSION_JWT_VERIFIER_<ID>_ISSUER
    • KURA_EXTENSION_JWT_VERIFIER_<ID>_AUDIENCES
  • 🌐 HTTP clients:
    • KURA_EXTENSION_HTTP_CLIENT_<ID>_BASE_URL
    • KURA_EXTENSION_HTTP_CLIENT_<ID>_CONNECT_TIMEOUT_MS
    • KURA_EXTENSION_HTTP_CLIENT_<ID>_REQUEST_TIMEOUT_MS

The script may define these hooks:

  • authenticate(ctx)
  • authorize(ctx, principal)
  • response_headers(ctx, principal)

The runtime keeps decision caching, metrics, timeouts, and cryptographic primitives in Rust, while the script supplies policy.