Back to Activepieces

Benchmark

docs/install/architecture/benchmark.mdx

0.86.07.5 KB
Original Source

This benchmark answers one question for the recommended production shape: at the 1:10 app-to-worker ratio, how does throughput scale as you grow the fleet from 40 to 160 workers? It runs the worker-is-the-sandbox model on a real GKE cluster, against a same-region object store with signed URLs and official piece tarballs served from the CDN.

The shape under test: app tier, Redis job queue, Postgres, S3, and a one-flow-per-worker execution tier.

What's measured

A 4-node synchronous webhook flow that holds the HTTP connection open until the flow returns:

<Steps> <Step title="Webhook trigger"> Catches the request on a `/sync` URL and holds the connection until the flow finishes. </Step> <Step title="Math Helper"> Adds `2 + 3`. </Step> <Step title="Code step"> Runs `return inputs.sum + 1` inside an `isolated-vm` context. </Step> <Step title="Webhook response"> Returns the result, closing the held connection. </Step> </Steps>

The compute is sub-millisecond by design — everything measured below is orchestration (queueing, callbacks, sandbox boot), which is what actually shapes production latency.

Each fleet size is held at the recommended 1:10 ratio (1 app per 10 workers) and run warm (AP_REUSE_SANDBOX=true) — the engine process is reused between jobs.

Results

<CardGroup cols={2}> <Card title="686 req/s" icon="gauge-high"> Peak warm throughput — 16 apps, 160 workers. </Card> <Card title="~4.5 req/s" icon="arrow-trend-up"> Per worker, held flat from 40 to 160 workers — throughput scales linearly with the fleet. </Card> </CardGroup>

Each worker is one sandbox at concurrency 1, hard-capped at 0.5 vCPU / 1 GB. Apps are 1 vCPU / 1 GB. Load concurrency is matched to the worker count so requests don't queue behind the concurrency-1 workers.

Apps · WorkersRatioWarm req/sWarm req/s per worker
4 app · 40 workers1:10185.34.6
8 app · 80 workers1:10409.55.1
12 app · 120 workers1:10553.04.6
16 app · 160 workers1:10686.34.3

What each tier ran — and what it was actually doing

Only the app and worker counts scale (1:10). Postgres and Redis are a single fixed-size pod each — the same for every row below. CPU is the average across three warm load tests; the singletons' figures are the whole pod, app/worker are per pod.

Apps · WorkersWarm req/sPostgres used / capRedis used / capApp used / cap (per pod)Worker used / cap (per pod)
4 · 40185522m / 3000m134m / 2000m782m / 1000m102m / 500m
8 · 80410640m / 3000m150m / 2000m518m / 1000m72m / 500m
12 · 120553546m / 3000m132m / 2000m311m / 1000m50m / 500m
16 · 160686396m / 3000m169m / 2000m205m / 1000m37m / 500m

Postgres never crosses ~0.65 of a core and Redis never crosses ~0.17, both far below their caps and flat as the fleet quadruples — they are not absorbing a growing share of anything. Workers sit at ≤0.1 of their 0.5-core cap. No tier approaches saturation, which is exactly why each added worker keeps adding throughput. (The singletons are sized this large on purpose — see Test environment — so they provably stay off the critical path; the default Postgres max_connections=100 would cap the fleet at ~10 apps, which is the artifact behind the earlier "120 cliff".)

How throughput scales

  • Warm scales linearly with the fleet. Per-worker throughput stays flat at ~4.5 req/s from 40 to 160 workers, so total throughput tracks the worker count (185 → 410 → 553 → 686; 3.7× for 4× the fleet). The shared Postgres and Redis singletons are not the wall — they sit near-idle at every fleet size (Postgres under 0.6 of a core, Redis under 0.2, both far below their caps), and raising their resources several-fold does not move the curve. The ceiling is the concurrency-1 worker model: each worker is busy for the whole per-flow time — engine run plus the end-of-run run-log persistence it finishes before taking the next job — so fleet throughput is workers ÷ per-flow-time, which is linear in the fleet. (The synchronous response reaches the client sooner than that — it is sent at the response step, before the worker wraps up the log write — so client-perceived latency is lower than the worker-busy time that sets throughput.) Per-flow time carries run-to-run variance (the object-store log-write tail), which is why a single run's curve looks bumpy; the invariant that the per-worker rate holds constant is what shows the scaling is linear. <Note>

Why Production Setup recommends 1:10. Apps at 1 vCPU are cheap relative to the worker fleet, and 1:10 is the warm-headroom margin that keeps the app tier from becoming the wall during bursts. See Production Setup. </Note>

Latency anatomy

Where the worker's milliseconds go — warm at peak (16 app · 160 w):

LayerWarm
Provision (flow bundle + piece + engine, mostly disk-cache hits)~10 ms
Sandbox boot (engine process reused)~5 ms
Flow run (4 steps: engine→app callbacks + end-of-run log persist)~203 ms
Worker-busy avg per job~218 ms

This is the time the worker is occupied per job — and at concurrency 1 it is what sets throughput (workers ÷ worker-busy-time). The synchronous client sees less: the response is published at the flow's response step, before the worker finishes persisting the run log, so client-perceived latency runs below the worker-busy figure.

Test environment

  • Cluster: GKE n2-standard-16 × 10 nodes, europe-west1-b
  • Worker: 0.5 vCPU / 1 GB, concurrency 1, SANDBOX_CODE_ONLY (Node fork + isolated-vm)
  • App: 1 vCPU / 1 GB
  • Object store: same-region GCS bucket (europe-west1) over the S3-interop endpoint, path-style SigV4 presigned URLs (AP_S3_USE_SIGNED_URLS=true)
  • Piece bundles: official tarballs served from the Activepieces CDN (AP_USE_CDN_FOR_BUNDLES=true)
  • Postgres + Redis: in-cluster singletons, deliberately over-provisioned so they stay off the critical path — Postgres at 3 vCPU / 3 GB with max_connections=2000 (the default 100 would starve the app pools past ~10 apps), durability off, and its data dir on tmpfs; Redis at 2 vCPU / 2 GB with io-threads. Under load both stay near-idle (Postgres <0.6 core, Redis <0.2), confirming the worker tier, not the singletons, is the ceiling.
  • Load: hey, concurrency matched to worker count (40/80/120/160) so requests don't queue behind the concurrency-1 workers — latency reflects real service time, not backlog

How to reproduce

bash
benchmark/run-gke.sh [total_requests] [concurrency]

The script mints a worker token, deploys benchmark/k8s-sandbox.yaml to the cluster, runs the load test against the app LoadBalancer, and reports warm throughput and the per-run breakdown from worker-pod logs. Set APP_REPLICAS and WORKER_REPLICAS (keeping the 1:10 ratio) to reproduce any row in the results table.

<Tip> This benchmark runs in `SANDBOX_CODE_ONLY` mode. It does **not** represent the performance of Activepieces Cloud, which uses a different sandboxing mechanism for multi-tenancy. See [Sandboxing](/install/architecture/sandboxing). </Tip>