docs/install/guarantees/crash-recovery.mdx
When infrastructure fails mid-flow, your runs survive. A worker can crash, get OOM-killed, be evicted, or be rolled in a deploy, and the run continues on another worker. It picks up where it left off, without repeating completed work and without being dropped.
<Note> These guarantees describe the [recommended production setup](/install/configure-operate/production-setup): one flow per worker (`AP_WORKER_CONCURRENCY=1`), S3 file storage, and a reused engine process. They are converging with the shipped defaults. Activepieces Cloud upholds the same guarantees through a different sandbox mechanism; see [Sandboxing](/install/architecture/sandboxing). </Note>Three promises follow from one mechanism:
Once a step's output is checkpointed, no crash, deploy, pause, or retry runs it again. On resume, every completed step is skipped and its recorded output reused. Only work that hadn't finished yet executes.
Once a run is on the queue it executes, even if the worker holding it dies, is evicted, or loses its lease mid-run. Work is never silently dropped.
Because runs are durable and re-queued automatically, you can restart, redeploy, or evict workers mid-flow. No drain window, no graceful-shutdown hook, no in-flight runs lost.
All three come from one mechanism. Every run has a run log, a checkpoint of each completed step persisted to Postgres/S3. Workers are stateless, so when one dies its run is re-queued and a healthy worker replays the log, skipping completed steps and resuming at the interruption point. See Durable Execution for how replay and skip work.
The replay guarantee is inherent to the engine. No flag enables or disables it, and the checkpoint cadence is fixed at 15 seconds. What you own is the durability of the backing stores: managed Postgres, S3, and Redis (see Production Setup).