docs/install/architecture/durable-execution.mdx
Without it, a worker that dies mid-flow would restart the whole run from the trigger and re-send emails, re-charge cards, and re-call APIs. Instead, a fresh worker reuses the saved output of every finished step and runs only the first step that had not completed. The same mechanism covers crashes, deploys, long pauses, and retries.
Every flow run has a run log: one compressed checkpoint file with everything needed to resume the run on a fresh worker.
What is in it:
When it is written:
Each write overwrites the previous copy. Only the latest checkpoint is kept, and the file is compressed before upload.
Resume is not a special path. Each time a worker starts a run, it walks the flow graph from the trigger and asks at every step: is this step's output already in the log?
SUCCEEDED or PAUSED), the engine returns the saved output and moves on.On the first run the log is empty, so every step runs. After a resume the log is full up to the interruption, so the engine skips through all of it and runs only what came next.
The most a crash can lose is the single step that was running when the worker died. That step runs again from the last checkpoint, and everything before it is skipped.
Every interruption resolves through the same replay path. Only the trigger differs.