.agents/features/flow-runs.md
Flow Runs records every execution of a flow, tracking its full lifecycle from queuing through completion or failure. It stores compressed execution logs for step-by-step inspection, supports pause-and-resume for delay and webhook-based waits, provides retry strategies for recovering from failures, and emits WebSocket events and application events to notify the frontend and downstream systems in real time.
packages/server/api/src/app/flows/flow-run/ — controller, service, entitypackages/shared/src/lib/automation/flow-run/flow-run.ts — FlowRun typepackages/shared/src/lib/automation/flow-run/dto/ — list, retry, bulk request typespackages/shared/src/lib/automation/flow-run/execution/ — StepOutput, FlowExecution, ExecutionOutputpackages/shared/src/lib/automation/flow-run/log-serializer.ts — zstd compress/decompress helperspackages/web/src/features/flow-runs/api/flow-runs-api.ts — flowRunsApipackages/web/src/features/flow-runs/hooks/flow-run-hooks.ts — flowRunQueries, flowRunMutationspackages/web/src/features/flow-runs/components/runs-table/ — RunsTable, columns.tsx, retry/cancel/archive dialogs, failed-step-dialog.tsxpackages/web/src/app/builder/flow-canvas/widgets/run-info-widget.tsx — builder widget that jumps to the failed step on the canvaspackages/web/src/app/builder/state/run-state.ts — tracks the focused/failed step for the builderpackages/web/src/app/builder/state/canvas-state.ts — tracks userManuallySelectedStepDuringRun and exposes the resumeLiveFollow action for live-follow controlpackages/server/api/src/app/ee/alerts/alerts-service.ts — sends the failure email via the EE Alerts feature (see .agents/features/alerts.md)packages/web/src/features/flow-runs/components/step-status-icon.tsx — per-step status badgepackages/web/src/app/routes/runs/index.tsx — runs list pagepackages/web/src/app/routes/runs/id/index.tsx — individual run detail pagepackages/web/src/app/builder/run-details/ — step input/output inspector inside the builderpackages/web/src/app/builder/run-list/ — recent runs sidebar in the builderEXECUTION_DATA_RETENTION_DAYS.POST /v1/admin/platforms/runs/retry) is Cloud-only.FlowExecutorContext after execution.waitpoint table representing one paused step on a run. Fields: type (DELAY|WEBHOOK), version (V0|V1 — V1 is the current API), status (PENDING|COMPLETED), stepName, resumeDateTime, responseToSend, resumePayload, workerHandlerId, httpRequestId. Unique on (flow_run_id, step_name).flow_run distinguishing DELAY vs WEBHOOK pauses. Deprecated 2026-04-13 (0.82.0); still read for in-flight V0 runs, scheduled for removal. V1 runs store this information on the waitpoint row instead.WAITPOINT | RETRY. Set on every ExecutionType.RESUME engine operation and threaded through ExecuteFlowJobData. Determines whether the engine restores FAILED steps from the journal — waitpoint resumes preserve them, retry resumes drop them so the failed step re-executes. The two resume paths share the same ExecutionType, so this field is the only discriminator.parentRunId, created when a flow calls another flow as a step.{ name, displayName, message? } for the step that caused failure. Enables filtered retries, the runs-table error-message search, the failure email's "Reason" line, and the builder's jump-to-failed-step affordance. message is truncated via truncateString from @activepieces/shared before being persisted, and the engine populates failedStep for every status in FAILED_STATES (FAILED, TIMEOUT, INTERNAL_ERROR, QUOTA_EXCEEDED, MEMORY_LIMIT_EXCEEDED) — not just FAILED.FlowRun: id, projectId, flowId, flowVersionId, environment (PRODUCTION/TESTING), logsFileId (nullable FK to File), parentRunId (nullable, self-reference for subflows), failParentOnFailure (default true), status, tags[] (nullable), startTime, triggeredBy (nullable FK to User), finishTime, pauseMetadata (JSONB), failedStep (JSONB: { name, displayName, message? }), archivedAt (soft delete), stepNameToTest (nullable), stepsCount.
Non-terminal: QUEUED, RUNNING, PAUSED Terminal: SUCCEEDED, FAILED, TIMEOUT, CANCELED, QUOTA_EXCEEDED, MEMORY_LIMIT_EXCEEDED, INTERNAL_ERROR, LOG_SIZE_EXCEEDED
GET / — List runs (cursor pagination, filters: projectId, flowId, status, tags, createdAfter/Before, failedStepName, failedStepMessage). failedStepMessage is a case-insensitive ILIKE '%…%' against failedStep->>'message'. The status and failedStepMessage filters are independent — combining a non-failure status with a failedStepMessage simply returns empty (no implicit narrowing).GET /:id — Get single run with populated dataPOST /:id/retry — Retry single run (strategy: FROM_FAILED_STEP or ON_LATEST_VERSION)POST /retry — Bulk retry with filtersPOST /cancel — Bulk cancel paused/queued runsPOST /archive — Bulk soft delete (set archivedAt)POST /v1/waitpoints — Engine-only: create a waitpoint (PENDING) for a paused stepALL /:id/waitpoints/:waitpointId — Resume a paused run via waitpoint (V1, async)ALL /:id/waitpoints/:waitpointId/sync — Resume and return the flow's response synchronously (V1)ALL /:id/requests/:requestId — V0 legacy resume route (pauseMetadata-based)ALL /:id/requests/:requestId/sync — V0 legacy sync resumeresumeReason: RETRY so the engine drops the FAILED step from the restored journal — without this, the failed step would be treated as already-complete and the retry would be a no-op.ctx.run.createWaitpoint({ type, ... }) + ctx.run.waitForWaitpoint(id). Engine POSTs /v1/waitpoints; server inserts a PENDING row keyed on (flow_run_id, step_name).
DELAY waitpoint: server upserts a SystemJobName.RESUME_DELAY_WAITPOINT BullMQ job scheduled at resumeDateTime. When it fires, resumeService.resumeFromWaitpoint enqueues the resume.WEBHOOK waitpoint: resume signal arrives as an HTTP call on /:id/waitpoints/:waitpointId[/sync]. Optional responseToSend is replied immediately to the original webhook trigger.waitpoint-service.complete() takes a pessimistic write lock on the PENDING row. If no row yet, it inserts a COMPLETED row with the resumePayload. When the flow then transitions to PAUSED, flow-runs-queue.ts sees the COMPLETED waitpoint and enqueues the resume immediately. Prevents dropped early callbacks.FlowExecutorContext → re-run the paused step with ExecutionType.RESUME and ctx.resumePayload = waitpoint.resumePayload. When rebuilding flowContext in flow.operation.ts#getFlowExecutionState, steps in SUCCEEDED / PAUSED are always restored; FAILED steps are restored iff resumeReason === WAITPOINT. Dropping a FAILED step kept alive by continueOnFailure would re-execute it from BEGIN — re-firing its waitpoint (e.g. re-invoking a subflow) and letting the global constants.resumePayload pollute the new output. The retry path needs the opposite behavior, hence the discriminator.AP_PAUSED_FLOW_TIMEOUT_DAYS caps DELAY resumeDateTime; engine throws PausedFlowTimeoutError beyond that.pauseMetadata on flow_run + ctx.run.pause({ pauseMetadata }) + ctx.generateResumeUrl() + /requests/:requestId[/sync] routes. Still functional for in-flight runs; scheduled for removal.refill-paused-jobs migration re-queues legacy paused runs. Waitpoint DELAYs are BullMQ delayed jobs and survive restarts natively.onStart() → emit FLOW_RUN_STARTED application eventonResume() → emit FLOW_RUN_RESUMEDonFinish() → emit FLOW_RUN_FINISHED (terminal states only), notify via WebSocketflowRunsApi.subscribeToTestFlowOrManualRun() uses Socket.IO to start a test run and stream progress updates via WebsocketClientEvent.UPDATE_RUN_PROGRESS. The builder's run-list sidebar polls for recent runs and the run-details panel renders step-by-step input/output from the populated run's execution logs. flowRunMutations.useRetryRun handles the FLOW_RUN_RETRY_OUTSIDE_RETENTION error code with a user-facing toast showing the retention window.
The runs table surfaces a Status multi-select and an "Error message" text input (failedStepMessage URL param). Filters are independent AND'd dimensions — the backend applies them with no implicit narrowing. Combining a non-failure status with failedStepMessage returns empty (only FAILED_STATES runs carry failedStep.message); the conflict is left for the user to resolve via the visible chips.
FailedStepDialog (full error + "Go to run" footer). Legacy runs without a captured message bypass the dialog and navigate straight to the run page.resumeLiveFollow, which clears the userManuallySelectedStepDuringRun flag and snaps loop indexes to their latest iteration so the canvas resumes following the engine live.goToFailedStep in run-state.userManuallySelectedStepDuringRun in canvas-state: selectStepByName sets it whenever the user picks a different step mid-run (fromAutoFocus is passed when the change came from the auto-follow effect, not a user click), and setRun resets it when a new run id arrives.