site/docs/integrations/n8n.md
This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:
| What | Why |
|---|---|
| Self‑hosted n8n ≥ v1 (Docker or bare‑metal) | Gives access to the “Execute Command” node. |
| Promptfoo CLI available in the container/host | Needed to run promptfoo eval. |
| (Optional) LLM provider API keys set as environment variables or n8n credentials | Example: OPENAI_API_KEY, ANTHROPIC_API_KEY, … |
| (Optional) Slack / email / GitHub nodes in the same workflow | For notifications or comments once the eval finishes. |
The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:
# Dockerfile
FROM n8nio/n8n:latest # or a fixed tag
USER root # gain perms to install packages
RUN npm install -g promptfoo # installs CLI system‑wide
USER node # drop back to non‑root
Update docker‑compose.yml:
services:
n8n:
build: .
env_file: .env # where your OPENAI_API_KEY lives
volumes:
- ./data:/data # prompts & configs live here
If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.
Below is the minimal pattern most teams start with:
| # | Node | Purpose |
|---|---|---|
| 1 | Trigger (Cron or Webhook) | Decide when to evaluate (nightly, on Git push webhook …). |
| 2 | Execute Command | Runs Promptfoo and emits raw stdout / stderr. |
| 3 | Code / Set node | Parses the resulting JSON, extracts pass/fail counts & share‑URL. |
| 4 | IF node | Branches on “failures > 0”. |
| 5 | Slack / Email / GitHub | Sends alert or PR comment when the gate fails. |
promptfoo eval \
-c /data/promptfooconfig.yaml \
--prompts "/data/prompts/**/*.json" \
--output /tmp/pf-results.json \
--share --fail-on-error
cat /tmp/pf-results.json
Set the working directory to /data (mount it with Docker volume) and set it to execute once (one run per trigger).
The node writes a machine‑readable results file and prints it to stdout,
so the next node can simply JSON.parse($json["stdout"]).
:::info The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet. :::
// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);
const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;
return items;
An IF node can then route execution:
If your goal is to test the prompt inside an n8n AI Agent / OpenAI node (not just run Promptfoo from a workflow), treat the n8n node like any other app contract:
tests.vars, andThis works well when you want to regression-test an agent before wiring it into a larger workflow.
If your agent is supposed to emit structured data for a Set, Code, Switch, or HTTP Request node, validate the payload directly.
prompts:
- file://./prompts/n8n-support-router.txt
providers:
- openai:gpt-5-mini
tests:
- vars:
customer_message: 'Customer wants to cancel order #4815 and asks for a refund'
assert:
- type: contains-json
value:
type: object
required: [route, priority, reply]
properties:
route:
type: string
enum: [billing, support, sales]
priority:
type: string
enum: [low, medium, high]
reply:
type: string
Use contains-json when the model may wrap JSON in prose or a markdown code block. If your node must return only JSON, use is-json instead.
If your n8n setup uses an OpenAI-compatible agent that should call tools before continuing, validate that Promptfoo sees a real tool call and that it matches your schema.
prompts:
- file://./prompts/n8n-calendar-agent.txt
providers:
- id: openai:gpt-5-mini
config:
tools: file://./tools/calendar-tools.yaml
tests:
- vars:
user_request: "Move tomorrow's standup to 3pm and notify the team"
assert:
- type: finish-reason
value: tool_calls
- type: is-valid-openai-tools-call
That pattern is especially useful when your n8n workflow branches on whether the LLM produced a tool invocation versus a final answer.
/docs/configuration/tools for defining tool schemas/docs/guides/evaluate-json for JSON and schema assertionsexamples/openai-tools-call for a concrete OpenAI tool-calling configexamples/eval-tool-use for finish-reason and tool-use checks across providersMake the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.
Mount your prompts directory and config file into the container at
/data. When you commit new prompts to Git, your CI/CD system can call the
n8n REST API or a Webhook trigger to re‑evaluate immediately.
If you run n8n headless via n8n start --tunnel, you can call this workflow
from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute
command and then check the HTTP
response code; returning exit 1 from the Execute Command node will propagate
the failure.
PROMPTFOO_CACHE_PATH; mount that directory to persist across runs.promptfoo eval with timeout --signal=SIGKILL 15m …
(Linux) if you need hard execution limits.stderr field of Execute Command to a dedicated log
channel so you don’t miss stack traces.| Symptom | Likely cause / fix |
|---|---|
Execute Command node not available | You’re on n8n Cloud; switch to self‑hosted. |
promptfoo: command not found | Promptfoo not installed inside the container. Rebuild your Docker image or add an install step. |
Run fails with ENOENT on config paths | Make sure the prompts/config volume is mounted at the same path you reference in the command. |
| Large evals time‑out | Increase the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow. |
Happy automating!