site/docs/integrations/langfuse.md
Langfuse is an open-source LLM engineering platform that includes collaborative prompt management, tracing, and evaluation capabilities.
Install the langfuse SDK:
npm install langfuse
Set the required environment variables:
export LANGFUSE_PUBLIC_KEY="your-public-key"
export LANGFUSE_SECRET_KEY="your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted URL
Use the langfuse:// prefix in your promptfoo configuration to reference prompts managed in Langfuse.
You can reference prompts by version or label using two different syntaxes:
# By label
langfuse://prompt-name@label:type
# Examples
langfuse://my-prompt@production # Text prompt with production label
langfuse://chat-prompt@staging:chat # Chat prompt with staging label
# By version or label (auto-detected)
langfuse://prompt-name:version-or-label:type
The parser automatically detects:
1, 2, 3)production, staging, latest)Where:
prompt-name: The name of your prompt in Langfuseversion: Specific version number (e.g., 1, 2, 3)label: Label assigned to a prompt version (e.g., production, staging, latest)type: Either text or chat (defaults to text if omitted)prompts:
# Explicit @ syntax for labels (recommended)
- 'langfuse://my-prompt@production' # Production label, text prompt
- 'langfuse://chat-prompt@staging:chat' # Staging label, chat prompt
- 'langfuse://my-prompt@latest:text' # Latest label, text prompt
# Auto-detection with : syntax
- 'langfuse://my-prompt:production' # String → treated as label
- 'langfuse://chat-prompt:staging:chat' # String → treated as label
- 'langfuse://my-prompt:latest' # "latest" → treated as label
# Version references (numeric values only)
- 'langfuse://my-prompt:3:text' # Numeric → version 3
- 'langfuse://chat-prompt:2:chat' # Numeric → version 2
providers:
- openai:gpt-5-mini
tests:
- vars:
user_query: 'What is the capital of France?'
context: 'European geography'
Variables from your promptfoo test cases are automatically passed to Langfuse prompts. If your Langfuse prompt contains variables like {{user_query}} or {{context}}, they will be replaced with the corresponding values from your test cases.
Using labels is recommended for production scenarios as it allows you to:
Common label patterns:
production - Current production versionstaging - Testing before productionlatest - Most recently created versionexperiment-a, experiment-b - A/B testingtenant-xyz - Multi-tenant scenarios@ symbols are supported, we recommend avoiding them for clarity. The parser looks for the last @ followed by a label pattern to distinguish between the prompt ID and label.@ in your label names, consider using a different naming convention.