examples/eval-conversation-relevance/README.md
You can run this example with:
npx promptfoo@latest init --example eval-conversation-relevance
cd eval-conversation-relevance
This example demonstrates how to use the conversation-relevance assertion to evaluate whether chatbot responses remain relevant throughout a conversation.
The conversation relevance metric evaluates whether each response in a conversation is relevant to the context and previous messages. It uses a sliding window approach to analyze conversation segments.
Install dependencies:
npm install -g promptfoo
Set your OpenAI API key:
export OPENAI_API_KEY=your-api-key
Run the evaluation:
promptfoo eval
Tests basic relevance for a single query-response pair about travel to Paris.
Evaluates a complete conversation about travel planning where all responses should be relevant.
Demonstrates detection of an off-topic response (stock market comment) in the middle of a conversation about wedding planning.
Shows a high-quality technical support conversation with a high relevance threshold (0.95).
threshold: Minimum score required to pass (0-1)config.windowSize: Number of messages in each sliding window (default: 5)provider: Override the default grading modelThe metric evaluates each message position using a sliding window approach. For example, with a 5-message conversation and window size of 3:
Each window evaluates whether the LAST assistant response in that window is relevant. The final score is:
Score = Number of Relevant Windows / Total Number of Windows