site/docs/configuration/expected-outputs/model-graded/context-relevance.md
Measures what fraction of retrieved context is minimally needed to answer the query.
Use when: You want to check if your retrieval is returning too much irrelevant content.
How it works: Extracts only the sentences absolutely required to answer the query. Score = required sentences / total sentences.
:::warning This metric finds the MINIMUM needed, not all relevant content. A low score might mean good retrieval (found answer plus supporting context) or bad retrieval (lots of irrelevant content). :::
Example:
Query: "What is the capital of France?"
Context: "Paris is the capital. France has great wine. The Eiffel Tower is in Paris."
Score: 0.33 (only first sentence required)
assert:
- type: context-relevance
threshold: 0.3 # At least 30% should be essential
query - User's question (in test vars)context - Retrieved text (in vars or via contextTransform)threshold - Minimum score 0-1 (default: 0)tests:
- vars:
query: 'What is the capital of France?'
context: 'Paris is the capital of France.'
assert:
- type: context-relevance
threshold: 0.8 # Most content should be essential
Context can be provided as an array of chunks:
tests:
- vars:
query: 'What are the benefits of RAG systems?'
context:
- 'RAG systems improve factual accuracy by incorporating external knowledge sources.'
- 'They reduce hallucinations in large language models through grounded responses.'
- 'RAG enables up-to-date information retrieval beyond training data cutoffs.'
- 'The weather forecast shows rain this weekend.' # irrelevant chunk
assert:
- type: context-relevance
threshold: 0.5 # Score: 3/4 = 0.75
For RAG systems that return context with their response:
# Provider returns { answer: "...", context: "..." }
assert:
- type: context-relevance
contextTransform: 'output.context' # Extract context field
threshold: 0.3
contextTransform can also return an array:
assert:
- type: context-relevance
contextTransform: 'output.chunks' # Extract chunks array
threshold: 0.5
context-faithfulness - Does output stay faithful to context?context-recall - Does context support expected answer?