site/docs/configuration/expected-outputs/model-graded/search-rubric.md
The search-rubric assertion type is like llm-rubric but with web search capabilities. It evaluates outputs according to a rubric while having the ability to search for current information when needed.
assert:
- type: search-rubric
value: 'Provides accurate current Bitcoin price within 5% of market value'
The search-rubric assertion behaves exactly like llm-rubric, but automatically uses a provider with web search capabilities:
# These are equivalent:
assert:
# Using llm-rubric with a web-search capable provider
- type: llm-rubric
value: 'Contains current stock price for Apple (AAPL) within $5'
provider: openai:responses:gpt-5.1 # Must configure web search tool
# Using search-rubric (automatically selects a web-search provider)
- type: search-rubric
value: 'Contains current stock price for Apple (AAPL) within $5'
Like llm-rubric, you can use test variables:
prompts:
- 'What is the current weather in {{city}}?'
assert:
- type: search-rubric
value: 'Provides current temperature in {{city}} with units (F or C)'
tests:
- vars:
city: San Francisco
- vars:
city: Tokyo
The search-rubric assertion requires a grading provider with web search capabilities:
Anthropic Claude models support web search through the web_search_20250305 tool:
grading:
provider: anthropic:messages:claude-opus-4-6
providerOptions:
config:
tools:
- type: web_search_20250305
name: web_search
max_uses: 5
OpenAI's responses API supports web search through the web_search_preview tool:
grading:
provider: openai:responses:gpt-5.1
providerOptions:
config:
tools:
- type: web_search_preview
Perplexity models have built-in web search:
grading:
provider: perplexity:sonar
Google's Gemini models support web search through the googleSearch tool:
grading:
provider: google:gemini-3-pro-preview
providerOptions:
config:
tools:
- googleSearch: {}
xAI's Grok models have built-in web search capabilities:
grading:
provider: xai:grok-4-1-fast-reasoning
providerOptions:
config:
search_parameters:
mode: 'on'
prompts:
- 'Who won the latest Super Bowl?'
assert:
- type: search-rubric
value: 'Names the correct winner of the most recent Super Bowl with the final score'
prompts:
- "What's the current stock price of {{ticker}}?"
assert:
- type: search-rubric
value: |
Provides accurate stock price for {{ticker}} that:
1. Is within 2% of current market price
2. Includes currency (USD)
3. Mentions if market is open or closed
threshold: 0.8
prompts:
- "What's the weather like in Tokyo?"
assert:
- type: search-rubric
value: |
Describes current Tokyo weather including:
- Temperature (with units)
- General conditions (sunny, rainy, etc.)
- Humidity or precipitation if relevant
prompts:
- "What's the latest version of Node.js?"
assert:
- type: search-rubric
value: 'States the correct latest LTS version of Node.js (not experimental or nightly)'
Web search assertions have the following cost implications. As of November 2025:
Like llm-rubric, the search-rubric assertion supports thresholds:
assert:
- type: search-rubric
value: 'Contains accurate information about current US inflation rate'
threshold: 0.9 # Requires 90% accuracy for economic data
promptfoo eval --no-cache to force fresh searchesUnderstanding how search-rubric evaluates different scenarios helps you write better tests.
The search-enabled grader identifies several types of failures:
| SUT Response | Grader Verdict | Reason |
|---|---|---|
| "I don't have access to real-time data" | Fail | No actual answer provided |
| Stale price from training data | Fail | Value differs from current market |
| Correct current price | Pass | Matches web search results |
| Partially correct answer | Partial | Score reflects completeness |
Models like gpt-4o-mini without web search enabled will often refuse to answer real-time questions:
"I don't have access to real-time stock data. For current prices, please check a financial website."
The search-rubric grader correctly flags this as a failure since no actual information was provided. This is the expected behavior—the assertion is verifying whether your system provides accurate current information, not whether it gracefully declines.
To test models that confidently answer (and potentially hallucinate):
The grader returns a score from 0.0 to 1.0 based on how well the output matches the rubric:
Use the threshold parameter to set your acceptable score level.
Ensure your grading provider supports web search. Default providers without web search configuration will fail. Check the Grading Providers section above.
If your SUT consistently refuses to answer real-time questions, this is expected behavior for models without web access. The search-rubric grader is correctly identifying that no factual answer was provided.
Solutions:
llm-rubric instead if you only need to verify the response formatThe grader relies on web search results, which may occasionally be wrong or ambiguous.
Best practices:
Web search adds cost on top of model tokens.
Cost reduction strategies:
search-rubric for tests that truly need real-time verificationllm-rubric for static fact-checking that doesn't require current datasonar model for built-in search without per-call fees