site/docs/configuration/expected-outputs/model-graded/factuality.md
The factuality assertion evaluates the factual consistency between an LLM output and a reference answer. It uses a structured prompt based on OpenAI's evals to determine if the output is factually consistent with the reference.
To use the factuality assertion type, add it to your test configuration like this:
assert:
- type: factuality
# Specify the reference statement to check against:
value: The Earth orbits around the Sun
For non-English evaluation output, see the multilingual evaluation guide.
The factuality checker evaluates whether completion A (the LLM output) and reference B (the value) are factually consistent. It categorizes the relationship as one of:
By default, options A, B, C, and E are considered passing grades, while D is considered failing.
Here's a complete example showing how to use factuality checks:
prompts:
- 'What is the capital of {{state}}?'
providers:
- openai:gpt-5
- anthropic:claude-sonnet-4-5-20250929
tests:
- vars:
state: California
assert:
- type: factuality
value: Sacramento is the capital of California
- vars:
state: New York
assert:
- type: factuality
value: Albany is the capital city of New York state
You can customize which factuality categories are considered passing by setting scores in your test configuration:
defaultTest:
options:
factuality:
subset: 1 # Score for category A (default: 1)
superset: 1 # Score for category B (default: 1)
agree: 1 # Score for category C (default: 1)
disagree: 0 # Score for category D (default: 0)
differButFactual: 1 # Score for category E (default: 1)
Like other model-graded assertions, you can override the default grader:
Using the CLI:
promptfoo eval --grader openai:gpt-5-mini
Using test options:
defaultTest:
options:
provider: anthropic:claude-sonnet-4-5-20250929
Using assertion-level override:
assert:
- type: factuality
value: Sacramento is the capital of California
provider: openai:gpt-5-mini
You can customize the evaluation prompt using the rubricPrompt property. The prompt has access to the following Nunjucks template variables:
{{input}}: The original prompt/question{{ideal}}: The reference answer (from the value field){{completion}}: The LLM's actual response (provided automatically by promptfoo)Your custom prompt should instruct the model to either:
category and reason fieldsHere's an example of a custom prompt:
defaultTest:
options:
rubricPrompt: |
Input: {{input}}
Reference: {{ideal}}
Completion: {{completion}}
Evaluate the factual consistency between the completion and reference.
Choose the most appropriate option:
(A) Completion is a subset of reference
(B) Completion is a superset of reference
(C) Completion and reference are equivalent
(D) Completion and reference disagree
(E) Completion and reference differ, but differences don't affect factuality
Answer with a single letter (A/B/C/D/E).
The factuality checker will parse either format:
{"category": "A", "reason": "Detailed explanation..."}Use the factuality: prefix in __expected columns:
question,__expected
"What does GPT stand for?","factuality:Generative Pre-trained Transformer"
"What is photosynthesis?","factuality:Plants convert sunlight into chemical energy"
To apply factuality to all rows, see CSV with defaultTest.