Factuality

The factuality assertion evaluates the factual consistency between an LLM output and a reference answer. It uses a structured prompt based on OpenAI's evals to determine if the output is factually consistent with the reference.

How to use it

To use the factuality assertion type, add it to your test configuration like this:

yaml

assert:
  - type: factuality
    # Specify the reference statement to check against:
    value: The Earth orbits around the Sun

For non-English evaluation output, see the multilingual evaluation guide.

How it works

The factuality checker evaluates whether completion A (the LLM output) and reference B (the value) are factually consistent. It categorizes the relationship as one of:

(A) Output is a subset of the reference and is fully consistent
(B) Output is a superset of the reference and is fully consistent
(C) Output contains all the same details as the reference
(D) Output and reference disagree
(E) Output and reference differ, but differences don't matter for factuality

By default, options A, B, C, and E are considered passing grades, while D is considered failing.

Example Configuration

Here's a complete example showing how to use factuality checks:

yaml

prompts:
  - 'What is the capital of {{state}}?'
providers:
  - openai:gpt-5
  - anthropic:claude-sonnet-4-5-20250929
tests:
  - vars:
      state: California
    assert:
      - type: factuality
        value: Sacramento is the capital of California
  - vars:
      state: New York
    assert:
      - type: factuality
        value: Albany is the capital city of New York state

Customizing Score Thresholds

You can customize which factuality categories are considered passing by setting scores in your test configuration:

yaml

defaultTest:
  options:
    factuality:
      subset: 1 # Score for category A (default: 1)
      superset: 1 # Score for category B (default: 1)
      agree: 1 # Score for category C (default: 1)
      disagree: 0 # Score for category D (default: 0)
      differButFactual: 1 # Score for category E (default: 1)

Overriding the Grader

Like other model-graded assertions, you can override the default grader:

Using the CLI:

promptfoo eval --grader openai:gpt-5-mini

Using test options:

yaml

defaultTest:
  options:
    provider: anthropic:claude-sonnet-4-5-20250929

Using assertion-level override:

yaml

assert:
  - type: factuality
    value: Sacramento is the capital of California
    provider: openai:gpt-5-mini

Customizing the Prompt

You can customize the evaluation prompt using the rubricPrompt property. The prompt has access to the following Nunjucks template variables:

{{input}}: The original prompt/question
{{ideal}}: The reference answer (from the value field)
{{completion}}: The LLM's actual response (provided automatically by promptfoo)

Your custom prompt should instruct the model to either:

Return a single letter (A, B, C, D, or E) corresponding to the category, or
Return a JSON object with category and reason fields

Here's an example of a custom prompt:

yaml

defaultTest:
  options:
    rubricPrompt: |
      Input: {{input}}
      Reference: {{ideal}}
      Completion: {{completion}}

      Evaluate the factual consistency between the completion and reference.
      Choose the most appropriate option:
      (A) Completion is a subset of reference
      (B) Completion is a superset of reference
      (C) Completion and reference are equivalent
      (D) Completion and reference disagree
      (E) Completion and reference differ, but differences don't affect factuality

      Answer with a single letter (A/B/C/D/E).

The factuality checker will parse either format:

A single letter response like "A" or "(A)"
A JSON object: {"category": "A", "reason": "Detailed explanation..."}

Using Factuality with CSV

Use the factuality: prefix in __expected columns:

csv

question,__expected
"What does GPT stand for?","factuality:Generative Pre-trained Transformer"
"What is photosynthesis?","factuality:Plants convert sunlight into chemical energy"

To apply factuality to all rows, see CSV with defaultTest.

Factuality

Factuality

How to use it

How it works

Example Configuration

Customizing Score Thresholds

Overriding the Grader

Customizing the Prompt

Using Factuality with CSV

See Also