apps/opik-documentation/documentation/fern/docs-v2/evaluation/metrics/meaning_match.mdx
The Meaning Match metric evaluates whether an LLM's output semantically matches a ground truth answer, regardless of phrasing or formatting. This metric is particularly useful for evaluating question-answering systems where the same answer can be expressed in different ways.
The Meaning Match metric is available as an LLM-as-a-Judge metric in automation rules. You can use it to automatically evaluate traces in your project by creating a new rule.
The Meaning Match metric returns a boolean score:
Each score includes a detailed reason explaining the judgment.
The Meaning Match metric follows these rules when evaluating responses:
| Input | Ground Truth | Output | Score | Reason |
|---|---|---|---|---|
| What's the capital of France? | Paris | It's Paris | ✅ true | Output conveys the same factual answer as the ground truth |
| Who painted the Mona Lisa? | Leonardo da Vinci | Da Vinci | ✅ true | "Da Vinci" is an accepted alias for "Leonardo da Vinci" |
| Who painted the Mona Lisa? | Leonardo da Vinci | Pablo Picasso | ❌ false | Output names a different painter than the ground truth |
| What's 10 + 10? | 20 | The answer is twenty | ✅ true | Numeric and textual forms are treated as equivalent |
Opik uses an LLM as a Judge to evaluate semantic equivalence. By default, the evaluation uses the model you select when creating the rule. The prompt template used for evaluation is:
You are an expert semantic equivalence judge. Your task is to decide whether the OUTPUT conveys the same essential answer as the GROUND_TRUTH, regardless of phrasing or formatting.
## What to judge
- TRUE if the OUTPUT expresses the same core fact/entity/value as the GROUND_TRUTH.
- FALSE if the OUTPUT contradicts, differs from, or fails to include the core fact/value in GROUND_TRUTH.
## Rules
1. Focus only on the factual equivalence of the core answer. Ignore style, grammar, or verbosity.
2. Accept aliases, synonyms, paraphrases, or equivalent expressions.
Examples: "NYC" ≈ "New York City"; "Da Vinci" ≈ "Leonardo da Vinci".
3. Ignore case, punctuation, and formatting differences.
4. Extra contextual details are acceptable **only if they don't change or contradict** the main answer.
5. If the OUTPUT includes the correct answer along with additional unrelated or incorrect alternatives → FALSE.
6. Uncertain, hedged, or incomplete answers → FALSE.
7. Treat numeric and textual forms as equivalent (e.g., "100" = "one hundred").
8. Ignore whitespace, articles, and small typos that don't change meaning.
## Output Format
Your response **must** be a single JSON object in the following format:
{
"score": true or false,
"reason": ["short reason for the response"]
}
## Example
INPUT: "Who painted the Mona Lisa?"
GROUND_TRUTH: "Leonardo da Vinci"
OUTPUT: "It was painted by Leonardo da Vinci."
→ {"score": true, "reason": ["Output conveys the same factual answer as the ground truth."]}
OUTPUT: "Pablo Picasso"
→ {"score": false, "reason": ["Output names a different painter than the ground truth."]}
INPUT:
{{input}}
GROUND_TRUTH:
{{ground_truth}}
OUTPUT:
{{output}}
The Meaning Match metric is ideal for: