apps/opik-documentation/documentation/fern/docs/evaluation/metrics/dialogue_helpfulness.mdx
DialogueHelpfulnessJudge inspects the latest assistant reply in the context of preceding turns. It rewards responses that acknowledge the user’s request, use the available context, and offer actionable guidance.
from opik.evaluation.metrics import DialogueHelpfulnessJudge
turns = """USER: My VPN disconnects every 5 minutes.\nASSISTANT: Try reinstalling the client.\nUSER: I already did.\n"""
metric = DialogueHelpfulnessJudge()
score = metric.score(
input=turns,
output="Can you send logs? I'll escalate to network engineering.",
)
print(score.value)
print(score.reason)
| Argument | Type | Required | Description |
|---|---|---|---|
input | str | Optional | Conversation history (alternating USER / ASSISTANT blocks). |
conversation | list[dict] | Optional | Structured turns ({"role": "user", "content": "..."} |
output | str | Yes | Latest assistant reply to score. |
| Parameter | Default | Notes |
|---|---|---|
model | gpt-5-nano | Switch to a larger evaluator for complex enterprise workflows. |
temperature | 0.0 | Use low temperature for reproducible benchmarks. |
track | True | Record the evaluation in Opik. |
project_name | None | Set when routing results to a different project. |
Integrate this judge into regression suites to catch regressions after prompt changes or upgrades to your assistant model.