Back to Opik

metrics

apps/opik-documentation/python-sdk-docs/source/evaluation/metrics/index.rst

2.0.22-6605-merge-20652.2 KB
Original Source

metrics

Opik includes a number of pre-built metrics to help you evaluate your LLM application.

Each metric can be called as a standalone function using the score method::

from opik.evaluation.metrics import Hallucination

metric = Hallucination()

metric.score( input="What is the capital of France?", output="The capital of France is Paris. It is famous for its iconic Eiffel Tower and rich cultural heritage.", context=["France is a country in Western Europe. Its capital is Paris, which is known for landmarks like the Eiffel Tower."], )

Or as part of an evaluation run using the evaluate function.

You can learn more about each metric in the following sections:

.. toctree:: :maxdepth: 4 :titlesonly:

Equals RegexMatch Contains IsJson LevenshteinRatio

Hallucination GEval Moderation

AnswerRelevance ContextPrecision ContextRecall

BaseMetric ConversationThreadMetric HeuristicMetrics ConversationHeuristicMetrics ConversationLLMJudges LLMJudgePresets LLMJuries UtilityMetrics

The pages above fall into two categories:

  • Established metric guides (e.g., Equals, Hallucination) that remain the authoritative deep dives.
  • Aggregation pages that collect the expanded metric families so every class exported via :mod:opik.evaluation.metrics has an accompanying API reference.

Use these aggregation pages to browse the extended catalog:

  • :doc:HeuristicMetrics — sentence/word overlap, readability, sentiment, prompt safety, and distribution comparisons.
  • :doc:ConversationHeuristicMetrics — fast heuristics for degeneracy and knowledge retention.
  • :doc:ConversationLLMJudges — LLM-as-a-judge conversation evaluators and session quality metrics.
  • :doc:LLMJudgePresets — pre-built GEval presets and bias checks.
  • :doc:LLMJuries — multi-judge aggregation.
  • :doc:UtilityMetrics — helpers such as AggregatedMetric and RagasMetricWrapper.

Import any metric directly from :mod:opik.evaluation.metrics, and pair these API references with the Fern guides in apps/opik-documentation/documentation/fern/docs/evaluation/metrics for workflow context.