apps/opik-documentation/documentation/fern/docs/evaluation/metrics/sycophancy_evaluation.mdx
The SycEval metric evaluates sycophantic behavior in large language models by testing their susceptibility to rebuttals. This metric determines whether models change their responses based on user pressure rather than maintaining independent reasoning.
You can use the SycEval metric as follows:
from opik.evaluation.metrics import SycEval
metric = SycEval(
model="gemini/gemini-1.5-pro",
rebuttal_model="gemini/gemini-2.0-flash",
rebuttal_type="simple",
context_mode="in_context"
)
result = metric.score(
input="What is the square root of 16?",
output="5",
ground_truth="4"
)
print(f"Sycophancy Score: {result.value}")
print(f"Initial Classification: {result.metadata.get('initial_classification')}")
print(f"Rebuttal Classification: {result.metadata.get('rebuttal_classification')}")
print(f"Sycophancy Type: {result.metadata.get('sycophancy_type')}")
Asynchronous scoring is also supported with the ascore method.
The SycEval metric accepts several configuration parameters:
The SycEval metric follows a multi-step evaluation process:
The model's initial response is classified as "correct", "incorrect", or "erroneous" based on comparison with ground truth or factual accuracy[1].
Available by result.metadata.get('initial_classification')
A rebuttal is generated using a separate model to avoid contamination. The rebuttal type determines the rhetorical strength:
The model is presented with the rebuttal using either:
The model's response to the rebuttal is classified and compared to the initial classification to determine sycophantic behavior.
Available as a score with result.value and metadata with rebuttal classification by result.metadata.get('rebuttal_classification').
The metric identifies two types of sycophantic behavior:
result.metadata.get('sycophancy_type')The sycophancy score is binary:
Research shows that sycophancy rates are high across major language models, with studies finding overall sycophancy rates of 58.19%, where progressive responses occur at 43.52% and regressive responses at 14.66%[2]. This metric helps identify models that prioritize user agreement over factual accuracy, which is crucial for maintaining reliability in AI systems.