Back to Agno

Test Log: agent_as_judge

cookbook/09_evals/agent_as_judge/TEST_LOG.md

2.6.41.5 KB
Original Source

Test Log: agent_as_judge

Tests not yet run. Run each file and update this log.

agent_as_judge_basic.py

Status: PENDING

Description: Runs sync and async agent-as-judge evaluations.


agent_as_judge_post_hook.py

Status: PENDING

Description: Runs sync and async post-hook quality evaluation flows.


agent_as_judge_batch.py

Status: PENDING

Description: Evaluates multiple customer-service cases in one batch run.


agent_as_judge_binary.py

Status: PENDING

Description: Executes binary pass/fail tone evaluation.


agent_as_judge_custom_evaluator.py

Status: PENDING

Description: Evaluates a response using a custom evaluator agent.


agent_as_judge_team.py

Status: PENDING

Description: Scores a research team response for quality.


agent_as_judge_team_post_hook.py

Status: PENDING

Description: Runs post-hook team evaluation and prints stored score details.


agent_as_judge_with_guidelines.py

Status: PENDING

Description: Evaluates output quality with explicit additional guidelines.


agent_as_judge_with_tools.py

Status: PENDING

Description: Scores a tool-assisted math response for quality.


agent_as_judge_eval_metrics.py

Status: PASS

Description: Eval model metrics tracked under "eval_model" detail key via post-hook, with sync and async variants.

Result: Shows agent "model" and "eval_model" tokens in metrics.details. Async variant confirms eval_model present.