Back to Agno

Agent-as-Judge Eval Cookbooks

cookbook/09_evals/agent_as_judge/README.md

2.6.4917 B
Original Source

Agent-as-Judge Eval Cookbooks

Agent-as-judge examples evaluate output quality with model-based scoring.

Files

  • agent_as_judge_basic.py - Sync and async numeric scoring with persisted results.
  • agent_as_judge_post_hook.py - Sync and async post-hook evaluation examples.
  • agent_as_judge_batch.py - Batch case evaluation with summary output.
  • agent_as_judge_binary.py - PASS/FAIL quality evaluation example.
  • agent_as_judge_custom_evaluator.py - Uses a custom evaluator agent.
  • agent_as_judge_team.py - Evaluates quality of team-generated responses.
  • agent_as_judge_team_post_hook.py - Team post-hook quality checking.
  • agent_as_judge_with_guidelines.py - Numeric scoring with additional guidelines.
  • agent_as_judge_with_tools.py - Evaluates responses from a tool-using agent.
  • agent_as_judge_eval_metrics.py - Eval model metrics tracked under "eval_model" detail key via post-hook.