cookbook/09_evals/agent_as_judge/README.md
Agent-as-judge examples evaluate output quality with model-based scoring.
agent_as_judge_basic.py - Sync and async numeric scoring with persisted results.agent_as_judge_post_hook.py - Sync and async post-hook evaluation examples.agent_as_judge_batch.py - Batch case evaluation with summary output.agent_as_judge_binary.py - PASS/FAIL quality evaluation example.agent_as_judge_custom_evaluator.py - Uses a custom evaluator agent.agent_as_judge_team.py - Evaluates quality of team-generated responses.agent_as_judge_team_post_hook.py - Team post-hook quality checking.agent_as_judge_with_guidelines.py - Numeric scoring with additional guidelines.agent_as_judge_with_tools.py - Evaluates responses from a tool-using agent.agent_as_judge_eval_metrics.py - Eval model metrics tracked under "eval_model" detail key via post-hook.