cookbook/09_evals/agent_as_judge/TEST_LOG.md
Tests not yet run. Run each file and update this log.
Status: PENDING
Description: Runs sync and async agent-as-judge evaluations.
Status: PENDING
Description: Runs sync and async post-hook quality evaluation flows.
Status: PENDING
Description: Evaluates multiple customer-service cases in one batch run.
Status: PENDING
Description: Executes binary pass/fail tone evaluation.
Status: PENDING
Description: Evaluates a response using a custom evaluator agent.
Status: PENDING
Description: Scores a research team response for quality.
Status: PENDING
Description: Runs post-hook team evaluation and prints stored score details.
Status: PENDING
Description: Evaluates output quality with explicit additional guidelines.
Status: PENDING
Description: Scores a tool-assisted math response for quality.
Status: PASS
Description: Eval model metrics tracked under "eval_model" detail key via post-hook, with sync and async variants.
Result: Shows agent "model" and "eval_model" tokens in metrics.details. Async variant confirms eval_model present.