cookbook/09_evals/accuracy/TEST_LOG.md
Tests not yet run. Run each file and update this log.
Status: PENDING
Description: Runs sync and async calculator accuracy evaluations.
Status: PENDING
Description: Checks comparison accuracy for decimal values.
Status: PENDING
Description: Evaluates team routing accuracy for language handling.
Status: PENDING
Description: Scores a manually provided answer against expected output.
Status: PENDING
Description: Evaluates accuracy for factorial tool usage.
Status: PENDING
Description: Runs accuracy evaluation and stores results in PostgreSQL.
Status: PENDING
Description: Uses a custom evaluator agent for accuracy scoring.
Status: PASS
Description: Eval model metrics accumulated into agent run_output under "eval_model" detail key.
Result: Shows agent "model" tokens and "eval_model" tokens separately in metrics.details with full breakdown.