cookbook/09_evals/README.md
This directory contains runnable examples for Agno evaluation patterns.
accuracy/ - Accuracy evaluation examples for agents and teams.agent_as_judge/ - LLM-as-judge evaluation examples with scoring and hooks.performance/ - Runtime and memory performance benchmark examples.performance/comparison/ - Instantiation benchmarks for non-Agno frameworks.reliability/ - Tool-call reliability evaluation examples.reliability/single_tool_calls/ - Reliability examples for single expected tool calls.reliability/multiple_tool_calls/ - Reliability examples for multi-tool workflows.reliability/team/ - Reliability examples for team tool-call flows.RESTRUCTURE_PLAN.md - Detailed restructuring plan and file dispositions.RESTRUCTURE_PROMPT.md - Implementation prompt for the restructuring task.