Back to Agno

Evals Cookbook

cookbook/09_evals/README.md

2.6.4889 B
Original Source

Evals Cookbook

This directory contains runnable examples for Agno evaluation patterns.

Directory Overview

  • accuracy/ - Accuracy evaluation examples for agents and teams.
  • agent_as_judge/ - LLM-as-judge evaluation examples with scoring and hooks.
  • performance/ - Runtime and memory performance benchmark examples.
  • performance/comparison/ - Instantiation benchmarks for non-Agno frameworks.
  • reliability/ - Tool-call reliability evaluation examples.
  • reliability/single_tool_calls/ - Reliability examples for single expected tool calls.
  • reliability/multiple_tool_calls/ - Reliability examples for multi-tool workflows.
  • reliability/team/ - Reliability examples for team tool-call flows.

Root Files

  • RESTRUCTURE_PLAN.md - Detailed restructuring plan and file dispositions.
  • RESTRUCTURE_PROMPT.md - Implementation prompt for the restructuring task.