Back to Agno

Demo AgentOS Test Log

cookbook/01_demo/TEST_LOG.md

2.6.133.8 KB
Original Source

Demo AgentOS Test Log

Last updated: 2026-06-05

Test Environment

  • Python: .venvs/demo/bin/python
  • Model: gpt-5.5 (agents + eval judge)
  • Database: local SQLite at data/demo.db
  • Backends configured (via local .envrc): OpenAI, Parallel, Google, Git wiki (WIKI_REPO_URL + WIKI_GITHUB_TOKEN), Notion wiki (NOTION_API_KEY + NOTION_DATABASE_ID).

HTML generation requires agno's file-generation support (#8241), which ships in agno >= 2.6.12. demo_setup.sh installs the local editable agno, which includes it; a released wheel <= 2.6.11 does not register generate_html_file.


Change Summary (this session)

  • Removed the standalone FileGenerator agent; folded HTML-only file generation (settings.html_tools() -> FileGenerationTools) into the three wiki agents (LocalWiki, GitWiki, NotionWiki). CodeSearch is excluded.
  • Turned off enable_agentic_memory on all agents (cleaner responses; the wiki is the persistent store).
  • Moved assets/ -> evals/assets/ (the sample diagram is an eval fixture).

Static Checks

Status: PASS

  • ruff format + ruff check over cookbook/01_demo: clean.
  • App builds (import run): registers LocalWiki + CodeSearch; GitWiki/NotionWiki register only when their env vars are set. All three wiki agents expose generate_html_file; CodeSearch does not.

cookbook/scripts/check_cookbook_pattern.py reports advisory missing_main_gate / missing_sections — these assume standalone scripts; 01_demo is a served app, and the checker is not wired into validate.sh/CI.


Eval Suite (python -m evals)

Result this run: 6/7 (the one failure is environmental — see below).

CaseAgentJudgeReliabilityStatus
local_wiki_reports_state_honestlyLocalWikiPASSPASSPASS
local_wiki_ingests_imageLocalWikiPASSPASSPASS
local_wiki_generates_htmlLocalWikiPASSPASS
code_search_lists_registered_agentsCodeSearchPASSPASSPASS
code_search_admits_unknown_functionCodeSearchPASSPASS
git_wiki_reports_state_honestlyGitWikiPASSPASSPASS
notion_wiki_reports_state_honestlyNotionWikiFAILPASSFAIL

Core demo (LocalWiki + CodeSearch + HTML): all PASS.

HTML case is reliability-based

local_wiki_generates_html asserts generate_html_file fires. The agent reliably calls the tool and produces a valid <!doctype html> file under data/generated/. A strict response-text judge is intentionally omitted: the wiki-curator persona narrates the result as "filed/recorded a note" rather than "here is your HTML file." The file is correct; the narration wording is a known limitation (instructions + memory-mode changes did not reliably move it).

Git / Notion failure is environmental, not a code defect

These cases are env-gated and read real external backends. In this run:

  • Git: git clone of WIKI_REPO_URL failed (exit 128) — repo/branch/auth. The case still passed (agent honestly reported it could not read the wiki).
  • Notion: the local mirror data/notion-wiki was empty (the gitignored data/ dir was wiped mid-session), so the read returned nothing and the judge rejected the agent's response.

Both backends were verified reading real content earlier this session (full suite 6/6), before data/ was wiped. With a reachable git repo and a populated Notion database, both cases pass. In CI (no creds) these cases do not run.


Notes

  • The Git and Notion eval cases are read-only; running the suite does not push to a real repo or Notion database.
  • Local wiki pages, the Git clone, the Notion mirror, and generated HTML all live under data/ (gitignored); fresh clones self-seed / re-sync on setup.