Back to Ruflo

Dossier: ADR-088 (LongMemEval Benchmark)

v3/docs/examples/dossiers/adr-088/adr-088.md

3.6.303.9 KB
Original Source

Dossier: ADR-088 (LongMemEval Benchmark)

Generated by dossier-collect skill (ruflo-goals plugin, ADR-099) Seed: ADR-088 · Seed type: adr · Depth: 2 · Truncated: false Generated: 2026-05-03

Executive summary

ADR-088 establishes a reproducible benchmark for AgentDB memory retrieval against the LongMemEval dataset (ICLR 2025, 500 long-conversational-memory questions). It was prompted by MemPalace reporting 96.6% raw / 100% hybrid; our goal was to position AgentDB on the same axis. The decision touches three living artifacts: the harness in v3/@claude-flow/memory/benchmarks/longmemeval/, multiple result JSONs from runs in April-May 2026, and a chain of ADRs (-076, -077, -075, -089) that supply the underlying memory-bridge, DiskANN, and learning-pipeline components. Recent commits show iterative SOTA improvement: BM25+RRF hybrid hit C@1=26.8%, MRR=0.3269.

Entity table

EntityTypeKey attrsSources
ADR-088adrStatus: Accepted, Date: 2026-04-08Read
LongMemEvalbenchmarkICLR 2025, 500 Qs, 6 question typesadr-text, WebSearch
MemPalaceexternal-system100% hybrid, 96.6% rawadr-text
AgentDBsystemRuflo's memory backendadr-text, codebase
harness.tsfilebenchmark runnerGlob, Read
agentdb-adapter.tsfileadapter for AgentDBGlob
BM25+RRFtechniquehybrid retrieval, current SOTAgit-log
MiniLMmodelembedding model usedgit-log
ADR-076adrMemory Bridge (related)adr-text
ADR-077adrDiskANN (related)adr-text
ADR-075adrLearning Pipeline (related)adr-text
ADR-089adrretrieval improvements (follow-on)git-log
OMEGAexternal-system95.4% on LongMemEvaladr-text
Supermemoryexternal-system~93% on gpt-4oadr-text

Graph

mermaid
graph TD
  ADR088[ADR-088] -->|benchmarks| LongMemEval
  ADR088 -->|targets| AgentDB
  ADR088 -->|prompted-by| MemPalace
  ADR088 -->|relates-to| ADR076
  ADR088 -->|relates-to| ADR077
  ADR088 -->|relates-to| ADR075
  ADR088 -->|followed-by| ADR089
  AgentDB -->|implemented-in| harness.ts
  harness.ts -->|delegates-to| agentdb-adapter.ts
  ADR088 -->|adopts| BM25+RRF
  ADR088 -->|uses| MiniLM
  LongMemEval -->|compared-against| OMEGA
  LongMemEval -->|compared-against| Supermemory
  LongMemEval -->|compared-against| MemPalace

Source provenance

RoundSources used (parallel batch)Entities surfaced
0Read v3/docs/adr/ADR-088-longmemeval-benchmark.md, Glob v3/@claude-flow/memory/benchmarks/longmemeval/**, Bash git log --all -- ADR-088*ADR-088, LongMemEval, MemPalace, AgentDB, harness.ts, agentdb-adapter.ts, OMEGA, Supermemory
1Bash git log --oneline --all (filtered "adr-088"), Grep "LongMemEval"BM25+RRF, MiniLM, ADR-076, ADR-077, ADR-075, ADR-089

Recent git history (provenance for "iterative SOTA"):

  • b6ca2dd5d docs(adr-088): record smart+hybrid SOTA (C@1=26.8%, MRR=0.3269)
  • afc75cc71 bench(longmemeval): MiniLM + BM25 hybrid ablation
  • 6bbbdbe2a bench(adr-088): BM25 + RRF hybrid retrieval; new SOTA at C@1=26.8%
  • f88e99ba1 docs(adr-088): add 2026-05-01 run results + tiered optimization roadmap
  • 7331fdd5a feat: LongMemEval benchmark results and ADR-089 retrieval improvements

Stats

  • Nodes: 14 (1 adr-seed, 4 related ADRs, 1 benchmark, 1 system, 3 external-systems, 2 files, 2 techniques)
  • Edges: 12
  • Tokens: ~1.1k
  • Wall: ~3 seconds (Read + Glob + Grep + git in one batch)

Open questions / depth-3 candidates

  • ADR-089 (retrieval improvements) is referenced but not expanded — would surface the actual algorithmic delta.
  • ADR-076 Memory Bridge contains the AgentDB write path; expanding it would link back to ruflo-rag-memory plugin.
  • Run results JSONs in results/ subdirectory contain per-question scores worth statistical-summary expansion.