skills/mem0-test-integration/SKILL.md
Verifies what /mem0-integrate produced. Runs in the same workspace,
on the same feature branch. Loose coupling — fast, catches compile and
runtime bugs, does not catch logical errors.
All static checks and smoke-test shapes validate against these URLs.
WebFetch each before running step 3.
@ai-sdk/*): https://raw.githubusercontent.com/mem0ai/mem0/main/skills/mem0-vercel-ai-sdk/SKILL.mdmem0_tested_versions):
Read the Delegated skill: field in .mem0-integration/plan.md — if it
names a skill URL, fetch that skill and use its example blocks as the
reference for both static checks (step 3) and the smoke test (step 5).
Every check in this skill assumes the integration is additive and
feature-flagged (see /mem0-integrate "Integration principles").
Specifically:
product.json must contain a feature_flag field.main. Any failure here is a
hard fail — do not let the self-heal loop attempt a patch.non_invasive: false and sets
overall: fail with a distinct reason code the integrator's heal
loop refuses to touch.Refuse to start unless ALL of the following are true:
.mem0-integration/ directory exists in the repo root..mem0-integration/product.json, goal.md, and plan.md are readable
and internally consistent (JSON parses, docs non-empty).mem0-integrate/ (set by the companion
skill). Prevents accidental runs on unrelated branches.MEM0_API_KEY for Platform, OPENAI_API_KEY for OSS — read which from
product.json). Interactive mode asks if missing; CI mode exits 2.Exit with a written rationale on any precondition failure. Never attempt to "fix up" state.
Load:
product.json → which language, which product (Platform vs OSS), which
mem0 version, write_site, read_site.plan.md → the mechanical contract (write pattern, read pattern,
preserved behavior).goal.md → the intent (displayed in the scorecard only; not tested).Route by language from product.json:
| Language | Command |
|---|---|
| Python | pip install -e . if editable, else pip install -r requirements.txt. Then pip install mem0ai if not already present at the pinned version. |
| TypeScript / JavaScript | npm install (or pnpm install / yarn install if detected by lockfile). |
If install fails → exit code 2 with stderr tail. Never move to testing if dependencies don't resolve.
Import check: does the write-site file import the expected Mem0
surface? Authoritative list comes from ## Identify the User's Setup
in https://docs.mem0.ai/llms.txt:
from mem0 import MemoryClientimport MemoryClient from "mem0ai"from mem0 import Memoryimport { Memory } from "mem0ai/oss"If plan.md names a delegated skill (e.g., Vercel AI), use that
skill's import signature instead of the list above. Mismatch → fail
with line number.
Version check: installed mem0ai version falls in the range from
this skill's mem0_tested_versions. Out of range → warn but continue.
Type check (TS tracks only): run tsc --noEmit or tsup --dts.
Non-zero → fail.
Lint (if the repo has a linter configured): run the repo's own lint command. Lint failures from this skill's changes → fail; pre-existing lint failures → surface as a warning.
Eager-init check: grep the write_site and read_site files (paths
from product.json) for MemoryClient( or Memory( at module scope —
i.e., not inside a function, method, or class body. MemoryClient()
validates the API key in __init__ (network call) and OSS Memory()
can eagerly initialize embedding/LLM providers — module-level
instantiation hits the wire on import and breaks Pass A's test
collection whenever the key is unset. Hit → fail with file:line and
the lazy-init guidance from /mem0-integrate step 8 constraint #7.
| Language | Test command (in priority order) |
|---|---|
| Python | pytest with the test files from step 5 of the companion skill, else python -m unittest discover. |
| TypeScript / JavaScript | npm test if defined in package.json; else auto-detect vitest or jest. |
Pass A — feature_flag unset. Run the entire pre-existing suite
(excluding the new test_mem0_* files). Must be 100% green. Any
failure here marks non_invasive: false in the scorecard and is
a hard fail — the integrator's self-heal loop refuses to touch it.
Pass B — feature_flag set (value from product.json). Run the
full suite including the new tests. All must pass.
Isolate integration-introduced failures using git diff main..HEAD --name-only. A test file that exists on main and fails only under
the integration branch (flag set or unset) counts against the
scorecard regardless of pass. A test file that already failed on main
is surfaced as pre_existing_unrelated and does not count — but is
still reported so the user can clean it up.
Capture output to .mem0-integration/test-stdout-flag-off.log and
.mem0-integration/test-stdout-flag-on.log. Scorecard reports pass/fail
per pass.
Scripted end-to-end flow tailored to product.json. The call shapes
below are the minimal ones; if plan.md names a delegated skill, use
that skill's minimal example verbatim instead — it is the canonical
shape for the detected stack.
Platform (Python):
from mem0 import MemoryClient
c = MemoryClient() # uses MEM0_API_KEY
uid = f"mem0-test-integration-{os.urandom(4).hex()}"
c.add([{"role": "user", "content": "I prefer aisle seats"}], user_id=uid)
hits = c.search("seat preference", user_id=uid)
assert any("aisle" in h.get("memory", "") for h in hits), hits
c.delete_all(user_id=uid) # clean up
Platform (TS): same shape with MemoryClient from "mem0ai".
OSS (Python / TS): uses Memory() / new Memory() with default config
(OpenAI LLM via OPENAI_API_KEY, local Qdrant). If the repo ships a
docker-compose.yml with a Qdrant service, the skill starts it first and
tears it down after. If no backing store is reachable → fail with a
clear message naming the fix.
The smoke test always uses a disposable random user_id prefixed with
mem0-test-integration- so a failed cleanup doesn't pollute the user's
real data. A background tidy step deletes any prefix-matching entries
older than 24 hours on the next run.
Capture output to .mem0-integration/smoke-stdout.log.
Unit tests + smoke prove the SDK works in isolation. This step is the real signal: does memory actually appear in the app's user-visible output when the integration runs end-to-end?
Requires plan.md to contain an E2E recipe: section (authored by
/mem0-integrate step 5). If absent → status skipped (not fail),
note in scorecard that the repo has no runnable entry point.
Recipe fields the skill reads:
start — shell command to launch the app using $PORT for any network
port. Run in background with stdout/stderr teed to
.mem0-integration/e2e-app.log.ready_probe — how to detect readiness. url=... status=... polls an
HTTP endpoint; log="..." waits for a substring in e2e-app.log;
sleep=N waits N seconds (last resort). 60-second hard timeout.compose_services — optional. If set, bring them up via
docker compose up -d <services> before start, tear them down with
docker compose down at the end.write_call — triggers the Mem0 write path exactly once. Output is
captured and surfaced on failure. 60-second hard timeout.write_async_wait_ms — pause after write_call to let async memory
flushes land. Default 0.read_call — triggers the Mem0 read path. Typically a fresh session
or new request that should surface the stored memory.read_assert — substring, regex=..., or jsonpath=<expr>=<value>
that must appear in read_call's stdout. This is the E2E pass gate.Execution order:
PORT.MEM0_USER_ID to a disposable mem0-test-integration-<rand> value
and export it, so the app can use the same scoping the smoke test does
if the recipe wants cleanup.compose_services if named.start in the background.ready_probe until success or 60s timeout. Timeout → fail.write_call. Non-zero exit → fail (but continue to cleanup).write_async_wait_ms.read_call.read_assert against read_call's stdout. Miss → fail.docker compose down if services were started, delete_all
memories matching mem0-test-integration-* on Platform scenarios.On any failure, the scorecard includes:
e2e-app.logwrite_call outputread_call outputread_assertWrite .mem0-integration/scorecard.md and .mem0-integration/scorecard.json:
{
"timestamp": "2026-04-20T14:03:11Z",
"branch": "mem0-integrate/remember-user-preferences",
"product": "platform",
"language": "python",
"mem0_version": "2.0.0",
"non_invasive": true,
"feature_flag": "MEM0_ENABLED",
"results": {
"install": {"status": "pass", "duration_ms": 12043},
"static_checks":{"status": "pass", "duration_ms": 812},
"unit_tests_flag_off": {"status": "pass", "duration_ms": 3920, "count": 47,
"reason": "all pre-existing tests green with flag unset"},
"unit_tests_flag_on": {"status": "pass", "duration_ms": 4321, "count": 49},
"smoke_test": {"status": "pass", "duration_ms": 2890, "memory_id": "mem_..."},
"e2e_test": {"status": "pass", "duration_ms": 14200,
"ready_probe_ms": 3100, "write_exit": 0,
"read_assert_matched": true}
},
"friction": {
"dependency_install_retries": 0,
"pre_existing_test_failures": 0,
"warnings": ["mem0ai 2.0.0 pinned; consider 2.0.1 for fix X"]
},
"overall": "pass"
}
The markdown version is human-readable and includes:
.mem0-integration/,
which is gitignored. The user can inspect and optionally pin..mem0-integration/)| File | Purpose | Retention |
|---|---|---|
scorecard.md | Human-readable verdict. | Overwritten per run. |
scorecard.json | Machine-readable verdict. Consumed by the CI scorecard workflow later. | Overwritten per run. |
test-stdout-flag-off.log | Step 4 Pass A (pre-existing suite, flag unset). | Overwritten per run. |
test-stdout-flag-on.log | Step 4 Pass B (full suite, flag set). | Overwritten per run. |
smoke-stdout.log | Full output from step 5. | Overwritten per run. |
e2e-app.log | Background app stdout/stderr from step 6. | Overwritten per run. |
e2e-calls.log | write_call + read_call invocations and outputs. | Overwritten per run. |
| Mode | Trigger | Behavior |
|---|---|---|
| Interactive (default) | TTY present, MEM0_TEST_CI unset | Asks for missing keys, prints friendly summaries. |
| CI | MEM0_TEST_CI=1 | Keys must be in env, no prompts, non-zero exit on any fail. JSON scorecard goes to stdout's tail for workflow parsing. |
/mem0-test-integration # interactive, all steps
/mem0-test-integration --ci # non-interactive
/mem0-test-integration --skip-smoke # no API calls, no E2E
/mem0-test-integration --skip-e2e # unit + smoke only (faster CI)
/mem0-test-integration --only-smoke # just smoke
/mem0-test-integration --only-e2e # just E2E (assumes deps installed)
Composition: --skip-* can stack (--skip-smoke --skip-e2e = static +
unit only, zero API cost). --only-* is mutually exclusive with all
other flags.
| Code | Meaning |
|---|---|
| 0 | All checks passed. |
| 1 | Precondition failed (no .mem0-integration/, wrong branch, dirty tree). |
| 2 | Missing env key (CI mode) or dependency install failure. |
| 3 | Static sanity check failed (wrong import, type error). |
| 4 | Unit tests failed (Pass B — integration itself broken). |
| 5 | Smoke test failed. |
| 6 | E2E test failed (ready_probe timeout, write/read call failed, or read_assert miss). |
| 7 | Non-invasiveness violation: Pass A failed (pre-existing tests broke). Integrator's heal loop refuses to touch this. |
| 8 | Internal error (skill bug — report it). |
/mem0-integrate on the same
goal + plan; do not hand-patch.user_id correctly across real users, or handles conflict
resolution well. That's human review territory./mem0-integrate skill in its default --heal mode consumes the
scorecard produced here and drives its own remediation loop. Exit
code 7 (non-invasiveness violation) is the explicit signal the heal
loop must stop and surface to the user.main baseline diffing. The
scorecard reflects this branch only.