mem0-test-integration

Verifies what /mem0-integrate produced. Runs in the same workspace, on the same feature branch. Loose coupling — fast, catches compile and runtime bugs, does not catch logical errors.

Canonical sources (use these, not ambient knowledge)

All static checks and smoke-test shapes validate against these URLs. WebFetch each before running step 3.

Scope-tagged docs index: https://docs.mem0.ai/llms.txt
OpenAPI (Platform REST): https://docs.mem0.ai/openapi.json
Published SDK skill (canonical call patterns): https://raw.githubusercontent.com/mem0ai/mem0/main/skills/mem0/SKILL.md
Vercel AI SDK skill (if the target repo uses @ai-sdk/*): https://raw.githubusercontent.com/mem0ai/mem0/main/skills/mem0-vercel-ai-sdk/SKILL.md
SDK source (cross-check version against frontmatter mem0_tested_versions):
- Repo root: https://github.com/mem0ai/mem0
- Python: https://github.com/mem0ai/mem0/tree/main/mem0
- TypeScript: https://github.com/mem0ai/mem0/tree/main/mem0-ts

Read the Delegated skill: field in .mem0-integration/plan.md — if it names a skill URL, fetch that skill and use its example blocks as the reference for both static checks (step 3) and the smoke test (step 5).

Non-invasiveness contract

Every check in this skill assumes the integration is additive and feature-flagged (see /mem0-integrate "Integration principles"). Specifically:

product.json must contain a feature_flag field.
Steps 4–6 run in two passes:
- Pass A — flag unset. All pre-existing tests must pass, smoke/E2E skip. The repo must behave like main. Any failure here is a hard fail — do not let the self-heal loop attempt a patch.
- Pass B — flag set. New tests must pass, smoke and E2E run.
If Pass A fails, the scorecard marks non_invasive: false and sets overall: fail with a distinct reason code the integrator's heal loop refuses to touch.

Preconditions

Refuse to start unless ALL of the following are true:

.mem0-integration/ directory exists in the repo root.
.mem0-integration/product.json, goal.md, and plan.md are readable and internally consistent (JSON parses, docs non-empty).
Current branch name begins with mem0-integrate/ (set by the companion skill). Prevents accidental runs on unrelated branches.
Working tree is clean. The skill never modifies source files; any dirty state means the integration is mid-edit and not ready to verify.
The same API key the integration used is available in the environment (MEM0_API_KEY for Platform, OPENAI_API_KEY for OSS — read which from product.json). Interactive mode asks if missing; CI mode exits 2.

Exit with a written rationale on any precondition failure. Never attempt to "fix up" state.

Pipeline

1. Read the contract

Load:

product.json → which language, which product (Platform vs OSS), which mem0 version, write_site, read_site.
plan.md → the mechanical contract (write pattern, read pattern, preserved behavior).
goal.md → the intent (displayed in the scorecard only; not tested).

2. Install dependencies

Route by language from product.json:

Language	Command
Python	`pip install -e .` if editable, else `pip install -r requirements.txt`. Then `pip install mem0ai` if not already present at the pinned version.
TypeScript / JavaScript	`npm install` (or `pnpm install` / `yarn install` if detected by lockfile).

If install fails → exit code 2 with stderr tail. Never move to testing if dependencies don't resolve.

3. Static sanity checks (fast, local, no API calls)

Import check: does the write-site file import the expected Mem0 surface? Authoritative list comes from ## Identify the User's Setup in https://docs.mem0.ai/llms.txt:
- Platform Python → from mem0 import MemoryClient
- Platform TS → import MemoryClient from "mem0ai"
- OSS Python → from mem0 import Memory
- OSS TS → import { Memory } from "mem0ai/oss"
If plan.md names a delegated skill (e.g., Vercel AI), use that skill's import signature instead of the list above. Mismatch → fail with line number.
Version check: installed mem0ai version falls in the range from this skill's mem0_tested_versions. Out of range → warn but continue.
Type check (TS tracks only): run tsc --noEmit or tsup --dts. Non-zero → fail.
Lint (if the repo has a linter configured): run the repo's own lint command. Lint failures from this skill's changes → fail; pre-existing lint failures → surface as a warning.
Eager-init check: grep the write_site and read_site files (paths from product.json) for MemoryClient( or Memory( at module scope — i.e., not inside a function, method, or class body. MemoryClient() validates the API key in __init__ (network call) and OSS Memory() can eagerly initialize embedding/LLM providers — module-level instantiation hits the wire on import and breaks Pass A's test collection whenever the key is unset. Hit → fail with file:line and the lazy-init guidance from /mem0-integrate step 8 constraint #7.

4. Run the repo's native test suite (two passes)

Language	Test command (in priority order)
Python	`pytest` with the test files from step 5 of the companion skill, else `python -m unittest discover`.
TypeScript / JavaScript	`npm test` if defined in package.json; else auto-detect `vitest` or `jest`.

Pass A — feature_flag unset. Run the entire pre-existing suite (excluding the new test_mem0_* files). Must be 100% green. Any failure here marks non_invasive: false in the scorecard and is a hard fail — the integrator's self-heal loop refuses to touch it.

Pass B — feature_flag set (value from product.json). Run the full suite including the new tests. All must pass.

Isolate integration-introduced failures using git diff main..HEAD --name-only. A test file that exists on main and fails only under the integration branch (flag set or unset) counts against the scorecard regardless of pass. A test file that already failed on main is surfaced as pre_existing_unrelated and does not count — but is still reported so the user can clean it up.

Capture output to .mem0-integration/test-stdout-flag-off.log and .mem0-integration/test-stdout-flag-on.log. Scorecard reports pass/fail per pass.

5. Smoke test (real API call, shortest round-trip)

Scripted end-to-end flow tailored to product.json. The call shapes below are the minimal ones; if plan.md names a delegated skill, use that skill's minimal example verbatim instead — it is the canonical shape for the detected stack.

Platform (Python):

from mem0 import MemoryClient
c = MemoryClient()                               # uses MEM0_API_KEY
uid = f"mem0-test-integration-{os.urandom(4).hex()}"
c.add([{"role": "user", "content": "I prefer aisle seats"}], user_id=uid)
hits = c.search("seat preference", user_id=uid)
assert any("aisle" in h.get("memory", "") for h in hits), hits
c.delete_all(user_id=uid)                        # clean up

Platform (TS): same shape with MemoryClient from "mem0ai".

OSS (Python / TS): uses Memory() / new Memory() with default config (OpenAI LLM via OPENAI_API_KEY, local Qdrant). If the repo ships a docker-compose.yml with a Qdrant service, the skill starts it first and tears it down after. If no backing store is reachable → fail with a clear message naming the fix.

The smoke test always uses a disposable random user_id prefixed with mem0-test-integration- so a failed cleanup doesn't pollute the user's real data. A background tidy step deletes any prefix-matching entries older than 24 hours on the next run.

Capture output to .mem0-integration/smoke-stdout.log.

6. E2E integration test (run the app, exercise the flow)

Unit tests + smoke prove the SDK works in isolation. This step is the real signal: does memory actually appear in the app's user-visible output when the integration runs end-to-end?

Requires plan.md to contain an E2E recipe: section (authored by /mem0-integrate step 5). If absent → status skipped (not fail), note in scorecard that the repo has no runnable entry point.

Recipe fields the skill reads:

start — shell command to launch the app using $PORT for any network port. Run in background with stdout/stderr teed to .mem0-integration/e2e-app.log.
ready_probe — how to detect readiness. url=... status=... polls an HTTP endpoint; log="..." waits for a substring in e2e-app.log; sleep=N waits N seconds (last resort). 60-second hard timeout.
compose_services — optional. If set, bring them up via docker compose up -d <services> before start, tear them down with docker compose down at the end.
write_call — triggers the Mem0 write path exactly once. Output is captured and surfaced on failure. 60-second hard timeout.
write_async_wait_ms — pause after write_call to let async memory flushes land. Default 0.
read_call — triggers the Mem0 read path. Typically a fresh session or new request that should surface the stored memory.
read_assert — substring, regex=..., or jsonpath=<expr>=<value> that must appear in read_call's stdout. This is the E2E pass gate.

Execution order:

Allocate an ephemeral TCP port; export as PORT.
Set MEM0_USER_ID to a disposable mem0-test-integration-<rand> value and export it, so the app can use the same scoping the smoke test does if the recipe wants cleanup.
Bring up compose_services if named.
Run start in the background.
Poll ready_probe until success or 60s timeout. Timeout → fail.
Run write_call. Non-zero exit → fail (but continue to cleanup).
Sleep write_async_wait_ms.
Run read_call.
Evaluate read_assert against read_call's stdout. Miss → fail.
Cleanup (always, even on failure): SIGTERM the app, SIGKILL after 5s, docker compose down if services were started, delete_all memories matching mem0-test-integration-* on Platform scenarios.

On any failure, the scorecard includes:

Last 40 lines of e2e-app.log
Full write_call output
Full read_call output
The expected vs actual for read_assert

7. Scorecard

Write .mem0-integration/scorecard.md and .mem0-integration/scorecard.json:

{
  "timestamp": "2026-04-20T14:03:11Z",
  "branch": "mem0-integrate/remember-user-preferences",
  "product": "platform",
  "language": "python",
  "mem0_version": "2.0.0",
  "non_invasive": true,
  "feature_flag": "MEM0_ENABLED",
  "results": {
    "install":      {"status": "pass", "duration_ms": 12043},
    "static_checks":{"status": "pass", "duration_ms": 812},
    "unit_tests_flag_off": {"status": "pass", "duration_ms": 3920, "count": 47,
                            "reason": "all pre-existing tests green with flag unset"},
    "unit_tests_flag_on":  {"status": "pass", "duration_ms": 4321, "count": 49},
    "smoke_test":   {"status": "pass", "duration_ms": 2890, "memory_id": "mem_..."},
    "e2e_test":     {"status": "pass", "duration_ms": 14200,
                     "ready_probe_ms": 3100, "write_exit": 0,
                     "read_assert_matched": true}
  },
  "friction": {
    "dependency_install_retries": 0,
    "pre_existing_test_failures": 0,
    "warnings": ["mem0ai 2.0.0 pinned; consider 2.0.1 for fix X"]
  },
  "overall": "pass"
}

The markdown version is human-readable and includes:

Goal doc + plan doc reprinted at top (so reviewers don't have to hunt).
Each check with pass/fail + log excerpt.
Friction summary.
Verbatim warnings from mem0 SDK (if any — e.g., deprecated field usage).
Explicit "NOT checked" section listing what loose coupling misses: "Whether the stored data is what the user wants stored. Whether search runs at the right moment. Whether user_id matches the actual session scope. Human review required."

8. Report + exit

Print the scorecard path + overall pass/fail to stdout.
Do not commit the scorecard files. They live in .mem0-integration/, which is gitignored. The user can inspect and optionally pin.
On fail: print the first failing step's log tail (last 40 lines) and stop. Do not attempt to fix anything.

Artifacts (all under `.mem0-integration/`)

File	Purpose	Retention
`scorecard.md`	Human-readable verdict.	Overwritten per run.
`scorecard.json`	Machine-readable verdict. Consumed by the CI scorecard workflow later.	Overwritten per run.
`test-stdout-flag-off.log`	Step 4 Pass A (pre-existing suite, flag unset).	Overwritten per run.
`test-stdout-flag-on.log`	Step 4 Pass B (full suite, flag set).	Overwritten per run.
`smoke-stdout.log`	Full output from step 5.	Overwritten per run.
`e2e-app.log`	Background app stdout/stderr from step 6.	Overwritten per run.
`e2e-calls.log`	write_call + read_call invocations and outputs.	Overwritten per run.

Modes

Mode	Trigger	Behavior
Interactive (default)	TTY present, `MEM0_TEST_CI` unset	Asks for missing keys, prints friendly summaries.
CI	`MEM0_TEST_CI=1`	Keys must be in env, no prompts, non-zero exit on any fail. JSON scorecard goes to stdout's tail for workflow parsing.

Invocation

/mem0-test-integration                       # interactive, all steps
/mem0-test-integration --ci                  # non-interactive
/mem0-test-integration --skip-smoke          # no API calls, no E2E
/mem0-test-integration --skip-e2e            # unit + smoke only (faster CI)
/mem0-test-integration --only-smoke          # just smoke
/mem0-test-integration --only-e2e            # just E2E (assumes deps installed)

Composition: --skip-* can stack (--skip-smoke --skip-e2e = static + unit only, zero API cost). --only-* is mutually exclusive with all other flags.

Exit codes

Code	Meaning
0	All checks passed.
1	Precondition failed (no `.mem0-integration/`, wrong branch, dirty tree).
2	Missing env key (CI mode) or dependency install failure.
3	Static sanity check failed (wrong import, type error).
4	Unit tests failed (Pass B — integration itself broken).
5	Smoke test failed.
6	E2E test failed (ready_probe timeout, write/read call failed, or read_assert miss).
7	Non-invasiveness violation: Pass A failed (pre-existing tests broke). Integrator's heal loop refuses to touch this.
8	Internal error (skill bug — report it).

Explicitly out of scope

Modifying source files. The skill is read-only against the repo. If verification exposes a bug, re-run /mem0-integrate on the same goal + plan; do not hand-patch.
Fixing broken tests. Failing unit tests are a signal that the integration is wrong, not that the tests are wrong. The skill does not "try a different test."
Deep logical correctness. The E2E step proves "something the user said earlier comes back later," which is a useful but shallow signal. It does NOT prove the integration picks the right facts to store, scopes user_id correctly across real users, or handles conflict resolution well. That's human review territory.
Self-healing. This skill never modifies source files. The paired /mem0-integrate skill in its default --heal mode consumes the scorecard produced here and drives its own remediation loop. Exit code 7 (non-invasiveness violation) is the explicit signal the heal loop must stop and surface to the user.
Cross-branch comparisons. No main baseline diffing. The scorecard reflects this branch only.
Running against production data. Every smoke test uses a disposable random user_id and cleans up after. Never touches any other user's data.

mem0-test-integration

mem0-test-integration

Canonical sources (use these, not ambient knowledge)

Non-invasiveness contract

Preconditions

Pipeline

1. Read the contract

2. Install dependencies

3. Static sanity checks (fast, local, no API calls)

4. Run the repo's native test suite (two passes)

5. Smoke test (real API call, shortest round-trip)

6. E2E integration test (run the app, exercise the flow)

7. Scorecard

8. Report + exit

Artifacts (all under .mem0-integration/)

Modes

Invocation

Exit codes

Explicitly out of scope

Artifacts (all under `.mem0-integration/`)