scripts/e2e/mock_llm_fixtures/README.md
This directory holds canned OpenAI Chat Completions responses keyed by a
SHA-256 fingerprint of the request. The stub server at
scripts/e2e/mock_llm.py looks up <hash>.json here for every
POST /v1/chat/completions request; if no fixture matches, the server returns
a generic deterministic fallback and logs the missing hash to stderr so you
can promote it into a fixture later.
Embeddings do not use fixtures — they are generated on the fly from a hash-seeded RNG. Only chat completions are fixtured.
<sha256-hex>.json
The hash is a lowercase hex SHA-256 digest (64 characters, no prefix, no
extension beyond .json). Example:
3f5a7b9c...d12ef0.json
The fingerprint covers only the fields that should control which canned answer is returned:
canonical = {
"model": payload.get("model"),
"messages": [minimal({role, content, name?, tool_call_id?, tool_calls?}) ...],
"tool_choice": payload.get("tool_choice"),
}
blob = json.dumps(canonical, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
digest = hashlib.sha256(blob.encode("utf-8")).hexdigest()
Notes:
temperature, top_p, seed, max_tokens, stream, and any other
sampling/transport knobs do not influence the hash. Streaming and
non-streaming variants of the same request resolve to the same fixture.content is kept verbatim — if the app passes a list of content
parts (vision / multi-modal), the list is hashed as-is.tool_calls on assistant messages and tool_call_id / name on tool
messages are included because they change what the model is replying to.The canonical source of the hashing logic is _compute_request_digest in
scripts/e2e/mock_llm.py. If you change that function, regenerate all
fixtures.
The easiest path: run the e2e suite once with your new flow, grep stderr for
[mock-llm] unknown fixture hash <hash> along with the request dump on the
following line, and save the canned answer under <hash>.json. The up.sh log
tail preserves both lines.
If you need to compute a hash by hand from a request payload:
# scripts/e2e/compute_hash.py (not committed; run ad hoc)
import hashlib, json, sys
payload = json.load(sys.stdin)
canonical = {
"model": payload.get("model"),
"messages": [
{k: v for k, v in msg.items() if k in {"role", "content", "name", "tool_call_id", "tool_calls"}}
for msg in payload.get("messages", [])
],
"tool_choice": payload.get("tool_choice"),
}
blob = json.dumps(canonical, sort_keys=True, separators=(",", ":"), ensure_ascii=False)
print(hashlib.sha256(blob.encode("utf-8")).hexdigest())
cat request.json | python scripts/e2e/compute_hash.py
{
"request_digest": "<hash>",
"description": "Human description of when this is used",
"response": {
"content": "The canned assistant text, may include markdown.",
"tool_calls": null,
"finish_reason": "stop",
"usage": {"prompt_tokens": 12, "completion_tokens": 34}
}
}
| Field | Type | Required | Notes |
|---|---|---|---|
request_digest | string | no (documentation only) | Must match the filename. The loader does not re-verify this, it's here for human review. |
description | string | no | Short note on what flow this covers — makes grepping fixtures easier. |
response.content | string | yes (if no tool_calls) | The assistant's reply body. Plain text or markdown. Empty string is legal when tool_calls is set. |
response.tool_calls | array | null | no | OpenAI tool-call shape: [{"id": "call_x", "type": "function", "function": {"name": "...", "arguments": "{...}"}}]. Arguments must be a JSON string, not an object. |
response.finish_reason | string | no (defaults to "stop") | Use "tool_calls" when returning tool calls, "length" to simulate truncation. |
response.usage.prompt_tokens | number | no | Used verbatim in the non-streaming envelope. Default: estimated from request messages. |
response.usage.completion_tokens | number | no | Default: estimated from content. |
response.usage.total_tokens is always recomputed as the sum — do not set it.
The stub handles streaming vs non-streaming transparently for both content and tool-call fixtures:
finish_reason.tool_calls array, followed by a final empty delta carrying finish_reason.No fixture change is needed to toggle between streaming and non-streaming;
the app's stream=true flag alone controls it.
{
"request_digest": "abc123...",
"description": "Agent calls the weather tool for 'weather in London?'",
"response": {
"content": "",
"tool_calls": [
{
"id": "call_e2e_weather_1",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"London\"}"
}
}
],
"finish_reason": "tool_calls"
}
}
scripts/e2e/up.sh's log tail (or
/tmp/docsgpt-e2e/mock_llm.log depending on how orchestration pipes it).[mock-llm] unknown fixture hash <hash> warning and the request
dump on the following line.mock_llm_fixtures/<hash>.json with the schema above.time.time(), no random seeds, no environment dependence in the hash.