plugins/plugin-local-inference/native/verify/fused-attn-op-contract.md
This note defines the runtime contract for the optional fused attention path:
QJL-compressed K score -> online softmax -> quantized V mix
The fused op is an optimization layered on top of the required Eliza-1 cache kernels. It is not a release-required manifest kernel by itself until every publish target has runtime-ready graph-dispatch evidence for the fused route.
GGML_OP_FUSED_ATTN_QJL_TBQ or a documented
fused equivalent.qjl_score -> causal/non-causal softmax -> TBQ/Polar V decode -> weighted sum.h_kv = h_q / (n_heads / n_kv_heads).Runtime-ready status requires all of the following:
verify/fixtures/fused_attn_qjl_tbq.json or the matching
fused fixture for the value-cache type,maxDiff recorded in the backend runtime-dispatch evidence file,The current Vulkan evidence satisfies this for GGML_OP_FUSED_ATTN_QJL_TBQ.
Metal standalone fused kernels are verified separately, but Metal fused graph
dispatch remains a distinct gate.