docs/architecture/X402_STREAMING_PAYMENT.md
NOFX calls AI models (DeepSeek, GPT, Claude, etc.) through the claw402 gateway, using the x402 protocol to pay per request with USDC on Base L2.
This document describes the full implementation of the SSE streaming call mode, including client, server, and billing logic.
NOFX (client) ──→ Cloudflare (100s idle limit) ──→ claw402 (gateway) ──→ AI upstream
NOFX Client claw402 Gateway AI Upstream
│ │ │
── Phase 1: Payment ──────────────────────────────────────────────────────────────────────────────
│ │ │
1. POST /api/v1/ai/... │ ─── body + stream:true ──────────→ │ │
(no payment header) │ │ │
│ ←── 402 + Payment-Required ────── │ │
│ (base64 JSON: price/chain/asset) │
│ │ │
2. EIP-712 signing │ │ │
(USDC TransferWithAuth)│ │ │
│ │ │
3. POST + X-Payment hdr │ ─── body + signature ────────────→ │ │
│ │ ── verify signature → Facilitator
│ │ ←── OK ──────────── Facilitator
│ │ ── settle USDC ───→ Facilitator
│ │ ←── tx hash ─────── Facilitator
│ │ │
── Phase 2: Streaming Response ───────────────────────────────────────────────────────────────────
│ │ │
│ ←── 200 OK ────────────────────── │ ─── POST stream:true ────────→ │
│ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ──────────────── │
│ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ──────────────── │
│ ←── ... (continuous) ──────────── │ ←── ... ─────────────────────── │
│ ←── data: [DONE] ──────────────── │ ←── data: [DONE] ────────────── │
| File | Responsibility |
|---|---|
mcp/payment/claw402.go | Claw402Client — model routing, wallet management |
mcp/payment/x402.go | x402 payment flow core — DoX402RequestStream, X402CallStream |
mcp/client.go | ParseSSEStream — shared SSE parsing function |
Claw402Client.Call()
└→ X402CallStream() // x402.go:380
├→ Build request body + inject stream:true
├→ DoX402RequestStream() // x402.go:239
│ ├→ Send initial request (no payment header)
│ ├→ Receive 402 → parse Payment-Required header
│ ├→ signFn() → EIP-712 signature
│ └→ Send retry request with X-Payment header → return open *http.Response
│
├→ Start idle timeout watchdog (90s with no data → disconnect)
├→ TeeReader: simultaneous SSE parsing + raw byte buffering
├→ ParseSSEStream() // client.go:703
│ ├→ bufio.Scanner line-by-line read
│ ├→ Parse "data: {...}" → OpenAI chunk format
│ └→ Accumulate text + call onChunk callback
│
└→ Fallback: if SSE yields nothing, try JSON parsing on buffered bodyBuf
Every request carries an X-Client-ID: nofx header (x402.go:473), allowing claw402 to identify the request source for logging and monitoring.
claw402ModelEndpoints maps user-friendly model names to API paths:
"deepseek" → "/api/v1/ai/deepseek/chat"
"gpt-5.4" → "/api/v1/ai/openai/chat/5.4"
"claude-opus" → "/api/v1/ai/anthropic/messages/opus"
"qwen-max" → "/api/v1/ai/qwen/chat/max"
// ... more
Anthropic endpoints (containing /anthropic/) automatically switch to the Messages API wire format.
Coinbase's standard Gin middleware ginmw.PaymentMiddlewareFromConfig internally works as follows:
1. Wrap c.Writer with responseCapture (all writes go to buffer)
2. c.Next() — handler runs, SSE chunks all go into buffer
3. Settle payment after handler completes
4. Write buffered content to client only after successful settlement
Problems:
Dual-path design (internal/gateway/x402.go):
func streamAwareX402Middleware(streamServer, standardMW) {
return func(c *gin.Context) {
if !isStreamingBody(c) {
standardMW(c) // Non-streaming → standard ginmw (battle-tested)
return
}
// Streaming → custom path
}
}
Delegates entirely to ginmw.PaymentMiddlewareFromConfig with no custom logic.
1. isStreamingBody(c) — read body to check for {"stream": true}, restore body
2. streamServer.RequiresPayment(reqCtx) — does this route require payment?
3. streamServer.ProcessHTTPRequest() — verify X-Payment signature
4. handleStreamingPayment():
a. ProcessSettlement() — settle USDC on-chain (collect payment first)
b. c.Next() — pass to HandleAPIKeyStream
c. SSE chunks write directly to c.Writer (no responseCapture buffer)
Key differences:
| Standard ginmw (non-streaming) | Custom path (streaming) | |
|---|---|---|
| Settlement timing | After handler completes | Before handler starts |
| Response buffer | responseCapture buffers everything | No buffer, writes directly to client |
| Timeout risk | Slow handler causes context expiry | Settlement uses context.Background() |
| SSE compatible | No | Yes |
x402 is an HTTP 402 payment protocol proposed by Coinbase. Core roles:
Client signature type: USDC TransferWithAuthorization
1. Receive Payment-Required header from 402 response (base64 JSON)
2. Decode to get:
- scheme: "exact"
- network: "eip155:8453" (Base L2)
- amount: USDC amount (e.g., "3000" = $0.003)
- asset: USDC contract address
- payTo: claw402 recipient address
3. Sign with wallet private key using EIP-712, authorizing USDC transfer from user wallet to payTo
4. Place signature in X-Payment + Payment-Signature headers
Each AI model route has its own price configured in claw402:
| Mode | Description | Example |
|---|---|---|
| Fixed price | Specified directly via user_price field | $0.003 per request |
| Token-based dynamic pricing | Calculated from request token count | $0.001 per 1K tokens |
| Dispatch fallback | Default price for SDK-compatible routes | $0.01 per request |
// Fixed price
price := fmt.Sprintf("$%s", route.UserPrice)
// Dynamic pricing
price = DynamicPriceFunc(func(ctx, reqCtx) (Price, error) {
return resolveDynamicPrice(ctx, reqCtx, rule)
})
const X402MaxPaymentRetries = 5
const X402RetryBaseWait = 3 * time.Second
WithMaxRetries(1)) to prevent outer retries from causing duplicate payments| Non-Streaming | Streaming | |
|---|---|---|
| Settlement timing | After receiving full response | Before streaming begins |
| Risk | Low (content confirmed before charge) | Slightly higher (charge before seeing content) |
| Necessity | Standard mode | Must charge first, otherwise SSE is buffered |
| Location | Timeout | Purpose |
|---|---|---|
NOFX X402Timeout | 5 min | HTTP client overall timeout |
NOFX x402StreamIdleTimeout | 90s | SSE idle disconnect (prevent hangs) |
NOFX CallWithRequestStream idle | 60s | Idle timeout for non-x402 streaming |
claw402 ResponseHeaderTimeout | 120s | Wait for first byte from AI upstream |
claw402 streamingHTTP.Timeout | 0 (unlimited) | SSE stream can last indefinitely |
claw402 standardMW WithTimeout | 10 min | Non-streaming ginmw overall timeout |
claw402 x402PaymentTimeout | 30s | Payment verification/settlement timeout |
var bodyBuf bytes.Buffer
tee := io.TeeReader(resp.Body, &bodyBuf)
text, sseErr := ParseSSEStream(tee, onChunk, onLine)
if text != "" {
return text, nil // SSE succeeded
}
// SSE yielded nothing → try JSON parsing on bodyBuf (server may have returned non-streaming JSON)
jsonText, _ := ParseMCPResponse(bodyBuf.Bytes())
go func() {
t := time.NewTimer(90s)
for {
select {
case <-t.C:
cancel() // timeout → cancel context → close TCP → body.Read() returns error
case <-resetCh:
t.Reset(90s) // received SSE line → reset timer
}
}
}()
Every incoming SSE line resets the timer. If no data arrives for 90 seconds, the context is cancelled and the TCP connection is closed, preventing indefinite blocking.
mcp/payment/claw402.go — Claw402Client entry pointmcp/payment/x402.go — x402 payment flow (DoX402Request, DoX402RequestStream, X402CallStream)mcp/payment/x402_sign.go — EIP-712 signing implementationmcp/client.go — ParseSSEStream, CallWithRequestStreaminternal/gateway/x402.go — x402 middleware (streamAwareX402Middleware)internal/gateway/proxy/stream.go — SSE proxy (HandleAPIKeyStream)internal/config/ — Route configuration (pricing, model mapping)