Back to Nofx

x402 Streaming Payment Architecture

docs/architecture/X402_STREAMING_PAYMENT.md

latest12.4 KB
Original Source

x402 Streaming Payment Architecture

Overview

NOFX calls AI models (DeepSeek, GPT, Claude, etc.) through the claw402 gateway, using the x402 protocol to pay per request with USDC on Base L2.

This document describes the full implementation of the SSE streaming call mode, including client, server, and billing logic.

Why Streaming Is Needed

NOFX (client)  ──→  Cloudflare (100s idle limit)  ──→  claw402 (gateway)  ──→  AI upstream
  • DeepSeek inference takes 60–180 seconds (up to 5 minutes)
  • Cloudflare enforces a 100-second hard limit on idle connections, returning 520/EOF on timeout
  • Non-streaming mode: the client receives no data until inference completes — Cloudflare disconnects after 100 seconds
  • Streaming mode: the first byte arrives within seconds, subsequent chunks flow continuously, keeping Cloudflare alive

End-to-End Request Flow

                       NOFX Client                           claw402 Gateway                    AI Upstream
                            │                                    │                                │
  ── Phase 1: Payment ──────────────────────────────────────────────────────────────────────────────
                            │                                    │                                │
  1. POST /api/v1/ai/...   │ ─── body + stream:true ──────────→ │                                │
     (no payment header)    │                                    │                                │
                            │ ←── 402 + Payment-Required ────── │                                │
                            │     (base64 JSON: price/chain/asset)                                │
                            │                                    │                                │
  2. EIP-712 signing        │                                    │                                │
     (USDC TransferWithAuth)│                                    │                                │
                            │                                    │                                │
  3. POST + X-Payment hdr  │ ─── body + signature ────────────→ │                                │
                            │                                    │ ── verify signature → Facilitator
                            │                                    │ ←── OK ──────────── Facilitator
                            │                                    │ ── settle USDC ───→ Facilitator
                            │                                    │ ←── tx hash ─────── Facilitator
                            │                                    │                                │
  ── Phase 2: Streaming Response ───────────────────────────────────────────────────────────────────
                            │                                    │                                │
                            │ ←── 200 OK ────────────────────── │ ─── POST stream:true ────────→ │
                            │ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ────────────────  │
                            │ ←── data: {"choices":[...]} ───── │ ←── SSE chunk ────────────────  │
                            │ ←── ... (continuous) ──────────── │ ←── ... ─────────────────────── │
                            │ ←── data: [DONE] ──────────────── │ ←── data: [DONE] ────────────── │

Client Implementation (NOFX)

File Structure

FileResponsibility
mcp/payment/claw402.goClaw402Client — model routing, wallet management
mcp/payment/x402.gox402 payment flow core — DoX402RequestStream, X402CallStream
mcp/client.goParseSSEStream — shared SSE parsing function

Call Chain

Claw402Client.Call()
  └→ X402CallStream()                          // x402.go:380
       ├→ Build request body + inject stream:true
       ├→ DoX402RequestStream()                // x402.go:239
       │    ├→ Send initial request (no payment header)
       │    ├→ Receive 402 → parse Payment-Required header
       │    ├→ signFn() → EIP-712 signature
       │    └→ Send retry request with X-Payment header → return open *http.Response
       │
       ├→ Start idle timeout watchdog (90s with no data → disconnect)
       ├→ TeeReader: simultaneous SSE parsing + raw byte buffering
       ├→ ParseSSEStream()                     // client.go:703
       │    ├→ bufio.Scanner line-by-line read
       │    ├→ Parse "data: {...}" → OpenAI chunk format
       │    └→ Accumulate text + call onChunk callback
       │
       └→ Fallback: if SSE yields nothing, try JSON parsing on buffered bodyBuf

Request Identification

Every request carries an X-Client-ID: nofx header (x402.go:473), allowing claw402 to identify the request source for logging and monitoring.

Model Routing

claw402ModelEndpoints maps user-friendly model names to API paths:

go
"deepseek""/api/v1/ai/deepseek/chat"
"gpt-5.4""/api/v1/ai/openai/chat/5.4"
"claude-opus""/api/v1/ai/anthropic/messages/opus"
"qwen-max""/api/v1/ai/qwen/chat/max"
// ... more

Anthropic endpoints (containing /anthropic/) automatically switch to the Messages API wire format.

Server Implementation (claw402)

Core Problem: ginmw Is Incompatible with SSE

Coinbase's standard Gin middleware ginmw.PaymentMiddlewareFromConfig internally works as follows:

1. Wrap c.Writer with responseCapture (all writes go to buffer)
2. c.Next() — handler runs, SSE chunks all go into buffer
3. Settle payment after handler completes
4. Write buffered content to client only after successful settlement

Problems:

  • SSE chunks are buffered — the client receives no data for minutes
  • Cloudflare disconnects after 100 seconds → 520 error
  • Handler runs too long (5 min), settlement context expires

Solution: streamAwareX402Middleware

Dual-path design (internal/gateway/x402.go):

go
func streamAwareX402Middleware(streamServer, standardMW) {
    return func(c *gin.Context) {
        if !isStreamingBody(c) {
            standardMW(c)   // Non-streaming → standard ginmw (battle-tested)
            return
        }
        // Streaming → custom path
    }
}

Non-Streaming Path

Delegates entirely to ginmw.PaymentMiddlewareFromConfig with no custom logic.

Streaming Path (Pre-Settlement)

1. isStreamingBody(c) — read body to check for {"stream": true}, restore body
2. streamServer.RequiresPayment(reqCtx) — does this route require payment?
3. streamServer.ProcessHTTPRequest() — verify X-Payment signature
4. handleStreamingPayment():
   a. ProcessSettlement() — settle USDC on-chain (collect payment first)
   b. c.Next() — pass to HandleAPIKeyStream
   c. SSE chunks write directly to c.Writer (no responseCapture buffer)

Key differences:

Standard ginmw (non-streaming)Custom path (streaming)
Settlement timingAfter handler completesBefore handler starts
Response bufferresponseCapture buffers everythingNo buffer, writes directly to client
Timeout riskSlow handler causes context expirySettlement uses context.Background()
SSE compatibleNoYes

Billing Logic

x402 Protocol Flow

x402 is an HTTP 402 payment protocol proposed by Coinbase. Core roles:

  • Resource Server (claw402) — provides paid APIs
  • Client (NOFX) — consumer, holds an EVM wallet
  • Facilitator (Coinbase CDP) — verifies signatures, executes on-chain settlement

Payment Signing (EIP-712)

Client signature type: USDC TransferWithAuthorization

1. Receive Payment-Required header from 402 response (base64 JSON)
2. Decode to get:
   - scheme: "exact"
   - network: "eip155:8453" (Base L2)
   - amount: USDC amount (e.g., "3000" = $0.003)
   - asset: USDC contract address
   - payTo: claw402 recipient address
3. Sign with wallet private key using EIP-712, authorizing USDC transfer from user wallet to payTo
4. Place signature in X-Payment + Payment-Signature headers

Pricing Models

Each AI model route has its own price configured in claw402:

ModeDescriptionExample
Fixed priceSpecified directly via user_price field$0.003 per request
Token-based dynamic pricingCalculated from request token count$0.001 per 1K tokens
Dispatch fallbackDefault price for SDK-compatible routes$0.01 per request
go
// Fixed price
price := fmt.Sprintf("$%s", route.UserPrice)

// Dynamic pricing
price = DynamicPriceFunc(func(ctx, reqCtx) (Price, error) {
    return resolveDynamicPrice(ctx, reqCtx, rule)
})

Retry Logic and Double-Charge Prevention

go
const X402MaxPaymentRetries = 5
const X402RetryBaseWait     = 3 * time.Second
  • 5xx errors → exponential backoff retry (3s, 6s, 9s...), no re-signing (same payment authorization)
  • Another 402 → previous signature expired, re-sign and retry (on-chain authorization auto-invalidates, no double charge)
  • 4xx (non-402) → non-retryable, fail immediately
  • Outer retry is set to 1 (WithMaxRetries(1)) to prevent outer retries from causing duplicate payments

Settlement Timing: Streaming vs Non-Streaming

Non-StreamingStreaming
Settlement timingAfter receiving full responseBefore streaming begins
RiskLow (content confirmed before charge)Slightly higher (charge before seeing content)
NecessityStandard modeMust charge first, otherwise SSE is buffered

Timeout Configuration

LocationTimeoutPurpose
NOFX X402Timeout5 minHTTP client overall timeout
NOFX x402StreamIdleTimeout90sSSE idle disconnect (prevent hangs)
NOFX CallWithRequestStream idle60sIdle timeout for non-x402 streaming
claw402 ResponseHeaderTimeout120sWait for first byte from AI upstream
claw402 streamingHTTP.Timeout0 (unlimited)SSE stream can last indefinitely
claw402 standardMW WithTimeout10 minNon-streaming ginmw overall timeout
claw402 x402PaymentTimeout30sPayment verification/settlement timeout

SSE Fault Tolerance

TeeReader Dual Parsing

go
var bodyBuf bytes.Buffer
tee := io.TeeReader(resp.Body, &bodyBuf)
text, sseErr := ParseSSEStream(tee, onChunk, onLine)

if text != "" {
    return text, nil  // SSE succeeded
}
// SSE yielded nothing → try JSON parsing on bodyBuf (server may have returned non-streaming JSON)
jsonText, _ := ParseMCPResponse(bodyBuf.Bytes())

Idle Timeout Watchdog

go
go func() {
    t := time.NewTimer(90s)
    for {
        select {
        case <-t.C:
            cancel()  // timeout → cancel context → close TCP → body.Read() returns error
        case <-resetCh:
            t.Reset(90s)  // received SSE line → reset timer
        }
    }
}()

Every incoming SSE line resets the timer. If no data arrives for 90 seconds, the context is cancelled and the TCP connection is closed, preventing indefinite blocking.

NOFX (Client)

  • mcp/payment/claw402.go — Claw402Client entry point
  • mcp/payment/x402.go — x402 payment flow (DoX402Request, DoX402RequestStream, X402CallStream)
  • mcp/payment/x402_sign.go — EIP-712 signing implementation
  • mcp/client.go — ParseSSEStream, CallWithRequestStream

claw402 (Server)

  • internal/gateway/x402.go — x402 middleware (streamAwareX402Middleware)
  • internal/gateway/proxy/stream.go — SSE proxy (HandleAPIKeyStream)
  • internal/config/ — Route configuration (pricing, model mapping)