Back to Sglang

WASM Rate Limit Example for sgl-model-gateway

sgl-model-gateway/examples/wasm/wasm-guest-ratelimit/README.md

0.5.111.8 KB
Original Source

WASM Rate Limit Example for sgl-model-gateway

This example demonstrates rate limiting middleware for sgl-model-gateway using the WebAssembly Component Model.

Overview

This middleware provides rate limiting:

  • Default: 60 requests per minute per identifier
  • Identifier Priority: API Key > IP Address > Request ID
  • Response: Returns 429 Too Many Requests when limit exceeded

Important: This is a simplified demonstration. Since WASM components are stateless, each worker thread maintains its own counter. For production, implement rate limiting at the router/host level with shared state.

Quick Start

Build and Deploy

bash
# Build
cd examples/wasm-guest-ratelimit
./build.sh

# Deploy (replace file_path with actual path)
curl -X POST http://localhost:3000/wasm \
  -H "Content-Type: application/json" \
  -d '{
    "modules": [{
      "name": "ratelimit-middleware",
      "file_path": "/absolute/path/to/wasm_guest_ratelimit.component.wasm",
      "module_type": "Middleware",
      "attach_points": [{"Middleware": "OnRequest"}]
    }]
  }'

Customization

Modify constants in src/lib.rs:

rust
const RATE_LIMIT_REQUESTS: u64 = 100; // requests per window
const RATE_LIMIT_WINDOW_MS: u64 = 60_000; // time window in ms

Testing

bash
# Send multiple requests (first 60 succeed, then 429)
for i in {1..65}; do
  curl -s -o /dev/null -w "%{http_code}\n" \
    http://localhost:3000/v1/models \
    -H "Authorization: Bearer secret-api-key-12345"
done

Limitations

  • Per-instance state (not shared across workers)
  • No cross-process state sharing
  • Memory growth with unique identifiers
  • State lost on instance restart

Troubleshooting

  • Verify module attached to OnRequest phase
  • Check identifier extraction logic matches request format
  • Note: Each WASM worker has separate counter