Back to Posthog

LLM Analytics Capture Implementation Plan

rust/capture/docs/llma-capture-implementation-plan.md

1.43.17.1 KB
Original Source

LLM Analytics Capture Implementation Plan

Overview

This document outlines the implementation steps for the LLM Analytics capture pipeline based on the design specified in llma-capture-overview.md.

Implementation Phases

Phase 0: Local Development Setup

0.1 Routing Configuration

  • Create new /i/v0/ai endpoint in capture service
  • Set up routing for /i/v0/ai endpoint to capture service (Caddy routes in docker-compose, capture-ai service on port 3308)

0.2 End-to-End Integration Tests

  • Implement Rust integration tests for multipart parsing and validation
  • Create Python acceptance test scenarios with multipart requests and blob data
  • Test Kafka message output and S3 storage integration

Phase 1: HTTP Endpoint

1.1 HTTP Endpoint Foundation

  • Implement multipart/form-data request parsing
  • Add server-side boundary validation
  • Support separate event.properties multipart part
  • Implement gzip decompression for compressed requests
  • Output events with blob placeholders to Kafka
  • Implement error schema

1.2 Basic Validation

  • Implement specific AI event type validation ($ai_generation, $ai_trace, $ai_span, $ai_embedding, $ai_metric, $ai_feedback)
  • Validate blob part names against event properties
  • Prevent blob overwriting of existing properties (reject if both embedded and separate properties)
  • Validate event part is first in multipart request
  • Validate required fields (event name, distinct_id, $ai_model)
  • Implement size limits (32KB event, 960KB combined, 25MB total, 27.5MB request body)

1.3 Initial Deployment (Dev)

  • Deploy capture-ai service to dev with basic /i/v0/ai endpoint
  • Test basic multipart parsing and Kafka output functionality
  • Verify endpoint responds correctly to AI events

Phase 2: Basic S3 Uploads

2.1 Simple S3 Upload (per blob)

  • Upload individual blobs to S3 as separate objects (concatenated into single file per event)
  • Generate S3 URLs for blobs (format: s3://{bucket}/{prefix}{token}/{uuid}?range={start}-{end})
  • Store S3 blob metadata (URLs stored in event properties)
  • Track S3 upload success/failure rates
  • Monitor blob size distributions

Phase 3: S3 Infrastructure & Deployment

3.1 S3 Bucket Configuration

  • Set up S3 buckets for local dev (MinIO: ai-blobs bucket via docker-compose)
  • Set up bucket structure with llma/ prefix
  • Set up S3 buckets for dev environment
  • Configure S3 lifecycle policies for retention (30d default)
  • Set up S3 access policies for capture service
  • Create service accounts with appropriate S3 permissions

3.2 Capture S3 Configuration (Dev)

  • Configure capture-ai service for local dev with S3 (bin/start-rust-service, mprocs.yaml)
  • Test S3 connectivity and uploads (acceptance tests pass)
  • Deploy capture-ai service to dev environment with S3 configuration
  • Set up IAM roles and permissions for capture-ai service
  • Configure S3 read/write permissions

Phase 4: Multipart File Processing

4.1 Multipart File Creation

  • Implement multipart/mixed format for S3 storage
  • Store headers (Content-Type, Content-Disposition, Content-Encoding) within multipart format
  • Generate S3 URLs with byte range parameters (ranges exclude boundaries, include headers + body)
  • Full document parseable as valid multipart/mixed
  • Individual ranges parseable as MIME parts using standard parsers (httparse in Rust, email.parser in Python)

Phase 5: Authorization

5.1 Request Signature Verification

  • Implement basic API key validation (Bearer token authentication)
  • Implement PostHog API key authentication
  • Add request signature verification
  • Validate API key before processing multipart data
  • Add proper error responses for authentication failures
  • Test authentication with valid and invalid keys

Phase 6: Operations

6.1 Monitoring Setup

  • Set up monitoring dashboards for capture-ai

6.2 Alerting

  • Configure alerts for S3 upload failures
  • Set up alerts for high error rates on /i/v0/ai endpoint
  • Set up alerts for high latency on /i/v0/ai endpoint

6.3 Runbooks

  • Create runbook for capture-ai S3 connectivity issues

Phase 7: Compression

7.1 Compression Support

  • Parse Content-Encoding: gzip header for request-level compression
  • Implement streaming gzip decompression for compressed requests
  • Test with gzip-compressed multipart requests
  • Implement server-side compression for uncompressed blobs before S3 storage
  • Add compression metadata to S3 objects
  • Track compression ratio effectiveness

Phase 8: Schema Validation

8.1 Schema Validation

  • Validate Content-Type headers on blob parts (required: application/json, text/plain, application/octet-stream)
  • Validate event JSON structure (event, distinct_id, properties fields)
  • Validate required AI properties ($ai_model)
  • Test with different supported content types
  • Create comprehensive schema definitions for each AI event type
  • Add detailed schema validation for event-specific properties
  • Add Content-Length validation beyond size limits

Phase 9: Limits (Optional)

9.1 Request Validation & Limits

  • Add request size limits and validation (configurable via ai_max_sum_of_parts_bytes)
  • Implement event part size limit (32KB)
  • Implement combined event+properties size limit (960KB)
  • Implement total parts size limit (25MB default, configurable)
  • Implement request body size limit (110% of total parts limit)
  • Return 413 Payload Too Large for size violations
  • Add quota limiting per team (via quota_limiter.check_and_filter(), returns BillingLimit error when exceeded)
  • Implement per-team payload size limits

Phase 10: Production Deployment

10.1 Production S3 Infrastructure

  • Set up S3 buckets for production environment
  • Configure S3 lifecycle policies for production
  • Set up S3 access policies for production capture service
  • Create production service accounts with appropriate S3 permissions

10.2 Capture Service Production Deployment

  • Deploy capture-ai service to production with basic /i/v0/ai endpoint
  • Test basic multipart parsing and Kafka output functionality in production
  • Verify endpoint responds correctly to AI events in production
  • Deploy capture-ai service to production environment with S3 configuration
  • Set up production IAM roles and permissions for capture-ai service
  • Configure production S3 read/write permissions

Phase 11: Data Deletion (Optional)

11.1 Data Deletion (Choose One Approach)

  • Option A: S3 expiry (passive) - rely on lifecycle policies
  • Option B: S3 delete by prefix functionality
  • Option C: Per-team encryption keys

Phase 12: Automated Testing

12.1 Continuous Validation

  • Set up automated test suite for continuous validation
  • Configure CI/CD pipeline integration for capture-ai tests
  • Set up automated regression testing for /i/v0/ai endpoint
  • Implement automated S3 integration validation tests