rust/capture/docs/llma-capture-overview.md
Implement a dedicated capture pathway for LLM Analytics events that enables efficient processing of large-scale language model interactions. This specialized pipeline will:
This approach allows us to capture comprehensive LLM usage data without impacting the performance of our primary event ingestion system.
The LLM Analytics capture endpoint supports four primary AI event types. Events are sent to the /i/v0/ai endpoint with multipart payloads to handle large context data efficiently.
$ai_generationA generation represents a single call to an LLM (e.g., a chat completion request).
Core Properties:
$ai_trace_id (required) - UUID to group AI events (e.g., conversation_id)$ai_model (required) - The model used (e.g., "gpt-4o", "claude-3-opus")$ai_provider (required) - The LLM provider (e.g., "openai", "anthropic", "gemini")$ai_input - List of messages sent to the LLM (can be stored as blob)
role ("user", "system", or "assistant") and content array$ai_output_choices - List of response choices from the LLM (can be stored as blob)
role and content array$ai_input_tokens - Number of tokens in the input$ai_output_tokens - Number of tokens in the output$ai_span_id (optional) - Unique identifier for this generation$ai_span_name (optional) - Name given to this generation$ai_parent_id (optional) - Parent span ID for tree view grouping$ai_latency (optional) - LLM call latency in seconds$ai_time_to_first_token (optional) - Time to first token in seconds (streaming only)$ai_http_status (optional) - HTTP status code of the response$ai_base_url (optional) - Base URL of the LLM provider$ai_request_url (optional) - Full URL of the request$ai_is_error (optional) - Boolean indicating if the request was an error$ai_error (optional) - Error message or objectCost Properties (optional, auto-calculated from model and token counts if not provided):
$ai_input_cost_usd - Cost in USD of input tokens$ai_output_cost_usd - Cost in USD of output tokens$ai_total_cost_usd - Total cost in USDCache Properties (optional):
$ai_cache_read_input_tokens - Number of tokens read from cache$ai_cache_creation_input_tokens - Number of tokens written to cache (Anthropic-specific)Model Parameters (optional):
$ai_temperature - Temperature parameter used$ai_stream - Whether the response was streamed$ai_max_tokens - Maximum tokens setting$ai_tools - Tools/functions available to the LLM$ai_traceA trace represents a complete AI interaction flow (e.g., a full conversation or agent execution).
Key Properties:
$ai_trace_id (required) - UUID identifying this trace$ai_input_state - Initial state of the trace (can be stored as blob)$ai_output_state - Final state of the trace (can be stored as blob)$ai_spanA span represents a logical unit of work within a trace (e.g., a tool call, a retrieval step).
Key Properties:
$ai_trace_id (required) - Parent trace UUID$ai_span_id (required) - Unique identifier for this span$ai_parent_id (optional) - Parent span ID for nesting$ai_span_name - Name describing this span$ai_input_state - Input state for this span (can be stored as blob)$ai_output_state - Output state for this span (can be stored as blob)$ai_embeddingAn embedding event captures vector generation for semantic search or RAG systems.
Key Properties:
$ai_trace_id (required) - Parent trace UUID$ai_model (required) - Embedding model used$ai_provider (required) - Provider (e.g., "openai", "cohere")$ai_input - Text or data being embedded (can be stored as blob)$ai_input_tokens - Number of tokens in the inputThese events are lightweight and processed through the regular pipeline:
$ai_metric - Performance metrics, usage statistics$ai_feedback - User feedback on AI responsesProperties that can contain large payloads (marked as "can be stored as blob" above) should be sent as separate multipart parts with names like event.properties.$ai_input or event.properties.$ai_output_choices. This keeps the event JSON small while allowing arbitrarily large context data to be stored efficiently in S3.
Reference: PostHog LLM Analytics Manual Capture Documentation
The LLM Analytics capture system implements a specialized data flow that efficiently handles large language model payloads:
Event Ingestion
/i/v0/ai endpointBlob Processing
Event Routing
Evaluation Pipeline
The /i/v0/ai endpoint accepts multipart POST requests with the following structure:
Headers:
Content-Type: multipart/form-data; boundary=<boundary>Multipart Parts:
Event Part (required)
Content-Disposition: form-data; name="event" (required)Content-Type: application/json (required)event.properties part is also present)Event Properties Part (optional)
Content-Disposition: form-data; name="event.properties" (required)Content-Type: application/json (required)properties fieldBlob Parts (optional, multiple allowed)
Content-Disposition: form-data; name="event.properties.<property_name>"; filename="<blob_id>" (required)Content-Type: application/octet-stream | application/json | text/plain (required)event.properties.$ai_input_state)Allowed Part Headers:
Content-Disposition (required for all parts)Content-Type (required for all parts)Content-Encoding is not allowed on parts)Note: Individual parts cannot have their own compression. To compress the entire request payload, use the Content-Encoding: gzip header at the HTTP request level.
POST /i/v0/ai HTTP/1.1
Content-Type: multipart/form-data; boundary=----boundary123
------boundary123
Content-Disposition: form-data; name="event"
Content-Type: application/json
{
"event": "$ai_generation",
"distinct_id": "user_123",
"timestamp": "2024-01-15T10:30:00Z"
}
------boundary123
Content-Disposition: form-data; name="event.properties"
Content-Type: application/json
{
"$ai_model": "gpt-4",
"completion_tokens": 150
}
------boundary123
Content-Disposition: form-data; name="event.properties.$ai_input"; filename="blob_abc123"
Content-Type: application/json
[JSON LLM input data]
------boundary123
Content-Disposition: form-data; name="event.properties.$ai_output_choices"; filename="blob_def456"
Content-Type: application/json
[JSON LLM output data]
------boundary123
Content-Disposition: form-data; name="event.properties.$ai_embedding_vector"; filename="blob_ghi789"
Content-Type: application/octet-stream
[Binary embedding vector data]
------boundary123--
To prevent LLM data from accidentally containing the multipart boundary sequence:
Parse multipart request
event partHandle event properties
event.properties part exists: extract properties JSON from itevent.properties part exists: reject with 400 Bad Requestevent.properties part if present, otherwise use embedded properties)Validate event structure
$ai_$ai_model)Collect all blob parts
event.properties.$ai_input)Validate size limits
Create multipart file containing all blobs with index
Upload single multipart file to S3
Replace blob properties with S3 URLs
Send modified event to Kafka
All blobs for an event are stored as a single multipart file in S3:
Multipart/mixed format - Similar to email MIME, with boundaries separating each blob part
s3://<bucket>/
llma/
<team_id>/
<YYYY-MM-DD>/
<event_id>_<random_string>.multipart
With retention prefixes:
s3://<bucket>/
llma/
<retention>/
<team_id>/
<YYYY-MM-DD>/
<event_id>_<random_string>.multipart
llma/ prefixteam_idevent_idupload_timestampcontent_type (multipart/mixed or similar)Properties contain S3 URLs with byte range parameters:
{
"event": "$ai_generation",
"properties": {
"$ai_input": "s3://bucket/llma/123/2024-01-15/event_456_x7y9z.multipart?range=0-50000",
"$ai_output_choices": "s3://bucket/llma/123/2024-01-15/event_456_x7y9z.multipart?range=50001-75000",
"model": "gpt-4",
"completion_tokens": 150
}
}
Without retention prefix (default 30 days):
s3://posthog-llm-analytics/llma/123/2024-01-15/event_456_x7y9z.multipart
s3://posthog-llm-analytics/llma/456/2024-01-15/event_789_a3b5c.multipart
With retention prefixes:
s3://posthog-llm-analytics/llma/30d/123/2024-01-15/event_012_m2n4p.multipart
s3://posthog-llm-analytics/llma/90d/456/2024-01-15/event_345_q6r8s.multipart
s3://posthog-llm-analytics/llma/1y/789/2024-01-15/event_678_t1u3v.multipart
30d/, 90d/, 1y/)The following content types are accepted for blob parts:
application/octet-stream - For binary dataapplication/json - For JSON formatted LLM contexttext/plain - For plain text LLM inputs/outputsThe event and event.properties parts must use application/json.
The endpoint supports request-level gzip compression to reduce bandwidth usage:
Content-Encoding: gzip header to the HTTP request and compress the entire multipart request bodyContent-Encoding: gzip header and decompress the entire request before processing the multipart dataExample Compressed Request:
POST /i/v0/ai HTTP/1.1
Content-Type: multipart/form-data; boundary=----boundary123
Content-Encoding: gzip
[Gzipped multipart request body]
The entire multipart body (including all parts) is compressed as a single gzip stream.
For data received from SDKs (after request decompression, if any):
application/jsontext/* (all text subtypes)application/octet-stream) will not be automatically compressed/i/v0/ai endpoint must be authenticated using the project's private API keyThe authentication process for LLM analytics events follows these steps:
API Key Extraction
Authorization: Bearer <api_key>)/i/v0/aiEarly Validation
Team Resolution
Request Processing
Error Handling
Three approaches for handling data deletion requests:
S3 Expiry (Passive)
S3 Delete by Prefix
Use S3's delete by prefix functionality to remove all objects for a team
Simple to implement but requires listing and deleting potentially many objects
Example: Delete all data for team 123:
aws s3 rm s3://posthog-llm-analytics/llma/123/ --recursive
Or using S3 API to delete objects with prefix llma/123/
Per-Team Encryption
The capture service enforces strict validation on incoming events:
Event Name Validation
/i/v0/ai must have an event name starting with $ai_Required Fields
Blob Property Validation
Property Path Validation
event.properties.nested.$ai_input)Size Limits All size limit violations return 413 Payload Too Large:
AI_MAX_SUM_OF_PARTS_BYTES to account for multipart overheadAI_MAX_SUM_OF_PARTS_BYTES, enforced by handler)Strict Schema Validation
$ai_ event type has a strictly defined schemaArchitecture:
Downsides: