Back to Posthog

Rust feature flags service

docs/internal/feature-flags/rust-service-overview.md

1.43.124.5 KB
Original Source

Rust feature flags service

The Rust feature flags service (rust/feature-flags/) handles all runtime feature flag evaluation. It serves the /flags and /decide endpoints that SDKs call. Django remains the admin API for flag CRUD operations (/api/projects/{id}/feature_flags/) and serves the local evaluation endpoint (/api/feature_flag/local_evaluation).

Infrastructure routing

Traffic routing happens at the Kubernetes infrastructure level using Contour HTTPProxy resources (Envoy-based). The Rust service never receives requests through Django -- they are routed directly by Contour.

text
Client
  │
  ▼
AWS ALB
  │
  ▼
Contour / Envoy (path-based routing)
  │
  ├── /decide/*              ──▶ posthog-feature-flags:3001  (Rust)
  ├── /flags/?               ──▶ posthog-feature-flags:3001  (Rust)
  ├── /api/feature_flag/local_evaluation ──▶ posthog-local-evaluation:8000 (Django, dedicated deployment)
  ├── /api/*                 ──▶ posthog-web-django:8000     (Django, catch-all)
  └── /*                     ──▶ posthog-web-django:8000     (Django, final catch-all)

Key routing details:

  • The decide and feature-flags proxy blocks are included before the api block in Contour, so they match first
  • /decide adds an X-Original-Endpoint: decide header so the Rust service can adjust response format
  • A dedicated subdomain (us-d.i.posthog.com / eu-d.i.posthog.com) routes only to decide + feature-flags with no Django fallback
  • All flag routes have a 5-second timeout and 2 retries on reset/cancelled
  • Canary rollouts are supported via Argo Rollouts adjusting weights on the HTTPProxy resources

Routing config lives in the charts repo: argocd/contour-ingress/values/values.prod-us.yaml (and prod-eu, dev variants).

Architecture overview

text
┌─────────────────────────────────────────────────────────────────┐
│                          SDK Request                            │
│                    POST /flags or /decide                       │
└─────────────────────────────────────────────────────────────────┘
                               │
                        Contour / Envoy
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Rust Feature Flags Service                  │
│                     (Axum, port 3001)                           │
├─────────────────────────────────────────────────────────────────┤
│  Rate limiting ──▶ Auth ──▶ Decode ──▶ Evaluate ──▶ Response   │
└─────────────────────────────────────────────────────────────────┘
        │                │                    │
        ▼                ▼                    ▼
  ┌──────────┐   ┌──────────────┐   ┌──────────────────┐
  │  Redis   │   │  PostgreSQL  │   │   S3 (fallback)  │
  │ (cache)  │   │  (source of  │   │   via HyperCache │
  │          │   │   truth)     │   │                  │
  └──────────┘   └──────────────┘   └──────────────────┘

Project structure

DirectoryPurpose
src/api/HTTP endpoint handlers, auth, rate limiting, request/response types
src/handler/Request processing pipeline: decoding, billing, evaluation, session recording, config assembly
src/flags/Core domain: flag models, matching engine, property filters, analytics, dependency graph
src/cohorts/Cohort models, DB operations, in-memory cache (moka), realtime membership providers
src/properties/Property models, operator matching, relative date parsing
src/team/Team model and DB operations
src/database/Connection management, persons DB routing
src/metrics/Prometheus metric constants and utilities
src/utils/User-agent parsing, graph algorithms
src/site_apps/Site apps support
tests/Integration tests (flag matching, HTTP methods, rate limiting, experience continuity)

HTTP endpoints

All routes are defined in rust/feature-flags/src/router.rs.

RouteMethodHandlerPurpose
/flagsPOSTendpoint::flagsFeature flag evaluation (primary endpoint)
/flagsGETendpoint::flagsReturns minimal response with empty flags
/decidePOSTendpoint::flagsSame handler as /flags, response format varies via X-Original-Endpoint: decide header
/flags/definitionsGETflag_definitions::flags_definitionsWIP, not routed in production. Flag definitions for local SDK evaluation (requires secret token or personal API key)
/GETindexReturns "feature flags" (basic health check)
/_readinessGETreadinessKubernetes readiness probe, tests all 4 DB pool connections
/_livenessGETlivenessKubernetes liveness probe, heartbeat-based
/_startupGETstartupKubernetes startup probe, warms DB pools
/metricsGETPrometheusMetrics scrape endpoint (when ENABLE_METRICS=true)

All flag routes accept trailing slashes.

/flags request processing

The POST handler follows this pipeline:

  1. Rate limiting: IP-based check (DDoS defense), then token-based check (per-project limits)
  2. Body decoding: JSON, base64, or gzip-compressed bodies
  3. Authentication: Extracts API token from body, query params, or headers
  4. Team lookup: HyperCache (Redis -> S3) with PostgreSQL fallback
  5. Flag definitions fetch: HyperCache (Redis -> S3) with PostgreSQL fallback
  6. Billing check: Verifies the team's feature flag quota hasn't been exceeded
  7. Flag evaluation: Core matching logic (see flag-evaluation-engine.md)
  8. Config assembly: Session recording settings, error tracking, site apps
  9. Response formatting: Version-specific serialization

Response versioning

The response format depends on the v query parameter and the endpoint:

VersionEndpointResponse format
(default)/flagsLegacyFlagsResponse: flat feature_flags: { key: value } map
v=2/flagsFlagsResponse: detailed flags: { key: FlagDetails } map with reasons, metadata, payloads
v=1/decideDecideV1Response: list of active flag keys
v=2/decideDecideV2Response: flat feature_flags: { key: value } map

/flags/definitions endpoint (under construction)

Not live in production. This endpoint is under active development and is not routed by Contour. Local evaluation is currently served by Django at /api/feature_flag/local_evaluation (see Django API endpoints), which remains the production endpoint for server-side SDKs.

The goal is for this Rust endpoint to replace the Django local evaluation endpoint. When complete, it will serve flag definitions for SDKs that evaluate flags locally, authenticated via:

  • Team secret API token (Authorization: Bearer phs_...), or
  • Personal API key with feature_flag:read scope

Current implementation returns flag definitions with cohort data from HyperCache, with PostgreSQL fallback on cache miss. Supports ETag-based conditional requests (If-None-Match header) to avoid re-transferring unchanged definitions. Rate limited per team (default 600/minute, per-team overrides via LOCAL_EVAL_RATE_LIMITS).

Billing quota enforcement matches Django's /api/feature_flag/local_evaluation behavior:

  • Quota check: Uses FeatureFlagsLimiter.is_limited(token) to verify the team hasn't exceeded their feature flag request quota. Returns HTTP 402 with a JSON body ({"type": "quota_limited", "code": "payment_required", ...}) when the quota is exceeded.
  • Non-billable flag filtering: Usage tracking skips requests where the response contains only non-billable flags — i.e., flags with keys starting with survey-targeting- or product-tour-targeting-. The shared is_billable_flag_key() predicate (in flag_analytics.rs) is used by both this endpoint and the /flags billing handler.
  • 304 responses skip billing: Usage is recorded after the ETag/304 path, so conditional responses that return 304 Not Modified are not counted toward billing. This matches Django's behavior.

Request and response types

FlagRequest (POST body)

rust
pub struct FlagRequest {
    pub token: Option<String>,               // aliases: $token, api_key
    pub distinct_id: Option<String>,         // alias: $distinct_id
    pub geoip_disable: Option<bool>,
    pub disable_flags: Option<bool>,
    pub person_properties: Option<HashMap<String, Value>>,
    pub groups: Option<HashMap<String, Value>>,
    pub group_properties: Option<HashMap<String, HashMap<String, Value>>>,
    pub anon_distinct_id: Option<String>,    // alias: $anon_distinct_id
    pub device_id: Option<String>,           // alias: $device_id
    pub flag_keys: Option<Vec<String>>,      // evaluate only these flags
    pub timezone: Option<String>,
    pub evaluation_contexts: Option<Vec<String>>,
    pub evaluation_runtime: Option<EvaluationRuntime>,
}

FlagsResponse (v2 response)

rust
pub struct FlagsResponse {
    pub errors_while_computing_flags: bool,
    pub flags: HashMap<String, FlagDetails>,
    pub quota_limited: Option<Vec<String>>,
    pub request_id: Uuid,
    pub evaluated_at: i64,
    pub config: ConfigResponse,
}

pub struct FlagDetails {
    pub key: String,
    pub enabled: bool,
    pub variant: Option<String>,
    pub reason: FlagEvaluationReason,
    pub metadata: FlagDetailsMetadata,
}

Rate limiting

Three independent rate limiters (IP, token, definitions), all in-process using the governor crate. The /flags IP and token limiters support a warn-then-enforce model with X-PostHog-Rate-Limit-Warning headers and per-token custom overrides. See rate-limiting.md for the full model, configuration modes, and migration path.

Server initialization

The serve() function in rust/feature-flags/src/server.rs orchestrates startup:

  1. Redis clients: Shared ReadWriteClient (auto-routes reads to replica). Optional dedicated flags Redis with 3-mode migration: shared-only -> dual-write -> dedicated-only.
  2. Database pools: PostgresRouter with 4 pools (persons reader/writer, non-persons reader/writer), plus an optional behavioral cohorts reader pool. See database-interaction-patterns.md.
  3. GeoIP: MaxMind database for IP geolocation.
  4. Cohort cache: In-memory CohortCacheManager (moka, 256 MB default, 5-minute TTL).
  5. HyperCache readers: 4 pre-initialized readers for flags, flags+cohorts, team metadata, and config.
  6. Billing limiters: Redis-backed quota enforcement for feature flags and session replay.
  7. Cookieless manager: Redis-backed cookieless identity resolution.
  8. Background tasks: DB pool monitoring, cohort cache monitoring, rate limiter cleanup, health heartbeat.

Configuration reference

All values come from environment variables via the envconfig crate. Defined in rust/feature-flags/src/config.rs.

Server

VariableDefaultPurpose
ADDRESS127.0.0.1:3001Listen address
MAX_CONCURRENCY1000Max concurrent flag evaluation requests
DEBUGfalsePretty console logging vs JSON structured logging
ENABLE_METRICSfalseExpose /metrics endpoint

PostgreSQL

VariableDefaultPurpose
WRITE_DATABASE_URLpostgres://posthog:posthog@localhost:5432/posthogMain database primary
READ_DATABASE_URLsameMain database replica
PERSONS_WRITE_DATABASE_URL(empty, aliases to main)Persons database primary
PERSONS_READ_DATABASE_URL(empty, aliases to main)Persons database replica
MAX_PG_CONNECTIONS10Max connections per pool
ACQUIRE_TIMEOUT_SECS5Connection acquisition timeout
IDLE_TIMEOUT_SECS300Close idle connections after this
NON_PERSONS_READER_STATEMENT_TIMEOUT_MS2000Statement timeout for flag/team reads
PERSONS_READER_STATEMENT_TIMEOUT_MS3000Statement timeout for person lookups
WRITER_STATEMENT_TIMEOUT_MS3000Statement timeout for writes

Behavioral cohorts

VariableDefaultPurpose
BEHAVIORAL_COHORTS_READ_DATABASE_URL(empty)Optional PostgreSQL connection for realtime cohort membership lookups. When empty, realtime cohort evaluation is disabled
COHORT_MEMBERSHIP_CACHE_TTL_SECONDS60Cache TTL for cohort membership lookups
COHORT_MEMBERSHIP_CACHE_MAX_ENTRIES500000Max entries in cohort membership cache

The behavioral cohorts pool uses tight limits (max 5 connections, 1s statement timeout) since it only performs simple key lookups against the cohort_membership table. When BEHAVIORAL_COHORTS_READ_DATABASE_URL is not set, a NoOpCohortMembershipProvider is used and all realtime cohort checks return false (graceful degradation).

Redis

VariableDefaultPurpose
REDIS_URLredis://localhost:6379/Shared Redis primary
REDIS_READER_URL(falls back to REDIS_URL)Shared Redis replica
FLAGS_REDIS_URL(empty)Dedicated flags Redis primary
FLAGS_REDIS_READER_URL(empty)Dedicated flags Redis replica
FLAGS_REDIS_ENABLEDfalseRead from dedicated flags Redis
REDIS_RESPONSE_TIMEOUT_MS100Redis response timeout (capped at 30s)
REDIS_CONNECTION_TIMEOUT_MS5000Redis connection timeout (capped at 60s)

S3 / HyperCache

VariableDefaultPurpose
OBJECT_STORAGE_BUCKETposthogS3 bucket name
OBJECT_STORAGE_REGIONus-east-1AWS region
OBJECT_STORAGE_ENDPOINT(empty)Custom S3 endpoint for local dev

Rate limiting

See rate-limiting.md for the full configuration reference.

Caching

VariableDefaultPurpose
COHORT_CACHE_CAPACITY_BYTES268435456 (256 MB)Moka cache memory limit
CACHE_TTL_SECONDS300Cohort cache TTL
BILLING_LIMITER_CACHE_TTL_SECS5Billing limiter cache TTL

See Behavioral cohorts for cohort membership cache settings.

Observability

VariableDefaultPurpose
OTEL_EXPORTER_OTLP_ENDPOINT(disabled)OpenTelemetry collector endpoint
OTEL_TRACES_SAMPLER_ARG0.001Trace sampling rate (0.1%)
OTEL_SERVICE_NAMEposthog-feature-flagsService name in traces
TEAM_IDS_TO_TRACKallTeams to emit detailed metrics for (all, none, comma-separated, or range 1:100)

Other

VariableDefaultPurpose
MAXMIND_DB_PATHshare/GeoLite2-City.mmdbGeoIP database path
OPTIMIZE_EXPERIENCE_CONTINUITY_LOOKUPStrueSkip DB lookups for 100%-rollout flags
FLAGS_SESSION_REPLAY_QUOTA_CHECKfalseCheck session replay quota

Key dependencies

CratePurpose
axumHTTP framework
sqlxAsync PostgreSQL driver
tokioAsync runtime
serde / serde_json / serde-pickleSerialization (pickle for HyperCache interop with Python)
governorToken-bucket rate limiting
mokaConcurrent in-memory cache (cohorts)
sha1 / sha2Hashing for flag rollout and variant selection
petgraphDependency graph (flag-on-flag dependencies, cohort dependencies)
fancy-regexRegex property matching with backtrack limits
semverSemantic versioning operator support
rayonParallel flag evaluation within dependency stages
tokio-retryExponential backoff for DB operations

Middleware

Applied in order via Axum layers (defined in router.rs):

  1. ConcurrencyLimitLayer: Caps concurrent flag evaluation requests (default 1000)
  2. TraceLayer: HTTP request tracing with spans
  3. CorsLayer: Permissive CORS (mirrors request origin, allows credentials, exposes x-posthog-rate-limit-warning)
  4. track_metrics: Prometheus HTTP request metrics
FilePurpose
rust/feature-flags/src/main.rsBinary entry point, tracing setup
rust/feature-flags/src/server.rsService initialization, resource creation
rust/feature-flags/src/router.rsAxum router, routes, shared state
rust/feature-flags/src/config.rsEnvironment variable configuration
rust/feature-flags/src/api/endpoint.rs/flags and /decide handler
rust/feature-flags/src/api/flag_definitions.rs/flags/definitions handler
rust/feature-flags/src/api/auth.rsAuthentication (secret tokens, personal API keys)
rust/feature-flags/src/api/types.rsRequest/response types
rust/feature-flags/src/handler/flags.rsCore request processing pipeline

See also