docs/internal/feature-flags/rust-service-overview.md
The Rust feature flags service (rust/feature-flags/) handles all runtime feature flag evaluation. It serves the /flags and /decide endpoints that SDKs call. Django remains the admin API for flag CRUD operations (/api/projects/{id}/feature_flags/) and serves the local evaluation endpoint (/api/feature_flag/local_evaluation).
Traffic routing happens at the Kubernetes infrastructure level using Contour HTTPProxy resources (Envoy-based). The Rust service never receives requests through Django -- they are routed directly by Contour.
Client
│
▼
AWS ALB
│
▼
Contour / Envoy (path-based routing)
│
├── /decide/* ──▶ posthog-feature-flags:3001 (Rust)
├── /flags/? ──▶ posthog-feature-flags:3001 (Rust)
├── /api/feature_flag/local_evaluation ──▶ posthog-local-evaluation:8000 (Django, dedicated deployment)
├── /api/* ──▶ posthog-web-django:8000 (Django, catch-all)
└── /* ──▶ posthog-web-django:8000 (Django, final catch-all)
Key routing details:
decide and feature-flags proxy blocks are included before the api block in Contour, so they match first/decide adds an X-Original-Endpoint: decide header so the Rust service can adjust response formatus-d.i.posthog.com / eu-d.i.posthog.com) routes only to decide + feature-flags with no Django fallbackreset/cancelledRouting config lives in the charts repo: argocd/contour-ingress/values/values.prod-us.yaml (and prod-eu, dev variants).
┌─────────────────────────────────────────────────────────────────┐
│ SDK Request │
│ POST /flags or /decide │
└─────────────────────────────────────────────────────────────────┘
│
Contour / Envoy
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Rust Feature Flags Service │
│ (Axum, port 3001) │
├─────────────────────────────────────────────────────────────────┤
│ Rate limiting ──▶ Auth ──▶ Decode ──▶ Evaluate ──▶ Response │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────────┐
│ Redis │ │ PostgreSQL │ │ S3 (fallback) │
│ (cache) │ │ (source of │ │ via HyperCache │
│ │ │ truth) │ │ │
└──────────┘ └──────────────┘ └──────────────────┘
| Directory | Purpose |
|---|---|
src/api/ | HTTP endpoint handlers, auth, rate limiting, request/response types |
src/handler/ | Request processing pipeline: decoding, billing, evaluation, session recording, config assembly |
src/flags/ | Core domain: flag models, matching engine, property filters, analytics, dependency graph |
src/cohorts/ | Cohort models, DB operations, in-memory cache (moka), realtime membership providers |
src/properties/ | Property models, operator matching, relative date parsing |
src/team/ | Team model and DB operations |
src/database/ | Connection management, persons DB routing |
src/metrics/ | Prometheus metric constants and utilities |
src/utils/ | User-agent parsing, graph algorithms |
src/site_apps/ | Site apps support |
tests/ | Integration tests (flag matching, HTTP methods, rate limiting, experience continuity) |
All routes are defined in rust/feature-flags/src/router.rs.
| Route | Method | Handler | Purpose |
|---|---|---|---|
/flags | POST | endpoint::flags | Feature flag evaluation (primary endpoint) |
/flags | GET | endpoint::flags | Returns minimal response with empty flags |
/decide | POST | endpoint::flags | Same handler as /flags, response format varies via X-Original-Endpoint: decide header |
/flags/definitions | GET | flag_definitions::flags_definitions | WIP, not routed in production. Flag definitions for local SDK evaluation (requires secret token or personal API key) |
/ | GET | index | Returns "feature flags" (basic health check) |
/_readiness | GET | readiness | Kubernetes readiness probe, tests all 4 DB pool connections |
/_liveness | GET | liveness | Kubernetes liveness probe, heartbeat-based |
/_startup | GET | startup | Kubernetes startup probe, warms DB pools |
/metrics | GET | Prometheus | Metrics scrape endpoint (when ENABLE_METRICS=true) |
All flag routes accept trailing slashes.
/flags request processingThe POST handler follows this pipeline:
The response format depends on the v query parameter and the endpoint:
| Version | Endpoint | Response format |
|---|---|---|
| (default) | /flags | LegacyFlagsResponse: flat feature_flags: { key: value } map |
v=2 | /flags | FlagsResponse: detailed flags: { key: FlagDetails } map with reasons, metadata, payloads |
v=1 | /decide | DecideV1Response: list of active flag keys |
v=2 | /decide | DecideV2Response: flat feature_flags: { key: value } map |
/flags/definitions endpoint (under construction)Not live in production. This endpoint is under active development and is not routed by Contour. Local evaluation is currently served by Django at /api/feature_flag/local_evaluation (see Django API endpoints), which remains the production endpoint for server-side SDKs.
The goal is for this Rust endpoint to replace the Django local evaluation endpoint. When complete, it will serve flag definitions for SDKs that evaluate flags locally, authenticated via:
Authorization: Bearer phs_...), orfeature_flag:read scopeCurrent implementation returns flag definitions with cohort data from HyperCache, with PostgreSQL fallback on cache miss. Supports ETag-based conditional requests (If-None-Match header) to avoid re-transferring unchanged definitions. Rate limited per team (default 600/minute, per-team overrides via LOCAL_EVAL_RATE_LIMITS).
Billing quota enforcement matches Django's /api/feature_flag/local_evaluation behavior:
FeatureFlagsLimiter.is_limited(token) to verify the team hasn't exceeded their feature flag request quota. Returns HTTP 402 with a JSON body ({"type": "quota_limited", "code": "payment_required", ...}) when the quota is exceeded.survey-targeting- or product-tour-targeting-. The shared is_billable_flag_key() predicate (in flag_analytics.rs) is used by both this endpoint and the /flags billing handler.304 Not Modified are not counted toward billing. This matches Django's behavior.FlagRequest (POST body)pub struct FlagRequest {
pub token: Option<String>, // aliases: $token, api_key
pub distinct_id: Option<String>, // alias: $distinct_id
pub geoip_disable: Option<bool>,
pub disable_flags: Option<bool>,
pub person_properties: Option<HashMap<String, Value>>,
pub groups: Option<HashMap<String, Value>>,
pub group_properties: Option<HashMap<String, HashMap<String, Value>>>,
pub anon_distinct_id: Option<String>, // alias: $anon_distinct_id
pub device_id: Option<String>, // alias: $device_id
pub flag_keys: Option<Vec<String>>, // evaluate only these flags
pub timezone: Option<String>,
pub evaluation_contexts: Option<Vec<String>>,
pub evaluation_runtime: Option<EvaluationRuntime>,
}
FlagsResponse (v2 response)pub struct FlagsResponse {
pub errors_while_computing_flags: bool,
pub flags: HashMap<String, FlagDetails>,
pub quota_limited: Option<Vec<String>>,
pub request_id: Uuid,
pub evaluated_at: i64,
pub config: ConfigResponse,
}
pub struct FlagDetails {
pub key: String,
pub enabled: bool,
pub variant: Option<String>,
pub reason: FlagEvaluationReason,
pub metadata: FlagDetailsMetadata,
}
Three independent rate limiters (IP, token, definitions), all in-process using the governor crate. The /flags IP and token limiters support a warn-then-enforce model with X-PostHog-Rate-Limit-Warning headers and per-token custom overrides. See rate-limiting.md for the full model, configuration modes, and migration path.
The serve() function in rust/feature-flags/src/server.rs orchestrates startup:
ReadWriteClient (auto-routes reads to replica). Optional dedicated flags Redis with 3-mode migration: shared-only -> dual-write -> dedicated-only.PostgresRouter with 4 pools (persons reader/writer, non-persons reader/writer), plus an optional behavioral cohorts reader pool. See database-interaction-patterns.md.CohortCacheManager (moka, 256 MB default, 5-minute TTL).All values come from environment variables via the envconfig crate. Defined in rust/feature-flags/src/config.rs.
| Variable | Default | Purpose |
|---|---|---|
ADDRESS | 127.0.0.1:3001 | Listen address |
MAX_CONCURRENCY | 1000 | Max concurrent flag evaluation requests |
DEBUG | false | Pretty console logging vs JSON structured logging |
ENABLE_METRICS | false | Expose /metrics endpoint |
| Variable | Default | Purpose |
|---|---|---|
WRITE_DATABASE_URL | postgres://posthog:posthog@localhost:5432/posthog | Main database primary |
READ_DATABASE_URL | same | Main database replica |
PERSONS_WRITE_DATABASE_URL | (empty, aliases to main) | Persons database primary |
PERSONS_READ_DATABASE_URL | (empty, aliases to main) | Persons database replica |
MAX_PG_CONNECTIONS | 10 | Max connections per pool |
ACQUIRE_TIMEOUT_SECS | 5 | Connection acquisition timeout |
IDLE_TIMEOUT_SECS | 300 | Close idle connections after this |
NON_PERSONS_READER_STATEMENT_TIMEOUT_MS | 2000 | Statement timeout for flag/team reads |
PERSONS_READER_STATEMENT_TIMEOUT_MS | 3000 | Statement timeout for person lookups |
WRITER_STATEMENT_TIMEOUT_MS | 3000 | Statement timeout for writes |
| Variable | Default | Purpose |
|---|---|---|
BEHAVIORAL_COHORTS_READ_DATABASE_URL | (empty) | Optional PostgreSQL connection for realtime cohort membership lookups. When empty, realtime cohort evaluation is disabled |
COHORT_MEMBERSHIP_CACHE_TTL_SECONDS | 60 | Cache TTL for cohort membership lookups |
COHORT_MEMBERSHIP_CACHE_MAX_ENTRIES | 500000 | Max entries in cohort membership cache |
The behavioral cohorts pool uses tight limits (max 5 connections, 1s statement timeout) since it only performs simple key lookups against the cohort_membership table. When BEHAVIORAL_COHORTS_READ_DATABASE_URL is not set, a NoOpCohortMembershipProvider is used and all realtime cohort checks return false (graceful degradation).
| Variable | Default | Purpose |
|---|---|---|
REDIS_URL | redis://localhost:6379/ | Shared Redis primary |
REDIS_READER_URL | (falls back to REDIS_URL) | Shared Redis replica |
FLAGS_REDIS_URL | (empty) | Dedicated flags Redis primary |
FLAGS_REDIS_READER_URL | (empty) | Dedicated flags Redis replica |
FLAGS_REDIS_ENABLED | false | Read from dedicated flags Redis |
REDIS_RESPONSE_TIMEOUT_MS | 100 | Redis response timeout (capped at 30s) |
REDIS_CONNECTION_TIMEOUT_MS | 5000 | Redis connection timeout (capped at 60s) |
| Variable | Default | Purpose |
|---|---|---|
OBJECT_STORAGE_BUCKET | posthog | S3 bucket name |
OBJECT_STORAGE_REGION | us-east-1 | AWS region |
OBJECT_STORAGE_ENDPOINT | (empty) | Custom S3 endpoint for local dev |
See rate-limiting.md for the full configuration reference.
| Variable | Default | Purpose |
|---|---|---|
COHORT_CACHE_CAPACITY_BYTES | 268435456 (256 MB) | Moka cache memory limit |
CACHE_TTL_SECONDS | 300 | Cohort cache TTL |
BILLING_LIMITER_CACHE_TTL_SECS | 5 | Billing limiter cache TTL |
See Behavioral cohorts for cohort membership cache settings.
| Variable | Default | Purpose |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | (disabled) | OpenTelemetry collector endpoint |
OTEL_TRACES_SAMPLER_ARG | 0.001 | Trace sampling rate (0.1%) |
OTEL_SERVICE_NAME | posthog-feature-flags | Service name in traces |
TEAM_IDS_TO_TRACK | all | Teams to emit detailed metrics for (all, none, comma-separated, or range 1:100) |
| Variable | Default | Purpose |
|---|---|---|
MAXMIND_DB_PATH | share/GeoLite2-City.mmdb | GeoIP database path |
OPTIMIZE_EXPERIENCE_CONTINUITY_LOOKUPS | true | Skip DB lookups for 100%-rollout flags |
FLAGS_SESSION_REPLAY_QUOTA_CHECK | false | Check session replay quota |
| Crate | Purpose |
|---|---|
axum | HTTP framework |
sqlx | Async PostgreSQL driver |
tokio | Async runtime |
serde / serde_json / serde-pickle | Serialization (pickle for HyperCache interop with Python) |
governor | Token-bucket rate limiting |
moka | Concurrent in-memory cache (cohorts) |
sha1 / sha2 | Hashing for flag rollout and variant selection |
petgraph | Dependency graph (flag-on-flag dependencies, cohort dependencies) |
fancy-regex | Regex property matching with backtrack limits |
semver | Semantic versioning operator support |
rayon | Parallel flag evaluation within dependency stages |
tokio-retry | Exponential backoff for DB operations |
Applied in order via Axum layers (defined in router.rs):
x-posthog-rate-limit-warning)| File | Purpose |
|---|---|
rust/feature-flags/src/main.rs | Binary entry point, tracing setup |
rust/feature-flags/src/server.rs | Service initialization, resource creation |
rust/feature-flags/src/router.rs | Axum router, routes, shared state |
rust/feature-flags/src/config.rs | Environment variable configuration |
rust/feature-flags/src/api/endpoint.rs | /flags and /decide handler |
rust/feature-flags/src/api/flag_definitions.rs | /flags/definitions handler |
rust/feature-flags/src/api/auth.rs | Authentication (secret tokens, personal API keys) |
rust/feature-flags/src/api/types.rs | Request/response types |
rust/feature-flags/src/handler/flags.rs | Core request processing pipeline |