infra/website/docs/blog/feast-feature-server-monitoring.md
As feature stores become a critical part of production ML systems, the question shifts from "Can I serve features?" to "Can I trust what I'm serving?". Are my features fresh? Is latency within SLA? Are materialization pipelines succeeding? How long are my on-demand transformations taking?
Until now, answering these questions for Feast required ad-hoc monitoring — parsing logs, writing custom health checks, or bolting on external instrumentation. That changes today.
Feast now ships built-in Prometheus metrics for the feature server, covering the full request lifecycle — from HTTP request handling through online store reads and on-demand feature transformations to materialization pipelines and feature freshness tracking. Enable it with a single flag, point Prometheus at the metrics endpoint, and get production-grade observability for your feature serving infrastructure.
This post walks through the metrics available, what each one tells you, how to enable and configure them, and how to build a Grafana dashboard that gives you a complete operational picture of your feature server.
Feast's feature server now exposes a comprehensive set of Prometheus metrics across seven categories, designed to give ML platform teams full visibility into their feature serving infrastructure:
feature_count and feature_view_count labels, so you can correlate latency with request complexity./get-online-features) and write-path (during push/materialize) transformations, with odfv_name and mode labels to compare Pandas vs Python vs Substrait performance.ServiceMonitor when metrics are enabled, so Prometheus Operator discovers the scrape target automatically.All metrics are fully opt-in with zero overhead when disabled. Per-category toggles let you enable exactly the metrics you need.
The simplest way — one flag, everything enabled:
feast serve --metrics
This starts the feature server on its default port (6566) and a Prometheus metrics endpoint on port 8000.
For production deployments, configure metrics in feature_store.yaml with per-category toggles:
feature_server:
metrics:
enabled: true
resource: true # CPU and memory gauges
request: true # HTTP request counters and latency histograms
online_features: true # Entity count and retrieval tracking
push: true # Push/ingestion request counters
materialization: true # Pipeline success/failure and duration
freshness: true # Per-feature-view data staleness
If you're running Feast on Kubernetes with the Feast Operator, set metrics: true on the online store server:
apiVersion: feast.dev/v1alpha1
kind: FeatureStore
metadata:
name: production-feast
spec:
feastProject: my_project
services:
onlineStore:
server:
metrics: true
The operator automatically appends --metrics to the serve command and exposes port 8000 as a metrics port on the Service. It also auto-generates a ServiceMonitor resource for Prometheus Operator discovery. The operator detects the monitoring.coreos.com API group at startup; if the Prometheus Operator CRD is absent, ServiceMonitor creation is silently skipped, so vanilla Kubernetes clusters are unaffected.
Feast exposes metrics across seven categories. Here's the full reference, organized by what each category helps you answer.
| Metric | Type | Labels |
|---|---|---|
feast_feature_server_request_total | Counter | endpoint, status |
feast_feature_server_request_latency_seconds | Histogram | endpoint, feature_count, feature_view_count |
These are the core RED metrics (Rate, Errors, Duration) for your feature server. The latency histogram includes feature_count and feature_view_count labels so you can correlate latency with request complexity — a request fetching 200 features from 15 feature views will naturally be slower than one fetching 5 features from 2 views.
The histogram uses bucket boundaries tuned for feature serving workloads: 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s. Most online feature requests should complete in the lower buckets.
| Metric | Type | Labels |
|---|---|---|
feast_online_features_request_total | Counter | — |
feast_online_features_entity_count | Histogram | — |
The entity count histogram (buckets: 1, 5, 10, 25, 50, 100, 250, 500, 1000) tells you the shape of your traffic. Are callers sending single-entity lookups (real-time inference) or batch requests of hundreds (batch scoring)? A sudden spike in entity count per request means an upstream service changed its batching strategy — this directly impacts latency and memory.
| Metric | Type | Labels |
|---|---|---|
feast_feature_server_online_store_read_duration_seconds | Histogram | — |
This metric captures the total time spent reading from the online store (Redis, DynamoDB, PostgreSQL, etc.) during a /get-online-features request. It covers both the synchronous for-loop path and the async asyncio.gather path across all backends.
By comparing this with the overall request latency, you can determine whether latency is dominated by the store read or by other processing (serialization, transformation, network overhead).
| Metric | Type | Labels |
|---|---|---|
feast_feature_server_transformation_duration_seconds | Histogram | odfv_name, mode |
feast_feature_server_write_transformation_duration_seconds | Histogram | odfv_name, mode |
These metrics capture per-ODFV transformation time for both read-path (during /get-online-features) and write-path (during push/materialize with write_to_online_store=True) operations. The mode label distinguishes pandas, python, and substrait transformation modes, making it easy to compare their performance characteristics side by side.
ODFV transformation metrics are opt-in at the definition level via track_metrics=True:
@on_demand_feature_view(
sources=[driver_stats_fv, input_request],
schema=[Field(name="conv_rate_plus_val1", dtype=Float64)],
mode="python",
track_metrics=True, # Enable Prometheus metrics for this ODFV
)
def transformed_conv_rate_python(inputs: Dict[str, Any]) -> Dict[str, Any]:
return {"conv_rate_plus_val1": inputs["conv_rate"] + inputs["val_to_add"]}
When track_metrics=False (the default), zero metrics code runs for that ODFV — no timing, no Prometheus recording. This lets you selectively instrument the transforms you care about without adding overhead to others.
| Metric | Type | Labels |
|---|---|---|
feast_push_request_total | Counter | push_source, mode |
The push_source label identifies which source is pushing data. The mode label is one of online, offline, or online_and_offline. A push source that stops sending data is an early signal that an upstream pipeline is broken — long before feature staleness becomes visible.
| Metric | Type | Labels |
|---|---|---|
feast_materialization_total | Counter | feature_view, status |
feast_materialization_duration_seconds | Histogram | feature_view |
The status label is success or failure. The duration histogram uses wide buckets (1s, 5s, 10s, 30s, 60s, 2min, 5min, 10min, 30min, 1hr) because materialization jobs can range from seconds to tens of minutes depending on the feature view size and offline store.
| Metric | Type | Labels |
|---|---|---|
feast_feature_freshness_seconds | Gauge | feature_view, project |
This is the single most important metric for ML teams. It measures data staleness — the gap between "now" and the last successful materialization end time — per feature view. A background thread computes this every 30 seconds.
If your model was trained on hourly features and the freshness gauge crosses 2 hours, your model is receiving data it has never seen patterns for. Before this metric existed, this was a silent failure. Now you can set an alert and catch it in minutes.
The dashboard below shows these ML-specific metrics in action — latency correlated with feature count, online feature request rate, average entities per request, feature freshness per feature view, and materialization success counts with duration:
<div class="content-image"> </div>| Metric | Type | Labels |
|---|---|---|
feast_feature_server_cpu_usage | Gauge | (per worker PID) |
feast_feature_server_memory_usage | Gauge | (per worker PID) |
Per-worker CPU and memory gauges, updated every 5 seconds by a background thread. In Gunicorn deployments, each worker reports independently, so you can spot an individual worker consuming excessive resources.
One of the most powerful uses of these metrics is latency decomposition. By overlaying the overall request latency with the online store read duration and ODFV transformation duration, you can pinpoint exactly where time is spent:
Total request latency = Store read + ODFV transforms + Serialization/overhead
The Grafana dashboard below shows this decomposition in action — online store read latency (p50/p95/p99), per-ODFV read-path and write-path transform latency, and a side-by-side Pandas vs Python ODFV comparison:
<div class="content-image"> </div>If store reads dominate, the bottleneck is your online store (consider Redis instead of PostgreSQL, or tune your connection pool). If ODFV transforms dominate, consider switching from Pandas mode to Python mode — or re-evaluate whether the transformation should be precomputed during materialization instead of computed on the fly.
The mode label on transformation metrics makes it straightforward to compare Pandas and Python ODFV performance. The bottom-left panel in the dashboard above shows p50/p95 latencies for Pandas-mode and Python-mode ODFVs overlaid, making the comparison immediate. You can also query these directly:
# Pandas p95 read-path latency
histogram_quantile(0.95,
sum(rate(feast_feature_server_transformation_duration_seconds_bucket{mode="pandas"}[1m])) by (le))
# Python p95 read-path latency
histogram_quantile(0.95,
sum(rate(feast_feature_server_transformation_duration_seconds_bucket{mode="python"}[1m])) by (le))
Here are the recommended alert rules, ordered by impact:
- alert: FeastFeatureViewStale
expr: feast_feature_freshness_seconds > 3600
for: 5m
labels:
severity: critical
annotations:
summary: >
Feature view {{ $labels.feature_view }} in project {{ $labels.project }}
has not been materialized in {{ $value | humanizeDuration }}.
impact: Models consuming this feature view are receiving stale data.
- alert: FeastMaterializationFailing
expr: rate(feast_materialization_total{status="failure"}[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: >
Materialization is failing for feature view {{ $labels.feature_view }}.
- alert: FeastHighLatency
expr: |
histogram_quantile(0.99,
rate(feast_feature_server_request_latency_seconds_bucket{
endpoint="/get-online-features"
}[5m])
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: >
Feast p99 latency for online features is {{ $value }}s.
- alert: FeastHighErrorRate
expr: |
sum(rate(feast_feature_server_request_total{status="error"}[5m]))
/ sum(rate(feast_feature_server_request_total[5m]))
> 0.01
for: 5m
labels:
severity: warning
annotations:
summary: >
Feast feature server error rate is {{ $value | humanizePercentage }}.
With these metrics exposed, you can build a Grafana dashboard that gives you a complete operational picture of your feature server. We've published a ready-to-import Grafana dashboard JSON that covers all the panels described below — import it into your Grafana instance and point it at your Prometheus datasource to get started immediately.
Add the Feast metrics endpoint to your Prometheus scrape configuration:
scrape_configs:
- job_name: feast
static_configs:
- targets: ["<feast-host>:8000"]
scrape_interval: 15s
Once Prometheus is scraping, verify the raw metrics output:
curl -s http://localhost:8000 | grep feast_
Here are the most useful queries for building your own panels or running ad-hoc investigations in the Prometheus UI:
Throughput and errors:
# Request rate by endpoint
rate(feast_feature_server_request_total[5m])
# Error rate
sum(rate(feast_feature_server_request_total{status="error"}[5m]))
/ sum(rate(feast_feature_server_request_total[5m]))
Latency percentiles:
# p99 latency for online features
histogram_quantile(0.99,
rate(feast_feature_server_request_latency_seconds_bucket{endpoint="/get-online-features"}[5m]))
# Online store read p95
histogram_quantile(0.95,
sum(rate(feast_feature_server_online_store_read_duration_seconds_bucket[1m])) by (le))
# ODFV transform p95 by name and mode
histogram_quantile(0.95,
sum(rate(feast_feature_server_transformation_duration_seconds_bucket[1m])) by (le, odfv_name))
# ODFV write-path transform p95 by name
histogram_quantile(0.95,
sum(rate(feast_feature_server_write_transformation_duration_seconds_bucket[1m])) by (le, odfv_name))
Latency decomposition:
# Average total request latency
rate(feast_feature_server_request_latency_seconds_sum{endpoint="/get-online-features"}[5m])
/ rate(feast_feature_server_request_latency_seconds_count{endpoint="/get-online-features"}[5m])
# Average store read time
rate(feast_feature_server_online_store_read_duration_seconds_sum[5m])
/ rate(feast_feature_server_online_store_read_duration_seconds_count[5m])
# Average ODFV transform time
rate(feast_feature_server_transformation_duration_seconds_sum[5m])
/ rate(feast_feature_server_transformation_duration_seconds_count[5m])
Pandas vs Python comparison:
# Pandas p95
histogram_quantile(0.95,
sum(rate(feast_feature_server_transformation_duration_seconds_bucket{mode="pandas"}[1m])) by (le))
# Python p95
histogram_quantile(0.95,
sum(rate(feast_feature_server_transformation_duration_seconds_bucket{mode="python"}[1m])) by (le))
ML-specific signals:
# Feature freshness — views stale beyond 1 hour
feast_feature_freshness_seconds > 3600
# Materialization failure rate
rate(feast_materialization_total{status="failure"}[1h])
# Average entities per request
rate(feast_online_features_entity_count_sum[5m])
/ rate(feast_online_features_entity_count_count[5m])
# Push rate by source
rate(feast_push_request_total[5m])
Want to see all of this in action without manual setup? We've published an automated demo that deploys a Feast feature server with metrics, a Prometheus instance, and the pre-built Grafana dashboard — all with a single ./setup.sh command. It includes a traffic generator that exercises every metric category (plain online features, Pandas and Python ODFVs, push, materialize, and write-path transforms), so the dashboard populates immediately.
For teams running Feast on Kubernetes, the Feast Operator now auto-generates a ServiceMonitor when metrics: true is set on the online store. The operator:
monitoring.coreos.com API group at startupServiceMonitor owned by the FeatureStore CR, targeting the metrics port (8000)This means on an OpenShift or Prometheus-Operator-enabled cluster, metrics discovery is fully automatic — no manual ServiceMonitor creation required. The ServiceMonitor is cleaned up automatically when the FeatureStore CR is deleted or metrics is set back to false.
For teams using KEDA for autoscaling, these Prometheus metrics also serve as scaling signals. For example, you can scale the feature server based on request rate:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: feast-scaledobject
spec:
scaleTargetRef:
apiVersion: feast.dev/v1
kind: FeatureStore
name: my-feast
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc:9090
query: sum(rate(feast_feature_server_request_total[2m]))
threshold: "100"
| Category | Metric | What It Answers |
|---|---|---|
| Request | feast_feature_server_request_total | What is my throughput and error rate? |
| Request | feast_feature_server_request_latency_seconds | What are my p50/p99 latencies? |
| Online Features | feast_online_features_entity_count | What is my traffic shape? |
| Store Read | feast_feature_server_online_store_read_duration_seconds | Is my online store the bottleneck? |
| ODFV Transform | feast_feature_server_transformation_duration_seconds | How expensive are my read-path transforms? |
| ODFV Transform | feast_feature_server_write_transformation_duration_seconds | How expensive are my write-path transforms? |
| Push | feast_push_request_total | Is my ingestion pipeline sending data? |
| Materialization | feast_materialization_total | Are my pipelines succeeding? |
| Materialization | feast_materialization_duration_seconds | How long do my pipelines take? |
| Freshness | feast_feature_freshness_seconds | How stale is the data my models are using? |
| Resource | feast_feature_server_cpu_usage / memory_usage | Is my server healthy? |
feast serve --metrics or set metrics.enabled: true in your feature_store.yamlcurl http://localhost:8000We're excited to bring production-grade observability to Feast and welcome feedback from the community!