docs/reference/feature-servers/python-feature-server.md
The Python feature server is an HTTP endpoint that serves features with JSON I/O. This enables users to write and read features from the online store using any programming language that can make HTTP requests.
There is a CLI command that starts the server: feast serve. By default, Feast uses port 6566; the port be overridden with a --port flag.
For production deployments, the feature server supports several performance optimization options:
# Basic usage
feast serve
# Production configuration with multiple workers
feast serve --workers -1 --worker-connections 1000 --registry_ttl_sec 60
# Manual worker configuration
feast serve --workers 8 --worker-connections 2000 --max-requests 1000
Key performance options:
--workers, -w: Number of worker processes. Use -1 to auto-calculate based on CPU cores (recommended for production)--worker-connections: Maximum simultaneous clients per worker process (default: 1000)--max-requests: Maximum requests before worker restart, prevents memory leaks (default: 1000)--max-requests-jitter: Jitter to prevent thundering herd on worker restart (default: 50)--registry_ttl_sec, -r: Registry refresh interval in seconds. Higher values reduce overhead but increase staleness (default: 60)--keep-alive-timeout: Keep-alive connection timeout in seconds (default: 30)Worker Configuration:
--workers -1 to auto-calculate optimal worker count (2 × CPU cores + 1)--workers 1)Registry TTL:
--registry_ttl_sec 60 or higher to reduce refresh overheadConnection Tuning:
--worker-connections for high-concurrency workloads--max-requests to prevent memory leaks in long-running deployments--keep-alive-timeout based on client connection patternsContainer Deployments:
See this for an example on how to run Feast on Kubernetes using the Operator.
Here's an example of how to start the Python feature server with a local feature repo:
$ feast init feature_repo
Creating a new Feast repository in /home/tsotne/feast/feature_repo.
$ cd feature_repo
$ feast apply
Created entity driver
Created feature view driver_hourly_stats
Created feature service driver_activity
Created sqlite table feature_repo_driver_hourly_stats
$ feast materialize-incremental $(date +%Y-%m-%d)
Materializing 1 feature views to 2021-09-09 17:00:00-07:00 into the sqlite online store.
driver_hourly_stats from 2021-09-09 16:51:08-07:00 to 2021-09-09 17:00:00-07:00:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 295.24it/s]
$ feast serve
09/10/2021 10:42:11 AM INFO:Started server process [8889]
INFO: Waiting for application startup.
09/10/2021 10:42:11 AM INFO:Waiting for application startup.
INFO: Application startup complete.
09/10/2021 10:42:11 AM INFO:Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:6566 (Press CTRL+C to quit)
09/10/2021 10:42:11 AM INFO:Uvicorn running on http://127.0.0.1:6566 (Press CTRL+C to quit)
After the server starts, we can execute cURL commands from another terminal tab:
$ curl -X POST \
"http://localhost:6566/get-online-features" \
-d '{
"features": [
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips"
],
"entities": {
"driver_id": [1001, 1002, 1003]
}
}' | jq
{
"metadata": {
"feature_names": [
"driver_id",
"conv_rate",
"avg_daily_trips",
"acc_rate"
]
},
"results": [
{
"values": [
1001,
0.7037263512611389,
308,
0.8724706768989563
],
"statuses": [
"PRESENT",
"PRESENT",
"PRESENT",
"PRESENT"
],
"event_timestamps": [
"1970-01-01T00:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z"
]
},
{
"values": [
1002,
0.038169607520103455,
332,
0.48534533381462097
],
"statuses": [
"PRESENT",
"PRESENT",
"PRESENT",
"PRESENT"
],
"event_timestamps": [
"1970-01-01T00:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z"
]
},
{
"values": [
1003,
0.9665873050689697,
779,
0.7793770432472229
],
"statuses": [
"PRESENT",
"PRESENT",
"PRESENT",
"PRESENT"
],
"event_timestamps": [
"1970-01-01T00:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z",
"2021-12-31T23:00:00Z"
]
}
]
}
It's also possible to specify a feature service name instead of the list of features:
curl -X POST \
"http://localhost:6566/get-online-features" \
-d '{
"feature_service": <feature-service-name>,
"entities": {
"driver_id": [1001, 1002, 1003]
}
}' | jq
The Python feature server also exposes an endpoint for push sources. This endpoint allows you to push data to the online and/or offline store.
The request definition for PushMode is a string parameter to where the options are: ["online", "offline", "online_and_offline"].
Note: timestamps need to be strings, and might need to be timezone aware (matching the schema of the offline store)
curl -X POST "http://localhost:6566/push" -d '{
"push_source_name": "driver_stats_push_source",
"df": {
"driver_id": [1001],
"event_timestamp": ["2022-05-13 10:59:42+00:00"],
"created": ["2022-05-13 10:59:42"],
"conv_rate": [1.0],
"acc_rate": [1.0],
"avg_daily_trips": [1000]
},
"to": "online_and_offline"
}' | jq
or equivalently from Python:
import json
import requests
from datetime import datetime
event_dict = {
"driver_id": [1001],
"event_timestamp": [str(datetime(2021, 5, 13, 10, 59, 42))],
"created": [str(datetime(2021, 5, 13, 10, 59, 42))],
"conv_rate": [1.0],
"acc_rate": [1.0],
"avg_daily_trips": [1000],
"string_feature": "test2",
}
push_data = {
"push_source_name":"driver_stats_push_source",
"df":event_dict,
"to":"online",
}
requests.post(
"http://localhost:6566/push",
data=json.dumps(push_data))
/pushThe Python feature server supports configurable batching for the offline
portion of writes executed via the /push endpoint.
Only the offline part of a push is affected:
to: "offline" → fully batchedto: "online_and_offline" → online written immediately, offline batchedto: "online" → unaffected, always immediateEnable batching in your feature_store.yaml:
feature_server:
type: local
offline_push_batching_enabled: true
offline_push_batching_batch_size: 1000
offline_push_batching_batch_interval_seconds: 10
The Python feature server also exposes an endpoint for materializing features from the offline store to the online store.
Standard materialization with timestamps:
curl -X POST "http://localhost:6566/materialize" -d '{
"start_ts": "2021-01-01T00:00:00",
"end_ts": "2021-01-02T00:00:00",
"feature_views": ["driver_hourly_stats"]
}' | jq
Materialize all data without event timestamps:
curl -X POST "http://localhost:6566/materialize" -d '{
"feature_views": ["driver_hourly_stats"],
"disable_event_timestamp": true
}' | jq
When disable_event_timestamp is set to true, the start_ts and end_ts parameters are not required, and all available data is materialized using the current datetime as the event timestamp. This is useful when your source data lacks proper event timestamp columns.
Or from Python:
import json
import requests
# Standard materialization
materialize_data = {
"start_ts": "2021-01-01T00:00:00",
"end_ts": "2021-01-02T00:00:00",
"feature_views": ["driver_hourly_stats"]
}
# Materialize without event timestamps
materialize_data_no_timestamps = {
"feature_views": ["driver_hourly_stats"],
"disable_event_timestamp": True
}
requests.post(
"http://localhost:6566/materialize",
data=json.dumps(materialize_data))
The Python feature server can expose Prometheus-compatible metrics on a dedicated
HTTP endpoint (default port 8000). Metrics are opt-in and carry zero overhead
when disabled.
Option 1 — CLI flag (useful for one-off runs):
feast serve --metrics
Option 2 — feature_store.yaml (recommended for production):
feature_server:
type: local
metrics:
enabled: true
Either option is sufficient. When both are set, metrics are enabled.
By default, enabling metrics turns on all categories. You can selectively
disable individual categories within the same metrics block:
feature_server:
type: local
metrics:
enabled: true
resource: true # CPU / memory gauges
request: false # disable endpoint latency & request counters
online_features: true # online feature retrieval counters
push: true # push request counters
materialization: true # materialization counters & duration
freshness: true # feature freshness gauges
Any category set to false will emit no metrics and start no background
threads (e.g., setting freshness: false prevents the registry polling
thread from starting). All categories default to true.
| Metric | Type | Labels | Category | Description |
|---|---|---|---|---|
feast_feature_server_cpu_usage | Gauge | — | resource | Process CPU usage % |
feast_feature_server_memory_usage | Gauge | — | resource | Process memory usage % |
feast_feature_server_request_total | Counter | endpoint, status | request | Total requests per endpoint |
feast_feature_server_request_latency_seconds | Histogram | endpoint, feature_count, feature_view_count | request | Request latency with p50/p95/p99 support |
feast_online_features_request_total | Counter | — | online_features | Total online feature retrieval requests |
feast_online_features_entity_count | Histogram | — | online_features | Entity rows per online feature request |
feast_feature_server_online_store_read_duration_seconds | Histogram | — | online_features | Online store read phase duration (sync and async) |
feast_feature_server_transformation_duration_seconds | Histogram | odfv_name, mode | online_features | ODFV read-path transformation duration (requires track_metrics=True on the ODFV) |
feast_feature_server_write_transformation_duration_seconds | Histogram | odfv_name, mode | online_features | ODFV write-path transformation duration (requires track_metrics=True on the ODFV) |
feast_push_request_total | Counter | push_source, mode | push | Push requests by source and mode |
feast_materialization_result_total | Counter | feature_view, status | materialization | Materialization runs (success/failure) |
feast_materialization_duration_seconds | Histogram | feature_view | materialization | Materialization duration per feature view |
feast_feature_freshness_seconds | Gauge | feature_view, project | freshness | Seconds since last materialization |
The transformation_duration_seconds and write_transformation_duration_seconds
metrics are gated behind two conditions — both must be true for any
instrumentation to run:
online_features category must be enabled in the
metrics configuration.OnDemandFeatureView must have track_metrics=True.This defaults to False, so no ODFV incurs timing overhead unless explicitly
opted in:
from feast.on_demand_feature_view import on_demand_feature_view
@on_demand_feature_view(
sources=[my_feature_view, my_request_source],
schema=[Field(name="output", dtype=Float64)],
track_metrics=True, # opt in to transformation timing
)
def my_transform(inputs: pd.DataFrame) -> pd.DataFrame:
...
The odfv_name label lets you filter or group by individual ODFV,
and the mode label (python, pandas, substrait) lets you compare
transformation engines.
scrape_configs:
- job_name: feast
static_configs:
- targets: ["localhost:8000"]
Set metrics: true in your FeatureStore CR:
spec:
services:
onlineStore:
server:
metrics: true
The operator automatically exposes port 8000 and creates the corresponding Service port so Prometheus can discover it.
Feast uses Prometheus multiprocess mode so that metrics are correct regardless of the number of Gunicorn workers or Kubernetes replicas.
How it works:
PROMETHEUS_MULTIPROCESS_DIR). Feast creates
this directory automatically; you can override it by setting the
environment variable yourself.MultiProcessCollector, so a single scrape
returns accurate totals.child_exit → mark_process_dead).multiprocess_mode=liveall — Prometheus
shows per-worker values distinguished by a pid label.multiprocess_mode=max — Prometheus
shows the worst-case staleness (all workers compute the same value).Multiple replicas (HPA): Each pod runs its own metrics endpoint.
Prometheus adds an instance label per pod, so there is no
duplication. Use sum(rate(...)) or histogram_quantile(...) across
instances as usual.
Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.
In development mode we can generate a self-signed certificate for testing. In an actual production environment it is always recommended to get it from a trusted TLS certificate provider.
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes
The above command will generate two files
key.pem : certificate private keycert.pem: certificate public keyTo start the feature server in TLS mode, you need to provide the private and public keys using the --key and --cert arguments with the feast serve command.
feast serve --key /path/to/key.pem --cert /path/to/cert.pem
Warning: This is an experimental feature. To our knowledge, this is stable, but there are still rough edges in the experience.
Static artifacts loading allows you to load models, lookup tables, and other static resources once during feature server startup instead of loading them on each request. This improves performance for on-demand feature views that require external resources.
Create a static_artifacts.py file in your feature repository:
# static_artifacts.py
from fastapi import FastAPI
from transformers import pipeline
def load_artifacts(app: FastAPI):
"""Load static artifacts into app.state."""
app.state.sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
# Update global references for access from feature views
import example_repo
example_repo._sentiment_model = app.state.sentiment_model
Access pre-loaded artifacts in your on-demand feature views:
# example_repo.py
_sentiment_model = None
@on_demand_feature_view(...)
def sentiment_prediction(inputs: pd.DataFrame) -> pd.DataFrame:
global _sentiment_model
return _sentiment_model(inputs["text"])
For comprehensive documentation, examples, and best practices, see the Alpha Static Artifacts Loading reference guide.
The PyTorch NLP template provides a complete working example.
| Endpoint | Resource Type | Permission | Description |
|---|---|---|---|
| /get-online-features | FeatureView,OnDemandFeatureView | Read Online | Get online features from the feature store |
| /retrieve-online-documents | FeatureView | Read Online | Retrieve online documents from the feature store for RAG |
| /push | FeatureView | Write Online, Write Offline, Write Online and Offline | Push features to the feature store (online, offline, or both) |
| /write-to-online-store | FeatureView | Write Online | Write features to the online store |
| /materialize | FeatureView | Write Online | Materialize features within a specified time range |
| /materialize-incremental | FeatureView | Write Online | Incrementally materialize features up to a specified timestamp |
Please refer the page for more details on how to configure authentication and authorization.