skills/feast-user-guide/SKILL.md
A Feast project requires:
feature_store.yaml config filefeast apply to register definitionsfeast init my_project
cd my_project
feast apply
An entity is a collection of semantically related features (e.g., a customer, a driver). Entities have join keys used to look up features.
from feast import Entity
from feast.value_type import ValueType
driver = Entity(
name="driver_id",
description="Driver identifier",
value_type=ValueType.INT64,
)
Data sources describe where raw feature data lives.
from feast import FileSource, BigQuerySource, KafkaSource, PushSource, RequestSource
from feast.data_format import ParquetFormat
# Batch source (file)
driver_stats_source = FileSource(
name="driver_stats_source",
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
# Request source (for on-demand features)
input_request = RequestSource(
name="vals_to_add",
schema=[Field(name="val_to_add", dtype=Float64)],
)
Maps features from a data source to entities with a schema, TTL, and online/offline settings.
from feast import FeatureView, Field
from feast.types import Float32, Int64, String
from datetime import timedelta
driver_hourly_stats = FeatureView(
name="driver_hourly_stats",
entities=[driver],
ttl=timedelta(days=365),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
online=True,
source=driver_stats_source,
)
Computes features at request time from other feature views and/or request data.
from feast import on_demand_feature_view
import pandas as pd
@on_demand_feature_view(
sources=[driver_hourly_stats, input_request],
schema=[Field(name="conv_rate_plus_val", dtype=Float64)],
mode="pandas",
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val"] = inputs["conv_rate"] + inputs["val_to_add"]
return df
Groups features from multiple views for retrieval.
from feast import FeatureService
driver_fs = FeatureService(
name="driver_ranking",
features=[driver_hourly_stats, transformed_conv_rate],
)
from feast import FeatureStore
store = FeatureStore(repo_path=".")
features = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
],
entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}],
).to_dict()
entity_df = pd.DataFrame({
"driver_id": [1001, 1002],
"event_timestamp": [datetime(2023, 1, 1), datetime(2023, 1, 2)],
})
training_df = store.get_historical_features(
entity_df=entity_df,
features=["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"],
).to_df()
Or use a FeatureService:
training_df = store.get_historical_features(
entity_df=entity_df,
features=driver_fs,
).to_df()
Load features from offline store into online store:
# Full materialization over a time range
feast materialize 2023-01-01T00:00:00 2023-12-31T23:59:59
# Incremental (from last materialized timestamp)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
Python API:
from datetime import datetime
store.materialize(start_date=datetime(2023, 1, 1), end_date=datetime(2023, 12, 31))
store.materialize_incremental(end_date=datetime.utcnow())
| Command | Purpose |
|---|---|
feast init [DIR] | Create new feature repository |
feast apply | Register/update feature definitions |
feast plan | Preview changes without applying |
feast materialize START END | Materialize features to online store |
feast materialize-incremental END | Incremental materialization |
feast entities list | List registered entities |
feast feature-views list | List feature views |
feast feature-services list | List feature services |
feast on-demand-feature-views list | List on-demand feature views |
feast teardown | Remove infrastructure resources |
feast version | Show SDK version |
Options: --chdir / -c (run in different directory), --feature-store-yaml / -f (override config path).
Define a feature view with vector fields for similarity search:
from feast.types import Array, Float32
wiki_passages = FeatureView(
name="wiki_passages",
entities=[passage_entity],
schema=[
Field(name="passage_text", dtype=String),
Field(
name="embedding",
dtype=Array(Float32),
vector_index=True,
vector_length=384,
vector_search_metric="COSINE",
),
],
source=passages_source,
online=True,
)
Retrieve similar documents:
results = store.retrieve_online_documents(
feature="wiki_passages:embedding",
query=query_embedding,
top_k=5,
)
project: my_project
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
from feast import (
Entity, FeatureView, OnDemandFeatureView, FeatureService,
Field, FileSource, RequestSource, FeatureStore,
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64, String, Bool, Array
from feast.value_type import ValueType
from datetime import timedelta