Back to Feast

Feast User Guide

skills/SKILL.md

0.63.06.3 KB
Original Source

Feast User Guide

Quick Start

A Feast project requires:

  1. A feature_store.yaml config file
  2. Python files defining entities, data sources, feature views, and feature services
  3. Running feast apply to register definitions
bash
feast init my_project
cd my_project
feast apply

Core Concepts

Entity

An entity is a collection of semantically related features (e.g., a customer, a driver). Entities have join keys used to look up features.

python
from feast import Entity
from feast.value_type import ValueType

driver = Entity(
    name="driver_id",
    description="Driver identifier",
    value_type=ValueType.INT64,
)

Data Sources

Data sources describe where raw feature data lives.

python
from feast import FileSource, BigQuerySource, KafkaSource, PushSource, RequestSource
from feast.data_format import ParquetFormat

# Batch source (file)
driver_stats_source = FileSource(
    name="driver_stats_source",
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Request source (for on-demand features)
input_request = RequestSource(
    name="vals_to_add",
    schema=[Field(name="val_to_add", dtype=Float64)],
)

FeatureView

Maps features from a data source to entities with a schema, TTL, and online/offline settings.

python
from feast import FeatureView, Field
from feast.types import Float32, Int64, String
from datetime import timedelta

driver_hourly_stats = FeatureView(
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=365),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    source=driver_stats_source,
)

OnDemandFeatureView

Computes features at request time from other feature views and/or request data.

python
from feast import on_demand_feature_view
import pandas as pd

@on_demand_feature_view(
    sources=[driver_hourly_stats, input_request],
    schema=[Field(name="conv_rate_plus_val", dtype=Float64)],
    mode="pandas",
)
def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val"] = inputs["conv_rate"] + inputs["val_to_add"]
    return df

FeatureService

Groups features from multiple views for retrieval.

python
from feast import FeatureService

driver_fs = FeatureService(
    name="driver_ranking",
    features=[driver_hourly_stats, transformed_conv_rate],
)

Feature Retrieval

Online (low-latency)

python
from feast import FeatureStore

store = FeatureStore(repo_path=".")

features = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
    ],
    entity_rows=[{"driver_id": 1001}, {"driver_id": 1002}],
).to_dict()

Historical (training data with point-in-time joins)

python
entity_df = pd.DataFrame({
    "driver_id": [1001, 1002],
    "event_timestamp": [datetime(2023, 1, 1), datetime(2023, 1, 2)],
})

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"],
).to_df()

Or use a FeatureService:

python
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=driver_fs,
).to_df()

Materialization

Load features from offline store into online store:

bash
# Full materialization over a time range
feast materialize 2023-01-01T00:00:00 2023-12-31T23:59:59

# Incremental (from last materialized timestamp)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Python API:

python
from datetime import datetime
store.materialize(start_date=datetime(2023, 1, 1), end_date=datetime(2023, 12, 31))
store.materialize_incremental(end_date=datetime.utcnow())

CLI Commands

CommandPurpose
feast init [DIR]Create new feature repository
feast applyRegister/update feature definitions
feast planPreview changes without applying
feast materialize START ENDMaterialize features to online store
feast materialize-incremental ENDIncremental materialization
feast entities listList registered entities
feast feature-views listList feature views
feast feature-services listList feature services
feast on-demand-feature-views listList on-demand feature views
feast teardownRemove infrastructure resources
feast versionShow SDK version

Options: --chdir / -c (run in different directory), --feature-store-yaml / -f (override config path).

Vector Search / RAG

Define a feature view with vector fields for similarity search:

python
from feast.types import Array, Float32

wiki_passages = FeatureView(
    name="wiki_passages",
    entities=[passage_entity],
    schema=[
        Field(name="passage_text", dtype=String),
        Field(
            name="embedding",
            dtype=Array(Float32),
            vector_index=True,
            vector_length=384,
            vector_search_metric="COSINE",
        ),
    ],
    source=passages_source,
    online=True,
)

Retrieve similar documents:

python
results = store.retrieve_online_documents(
    feature="wiki_passages:embedding",
    query=query_embedding,
    top_k=5,
)

feature_store.yaml Minimal Config

yaml
project: my_project
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

Common Imports

python
from feast import (
    Entity, FeatureView, OnDemandFeatureView, FeatureService,
    Field, FileSource, RequestSource, FeatureStore,
)
from feast.on_demand_feature_view import on_demand_feature_view
from feast.types import Float32, Float64, Int64, String, Bool, Array
from feast.value_type import ValueType
from datetime import timedelta

Detailed References