docs/adr/ADR-0005-stream-transformations.md
Accepted
Feast supported batch features well but lacked in-house support for pull-based stream ingestion or registered stream transformations. While Kafka and Kinesis data sources could be registered, users had to either:
The stream transformation pipeline existed entirely outside of Feast, making it harder to track, version, and manage streaming features.
Introduce a StreamFeatureView and a StreamProcessor interface to provide a standardized pipeline for ingesting and transforming stream data.
from feast import StreamFeatureView, Entity, Field, Aggregation
from feast.types import Float32
@stream_feature_view(
entities=[entity],
ttl=timedelta(days=30),
owner="[email protected]",
online=True,
schema=[Field(name="dummy_field", dtype=Float32)],
description="Stream feature view with aggregations",
aggregations=[
Aggregation(column="dummy_field", function="max", time_window=timedelta(days=1)),
Aggregation(column="dummy_field2", function="count", time_window=timedelta(days=24)),
],
timestamp_field="event_timestamp",
mode="spark",
source=stream_source,
)
def pandas_view(pandas_df):
df = pandas_df.transform(lambda x: x + 10, axis=1)
return df
The StreamProcessor is a pluggable interface for stream engines (Spark, Flink, etc.):
class StreamProcessor(ABC):
sfv: StreamFeatureView
data_source: DataSource
def ingest_stream_feature_view(self) -> None: ...
def _ingest_stream_data(self) -> StreamTable: ...
def _construct_transformation_plan(self, table: StreamTable) -> StreamTable: ...
def _write_to_online_store(self, table: StreamTable) -> None: ...
A unified push API was introduced to allow pushing features to both online and offline stores, supporting the Kappa architectural approach to streaming.
Built-in aggregation functions: sum, count, mean, max, min. Aggregations use full aggregation with RocksDB for the initial implementation, keeping it simple while reducing request-time latency.
sdk/python/feast/stream_feature_view.py