Back to Feast

feature_store.yaml

docs/reference/feature-repository/feature-store-yaml.md

0.63.04.1 KB
Original Source

feature_store.yaml

Overview

feature_store.yaml is used to configure a feature store. The file must be located at the root of a feature repository. An example feature_store.yaml is shown below:

{% code title="feature_store.yaml" %}

yaml
project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

{% endcode %}

Options

The following top-level configuration options exist in the feature_store.yaml file.

  • provider — Configures the environment in which Feast will deploy and operate.
  • registry — Configures the location of the feature registry.
  • online_store — Configures the online store.
  • offline_store — Configures the offline store.
  • project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast. Should only contain letters, numbers, and underscores.
  • engine - Configures the batch materialization engine.
  • materialization - Configures materialization behavior (write batching, feature pull strategy). See below.

Please see the RepoConfig API reference for the full list of configuration options.


materialization configuration

The materialization block controls how Feast reads from the offline store and writes to the online store during feast materialize / feast materialize-incremental runs.

{% code title="feature_store.yaml" %}

yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: redis
  connection_string: "localhost:6379"
materialization:
  online_write_batch_size: 10000   # write rows in chunks of 10 000
  pull_latest_features: false      # pull full time range (default)

{% endcode %}

online_write_batch_size

FieldTypeDefaultSupported engines
online_write_batch_sizeint (positive)nulllocal, spark, ray

Controls how many rows are converted to protobuf and written to the online store per batch during materialization.

Default behaviour (null): All rows fetched from the offline store are converted to protobuf in a single in-memory operation before writing. This is fast but can exhaust memory for large datasets — every row must be held as a Python proto object simultaneously.

With online_write_batch_size set: The Arrow table returned by the offline store is split into chunks of at most online_write_batch_size rows. Each chunk is converted and written independently, keeping peak memory proportional to the batch size rather than the full dataset size.

yaml
# Recommended for datasets > a few million rows or memory-constrained workers
materialization:
  online_write_batch_size: 10000

Choosing a value:

Dataset sizeWorker memoryRecommended batch size
< 1 M rowsAnynull (default — single batch is fine)
1–10 M rows≥ 4 GB50000
10–100 M rows≥ 8 GB10000
> 100 M rowsAny500010000

A smaller batch size reduces peak memory at the cost of more online_write_batch calls to the online store. For Redis, each call is a pipelined batch, so the overhead is low. For stores with higher per-call latency (e.g. DynamoDB), prefer larger batch sizes.

{% hint style="info" %} online_write_batch_size is applied per feature view within a single materialization job. If you materialize five feature views in parallel, peak memory is 5 × batch_size × bytes_per_row. {% endhint %}

pull_latest_features

FieldTypeDefault
pull_latest_featuresboolfalse

When false (default), the offline store retrieves all feature values within the requested time range for each entity.

When true, only the latest value per entity is retrieved. This reduces I/O and memory for feature views where historical values are not needed (e.g., slowly changing dimensions). It is equivalent to running a GROUP BY entity, MAX(event_timestamp) on the offline data before writing.