Back to Feast

feature_store.yaml

docs/reference/feature-store-yaml.md

0.63.05.9 KB
Original Source

feature_store.yaml

Overview

feature_store.yaml is a file that is placed at the root of the Feature Repository. This file contains configuration about how the feature store runs. An example feature_store.yaml is shown below:

{% code title="feature_store.yaml" %}

yaml
project: loyal_spider
registry: data/registry.db
provider: local
online_store:
    type: sqlite
    path: data/online_store.db

{% endcode %}

Fields in feature_store.yaml

  • provider ("local" or "gcp") — Defines the environment in which Feast will execute data flows.
  • registry (a local or GCS filepath) — Defines the location of the feature registry.
  • online_store — Configures the online store. This field will have various subfields depending on the type of online store:
    • type ("sqlite" or "datastore") — Defines the type of online store.
    • path (a local filepath) — Parameter for the sqlite online store. Defines the path to the SQLite database file.
    • project_id — Optional parameter for the datastore online store. Sets the GCP project id used by Feast, if not set Feast will use the default GCP project id in the local environment.
  • project — Defines a namespace for the entire feature store. Can be used to isolate multiple deployments in a single installation of Feast.

feature_server

The feature_server block configures the Python Feature Server when it is used to serve online features and handle /push requests. This section is optional and only applies when running the Python feature server.

An example configuration:

yaml
feature_server:
  type: local
  metrics: # Prometheus metrics configuration. Also achievable via `feast serve --metrics`.
    enabled: true             # Enable Prometheus metrics server on port 8000
    resource: true            # CPU / memory gauges
    request: true             # endpoint latency histograms & request counters
    online_features: true     # online feature retrieval counters + store read & ODFV transform timing
    push: true                # push request counters
    materialization: true     # materialization counters & duration histograms
    freshness: true           # per-feature-view freshness gauges
  offline_push_batching_enabled: true # Enables batching of offline writes processed by /push. Online writes are unaffected.
  offline_push_batching_batch_size: 100 # Maximum number of buffered rows before writing to the offline store.
  offline_push_batching_batch_interval_seconds: 5 # Maximum time rows may remain buffered before a forced flush.

Providers

The provider field defines the environment in which Feast will execute data flows. As a result, it also determines the default values for other fields.

Local

When using the local provider:

  • Feast can read from local Parquet data sources.
  • Feast performs historical feature retrieval (point-in-time joins) using pandas.
  • Feast performs online feature serving from a SQLite database.

GCP

When using the GCP provider:

  • Feast can read data from BigQuery data sources.
  • Feast performs historical feature retrieval (point-in-time joins) in BigQuery.
  • Feast performs online feature serving from Google Cloud Datastore.

Permissions

<table> <thead> <tr> <th style="text-align:left"><b>Command</b> </th> <th style="text-align:left">Component</th> <th style="text-align:left">Permissions</th> <th style="text-align:left">Recommended Role</th> </tr> </thead> <tbody> <tr> <td style="text-align:left"><b>Apply</b> </td> <td style="text-align:left">BigQuery (source)</td> <td style="text-align:left"> <p>bigquery.jobs.create</p> <p>bigquery.readsessions.create</p> <p>bigquery.readsessions.getData</p> </td> <td style="text-align:left">roles/bigquery.user</td> </tr> <tr> <td style="text-align:left"><b>Apply</b> </td> <td style="text-align:left">Datastore (destination)</td> <td style="text-align:left"> <p>datastore.entities.allocateIds</p> <p>datastore.entities.create</p> <p>datastore.entities.delete</p> <p>datastore.entities.get</p> <p>datastore.entities.list</p> <p>datastore.entities.update</p> </td> <td style="text-align:left">roles/datastore.owner</td> </tr> <tr> <td style="text-align:left"><b>Materialize</b> </td> <td style="text-align:left">BigQuery (source)</td> <td style="text-align:left">bigquery.jobs.create</td> <td style="text-align:left">roles/bigquery.user</td> </tr> <tr> <td style="text-align:left"><b>Materialize</b> </td> <td style="text-align:left">Datastore (destination)</td> <td style="text-align:left"> <p>datastore.entities.allocateIds</p> <p>datastore.entities.create</p> <p>datastore.entities.delete</p> <p>datastore.entities.get</p> <p>datastore.entities.list</p> <p>datastore.entities.update</p> <p>datastore.databases.get</p> </td> <td style="text-align:left">roles/datastore.owner</td> </tr> <tr> <td style="text-align:left"><b>Get Online Features</b> </td> <td style="text-align:left">Datastore</td> <td style="text-align:left">datastore.entities.get</td> <td style="text-align:left">roles/datastore.user</td> </tr> <tr> <td style="text-align:left"><b>Get Historical Features</b> </td> <td style="text-align:left">BigQuery (source)</td> <td style="text-align:left"> <p>bigquery.datasets.get</p> <p>bigquery.tables.get</p> <p>bigquery.tables.create</p> <p>bigquery.tables.updateData</p> <p>bigquery.tables.update</p> <p>bigquery.tables.delete</p> <p>bigquery.tables.getData</p> </td> <td style="text-align:left">roles/bigquery.dataEditor</td> </tr> </tbody> </table>