docs/how-to-guides/dbt-integration.md
{% hint style="warning" %} Alpha Feature: The dbt integration is currently in early development and subject to change.
Current Limitations:
Breaking changes may occur in future releases. {% endhint %}
This guide explains how to use Feast's dbt integration to automatically import dbt models as Feast FeatureViews. This enables you to leverage your existing dbt transformations as feature definitions without manual duplication.
dbt (data build tool) is a popular tool for transforming data in your warehouse. Many teams already use dbt to create feature tables. Feast's dbt integration allows you to:
This eliminates the need to manually define Feast objects that mirror your dbt models.
target/manifest.json)pip install 'feast[dbt]'
Or install the parser directly:
pip install dbt-artifacts-parser
In your dbt project, add a feast tag to models you want to import:
{% code title="models/driver_features.sql" %}
{{ config(
materialized='table',
tags=['feast']
) }}
SELECT
driver_id,
event_timestamp,
avg_rating,
total_trips,
is_active
FROM {{ ref('stg_drivers') }}
{% endcode %}
Feast uses column metadata from your schema.yml to determine feature types:
{% code title="models/schema.yml" %}
version: 2
models:
- name: driver_features
description: "Driver aggregated features for ML models"
columns:
- name: driver_id
description: "Unique driver identifier"
data_type: STRING
- name: event_timestamp
description: "Feature timestamp"
data_type: TIMESTAMP
- name: avg_rating
description: "Average driver rating"
data_type: FLOAT64
- name: total_trips
description: "Total completed trips"
data_type: INT64
- name: is_active
description: "Whether driver is currently active"
data_type: BOOLEAN
{% endcode %}
cd your_dbt_project
dbt compile
This generates target/manifest.json which Feast will read.
Use the Feast CLI to discover tagged models:
feast dbt list target/manifest.json --tag-filter feast
Output:
Found 1 model(s) with tag 'feast':
driver_features
Description: Driver aggregated features for ML models
Columns: driver_id, event_timestamp, avg_rating, total_trips, is_active
Tags: feast
Generate a Python file with Feast object definitions:
feast dbt import target/manifest.json \
--entity-column driver_id \
--data-source-type bigquery \
--tag-filter feast \
--output features/driver_features.py
This generates:
{% code title="features/driver_features.py" %}
"""
Feast feature definitions generated from dbt models.
Source: target/manifest.json
Project: my_dbt_project
Generated by: feast dbt import
"""
from datetime import timedelta
from feast import Entity, FeatureView, Field
from feast.types import Bool, Float64, Int64
from feast.infra.offline_stores.bigquery_source import BigQuerySource
# Entities
driver_id = Entity(
name="driver_id",
join_keys=["driver_id"],
description="Entity key for dbt models",
tags={'source': 'dbt'},
)
# Data Sources
driver_features_source = BigQuerySource(
name="driver_features_source",
table="my_project.my_dataset.driver_features",
timestamp_field="event_timestamp",
description="Driver aggregated features for ML models",
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
)
# Feature Views
driver_features_fv = FeatureView(
name="driver_features",
entities=[driver_id],
ttl=timedelta(days=1),
schema=[
Field(name="avg_rating", dtype=Float64, description="Average driver rating"),
Field(name="total_trips", dtype=Int64, description="Total completed trips"),
Field(name="is_active", dtype=Bool, description="Whether driver is currently active"),
],
online=True,
source=driver_features_source,
description="Driver aggregated features for ML models",
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
)
{% endcode %}
The dbt integration supports feature views with multiple entities, useful for modeling relationships involving multiple keys.
Specify multiple entity columns using repeated -e flags:
feast dbt import \
-m target/manifest.json \
-e user_id \
-e merchant_id \
--tag feast \
-o features/transactions.py
This creates a FeatureView with both user_id and merchant_id as entities, useful for:
Single entity usage:
feast dbt import -m target/manifest.json -e driver_id --tag feast
All specified entity columns must exist in each dbt model being imported. Models missing any entity column will be skipped with a warning.
The --output flag generates code like:
user_id = Entity(name="user_id", join_keys=["user_id"], ...)
merchant_id = Entity(name="merchant_id", join_keys=["merchant_id"], ...)
transaction_fv = FeatureView(
name="transactions",
entities=[user_id, merchant_id], # Multiple entities
schema=[...],
...
)
feast dbt listDiscover dbt models available for import.
feast dbt list <manifest_path> [OPTIONS]
Arguments:
manifest_path: Path to dbt's manifest.json fileOptions:
--tag-filter, -t: Filter models by dbt tag (e.g., feast)--model, -m: Filter to specific model name(s)feast dbt importImport dbt models as Feast object definitions.
feast dbt import <manifest_path> [OPTIONS]
Arguments:
manifest_path: Path to dbt's manifest.json fileOptions:
| Option | Description | Default |
|---|---|---|
--entity-column, -e | Entity column name (can be specified multiple times) | (required) |
--data-source-type, -d | Data source type: bigquery, snowflake, file | bigquery |
--tag-filter, -t | Filter models by dbt tag | None |
--model, -m | Import specific model(s) only | None |
--timestamp-field | Timestamp column name | event_timestamp |
--ttl-days | Feature TTL in days | 1 |
--exclude-columns | Columns to exclude from features | None |
--no-online | Disable online serving | False |
--output, -o | Output Python file path | None (stdout) |
--dry-run | Preview without generating code | False |
Feast automatically maps dbt/warehouse column types to Feast types:
| dbt/SQL Type | Feast Type |
|---|---|
STRING, VARCHAR, TEXT | String |
INT, INTEGER, BIGINT | Int64 |
SMALLINT, TINYINT | Int32 |
FLOAT, REAL | Float32 |
DOUBLE, FLOAT64 | Float64 |
BOOLEAN, BOOL | Bool |
TIMESTAMP, DATETIME | UnixTimestamp |
BYTES, BINARY | Bytes |
ARRAY<type> | Array(type) |
JSON, JSONB | Map (or Json if declared in schema) |
VARIANT, OBJECT | Map |
SUPER | Map |
MAP<string,string> | Map |
STRUCT, RECORD | Struct (BigQuery) |
struct<...> | Struct (Spark) |
Snowflake NUMBER(precision, scale) types are handled specially:
Float64Int32Int64Float64feast dbt import manifest.json -e user_id -d bigquery -o features.py
Generates BigQuerySource with the full table path from dbt metadata:
BigQuerySource(
table="project.dataset.table_name",
...
)
feast dbt import manifest.json -e user_id -d snowflake -o features.py
Generates SnowflakeSource with database, schema, and table:
SnowflakeSource(
database="MY_DB",
schema="MY_SCHEMA",
table="TABLE_NAME",
...
)
feast dbt import manifest.json -e user_id -d file -o features.py
Generates FileSource with a placeholder path:
FileSource(
path="/data/table_name.parquet",
...
)
{% hint style="info" %} For file sources, update the generated path to point to your actual data files. {% endhint %}
Create a standard tagging convention in your dbt project:
# dbt_project.yml
models:
my_project:
features:
+tags: ['feast'] # All models in features/ get the feast tag
Column descriptions from schema.yml are preserved in the generated Feast definitions, making your feature catalog self-documenting.
Use --dry-run to preview what will be generated:
feast dbt import manifest.json -e user_id -d bigquery --dry-run
Commit the generated Python files to your repository. This allows you to:
Add dbt import to your CI pipeline:
# .github/workflows/features.yml
- name: Compile dbt
run: dbt compile
- name: Generate Feast definitions
run: |
feast dbt import target/manifest.json \
-e user_id -d bigquery -t feast \
-o feature_repo/features.py
- name: Apply Feast changes
run: feast apply
data_type in schema.yml default to String type.Run dbt compile or dbt run first to generate the manifest file.
Check that your models have the correct tag in their config:
{{ config(tags=['feast']) }}
Ensure your dbt model includes the entity column specified with --entity-column. Models missing this column are skipped with a warning.
By default, Feast looks for event_timestamp. Use --timestamp-field to specify a different column name.