Back to Feast

OpenLineage Integration

docs/reference/openlineage.md

0.63.06.1 KB
Original Source

OpenLineage Integration

This module provides native integration between Feast and OpenLineage, enabling automatic data lineage tracking for ML feature engineering workflows.

Overview

When enabled, the integration automatically emits OpenLineage events for:

  • Registry changes - Events when feature views, feature services, and entities are applied
  • Feature materialization - START, COMPLETE, and FAIL events when features are materialized

No code changes required - just enable OpenLineage in your feature_store.yaml!

Installation

OpenLineage is an optional dependency. Install it with:

bash
pip install openlineage-python

Or install Feast with the OpenLineage extra:

bash
pip install feast[openlineage]

Configuration

Add the openlineage section to your feature_store.yaml:

yaml
project: my_project
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

openlineage:
  enabled: true
  transport_type: http
  transport_url: http://localhost:5000
  transport_endpoint: api/v1/lineage
  namespace: feast
  emit_on_apply: true
  emit_on_materialize: true

Once configured, all Feast operations will automatically emit lineage events.

Environment Variables

You can also configure via environment variables:

bash
export FEAST_OPENLINEAGE_ENABLED=true
export FEAST_OPENLINEAGE_TRANSPORT_TYPE=http
export FEAST_OPENLINEAGE_URL=http://localhost:5000
export FEAST_OPENLINEAGE_ENDPOINT=api/v1/lineage
export FEAST_OPENLINEAGE_NAMESPACE=feast

Usage

Once configured, lineage is tracked automatically:

python
from feast import FeatureStore
from datetime import datetime, timedelta

# Create FeatureStore - OpenLineage is initialized automatically if configured
fs = FeatureStore(repo_path="feature_repo")

# Apply operations emit lineage events automatically
fs.apply([driver_entity, driver_hourly_stats_view])

# Materialize emits START, COMPLETE/FAIL events automatically
fs.materialize(
    start_date=datetime.now() - timedelta(days=1),
    end_date=datetime.now()
)

Configuration Options

OptionDefaultDescription
enabledfalseEnable/disable OpenLineage integration
transport_typehttpTransport type: http, file, kafka
transport_url-URL for HTTP transport (required)
transport_endpointapi/v1/lineageAPI endpoint for HTTP transport
api_key-Optional API key for authentication
namespacefeastNamespace for lineage events (uses project name if set to "feast")
producerfeastProducer identifier
emit_on_applytrueEmit events on feast apply
emit_on_materializetrueEmit events on materialization

Lineage Graph Structure

When you run feast apply, Feast creates a lineage graph that matches the Feast UI:

DataSources ──┐
              ├──→ feast_feature_views_{project} ──→ FeatureViews
Entities ─────┘                                          │
                                                         │
                                                         ▼
                                     feature_service_{name} ──→ FeatureService

Jobs created:

  • feast_feature_views_{project}: Shows DataSources + Entities → FeatureViews
  • feature_service_{name}: Shows specific FeatureViews → FeatureService (one per service)

Datasets include:

  • Schema with feature names, types, descriptions, and tags
  • Feast-specific facets with metadata (TTL, entities, owner, etc.)
  • Documentation facets with descriptions

Transport Types

yaml
openlineage:
  enabled: true
  transport_type: http
  transport_url: http://marquez:5000
  transport_endpoint: api/v1/lineage
  api_key: your-api-key  # Optional

File Transport

yaml
openlineage:
  enabled: true
  transport_type: file
  additional_config:
    log_file_path: openlineage_events.json

Kafka Transport

yaml
openlineage:
  enabled: true
  transport_type: kafka
  additional_config:
    bootstrap_servers: localhost:9092
    topic: openlineage.events

Custom Feast Facets

The integration includes custom Feast-specific facets in lineage events:

FeastFeatureViewFacet

Captures metadata about feature views:

  • name: Feature view name
  • ttl_seconds: Time-to-live in seconds
  • entities: List of entity names
  • features: List of feature names
  • online_enabled / offline_enabled: Store configuration
  • description: Feature view description
  • tags: Key-value tags

FeastFeatureServiceFacet

Captures metadata about feature services:

  • name: Feature service name
  • feature_views: List of feature view names
  • feature_count: Total number of features
  • description: Feature service description
  • tags: Key-value tags

FeastMaterializationFacet

Captures materialization run metadata:

  • feature_views: Feature views being materialized
  • start_date / end_date: Materialization window
  • rows_written: Number of rows written

Lineage Visualization

Use Marquez to visualize your Feast lineage:

bash
# Start Marquez
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez

# Configure Feast to emit to Marquez (in feature_store.yaml)
# openlineage:
#   enabled: true
#   transport_type: http
#   transport_url: http://localhost:5000

Then access the Marquez UI at http://localhost:3000 to see your feature lineage.

Namespace Behavior

  • If namespace is set to "feast" (default): Uses project name as namespace (e.g., my_project)
  • If namespace is set to a custom value: Uses {namespace}/{project} (e.g., custom/my_project)

Feast to OpenLineage Mapping

Feast ConceptOpenLineage Concept
DataSourceInputDataset
FeatureViewOutputDataset (of feature views job) / InputDataset (of feature service job)
FeatureSchema field
EntityInputDataset
FeatureServiceOutputDataset
MaterializationRunEvent (START/COMPLETE/FAIL)