Back to Feast

ADR-0004: Entity Join Key Mapping

docs/adr/ADR-0004-entity-join-key-mapping.md

0.63.03.6 KB
Original Source

ADR-0004: Entity Join Key Mapping

Status

Accepted

Context

Multiple different entity keys in the source data may need to map onto the same entity from the feature data table during a join. For example, spammer_id and reporter_id may both need the years_on_platform feature from a table keyed by user_id.

Without entity join key mapping:

  • Users had to rename columns in their entity dataframe to match the feature view's join key before retrieval.
  • It was impossible to join a feature view twice on two different columns in the entity data (e.g., getting user features for both spammer_id and reporter_id in the same query).

Example

Entity source data:

spammer_idreporter_idtimestamp
281629909366
121629909323

Desired joined data should include spammer_feature_a and reporter_feature_a, both sourced from the same user feature view but joined on different keys.

Decision

Implement join key overrides using a with_join_key_map() method on feature views, combined with with_name() for disambiguation. This was Option 8b from the RFC.

API

python
abuse_feature_service = FeatureService(
    name="my_abuse_model_v1",
    features=[
        user_features
            .with_name("reporter_features")
            .with_join_key_map({"user_id": "reporter_id"}),
        user_features
            .with_name("spammer_features")
            .with_join_key_map({"user_id": "spammer_id"}),
    ],
)

Key Decisions

  • Query-time mapping rather than registration-time. This provides flexibility since the same feature view can be used with different mappings in different contexts.
  • Join key level mapping rather than entity-level mapping. While entity-level mapping (Option 10) better preserves abstraction boundaries, join key mapping is more flexible and doesn't require registering additional entities.
  • with_name() required when using the same feature view multiple times to avoid output column name collisions. If omitted, a name collision error is raised.
  • Mapping overwrites wholly: specifying a mapping replaces the default join behavior entirely. If you want the original join key included, it must be explicitly listed.

Implementation

  • Offline (historical) retrieval: After feature subtable cleaning and dedup, entity columns are renamed based on the mapping before the join.
  • Online retrieval: Shadow entity keys are translated to the original join key for the online store lookup, then results are remapped to the shadow entity names.
  • The join_key_map is stored on FeatureViewProjection and flows through both online and offline retrieval paths.

Consequences

Positive

  • Users can join the same feature view on different entity columns in a single query.
  • No need to register additional entities or manually rename columns before retrieval.
  • Works consistently across both online and offline retrieval.
  • Feature view definitions remain clean and reusable.

Negative

  • Adds complexity to the retrieval path with column renaming logic.
  • Users must remember to use with_name() to avoid collisions when joining the same feature view multiple times.

References

  • Original RFC: Feast RFC-023: Shadow Entities Mapping
  • GitHub Issue: #1762
  • Implementation: sdk/python/feast/feature_view.py (with_join_key_map method), sdk/python/feast/feature_view_projection.py (join_key_map field)