metadata-models/docs/entities/mlFeatureTable.md
The ML Feature Table entity represents a collection of related machine learning features organized together in a feature store. Feature tables are fundamental building blocks in the ML feature management ecosystem, grouping features that share common characteristics such as the same primary keys, update cadence, or data source. They bridge the gap between raw data in data warehouses and the features consumed by ML models during training and inference.
ML Feature Tables are identified by two pieces of information:
feast, tecton, sagemaker, etc. See dataplatform for more details.An example of an ML Feature Table identifier is urn:li:mlFeatureTable:(urn:li:dataPlatform:feast,users_feature_table).
The identity is defined by the mlFeatureTableKey aspect, which contains:
platform: A URN reference to the data platform hosting the feature tablename: The unique name of the feature table within that platformML Feature Tables support comprehensive metadata through the mlFeatureTableProperties aspect. This aspect captures the essential characteristics of the feature table:
Feature tables can have detailed descriptions explaining their purpose, the type of features they contain, and when they should be used. This documentation helps data scientists and ML engineers discover and understand feature tables in their organization.
<details> <summary>Python SDK: Create an ML Feature Table with properties</summary>{{ inline /metadata-ingestion/examples/library/mlfeature_table_create_with_properties.py show_path_as_comment }}
The most important property of a feature table is the collection of features it contains. Feature tables maintain explicit relationships to their constituent features through the mlFeatures property. This creates a "Contains" relationship between the feature table and each individual feature, enabling:
{{ inline /metadata-ingestion/examples/library/mlfeature_table_add_features.py show_path_as_comment }}
Feature tables define one or more primary keys that uniquely identify each row in the table. These primary keys are critical for:
When multiple primary keys are specified, they act as a composite key. The mlPrimaryKeys property creates a "KeyedBy" relationship to each primary key entity.
{{ inline /metadata-ingestion/examples/library/mlfeature_table_add_primary_keys.py show_path_as_comment }}
Feature tables support custom properties through the customProperties field, allowing you to capture platform-specific or organization-specific metadata that doesn't fit into the standard schema. This might include information like:
While primary keys are referenced from feature tables, they are separate entities with their own properties defined in the mlPrimaryKeyProperties aspect. Understanding primary key metadata is essential for proper feature table usage:
Primary keys have a data type (defined using MLFeatureDataType) that specifies the type of values:
ORDINAL: Integer valuesNOMINAL: Categorical valuesBINARY: Boolean valuesCOUNT: Count valuesTIME: Timestamp valuesTEXT: String valuesCONTINUOUS, INTERVALPrimary keys can declare their source datasets through the sources property. This creates lineage relationships showing which upstream datasets the primary key values are derived from. This is crucial for understanding data provenance and impact analysis.
Primary keys support versioning through the version property, allowing teams to track changes to key definitions over time and maintain multiple versions in parallel.
Like other DataHub entities, ML Feature Tables support tags and glossary terms for classification and discovery:
globalTags aspect) provide lightweight categorizationglossaryTerms aspect) link to business definitions and conceptsRead this blog to understand when to use tags vs terms.
Ownership is associated with feature tables using the ownership aspect. Owners can be individuals or teams responsible for maintaining the feature table. Clear ownership is essential for:
Feature tables can be organized into domains (via the domains aspect) to represent organizational structure or functional areas. This helps teams manage large feature catalogs by grouping related feature tables together.
Here's a comprehensive example that creates a feature table with all core aspects:
<details> <summary>Python SDK: Create a complete ML Feature Table</summary>{{ inline /metadata-ingestion/examples/library/mlfeature_table_create_complete.py show_path_as_comment }}
You can retrieve ML Feature Table metadata using both the Python SDK and REST API:
<details> <summary>Python SDK: Read an ML Feature Table</summary>{{ inline /metadata-ingestion/examples/library/mlfeature_table_read.py show_path_as_comment }}
# Get the complete entity with all aspects
curl 'http://localhost:8080/entities/urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)'
# Get relationships to see features and primary keys
curl 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3AmlFeatureTable%3A(urn%3Ali%3AdataPlatform%3Afeast,users_feature_table)&types=Contains,KeyedBy'
ML Feature Tables integrate with multiple other entities in DataHub's metadata model:
Feature tables contain ML Features through the "Contains" relationship. Each feature in the mlFeatures array represents an individual feature that can be:
Navigation works bidirectionally - from feature table to features, and from features back to their parent tables.
Feature tables reference ML Primary Keys through the "KeyedBy" relationship. Primary keys:
sources propertyWhile not directly referenced in feature table metadata, ML Models consume features through the mlFeatures property in MLModelProperties. This creates a "Consumes" lineage relationship showing which models use features from a particular feature table. This lineage enables:
Feature tables have indirect relationships to datasets through two paths:
sources property, creating "DerivedFrom" lineageThis lineage connects the feature store to upstream data warehouses, enabling end-to-end data lineage from raw data to model predictions.
Feature tables are associated with a specific data platform (e.g., Feast, Tecton) through the platform property in the key aspect. This creates a "SourcePlatform" relationship that:
Different feature store platforms have different capabilities and concepts:
When ingesting from these platforms, ensure the naming conventions match the platform's terminology for consistency.
Unlike datasets which have both datasetProperties and editableDatasetProperties, feature tables have:
mlFeatureTableProperties: The main properties aspect (usually from ingestion)editableMlFeatureTableProperties: UI-editable description onlyFor custom metadata, use the customProperties map in mlFeatureTableProperties rather than creating custom aspects.
When using the SDK to create feature tables:
This is different from some other DataHub entities where child entities can be created inline.
Feature table lineage is typically established through the features and primary keys it contains:
upstreamLineage aspectssources propertiesThis design reflects that features are the atomic unit of lineage in ML systems, while feature tables are organizational constructs.