docs/tutorials/azure/notebooks/part2-register-features.ipynb
Copyright (c) Microsoft Corporation. Licensed under the MIT license.
In this notebook you will connect to your feature store and register features into a central repository hosted on Azure Blob Storage. It should be noted that best practice for registering features would be through a CI/CD process e.g. GitHub Actions, or Azure DevOps.
Feast is an operational data system for managing and serving machine learning features to models in production. Feast is able to serve feature data to models from a low-latency online store (for real-time prediction) or from an offline store (for scale-out batch scoring or model training).
The cell below displays the feature_store.yaml file - a file that contains infrastructural configuration, such as where the registry file is located, and connection strings to data.
There is no need to change the details in this file. When you connect to the feature store afterwards, the credentials are resolved from the Azure ML default keyvault.
!cat feature_repo/feature_store.yaml
Below you connect to the feature store.
import os
from feast import FeatureStore
from azureml.core import Workspace
# access key vault to get secrets
ws = Workspace.from_config()
kv = ws.get_default_keyvault()
os.environ['REGISTRY_PATH']=kv.get_secret("FEAST-REGISTRY-PATH")
os.environ['SQL_CONN']=kv.get_secret("FEAST-OFFLINE-STORE-CONN")
os.environ['REDIS_CONN']=kv.get_secret("FEAST-ONLINE-STORE-CONN")
# connect to feature store
fs = FeatureStore("./feature_repo")
The data source refers to raw underlying data (a table in Azure SQL DB or Synapse SQL). Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store.
from feast.infra.offline_stores.contrib.mssql_offline_store.mssqlserver_source import MsSqlServerSource
orders_table = "orders"
driver_hourly_table = "driver_hourly"
customer_profile_table = "customer_profile"
driver_source = MsSqlServerSource(
table_ref=driver_hourly_table,
event_timestamp_column="datetime",
created_timestamp_column="created",
)
customer_source = MsSqlServerSource(
table_ref=customer_profile_table,
event_timestamp_column="datetime",
created_timestamp_column="",
)
A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline (training) and online (serving) environment.
Feature views are used during:
NOTE: Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form.
from feast import Feature, FeatureView, ValueType
from datetime import timedelta
driver_fv = FeatureView(
name="driver_stats",
entities=["driver"],
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT32),
],
batch_source=driver_source,
ttl=timedelta(hours=2),
)
customer_fv = FeatureView(
name="customer_profile",
entities=["customer_id"],
features=[
Feature(name="current_balance", dtype=ValueType.FLOAT),
Feature(name="avg_passenger_count", dtype=ValueType.FLOAT),
Feature(name="lifetime_trip_count", dtype=ValueType.INT32),
],
batch_source=customer_source,
ttl=timedelta(days=2),
)
An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers.
Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities (more than one entity object) in a feature view. Entities should be reused across feature views.
A related concept is an entity key. These are one or more entity values that uniquely describe a feature view record. In the case of an entity (like a driver) that only has a single entity field, the entity is an entity key. However, it is also possible for an entity key to consist of multiple entity values. For example, a feature view with the composite entity of (customer, country) might have an entity key of (1001, 5).
Entity keys act as primary keys. They are used during the lookup of features from the online store, and they are also used to match feature rows across feature views during point-in-time joins.
from feast import Entity
driver = Entity(name="driver", join_key="driver_id", value_type=ValueType.INT64)
customer = Entity(name="customer_id", value_type=ValueType.INT64)
apply()Feast apply will:
fs.apply([driver, driver_fv, customer, customer_fv])
If you look in your feast registry storage account, you will see there is now a registry.db file that contains the metadata for your registered features. Below you can list the feature views:
import pandas as pd
from google.protobuf.json_format import MessageToDict
for x in fs.list_feature_views():
d=MessageToDict(x.to_proto())
print("๐ช Feature view name:", d['spec']['name'])
print("๐ง Entities:", d['spec']['entities'])
print("๐งช Features:", d['spec']['features'])
print("๐พ Batch source type:", d['spec']['batchSource']['dataSourceClassType'])
print("\n")
In the next part of this tutorial you will: