Back to Feast

WE MUST ENSURE PYTHON CONSISTENCY BETWEEN NOTEBOOK AND FEAST SERVERS

examples/rhoai-quickstart/feast-demo-quickstart.ipynb

0.63.013.8 KB
Original Source

Installing Feast

Feast is a python dependency so we have to install it using pip

python
# WE MUST ENSURE PYTHON CONSISTENCY BETWEEN NOTEBOOK AND FEAST SERVERS
# LAUNCH THIS NOTEBOOK FROM A CLEAN PYTHON ENVIRONMENT >3.9
%pip install -q feast==0.40.1
# grpcio is needed as a dependency in the later section of the example to run the feast registry server.
%pip install -q grpcio

Creating and initializing Feast project

python
# Displaying the current directory. We will know where the feast files will be created so that we can review them using jupyter console or explorer
%pwd
python
# Creating the feast repository. If there is already existing repository then removing it first.
!rm -rf my_feast_project
!feast init my_feast_project

Above output displays where the feast repo has been created. It may differ based on the environment configuration.

python
# Going to change the current directory to feature_repo so that we can execute feast CLI commands.
%cd my_feast_project/feature_repo
python
# Inspect the feast repo path files. Displaying folder strucuture as tree. Going to describe each file/folder purpose.
!find . | sed -e 's/[^-][^\/]*\// |-- /g' -e 's/|-- \(.*\)/+-- \1/'

Now the feast repo has been created for you. Running the feast init command populated the directory with an example feature store structure, complete with example data.

We are defining an entity for the driver in the current example. You can think of an entity as a primary key used to fetch features. Rest of the example will work on the driver data. All the data is coming from the data/driver_stats.parquet file which will act as offline store in our example.

Inspect the below files before going further in the current example.

data contains the parquet file data used to demonstrate this example.

example_repo.py file will have the code to create feast objects such as FeatureView, FeatureServices and OnDemandFeatureViews required to demonstrate this example. my_feast_project/feature_repo/example_repo.py

feature_store.yaml file will have all the configurations related to feast. my_feast_project/feature_repo/feature_store.yaml

test_workflow.py contains the python code to demonstrate run all key Feast commands, including defining, retrieving, and pushing features. my_feast_project/feature_repo/test_workflow.py

python
!cat feature_store.yaml

File data/driver_stats.parquet is generated by the feast init command and it acts a historical information source to this example. We have defined this source in the my_feast_project/feature_repo/feature_definitions.py file.

python
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="/opt/app-root/src/feast/examples/rhoai-quickstart/my_feast_project/feature_repo/data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)
python
import pandas as pd
pd.read_parquet("data/driver_stats.parquet")

You have not created any feast objects to do that you have to execute command feast apply on the directory where feature_store.yaml exists. Lets go and do that now.

python
# Below folder is creating interference with the feast apply command so deleting it in case if it exists.
!rm -rf .ipynb_checkpoints/
python
# this command will actual creates the feast objects mentioned in `example_repo.py`
!feast apply

Generating the training Data

To train a model, we need features and labels. Often, this label data is stored separately (e.g. you have one table storing user survey results and another set of tables with feature values). Feast can help generate the features that map to these labels.

Feast needs a list of entities (e.g. driver ids) and timestamps. Feast will join relevant tables to create the relevant feature vectors. There are two ways to generate this list:

  • The user can query that table of labels with timestamps and pass that into Feast as an entity dataframe for training data generation.

  • The user can also query that table with a SQL query which pulls entities. See the documentation on feature retrieval for details

Note: we include timestamps because we want the features for the same driver at various timestamps to be used in a model.

python
from datetime import datetime
import pandas as pd

from feast import FeatureStore

# Note: see https://docs.feast.dev/getting-started/concepts/feature-retrieval for 
# more details on how to retrieve for all entities in the offline store instead
entity_df = pd.DataFrame.from_dict(
    {
        # entity's join key -> entity values
        "driver_id": [1001, 1002, 1003],
        # "event_timestamp" (reserved key) -> timestamps
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
        ],
        # (optional) label name -> label values. Feast does not process these
        "label_driver_reported_satisfaction": [1, 5, 3],
        # values we're using for an on-demand transformation
        "val_to_add": [1, 2, 3],
        "val_to_add_2": [10, 20, 30],
    }
)

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

Run offline inference (batch scoring)

To power a batch model, we primarily need to pull features with the get_historical_features call, but using the current timestamp

python
entity_df["event_timestamp"] = pd.to_datetime("now", utc=True)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("\n----- Example features -----\n")
print(training_df.head())

Ingest batch features into your online store

This command will generate the features from offline store and stores into online store. This command will call get_historical_features to get the data from offline store.

python
!feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Fetching feature vectors for inference

At inference time, we need to quickly read the latest feature values for different drivers (which otherwise might have existed only in batch sources) from the online feature store using get_online_features(). These feature vectors can then be fed to the model.

python
from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()

pprint(feature_vector)

Using a feature service to fetch online features instead.

You can also use feature services to manage multiple features, and decouple feature view definitions and the features needed by end applications. The feature store can also be used to fetch either online or historical features using the same API below.

The driver_activity_v4 feature service pulls all features from the driver_hourly_stats feature view:

python
import example_repo
from feast import FeatureStore


from feast import FeatureService
driver_activity_v4 = FeatureService(
    name="driver_activity_v4",
    features=[example_repo.driver_stats_fresh_fv],
)

feature_store = FeatureStore('.')  # Initialize the feature store

feature_store.apply([driver_activity_v4])

print("FeatureService driver_activity_v4 created.")
python
import example_repo
from pprint import pprint


#feature_service = feature_store.get_feature_service("driver_activity_v4")
feature_vector = feature_store.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(feature_vector)

Accessing Features using remote online store

In this section we will start the feast in server and client mode. We will start the feast online server and retrieve online features using remote online store.

By default online server starts on the port: 6566. We are going to still refer the same registry as my_feast_project to keep this example simple to understand instead of starting registry and online server at the same time. You can review the client feature store configuration here.

In the actual production environment you can run registry, online and offline servers and access them remotely using feature store clients.

Starting feast online feature server

python
import subprocess

# Run feast serve in the background
feast_online_server_process = subprocess.Popen(["feast", "serve"])
python
%%sh
# checking if the online server process started.
ps -ef | grep 'feast serve'

Retrieving the features using online remote client

python
import os
import yaml

directory = os.path.abspath("./../../remote-online")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': './../my_feast_project/feature_repo/data/registry.db',
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 3
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-online feature_Store.yaml file has been created.")

python
%cd ./../../remote-online
python
online_feature_store_client = FeatureStore('.')
online_feature_store_client.apply([])
print("remote online feature store client has been initialized.")

Now we are going to retrieve the same features we have retrieved in previous section. Here we are client store going to retrieve the features using remote feature store.

python
online_features_stores_client = online_feature_store_client.get_online_features(
    features=driver_activity_v4,
    entity_rows=[
        # {join_key: entity_value}
        {"driver_id": 1004},
        {"driver_id": 1005},
    ],
).to_dict()
pprint(online_features_stores_client)

Accessing Feast Registry metadata using remote registry store

Registry is going to have all the metadata information of feast objects such as FeatureService, FeatureViews. Either you can directly access this information using the way referred in above section.

The other way to access in the client server model. You can start the registry server and access them using remote registry client as shown in this section.

The default port for the registry is 6570

Starting the registry server as remote

Change the current directory context to initial feature store so that we can start the registry server.

python
%cd ./../my_feast_project/feature_repo
python
import subprocess

# Run feast serve in the background
feast_remote_registry_server_process = subprocess.Popen(["feast", "serve_registry"])
print("Registry server started on the default port 6570. Go to next cell and check if the process is available.")
python
%%sh
# checking if the registry server process started.
pwd
ps -ef | grep 'feast serve_registry'

Initializing the remote registry client and retrieving the feast metadata

python
import os
import yaml

directory = os.path.abspath("./../../remote-registry")
os.makedirs(directory, exist_ok=True)

data = {
    'project': 'my_feast_project',
    'registry': {
        'registry_type': 'remote',
        'path': 'localhost:6570'
    },
    'provider': 'local',
    'online_store': {
        'type': 'remote',
        'path': 'http://127.0.0.1:6566'
    },
    'entity_key_serialization_version': 3
}

file_path = os.path.join(directory, 'feature_store.yaml')

# Write to a YAML file
with open(file_path, 'w') as file:
    yaml.dump(data, file, default_flow_style=False)

print("remote-registry feature_Store.yaml file has been created.")
python
%cd ./../../remote-registry
python
registry_feature_store_client = FeatureStore('.')
registry_feature_store_client.apply([])
print("Remote registry feature store client has been initialized.")
python
# Listing all feature views using remote registry client
registry_feature_store_client.list_all_feature_views(allow_cache=False)
python
# Listing all feature services using remote registry client
registry_feature_store_client.list_feature_services()

Stopping the online, registry server

python
%%sh
# checking if the registry server and online server process is already running.
pwd
ps -ef | grep 'feast serve'
python
feast_online_server_process.terminate()  # Stop the remote Feast online server
feast_remote_registry_server_process.terminate() # stops the remote registry server
print("remote online and registry server has been stopped.")

python
%%sh
# checking if the registry server and online server process stopped. wait for some time until it kills.
pwd
ps -ef | grep 'feast serve'