examples/credit-risk-end-to-end/02_Deploying_the_Feature_Store.ipynb
Feast enables AI/ML teams to serve (and consume) features via feature stores. In this notebook, we will configure the feature stores and feature definitions, and deploy a Feast feature store server. We will also materialize (move) data from the offline store to the online store.
In Feast, offline stores support pulling large amounts of data for model training using tools like Redshift, Snowflake, Bigquery, and Spark. In contrast, the focus of Feast online stores is feature serving in support of model inference, using tools like Redis, Snowflake, PostgreSQL, and SQLite.
In this notebook, we will setup a file-based (Dask) offline store and SQLite online store. The online store will be made available through the Feast server.
This notebook assumes that you have prepared the data by running the notebook 01_Credit_Risk_Data_Prep.ipynb.
The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.
# Imports
import re
import sys
import time
import signal
import sqlite3
import subprocess
import datetime as dt
from feast import FeatureStore
For model training, we usually don't need (or want) a constantly running feature server. All we need is the ability to efficiently query and pull all of the training data at training time. In contrast, during model serving we need servers that are always ready to supply feature records in response to application requests.
This training-serving dichotomy is reflected in Feast using "offline" and "online" stores. Offline stores are configured to work with database technologies typically used for training, while online stores are configured to use storage and streaming technologies that are popular for feature serving.
We need to create a feature_store.yaml config file to tell feast the structure we want in our offline and online feature stores. Below, we write the configuration for a local "Dask" offline store and local SQLite online store. We give the feature store a project name of loan_applications, and provider local. The registry is where the feature store will keep track of feature definitions and online store updates; we choose a file location in this case.
See the feature_store.yaml documentation for further details.
%%writefile Feature_Store/feature_store.yaml
project: loan_applications
registry: data/registry.db
provider: local
offline_store:
type: dask
online_store:
type: sqlite
path: data/online_store.db
entity_key_serialization_version: 3
We also need to create feature definitions and other feature constructs in a python file, which we name feature_definitions.py. For our purposes, we define the following:
For more information on these, see the Concepts section of the Feast documentation.
%%writefile Feature_Store/feature_definitions.py
# Imports
import os
from pathlib import Path
from feast import (
FileSource,
Entity,
FeatureView,
Field,
FeatureService
)
from feast.types import Float32, String
from feast.data_format import ParquetFormat
CURRENT_DIR = os.path.abspath(os.curdir)
# Data Sources
# A data source tells Feast where the data lives
data_a = FileSource(
file_format=ParquetFormat(),
path=Path(CURRENT_DIR,"data/data_a.parquet").as_uri()
)
data_b = FileSource(
file_format=ParquetFormat(),
path=Path(CURRENT_DIR,"data/data_b.parquet").as_uri()
)
# Entity
# An entity tells Feast the column it can use to join tables
loan_id = Entity(
name = "loan_id",
join_keys = ["ID"]
)
# Feature views
# A feature view is how Feast groups features
features_a = FeatureView(
name="data_a",
entities=[loan_id],
schema=[
Field(name="checking_status", dtype=String),
Field(name="duration", dtype=Float32),
Field(name="credit_history", dtype=String),
Field(name="purpose", dtype=String),
Field(name="credit_amount", dtype=Float32),
Field(name="savings_status", dtype=String),
Field(name="employment", dtype=String),
Field(name="installment_commitment", dtype=Float32),
Field(name="personal_status", dtype=String),
Field(name="other_parties", dtype=String),
],
source=data_a
)
features_b = FeatureView(
name="data_b",
entities=[loan_id],
schema=[
Field(name="residence_since", dtype=Float32),
Field(name="property_magnitude", dtype=String),
Field(name="age", dtype=Float32),
Field(name="other_payment_plans", dtype=String),
Field(name="housing", dtype=String),
Field(name="existing_credits", dtype=Float32),
Field(name="job", dtype=String),
Field(name="num_dependents", dtype=Float32),
Field(name="own_telephone", dtype=String),
Field(name="foreign_worker", dtype=String),
],
source=data_b
)
# Feature Service
# a feature service in Feast represents a logical group of features
loan_fs = FeatureService(
name="loan_fs",
features=[features_a, features_b]
)
Now that we have our feature store configuration (feature_store.yaml) and feature definitions (feature_definitions.py), we are ready to "apply" them. The feast apply command creates a registry file (Feature_Store/data/registry.db) and sets up data connections; in this case, it creates a SQLite database (Feature_Store/data/online_store.db).
# Run 'feast apply' in the Feature_Store directory
!feast --chdir ./Feature_Store apply
# List the Feature_Store/data/ directory to see newly created files
!ls -nlh Feature_Store/data/
Note that while feast apply set up the sqlite online database, online_store.db, no data has been added to the online database as of yet. We can verify this by connecting with the sqlite3 library.
# Connect to sqlite database
conn = sqlite3.connect("Feature_Store/data/online_store.db")
cursor = conn.cursor()
# Query table data (3 tables)
print(
"Online Store Tables: ",
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
)
print(
"loan_applications_data_a data: ",
cursor.execute("SELECT * FROM loan_applications_data_a").fetchall()
)
print(
"loan_applications_data_b data: ",
cursor.execute("SELECT * FROM loan_applications_data_b").fetchall()
)
conn.close()
Since we have used feast apply to create the registry, we can now use the Feast Python SDK to interact with our new feature store. To see other possible commands see the Feast Python SDK documentation.
# Get feature store config
store = FeatureStore(repo_path="./Feature_Store")
store.config
# List feature views
feature_views = store.list_batch_feature_views()
for fv in feature_views:
print(f"Feature view: {fv.name} | Features: {fv.features}")
If you wish to share a feature store with your team, Feast provides feature servers. To spin up an offline feature server process, we can use the feast serve_offline command, while to spin up a Feast online feature server, we use the feast serve command.
Let's spin up an offline and an online server that we can use in the subsequent notebooks to get features during model training and model serving. We will run both servers as background processes, that we can communicate with in the other notebooks.
First, we write a helper function to extract the first few printed log lines (so we can print it in the notebook cell output).
# TimeoutError class
class TimeoutError(Exception):
pass
# TimeoutError raise function
def timeout():
raise TimeoutError("timeout")
# Get first few log lines function
def print_first_proc_lines(proc, wait):
'''Given a process, `proc`, read and print output lines until they stop
comming (waiting up to `wait` seconds for new lines to appear)'''
lines = ""
while True:
signal.signal(signal.SIGALRM, timeout)
signal.alarm(wait)
try:
lines += proc.stderr.readline()
except:
break
if lines:
print(lines, file=sys.stderr)
Launch the offline server with the command feast --chdir ./Feature_Store serve_offline.
# Feast offline server process
offline_server_proc = subprocess.Popen(
"feast --chdir ./Feature_Store serve_offline 2>&2 & echo $! > server_proc.txt",
shell=True,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=0
)
print_first_proc_lines(offline_server_proc, 2)
The tail end of the command above, 2>&2 & echo $! > server_proc.txt, captures log messages (in the offline case there are none), and writes the process PID to the file server_proc.txt (we will use this in the cleanup notebook, 05_Credit_Risk_Cleanup.ipynb).
Next, launch the online server with the command feast --chdir ./Feature_Store serve.
# Feast online server (master and worker) processes
online_server_proc = subprocess.Popen(
"feast --chdir ./Feature_Store serve 2>&2 & echo $! >> server_proc.txt",
shell=True,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=0
)
print_first_proc_lines(online_server_proc, 3)
Note that the output helpfully let's us know that the online server is "Listening at: http://127.0.0.1:6566" (the default host:port).
List the running processes to verify they are up.
# List running Feast processes (paths redacted)
running_procs = !ps -ef | grep feast | grep serve
for line in running_procs:
redacted = re.sub(r'/*[^\s]*(?P<cmd>(python )|(feast ))', r'**/\g<cmd>', line)
print(redacted)
Note that there are two process for the online server (master and worker).
At this point, there is no data in the online store yet. Let's use the SDK feature store object (that we created above) to "materialize" data; this is Feast lingo for moving/updating data from the offline store to the online store.
# Materialize
# Recall that we mocked the outcome data to have timestamps from
# 'Tue Sep 24 12:00:00 2023'out to "Wed Oct 9 12:00:00 2023"
# The loan outcome timestamps were then lagged by 30-90 days (which is Jan 7 12:00:00 2024)
res = store.materialize(
start_date=dt.datetime(2023,9,24,12,0,0),
end_date=dt.datetime(2024,1,7,12,0,0)
)
Now, we can query the SQLite database again and see data in the response!
# Query the online store database to verify materialized data
conn = sqlite3.connect("Feature_Store/data/online_store.db")
cursor = conn.cursor()
print(
"loan_applications_data_a data: ",
cursor.execute("SELECT * FROM loan_applications_data_a LIMIT 2").fetchall()
)
print(
"loan_applications_data_b data: ",
cursor.execute("SELECT * FROM loan_applications_data_b LIMIT 2").fetchall()
)
conn.close()
Note that the data is stored in binary strings, which is part of Feast's optimization for online queries. To get human-readable data, use the get-online-features REST API command, which returns a JSON response.
# curl command to online server to get data from the online store
cmd = """http://localhost:6566/get-online-features \
-d '{
"feature_service": "loan_fs",
"entities": {"ID": [18, 764]}
}'
"""
response = !curl -X POST {cmd}
response
The curl command gave us a quick validation. In the 04_Credit_Risk_Model_Serving.ipynb notebook, we'll use the Python requests library to handle the query better.
Now that the feature stores and their respective servers have been configured and deployed, we can proceed to train an AI model in 03_Credit_Risk_Model_Training.ipynb.