docs/tutorials/azure/notebooks/part1-load-data.ipynb
The objective of this tutorial is to build a model that predicts if a driver will complete a trip based on a number of features ingested into Feast.
This notebook creates you will create and load data into the following 3 feature tables.
Customer Profile: This contains features related to a customer entity such as current balance, lifetime trip count, average number of passengers per trip. A snippet of data:
| datetime | customer_id | current_balance | avg_passenger_count | lifetime_trip_count | created |
|---|---|---|---|---|---|
| 2021-03-17 19:31 | 1010 | 0.889188 | 0.049057 | 412 | 2021-03-24 19:38 |
| 2021-03-18 19:31 | 1010 | 0.979273 | 0.212630 | 639 | 2021-03-24 19:38 |
| 2021-03-19 19:31 | 1010 | 0.976549 | 0.176881 | 70 | 2021-03-24 19:38 |
| 2021-03-20 19:31 | 1010 | 0.273697 | 0.325012 | 68 | 2021-03-24 19:38 |
Driver table: This contains features related to a driver entity such as conversion rate, average number of daily trips. A snippet of data:
| datetime | driver_id | conv_rate | acc_rate | avg_daily_trips | created |
|---|---|---|---|---|---|
| 2021-03-17 19:31 | 5010 | 0.229297 | 0.685843 | 861 | 2021-03-24 19:34 |
| 2021-03-17 20:31 | 5010 | 0.781655 | 0.861280 | 769 | 2021-03-24 19:34 |
| 2021-03-17 21:31 | 5010 | 0.150333 | 0.525581 | 778 | 2021-03-24 19:34 |
| 2021-03-17 22:31 | 5010 | 0.951701 | 0.228883 | 570 | 2021-03-24 19:34 |
Orders table: This is a typical fact table that contains the order information such driver/customer id and whether the trip was completed. A snippet of data:
| order_id | driver_id | customer_id | order_is_success | event_timestamp |
|---|---|---|---|---|
| 100 | 5004 | 1007 | 0 | 2021-03-10 19:31:15 |
| 101 | 5003 | 1006 | 0 | 2021-03-11 22:02:50 |
| 102 | 5010 | 1005 | 0 | 2021-03-13 00:34:24 |
| 103 | 5010 | 1001 | 1 | 2021-03-14 03:05:59 |
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.sql import text
from azureml.core import Workspace
ws = Workspace.from_config()
kv = ws.get_default_keyvault()
engine = create_engine(kv.get_secret("FEAST-OFFLINE-STORE-CONN"))
The cell below will create the customer profile table and load the data into the Synapse table. Loading is achieved using the ``COPY INTO` bulk load available in Synapse (the CSV data is available on public blob):
COPY INTO dbo.customer_profile
FROM 'https://feastonazuredatasamples.blob.core.windows.net/feastdatasamples/customer_profile.csv'
WITH
(
FILE_TYPE = 'CSV'
,FIRSTROW = 2
,MAXERRORS = 0
)
with engine.connect() as con:
# create and load customer profile table
file = open("../sql/create_cx_profile_table.sql")
query = text(file.read())
print("creating customer profile table...", end="")
con.execute(query)
print("done")
file = open("../sql/load_cx_profile_data.sql")
query = text(file.read())
print("loading customer profile data...", end="")
con.execute(query)
print("done")
pd.read_sql_query("select top 10 * from dbo.customer_profile", engine)
The cell below will create the drivers table and load the data into the Synapse table. Loading is achieved using the ``COPY INTO` bulk load available in Synapse (the CSV data is available on public blob):
COPY INTO dbo.driver_hourly
FROM 'https://feastonazuredatasamples.blob.core.windows.net/feastdatasamples/driver_hourly.csv'
WITH
(
FILE_TYPE = 'CSV'
,FIRSTROW = 2
,MAXERRORS = 0
)
with engine.connect() as con:
file = open("../sql/create_drivers_table.sql")
query = text(file.read())
print("creating drivers table...", end="")
con.execute(query)
print("done")
file = open("../sql/load_drivers_data.sql")
query = text(file.read())
print("loading drivers data...", end="")
con.execute(query)
print("done")
pd.read_sql_query("select top 10 * from dbo.driver_hourly", engine)
The cell below will create the orders table and load the data into the Synapse table. Loading is achieved using the ``COPY INTO` bulk load available in Synapse (the CSV data is available on public blob):
COPY INTO dbo.orders
FROM 'https://feastonazuredatasamples.blob.core.windows.net/feastdatasamples/orders.csv'
WITH
(
FILE_TYPE = 'CSV'
,FIRSTROW = 2
,MAXERRORS = 0
)
with engine.connect() as con:
file = open("../sql/create_orders_table.sql")
query = text(file.read())
print("creating orders table...", end="")
con.execute(query)
print("done")
file = open("../sql/load_orders_data.sql")
query = text(file.read())
print("loading orders data...", end="")
con.execute(query)
print("done")
pd.read_sql_query("select top 10 * from dbo.orders", engine)
With the feature values loaded into the feature store, you will need to register the features in the feast central registry. Follow the Register Features part of the tutorial.