examples/00_quick_start/sasrec_amazon.ipynb
This is a class of sequential recommendation that uses Transformer [2] for encoding the users preference represented in terms of a sequence of items purchased/viewed before. Instead of using CNN (Caser [3]) or RNN (GRU [4], SLI-Rec [5] etc.) the approach relies on Transformer based encoder that generates a new representation of the item sequence. Two variants of this Transformer based approaches are included here,
This notebook provides an example of necessary steps to train and test either a SASRec or a SSE-PT model.
%load_ext autoreload
%autoreload 2
import os
import sys
import pandas as pd
import torch
from recommenders.utils.timer import Timer
from recommenders.datasets.amazon_reviews import get_review_data
from recommenders.datasets.split_utils import filter_k_core
from recommenders.models.sasrec.model import SASREC
from recommenders.models.sasrec.ssept import SSEPT
from recommenders.models.sasrec.sampler import WarpSampler
from recommenders.models.sasrec.util import SASRecDataSet
from recommenders.utils.notebook_utils import store_metadata
print(f"System version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
num_epochs = 5
batch_size = 128
seed = 100 # Set None for non-deterministic result
# data_dir = os.path.join("tests", "recsys_data", "RecSys", "SASRec-tf2", "data")
data_dir = os.path.join("..", "..", "tests", "resources", "deeprec", "sasrec")
# Amazon Electronics Data
dataset = "reviews_Electronics_5"
lr = 0.001 # learning rate
maxlen = 50 # maximum sequence length for each user
num_blocks = 2 # number of transformer blocks
hidden_units = 100 # number of units in the attention calculation
num_heads = 1 # number of attention heads
dropout_rate = 0.1 # dropout rate
l2_emb = 0.0 # L2 regularization coefficient
num_neg_test = 100 # number of negative examples per positive example
model_name = "sasrec" # "sasrec" or "ssept"
reviews_name = dataset + '.json'
outfile = dataset + '.txt'
reviews_file = os.path.join(data_dir, reviews_name)
if not os.path.exists(reviews_file):
reviews_output = get_review_data(reviews_file)
else:
reviews_output = os.path.join(data_dir, dataset+".json_output")
if not os.path.exists(os.path.join(data_dir, outfile)):
df = pd.read_csv(reviews_output, sep="\t", names=["userID", "itemID", "time"])
df = filter_k_core(df, 10) # filter for users & items with less than 10 interactions
user_set, item_set = set(df['userID'].unique()), set(df['itemID'].unique())
user_map = dict()
item_map = dict()
for u, user in enumerate(user_set):
user_map[user] = u+1
for i, item in enumerate(item_set):
item_map[item] = i+1
df["userID"] = df["userID"].apply(lambda x: user_map[x])
df["itemID"] = df["itemID"].apply(lambda x: item_map[x])
df = df.sort_values(by=["userID", "time"])
df.drop(columns=["time"], inplace=True)
df.to_csv(os.path.join(data_dir, outfile), sep="\t", header=False, index=False)
SASRec requires sequence input and sequence target. Targets are for both positive and negative examples. Inputs to the model are
From each user's history three samples are created. If there are $N_u$ items for user-$u$ then $N_u-2$ items are used in training and the last two items are used for validation and testing, respectively.
each row has user-id and item-id converted into integers (starting from 1)
the rows are sorted by user-id and time of interaction
for every user the last item is used for testing and the last but one is used for validation
for example, for user 30449 the sorted inputs are:
30449 277130449 6184230449 6029330449 3004730449 6329630449 2204230449 671730449 75780then the train inputs are
2771, 61842, 60293, 30047, 63296] (input sequence)61842, 60293, 30047, 63296, 22042] (target sequence for positive examples)1001, 50490, 33312, 19294, 45342] (sample negative examples)and the validation inputs are
2771, 61842, 60293, 30047, 63296, 22042] (input sequence)61842, 60293, 30047, 63296, 22042, 6717] (target sequence for positive examples)4401, 60351, 22176, 23456, 45342, '1193`] (sample negative examples)and the test inputs are
2771, 61842, 60293, 30047, 63296, 22042, 6717] (input sequence)61842, 60293, 30047, 63296, 22042, 6717, 75780] (target sequence for positive examples)4401, 60351, 22176, 23456, 45342, '1193, 54231`] (sample negative examples)inp_file = os.path.join(data_dir, dataset + ".txt")
print(inp_file)
# initiate a dataset class
data = SASRecDataSet(filename=inp_file, col_sep="\t")
# create train, validation and test splits using leave-one-out strategy
# - valid_size: number of items per user for validation
# - test_size: number of items per user for testing
# - min_interactions: minimum interactions for a user to have valid/test splits
# - verbose: print split statistics
split_stats = data.split(valid_size=1, test_size=1, min_interactions=3, verbose=True)
print(split_stats)
# some statistics
num_steps = int(len(data.user_train) / batch_size)
cc = 0.0
for u in data.user_train:
cc += len(data.user_train[u])
print('average sequence length: %.2f' % (cc / len(data.user_train)))
Model parameters are
- number of items
- maximum sequence length of the user interaction history
- number of Transformer blocks
- embedding dimension for item embedding
- dimension of the attention
- number of attention heads
- dropout rate
- dimension of the convolution layers, list
- L_2-regularization coefficient
if model_name == 'sasrec':
model = SASREC(item_num=data.itemnum,
seq_max_len=maxlen,
num_blocks=num_blocks,
embedding_dim=hidden_units,
attention_dim=hidden_units,
attention_num_heads=num_heads,
dropout_rate=dropout_rate,
conv_dims = [100, 100],
l2_reg=l2_emb,
num_neg_test=num_neg_test
)
elif model_name == "ssept":
model = SSEPT(item_num=data.itemnum,
user_num=data.usernum,
seq_max_len=maxlen,
num_blocks=num_blocks,
# embedding_dim=hidden_units, # optional
user_embedding_dim=10,
item_embedding_dim=hidden_units,
attention_dim=hidden_units,
attention_num_heads=num_heads,
dropout_rate=dropout_rate,
conv_dims = [110, 110],
l2_reg=l2_emb,
num_neg_test=num_neg_test
)
else:
print(f"Model-{model_name} not found")
- the sampler creates negative samples from the training data for each batch
- this is done by looking at the original user interaction history and creating items that are not present at all
- the sampler generates a sequence of negative items of the same length as the original history
sampler = WarpSampler(data.user_train, data.usernum, data.itemnum, batch_size=batch_size, maxlen=maxlen, n_workers=3)
- the loss function is defined over all the negative and positive logits
- a mask has to be applied to indicate the non-zero items present in the output
- we also add the regularization loss here
- having a train-step signature function can speed up the training process
# Train the model
# - val_epoch: evaluate on validation set every N epochs (0 to disable)
# - eval_batch_size: batch size for evaluation (higher = faster)
with Timer() as train_time:
history = model.train_model(
data,
sampler,
num_epochs=num_epochs,
batch_size=batch_size,
learning_rate=lr,
val_epoch=5, # validate every 5 epochs
eval_batch_size=256,
verbose=True
)
print(f'\nTraining time: {train_time.interval/60.0:.2f} mins')
print(f'Final training loss: {history["loss"][-1]:.4f}')
# Evaluate on test set
# - seed: for reproducibility when sampling users (if > 10000 users)
# - eval_batch_size: batch size for evaluation
print("Evaluating on test set...")
with Timer() as eval_time:
test_metrics = model.evaluate(data, seed=seed, eval_batch_size=256)
print(f"Evaluation time: {eval_time.interval:.2f}s")
print(f"\nTest Results:")
print(f" NDCG@10: {test_metrics[0]:.6f}")
print(f" HR@10: {test_metrics[1]:.6f}")
# Record results for tests - ignore this cell
store_metadata("ndcg@10", test_metrics[0])
store_metadata("Hit@10", test_metrics[1])
[1] Wang-Cheng Kang, Julian McAuley: Self-Attentive Sequential Recommendation, arXiv preprint arXiv:1808.09781 (2018)
[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008
[3] Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 565–573.
[4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078. 2014.
[5] Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, Xing Xie. Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation. In Proceedings of the 28th International Joint Conferences on Artificial Intelligence, IJCAI’19, Pages 4213-4219. AAAI Press, 2019.
[6] Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack. SSE-PT: Sequential Recommendation Via Personalized Transformer. In Fourteenth ACM Conference on Recommender Systems, RecSys'20:, Pages 328–337, 2020.