SASRec & SSEPT

Sequential Recommendation Using Transformer

This is a class of sequential recommendation that uses Transformer [2] for encoding the users preference represented in terms of a sequence of items purchased/viewed before. Instead of using CNN (Caser [3]) or RNN (GRU [4], SLI-Rec [5] etc.) the approach relies on Transformer based encoder that generates a new representation of the item sequence. Two variants of this Transformer based approaches are included here,

Self-Attentive Sequential Recommendation (or SASRec [1]) that is based on vanilla Transformer and models only the item sequence and
Stochastic Shared Embedding based Personalized Transformer or SSE-PT [6], that also models the users along with the items.

This notebook provides an example of necessary steps to train and test either a SASRec or a SSE-PT model.

python

%load_ext autoreload
%autoreload 2

python

import os
import sys
import pandas as pd 
import torch

from recommenders.utils.timer import Timer
from recommenders.datasets.amazon_reviews import get_review_data
from recommenders.datasets.split_utils import filter_k_core
from recommenders.models.sasrec.model import SASREC
from recommenders.models.sasrec.ssept import SSEPT
from recommenders.models.sasrec.sampler import WarpSampler
from recommenders.models.sasrec.util import SASRecDataSet
from recommenders.utils.notebook_utils import store_metadata

print(f"System version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")

Input Parameters

python

num_epochs = 5
batch_size = 128
seed = 100  # Set None for non-deterministic result

# data_dir = os.path.join("tests", "recsys_data", "RecSys", "SASRec-tf2", "data")
data_dir = os.path.join("..", "..", "tests", "resources", "deeprec", "sasrec")

# Amazon Electronics Data
dataset = "reviews_Electronics_5"

lr = 0.001             # learning rate
maxlen = 50            # maximum sequence length for each user
num_blocks = 2         # number of transformer blocks
hidden_units = 100     # number of units in the attention calculation
num_heads = 1          # number of attention heads
dropout_rate = 0.1     # dropout rate
l2_emb = 0.0           # L2 regularization coefficient
num_neg_test = 100     # number of negative examples per positive example
model_name = "sasrec"  # "sasrec" or "ssept"

python

reviews_name = dataset + '.json'
outfile = dataset + '.txt'

reviews_file = os.path.join(data_dir, reviews_name)
if not os.path.exists(reviews_file):
    reviews_output = get_review_data(reviews_file)
else:
    reviews_output = os.path.join(data_dir, dataset+".json_output")

python

if not os.path.exists(os.path.join(data_dir, outfile)):
    df = pd.read_csv(reviews_output, sep="\t", names=["userID", "itemID", "time"])
    df = filter_k_core(df, 10)  # filter for users & items with less than 10 interactions
    
    user_set, item_set = set(df['userID'].unique()), set(df['itemID'].unique())
    user_map = dict()
    item_map = dict()
    for u, user in enumerate(user_set):
        user_map[user] = u+1
    for i, item in enumerate(item_set):
        item_map[item] = i+1
    
    df["userID"] = df["userID"].apply(lambda x: user_map[x])
    df["itemID"] = df["itemID"].apply(lambda x: item_map[x])
    df = df.sort_values(by=["userID", "time"])
    df.drop(columns=["time"], inplace=True)
    df.to_csv(os.path.join(data_dir, outfile), sep="\t", header=False, index=False)

SASRec requires sequence input and sequence target. Targets are for both positive and negative examples. Inputs to the model are

user's item history as input to the transformer
user's item history shifted (by 1) as target to the transformer (positive examples)
a sequence of items that are not equal to the positive examples (negative examples)

From each user's history three samples are created. If there are $N_u$ items for user-$u$ then $N_u-2$ items are used in training and the last two items are used for validation and testing, respectively.

Dataset Format

The input files should have the following format:
- each row has user-id and item-id converted into integers (starting from 1)
- the rows are sorted by user-id and time of interaction
- for every user the last item is used for testing and the last but one is used for validation
- for example, for user 30449 the sorted inputs are:
  - 30449 2771
  - 30449 61842
  - 30449 60293
  - 30449 30047
  - 30449 63296
  - 30449 22042
  - 30449 6717
  - 30449 75780
  then the train inputs are
  - [2771, 61842, 60293, 30047, 63296] (input sequence)
  - [61842, 60293, 30047, 63296, 22042] (target sequence for positive examples)
  - [1001, 50490, 33312, 19294, 45342] (sample negative examples)
  and the validation inputs are
  - [2771, 61842, 60293, 30047, 63296, 22042] (input sequence)
  - [61842, 60293, 30047, 63296, 22042, 6717] (target sequence for positive examples)
  - [4401, 60351, 22176, 23456, 45342, '1193`] (sample negative examples)
  and the test inputs are
  - [2771, 61842, 60293, 30047, 63296, 22042, 6717] (input sequence)
  - [61842, 60293, 30047, 63296, 22042, 6717, 75780] (target sequence for positive examples)
  - [4401, 60351, 22176, 23456, 45342, '1193, 54231`] (sample negative examples)

python

inp_file = os.path.join(data_dir, dataset + ".txt")
print(inp_file)

# initiate a dataset class 
data = SASRecDataSet(filename=inp_file, col_sep="\t")

# create train, validation and test splits using leave-one-out strategy
# - valid_size: number of items per user for validation
# - test_size: number of items per user for testing  
# - min_interactions: minimum interactions for a user to have valid/test splits
# - verbose: print split statistics
split_stats = data.split(valid_size=1, test_size=1, min_interactions=3, verbose=True)
print(split_stats)

# some statistics
num_steps = int(len(data.user_train) / batch_size)
cc = 0.0
for u in data.user_train:
    cc += len(data.user_train[u])
print('average sequence length: %.2f' % (cc / len(data.user_train)))

Model Creation

Model parameters are

- number of items
- maximum sequence length of the user interaction history
- number of Transformer blocks
- embedding dimension for item embedding
- dimension of the attention
- number of attention heads
- dropout rate
- dimension of the convolution layers, list
- L_2-regularization coefficient

python

if model_name == 'sasrec':
    model = SASREC(item_num=data.itemnum,
                   seq_max_len=maxlen,
                   num_blocks=num_blocks,
                   embedding_dim=hidden_units,
                   attention_dim=hidden_units,
                   attention_num_heads=num_heads,
                   dropout_rate=dropout_rate,
                   conv_dims = [100, 100],
                   l2_reg=l2_emb,
                   num_neg_test=num_neg_test
    )
elif model_name == "ssept":
    model = SSEPT(item_num=data.itemnum,
                  user_num=data.usernum,
                  seq_max_len=maxlen,
                  num_blocks=num_blocks,
                  # embedding_dim=hidden_units,  # optional
                  user_embedding_dim=10,
                  item_embedding_dim=hidden_units,
                  attention_dim=hidden_units,
                  attention_num_heads=num_heads,
                  dropout_rate=dropout_rate,
                  conv_dims = [110, 110],
                  l2_reg=l2_emb,
                  num_neg_test=num_neg_test
    )
else:
    print(f"Model-{model_name} not found")

Sampler

- the sampler creates negative samples from the training data for each batch
- this is done by looking at the original user interaction history and creating items that are not present at all
- the sampler generates a sequence of negative items of the same length as the original history

python

sampler = WarpSampler(data.user_train, data.usernum, data.itemnum, batch_size=batch_size, maxlen=maxlen, n_workers=3)

Model Training

- the loss function is defined over all the negative and positive logits
- a mask has to be applied to indicate the non-zero items present in the output
- we also add the regularization loss here

- having a train-step signature function can speed up the training process

python

# Train the model
# - val_epoch: evaluate on validation set every N epochs (0 to disable)
# - eval_batch_size: batch size for evaluation (higher = faster)
with Timer() as train_time:
    history = model.train_model(
        data, 
        sampler, 
        num_epochs=num_epochs, 
        batch_size=batch_size, 
        learning_rate=lr, 
        val_epoch=5,  # validate every 5 epochs
        eval_batch_size=256,
        verbose=True
    )

print(f'\nTraining time: {train_time.interval/60.0:.2f} mins')
print(f'Final training loss: {history["loss"][-1]:.4f}')

python

# Evaluate on test set
# - seed: for reproducibility when sampling users (if > 10000 users)
# - eval_batch_size: batch size for evaluation
print("Evaluating on test set...")
with Timer() as eval_time:
    test_metrics = model.evaluate(data, seed=seed, eval_batch_size=256)

print(f"Evaluation time: {eval_time.interval:.2f}s")
print(f"\nTest Results:")
print(f"  NDCG@10: {test_metrics[0]:.6f}")
print(f"  HR@10:   {test_metrics[1]:.6f}")

python

# Record results for tests - ignore this cell
store_metadata("ndcg@10", test_metrics[0])
store_metadata("Hit@10", test_metrics[1])

Reference

[1] Wang-Cheng Kang, Julian McAuley: Self-Attentive Sequential Recommendation, arXiv preprint arXiv:1808.09781 (2018)

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008

[3] Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 565–573.

[4] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078. 2014.

[5] Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, Xing Xie. Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation. In Proceedings of the 28th International Joint Conferences on Artificial Intelligence, IJCAI’19, Pages 4213-4219. AAAI Press, 2019.

[6] Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack. SSE-PT: Sequential Recommendation Via Personalized Transformer. In Fourteenth ACM Conference on Recommender Systems, RecSys'20:, Pages 328–337, 2020.