examples/00_quick_start/rbm_movielens.ipynb
<i>Copyright (c) Recommenders contributors.</i>
<i>Licensed under the MIT License.</i>
A Restricted Boltzmann Machine (RBM) is a generative neural network model typically used to perform unsupervised learning. The main task of an RBM is to learn the joint probability distribution $P(v,h)$, where $v$ are the visible units and $h$ the hidden ones. The hidden units represent latent variables while the visible units are clamped on the input data. Once the joint distribution is learnt, new examples are generated by sampling from it.
In this notebook, we provide an example of how to utilize the RBM to perform user/item recommendations. In particular, we use as a case study the movielens dataset, comprising user's ranking of movies on a scale of 1 to 5.
This notebook provides a quick start, showing the basic steps needed to use and evaluate the algorithm. A detailed discussion of the RBM model together with a deeper analysis of the recommendation task is provided in the RBM Deep Dive section. The RBM implementation presented here is based on the article by Ruslan Salakhutdinov, Andriy Mnih and Geoffrey Hinton Restricted Boltzmann Machines for Collaborative Filtering with the exception that here we use multinomial units instead of the one-hot encoded used in the paper.
The model generates ratings for a user/movie pair using a collaborative filtering based approach. While matrix factorization methods learn how to reproduce an instance of the user/item affinity matrix, the RBM learns the underlying probability distribution. This has several advantages:
import sys
import numpy as np
import pandas as pd
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages
from recommenders.models.rbm.rbm import RBM
from recommenders.datasets.python_splitters import numpy_stratified_split
from recommenders.datasets.sparse import AffinityMatrix
from recommenders.datasets import movielens
from recommenders.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from recommenders.utils.timer import Timer
from recommenders.utils.plot import line_graph
from recommenders.utils.notebook_utils import store_metadata
#For interactive mode only
%load_ext autoreload
%autoreload 2
%matplotlib inline
print(f"System version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"Tensorflow version: {tf.__version__})
Here we select the size of the movielens dataset. In this example we consider the 100k ratings datasets, provided by 943 users on 1682 movies. The data are imported in a pandas dataframe including the user ID, the item ID, the ratings and a timestamp denoting when a particular user rated a particular item.
# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'
data = movielens.load_pandas_df(
size=MOVIELENS_DATA_SIZE,
header=['userID','movieID','rating','timestamp']
)
data.head()
As a second step we generate the user/item affiity matrix and then split the data into train and test set. If you are familiar with training supervised learning model, here you will notice the first difference. In the former case, we cut off a certain proportion of training examples from dataset (e.g. images), here corresponding to users (or items), ending up with two matrices (train and test) having different row dimensions. Here we need to mantain the same matrix size for the train and test set, but the two will contain different amounts of ratings, see the deep dive notebook for more details. The affinity matrix reads
#to use standard names across the analysis
header = {
"col_user": "userID",
"col_item": "movieID",
"col_rating": "rating",
}
#instantiate the sparse matrix generation
am = AffinityMatrix(df = data, **header)
#obtain the sparse matrix
X, _, _ = am.gen_affinity_matrix()
The method also returns informations on the sparsness of the dataset and the size of the user/affinity matrix. The former is given by the ratio between the unrated elements and the total number of matrix elements. This is what makes a recommendation task hard: we try to predict 93% of the missing data with only 7% of information!
We split the matrix using the default ration of 0.75, i.e. 75% of the ratings will constitute the train set.
Xtr, Xtst = numpy_stratified_split(X)
The splitter returns:
Note that the train/test matrices have exactly the same dimension, but different entries as it can be explicitly verified:
print('train matrix size', Xtr.shape)
print('test matrix size', Xtst.shape)
The model has been implemented as a Tensorflow (TF) class. TF does not support probabilistic models natively, so the implementation of the algorithm has a different structure than the one you may be used to see in popular supervised models. The class has been implemented in such a way that the TF session is hidden inside the fit() method and no explicit call is needed. The algorithm operates in three different steps:
Model initialization: This is where we tell TF how to build the computational graph. The main parameters to specify are the number of hidden units, the number of training epochs and the minibatch size. Other parameters can be optionally tweaked for experimentation and to achieve better performance, as explained in the RBM Deep Dive section.
Model fit: This is where we train the model on the data. The method takes two arguments: the training and test set matrices. Note that the model is trained only on the training set, the test set is used to display the generalization accuracy of the trained model, useful to have an idea of how to fix the hyper parameters.
Model prediction: This is where we generate ratings for the unseen items. Once the model has been trained and we are satisfied with its overall accuracy, we sample new ratings from the learned distribution. In particular, we extract the top_k (e.g. 10) most relevant recommendations according to some predefined score. The prediction is then returned in a dataframe format ready to be analysed and deployed.
#First we initialize the model class
model = RBM(
possible_ratings=np.setdiff1d(np.unique(Xtr), np.array([0])),
visible_units=Xtr.shape[1],
hidden_units=600,
training_epoch=30,
minibatch_size=60,
keep_prob=0.9,
with_metrics=True
)
Note that the first time the fit method is called it may take longer to return the result. This is due to the fact that TF needs to initialized the GPU session. You will notice that this is not the case when training the algorithm the second or more times.
# Model Fit
with Timer() as train_time:
model.fit(Xtr)
print("Took {:.2f} seconds for training.".format(train_time.interval))
# Plot the train RMSE as a function of the epochs
line_graph(values=model.rmse_train, labels='train', x_name='epoch', y_name='rmse_train')
During training, we can optionlly evauate the root mean squared error to have an idea of how the learning is proceeding. We would generally like to see this quantity decreasing as a function of the learning epochs. To visualise this choose with_metrics = True in the RBM() model function.
Once the model has been trained, we can predict new ratings on the test set.
# number of top score elements to be recommended
K = 10
# Model prediction on the test set Xtst.
with Timer() as prediction_time:
top_k = model.recommend_k_items(Xtst)
print("Took {:.2f} seconds for prediction.".format(prediction_time.interval))
top_k returns the first K elements having the highest recommendation score. Here the recommendation score is evaluated by multiplying the predicted rating by its probability, i.e. the confidence the algorithm has about its output. So if we have two items both with predicted ratings 5, but one with probability 0.5 and the other 0.9, the latter will be considered more relevant. In order to inspect the prediction and use the evaluation metrics in this repository, we convert both top_k and Xtst to pandas dataframe format:
top_k_df = am.map_back_sparse(top_k, kind = 'prediction')
test_df = am.map_back_sparse(Xtst, kind = 'ratings')
top_k_df.head(10)
Here we evaluate the performance of the algorithm using the metrics provided in the PythonRankingEvaluation class. Note that the following metrics take into account only the first K elements, therefore their value may be different from the one displayed from the model.fit() method.
def ranking_metrics(
data_size,
data_true,
data_pred,
K
):
eval_map = map_at_k(data_true, data_pred, col_user="userID", col_item="movieID",
col_rating="rating", col_prediction="prediction",
relevancy_method="top_k", k= K)
eval_ndcg = ndcg_at_k(data_true, data_pred, col_user="userID", col_item="movieID",
col_rating="rating", col_prediction="prediction",
relevancy_method="top_k", k= K)
eval_precision = precision_at_k(data_true, data_pred, col_user="userID", col_item="movieID",
col_rating="rating", col_prediction="prediction",
relevancy_method="top_k", k= K)
eval_recall = recall_at_k(data_true, data_pred, col_user="userID", col_item="movieID",
col_rating="rating", col_prediction="prediction",
relevancy_method="top_k", k= K)
df_result = pd.DataFrame(
{ "Dataset": data_size,
"K": K,
"MAP": eval_map,
"nDCG@k": eval_ndcg,
"Precision@k": eval_precision,
"Recall@k": eval_recall,
},
index=[0]
)
return df_result
eval_100k = ranking_metrics(
data_size="mv 100k",
data_true=test_df,
data_pred=top_k_df,
K=10
)
eval_100k
# Record results for tests - ignore this cell
store_metadata("map", eval_100k['MAP'][0])
store_metadata("ndcg", eval_100k['nDCG@k'][0])
store_metadata("precision", eval_100k['Precision@k'][0])
store_metadata("recall", eval_100k['Recall@k'][0])
Trained model checkpoint can be saved to a specified directory using the save function.
model.save(file_path='./models/rbm_model.ckpt')
Pre-trained RBM model can be loaded using the load function, which can be used to resume the training.
# Initialize the model class
model = RBM(
possible_ratings=np.setdiff1d(np.unique(Xtr), np.array([0])),
visible_units=Xtr.shape[1],
hidden_units=600,
training_epoch=30,
minibatch_size=60,
keep_prob=0.9,
with_metrics=True
)
# Load the model checkpoint
model.load(file_path='./models/rbm_model.ckpt')