<i>Copyright (c) Recommenders contributors.

Licensed under the MIT License.</i>

Model Comparison for NCF Using the Neural Network Intelligence Toolkit

This notebook shows how to use the Neural Network Intelligence toolkit (NNI) for tuning hyperparameters for the Neural Collaborative Filtering Model.

To learn about each tuner NNI offers you can read about it here.

NNI is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system’s parameters, in an efficient and automatic way. NNI has several appealing properties: ease of use, scalability, flexibility and efficiency. NNI can be executed in a distributed way on a local machine, a remote server, or a large scale training platform such as OpenPAI or Kubernetes.

In this notebook, we can see how NNI works with two different model types and the differences between their hyperparameter search spaces, yaml config file, and training scripts.

NCF Training Script

For this notebook we use a local machine as the training platform (this can be any machine running the reco_base conda environment). In this case, NNI uses the available processors of the machine to parallelize the trials, subject to the value of trialConcurrency we specify in the configuration. Our runs and the results we report were obtained on a Standard_D16_v3 virtual machine with 16 vcpus and 64 GB memory.

1. Global Settings

python

import sys
import json
import os
import surprise
import pandas as pd
import shutil
import subprocess
import yaml
import pkg_resources
from tempfile import TemporaryDirectory
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages

import recommenders
from recommenders.utils.timer import Timer
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_chrono_split
from recommenders.evaluation.python_evaluation import rmse, precision_at_k, ndcg_at_k
from recommenders.tuning.nni.nni_utils import (
    check_experiment_status, 
    check_stopped, 
    check_metrics_written, 
    get_trials,
    stop_nni, start_nni
)
from recommenders.models.ncf.dataset import Dataset as NCFDataset
from recommenders.models.ncf.ncf_singlenode import NCF
from recommenders.tuning.nni.ncf_utils import compute_test_results, combine_metrics_dicts

print("System version: {}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))
print("NNI version: {}".format(pkg_resources.get_distribution("nni").version))

tmp_dir = TemporaryDirectory()

%load_ext autoreload
%autoreload 2

2. Prepare Dataset

Download data and split into training, validation and test sets
Store the data sets to a local directory.

python

# Parameters used by papermill
# Select Movielens data size: 100k, 1m
MOVIELENS_DATA_SIZE = '100k'
SURPRISE_READER = 'ml-100k'
TMP_DIR = tmp_dir.name
NUM_EPOCHS = 10
MAX_TRIAL_NUM = 16
DEFAULT_SEED = 42

# time (in seconds) to wait for each tuning experiment to complete
WAITING_TIME = 20
MAX_RETRIES = MAX_TRIAL_NUM*4 # it is recommended to have MAX_RETRIES>=4*MAX_TRIAL_NUM

python

# Note: The NCF model can incorporate
df = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=["userID", "itemID", "rating", "timestamp"]
)

df.head()

python

train, validation, test = python_chrono_split(df, [0.7, 0.15, 0.15])
train = train.drop(['timestamp'], axis=1)
validation = validation.drop(['timestamp'], axis=1)
test = test.drop(['timestamp'], axis=1)

python

LOG_DIR = os.path.join(TMP_DIR, "experiments")
os.makedirs(LOG_DIR, exist_ok=True)

DATA_DIR = os.path.join(TMP_DIR, "data") 
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))

VAL_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_val.pkl"
validation.to_pickle(os.path.join(DATA_DIR, VAL_FILE_NAME))

TEST_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_test.pkl"
test.to_pickle(os.path.join(DATA_DIR, TEST_FILE_NAME))

3. Prepare Hyperparameter Tuning

To run an experiment on NNI we require a general training script for our model of choice. A general framework for a training script utilizes the following components

Argument Parse for the fixed parameters (dataset location, metrics to use)
Data preprocessing steps specific to the model
Fitting the model on the train set
Evaluating the model on the validation set on each metric (ranking and rating)
Save metrics and model

To utilize NNI we also require a hypeyparameter search space. Only the hyperparameters we want to tune are required in the dictionary. NNI supports different methods of hyperparameter sampling.

The script_params below are the parameters of the training script that are fixed (unlike hyper_params which are tuned).

python

PRIMARY_METRIC = "precision_at_k"
RATING_METRICS = ["rmse"]
RANKING_METRICS = ["precision_at_k", "ndcg_at_k"]  
USERCOL = "userID"
ITEMCOL = "itemID"
REMOVE_SEEN = True
K = 10
RANDOM_STATE = 42
VERBOSE = True
BIASED = True

script_params = " ".join([
    "--datastore", DATA_DIR,
    "--train-datapath", TRAIN_FILE_NAME,
    "--validation-datapath", VAL_FILE_NAME,
    "--surprise-reader", SURPRISE_READER,
    "--rating-metrics", " ".join(RATING_METRICS),
    "--ranking-metrics", " ".join(RANKING_METRICS),
    "--usercol", USERCOL,
    "--itemcol", ITEMCOL,
    "--k", str(K),
    "--random-state", str(RANDOM_STATE),
    "--epochs", str(NUM_EPOCHS),
    "--primary-metric", PRIMARY_METRIC
])

if BIASED:
    script_params += " --biased"
if VERBOSE:
    script_params += " --verbose"
if REMOVE_SEEN:
    script_params += " --remove-seen"

We specify the search space for the NCF hyperparameters

python

ncf_hyper_params = {
    'n_factors': {"_type": "choice", "_value": [2, 4, 8, 12]},
    'learning_rate': {"_type": "uniform", "_value": [1e-3, 1e-2]},
}

python

with open(os.path.join(TMP_DIR, 'search_space_ncf.json'), 'w') as fp:
    json.dump(ncf_hyper_params, fp)

This config file follows the guidelines provided in NNI Experiment Config instructions.

The options to pay attention to are

The "searchSpacePath" which contains the space of hyperparameters we wanted to tune defined above
The "tuner" which specifies the hyperparameter tuning algorithm that will sample from our search space and optimize our model

python

config = {
    "authorName": "default",
    "experimentName": "tensorflow_ncf",
    "trialConcurrency": 8,
    "maxExecDuration": "1h",
    "maxTrialNum": MAX_TRIAL_NUM,
    "trainingServicePlatform": "local",
    # The path to Search Space
    "searchSpacePath": "search_space_ncf.json",
    "useAnnotation": False,
    "logDir": LOG_DIR,
    "tuner": {
        "builtinTunerName": "TPE",
        "classArgs": {
            #choice: maximize, minimize
            "optimize_mode": "maximize"
        }
    },
    # The path and the running command of trial
    "trial":  {
      "command": f"{sys.executable} ncf_training.py {script_params}",
      "codeDir": os.path.join(os.path.split(os.path.abspath(recommenders.__file__))[0], "tuning", "nni"),
      "gpuNum": 0
    }
}
 
with open(os.path.join(TMP_DIR, "config_ncf.yml"), "w") as fp:
    fp.write(yaml.dump(config, default_flow_style=False))

4. Execute NNI Trials

The conda environment comes with NNI installed, which includes the command line tool nnictl for controlling and getting information about NNI experiments.

To start the NNI tuning trials from the command line, execute the following command:

nnictl create --config <path of config.yml>

The start_nni function will run the nnictl create command. To find the URL for an active experiment you can run nnictl webui url on your terminal.

In this notebook the 16 NCF models are trained concurrently in a single experiment with batches of 8. While NNI can run two separate experiments simultaneously by adding the --port <port_num> flag to nnictl create, the total training time will probably be the same as running the batches sequentially since these are CPU bound processes.

python

stop_nni()
config_path_ncf = os.path.join(TMP_DIR, 'config_ncf.yml')
with Timer() as time_ncf:
    start_nni(config_path_ncf, wait=WAITING_TIME, max_retries=MAX_RETRIES)

python

check_metrics_written(wait=WAITING_TIME, max_retries=MAX_RETRIES)
trials_ncf, best_metrics_ncf, best_params_ncf, best_trial_path_ncf = get_trials('maximize')

python

best_metrics_ncf

python

best_params_ncf

5. Baseline Model

Although we hope that the additional effort of utilizing an AutoML framework like NNI for hyperparameter tuning will lead to better results, we should also draw comparisons using our baseline model (our model trained with its default hyperparameters). This allows us to precisely understand what performance benefits NNI is or isn't providing.

python

data = NCFDataset(train, validation, seed=DEFAULT_SEED)
model = NCF(
    n_users=data.n_users, 
    n_items=data.n_items,
    model_type="NeuMF",
    n_factors=4,
    layer_sizes=[16,8,4],
    n_epochs=NUM_EPOCHS,
    learning_rate=1e-3,  
    verbose=True,
    seed=DEFAULT_SEED
)
model.fit(data)

python

test_results = compute_test_results(model, train, validation, RATING_METRICS, RANKING_METRICS)
test_results

5. Show Results

The metrics for each model type is reported on the validation set. At this point we can compare the metrics for each model and select the one with the best score on the primary metric(s) of interest.

python

test_results['name'] = 'ncf_baseline'
best_metrics_ncf['name'] = 'ncf_tuned'
combine_metrics_dicts(test_results, best_metrics_ncf)

Based on the above metrics, we determine that NNI has identified a set of hyperparameters that does demonstrate an improvement on our metrics of interest. In this example, it turned out that an n_factors of 12 contributed to a better performance than an n_factors of 4. While the difference in precision_at_k and ndcg_at_k is small, NNI has helped us determine that a slightly larger embedding dimension for NCF may be useful for the movielens dataset.

python

# Stop the NNI experiment 
stop_nni()

python

tmp_dir.cleanup()

7. Concluding Remarks

In this notebook we showed how to use the NNI framework on different models. By inspection of the training scripts, the differences between the two should help you identify what components would need to be modified to run another model with NNI.

In practice, an AutoML framework like NNI is just a tool to help you explore a large space of hyperparameters quickly with a pre-described level of randomization. It is recommended that in addition to using NNI one trains baseline models using typical hyperparamter choices (learning rate of 0.005, 0.001 or regularization rates of 0.05, 0.01, etc.) to draw more meaningful comparisons between model performances. This may help determine if a model is overfitting from the tuner or if there is a statistically significant improvement.

Another thing to note is the added computational cost required to train models using an AutoML framework. In this case, it takes about 6 minutes to train each of the models on a Standard_NC6 VM. With this in mind, while NNI can easily train hundreds of models over all hyperparameters for a model, in practice it may be beneficial to choose a subset of the hyperparameters that are deemed most important and to tune those. Too small of a hyperparameter search space may restrict our exploration, but too large may also lead to random noise in the data being exploited by a specific combination of hyperparameters.

For examples of scaling larger tuning workloads on clusters of machines, see the notebooks that employ the Azure Machine Learning service.

8. References

Recommenders Repo References

External References