Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements. See the NOTICE file

distributed with this work for additional information

regarding copyright ownership. The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,

software distributed under the License is distributed on an

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

KIND, either express or implied. See the License for the

specific language governing permissions and limitations

under the License.

Content-based recommender using Deep Structured Semantic Model

An example of how to build a Deep Structured Semantic Model (DSSM) for incorporating complex content-based features into a recommender system. See Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. This example does not attempt to provide a datasource or train a model, but merely show how to structure a complex DSSM network.

python

import warnings

import mxnet as mx
from mxnet import gluon, np, npx, autograd, sym
import numpy as onp
from sklearn.random_projection import johnson_lindenstrauss_min_dim

python

# Define some constants
max_user = int(1e5)
title_vocab_size = int(3e4)
query_vocab_size = int(3e4)
num_samples = int(1e4)
hidden_units = 128
epsilon_proj = 0.25

ctx = mx.gpu() if mx.device.num_gpus() > 0 else mx.cpu()

Bag of words random projection

A previous version of this example contained a bag of word random projection example, it is kept here for reference but not used in the next example. Random Projection is a dimension reduction technique that guarantees the disruption of the pair-wise distance between your original data point within a certain bound. What is even more interesting is that the dimension to project onto to guarantee that bound does not depend on the original number of dimension but solely on the total number of datapoints. You can see more explanation in this blog post

python

proj_dim = johnson_lindenstrauss_min_dim(num_samples, epsilon_proj)
print("To keep a distance disruption ~< {}% of our {} samples we need to randomly project to at least {} dimensions".format(epsilon_proj*100, num_samples, proj_dim))

python

class BagOfWordsRandomProjection(gluon.HybridBlock):
    def __init__(self, vocab_size, output_dim, random_seed=54321, pad_index=0):
        """
        :param int vocab_size: number of element in the vocabulary
        :param int output_dim: projection dimension
        :param int ramdon_seed: seed to use to guarantee same projection
        :param int pad_index: index of the vocabulary used for padding sentences
        """
        super(BagOfWordsRandomProjection, self).__init__()
        self._vocab_size = vocab_size
        self._output_dim = output_dim
        proj = self._random_unit_vecs(vocab_size=vocab_size, output_dim=output_dim, random_seed=random_seed)
        # we set the projection of the padding word to 0
        proj[pad_index, :] = 0
        self.proj = self.params.get_constant('proj', value=proj)

    def _random_unit_vecs(self, vocab_size, output_dim, random_seed):
        rs = onp.random.RandomState(seed=random_seed)
        W = rs.normal(size=(vocab_size, output_dim))
        Wlen = np.linalg.norm(W, axis=1)
        W_unit = W / Wlen[:,None]
        return W_unit

    def forward(self, x, proj):
        """
        :param nd or sym F:
        :param nd.NDArray x: index of tokens
        returns the sum of the projected embeddings of each token
        """
        embedded = npx.embedding(x, proj, input_dim=self._vocab_size, output_dim=self._output_dim)
        return embedded.sum(axis=1)

python

bowrp = BagOfWordsRandomProjection(1000, 20)
bowrp.initialize()

python

bowrp(mx.np.array([[10, 50, 100], [5, 10, 0]]))

With padding:

python

bowrp(mx.np.array([[10, 50, 100, 0], [5, 10, 0, 0]]))

Content-based recommender / ranking system using DSSM

For example in the search result ranking problem: You have users, that have performed text-based searches. They were presented with results, and selected one of them. Results are composed of a title and an image.

Your positive examples will be the clicked items in the search results, and the negative examples are sampled from the non-clicked examples.

The network will jointly learn embeddings for users and query text making up the "Query", title and image making the "Item" and learn how similar they are.

After training, you can index the embeddings for your items and do a knn search with your query embeddings using the cosine similarity to return ranked items

python

proj_dim = 128

python

class DSSMRecommenderNetwork(gluon.HybridBlock):
    def __init__(self, query_vocab_size, proj_dim, max_user, title_vocab_size, hidden_units, random_seed=54321, p=0.5):
        super(DSSMRecommenderNetwork, self).__init__()
            
        # User/Query pipeline
        self.user_embedding = gluon.nn.Embedding(max_user, proj_dim)
        self.user_mlp = gluon.nn.Dense(hidden_units, activation="relu")
        
        # Instead of bag of words, we use learned embeddings + stacked biLSTM average
        self.query_text_embedding = gluon.nn.Embedding(query_vocab_size, proj_dim)
        self.query_lstm = gluon.rnn.LSTM(hidden_units, 2, bidirectional=True)
        self.query_text_mlp = gluon.nn.Dense(hidden_units, activation="relu")            
        
        self.query_dropout = gluon.nn.Dropout(p)
        self.query_mlp = gluon.nn.Dense(hidden_units, activation="relu")

        # Item pipeline
        # Instead of bag of words, we use learned embeddings + stacked biLSTM average
        self.title_embedding = gluon.nn.Embedding(title_vocab_size, proj_dim)
        self.title_lstm = gluon.rnn.LSTM(hidden_units, 2, bidirectional=True)
        self.title_mlp = gluon.nn.Dense(hidden_units, activation="relu")
        
        # You could use vgg here for example
        self.image_embedding = gluon.model_zoo.vision.resnet18_v2(pretrained=False).features 
        self.image_mlp = gluon.nn.Dense(hidden_units, activation="relu")
        
        self.item_dropout = gluon.nn.Dropout(p)
        self.item_mlp = gluon.nn.Dense(hidden_units, activation="relu")
    
    def forward(self, user, query_text, title, image):
        # Query
        user = self.user_embedding(user)
        user = self.user_mlp(user)

        query_text = self.query_text_embedding(query_text)
        query_text = self.query_lstm(query_text.transpose((1,0,2)))
        # average the states
        query_text = query_text.mean(axis=0)
        query_text = self.query_text_mlp(query_text)
        
        query = np.concatenate([user, query_text])
        query = self.query_dropout(query)
        query = self.query_mlp(query)
        
        # Item
        title_text = self.title_embedding(title)
        title_text = self.title_lstm(title_text.transpose((1,0,2)))
        # average the states
        title_text = title_text.mean(axis=0)
        title_text = self.title_mlp(title_text)
        
        image = self.image_embedding(image)
        image = self.image_mlp(image)
        
        item = np.concatenate([title_text, image])
        item = self.item_dropout(item)
        item = self.item_mlp(item)
        
        # Cosine Similarity
        query = query.expand_dims(axis=2)
        item = item.expand_dims(axis=2)
        sim = npx.batch_dot(query, item, transpose_a=True) / np.expand_dims((np.norm(query, axis=1) * np.norm(item, axis=1) + 1e-9), axis=2)
        
        return sim.squeeze(axis=2)

python

network = DSSMRecommenderNetwork(
    query_vocab_size,
    proj_dim,
    max_user,
    title_vocab_size,
    hidden_units
)


network.initialize(mx.init.Xavier(), ctx)

# Load pre-trained vgg16 weights
with network.name_scope():
    network.image_embedding = gluon.model_zoo.vision.resnet18_v2(pretrained=True, ctx=ctx).features

It is quite hard to visualize the network since it is relatively complex but you can see the two-pronged structure, and the resnet18 branch

python

mx.viz.plot_network(network(
                        mx.sym.var('user'), mx.sym.var('query_text'), mx.sym.var('title'), mx.sym.var('image')),
                    shape={'user': (1,1), 'query_text': (1,30), 'title': (1,30), 'image': (1,3,224,224)},
                    node_attrs={"fixedsize":"False"})

We can print the summary of the network using dummy data. We can see it is already training on 32M parameters!

python

user  = mx.np.array([[200], [100]], ctx)
query = mx.np.array([[10, 20, 0, 0, 0], [40, 50, 0, 0, 0]], ctx) # Example of an encoded text
title = mx.np.array([[10, 20, 0, 0, 0], [40, 50, 0, 0, 0]], ctx) # Example of an encoded text
image = mx.np.random.uniform(size=(2,3, 224,224), ctx=ctx) # Example of an encoded image


network.summary(user, query, title, image)

python

network(user, query, title, image)

The output is the similarity, if we wanted to train it on real data, we would need to minimize the Cosine loss, 1 - cosine_similarity.