Objectives:
Master the improvements of NeuralCF over the traditional collaborative filtering algorithm based on matrix decomposition, as well as the advantages and disadvantages of the algorithm.
Content:
The first part studies the most classic recommendation algorithm: collaborative filtering, and obtains the embedding vector of users and items based on matrix decomposition. The similarity between them can be obtained through dot product, which can be used for ranking and recommendation. However, traditional collaborative filtering directly uses a very sparse co-occurrence matrix for prediction, so the generalization ability of the model is very weak. In case of users with very few historical behaviors, it is impossible to produce accurate recommendation results. Matrix decomposition uses a very simple inner product method to deal with the intersection of user vector and item vector, so its fitting ability is also weak.
- Improvement points
1. Can we use deep learning to improve collaborative filtering algorithm? It includes calculating the embedding vector and finally calculating the dot product of the similarity between the item and the user.
2. Researchers in Singapore have improved the traditional collaborative filtering algorithm by using deep learning network, which is called neural CF (neural network collaborative filtering)
Algorithm idea:
Compare several algorithm ideas
- 1. Principle of matrix decomposition algorithm
Is to decompose the collinear matrix into two small matrices and multiply them. The small matrix is the embedding vector. - 2. Traditional point product for similarity
- 3. Basic idea of NeuralCF
The improvement point is to replace the original dot product operation with MLP. - 4. Improved version - double tower model
- The output of the Layer on the user side is regarded as embedding on the user side.
- The output of the Layer on the article side is regarded as embedding on the article side.
- Advantages: it can cache items and embed on the user side when recommended online. The similarity is obtained by directly calculating the dot product with the embedding of the item and the user side.
- 5. Improved version 2 - double tower model + MLP
Dot product operation is still too simple to find. Replace dot product operation with MLP.
- 6. Improved version 6 - double tower model + multi feature combination + MLP
Embedding only uses the user's id or collinear matrix, ignores other inherent attributes of the item and the user, and uses too few features. Therefore, more features can be added to the multi-layer neural network on the user side and the item side. This makes full use of features.
Model Code:
GitHub address: github source code
For example:
1. NeuralCF basic model
# neural cf model arch two. only embedding in each tower, then MLP as the interaction layers def neural_cf_model_1(feature_inputs, item_feature_columns, user_feature_columns, hidden_units): item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs) user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs) interact_layer = tf.keras.layers.concatenate([item_tower, user_tower]) for num_nodes in hidden_units: interact_layer = tf.keras.layers.Dense(num_nodes, activation='relu')(interact_layer) output_layer = tf.keras.layers.Dense(1, activation='sigmoid')(interact_layer) neural_cf_model = tf.keras.Model(feature_inputs, output_layer) return neural_cf_model
2. Improved version - double tower model
# neural cf model arch one. embedding+MLP in each tower, then dot product layer as the output def neural_cf_model_2(feature_inputs, item_feature_columns, user_feature_columns, hidden_units): item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs) for num_nodes in hidden_units: item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower) user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs) for num_nodes in hidden_units: user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower) output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower]) output = tf.keras.layers.Dense(1, activation='sigmoid')(output) # output = tf.keras.layers.Dense(1)(output) neural_cf_model = tf.keras.Model(feature_inputs, output) return neural_cf_model
It can be seen from the results that the accuracy is not very high and the model is seriously under fitted.
3. Improved version 2 - double tower model + MLP
# neural cf model arch one. embedding+MLP in each tower, then MLP layer as the output def neural_cf_model_3(feature_inputs, item_feature_columns, user_feature_columns, hidden_units): item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs) for num_nodes in hidden_units: item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower) user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs) for num_nodes in hidden_units: user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower) output = tf.keras.layers.concatenate([item_tower, user_tower]) # output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower]) for num_nodes in hidden_units: output = tf.keras.layers.Dense(num_nodes,activation='relu')(output) output = tf.keras.layers.Dense(1, activation='sigmoid')(output) # output = tf.keras.layers.Dense(1)(output) neural_cf_model = tf.keras.Model(feature_inputs, output) return neural_cf_model
From the running results, the loss of this model is reduced and the accuracy is improved.
Test Loss 0.19877538084983826, Test Accuracy 0.6881847977638245, Test ROC AUC 0.7592607140541077, Test PR AUC 0.7094590663909912
4. Improved version 6 - double tower model + multi feature combination + MLP
Final version:
# neural cf model arch one. embedding+MLP in each tower, then MLP layer as the output def neural_cf_model_4(feature_inputs, item_feature_columns, user_feature_columns, hidden_units): item_tower = tf.keras.layers.DenseFeatures(item_feature_columns)(feature_inputs) item_tower = tf.keras.layers.concatenate([item_tower,iterm_f]) for num_nodes in hidden_units: item_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(item_tower) user_tower = tf.keras.layers.DenseFeatures(user_feature_columns)(feature_inputs) user_tower = tf.keras.layers.concatenate([user_tower,user_f]) for num_nodes in hidden_units: user_tower = tf.keras.layers.Dense(num_nodes, activation='relu')(user_tower) output = tf.keras.layers.concatenate([item_tower, user_tower]) # output = tf.keras.layers.Dot(axes=1)([item_tower, user_tower]) for num_nodes in hidden_units: output = tf.keras.layers.Dense(num_nodes,activation='relu')(output) output = tf.keras.layers.Dense(1, activation='sigmoid')(output) # output = tf.keras.layers.Dense(1)(output) neural_cf_model = tf.keras.Model(feature_inputs, output) return neural_cf_model
Final operation result:
Test Loss 0.6841861605644226, Test Accuracy 0.6669825315475464, Test ROC AUC 0.715860903263092, Test PR AUC 0.6257403492927551
The effect is not much different from the third one, but when there is a large amount of data, theoretically, the fourth one has the best effect.