Implementation of CF algorithm based on matrix decomposition (I): LFM
LFM is the Funk SVD matrix decomposition mentioned earlier
Analysis of LFM principle
The core idea of LFM (late factor model) implicit semantic model is to contact users and items through implicit features, as shown in the following figure:
- P matrix is user lf matrix, that is, user and implicit feature matrix. LF has three, indicating that there are always three implied features.
- Q matrix is lf item matrix, that is, the matrix of implied features and items
- R matrix is a user item matrix, which is derived from P*Q
- Can handle sparse scoring matrix
Using the matrix decomposition technology, the scoring matrix (dense / sparse) of the original user item is decomposed into P and Q matrices, and then P ∗ Q P*Q P * Q restores the user item scoring matrix R R R. The whole process is equivalent to dimensionality reduction, in which:
-
Matrix value P 11 P_{11} P11 ﹤ represents the weight value of user 1 to implicit feature 1
-
Matrix value Q 11 Q_{11} Q11 represents the weight value of implicit feature 1 on Article 1
-
Matrix value R 11 R_{11} R11 represents the predicted score of user 1 on Item 1, and R 11 = P 1 , k ⃗ ⋅ Q k , 1 ⃗ R_{11}=\vec{P_{1,k}}\cdot \vec{Q_{k,1}} R11=P1,k ⋅Qk,1
Use LFM to predict users' ratings of items,
k
k
k represents the number of implied features:
Therefore, in the end, our goal is to obtain the P matrix and Q matrix and each value thereof, and then predict the user item score.
loss function
Similarly, for the score prediction, we use the square difference to construct the loss function:
Add L2 regularization:
C
o
s
t
=
∑
u
,
i
∈
R
(
r
u
i
−
∑
k
=
1
k
p
u
k
q
i
k
)
2
+
λ
(
∑
U
p
u
k
2
+
∑
I
q
i
k
2
)
Cost = \sum_{u,i\in R} (r_{ui}-{\sum_{k=1}}^k p_{uk}q_{ik})^2 + \lambda(\sum_U{p_{uk}}^2+\sum_I{q_{ik}}^2)
Cost=u,i∈R∑(rui−k=1∑kpukqik)2+λ(U∑puk2+I∑qik2)
Partial derivative of loss function:
Stochastic gradient descent optimization
Gradient descent update parameters
p
u
k
p_{uk}
puk:
Similarly:
Random gradient descent: vector multiplication, each component is multiplied and summed
Because P matrix and Q matrix are two different matrices, they usually adopt different regularization parameters, such as λ 1 \lambda_1 λ 1. And λ 2 \lambda_2 λ2
Algorithm implementation
''' LFM Model ''' import pandas as pd import numpy as np # Score prediction 1-5 class LFM(object): def __init__(self, alpha, reg_p, reg_q, number_LatentFactors=10, number_epochs=10, columns=["uid", "iid", "rating"]): self.alpha = alpha # Learning rate self.reg_p = reg_p # P-matrix regularity self.reg_q = reg_q # Q-matrix regularity self.number_LatentFactors = number_LatentFactors # Number of implicit categories self.number_epochs = number_epochs # Maximum number of iterations self.columns = columns def fit(self, dataset): ''' fit dataset :param dataset: uid, iid, rating :return: ''' self.dataset = pd.DataFrame(dataset) self.users_ratings = dataset.groupby(self.columns[0]).agg([list])[[self.columns[1], self.columns[2]]] self.items_ratings = dataset.groupby(self.columns[1]).agg([list])[[self.columns[0], self.columns[2]]] self.globalMean = self.dataset[self.columns[2]].mean() self.P, self.Q = self.sgd() def _init_matrix(self): ''' initialization P and Q Matrix, and set the random value between 0 and 1 as the initial value :return: ''' # User-LF P = dict(zip( self.users_ratings.index, np.random.rand(len(self.users_ratings), self.number_LatentFactors).astype(np.float32) )) # Item-LF Q = dict(zip( self.items_ratings.index, np.random.rand(len(self.items_ratings), self.number_LatentFactors).astype(np.float32) )) return P, Q def sgd(self): ''' Use random gradient descent to optimize the results :return: ''' P, Q = self._init_matrix() for i in range(self.number_epochs): print("iter%d"%i) error_list = [] for uid, iid, r_ui in self.dataset.itertuples(index=False): # User-LF P ## Item-LF Q v_pu = P[uid] #User vector v_qi = Q[iid] #Item vector err = np.float32(r_ui - np.dot(v_pu, v_qi)) v_pu += self.alpha * (err * v_qi - self.reg_p * v_pu) v_qi += self.alpha * (err * v_pu - self.reg_q * v_qi) P[uid] = v_pu Q[iid] = v_qi # for k in range(self.number_of_LatentFactors): # v_pu[k] += self.alpha*(err*v_qi[k] - self.reg_p*v_pu[k]) # v_qi[k] += self.alpha*(err*v_pu[k] - self.reg_q*v_qi[k]) error_list.append(err ** 2) print(np.sqrt(np.mean(error_list))) return P, Q def predict(self, uid, iid): # If uid or iid is not available, we use the average score of the whole play as the prediction result if uid not in self.users_ratings.index or iid not in self.items_ratings.index: return self.globalMean p_u = self.P[uid] q_i = self.Q[iid] return np.dot(p_u, q_i) def test(self,testset): '''Predictive test set data''' for uid, iid, real_rating in testset.itertuples(index=False): try: pred_rating = self.predict(uid, iid) except Exception as e: print(e) else: yield uid, iid, real_rating, pred_rating if __name__ == '__main__': dtype = [("userId", np.int32), ("movieId", np.int32), ("rating", np.float32)] dataset = pd.read_csv("datasets/ml-latest-small/ratings.csv", usecols=range(3), dtype=dict(dtype)) lfm = LFM(0.02, 0.01, 0.01, 10, 100, ["userId", "movieId", "rating"]) lfm.fit(dataset) while True: uid = input("uid: ") iid = input("iid: ") print(lfm.predict(int(uid), int(iid)))
Implementation of CF algorithm based on matrix decomposition (II): BiasSvd
BiasSvd is actually the offset term added to the previously mentioned Funk SVD matrix decomposition.
BiasSvd
Using BiasSvd to predict users' ratings of items,
k
k
k represents the number of implied features:
loss function
Similarly, for the score prediction, we use the square difference to construct the loss function:
Add L2 regularization:
C
o
s
t
=
∑
u
,
i
∈
R
(
r
u
i
−
μ
−
b
u
−
b
i
−
∑
k
=
1
k
p
u
k
q
i
k
)
2
+
λ
(
∑
U
b
u
2
+
∑
I
b
i
2
+
∑
U
p
u
k
2
+
∑
I
q
i
k
2
)
Cost = \sum_{u,i\in R} (r_{ui}-\mu - b_u - b_i-{\sum_{k=1}}^k p_{uk}q_{ik})^2 + \lambda(\sum_U{b_u}^2+\sum_I{b_i}^2+\sum_U{p_{uk}}^2+\sum_I{q_{ik}}^2)
Cost=u,i∈R∑(rui−μ−bu−bi−k=1∑kpukqik)2+λ(U∑bu2+I∑bi2+U∑puk2+I∑qik2)
Partial derivative of loss function:
Stochastic gradient descent optimization
Gradient descent update parameters
p
u
k
p_{uk}
puk:
Similarly:
b u : = b u + α [ ∑ u , i ∈ R ( r u i − μ − b u − b i − ∑ k = 1 k p u k q i k ) − λ b u ] b_u:=b_u + \alpha[\sum_{u,i\in R} (r_{ui}-\mu - b_u - b_i-{\sum_{k=1}}^k p_{uk}q_{ik}) - \lambda b_u] bu:=bu+α[u,i∈R∑(rui−μ−bu−bi−k=1∑kpukqik)−λbu]
b i : = b i + α [ ∑ u , i ∈ R ( r u i − μ − b u − b i − ∑ k = 1 k p u k q i k ) − λ b i ] b_i:=b_i + \alpha[\sum_{u,i\in R} (r_{ui}-\mu - b_u - b_i-{\sum_{k=1}}^k p_{uk}q_{ik}) - \lambda b_i] bi:=bi+α[u,i∈R∑(rui−μ−bu−bi−k=1∑kpukqik)−λbi]
Random gradient descent:
b u : = b u + α [ ( r u i − μ − b u − b i − ∑ k = 1 k p u k q i k ) − λ 3 b u ] b_u:=b_u + \alpha[(r_{ui}-\mu - b_u - b_i-{\sum_{k=1}}^k p_{uk}q_{ik}) - \lambda_3 b_u] bu:=bu+α[(rui−μ−bu−bi−k=1∑kpukqik)−λ3bu]
b i : = b i + α [ ( r u i − μ − b u − b i − ∑ k = 1 k p u k q i k ) − λ 4 b i ] b_i:=b_i + \alpha[(r_{ui}-\mu - b_u - b_i-{\sum_{k=1}}^k p_{uk}q_{ik}) - \lambda_4 b_i] bi:=bi+α[(rui−μ−bu−bi−k=1∑kpukqik)−λ4bi]
Because P matrix and Q matrix are two different matrices, they usually adopt different regularization parameters, such as λ 1 \lambda_1 λ 1. And λ 2 \lambda_2 λ2
Algorithm implementation
''' BiasSvd Model ''' import math import random import pandas as pd import numpy as np class BiasSvd(object): def __init__(self, alpha, reg_p, reg_q, reg_bu, reg_bi, number_LatentFactors=10, number_epochs=10, columns=["uid", "iid", "rating"]): self.alpha = alpha # Learning rate self.reg_p = reg_p self.reg_q = reg_q self.reg_bu = reg_bu self.reg_bi = reg_bi self.number_LatentFactors = number_LatentFactors # Number of implicit categories self.number_epochs = number_epochs self.columns = columns def fit(self, dataset): ''' fit dataset :param dataset: uid, iid, rating :return: ''' self.dataset = pd.DataFrame(dataset) self.users_ratings = dataset.groupby(self.columns[0]).agg([list])[[self.columns[1], self.columns[2]]] self.items_ratings = dataset.groupby(self.columns[1]).agg([list])[[self.columns[0], self.columns[2]]] self.globalMean = self.dataset[self.columns[2]].mean() self.P, self.Q, self.bu, self.bi = self.sgd() def _init_matrix(self): ''' initialization P and Q Matrix, and set the random value between 0 and 1 as the initial value :return: ''' # User-LF P = dict(zip( self.users_ratings.index, np.random.rand(len(self.users_ratings), self.number_LatentFactors).astype(np.float32) )) # Item-LF Q = dict(zip( self.items_ratings.index, np.random.rand(len(self.items_ratings), self.number_LatentFactors).astype(np.float32) )) return P, Q def sgd(self): ''' Use random gradient descent to optimize the results :return: ''' P, Q = self._init_matrix() # Initialize the values of bu and bi, and set them all to 0 bu = dict(zip(self.users_ratings.index, np.zeros(len(self.users_ratings)))) bi = dict(zip(self.items_ratings.index, np.zeros(len(self.items_ratings)))) for i in range(self.number_epochs): print("iter%d"%i) error_list = [] for uid, iid, r_ui in self.dataset.itertuples(index=False): v_pu = P[uid] v_qi = Q[iid] err = np.float32(r_ui - self.globalMean - bu[uid] - bi[iid] - np.dot(v_pu, v_qi)) v_pu += self.alpha * (err * v_qi - self.reg_p * v_pu) v_qi += self.alpha * (err * v_pu - self.reg_q * v_qi) P[uid] = v_pu Q[iid] = v_qi bu[uid] += self.alpha * (err - self.reg_bu * bu[uid]) bi[iid] += self.alpha * (err - self.reg_bi * bi[iid]) error_list.append(err ** 2) print(np.sqrt(np.mean(error_list))) return P, Q, bu, bi def predict(self, uid, iid): if uid not in self.users_ratings.index or iid not in self.items_ratings.index: return self.globalMean p_u = self.P[uid] q_i = self.Q[iid] return self.globalMean + self.bu[uid] + self.bi[iid] + np.dot(p_u, q_i) if __name__ == '__main__': dtype = [("userId", np.int32), ("movieId", np.int32), ("rating", np.float32)] dataset = pd.read_csv("datasets/ml-latest-small/ratings.csv", usecols=range(3), dtype=dict(dtype)) bsvd = BiasSvd(0.02, 0.01, 0.01, 0.01, 0.01, 10, 20) bsvd.fit(dataset) while True: uid = input("uid: ") iid = input("iid: ") print(bsvd.predict(int(uid), int(iid)))
come on.
thank!
strive!