Find Common Friends - Data Mining - Scala Edition

Hello, there are many language implementations on the Internet about the algorithm of "Find common friends". When I have time today, I have studied the writing of the Scala algorithm myself.

The complete code can refer to the Git address:https://github.com/benben7466/SparkDemo/blob/master/spark-test/src/main/scala/testCommendFriend.scala

 

Data entered:

A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J

 

Core algorithm:

 1 package chunbo.recommend
 2 
 3 import org.apache.spark.SparkContext
 4 
 5 //Common Friend Statistics
 6 //Reference resources:http://www.cnblogs.com/charlesblc/p/6126346.html
 7 object testCommendFriend {
 8   def index(_spark_sc: SparkContext): Unit = {
 9 
10     //get data
11     val friendRDD = _spark_sc.textFile(Config.HDFS_HOSH + "test/common_friend")
12 
13     //map
14     val friendKV = friendRDD.map(x => {
15       val fields = x.split(":")
16       val person = fields(0)
17       val friends = fields(1).split(",").toList
18       (person, friends)
19     })
20 
21     val mapRDD = friendKV.flatMap(x => {
22       for (i <- 0 until x._2.length) yield (x._2(i), x._1)
23     })
24 
25     //reduce
26     val reduceRDD = mapRDD.reduceByKey(_ + "::" + _)
27 
28     //Print
29     reduceRDD.foreach(println)
30 
31   }
32 
33 }

 

The output data is as follows:

(L,D::E)
(B,A::E::F::J)
(J,O)
(H,O)
(F,A::C::D::G::L::M)
(D,A::C::E::F::G::H::K::L)
(G,M)
(M,E::F)
(O,A::F::H::I::J)
(A,B::C::D::F::G::H::I::K::O)
(I,C::O)
(K,B)
(C,A::B::E::F::G::H::K)
(E,A::B::D::F::G::H::L::M)

Explain:

Separated by commas, the left represents the common friends of the right collection.

For example (L,D::E), L is the common friend of users D and E.

 

 

 

Reference resources:http://www.cnblogs.com/charlesblc/p/6126346.html

Tags: Scala Spark git github

Posted on Sat, 04 Jul 2020 10:58:46 -0400 by EGNJohn