Hello, there are many language implementations on the Internet about the algorithm of "Find common friends". When I have time today, I have studied the writing of the Scala algorithm myself.
The complete code can refer to the Git address:https://github.com/benben7466/SparkDemo/blob/master/spark-test/src/main/scala/testCommendFriend.scala
Data entered:
A:B,C,D,F,E,O B:A,C,E,K C:F,A,D,I D:A,E,F,L E:B,C,D,M,L F:A,B,C,D,E,O,M G:A,C,D,E,F H:A,C,D,E,O I:A,O J:B,O K:A,C,D L:D,E,F M:E,F,G O:A,H,I,J
Core algorithm:
1 package chunbo.recommend 2 3 import org.apache.spark.SparkContext 4 5 //Common Friend Statistics 6 //Reference resources:http://www.cnblogs.com/charlesblc/p/6126346.html 7 object testCommendFriend { 8 def index(_spark_sc: SparkContext): Unit = { 9 10 //get data 11 val friendRDD = _spark_sc.textFile(Config.HDFS_HOSH + "test/common_friend") 12 13 //map 14 val friendKV = friendRDD.map(x => { 15 val fields = x.split(":") 16 val person = fields(0) 17 val friends = fields(1).split(",").toList 18 (person, friends) 19 }) 20 21 val mapRDD = friendKV.flatMap(x => { 22 for (i <- 0 until x._2.length) yield (x._2(i), x._1) 23 }) 24 25 //reduce 26 val reduceRDD = mapRDD.reduceByKey(_ + "::" + _) 27 28 //Print 29 reduceRDD.foreach(println) 30 31 } 32 33 }
The output data is as follows:
(L,D::E) (B,A::E::F::J) (J,O) (H,O) (F,A::C::D::G::L::M) (D,A::C::E::F::G::H::K::L) (G,M) (M,E::F) (O,A::F::H::I::J) (A,B::C::D::F::G::H::I::K::O) (I,C::O) (K,B) (C,A::B::E::F::G::H::K) (E,A::B::D::F::G::H::L::M)
Explain:
Separated by commas, the left represents the common friends of the right collection.
For example (L,D::E), L is the common friend of users D and E.
Reference resources:http://www.cnblogs.com/charlesblc/p/6126346.html