1, Introduction
The friend recommendation function is simply a demand to predict whether two people know each other and recommend them as friends.
2, Train of thought
For two users who are not friends, the more common friends they have, the more likely they are to know each other.
For example, the raw data are as follows
Tom Cat Hello Hadoop Spring Cat Hello Spring Hello Tom Netty Hadoop Cat Hadoop Tom Hello Netty Spring Spring Tom Cat Hadoop Netty Hello Hadoop
Each line represents a user's friend list. The first name in each line is the user name of the user, and the user name followed represents his friends
Then we need to get the following data
Tom Netty 2 Cat Netty 1 Cat Hadoop 3 Hello Spring 3 Spring Netty 1
Here, the two user names in each line are the predicted two possible users, and the number represents the number of friends they share
map:
Combine the person names of each line, and finally output key as the person name combination and value as the flag bit. It should be noted that the person names should be sorted in a certain order to prevent the same person name combination from generating two different keys
Flag bit 0 indicates that the user must be a direct friend, that is, the user himself and each user in his user list are direct friends.
Flag bit 1 indicates that they may be indirect friends. Because only one line can be processed at a time, two people in a line are not direct friends, but they may be direct friends in other people's friend list, such as Cat and Hello. In the first line, we can only determine whether they can be indirect friends, and in the second line, we know that they are actually direct friends. However, Cat and Hadoop are real indirect friends. Therefore, you need to set the flag bit here to confirm whether you are a real indirect friend in the reduce phase. Selecting 1 as the flag bit also facilitates the cumulative calculation in the reduce phase.
reduce:
Process each person name combination. If the value has flag bit 0, it is discarded directly. If the flag bit is not 0, it can be determined that it must be an indirect friend for cumulative calculation.
3, Realize
public class RecommentFriendJob { /** * * @param args 0|profile;1|input;2|output;3|master-ip;4|operator;5|homeDir * @throws Exception */ public static void main(String[] args) throws Exception { Configuration config = JobUtil.init(args); Job job = Job.getInstance(config); job.setJarByClass(RecommentFriendJob.class); job.setJobName("recommentFriend"); Path inputPath = new Path(args[1]); FileInputFormat.addInputPath(job, inputPath); Path outputPath = new Path(args[2]); if(outputPath.getFileSystem(config).exists(outputPath)) { outputPath.getFileSystem(config).delete(outputPath, true); } FileOutputFormat.setOutputPath(job, outputPath); job.setMapperClass(ReconmentFriendMapper.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setNumReduceTasks(1); job.setReducerClass(RecommentFriendReducer.class); boolean isSuccess = job.waitForCompletion(true); Home("isSuccess:" + isSuccess); System.exit(isSuccess ? 0 : 1); } }
FriendMapper
public class ReconmentFriendMapper extends Mapper<Object, Text, Text, IntWritable> { /** * * map The key of is the friend name, and the value is 0 | direct friend; 1 | it may be an indirect friend and needs further processing in reduce * */ @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { Home("key:" + key + ",value:" + value); String[] friends = value.toString().split(" "); for(int i=0; i < friends.length; i++) { String self = friends[i]; for(int j=i+1; j < friends.length; j++) { Home("i:" + i + ",j:" + j); if(i == 0) { // Direct friend String directFriend = friends[j]; Text directFriendKey = new Text(sort(self, directFriend)); Home("direct:" + directFriendKey.toString()); context.write(directFriendKey, new IntWritable(0)); } else { // May be an indirect friend String indirectFriend = friends[j]; Text indirectFriendKey = new Text(sort(self, indirectFriend)); Home("indirect:" + indirectFriendKey.toString()); context.write(indirectFriendKey, new IntWritable(1)); } } } } private String sort(String self, String directFriend) { if(self.compareToIgnoreCase(directFriend) < 0) { return directFriend + " " + self; } return self + " " + directFriend; } }
FriendReduce class
public class RecommentFriendReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { Home("key:" + key); int sum = 0; boolean isDirectFriend = false; for(IntWritable value : values) { if(value.get() == 0) { // Direct friend Home("direct friend"); isDirectFriend = true; break; } sum = sum + value.get(); } if(!isDirectFriend) { context.write(key, new IntWritable(sum)); } } }