Hadoop learning notes - MapReduce implements friend recommendation records

1, Introduction

The friend recommendation function is simply a demand to predict whether two people know each other and recommend them as friends.

2, Train of thought

For two users who are not friends, the more common friends they have, the more likely they are to know each other.

For example, the raw data are as follows

Tom Cat Hello Hadoop Spring

Cat Hello Spring

Hello Tom Netty Hadoop Cat 

Hadoop Tom Hello Netty Spring 

Spring Tom Cat Hadoop

Netty Hello Hadoop

Each line represents a user's friend list. The first name in each line is the user name of the user, and the user name followed represents his friends

Then we need to get the following data

Tom Netty 2
Cat Netty 1
Cat Hadoop 3
Hello Spring 3
Spring Netty 1

Here, the two user names in each line are the predicted two possible users, and the number represents the number of friends they share

map:

Combine the person names of each line, and finally output key as the person name combination and value as the flag bit. It should be noted that the person names should be sorted in a certain order to prevent the same person name combination from generating two different keys

Flag bit 0 indicates that the user must be a direct friend, that is, the user himself and each user in his user list are direct friends.

Flag bit 1 indicates that they may be indirect friends. Because only one line can be processed at a time, two people in a line are not direct friends, but they may be direct friends in other people's friend list, such as Cat and Hello. In the first line, we can only determine whether they can be indirect friends, and in the second line, we know that they are actually direct friends. However, Cat and Hadoop are real indirect friends. Therefore, you need to set the flag bit here to confirm whether you are a real indirect friend in the reduce phase. Selecting 1 as the flag bit also facilitates the cumulative calculation in the reduce phase.

reduce:

Process each person name combination. If the value has flag bit 0, it is discarded directly. If the flag bit is not 0, it can be determined that it must be an indirect friend for cumulative calculation.

3, Realize

public class RecommentFriendJob {
	
	/**
	 * 
	 * @param args 0|profile;1|input;2|output;3|master-ip;4|operator;5|homeDir
	 * @throws Exception
	 */
	public static void main(String[] args) throws Exception {
		
		Configuration config = JobUtil.init(args);
		
		Job job = Job.getInstance(config);
		
		job.setJarByClass(RecommentFriendJob.class);
		job.setJobName("recommentFriend");
		
		Path inputPath = new Path(args[1]);
		FileInputFormat.addInputPath(job, inputPath);
		
		Path outputPath = new Path(args[2]);
		if(outputPath.getFileSystem(config).exists(outputPath)) {
			outputPath.getFileSystem(config).delete(outputPath, true);
		}
		FileOutputFormat.setOutputPath(job, outputPath);
		
		job.setMapperClass(ReconmentFriendMapper.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		job.setNumReduceTasks(1);
		job.setReducerClass(RecommentFriendReducer.class);
		
		boolean isSuccess = job.waitForCompletion(true);
		Home("isSuccess:" + isSuccess);
		System.exit(isSuccess ? 0 : 1);
		
	}
	
}

FriendMapper

public class ReconmentFriendMapper extends Mapper<Object, Text, Text, IntWritable> {
 
	/**
	 * 
	 * map The key of is the friend name, and the value is 0 | direct friend; 1 | it may be an indirect friend and needs further processing in reduce
	 * 
	 */
	@Override
	public void map(Object key, Text value, Context context) throws 
IOException, InterruptedException {
		
	Home("key:" + key + ",value:" + value);
	String[] friends = value.toString().split(" ");
	for(int i=0; i < friends.length; i++) {
			
		String self = friends[i];
			
		for(int j=i+1; j < friends.length; j++) {
				
		Home("i:" + i + ",j:" + j);
		if(i == 0) {
			// Direct friend
			String directFriend = friends[j];
			Text directFriendKey = new Text(sort(self, directFriend));
			Home("direct:" + directFriendKey.toString());
			context.write(directFriendKey, new IntWritable(0));
		} else {
			// May be an indirect friend
			String indirectFriend = friends[j];
			Text indirectFriendKey = new Text(sort(self, indirectFriend));
			Home("indirect:" + indirectFriendKey.toString());
			context.write(indirectFriendKey, new IntWritable(1));
	}
				
	}
			
	}
		
}
	
	private String sort(String self, String directFriend) {
	   if(self.compareToIgnoreCase(directFriend) < 0) {
		  return directFriend + " " + self;
		}
		return self + " " + directFriend;
	}
	
}

FriendReduce class

public class RecommentFriendReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
 
	@Override
	public void reduce(Text key, Iterable<IntWritable> values, Context context) 
throws IOException, InterruptedException {
		Home("key:" + key);
		int sum = 0;
		boolean isDirectFriend = false;
		for(IntWritable value : values) {
			if(value.get() == 0) {
				// Direct friend
				Home("direct friend");
				isDirectFriend = true;
				break;
			}
			sum = sum + value.get();
		}
		
		if(!isDirectFriend) {
			context.write(key, new IntWritable(sum));
		}
	}
	
}

Tags: Big Data Hadoop mapreduce

Posted on Thu, 11 Nov 2021 03:44:53 -0500 by flashicon