Anhui Province Big Data and Artificial Intelligence Application Competition 2021 - MapReduce (Data Preprocessing) Topic Answer (Second Question)

Link to Topic 1
Anhui Province Big Data and Artificial Intelligence Application Competition 2021-Topic Answer of MapReduce

Title: Use MapReduce to count each mobile phone number in calls.txt, call duration and number of calls, call duration, call number, and output format as mobile phone number, call duration, call number, call duration and call number;

calls.txt call log
Example: 18620192711,15733218050,1506628174,1506628265,650000,810000
Fields are:
Caller's mobile number, recipient's mobile number, start time stamp, end time stamp, caller's address province code, recipient's address province code

package Demo.mapreduce;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.log4j.BasicConfigurator;

import java.util.Date;
 * send_time = Sender's mobile number
 * receive_time = Recipient's mobile number
 * talk_time = call duration
 * send_time = Call duration
 * receive_time = Called duration
 * send_count = Number of calls
 * receive_count = Number of Calls
public class subject2 {
    public static class demoMapper extends Mapper<LongWritable,Text,Text,Text>{
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] split = line.split(",");
            String send_phone = split[0];
            String receive_phone = split[1];
            Date time1 = new Date(Long.parseLong(split[2]) * 1000L);
            Date time2 = new Date(Long.parseLong(split[3]) * 1000L);
            long talk_time = (time2.getTime() - time1.getTime())/1000;

            context.write(new Text(send_phone),new Text("send,"+talk_time));
            context.write(new Text(receive_phone),new Text("receive,"+talk_time));

    public static class demoReducer extends Reducer<Text,Text,Text,Text>{
        protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            int send_time = 0;
            int receive_time = 0;
            int send_count = 0;
            int receive_count = 0;
            for (Text value : values) {
                String string = value.toString();
                String[] split = string.split(",");
                    send_time += Integer.parseInt(split[1]);
                    receive_time += Integer.parseInt(split[1]);
            context.write(new Text(key),new Text(","+send_time+"Seconds,"+send_count+"Secondly,"+receive_time+"Seconds,"+receive_count+"second"));


    public static void main(String[] args) throws Exception{
        // Configure mapreduce
        Job job = Job.getInstance();
        //Specify Path
        Path input1 = new Path("hdfs://master:9000/data/calls.txt");

        Path output = new Path("hdfs://master:9000/output ";//output path cannot already exist

        //Get the file system object FS and use fs to manipulate files in hdfs
        FileSystem fs = FileSystem.get(new URI("hdfs://master:9000"),new Configuration());


The result is

The main difficulty with this question is whether or not you can think of writing two context.write() statements on the Map side, that is, passing in the values of the two fields as key s while also distinguishing the values of value.

There are two fields in the reduce side of this question, but the values of these two fields are identical, both are mobile numbers. So even if you write two context.write() The same key value, that is, the same mobile phone number, will be combined, but the value value needs to be distinguished. Because there are two identities for the same mobile phone number, one is the caller, the other is the callee, the other is the call duration and the call number. The other is the callee, the call duration and the call number. Number.

Calls and calls can be counted by counting++ while traversing the number of values on the reduce side

But the map side passes to the reduce side, which is the call duration. So give the value, which is the call duration, a prefix. Then match the prefix with equals on the reduce side, which distinguishes the call duration from the call duration for the same phone number.

Also note that the call duration and the call duration should be cumulative rather than direct output, because values have multiple call times, divided into two partitions by prefix, call duration partition and call duration partition, each partition still has multiple call time values, and the final result should be accumulated.

Tags: Big Data mapreduce

Posted on Fri, 26 Nov 2021 16:46:57 -0500 by haddydaddy