Hadoop introductory notes XX: MapReduce Counter counter

1, Counter overview

When the MapReduce program is executed, the console output information usually contains the following fragments:

The core word in the output information is counters, which is called counter in Chinese. The built-in counter function of Hadoop collects the main statistical information of jobs, which can help users understand the operation of the program and help users diagnose faults.

2, MapReduce built-in counters

Hadoop maintains some built-in counters for each MapReduce job. These counters report various indicators, such as counters related to the amount of input and output data in each stage of MapReduce program execution, which can help users judge whether the program logic is effective and correct.

Hadoop built-in counters are grouped according to functions. Each group includes several different counters: MapReduce framework, File System Counters, Job Counters, File Input Format Counters, and File Output Format Counters.

It should be noted that the built-in counters are global counters in the MapReduce program, which has nothing to do with the MapReduce distributed operation, not the so-called local statistics.

1. Map-Reduce Framework Counters

Counter nameexplain
MAP_INPUT_RECORDSThe number of input records processed by all mapper s
MAP_OUTPUT_RECORDSNumber of output records generated by all mapper s
MAP_OUTPUT_BYTESThe number of bytes of uncompressed output data generated by all mapper s
MAP_OUTPUT_MATERIALIZED_BYTESNumber of bytes actually written to disk after mapper output
COMBINE_INPUT_RECORDSThe number of input records processed by all combiners (if any)
COMBINE_OUTPUT_RECORDSThe number of output records generated by all combiner s (if any)
REDUCE_INPUT_GROUPSNumber of packets processed by all reducer s
REDUCE_INPUT_RECORDSThe number of input records that have been processed by all reducers. Each time a value is read by the iterator of a reducer, the value of this counter increases
REDUCE_OUTPUT_RECORDSNumber of all reducer output records
REDUCE_SHUFFLE_BYTESNumber of bytes copied to reducer when shuffling
SPILLED_RECORDSThe number of records overflowed to disk by all map and reduce tasks
CPU_MILLISECONDSThe total CPU time of a task, in milliseconds, can be obtained by / proc/cpuinfo
PHYSICAL_MEMORY_BYTESThe physical memory used by a task, in bytes, can be obtained by / proc/meminfo
VIRTUAL_MEMORY_BYTESThe number of bytes of virtual memory used by a task is obtained by / proc/meminfo

2. File System Counters Counters

The counters of the file system will count the usage of different file systems, such as HDFS and local file systems:

Counter nameexplain
BYTES_READThe number of bytes the program reads from the file system
BYTES_WRITTENThe number of bytes written by the program to the file system
READ_OPSThe number of read operations performed in the file system (for example, open operation, filestatus operation)
LARGE_READ_OPSThe number of large-scale read operations in the file system
WRITE_OPSThe number of write operations in the file system (for example, create operation, append operation)

3. Job Counters

Counter nameexplain
Launched map tasksThe number of map tasks started, including those started in speculative execution mode
Launched reduce tasksThe number of reduce tasks started, including those started in speculative execution mode
Data-local map tasksNumber of map tasks on the same node as the input data
Total time spent by all maps in occupied slots (ms)The total time (in milliseconds) spent by all map tasks in occupied slots
Total time spent by all reduces in occupied slots (ms)Total time (in milliseconds) spent by all reduce tasks in occupied slots
Total time spent by all map tasks (ms)Time spent on all map task s
Total time spent by all reduce tasks (ms)Time spent on all reduce task s

4. File Input|Output Format Counters

Counter nameexplain
Bytes_readThe number of bytes read by the map task through FilelnputFormat
Bytes_writtenThe number of bytes written by the map task (for jobs containing only map) or the reduce task through FileOutputFormat

3. MapReduce custom counters

Although the built-in counters in Hadoop are relatively comprehensive, which makes it convenient to monitor the job running process, there are specific requirements in some businesses (counting and counting certain situations in the statistical process) MapReduce also provides a method for users to write custom counters. Most importantly, counters are global statistics, avoiding the disadvantage of users maintaining global variables themselves.
The use of custom counters is divided into two steps:

  1. Obtain a global counter through the context.getCounter method. When creating, you need to specify the group name to which the counter belongs and the name of the counter:
  2. Where counters need to be used in the program, just call the methods provided by counter, such as + 1 operation:

4. Case: MapReduce custom counter usage

1. Demand

For word frequency statistics of a batch of files, for unknown reasons, the word "apple" may be inserted anywhere in any file. Now it is required to use the counter to count the number of occurrences of apple in the data, which is convenient for users to judge when executing the program.

2. Code implementation

1. Mapper class

public class WordCountMapper extends Mapper<LongWritable, Text,Text,LongWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //Get a global counter from the program context object: used to count the number of apple s
        //Counter group and counter name need to be specified
        Counter counter = context.getCounter("itcast_counters", "apple Counter");

        String[] words = value.toString().split("\\s+");
        for (String word : words) {
            //Judge whether the read content is apple. If so, add 1 to the counter
            if("apple".equals(word)){
                counter.increment(1);
            }
            context.write(new Text(word),new LongWritable(1));
        }
    }
}

2. Reducer class

public class WordCountReducer extends Reducer<Text, LongWritable,Text,LongWritable> {
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
        long count = 0;
        for (LongWritable value : values) {
            count +=value.get();
        }
        context.write(key,new LongWritable(count));
    }
}

3. Operation main class

public class WordCountDriver extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        // Create job instance
        Job job = Job.getInstance(getConf(), WordCountDriver.class.getSimpleName());
        // Set job driven class
        job.setJarByClass(this.getClass());

        // Set job mapper reducer class
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);

        // Set the job mapper stage output key value data type
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        //Set the output key value data type of the job reducer stage, that is, the final output data type of the program
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        // Configure the input data path for the job
        FileInputFormat.addInputPath(job, new Path(args[0]));
        // Configure the output data path of the job
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // Submit the job and wait for execution to complete
        return job.waitForCompletion(true) ? 0 : 1;

    }


    public static void main(String[] args) throws Exception {
        //Profile object
        Configuration conf = new Configuration();
        //Using the tool classtoolrunner submitter
        int status = ToolRunner.run(conf, new WordCountDriver(), args);
        //Exit the client program. The client exit status code is bound to the execution result of the MapReduce program
        System.exit(status);
    }
}

4. Implementation results

Tags: Hadoop

Posted on Mon, 01 Nov 2021 22:35:50 -0400 by returnButton