1, Counter overview
When the MapReduce program is executed, the console output information usually contains the following fragments:
The core word in the output information is counters, which is called counter in Chinese. The built-in counter function of Hadoop collects the main statistical information of jobs, which can help users understand the operation of the program and help users diagnose faults.
2, MapReduce built-in counters
Hadoop maintains some built-in counters for each MapReduce job. These counters report various indicators, such as counters related to the amount of input and output data in each stage of MapReduce program execution, which can help users judge whether the program logic is effective and correct.
Hadoop built-in counters are grouped according to functions. Each group includes several different counters: MapReduce framework, File System Counters, Job Counters, File Input Format Counters, and File Output Format Counters.
It should be noted that the built-in counters are global counters in the MapReduce program, which has nothing to do with the MapReduce distributed operation, not the so-called local statistics.
1. Map-Reduce Framework Counters
Counter name | explain |
---|---|
MAP_INPUT_RECORDS | The number of input records processed by all mapper s |
MAP_OUTPUT_RECORDS | Number of output records generated by all mapper s |
MAP_OUTPUT_BYTES | The number of bytes of uncompressed output data generated by all mapper s |
MAP_OUTPUT_MATERIALIZED_BYTES | Number of bytes actually written to disk after mapper output |
COMBINE_INPUT_RECORDS | The number of input records processed by all combiners (if any) |
COMBINE_OUTPUT_RECORDS | The number of output records generated by all combiner s (if any) |
REDUCE_INPUT_GROUPS | Number of packets processed by all reducer s |
REDUCE_INPUT_RECORDS | The number of input records that have been processed by all reducers. Each time a value is read by the iterator of a reducer, the value of this counter increases |
REDUCE_OUTPUT_RECORDS | Number of all reducer output records |
REDUCE_SHUFFLE_BYTES | Number of bytes copied to reducer when shuffling |
SPILLED_RECORDS | The number of records overflowed to disk by all map and reduce tasks |
CPU_MILLISECONDS | The total CPU time of a task, in milliseconds, can be obtained by / proc/cpuinfo |
PHYSICAL_MEMORY_BYTES | The physical memory used by a task, in bytes, can be obtained by / proc/meminfo |
VIRTUAL_MEMORY_BYTES | The number of bytes of virtual memory used by a task is obtained by / proc/meminfo |
2. File System Counters Counters
The counters of the file system will count the usage of different file systems, such as HDFS and local file systems:
Counter name | explain |
---|---|
BYTES_READ | The number of bytes the program reads from the file system |
BYTES_WRITTEN | The number of bytes written by the program to the file system |
READ_OPS | The number of read operations performed in the file system (for example, open operation, filestatus operation) |
LARGE_READ_OPS | The number of large-scale read operations in the file system |
WRITE_OPS | The number of write operations in the file system (for example, create operation, append operation) |
3. Job Counters
Counter name | explain |
---|---|
Launched map tasks | The number of map tasks started, including those started in speculative execution mode |
Launched reduce tasks | The number of reduce tasks started, including those started in speculative execution mode |
Data-local map tasks | Number of map tasks on the same node as the input data |
Total time spent by all maps in occupied slots (ms) | The total time (in milliseconds) spent by all map tasks in occupied slots |
Total time spent by all reduces in occupied slots (ms) | Total time (in milliseconds) spent by all reduce tasks in occupied slots |
Total time spent by all map tasks (ms) | Time spent on all map task s |
Total time spent by all reduce tasks (ms) | Time spent on all reduce task s |
4. File Input|Output Format Counters
Counter name | explain |
---|---|
Bytes_read | The number of bytes read by the map task through FilelnputFormat |
Bytes_written | The number of bytes written by the map task (for jobs containing only map) or the reduce task through FileOutputFormat |
3. MapReduce custom counters
Although the built-in counters in Hadoop are relatively comprehensive, which makes it convenient to monitor the job running process, there are specific requirements in some businesses (counting and counting certain situations in the statistical process) MapReduce also provides a method for users to write custom counters. Most importantly, counters are global statistics, avoiding the disadvantage of users maintaining global variables themselves.
The use of custom counters is divided into two steps:
- Obtain a global counter through the context.getCounter method. When creating, you need to specify the group name to which the counter belongs and the name of the counter:
- Where counters need to be used in the program, just call the methods provided by counter, such as + 1 operation:
4. Case: MapReduce custom counter usage
1. Demand
For word frequency statistics of a batch of files, for unknown reasons, the word "apple" may be inserted anywhere in any file. Now it is required to use the counter to count the number of occurrences of apple in the data, which is convenient for users to judge when executing the program.
2. Code implementation
1. Mapper class
public class WordCountMapper extends Mapper<LongWritable, Text,Text,LongWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //Get a global counter from the program context object: used to count the number of apple s //Counter group and counter name need to be specified Counter counter = context.getCounter("itcast_counters", "apple Counter"); String[] words = value.toString().split("\\s+"); for (String word : words) { //Judge whether the read content is apple. If so, add 1 to the counter if("apple".equals(word)){ counter.increment(1); } context.write(new Text(word),new LongWritable(1)); } } }
2. Reducer class
public class WordCountReducer extends Reducer<Text, LongWritable,Text,LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long count = 0; for (LongWritable value : values) { count +=value.get(); } context.write(key,new LongWritable(count)); } }
3. Operation main class
public class WordCountDriver extends Configured implements Tool { @Override public int run(String[] args) throws Exception { // Create job instance Job job = Job.getInstance(getConf(), WordCountDriver.class.getSimpleName()); // Set job driven class job.setJarByClass(this.getClass()); // Set job mapper reducer class job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); // Set the job mapper stage output key value data type job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); //Set the output key value data type of the job reducer stage, that is, the final output data type of the program job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); // Configure the input data path for the job FileInputFormat.addInputPath(job, new Path(args[0])); // Configure the output data path of the job FileOutputFormat.setOutputPath(job, new Path(args[1])); // Submit the job and wait for execution to complete return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { //Profile object Configuration conf = new Configuration(); //Using the tool classtoolrunner submitter int status = ToolRunner.run(conf, new WordCountDriver(), args); //Exit the client program. The client exit status code is bound to the execution result of the MapReduce program System.exit(status); } }