Error in Job running
Unable to close file because the last block BP-1820686335-10.201.48.27-144816918
ava.io.IOException: Unable to close file because the last block BP-1820686335-10.201.48.27-1448169181587:blk_1850383542_781036567 does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2705) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2667) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2621) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:248) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:380) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1060) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:67) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:83) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:937) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2299) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2388) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2119) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2081) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1972) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1898) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:514) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:475) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:263) at java.lang.Thread.run(Thread.java:745)
Reference: [HDFS] high task reports HDFS exception: last block does not have enough number of replicas. It is known that it is caused by excessive load on hadoop server. Just re execute the high SQL script. To completely solve the problem, you need to
It is suggested to reduce the task concurrency or control the cpu utilization to reduce the network transmission, so that the DN can smoothly report the block to the NN.
Problem conclusion:
Reduce system load. When the cluster occurs, the load is very heavy. All 32 cores (100%) of the CPU are allocated. MR thinks that at least 20% of the CPU should be reserved
The main reason is that there are too many block s. You can consider doing a large directory scan to sort out the directories of too many small files before processing
java.lang.IllegalArgumentException: java.net.UnknownHostException
Solve the path and check the resourcemanager. It is found that a node exists and the hostname cannot be found. After deletion, the problem is eliminated
But the problem hasn't been explained yet. In yarn, I just saw that the corresponding hostname was not found and the container was not allocated. In addition, the corresponding container was allocated, but the corresponding application was successfully executed
2017-12-21 13:34:36,732 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCC ESS APPID=application_1513834407876_0012 CONTAINERID=container_e91_1513834407876_0012_01_000086 595972 2017-12-21 13:34:36,732 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e91_1513834407876_0012_01_000086 of capacity <memory:4096, vCores:1> on host slave19.bl.bigdata:8041, which has 6 containers, <memory:27648, vCores:12> used and <memory:54272, vCores:36> available after allocation 595973 2017-12-21 13:34:36,748 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Error trying to assign container token and NM token to an allocated container container_e91_1513694506641_4872_01_000001 595974 java.lang.IllegalArgumentException: java.net.UnknownHostException: BGhadoop08 595975 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406) 595976 at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:256) 595977 at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220) 595978 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:455) 595979 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:823) 595980 at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:532) 595981 at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) 595982 at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) 595983 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) 595984 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) 595985 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220) 595986 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) 595987 at java.security.AccessController.doPrivileged(Native Method) 595988 at javax.security.auth.Subject.doAs(Subject.java:422) 595989 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) 595990 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214) 595991 Caused by: java.net.UnknownHostException: BGhadoop08
**Analysis**
- Are all the stuck tasks on the corresponding server without hostname configured?
- How is the speculative execution of hadoop triggered
- Why can some tasks be assigned to a task without a hostname, while others cannot
In fact, it is very clear. UnknownHostException
Cluster service
The NTP service for the host could not be found, or the service did not respond to a clock skew request
scene
The CDH cluster starts successfully, but some hosts prompt "the NTP service of the host cannot be found, or the service does not respond to the clock deviation request"
Problem thinking
- NTP service is not started normally
- Exception in CDH daemon
Solution script
1. First close the CDH service, and then close the cluster service in the interface
2. Turn on NTP service for each host
systemctl restart ntpd
3. Restart cloudera SCM agent on each host
systemctl restart cloudera-scm-agent
Wait for 5 minutes and go to the CDH console to check the results. The exception has been resolved