Hadoop fully distributed installation -- hadoop-2.7.3

catalogue 1, Installation preparation   2: Installing hadoop on the master node 3, Installing hadoop from a node   4, Start hadoop 5, Verify installation 1, Installation preparation 1. Three virtual machines are required: the master node is Hadoop 001, the slave nodes are Hadoop 002 and Hadoop 003; hadoop001,hadoop002,hado ...

Posted on Wed, 01 Dec 2021 09:27:56 -0500 by Slip

7. Hadoop3.3.1 HA High Availability Cluster QJM (based on Zookeeper, NameNode High Availability + Yarn High Availability)

Previous 1. CentOS7 hadoop3.3.1 Installation (single machine distributed, pseudo distributed, distributed)2. Implementation of HDFS by JAVA API3. MapReduce programming examplesIV. Zookeeper3.7 Installation5. Shell operation of Zookeeper6. Java API Operations zookeeper Node) Setup of Hadoop3.3.1 HA High Availability Cluster (NameNode High ...

Posted on Mon, 22 Nov 2021 20:33:21 -0500 by Hendricus

Memory overflow caused by Spark reading Snappy compressed files on HDFS

There are some files growing every day on HDFS. At present, Snappy compression is used. Suddenly, one day, OOM 1. Reasons: Because snappy cannot split slices, a file will be read by a task. After reading and decompressing, the data will expand many times. If the number of files is too large and your parallelism is very large, it will lead to ...

Posted on Fri, 19 Nov 2021 01:54:04 -0500 by tstout2

Big data HDFS application development

1. HDFS Shell operation (development focus) Through the previous study, we have a basic understanding of HDFS. Now we want to practice to deepen our understanding of HDFS For HDFS, we can operate on the shell command line, which is similar to operating the file system in linux, but there are some differences in the operation format of s ...

Posted on Mon, 08 Nov 2021 09:38:12 -0500 by mathieumg

[Hadoop of big data] cluster environment construction

1 Introduction to Hadoop 1.1 advantages 1) High reliability: Hadoop bottom layer maintains multiple data copies, so even if a Hadoop computing element or storage fails, it will not lead to data loss. 2) High scalability: allocating task data among clusters can easily expand thousands of nodes. 3) Efficiency: under the idea of MapReduce, H ...

Posted on Sun, 31 Oct 2021 15:09:49 -0400 by nicephotog