Simple Distributed Cluster Cluster Setup for HDFS

Simple Distributed Cluster Cluster Setup for HDFS


This paper describes a simple HDFS fully distributed cluster setup operation, which is a simple distributed cluster because it is not a highly available HDFS.The next article will describe how to build a distributed cluster of HDFS for HA.

1. Planning for Cluster Building

A total of 4 machines need to be prepared.
One machine serves as the NomeNode node, four machines as the DataNode node, one of which shares a machine with the NameNode node.
hadoop3( NameNode & DataNode
hadoop4( DataNode
hadoop5( DataNode
hadoop6( DataNode

2. HDFS Distributed Cluster Setup

1. Clone 4 virtual machines

All four cloned machines have JDK installed and environment variables configured. How do I install JDK and configure environment variables?

2. Set ip address, hostname and mapping file of ip and hostname for 4 machines respectively

2.1 Modify ip addresses of 4 machines
vim /etc/sysconfig/network-scripts/ifcfg-ens33

2.2 Modify hostname of 4 machines
vim /etc/hostname

2.3 Modify hostname and ip mapping files hosts for 4 machines
vim /etc/hosts

Planning ip address mapping for 4 machines is configured on each machine

2.4 Reboot 4 machines

3. Configure SSH Secret Login

3.1. Generate ssh private key on NameNode node
3.2. Copy the public key to three other machines
ssh-copy-id hadoop4
ssh-copy-id hadoop5
ssh-copy-id hadoop6

4. Unzip the hadoop package on the NameNode node and configure it accordingly

4.1 Unzip hadoop package
tar -zxvf /root/hadoop/hadoop-2.9.2.tar.gz
4.2 Configuring hadoop environment variables
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java-1.8.0

export JRE_HOME=$JAVA_HOME/jre

export HADOOP_HOME=/root/hadoop/hadoop-2.9.2


Reload Configuration File

source /etc/profile
4.3. Configure
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/

Configure JAVA's path to hadoop's environment file

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0
4.4. Configuring core-sit.xml
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/core-site.xml

1. Configure which machine is the namenode


2. Default Hadoop configuration places data in the system temporary directory: /tmp/hadoop-${}, the system temporary directory may result in unsafe cluster data, so modify the configuration to store the data in the specified directory. This article will store the data in the data folder under the unzipped hadoop-2.9.2


4.5, Configure hdfs-site.xml

Modify the permissions of the root so that non-root users can also operate on hdfs


4.6. Configure slaves file
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/slaves

Using hadoop3,hadoop4,hadoop5,hadoop6 as the DataNode Node Machine


5. Synchronize the configured hadoop-2.9.2 directory to other cluster nodes on the NameNode node

scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop4:/root/hadoop
scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop5:/root/hadoop
scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop6:/root/hadoop

6. Format NameNode on NameNode node

hdfs namenode -format

7. Turn off firewalls for 4 machines

systemctl stop firewalld 

It is best to recompile profile files from three other machines

source /etc/profile

8. Start Cluster

9. Access HDFS cluster browser interface

Machine IP of NameNode node to access

Tags: Hadoop vim ssh Java

Posted on Sat, 20 Jun 2020 21:56:51 -0400 by abhi_10_20