Simple Distributed Cluster Cluster Setup for HDFS

Preface

This paper describes a simple HDFS fully distributed cluster setup operation, which is a simple distributed cluster because it is not a highly available HDFS.The next article will describe how to build a distributed cluster of HDFS for HA.

1. Planning for Cluster Building

A total of 4 machines need to be prepared.
One machine serves as the NomeNode node, four machines as the DataNode node, one of which shares a machine with the NameNode node.
hadoop3(192.168.23.133): NameNode & DataNode
hadoop4(192.168.23.134): DataNode
hadoop5(192.168.23.135): DataNode
hadoop6(192.168.23.136): DataNode

2. HDFS Distributed Cluster Setup

1. Clone 4 virtual machines

All four cloned machines have JDK installed and environment variables configured. How do I install JDK and configure environment variables?

2. Set ip address, hostname and mapping file of ip and hostname for 4 machines respectively 2.1 Modify ip addresses of 4 machines

vim /etc/sysconfig/network-scripts/ifcfg-ens33

2.2 Modify hostname of 4 machines

vim /etc/hostname

2.3 Modify hostname and ip mapping files hosts for 4 machines

vim /etc/hosts

Planning ip address mapping for 4 machines is configured on each machine

2.4 Reboot 4 machines

reboot

3. Configure SSH Secret Login 3.1. Generate ssh private key on NameNode node

ssh-keygen

3.2. Copy the public key to three other machines

ssh-copy-id hadoop4 ssh-copy-id hadoop5 ssh-copy-id hadoop6

4. Unzip the hadoop package on the NameNode node and configure it accordingly 4.1 Unzip hadoop package

tar -zxvf /root/hadoop/hadoop-2.9.2.tar.gz

4.2 Configuring hadoop environment variables

vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-1.8.0 export JRE_HOME=$JAVA_HOME/jre export HADOOP_HOME=/root/hadoop/hadoop-2.9.2 export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Reload Configuration File

source /etc/profile

4.3. Configure hadoop-evn.sh

vim /root/hadoop/hadoop-2.9.2/etc/hadoop/hadoop-env.sh

Configure JAVA's path to hadoop's environment file

# The java implementation to use. export JAVA_HOME=/usr/lib/jvm/java-1.8.0

4.4. Configuring core-sit.xml

vim /root/hadoop/hadoop-2.9.2/etc/hadoop/core-site.xml

1. Configure which machine is the namenode

<property> <name>fs.defaultFS</name> <value>hdfs://hadoop3:9000</value> </property>

2. Default Hadoop configuration places data in the system temporary directory: /tmp/hadoop-$, the system temporary directory may result in unsafe cluster data, so modify the configuration to store the data in the specified directory. This article will store the data in the data folder under the unzipped hadoop-2.9.2

<property> <name>hadoop.tmp.dir</name> <value>/root/hadoop/hadoop-2.9.2/data</value> </property>

4.5, Configure hdfs-site.xml

Modify the permissions of the root so that non-root users can also operate on hdfs

<property> <name>dfs.permissions.enabled</name> <value>false</value> </property>

4.6. Configure slaves file

vim /root/hadoop/hadoop-2.9.2/etc/hadoop/slaves

Using hadoop3,hadoop4,hadoop5,hadoop6 as the DataNode Node Machine

hadoop3 hadoop4 hadoop5 hadoop6

5. Synchronize the configured hadoop-2.9.2 directory to other cluster nodes on the NameNode node

scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop4:/root/hadoop scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop5:/root/hadoop scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop6:/root/hadoop

6. Format NameNode on NameNode node

hdfs namenode -format

7. Turn off firewalls for 4 machines

systemctl stop firewalld

It is best to recompile profile files from three other machines

source /etc/profile

8. Start Cluster

start-dfs.sh

9. Access HDFS cluster browser interface

Machine IP of NameNode node to access

http://192.168.23.133:50070/

Simple Distributed Cluster Cluster Setup for HDFS