Simple Distributed Cluster Cluster Setup for HDFS
Preface
This paper describes a simple HDFS fully distributed cluster setup operation, which is a simple distributed cluster because it is not a highly available HDFS.The next article will describe how to build a distributed cluster of HDFS for HA.
1. Planning for Cluster Building
A total of 4 machines need to be prepared.
One machine serves as the NomeNode node, four machines as the DataNode node, one of which shares a machine with the NameNode node.
hadoop3(192.168.23.133): NameNode & DataNode
hadoop4(192.168.23.134): DataNode
hadoop5(192.168.23.135): DataNode
hadoop6(192.168.23.136): DataNode
2. HDFS Distributed Cluster Setup
1. Clone 4 virtual machinesAll four cloned machines have JDK installed and environment variables configured. How do I install JDK and configure environment variables?
vim /etc/sysconfig/network-scripts/ifcfg-ens332.2 Modify hostname of 4 machines
vim /etc/hostname2.3 Modify hostname and ip mapping files hosts for 4 machines
vim /etc/hosts
Planning ip address mapping for 4 machines is configured on each machine
reboot3. Configure SSH Secret Login 3.1. Generate ssh private key on NameNode node
ssh-keygen3.2. Copy the public key to three other machines
ssh-copy-id hadoop4 ssh-copy-id hadoop5 ssh-copy-id hadoop64. Unzip the hadoop package on the NameNode node and configure it accordingly 4.1 Unzip hadoop package
tar -zxvf /root/hadoop/hadoop-2.9.2.tar.gz4.2 Configuring hadoop environment variables
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java-1.8.0 export JRE_HOME=$JAVA_HOME/jre export HADOOP_HOME=/root/hadoop/hadoop-2.9.2 export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Reload Configuration File
source /etc/profile4.3. Configure hadoop-evn.sh
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/hadoop-env.sh
Configure JAVA's path to hadoop's environment file
# The java implementation to use. export JAVA_HOME=/usr/lib/jvm/java-1.8.04.4. Configuring core-sit.xml
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/core-site.xml
1. Configure which machine is the namenode
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop3:9000</value> </property>
2. Default Hadoop configuration places data in the system temporary directory: /tmp/hadoop-$, the system temporary directory may result in unsafe cluster data, so modify the configuration to store the data in the specified directory. This article will store the data in the data folder under the unzipped hadoop-2.9.2
<property> <name>hadoop.tmp.dir</name> <value>/root/hadoop/hadoop-2.9.2/data</value> </property>4.5, Configure hdfs-site.xml
Modify the permissions of the root so that non-root users can also operate on hdfs
<property> <name>dfs.permissions.enabled</name> <value>false</value> </property>4.6. Configure slaves file
vim /root/hadoop/hadoop-2.9.2/etc/hadoop/slaves
Using hadoop3,hadoop4,hadoop5,hadoop6 as the DataNode Node Machine
hadoop3 hadoop4 hadoop5 hadoop65. Synchronize the configured hadoop-2.9.2 directory to other cluster nodes on the NameNode node
scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop4:/root/hadoop scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop5:/root/hadoop scp –r /etc/hadoop/Hadoop-2.9.2 root@hadoop6:/root/hadoop6. Format NameNode on NameNode node
hdfs namenode -format7. Turn off firewalls for 4 machines
systemctl stop firewalld
It is best to recompile profile files from three other machines
source /etc/profile8. Start Cluster
start-dfs.sh9. Access HDFS cluster browser interface
Machine IP of NameNode node to access
http://192.168.23.133:50070/