Setup of Hadoop Cluster

Setup of Hadoop Cluster

Required installation package:
1.jdk compression package
2.hadoop compression package

Download in Baidu cloud if necessary:
jdk: Link:https://pan.baidu.com/s/1Jaclnw1-Ml4lrLgnBJIV9A Extraction Code: 6pgi
hadoop:Link:https://pan.baidu.com/s/1Lr1NDR00DpjcMG1Kczg59A Extraction Code: mqig

Start building a hadoop cluster (do the following to make sure the virtual machine network is connected):
1.Using VMvare to create a virtual machine, I use CentOS 7, so here's an example of CentOS 7 (you can also create three directly, but my personal habit is to create one first, copy what the environment is like)

It's customary to call it master because it's a host, but it's a matter of personal preference
After the virtual machine is set up, CRT or Xshell can be used to login and operate the server remotely.

2. Installation of jdk

2.1 Preparations for uploading installation packages

mkdir /home/software   #Create a folder to store installation packages. See your personal habits
cd /home/software/
rz               #You can use rz command to upload jdk file, if rz does not exist, use Yum install lrzsz to download, after successful download, upload jdk file at rz in the home directory, and ls to check the file to see that jdk uploaded successfully
tar -zxvf jdk-8u144-linux-x64.tar.gz    #Unzip jdk to current directory
ls                                            #Check if decompression was successful
mv jdk1.8.0_144/ jdk             #Rename
pwd                                        #View Current Directory

2.2 Configuration/etc/profile file

vim /etc/profile  #Configure java's environment variables in the jdk directory, or with the vi command
 Add to:
#java path configuration
export	JAVA_HOME=/home/software/jdk         #Is the jdk installation path
export	PATH=$JAVA_HOME/bin:$PATH
export	CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

#Notice that the middle space is the tab key
: wq Save Exit

Effective Profile

 source /etc/profile
 java -version       #See if java was successfully installed and configured
 javac
 java
 chkconfig iptables off  #Use this command to permanently close a firewall if you forgot to turn it off before

Configuration jdk is successfully installed if


Incomplete

Not completely

3. Installation of Hadoop

3.1 Preparations for uploading installation packages

cd /home/software/
rz               #Upload hadoop installation package
tar -zxvf hadoop-2.6.5.tar.gz
ls                             #Check if decompression was successful
mv hadoop-2.6.5 hadoop

3.2 Configuration/etc/profile file

 vim /etc/profile  #Configuring environment variables for hadoop
 Add to:
 #hadoop configuration
 export  HADOOP_HOME=/home/software/hadoop
 export  PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
 #Notice that the middle space is the tab key
: wq Save Exit

3.3 Effective Profile

source /etc/profile
hadoop version #Check for success

Fourth, Clone Virtual Machine (slave1,slave2)

4.1 Duplicate two virtual machines
Find the directory of the virtual machine, copy two virtual machines directly, named Slave1, Slave2, and open the file accordingly to delete
The following file (otherwise it may not open)

Click "I copied the virtual machine"

4.2 Remember to modify the ip addresses of their network card profiles separately:

vi /etc/sysconfig/network-scripts/ifcfg-ens33,

Remember to restart the network card:

systemctl restart network

4.3 Close the firewall for master, slave1,slave2

systemctl stop firewalld.service   #Stop firewall 
systemctl disable firewalld.service #Disable firewall startup

4.4 Modify their hostnames separately

CentOS7 Modify Host Name:

hostnamectl set-hostname +host name 
hostnamectl set-hostname Master  
hostname  #Check it out
ls

The same is true for slave1 and slave2.

If CentOS6 modifies the hostname, use

vi /etc/sysconfig/network
Master Change Host Name
vi /etc/sysconfig/network
 modify
HOSTNAME=Master 

Slave1 Change Host Name
vi /etc/sysconfig/network 
Add to
HOSTNAME=Slave1

Slave2 Change Host Name
vi /etc/sysconfig/network 
Add to
HOSTNAME=Slave2

4.5 hosts Modify domain name: (Once on all three nodes)

vi /etc/hosts
 Modified to
192.168.100.10 Master Master.cn 
192.168.100.11 Slave1 Slave1.cn
192.168.100.12 Slave2 Slave2.cn

#ip address hostname domain name,
Save time:

scp /etc/hosts slave1.cn:/etc/    # (Change the domain name to the ip address of the virtual machine that accepts the file) and pass it to the two exceptions

Otherwise, it will be modified one by one, but each will be modified anyway
Once set up, you can access it with a domain name
Restart

reboot

Test whether the domain name is configured successfully:
Create a file on Master such as

touch 18hadoop.bc,
scp 18hadoop.bc Slave1.cn:/root    #Pass to Slave1
 stay Slave1 upper
ls               #You can see that the file has been successfully transferred by domain name
5. Configure ssh password-free login
cd ~
ssh-keygen -t rsa     #Keep pressing Enter
ssh-copy-id Master.cn    #Copy to another virtual machine
ssh-copy-id Slave2.cn    #Copy to another virtual machine
touch 2.c 
scp 2.c Master.cn:/root #Create a file to upload to Master to test if confidentiality is successful

ssh-copy-id Master.cn  #Copy public key to Master
ssh-copy-id Slave2.cn  #Copy public key to Slave2

This is done once on two exceptional virtual machines
This enables three virtual machines to log in password-free and transfer files
Use scp to test if password-free login succeeds

Six.Modify profile to cluster on Master

6.1 Create a folder to store the hadoop database first

  cd /home/software/   #Enter software to create a new directory for data, and ls once
  mkdir data
  cd data
  mkdir -p hadoop/tmp    #Create a new hadoop data under data and ls once

6.2 Modify the hadoop configuration file

  1. Modify the hadoop-env.sh file*
cd /home/software/hadoop/etc/
ls
cd hadoop/            cd hadoop/  #Configuration file for hadoop into etc
vim hadoop-env.sh  #Enter the installation directory for modifying JAVA_HOME

  1. Modify mapred-env.sh file
vim mapred-env.sh   #Also enter the installation directory that modifies JAVA_HOME

  1. Modify the core-site.xml file
 vim core-site.xml    #Configure core-site.xml
 stay<configuration>and</configuration>Add in the middle:
 <property>
	<name>hadoop.tmp.dir</name>
	<value>/home/software/data/hadoop/tmp</value>
</property>
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://192.168.100.10:9000</value>
</property>
#Where/export/data/hadoop/tmp was previously built to hold Hadoop data
#hdfs://192.168.100.10:9000 IP address of host Master before port number, can default to 9000

  1. Modify yarn-site.xml file
vim yarn-site.xml       #Configure yarn-sit.xml
 stay<configuration>and</configuration>Add in the middle
<property>
<name>yarn.resourcemanager.hostname</name>
 <value>192.168.100.10</value>
 </property>
 <property>
 <name>yarn.nodemanager.aux-services</name
 <value>mapreduce_shuffle</value>
 </property>
 #192.168.100.10 is the IP address of the host Master

  1. Modify the hdfs-site.xml file
 vim hdfs-site.xml  #Configure hdfs-site.xml 
  stay<configuration>and</configuration>Add in the middle
  <property>
              <name>dfs.replication</name>
              <value>3</value>
  </property>
  <property>
            <name>dfs.secondary.http.address</name
            <value>192.168.100.10:50090</value>
   </property>
 #192.168.100.10 is the IP address of the host and 3 is the node number of the machine.

  1. Modify mapred-site.xml file
cp mapred-site.xml.template mapred-site.xml #Copy a copy of mapred-site.xml
vim mapred-site.xml         #Configure mapred-site.xml (Resource Manager)
stay<configuration>and</configuration>Add in the middle
<property>
          <name>mapreduce.framework.name</name
          <value>yarn</value>
</property>

  1. Modify slaves file
 vim slaves       #Configure slaves
 modify
 localhost--->Slave1.cn  Slave2.cn
 #Lave1.cn Slave2.cn is the machine you want to cluster


#The above profile is complete

Configured for distribution to other machines:

 cd /home/software
 ls
 scp -r hadoop slave2.cn:/home/software/
 scp -r hadoop slave1.cn:/home/software/
Seven, Start Cluster

Format the file system before starting

hdfs namenode -format
start-dfs.sh   #Start dfs
start-yarn.sh #Start yarn
jps   #You can view the processes started by each node, and DataNode and NodeManager can be seen by the JobHistoryServer processes jps on Slave 1 and 2 nodes, NameNode, ResourceManager, SecondrryNameNode, and JobHistoryServer processes

Appear this,

Congratulations, the cluster has been successfully built

Shutting down a cluster is also shutting down on Master, executing commands sequentially

 stop-yarn.sh 
 stop-dfs.sh 



 start-all.sh Start entire cluster
 stop-all.sh  Close entire cluster

Tags: Java Big Data Hadoop

Posted on Wed, 06 Oct 2021 12:50:45 -0400 by clewis4343