Setup of Hadoop Cluster
Required installation package:
1.jdk compression package
2.hadoop compression package
Download in Baidu cloud if necessary:
jdk: Link:https://pan.baidu.com/s/1Jaclnw1-Ml4lrLgnBJIV9A Extraction Code: 6pgi
hadoop:Link:https://pan.baidu.com/s/1Lr1NDR00DpjcMG1Kczg59A Extraction Code: mqig
Start building a hadoop cluster (do the following to make sure the virtual machine network is connected):
1.Using VMvare to create a virtual machine, I use CentOS 7, so here's an example of CentOS 7 (you can also create three directly, but my personal habit is to create one first, copy what the environment is like)
It's customary to call it master because it's a host, but it's a matter of personal preference
After the virtual machine is set up, CRT or Xshell can be used to login and operate the server remotely.
2. Installation of jdk
2.1 Preparations for uploading installation packages
mkdir /home/software #Create a folder to store installation packages. See your personal habits cd /home/software/ rz #You can use rz command to upload jdk file, if rz does not exist, use Yum install lrzsz to download, after successful download, upload jdk file at rz in the home directory, and ls to check the file to see that jdk uploaded successfully tar -zxvf jdk-8u144-linux-x64.tar.gz #Unzip jdk to current directory ls #Check if decompression was successful mv jdk1.8.0_144/ jdk #Rename pwd #View Current Directory
2.2 Configuration/etc/profile file
vim /etc/profile #Configure java's environment variables in the jdk directory, or with the vi command Add to: #java path configuration export JAVA_HOME=/home/software/jdk #Is the jdk installation path export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar #Notice that the middle space is the tab key : wq Save Exit
Effective Profile
source /etc/profile java -version #See if java was successfully installed and configured javac java chkconfig iptables off #Use this command to permanently close a firewall if you forgot to turn it off before
Configuration jdk is successfully installed if
Incomplete
Not completely
3. Installation of Hadoop
3.1 Preparations for uploading installation packages
cd /home/software/ rz #Upload hadoop installation package tar -zxvf hadoop-2.6.5.tar.gz ls #Check if decompression was successful mv hadoop-2.6.5 hadoop
3.2 Configuration/etc/profile file
vim /etc/profile #Configuring environment variables for hadoop Add to: #hadoop configuration export HADOOP_HOME=/home/software/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH #Notice that the middle space is the tab key : wq Save Exit
3.3 Effective Profile
source /etc/profile hadoop version #Check for success
Fourth, Clone Virtual Machine (slave1,slave2)
4.1 Duplicate two virtual machines
Find the directory of the virtual machine, copy two virtual machines directly, named Slave1, Slave2, and open the file accordingly to delete
The following file (otherwise it may not open)
Click "I copied the virtual machine"
4.2 Remember to modify the ip addresses of their network card profiles separately:
vi /etc/sysconfig/network-scripts/ifcfg-ens33,
Remember to restart the network card:
systemctl restart network
4.3 Close the firewall for master, slave1,slave2
systemctl stop firewalld.service #Stop firewall systemctl disable firewalld.service #Disable firewall startup
4.4 Modify their hostnames separately
CentOS7 Modify Host Name:
hostnamectl set-hostname +host name hostnamectl set-hostname Master hostname #Check it out ls
The same is true for slave1 and slave2.
If CentOS6 modifies the hostname, use
vi /etc/sysconfig/network Master Change Host Name vi /etc/sysconfig/network modify HOSTNAME=Master Slave1 Change Host Name vi /etc/sysconfig/network Add to HOSTNAME=Slave1 Slave2 Change Host Name vi /etc/sysconfig/network Add to HOSTNAME=Slave2
4.5 hosts Modify domain name: (Once on all three nodes)
vi /etc/hosts Modified to 192.168.100.10 Master Master.cn 192.168.100.11 Slave1 Slave1.cn 192.168.100.12 Slave2 Slave2.cn
#ip address hostname domain name,
Save time:
scp /etc/hosts slave1.cn:/etc/ # (Change the domain name to the ip address of the virtual machine that accepts the file) and pass it to the two exceptions
Otherwise, it will be modified one by one, but each will be modified anyway
Once set up, you can access it with a domain name
Restart
reboot
Test whether the domain name is configured successfully:
Create a file on Master such as
touch 18hadoop.bc, scp 18hadoop.bc Slave1.cn:/root #Pass to Slave1 stay Slave1 upper ls #You can see that the file has been successfully transferred by domain name
5. Configure ssh password-free login
cd ~ ssh-keygen -t rsa #Keep pressing Enter ssh-copy-id Master.cn #Copy to another virtual machine ssh-copy-id Slave2.cn #Copy to another virtual machine touch 2.c scp 2.c Master.cn:/root #Create a file to upload to Master to test if confidentiality is successful
ssh-copy-id Master.cn #Copy public key to Master ssh-copy-id Slave2.cn #Copy public key to Slave2
This is done once on two exceptional virtual machines
This enables three virtual machines to log in password-free and transfer files
Use scp to test if password-free login succeeds
Six.Modify profile to cluster on Master
6.1 Create a folder to store the hadoop database first
cd /home/software/ #Enter software to create a new directory for data, and ls once mkdir data cd data mkdir -p hadoop/tmp #Create a new hadoop data under data and ls once
6.2 Modify the hadoop configuration file
- Modify the hadoop-env.sh file*
cd /home/software/hadoop/etc/ ls cd hadoop/ cd hadoop/ #Configuration file for hadoop into etc vim hadoop-env.sh #Enter the installation directory for modifying JAVA_HOME
- Modify mapred-env.sh file
vim mapred-env.sh #Also enter the installation directory that modifies JAVA_HOME
- Modify the core-site.xml file
vim core-site.xml #Configure core-site.xml stay<configuration>and</configuration>Add in the middle: <property> <name>hadoop.tmp.dir</name> <value>/home/software/data/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.100.10:9000</value> </property> #Where/export/data/hadoop/tmp was previously built to hold Hadoop data #hdfs://192.168.100.10:9000 IP address of host Master before port number, can default to 9000
- Modify yarn-site.xml file
vim yarn-site.xml #Configure yarn-sit.xml stay<configuration>and</configuration>Add in the middle <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.100.10</value> </property> <property> <name>yarn.nodemanager.aux-services</name <value>mapreduce_shuffle</value> </property> #192.168.100.10 is the IP address of the host Master
- Modify the hdfs-site.xml file
vim hdfs-site.xml #Configure hdfs-site.xml stay<configuration>and</configuration>Add in the middle <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.secondary.http.address</name <value>192.168.100.10:50090</value> </property> #192.168.100.10 is the IP address of the host and 3 is the node number of the machine.
- Modify mapred-site.xml file
cp mapred-site.xml.template mapred-site.xml #Copy a copy of mapred-site.xml vim mapred-site.xml #Configure mapred-site.xml (Resource Manager) stay<configuration>and</configuration>Add in the middle <property> <name>mapreduce.framework.name</name <value>yarn</value> </property>
- Modify slaves file
vim slaves #Configure slaves modify localhost--->Slave1.cn Slave2.cn #Lave1.cn Slave2.cn is the machine you want to cluster
#The above profile is complete
Configured for distribution to other machines:
cd /home/software ls scp -r hadoop slave2.cn:/home/software/ scp -r hadoop slave1.cn:/home/software/
Seven, Start Cluster
Format the file system before starting
hdfs namenode -format start-dfs.sh #Start dfs start-yarn.sh #Start yarn jps #You can view the processes started by each node, and DataNode and NodeManager can be seen by the JobHistoryServer processes jps on Slave 1 and 2 nodes, NameNode, ResourceManager, SecondrryNameNode, and JobHistoryServer processes
Appear this,
Congratulations, the cluster has been successfully built
Shutting down a cluster is also shutting down on Master, executing commands sequentially
stop-yarn.sh stop-dfs.sh start-all.sh Start entire cluster stop-all.sh Close entire cluster