CentOS 6.8 build hadoop cluster

CentOS 6.8 build hadoop cluster

1. Prepare a clean CentOS 6.8 virtual machine

2. Turn off the firewall

1. Temporarily close the firewall

service iptables stop

2. Turn off firewall self startup

chkconfig iptables off

3. View firewall status

service iptables status

3. Set static ip

vim /etc/sysconfig/network-script/incfg-etho

DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
#eth0 can be used with or without double lead
NAME="eth0"
IPADDR=192.168.5.101
PREFIX=24
GATEWAY=192.168.5.2
DNS1=192.168.5.2

4. Modify the host name

vim /etc/hostname

5. modify the hosts file

vim /etc/hosts

6. Prepare atguigu users

Use root for the following operations

1. Set up atguigu users

useradd atguigu
passwd atguigu

2. Set atguigu to sudoers user (sudo without password)

visudo
# atguigu    ALL=(ALL)       NOPASSWD:ALL

3. Create under / opt

/opt/software
/opt/module

Two directories

Assign the ownership of these two directories to atguigu users

chown atguigu:atguigu /opt/module /opt/software

7. Clone the machine and use the general user to operate

1. Change network card script

vim /etc/udev/rules.d/70-persistent-net.rules

Delete the first line

Second elements
 Finally, change NAME="eth1" to NAME="eth0"

8. Make a distribution script for the distribution of files between different machines

1.rsync remote synchronization tool

rsync only updates the difference file.

Basic syntax:

rsync -av $pdir/$fname $user@hadoop$host:$pdir/$fname
-a	Archived copy
-v	Show replication process

Case practice:

Synchronize the / opt/software directory on the Hadoop 101 machine to the root user of the Hadoop 102 server

rsync -av /opt/software hadoop102:/opt/software

2. Distribute script:

#!/bin/bash
#Number of judgment parameters
if [ $# -lt 1 ]
then
	echo parameter number -lt 1
	exit
fi
#Loop through three hadoop machines
for host in hadoop203 hadoop204
do
	echo --------------------$host-------------------
	#Loop through all directories. Directories are the parameters
	for file in $@
		do
			#-E determine whether the file exists (- e includes the file and directory)
			if [ -e $file ]
			then
			#Get parent directory,
			#-P enters the absolute path, which is prepared for the soft connection
			dir=$(cd -P $(dirname $file);pwd)
			#Get file name
			base=$(basename $file)
			#Create directory remotely 
			# -p if there is an error, do not prompt, use the parent directory
			ssh $host "mkdir -p $dir"
			#rsync differentiation synchronization tool
			rsync -av $dir/$base $host:$dir
			else
			 echo $file does not exits!!! 
			fi
		done
done

3. Give executable permission to script (xsync)

chmod +x xsync
sudo cp ./xsync /bin
#Attempt to execute script
sudo xsync /bin/sxync

9. Configure password free login

1. Generate key pair

#Three return
ssh-keygen -t rsa 
-t Specify the key type to create
#Send the key to the local computer and enter the password once
ssh-copy-id hadoop102
#Sending keys to other machines in the cluster
xsync /home/atguigu/.ssh

10. Upload installation package configuration environment variables

sudo vim /etc/profile.d/my_envi.sh
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

11. Configure the core-site.xml file

    <!-- Appoint HDFS in NameNode Address -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop102:9000</value>
    </property>

    <!-- Appoint Hadoop Storage directory where files are generated at run time -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>

12. Configure the hdfs-site.xml file

  <!-- Number of copies of data -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- Appoint Hadoop Secondary name node host configuration -->
    <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>hadoop104:50090</value>
    </property>

13. Configure the yarn-site.xml file

    <!-- Site specific YARN configuration properties -->
    <!-- Reducer How to get data -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>

    <!-- Appoint YARN Of ResourceManager Address -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value>
    </property>
    <!-- Log aggregation enabled -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <!-- Log retention time set 7 days -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>

14. Configure mapred-site.xml

	<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- Historical server address -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop104:10020</value>
    </property>
    <!-- History server web End address -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop104:19888</value>
    </property>

15. Start the history server

mr-jobhistory-daemon.sh start historyserver

16. Configure slave

hadoop102
hadoop103
hadoop104

17. Distribution profile

xsync /opt/module/hadoop-2.7.2/etc

18. Format namenode

#Configure the namenode machine
hdfs namenode -format

22. Start hdfs

#Configure the namenode machine
#The slave file needs to be read when the group command is started
start-dfs.sh

23. Start yarn

start-yarn.sh

24. Single point start

hadoop-daemon.sh start namenode or datanode
yarn-daemon.sh start resourcemanager or nodemanager

25. Shut down the cluster

stop-dfs.sh 
stop-yarn.sh

26. If there is a problem with the cluster

stop-dfs.sh
stop-yarn.sh
#All three machines need to be executed
cd $HADOOP_HOME
rm -rf data logs
//Back to 18
Published 9 original articles, won praise 2, visited 304
Private letter follow

Tags: Hadoop rsync vim firewall

Posted on Thu, 20 Feb 2020 02:33:01 -0500 by igorek