catalog
- Pre preparation
- View local network information
- View network connection status
- Change network information
- Change host name
- Clone virtual machine to obtain slave1 and slave2 nodes
- Configure parameter information of slave1 and slave2
- Mapping host name to ip
- Configure ssh password free login
- Turn off firewall and SELinux
- Install JDK
- Create a new user
- hadoop environment configuration
- Run the Wordcount program
Pre preparation
First of all, open my own virtual machine. I use CentOS 7 system, but the operation of different systems is not different.
View local network information
Enter the virtual network editor
Enter NAT settings and view the following information
View network connection status
You can see the network successfully connected
It is not in our habit to enter ifconfig command and find that there is no eth0 (if it is eth0, you can skip this step). And remote ssh connection is not possible
cd /etc/sysconfig/network-scripts/ mv ifcfg-ens33 ifconfig-eth0
Change network information
If you have eth0, you can execute it from here
Enter administrator mode, because if you do not enter, it will show that you cannot save.
su
vim /etc/sysconfig/network-scripts/ifcfg-eth0
Make changes to the following information. Note that the ip and gateway here need to be recorded by yourself.
Restart the network card, and you can see that the changes take effect
service network restart
If the network card fails to restart
vim /etc/default/grub
Add the following
Execute the following command to change our configuration
grub2-mkconfig -o /boot/grub2/grub.cfg
If it doesn't work, do it
reboot
Change host name
vim /etc/hostname
Reboot the computer by executing the reboot command
Finally, check whether the configuration is correct:
Clone virtual machine to obtain slave1 and slave2 nodes
Shut down the virtual machine and enter the virtual machine clone
Click Next
Go to the next step again
Complete cloning
Select installation location
Click finish and wait.
Configure parameter information of slave1 and slave2
Use the same method above to configure the parameter information of slave1 and slave2. Remember to select different ip addresses, and select slave1 and slave2 as the host names.
vim /etc/sysconfig/network-scripts/ifcfg-eth0
vim /etc/hostname
reboot to see if the configuration is successful
Mapping host name to ip
vim /etc/hosts
Add the following
Check whether the configuration is successful
Configure ssh password free login
For your convenience, I have synthesized one more command, and input "enter" all the time during the operation.
ssh-keygen -t rsa&&cd ~/.ssh/&&cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys&&chmod 600 ~/.ssh/authorized_keys&&cat ~/.ssh/authorized_keys&&ls
The results are as follows
master slave1 slave2 executes this command and authorizes_ Keys (the red part of the icon above), copied to the authorized part of the master node_ Keys.
The final results are as follows:
Pass these public keys to the child nodes, and test whether they can be password free login.
scp ~/.ssh/authorized_keys root@slave1:~/.ssh/ scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
Turn off firewall and SELinux
Do the following on all nodes:
yum install iptables-services systemctl stop firewalld
On the master node, do the following:
vim /etc/selinux/config
Install JDK
Download address: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Enter the folder where the jdk is placed and execute (remember to change to the name of your own compressed package)
mkdir -p /usr/local/java # Create the folder you want tar -vzxf jdk-8u251-linux-x64.tar.gz -C /usr/local/java/ # Extract to the specified location
View name
At the bottom of the file or specify file add
export JAVA_HOME=/usr/local/java/jdk1.8.0_251 export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/ export PATH=$PATH:$JAVA_HOME/bin
Execute code application environment variable
source /etc/profile
To see if the installation was successful:
java -version
Create a new user
adduser hadoop
Do the following
The following information was found to indicate that the configuration was successful
Give their superuser privileges:
vim /etc/sudoers
Change to the following form
hadoop environment configuration
Download and install
Download website: https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gz
Place hadoop in the folder you specify, enter the directory, and execute the following command.
tar -vzxf hadoop-3.1.3.tar.gz -C /usr/&&cd /usr&&cd ./hadoop-3.1.3&&mkdir -p dfs/name&&mkdir -p dfs/name&&mkdir temp&&ls
Environment configuration
cd ./etc/hadoop/&&vim hadoop-env.sh
Add the following environment variables
export JAVA_HOME=/usr/local/java/jdk1.8.0_251/ HADOOP_PREFIX=/usr/hadoop-3.1.3
vim yarn-env.sh
Add the following:
if [ "$JAVA_HOME" != "" ];then #echo "run java in $JAVA_HOME" JAVA_HOME=/usr/local/java/jdk1.8.0_251/ fi
Open slaves or workers in the current folder
vim workers
Delete the hostname and add your own node name.
vim /etc/profile
Add the following environment variables
export HADOOP_HOME=/usr/hadoop-3.1.3 export PATH=$HADOOP_HOME/bin:$PATH
Change profile
Change sh file
cd /usr/hadoop-3.1.3/sbin/
Set start-dfs.sh ,stop-dfs.sh Add the following parameters at the top of both files
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
start-yarn.sh ,stop-yarn.sh The following should also be added at the top:
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
Change xml file
cd /usr/hadoop-3.1.3/etc/hadoop/
vim core-site.xml
Add the following information
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop-3.1.3/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hduser.groups</name> <value>*</value> </property> </configuration>
vim hdfs-site.xml
Add the following information
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/hadoop-3.1.3/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/hadoop-3.1.3/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
vim mapred-site.xml
Add the following information
<configuration> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
vim yarn-site.xml
Add the following information
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
Check success
hadoop version
implement
hadoop classpath
Copy the printed information and add it to yarn as I do below- site.xml Medium.
<property> <name>yarn.application.classpath</name> <value>/usr/hadoop-3.1.3/etc/hadoop:/usr/hadoop-3.1.3/share/hadoop/common/lib/*:/usr/hadoop-3.1.3/share/hadoop/common/*:/usr/hadoop-3.1.3/share/hadoop/hdfs:/usr/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/usr/hadoop-3.1.3/share/hadoop/hdfs/*:/usr/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.1.3/share/hadoop/mapreduce/*:/usr/hadoop-3.1.3/share/hadoop/yarn:/usr/hadoop-3.1.3/share/hadoop/yarn/lib/*:/usr/hadoop-3.1.3/share/hadoop/yarn/*</value> </property>
Transfer and connect
Transfer to two child nodes
scp -r /usr/hadoop-3.1.3/ root@slave1:/usr/&&scp -r /usr/hadoop-3.1.3/ root@slave2:/usr/
Format namenode
/usr/hadoop-3.1.3/bin/hdfs namenode -format
Open cluster
/usr/hadoop-3.1.3/sbin/stop-all.sh&&/usr/hadoop-3.1.3/sbin/start-dfs.sh&&/usr/hadoop-3.1.3/sbin/start-yarn.sh
Check whether it is opened successfully
hdfs dfsadmin -report
If live datanodes is not 0, it is successful
If unsuccessful solution 1
If jps is executed on the slave node without this, it is caused by executing the / usr/hadoop-3.1.3/bin/hdfs namenode -format code multiple times:
Enter hdfs-site.xml Find the following two paths, and delete all the contents on the master and slave nodes.
Execute the following command again
/usr/hadoop-3.1.3/bin/hdfs namenode -format /usr/hadoop-3.1.3/sbin/stop-all.sh&&/usr/hadoop-3.1.3/sbin/start-dfs.sh&&/usr/hadoop-3.1.3/sbin/start-yarn.sh hdfs dfsadmin -report
If unsuccessful solution II
If you have the following datanodes
It should be caused by not closing the firewall:
Execute the following command on all nodes:
systemctl stop firewalld
Execute the following command again
/usr/hadoop-3.1.3/sbin/stop-all.sh&&/usr/hadoop-3.1.3/sbin/start-dfs.sh&&/usr/hadoop-3.1.3/sbin/start-yarn.sh hdfs dfsadmin -report
Run the Wordcount program
Randomly find several txt files to place in the specified path
hadoop dfs -mkdir -p /usr/hadoop-3.1.3/input&&hadoop dfs -put You put txt Path to/* /usr/hadoop-3.1.3/input&&hadoop dfs -ls /usr/hadoop-3.1.3/input
Note that the output path cannot exist in advance. If it exists, delete it with the following command:
hadoop dfs -rmr /usr/hadoop-3.1.3/output
Run the Wordcount program:
hadoop jar /usr/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /usr/hadoop-3.1.3/input /usr/hadoop-3.1.3/output
The following results show that the operation is successful
View output folder
hadoop dfs -ls /usr/hadoop-3.1.3/output
Print the results
hadoop dfs -cat /usr/hadoop-3.1.3/output/part-r-00000