Operation and maintenance project training - Hadoop distributed file system HDFS

1, Hadoop distributed file system HDFS single data storage node

1. Install and deploy Hadoop

Specific steps: https://blog.csdn.net/Hannah_zh/article/details/81169416

2. Modify the configuration file
<1> Make the address of Namenode
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim core-site.xml 
<configuration>
    <property>
            <name>fs.defaultFS</name>
                    <value>hdfs://172.25.51.1:9000</value>
    </property>
</configuration>

<2> Specify the Datanode address and the number of copies of data saved by hdfs
[hadoop@server1 hadoop]$ vim slaves 
172.25.51.1
[hadoop@server1 hadoop]$ vim hdfs-site.xml 
<configuration>
      <property>
             <name>dfs.replication</name>
                  <value>1</value>     ##The number of copies of data saved by hdfs is 1
      </property>
</configuration>

3. Set ssh password free login (premise: install ssh service)
[hadoop@server1 ~]$ ssh-keygen
[hadoop@server1 ~]$ cd .ssh/
[hadoop@server1 .ssh]$ ls
id_rsa  id_rsa.pub
[hadoop@server1 .ssh]$ cp id_rsa.pub authorized_keys

Figure: verify password free login

4. Format metadata node (Namenode)
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ pwd
/home/hadoop/hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ ls /tmp/
hadoop-hadoop  hsperfdata_hadoop

Illustration: files generated after formatting

5. open dfs
[hadoop@server1 hadoop]$ sbin/start-dfs.sh

6. Configure environment variables

Note: after configuring environment variables, you need to log in again to take effect

[hadoop@server1 ~]$ vim .bash_profile 
 10 PATH=$PATH:$HOME/bin:~/java/bin
[hadoop@server1 ~]$ logout    
7.jps command to view java process
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ jps
2082 DataNode              ##Data node
2239 SecondaryNameNode     ##From metadata node
1989 NameNode              ##Metadata node
2941 Jps



8. Create directory and upload to input
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user
[hadoop@server1 hadoop]$ bin/hdfs dfs -mkdir /user/hadoop
[hadoop@server1 hadoop]$ bin/hdfs dfs -put input/   ##Upload to input
[hadoop@server1 hadoop]$ bin/hdfs dfs -ls


9. Execute the wordcount program in hadoop
[hadoop@server1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount input output

Figure: the following results show that the operation is successful

2, Hadoop distributed file system HDFS multi data storage node

Experimental environment: RedHat version 6.5

Namenode: 1G memory

server1: 172.25.51.1

Datanode: 1G memory

server2: 172.25.51.2
server3: 172.25.51.3

The specific deployment process is as follows:

For the Namenode node, delete the operation in the previous experimental deployment and stop dfs
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ rm -fr input/ output/
[hadoop@server1 hadoop]$ bin/hdfs dfs -get output
[hadoop@server1 hadoop]$ rm -fr output/
[hadoop@server1 hadoop]$ sbin/stop-dfs.sh 

1.Namenode
[root@server1 ~]# yum install -y nfs-utils
[root@server1 ~]# /etc/init.d/rpcbind start   ##Before opening the nfs service. This service must be turned on
[root@server1 ~]# vim /etc/exports 
/home/hadoop      *(rw,anonuid=800,anongid=800)
[root@server1 ~]# /etc/init.d/nfs start
[root@server1 ~]# exportfs -v
[root@server1 ~]# exportfs -rv
2.Datanode (the operations of 172.25.51.3 and 172.25.51.2 are the same)
[root@server2 ~]# useradd -u 800 hadoop
[root@server2 ~]# id hadoop
uid=800(hadoop) gid=800(hadoop) groups=800(hadoop)
[root@server2 ~]# yum install -y nfs-utils
[root@server2 ~]# /etc/init.d/rpcbind start
[root@server2 ~]# showmount -e 172.25.51.1
[root@server2 ~]# mount 172.25.51.1:/home/hadoop/ /home/hadoop/
[root@server2 ~]# df
Filesystem                   1K-blocks    Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root  19134332  925180  17237172   6% /
tmpfs                           510188       0    510188   0% /dev/shm
/dev/vda1                       495844   33451    436793   8% /boot
172.25.51.1:/home/hadoop/     19134336 1962240  16200192  11% /home/hadoop

3. configure hdfs
[root@server1 ~]# su - hadoop
[hadoop@server1 ~]$ cd hadoop/etc/hadoop/
[hadoop@server1 hadoop]$ vim slaves     ##Set Datanode
172.25.51.2
172.25.51.3
[hadoop@server1 hadoop]$ vim hdfs-site.xml 
 19 <configuration>
 20       <property>
 21              <name>dfs.replication</name>
 22                   <value>2</value>     ##Duplicate two documents
 23       </property>
 24 </configuration>
[hadoop@server1 hadoop]$ cd /tmp/
[hadoop@server1 tmp]$ rm -fr *
4. Test ssh security free service
[hadoop@server1 tmp]$ ssh server2
[hadoop@server2 ~]$ logout
[hadoop@server1 tmp]$ ssh server3
[hadoop@server3 ~]$ logout
[hadoop@server1 tmp]$ ssh 172.25.120.2
[hadoop@server2 ~]$ logout
[hadoop@server1 tmp]$ ssh 172.25.120.3
[hadoop@server2 ~]$ logout
5. Reformat
[hadoop@server1 ~]$ cd hadoop
[hadoop@server1 hadoop]$ bin/hdfs namenode -format
[hadoop@server1 hadoop]$ ls /tmp/
hadoop-hadoop  hsperfdata_hadoop
6. start dfs
[hadoop@server1 hadoop]$ sbin/start-dfs.sh


Figure: jps viewing java process


Tags: Hadoop ssh vim xml

Posted on Fri, 31 Jan 2020 18:22:13 -0500 by dc_jt