Hadoop High Availability Environment Setup (QJM)


1. Virtual Machine Configuration

host name NN DN ZK ZKFC JNN RM NM
node1 *

* * *
node2 * * * * * * *
node3
* *
*
*
node4
* *


*

2. Hadoop High Availability (HA) Implementation (QJM)

1. Modify profile on node1 host

  • Modify core-site.xml

[root@node1 ~]# vi /opt/hadoop/etc/hadoop/core-site.xml
#Modify to the following:
<configuration>
       <property>
               <name>fs.defaultFS</name>
               <value>hdfs://mycluster</value>
       </property>
       <property>
               <name>hadoop.tmp.dir</name>
               <value>/hadoop-full/</value>
       </property>
</configuration>
  • Modify hdfs-site.xml

[root@node1 hadoop]# vi /opt/hadoop/etc/hadoop/hdfs-site.xml
#The modifications are as follows:
<configuration>
       <property>
               <name>dfs.nameservices</name>
               <value>mycluster</value>
       </property>
       <property>
               <name>dfs.ha.namenodes.mycluster</name>
               <value>nn1,nn2</value>
       </property>
       <property>
               <name>dfs.namenode.rpc-address.mycluster.nn1</name>
               <value>node1:8020</value>
       </property>
       <property>
               <name>dfs.namenode.rpc-address.mycluster.nn2</name>
               <value>node2:8020</value>
       </property>
       <property>
               <name>dfs.namenode.http-address.mycluster.nn1</name>
               <value>node1:50070</value>
       </property>
       <property>
               <name>dfs.namenode.http-address.mycluster.nn2</name>
               <value>node2:50070</value>
       </property>
       <property>
               <name>dfs.namenode.shared.edits.dir</name>
               <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
       </property>
       <property>
               <name>dfs.client.failover.proxy.provider.mycluster</name>
             <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
       </property>
       <property>
               <name>dfs.ha.fencing.methods</name>
               <value>sshfence</value>
       </property>
       <property>
               <name>dfs.ha.fencing.ssh.private-key-files</name>
               <value>/root/.ssh/id_rsa</value>
       </property>
       <property>
               <name>dfs.journalnode.edits.dir</name>
               <value>/hadoop-full/journalnode</value>
       </property>
       <property>
               <name>dfs.replication</name>
               <value>3</value>
       </property>
       <property>
               <name>dfs.permissions.enabled</name>
               <value>false</value>
       </property>
</configuration>

2. Distribute configuration files to node2, node3, node4

[root@node1 ~]#cd /opt/hadoop/etc/hadoop/
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node2:/`pwd`
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node3:/`pwd`
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node4:/`pwd`

3. Start journalnode

[root@node1 ~]# hadoop-daemon.sh start journalnode
[root@node2 ~]# hadoop-daemon.sh start journalnode
[root@node3 ~]# hadoop-daemon.sh start journalnode

4. HDFS Namenode data synchronization

  • Formatting (used in the first configuration, not for a running cluster)

[root@node1 ~]# hdfs namenode -format
2020-01-27 11:09:37,394 INFO common.Storage: Storage directory /hadoop-full/dfs/name has been successfully formatted.
[root@node1 ~]# hadoop-daemon.sh start namenode
  • Shared log file initial initialization (used by a running non-HA cluster)

[root@node1 ~]#hdfs namenode -initializeSharedEdits
  • Start the namenode node on node1

[root@node1 ~]#hadoop-daemon.sh start namenode
  • Node 2 Node Synchronization Mirror Data

[root@node2 ~]# hdfs namenode -bootstrapStandby
  • Start namenode on node2 node

[root@node2 ~]#hadoop-daemon.sh start namenode

5. Start the datanode node

[root@node2 ~]#hadoop-daemon.sh start datanode
[root@node3 ~]# hadoop-daemon.sh start datanode
[root@node4 ~]# hadoop-daemon.sh start datanode

6. Promote namenode node to active state

hdfs haadmin -transitionToActive nn1

7. Validation

  • Command Line Validation

[root@node1 ~]# jps
2948 Jps
1829 NameNode
2013 JournalNode
[root@node2 ~]# jps
2029 Jps
1455 NameNode
1519 DataNode
1599 JournalNode
[root@node3 ~]# jps
1335 Jps
1195 DataNode
1275 JournalNode
[root@node4 ~]# jps
997 Jps
967 DataNode
  • Web pages viewing Web

8. Command Line Operations HA Cluster

  • View service status

[root@node1 ~]#hdfs haadmin -getServiceState nn1
  • Set namenode as active node

Set namenode as active node
  • Set namenode to Standby

[root@node1 ~]#hdfs haadmin -transitionToStandby nn1
  • Manual Failover

[root@node1 ~]#hdfs haadmin -failover nn1 nn2

3. Zookeeper Installation Configuration

1. Modify hosts file

[root@node2 conf]# vi /etc/hosts
#Add the following
127.0.0.1       localhost

2. Upload zookeeper installation package to node2

3. Unzip the installation package to the specified directory

[root@node2 ~]# tar -zxvf zookeeper-3.4.6.tar.gz -C /opt/

4. Rename directory

[root@node2 ~]# mv /opt/zookeeper-3.4.6/ /opt/zookeeper

5. Modify Profile

  • rename profile

[root@node2 ~]#cd /opt/zookeeper/conf/
[root@node2 conf]#cp zoo_sample.cfg zoo.cfg
  • Modify Profile

[root@node2 conf]#vi zoo.cfg
Line #12 is modified to read as follows
dataDir=/hadoop-full/zookeeper
#Add the following at the end of the file
server.1=node2:2888:3888
server.2=node3:2888:3888
server.3=node4:2888:3888

5. Add environment variables

  • Modify Profile

[root@node2 ~]# vi /etc/profile.d/hadoop.sh
#Modify to the following
export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop
export ZOOKEEPER_HOME=/opt/zookeeper
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin
  • Recompile File

[root@node2 ~]# source /etc/profile

6. Create a working directory

[root@node2 ~]# mkdir -p /hadoop-full/zookeeper
[root@node2 ~]# echo 1 >/hadoop-full/zookeeper/myid

7. Distributing Documents

  • Distribute hosts files

[root@node2 conf]# scp /etc/hosts node3:/etc/hosts
[root@node2 conf]# scp /etc/hosts node4:/etc/hosts
  • Distributing environment variables

[root@node2 ~]# scp /etc/profile.d/hadoop.sh node3:/etc/profile.d/
[root@node2 ~]# scp /etc/profile.d/hadoop.sh node4:/etc/profile.d/
  • Compile profile file

[root@node3 ~]# source /etc/profile
[root@node4 ~]# source /etc/profile
  • Distributing zookeeper installation files

[root@node2 ~]# scp -r /opt/zookeeper node3:/opt/
[root@node2 ~]# scp -r /opt/zookeeper node4:/opt/
  • Distribution working directory

[root@node2 ~]# scp -r /hadoop-full/zookeeper node3:/hadoop-full/
[root@node2 ~]# scp -r /hadoop-full/zookeeper node4:/hadoop-full/

8. Modify myid file

[root@node3 ~]# echo 2 >/hadoop-full/zookeeper/myid
[root@node4 ~]# echo 3 >/hadoop-full/zookeeper/myid

9. Start Services

[root@node2 ~]# zkServer.sh start
[root@node3 ~]# zkServer.sh start
[root@node4 ~]# zkServer.sh start

10. View Status

[root@node2 opt]# zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@node3 ~]# zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@node4 ~]# zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower

5. Zookeeper implements Hadoop Namenode HA failover automatically

1. Modify profile on node1 host

  • Modify core-site.xml

[root@node1 ~]# vi /opt/hadoop/etc/hadoop/core-site.xml
#Add the following to the original content:
<configuration>
       <property>
               <name>ha.zookeeper.quorum</name>
               <value>node2:2181,node3:2181,node4:2181</value>
       </property>
</configuration>
  • Modify hdfs-site.xml

[root@node1 hadoop]# vi /opt/hadoop/etc/hadoop/hdfs-site.xml
#Add the following to the original content:
<configuration>
       <property>
               <name>dfs.ha.automatic-failover.enabled</name>
               <value>true</value>
       </property>
</configuration>

2. Distribute configuration files to node2, node3, node4

[root@node1 ~]#cd /opt/hadoop/etc/hadoop/
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node2:/`pwd`
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node3:/`pwd`
[root@hadoop ~]# scp core-site.xml hdfs-site.xml node4:/`pwd`

3. zookeeper FailerController Formatting

[root@node1 ~]# hdfs zkfc -formatZK
2020-01-27 11:26:40,326 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK

4. namenode node installs psmisc (ZKFC host)

[root@node1 ~]#yum install psmisc -y
[root@node2 ~]#yum install psmisc -y

5. Noe2 Secret Login Noe2 Settings

[root@node2 ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:21X44T83NPr3k/FabLoSGZZDCoGck1ncNiowoLJGlr0 root@node2
The key's randomart image is:
+---[RSA 2048]----+
|  .. . Boo       |
| .o o B o + ..   |
|o+ . o . + +..o  |
|+.  . . . . =+ . |
|.. E   .S  ..+oo |
|.        o .o o+.|
|        . .  o oX|
|            . .**|
|             .++=|
+----[SHA256]-----+
[root@node2 ~]# cd ~/.ssh/
[root@node2 .ssh]# ssh-copy-id node1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'node1 (192.168.30.11)' can't be established.
ECDSA key fingerprint is SHA256:/V6z9w2ts2Ei8dgcKAlJCGozcmoeWNSNyctvHWjdoJk.
ECDSA key fingerprint is MD5:09:41:c7:ad:2b:65:77:6f:eb:af:77:be:8f:e3:1f:15.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node1's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'node1'"
and check to make sure that only the key(s) you wanted were added.

[root@node2 .ssh]# ssh node1

6. Start Cluster

[root@node1 hadoop]# start-dfs.sh

7. Verify:

6. ResourceManager HA Configuration

1. Modify mapred-site.xml

[root@node1 hadoop]# vi /opt/hadoop/etc/hadoop/mapred-site.xml
#Modify to the following
<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

2. Modify yarn-site.xml

[root@node1 hadoop]# vi /opt/hadoop/etc/hadoop/yarn-site.xml
#Modify to the following
<configuration>
<!-- Site specific YARN configuration properties -->
       <property>
               <name>yarn.nodemanager.aux-services</name>
               <value>mapreduce_shuffle</value>
       </property>
         <!-- Turn on log aggregation -->
       <property>
               <name>yarn.log-aggregation-enable</name>
               <value>true</value>
       </property>      
       <property>
               <!-- To configure yarn For High Availability -->
               <name>yarn.resourcemanager.ha.enabled</name>
               <value>true</value>
       </property>
       <property>
               <!-- Unique identity of cluster -->
               <name>yarn.resourcemanager.cluster-id</name>
               <value>cl uster1</value>
       </property>
       <property>
               <!-- ResourceManager ID -->
               <name>yarn.resourcemanager.ha.rm-ids</name>
               <value>rm1,rm2</value>
       </property>
       <property>
               <!-- Appoint ResourceManager Located Node -->
               <name>yarn.resourcemanager.hostname.rm1</name>
               <value>node1</value>
       </property>
       <property>
               <!-- Appoint ResourceManager Http Listening Node -->
               <name>yarn.resourcemanager.webapp.address.rm1</name>
               <value>node1:8088</value>
       </property>
        <property>
               <!-- Appoint ResourceManager Located Node -->
               <name>yarn.resourcemanager.hostname.rm2</name>
               <value>node2</value>
       </property>
       <property>
               <!-- Appoint ResourceManager Http Listening Node -->
               <name>yarn.resourcemanager.webapp.address.rm2</name>
               <value>node2:8088</value>
       </property>
       <property>
               <!-- Appoint zookeeper Located Node -->
               <name>yarn.resourcemanager.zk-address</name>
               <value>node2:2181,node3:2181,node4:2181</value>
       </property>
</configuration>

3. Distribute the modified configuration file to Noe2, Noe3, Noe4 hosts

[root@node1 hadoop]# cd /opt/hadoop/etc/hadoop/
[root@node1 hadoop]# scp hadoop-env.sh mapred-site.xml yarn-site.xml node2:/`pwd`
[root@node1 hadoop]# scp hadoop-env.sh mapred-site.xml yarn-site.xml node3:/`pwd`
[root@node1 hadoop]# scp hadoop-env.sh mapred-site.xml yarn-site.xml node4:/`pwd`

4. Start Services

  • Execute on node1

[root@node1 hadoop]# start-yarn.sh
  • Execute on node2

[root@node2 ~]# yarn-daemon.sh start resourcemanager
  • test



Finally, because the environment builds the technological basis for each person to consider, so each link will be separated and deployed so that you can understand the principles. If you are skilled in later stages, you can combine steps to configure, and if you have problems, you can leave a message or trust me privately.


Tags: Big Data Hadoop xml Zookeeper ssh

Posted on Thu, 07 May 2020 14:02:49 -0400 by jotate