Building a Fully Distributed Large Data Platform Based on EnOS 7

Article Directory

1. Environment

Windows 10 Machine One
Putty 64-bit (including SSH, FTP tools)
Cloud Server (three nodes)

2. Network

Host name and IP map as follows

master.novalocal		192.168.72.126
slave1.novalocal		192.168.72.127
slave2.novalocal		192.168.72.128

3. Goals

Distributed Hadoop, Single Node Hive

4. List of installation packages

Note: Comments contain'(required)'compression packages that are necessary to build the environment and do not contain the necessary optional

# Hadoop installation package (required)
hadoop-2.7.4.tar.gz

# Hive installation package (required)
apache-hive-2.1.1-bin.tar.gz

# mysql installation package (required)
mysql57-community-release-el7-8.noarch.rpm

# mysql JDBC (required)
mysql-connector-java-5.0.4-bin.jar

# JDK (required)
jdk-8u151-linux-x64.tar.gz

# Ali Cloud Open Source Mirror
Centos-7.repo

# yum acceleration
axel-2.4.tar.gz

# Configure files needed for yum acceleration
axelget.conf

# Configure files needed for yum acceleration
axelget.py

5. Preparations

Note: Only execute on master node from step 5 to 6.4

5.1 Transport Installation Pack

Transfer all installation packages from the list of installation packages to the master server through the FTP tool under Putty
Place a diagram after all the packages have been imported

5.2 Configure yum source

# Back up the original yum source
mv /etc/yum.repos.d/CentOS-Base.repo CentOS-Base.repo.cp

# Configure Ali Cloud Mirror Source
cp Centos-7.repo /etc/yum.repos.d/

# Generate a new yum cache
yum makecache

5.3 Install GCC GCC-C++.

yum install gcc gcc-c++

5.4 yum acceleration

Reference article: Download speed optimization for Linux software

5.5 Host Name to IP Mapping

Configure IP to hostname mapping

# Host name and IP mapping profile
vi /etc/hosts

The configuration results are as follows

6. Installation

6.1 Unzip the installation package

# Unzip the JDK installation package
tar -zxvf jdk-8u151-linux-x64.tar.gz -C /usr/local/

# Unzip the Hadoop installation package
tar -zxvf hadoop-2.7.4.tar.gz -C /usr/local/

# Unzip the Hive installation package
tar -zxvf apache-hive-2.1.1-bin.tar.gz -C /usr/local/

6.2 Install MySQL

Reference article: Common software management for Linux

6.2.1 Configuring MySQL

# Open MySQL Service
systemctl start  mysqld.service

# View initial password
grep 'temporary password' /var/log/mysqld.log

# Log on with initial password
mysql -u root -p

# Modify root user password, PASSWORD specifies password
ALTER USER 'root'@'localhost' IDENTIFIED BY 'PASSWORD';

Note: When modifying the password, note the password security checking plug-in corresponding to the MySQL version

6.2.2 MySQL Open Remote Connection

Reference article: Tips - MySQL

6.3 Configure Hadoop

Hadoop profile directory: HADOOP_DIR/etc/hadoop/
core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master.novalocal:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop-2.7.4/tmp</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop-2.7.4/tmp/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop-2.7.4/tmp/dfs/data</value>
    </property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>master.novalocal:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master.novalocal:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master.novalocal:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master.novalocal:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master.novalocal:8088</value>
  </property>
</configuration>

mapred-ste.xml

# Generate this profile
cp mapred-site.xml.template mapred-site.xml
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>master.novalocal:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master.novalocal:19888</value>
  </property>
</configuration>

slaves

slave1.novalocal
slave2.novalocal

hadoop-env.sh

# Add the following to the file
export JAVA_HOME=/usr/local/jdk1.8.0_151

6.4 Configuring environment variables

# /etc/profile as configuration file
# Add the following
export JAVA_HOME=/usr/local/jdk1.8.0_151
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop-2.7.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# Make the configuration file take effect immediately
source /etc/profile

6.5 Password-free login

#Each node executes

cd ~/.ssh/

# Generate Key
ssh-keygen
# Execute on slave1 and slave2 nodes, respectively
scp ~/.ssh/id_rsa.pub root@192.168.72.126:~/id_rsa.pub.1
scp ~/.ssh/id_rsa.pub root@192.168.72.126:~/id_rsa.pub.2
# Execute on master node
cat ~/id_rsa.pub.1 >> .ssh/authorized_keys
cat ~/id_rsa.pub.2 >> .ssh/authorized_keys
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

scp ~/.ssh/authorized_keys root@slave1.novalocal:~/.ssh/
scp ~/.ssh/authorized_keys root@slave2.novalocal:~/.ssh/

Synchronize files between 6.6 nodes

There are the following files that need to be synchronized:

  1. /etc/hosts
  2. /etc/profile
  3. /usr/local/jdk1.8.0_151
  4. /usr/local/hadoop-2.7.4
# Execute the following command at the master node
# Copy hostname IP mapping file
scp /etc/hosts root@slave1.novalocal:/etc/
scp /etc/hosts root@slave2.novalocal:/etc/

# Copy Environment Variable Profile
scp /etc/profile root@slave1.novalocal:/etc/
scp /etc/profile root@slave2.novalocal:/etc/

# Copy JDK
scp -r /usr/local/jdk1.8.0_151/ root@slave1.novalocal:/usr/local/
scp -r /usr/local/jdk1.8.0_151/ root@slave2.novalocal:/usr/local/

# Copy Hadoop
scp -r /usr/local/hadoop-2.7.4/ root@slave1.novalocal:/usr/local/
scp -r /usr/local/hadoop-2.7.4/ root@slave2.novalocal:/usr/local/
# Execute the following command on two slave nodes
source /etc/profile

6.7 Format Hadoop Cluster

# Execute the following command on the master node
hdfs namenode -format

6.8 Start Hadoop Cluster and Test

# Execute the following command on the master node
start-all.sh

# Execute jps on three nodes to view process status
jps

master

slave1

slave2

Open the Web page to visit http://192.168.72.126:50070

The number of DataNode survivors is consistent with the actual situation, and the Hadoop cluster building is over

6.9 Install Hive

Reference article: Building a pseudo-distributed large data environment based on entos 7

Reference Web Site

Hadoop Installation Tutorial_Single Machine/Pseudo Distributed Configuration
Fully Distributed Installation

Thirteen original articles were published. 6. Visits 1258
Private letter follow

Tags: Hadoop MySQL ssh yum

Posted on Mon, 16 Mar 2020 21:54:08 -0400 by felipeebs