Docker + Hadoop build fully distributed

Reading this article requires a certain Linux foundation, which is not suitable for novices. It only provides ideas

System deployment

reference resources

Docker Download: https://docs.docker.com/desktop/windows/install/

Docker installation: https://www.runoob.com/docker/windows-docker-install.html

Deploying Centos7

Docker domestic source

Reference source

add to

# Linux
vi /etc/docker/daemon.json

# MAC
 Please go Preferences -> Daemon Place setting

Restart docker

# Linux
systemctl restart docker

Mirror deployment

Pull

docker pull centos:centos7

Check local mirror

docker images

start-up

docker run -itd --name 「DIY Image Name」 centos:centos7 /bin/Bash

View startup information

docker ps

Management container

docker start/kill/stop/rm 「CONTAINER ID」

Enter container

docker exec -it 「CONTAINER ID」 /bin/Bash


Create a development environment image

Centos domestic source

reference resources

Installation of basic tools

vim

The best text editor for Linux

yum -y install vim

net-tools

Linux network management tools

yum -y install net-tools

openssh-clients/openssh-server

ssh client

yum -y install openssh-clients && yum -y install openssh-server

openssl

A software library package that ensures secure communication through SSL

yum -y install openssl

wget

Remote download tool

yum -y install wget

Development tool installation

Mysql installation

Please refer to my other article:

Section 5.1 of Hadoop high concurrency cluster and development environment deployment

Java installation

Please refer to my other article:

Section 5.2 of Hadoop high concurrency cluster and development environment deployment

Python 3 installation

Please refer to my other article:

Section 5.3 of Hadoop high concurrency cluster and development environment deployment

Scala installation

Please refer to my other article:

Section 5.4 of Hadoop high concurrency cluster and development environment deployment

Make image

Package image

docker commit -a "「Image Name」" -m "「Comment」"  「CONTAINER ID」 「Image Name」:v「version」


Start production cluster

Uniform caliber

Create network

docker network create --subnet=192.168.10.1/24 「Net Name」

Unified IP and Host

NameNode

docker run -itd --name nn \
--privileged=true -p 50070:50070 -p 8080:8080\
--hostname nn \
--net hadoop --ip 192.168.10.10 \
--add-host dn1:192.168.10.11 \
--add-host dn2:192.168.10.12 \
-d 「Image Name」((remember to bring the version number)\
/usr/sbin/init

If you need to open a port, add the - p parameter. The content is the port in the container: the port mapped to the local machine

DataNode 1

docker run -itd --name dn1 \
--privileged=true \
--hostname dn1 \
--net hadoop --ip 192.168.10.11 \
--add-host nn:192.168.10.10 \
--add-host dn2:192.168.10.12 \
-d 「Image Name」((remember to bring the version number)\
/usr/sbin/init

DataNode 2

docker run -itd --name dn2 \
--privileged=true \
--hostname dn2 \
--net hadoop --ip 192.168.10.12 \
--add-host dn1:192.168.10.11 \
--add-host nn:192.168.10.10 \
-d 「Image Name」((remember to bring the version number)\
/usr/sbin/init

SSH configuration

Please refer to my other article:

Section 6.2.2 of Hadoop high concurrency cluster and development environment deployment


Hadoop fully distributed deployment

prepare

Download address

wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz

prepare directory

mkdir /usr/hadoop \
&& mkdir /usr/hadoop/tmp \
&& mkdir /usr/hadoop/hdfs/name \
&& mkdir /usr/hadoop/hdfs/data

Please decompress the environment variables and configure them yourself

The variable name shall comply with HADOOP_HOME


Modify configuration

Address: $HADOOP_HOME/etc/hadoop/

core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://nn:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/hadoop/hdfs/data</value>
    </property>
</configuration>

mapred-site.xml

Copy mapred-site.xml.template to mapred-site.xml

cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
   <property>
      <name>mapred.job.tracker</name>
      <value>http://nn:9001</value>
  </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>nn</value>
    </property>
</configuration>

masters

Address: $HADOOP_HOME/etc/hadoop/masters

Content:

nn

slaves

Address: $HADOOP_HOME/etc/hadoop/slaves

Modified content:

「datanode HOST 1」

「datanode HOST 2」

............

「datanode HOST n」

Hadoop-env.sh

Go to the file and add JAVA_HOME avoid situations where JDK is not recognized


synchronization

Use the scp command to synchronize all changed files

reference resources:

scp -r /usr/dt dn1:/usr/

test

Initialize HDFS

hadoop namenode -format

Start Hadoop

sh $HADOOP_HOME/sbin/start-all.sh

View cluster status

hadoop dfsadmin -report

Tags: Big Data Docker Hadoop

Posted on Mon, 22 Nov 2021 22:48:50 -0500 by Al42