Reading this article requires a certain Linux foundation, which is not suitable for novices. It only provides ideas
System deployment
reference resources
Docker Download: https://docs.docker.com/desktop/windows/install/
Docker installation: https://www.runoob.com/docker/windows-docker-install.html
Deploying Centos7
Docker domestic source
Reference source
-
Ali image (recommended)
Enter the management console -- > products and services -- > elastic computing -- > container image service -- > Image accelerator, and copy your own exclusive accelerator address.
add to
# Linux vi /etc/docker/daemon.json # MAC Please go Preferences -> Daemon Place setting
Restart docker
# Linux systemctl restart docker
Mirror deployment
Pull
docker pull centos:centos7
Check local mirror
docker images
start-up
docker run -itd --name 「DIY Image Name」 centos:centos7 /bin/Bash
View startup information
docker ps
Management container
docker start/kill/stop/rm 「CONTAINER ID」
Enter container
docker exec -it 「CONTAINER ID」 /bin/Bash
Create a development environment image
Centos domestic source
reference resources
- Netease open source mirror station
- Tencent software source
- Open source software mirror station of Tsinghua University
- Open source software image of China University of science and technology
- Alicloud official image station
Installation of basic tools
vim
The best text editor for Linux
yum -y install vim
net-tools
Linux network management tools
yum -y install net-tools
openssh-clients/openssh-server
ssh client
yum -y install openssh-clients && yum -y install openssh-server
openssl
A software library package that ensures secure communication through SSL
yum -y install openssl
wget
Remote download tool
yum -y install wget
Development tool installation
Mysql installation
Please refer to my other article:
Section 5.1 of Hadoop high concurrency cluster and development environment deployment
Java installation
Please refer to my other article:
Section 5.2 of Hadoop high concurrency cluster and development environment deployment
Python 3 installation
Please refer to my other article:
Section 5.3 of Hadoop high concurrency cluster and development environment deployment
Scala installation
Please refer to my other article:
Section 5.4 of Hadoop high concurrency cluster and development environment deployment
Make image
Package image
docker commit -a "「Image Name」" -m "「Comment」" 「CONTAINER ID」 「Image Name」:v「version」
Start production cluster
Uniform caliber
Create network
docker network create --subnet=192.168.10.1/24 「Net Name」
Unified IP and Host
NameNode
docker run -itd --name nn \ --privileged=true -p 50070:50070 -p 8080:8080\ --hostname nn \ --net hadoop --ip 192.168.10.10 \ --add-host dn1:192.168.10.11 \ --add-host dn2:192.168.10.12 \ -d 「Image Name」((remember to bring the version number)\ /usr/sbin/init
If you need to open a port, add the - p parameter. The content is the port in the container: the port mapped to the local machine
DataNode 1
docker run -itd --name dn1 \ --privileged=true \ --hostname dn1 \ --net hadoop --ip 192.168.10.11 \ --add-host nn:192.168.10.10 \ --add-host dn2:192.168.10.12 \ -d 「Image Name」((remember to bring the version number)\ /usr/sbin/init
DataNode 2
docker run -itd --name dn2 \ --privileged=true \ --hostname dn2 \ --net hadoop --ip 192.168.10.12 \ --add-host dn1:192.168.10.11 \ --add-host nn:192.168.10.10 \ -d 「Image Name」((remember to bring the version number)\ /usr/sbin/init
SSH configuration
Please refer to my other article:
Section 6.2.2 of Hadoop high concurrency cluster and development environment deployment
Hadoop fully distributed deployment
prepare
Download address
wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
prepare directory
mkdir /usr/hadoop \ && mkdir /usr/hadoop/tmp \ && mkdir /usr/hadoop/hdfs/name \ && mkdir /usr/hadoop/hdfs/data
Please decompress the environment variables and configure them yourself
The variable name shall comply with HADOOP_HOME
Modify configuration
Address: $HADOOP_HOME/etc/hadoop/
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://nn:9000</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/hadoop/hdfs/data</value> </property> </configuration>
mapred-site.xml
Copy mapred-site.xml.template to mapred-site.xml
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.job.tracker</name> <value>http://nn:9001</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>nn</value> </property> </configuration>
masters
Address: $HADOOP_HOME/etc/hadoop/masters
Content:
nn
slaves
Address: $HADOOP_HOME/etc/hadoop/slaves
Modified content:
「datanode HOST 1」 「datanode HOST 2」 ............ 「datanode HOST n」
Hadoop-env.sh
Go to the file and add JAVA_HOME avoid situations where JDK is not recognized
synchronization
Use the scp command to synchronize all changed files
reference resources:
scp -r /usr/dt dn1:/usr/
test
Initialize HDFS
hadoop namenode -format
Start Hadoop
sh $HADOOP_HOME/sbin/start-all.sh
View cluster status
hadoop dfsadmin -report