HADOOP learning notes

rpm -qa | grep vim // See what packages vim commands are in
 appear vim-minimal-7.4.160-4.el7.x86_64
yum install -y vim* //Install vim related packages
rpm -qa | grep vim see

HADOOP learning notes

1, Install virtual machine (CentOS)

2, Modify host name (host name of current virtual machine)

1. View the current host name

Command: hostname

2. Modify host name

Command: vi /etc/hostname

(1) Be sure to be in vi's command mode (press esc key in edit mode)

wq Save exit

:wq!Force save exit

:q sign out

:q!forced return

(2) In edit mode: i the position of the current cursor.

\3. start-up--

Command: reboot -h now

Supplement: the second way to modify the host name

Hostnamectl set -hostname <Self written host name>

Bash

3, Modify ip address (ip address of current virtual machine)

Command:

vi /etc/sysconfig/network-scripts/ifcfg-ens33

1. Modify to obtain ip statically

BOOTPROTO="static"

2. Add IP address

IPADDR=192.168.1.100

3. Add gateway

GATEWAY=192.168.1.2

4. Add subnet mask

NETMASK=255.255.255.0

5. Add domain name parser

DNS1=192.168.1.2

4, Modify the mapping between ip address and host name

command: vi /etc/hosts

Add the corresponding ip and the corresponding host name

1, Modify the network configuration of the virtual machine

2, Modify the network configuration of windows

3, Switch of the firewall of the virtual machine (do not turn off if ping is enabled, but turn off if ping is disabled)

1. View firewall status

systemctl status firewalld

2. Turn off the firewall

systemctl stop firewalld

3. The firewall does not start after startup

systemctl disable firewalld

5, View the current ip address of the virtual machine

1. ifconfig -a
2. ip addr
 from windows go ping What about the virtual machine ip address
 can ping Pass, indicating that the configuration is successful.
Shutdown command: shutdown -h now

6, Open moba to create a new connection service

The following page appears

(1) If you failed to ping192.168.1.100 before, you can't open it
(2) If ping192.168.1.128 succeeds, the new session must be connected to 192.168.1.128
(3) If the virtual machine is not started, the following conditions also occur. Please enter R directly to refresh

We need to create two folders under / opt

(1) Software: put the compressed package of software
Command to create software folder: mkdir software
(2) module: put the unzipped folder of the software
(1) Switch to the software folder
cd  /opt/software
(2) Unzip the jdk into the module folder

command

 tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
(4) Configure jdk environment variables (caution ~ ~)
1. Enter vi /etc/profile
2.Shift+g to the last line
3. After esc --------: wq
4. Enter: source /etc/profile
5. Enter: java -version appears

It worked

(5) Configuring hadoop environment variables

(1) Enter vi /etc/profile
(2) Shift+g to the last line
(3) After that, esc --------: wq
(4) Input: source /etc/profile
(5) Input: hadoop version

1, Local deployment of Hadoop

Goal 1: count the number of occurrences of a word~

  1. First, there must be a file containing the content

Create a directory (folder) test under / opt

Command:

Create an input directory (folder) and an output directory (folder) under the / opt/test directory

Command:

Create a file containing the contents in the / opt/test/input directory. (actually editing text in a file)

Command:

  1. Use hadoop to execute this file
Switch to/opt/module/hadoop-3.1.3/share/hadoop/mapreduce Directory

Executive document: hadoop jar hadoop-mapreduce-examples-3.1.3.jar wordcount /opt/tesd /opt/test/output/count.txt
Command: cat part-r-00000

t/input/ /opt/test/output/count.txt

\3. View the results after execution

Command: c

2, Pseudo distributed deployment of Hadoop

$$
'* * * configure cluster environment

(1) Modify the first configuration

In / opt/module/hadoop-3.1.3/etc/hadoop directory

Set hadoop-env.sh file

Vi hadoop-env.sh

Input / search java in command mode_ HOME

export JAVA_HOME=/opt/module/jdk1.8.0_212

(2) Modify the second configuration

In / opt/module/hadoop-3.1.3/etc/hadoop directory

Set up the core-site.xml file

i core-site.xml
(3) Modify the third configuration

In / opt/module/hadoop-3.1.3/etc/hadoop directory

Set hdfs-site.xml file

Vi hdfs-site.xml

Command: vi hdfs-site.xml

Specify the number of HDFS in the configuration

<configuration>

<!-- appoint HDFS Number of copies -->

<property> 

​    <name>dfs.replication</name>

​    <value>1</value>

</property>

</configuration>

\2. Start the cluster

(1) format namenode

Command: Hdfs namenode -format

(2) start-up namenode

Command: hdfs --daemon start

HDFS is for storage and YARN is for scheduling.

1. Switch to etc under hadoop (all configuration files are under etc)

2. Configure core-site.xml in hadoop

Vi core-site.xml

appoint HDFS in namenode Address of the. Place the command in configuration In label

<configuration>

<!-- appoint HDFS in NameNode Address of -->

<property>

<name>fs.defaultFS</name>

  <value>hdfs://hadoop100:9820</value>

</property>

 

<!-- appoint Hadoop The storage directory where files are generated at run time -->

<property>

​    <name>hadoop.tmp.dir</name>

​    <value>/opt/module/hadoop-3.1.3/data/tmp</value>

</property>

</configuration

3. Configure hdfs-site.xml in hadoop

Command: vi hdfs-site.xml

stay configuration Specified in HDFS Number of

<configuration>

<!-- appoint HDFS Number of copies -->

<property>

​    <name>dfs.replication</name>

​    <value>1</value>

</property>

</configuration>

4. Format NameNode (format it at the first startup, and do not always format it later)

Format command: hdfs namenode –format

5. Start namenode

Command: hdfs  - -daemon  start  namenode

6. Start datanode

Command: hdfs  --daemon start datanode

7. Configure yarn-site.xml

Command: vi yarn-site.xml

 
<configuration>

 

<!-- Site specific YARN configuration properties -->

<!-- Reducer How to get data -->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<!-- appoint YARN of ResourceManager Address of -->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>hadoop100</value>

</property>

 <property>

 <name>yarn.nodemanager.env-whitelist</name>     <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

 </property>

</configuration>

8. Configure mapred-site.xml

Command:[root@hadoop100 hadoop]# vi mapred-site.xml

 

<configuration>

 </configuration>

9. Start the resourcemanager

Command:[root@hadoop100 hadoop]# yarn --daemon start resourcemanager

10. nodemanager

Command:[ root@hadoop100 hadoop]# yarn --daemon start nodemanager

11. Jsp view java process

Command: jsp

12. user/input\

Command: hdfs dfs -mkdir -p /user/input

13. Upload files to HDFS

Command: hdfs dfs – put the file name to be uploaded and the address to be uploaded

Case: HDFS DFS - wcinput / wc.input / user / input/

14. Check the file directory of hdfs

Command: HDFS - ls file path

Note that the root directory is not the root directory of linux

Case: hdfs dfs -ls /user/input/

15. View the file contents in hdfs

Command: hdfs dfs – cat file name

Case: hdfs dfs -cat /user/input/wc.input

16. Executive documents

Command: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount execution file location output file location

Case: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /user/input /user/output

  1. View the results after execution

Command: hdfs dfs -cat output file path/*

Case:

hdfs dfs -cat /user/output/

18. Stop the process hdfsdaemon stop namenode

Hdfs maintains an abstract directory * * * ''
$$

Hadoop is fully distributed

1. Namenode: stores the metadata of the file.

2. Datanode: store file fast data and block data in the local file system.

3. Secondary Namenode: Backup metadata of Namenode every once in a while.

hadoop100hadoop101hadoop102
HDFSNamenode DatanodeDatanodeSecondary Namenode
YARNnodemanagerResourceManger
nodemanager
nodemanager

Start:

1)start-up hdfs relevant
		hdfs --daemon start namenode
		hdfs --daemon start datanode
2)start-up yarn relevant
		yarn --daemon start resourcemanager
		yarn --daemon start nodemanage

YARN architecture

1)

(1) Main role of resource manager (RM)

(2) Processing client requests

(3) Start or monitor the ApplicationMaster

(4) Resource allocation and scheduling

1. Cluster configuration

Core profile

Configuration: hadoop-env.sh (in / opt/module/hadoop-3.1.3/etc/hadoop directory)

Get the installation path of JDK in Linux system:

[soft863@ hadoop100 ~]# echo $JAVA_HOME

/opt/module/jdk1.8.0_212

Modify JAVA_HOME path in hadoop-env.sh file: / add content lookup

export JAVA_HOME=/opt/module/jdk1.8.0_212

1. Configure core-site.xml namenode

cd $HADOOP_HOME/etc/hadoop

vim core-site.xml

The contents of the document are as follows:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<configuration>

  <property>

   <name>fs.defaultFS</name>

   <value>hdfs://hadoop1000:9820</value>

</property>

 

<!-- hadoop.data.dir Is a custom variable, which will be used in the following configuration file -->

  <property>

    <name>hadoop.data.dir</name>

    <value>/opt/module/hadoop-3.1.3/data</value>

  </property>

</configuration>

2.HDFS configuration file datanode

to configure hdfs-site.xml
vim hdfs-site.xml
 The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <!-- namenode Data storage location -->
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file://${hadoop.data.dir}/name</value>
 </property>

 <!-- datanode Data storage location -->
 <property>
  <name>dfs.datanode.data.dir</name>
  <value>file://${hadoop.data.dir}/data</value>
 </property>

 <!-- secondary namenode Data storage location -->
  <property>
  <name>dfs.namenode.checkpoint.dir</name>
<value>file://${hadoop.data.dir}/namesecondary</value>
 </property>

 <!-- datanode The restart timeout is 30 s,Resolve compatibility issues, skip -->
 <property>
  <name>dfs.client.datanode-restart.timeout</name>
  <value>30</value>
 </property>

 <!-- set up web End access namenode Address of -->
<property>
  <name>dfs.namenode.http-address</name>
  <value>hadoop1000:9870</value>
</property>

 <!-- set up web End access secondary namenode Address of -->
 <property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hadoop1002:9868</value>
 </property>
</configuration>

3.YARN configuration file

to configure yarn-site.xml
vim yarn-site.xml
 The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop101</value>
  </property>
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>

4.MapReduce configuration file

to configure mapred-site.xml
vim mapred-site.xml
 The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 
<configuration>
 <property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
 </property>
</configuration>

2. Cluster distribution

scp -r (recursive) (full copy)

rsync -av (differentiated copy)

hold/etc/hadoop/Copy directory to hadoop1001:

[root@hadoop1000 opt]# cd /opt

[root@hadoop1000 opt]#  scp -r hadoop/ root@hadoop1001:/opt/module/hadoop-3.1.3/etc/

hold/etc/hadoop/Copy directory to hadoop1002:

[root@hadoop1000 opt]# scp -r hadoop/ root@hadoop1002:/opt/module/hadoop-3.1.3/etc/

 
hold /etc/profile copy to hadoop100 hadoop101

[root@hadoop102 opt]# rsync -av /etc/profile hadoop101:/etc

[root@hadoop102 opt]# rsync -av /etc/profile hadoop100:/etc


stay hadoop100 and hadoop101 They should be carried out separately source /etc/profile

[root@hadoop100 opt]# source /etc/profile

[root@hadoop101 opt]# source /etc/profile

3. Distributed cluster formatting

The distributed cluster should be formatted before starting for the first time

Before formatting, delete the data directory and logs directory under the hadoop installation directory on the three servers

[root@hadoop1001 opt]# cd /opt/module/hadoop-3.1.3

[root@hadoop1001 opt]# rm -rf data

[root@hadoop1001 opt]# rm -rf logs

 

Perform formatting on the server on which the specified namenode is running:

(namenode specifies the running on Hadoop 100)

[root@hadoop1000 hadoop-3.1.3]# hdfs namenode –format

ssh password free login

1. Generate public and private keys at each node and copy them

Hadoop1000:

Generate public and private keys

[root@hadoop100] ssh-keygen -t rsa

Then hit (three returns)

Copy the public key to the target machine for password free login

[root@hadoop1000] ssh-copy-id hadoop1000

[root@hadoop1000] ssh-copy-id hadoop1001

[root@hadoop1000] ssh-copy-id hadoop1002

Hadoop101:

	Generate public and private keys

[root@hadoop1001] ssh-keygen -t rsa

Then hit (three returns)

Copy the public key to the target machine for password free login

[root@hadoop1001] ssh-copy-id hadoop1000

[root@hadoop1001] ssh-copy-id hadoop1001

[root@hadoop1001] ssh-copy-id hadoop1002

Hadoop102:

Generate public and private keys

[root@hadoop1002] ssh-keygen -t rsa

Then hit (three returns)

Copy the public key to the target machine for password free login

[root@hadoop1002] ssh-copy-id hadoop1000

[root@hadoop1002] ssh-copy-id hadoop1001

[root@hadoop1002] ssh-copy-id hadoop1002

Start the cluster with a script

1. Modify hadoop configuration file

Add a few lines of data at the top of the start-dfs.sh and stop-dfs.sh files on Hadoop 100

[root@hadoop100] cd /opt/module/hadoop-3.1.3/sbin

[root@hadoop100] vi start-dfs.sh

HDFS_DATANODE_USER=root

HADOOP_SECURE_DN_USER=hdfs

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

[root@hadoop100] vi stop-dfs.sh

HDFS_DATANODE_USER=root

HADOOP_SECURE_DN_USER=hdfs

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

Add a few lines of data at the top of the start-yarn.sh and stop-yarn.sh files

[root@hadoop100] vi start-yarn.sh

[root@hadoop100] vi stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

Modify workers on Hadoop 100:

[root@hadoop100] cd /opt/module/hadoop-3.1.3/etc/hadoop

[root@hadoop100] vi workers

hadoop100

hadoop101

hadoop102

Synchronize the above changes to Hadoop 101 and Hadoop 102:

[root@hadoop100] rsync -av /opt/module/hadoop-3.1.3/sbin/ hadoop101:/opt/module/hadoop-3.1.3/sbin/

 

[root@hadoop100] rsync -av /opt/module/hadoop-3.1.3/sbin/ hadoop102:/opt/module/hadoop-3.1.3/sbin/

 

[root@hadoop100] rsync -av /opt/module/hadoop-3.1.3/etc/hadoop/ hadoop101:/opt/module/hadoop-3.1.3/etc/hadoop/

 

[root@hadoop100] rsync -av /opt/module/hadoop-3.1.3/etc/hadoop/ hadoop102:/opt/module/hadoop-3.1.3/etc/hadoop/

Start stop cluster

Start the cluster:

If hadoop related programs have been started on the cluster, you can stop them first.

Execute the following script on Hadoop 100 to start hdfs:

[root@hadoop100] start-dfs.sh

Execute the following script on Hadoop 101 to start yarn:

[root@hadoop101] start-yarn.sh

Stop cluster:

Execute the following script on Hadoop 100 to stop hdfs:

[root@hadoop100] stop-dfs.sh

Execute the following script on Hadoop 101 to stop yarn:

[root@hadoop101] stop-yarn.sh

Hive's erection and installation

Mysql installation

1, Download mysql (also available in nailing group)

https://dev.mysql.com/downloads/mysql/5.7.html#downloads

[external chain picture transfer failed. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-svecy1gz-1638703515077)( file:///C: \Users\ADMINI~1\AppData\Local\Temp\ksohtml\wps79B7.tmp.jpg)]

2, Upload it to / opt/software under linux

3, Check whether Mysql has been installed in the current system

rpm -qa|grep mariadb

mariadb-libs-5.5.56-2.el7.x86_64 //If yes, uninstall with the following command

4, RPM - E -- nodeps mariadb LIBS / / uninstall mariadb with this command

5, Unzip it to / opt/module

The command is: tar -xf file to unzip - C location to unzip

6, Install the corresponding rpm file

If a problem is reported:

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7imrrab0-163870355077)( file:///C: \Users\ADMINI~1\AppData\Local\Temp\ksohtml\wps79C8.tmp.jpg)]

1. Get plug-in: yum install -y libaio

2. Execute commands (in order):

sudo rpm -ivh --nodeps mysql-community-common-5.7.36-1.el7.x86_64.rpm

sudo rpm -ivh --nodeps mysql-community-libs-5.7.36-1.el7.x86_64.rpm

sudo rpm -ivh --nodeps mysql-community-libs-compat-5.7.36-1.el7.x86_64.rpm

sudo rpm -ivh --nodeps mysql-community-client-5.7.36-1.el7.x86_64.rpm

sudo rpm -ivh --nodeps mysql-community-server-5.7.36-1.el7.x86_64.rpm

7, Switch to / etc

8, cat my.cnf

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-xyl4ntu7-1638703515078)( file:///C: \Users\ADMINI~1\AppData\Local\Temp\ksohtml\wps79C9.tmp.jpg)]

9, Switch to / var/lib/mysql and delete all files rm -rf*

10, Reset mysql: mysqld --initialize --user=mysql

11, View the generated random password: cat /var/log/mysqld.log

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jyjkbp8a-1638703515078)( file:///C: \Users\ADMINI~1\AppData\Local\Temp\ksohtml\wps79CA.tmp.jpg)]

10, Start MySQL service

systemctl start mysqld

Log in to MySQL

mysql -uroot -p

Enter password:  Enter the temporarily generated password

Login succeeded

The password of the root user must be modified first, otherwise an error will be reported when performing other operations

mysql> set password = password("New password");

11. Modify the root user in the user table under the mysql database to allow any ip connection

mysql> update mysql.user set host='%' where user='root';

mysql> flush privileges;

Hive installation

1. Download the installation package: apache-hive-3.1.2-bin.tar.gz

Upload to linux system / opt/software / path

2. Decompression software

cd /opt/software/
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/module/

3. Modify system environment variables

vim /etc/profile

Add content:

export HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/sbin:$HIVE_HOME/bin

Restart environment configuration:

source /etc/profile

4. Modify hive environment variable

cd  /opt/module/apache-hive-3.1.2-bin/bin/

Edit the hive-config.sh file

vi hive-config.sh

New content:

export JAVA_HOME=/opt/module/jdk1.8.0_212
export HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
export HADOOP_HOME=/opt/module/hadoop-3.2.0
export HIVE_CONF_DIR=/opt/module/apache-hive-3.1.2-bin/conf

5. Copy hive profile

cd  /opt/module/apache-hive-3.1.2-bin/conf/
cp hive-default.xml.template hive-site.xml

6. Modify Hive configuration file and find the corresponding location for modification

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.100:3306/hive?useUnicode=true&amp;characterEncoding=utf8&amp;useSSL=false&amp;serverTimezone=GMT</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
    <description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.</description>
  </property>
<property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
  </property>
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/opt/module/apache-hive-3.1.2-bin/tmp/${user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
<name>system:java.io.tmpdir</name>
<value>/opt/module/apache-hive-3.1.2-bin/iotmp</value>
<description/>
</property>

  <property>
    <name>hive.downloaded.resources.dir</name>
<value>/opt/module/apache-hive-3.1.2-bin/tmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
<property>
    <name>hive.querylog.location</name>
    <value>/opt/module/apache-hive-3.1.2-bin/tmp/${system:user.name}</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
<value>/opt/module/apache-hive-3.1.2-bin/tmp/${system:user.name}/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>
  <property>
    <name>hive.metastore.db.type</name>
    <value>mysql</value>
    <description>
      Expects one of [derby, oracle, mysql, mssql, postgres].
      Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
    </description>
  </property>
  <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
    <description>Whether to include the current database in the Hive prompt.</description>
  </property>
  <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
    <description>Whether to print the names of the columns in query output.</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/opt/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>

7. Upload mysql driver package to / opt/module/apache-hive-3.1.2-bin/lib /

Driver package: mysql-connector-java-8.0.15.zip. Extract the jar package from it

8. Make sure there is a database named hive in the mysql database

9. Initialize metabase

schematool -dbType mysql -initSchema

10. Make sure Hadoop starts

11. Start hive

hive

12. Check whether the startup is successful

show databases;

ZOOKEEPER!

Premise: turn off the firewall

1. Decompress

cd /opt/module/
tar -zxvf apache-zookeeper-3.5.5-bin.tar.gz

2. Create data files and catalog files

Create two folders data and log in the following directory of zookeeper

cd /opt/module/apache-zookeeper-3.5.5-bin/
  mkdir data
  mkdir log

3. Copy profile

cd /opt/module/apache-zookeeper-3.5.5-bin/conf/
cp zoo_sample.cfg zoo.cfg

Profile changes

vi zoo.cfg
The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial 

synchronization phase can take

initLimit=10

The number of ticks that can pass between 

sending a request and getting an acknowledgement

syncLimit=5

the directory where the snapshot is stored.

do not use /tmp for storage, /tmp here is just 

example sakes.

dataDir=/opt/module/apache-zookeeper-3.5.5-bin/data
dataLogDir=/opt/module/apache-zookeeper-3.5.5-bin/log

the port at which the clients will connect

clientPort=2181

the maximum number of client connections.

increase this if you need to handle more clients

#maxClientCnxns=60
#

Be sure to read the maintenance section of the 

administrator guide before turning on autopurge.

#

#http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

Purge task interval in hours

Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1
server.0=192.168.1.100:2888:3888
server.1=192.168.1.101:2888:3888
server.2=192.168.1.102:2888:3888

4. Create server myid

Create a myid file in the data directory. The value in the file can be given any value, but it should correspond to the above service server.x
cd /opt/module/apache-zookeeper-3.5.5-bin/data/
touch myid

5. Cluster copy

scp -r /opt/module/apache-zookeeper-3.5.5-bin root@hadoop101:/opt/module/apache-zookeeper-3.5.5-bin

scp -r /opt/module/apache-zookeeper-3.5.5-bin root@hadoop102:/opt/module/apache-zookeeper-3.5.5-bin

6. Cluster myid change

Enter each node and modify the myid value
Add cluster system environment variable: vi /etc/profile
export ZOOKEEPER_HOME=/opt/module/apache-zookeeper-3.5.5-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
Save the system environment variable: source /etc/profile

Turn off the cluster firewall

7. Cluster startup

Enter each node to start

cd /opt/module/apache-zookeeper-3.5.5-bin
zkServer.sh start
zkServer.sh status

zkCli connection verification

zkCli.sh -server hadoop1001:2181

hbase build!

ZZZZZ supporting installation:
Hadoop 3.1.3
Zookeeper3.5.7
Hbase2.2.0

1, File decompression

cd /opt/module/
tar -zxvf hbase-2.2.0-bin.tar.gz

2, Modify startup variable

System environment variable increase
vi /etc/profile
export HBASE_HOME=/opt/module/hbase-2.2.0
export PATH=$PATH:$HBASE_HOME/bin
Save the system environment variable: source /etc/profile
Modify hbase variable
cd /opt/module/hbase-2.2.0/conf/
vi hbase-env.sh
 Use the find command while viewing to change the configuration file
export JAVA_HOME=/opt/module/jdk1.8.0_212/
export HBASE_MANAGES_ZK=false

3, Configuration file

Configure hbase-site.xml file
 vi hbase-site.xml
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://hadoop1000:9820/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop1000</value>
</property>
    <property>
        <name>hbase.master.info.port</name>
        <value>60010</value>
    </property>
    <property>
        <name>hbase.master.maxclockskew</name>
        <value>180000</value>
        <description>Time difference of regionserver from master</description>
    </property>
    <property>
        <name>hbase.coprocessor.abortonerror</name>
        <value>false</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
</configuration>
Note that if you use an external zk, hbase.cluster.distributed needs to be set to true
Configuration file of regional servers: Hadoop 1000

4, Start

Start in sequence (hbase only needs the master node to start)
Zookeeper,Hadoop,Hbase

Hbase startup mode:

start-hbase.sh
Note: if HRegionServer is still not started, you can try the following statement
bin/hbase-daemon.sh start regionserver

5, Check

Web view: http://hadoop100:60010/master-status
Note: master web does not run by default. You need to configure the port in the configuration file

If Zookeeper cannot be started, check / usr/local/soft/hbase-2.2.0/logs/ log information
Consider deleting all hbase nodes in zk, and then restart to try

6, hbase shell usage

hbase shell

A table named myHbase is created. There is a column cluster named myCard in the table, and five version information is reserved
create 'myHbase',{NAME => 'myCard',VERSIONS => 5}
View list

All indication and column names need to be enclosed in quotation marks

1. View status:

status

2. View all tables:

list

3. Exit:

quit
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop1000</value>
hbase.master.info.port 60010 hbase.master.maxclockskew 180000 Time difference of regionserver from master hbase.coprocessor.abortonerror false hbase.unsafe.stream.capability.enforce false ```
Note that if you use an external zk, hbase.cluster.distributed needs to be set to true
Configuration file of regional servers: Hadoop 1000

4, Start

Start in sequence (hbase only needs the master node to start)
Zookeeper,Hadoop,Hbase

Hbase startup mode:

start-hbase.sh
Note: if HRegionServer is still not started, you can try the following statement
bin/hbase-daemon.sh start regionserver

5, Check

Web view: http://hadoop100:60010/master-status
Note: master web does not run by default. You need to configure the port in the configuration file

If Zookeeper cannot be started, check / usr/local/soft/hbase-2.2.0/logs/ log information
Consider deleting all hbase nodes in zk, and then restart to try

6, hbase shell usage

hbase shell

A table named myHbase is created. There is a column cluster named myCard in the table, and five version information is reserved
create 'myHbase',{NAME => 'myCard',VERSIONS => 5}
View list

All indication and column names need to be enclosed in quotation marks

1. View status:

status

2. View all tables:

list

3. Exit:

quit

Tags: Linux Big Data Hadoop

Posted on Sun, 05 Dec 2021 08:12:39 -0500 by Rizla