Hive is an essential tool in Hadop ecosystem.
It can map the structured data stored in HDES into a table in the database, and provides an SQL dialect to query it.
These SQL statements will eventually be translated into MapReduce program for execution. The essence of Hive is a framework generated to simplify users' writing MapReduce programs. It does not store and calculate data, and completely depends on HDFS and MapReduce.
Hive provides an SQL dialect called hive query language (HiveQ or HQL for short) to query the data stored in Hadoop clusters. Hive reduces the difficulty of transferring traditional data analysis system to Hadoop system. All developers who can use SQL language can easily learn and use hive. Without hive, these people must learn new languages and tools before they can be applied to the new production environment. However, hive is different from other SQL based environments (MySQL).
1. Hive features
Hive is an application based on Hadoop. Due to the design of Hadoop, hive cannot provide complete database functions. The biggest limitation is that hive does not support row level update, insert or delete operations. At the same time, because the start process of MapReduce task takes a long time, all hive queries are seriously delayed. Queries that can be completed at the second level in traditional databases often take longer to execute in hive, even if the data set is relatively small. Finally, hive does not support transactions.
2, Hive installationThe operation of Hive depends on Hadoop, so Hadoop needs to be installed before installing Hive.
Hive's basic installation configuration includes the following steps:
1. Check Hadoop environment
2. Install MySQL
3. Install Hive
4. Configure Hive
1. Check hadoop environment
(1) View Hadoop versionThe code is as follows:
hadoop version(2) Start process
The current directory is / home/hadoop
First switch the directory cd /usr/local/hadoop
The code is as follows:
cd /usr/local/hadoop
Start the process and view
The code is as follows:
./sbin/start-dfs.sh ./sbin/start-ysrn.sh jps
2. Install MySQL
(1) Install MySQLThe code is as follows:
sudo apt-get install mysal-server(2) View account and password
The code is as follows:
sudo cat /etc/mysql/debian.cnf(3) Log in to MySQL database with default account
The code is as follows:
mysql -u debian-sys-maint -p(4) Create Hive account
The code is as follows:
CREATE USER 'hive'@'%' IDENTIFIED BY '123456';(5) Grant Hive users permission to manipulate the database
The code is as follows:
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%'; FLUSH PRIVILEGES;(6) Exit MySQL database
The code is as follows:
exit
3. Install Hive
(1) Upload HIve to / home/hadoop (2) Unzip Hive into / usr/localThe code is as follows:
sudo tar -xvf apache-hive-2.3.7-bin.tar.gz -C /usr/local(3) Enter the / usr/local directory and rename the extracted directory hive
The code is as follows:
cd /usr/local sudo mv apache-hive-2.3.7-bin hive(4) Modify the owner of hive to hadoop
The code is as follows:
sudo chown -R hadoop hive
4. Configure Hive
(1) Enter hive profile directoryThe code is as follows:
cd /usr/local/hive/conf(2) Create hive-site.xml file configuration information
The code is as follows:
vim hive-site.xml
The configuration contents are as follows:
You need to create a tmp directory under the hive directory
<configuration> <property> <name>system:java.io.tmpdir</name> <value>/usr/local/hive/tmp</value> </property> <property> <name>system:user.name</name> <value>hadoop</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> </configuration>(3) Enter the dependency Library Directory of hive
The code is as follows:
cd /usr/local/hive/lib(4) Upload the mysql driver file to the lib directory (5) Enter the configuration file directory of hadoop software
The code is as follows:
cd /usr/local/hadoop/etc/hadoop(6) Edit the core-site.xml file
The code is as follows:
vim core-site.xml
The configuration contents are as follows:
<property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property>(7) Enter the Hadoop home directory to edit the environment variable file
The code is as follows:
cd ~ vim .bashrc(8) Add content to environment variables file
Add the following:
export HADOOP_HOME=/usr/local/hadoop export HIVE_HOME=/usr/local/hive export PATH=$/bin:$/bin:$/sbin:$PATH(9) Refresh environment variables
The code is as follows:
source.bashrc(10) Initialize Hive
The code is as follows:
schematool -dbType mysql -initSchema
Initialization succeeded
(11) Query Hive default database list to verify installationThe code is as follows:
hive -e 'show databases'
Installation succeeded
hive file:
https://pan.baidu.com/s/1iCjmb9hdhnnL1kI0VzaCxg