Use and installation of sqoop

SQOOP installation and use SQOOP installation 1. Upload and unzip Installation package download address http://archive...
SQOOP installation and use

SQOOP installation

1. Upload and unzip

Installation package download address http://archive.apache.org/dist/sqoop/1.4.7/

tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/soft/
2. Modify folder name
mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
3. Modify profile
# Switch to the sqoop profile directory cd /usr/local/soft/sqoop-1.4.7/conf # Copy profile and rename cp sqoop-env-template.sh sqoop-env.sh # vim sqoop-env.sh edit the configuration file and add the following export HADOOP_COMMON_HOME=/usr/local/soft/hadoop-2.7.6 export HADOOP_MAPRED_HOME=/usr/local/soft/hadoop-2.7.6/share/hadoop/mapreduce export HBASE_HOME=/usr/local/soft/hbase-1.4.6 export HIVE_HOME=/usr/local/soft/hive-1.2.1 export ZOOCFGDIR=/usr/local/soft/zookeeper-3.4.6/conf export ZOOKEEPER_HOME=/usr/local/soft/zookeeper-3.4.6 # Switch to bin directory cd /usr/local/soft/sqoop-1.4.7/bin # VIM configure sqoop modifies the configuration file and comments out the useless content (just to remove the warning message)
4. Modify environment variables
vim /etc/profile # Add the directory of sqoop to the environment variable
5. Add MySQL connection driver
# Copy MySQL connection driver from HIVE to $SQOOP_HOME/lib cp /usr/local/soft/hive-1.2.1/lib/mysql-connector-java-5.1.49.jar /usr/local/soft/sqoop-1.4.7/lib/
6. Testing
# Print sqoop version sqoop version
# Test MySQL connectivity sqoop list-databases -connect jdbc:mysql://master:3306?useSSL=false -username root -password 123456

Prepare MySQL data

Log in to MySQL database
mysql -u root -p123456;
Create student database
create database student;
Switch databases and import data
# Execute in mysql shell use student; source /root/student.sql; source /root/score.sql;
Another way to import data
# Execute in linux shell mysql -u root -p123456 student</root/student.sql mysql -u root -p123456 student</root/score.sql
Running SQL files using Navicat

You can also import via Navicat

Export MySQL database
mysqldump -u root -p123456 Database name>Any file name.sql

import

Import HDFS, HIVE, HBASE from traditional relational database

MySQLToHDFS Write a script and save it as MySQLToHDFS.conf

In the mysql table, you need to create a score table in the test database

import --connect #Connect to JDBC MySQL jdbc:mysql://master:3306/student?useSSL=false --username root --password 123456 --table student #Two map tasks (the bottom layer of sqoop is based on MapReduce, but there is no reduce task) --m 2 #The map task is divided into two map tasks by age. If this is removed, there must be a primary key in the student table, and the map task is divided by primary key --split-by age #Data entry directory on HDFS (no manual creation required) --target-dir /sqoop/data/student1 #Split with comma --fields-terminated-by ','
Execute script
sqoop --options-file MySQLToHDFS.conf
matters needing attention:

1. – m means to specify how many Map tasks to generate. The more, the better, because MySQL Server has limited carrying capacity

2. When the specified number of map tasks is > 1, you need to specify the split key in combination with the -- split by parameter to determine which part of the data each map task reads. It is best to specify numerical columns and primary keys (or evenly distributed columns = > to avoid excessive differences in the amount of data processed by each map task)

3. If the specified split key data is unevenly distributed, it may cause data skew problems

4. It is best to specify numeric type for split keys, and the field types are numeric types such as int and bigint

5. When writing a script, note: for example, for the - username parameter, the parameter value cannot be on the same line as the parameter name

--username root // FALSE // It should be divided into two lines --username root

6. An error InterruptedException will be reported when running. You can ignore the problems inherent in Hadoop 2.7.6

21/01/25 14:32:32 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1252) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)

7. In fact, when sqoop reads mysql data, it uses JDBC, so when the amount of data is large, the efficiency is not very high

8. The bottom layer of sqoop imports and exports data through MapReduce. Only Map task is required, not Reduce task

9. Each Map task generates a file

MySQLToHive

First, export MySQL data and find a directory on HDFS for temporary storage. The default is: / user / user name / table name

Then load the data into Hive. After loading, the temporarily stored directory will be deleted

Write a script and save it as MySQLToHIVE.conf file

Process: first input the score table in the test library in mysql to HDFS, and then transfer the data from HDFS to a table of hive

In the mysql table, you need to create a score table (with data) in the test database

import --connect jdbc:mysql://master:3306/test?useSSL=false --username root --password 123456 #Existing table score in mysql test database --table score #Column partition --fields-terminated-by "\t" #Row Division --lines-terminated-by "\n" --m 3 --split-by student_id --hive-import #If run again, the data will be overwritten --hive-overwrite --create-hive-table #Create hive's database --hive-database testsqoop --hive-table score #When running again, if the table exists, you need to delete the table --delete-target-dir
Create testsqoop Library in Hive
hive> create database testsqoop;
Add HADOOP_CLASSPATH is added to the environment variable
vim /etc/profile # Add the following export HADOOP_CLASSPATH=$HADOOP_HOME/lib:$HIVE_HOME/lib/* # Reload environment variables source /etc/profile
Put hive-site.xml into SQOOP_HOME/conf/
cp /usr/local/soft/hive-1.2.1/conf/hive-site.xml /usr/local/soft/sqoop-1.4.7/conf/
Execute script
sqoop --options-file MySQLToHIVE.conf
– direct (just add it directly after the above file)

With this parameter, when exporting MySQL data, you can use the export tool mysqldump provided by Mysql to speed up the export speed and improve the efficiency

You need to distribute / usr/bin/mysqldump on the master to the / usr/bin directories of node1 and node2
Note: the mapreduce task is run under nodemanger of node1 and node2.

scp /usr/bin/mysqldump node1:/usr/bin/ scp /usr/bin/mysqldump node2:/usr/bin/
-e. use of parameters
import --connect jdbc:mysql://master:3306/test --username root --password 123456 --fields-terminated-by "\t" --lines-terminated-by "\n" --m 2 --split-by student_id --e "select * from score where student_id=1500100011 and $CONDITIONS" --target-dir #For the temporary directory on hdfs, the data is first transferred to the temporary directory on hdfs, and the data is called from above to hive /sqoop/data/score --hive-import --hive-overwrite --create-hive-table --hive-database testsqoop --hive-table score2
MySQLToHBase Write a script and save it as MySQLToHBase.conf
import --connect jdbc:mysql://master:3306/student?useSSL=false --username root --password 123456 --table student --hbase-table student --hbase-create-table --hbase-row-key id --m 1 --column-family cf1
Create student table in HBase
create 'student','cf1'
Execute script
sqoop --options-file MySQLToHBase.conf

export

HDFSToMySQL Write a script and save it as HDFSToMySQL.conf
export --connect jdbc:mysql://master:3306/student?useSSL=false --username root --password 123456 --table student -m 1 --columns id,name,age,gender,clazz --export-dir /sqoop/data/student1/ --fields-terminated-by ','
Clear the data in the MySQL student table first, otherwise it will cause primary key conflict Execute script
sqoop --options-file HDFSToMySQL.conf
View sqoop help
sqoop help 21/04/26 15:50:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command.
# View detailed help for import sqoop import --help

6 December 2021, 17:35 | Views: 3244

Add new comment

For adding a comment, please log in
or create account

0 comments