Sqoop of big data technology

Sqoop of big data technology

Chapter 1 Introduction to Sqoop

Sqoop is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from HDFS into relational database.
The Sqoop project started in 2009. It first existed as a third-party module of Hadoop. Later, in order to enable users to deploy quickly and developers to develop iteratively, Sqoop became an Apache project independently.
The latest version of Sqoop2 is 1.99.7. Note that 2 is not compatible with 1 and has incomplete characteristics and is not intended for production deployment.

Chapter 2 Sqoop principle

Translate the import or export command into mapreduce program.
In the translated mapreduce, input format and output format are mainly customized.

Chapter 3 Sqoop installation (construction)

The premise of installing Sqoop is that you already have Java and Hadoop environments.
3.1 download and unzip the Sqoop package

mkdir /usr/local/sqoop
cd /usr/local/sqoop
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
rm -rf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

3.2 modify the configuration file (under conf)

  1. rename profile
cd /usr/local/sqoop/sqoop-1.4.7.bin__hadoop-2.6.0/conf/
cp sqoop-env-template.sh sqoop-env.sh
  1. Modify profile
    sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.9.2
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.9.2
export HIVE_HOME=/usr/local/hive
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.10
export ZOOCFGDIR=/usr/local/zookeeper/zookeeper-3.4.10
export HBASE_HOME=/usr/local/hbase

3.3 upload rack package
Upload a mysql driver jar package to lib in the installation directory of sqoop
mysql-connector-java-5.1.39.jar

cd ../lib/

3.4 verify startup (in bin directory)

cd ../bin/
./sqoop-version
./sqoop-list-databases --connect jdbc:mysql://localhost:3306 --username root --password 123456
./sqoop-list-tables --connect jdbc:mysql://localhost:3306 --username root --password 123456

Chapter 4 simple use case of Sqoop

4.1 import data
In Sqoop, the concept of "import" refers to the transfer of data from non big data cluster (RDBMS) to big data cluster (HDFS, HIVE, HBASE). It is called "import", that is, using the import keyword.
4.1.1 RDBMS to HDFS

  1. Make sure the Mysql service is turned on normally
  2. Create a new table in Mysql and insert some data
    3) Import data
    (1) Import all (import emp table in MySql to HDFS)
./sqoop import \
> --connect jdbc:mysql://localhost:3306/test \
> --username root \
> --password 123456 \
> --table emp \
> --m 1

query

hadoop fs -cat /user/root/emp/part-m-00000

(1) Import all (import emp table in MySql to the specified directory of HDFS)

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp \
--delete-target-dir \
--fields-terminated-by '\001' \
--table emp \
--m 1 

query

hadoop fs -cat /emp/part-m-00000


(2) Query import

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp3 \
--fields-terminated-by '\001' \
--delete-target-dir \
--m 1 \
--query 'select id,name from emp where salary <= 30000 and $CONDITIONS;'


query

hadoop fs -cat /emp3/part-m-00000

(3) Import specified columns

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp1 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--columns id,name


query

hadoop fs -cat /emp1/part-m-00000

Tip: if multiple columns are involved in columns, separate them with commas. Do not add spaces when separating them

(4) Filtering query import data using the snoop keyword

./sqoop import \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--target-dir /emp2 \
--fields-terminated-by '\001' \
--delete-target-dir \
--table emp \
--m 1 \
--where "id = 1203"

query

hadoop fs -cat /emp2/part-m-00000

4.2 export data
In Sqoop, the concept of "export" refers to the transfer of data from the big data cluster (HDFS, HIVE, HBASE) to the non big data cluster (RDBMS). It is called "export", that is, using the export keyword.
4.2.1 HIVE/HDFS to RDBMS

./sqoop export \
--connect jdbc:mysql://localhost:3306/test \
--username root \
--password 123456 \
--table dept \
--export-dir /user/hive/warehouse/dept/dept.txt \
--m 1 \
--input-fields-terminated-by "\t"

Tip: if the table does not exist in Mysql, it will not be created automatically

Published 11 original articles, won praise 2, visited 1752
Private letter follow

Tags: MySQL Hadoop JDBC hive

Posted on Wed, 12 Feb 2020 23:54:15 -0500 by vickie