Learn sqoop II: the job task of sqoop exporting hadoop data to mysql and sqoop

1.sqoop exports hadoop data to mysql

Premise: before exporting data from Hadoop ecosystem to RDBMS database, the target table must exist in the target database.
There are three modes of export:

1.1 the default operation is to INSERT data from a file into a table using the INSERT statement.

  • Observe the hdfs data to be imported

    We can see that the separator is SOH, and the corresponding code table is "001"
  • Create the target table in the user library of mysql
use `user`
CREATE TABLE `user_info` (
  `uid` int(10) NOT NULL,
  `sex` varchar(2) NOT NULL,
  `career` varchar(3) NOT NULL,
  `education` varchar(3) NOT NULL,
  `marriage` varchar(2) NOT NULL,
  `usertype` varchar(3) NOT NULL,
  `birthday` varchar(10) NOT NULL,
  `province` varchar(3) NOT NULL,
  PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • Execute Import command
Be careful: Comma is used as the cutting character by default
bin/sqoop export \
--connect jdbc:mysql://node03:3306/user \
--username root \
--password 123456 \
--table user_info \
--input-fields-terminated-by '\001' \
--export-dir /user/hive/warehouse/test.db/user_info/*
  • Related configuration parameters
--input-fields-terminated-by '\t'  
	Specify separator in file
	
--columns 
	Select columns and control their ordering. When the order of the exported data file and the field column of the target table is exactly the same, it can not be written. Otherwise, select and arrange columns in comma intervals. Column names or fields that are not included after - columns either have default values or allow null values to be inserted. Otherwise, the database will refuse to accept the data exported by sqoop, resulting in sqoop job failure
	
--export-dir 
	To export a directory, you must specify this parameter when performing an export. At the same time, you need to have one of the -- table or -- call parameters. - table refers to the corresponding table in the export database,

--call
	Is a stored procedure of.
--input-null-string --input-null-non-string
	When importing, if there is null data in RDBMS table, after importing hdfs, 
			If it is a field of string type: null converted to "null" 
			If it is a field of numeric type: null to null
		When exporting, if -- input null string will convert string "null" to RDNMS empty value
						  If -- input null non string will convert null value to null value
		Solution: use when there is a null value in the table data
			--input-null-string "\\N" --input-null-non-string "\\N"

1.2 UPDATE mode: Sqoop will generate a statement that updates the existing records in the database.

bin/sqoop export \
--connect jdbc:mysql://node01:3306/user \
--username root \
--password 123456 \
--table user_info \
--export-dir /user/hive/warehouse/test.db/user_info/* \
--update-key id \
--update-mode updateonly

Common properties:
   --Update key: update ID, that is, update according to a field
   
   --Update mod: update mode 
   	Value: 
   		Update only: just update, no matter new
   		Allow insert: update is performed when there is an update, and add is performed when there is a new one
   	
Be careful:
   Only the modified data will be processed, and no operation will be performed for the new data
   Just update, no matter new

1.3 call mode: Sqoop will create a stored procedure call for each record.

 bin/sqoop export \
--connect jdbc:mysql://node01:3306/user \
--username root \
--password 123456 \
--table user_info \
--export-dir /user/hive/warehouse/test.db/user_info/* \
--update-key id \
--update-mode allowinsert

2. job tasks of sqoop

  • Here are the jobs to import mysql's test database into hive.
bin/sqoop job --create testjob --import-all-tables \
--connect jdbc:mysql://node03:3306/test \
--hive-overwrite \
--username root \
--password 123456 \
--hive-import \
--hive-database test;
  • Verify job
bin/sqoop job --list
  • Check job
bin/sqoop job --show testjob
  • Implementation of job
bin/sqoop job --exec testjob
  • Extend a secret free execution job
#The first step is to configure in sqoop-site.xml:
<property>
   <name>sqoop.metastore.client.record.password</name>
   <value>true</value>
   <description>If true, allow saved passwords in the metastore.</description>
</property>
#Step 2: put the password file specified by sqoop on HDFS, and the file permission must be 400
#Step 3: write the sqoop job to use the -- password file parameter. If it is -- password, a warning will appear
Published 7 original articles, won praise 1, visited 172
Private letter follow

Tags: MySQL Database hive JDBC

Posted on Fri, 13 Mar 2020 04:01:51 -0400 by eskimowned