The integration of Flink SQL client 1.10 and hive to read real-time data

Looking forward to processing streaming data in the form of pure SQL, flink 1.10 introduced Hive integration, which is available in production, and has stronger streaming SQL processing capacity. Let's try it this time~~

 

 

[outline]

1. Environmental preparation

2. SQL Client and hive integrated configuration

3. Reading kafka data with SQL Client

1. Environmental preparation

Relevant software version:

linux version: centos 6.5

Java version: jdk1.8

Hive version: hive-2.3.4

Hadoop version: hadoop-2.7.3

flink: flink-1.10.0

scala:scala-2.11

kafka:kafka_2.11-2.3.0

Before the installation of java, hive and hadoop, I wrote: Hive source code series (I) hive2.1.1 + Hadoop 2.7.3 environment building

Now prepare the flink,scala,kafka environment

1.1 scala installation

Download scala-2.11.12.tgz

tar -zxvf  scala-2.11.12.tgz ##decompression scalaln -s flink-1.10.0 flink ##Soft linkvim /etc/profile ##Setting environment variables

 

source /etc/profile ##Take effect

Test:

 

1.2 kafka installation

Download Kafka? 2.11-2.3.0.tgz

tar -zxvf kafka_2.11-2.3.0.tgz  ##decompressionkafkaln -s kafka_2.11-2.3.0 kafka ##Soft linkvim /etc/profile ##Setting environment variables

source /etc/profile ##Take effect

 

Start kafka service:

zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties &kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Create the topic of the test (flinktest):

kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic flinktestkafka-topics.sh --list --bootstrap-server localhost:9092 ##View createdtopic

 

 

Start the producer and consumer tests respectively:

kafka-console-producer.sh --broker-list localhost:9092 --topic flinktest  ##Producerkafka-console-consumer.sh --bootstrap-server localhost:9092 --topic flinktest --from-beginning ##Consumer

 

 

As shown in the following figure, there is no problem with the production data:

Above, the real-time data source is ready for later testing

 

1.3 flink installation

Download flink-1.10.0-bin-scala Gu 2.11.tgz

tar -zxvf  flink-1.10.0-bin-scala_2.11.tgz ##decompression flinkln -s flink-1.10.0 flink ##Soft linkvim /etc/profile ##Setting environment variables

 

 

source /etc/profile ##Take effect

 

To configure the flink – Standalone mode:

## flink-conf.yaml File configurationvim $FLINK_HOME/conf/flink-conf.yaml  ##Configure the ip of the primary node

 

 

##Slavesvim $Flink? Home / conf / slaves? Configure slave node ip? Write dataming

The above single model of flink has been configured

 

2. SQL Client and hive integrated configuration

2.1 preparation of yaml file

cp $FLINK_HOME/conf/sql-client-defaults.yaml sql-client-hive.yamlvim $FLINK_HOME/conf/sql-client-hive.yaml

 

 

2.2 add dependency package

This one has the most problems

Dependent on hive related packages:

$HIVE_HOME/lib/hive-exec-2.3.4.jar$HIVE_HOME/lib/hive-common-2.3.4.jar$HIVE_HOME/lib/hive-metastore-2.3.4.jar$HIVE_HOME/lib/hive-shims-common-2.3.4.jar$HIVE_HOME/lib/antlr-runtime-3.5.2.jar$HIVE_HOME/lib/datanucleus-api-jdo-4.2.4.jar$HIVE_HOME/lib/datanucleus-core-4.1.17.jar$HIVE_HOME/lib/datanucleus-rdbms-4.1.19.jar$HIVE_HOME/lib/javax.jdo-3.2.0-m3.jar$HIVE_HOME/lib/libfb303-0.9.3.jar$HIVE_HOME/lib/jackson-core-2.6.5.jar

Other packages:

commons-cli-1.3.1.jarflink-connector-hive_2.11-1.10.0.jarflink-hadoop-compatibility_2.11-1.10.0.jarflink-shaded-hadoop2-uber-blink-3.2.4.jarflink-table-api-java-bridge_2.11-1.10.0.jarmysql-connector-java-5.1.9.jar

Put the above jar s in the directory $FLINK_HOME/lib

2.3 boot

start-cluster.sh

 

3. Reading kafka data with SQL Client

3.1 start sql client

sql-client.sh embedded -d conf/sql-client-hive.yaml

3.2 create table

CREATE TABLE mykafka (name String, age Int) WITH (   'connector.type' = 'kafka',   'connector.version' = 'universal',   'connector.topic' = 'flinktest',   'connector.properties.zookeeper.connect' = 'localhost:2181',   'connector.properties.bootstrap.servers' = 'localhost:9092',   'format.type' = 'csv',   'update-mode' = 'append');

 

At this time, you can also see the newly created table with the flink sql client in hive:

 

3.3 write data

At this time, write several pieces of data with kafka production end, and you can find out from the flink end:

 

 

 

In this way, real-time data can be operated through SQL Client, a pure SQL way

 

The future of SQL Client

 

If you think it's useful, please pay attention to it. Welcome to watch it~~

  

Tags: Big Data SQL hive kafka Scala

Posted on Wed, 04 Mar 2020 01:51:22 -0500 by xtheonex