The integration of Flink SQL client 1.10 and hive to read real-time data

Looking forward to processing streaming data in the form of pure SQL, flink 1.10 introduced Hive integration, which is available in production, and has stronger streaming SQL processing capacity. Let's try it this time~~




1. Environmental preparation

2. SQL Client and hive integrated configuration

3. Reading kafka data with SQL Client

1. Environmental preparation

Relevant software version:

linux version: centos 6.5

Java version: jdk1.8

Hive version: hive-2.3.4

Hadoop version: hadoop-2.7.3

flink: flink-1.10.0



Before the installation of java, hive and hadoop, I wrote: Hive source code series (I) hive2.1.1 + Hadoop 2.7.3 environment building

Now prepare the flink,scala,kafka environment

1.1 scala installation

Download scala-2.11.12.tgz

tar -zxvf  scala-2.11.12.tgz ##decompression scalaln -s flink-1.10.0 flink ##Soft linkvim /etc/profile ##Setting environment variables


source /etc/profile ##Take effect



1.2 kafka installation

Download Kafka? 2.11-2.3.0.tgz

tar -zxvf kafka_2.11-2.3.0.tgz  ##decompressionkafkaln -s kafka_2.11-2.3.0 kafka ##Soft linkvim /etc/profile ##Setting environment variables

source /etc/profile ##Take effect


Start kafka service: $KAFKA_HOME/config/ & $KAFKA_HOME/config/ &

Create the topic of the test (flinktest): --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic --list --bootstrap-server localhost:9092 ##View createdtopic



Start the producer and consumer tests respectively: --broker-list localhost:9092 --topic flinktest --bootstrap-server localhost:9092 --topic flinktest --from-beginning ##Consumer



As shown in the following figure, there is no problem with the production data:

Above, the real-time data source is ready for later testing


1.3 flink installation

Download flink-1.10.0-bin-scala Gu 2.11.tgz

tar -zxvf  flink-1.10.0-bin-scala_2.11.tgz ##decompression flinkln -s flink-1.10.0 flink ##Soft linkvim /etc/profile ##Setting environment variables



source /etc/profile ##Take effect


To configure the flink – Standalone mode:

## flink-conf.yaml File configurationvim $FLINK_HOME/conf/flink-conf.yaml  ##Configure the ip of the primary node



##Slavesvim $Flink? Home / conf / slaves? Configure slave node ip? Write dataming

The above single model of flink has been configured


2. SQL Client and hive integrated configuration

2.1 preparation of yaml file

cp $FLINK_HOME/conf/sql-client-defaults.yaml sql-client-hive.yamlvim $FLINK_HOME/conf/sql-client-hive.yaml



2.2 add dependency package

This one has the most problems

Dependent on hive related packages:


Other packages:


Put the above jar s in the directory $FLINK_HOME/lib

2.3 boot


3. Reading kafka data with SQL Client

3.1 start sql client embedded -d conf/sql-client-hive.yaml

3.2 create table

CREATE TABLE mykafka (name String, age Int) WITH (   'connector.type' = 'kafka',   'connector.version' = 'universal',   'connector.topic' = 'flinktest',   '' = 'localhost:2181',   '' = 'localhost:9092',   'format.type' = 'csv',   'update-mode' = 'append');


At this time, you can also see the newly created table with the flink sql client in hive:


3.3 write data

At this time, write several pieces of data with kafka production end, and you can find out from the flink end:




In this way, real-time data can be operated through SQL Client, a pure SQL way


The future of SQL Client


If you think it's useful, please pay attention to it. Welcome to watch it~~


Tags: Big Data SQL hive kafka Scala

Posted on Wed, 04 Mar 2020 01:51:22 -0500 by xtheonex