kafka production operation

kafka operation document


1. Why write this operation document?

In the process of big data development, most of the data comes from kafka,Skilled operation kafka Command is necessary

2. What problems does this article solve?

Meet some daily work kafka Consumer orders

Production use

The requirements document will provide kafka directory, topic and kafak address

Here is my test

kafka address [server address, which will execute the kafka command in this directory by default]: / opt/app/kafka

topic : test1

kafka address [I configured hosts, and input IP without configuration]: node01:2181,node02:2181,node03:2181

Generally, in the server, these three types can meet the basic use of kafka. The version and information of kafka are provided by the upstream and are not our responsibility

1. [consume a sample data] is most commonly used

Example: consume a piece of kafka data [generally used to get a sample data and obtain the format of kafka data] for subsequent development

bin/kafka-console-consumer.sh --zookeeper,, --topic bigdata2301 --from-beginning --max-messages 1

2. [consumption stored in document] consume kafka data from scratch and write it into the file wsy.log, [> WSY. Log 'overwrite write, > > WSY. Log' append write]

bin/kafka-console-consumer.sh --zookeeper,, --topic bigdata2301 --from-beginning > wsy.log

3. [filter consumption] consume kafka and filter it line by line according to the specified field [filter the data of 20210817 days and 9 o'clock, SEND_DATE and SEND_TIME are the fields of kafka] [I also use IP as an example]

bin/kafka-console-consumer.sh --zookeeper,, --topic bigdata2301 --from-beginning |grep --line-buffered '"SEND_DATE":20210817'|grep --line-buffered '"SEND_TIME":9' >wsy.log

Development and use

1. [the program needs to be re consumed and used for replenishment] during the operation of our program, kafka data needs to be re consumed because the program needs to be added or changed.

Re consumption kafka Data: consumers have to consume a product from scratch topic Two conditions need to be met for the full amount of data( spring-kafka): 
(1)Use a brand new"group.id"(That is, it has not been used by any consumers before);
(2)appoint"auto.offset.reset"The value of the parameter is earliest;  //There are three configurations: early, late and none
1,If there are already submitted offest Time,Regardless of setting to earliest perhaps latest From the submitted offest Start consumption at
2,If there are no submitted offest Time,earliest It means spending from scratch,latest Indicates consumption from the latest data,That is, the newly generated data.
3,none topic There are committed for each partition offset From submitted offest Start consumption at the beginning; As long as there is a partition that does not exist, the committed offset,An exception is thrown

Just know this: if there is no submitted offer, early means to consume from the beginning, and late means to consume from the latest data, that is, the newly generated data

Test use

Generally, you can test it yourself. Just change the topic and kafka address

1. Start kafka [if you need to start zookeeper]

bin/kafka-server-start.sh config/server.properties >>/dev/null 2>&1 &

2. Create topic

bin/kafka-topics.sh  --create --topic test1 --partitions 3 --replication-factor 2 --zookeeper node01:2181,node02:2181,node03:2181

3. List all topic s

bin/kafka-topics.sh  --list --zookeeper node01:2181,node02:2181,node03:2181

4. View topic [you can view some partition information]

bin/kafka-topics.sh  --describe --topic test1 --zookeeper node01:2181,node02:2181,node03:2181

5. Delete topic

bin/kafka-topics.sh  --delete --topic test1 --zookeeper node01:2181,node02:2181,node03:2181

6. Producer

bin/kafka-console-producer.sh --topic test1 --broker-list  node01:9092,node02:9092,node03:9092

7. Consumer

bin/kafka-console-consumer.sh --topic test1 --bootstrap-server node01:9092,node02:9092,node03:9092 --from-beginning

Appendix I: zookeeper startup script

You need to configure inter cluster secret free login first


echo "start zkServer"
for i in 01 02 03
ssh node$i "source /etc/profile;zkServer.sh start"

Execution process

[root@node01 bin]# ./zkstart.sh 
start zkServer
JMX enabled by default
Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
JMX enabled by default
Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
JMX enabled by default
Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Appendix II: kafka startup script

Change the deployment directory and ip address of kafka when using it


for host in node01 node02 node03
        ssh $host "source /etc/profile;/opt/app/kafka/bin/kafka-server-start.sh /opt/app/kafka/config/server.properties >/dev/null 2>&1 &"
        echo "$host kafka is running"

Execution process

[root@node01 bin]# ./kafka-start.sh 
node01 kafka is running
node02 kafka is running
node03 kafka is running

Posted on Wed, 03 Nov 2021 18:05:30 -0400 by Senate