premise
1. Why write this operation document?
In the process of big data development, most of the data comes from kafka,Skilled operation kafka Command is necessary
2. What problems does this article solve?
Meet some daily work kafka Consumer orders
Production use
The requirements document will provide kafka directory, topic and kafak address
Here is my test
kafka address [server address, which will execute the kafka command in this directory by default]: / opt/app/kafka
topic : test1
kafka address [I configured hosts, and input IP without configuration]: node01:2181,node02:2181,node03:2181
Generally, in the server, these three types can meet the basic use of kafka. The version and information of kafka are provided by the upstream and are not our responsibility
1. [consume a sample data] is most commonly used
Example: consume a piece of kafka data [generally used to get a sample data and obtain the format of kafka data] for subsequent development
bin/kafka-console-consumer.sh --zookeeper 192.168.88.100:2181,192.168.88.101:2181,192.168.88.100:2181 --topic bigdata2301 --from-beginning --max-messages 1
2. [consumption stored in document] consume kafka data from scratch and write it into the file wsy.log, [> WSY. Log 'overwrite write, > > WSY. Log' append write]
bin/kafka-console-consumer.sh --zookeeper 192.168.88.100:2181,192.168.88.101:2181,192.168.88.100:2181 --topic bigdata2301 --from-beginning > wsy.log
3. [filter consumption] consume kafka and filter it line by line according to the specified field [filter the data of 20210817 days and 9 o'clock, SEND_DATE and SEND_TIME are the fields of kafka] [I also use IP as an example]
bin/kafka-console-consumer.sh --zookeeper 192.168.88.100:2181,192.168.88.101:2181,192.168.88.100:2181 --topic bigdata2301 --from-beginning |grep --line-buffered '"SEND_DATE":20210817'|grep --line-buffered '"SEND_TIME":9' >wsy.log
Development and use
1. [the program needs to be re consumed and used for replenishment] during the operation of our program, kafka data needs to be re consumed because the program needs to be added or changed.
Re consumption kafka Data: consumers have to consume a product from scratch topic Two conditions need to be met for the full amount of data( spring-kafka): (1)Use a brand new"group.id"(That is, it has not been used by any consumers before); (2)appoint"auto.offset.reset"The value of the parameter is earliest; //There are three configurations: early, late and none
Conclusion: 1,If there are already submitted offest Time,Regardless of setting to earliest perhaps latest From the submitted offest Start consumption at 2,If there are no submitted offest Time,earliest It means spending from scratch,latest Indicates consumption from the latest data,That is, the newly generated data. 3,none topic There are committed for each partition offset From submitted offest Start consumption at the beginning; As long as there is a partition that does not exist, the committed offset,An exception is thrown
Test use
Generally, you can test it yourself. Just change the topic and kafka address
1. Start kafka [if you need to start zookeeper]
bin/kafka-server-start.sh config/server.properties >>/dev/null 2>&1 &
2. Create topic
bin/kafka-topics.sh --create --topic test1 --partitions 3 --replication-factor 2 --zookeeper node01:2181,node02:2181,node03:2181
3. List all topic s
bin/kafka-topics.sh --list --zookeeper node01:2181,node02:2181,node03:2181
4. View topic [you can view some partition information]
bin/kafka-topics.sh --describe --topic test1 --zookeeper node01:2181,node02:2181,node03:2181
5. Delete topic
bin/kafka-topics.sh --delete --topic test1 --zookeeper node01:2181,node02:2181,node03:2181
6. Producer
bin/kafka-console-producer.sh --topic test1 --broker-list node01:9092,node02:9092,node03:9092
7. Consumer
bin/kafka-console-consumer.sh --topic test1 --bootstrap-server node01:9092,node02:9092,node03:9092 --from-beginning
Appendix I: zookeeper startup script
You need to configure inter cluster secret free login first
zkstart.sh
#!/bin/bash echo "start zkServer" for i in 01 02 03 do ssh node$i "source /etc/profile;zkServer.sh start" done
Execution process
[root@node01 bin]# ./zkstart.sh start zkServer JMX enabled by default Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED JMX enabled by default Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED JMX enabled by default Using config: /opt/app/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
Appendix II: kafka startup script
Change the deployment directory and ip address of kafka when using it
kafka-start.sh
#!/bin/sh for host in node01 node02 node03 do ssh $host "source /etc/profile;/opt/app/kafka/bin/kafka-server-start.sh /opt/app/kafka/config/server.properties >/dev/null 2>&1 &" echo "$host kafka is running" done
Execution process
[root@node01 bin]# ./kafka-start.sh node01 kafka is running node02 kafka is running node03 kafka is running