Detailed explanation of Kafka core concepts_ Step on the blog - CSDN blog
preface
This article inherits the core concepts of topic and kafka, including cluster fragmentation, persistent message storage mechanism and leader election mechanism; This article continues to analyze topic, message storage mechanism, and production produce rs and consumers Detailed explanation of the principle of Consumer, and transaction mode
Message storage mechanism
message number

Consumer offset, Serial number offset. Every consumer has a serial number and offset when consuming. There is no need to confirm the message, because there is a serial number to judge where the consumer has visited.
TimeIndex time index
- The message should have a timestamp
-
Time indexing .timeindex data structure { Timestamp, sequence number }

You can configure the creation time or log writing time. The default is the message creation time

How to find the offset of segmentation in consumers is through the following methods.
The returned partition and the start number of the partition Finally, you can get the services you can consume according to this
Configuration parameters of Topic
Official document parameter configuration: Apache Kafka
Parameter modification
> bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic -- partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
You can also change or set the override value later by using the alter configs command. This example resets my topic
> bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity- name my-topic --alter --add-config max.message.bytes=128000
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity- name my-topic --describe
> bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics -- entity-name my-topic --alter --delete-config max.message.bytes
Delete Topic
[root@node4 latest]# bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test-1
# Send a message [root@node4 latest]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-1
[root@node4 latest]# bin/kafka-topics.sh --delete --bootstrap-server localhost:9092 --topic test-1
drwxr-xr-x. 2 root root 141 8 June 26-20:49 test-1- 0.20bb2388c6f04c34b79419411bc1bda6-delete
Producer
public static void main(String[] args) throws Exception { // Description of the configuration parameters of Producer: http://kafka.apachecn.org/documentation.html#configuration Properties props = new Properties(); props.put("bootstrap.servers", "192.168.120.41:9092"); // The client ID allows the Broker to better distinguish clients. It is not required. props.put(CommonClientConfigs.CLIENT_ID_CONFIG, "client-1"); // How the broker replies to the release confirmation props.put("acks", "all"); // Retry several times when sending fails props.put("retries", 0); // Producer uses the batch method to improve the transmission throughput. Here, the batch size is specified in bytes props.put("batch.size", 16384); // The waiting time for batch sending (i.e. how long to wait after a piece of data is received to form a batch sending of other data to be sent), which is a supplement to the above size. props.put("linger.ms", 1); // The size of the buffer in which the data is stored props.put("buffer.memory", 33554432); // Serializer for key props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); // Serializer for message data props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); try (Producer<String, String> producer = new KafkaProducer<>(props);) { for (int i = 0; i < 100; i++) { String message = "message-" + i; // producer sends messages in asynchronous batch mode, and the send method will return immediately. Future<RecordMetadata> resultFuture = producer.send(new ProducerRecord<String, String>("test", Integer.toString(i), message)); // If you want to synchronize blocking, wait for the results RecordMetadata rm = resultFuture.get(); System.out.println("send out:" + message + " hasOffset: " + rm.hasOffset() + " partition: " + rm.partition() + " offset: " + rm.offset()); TimeUnit.SECONDS.sleep(1L); } } }
Producer message release confirmation mechanism
// How the broker replies to the release confirmation props.put("acks", "all");
There is the meaning of configuration parameters in produceconfig.
- 0: the producer will not wait for the broker to send an ack
- 1: After receiving the message, the leader sends an ack
- All: after receiving the message, the leader waits for all in sync replicas to be synchronized and sends an ack. This provides the best reliability. This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.
- -1: Equivalent to all
Acknowledgement result of Callback asynchronous processing
Obtain the confirmation result asynchronously.

There is no message data we need in RecordMetadata. Properties can be seen
You need to make a record correlation to know that the data is that message.
You want to know the exceptions, including the timeout exception, the server exception not found, and so on.
Using in spring
Send messages directly using Kafkatemple, also through
/*** Set a {@link ProducerListener} which will be invoked when Kafka acknowledges * a send operation. By default a {@link LoggingProducerListener} is configured * which logs errors only. * @param producerListener the listener; may be {@code null}. */ public void setProducerListener(@Nullable ProducerListener<K, V> producerListener) { this.producerListener = producerListener; }
You can give us the corresponding records
Producer interceptor interceptor

The order of interception has no priority, only the order of addition
Mode of use
Specify the implementation class name through interceptor.classes:
//Configure ProducerInterceptor props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, "com.study.xxx.MyProducerInterceptor1,com.study.xxx.MyProducerInterceptor2");
Idempotent mode
That is, how to send messages only once to prevent repeated messages
Transaction mode
- To use the transaction pattern and the corresponding api, the transactional. ID attribute must be set. After the transactional.id is configured, idempotency will be enabled automatically, and the generator configuration on which idempotency depends will be enabled. Transactional.id is the transaction ID used to enable transaction recovery across multiple sessions of a single producer instance. It should be unique for each instance of the generator running in the partitioned application.
- Kafka cluster needs at least 3 nodes
- In order to achieve transaction assurance from end to end, the consumer must also be configured to read only committed messages.

Configure the isolation level, including read commit, etc.
Consumer
Consumers, in Kafka, consumers use the poll pull mode to obtain messages, and the data will always exist if it is not deleted.
public static void main(String[] args) { // For consumer parameters, see: http://kafka.apachecn.org/documentation.html#newconsumerconfigs Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put(CommonClientConfigs.CLIENT_ID_CONFIG, "client-1"); // Set consumption group props.put("group.id", "test"); // Enable automatic consumption offset submission // If this value is set to true, the consumer will periodically save the offset value of the current consumption to zookeeper. When the consumer fails to restart, this value will be used as the value for new consumption. props.put("enable.auto.commit", "true"); // The interval between automatic consumption and offset submission props.put("auto.commit.interval.ms", "1000"); // Deserializer for key props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); // Deserializer for message props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); String topic = "test"; try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);) { // Subscribe to topics consumer.subscribe(Arrays.asList(topic)); while (true) { // Kafka is a pull mode. The time parameter of poll tells Kafka: if there is no data at present, how long to wait before responding ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1L)); for (ConsumerRecord<String, String> record : records) System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); // Set where to start pulling messages (if necessary) // consumer.seek(new TopicPartition(topic, 0), 0); } } }
Consumer is non thread safe. In the case of multithreading, one thread has one consumer.
Consumer group
The concept of consumption group, why this concept, to achieve the concept of consumption load balance.
- Consumers are identified by a consumer group name. Each record published to topic is assigned to a consumer instance in the subscription consumer group. Consumer instances can be distributed in multiple processes or on multiple machines.
- If all consumer instances are in the same consumer group, the message record will be load balanced to each consumer instance
- If all consumer instances are in different consumer groups, each message record will be broadcast to all consumer processes
- Generally, each topic will have some consumption groups. One consumption group corresponds to a "logical subscriber". A consumer group consists of many consumer instances, which is easy to expand and fault-tolerant. This is the concept of publish and subscribe, except that subscribers are a group of consumers rather than a single process.
- The way to realize consumption in Kafka is to divide the fragments in the log into each consumer instance, so that each instance is the only consumer in the fragment at any time. Maintaining the consumption relationship in the consumption group is dynamically processed by Kafka protocol. If new instances join the group, they will take over some partition partitions from other members of the group; If an instance disappears, the owned partition will be distributed to the remaining instances.
- Kafka only guarantees that the records in the partition are orderly, but does not guarantee the order of different partitions in the subject. If you want all messages to be consumed in order, you can use a topic with only one partition, which means that there is only one consumer process for each consumer group.
group rebalance
Here, the group rebalancing and consumption group rebalancing delay parameters are set
Event triggering the counterbalance:
- Changes in the number of consumers
- Increase or decrease of slice
Will lead to the start of this balance, which is the reason for the balance.
How to specify the consumption group id in spring
Directly refers to the default id we set, that is, the consumer group id
How does Kafka know that consumers are offline and need rebalance?
It also comes from heartbeat and session.
Beatheart Session
Heartbeat and session heartbeat.interval.ms ,session.timeout.ms , max.poll.interval.ms
There are default values here. There's a heartbeat to monitor. The timeout time of the session will cause the update of the consumer's status
Consumption offset
What should I do when there is no initial offset in ZooKeeper or if the offset is out of range;
- smallest: automatically reset the offset to the minimum offset;
- Large: automatically resets the offset to the maximum
- Offset: throws an exception to the consumer
//Set where the created consumer starts consuming messages props.put("auto.offset.reset", "smallest");
Automatic submission manual submission
Auto submit
// Enable automatic consumption offset submission // If this value is set to true, the consumer will periodically save the offset value of the current consumption to zookeeper. When the consumer fails to restart, this value will be used as the value for new consumption. props.put("enable.auto.commit", "true"); // The interval between automatic consumption and offset submission props.put("auto.commit.interval.ms", "1000");
- Repeated consumption
- Lost data (lost message)
This is the problem of automatic submission. Only judge and pull again.
Manual submission
Use code
//Set manual submission props.put("enable.auto.commit", "false"); ... //Manually synchronize submit consumer offset to zookeeper consumer.commitSync();
Here, you can catch exceptions when throwing exceptions.
Submission method of Consumer

Flexibly submit consumption offset value by slice

Control the location of consumption
- The consumer program stops running at a certain time and continues to consume messages since that time after restarting. Specify the location of the consumer from a certain time
- The consumer may have stored the consumption message locally, but the local storage is broken. They want to retrieve a copy locally
String topic = "foo"; TopicPartition partition0 = new TopicPartition(topic, 0); TopicPartition partition1 = new TopicPartition(topic, 1); consumer.assign(Arrays.asList(partition0, partition1));
Get data according to a certain time
subscribe vs assign
Subscriptions and assignments The difference between the two in use
- subscribe: the segments consumed by the consumer are dynamically specified by Kafka according to the consumption group, and can be dynamically rebalance d
- assign: the user specifies which slices to consume, which is not affected by rebalance
// Not a subscription Topic // consumer.subscribe(Arrays.asList("test", "test-group")); // Instead, the consumer is directly assigned to read some fragment subscriptions and assign. Only one of them can be used. The assignment is not affected by rebalance. TopicPartition partition = new TopicPartition("test", 0); consumer.assign(Arrays.asList(partition));
Balance can be perceived through consumerRebalanceListener.
poll settings
When pulling data, how much data is returned?
Whether to return all data or part of data can be set.
- fetch.min.bytes The minimum amount of data that the server should return for a get request. If there is not enough data available, the request will wait to accumulate enough data before responding to the request. The default setting is 1 byte, which means that as long as one byte of data is available, or the extraction request times out while waiting for the data to arrive, the request will be responded to. Setting this value to a value greater than 1 will cause the server to wait for a greater amount of data accumulation, which can slightly increase the cost of server throughput and additional latency.
-
max.partition.fetch.bytes The maximum amount of data per partition that the server will return. Records are obtained by users in batches. If the first batch of records in the first non empty partition obtained is greater than this limit, the batch will still be returned to ensure that the user can make progress. The maximum record batch size accepted by the agent is defined by message.max.bytes (agent configuration) or max.message.bytes (subject configuration). For information on limiting the size of user requests, see fetch.max.bytes.
-
fetch.max.byte The maximum amount of data that the server should return for a fetch request. Records are obtained by users in batches. If the first batch of records is located in the first non empty partition obtained and is greater than this value, the batch of records will still be returned to ensure that users can make progress. Therefore, this is not an absolute maximum. The maximum record batch size accepted by the agent is defined through message.max.bytes (agent configuration) or max.message.bytes (topic configuration). Note that users perform multiple fetches in parallel.
- max.poll.records The maximum number of records returned in a single call to poll().
Here you can see the parameter configuration on the official website.
This parameter is used to prevent repetition and always take the value of a partition.
Flow Control of consumption
- In streaming processing, the program needs to perform a join operation on the stream data in two topics, and the production speed of one topic is faster than that of the other. At this time, it is necessary to reduce the consumption speed of the fast topic to match the slow topic.
- Another scenario: when starting the consumer, there are already a large number of messages piled up in these specified topics. The program needs to give priority to the topics containing the latest data, and then deal with the topics of old data.
- pause(Collection partitions) pauses the pull of a subset
- resume(Collection partitions) restores the pull of subsets
affair
API in Spring
- @KafkaListener
- @TopicPartition
- @PartitionOffffset
- KafkaListenerErrorHandler