This article mainly explains the basic use and precautions of Producer API and Consumer API in Go Client sarama.
1. General
For Kakfa related codes, see Github
Kafka has five core API s:
- Producer API
- Consumer API
- Stream API
- Connect API
- Admin API
In the Go sarama client, only Producer, Consumer and admin API s are implemented temporarily.
The Stream API has made it clear that it will not be supported, and Connect is unknown.
2. Producer API
Producers in Kafka are divided into synchronous producers and asynchronous producers.
As the name implies, the synchronous producer will send every message to Kafka in real time, while the asynchronous producer will wait for a batch of messages to be saved or a specified interval to send to Kafka at one time in order to improve performance.
Async Producer
The Demo used by asynchronous producers in sarama is as follows
func Producer(topic string, limit int) { config := sarama.NewConfig() // Asynchronous producers do not recommend enabling both Errors and Successes. Generally, it is OK to enable Errors // Synchronization producers must all be enabled, because the synchronization will return the success or failure of sending config.Producer.Return.Errors = false // Setting the error message to be returned config.Producer.Return.Successes = true // Setting needs to return success information producer, err := sarama.NewAsyncProducer([]string{kafka.HOST}, config) if err != nil { log.Fatal("NewSyncProducer err:", err) } defer producer.AsyncClose() go func() { // [! important] after sending, the asynchronous producer must read the return value from Errors or Successes. Otherwise, the internal processing logic of sarama will be blocked and only one message can be sent for { select { case s := <-producer.Successes(): log.Printf("[Producer] key:%v msg:%+v \n", s.Key, s.Value) case e := <-producer.Errors(): if e != nil { log.Printf("[Producer] err:%v msg:%+v \n", e.Msg, e.Err) } } } }() // Asynchronous transmission for i := 0; i < limit; i++ { str := strconv.Itoa(int(time.Now().UnixNano())) msg := &sarama.ProducerMessage{Topic: topic, Key: nil, Value: sarama.StringEncoder(str)} // Asynchronous sending only returns after writing to memory, but it is not really sent out // The sarama library uses a channel to receive messages. The background goroutine asynchronously takes messages from the channel and sends them producer.Input() <- msg atomic.AddInt64(&count, 1) if atomic.LoadInt64(&count)%1000 == 0 { log.Printf("Number of messages sent:%v\n", count) } } log.Printf("Total messages sent after sending:%v\n", limit) }
Note:
Asynchronous producers only need to send messages to chan to return. Similarly, specific responses, including Success or Errors, are also returned asynchronously through chan.
The return value must be read from Errors or Successes, otherwise producer.Input() will be blocked
Sync Producer
Synchronizing producers is easier:
func Producer(topic string, limit int) { config := sarama.NewConfig() // The synchronization producer must enable Return.Successes and Return.Errors at the same time // Because the synchronization producer must return the status after sending, it needs to return both config.Producer.Return.Successes = true config.Producer.Return.Errors = true // The default value is true, which can be assigned manually // The logic of synchronous producer and asynchronous producer is consistent. Success or Errors are returned through channel, // Only the synchronization producer encapsulates a layer and returns it to the caller after the channel returns // See sync for details_ Line 72 of the producer.go file the newsyncproducer fromasyncproducer method // Two goroutine s are started internally to process Success Channel and Errors Channel respectively // The synchronous producer is the encapsulated asynchronous producer // type syncProducer struct { // producer *asyncProducer // wg sync.WaitGroup // } producer, err := sarama.NewSyncProducer([]string{kafka.HOST}, config) if err != nil { log.Fatal("NewSyncProducer err:", err) } defer producer.Close() for i := 0; i < limit; i++ { str := strconv.Itoa(int(time.Now().UnixNano())) msg := &sarama.ProducerMessage{Topic: topic, Key: nil, Value: sarama.StringEncoder(str)} partition, offset, err := producer.SendMessage(msg) // Sending logic is also encapsulated asynchronous sending logic, which can be understood as encapsulating asynchrony into synchronization if err != nil { log.Println("SendMessage err: ", err) return } log.Printf("[Producer] partitionid: %d; offset:%d, value: %s\n", partition, offset, str) } }
Note:
Both Return.Successes and Return.Errors must be enabled
3. Consumer API
In Kafka, consumers are divided into independent consumers and consumer groups.
StandaloneConsumer
// SinglePartition single partition consumption func SinglePartition(topic string) { config := sarama.NewConfig() consumer, err := sarama.NewConsumer([]string{kafka.HOST}, config) if err != nil { log.Fatal("NewConsumer err: ", err) } defer consumer.Close() // Parameter 1 specifies which topic to consume // Parameter 2 partition here, the default consumption is partition No. 0. kafka has the concept of partition, which is similar to sharding in ES and MongoDB and split tables in MySQL // Parameter 3: the offset starts from where it is consumed. Normally, the offset will be submitted to kafka after each consumption, and then it can be consumed next time, // Here, the demo starts from the latest, that is, the messages generated before the consumer is started cannot be consumed // If it is changed to sarama.OffsetOldest, the consumption will start from the oldest message, that is, every time the consumer restarts, all messages under the topic will be consumed once partitionConsumer, err := consumer.ConsumePartition(topic, 0, sarama.OffsetOldest) if err != nil { log.Fatal("ConsumePartition err: ", err) } defer partitionConsumer.Close() // Will be stuck here all the time for message := range partitionConsumer.Messages() { log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, string(message.Value)) } }
// Partitions multi partition consumption func Partitions(topic string) { config := sarama.NewConfig() consumer, err := sarama.NewConsumer([]string{kafka.HOST}, config) if err != nil { log.Fatal("NewConsumer err: ", err) } defer consumer.Close() // First query how many partitions the topic has partitions, err := consumer.Partitions(topic) if err != nil { log.Fatal("Partitions err: ", err) } var wg sync.WaitGroup // Then open a goroutine for consumption in each partition for _, partitionId := range partitions { consumeByPartition(consumer, partitionId, &wg) } wg.Wait() } func consumeByPartition(consumer sarama.Consumer, partitionId int32, wg *sync.WaitGroup) { defer wg.Done() partitionConsumer, err := consumer.ConsumePartition(kafka.Topic, partitionId, sarama.OffsetOldest) if err != nil { log.Fatal("ConsumePartition err: ", err) } defer partitionConsumer.Close() for message := range partitionConsumer.Messages() { log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, string(message.Value)) } }
If you run the above Demo repeatedly, you will find that you will consume from the first message to all the messages.
Isn't this proper repeated consumption?
The biggest difference between Kafka and other MQ is that the messages in Kafka will not be deleted after consumption, but will be retained until they expire.
In order to prevent consumers from consuming from the first message every time they restart, we need to submit offset to Kafka after consuming the message. In this way, you can continue to consume the last offset after restarting.
OffsetManager
The function of submitting Offset is not implemented in independent consumers, so we need to complete it with the help of OffsetManager.
func OffsetManager(topic string) { config := sarama.NewConfig() // Configure to enable automatic offset submission, so that the samara library will regularly help us submit the latest offset information to kafka config.Consumer.Offsets.AutoCommit.Enable = true // Enable auto commit offset config.Consumer.Offsets.AutoCommit.Interval = 1 * time.Second // Automatic commit interval client, err := sarama.NewClient([]string{kafka.HOST}, config) if err != nil { log.Fatal("NewClient err: ", err) } defer client.Close() // The offset manager is used to manage the offset of each consumer group // Different consumer s are distinguished according to the groupID. Note: the offset information submitted each time is also associated with the groupID offsetManager, _ := sarama.NewOffsetManagerFromClient("myGroupID", client) // Offset Manager defer offsetManager.Close() // The offset of each partition is also managed separately. In the demo, 0 partition is used because the topic has only 1 partition partitionOffsetManager, _ := offsetManager.ManagePartition(topic, kafka.DefaultPartition) // Offset manager for the corresponding partition defer partitionOffsetManager.Close() // defer commit s once after the program ends to prevent the information between automatic submission intervals from being lost defer offsetManager.Commit() consumer, _ := sarama.NewConsumerFromClient(client) // According to the offset of the last consumption recorded in kafka, start + 1 and then consume nextOffset, _ := partitionOffsetManager.NextOffset() // Get the offset of the next message as the starting point of this consumption pc, _ := consumer.ConsumePartition(topic, kafka.DefaultPartition, nextOffset) defer pc.Close() for message := range pc.Messages() { value := string(message.Value) log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, value) // After each consumption, the offset is updated. Here, only the value in the program memory is updated. It needs to commit before it can be submitted to kafka partitionOffsetManager.MarkOffset(message.Offset+1, "modified metadata") // MarkOffset updates the offset of the last consumption } }
1) Create offset Manager
offsetManager, _ := sarama.NewOffsetManagerFromClient("myGroupID", client)
2) Create the offset manager for the corresponding zone
The offset of each partition in Kafka is managed separately
artitionOffsetManager, _ := offsetManager.ManagePartition(topic, kafka.DefaultPartition)
3) Record offset
The next message to be retrieved is recorded here, not the last message, so + 1 is required
partitionOffsetManager.MarkOffset(message.Offset+1, "modified metadata")
4) Commit offset
In sarama, the offset will be submitted automatically by default, but it is recommended to use defer to submit it manually when the program exits.
defer offsetManager.Commit()
ConsumerGroup
There can be multiple consumers in the Kafka consumer group. Kafka will distribute messages to each consumer in the unit of partition. Each message will only be consumed by one consumer in the consumer group.
Note: it is based on the partition. If there are two consumers in the consumer group, but the subscribed Topic has only one partition, it is doomed that one consumer will never consume any messages.
The advantage of consumer group is concurrent consumption. Kafka has implemented the distribution logic. We only need to start multiple consumers.
If there is only one consumer, we need to manually obtain messages and distribute them to multiple goroutines. We need to write an extra piece of code, and the Offset maintenance is troublesome.
// MyConsumerGroupHandler implements the sarama.ConsumerGroup interface as a custom ConsumerGroup type MyConsumerGroupHandler struct { name string count int64 } // Setup executes the first step after obtaining a new session, before ConsumeClaim() func (MyConsumerGroupHandler) Setup(_ sarama.ConsumerGroupSession) error { return nil } // The Cleanup is executed before the end of the session, when all ConsumeClaim goroutines exit func (MyConsumerGroupHandler) Cleanup(_ sarama.ConsumerGroupSession) error { return nil } // ConsumeClaim specific consumption logic func (h MyConsumerGroupHandler) ConsumeClaim(sess sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error { for msg := range claim.Messages() { // fmt.Printf("[consumer] name:%s topic:%q partition:%d offset:%d\n", h.name, msg.Topic, msg.DefaultPartition, msg.Offset) // Mark that the message has been consumed. The consumer offset will be updated internally sess.MarkMessage(msg, "") sess.Commit() h.count++ if h.count%100 == 0 { fmt.Printf("name:%s Consumption number:%v\n", h.name, h.count) } } return nil } func ConsumerGroup(topic, group, name string) { config := sarama.NewConfig() config.Consumer.Return.Errors = true ctx, cancel := context.WithCancel(context.Background()) defer cancel() cg, err := sarama.NewConsumerGroup([]string{kafka.HOST}, group, config) if err != nil { log.Fatal("NewConsumerGroup err: ", err) } defer cg.Close() var wg sync.WaitGroup wg.Add(1) go func() { defer wg.Done() handler := MyConsumerGroupHandler{name: name} for { fmt.Println("running: ", name) /* Consume() should be called continuously in an infinite loop Because after each Rebalance, you need to execute Consume() again to restore the connection Consume The Join Group request is initiated at the beginning. If the current consumer becomes the consumer group leader after joining, the Rebalance process will be carried out to re allocate Each consumption group in the group needs to consume the topic and partition, and then start consumption after Sync Group */ err = cg.Consume(ctx, []string{topic}, handler) if err != nil { log.Println("Consume err: ", err) } // If the context is cancel led, exit if ctx.Err() != nil { return } } }() wg.Wait() }
Note:
It mainly implements the sarama.ConsumerGroup interface. Setup and Cleanup are auxiliary work. The real logic is ConsumeClaim method.
func (h MyConsumerGroupHandler) ConsumeClaim(sess sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error { for msg := range claim.Messages() { // Mark that the message has been consumed. The consumer offset will be updated internally sess.MarkMessage(msg, "") } return nil }
You need to call the sess.MarkMessage() method to update the Offset.
For Kakfa related codes, see Github
4. Summary
1) Producer
- Synchronous producer
- Synchronous transmission, low efficiency and high real-time performance
- Asynchronous producer
- Asynchronous transmission, high efficiency
- Sending is triggered when the size and quantity of messages reach the threshold or the interval time reaches the set value
Asynchronous producers will not block, and will send messages to Kafka in batches, which is better than synchronous producers in performance.
2) Consumer
- Independent consumer
- It needs to be used with OffsetManager
- Consumer group
- Distribute messages to consumers in the group on a partition by partition basis
- If the number of consumers is greater than the number of partitions, there must be messages that consumers cannot consume
- Original author: Yiqixing
- Original link: Kafka(Go) tutorial (V) -- basic use of producer consumer API | refers to the personal blog of yuexiaozhu and yiqixing
- Copyright notice: this work adopts Creative Commons Attribution - non commercial use - no deduction 4.0 international license agreement For non-commercial reprint, please indicate the source (author, original link). For commercial reprint, please contact the author for authorization.
See Also
- Kafka(Go) tutorial (IV) -- Kafka online deployment and cluster parameter configuration
- Kafka(Go) tutorial (III) -- Introduction to Kafka related concepts
- Kafka(Go) tutorial (II) -- hello Kafka
- Kafka(Go) tutorial (I) -- install Kafka through docker compose
- How much traffic does an HTTP(S) request require? Wireshark packet capture analysis