Kafka (go) -- basic usage of producer consumer API

This article mainly explains the basic use and precautions of Producer API and Consumer API in Go Client sarama.

1. General

For Kakfa related codes, see   Github

Kafka has five core API s:

  • Producer API
  • Consumer API
  • Stream API
  • Connect API
  • Admin API

In the Go sarama client, only Producer, Consumer and admin API s are implemented temporarily.

The Stream API has made it clear that it will not be supported, and Connect is unknown.

2. Producer API

Producers in Kafka are divided into synchronous producers and asynchronous producers.

As the name implies, the synchronous producer will send every message to Kafka in real time, while the asynchronous producer will wait for a batch of messages to be saved or a specified interval to send to Kafka at one time in order to improve performance.

Async Producer

The Demo used by asynchronous producers in sarama is as follows

func Producer(topic string, limit int) {
	config := sarama.NewConfig()
	// Asynchronous producers do not recommend enabling both Errors and Successes. Generally, it is OK to enable Errors
	// Synchronization producers must all be enabled, because the synchronization will return the success or failure of sending
	config.Producer.Return.Errors = false   // Setting the error message to be returned
	config.Producer.Return.Successes = true // Setting needs to return success information
	producer, err := sarama.NewAsyncProducer([]string{kafka.HOST}, config)
	if err != nil {
		log.Fatal("NewSyncProducer err:", err)
	}
	defer producer.AsyncClose()
	go func() {
		// [! important] after sending, the asynchronous producer must read the return value from Errors or Successes. Otherwise, the internal processing logic of sarama will be blocked and only one message can be sent
		for {
			select {
			case s := <-producer.Successes():
				log.Printf("[Producer] key:%v msg:%+v \n", s.Key, s.Value)
			case e := <-producer.Errors():
				if e != nil {
					log.Printf("[Producer] err:%v msg:%+v \n", e.Msg, e.Err)
				}
			}
		}
	}()
	// Asynchronous transmission
	for i := 0; i < limit; i++ {
		str := strconv.Itoa(int(time.Now().UnixNano()))
		msg := &sarama.ProducerMessage{Topic: topic, Key: nil, Value: sarama.StringEncoder(str)}
		// Asynchronous sending only returns after writing to memory, but it is not really sent out
		// The sarama library uses a channel to receive messages. The background goroutine asynchronously takes messages from the channel and sends them
		producer.Input() <- msg
		atomic.AddInt64(&count, 1)
		if atomic.LoadInt64(&count)%1000 == 0 {
			log.Printf("Number of messages sent:%v\n", count)
		}

	}
	log.Printf("Total messages sent after sending:%v\n", limit)
}

Note:

Asynchronous producers only need to send messages to chan to return. Similarly, specific responses, including Success or Errors, are also returned asynchronously through chan.

The return value must be read from Errors or Successes, otherwise producer.Input() will be blocked

Sync Producer

Synchronizing producers is easier:

func Producer(topic string, limit int) {
	config := sarama.NewConfig()
	// The synchronization producer must enable Return.Successes and Return.Errors at the same time
	// Because the synchronization producer must return the status after sending, it needs to return both
	config.Producer.Return.Successes = true
	config.Producer.Return.Errors = true // The default value is true, which can be assigned manually
	// The logic of synchronous producer and asynchronous producer is consistent. Success or Errors are returned through channel,
	// Only the synchronization producer encapsulates a layer and returns it to the caller after the channel returns
	// See sync for details_ Line 72 of the producer.go file the newsyncproducer fromasyncproducer method
	// Two goroutine s are started internally to process Success Channel and Errors Channel respectively
	// The synchronous producer is the encapsulated asynchronous producer
	// type syncProducer struct {
	// 	producer *asyncProducer
	// 	wg       sync.WaitGroup
	// }
	producer, err := sarama.NewSyncProducer([]string{kafka.HOST}, config)
	if err != nil {
		log.Fatal("NewSyncProducer err:", err)
	}
	defer producer.Close()
	for i := 0; i < limit; i++ {
		str := strconv.Itoa(int(time.Now().UnixNano()))
		msg := &sarama.ProducerMessage{Topic: topic, Key: nil, Value: sarama.StringEncoder(str)}
		partition, offset, err := producer.SendMessage(msg) // Sending logic is also encapsulated asynchronous sending logic, which can be understood as encapsulating asynchrony into synchronization
		if err != nil {
			log.Println("SendMessage err: ", err)
			return
		}
		log.Printf("[Producer] partitionid: %d; offset:%d, value: %s\n", partition, offset, str)
	}
}

Note:

Both Return.Successes and Return.Errors must be enabled

3. Consumer API

In Kafka, consumers are divided into independent consumers and consumer groups.

StandaloneConsumer

// SinglePartition single partition consumption
func SinglePartition(topic string) {
	config := sarama.NewConfig()
	consumer, err := sarama.NewConsumer([]string{kafka.HOST}, config)
	if err != nil {
		log.Fatal("NewConsumer err: ", err)
	}
	defer consumer.Close()
	// Parameter 1 specifies which topic to consume
	// Parameter 2 partition here, the default consumption is partition No. 0. kafka has the concept of partition, which is similar to sharding in ES and MongoDB and split tables in MySQL
	// Parameter 3: the offset starts from where it is consumed. Normally, the offset will be submitted to kafka after each consumption, and then it can be consumed next time,
	// Here, the demo starts from the latest, that is, the messages generated before the consumer is started cannot be consumed
	// If it is changed to sarama.OffsetOldest, the consumption will start from the oldest message, that is, every time the consumer restarts, all messages under the topic will be consumed once
	partitionConsumer, err := consumer.ConsumePartition(topic, 0, sarama.OffsetOldest)
	if err != nil {
		log.Fatal("ConsumePartition err: ", err)
	}
	defer partitionConsumer.Close()
	// Will be stuck here all the time
	for message := range partitionConsumer.Messages() {
		log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, string(message.Value))
	}
}
// Partitions multi partition consumption
func Partitions(topic string) {
	config := sarama.NewConfig()
	consumer, err := sarama.NewConsumer([]string{kafka.HOST}, config)
	if err != nil {
		log.Fatal("NewConsumer err: ", err)
	}
	defer consumer.Close()
	// First query how many partitions the topic has
	partitions, err := consumer.Partitions(topic)
	if err != nil {
		log.Fatal("Partitions err: ", err)
	}
	var wg sync.WaitGroup
	// Then open a goroutine for consumption in each partition
	for _, partitionId := range partitions {
		consumeByPartition(consumer, partitionId, &wg)
	}
	wg.Wait()
}

func consumeByPartition(consumer sarama.Consumer, partitionId int32, wg *sync.WaitGroup) {
	defer wg.Done()
	partitionConsumer, err := consumer.ConsumePartition(kafka.Topic, partitionId, sarama.OffsetOldest)
	if err != nil {
		log.Fatal("ConsumePartition err: ", err)
	}
	defer partitionConsumer.Close()
	for message := range partitionConsumer.Messages() {
		log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, string(message.Value))
	}
}

If you run the above Demo repeatedly, you will find that you will consume from the first message to all the messages.

Isn't this proper repeated consumption?

The biggest difference between Kafka and other MQ is that the messages in Kafka will not be deleted after consumption, but will be retained until they expire.

In order to prevent consumers from consuming from the first message every time they restart, we need to submit offset to Kafka after consuming the message. In this way, you can continue to consume the last offset after restarting.

OffsetManager

The function of submitting Offset is not implemented in independent consumers, so we need to complete it with the help of OffsetManager.

func OffsetManager(topic string) {
	config := sarama.NewConfig()
	// Configure to enable automatic offset submission, so that the samara library will regularly help us submit the latest offset information to kafka
	config.Consumer.Offsets.AutoCommit.Enable = true              // Enable auto commit offset
	config.Consumer.Offsets.AutoCommit.Interval = 1 * time.Second // Automatic commit interval
	client, err := sarama.NewClient([]string{kafka.HOST}, config)
	if err != nil {
		log.Fatal("NewClient err: ", err)
	}
	defer client.Close()
	// The offset manager is used to manage the offset of each consumer group
	// Different consumer s are distinguished according to the groupID. Note: the offset information submitted each time is also associated with the groupID
	offsetManager, _ := sarama.NewOffsetManagerFromClient("myGroupID", client) // Offset Manager
	defer offsetManager.Close()
	// The offset of each partition is also managed separately. In the demo, 0 partition is used because the topic has only 1 partition
	partitionOffsetManager, _ := offsetManager.ManagePartition(topic, kafka.DefaultPartition) // Offset manager for the corresponding partition
	defer partitionOffsetManager.Close()
	// defer commit s once after the program ends to prevent the information between automatic submission intervals from being lost
	defer offsetManager.Commit()
	consumer, _ := sarama.NewConsumerFromClient(client)
	// According to the offset of the last consumption recorded in kafka, start + 1 and then consume
	nextOffset, _ := partitionOffsetManager.NextOffset() // Get the offset of the next message as the starting point of this consumption
	pc, _ := consumer.ConsumePartition(topic, kafka.DefaultPartition, nextOffset)
	defer pc.Close()

	for message := range pc.Messages() {
		value := string(message.Value)
		log.Printf("[Consumer] partitionid: %d; offset:%d, value: %s\n", message.Partition, message.Offset, value)
		// After each consumption, the offset is updated. Here, only the value in the program memory is updated. It needs to commit before it can be submitted to kafka
		partitionOffsetManager.MarkOffset(message.Offset+1, "modified metadata") // MarkOffset updates the offset of the last consumption
	}
}

1) Create offset Manager

offsetManager, _ := sarama.NewOffsetManagerFromClient("myGroupID", client)

2) Create the offset manager for the corresponding zone

The offset of each partition in Kafka is managed separately

artitionOffsetManager, _ := offsetManager.ManagePartition(topic, kafka.DefaultPartition)

3) Record offset

The next message to be retrieved is recorded here, not the last message, so + 1 is required

partitionOffsetManager.MarkOffset(message.Offset+1, "modified metadata")

4) Commit offset

In sarama, the offset will be submitted automatically by default, but it is recommended to use defer to submit it manually when the program exits.

defer offsetManager.Commit()

ConsumerGroup

There can be multiple consumers in the Kafka consumer group. Kafka will distribute messages to each consumer in the unit of partition. Each message will only be consumed by one consumer in the consumer group.

Note: it is based on the partition. If there are two consumers in the consumer group, but the subscribed Topic has only one partition, it is doomed that one consumer will never consume any messages.

The advantage of consumer group is concurrent consumption. Kafka has implemented the distribution logic. We only need to start multiple consumers.

If there is only one consumer, we need to manually obtain messages and distribute them to multiple goroutines. We need to write an extra piece of code, and the Offset maintenance is troublesome.

// MyConsumerGroupHandler implements the sarama.ConsumerGroup interface as a custom ConsumerGroup
type MyConsumerGroupHandler struct {
	name  string
	count int64
}

// Setup executes the first step after obtaining a new session, before ConsumeClaim()
func (MyConsumerGroupHandler) Setup(_ sarama.ConsumerGroupSession) error { return nil }

// The Cleanup is executed before the end of the session, when all ConsumeClaim goroutines exit
func (MyConsumerGroupHandler) Cleanup(_ sarama.ConsumerGroupSession) error { return nil }

// ConsumeClaim specific consumption logic
func (h MyConsumerGroupHandler) ConsumeClaim(sess sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
	for msg := range claim.Messages() {
		// fmt.Printf("[consumer] name:%s topic:%q partition:%d offset:%d\n", h.name, msg.Topic, msg.DefaultPartition, msg.Offset)
		// Mark that the message has been consumed. The consumer offset will be updated internally
		sess.MarkMessage(msg, "")
		sess.Commit()
		h.count++
		if h.count%100 == 0 {
			fmt.Printf("name:%s Consumption number:%v\n", h.name, h.count)
		}
	}
	return nil
}

func ConsumerGroup(topic, group, name string) {
	config := sarama.NewConfig()
	config.Consumer.Return.Errors = true
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	cg, err := sarama.NewConsumerGroup([]string{kafka.HOST}, group, config)
	if err != nil {
		log.Fatal("NewConsumerGroup err: ", err)
	}
	defer cg.Close()
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		handler := MyConsumerGroupHandler{name: name}
		for {
			fmt.Println("running: ", name)
			/*
				Consume() should be called continuously in an infinite loop
				Because after each Rebalance, you need to execute Consume() again to restore the connection
				Consume The Join Group request is initiated at the beginning. If the current consumer becomes the consumer group leader after joining, the Rebalance process will be carried out to re allocate
				Each consumption group in the group needs to consume the topic and partition, and then start consumption after Sync Group
			*/
			err = cg.Consume(ctx, []string{topic}, handler)
			if err != nil {
				log.Println("Consume err: ", err)
			}
			// If the context is cancel led, exit
			if ctx.Err() != nil {
				return
			}
		}
	}()
	wg.Wait()
}

Note:

It mainly implements the sarama.ConsumerGroup interface. Setup and Cleanup are auxiliary work. The real logic is   ConsumeClaim method.

func (h MyConsumerGroupHandler) ConsumeClaim(sess sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
	for msg := range claim.Messages() {
		// Mark that the message has been consumed. The consumer offset will be updated internally
		sess.MarkMessage(msg, "")
	}
	return nil
}

You need to call the sess.MarkMessage() method to update the Offset.

For Kakfa related codes, see   Github

4. Summary

1) Producer

  • Synchronous producer
    • Synchronous transmission, low efficiency and high real-time performance
  • Asynchronous producer
    • Asynchronous transmission, high efficiency
    • Sending is triggered when the size and quantity of messages reach the threshold or the interval time reaches the set value

Asynchronous producers will not block, and will send messages to Kafka in batches, which is better than synchronous producers in performance.

2) Consumer

  • Independent consumer
    • It needs to be used with OffsetManager
  • Consumer group
    • Distribute messages to consumers in the group on a partition by partition basis
    • If the number of consumers is greater than the number of partitions, there must be messages that consumers cannot consume

See Also

Tags: Go crawler http

Posted on Tue, 30 Nov 2021 22:40:02 -0500 by u0206787@nus.edu.sg