Kafka Core API Producer

Producer sends demos asynchronously

stay Above The use of the AdminClient API was introduced, and now we know how to manage Kafka through the API in our applications.But in most application development, the most common scenario we face is to send messages to or consume messages from Kafka, which is a typical production/consumption model.This article demonstrates how to use the Producer API to send messages to Kafka to make an application a producer.

The Producer API has the following send modes:

  • Asynchronous Send
  • Asynchronous Blocking Send
  • Asynchronous callback send

Next, use a simple example to demonstrate sending messages asynchronously to Kafka.First, we need to create a Producer instance, and we must configure three parameters, the ip address and port number of the Kafka service, and the serializer of the message key and value (the message body exists as a key-value structure).

In this case, both the key and value of the message are of String type, so the String Serializer is used as a string type serializer.Code example:

/**
 * Create Producer Instance
 */
public static Producer<String, String> createProducer() {
    Properties properties = new Properties();
    // Specify the ip address and port number of the Kafka service
    properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "127.0.0.1:9092");
    // Specify the serializer of the message key
    properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
    // Specify the serializer of the message value
    properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");

    return new KafkaProducer<>(properties);
}

What was done in the constructor when new KafkaProducer:

  • Read configuration items in Properties to initialize ProducerConfig
  • Initialize some configuration fields based on ProducerConfig
  • Initialize MetricConfig monitoring metrics configuration and MetricsReporter reports list and Metrics repository
  • Load the partitioner load balancer from the configuration through which messages are evenly distributed to different partitions when there are multiple partitions
  • Loading serializers for message key s and value s from configuration
  • Initialize RecordAccumulator, a counter-like thing for calculating message batches.Because Producer does not send a message as soon as it receives a message, it sends batches after a certain batch has been reached, so a counter is needed to store and calculate the batches.
  • Initializes the Sender used to send the message, then creates a daemon thread for it and starts

Tips:

  • If you look closely at the source code of the KafkaProducer constructor, you will find that all its properties are final and initialized in the constructor. There are no unsafe publishing or sharing variables. This also implies in disguise that KafkaProducer is thread-safe.

Asynchronous sending can then be achieved by calling the send method in Producer.Code example:

/**
 * Demonstrate Producer Asynchronous Sending
 */
public static void producerAsyncSend() {
    String topicName = "MyTopic";
    String key = "test-key";
    String value = "this is test message!";

    try (Producer<String, String> producer = createProducer()) {
        // Build Message Object
        ProducerRecord<String, String> record =
                new ProducerRecord<>(topicName, key, value);
        // Send a message
        producer.send(record);
    }
}

The main things you do in producer.send(record) are the following:

  • Deserialize key and value of message using serializer
  • Computing partitions, which calculate which partition a message enters, is a load balancing process
  • Calculate batches, determine if a new batch needs to be created, and then call accumulator.append to append messages to the batch
  • When the batch is full, call sender.wakeup to send messages in the daemon thread

The general time series is as follows:

The flow chart for sending the message is as follows:

Producer Asynchronous Blocking Send Demo

The send method has a return value of type Future, and when we call Future's get method, the current thread is blocked. This achieves the effect of blocking the sending message asynchronously, while getting the result is blocked.This is how we get metadata information stored in Future.Code example:

/**
 * Demonstrate Producer Asynchronous Blocking Send
 */
public static void producerAsyncBlockSend() throws Exception {
    String topicName = "MyTopic";
    String key = "test-key";
    String value = "this is test message!";

    try (Producer<String, String> producer = createProducer()) {
        // Build Message Object
        ProducerRecord<String, String> record =
                new ProducerRecord<>(topicName, key, value);
        // Send a message
        Future<RecordMetadata> future = producer.send(record);
        // When a get is called, the current thread is blocked for asynchronous blocking
        // In fact, the effect of synchronization is equal to that of get immediately after sending
        RecordMetadata metadata = future.get();
        System.out.println(String.format(
                "hasTimestamp: %s, timestamp: %s, hasOffset: %s, offset: %s, partition: %s, topic: %s",
                metadata.hasTimestamp(), metadata.timestamp(),
                metadata.hasOffset(), metadata.offset(),
                metadata.partition(), metadata.topic()
        ));
    }
}

Run the above code, and the console output is as follows:

hasTimestamp: true, timestamp: 1589637627231, hasOffset: true, offset: 5, partition: 1, topic: MyTopic

Producer Asynchronous Callback Send Demo

If you want to get results after sending a message, a better way to do this than to call Future's get method directly is to send the message using an asynchronous callback.

Incoming a callback function is supported in the send method. When the message is sent, the callback function is called and the result is passed in as a parameter, so that we can process the result in the callback function.Code example:

/**
 * Demonstrate Producer Asynchronous Callback Sending
 */
public static void producerAsyncCallbackSend() throws Exception {
    String topicName = "MyTopic";
    String key = "test-key";
    String value = "this is test message!";

    try (Producer<String, String> producer = createProducer()) {
        // Build Message Object
        ProducerRecord<String, String> record =
                new ProducerRecord<>(topicName, key, value);
        // Sends a message, passes in a callback function, which is called when the message is sent
        producer.send(record, (metadata, err) -> {
            if (err != null) {
                err.printStackTrace();
            }

            System.out.println(String.format(
                    "hasTimestamp: %s, timestamp: %s, hasOffset: %s, offset: %s, partition: %s, topic: %s",
                    metadata.hasTimestamp(), metadata.timestamp(),
                    metadata.hasOffset(), metadata.offset(),
                    metadata.partition(), metadata.topic()
            ));
        });
    }
}

Run the above code, and the console output is as follows:

hasTimestamp: true, timestamp: 1589639553024, hasOffset: true, offset: 7, partition: 1, topic: MyTopic

Custom Partition Load Balancer

In some special business scenarios, we often have the need for custom load balancing algorithms, in Kafka, Partition load balancers can be customized by implementing the Partitioner interface.

The load balancing algorithm implemented in this example is relatively simple. It uses the hashcode of key to balance the number of partitions to get an index of the partitions. Code example:

package com.zj.study.kafka.producer;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

/**
 * Custom Partition Load Balancer
 *
 * @author 01
 * @date 2020-05-17
 **/
public class MyPartitioner implements Partitioner {

    @Override
    public int partition(String topic, Object key,
                         byte[] keyBytes, Object value,
                         byte[] valueBytes, Cluster cluster) {

        int partitionsNum = cluster.partitionsForTopic(topic).size();
        return key.hashCode() % partitionsNum;
    }

    @Override
    public void close() {
    }

    @Override
    public void configure(Map<String, ?> configs) {
    }
}

Then, when creating the Producer instance, specify the package name path for MyPartitioner.Code example:

/**
 * Create Producer Instance
 */
public static Producer<String, String> createProducer() {
    Properties properties = new Properties();
    ...
    // Specify a custom Artition Load Balancer
    properties.setProperty(ProducerConfig.PARTITIONER_CLASS_CONFIG
            , "com.zj.study.kafka.producer.MyPartitioner");

    return new KafkaProducer<>(properties);
}

Kafka's Message Delivery Guarantee

First, let's look at the message delivery semantics, which generally have three types:

  • At most once: The message may be lost in the process of delivery, and the lost message will not be re-delivered. In fact, it ensures that the message will not be sent or consumed repeatedly
  • At least once: Messages cannot be lost in the course of delivery. Lost messages are re-delivered, which is to ensure that messages are not lost, but messages may be repeated or consumed.
  • Exactly once: This is the semantics required for most scenarios, which actually guarantees that the message will not be lost, consumed repeatedly, and delivered only once

In Kafka, messages are mainly guaranteed to be delivered through message re-sending and ACK mechanisms. Message re-sending mechanisms mainly improve the success rate of message sending, and do not guarantee that messages will be sent successfully.We can turn on or off the message redistribution mechanism by setting the retries configuration item when creating a Producer instance, code example:

// Setting a value of 0 means off, and greater than 0 means on
properties.setProperty(ProducerConfig.RETRIES_CONFIG, "0");

Another messaging guarantee mechanism is ACK mechanism, which has three modes in Kafka and needs to be specified by configuration.These three configurations mean the following:

  • acks=0:
    • The Producer sends the message directly back to the sender's buffer, as to whether the message was actually sent to the BrokerServer, Producer doesn't care, and even if the message fails, the message re-sending mechanism mentioned above doesn't work, so in this scenario, you might lose the message (which is a bit like UDP, just send, whether the other party receives the message or not).
  • acks=1:
    • Messages sent by Producer must be stored in the corresponding partition's Leader replica log file before they are successfully sent. If they fail, retry is attempted.In this mode, messages are lost only if the broker server where the Leader copy is located hangs when the message has been stored in the Leader copy but the message has not been synchronized by the Follower copy
  • acks=all:
    • Messages sent by Producer must be stored in all copied log files in the ISR list of the corresponding partition before they are successfully sent. If they fail, retry is attempted.In this scenario, it is difficult to lose a message unless all copies of the Broker Server are hung up

The same configuration item can be set when creating a Producer instance, code example:

properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");

The three values above can be set according to the actual business scenario. The more reliable the message, the worse the performance will be.These three values are a trade-off between message reliability and performance:

  • High performance requirements, but low reliability requirements, you can choose acks=0
  • For both performance and reliability, choose acks=1
  • Choose acks=all if performance is allowed to be sacrificed to ensure high reliability scenarios

Tags: Big Data kafka Apache Java

Posted on Sun, 17 May 2020 13:35:32 -0400 by klaibert26