Kafka use summary and production and consumption Demo implementation

What is kafka

Kafka official website's own introduction is: a distributed flow platform can be supported.
kafka official website

 kafka has three key capabilities:
     1. Publish / subscribe record flow, similar to message queue and enterprise information system
     2. Store the record stream in a fault-tolerant and persistent way
     3. Convection treatment

kafka is usually used in two kinds of applications:
    1. Build a real-time flow data pipeline to reliably acquire data between systems or applications
    2. Build real-time streaming applications that transform or respond to data flows

Some basic concepts of kafka:
    1.Kafka runs as a cluster on one or more servers, which can span multiple data centers.
    2. The Kafka cluster stores the record stream in a category called topic.
    3. Each record consists of a key, a value and a timestamp.

kafka core API:
    1.Producer API: allows applications to publish record streams to one or more topic s.
    2.Consumer API: allows applications to subscribe to one or more topic s and process the record streams generated for them.
    3.Streams API: allows applications to act as stream processors, using input streams from one or more topic s,
    The output stream of one or more output topic s is generated to effectively convert the input stream to the output stream.
    4.Connector API: allows building and running reusable producers or consumers to connect topic s to existing applications or data systems.
    For example, a connector to a relational database might capture every change to a table.

As message system

There are two kinds of traditional messaging models: message queue and publish / subscribe. In message queuing, a consumer pool can read data from a server, and each record will be sent to one of the servers; in publish subscribe, records are broadcast to all consumers. These two models have their own advantages and disadvantages:

    Advantages and disadvantages of message queuing:
        It allows you to partition data processing on multiple consumer instances, which allows you to extend processing.
        The queue is not multi subscriber - once a process has read its lost data.

    Advantages and disadvantages of publish and subscribe:
        Publish subscribe allows you to broadcast data to multiple processes,
        However, because each message is delivered to each subscriber, processing cannot be extended.

As a messaging system, what's the difference with mq? (RabbitMq\redis\RocketMq\ActiveMq)

RabbitMQ:
     Following AMQP protocol, it is developed by erlang language with high concurrency, which is used in real-time message delivery with high reliability requirements
     Ten thousand data volume, high community activity, rich visual operation interface.
     It provides comprehensive core functions and is an excellent product of message queuing.
     Because it is erlang language development, it is difficult to maintain and developers are difficult to redevelop.

Redis:
    The main scenario of redis is memory database. As a message queue, the reliability is too poor, and the speed is too dependent on network IO.
    It is faster on the server and prone to data piling up. It can be used in lighter situations.

RocketMq:
    rocketMq hundreds of thousands of data, based on Java development. It is an open source message product of Alibaba.
    It has met the test of Taobao's double 11, and the documents are very perfect, with some advanced features that other message queues do not have,
    For example, scheduled push, other message queues are delayed push. For example, rabbitMq sets the delayed push time by setting the expire field.
    For example, rocketmq implements distributed transactions, which is more reliable. Rocketmq is also the only used product that supports distributed transactions.

Kafka:
    kafka was originally designed for log statistical analysis, but now it can also do analysis and statistics of operation data under the background of big data.
    kafka is a real large-scale distributed message queue with few core functions. Distributed message subscription based on zookeeper.
    Hundreds of thousands of data magnitude, stronger than RokectMq.
    The communication between client and server is completed by a simple, high performance, language independent TCP protocol.

ActiveMq:
    Apache ActiveMQ? Is the most popular open source, multi protocol, java based message server. It supports industry standard agreements,
    So users can choose clients in various languages and platforms. You can use connectivity from c, c + +, Python,. net, and so on.
    Integrate your multi platform applications using the common AMQP protocol. Use STOMP to exchange messages between web applications on websockets.
    Use MQTT to manage IOT devices. Support your existing JMS infrastructure and others. ActiveMQ provides powerful support for any messagi
    And flexibility.

Note: because this article mainly introduces kafka, so the above is just a simple list of some features. If you are interested in it, you can analyze it in detail. I will write articles to summarize and analyze these products in the future, which will be briefly introduced here.

Why use message queuing?

This part is the extended content. Many people, including the year when I graduated from school, use message queuing. However, I don't have a clear understanding of why I use message queuing, so I'll talk about it here. I hope to give some help to the students in need.

So why use message queuing? Let's first review messaging. In terms of front-end, the traditional way is to pass through global variables. Later, there is the concept of data bus, and then there are corresponding solutions such as vuex, redux, store, etc. For the back-end, the communication and message delivery between the first systems are highly dependent on the communication objects, which are highly coupled. There are some products behind to solve these problems, such as web service. However, such a way is extremely unfriendly, and the maintenance is cumbersome, the responsibilities are difficult to distinguish, and the workload is increasing, so after the birth of mq, these problems have been basically solved.

Message queuing is introduced to:

1. decoupling:
    For example, system A needs to deliver messages to system B and system C when operating system p. if there is no message queue, system A needs to send A message to system B,
    I have to send another message to C. Then one day, system D, E and F said, "system A, you also need to send me A message to p. at this time, system A has to modify the code,
    The DEF can receive messages normally only after it is released online. Then after n days, C said, "don't send me messages. Take out the part that sends me messages.".
    Developers of system A have to get rid of it and launch it online. So day after day, as the number of systems increases, the number of access and exit operations increases,
    Then system A needs to be released and launched frequently, which reduces the stability, available time and the cost of testing and tracking test every time it goes online
    And the risks are self-evident. Once the message queue is introduced, a does not need to care who consumes and who exits the consumption. A is only responsible for putting the message into the queue,
    Other systems only need to listen to this queue. Even if other systems exit, it has no impact on A. It can be continuous
    Isn't it delicious to provide services?

2. asynchronous
    For example, the traditional way of sending messages to B, C and D requires 120ms. If message queuing is adopted, the time-consuming can be greatly reduced. but
    These are applicable to those unnecessary synchronous business logic.

3. peak clipping
    In the traditional mode, the request enters the database directly. When the peak value reaches a certain value, it will inevitably hang up. If the middleware message queue is applied, it can guarantee the normal service of the system. This is also the current restriction that seckill system often talks about, which can prevent the system from crashing and provide the system availability.

Configure MAVEN

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.12</artifactId>
            <version>1.0.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-streams</artifactId>
            <version>1.0.0</version>
        </dependency>

Producer

/**
 * @author chandlerHuang
 * @description @TODO
 * @date 2020/1/15
 */
public class KafkaProducerService implements Runnable {

    private final KafkaProducer<String,String> producer;

    private final String topic;

    public KafkaProducerService(String topic) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "Bound Internet IP:9092");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("key.serializer", StringSerializer.class.getName());
        props.put("value.serializer", StringSerializer.class.getName());
        this.producer = new KafkaProducer<String, String>(props);
        this.topic = topic;
    }

    @Override
    public void run() {
        int messageNo = 1;
        try {
            for(;;) {
                String messageStr="["+messageNo+"]:hello,boys!";
                producer.send(new ProducerRecord<String, String>(topic, "Message", messageStr));
                //Print when 100 pieces are produced
                if(messageNo%100==0){
                    System.out.println("sendMessages:" + messageStr);
                }
                //Exit after producing 1000 pieces
                if(messageNo%1000==0){
                    System.out.println("successCount:"+messageNo);
                    break;
                }
                messageNo++;
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            producer.close();
        }
    }

    public static void main(String args[]) {
        KafkaProducerService test = new KafkaProducerService(TopicConstant.CHART_TOPIC);
        Thread thread = new Thread(test);
        thread.start();
    }
}


Consumer

/**
 * @author chandlerHuang
 * @description @TODO
 * @date 2020/1/15
 */
public class KafkaConsumerService implements Runnable{

    private final KafkaConsumer<String, String> consumer;
    private ConsumerRecords<String, String> msgList;
    private final String topic;
    private static final String GROUPID = "groupA";

    public KafkaConsumerService(String topicName) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "Bound Internet IP:9092");
        props.put("group.id", GROUPID);
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("session.timeout.ms", "30000");
        props.put("auto.offset.reset", "earliest");
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        this.consumer = new KafkaConsumer<String, String>(props);
        this.topic = topicName;
        this.consumer.subscribe(Arrays.asList(topic));
    }

    @Override
    public void run() {
        int messageNo = 1;
        System.out.println("---------Start consumption---------");
        try {
            for (;;) {
                msgList = consumer.poll(1000);
                if(null!=msgList&&msgList.count()>0){
                    for (ConsumerRecord<String, String> record : msgList) {
                        //100 pieces of consumption will be printed, but the printed data is not necessarily the rule
                        if(messageNo%100==0){
                            System.out.println(messageNo+"=======receive: key = " + record.key() + ", value = " + record.value()+" offset==="+record.offset());
                        }
                        //Quit when you consume 1000
                        if(messageNo%1000==0){
                            break;
                        }
                        messageNo++;
                    }
                }else{
                    Thread.sleep(1000);
                }
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            consumer.close();
        }
    }
    public static void main(String args[]) {
        KafkaConsumerService test1 = new KafkaConsumerService(TopicConstant.CHART_TOPIC);
        Thread thread1 = new Thread(test1);
        thread1.start();
    }
}

Note: during the above demo writing process, an exception was reported: Kafka java client connection exception (org.apache.kafka.common.errors.TimeoutException: Failed to update metadata)

The server. File needs to be configured in kafka:

advertised.listeners=PLAINTEXT://Internet address: 9092

zookeeper.connect=Intranet address:2181

If you are a ECs, set the corresponding port in the security group to be open, otherwise you cannot access the response interface!

Tags: Linux kafka Apache Database RabbitMQ

Posted on Wed, 15 Jan 2020 07:25:05 -0500 by xydra