Kafk Installation, Application and System Architecture

1. Environmental Construction

1.1 Installation package and environment preparation

zookeeper: 3.7.0  (apache-zookeeper-3.7.0-bin.tar.gz)
kafka: 2.8.0  (kafka_2.13-2.8.0.tgz)
jdk: jdk-8u301  (jdk-8u301-linux-x64.tar.gz)

zookeeper, kafka are all deployed in cluster, the host is centos7.9, the environment information is as follows:

host nameipApplication Deployment
node1192.168.206.201zookeeper,kafka
node1192.168.206.202zookeeper,kafka
node1192.168.206.203zookeeper,kafka

The hosts file is configured as follows:

$ vi /etc/hosts

192.168.206.201 node1
192.168.206.202 node2
192.168.206.203 node3

1.2 JDK Installation

Unzip the installation package on the server and configure environment variables, not to mention, as follows: (All three servers need operation)

$ tar -xzvf jdk-8u301-linux-x64.tar.gz -C /usr/local/

$ vi .bash_profile
  PATH=$PATH:$HOME/bin
  export PATH
  export JAVA_HOME=/usr/local/jdk1.8.0_301
  export PATH=$JAVA_HOME/bin:$PATH

$ source .bash_profile

Test:

$ java -version

Enter the following:
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

1.3 Zookeeper installation

Unzip and install on server

$ tar -xzvf apache-zookeeper-3.7.0-bin.tar.gz

$ cd apache-zookeeper-3.7.0-bin/conf/
$ vi zoo.cfg
  tickTime=2000
  dataDir=/data/zookeeper
  clientPort=2181
  initLimit=5
  syncLimit=2
  server.1=node1:2888:3888
  server.2=node2:2888:3888
  server.3=node3:2888:3888

$ mkdir -p /data/zookeeper
$ vi /data/zookeeper/myid
  1
  ##The last three lines of server.ordinal=IP:2888:3888 corresponding to the person configured in the zoo.cfg file are written in this file, and 1 node2 for node1 is 2 for node2

Configure on three servers in turn, noting that the sequence numbers in the myid file above are configured according to the sequence numbers in zoo.cfg.
Start zk:

$ bin/zkServer.sh start

Start successfully:
ZooKeeper JMX enabled by default
Using config: /root/apache-zookeeper-3.7.0-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Client test:

$ bin/zkCli.sh -server 127.0.0.1:2181

Get into zk Interactive terminal, execute test commands
[zk: 127.0.0.1:2181(CONNECTED) 0] ls /
[zookeeper]

1.4 Kafka Installation

Unzip and install the server:

$ tar -xzvf kafka_2.13-2.8.0.tgz
$ cd kafka_2.13-2.8.0
$ vi config/server.properties

broker.id=1     #Unique and non-conflicting per server, used to distinguish different broker s
log.dirs=/data/kafka #Modify the directory where kafka log data files are stored
zookeeper.connect=node1:2181,node2:2181,node3:2181/kafka    

Start the service:

$ bin/kafka-server-start.sh config/server.properties

Test:

See zookeeper Cluster data:
ls /kafka/brokers/ids
 Output results:
[1, 2, 3]

2. Use of Basic Commands

2.1 Create Topic

$ bin/kafka-topics.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --create  --topic hello-topic --partitions 3 --replication-factor 3

- topic hello-topic: topic topic topic name (example: hello-topic is topic name)
- partitions 3: Set the number of partitions (example: the top has three partitions)
- replication-factor 3: replication factor (example: there are three replica backups per partition under this top)

2.2 List all Topic s

$ bin/kafka-topics.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --list
 All will be displayed topic list

2.3 Get more information about a Topic

$ bin/kafka-topics.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --describe --topic hello-topic
 The output is as follows: (Partition information, partition master node information, replica node information will be displayed, ISR Node Information)
Topic: hello-topic	TopicId: uX53uX8dSDW-qP-0KKreDg	PartitionCount: 3	ReplicationFactor: 3	Configs: segment.bytes=1073741824
Topic: hello-topic	Partition: 0	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
Topic: hello-topic	Partition: 1	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
Topic: hello-topic	Partition: 2	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1

Explain
Partition: 0 here represents the partition number
Leader: 1 Primary node information for this partition
Replicas: 1,3,2 Replica node information for this partition
Isr: 1,3,2 Currently available ISR node information
Refer to this article to learn more about ISR mechanisms

2.4 Send Messages

$ bin/kafka-console-producer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --topic hello-topic
 Enter the interactive terminal to send messages online to topic
>This is my first event
>This is my second event
>This is my third event
>

2.5 Consumer News

$ bin/kafka-console-consumer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --topic hello-topic --from-beginning
 Message results will be output in real time:
This is my first event
This is my third event
This is my second event

Explain
The following options define where messages begin to be consumed:
- from-start: If the user has not established an offset to use, start with the earliest message that appears in the log
- offset: The offset id (a non-negative number) to be used from, or the character "earliest" to start from, or "latest" to end (default: up-to-date).If offset offsets are used to specify precisely the location of the message, then the - partition option needs to be specified to explicitly specify partition information
For example:
$ bin/kafka-console-consumer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --topic hello-topic --offset 11 --partition 2
- group: group consumption group can be specified

2.6 Get a list of consumer groups

$  bin/kafka-consumer-groups.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --all-groups --list

2.7 Get consumer Group details

$  bin/kafka-consumer-groups.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --group hello-group --describe
GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                 HOST             CLIENT-ID
hello-group     hello-topic     0          14              14              0               consumer-hello-group-1-cc2ea682-4bb7-41a0-b87c-5970522de997 /192.168.206.201 consumer-hello-group-1
hello-group     hello-topic     1          19              19              0               consumer-hello-group-1-cc2ea682-4bb7-41a0-b87c-5970522de997 /192.168.206.201 consumer-hello-group-1
hello-group     hello-topic     2          24              24              0               consumer-hello-group-1-cc2ea682-4bb7-41a0-b87c-5970522de997 /192.168.206.201 consumer-hello-group-1

*Explain*
CURRENT-OFFSET: this group Consumption offset for current partition
LOG-END-OFFSET: The offset of the current partition, which is the maximum message offset that has been submitted ( CURRENT-OFFSET <= LOG-END-OFFSET)

2.8 View the total number of messages currently topic

$ ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list node1:9092,node3:9092,node2:9092 --topic hello-topic  --time -1
 The output is as follows:
hello-topic:0:20
hello-topic:1:27
hello-topic:2:29

Explain
The kafka-consumer-groups.sh command is usually used to view LOG-END-OFFSET and can be used without a consumer

3. Use of SDK

3.1 Project Maven Dependency

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.8.0</version>
</dependency>

3.2 Producer Demo

package com.kafka;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.Properties;

public class ProducerDemo {
    private static final Logger LOGGER = LoggerFactory.getLogger(ProducerDemo.class);

    public static void main(String[] args) throws IOException {
        String topic = "hello-topic";
        String bootstrapServer = "node1:9092,node2:9092,node3:9092";

        Properties properties = new Properties();
        //broker Connection Information
        properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
        //Message Key Serializer
        properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        //Message Value Serializer
        properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        //Ack Response 0: Do not wait for broker ACK response, i.e. return when socket starts locally, return when 1:broker master node writes successfully, replica success is not guaranteed, and return when at least one replica writes successfully in -1:broker master node and ISR collection (distributed environment recommends using this mode)
        properties.setProperty(ProducerConfig.ACKS_CONFIG, "-1");
        
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

        for (int i = 0; i < 100000; i++) {
            for (int j = 0; j < 3; j++) {
                ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, j, j + "", j + "-" + i + "-msg");
                producer.send(record, (metadata, exception) -> {
                    int partition = metadata.partition();
                    long offset = metadata.offset();
                    LOGGER.debug("partition:{},offset:{},key:{},value:{}!", partition, offset, record.key(), record.value());
                    if (exception != null) {
                        LOGGER.error("error!", exception);
                    }
                });
            }
        }
        System.in.read();
    }
}

Demo simulates message sending in the form of String for both key and value, and continuously sends data in a partition explicitly specified.

Parameters: Pay attention to ack's parameter settings
0: Do not wait for the broker's ack response, that is, start the socket locally and return
1:broker primary node returns when it is successfully written, not guaranteed a successful copy
-1:At least one copy in the broker master node and ISR collection is written successfully before returning (this mode is recommended for distributed environments)

3.3 Consumer Demo

package com.kafka;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRebalanceListener;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.time.Duration;
import java.util.*;

public class ConsumerDemo {
    private static final Logger LOGGER = LoggerFactory.getLogger(ConsumerDemo.class);

    public static void main(String[] args) throws IOException {
        String topic = "hello-topic";
        String bootstrapServer = "node1:9092,node2:9092,node3:9092";
        String groupId = "consumer-java-group";

        Properties properties = new Properties();
        //broker Connection Information
        properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
        //Message Key Deserializer
        properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //Message Value Deserializer
        properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        //Consumer Group ID
        properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, groupId);
        //Set autocommit to true, then follow AUTO_COMMIT_INTERVAL_MS_CONFIG parameter for automatic submission of consumer offset s
        properties.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, Boolean.TRUE.toString());
        //Time interval for automatic submission of consumer offset s
        properties.setProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "5000");
        //offset settings for message consumption for this group
        //Earliest: If there is no initial offset in Kafka, or if the current offset no longer exists on the server (for example, because the data has been deleted), automatically reset the offset to the earliest offset
        //Latest: automatically reset offset to the latest offset
        //none: If not, throw an exception to the user to find the last offset for the consumer group
        properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);

        //Can monitor changes in partition information
        consumer.subscribe(Collections.singletonList(topic), new ConsumerRebalanceListener() {
            @Override
            public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
                partitions.forEach(e -> LOGGER.info("onPartitionsRevoked -> topic:{},partition:{}", e.topic(), e.partition()));
            }

            @Override
            public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
                partitions.forEach(e -> LOGGER.info("onPartitionsAssigned -> topic:{},partition:{}", e.topic(), e.partition()));
            }
        });

        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(5000L));
            //Read messages by partition
            records.forEach(r -> {
                LOGGER.debug("partition:{},offset:{},key:{},value:{}", r.partition(), r.offset(), r.key(), r.value());
            });
        }
    }
}

About offset
Consumer's offset can be self-coded and maintained by third parties, such as databases, redis, etc. At the beginning of consumption, according to the information maintained by the consumer, the consumer's search (long offset) function is used to reset and start consumption.In the absence of specific business needs, kafka's auto-preservation mechanism works by maintaining u in KafkaConsumer_Offsets topic, which saves offsets for different group s of consume, is submitted once in five seconds by default.Of course, you can also use a non-automatic commit mechanism to explicitly call the commit function, commitAsync(), or commitSync().
Further reference articles on offset

4. System Architecture

//todo

Tags: Big Data kafka Zookeeper

Posted on Sun, 05 Sep 2021 13:22:27 -0400 by eatadi