Installation and configuration
This section describes the construction of Kafka operating environment in detail. In order to save space, this section takes Linux CentOS as the operating system for installation demonstration. Other linux series operating systems can also refer to this section. The specific operating system information is as follows:
[root@node1 ~]# uname -a Linux node1 2.6.32-504.23.4.el6.x86_64 #1 SMP Tue Jun 9 20:57:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@node1 ~]# cat /etc/issue CentOS release 6.6 (Final) Kernel \r on an \m
It can be seen from the first figure in Section 1 that ZooKeeper needs to be involved in building Kafka operating environment. Kafka and ZooKeeper are services running on the JVM, so JDK needs to be installed. Kafka no longer supports JDK7 and below since version 2.0.0. This section takes JDK8 as an example.
1. JDK installation and configuration
Many readers who study Kafka are also supporters of JVM language. If JDK8 or above is installed in your operating system, you can skip this paragraph.
The first step to install JDK is to download the JDK 1.8 installation package, which can be downloaded on the Oracle official website page. The installation package selected in the example is jdk-8u181-linux-x64.tar.gz. We copy it to the / opt directory first. All operations related to installation in this book are carried out in this directory.
Secondly, unzip the installation package in the / opt directory. The relevant information is as follows:
[root@node1 opt]# ll jdk-8u181-linux-x64.tar.gz -rw-r--r-- 1 root root 185646832 Aug 31 14:48 jdk-8u181-linux-x64.tar.gz [root@node1 opt]# tar zxvf jdk-8u181-linux-x64.tar.gz # After decompression, a file named jdk1.8.0 is generated in the current / opt directory_ 181 folders [root@node1 opt]# cd jdk1.8.0_181/ [root@node1 jdk1.8.0_181]# pwd /opt/jdk1.8.0_181 # The above is the current JDK8 installation directory
Then configure the JDK environment variables. Modify the / etc/profile file and add the following configuration to it:
export JAVA_HOME=/opt/jdk1.8.0_181 export JRE_HOME=$JAVA_HOME/jre export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=./://$JAVA_HOME/lib:$JRE_HOME/lib
Then execute the source /etc/profile command to make the configuration effective. Finally, you can verify whether the JDK has been installed and configured successfully through the java – version command. If the installation and configuration are successful, the JDK version information will be displayed correctly. Refer to the following:
[root@node1 ~]# java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
2. ZooKeeper installation and configuration
ZooKeeper is a necessary component for installing Kafka cluster. Kafka manages metadata information through ZooKeeper, including cluster, broker, topic, partition and so on.
ZooKeeper is an open source distributed coordination service and an open source implementation of Google Chubby. Distributed applications can implement functions such as data publish / subscribe, load balancing, naming service, distributed coordination / notification, cluster management, Master election, configuration maintenance, etc. based on ZooKeeper.
There are three roles in ZooKeeper: leader, follower and observer. At the same time, there will only be one leader in ZooKeeper cluster, and the others are follower and observer. Observer does not participate in voting. By default, ZooKeeper has only two roles: leader and follower. More relevant knowledge can be obtained by referring to the official website of ZooKeeper.
The first step in installing ZooKeeper is to download the corresponding installation package. The installation package can be obtained from the official website. The installation package used in the example is zookeeper-3.4.12.tar.gz. Similarly, copy it to the / opt directory, and then unzip it. Refer to the following:
[root@node1 opt]# ll zookeeper-3.4.12.tar.gz -rw-r--r-- 1 root root 36667596 Aug 31 15:55 zookeeper-3.4.12.tar.gz [root@node1 opt]# tar zxvf zookeeper-3.4.12.tar.gz # After decompression, a folder named zookeeper-3.4.12 is generated under the current / opt directory [root@node1 opt]# cd zookeeper-3.4.12 [root@node1 zookeeper-3.4.12]# pwd /opt/zookeeper-3.4.12
Step 2: add the following contents to the / etc/profile configuration file, and execute the source /etc/profile command to make the configuration effective:
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.12 export PATH=$PATH:$ZOOKEEPER_HOME/bin
Step 3: modify the configuration file of ZooKeeper. First enter $ZOOKEEPER_HOME/conf directory and set Zoo_ Modify the sample.cfg file to zoo.cfg:
[root@node1 zookeeper-3.4.12]# cd conf [root@node1 conf]# cp zoo_sample.cfg zoo.cfg
Then modify the zoo.cfg configuration file. The contents of the zoo.cfg file are as follows:
# Heartbeat time of ZooKeeper server, unit: ms tickTime=2000 # The initialization connection time that allows the follower to connect and synchronize to the leader, expressed as a multiple of tickTime initLimit=10 # The maximum tolerance time for heartbeat detection of leader and follower exceeds syncLimit*tickTime. The leader thinks that # Follow "dead", delete the follow from the server list syncLimit=5 # Data directory dataDir=/tmp/zookeeper/data # Log directory dataLogDir=/tmp/zookeeper/log # ZooKeeper external service port clientPort=2181
By default, there are no / tmp/zookeeper/data and / tmp/zookeeper/log directories in the Linux system, so you need to create these two directories next:
[root@node1 conf]# mkdir -p /tmp/zookeeper/data [root@node1 conf]# mkdir -p /tmp/zookeeper/log
Step 4: create a myid file in the ${dataDir} directory (that is, / tmp/zookeeper/data) and write a value, such as 0. The myid file stores the server number.
Step 5: start the Zookeeper service. The details are as follows:
[root@node1 conf]# zkServer.sh start JMX enabled by default Using config: /opt/zookeeper-3.4.6/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
You can view the Zookeeper service status through the zkServer.sh status command. An example is as follows:
[root@node1 ]# zkServer.sh status JMX enabled by default Using config: /opt/zookeeper-3.4.12/bin/../conf/zoo.cfg Mode: Standalone
The above is about the installation and configuration of ZooKeeper stand-alone mode. Generally, the cluster mode is used in the production environment. The configuration of cluster mode is also relatively simple. Compared with stand-alone mode, only some configurations need to be modified. Next, take three machines as an example to configure a ZooKeeper cluster. First, add the mapping between the IP addresses of the three clusters and the machine domain names in the / etc/hosts file of the three machines. The example is as follows (the three IP addresses correspond to three machines respectively):
192.168.0.2 node1 192.168.0.3 node2 192.168.0.4 node3
Then add the following configuration to the zoo.cfg file of these three machines:
server.0=192.168.0.2:2888:3888 server.1=192.168.0.3:2888:3888 server.2=192.168.0.4:2888:3888
In order to explain the above configuration, a formula is abstracted here, that is, server.A=B:C:D. Where a is a number, representing the number of the server, which is the value in the myid file mentioned above. The number of each server in the cluster must be unique, so ensure that the value in the myid file in each server is different. B represents the IP address of the server. C represents the port where the server exchanges information with the leader server in the cluster. D indicates the port where the servers communicate with each other during the election. In this way, the cluster mode configuration comes to an end. You can start the service by executing the zkServer.sh start command on each of the three machines.
3. Installation and configuration of Kafka
After installing JDK and ZooKeeper, you can install Kafka broker. First, download the installation package from the official website. The installation package selected in the example is kafka_2.11-2.0.0.tgz, copy it to / opt directory and decompress it. The example is as follows:
[root@node1 opt]# ll kafka_2.11-2.0.0.tgz -rw-r--r-- 1 root root 55751827 Jul 31 10:45 kafka_2.11-2.0.0.tgz [root@node1 opt]# tar zxvf kafka_2.11-2.0.0.tgz # After decompression, a file named Kafka is generated in the current / opt directory_ 2.11-2.0.0 folder [root@node1 opt]# cd kafka_2.11-2.0.0 [root@node1 kafka_2.11-2.0.0]# # Kafka's root directory $Kafka_HOME is / opt/kafka_2.11-2.0.0, Kafka_ Add home to the / etc/profile file. Refer to the previous installation examples of JDK and ZooKeeper for details
Next, you need to modify the broker's configuration file $KAFKA_HOME/conf/server.properties. Mainly focus on the following configuration parameters:
# The number of brokers. If there are multiple brokers in the cluster, the number of each broker needs to be set differently broker.id=0 # The service entry address provided by the broker listeners=PLAINTEXT://localhost:9092 # The address where the message log file is stored log.dirs=/tmp/kafka-logs # The ZooKeeper cluster address required by Kafka. To facilitate the demonstration, we assume that Kafka and ZooKeeper are installed locally zookeeper.connect=localhost:2181/kafka
If it is in stand-alone mode, you can start the service after modifying the above configuration parameters. In cluster mode, you only need to modify the configuration file of stand-alone mode: ensure that the value of the broker.id configuration parameter of each broker in the cluster is different, and the listeners configuration parameter also needs to be modified to the IP address or domain name corresponding to the broker, and then you can start the service respectively. Note that before starting the Kafka service, you also need to ensure that the ZooKeeper service configured by the zookeeper.connect parameter has been started correctly.
The way to start the Kafka service is relatively simple, in $Kafka_ Execute the following command in the home directory:
bin/kafka-server-start.sh config/server.properties
If you want to run the Kafka service in the background, you can add the - daemon parameter or the & character in the startup command, for ex amp le:
bin/kafka-server-start.sh –daemon config/server.properties # perhaps bin/kafka-server-start.sh config/server.properties &
You can check whether the Kafka service process has been started through the jps command. An example is as follows:
[root@node1 kafka_2.11-2.0.0]# jps -l 23152 sun.tools.jps.Jps 16052 org.apache.zookeeper.server.quorum.QuorumPeerMain 22807 kafka.Kafka # This is the process of Kafka server
The jps command is only used to confirm that the Kafka service process has started normally. Whether it can correctly provide external services also needs to be verified by sending and consuming messages. For the verification process, please refer to the following contents.
Production and consumption
As can be seen from the contents of section 1, the producer sends messages to Kafka's theme, or more precisely, to the theme partition, and consumers consume messages by subscribing to the theme. Before demonstrating production and consumption messages, you need to create a topic as the carrier of the message.
Kafka provides many practical scripting tools, which are stored in $Kafka_ In the bin directory of home, the kafka-topics.sh script is related to the topic. Below, we use it to demonstrate the creation of a topic demo with 4 partitions and 3 replica factors. The example is as follows (in Kafka cluster mode, the number of broker s is 3):
[root@node1 kafka_2.11-2.0.0]# bin/kafka-topics.sh --zookeeper localhost: 2181/kafka --create --topic topic-demo --replication-factor 3 --partitions 4 Created topic "topic-demo".
Where -- ZooKeeper specifies the ZooKeeper service address to which Kafka is connected, - topic specifies the name of the topic to be created, - replication factor specifies the replica factor, - partitions specifies the number of partitions, - create is the action instruction to create the topic.
You can also display more specific information about the topic through -- describe, for example:
[root@node1 kafka_2.11-2.0.0]# bin/kafka-topics.sh --zookeeper localhost: 2181/kafka --describe --topic topic-demo Topic:topic-demo PartitionCount:4 ReplicationFactor:3 Configs: Topic: topic-demo Partition: 0 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: topic-demo Partition: 1 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1 Topic: topic-demo Partition: 2 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2 Topic: topic-demo Partition: 3 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
After creating the topic demo, let's check whether the Kafka cluster can send and consume messages normally$ KAFKA_ Two scripts kafka-console-producer.sh and kafka-console- consumer.sh are also provided in the home / bin directory to send and receive messages through the console. First, open a shell terminal and subscribe to the topic demo through the kafka-console-consumer.sh script. The example is as follows:
[root@node1 kafka_2.11-2.0.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-demo
Where -- bootstrap server specifies the connected Kafka cluster address and - topic specifies the topic subscribed by the consumer. At present, there are no messages stored in the topic demo, so this script cannot consume any messages.
We open another shell terminal, and then use the kafka-console-producer.sh script to send a message "Hello, Kafka!" to the topic demo. The example is as follows:
[root@node1 kafka_2.11-2.0.0]# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-demo >Hello, Kafka! >
Where -- broker list specifies the connected Kafka cluster address and - topic specifies the subject when sending messages. The second line in the example is entered manually. Pressing enter will jump to the third line, namely ">" character. At this time, the message "Hello, Kafka!" just entered appears in the shell terminal that originally executed the kafka-console-consumer.sh script. The example is as follows:
[root@node1 kafka_2.11-2.0.0]# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-demo Hello, Kafka!
Readers can also get familiar with the sending and receiving of messages and the usage of these two scripts by entering some other customized messages. However, these two scripts are generally used to do some testing work. In practical application, they will not be simply used to do complex message production and consumption related to business logic. The specific work needs to be implemented by programming. The Java client provided by Kafka is used to demonstrate message sending and receiving. Maven dependencies related to Kafka's Java client are as follows:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>2.0.0</version> </dependency>
To write a message to Kafka, first create a producer client instance and set some configuration parameters, then build the ProducerRecord object of the message, which must contain the subject to be sent and the message body of the message, and then send the message through the producer client instance. Finally, close() Method to close the producer client instance and recycle the corresponding resources.
The specific example is shown in listing 2-1. As in the script demonstration, only one message with the content of "Hello, Kafka!" is sent to the topic demo.
//Code listing 2-1 producer client sample code import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Properties; public class ProducerFastStart { public static final String brokerList = "localhost:9092"; public static final String topic = "topic-demo"; public static void main(String[] args) { Properties properties = new Properties(); properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); properties.put("bootstrap.servers", brokerList); KafkaProducer<String, String> producer = new KafkaProducer<>(properties); ProducerRecord<String, String> record = new ProducerRecord<>(topic, "hello, Kafka!"); try { producer.send(record); } catch (Exception e) { e.printStackTrace(); } producer.close(); } }
The corresponding consumption message is also relatively simple. First, create a consumer client instance and configure the corresponding parameters, and then subscribe to the topic and consume. The specific example code is shown in code listing 2-2.
//Code listing 2-2 consumer client sample code import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import java.time.Duration; import java.util.Arrays; import java.util.Properties; public class ConsumerFastStart { public static final String brokerList = "localhost:9092"; public static final String topic = "topic-demo"; public static final String groupId = "group.demo"; public static void main(String[] args) { Properties properties = new Properties(); properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); properties.put("bootstrap.servers", brokerList); //Set the name of the consumption group. See Chapter 3 for specific definitions properties.put("group.id", groupId); //Create a consumer client instance KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties); //Subscribe to topics consumer.subscribe(Collections.singletonList(topic)); //Circular consumption message while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000)); for (ConsumerRecord<String, String> record : records) { System.out.println(record.value()); } } } }
Through these examples, I believe readers should have a preliminary understanding of Kafka. This is just the beginning. In order to correctly and flexibly use Kafka, we still need to deeply explore it, including the use details and principles of producers and consumers' clients, the use details and principles of servers, operation and maintenance, monitoring, etc. every aspect is waiting for readers to break through.
Server parameter configuration
The previous description of Kafka installation and configuration simply describes several necessary server-side parameters without detailed introduction, and the Kafka server-side parameters (broker configurations) are not the only ones. There are many parameter configurations on the Kafka server, involving all aspects of use and tuning. Although these parameters do not need to be changed in most cases, understanding these parameters and conducting targeted tuning under special application requirements can make better use of Kafka to work for us.
Next, select some important server parameters for detailed description. These parameters are configured in $Kafka_ In the home / config / server.properties file.
1. zookeeper.connect
This parameter indicates the service address (including port number) of the ZooKeeper cluster to which the broker is connected. There is no default value, and this parameter is required. It can be configured as localhost:2181. If there are multiple nodes in the ZooKeeper cluster, each node can be separated by commas, similar to the format of localhost1:2181, localhost2:2181 and localhost3:2181. The best practice is to add a chroot path, which can not only clearly indicate that the nodes under the chroot path are used by Kafka, but also realize multiple Kafka clusters to reuse a set of ZooKeeper clusters, which can save more hardware resources. The configuration including chroot path is similar to localhost1:2181, localhost2:2181 and localhost3:2181 / Kafka. If chroot is not specified, the root path of ZooKeeper is used by default.
2. listeners
This parameter indicates the address list of the broker listening to the client connection, that is, the entry address list of the client to connect to the broker. The configuration format is protocol1://hostname1:port1,protocol2://hostname2:port2, where protocol represents the protocol type. The protocol types currently supported by Kafka include plantext, SSL and SASL_SSL, etc. if security authentication is not enabled, use a simple PLAINTEXT. Hostname represents the host name and port represents the service port. The default value of this parameter is null. In this way, the parameter is configured as PLAINTEXT://198.162.0.2:9092. If there are multiple addresses, they are separated by commas. If the host name is not specified, it means that the default network card is bound. Note that it may be bound to 127.0.0.1, which cannot provide external services, so the host name should not be empty; If the host name is 0.0.0.0, it means that all network cards are bound.
Also associated with this parameter is advertised.listeners, which is similar to listeners, and the default value is null. However, advertised.listeners are mainly used in IaaS (Infrastructure as a Service) environment. For example, machines on the public cloud are usually equipped with multiple network cards, including private network cards and public network cards. In this case, the advertised.listeners parameter can be set to bind the public network IP for external clients, Configure the listeners parameter to bind the private IP address for inter broker communication.
3. broker.id
This parameter is used to specify the unique ID of the broker in the Kafka cluster. The default value is - 1. If it is not set, Kafka will automatically generate one. This parameter is also related to the meta.properties file and the server parameters broker.id.generation. enable and reserved.broker.max.id. you can refer to the related depth analysis Illustrating the core principles of Kafka Relevant contents of.
4. log.dir and log.dirs
Kafka saves all messages on disk, and these two parameters are used to configure the root directory where Kafka log files are stored. Generally, log.dir is used to configure a single root directory, while log.dirs is used to configure multiple root directories (separated by commas), but Kafka does not impose any mandatory restrictions on this, that is, both log.dir and log.dirs can be used to configure a single or multiple root directories. The priority of log.dirs is higher than that of log.dir, but if log.dirs is not configured, the configuration of log.dir will prevail. By default, only the log.dir parameter is configured, and its default value is / TMP / Kafka logs.
5. message.max.bytes
This parameter is used to specify the maximum message that the broker can receive. The default value is 1000012 (B), which is about 976.6KB. If the message sent by the Producer is greater than the value set by this parameter, the (Producer) will report an exception of RecordTooLargeException. If you need to modify this parameter, you should also consider the influence of parameters such as max.request.size (client parameter) and max.message.bytes (topic parameter). In order to avoid cascading effects caused by modifying this parameter, it is recommended to consider the feasibility of splitting messages before modifying this parameter.
There are also some server-side parameters not mentioned in this section. These parameters are also very important. They need to be described in separate chapters or scenarios. For example, unclean.leader.selection.enable, log.segment.bytes and other parameters will be mentioned in later chapters.
summary
Through the content introduction of the previous two chapters, I believe readers have a preliminary understanding of Kafka, and then we can officially start to study how to use Kafka correctly and effectively.