Big data Kafka

Kafka is a distributed Message Queue based on publish / subscribe mode, which is mainly used in the field of big data real-time processing.

1.1 function of message queue

1) Decoupling
It allows you to extend or modify the processes on both sides independently, as long as you ensure that they comply with the same interface constraints.
2) Recoverability
Failure of some components of the system will not affect the whole system. Message queuing reduces the coupling between processes
Even if a message processing process hangs, the messages added to the queue can still be processed after the system recovers.
3) Buffering helps to control and optimize the speed of data flow through the system and solve the inconsistency between the processing speed of production messages and consumption messages.
4) Flexibility & peak processing power
In the case of a sharp increase in traffic, applications still need to continue to play a role, but such burst traffic is not common. It would be a huge waste to put resources on standby to handle such peak visits. Using message queuing
It enables key components to withstand sudden access pressure without completely crashing due to sudden overload requests.
5) Asynchronous communication
Many times, users do not want or need to process messages immediately. Message queuing provides an asynchronous processing mechanism that allows users to put a message on the queue without processing it immediately. Put as many messages into the queue as you want, and then process them when needed.

1.2 two modes of message queue

(1) Point to point mode (one-to-one, consumers actively pull data, and the message is cleared after receiving the message)
The message producer sends the message to the queue, and then the message consumer takes it out of the queue and consumes the message. After the message is consumed, there is no storage in the queue, so it is impossible for the message consumer to consume the consumed message.
Queue supports multiple consumers, but for a message, only one consumer can consume.

(2) Publish / subscribe mode (one to many, messages will not be cleared after consumer consumption data)
The message producer (publisher) publishes the message to the topic, and multiple message consumers (subscribers) consume the message at the same time
Rest.
There are two modes for publishing and subscribing:
1. Consumers actively pull data
2. Queues actively push data -- consumers' consumption rates are not equal, which may lead to waste of service resources and service collapse.

2Kafka

2.1 Kafka architecture

Producer consumption news
Kafks cluster management message
1. kafka cluster is a process running on the server.
2. The Kafka cluster partitions messages according to different topics.
3. Kafka will make redundant backup of data (not on the same server), leader and follower (data backup)
Consumer News
1. Consumer group: a partition can only be consumed by consumers in the same consumer group.
Consumer group, improve consumption ability.
Zookeeper registration message
1. Kafka registry
2. Consumer consumption information is stored in zk, which is convenient for consumers to read consumption information from zk after hanging up.
Before version 0.9, it exists in zk, and after version 0.9, it is saved in a topic in Kafka. Save to disk (7 days by default)

2.1 docker deploying Kafka cluster

2.1.1 docker compose installation

1,sudo curl -L "https://github.com/docker/compose/releases/download/1.28.2/docker-compose- ( u n a m e − s ) − (uname -s)- (uname−s)−(uname -m)" -o /usr/local/bin/docker-compose
2,chmod +x /usr/local/bin/docker-compose

2.1.2 Kafka cluster installation

1. Create virtual network
docker network create --driver bridge --subnet 172.23.0.0/16 --gateway 172.23.0.1 zoo_kafka
docker-compose.yml is as follows:

version: '2' services: zoo1: image: zookeeper:3.4 # Image name restart: always # Automatic restart in case of error hostname: zoo1 container_name: zoo1 privileged: true ports: # port - 2181:2181 volumes: # Mount data volume - ./zoo1/data:/data - ./zoo1/datalog:/datalog environment: TZ: Asia/Shanghai ZOO_MY_ID: 1 # Node ID ZOO_PORT: 2181 # zookeeper port number ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 # zookeeper node list networks: default: ipv4_address: 172.23.0.11 zoo2: image: zookeeper:3.4 restart: always hostname: zoo2 container_name: zoo2 privileged: true ports: - 2182:2181 volumes: - ./zoo2/data:/data - ./zoo2/datalog:/datalog environment: TZ: Asia/Shanghai ZOO_MY_ID: 2 ZOO_PORT: 2181 ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 networks: default: ipv4_address: 172.23.0.12 zoo3: image: zookeeper:3.4 restart: always hostname: zoo3 container_name: zoo3 privileged: true ports: - 2183:2181 volumes: - ./zoo3/data:/data - ./zoo3/datalog:/datalog environment: TZ: Asia/Shanghai ZOO_MY_ID: 3 ZOO_PORT: 2181 ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 networks: default: ipv4_address: 172.23.0.13 networks: default: external: name: zoo_kafka

Kafka: docker-compose.yml content:
Docker compose up - d start

version: '2' services: broker1: image: wurstmeister/kafka restart: always hostname: broker1 container_name: broker1 privileged: true ports: - "9091:9092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENERS: PLAINTEXT://broker1:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9092 KAFKA_ADVERTISED_HOST_NAME: broker1 KAFKA_ADVERTISED_PORT: 9092 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 #JMX_PORT: 9987 volumes: - /var/run/docker.sock:/var/run/docker.sock - ./broker1:/kafka/kafka\-logs\-broker1 external_links: - zoo1 - zoo2 - zoo3 networks: default: ipv4_address: 172.23.0.14 broker2: image: wurstmeister/kafka restart: always hostname: broker2 container_name: broker2 privileged: true ports: - "9092:9092" environment: KAFKA_BROKER_ID: 2 KAFKA_LISTENERS: PLAINTEXT://broker2:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092 KAFKA_ADVERTISED_HOST_NAME: broker2 KAFKA_ADVERTISED_PORT: 9092 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 #JMX_PORT: 9988 volumes: - /var/run/docker.sock:/var/run/docker.sock - ./broker2:/kafka/kafka\-logs\-broker2 external_links: # Connect container s other than this compose file - zoo1 - zoo2 - zoo3 networks: default: ipv4_address: 172.23.0.15 broker3: image: wurstmeister/kafka restart: always hostname: broker3 container_name: broker3 privileged: true ports: - "9093:9092" environment: KAFKA_BROKER_ID: 3 KAFKA_LISTENERS: PLAINTEXT://broker3:9092 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9092 KAFKA_ADVERTISED_HOST_NAME: broker3 KAFKA_ADVERTISED_PORT: 9092 KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181 #JMX_PORT: 9989 volumes: - /var/run/docker.sock:/var/run/docker.sock - ./broker3:/kafka/kafka\-logs\-broker3 external_links: # Connect container s other than this compose file - zoo1 - zoo2 - zoo3 networks: default: ipv4_address: 172.23.0.16 kafka-manager: image: sheepkiller/kafka-manager:latest restart: always container_name: kafka-manager hostname: kafka-manager ports: - "9000:9000" links: # Connect the container created by this compose file - broker1 - broker2 - broker3 external_links: # Connect container s other than this compose file - zoo1 - zoo2 - zoo3 environment: ZK_HOSTS: zoo1:2181,zoo2:2181,zoo3:2181 KAFKA_BROKERS: broker1:9092,broker2:9092,broker3:9092 APPLICATION_SECRET: letmein KM_ARGS: -Djava.net.preferIPv4Stack=true networks: default: ipv4_address: 172.23.0.10 networks: default: external: # Use created network name: zoo_kafka

3.1 kafka command line test

Theme functions help us partition

kafka-topics.sh --list --zookeeper zoo1:2181 -- View topics
Kafka-topics.sh -- zookeeper zoo1:2181 -- create -- replication factor 3 -- partitions 1 - topic first -- create a topic
Option Description:
– topic defines the topic name
– the number of replicas defined by replication factor cannot exceed the number of machines, otherwise an error will be reported.
– partitions defines the number of partitions
kafka-topics.sh --describe --topic frist --zookeeper zoo1:2181 -- describe a topic
kafka-topics.sh --zookeeper zook1:2181 --delete --topic first -- delete topic