Big data Kafka

1Kafka definition

Kafka is a distributed Message Queue based on publish / subscribe mode, which is mainly used in the field of big data real-time processing.

1.1 function of message queue

1) Decoupling
It allows you to extend or modify the processes on both sides independently, as long as you ensure that they comply with the same interface constraints.
2) Recoverability
Failure of some components of the system will not affect the whole system. Message queuing reduces the coupling between processes
Even if a message processing process hangs, the messages added to the queue can still be processed after the system recovers.
3) Buffering helps to control and optimize the speed of data flow through the system and solve the inconsistency between the processing speed of production messages and consumption messages.
4) Flexibility & peak processing power
In the case of a sharp increase in traffic, applications still need to continue to play a role, but such burst traffic is not common. It would be a huge waste to put resources on standby to handle such peak visits. Using message queuing
It enables key components to withstand sudden access pressure without completely crashing due to sudden overload requests.
5) Asynchronous communication
Many times, users do not want or need to process messages immediately. Message queuing provides an asynchronous processing mechanism that allows users to put a message on the queue without processing it immediately. Put as many messages into the queue as you want, and then process them when needed.

1.2 two modes of message queue

(1) Point to point mode (one-to-one, consumers actively pull data, and the message is cleared after receiving the message)
The message producer sends the message to the queue, and then the message consumer takes it out of the queue and consumes the message. After the message is consumed, there is no storage in the queue, so it is impossible for the message consumer to consume the consumed message.
Queue supports multiple consumers, but for a message, only one consumer can consume.

(2) Publish / subscribe mode (one to many, messages will not be cleared after consumer consumption data)
The message producer (publisher) publishes the message to the topic, and multiple message consumers (subscribers) consume the message at the same time
Rest.
There are two modes for publishing and subscribing:
1. Consumers actively pull data
2. Queues actively push data -- consumers' consumption rates are not equal, which may lead to waste of service resources and service collapse.

2Kafka

2.1 Kafka architecture

  1. Producer consumption news
  2. Kafks cluster management message
    1. kafka cluster is a process running on the server.
    2. The Kafka cluster partitions messages according to different topics.
    3. Kafka will make redundant backup of data (not on the same server), leader and follower (data backup)
  3. Consumer News
    1. Consumer group: a partition can only be consumed by consumers in the same consumer group.
    Consumer group, improve consumption ability.
  4. Zookeeper registration message
    1. Kafka registry
    2. Consumer consumption information is stored in zk, which is convenient for consumers to read consumption information from zk after hanging up.
    Before version 0.9, it exists in zk, and after version 0.9, it is saved in a topic in Kafka. Save to disk (7 days by default)

2.1 docker deploying Kafka cluster

2.1.1 docker compose installation

1,sudo curl -L "https://github.com/docker/compose/releases/download/1.28.2/docker-compose- ( u n a m e − s ) − (uname -s)- (uname−s)−(uname -m)" -o /usr/local/bin/docker-compose
2,chmod +x /usr/local/bin/docker-compose

2.1.2 Kafka cluster installation

1. Create virtual network
docker network create --driver bridge --subnet 172.23.0.0/16 --gateway 172.23.0.1 zoo_kafka
docker-compose.yml is as follows:

version: '2'
services:
 zoo1:
  image: zookeeper:3.4 # Image name
  restart: always # Automatic restart in case of error
  hostname: zoo1
  container_name: zoo1
  privileged: true
  ports: # port
   - 2181:2181
  volumes: # Mount data volume
   - ./zoo1/data:/data
   - ./zoo1/datalog:/datalog 
  environment:
   TZ: Asia/Shanghai
   ZOO_MY_ID: 1 # Node ID
   ZOO_PORT: 2181 # zookeeper port number
   ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 # zookeeper node list
  networks:
   default:
    ipv4_address: 172.23.0.11

 zoo2:
  image: zookeeper:3.4
  restart: always
  hostname: zoo2
  container_name: zoo2
  privileged: true
  ports:
   - 2182:2181
  volumes:
   - ./zoo2/data:/data
   - ./zoo2/datalog:/datalog
  environment:
   TZ: Asia/Shanghai
   ZOO_MY_ID: 2
   ZOO_PORT: 2181
   ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
  networks:
   default:
    ipv4_address: 172.23.0.12

 zoo3:
  image: zookeeper:3.4
  restart: always
  hostname: zoo3
  container_name: zoo3
  privileged: true
  ports:
   - 2183:2181
  volumes:
   - ./zoo3/data:/data
   - ./zoo3/datalog:/datalog
  environment:
   TZ: Asia/Shanghai
   ZOO_MY_ID: 3
   ZOO_PORT: 2181
   ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
  networks:
   default:
    ipv4_address: 172.23.0.13

networks:
 default:
  external:
   name: zoo_kafka

Kafka: docker-compose.yml content:
Docker compose up - d start

version: '2'

services:
 broker1:
  image: wurstmeister/kafka
  restart: always
  hostname: broker1
  container_name: broker1
  privileged: true
  ports:
   - "9091:9092"
  environment:
   KAFKA_BROKER_ID: 1
   KAFKA_LISTENERS: PLAINTEXT://broker1:9092
   KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker1:9092
   KAFKA_ADVERTISED_HOST_NAME: broker1
   KAFKA_ADVERTISED_PORT: 9092
   KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
   #JMX_PORT: 9987
  volumes:
   - /var/run/docker.sock:/var/run/docker.sock
   - ./broker1:/kafka/kafka\-logs\-broker1
  external_links:
  - zoo1
  - zoo2
  - zoo3
  networks:
   default:
    ipv4_address: 172.23.0.14

 broker2:
  image: wurstmeister/kafka
  restart: always
  hostname: broker2
  container_name: broker2
  privileged: true
  ports:
   - "9092:9092"
  environment:
   KAFKA_BROKER_ID: 2
   KAFKA_LISTENERS: PLAINTEXT://broker2:9092
   KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker2:9092
   KAFKA_ADVERTISED_HOST_NAME: broker2
   KAFKA_ADVERTISED_PORT: 9092
   KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
   #JMX_PORT: 9988
  volumes:
   - /var/run/docker.sock:/var/run/docker.sock
   - ./broker2:/kafka/kafka\-logs\-broker2
  external_links: # Connect container s other than this compose file
  - zoo1
  - zoo2
  - zoo3
  networks:
   default:
    ipv4_address: 172.23.0.15

 broker3:
  image: wurstmeister/kafka
  restart: always
  hostname: broker3
  container_name: broker3
  privileged: true
  ports:
   - "9093:9092"
  environment:
   KAFKA_BROKER_ID: 3
   KAFKA_LISTENERS: PLAINTEXT://broker3:9092
   KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker3:9092
   KAFKA_ADVERTISED_HOST_NAME: broker3
   KAFKA_ADVERTISED_PORT: 9092
   KAFKA_ZOOKEEPER_CONNECT: zoo1:2181,zoo2:2181,zoo3:2181
   #JMX_PORT: 9989
  volumes:
   - /var/run/docker.sock:/var/run/docker.sock
   - ./broker3:/kafka/kafka\-logs\-broker3
  external_links: # Connect container s other than this compose file
  - zoo1
  - zoo2
  - zoo3
  networks:
   default:
    ipv4_address: 172.23.0.16

 kafka-manager:
  image: sheepkiller/kafka-manager:latest
  restart: always
  container_name: kafka-manager
  hostname: kafka-manager
  ports:
   - "9000:9000"
  links:      # Connect the container created by this compose file
   - broker1
   - broker2
   - broker3
  external_links:  # Connect container s other than this compose file
   - zoo1
   - zoo2
   - zoo3
  environment:
   ZK_HOSTS: zoo1:2181,zoo2:2181,zoo3:2181
   KAFKA_BROKERS: broker1:9092,broker2:9092,broker3:9092
   APPLICATION_SECRET: letmein
   KM_ARGS: -Djava.net.preferIPv4Stack=true
  networks:
   default:
    ipv4_address: 172.23.0.10

networks:
 default:
  external:  # Use created network
   name: zoo_kafka

3.1 kafka command line test

Theme functions help us partition

  1. kafka-topics.sh --list --zookeeper zoo1:2181 -- View topics
  2. Kafka-topics.sh -- zookeeper zoo1:2181 -- create -- replication factor 3 -- partitions 1 - topic first -- create a topic
    Option Description:
    – topic defines the topic name
    – the number of replicas defined by replication factor cannot exceed the number of machines, otherwise an error will be reported.
    – partitions defines the number of partitions
  3. kafka-topics.sh --describe --topic frist --zookeeper zoo1:2181 -- describe a topic
  4. kafka-topics.sh --zookeeper zook1:2181 --delete --topic first -- delete topic

Tags: Big Data kafka Distribution

Posted on Thu, 11 Nov 2021 23:26:27 -0500 by will83