03 elastic log system - filebeat-kafka-logstash-elastic search-kibana-6.8.0 building process

03 elastic log system - filebeat-kafka-logstash-elastic search-kibana-6.8.0 building process

1. introduction

It is written that redis is used as the intermediate cache and message queue to cut the peak of the log flow. However, in the process of using, when the amount of logs in the early stage is small, there is no problem at all. When the logs keep rising and the redis queue keeps piling up, the problem comes. Redis is a memory database. When the days cannot be consumed in real time, the memory occupation will keep rising Until OOM, the system crashes. Of course, the rate of logstash consuming logs is also a problem. However, consider replacing the single node redis, using the three node kafka, and modifying the startup parameters of elastic search. The following only describes the configuration of kafka and the problems encountered. Refer to the previous articles for other configurations.

2. Preparations

Node:
192.168.72.56
192.168.72.57
192.168.72.58

2.1 software version

All relevant software of elastic is installed with 6.8.0 rpm package
zookeeper: 3.4.14,Download address
kafka: 2.11-2.4.0,Download address

System version: CentOS Linux release 7.7.1908 (Core)

2.2 log flow

Filebeat > Kafka cluster > logstash > elasticsearch cluster > kibana

3. Configure zookeeper cluster

We use the zookeeper cluster outside kafka. There is also a zookeeper component in the actual kafka installation package. Reference: https://www.cnblogs.com/longBlogs/p/10340251.html

Configuration is described below

wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
tar -xvf zookeeper-3.4.14.tar.gz -C /usr/local
cd /usr/local
ln -sv zookeeper-3.4.14 zookeeper
cd zookeeper/conf
cp zoo_sample.cfg zoo.cfg
mkdir -pv /usr/local/zookeeper/{data,logs}

Node 1 edit the configuration file zoo.cfg

# Specify data folder, log folder
dataDir=/usr/local/zookeeper/data
dataLogDir=/usr/local/zookeeper/logs

clientPort=2181

server.1=192.168.72.56:2888:3888
server.2=192.168.72.57:2888:3888
server.3=192.168.72.58:2888:3888
# The first port is the communication port between master and slave, which is 2888 by default. The second port is the port for leader election. When the cluster is started, the port for election or new election after the leader hangs up is 3888 by default

Configuration node id

echo "1" > /usr/local/zookeeper/data/myid   #server1 configuration, different nodes, the same number as server.1 configured above
echo "2" > /usr/local/zookeeper/data/myid   #server2 configuration. Each node is different. It is the same as the number of server.2 configured above
echo "3" > /usr/local/zookeeper/data/myid   #server3 configuration. Each node is different. It is the same as the number of server.3 configured above

Start stop zookeeper

# start-up
/usr/local/zookeeper/bin/zkServer.sh start
# Stop it
/usr/local/zookeeper/bin/zkServer.sh stop
# Status view
/usr/local/zookeeper/bin/zkServer.sh status

Configure zookeeper service

cd /usr/lib/systemd/system
# vim zookeeper.service

=========================================
[Unit]
Description=zookeeper server daemon
After=zookeeper.target

[Service]
Type=forking
ExecStart=/usr/local/zookeeper/bin/zkServer.sh start
ExecReload=/usr/local/zookeeper/bin/zkServer.sh stop && sleep 2 && /usr/local/zookeeper/bin/zkServer.sh start
ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop
Restart=always

[Install]

WantedBy=multi-user.target
=======================================================
# systemctl start  zookeeper
# systemctl enable zookeeper

4. Configure kafka cluster

Download and install

wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.11-2.4.0.tgz
tar -xvf kafka_2.11-2.4.0.tgz  -C /usr/local
cd /usr/local
ln -sv kafka_2.11-2.4.0 kafka
cd kafka/config

Modify configuration

# vim server.properties
broker.id=1 # Each broker's unique identification in the cluster requires a positive number, and the three nodes are different
host.name=192.168.72.56 # New item, node IP
num.network.threads=3 # The number of partitions per topic. More partitions allow more parallel operations
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/log/kafka  # Log folder
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168 #The maximum retention time (hours) of segment file. The timeout will be deleted, that is, the data 7 days ago will be cleaned up
log.segment.bytes=1073741824  # The size (in bytes) of each segment in the log file, which is 1G by default
log.retention.check.interval.ms=300000
log.cleaner.enable=true # Enable log cleanup
zookeeper.connect=192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181 # Address of zookeeper cluster, which can be multiple
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0

The default memory required by Kafka node is 1G. If you need to modify the memory, you can modify the configuration item of kafka-server-start.sh
Find the Kafka? Heap? Opts configuration item. For example, modify it as follows:
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"

Start kafka

cd /usr/local/kafka
./bin/kafka-server-start.sh -daemon ./config/server.properties

Set startup

# cd /usr/lib/systemd/system
# vim kafka.service
=========================================
[Unit]
Description=kafka server daemon
After=kafka.target

[Service]

Type=forking
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
ExecReload=/usr/local/kafka/bin/kafka-server-stop.sh && sleep 2 && /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
Restart=always

[Install]
WantedBy=multi-user.target
=======================================================

# systemctl start kafka
# systemctl enable kafka

Create topic
Create 3 partitions, 3 backups

cd /usr/local/kafka
/bin/kafka-topics.sh --create --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --replication-factor 3 --partitions 3 --topic java

Frequently used commands

1) Stop kafka
./bin/kafka-server-stop.sh 

2) Create topic
./bin/kafka-topics.sh --create --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181 --replication-factor 1 --partitions 1 --topic topic_name

Partition expansion
./bin/kafka-topics.sh --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181 --alter --topic java --partitions 40

3) Show topic
./bin/kafka-topics.sh --list --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181

4) View description topic
./bin/kafka-topics.sh --describe --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181 --topic topic_name

5) Producer sends message
./bin/kafka-console-producer.sh --broker-list 192.168.89.11:9092 --topic topic_name

6) Consumer consumption news
./bin/kafka-console-consumer.sh --bootstrap-server 192.168.89.11:9092,192.168.89.12:9092,192.168.89.13:9092 --topic topic_name

7) Delete topic
./bin/kafka-topics.sh --delete --topictopic_name --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181

8) View per partition consumer_offsets (consumer hosts you can connect to)
./bin/kafka-topics.sh --describe --zookeeper 192.168.72.56:2181,192.168.72.57:2181,192.168.72.58:2181 --topic __consumer_offsets

5. Configure filebeat output

For details, please refer to https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html

output.kafka:
  enabled: true
  hosts: ["192.168.72.56:9092","192.168.72.56:9092","192.168.72.56:9092"]
  topic: java
  required_acks: 1
  compression: gzip
  message.max.bytes: 500000000  # The maximum number of bytes transmitted per message. Those larger than this number will be discarded

Restart filebeat

systemctl restart filebeat

6. Configure logstash input

Install kafka input module first

/usr/share/logstash/bin/logstash-plugin install logstash-input-kafka

Add profile:

vim /etc/logstash/conf.d/kafka.conf
======================
input {
    kafka {
        bootstrap_servers => "192.168.72.56:9092"
        group_id => "java"
        auto_offset_reset => "latest"
        consumer_threads => "5"
        decorate_events => "false"
        topics => ["java"]
        codec => json
    }
}

output {
    elasticsearch {
        hosts => ["192.168.72.56:9200","192.168.72.57:9200","192.168.72.58:9200"]
        user => "elastic"
        password => "changme"
        index => "logs-other-%{+YYYY.MM.dd}"
        http_compression => true
  }
}

After adding, test the configuration file

/usr/share/logstash/bin/logstash -t -f  /etc/logstash/conf.d/kafka.conf

Test OK, restart logstash

7. Possible problems

filebeat error

  1. *WARN producer/broker/0 maximum request accumulated, waiting for space
    Reference: https://linux.xiao5tech.com/bigdata/elk/elk ﹤ 2.2.1 ﹐ error ﹐ filebeat ﹐ Kafka ﹐ waiting ﹐ for ﹐ space.html
    Reason: the buffer value configuration of Max message bytes is small

  2. dropping too large message of size
    Reference: https://www.cnblogs.com/zhaosc-haha/p/12133699.html
    Reason: the number of message bytes transmitted exceeds the limit. Modify the scanning frequency of the log or confirm whether the log output is abnormal or unnecessary. Too large logs can seriously affect the performance of kafka.
    Setting value: 10000000 (10MB)

Published 37 original articles, won praise 9, visited 60000+
Private letter follow

Tags: kafka Zookeeper Java Redis

Posted on Sun, 19 Jan 2020 04:46:47 -0500 by XenoPhage