03 elastic log system - filebeat-kafka-logstash-elastic search-kibana-6.8.0 building process

03 elastic log system - filebeat-kafka-logstash-elastic search-kibana-6.8.0 building process

1. introduction

It is written that redis is used as the intermediate cache and message queue to cut the peak of the log flow. However, in the process of using, when the amount of logs in the early stage is small, there is no problem at all. When the logs keep rising and the redis queue keeps piling up, the problem comes. Redis is a memory database. When the days cannot be consumed in real time, the memory occupation will keep rising Until OOM, the system crashes. Of course, the rate of logstash consuming logs is also a problem. However, consider replacing the single node redis, using the three node kafka, and modifying the startup parameters of elastic search. The following only describes the configuration of kafka and the problems encountered. Refer to the previous articles for other configurations.

2. Preparations


2.1 software version

All relevant software of elastic is installed with 6.8.0 rpm package
zookeeper: 3.4.14,Download address
kafka: 2.11-2.4.0,Download address

System version: CentOS Linux release 7.7.1908 (Core)

2.2 log flow

Filebeat > Kafka cluster > logstash > elasticsearch cluster > kibana

3. Configure zookeeper cluster

We use the zookeeper cluster outside kafka. There is also a zookeeper component in the actual kafka installation package. Reference: https://www.cnblogs.com/longBlogs/p/10340251.html

Configuration is described below

wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
tar -xvf zookeeper-3.4.14.tar.gz -C /usr/local
cd /usr/local
ln -sv zookeeper-3.4.14 zookeeper
cd zookeeper/conf
cp zoo_sample.cfg zoo.cfg
mkdir -pv /usr/local/zookeeper/{data,logs}

Node 1 edit the configuration file zoo.cfg

# Specify data folder, log folder


# The first port is the communication port between master and slave, which is 2888 by default. The second port is the port for leader election. When the cluster is started, the port for election or new election after the leader hangs up is 3888 by default

Configuration node id

echo "1" > /usr/local/zookeeper/data/myid   #server1 configuration, different nodes, the same number as server.1 configured above
echo "2" > /usr/local/zookeeper/data/myid   #server2 configuration. Each node is different. It is the same as the number of server.2 configured above
echo "3" > /usr/local/zookeeper/data/myid   #server3 configuration. Each node is different. It is the same as the number of server.3 configured above

Start stop zookeeper

# start-up
/usr/local/zookeeper/bin/zkServer.sh start
# Stop it
/usr/local/zookeeper/bin/zkServer.sh stop
# Status view
/usr/local/zookeeper/bin/zkServer.sh status

Configure zookeeper service

cd /usr/lib/systemd/system
# vim zookeeper.service

Description=zookeeper server daemon

ExecStart=/usr/local/zookeeper/bin/zkServer.sh start
ExecReload=/usr/local/zookeeper/bin/zkServer.sh stop && sleep 2 && /usr/local/zookeeper/bin/zkServer.sh start
ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop


# systemctl start  zookeeper
# systemctl enable zookeeper

4. Configure kafka cluster

Download and install

wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.11-2.4.0.tgz
tar -xvf kafka_2.11-2.4.0.tgz  -C /usr/local
cd /usr/local
ln -sv kafka_2.11-2.4.0 kafka
cd kafka/config

Modify configuration

# vim server.properties
broker.id=1 # Each broker's unique identification in the cluster requires a positive number, and the three nodes are different
host.name= # New item, node IP
num.network.threads=3 # The number of partitions per topic. More partitions allow more parallel operations
log.dirs=/var/log/kafka  # Log folder
log.retention.hours=168 #The maximum retention time (hours) of segment file. The timeout will be deleted, that is, the data 7 days ago will be cleaned up
log.segment.bytes=1073741824  # The size (in bytes) of each segment in the log file, which is 1G by default
log.cleaner.enable=true # Enable log cleanup
zookeeper.connect=,, # Address of zookeeper cluster, which can be multiple

The default memory required by Kafka node is 1G. If you need to modify the memory, you can modify the configuration item of kafka-server-start.sh
Find the Kafka? Heap? Opts configuration item. For example, modify it as follows:
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"

Start kafka

cd /usr/local/kafka
./bin/kafka-server-start.sh -daemon ./config/server.properties

Set startup

# cd /usr/lib/systemd/system
# vim kafka.service
Description=kafka server daemon


ExecStart=/usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
ExecReload=/usr/local/kafka/bin/kafka-server-stop.sh && sleep 2 && /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties


# systemctl start kafka
# systemctl enable kafka

Create topic
Create 3 partitions, 3 backups

cd /usr/local/kafka
/bin/kafka-topics.sh --create --zookeeper,, --replication-factor 3 --partitions 3 --topic java

Frequently used commands

1) Stop kafka

2) Create topic
./bin/kafka-topics.sh --create --zookeeper,, --replication-factor 1 --partitions 1 --topic topic_name

Partition expansion
./bin/kafka-topics.sh --zookeeper,, --alter --topic java --partitions 40

3) Show topic
./bin/kafka-topics.sh --list --zookeeper,,

4) View description topic
./bin/kafka-topics.sh --describe --zookeeper,, --topic topic_name

5) Producer sends message
./bin/kafka-console-producer.sh --broker-list --topic topic_name

6) Consumer consumption news
./bin/kafka-console-consumer.sh --bootstrap-server,, --topic topic_name

7) Delete topic
./bin/kafka-topics.sh --delete --topictopic_name --zookeeper,,

8) View per partition consumer_offsets (consumer hosts you can connect to)
./bin/kafka-topics.sh --describe --zookeeper,, --topic __consumer_offsets

5. Configure filebeat output

For details, please refer to https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html

  enabled: true
  hosts: ["","",""]
  topic: java
  required_acks: 1
  compression: gzip
  message.max.bytes: 500000000  # The maximum number of bytes transmitted per message. Those larger than this number will be discarded

Restart filebeat

systemctl restart filebeat

6. Configure logstash input

Install kafka input module first

/usr/share/logstash/bin/logstash-plugin install logstash-input-kafka

Add profile:

vim /etc/logstash/conf.d/kafka.conf
input {
    kafka {
        bootstrap_servers => ""
        group_id => "java"
        auto_offset_reset => "latest"
        consumer_threads => "5"
        decorate_events => "false"
        topics => ["java"]
        codec => json

output {
    elasticsearch {
        hosts => ["","",""]
        user => "elastic"
        password => "changme"
        index => "logs-other-%{+YYYY.MM.dd}"
        http_compression => true

After adding, test the configuration file

/usr/share/logstash/bin/logstash -t -f  /etc/logstash/conf.d/kafka.conf

Test OK, restart logstash

7. Possible problems

filebeat error

  1. *WARN producer/broker/0 maximum request accumulated, waiting for space
    Reference: https://linux.xiao5tech.com/bigdata/elk/elk ﹤ 2.2.1 ﹐ error ﹐ filebeat ﹐ Kafka ﹐ waiting ﹐ for ﹐ space.html
    Reason: the buffer value configuration of Max message bytes is small

  2. dropping too large message of size
    Reference: https://www.cnblogs.com/zhaosc-haha/p/12133699.html
    Reason: the number of message bytes transmitted exceeds the limit. Modify the scanning frequency of the log or confirm whether the log output is abnormal or unnecessary. Too large logs can seriously affect the performance of kafka.
    Setting value: 10000000 (10MB)

Published 37 original articles, won praise 9, visited 60000+
Private letter follow

Tags: kafka Zookeeper Java Redis

Posted on Sun, 19 Jan 2020 04:46:47 -0500 by XenoPhage