Detailed explanation of enterprise log analysis system ELK

Detailed explanation of ELK theory
ELK deployment
ELK details


  • Analysis log is the main means for operation and maintenance engineers to find problems and solve system faults. Logs mainly include system logs, application logs and security logs.
  • Generally, a large-scale system is a distributed deployment architecture. Different service modules are deployed on different servers. When a problem occurs, it is necessary to locate the specific server and service module according to the key information exposed by the problem, and build a centralized log system, which can improve the efficiency of locating the problem.
  • Regular analysis of logs can understand the load, performance and security of the server, so as to take timely measures to correct errors. Usually, logs are stored on different devices. If you manage dozens or hundreds of servers and still use the traditional method of logging in to each machine in turn, it is cumbersome and inefficient. For this purpose, we can use centralized log management, such as open source syslog, to collect and summarize the logs on all servers.
  • After centralized log management, log statistics and inspection become a more troublesome thing. Generally, we can use grep, awk, wc and other Linux commands to achieve retrieval and statistics. However, for higher requirements of query, sorting and statistics, coupled with the huge number of machines, it is still difficult to use this method.
  • The open source real-time log analysis ELK platform can perfectly solve the above problems. ELK is composed of ElasticSearch, Logstash and Kibana.

1, ELK log analysis system

1. ELK introduction

   ELK platform is a complete set of centralized log processing solution, which combines ElasticSearch, Logstash and Kibana to meet more powerful user requirements for log query, sorting and statistics.

2. Component description

2.1 ElasticSearch

  • ES is a distributed storage retrieval engine based on Lucene (a full-text retrieval engine architecture), which is used to store all kinds of logs.
  • ES is developed in JAVA. Users can communicate with ES through browser through RESTful Web interface.
  • ES is a distributed search and analysis engine. Its advantage is that it can store, search and analyze large capacity data in near real time.

2.2 Logstash

  • Logstash acts as the data collection engine. It supports dynamic data search from various data sources, filtering, analyzing, enriching and unifying the data, and then storing it to the location specified by the user, which is generally sent to ES.
  • Logstash is written in JRuby language and runs on JAVA virtual machine (JVM). It is a powerful data processing tool that can realize data transmission, format processing and formatted output. Logstash has powerful plug-in functions and is commonly used for log processing.

2.3 Kibana

  • Kibana is a display tool developed based on Node.js. It can provide graphical log analysis Web interface display for Logstash and ES, and summarize, analyze and search important data logs.

2.4 Filebeat

  • Filebeat is a lightweight open source log file data searcher. Usually, install filebeat on the client that needs to collect data and specify the directory and log format. Filebeat can quickly collect data and send it to Logstash for parsing, or directly to ES storage. In terms of performance, it has obvious advantages over Logstash running on JVM and is a substitute for it.

Centralized log management beats includes four tools:
Packetbeat (, search network traffic data)
Topbeat (search for data such as CPU and memory usage at the system, process, and file system levels)
Filebeat (collect file data)
Winlogbeat (collect Windows time log data)

3. Basic characteristics of complete log system

  • Collection: it can collect log data from multiple sources
  • Transmission: it can analyze, filter and transmit the log data to the storage system stably
  • Storage: store log data
  • Analysis: supports UI analysis
  • Warning: it can provide error reporting and monitoring mechanism

4. Working principle of Elk

  • AppServer is a cluster similar to Nginx and Apache, and its log information is collected by Logstash
  • In order to reduce the bottleneck caused by network problems, Logstash services are often placed in the former cluster to reduce network consumption
  • Logstash formats the collected log data and transfers it to the ES database (this is a process of centralized log management)
  • Subsequently, Kibana indexes and stores the formatted log data information in the ES database
  • Finally, Kibana shows it to the client

2, Deploy ELK log analysis system

1. Server configuration

The serverto configurehost nameip addressMain software
node1 node2C/4Gnode1192.168.10.100ElasticSearch,Kibana
node2 node2C/4Gnode2192.168.10.101ElasticSearch
apache node-apache192.168.10.102Logstash,Apache

2. Turn off the firewall

systemctl stop firewalld && systemctl disable firewalld
setenforce 0

3. ElasticSearch cluster deployment (node1, node2)

3.1 environmental preparation

Take node1 as an example

[root@localhost ~]# hostnamectl set-hostname node1
[root@localhost ~]# su
[root@node1 ~]# echo " node1" >> /etc/hosts
[root@node1 ~]# echo " node2" >> /etc/hosts
[root@node1 ~]# java -version	#openjdk is not recommended

# rpm installation jdk (method 1)
cd /opt
#Transfer the software package to this directory
rpm -ivh jdk-8u201-linux-x64.rpm

vim /etc/profile.d/
export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$JAVA_HOME/bin:$PATH
1.Output definition java Working directory for
2.Output assignment java Required class files
3.Output redefines environment variables, $PATH Be sure to put it in $JAVA_HOME After, let the system read the version information in the working directory first

source /etc/profile.d/
java -version

# rpm installation jdk (method 2)
cd /opt
tar zxvf jdk-8u91-linux-x64.tar.gz -C /usr/local
mv /usr/local/jdk1.8.0_91/ /usr/local/jdk

vim /etc/profile
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

source /etc/profile
java -version

3.2 deployment of ElasticSearch software

3.2.1 installing elasticsearch RPM package

Take node1 as an example

[root@node1 ~]# cd /opt
[root@node1 opt]# rz -E
#Upload elasticsearch-5.5.0.rpm to the / opt directory
rz waiting to receive.
[root@node1 opt]# rpm -ivh elasticsearch-5.5.0.rpm 

3.2.2 loading system services

Take node1 as an example

systemctl daemon-reload && systemctl enable elasticsearch.service

3.2.3 modify the elasticsearch master configuration file

Take node1 as an example

[root@node1 opt]# cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak
#Backup profile
[root@node1 opt]# vim /etc/elasticsearch/elasticsearch.yml
##Line 17, uncomment and specify the cluster name my-elk-cluster
##In line 23, uncomment and specify the node name (node1 node is node1 and node2 node is node2) node1
##Line 33, uncomment and specify the data storage path /data/elk_data
##Line 37, uncomment and specify the log storage path
path.logs: /var/log/elasticsearch/
##Line 43, uncomment, and do not lock the memory at startup (front-end cache, related to IOPS performance test method and read / write times per second)
bootstrap.memory_lock: false
##Line 55, uncomment and set the listening address. represents all addresses
##Line 59, uncomment. The default listening port of ES service is 9200
http.port: 9200
##Line 68, uncomment. Cluster discovery is implemented through unicast, specifying nodes node1 and node2 to be discovered ["node1", "node2"]
[root@node1 opt]# grep -v "^#" /etc/elasticsearch/elasticsearch.yml my-elk-cluster node1 /data/elk_data
path.logs: /var/log/elasticsearch/
bootstrap.memory_lock: false
http.port: 9200 ["node1", "node2"]

scp /etc/elasticsearch/elasticsearch.yml root@
#Transfer the configured file to node2 with scp, and then just change the node name

3.2.4 create data storage path and authorize

Take node1 as an example

[root@node1 opt]# mkdir -p /data/elk_data
[root@node1 opt]# chown elasticsearch:elasticsearch /data/elk_data/

3.2.5 start elasticsearch

Take node1 as an example

[root@node1 opt]# systemctl start elasticsearch.service 
[root@node1 opt]# netstat -natp | grep 9200		#Slow start, waiting
tcp6       0      0 :::9200                 :::*                    LISTEN      4216/java           

3.2.6 viewing node information

Browser access , View node node1 and node2 information

Browser access, view the health status of the cluster. You can see that the status value is green, indicating that the node is running healthily

Browser access, Cluster / state - pretty checks the cluster state information

Using the above method to view the status of the cluster is not user-friendly. You can install the elasticsearch head plug-in to manage the cluster more conveniently.

3.3 install elasticsearch head plug-in

After ES version 5.0, the plug-in needs to be installed as an independent service and needs to be installed using npm tool (package management tool of NodeJS). To install elasticsarch head, you need to install the dependent software node and phantomjs in advance.

  • node
    It is a JavaScript running environment based on Chrome V8 engine.
  • phantomjs
    Is a JavaScript API based on webkit, which can be understood as an invisible browser. It can do anything based on webkit browser.

3.3.1 compiling and installing node

Take node1 as an example

[root@node1 ~]# cd /opt
[root@node1 opt]# rz -E
#Upload the software package node-v8.2.1.tar.gz to the / opt directory
rz waiting to receive.
[root@node1 opt]# yum install -y gcc gcc-c++ make
[root@node1 opt]# tar zxvf node-v8.2.1.tar.gz 
[root@node1 opt]# cd node-v8.2.1/
[root@node1 node-v8.2.1]# ./configure
[root@node1 node-v8.2.1]# make -j 4 && make install
#Compilation time is very long

3.3.2 installation of phantomjs

Take node1 as an example

[root@node1 node-v8.2.1]# cd /opt
[root@node1 opt]# rz -E
#Upload software package phantomjs-2.1.1-linux-x86_64.tar.bz2 to / opt directory
rz waiting to receive.
[root@node1 opt]# tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src
[root@node1 opt]# cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin
[root@node1 bin]# cp phantomjs /usr/local/bin

3.3.3 install elasticsearch head data visualization tool

Take node1 as an example

[root@node1 bin]# cd /opt
[root@node1 opt]# rz -E
#Upload the package elasticsearch-head.tar.gz to the / opt directory
rz waiting to receive.
[root@node1 opt]# tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/
[root@node1 opt]# cd /usr/local/src/elasticsearch-head/
[root@node1 elasticsearch-head]# npm install

3.3.4 modify Elasticsearch master configuration file

Take node1 as an example

[root@node1 elasticsearch-head]# vim /etc/elasticsearch/elasticsearch.yml
##Add the following to the last line
http.cors.enabled: true			##Enable cross domain access support. The default value is false
http.cors.allow-origin: "*"		##Specify that the domain names and addresses allowed for cross domain access are all
[root@node1 elasticsearch-head]# systemctl restart elasticsearch.service
[root@node1 elasticsearch-head]# netstat -antp | grep 9200

3.3.5 start elasticsearch head service

Take node1 as an example

[root@node1 elasticsearch-head]# cd /usr/local/src/elasticsearch-head/
[root@node1 elasticsearch-head]# npm run start &
[1] 71012
> elasticsearch-head@0.0.0 start /usr/local/src/elasticsearch-head
> grunt server
Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100
[root@node1 elasticsearch-head]# netstat -natp | grep 9100
tcp        0      0  *               LISTEN      71022/grunt         

Note: the service must be started in the extracted elasticsearch head directory, and the process will read the gruntfile.js file in this directory, otherwise the startup may fail.

3.3.6 view ES information through elasticsearch head

Access via browser Address and connect to the cluster. If you see that the cluster health value is green, it means that the cluster is very healthy.
Sometimes it shows that there is no connection. At this time, change localhost to IP address

3.3.7 Insert Index

Insert a test index through the command. The index is index demo and the type is test

[root@node1 elasticsearch-head]# curl -X PUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}'
  "_index" : "index-demo",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  "created" : true

3.3.8 browser view index information

Browser access Viewing the index information, you can see that the index is divided into 5 by default, and there is a copy.

Click * * data browsing * * and you will find relevant information that the index created on node1 is index demo and the type is test.

4. Elk logstash deployment (operate on Apache node)

  • Logstash is generally deployed on servers that need to monitor their logs. In this case, logstash is deployed on the Apache server to collect Apache log information and send it to Elasticsearch.

4.1 change host name

[root@localhost ~]# hostnamectl set-hostname apache
[root@localhost ~]# su
[root@apache ~]#

4.2 installing Apache service (httpd)

[root@apache ~]# yum install -y httpd
[root@apache ~]# systemctl start httpd && systemctl enable httpd

4.3 installing the Java environment

cd /opt
tar zxvf jdk-8u91-linux-x64.tar.gz -C /usr/local
mv /usr/local/jdk1.8.0_91/ /usr/local/jdk

vim /etc/profile
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

source /etc/profile
java -version

4.4 installing logstash

[root@apache ~]# cd /opt
[root@apache opt]# rz -E	#Upload the installation package logstash-5.5.1.rpm
[root@apache opt]# rpm -ivh logstash-5.5.1.rpm 
[root@apache opt]# systemctl start logstash.service && systemctl enable logstash.service
[root@apache opt]# ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

4.5 test Logstash

4.5.1 common options of logstash command

Logstash command common optionsexplain
-fWith this option, you can specify the configuration file of Logstash, and configure the input and output streams of Logstash according to the configuration file
-eFrom the command line, the input and output are followed by a string, which can be used as the configuration of Logstash (if it is empty, stdin is used as the input and stdout as the output by default)
-tTest that the configuration file is correct and exit

4.5.2 define input and output streams standard input and output

The input adopts standard input and the output adopts standard output (similar to pipeline)

[root@apache /opt]# logstash -e 'input { stdin{} } output { stdout{} }'
The stdin plugin is now waiting for input:
22:24:31.510 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}		#Type (standard input)
2021-11-19T14:28:36.175Z apache	#Input results (standard output)
2021-11-19T14:29:01.315Z apache
2021-11-19T14:29:10.569Z apache
2021-11-19T14:29:10.569Z apache
^C22:30:07.071 [SIGINT handler] WARN  logstash.runner - SIGINT received. Shutting down the agent.
22:30:07.081 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"} rubydebug output

Using rubydebug output detailed format display, codec is a codec

[root@apache /opt]# logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }'
The stdin plugin is now waiting for input:
22:37:46.417 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}		#Input content
{					#Output content
    "@timestamp" => 2021-11-19T14:38:03.535Z,
      "@version" => "1",
          "host" => "apache",
       "message" => ""
^C22:38:35.333 [SIGINT handler] WARN  logstash.runner - SIGINT received. Shutting down the agent.
22:38:35.343 [LogStash::Runner] WARN  logstash.agent - stopping pipeline {:id=>"main"} output to ES

Use logstash to write hee hee into ES

[root@apache opt]# logstash -e 'input { stdin{} } output { elasticsearch { hosts=>[""] } }'
The stdin plugin is now waiting for input:
22:40:57.485 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}	#Typing (standard output)

The results are not displayed in the standard output, but are sent to the ES and can be accessed by the browser View indexes and data

4.6 define logstash configuration file

The logstash configuration file basically consists of three parts:

  • Input input
  • Output output
  • filter filtering

The format is as follows:

input {...}
output {...}
filter {...}

In each section, you can also specify multiple access methods. For example, to specify two log source files, the format is as follows:

input {
	file { path =>"/var/log/messages" type =>"syslog"}
	file { path =>"/var/log/httpd/access.log" type =>"apache"}

Modify the logstash configuration file to collect the system log / var/log/messages and output it to ES.

[root@apache opt]# chmod o+r /var/log/messages
#Grant read permission so that Logstash can get the contents of the file
[root@apache opt]# vim /etc/logstash/conf.d/system.conf
##The file needs to be created by yourself, and the file name can be customized
input {
        path =>"/var/log/messages"
##Specify the location of the logs to collect
        type =>"system"
##Custom log type ID
        start_position =>"beginning"
##Indicates collection from the beginning
output {
##Output to ES
        hosts =>["", ""]
##Specify the address and port of the ES server. To avoid stand-alone failure, it is recommended to write all
        index =>"system-%{+YYYY.MM.dd}"
##Specifies the index format for output to ES
[root@apache opt]# systemctl restart logstash.service 

4.7 access test

Browser access View index information

5. Elk kibana deployment (operate on node1 node)

5.1 installation of Kibana

[root@node1 elasticsearch-head]# cd /opt
[root@node1 opt]# rz -E		#Upload package kibana-5.5.1-x86_64.rpm
[root@node1 opt]# rpm -ivh kibana-5.5.1-x86_64.rpm 

5.2 setting up Kibana's master profile

[root@node1 opt]# cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.bak
#Backup profile

[root@node1 opt]# vim /etc/kibana/kibana.yml
##Line 2. Uncomment. The default listening port of kibana service is 5601
server.port: 5601
##In line 7, uncomment and set the listening address of kibana. 0.0.0 represents all addresses ""
##Line 21, uncomment and set the address and port to establish a connection with ES
elasticsearch.url: ""
##Line 30, uncomment and set to add. kibana index in ES
kibana.index: ".kibana"

5.3 start Kibana service

[root@node1 opt]# systemctl start kibana.service && systemctl enable kibana.service 
[root@node1 opt]# netstat -natp | grep 5601
tcp        0      0  *               LISTEN      82765/node     

5.4 verification of Kibana

Browser access

An ES index needs to be added for the first login

Click create to create

After the index is added, click the Discover button to view the chart information and log information

Data display can be classified, such as host in available files

5.5 add the Apache server logs (accessed and incorrect) to ES and display them through Kibana

apache server

[root@apache opt]# mkdir -p /etc/logstash/conf.d/
[root@apache opt]# vim /etc/logstash/conf.d/apache_log.conf
input {
        path => "/etc/httpd/logs/access_log"
        type => "access"
        start_position => "beginning"
        path => "/etc/httpd/logs/error_log"
        type => "error"
        start_position => "beginning"
output {
    if [type] == "access" {
        elasticsearch {
            hosts => ["", ""]
            index => "apache_access-%{+YYYY.MM.dd}"
	if [type] == "error" {
        elasticsearch {
            hosts => ["", ""]
            index => "apache_error-%{+YYYY.MM.dd}"
[root@apache opt]# cd /etc/logstash/conf.d/
[root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf
21:55:40.494 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9601}

5.6 browser access

Browser access Check whether the index is created

Browser access Log in to kibana and add apache_access -, * and apache_error - * index to view log information.

3, ELFK (Filebeat + ELK)

1. Function of filebeat

Since logstash will occupy a lot of system memory resources, we will generally use filebeat to replace the log collection function of logstash to form the ELFK architecture.
Or use fluent d instead of logstash to form EFK (elastic search / fluent D / kibana). Because fluent D is developed by Go language, it is generally used more in K8S environment.

2. ELFK workflow

(1) filebeat collects logs and sends them to logstash for processing
(2) logstash performs filtering, formatting and other operations, and the data meeting the filtering conditions will be sent to ES
(3) ES stores data in pieces and provides indexing function
(4) kibana displays data graphically on the web and provides an index interface

3. ELFK deployment

3.1 server configuration

The serverto configurehost nameip addressMajor software deployment
node1 node2C/4Gnode1192.168.122.10ElasticSearch,Kibana
node2 node2C/4Gnode2192.168.122.11ElasticSearch
apache node-apache192.168.122.12Logstash,Apache
filebeat node-filebeat192.168.122.13Filebeat

On the basis of ELK, a filebeat server is added, so further operation is only required on the premise of the aforementioned ELK deployment.

3.2 server environment

filebeat node

[root@localhost ~]# hostnamectl set-hostname filebeat
[root@localhost ~]# su
[root@filebeat ~]# systemctl stop firewalld
[root@filebeat ~]# systemctl disable firewalld
[root@filebeat ~]# setenforce 0

3.3 installing filebeat

filebeat node

[root@filebeat ~]# cd /opt
[root@filebeat opt]# rz -E
rz waiting to receive.
[root@filebeat opt]# tar zxvf filebeat-6.2.4-linux-x86_64.tar.gz 
[root@filebeat opt]# mv filebeat-6.2.4-linux-x86_64 /usr/local/filebeat

3.4 modifying the filebeat master configuration file

filebeat node

[root@filebeat opt]# cd /usr/local/filebeat/
[root@filebeat filebeat]# cp filebeat.yml filebeat.yml.bak
[root@filebeat filebeat]# vim filebeat.yml
##Line 21, specify the log type, and read the message from the log file
- type: log
##Line 24: enable the log collection function. The default value is false
  enabled: true
##Line 28, specify the log file to monitor
    - /var/log/*.log
##Line 29, add collection / var/log/messages
    - /var/log/messages
##Line 31, add the following, and pay attention to the format
    service_name: filebeat
    log_type: log
#-------------------------- Elasticsearch output ------------------------------
All comments in this area
#----------------------------- Logstash output --------------------------------
##Line 157, uncomment
##Line 159, uncomment and specify the IP and port number of logstash
  hosts: [""]
[root@filebeat filebeat]# ./filebeat -e -c filebeat.yml
#Start filebeat, - e log to stderr and disable syslog / file output, - c specify the configuration file

3.5 create a new logstash configuration file on the node where the logstash component is located (apache node)

[root@apache ~]# cd /etc/logstash/conf.d/
[root@apache conf.d]# vim logstash.conf
input {
    beats {
        port => "5044"
output {
    elasticsearch {
        hosts => ["", ""]
        index => "%{[fields][service_name]}-%{+YYYY.MM.dd}"
    stdout {
        codec => rubydebug
[root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf 

3.6 browser verification

Browser access Log in to kibana,
After adding the "filebeat - *" index, view the collection of filebeat logs in "Discover".

Tags: Linux Operation & Maintenance Database Redis cloud computing

Posted on Fri, 19 Nov 2021 11:04:36 -0500 by twinzen