Detailed explanation of ELK theory
ELK deployment
ELK details
preface
- Analysis log is the main means for operation and maintenance engineers to find problems and solve system faults. Logs mainly include system logs, application logs and security logs.
- Generally, a large-scale system is a distributed deployment architecture. Different service modules are deployed on different servers. When a problem occurs, it is necessary to locate the specific server and service module according to the key information exposed by the problem, and build a centralized log system, which can improve the efficiency of locating the problem.
- Regular analysis of logs can understand the load, performance and security of the server, so as to take timely measures to correct errors. Usually, logs are stored on different devices. If you manage dozens or hundreds of servers and still use the traditional method of logging in to each machine in turn, it is cumbersome and inefficient. For this purpose, we can use centralized log management, such as open source syslog, to collect and summarize the logs on all servers.
- After centralized log management, log statistics and inspection become a more troublesome thing. Generally, we can use grep, awk, wc and other Linux commands to achieve retrieval and statistics. However, for higher requirements of query, sorting and statistics, coupled with the huge number of machines, it is still difficult to use this method.
- The open source real-time log analysis ELK platform can perfectly solve the above problems. ELK is composed of ElasticSearch, Logstash and Kibana.
1, ELK log analysis system
1. ELK introduction
ELK platform is a complete set of centralized log processing solution, which combines ElasticSearch, Logstash and Kibana to meet more powerful user requirements for log query, sorting and statistics.
2. Component description
2.1 ElasticSearch
- ES is a distributed storage retrieval engine based on Lucene (a full-text retrieval engine architecture), which is used to store all kinds of logs.
- ES is developed in JAVA. Users can communicate with ES through browser through RESTful Web interface.
- ES is a distributed search and analysis engine. Its advantage is that it can store, search and analyze large capacity data in near real time.
2.2 Logstash
- Logstash acts as the data collection engine. It supports dynamic data search from various data sources, filtering, analyzing, enriching and unifying the data, and then storing it to the location specified by the user, which is generally sent to ES.
- Logstash is written in JRuby language and runs on JAVA virtual machine (JVM). It is a powerful data processing tool that can realize data transmission, format processing and formatted output. Logstash has powerful plug-in functions and is commonly used for log processing.
2.3 Kibana
- Kibana is a display tool developed based on Node.js. It can provide graphical log analysis Web interface display for Logstash and ES, and summarize, analyze and search important data logs.
2.4 Filebeat
- Filebeat is a lightweight open source log file data searcher. Usually, install filebeat on the client that needs to collect data and specify the directory and log format. Filebeat can quickly collect data and send it to Logstash for parsing, or directly to ES storage. In terms of performance, it has obvious advantages over Logstash running on JVM and is a substitute for it.
Centralized log management beats includes four tools:
Packetbeat (, search network traffic data)
Topbeat (search for data such as CPU and memory usage at the system, process, and file system levels)
Filebeat (collect file data)
Winlogbeat (collect Windows time log data)
3. Basic characteristics of complete log system
- Collection: it can collect log data from multiple sources
- Transmission: it can analyze, filter and transmit the log data to the storage system stably
- Storage: store log data
- Analysis: supports UI analysis
- Warning: it can provide error reporting and monitoring mechanism
4. Working principle of Elk
- AppServer is a cluster similar to Nginx and Apache, and its log information is collected by Logstash
- In order to reduce the bottleneck caused by network problems, Logstash services are often placed in the former cluster to reduce network consumption
- Logstash formats the collected log data and transfers it to the ES database (this is a process of centralized log management)
- Subsequently, Kibana indexes and stores the formatted log data information in the ES database
- Finally, Kibana shows it to the client
2, Deploy ELK log analysis system
1. Server configuration
The server | to configure | host name | ip address | Main software |
---|---|---|---|---|
node1 node | 2C/4G | node1 | 192.168.10.100 | ElasticSearch,Kibana |
node2 node | 2C/4G | node2 | 192.168.10.101 | ElasticSearch |
apache node | - | apache | 192.168.10.102 | Logstash,Apache |
2. Turn off the firewall
systemctl stop firewalld && systemctl disable firewalld setenforce 0 ntpdate ntp.aliyun.com
3. ElasticSearch cluster deployment (node1, node2)
3.1 environmental preparation
Take node1 as an example
[root@localhost ~]# hostnamectl set-hostname node1 [root@localhost ~]# su [root@node1 ~]# echo "192.168.10.100 node1" >> /etc/hosts [root@node1 ~]# echo "192.168.10.101 node2" >> /etc/hosts [root@node1 ~]# java -version #openjdk is not recommended # rpm installation jdk (method 1) cd /opt #Transfer the software package to this directory rpm -ivh jdk-8u201-linux-x64.rpm vim /etc/profile.d/java.sh export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64 export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar export PATH=$JAVA_HOME/bin:$PATH #notes: 1.Output definition java Working directory for 2.Output assignment java Required class files 3.Output redefines environment variables, $PATH Be sure to put it in $JAVA_HOME After, let the system read the version information in the working directory first source /etc/profile.d/java.sh java -version # rpm installation jdk (method 2) cd /opt tar zxvf jdk-8u91-linux-x64.tar.gz -C /usr/local mv /usr/local/jdk1.8.0_91/ /usr/local/jdk vim /etc/profile export JAVA_HOME=/usr/local/jdk export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH source /etc/profile java -version
3.2 deployment of ElasticSearch software
3.2.1 installing elasticsearch RPM package
Take node1 as an example
[root@node1 ~]# cd /opt [root@node1 opt]# rz -E #Upload elasticsearch-5.5.0.rpm to the / opt directory rz waiting to receive. [root@node1 opt]# rpm -ivh elasticsearch-5.5.0.rpm
3.2.2 loading system services
Take node1 as an example
systemctl daemon-reload && systemctl enable elasticsearch.service
3.2.3 modify the elasticsearch master configuration file
Take node1 as an example
[root@node1 opt]# cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak #Backup profile [root@node1 opt]# vim /etc/elasticsearch/elasticsearch.yml ##Line 17, uncomment and specify the cluster name cluster.name: my-elk-cluster ##In line 23, uncomment and specify the node name (node1 node is node1 and node2 node is node2) node.name: node1 ##Line 33, uncomment and specify the data storage path path.data: /data/elk_data ##Line 37, uncomment and specify the log storage path path.logs: /var/log/elasticsearch/ ##Line 43, uncomment, and do not lock the memory at startup (front-end cache, related to IOPS performance test method and read / write times per second) bootstrap.memory_lock: false ##Line 55, uncomment and set the listening address. 0.0.0.0 represents all addresses network.host: 0.0.0.0 ##Line 59, uncomment. The default listening port of ES service is 9200 http.port: 9200 ##Line 68, uncomment. Cluster discovery is implemented through unicast, specifying nodes node1 and node2 to be discovered discovery.zen.ping.unicast.hosts: ["node1", "node2"] [root@node1 opt]# grep -v "^#" /etc/elasticsearch/elasticsearch.yml cluster.name: my-elk-cluster node.name: node1 path.data: /data/elk_data path.logs: /var/log/elasticsearch/ bootstrap.memory_lock: false network.host: 0.0.0.0 http.port: 9200 discovery.zen.ping.unicast.hosts: ["node1", "node2"] ------------------------------------------------------- scp /etc/elasticsearch/elasticsearch.yml root@192.168.10.101:/etc/elasticsearch/elasticsearch.yml #Transfer the configured file to node2 with scp, and then just change the node name
3.2.4 create data storage path and authorize
Take node1 as an example
[root@node1 opt]# mkdir -p /data/elk_data [root@node1 opt]# chown elasticsearch:elasticsearch /data/elk_data/
3.2.5 start elasticsearch
Take node1 as an example
[root@node1 opt]# systemctl start elasticsearch.service [root@node1 opt]# netstat -natp | grep 9200 #Slow start, waiting tcp6 0 0 :::9200 :::* LISTEN 4216/java
3.2.6 viewing node information
Browser access http://192.168.10.100:9200 , http://192.168.10.101:9200 View node node1 and node2 information
Browser access http://192.168.10.100:9200/_cluster/health?pretty, http://192.168.10.101:9200/_cluster/health?pretty view the health status of the cluster. You can see that the status value is green, indicating that the node is running healthily
Browser access http://192.168.10.100:9200/_cluster/state?pretty, http://192.168.10.101:9200/_ Cluster / state - pretty checks the cluster state information
Using the above method to view the status of the cluster is not user-friendly. You can install the elasticsearch head plug-in to manage the cluster more conveniently.
3.3 install elasticsearch head plug-in
After ES version 5.0, the plug-in needs to be installed as an independent service and needs to be installed using npm tool (package management tool of NodeJS). To install elasticsarch head, you need to install the dependent software node and phantomjs in advance.
- node
It is a JavaScript running environment based on Chrome V8 engine. - phantomjs
Is a JavaScript API based on webkit, which can be understood as an invisible browser. It can do anything based on webkit browser.
3.3.1 compiling and installing node
Take node1 as an example
[root@node1 ~]# cd /opt [root@node1 opt]# rz -E #Upload the software package node-v8.2.1.tar.gz to the / opt directory rz waiting to receive. [root@node1 opt]# yum install -y gcc gcc-c++ make [root@node1 opt]# tar zxvf node-v8.2.1.tar.gz [root@node1 opt]# cd node-v8.2.1/ [root@node1 node-v8.2.1]# ./configure [root@node1 node-v8.2.1]# make -j 4 && make install #Compilation time is very long
3.3.2 installation of phantomjs
Take node1 as an example
[root@node1 node-v8.2.1]# cd /opt [root@node1 opt]# rz -E #Upload software package phantomjs-2.1.1-linux-x86_64.tar.bz2 to / opt directory rz waiting to receive. [root@node1 opt]# tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src [root@node1 opt]# cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin [root@node1 bin]# cp phantomjs /usr/local/bin
3.3.3 install elasticsearch head data visualization tool
Take node1 as an example
[root@node1 bin]# cd /opt [root@node1 opt]# rz -E #Upload the package elasticsearch-head.tar.gz to the / opt directory rz waiting to receive. [root@node1 opt]# tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/ [root@node1 opt]# cd /usr/local/src/elasticsearch-head/ [root@node1 elasticsearch-head]# npm install
3.3.4 modify Elasticsearch master configuration file
Take node1 as an example
[root@node1 elasticsearch-head]# vim /etc/elasticsearch/elasticsearch.yml ##Add the following to the last line http.cors.enabled: true ##Enable cross domain access support. The default value is false http.cors.allow-origin: "*" ##Specify that the domain names and addresses allowed for cross domain access are all [root@node1 elasticsearch-head]# systemctl restart elasticsearch.service [root@node1 elasticsearch-head]# netstat -antp | grep 9200
3.3.5 start elasticsearch head service
Take node1 as an example
[root@node1 elasticsearch-head]# cd /usr/local/src/elasticsearch-head/ [root@node1 elasticsearch-head]# npm run start & [1] 71012 > elasticsearch-head@0.0.0 start /usr/local/src/elasticsearch-head > grunt server Running "connect:server" (connect) task Waiting forever... Started connect web server on http://localhost:9100 ^C [root@node1 elasticsearch-head]# netstat -natp | grep 9100 tcp 0 0 0.0.0.0:9100 0.0.0.0:* LISTEN 71022/grunt
Note: the service must be started in the extracted elasticsearch head directory, and the process will read the gruntfile.js file in this directory, otherwise the startup may fail.
3.3.6 view ES information through elasticsearch head
Access via browser http://192.168.10.100:9100 Address and connect to the cluster. If you see that the cluster health value is green, it means that the cluster is very healthy.
Sometimes it shows that there is no connection. At this time, change localhost to IP address
3.3.7 Insert Index
Insert a test index through the command. The index is index demo and the type is test
[root@node1 elasticsearch-head]# curl -X PUT 'localhost:9200/index-demo/test/1?pretty&pretty' -H 'content-Type: application/json' -d '{"user":"zhangsan","mesg":"hello world"}' { "_index" : "index-demo", "_type" : "test", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "created" : true }
3.3.8 browser view index information
Browser access http://129.168.10.100:9100 Viewing the index information, you can see that the index is divided into 5 by default, and there is a copy.
Click * * data browsing * * and you will find relevant information that the index created on node1 is index demo and the type is test.
4. Elk logstash deployment (operate on Apache node)
- Logstash is generally deployed on servers that need to monitor their logs. In this case, logstash is deployed on the Apache server to collect Apache log information and send it to Elasticsearch.
4.1 change host name
[root@localhost ~]# hostnamectl set-hostname apache [root@localhost ~]# su [root@apache ~]#
4.2 installing Apache service (httpd)
[root@apache ~]# yum install -y httpd [root@apache ~]# systemctl start httpd && systemctl enable httpd
4.3 installing the Java environment
cd /opt tar zxvf jdk-8u91-linux-x64.tar.gz -C /usr/local mv /usr/local/jdk1.8.0_91/ /usr/local/jdk vim /etc/profile export JAVA_HOME=/usr/local/jdk export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH source /etc/profile java -version
4.4 installing logstash
[root@apache ~]# cd /opt [root@apache opt]# rz -E #Upload the installation package logstash-5.5.1.rpm [root@apache opt]# rpm -ivh logstash-5.5.1.rpm [root@apache opt]# systemctl start logstash.service && systemctl enable logstash.service [root@apache opt]# ln -s /usr/share/logstash/bin/logstash /usr/local/bin/
4.5 test Logstash
4.5.1 common options of logstash command
Logstash command common options | explain |
---|---|
-f | With this option, you can specify the configuration file of Logstash, and configure the input and output streams of Logstash according to the configuration file |
-e | From the command line, the input and output are followed by a string, which can be used as the configuration of Logstash (if it is empty, stdin is used as the input and stdout as the output by default) |
-t | Test that the configuration file is correct and exit |
4.5.2 define input and output streams
4.5.2.1 standard input and output
The input adopts standard input and the output adopts standard output (similar to pipeline)
[root@apache /opt]# logstash -e 'input { stdin{} } output { stdout{} }' ...... The stdin plugin is now waiting for input: 22:24:31.510 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600} www.test.com #Type (standard input) 2021-11-19T14:28:36.175Z apache www.test.com #Input results (standard output) www.4399.com 2021-11-19T14:29:01.315Z apache www.4399.com www.baidu.com 2021-11-19T14:29:10.569Z apache www.baidu.com 2021-11-19T14:29:10.569Z apache www.baidu.com ^C22:30:07.071 [SIGINT handler] WARN logstash.runner - SIGINT received. Shutting down the agent. 22:30:07.081 [LogStash::Runner] WARN logstash.agent - stopping pipeline {:id=>"main"}
4.5.2.2 rubydebug output
Using rubydebug output detailed format display, codec is a codec
[root@apache /opt]# logstash -e 'input { stdin{} } output { stdout{ codec=>rubydebug } }' ...... The stdin plugin is now waiting for input: 22:37:46.417 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600} www.test.com #Input content { #Output content "@timestamp" => 2021-11-19T14:38:03.535Z, "@version" => "1", "host" => "apache", "message" => "www.test.com" } ^C22:38:35.333 [SIGINT handler] WARN logstash.runner - SIGINT received. Shutting down the agent. 22:38:35.343 [LogStash::Runner] WARN logstash.agent - stopping pipeline {:id=>"main"}
4.5.2.3 output to ES
Use logstash to write hee hee into ES
[root@apache opt]# logstash -e 'input { stdin{} } output { elasticsearch { hosts=>["192.168.10.100:9200"] } }' ······ The stdin plugin is now waiting for input: 22:40:57.485 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600} www.test.com #Typing (standard output)
The results are not displayed in the standard output, but are sent to the ES and can be accessed by the browser http://192.168.10.100:9100 View indexes and data
4.6 define logstash configuration file
The logstash configuration file basically consists of three parts:
- Input input
- Output output
- filter filtering
The format is as follows:
input {...} output {...} filter {...}
In each section, you can also specify multiple access methods. For example, to specify two log source files, the format is as follows:
input { file { path =>"/var/log/messages" type =>"syslog"} file { path =>"/var/log/httpd/access.log" type =>"apache"} }
Modify the logstash configuration file to collect the system log / var/log/messages and output it to ES.
[root@apache opt]# chmod o+r /var/log/messages #Grant read permission so that Logstash can get the contents of the file [root@apache opt]# vim /etc/logstash/conf.d/system.conf ##The file needs to be created by yourself, and the file name can be customized input { file{ path =>"/var/log/messages" ##Specify the location of the logs to collect type =>"system" ##Custom log type ID start_position =>"beginning" ##Indicates collection from the beginning } } output { elasticsearch{ ##Output to ES hosts =>["192.168.10.100:9200", "192.168.10.101:9200"] ##Specify the address and port of the ES server. To avoid stand-alone failure, it is recommended to write all index =>"system-%{+YYYY.MM.dd}" ##Specifies the index format for output to ES } } [root@apache opt]# systemctl restart logstash.service
4.7 access test
Browser access http://192.168.10.100:9100 View index information
5. Elk kibana deployment (operate on node1 node)
5.1 installation of Kibana
[root@node1 elasticsearch-head]# cd /opt [root@node1 opt]# rz -E #Upload package kibana-5.5.1-x86_64.rpm [root@node1 opt]# rpm -ivh kibana-5.5.1-x86_64.rpm
5.2 setting up Kibana's master profile
[root@node1 opt]# cp /etc/kibana/kibana.yml /etc/kibana/kibana.yml.bak #Backup profile [root@node1 opt]# vim /etc/kibana/kibana.yml ##Line 2. Uncomment. The default listening port of kibana service is 5601 server.port: 5601 ##In line 7, uncomment and set the listening address of kibana. 0.0.0 represents all addresses server.host: "0.0.0.0" ##Line 21, uncomment and set the address and port to establish a connection with ES elasticsearch.url: "http://192.168.10.100:9200" ##Line 30, uncomment and set to add. kibana index in ES kibana.index: ".kibana"
5.3 start Kibana service
[root@node1 opt]# systemctl start kibana.service && systemctl enable kibana.service [root@node1 opt]# netstat -natp | grep 5601 tcp 0 0 0.0.0.0:5601 0.0.0.0:* LISTEN 82765/node
5.4 verification of Kibana
Browser access http://192.168.10.100:5601
An ES index needs to be added for the first login
Click create to create
After the index is added, click the Discover button to view the chart information and log information
Data display can be classified, such as host in available files
5.5 add the Apache server logs (accessed and incorrect) to ES and display them through Kibana
apache server
[root@apache opt]# mkdir -p /etc/logstash/conf.d/ [root@apache opt]# vim /etc/logstash/conf.d/apache_log.conf input { file{ path => "/etc/httpd/logs/access_log" type => "access" start_position => "beginning" } file{ path => "/etc/httpd/logs/error_log" type => "error" start_position => "beginning" } } output { if [type] == "access" { elasticsearch { hosts => ["192.168.10.100:9200", "192.168.10.101:9200"] index => "apache_access-%{+YYYY.MM.dd}" } } if [type] == "error" { elasticsearch { hosts => ["192.168.10.100:9200", "192.168.10.101:9200"] index => "apache_error-%{+YYYY.MM.dd}" } } } [root@apache opt]# cd /etc/logstash/conf.d/ [root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf ······ 21:55:40.494 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9601}
5.6 browser access
Browser access http://192.168.10.100:9100 Check whether the index is created
Browser access http://192.168.10.100:5601 Log in to kibana and add apache_access -, * and apache_error - * index to view log information.
3, ELFK (Filebeat + ELK)
1. Function of filebeat
Since logstash will occupy a lot of system memory resources, we will generally use filebeat to replace the log collection function of logstash to form the ELFK architecture.
Or use fluent d instead of logstash to form EFK (elastic search / fluent D / kibana). Because fluent D is developed by Go language, it is generally used more in K8S environment.
2. ELFK workflow
(1) filebeat collects logs and sends them to logstash for processing
(2) logstash performs filtering, formatting and other operations, and the data meeting the filtering conditions will be sent to ES
(3) ES stores data in pieces and provides indexing function
(4) kibana displays data graphically on the web and provides an index interface
3. ELFK deployment
3.1 server configuration
The server | to configure | host name | ip address | Major software deployment |
---|---|---|---|---|
node1 node | 2C/4G | node1 | 192.168.122.10 | ElasticSearch,Kibana |
node2 node | 2C/4G | node2 | 192.168.122.11 | ElasticSearch |
apache node | - | apache | 192.168.122.12 | Logstash,Apache |
filebeat node | - | filebeat | 192.168.122.13 | Filebeat |
On the basis of ELK, a filebeat server is added, so further operation is only required on the premise of the aforementioned ELK deployment.
3.2 server environment
filebeat node
[root@localhost ~]# hostnamectl set-hostname filebeat [root@localhost ~]# su [root@filebeat ~]# systemctl stop firewalld [root@filebeat ~]# systemctl disable firewalld [root@filebeat ~]# setenforce 0
3.3 installing filebeat
filebeat node
[root@filebeat ~]# cd /opt [root@filebeat opt]# rz -E rz waiting to receive. [root@filebeat opt]# tar zxvf filebeat-6.2.4-linux-x86_64.tar.gz [root@filebeat opt]# mv filebeat-6.2.4-linux-x86_64 /usr/local/filebeat
3.4 modifying the filebeat master configuration file
filebeat node
[root@filebeat opt]# cd /usr/local/filebeat/ [root@filebeat filebeat]# cp filebeat.yml filebeat.yml.bak [root@filebeat filebeat]# vim filebeat.yml filebeat.prospectors: ##Line 21, specify the log type, and read the message from the log file - type: log ##Line 24: enable the log collection function. The default value is false enabled: true ##Line 28, specify the log file to monitor - /var/log/*.log ##Line 29, add collection / var/log/messages - /var/log/messages ##Line 31, add the following, and pay attention to the format fields: service_name: filebeat log_type: log service_id: 192.168.122.13 #-------------------------- Elasticsearch output ------------------------------ All comments in this area #----------------------------- Logstash output -------------------------------- ##Line 157, uncomment output.logstash: ##Line 159, uncomment and specify the IP and port number of logstash hosts: ["192.168.122.12:5044"] [root@filebeat filebeat]# ./filebeat -e -c filebeat.yml #Start filebeat, - e log to stderr and disable syslog / file output, - c specify the configuration file
3.5 create a new logstash configuration file on the node where the logstash component is located (apache node)
[root@apache ~]# cd /etc/logstash/conf.d/ [root@apache conf.d]# vim logstash.conf input { beats { port => "5044" } } output { elasticsearch { hosts => ["192.168.122.10:9200", "192.168.122.11:9200"] index => "%{[fields][service_name]}-%{+YYYY.MM.dd}" } stdout { codec => rubydebug } } [root@apache conf.d]# /usr/share/logstash/bin/logstash -f apache_log.conf
3.6 browser verification
Browser access http://192.168.122.10:5601 Log in to kibana,
After adding the "filebeat - *" index, view the collection of filebeat logs in "Discover".