Filebeat log collector

1, Filebeat log collector

1.1 introduction to filebeat

Filebeat is a "lightweight data collector" for "forwarding" and "centralizing log data". It is developed in go language and is lighter than Logstash.
Filebeat will monitor the specified log file path, collect log events and forward the data to Elasticsearch, Logstash, Redis, Kafka and other storage servers

1.2 main components of filebeat

Filebeat contains two main components, input and harvester. The two components work together to send the latest data at the end of the file

  1. Input: input is responsible for managing the path from which the harvester finds all readable resources.
  2. Harvester: responsible for reading the contents of a single file line by line, and then sending the contents to the output.

1.3 Filebeat workflow

After filebeat is started, filebeat reads the specified log path through Input, and then starts a harvesting process harvester for the log. Each harvesting process reads the new content of a log file and sends these new log data to the handler spooler. The handler will collect these events, and finally filebeat will send the collected data to the location you specify.

1.4 Filebeat configuration description

2, Filebeat basic usage

2.1 installation

It needs to be installed on the business system

[root@web01 ~]# rpm -ivh filebeat-7.8.1-x86_64.rpm 

Error during startup: existing: could not start register: error loading state: error decoding states: EOF

rm -r /var/lib/filebeat/registry
systemctl reset-failed filebeat
systemctl start filebeat

2.2 the test is read in from the terminal and output to the middle terminal

[root@web01 ~]# cat /etc/filebeat/test.yml 
filebeat.inputs:
- type: stdin
  enabled: true
output.console:
  pretty: true
  enable: true

[root@web01 filebeat]# filebeat -e -c /etc/filebeat/test.yml
hello world
{
  "@timestamp": "2021-10-27T13:29:07.422Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.8.1"
  },
  "log": {
    "offset": 0,
    "file": {
      "path": ""
    }
  },
  "message": "hello world",
  "input": {
    "type": "stdin"
  },
  "ecs": {
    "version": "1.5.0"
  },
  "host": {
    "name": "web01"
  },
  "agent": {
    "version": "7.8.1",
    "hostname": "web01",
    "ephemeral_id": "3d0de9b0-b486-494a-823d-305491f44950",
    "id": "457b924d-450b-49eb-8126-047091c09920",
    "name": "web01",
    "type": "filebeat"
  }
}

2.3 read data from the file and output it to the middle end

1. Modify yml file

[root@web01 ~]# cat /etc/filebeat/test.yml 
filebeat.inputs:
- type: log
  enabled: true
  paths: 
    - /var/log/test.log
output.console:
  pretty: true
  enable: true


2. Create the / var/log/test.log directory

3. Start Filebeat

[root@web01 filebeat]# filebeat -e -c /etc/filebeat/test.yml 

4. Another terminal adds data to the log

[root@web01 ~]# echo "test log" > /var/log/test.log

5. Check whether the middle end panel can obtain data

{
  "@timestamp": "2021-10-27T13:35:20.083Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.8.1"
  },
  "log": {
    "offset": 0,
    "file": {
      "path": "/var/log/test.log"
    }
  },
  "message": "test log",
  "input": {
    "type": "log"
  },
  "host": {
    "name": "web01"
  },
  "agent": {
    "hostname": "web01",
    "ephemeral_id": "cce5fd00-ba6f-44bb-b40a-1f9e39e27986",
    "id": "457b924d-450b-49eb-8126-047091c09920",
    "name": "web01",
    "type": "filebeat",
    "version": "7.8.1"
  },
  "ecs": {
    "version": "1.5.0"
  }
}

2.5 read the data from the file and input it into the es cluster

[root@web01 filebeat]# cat /etc/filebeat/test.yml 
filebeat.inputs:
- type: log  # Log type
  enabled: true # Start collection
  paths: 
    - /var/log/test.log  # Log path
output.elasticsearch:
  hosts: ["172.16.1.161:9200"]  # es cluster ip+port
# If you do not customize the index, the default index is filebeat

Append data to simulation log

[root@web01 ~]# echo "filebeat  test data" > /var/log/test.log
[root@web01 ~]# echo "filebeat  test data123" > /var/log/test.log

go
Check in cerebro and you can see the index,

Go to kibana to check the specific data

2.6 output to ES cluster actual combat

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/messages


output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]

2.7 custom index

It is inconvenient to view the fields in kibana by default. In fact, you can view them in discover, but you need to customize the index first.
Click stack management - > create index mode in kibana


Click Discover

Add your own log

[root@web01 ~]# echo "test bertwu" >> /var/log/messages

2.8 Filebeat custom index name

By default, all index names start with filebeat, which is difficult to distinguish. We can customize the index name

  1. Modify the filebeat configuration file;
  2. Delete the index of ES; delete the index of Kibana;
  3. Restart the filebeat service to regenerate the new index;
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/messages


output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "message-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name



setup.ilm.enabled: false # In the index life cycle, you need to turn off the custom index name to take effect
setup.template.name: "message"       #Define template name
setup.template.pattern: "message-*"  #Defines the matching index name of the template

# If it is a filebeat -- elastic search -- kibana architecture, you can set fragmentation in this way, otherwise it is invalid.
#setup.template.settings:
#  index.number_of_shards: 3
#  index.number_of_replicas: 0

Create a message index in kibana to retrieve it.
By default, the index fragment written by Filebeat to ES is 1. If you need to revise the fragment, you can do it in the following two ways:
Method 1: modify the filebeat configuration file and add the following contents; then delete the index template and index and regenerate the data;

setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 1

Method 2: use cerebro web page to modify;
1. Modify the template settings configuration and adjust the partition and copy;
2. Delete the index associated with the template;
3. Restart filebeat to generate a new index;

2, Filebeat collects system logs

2.1. What are the system logs

The system log is actually very broad. We usually refer to messages, secure, cron, dmesg, ssh, boot and other logs.

2.2 system log collection ideas

There are many logs in the system, and it becomes very troublesome to configure and collect them one by one. Therefore, we need to conduct unified and centralized management of these logs. You can write all types of local logs to the / var/log/system.log file through rsyslog, and then use filebeat to collect the file.

Rsyslog + filebeat -- > elasticsearch cluster < – kibana

2.3 environmental preparation

Host nameserviceIP address
web01rsyslog+filebeat172.16.1.7
es-node1es172.16.1.161
es-node2es172.16.1.162
es-node3es172.16.1.163

2.4 rsyslog installation and configuration

[root@web01 ~]# yum install rsyslog -y

[root@web01 ~]# vim /etc/rsyslog.conf
#Configure how logs are collected
#*.* @IP:514 #Send all local logs to the remote server over the network
*.* /var/log/oldxu.log #Save all local logs to local / var/log/system.log


# start-up
systemctl start rsyslog

2.5 configuring filebeat

[root@web01 ~]# cat /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/sys.log


output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "system-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name


setup.ilm.enabled: false
setup.template.name: "system"       #Define template name
setup.template.pattern: "system-*"  #Defines the matching index name of the template

2.6 kibana creates system index and views it

2.7 optimization

kibana shows that there are many Debug messages on the results. In fact, such messages do not need to be collected, so we can optimize the collected log content and only collect the logs related to warning, ERR and sshd;

[root@web01 ~]# cat /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/sys.log
  include_lines: ["WARN","ERR","sshd"]  #Include these
# exclude_lines: ["DEBUG","INFO"] # Or exclude these

output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "system-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name


setup.ilm.enabled: false
setup.template.name: "system"       #Define template name
setup.template.pattern: "system-*"  #Defines the matching index name of the template

3, Filebeat collects Nginx logs

We need to obtain the user's information, such as: which region the source IP is, PV, UV, status code, access time, etc. of the website, so we need to collect Nginx logs;

3.1 architecture diagram of nginx log collection

nginx+filebeat --> elasticsearch <–kibana

3.2 install nginx and configure the default access site

[root@web01 filebeat]# cat /etc/nginx/conf.d/elk.conf 
server {
	listen 5555;
	server_name elk.bertwu.net;
	
	location / {
		root /code;
		index index.html;			
					
		}				
				
}

3.3 configuring filebeat

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/nginx/access.log

output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "nginx-access-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name


setup.ilm.enabled: false
setup.template.name: "nginx"       #Define template name
setup.template.pattern: "nginx-*"  #Defines the matching index name of the template

3.4 kibana create index and display

3.5 Nginx json log collection

3.5.1 original collection problems

We have realized the collection of Nginx logs, but all the data are in the message field, which can not meet the needs of analysis, such as:
The status code needs to be counted;
Count the total traffic generated by all requests;
Clients used by statistical sources; and so on
There is no way to achieve this

3.5.2 solutions

Each option in the log needs to be split into key value, so the json format is needed.

3.5.3 converting nginx log format to json

1. Reset nginx day master format to json format

log_format json '{ "time_local": "$time_local",'
                        '"remote_addr": "$remote_addr",'
                        '"referer": "$http_referer",'
                        '"request": "$request",'
                        '"status": $status,'
                        '"bytes": $body_bytes_sent,'
                        '"test_agent": "$http_user_agent",'
                        '"x_forwarded": "$http_x_forwarded_for",'
                        '"up_addr": "$upstream_addr",'
                        '"up_host": "$upstream_http_host",'
                        '"upstream_time": "$upstream_response_time",'
                        '"request_time": "$request_time"'
                        '}';

2. Reconfigure nginx.conf

[root@web01 filebeat]# cat /etc/nginx/conf.d/elk.conf 
server {
	listen 5555;
	server_name elk.bertwu.net;
	access_log  /var/log/nginx/access.log  json; # Define the log format as json
		
	location / {
		root /code;
		index index.html;			
					
		}				
				
}

3. Reconfigure the filebeat file

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/nginx/access.log
  json.keys_under_root: true  # Flag will store the format parsed by json in messages. If it is changed to true, it will not be stored in messages
  json.overwrite_keys: true  #Override the default message field and use the key in the custom json format

output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "nginx-access-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name


setup.ilm.enabled: false
setup.template.name: "nginx"       #Define template name
setup.template.pattern: "nginx-*"  #Defines the matching index name of the template


4. Restart filebeat and Nginx, then clear the log and regenerate the log in json format

[root@web01 nginx]# > /var/log/nginx/access.log
[root@web01 nginx]# 
[root@web01 nginx]# 
[root@web01 nginx]# systemctl restart nginx
[root@web01 nginx]# systemctl restart filebeat

5. View

3.6 nginx multiple log types collection

There are access logs and error logs in nginx, so how to use filebeat to collect the access logs and error logs of nginx at the same time;
Our desired status is as follows:
Nginx access log -- store – > nginx access XXX index
Nginx error log -- store – > nginx error XXX index

1. Configure filebeat to collect multiple logs, which need to be distinguished by tags;

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/nginx/access.log
  json.keys_under_root: true
  json.overwrite_keys: true
  tags: ["access"]

- type: log
  enabled: true
  paths: /var/log/nginx/error.log
  tags: ["error"]

output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  indices:
    - index: "nginx-access-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name
      when.contains: 
        tags: "access"
    - index: "nginx-error-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        tags: "error"

setup.ilm.enabled: false
setup.template.name: "nginx"       #Define template name
setup.template.pattern: "nginx-*"  #Defines the matching index name of the template

2. Create nginx err index in kibana and view it

3.7 Nginx multi virtual host collection

Nginx if there are multiple sites; How does filebeat collect access logs for multiple domain names

Improvement based on the original
1. Configure nginx multi site

[root@web01 ~]# cat /etc/nginx/conf.d/elk.conf 
server {
	listen 5555;
	server_name elk.bertwu.net;
	access_log  /var/log/nginx/access.log  json;
		
	location / {
		root /code;
		index index.html;			
					
		}				
				
}

server {
  listen 5555;
  server_name blog.bertwu.net;
  access_log  /var/log/nginx/blog.log  json;
    
  location / {
    root /code;
    index index.html;     
          
    }       
        
}    

server {
  listen 5555;
  server_name www.bertwu.net;
  access_log  /var/log/nginx/www.log  json;
    
  location / {
    root /code;
    index index.html;     
          
    }       
        
}    

2. Configure filebeat

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /var/log/nginx/access.log
  json.keys_under_root: true
  json.overwrite_keys: true
  tags: ["access"]

- type: log
  enabled: true
  paths: /var/log/nginx/error.log
  tags: ["error"]

- type: log
  enabled: true
  paths: /var/log/nginx/www.log
  json.keys_under_root: true
  json.overwrite_keys: true
  tags: ["nginx-www"]

- type: log
  enabled: true
  paths: /var/log/nginx/blog.log
  json.keys_under_root: true
  json.overwrite_keys: true
  tags: ["nginx-blog"]

output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  indices:
    - index: "nginx-access-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name
      when.contains: 
        tags: "access"
    - index: "nginx-error-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        tags: "error"

    - index: "nginx-www-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        tags: "nginx-www"

    - index: "nginx-blog-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        tags: "nginx-blog"

setup.ilm.enabled: false
setup.template.name: "nginx"       #Define template name
setup.template.pattern: "nginx-*"  #Defines the matching index name of the template

3.kibana view

4, Filebeat collects Tomcat logs

We only need to install tomcat, then modify Tomcat to json format log, and use filebeat to collect;

4.1 Tomcat log collection architecture

tomcat+filebeat --> elasticsearch <–kibana

4.2 Tomcat access log collection

1. Install tomcat and set the access site

[root@web01 ~]# mkdir -p /soft/ && cd /soft
[root@web01 soft]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/tomcat/tomcat-9/v9.0.26/bin/apache-tomcat9.0.26.tar.gz
[root@web01 soft]# tar xf apache-tomcat-9.0.26.tar.gz
[root@web01 soft]# ln -s /soft/apache-tomcat-9.0.26 /soft/tomcat

2. Modify the tomcat server.xml file and modify the log format

<Host name="elk.bertwu.net"  appBase="webapps"
unpackWARs="true" autoDeploy="true">
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
  prefix="json_elk_log" suffix=".txt"                                                                                              
    pattern="
     {&quot;clientip&quot;:&quot;%h&quot;,&quot;
     ClientUser&quot;:&quot;%l&quot;,&quot;
     authenticated&quot;:&quot;%u&quot;,&quot;
     AccessTime&quot;:&quot;%t&quot;,&quot;
     method&quot;:&quot;%r&quot;,&quot;
     status&quot;:&quot;%s&quot;,&quot;
     SendBytes&quot;:&quot;%b&quot;,&quot;
     Query?string&quot;:&quot;%q&quot;,&quot;
     partner&quot;:&quot;%{Referer}i&quot;,&quot;
     AgentVersion&quot;:&quot;%{User-Agent}i&quot;}" />      
</Host>

3. Restart tomcat

[root@web01 tomcat]# /soft/tomcat/bin/startup.sh

4. Check whether the access log is in json format

[root@web01 tomcat]# cat /soft/tomcat/logs/json_elk_log.2021-10-30.txt 
      {"clientip":"10.0.0.1","      ClientUser":"-","    authenticated":"-","    AccessTime":"[30/Oct/2021:11:08:45 +0800]","    method":"GET / HTTP/1.1","    status":"200","    SendBytes":"200","    Query?string":"","    partner":"-","    AgentVersion":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"}

5. Modify the filebeat configuration

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /soft/tomcat/logs/json_elk_log*.txt
  json.keys_under_root: true
  json.overwrite_keys: true
  tags: ["access"]


output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  index: "tomcat-access-%{[agent.version]}-%{+yyyy.MM.dd}"

setup.ilm.enabled: false
setup.template.name: "tomcat"       #Define template name
setup.template.pattern: "tomcat-*"  #Defines the matching index name of the template


6. Create a Tomcat access cable in kibana and display it

4.3 Tomcat error log collection

4.3.1 java error log features

1. There are many error messages.
2. The error information is divided into many lines.

4.3.2 collection ideas

Example 1: Tomcat normal log starts with "date". The errors in the error log do not start with "date". So we can match the event log that starts with "date" until the next date.
Example 2: Elasticsearch normal log starts with []. The error message in the error log does not start with [], so it can match the line starting with []. Until the next occurrence, it will be an event log. Official multiline matching method

4.3.3 filebeat configuration

[root@web01 filebeat]# cat filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths: /soft/tomcat/logs/json_elk_log*.txt
  json.keys_under_root: true    # The default value is False; that is, all logs are recorded in the Message field; true is not stored in the Message field
  json.overwrite_keys: true     # The Message field will be overwritten, and then the self-defined Key in Json format will be used as the field to store the corresponding value
  tags: ["access"]

- type: log
  enabled: true
  paths: /soft/tomcat/logs/catalina.out
  tags: ["error"]
  multiline.pattern: '^\d{2}'
  multiline.negate: true
  multiline.match: after
  multiline.max_lines: 1000       # The maximum number of merged rows is 500 by default


output.elasticsearch:
  hosts: ["172.16.1.161:9200","172.16.1.162:9200","172.16.1.163:9200"]
  indices:
    - index: "tomcat-access-%{[agent.version]}-%{+yyyy.MM.dd}" #Custom index name
      when.contains:
        tags: "access"
    - index: "tomcat-error-%{[agent.version]}-%{+yyyy.MM.dd}"
      when.contains:
        tags: "error"



setup.ilm.enabled: false
setup.template.name: "tomcat"       #Define template name
setup.template.pattern: "tomcat-*"  #Defines the matching index name of the template

Tags: ElasticSearch kafka Redis ELK filebeat

Posted on Sat, 30 Oct 2021 15:56:34 -0400 by Panz3r