An article on filebeat (ELK)

The filebeat used in this article is version 7.7.0
This paper illustrates from the following points:

  • What is filebeat and what can it be used for
  • What is the principle of filebeat and how it is constructed
  • How should filebeat play

1. What is filebeat

1.1, filebeat and beats

First filebeat is a member of Beats.
Beats is a lightweight log collector. In fact, Beats family has six members. Logstash was used to collect and parse logs in the early ELK architecture, but it consumes a lot of resources such as memory, CPU and io.Beats account for almost nothing in the CPU and memory of the system compared to Logstash.
Beats currently has six tools:

  • Packetbeat: Network data (collects network traffic data)
  • Metricbeat: Indicator (collecting system, process, and file system-level CPU and memory usage data, etc.)
  • Filebeat: Log file (collects file data)
  • Winlogbeat: Winows Event Log (collects Windows Event Log data)
  • Auditbeat: Audit data (collect audit logs)
  • Heartbeat: Runtime monitoring (collecting data about the system as it runs)

1.2. What is filebeat

Filebeat is a lightweight transport tool for forwarding and centralizing log data.Filebeat monitors the log files or locations you specify, collects log events, and forwards them to Elasticsearch or Logstash for indexing.

Filebeat works as follows: When you start Filebeat, it will start one or more inputs that will be found in the location specified for the log data.For each log found by Filebeat, Filebeat starts the collector.Each collector reads a single log to get new content and sends the new log data to libbeat, which aggregates events and sends the aggregated data to the output configured for Filebeat.

The flow chart of the work is as follows:

 

1.3, filebeat and logstash

Since logstash is run by jvm and consumes a lot of resources, the author later wrote a lightweight logstash-forwarder with golang which has less functions but less resources.But the author is just one person, joinHttp://elastic.coLater, because es acquired another open source project, packetbeat, specially for golang, with the entire team, it simply merged the development of logstash-forwarder with the same golang team, and the new project was called filebeat.

 

2. What is the filebeat principle

2.1. Composition of filebeat s

Filebeat structure: consists of two components, inputs and harvesters, which work together to track files and send event data to the output you specify, and harvester is responsible for reading the contents of a single file.Harvester reads each file line by line and sends the content to the output.Start a harvester for each file.Harvester is responsible for opening and closing files, which means that the file descriptor remains open while harvester is running.If you delete or rename a file while collecting it, Filebeat will continue to read the file.The side effect is that the space on the disk remains until harvester shuts down.By default, Filebeat keeps the file open until it reaches close_inactive

The result of closing harvester is:

  • The file handler closes, and if harvester is deleted while still reading the file, the underlying resources are released.
  • Only in scan_File collection will not start again until frequency is over.
  • If the file is moved or deleted when harvester closes, collection of the file will not continue

An input manages harvesters and looks for all source reads.If the input type is log, input will find all files on the drive that match the defined path and start a harvester for each file.Each input runs in its own Go process, and Filebeat currently supports multiple input types.Each input type can be defined multiple times.Log input checks each file to see if harvester needs to be started, if harvester is already running, or if the file can be ignored

2.2. How filebeat saves the state of a file

Filebeat preserves the state of each file and frequently refreshes the state to a registry file on disk.This state is used to remember the last offset read by harvester and ensure that all log rows are sent.If the output (such as Elasticsearch or Logstash) is not accessible, Filebeat will track the last line sent and continue reading the file when the output is available again.When Filebeat runs, the status information for each input is also saved in memory.When Filebeat restarts, the data from the registry file is used to rebuild the state, and Filebeat continues each harvester in the last known location.For each input, Filebeat preserves the state of each file it finds.Since files can be renamed or moved, the file name and path are not sufficient to identify the file.For each file, Filebeat stores a unique identifier to detect whether the file was previously captured.

2.3. How filebeat guarantees at least one data consumption

Filebeat guarantees that events will be passed to the configured output at least once and that no data will be lost.It stores the delivery status of each event in a registry file.In cases where the defined output is blocked and all events are not acknowledged, Filebeat will continue to attempt to send events until the output acknowledges that events have been received.If Filebeat is closed while sending events, it will not wait for the output to confirm all events before closing.When Filebeat restarts, all events that were not acknowledged before Filebeat was closed will again be sent to the output.This ensures that each event is sent at least once, but duplicate events may eventually be sent to the output.By setting shutdown_timeout option, you can configure Filebeat to wait for a specific time before shutting down

3. How to play filebeat

3.1. Compressed package installation

This is installed in a compressed package, version linux, filebeat-7.7.0-linux-x86_64.tar.gz

curl-L-Ohttps://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz
tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

Configuration Sample File:filebeat.reference.yml(Contains all non-obsolete configuration items)
Profile:filebeat.yml

3.2. Basic Commands

See the official website for details: https://www.elastic.co/guide/en/beats/filebeat/current/command-line-options.html

export   #export
run      #Execution (default)
test     #Test Configuration
keystore #Secret Key Storage
modules  #Module Configuration Management
setup    #Set Initial Environment

For example:. /filebeat test config #is used to test if the configuration file is correct

3.3, Input and Output

Supported input components:

Multilinemessages, Azureeventhub, CloudFoundry, Container, Docker, GooglePub/Sub, HTTPJSON, Kafka, Log, MQTT, NetFlow, Office365 ManagementActivity API, Redis, s3, Stdin, Syslog, TCP, UDP (most commonly used amount is log)

Supported output components:

Elasticsearch,Logstash,Kafka,Redis,File,Console,ElasticCloud,Changetheoutputcodec (most commonly Elasticsearch,Logstash)

3.4. Use of keystore s

keystore is designed to prevent sensitive information from being leaked, such as passwords, such as ES's password, where a key is generated as ES_PWD, a corresponding relationship for the password of es, can use ${ES_when using the password of ESPWD} Use

Create a keystore to store passwords:filebeat keystore create
 Then add key-value pairs to it, for example: filebeatk eystore add ES_PWD
 Use a value that overrides the original key: filebeat key store add ES_PWD-force
 Delete key-value pairs: filebeat key store remove ES_PWD
 View existing key-value pairs: filebeat key store list

For example, ${ES_can be used laterPWD} uses its value, for example:

output.elasticsearch.password:"${ES_PWD}"

3.5,Filebeat.ymlConfiguration (log input type example)

See the official website for details: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

type: log #input type is log
 enable: true #indicates that the log type configuration is valid
 paths:#Specifies the log to be monitored and is currently handled as a glob function in the Go language.There is no recursive processing of the configuration directory, such as if the configuration is:
- /var/log/* /*.log # will only look for files ending in'.log'in all subdirectories of the / var/log directory, not files ending in'.log' in the / var/log directory.
recursive_glob.enabled:#Enable global recursion mode, e.g. /foo/**Includes/foo, /foo/*, /foo/*/*
Encoding:#Specifies the encoding type of the file being monitored. Use plain and utf-8 to process Chinese logs
 exclude_lines: ['^DBG'] #does not contain rows matching the regular
 Include_Lines: ['^ERR','^WARN'] #Contains rows that match the regularity
 harvester_buffer_size: 16384 #The byte size of the buffer used by each harvester to get the file
 max_bytes: 10485760 #Maximum number of bytes a single log message can have.Max_All bytes after bytes are discarded and not sent.The default value is 10MB (10485760)
exclude_files: ['\.gz$'] #List of regular expressions used to match files you want Filebeat to ignore
 ingore_older: 0 #defaults to 0, which means disabled. You can configure 2h, 2m, etc. Note ignore_older must be greater than close_Value of inactive. Indicates ignoring updates that exceed the set value
 Files or files are never collected by harvester
 close_* #close_* Configuration options are used to close harvester after a specific standard or time.Closing harvester means closing the file handler.If closed in harvester
 After the file is updated, scan_After frequency, the file will be picked up again.However, if you move or delete files when harvester closes, Filebeat will not be able to receive files again
 And any data that harvester does not read will be lost.
Close_When inactive #startup option is enabled, the file handle will be closed if it is not read at set time
 The last log read is defined as the starting point for the next read, not the file-based modification time
 If the closed file changes, a new harverster will be in scan_frequency is started after running
 It is recommended that you set at least one value greater than the frequency of reading logs and configure multiple prospector s to implement log files with different update speeds
 Use internal timestamp mechanism to reflect log reads, and start countdown every time the last log line is read using 2h 5m
 close_rename #Starts with the option, filebeat closes processing and reading of files if they are renamed and moved
 Close_Remove #When the option starts and the file is deleted, filebeat closes processing of the file. After reading this option starts, clean_must be startedRemoved
 close_eof #Suitable for files that log only once, then filebeat closes processing and reading of files
 Close_When timeout #is selected for startup, filebeat will set a predefined time for each harvester, whether the file is read or not, and will be closed when the set time is reached
 close_timeout cannot equal ignore_older, which causes file updates not to be read. If output has no output log events, this timeout will not be started.
At least one event must be sent and the haverter will be closed
 Setting 0 means no startup
 clean_inactived #Delete the status of previously harvested files from registry files
 Settings must be greater than ignore_older+scan_frequency to ensure that no state is deleted while the file is still being collected
 Configuration options can help reduce the size of registry files, especially if a large number of new files are generated every day
 This configuration option can also be used to prevent the Filebeat problem of reusing inode s on Linux
 Clean_After the remove #boot option, if the file is not found on disk, the filebeat will be cleared from the registry
 clean removed must be turned off if close removed is turned off
 scan_frequency #prospector checks the frequency of new files in the path specified for harvest, default 10s
 tail_files:#If set to true, Filebeat monitors the file additions from the end of the file, sending each new line of files in turn as an event.
Instead of resending everything from the beginning of the file.
The symlinks:#symbolic link option allows Filebeat s to collect symbolic links in addition to regular files.When collecting symbolic links, even if the path to the symbolic link is reported,
Filebeat also opens and reads the original file.
The backoff:#backoff option specifies how Filebeats actively grabs new files for updates.Default 1s, backoff option defines Filebeat after EOF
 Check the waiting time between files again.
max_backoff:#The maximum amount of time Filebeat waits before checking the file again after EOF has been reached
 backoff_factor:#Specifies how many times backoff attempts to wait, defaulting to 2
 harvester_limit:#harvester_The limit option limits the number of harvesters a prospector starts in parallel, directly affecting the number of file openings

Tags are added to the tags #list and filtered, for example: tags: ["json"]
Fields #Optional fields, select additional fields for output can be scalar values, tuples, dictionaries, and other nested types
 Default at sub-dictionary location
filebeat.inputs:
fields:
app_id: query_engine_12
 fields_under_root #If the value is true, fields are stored at the top level of the output document

multiline.pattern#must match regexp pattern
 multiline.negate#Defines whether the action above the pattern matching criteria is negative, defaulting to false
 If the pattern matching condition'^b', the default is the false pattern, meaning that matching by pattern matching will not merge log lines that start with B
 If true, it means that log lines that do not start with b will not be merged
 multiline.match#Specifies how Filebeat combines matching rows into events, either before or after, depending on the negate specified above
 multiline.max_lines #The maximum number of rows that can be combined into an event, more than will be discarded, default 500
 multiline.timeout#Defines the timeout, and if a match is not found within the timeout for starting a new event, the log will also be sent, defaulting to 5s
max_procs #Sets the maximum number of CPUs that can be executed simultaneously.The default value is the number of logical CPUs available in the system.
Name #Specify a name for the filebeat, defaulting to hostname of the host
 

3.6, Example 1: logstash as output

Filebeat.ymlTo configure

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:  #Configure multiple log paths
    - /var/logs/es_aaa_index_search_slowlog.log
    - /var/logs/es_bbb_index_search_slowlog.log
    - /var/logs/es_ccc_index_search_slowlog.log
    - /var/logs/es_ddd_index_search_slowlog.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts #Use load balancing mechanism with multiple logstash
  hosts: ["192.168.110.130:5044","192.168.110.131:5044","192.168.110.132:5044","192.168.110.133:5044"]  
  loadbalance: true  #Load balancing used

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

. /filebeat-e #Start filebeat

Configuration of logstash

input {
  beats {
    port => 5044   
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.110.130:9200"] #You can configure multiple
    index => "query-%{yyyyMMdd}" 
  }
}

 

3.7, Example 2: elasticsearch as output

Filebeat.ymlConfiguration:

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/logs/es_aaa_index_search_slowlog.log
    - /var/logs/es_bbb_index_search_slowlog.log
    - /var/logs/es_ccc_index_search_slowlog.log
    - /var/logs/es_dddd_index_search_slowlog.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================


#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: filebeat222

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

#cloud.auth:

#================================ Outputs =====================================


#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["192.168.110.130:9200","92.168.110.131:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "${ES_PWD}"   #Set password through keystore

. /filebeat-e #Start filebeat

Looking at the elasticsearch cluster, there is a default index name of filebeat-%{[beat.version]}-%{+yyyy.MM.dd}

 

 

 

3.8, filebeat module

Official website: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html

Here I use the elasticsearch mode to parse es'slow log query, following the steps, as well as other module operations:

Prerequisite: Install both Elasticsearch and kibana software, then use filebeat

The specific operational websites are: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules-quickstart.html

Step 1, ConfigureFilebeat.ymlfile

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "192.168.110.130:5601"  #Specify kibana
  username: "elastic"   #user
  password: "${ES_PWD}"  #Password, keystore used here to prevent plain text passwords

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["192.168.110.130:9200","192.168.110.131:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"  #Users of es
  password: "${ES_PWD}" # Password for es
  #You cannot specify here index,Because I don't have a configuration template, a template named filebeat-%{[beat.version]}-%{+yyyy.MM.dd}Index of

Step 2: Configure the slow log path for elasticsearch

cd filebeat-7.7.0-linux-x86_64/modules.d

vim  elasticsearch.yml

 

 

 

Step 3: Effective es module

./filebeat modules elasticsearch

View active modules

./filebeat modules list

 

 

 

Step 4: Initialize the environment

./filebeat setup -e

 

 

 

 

Step 5: Start filebeat

./filebeat -e

Looking at the elasticsearch cluster, as shown in the following figure, the logs of slow log queries are automatically parsed out:

 

Here, the elasticsearch module succeeds

 

Reference resources

Official website: https://www.elastic.co/guide/en/beats/filebeat/current/index.html

Tags: ElasticSearch Linux SSL network

Posted on Mon, 15 Jun 2020 22:05:28 -0400 by intellivision