LogStash - pit avoidance Guide (basic syntax, grok, date)

1. Background

Logstash is an open source data collection engine with real-time pipeline function. Through three steps of input, filtering and output, logstash can process data from different sources and output multiple data sources at the same time.
The data received by logStash is generally the logs of various business systems. It needs to be filtered by filter before output. The pit existing in the process of filtering and output is described below (the data source is filebeat).

2. Introduction to grok / date plug-in

2.1.grok plug-in

grok supports 120 modes of built-in variables. You can view and debug all built-in variables here: Click to view
In addition to supporting built-in matching patterns, it also supports custom matching rules through regular expressions:

(? < field_name > regular expression pattern)

field_name is the name of the matched variable, such as:
(? \ < num > \ D +) matching results: a variable num is defined, and the matching rule is at least one digit

2.2.date plug-in

The date plug-in has five attributes, as follows:

attributetypeexplainmatcharrayThe format is as follows: [field,Formats...], field is the name of the field to be parsed, which can be either a built-in attribute or a user-defined attribute. From the second element of the array and later positions, there are matching date format rules. If the parsed attribute has multiple date formats, multiple elements can be used to include them alllocatecharacter stringRegional settings for date resolution. The platform default value is used by defaulttag_on_failurearrayThe default value is ["_dateparsefailure"], and the value is appended to the field when there is no successful matchtargetcharacter stringStore the matching timestamp in the given target field. If not provided, the @ timestamp field is replaced by defaulttimezonecharacter stringThe time zone specification ID used for date resolution. Valid ID Click here to view . This is useful when the time zone cannot be extracted from the value and is not the platform default. If not specified, the platform defaults are used 3. Buried pit

3.1.grok get input source file name

Each data source will carry its own file name. When filebeat is used as the data source, the field where the source file path is located is source, and the values are as follows:

/export/Logs/serverLog1.log

If the value of the file name is an alphanumeric underscore and the regular matching rule is \ w +, the filter is as follows:

#Filtering rules filter{ grok { match => { "source" => "/export/Logs/(?<logName>\w+).log" } overwrite => [ "source"] } } #Output rule output { #Output to console stdout { codec => rubydebug } }

After filtering, you can see through the console that there will be an additional attribute logName in the output result

3.2.grok parses multiple log text fields and outputs custom fields

To parse multiple fields, you only need to write multiple matches in the filter, and multiple user-defined attributes can be parsed in the text of each field. Note: if multiple custom attributes need to be parsed in a text field, note that break must be added_ on_ match property. The tangent value is set to false. This property defaults to true, that is, it returns when one is matched.
If the log is as follows:

sum= 2124 avg= 14.3 size=312343kB deal success...

filter{ grok { match => { "message" => "sum=\s*(?<total>\d+)\s*avg=\s*(?<num>\d+[\.\d+]*)\s*.*size=\s*(?<fileSize>\d+[\.\d+]*)kB\s*.*" } match => { "source" => "/export/Logs/(?<logName>\w+).log" } overwrite => [ "message"] overwrite => [ "source"] break_on_match => false } }

After filtering, you can see the new attributes total, num, filesize and logname on the console

3.3.grok and data plug-ins replace @ timestamp with parsing log date

After logstash parsing, there will be a @ timestamp attribute by default. This attribute is the current timestamp. In some scenarios, this attribute will need to be replaced with the time in the log. However, the time format in the log is displayed in a user-defined format and cannot be obtained directly through the built-in mode. Therefore, it needs to be used in combination with the grok and date plug-ins. Note: if it is output to es, you can also add ruby plug-in to add 8 hours to the timestamp to avoid the problem of 8 hours difference in ES.
If the log format is as follows:

11/24/2021, 15:22:13 0x2e1a230] UDP timeout, retrying with TCP...

filter{ grok { match => { "message" => "(?<logTime>%/%/%[,]?\s+%\s+[A|P]M)\s+frame=\s*(?<frame>\d+)\s*fps=\s*(?<fps>\d+[\.\d+]*)\s*.*size=\s*(?<bits>\d+[\.\d+]*)kB\s*.*bitrate=\s*(?<bitrate>\d+[\.\d+]*)kbits/s.*" } match => { "source" => "/export/Logs/(?<conferenceId>\d+)[_\d+]*.log" } overwrite => [ "message"] overwrite => [ "source"] break_on_match => false } date { match => ["logTime","MM/dd/yyyy, hh:mm:ss a","M/dd/yyyy, hh:mm:ss a"] #Replace the logTime value with target, and replace the @ timestamp attribute without target # target => "logTime" } # Set the logTime attribute to increase 8 hours to solve the problem of 8 hours difference in UTC time # ruby { # code => "event.set('logTime', event.get('logTime').time.localtime + 8*60*60)" # } # Replace the @ timestamp attribute value with the logTime attribute # ruby { # code => "event.set('@timestamp',event.get('logTime'))" # } mutate { remove_field => ["logTime"] } }

2.4. Output ES using index template

If the volume of business data is too large, and the ES index design uses the template to generate the default regularly, the dynamic index should be used when logstash stores es to support dynamic data storage.

output { # Output to console # stdout { # codec => rubydebug # } # Output to es elasticsearch { hosts => [ "es1.demo.com:40000", "es2.demo.com:40000" ] user => "es-username" password => "1qaz2wsx3dc" index => "log_monitor_%{+yyyy_MM}" document_type => "log_monitor_type" } }

In the above configuration, the es has two connection addresses. The es is generated monthly in log according to the template_ Monitor is the index prefixed with log_monitor_type (an index can only have one type after es6.x, and type has been cancelled after es7.x), where% {+ yyyy_MM} is to dynamically obtain the value of the built-in mode to dynamically output to different indexes.