ELK log analysis platform -- elastic search

1, elasticsearch practice

Open source distributed search analysis engine and love ability are based on apache lucene, a full-text search engine library
elasticsearch is not only lucene, but also a full-text search engine:
A distributed file storage, each field can be indexed and searched
A distributed real-time analysis search engine
It can scale hundreds of service nodes and support PB level structured or unstructured data.

Basic modules:

cluster Manage cluster status and maintain cluster level configuration information
alloction Functions and strategies related to package allocation
discovery Discover the nodes in the cluster and select the primary node
gateway Persistent storage of cluster state data broadcast by the received master
indices Manage index settings at the global level
http Allow access to ES API through JSON over HTTP
transport Used for internal communication between nodes in the cluster
engine It encapsulates the operation of lucene and the call of translog

elasticsearch application scenario:
Information retrieval
Log analysis
Business data analysis
Database acceleration
Operation and maintenance index monitoring

Official website: https://www.elastic.co/cn/

1. Installation
https://elasticsearch.cn/download

[root@server1 elk]# rpm -ivh elasticsearch-7.7.0-x86_64.rpm 
[root@server1 elk]# cd /etc/elasticsearch/
[root@server1 elasticsearch]# vim elasticsearch.yml 		#Install profile
cluster.name: my-es											#Specify cluster
node.name: server4											#Specify nodes
bootstrap.memory_lock: true									#memory locked
network.host: 172.25.1.1									#Listening port
http.port: 9200
discovery.seed_hosts: ["server1", "server3","server5"]      #There must be one, or the service won't work



[root@server1 elasticsearch]# vim /etc/security/limits.conf 
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
elasticsearch - nofile 65535
elasticsearch - nproc  4096

[root@server1 elasticsearch]# vim /usr/lib/systemd/system/elasticsearch.service
LimitNPROC=4096
LimitMEMLOCK=infinity

[root@server1 elasticsearch]# swapoff -a			#Disable swap partition
[root@server1 elasticsearch]# netstat -antlp



2. Graphical operation
(1) Installation

[root@server1 elk]# ls
elasticsearch-7.7.0-x86_64.rpm  elasticsearch-head-master  nodejs-9.11.2-1nodesource.x86_64.rpm
https://mirrors.tuna.tsinghua.edu.cn/nodesource/rpm_9.x/el/7/x86_64/		#Download address
[root@server1 elk]# yum install -y nodejs-9.11.2-1nodesource.x86_64.rpm 
[root@server1 elk]# node -v
v9.11.2
[root@server1 elk]# npm -v
5.6.0

yum install unzip
unzip elasticsearch-head-master.zip
cd elasticsearch-head-master
npm install --registry=https://registry.npm.taobao.org


yum install -y bzip2
tar jxf phantomjs-2.1.1-linux-x86_64.tar.bz2 
cd phantomjs-2.1.1-linux-x86_64/bin/
mv phantomjs /usr/local/bin/
phantomjs 
yum provides */libfontconfig.so.1
yum install -y fontconfig-2.13.0-4.3.el7.x86_64
phantomjs 
cd /root/elk/elasticsearch-head-master

npm install --registry=https://registry.npm.taobao.org


(2) Start

[root@server1 elasticsearch-head-master]# cd _site/
[root@server1 _site]# ls
app.css  app.js  background.js  base  fonts  i18n.js  index.html  lang  manifest.json  vendor.css  vendor.js
[root@server1 _site]# vim app.js 
this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://172.25.1.1:9200";
[root@server1 elasticsearch-head-master]# npm run start &


Web search connection: http://172.25.1.1:9100/
Found unable to connect

(3) Modify ES cross domain hosting

[root@server1 elasticsearch-head-master]# vim /etc/elasticsearch/elasticsearch.yml
http.cors.enabled: true
http.cors.allow-origin: "*"
[root@server1 elasticsearch-head-master]# systemctl restart elasticsearch


Connection failed to modify profile

[root@server1 elasticsearch]# vim elasticsearch.yml 

discovery.seed_hosts: ["server1", "server3","server5"]
#
#Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["server1"]
[root@server1 elasticsearch]# systemctl restart elasticsearch


Successfully connected after modification

(4) Add a host node (the procedure is the same as before)

[root@server3 ~]# cd elk/
[root@server3 elk]# ls
elasticsearch-7.7.0-x86_64.rpm
[root@server3 elk]# rpm -ivh elasticsearch-7.7.0-x86_64.rpm 

[root@server5 ~]# cd elk/
[root@server5 elk]# ls
elasticsearch-7.7.0-x86_64.rpm
[root@server5 elk]# rpm -ivh elasticsearch-7.7.0-x86_64.rpm

[root@server1 elasticsearch]# scp -p elasticsearch.yml server3:/etc/elasticsearch/elasticsearch.yml
[root@server1 elasticsearch]# scp -p elasticsearch.yml server5:/etc/elasticsearch/elasticsearch.yml
//Change your host name and ip address
discovery.seed_hosts: ["server1", "server3","server5"]
#
#Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["server1","server3","server5"]

[root@server1 security]# scp limits.conf server3:/etc/security/limits.conf 
[root@server1 security]# scp limits.conf server5:/etc/security/limits.conf 
[root@server3 elasticsearch]# vim /usr/lib/systemd/system/elasticsearch.service
LimitNPROC=4096
LimitMEMLOCK=infinity
[root@server5 elasticsearch]# vim /usr/lib/systemd/system/elasticsearch.service
LimitNPROC=4096
LimitMEMLOCK=infinity

[root@server3 elk]# systemctl start elasticsearch
[root@server3 elk]# 
[root@server2 elk]#  systemctl restart elasticsearch


elasticsearch node role
Master: it is mainly responsible for the creation and deletion of indexes in the cluster and the rebalance of data u. Master is not responsible for data indexing and retrieval, so the load is light. When the master node loses contact or hangs up, the ES cluster will automatically select a leader from other master nodes
Date Node: it is mainly responsible for the index and retrieval of data in the cluster. Generally, it is under great pressure
Coordinating Node: the main function of the original Client node is to distribute requests and merge results. By default, all nodes are coordinating nodes, which cannot be closed
Ingest Node: preprocessing index documents

3. Optimize nodes
By default, all three nodes can be used as the master to optimize the three nodes and clear their respective roles

(1)
In the production environment, if you don't modify the role information of the elasticsearch node, the cluster is prone to brain crack and other problems in the scene of high data volume and high concurrency. By default, each node in the elasticsearch cluster has the qualification to be the master node, also stores data, and can provide query services.
Node roles are controlled by the following attributes:

node.master false/true
node.master false/true
node.ingest true/false
search.remote.connect true/false

By default, the values of these properties are true.

(2)

node.master This attribute indicates whether the node has the qualification to be the master node. Note: the value of this attribute is true, which does not mean that the node is the master node. Because the real master node is elected by multiple nodes with master node qualification.
node.data This property indicates whether the node stores data
node.ingest Whether to preprocess the document
search.remote.connect Disable cross cluster query

(3)
First combination: (default)
node.master: true
node.data: true
node.ingest: true
search.remote.connect: true
This combination means that this node has the qualification to be the master node and also stores data. If a node is elected as the real master node, then it also stores data, so the pressure on this node is greater. This is OK in the test environment, but it is not recommended in practice.
(4)
Second combination: (Data node)
node.master: false
node.data: true
node.ingest: false
search.remote.connect: false
This combination means that this node is not qualified to be the master node, so it will not participate in the election and will only store data.
This node is called the data node. Several such nodes need to be set up separately in the cluster to store data. Provide storage and query services later.

(5)
Third combination: (master node)
node.master: true
node.data: false
node.ingest: false
search.remote.connect: false
This combination means that this node will not store data, has the qualification to become a master node, can participate in the election, and may become a real master node.
This node is called the master node.

(6)
The fourth combination: (Coordinating Node)
node.master: false
node.data: false
node.ingest: false
search.remote.connect: false
This combination means that this node will neither be the primary node nor store data,
The meaning of this node is as a coordination node, which can balance the load when the massive requests are needed.

(7)
The fifth combination: (Ingest Node)
node.master: false
node.data: false
node.ingest: true
search.remote.connect: false
This combination means that this node will neither be the primary node nor store data,
The meaning of this node is the ingest node, which preprocesses the indexed documents.

The responsibilities of these nodes can be divided in the production cluster
It is recommended to set more than 3 nodes in the cluster as master nodes, which are only responsible for becoming master nodes and maintaining the status of the whole cluster.
Then set a batch of data nodes according to the amount of data. These nodes are only responsible for storing data, and later provide the service of establishing index and querying index. In this way, if the user requests frequently, the pressure of these nodes will be greater.
Therefore, it is recommended to set up another batch of coordination nodes in the cluster, which are only responsible for processing user requests, realizing request forwarding, load balancing and other functions.

Node requirements
master node: normal server (general CPU and memory consumption)
data node: mainly consumes disk and memory.
path.data: data1,data2,data3
Such a configuration may lead to uneven data writing. It is recommended that only one data path be specified. The RAID 0 array can be used for the disk without requiring a high cost ssd.
Coordinating node: high requirements for cpu and memory

experiment
server1:

[root@server1 elasticsearch]# vim elasticsearch.yml 
node.name: server1
node.master: true
node.data: false
node.ingest: false
search.remote.connect: false
[root@server1 elasticsearch]# systemctl restart elasticsearch

server2\3:

[root@server2 elasticsearch]# vim elasticsearch.yml 
node.master: true
node.data: true
node.ingest: false
search.remote.connect: false
[root@server2 elasticsearch]# systemctl restart elasticsearch

#After the setting is successful, the primary node changes to server1

Tags: ElasticSearch RPM vim npm

Posted on Sun, 21 Jun 2020 06:22:37 -0400 by egpis