catalogue
1, Introduction to Elasticsearch
2, ElasticSearch cluster installation
3, Configure Chinese word splitter
1, Introduction to Elasticsearch
Elasticsearch is a distributed document storage middleware. Instead of storing information as columns and rows, it stores complex data structures that have been serialized into JSON documents. When you have multiple nodes in a cluster, the stored documents are distributed in the whole cluster and can be accessed from any node immediately.
When a document is stored, it will be indexed and searched in near real time (1s). Elastic search uses a data structure called inverted index, which supports fast full-text search. The inverted index lists each unique word in all documents and identifies which document each word is in.
An index can be regarded as an optimized collection of documents. Each document index is a collection of fields, which are key value pairs containing data. By default, Elasticsearch establishes inverted indexes for all data in each field, and each index field has a special optimized data structure. For example, text fields are in the inverted index, and numerical and geographic fields are stored in the BKD tree. It is precisely because of the combination of data structures by field that Elasticsearch has such fast search ability.
2, ElasticSearch cluster installation
The installed version of this article is 7.15.2, and some parameters of the old version are somewhat different; The JDK version is jdk17.
Traditional way
1. Installing the jdk environment
Installation jdk17&jdk8 under Linux system_ Familiar snail blog - CSDN blog
2. Unzip the installation package
#Unzip file tar -zxvf elasticsearch-7.15.2-linux-x86_64.tar.gz -C /usr/local/ #Duplicate name mv elasticsearch-7.15.2 /usr/local/elasticsearch
3. Create user group
Since elasticsearch cannot be started with the root account, an account needs to be created
#Create user group groupadd es #Create user useradd -g es snail_es #to grant authorization chown -R snail_es.es /usr/local/elasticsearch/
4. Create es data directory to store data and logs, and authorize
#Create catalog files and authorize mkdir /usr/local/es chown -R snail_es.es /usr/local/es
5. Modify the configuration file (please modify the configuration of each node according to the following configuration file response)
# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: #Cluster name. The names of the three nodes are the same cluster.name: my-es # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: #The name of each node is different node.name: node-1 # # Add custom attributes to the node: # #node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): #Data directory path.data: /usr/local/es/data # # Path to log files: #Log directory path.logs: /usr/local/es/logs # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # By default Elasticsearch is only accessible on localhost. Set a different # address here to expose this node on the network: #Current host ip network.host: 192.168.139.160 # # By default Elasticsearch listens for HTTP traffic on the first free port it # finds starting at 9200. Set a specific HTTP port here: #External port number http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when this node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] #Cluster discovery, the default port is 9300 discovery.seed_hosts: ["192.168.139.160","192.168.139.161", "192.168.139.162"] #Cluster node name # Bootstrap the cluster using an initial set of master-eligible nodes: # cluster.initial_master_nodes: ["node-1", "node-2","node-3"] # # For more information, consult the discovery and cluster formation module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true
6. Modify the server parameters, otherwise an error will be reported during startup
1,Elasticsearch Use a large number of file descriptors or file handles. Running out of file descriptors can be catastrophic and may result in data loss. Make sure that the will run Elasticsearch The limit on the number of user open file descriptors increased to 65536 or higher. /etc/security/limits.conf take nofile Set to 65535 2,Elasticsearch Many thread pools are used for different types of operations. It is important to be able to create new threads when needed. ensure Elasticsearch The number of threads that users can create is at least 4096. You can set ulimit -u 4096 with root start-up Elasticsearch, Or through /etc/security/limits.conf set up nproc by 4096 #terms of settlement vi /etc/security/limits.conf,Add the following: * soft nofile 65536 * hard nofile 131072 * soft nproc 2048 * hard nproc 4096 After that, restart the server to take effect 3,Elasticsearch Default use mmapfs A directory stores its index. Default operating system pair mmap The limit of the count may be too low, which may cause out of memory exceptions. sysctl -w vm.max_map_count=262144 #terms of settlement: stay/etc/sysctl.conf Add a line at the end of the file vm.max_map_count=262144 implement/sbin/sysctl -p Effective immediately
7. Switch the user to the installation directory and start the three nodes respectively
./bin/elasticsearch #Background start ./bin/elasticsearch -d
8. Test results, browser input
http://192.168.139.160:9200/_cat/nodes?pretty
Docker installation
1. Pull image file
[root@bogon ~]# docker pull elasticsearch:7.14.2
2. Create mount directory and authorize
[root@localhost ~]# mkdir -p /data/es/{conf,data,logs,plugins} #to grant authorization [root@localhost ~]# chmod 777 -R /data/
3. The configuration file only needs to be modified
node.name: node-1,node.name: node-2,node.name: node-3,
# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: #Cluster name. The names of the three nodes are the same cluster.name: my-es # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: #The name of each node is different node.name: node-1 # # Add custom attributes to the node: # #node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): #Data directory #path.data: /usr/local/es/data # # Path to log files: #Log directory #path.logs: /usr/local/es/logs # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # By default Elasticsearch is only accessible on localhost. Set a different # address here to expose this node on the network: #Current host ip network.host: 0.0.0.0 # # By default Elasticsearch listens for HTTP traffic on the first free port it # finds starting at 9200. Set a specific HTTP port here: #External port number http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when this node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] #Cluster discovery, the default port is 9300 discovery.seed_hosts: ["192.168.139.160","192.168.139.161", "192.168.139.162"] #Cluster node name # Bootstrap the cluster using an initial set of master-eligible nodes: # cluster.initial_master_nodes: ["node-1", "node-2","node-3"] # # For more information, consult the discovery and cluster formation module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true
4. Perform step 6 above before creating a docker container
docker run --name elasticsearch --net=host \ -v /data/es/conf/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /data/es/data:/usr/share/elasticsearch/data \ -v /data/es/logs:/usr/share/elasticsearch/logs \ -v /data/es/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.14.2
5. Verify
http://192.168.139.160:9200/_cat/nodes?pretty
3, Configure Chinese word splitter
Download the word splitter, corresponding to the es version
https://github.com/medcl/elasticsearch-analysis-ik/releases
Traditional way
1. Create ik directory
[root@bogon plugins]# mkdir -p /usr/local/elasticsearch/plugins/ik
2. Unzip the downloaded word segmentation to the ik directory
[root@bogon ik]# unzip /root/elasticsearch-analysis-ik-7.15.2.zip -d /usr/local/elasticsearch/plugins/ik/
3. Start elasticsearch to validate the word breaker
Docker mode
1. Extract it to the mount directory
[root@bogon plugins]# unzip /root/elasticsearch-analysis-ik-7.14.2.zip -d /data/es/plugins/ik
2. Restart Docker
[root@bogon plugins]# docker start elasticsearch
3. The inspection method is the same as above
4, Integrate SpringBoot
maven dependency
<dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <!--swagger rely on --> <dependency> <groupId>io.springfox</groupId> <artifactId>springfox-boot-starter</artifactId> <version>3.0.0</version> </dependency> </dependencies>
Core code
package com.xiaojie.es.service; import com.xiaojie.es.entity.User; import com.xiaojie.es.mapper.UserMapper; import com.xiaojie.es.util.ElasticSearchUtils; import org.apache.commons.lang3.RandomStringUtils; import org.apache.commons.lang3.RandomUtils; import org.apache.commons.lang3.StringUtils; import org.elasticsearch.common.Strings; import org.elasticsearch.index.query.BoolQueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.fetch.subphase.FetchSourceContext; import org.elasticsearch.search.sort.SortOrder; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import java.io.IOException; import java.util.List; import java.util.Map; /** * @Description: * @author: yan * @date: 2021.11.30 */ @Service public class UserService { @Autowired private UserMapper userMapper; @Autowired private ElasticSearchUtils elasticSearchUtils; //Add user public void add() throws IOException { // elasticSearchUtils.createIndex("user"); for (int i = 0; i < 100; i++) { User user = new User(); String chars = "11 On June 29, in the women's doubles final of the 2021 World Table Tennis Championships in Houston, the United States, Chinese team sun yingsha Wang Manyu beat Japanese team ITO Meicheng Zaotian Xina 3-0 to win the championship"; user.setName(RandomStringUtils.random(3, chars)); user.setAge(RandomUtils.nextInt(18, 40)); userMapper.add(user); //Add to es elasticSearchUtils.addData(user, "user"); } } /* * * @todo Query user * @author yan * @date 2021/11/30 16:24 * @return java.util.List<java.util.Map<java.lang.String,java.lang.Object>> */ public List<Map<String, Object>> search() throws IOException { //Build query criteria BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder(); //Precise query //boolQueryBuilder.must(QueryBuilders.wildcardQuery("name", "Zhang San"); // Fuzzy query boolQueryBuilder.filter(QueryBuilders.wildcardQuery("name", "king")); // Range query from: equivalent to a closed interval; gt: equivalent to open interval (>) GTE: equivalent to closed interval (> =) lt: open interval (<) LTE: closed interval (< =) boolQueryBuilder.filter(QueryBuilders.rangeQuery("age").from(18).to(32)); SearchSourceBuilder query = new SearchSourceBuilder(); query.query(boolQueryBuilder); //All fields to be queried are queried by default String fields = ""; //Fields to highlight String highlightField = "name"; if (StringUtils.isNotBlank(fields)) { //Query only specific fields. If all fields need to be queried, this item is not set. query.fetchSource(new FetchSourceContext(true, fields.split(","), Strings.EMPTY_ARRAY)); } //Paging parameter, equivalent to pageNum Integer from = 0; //Paging parameter, equivalent to pageSize Integer size = 10; //Set paging parameters query.from(from); query.size(size); //Set the sorting field and sorting method. Note: the field is of text type and needs to be spliced with. keyword //query.sort("age", SortOrder.DESC); query.sort("name" + ".keyword", SortOrder.ASC); return elasticSearchUtils.searchListData("user", query, highlightField); } }
Full code: Spring boot: spring boot integrates redis, message oriented middleware and other related codes es module of
reference resources:
Elasticsearch Chinese document | elasticsearch Technology Forum
https://www.cnblogs.com/wqp001/p/14478900.html