Cereals mall – elasticsearch – advanced note 1
1. What is elastic search?
Elasticsearch is a distributed search and analysis engine located at the core of Elastic Stack. Logstash and Beats help you collect, aggregate, enrich, and store your data in elastic search. Kibana enables you to interactively explore, visualize, and share insight into data, and manage and monitor the stack. Elastic search is where indexing, searching, and analysis occur.
Elasticsearch provides near real-time search and analysis for all types of data. Whether you have structured or unstructured text, numeric data, or geospatial data, elasticsearch can efficiently store and index it in a way that supports fast search. You can go beyond simple data retrieval and aggregating information to discover trends and patterns in data. As your data and queries grow, elasticsearch's distributed nature enables your deployment to grow seamlessly.
Although not all problems are search problems, Elasticsearch provides speed and flexibility to process data in various use cases:
- Add a search box to an app or website
- Store and analyze logs, metrics, and security event data
- Use machine learning to automatically model the behavior of data in real time
- Automate business workflows using Elasticsearch as a storage engine
- Use Elasticsearch as a geographic information system (GIS) to manage, integrate and analyze spatial information
- Elasticsearch is used as a bioinformatics research tool to store and process genetic data
We are constantly surprised by the novel ways people use search. However, whether your use case is similar to one of them or you are using Elasticsearch to solve new problems, you process data, documents, and indexes the same way in Elasticsearch.
2. Introduction
Full text search is the most common requirement. Open source Elasticsearch is the first choice of full-text search engines. It can quickly store, search and analyze massive data. Wikipedia, Stack Overflow and Github all use it. The bottom layer of Elastic is the open source library Lucene. However, you can't use Lucene directly. You must write your own code to call it
Interface. Elastic is the package of Lucene and provides the operation interface of REST API, which can be used out of the box.
REST API: natural cross platform.
Official documents
Official Chinese
Community Chinese
3. Basic concepts
3.1 Index
- Verb, equivalent to insert in MySQL
- Noun, equivalent to Database in MySQL
3.2 Type
3.2.1 concept
In index, one or more types can be defined, and the data of each type can be put together
Similar to MySQL, one or more tables can be defined in the database;
3.2.2 ElasticSearch7 - concept
The type parameter in Elasticsearch 7. X URL is optional. For example, indexing a document no longer requires a document type.
Elasticsearch 8.X no longer supports the type parameter in URL s.
reason
Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but this is not the case in ES. Elastic search is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.
- Two users under two different types_ Name, under the same index of ES, is actually considered to be the same filed. You must define the same filed mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decrease of Lucene processing efficiency.
- Removing type is to improve the efficiency of ES data processing.
3.2.3 Elasticsearch version upgrade problem (upgrade to 8)
Solution: migrate the index from multi type to single type, and each type of document has an independent index
3.3 document
Save to an index, a data (Document) of a certain type, and the document is in JSON format
A Document is like a record of a table in MySQL
3.4 inverted index
Inverted index: because the attribute value is not determined by the record, but the location of the record is determined by the attribute value, it is called inverted index.
Index storage example
Split the whole sentence into words, store the word value and index, query the index position according to the word, and then sort according to the correlation score of the search conditions
4.Docker installation ES and kibana
4.1 downloading image files
Elastic search is synchronized with kibana version
docker pull elasticsearch:7.4.2 #Storing and retrieving data docker pull kibana:7.4.2 #Visual retrieval data
4.2 create instance
4.2.1 create ElasticSearch instance
#First create the es data and configuration with the folder to be mapped mkdir -p /mydata/elasticsearch/config mkdir -p /mydata/elasticsearch/data #Configure es address echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml #Guarantee authority chmod -R 777 /mydata/elasticsearch/ #Create and start an instance docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \ -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.4.2
**Special attention:
-e ES_JAVA_OPTS="-Xms64m -Xmx256m" \ in the test environment, set the initial memory and maximum memory of ES, otherwise it will not be imported
Too large to start ES**
4.2.2 difference between 9200 and 9300 ports of ElasticSearch
-
As an Http protocol, 9200 is mainly used for external communication
-
9300 is the TCP protocol. Jars communicate with each other through TCP protocol. ES clusters communicate with each other through 9300
The test is created successfully
192.168.157.128: 9200 virtual machine address + 9200
4.2.3 create Kibana instance
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.157.128:9200 -p 5601:5601 \ -d kibana:7.4.2
The test is created successfully
192.168.157.128:5601
4.3 set es and Kibana to start when docker starts
#Set es to start when docker is turned on docker update elasticsearch --restart=always #Set Kibana to start when docker is turned on docker update kibana --restart=always
Restart docker and find that es and Kibana can still be used
5. Preliminary search
5.1_cat
- GET /_cat/nodes: view all nodes
- GET /_cat/health: View es health status
- GET /_cat/master: View master node
- GET /_ Cat / indexes: view all indexes show databases;
5.2 index (save) a document
5.2.1 PUT method (must be with id)
To save a data, specify which index and type it is saved in, and which unique identifier is used
#Save the No. 1 data under the external type under the customer index PUT customer/external/1 { "name": "Zhang shan" }
5.2.2 post method (without id)
#Index (save) a document (POST) GET customer/external/2 { "name":"Li Si" }
5.2.3 difference between using GET method and using PUT method to index (save) documents
Personal understanding:
Both PUT and POST can add and modify documents
Both PUT and POST modified documents are indexed again with the specified id
POST can be added without id (automatically generated) or user-defined id
PUT can only be added with a user-defined id and cannot be generated automatically (PUT is set for modification)
Both PUT and POST can add and modify documents
POST added. If you do not specify an id, an id is automatically generated. Specifying the id will modify the data and add a version number
- New POST with id: e.g. 5.2.2
- POST add without ID: ID is automatically generated
- POST modification
PUT can be added or modified. PUT must specify id; Because PUT needs to specify an id, we usually use it for modification. If we do not specify an id, an error will be reported.
- PUT must be added with id, otherwise an error will be reported
5.3 query the specified id document
#Query document specified id GET /customer/external/1
#Query results { "_index" : "customer", //At which index "_type" : "external", //In which type "_id" : "1", //Record id "_version" : 2, //Version number "_seq_no" : 10, //The concurrency control field will be + 1 each time it is updated, which is used as an optimistic lock "_primary_term" : 1, //As above, the main partition will be reassigned. If it is restarted, it will change "found" : true, "_source" : { //Real content "name" : "Zhang shan2" } }
5.4 updating documents
5.4.1 POST update mode I
POST customer/external/1/_update { "doc": { "name": "John Doew" } }
5.4.2 POST update mode II
#That's how it was written before POST customer/external/1 { "name": "John Doe2" }
5.4.3 PUT update
PUT customer/external/1 { "name": "John Doe3" }
5.4.3 add attributes while updating
#Update and add attributes POST customer/external/1/_update { "doc": { "name": "Jane Doe", "age": 20 } }
5.4.4 characteristics of three update methods
The difference between POST mode 1 and POST mode 2 is whether update is included
- Bring_ POST update of update: the source document data will be compared. If they are the same, there will be no operation, and the document version will not be increased
(the source document data will be compared. If they are the same, there will be no operation, and the document version will not be increased)
- No_ POST of update: always save the data again and increase the version
POST usage scenario
-
For large concurrent updates, no update is required;
-
For large concurrent queries, it is updated occasionally with update; Compare the updates and recalculate the allocation rules.
- PUT operation will always save the data again and increase the version;
5.5 delete document & index with specified id
5.5.1 delete the specified id document
#remove document DELETE customer/external/1
5.5.2 delete index
#Delete index DELETE customer
5.6 bulk batch API
Perform multiple index or delete operations in a single API call. This reduces overhead and can greatly improve indexing speed.
5.6.1 syntax format
{action: {metadata}} / / action: operation; Metadata: which data to operate on
{request body} / / operation contents
{ action: { metadata }}
{ request body }
In pairs
#Batch operation POST customer/external/_bulk //Batch operation on external type under customer index {"index":{"_id":"1"}} //Index (add) a document, specifying id=1 {"name": "John Doe" } //Attribute value with id=1 {"index":{"_id":"2"}} //Index (add) a document, specifying id=2 {"name": "Jane Doe" } //id=2 attribute value
5.6.2 complex examples
POST /_bulk { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123"}} { "doc" : {"title" : "My updated blog post"} }
The bulk API executes all actions in this order. If a single action fails for any reason, it will continue to process the remaining actions after it. When the bulk API returns, it will provide the status of each action (in the same order as sending), so you can check whether a specified action has failed.
5.6.3 sample test data
The original test data address has been hung up
New test data address: https://gitee.com/zhourui815/gulimall/blob/master/doc/es%E6%B5%8B%E8%AF%95%E6%95%B0%E6%8D%AE.json
POST bank/account/_bulk test data
6. Advanced search
6.1 SearchAPI
ES supports two basic retrieval methods:
- One is to send search parameters (uri + retrieval parameters) by using the REST request URI
- The other is to send them by using REST request body (uri + request body)
6.1.1 retrieval information
**Everything retrieved from_ search start**
#Retrieve all information under the bank, including type and docs GET bank/_search
6.1.1.1 request parameter retrieval
#Request parameter retrieval q=* Represents the field to query, similar to select * sort=account_number:asc Representative according to account_number field order,Ascending order GET bank/_search?q=*&sort=account_number:asc
6.1.1.2 uri + request body for retrieval
#uri + request body for retrieval GET bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": { "order": "desc" } } ] }
6.2 Query DSL
Elasticsearch provides a Json style DSL (domain specific language) that can execute queries. This is called Query DSL. The query language is very comprehensive, and it feels a little complicated at the beginning. The way to really learn it well is to start with some basic examples.
6.2.1 basic syntax format
6.2.1.1 query structure
#Typical structure of a query statement { QUERY_NAME: { ARGUMENT: VALUE, ARGUMENT: VALUE,... } } #If it is for a field, its structure is as follows: { QUERY_NAME: { FIELD_NAME: { ARGUMENT: VALUE, ARGUMENT: VALUE,... } } }
6.2.1.2 query example
Query the back index by account_ The number field is sorted in descending order, and the page size is 5
#Basic query example GET bank/_search { "query": { "match_all": {} }, "from": 0, "size": 5, "sort": [ { "account_number": { "order": "desc" } } ] }
- Query defines how to query,
- match_all query type [represents all queries]. In es, you can combine many query types in query to complete complex queries
- In addition to the query parameter, we can also pass other parameters to change the query result. Such as sort, size
- from+size limit to complete paging function
- Sort sort, multi field sort, will sort the subsequent fields when the previous fields are equal, otherwise the previous order will prevail
6.2.2 match
#Match basic type (non string type), exact match GET bank/_search { "query": { "match": { "account_number": "20" } } }
match returns account_number=20
#match string type, full text retrieval GET bank/_search { "query": { "match": { "address": "mill" } } }
Finally, all records containing the word "mill" in the address will be queried. When the string type is searched, the full-text search will be carried out, and each record has a correlation score.
#match string, multiple words (word segmentation + full text search) GET bank/_search { "query": { "match": { "address": "mill road" } } }
Finally, query all records containing mill or road or mill road in address, and give correlation scores
6.2.3 match_phrase matching
**Phrase matching: retrieve the value to be matched as a whole word (without word segmentation)**
#match_phrase matching GET bank/_search { "query": { "match_phrase": { "address": "mill road" } } }
Find out all records containing mill road in address and give correlation scores
6.2.4 multi_match multi field matching
#multi_match multi field matching GET bank/_search { "query": { "multi_match": { "query": "mill", "fields": ["address","state"] } } }
The query state or address contains mill
6.2.5 bool compound query
bool is used for compound query:
It is important to understand that compound statements can combine any other query statements, including compound statements. This means that compound statements can be nested with each other and can express very complex logic.
6.2.5.1 must
All conditions listed in must must be met
#Must meet all the conditions listed in must GET bank/_search { "query": { "bool": { "must": [ { "match": { "address": "Mill" } }, { "match": { "gender": "M" } } ] } } }
6.2.5.2 should
The conditions listed in should should should be met. If they are met, the score of relevant documents will be increased and the query result will not be changed. If there is only should and only one matching rule in the query, the condition of should will be used as the default matching condition to change the query result
# The conditions listed in should should should be met. If they are met, it will increase #Adding the score of relevant documents will not change the query results. GET bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "gender": "M" } } ], "should": [ { "match": { "address": "lane" } } ] } } }
6.2.5.3 must_not
Must not be the specified case
#must_not must not be the case specified GET bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "gender": "M" } } ], "should": [ { "match": { "address": "lane" } } ], "must_not": [ {"match": { "FIELD": "TEXT" }} ] } } }
"email" queried in 6.2.5.2:“ winnieholland@neteria.com ”, my record is missing
The address contains mill and the gender is M. if there is lane in the address, it is best, but the email must not contain baluba.com
6.2.5.4 filter result filtering
Not all queries need to generate scores, especially those used only for "filtering"** In order not to calculate the score * * Elasticsearch will automatically check the scenario and optimize the execution of the query.
6.2.5.4 summary
event | describe |
---|---|
must | The clause (query) must appear in the matching document and will contribute to the score. |
filter | Clause (query)) must appear in the matching document. However, unlike must, the score of this query will be ignored. |
should | Clause (query) should appear in the matching document. A Boolean query does not contain a must or filter clause, and one or more should clauses must have matching files. The minimum number of matching should conditions can be set by setting the minimum_should_match parameter. |
must_not | Clause (query) cannot appear in a matching document. |
6.2.6 term non text field retrieval
The same as match. Match the value of a property. Match is used for full-text search fields and term is used for other non text fields.
#term other non text field retrieval GET bank/_search { "query": { "bool": { "must": [ { "term": { "account_number": { "value": "970" } } }, { "match": { "address": "Mill" } } ] } } }
6.2.7 aggregations aggregation retrieval
Aggregation provides the ability to group and extract data from data. The simplest aggregation method is roughly equal to SQL GROUP BY and SQL aggregation function. In Elasticsearch, you can perform a search to return hits (hit results), and return the aggregation results at the same time, and put all hits (hit results) in a response The ability to separate. This is very powerful and effective. You can execute queries and multiple aggregations, get their own (any) return results in one use, and use a concise and simplified API to avoid network roundtrip.
6.2.7.1 age distribution and calculate the average value of age
Search for the age distribution and average age of all people with mill in address, but do not display the details of these people
#aggregations perform aggregation #Age distribution and mean aggregation GET bank/_search { "query": { "bool": { "must": [ {"match": { "address": "Mill" }} ] } }, "aggs": { "group_by_state": { "terms": { "field": "age", "size": 10 } }, "avgAge": { "avg": { "field": "age" } } } }
size: 0 do not display search data
aggs: performs aggregation. The aggregation syntax is as follows
"aggs": {
"aggs_name is the name of this aggregation, which can be easily displayed in the result set":{
"AGG_TYPE aggregate type (avg,term,terms)": {}
}
},
6.2.7.2 calculate the age distribution and average salary at each age
#Average salary within age distribution GET bank/_search { "query": { "match_all": {} }, "aggs": { "agg_avg": { "terms": { "field": "age", "size": 10 } , "aggs": { "banlances_avg": { "avg": { "field": "balance" } } } } }, "size": 0 }
Note that it is different from 6.2.7.1. 6.2.7.1 is divided into two aggregates
This example aggregates again on the basis of one aggregation
6.2.7.3 calculate the average value based on age distribution and gender distribution
Find out all age distributions, and the average salary of M and F in these age groups, as well as the overall average salary of this age group
#Find out all age distributions and M in these age groups #The average salary of F and the average salary of F and this age #Overall average salary for segment GET bank/_search { "query": { "match_all": {} }, "aggs": { "age_state": { "terms": { "field": "age", "size": 100 }, "aggs": { "sex_agg": { "terms": { "field": "gender.keyword", "size": 10 }, "aggs": { "banlances_avg": { "avg": { "field": "balance" } } } } } } }, "size": 0 }
6.3 Mapping
6.3.1 field type
6.3.2 mapping
Mapping
Mapping is used to define a document and how its properties (field s) are stored and stored
Index. For example, use mapping to define:
- Which string attributes should be considered full text fields.
- Which attributes contain numbers, dates, or geographic locations.
- Whether all properties in the document can be indexed (_allconfiguration).
- Format of the date.
- Custom mapping rules to perform dynamic attribute addition.
6.3.2.1 viewing mapping information
#Viewing mapping information GET bank/_mapping
We didn't specify the type when creating the index. Why did we query it?
A: es will automatically guess the mapping type based on the data
6.3.3 new version change
Es7 and above removed the concept of type.
-
Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but it is not the case in es. elasticsearch is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.
-
Two user_name s of two different types are actually considered to be the same file D under the same index of ES. You must define the same file D mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decline of Lucene processing efficiency.
-
Removing type is to improve the efficiency of ES data processing.
Elasticsearch 7.x
-
The type parameter in the URL is optional. For example, indexing a document no longer requires a document type.
Elasticsearch 8.x
-
The type parameter in the URL is no longer supported.
solve:
1) Migrate the index from multi type to single type, and each type of document has an independent index
2) . migrate all the type data under the existing index to the specified location. See data migration for details
6.3.3.1 create mapping
#Create index and specify mapping PUT my-index { "mappings": { "properties": { "age":{ "type": "integer" }, "emali":{ "type": "keyword" }, "name":{ "type": "text" } } } }
6.3.3.2 add new field mapping
#Add new field mapping PUT my-index/_mapping { "properties": { "employee-id": { "type": "text", "index": false } } }
6.3.3.3 update mapping
We cannot update the mapping field that already exists. To update, we must create a new index for data migration
6.3.3.4 data migration
6.3.3.4.1 query the mapping type you want to modify
GET bank/_mapping #Create a new index
Copy the following properties
6.3.3.4.2 add a new index
-
Paste the attribute copied in 6.3.3.4.1 into the attribute without executing it first
#Create a new index PUT newbank { "properties": { "account_number" : { "type" : "long" }, "address" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "age" : { "type" : "long" }, "balance" : { "type" : "long" }, "city" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "email" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "employer" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "firstname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "gender" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "lastname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "state" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } }
- Modify and save the mapping type of each field according to your own needs
#Create a new index PUT /newbank { "mappings": { "properties": { "account_number": { "type": "long" }, "address": { "type": "text" }, "age": { "type": "integer" }, "balance": { "type": "long" }, "city": { "type": "keyword" }, "email": { "type": "keyword" }, "employer": { "type": "keyword" }, "firstname": { "type": "text" }, "gender": { "type": "keyword" }, "lastname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "state": { "type": "keyword" } } } }
6.3.3.4.3 data migration
First create the correct mapping of newbank. Then use the following method for data migration
Fixed syntax
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Migrate the data under the type of the old index
#data migration POST _reindex { "source": { "index": "bank" }, "dest": { "index": "newbank" } }
Migration succeeded
6.4 word segmentation
-
A tokenizer receives a character stream, divides it into independent tokens (words, usually independent words), and then outputs the tokens stream.
-
For example, the whitespace tokenizer splits text when it encounters white space characters. It splits the text "quick brown, fox!" into [Quick, brown, fox!].
-
The tokenizer is also responsible for recording the order or position position position of each term (used for phrase and word proximity query), and the character offsets of the start and end of the original word represented by the term (used to highlight the search content).
-
Elasticsearch provides many built-in word splitters that can be used to build custom analyzers.
6.4.1 install ik word splitter
Effect before installing ik word splitter
**Note: the default elasticsearch plugin install xxx.zip cannot be used for automatic installation**
https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2 Corresponding es version installation (version 7.4.2 selected here)
- Since the plugins directory was mapped earlier, Download elasticsearch-analysis-ik-7.4.2.zip at / mydata/elasticsearch/plugins /
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
- Unzip the downloaded file
unzip elasticsearch-analysis-ik-7.4.2.zip
- Delete zip file
rm –rf *.zip
- Move all files in the elasticsearch folder to the ik directory (self built directory)
mv elasticsearch/ ik
- Confirm that the word splitter is installed
#Enter the inside of the container docker exec -it container id /bin/bash #You can list the word splitters of the system cd ../bin elasticsearch plugin list
- Restart the container after discovering ik
docker restart elasticsearch
6.4.2 test word splitter
6.4.2.1 use default word splitter
#Use default word breaker POST _analyze { "text": "I am Chinese," }
6.4.2.2 use ik_smart word breaker
POST _analyze { "analyzer": "ik_smart", "text": "I am Chinese," }
6.4.2.3 use ik_max_word separator
#Using ik_max_word separator POST _analyze { "analyzer": "ik_max_word", "text": "I am Chinese," }
6.4.2.4 summary
It can be seen that different word splitters have obvious differences in word segmentation, so you can no longer use the default mapping when defining an index in the future. You need to create mapping manually because you need to select a word splitter.
6.4.4. User defined Thesaurus
6.4.4.1 undefined Thesaurus
6.4.4.2 custom thesaurus test
6.4.4.2.1 create Thesaurus
On the basis of building nginx, build nginx in Chapter 8.x
cd /mydata/nginx/html/ #Create custom Thesaurus vim fenci.txt
Add new words
6.4.4.2.1 user defined Thesaurus
Modify the configuration file of the word breaker
vim /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
Specify custom thesaurus address
Restart es
docker restart elasticsearch
test result
7.Elasticsearch-Rest-Client
7.1 why elasticsearch rest client
9300: TCP
- spring-data-elasticsearch:transport-api.jar;
- Different versions of springboot and transport-api.jar cannot be adapted to the es version
- 7.x is no longer recommended and will be abandoned after 8
9200: HTTP
- JestClient: unofficial, slow update
- RestTemplate: simulate sending HTTP requests. Many ES operations need to be encapsulated by themselves, which is troublesome
- HttpClient: same as above
- Elasticsearch rest client: the official RestClient encapsulates ES es ES operations. The API is hierarchical and easy to use
Final choice Elasticsearch-Rest-Client(elasticsearch-rest-high-level-client)
Why use higher order?
The difference between lower order and higher order is just like the difference between jdbc and mybatis
7.2 SpringBoot integration
7.2.1 add a new module gulimall search
Startup class
package site.zhourui.gilimall.search; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration; import org.springframework.cloud.client.discovery.EnableDiscoveryClient; @EnableDiscoveryClient @SpringBootApplication(exclude = DataSourceAutoConfiguration.class) public class GulimallSearchApplication { public static void main(String[] args) { SpringApplication.run(GulimallSearchApplication.class, args); } }
pom file
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.2.1.RELEASE</version> <relativePath/> <!-- lookup parent from repository --> </parent> <groupId>site.zhourui.gulimall</groupId> <artifactId>gulimall-search</artifactId> <version>0.0.1-SNAPSHOT</version> <name>gulimall-search</name> <description>ElasticSearch Retrieval service</description> <properties> <java.version>1.8</java.version> <elasticsearch.version>7.4.2</elasticsearch.version> <spring-cloud.version>Hoxton.SR9</spring-cloud.version> </properties> <dependencies> <dependency> <groupId>com.zhourui.gulimall</groupId> <artifactId>gulimall-common</artifactId> <version>0.0.1-SNAPSHOT</version> </dependency> <dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.4.2</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> <version>2.2.0.RELEASE</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> </dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>${spring-cloud.version}</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
7.2.2 add configuration class
gulimall-search/src/main/java/site/zhourui/gilimall/search/config/GulimallElasticSearchConfig.java
package site.zhourui.gilimall.search.config; import org.apache.http.HttpHost; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; /** * @author zr * @date 2021/10/25 14:27 */ @Configuration public class GulimallElasticSearchConfig { //Global general setting item, single instance singleton, build authorization request header, asynchrony and other information public static final RequestOptions COMMON_OPTIONS; static { RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder(); // builder.addHeader("Authorization","Bearer"+TOKEN); // builder.setHttpAsyncResponseConsumerFactory( // new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(30*1024*1024*1024)); COMMON_OPTIONS = builder.build(); } @Bean public RestHighLevelClient esRestClient() { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("192.168.157.128", 9200, "http"))); return client; } }
7.2.3 testing
gulimall-search/src/test/java/site/zhourui/gilimall/search/GulimallSearchApplicationTests.java
package site.zhourui.gilimall.search; import org.elasticsearch.client.RestHighLevelClient; import org.junit.jupiter.api.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.junit4.SpringRunner; @SpringBootTest @RunWith(SpringRunner.class) class GulimallSearchApplicationTests { @Autowired RestHighLevelClient client; @Test void contextLoads() { System.out.println(client); } }
test result
7.3 use
Official api reference documents
7.3.1 index (New) data
gulimall-search/src/test/java/site/zhourui/gilimall/search/GulimallSearchApplicationTests.java
package site.zhourui.gilimall.search; import com.alibaba.fastjson.JSON; import lombok.Data; import org.apache.catalina.User; import org.apache.ibatis.ognl.JavaSource; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.common.xcontent.XContentType; import org.junit.jupiter.api.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.test.context.junit4.SpringRunner; import site.zhourui.gilimall.search.config.GulimallElasticSearchConfig; import java.io.IOException; @SpringBootTest @RunWith(SpringRunner.class) class GulimallSearchApplicationTests { @Autowired RestHighLevelClient client; /** * Test index to es */ @Test void index() throws IOException { IndexRequest request = new IndexRequest("users");//Index name request.id("1");//Document id User user = new User(); user.setUserName("Zhang San"); user.setAge(18); user.setGender("male"); String jsonString = JSON.toJSONString(user); request.source(jsonString,XContentType.JSON);//What to save //Perform operation IndexResponse index = client.index(request, GulimallElasticSearchConfig.COMMON_OPTIONS); //Extract useful response data System.out.println(index); } @Data class User{ private String userName; private Integer age; private String gender; } @Test void contextLoads() { System.out.println(client); } }
Successful indexing
7.3.2 data acquisition
/** * Test query es * @throws IOException */ @Test void search() throws IOException { //Create index request SearchRequest searchRequest = new SearchRequest(); //Specify index searchRequest.indices("bank"); //Specify DSL, search criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill")); //Construct search conditions // searchSourceBuilder.query(); // searchSourceBuilder.from(); // searchSourceBuilder.size(); // searchSourceBuilder.aggregation(); searchRequest.source(searchSourceBuilder); //Perform retrieval SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); //Analysis result searchResponse System.out.println(searchResponse); }
searchResponse query results
And
#match string type full text retrieval GET bank/_search { "query": { "match": { "address": "mill" } } }
The query results are consistent
{ "took": 15, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": 5.4032025, "hits": [{ "_index": "bank", "_type": "account", "_id": "970", "_score": 5.4032025, "_source": { "account_number": 970, "balance": 19648, "firstname": "Forbes", "lastname": "Wallace", "age": 28, "gender": "M", "address": "990 Mill Road", "employer": "Pheast", "email": "forbeswallace@pheast.com", "city": "Lopezo", "state": "AK" } }, { "_index": "bank", "_type": "account", "_id": "136", "_score": 5.4032025, "_source": { "account_number": 136, "balance": 45801, "firstname": "Winnie", "lastname": "Holland", "age": 38, "gender": "M", "address": "198 Mill Lane", "employer": "Neteria", "email": "winnieholland@neteria.com", "city": "Urie", "state": "IL" } }, { "_index": "bank", "_type": "account", "_id": "345", "_score": 5.4032025, "_source": { "account_number": 345, "balance": 9812, "firstname": "Parker", "lastname": "Hines", "age": 38, "gender": "M", "address": "715 Mill Avenue", "employer": "Baluba", "email": "parkerhines@baluba.com", "city": "Blackgum", "state": "KY" } }, { "_index": "bank", "_type": "account", "_id": "472", "_score": 5.4032025, "_source": { "account_number": 472, "balance": 25571, "firstname": "Lee", "lastname": "Long", "age": 32, "gender": "F", "address": "288 Mill Street", "employer": "Comverges", "email": "leelong@comverges.com", "city": "Movico", "state": "MT" } }] } }
7.3.3 aggregate query
7.3.3.1 age distribution
/** * Test aggregate query es * @throws IOException */ @Test void aggSearch1() throws IOException { //Create index request SearchRequest searchRequest = new SearchRequest(); //Specify index searchRequest.indices("bank"); //Specify DSL, search criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill")); //Polymerization conditions TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10); //Add aggregation condition to search condition searchSourceBuilder.aggregation(ageAgg); //Construct search conditions // searchSourceBuilder.query(); // searchSourceBuilder.from(); // searchSourceBuilder.size(); // searchSourceBuilder.aggregation(); searchRequest.source(searchSourceBuilder); //Perform retrieval SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); //Analysis result searchResponse System.out.println(searchResponse); }
Query results
And
#aggregations perform aggregation #Age distribution and mean aggregation GET bank/_search { "query": { "bool": { "must": [ {"match": { "address": "Mill" }} ] } }, "aggs": { "group_by_state": { "terms": { "field": "age", "size": 10 } } }, "size": 0 }
The query results are consistent
{ "took": 19, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": 5.4032025, "hits": [{ "_index": "bank", "_type": "account", "_id": "970", "_score": 5.4032025, "_source": { "account_number": 970, "balance": 19648, "firstname": "Forbes", "lastname": "Wallace", "age": 28, "gender": "M", "address": "990 Mill Road", "employer": "Pheast", "email": "forbeswallace@pheast.com", "city": "Lopezo", "state": "AK" } }, { "_index": "bank", "_type": "account", "_id": "136", "_score": 5.4032025, "_source": { "account_number": 136, "balance": 45801, "firstname": "Winnie", "lastname": "Holland", "age": 38, "gender": "M", "address": "198 Mill Lane", "employer": "Neteria", "email": "winnieholland@neteria.com", "city": "Urie", "state": "IL" } }, { "_index": "bank", "_type": "account", "_id": "345", "_score": 5.4032025, "_source": { "account_number": 345, "balance": 9812, "firstname": "Parker", "lastname": "Hines", "age": 38, "gender": "M", "address": "715 Mill Avenue", "employer": "Baluba", "email": "parkerhines@baluba.com", "city": "Blackgum", "state": "KY" } }, { "_index": "bank", "_type": "account", "_id": "472", "_score": 5.4032025, "_source": { "account_number": 472, "balance": 25571, "firstname": "Lee", "lastname": "Long", "age": 32, "gender": "F", "address": "288 Mill Street", "employer": "Comverges", "email": "leelong@comverges.com", "city": "Movico", "state": "MT" } }] }, "aggregations": { "lterms#ageAgg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [{ "key": 38, "doc_count": 2 }, { "key": 28, "doc_count": 1 }, { "key": 32, "doc_count": 1 }] } } }
7.3.4 get query results (convert to object)
The specific results are in the hits below
Use json to generate Java beans and use lombok
/** * Auto-generated: 2021-10-25 16:57:49 * * @author bejson.com (i@bejson.com) * @website http://www.bejson.com/java2pojo/ */ @Data @ToString public static class Accout { private int account_number; private int balance; private String firstname; private String lastname; private int age; private String gender; private String address; private String employer; private String email; private String city; private String state; }
/** * Test aggregate query es * @throws IOException */ @Test void aggSearch1() throws IOException { //Create index request SearchRequest searchRequest = new SearchRequest(); //Specify index searchRequest.indices("bank"); //Specify DSL, search criteria SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill")); //Polymerization conditions TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10); //Add aggregation condition to search condition searchSourceBuilder.aggregation(ageAgg); //Construct search conditions // searchSourceBuilder.query(); // searchSourceBuilder.from(); // searchSourceBuilder.size(); // searchSourceBuilder.aggregation(); searchRequest.source(searchSourceBuilder); //Perform retrieval SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS); //Analysis result searchResponse System.out.println(searchResponse); //Get all query results SearchHits hits = searchResponse.getHits(); SearchHit[] searchHits = hits.getHits(); for (SearchHit searchHit : searchHits) { // searchHit.getId(); // searchHit.getIndex(); // searchHit.getType(); //Convert to json string String sourceAsString = searchHit.getSourceAsString(); Accout accout = JSON.parseObject(sourceAsString, Accout.class); System.out.println(accout); } }
Query results
8. Install nginx
8.1 start an nginx instance just to copy the configuration
#Create an empty folder cd /mydata/ mkdir nginx #Enable nginx instance #No nginx image will be downloaded and started automatically docker run -p 80:80 --name nginx -d nginx:1.10
8.2 copy the configuration file in the container to the current directory
#The current directory is mydata docker container cp nginx:/etc/nginx .
8.3 modification of document name
#Modify file name mv nginx conf #Create a new nginx folder mkdir nginx #Copy the config folder to nginx mv conf nginx/
8.4 delete the original container:
#Stop original container docker stop nginx #Delete original container docker rm container id
8.5 creating a new nginx
docker run -p 80:80 --name nginx \ -v /mydata/nginx/html:/usr/share/nginx/html \ -v /mydata/nginx/logs:/var/log/nginx \ -v /mydata/nginx/conf:/etc/nginx \ -d nginx:1.10
visit: 192.168.157.128 Virtual machine address
The setup succeeded, but the file was not accessed
Create a new hello word file
vim index.html
Test passed