Cereal mall -- elasticsearch -- advanced note 1

Cereals mall – elasticsearch – advanced note 1

1. What is elastic search?

Elasticsearch is a distributed search and analysis engine located at the core of Elastic Stack. Logstash and Beats help you collect, aggregate, enrich, and store your data in elastic search. Kibana enables you to interactively explore, visualize, and share insight into data, and manage and monitor the stack. Elastic search is where indexing, searching, and analysis occur.

Elasticsearch provides near real-time search and analysis for all types of data. Whether you have structured or unstructured text, numeric data, or geospatial data, elasticsearch can efficiently store and index it in a way that supports fast search. You can go beyond simple data retrieval and aggregating information to discover trends and patterns in data. As your data and queries grow, elasticsearch's distributed nature enables your deployment to grow seamlessly.

Although not all problems are search problems, Elasticsearch provides speed and flexibility to process data in various use cases:

  • Add a search box to an app or website
  • Store and analyze logs, metrics, and security event data
  • Use machine learning to automatically model the behavior of data in real time
  • Automate business workflows using Elasticsearch as a storage engine
  • Use Elasticsearch as a geographic information system (GIS) to manage, integrate and analyze spatial information
  • Elasticsearch is used as a bioinformatics research tool to store and process genetic data

We are constantly surprised by the novel ways people use search. However, whether your use case is similar to one of them or you are using Elasticsearch to solve new problems, you process data, documents, and indexes the same way in Elasticsearch.

2. Introduction

Full text search is the most common requirement. Open source Elasticsearch is the first choice of full-text search engines. It can quickly store, search and analyze massive data. Wikipedia, Stack Overflow and Github all use it. The bottom layer of Elastic is the open source library Lucene. However, you can't use Lucene directly. You must write your own code to call it
Interface. Elastic is the package of Lucene and provides the operation interface of REST API, which can be used out of the box.
REST API: natural cross platform.
Official documents

Official Chinese
Community Chinese

3. Basic concepts

3.1 Index

  • Verb, equivalent to insert in MySQL
  • Noun, equivalent to Database in MySQL

3.2 Type

3.2.1 concept

In index, one or more types can be defined, and the data of each type can be put together

Similar to MySQL, one or more tables can be defined in the database;

3.2.2 ElasticSearch7 - concept

The type parameter in Elasticsearch 7. X URL is optional. For example, indexing a document no longer requires a document type.

Elasticsearch 8.X no longer supports the type parameter in URL s.

reason

Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but this is not the case in ES. Elastic search is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.

  • Two users under two different types_ Name, under the same index of ES, is actually considered to be the same filed. You must define the same filed mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decrease of Lucene processing efficiency.
  • Removing type is to improve the efficiency of ES data processing.

3.2.3 Elasticsearch version upgrade problem (upgrade to 8)

Solution: migrate the index from multi type to single type, and each type of document has an independent index

3.3 document

Save to an index, a data (Document) of a certain type, and the document is in JSON format

A Document is like a record of a table in MySQL

3.4 inverted index

Inverted index: because the attribute value is not determined by the record, but the location of the record is determined by the attribute value, it is called inverted index.

Index storage example

Split the whole sentence into words, store the word value and index, query the index position according to the word, and then sort according to the correlation score of the search conditions

4.Docker installation ES and kibana

4.1 downloading image files

Elastic search is synchronized with kibana version

docker pull elasticsearch:7.4.2 #Storing and retrieving data
docker pull kibana:7.4.2 #Visual retrieval data

4.2 create instance

4.2.1 create ElasticSearch instance

#First create the es data and configuration with the folder to be mapped
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
#Configure es address
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml
#Guarantee authority
chmod -R 777 /mydata/elasticsearch/ 
#Create and start an instance
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2

**Special attention:
-e ES_JAVA_OPTS="-Xms64m -Xmx256m" \ in the test environment, set the initial memory and maximum memory of ES, otherwise it will not be imported
Too large to start ES**

4.2.2 difference between 9200 and 9300 ports of ElasticSearch

  • As an Http protocol, 9200 is mainly used for external communication

  • 9300 is the TCP protocol. Jars communicate with each other through TCP protocol. ES clusters communicate with each other through 9300

The test is created successfully

192.168.157.128: 9200 virtual machine address + 9200

4.2.3 create Kibana instance

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.157.128:9200 -p 5601:5601 \
-d kibana:7.4.2

The test is created successfully

192.168.157.128:5601

4.3 set es and Kibana to start when docker starts

#Set es to start when docker is turned on
docker update elasticsearch --restart=always
#Set Kibana to start when docker is turned on
docker update kibana --restart=always

Restart docker and find that es and Kibana can still be used

5. Preliminary search

5.1_cat

  • GET /_cat/nodes: view all nodes

  • GET /_cat/health: View es health status

  • GET /_cat/master: View master node

  • GET /_ Cat / indexes: view all indexes show databases;

5.2 index (save) a document

5.2.1 PUT method (must be with id)

To save a data, specify which index and type it is saved in, and which unique identifier is used

#Save the No. 1 data under the external type under the customer index
PUT customer/external/1
{
  "name": "Zhang shan"
}

5.2.2 post method (without id)

#Index (save) a document (POST)
GET customer/external/2
{
  "name":"Li Si"
}

5.2.3 difference between using GET method and using PUT method to index (save) documents

Personal understanding:

Both PUT and POST can add and modify documents

Both PUT and POST modified documents are indexed again with the specified id

POST can be added without id (automatically generated) or user-defined id

PUT can only be added with a user-defined id and cannot be generated automatically (PUT is set for modification)

Both PUT and POST can add and modify documents

POST added. If you do not specify an id, an id is automatically generated. Specifying the id will modify the data and add a version number

  • New POST with id: e.g. 5.2.2
  • POST add without ID: ID is automatically generated

  • POST modification

PUT can be added or modified. PUT must specify id; Because PUT needs to specify an id, we usually use it for modification. If we do not specify an id, an error will be reported.

  • PUT must be added with id, otherwise an error will be reported

5.3 query the specified id document

#Query document specified id
GET /customer/external/1

#Query results
{
  "_index" : "customer",   //At which index
  "_type" : "external",    //In which type
  "_id" : "1",			   //Record id
  "_version" : 2,		   //Version number
  "_seq_no" : 10,		   //The concurrency control field will be + 1 each time it is updated, which is used as an optimistic lock
  "_primary_term" : 1,     //As above, the main partition will be reassigned. If it is restarted, it will change
  "found" : true,          
  "_source" : {            //Real content
    "name" : "Zhang shan2"
  }
}

5.4 updating documents

5.4.1 POST update mode I

POST customer/external/1/_update
{
  "doc": {
    "name": "John Doew"
  }
}

5.4.2 POST update mode II

#That's how it was written before
POST customer/external/1
{
  "name": "John Doe2"
}

5.4.3 PUT update

PUT customer/external/1
{
  "name": "John Doe3"
}

5.4.3 add attributes while updating

#Update and add attributes
POST customer/external/1/_update
{
  "doc": {
    "name": "Jane Doe",
    "age": 20
  }
}

5.4.4 characteristics of three update methods

The difference between POST mode 1 and POST mode 2 is whether update is included

  • Bring_ POST update of update: the source document data will be compared. If they are the same, there will be no operation, and the document version will not be increased

(the source document data will be compared. If they are the same, there will be no operation, and the document version will not be increased)

  • No_ POST of update: always save the data again and increase the version

POST usage scenario

  • For large concurrent updates, no update is required;

  • For large concurrent queries, it is updated occasionally with update; Compare the updates and recalculate the allocation rules.

  • PUT operation will always save the data again and increase the version;

5.5 delete document & index with specified id

5.5.1 delete the specified id document

#remove document
DELETE customer/external/1

5.5.2 delete index

#Delete index
DELETE customer

5.6 bulk batch API

Perform multiple index or delete operations in a single API call. This reduces overhead and can greatly improve indexing speed.

5.6.1 syntax format

{action: {metadata}} / / action: operation; Metadata: which data to operate on

{request body} / / operation contents

{ action: { metadata }}

{ request body }

In pairs

#Batch operation
POST customer/external/_bulk                //Batch operation on external type under customer index
{"index":{"_id":"1"}}                       //Index (add) a document, specifying id=1
{"name": "John Doe" }					    //Attribute value with id=1
{"index":{"_id":"2"}}						//Index (add) a document, specifying id=2
{"name": "Jane Doe" }						//id=2 attribute value

5.6.2 complex examples

POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}				  
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123"}}
{ "doc" : {"title" : "My updated blog post"} }

The bulk API executes all actions in this order. If a single action fails for any reason, it will continue to process the remaining actions after it. When the bulk API returns, it will provide the status of each action (in the same order as sending), so you can check whether a specified action has failed.

5.6.3 sample test data

The original test data address has been hung up

New test data address: https://gitee.com/zhourui815/gulimall/blob/master/doc/es%E6%B5%8B%E8%AF%95%E6%95%B0%E6%8D%AE.json

POST bank/account/_bulk
 test data

6. Advanced search

6.1 SearchAPI

ES supports two basic retrieval methods:

  • One is to send search parameters (uri + retrieval parameters) by using the REST request URI
  • The other is to send them by using REST request body (uri + request body)

6.1.1 retrieval information

**Everything retrieved from_ search start**

#Retrieve all information under the bank, including type and docs
GET bank/_search

6.1.1.1 request parameter retrieval
#Request parameter retrieval
q=*  Represents the field to query, similar to select *
sort=account_number:asc  Representative according to account_number field order,Ascending order
GET bank/_search?q=*&sort=account_number:asc

6.1.1.2 uri + request body for retrieval
#uri + request body for retrieval  
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": {
        "order": "desc"
      }
    }
  ]
}

6.2 Query DSL

Elasticsearch provides a Json style DSL (domain specific language) that can execute queries. This is called Query DSL. The query language is very comprehensive, and it feels a little complicated at the beginning. The way to really learn it well is to start with some basic examples.

6.2.1 basic syntax format

6.2.1.1 query structure
#Typical structure of a query statement
{
	QUERY_NAME: {
	ARGUMENT: VALUE,
	ARGUMENT: VALUE,...
	}
}
#If it is for a field, its structure is as follows:
{
	QUERY_NAME: {
		FIELD_NAME: {
		ARGUMENT: VALUE,
		ARGUMENT: VALUE,...
		}
	}
}
6.2.1.2 query example

Query the back index by account_ The number field is sorted in descending order, and the page size is 5

#Basic query example
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "account_number": {
        "order": "desc"
      }
    }
  ]
}
  • Query defines how to query,
  • match_all query type [represents all queries]. In es, you can combine many query types in query to complete complex queries
  • In addition to the query parameter, we can also pass other parameters to change the query result. Such as sort, size
  • from+size limit to complete paging function
  • Sort sort, multi field sort, will sort the subsequent fields when the previous fields are equal, otherwise the previous order will prevail

6.2.2 match

#Match basic type (non string type), exact match  
GET bank/_search
{
  "query": {
    "match": {
      "account_number": "20"
    }
  }
}

match returns account_number=20

#match string type, full text retrieval
GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  }
}

Finally, all records containing the word "mill" in the address will be queried. When the string type is searched, the full-text search will be carried out, and each record has a correlation score.

#match string, multiple words (word segmentation + full text search)
GET bank/_search
{
  "query": {
    "match": {
      "address": "mill road"
    }
  }
}

Finally, query all records containing mill or road or mill road in address, and give correlation scores

6.2.3 match_phrase matching

**Phrase matching: retrieve the value to be matched as a whole word (without word segmentation)**

#match_phrase matching
GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill road"
    }
  }
}

Find out all records containing mill road in address and give correlation scores

6.2.4 multi_match multi field matching

#multi_match multi field matching
GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill",
      "fields": ["address","state"]
    }
  }
}

The query state or address contains mill

6.2.5 bool compound query

bool is used for compound query:
It is important to understand that compound statements can combine any other query statements, including compound statements. This means that compound statements can be nested with each other and can express very complex logic.

6.2.5.1 must

All conditions listed in must must be met

#Must meet all the conditions listed in must
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "Mill"
          }
        },
        {
          "match": {
            "gender": "M"
          }
        }
      ]
    }
  }
}

6.2.5.2 should

The conditions listed in should should should be met. If they are met, the score of relevant documents will be increased and the query result will not be changed. If there is only should and only one matching rule in the query, the condition of should will be used as the default matching condition to change the query result

# The conditions listed in should should should be met. If they are met, it will increase
#Adding the score of relevant documents will not change the query results.
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "mill"
          }
        },
        {
          "match": {
            "gender": "M"
          }
        }
      ],
      "should": [
        {
          "match": {
            "address": "lane"
          }
        }
      ]
    }
  }
}

6.2.5.3 must_not

Must not be the specified case

#must_not must not be the case specified  
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "mill"
          }
        },
        {
          "match": {
            "gender": "M"
          }
        }
      ],
      "should": [
        {
          "match": {
            "address": "lane"
          }
        }
      ],
      "must_not": [
        {"match": {
          "FIELD": "TEXT"
        }}
      ]
    }
  }
}

"email" queried in 6.2.5.2:“ winnieholland@neteria.com ”, my record is missing

The address contains mill and the gender is M. if there is lane in the address, it is best, but the email must not contain baluba.com

6.2.5.4 filter result filtering

Not all queries need to generate scores, especially those used only for "filtering"** In order not to calculate the score * * Elasticsearch will automatically check the scenario and optimize the execution of the query.

6.2.5.4 summary
eventdescribe
mustThe clause (query) must appear in the matching document and will contribute to the score.
filterClause (query)) must appear in the matching document. However, unlike must, the score of this query will be ignored.
shouldClause (query) should appear in the matching document. A Boolean query does not contain a must or filter clause, and one or more should clauses must have matching files. The minimum number of matching should conditions can be set by setting the minimum_should_match parameter.
must_notClause (query) cannot appear in a matching document.

6.2.6 term non text field retrieval

The same as match. Match the value of a property. Match is used for full-text search fields and term is used for other non text fields.

#term other non text field retrieval
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "account_number": {
              "value": "970"
            }
          }
        },
        {
          "match": {
            "address": "Mill"
          }
        }
      ]
    }
  }
}

6.2.7 aggregations aggregation retrieval

Aggregate search official API

Aggregation provides the ability to group and extract data from data. The simplest aggregation method is roughly equal to SQL GROUP BY and SQL aggregation function. In Elasticsearch, you can perform a search to return hits (hit results), and return the aggregation results at the same time, and put all hits (hit results) in a response The ability to separate. This is very powerful and effective. You can execute queries and multiple aggregations, get their own (any) return results in one use, and use a concise and simplified API to avoid network roundtrip.

6.2.7.1 age distribution and calculate the average value of age

Search for the age distribution and average age of all people with mill in address, but do not display the details of these people

#aggregations perform aggregation
#Age distribution and mean aggregation
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "address": "Mill"
        }}
      ]
    }
  },
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "avgAge": {
      "avg": {
        "field": "age"
      }
    }
  }
}

size: 0 do not display search data

aggs: performs aggregation. The aggregation syntax is as follows

"aggs": {

"aggs_name is the name of this aggregation, which can be easily displayed in the result set":{

"AGG_TYPE aggregate type (avg,term,terms)": {}

​ }

},

6.2.7.2 calculate the age distribution and average salary at each age
#Average salary within age distribution
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "agg_avg": {
      "terms": {
        "field": "age",
        "size": 10
      }
      , 
      "aggs": {
        "banlances_avg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

Note that it is different from 6.2.7.1. 6.2.7.1 is divided into two aggregates

This example aggregates again on the basis of one aggregation

6.2.7.3 calculate the average value based on age distribution and gender distribution

Find out all age distributions, and the average salary of M and F in these age groups, as well as the overall average salary of this age group

#Find out all age distributions and M in these age groups
#The average salary of F and the average salary of F and this age
#Overall average salary for segment
GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_state": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "sex_agg": {
          "terms": {
            "field": "gender.keyword",
            "size": 10
          },
          "aggs": {
            "banlances_avg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

6.3 Mapping

6.3.1 field type

6.3.2 mapping

Mapping
Mapping is used to define a document and how its properties (field s) are stored and stored
Index. For example, use mapping to define:

  • Which string attributes should be considered full text fields.
  • Which attributes contain numbers, dates, or geographic locations.
  • Whether all properties in the document can be indexed (_allconfiguration).
  • Format of the date.
  • Custom mapping rules to perform dynamic attribute addition.
6.3.2.1 viewing mapping information
#Viewing mapping information
GET bank/_mapping

We didn't specify the type when creating the index. Why did we query it?

A: es will automatically guess the mapping type based on the data

6.3.3 new version change

Es7 and above removed the concept of type.

  • Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but it is not the case in es. elasticsearch is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.

  • Two user_name s of two different types are actually considered to be the same file D under the same index of ES. You must define the same file D mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decline of Lucene processing efficiency.

  • Removing type is to improve the efficiency of ES data processing.

    Elasticsearch 7.x

  • The type parameter in the URL is optional. For example, indexing a document no longer requires a document type.

    Elasticsearch 8.x

  • The type parameter in the URL is no longer supported.
    solve:
    1) Migrate the index from multi type to single type, and each type of document has an independent index
    2) . migrate all the type data under the existing index to the specified location. See data migration for details

6.3.3.1 create mapping
#Create index and specify mapping
PUT my-index
{
  "mappings": {
    "properties": {
      "age":{
        "type": "integer"
      },
      "emali":{
        "type": "keyword"
      },
      "name":{
        "type": "text"
      }
    }
  }
}

6.3.3.2 add new field mapping
#Add new field mapping
PUT my-index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "text",
      "index": false
    }
  }
}

6.3.3.3 update mapping

We cannot update the mapping field that already exists. To update, we must create a new index for data migration

6.3.3.4 data migration
6.3.3.4.1 query the mapping type you want to modify
GET bank/_mapping
#Create a new index

Copy the following properties

6.3.3.4.2 add a new index
  1. Paste the attribute copied in 6.3.3.4.1 into the attribute without executing it first

    #Create a new index
    PUT newbank
    {
      "properties": {
        "account_number" : {
              "type" : "long"
            },
            "address" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "age" : {
              "type" : "long"
            },
            "balance" : {
              "type" : "long"
            },
            "city" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "email" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "employer" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "firstname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "gender" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "lastname" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "state" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
      }
    }
    

  1. Modify and save the mapping type of each field according to your own needs
#Create a new index
PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}

6.3.3.4.3 data migration

First create the correct mapping of newbank. Then use the following method for data migration

Fixed syntax

POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}

Migrate the data under the type of the old index

#data migration 
POST _reindex
{
  "source": {
    "index": "bank"
  },
  "dest": {
    "index": "newbank"
  }
}

Migration succeeded

6.4 word segmentation

  • A tokenizer receives a character stream, divides it into independent tokens (words, usually independent words), and then outputs the tokens stream.

  • For example, the whitespace tokenizer splits text when it encounters white space characters. It splits the text "quick brown, fox!" into [Quick, brown, fox!].

  • The tokenizer is also responsible for recording the order or position position position of each term (used for phrase and word proximity query), and the character offsets of the start and end of the original word represented by the term (used to highlight the search content).

  • Elasticsearch provides many built-in word splitters that can be used to build custom analyzers.

6.4.1 install ik word splitter

Effect before installing ik word splitter

**Note: the default elasticsearch plugin install xxx.zip cannot be used for automatic installation**

https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2 Corresponding es version installation (version 7.4.2 selected here)

  1. Since the plugins directory was mapped earlier, Download elasticsearch-analysis-ik-7.4.2.zip at / mydata/elasticsearch/plugins /
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
  1. Unzip the downloaded file
unzip elasticsearch-analysis-ik-7.4.2.zip
  1. Delete zip file
rm –rf *.zip
  1. Move all files in the elasticsearch folder to the ik directory (self built directory)
mv elasticsearch/ ik
  1. Confirm that the word splitter is installed
#Enter the inside of the container
docker exec -it container id /bin/bash
#You can list the word splitters of the system
cd ../bin
elasticsearch plugin list

  1. Restart the container after discovering ik
docker restart elasticsearch

6.4.2 test word splitter

6.4.2.1 use default word splitter
#Use default word breaker
POST _analyze
{
"text": "I am Chinese,"
}

6.4.2.2 use ik_smart word breaker
POST _analyze
{
  "analyzer": "ik_smart",
  "text": "I am Chinese,"
}

6.4.2.3 use ik_max_word separator
#Using ik_max_word separator
POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "I am Chinese,"
}

6.4.2.4 summary

It can be seen that different word splitters have obvious differences in word segmentation, so you can no longer use the default mapping when defining an index in the future. You need to create mapping manually because you need to select a word splitter.

6.4.4. User defined Thesaurus

6.4.4.1 undefined Thesaurus

6.4.4.2 custom thesaurus test
6.4.4.2.1 create Thesaurus

On the basis of building nginx, build nginx in Chapter 8.x

cd /mydata/nginx/html/
#Create custom Thesaurus
vim fenci.txt

Add new words

6.4.4.2.1 user defined Thesaurus

Modify the configuration file of the word breaker

vim /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

Specify custom thesaurus address

Restart es

docker restart elasticsearch

test result

7.Elasticsearch-Rest-Client

7.1 why elasticsearch rest client

9300: TCP

  • spring-data-elasticsearch:transport-api.jar;
  • Different versions of springboot and transport-api.jar cannot be adapted to the es version
  • 7.x is no longer recommended and will be abandoned after 8

9200: HTTP

  • JestClient: unofficial, slow update
  • RestTemplate: simulate sending HTTP requests. Many ES operations need to be encapsulated by themselves, which is troublesome
  • HttpClient: same as above
  • Elasticsearch rest client: the official RestClient encapsulates ES es ES operations. The API is hierarchical and easy to use

Final choice Elasticsearch-Rest-Client(elasticsearch-rest-high-level-client)

Why use higher order?

The difference between lower order and higher order is just like the difference between jdbc and mybatis

7.2 SpringBoot integration

7.2.1 add a new module gulimall search

Startup class

package site.zhourui.gilimall.search;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
import org.springframework.cloud.client.discovery.EnableDiscoveryClient;

@EnableDiscoveryClient
@SpringBootApplication(exclude = DataSourceAutoConfiguration.class)
public class GulimallSearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(GulimallSearchApplication.class, args);
    }

}

pom file

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.2.1.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>site.zhourui.gulimall</groupId>
    <artifactId>gulimall-search</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>gulimall-search</name>
    <description>ElasticSearch Retrieval service</description>
    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.4.2</elasticsearch.version>
        <spring-cloud.version>Hoxton.SR9</spring-cloud.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>com.zhourui.gulimall</groupId>
            <artifactId>gulimall-common</artifactId>
            <version>0.0.1-SNAPSHOT</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.4.2</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
            <version>2.2.0.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>${spring-cloud.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

7.2.2 add configuration class

gulimall-search/src/main/java/site/zhourui/gilimall/search/config/GulimallElasticSearchConfig.java

package site.zhourui.gilimall.search.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author zr
 * @date 2021/10/25 14:27
 */
@Configuration
public class GulimallElasticSearchConfig {

    //Global general setting item, single instance singleton, build authorization request header, asynchrony and other information
    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
//        builder.addHeader("Authorization","Bearer"+TOKEN);
//        builder.setHttpAsyncResponseConsumerFactory(
//                new HttpAsyncResponseConsumerFactory.HeapBufferedResponseConsumerFactory(30*1024*1024*1024));
        COMMON_OPTIONS = builder.build();
    }
    @Bean
    public RestHighLevelClient esRestClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("192.168.157.128", 9200, "http")));
        return client;
    }

}

7.2.3 testing

gulimall-search/src/test/java/site/zhourui/gilimall/search/GulimallSearchApplicationTests.java

package site.zhourui.gilimall.search;


import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;


@SpringBootTest
@RunWith(SpringRunner.class)
class GulimallSearchApplicationTests {

    @Autowired
    RestHighLevelClient client;
    @Test
    void contextLoads() {
        System.out.println(client);
    }

}

test result

7.3 use

Official api reference documents

7.3.1 index (New) data

gulimall-search/src/test/java/site/zhourui/gilimall/search/GulimallSearchApplicationTests.java

package site.zhourui.gilimall.search;


import com.alibaba.fastjson.JSON;
import lombok.Data;
import org.apache.catalina.User;
import org.apache.ibatis.ognl.JavaSource;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import site.zhourui.gilimall.search.config.GulimallElasticSearchConfig;

import java.io.IOException;


@SpringBootTest
@RunWith(SpringRunner.class)
class GulimallSearchApplicationTests {

    @Autowired
    RestHighLevelClient client;

    /**
     * Test index to es
     */
    @Test
    void index() throws IOException {
        IndexRequest request = new IndexRequest("users");//Index name
        request.id("1");//Document id
        User user = new User();
        user.setUserName("Zhang San");
        user.setAge(18);
        user.setGender("male");
        String jsonString = JSON.toJSONString(user);
        request.source(jsonString,XContentType.JSON);//What to save
        //Perform operation
        IndexResponse index = client.index(request, GulimallElasticSearchConfig.COMMON_OPTIONS);
        //Extract useful response data
        System.out.println(index);
    }
    @Data
    class User{
        private String userName;
        private Integer age;
        private String gender;
    }
    @Test
    void contextLoads() {
        System.out.println(client);
    }

}

Successful indexing

7.3.2 data acquisition

/**
     * Test query es
     * @throws IOException
     */
    @Test
    void search() throws IOException {
        //Create index request
        SearchRequest searchRequest = new SearchRequest();
        //Specify index
        searchRequest.indices("bank");
        //Specify DSL, search criteria
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
        //Construct search conditions
//        searchSourceBuilder.query();
//        searchSourceBuilder.from();
//        searchSourceBuilder.size();
//        searchSourceBuilder.aggregation();
        searchRequest.source(searchSourceBuilder);

        //Perform retrieval
        SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

        //Analysis result searchResponse
        System.out.println(searchResponse);

    }

searchResponse query results

And

#match string type full text retrieval
GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  }
}

The query results are consistent

{
	"took": 15,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 4,
			"relation": "eq"
		},
		"max_score": 5.4032025,
		"hits": [{
			"_index": "bank",
			"_type": "account",
			"_id": "970",
			"_score": 5.4032025,
			"_source": {
				"account_number": 970,
				"balance": 19648,
				"firstname": "Forbes",
				"lastname": "Wallace",
				"age": 28,
				"gender": "M",
				"address": "990 Mill Road",
				"employer": "Pheast",
				"email": "forbeswallace@pheast.com",
				"city": "Lopezo",
				"state": "AK"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "136",
			"_score": 5.4032025,
			"_source": {
				"account_number": 136,
				"balance": 45801,
				"firstname": "Winnie",
				"lastname": "Holland",
				"age": 38,
				"gender": "M",
				"address": "198 Mill Lane",
				"employer": "Neteria",
				"email": "winnieholland@neteria.com",
				"city": "Urie",
				"state": "IL"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "345",
			"_score": 5.4032025,
			"_source": {
				"account_number": 345,
				"balance": 9812,
				"firstname": "Parker",
				"lastname": "Hines",
				"age": 38,
				"gender": "M",
				"address": "715 Mill Avenue",
				"employer": "Baluba",
				"email": "parkerhines@baluba.com",
				"city": "Blackgum",
				"state": "KY"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "472",
			"_score": 5.4032025,
			"_source": {
				"account_number": 472,
				"balance": 25571,
				"firstname": "Lee",
				"lastname": "Long",
				"age": 32,
				"gender": "F",
				"address": "288 Mill Street",
				"employer": "Comverges",
				"email": "leelong@comverges.com",
				"city": "Movico",
				"state": "MT"
			}
		}]
	}
}

7.3.3 aggregate query

7.3.3.1 age distribution
/**
     * Test aggregate query es
     * @throws IOException
     */
    @Test
    void aggSearch1() throws IOException {
        //Create index request
        SearchRequest searchRequest = new SearchRequest();
        //Specify index
        searchRequest.indices("bank");
        //Specify DSL, search criteria
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
        //Polymerization conditions
        TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
        //Add aggregation condition to search condition
        searchSourceBuilder.aggregation(ageAgg);
        //Construct search conditions
//        searchSourceBuilder.query();
//        searchSourceBuilder.from();
//        searchSourceBuilder.size();
//        searchSourceBuilder.aggregation();
        searchRequest.source(searchSourceBuilder);

        //Perform retrieval
        SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

        //Analysis result searchResponse
        System.out.println(searchResponse);
    }

Query results

And

#aggregations perform aggregation
#Age distribution and mean aggregation
GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "address": "Mill"
        }}
      ]
    }
  },
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "age",
        "size": 10
      }
    }
  },
  "size": 0
}

The query results are consistent

{
	"took": 19,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 4,
			"relation": "eq"
		},
		"max_score": 5.4032025,
		"hits": [{
			"_index": "bank",
			"_type": "account",
			"_id": "970",
			"_score": 5.4032025,
			"_source": {
				"account_number": 970,
				"balance": 19648,
				"firstname": "Forbes",
				"lastname": "Wallace",
				"age": 28,
				"gender": "M",
				"address": "990 Mill Road",
				"employer": "Pheast",
				"email": "forbeswallace@pheast.com",
				"city": "Lopezo",
				"state": "AK"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "136",
			"_score": 5.4032025,
			"_source": {
				"account_number": 136,
				"balance": 45801,
				"firstname": "Winnie",
				"lastname": "Holland",
				"age": 38,
				"gender": "M",
				"address": "198 Mill Lane",
				"employer": "Neteria",
				"email": "winnieholland@neteria.com",
				"city": "Urie",
				"state": "IL"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "345",
			"_score": 5.4032025,
			"_source": {
				"account_number": 345,
				"balance": 9812,
				"firstname": "Parker",
				"lastname": "Hines",
				"age": 38,
				"gender": "M",
				"address": "715 Mill Avenue",
				"employer": "Baluba",
				"email": "parkerhines@baluba.com",
				"city": "Blackgum",
				"state": "KY"
			}
		}, {
			"_index": "bank",
			"_type": "account",
			"_id": "472",
			"_score": 5.4032025,
			"_source": {
				"account_number": 472,
				"balance": 25571,
				"firstname": "Lee",
				"lastname": "Long",
				"age": 32,
				"gender": "F",
				"address": "288 Mill Street",
				"employer": "Comverges",
				"email": "leelong@comverges.com",
				"city": "Movico",
				"state": "MT"
			}
		}]
	},
	"aggregations": {
		"lterms#ageAgg": {
			"doc_count_error_upper_bound": 0,
			"sum_other_doc_count": 0,
			"buckets": [{
				"key": 38,
				"doc_count": 2
			}, {
				"key": 28,
				"doc_count": 1
			}, {
				"key": 32,
				"doc_count": 1
			}]
		}
	}
}

7.3.4 get query results (convert to object)

The specific results are in the hits below

Use json to generate Java beans and use lombok

/**
     * Auto-generated: 2021-10-25 16:57:49
     *
     * @author bejson.com (i@bejson.com)
     * @website http://www.bejson.com/java2pojo/
     */
    @Data
    @ToString
    public static class Accout {

        private int account_number;
        private int balance;
        private String firstname;
        private String lastname;
        private int age;
        private String gender;
        private String address;
        private String employer;
        private String email;
        private String city;
        private String state;
    }
/**
     * Test aggregate query es
     * @throws IOException
     */
    @Test
    void aggSearch1() throws IOException {
        //Create index request
        SearchRequest searchRequest = new SearchRequest();
        //Specify index
        searchRequest.indices("bank");
        //Specify DSL, search criteria
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
        //Polymerization conditions
        TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
        //Add aggregation condition to search condition
        searchSourceBuilder.aggregation(ageAgg);
        //Construct search conditions
//        searchSourceBuilder.query();
//        searchSourceBuilder.from();
//        searchSourceBuilder.size();
//        searchSourceBuilder.aggregation();
        searchRequest.source(searchSourceBuilder);

        //Perform retrieval
        SearchResponse searchResponse = client.search(searchRequest, GulimallElasticSearchConfig.COMMON_OPTIONS);

        //Analysis result searchResponse
        System.out.println(searchResponse);

        //Get all query results
        SearchHits hits = searchResponse.getHits();
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit searchHit : searchHits) {
//            searchHit.getId();
//            searchHit.getIndex();
//            searchHit.getType();
            //Convert to json string
            String sourceAsString = searchHit.getSourceAsString();
            Accout accout = JSON.parseObject(sourceAsString, Accout.class);
            System.out.println(accout);
        }
    }

Query results

8. Install nginx

8.1 start an nginx instance just to copy the configuration

#Create an empty folder
cd /mydata/
mkdir nginx
#Enable nginx instance
#No nginx image will be downloaded and started automatically
docker run -p 80:80 --name nginx -d nginx:1.10

8.2 copy the configuration file in the container to the current directory

#The current directory is mydata
docker container cp nginx:/etc/nginx .

8.3 modification of document name

#Modify file name
mv nginx conf
#Create a new nginx folder
mkdir nginx
#Copy the config folder to nginx
mv conf nginx/

8.4 delete the original container:

 #Stop original container
 docker stop nginx
 #Delete original container
 docker rm container id

8.5 creating a new nginx

docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10

visit: 192.168.157.128 Virtual machine address

The setup succeeded, but the file was not accessed

Create a new hello word file

vim index.html

Test passed

Tags: Big Data ElasticSearch search engine

Posted on Mon, 25 Oct 2021 04:40:35 -0400 by Beans