ElasticSearch -- full text search

ElasticSearch -- full text search

Source: [Shangsi Valley] - Advanced chapter of grain mall

1, Introduction

Official website: https://www.elastic.co/cn/what-is/elasticsearch

Full text search is the most common requirement. Open source Elasticsearch is the first choice of full-text search engines. It can quickly store, search and analyze massive data. Wikipedia, Stack Overflow and Github all use it. The bottom layer of elastic is the open source library Lucene. However, you can't use Lucene directly. You must write your own code to call its interface. Elastic is the package of Lucene and provides the operation interface of REST API, which can be used out of the box.

Kibana (visualization tool)

REST API: natural cross platform.

Official documents: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Official Chinese: https://www.elastic.co/guide/cn/elasticsearch/guide/current/foreword_id.html

Community Chinese: https://es.xiaoleilu.com/index.html http://doc.codingdict.com/elasticsearch/0

2, Basic concepts

1. index

It is equivalent to a database in MySQL

2. type

It is equivalent to a data table in MySQL

3. Document

Equivalent to data in MySQL

4. Inverted indexing mechanism

Wordsrecord
Crimson Sea1,2,3,4,5
get some action1,2,3
explore2,5
especially3,5
Record article4
special agent5

3, Preliminary search

1. _cat

  • _ GET: View es information

    {
        "name": "616a9e1efbf4",
        "cluster_name": "elasticsearch",
        "cluster_uuid": "Op8Z3VQzSeKTG2Q3rN17yQ",
        "version": {
            "number": "7.4.2",
            "build_flavor": "default",
            "build_type": "docker",
            "build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96",
            "build_date": "2019-10-28T20:40:44.881551Z",
            "build_snapshot": false,
            "lucene_version": "8.2.0",
            "minimum_wire_compatibility_version": "6.8.0",
            "minimum_index_compatibility_version": "6.0.0-beta1"
        },
        "tagline": "You Know, for Search"
    }
    
  • _ GET /_cat/nodes: view all nodes

    127.0.0.1 22 77 4 0.00 0.01 0.06 dilm * 616a9e1efbf4
    
  • GET /_cat/health: View es health status

    1637633495 02:11:35 elasticsearch yellow 1 1 9 9 0 0 6 0 - 60.0%
    
  • GET /_cat/master: View master node

    9QX5tgTTQW-djqKkz6tnFQ 127.0.0.1 127.0.0.1 616a9e1efbf4
    
  • GET /_ Cat / indexes: view all indexes show databases

    yellow open bank                     s8dd9wdKSC2j4HA9pfCrig 1 1 1000 0 428.4kb 428.4kb
    yellow open website                  YmVaHWr9SL6ilyWfaRyHog 1 1    2 2   8.6kb   8.6kb
    green  open .kibana_task_manager_1   vhVToLqpT4-NCXd5rXicPw 1 0    2 0  21.7kb  21.7kb
    yellow open my_index                 diw79fzHRRKi7MdS1Seflg 1 1    0 0    283b    283b
    green  open .apm-agent-configuration DQhr7FobQg6tV9q5z7QA8g 1 0    0 0    283b    283b
    yellow open newbank                  N-6EQ7M4TcmDKIMBQQv4Aw 1 1 1000 0 286.5kb 286.5kb
    green  open .kibana_1                vzWLn08uRK2R8xDGukZzdw 1 0    9 1    30kb    30kb
    yellow open users                    T1m16_wQSiyHxKiA5CIv-A 1 1    1 0   4.3kb   4.3kb
    yellow open customer                 LlfbJVXnSXWHaZr-_n-w2g 1 1    2 0   3.5kb   3.5kb
    

2. Index a document (save)

Save a data in which index and type, and specify which unique identifier PUT customer/external/1 is used; Save the No. 1 data under the external type under the customer index as

PUT customer/external/1
{ 
	"name": "John Doe"
}

Both PUT and POST are OK, and POST is added. If you do not specify an id, an id is automatically generated. Specifying the id will modify the data and add the version number PUT, which can be modified. PUT must specify id; Because PUT needs to specify an id, we usually use it for modification. If we do not specify an id, an error will be reported.

3. Query documents

GET customer/externa

result:

{ 
	"_index": "customer", //At which index
	"_type": "external", //In which type
	"_id": "1", //Record id
	"_version": 2, //Version number
	"_seq_no": 1, //The concurrency control field will be + 1 each time it is updated, which is used as an optimistic lock
	"_primary_term": 1, //As above, the main partition will be reassigned. If it is restarted, it will change
	"found": true, "_source": { //Real content
		"name": "John Doe"
	}
}

Update carry

?if_seq_no=0&if_primary_term=

4. Update documents

POST customer/external/1/_update
{ 
    "doc":{ 
        "name": "John Doew"
	}
}
//Or===========================
POST customer/external/1
{ 
    "name": "John Doe2"
}
//Or===========================
PUT customer/external/1
{ 
    "name": "John Doe"
}

Different:

The POST operation will compare the source document data. If they are the same, there will be no operation. The PUT operation will always save the data again and increase the version; Bring_ Update does not perform any operation if the metadata is the same. Look at the scene; For large concurrent updates, no update is required; For large concurrent queries, it is updated occasionally with update; Compare the updates and recalculate the allocation rules.

Update and add attributes:

POST customer/external/1/_update
{ 
	"doc": { 
		"name": "Jane Doe", 
		"age": 20 
    }
}
//PUT and POST without_ update is also OK

5. Delete document & Index

DELETE customer/external/1
DELETE customer

6. bulk batch API

POST customer/external/_bulk
{"index":{"_id":"1"}} 
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }

Syntax format:

{ action: { metadata }}\n 
{ request body }\n 

{ action: { metadata }}\n 
{ request body }\n 

Complex instance:

POST /_bulk 
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "title": "My first blog post" } 
{ "index": { "_index": "website", "_type": "blog" }} 
{ "title": "My second blog post" } 
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } 
{ "doc" : {"title" : "My updated blog post"} } 

The bulk API executes all actions in this order. If a single action fails for any reason, it will continue to process the remaining actions after it. When the bulk API returns, it will provide the status of each action (in the same order as sending), so you can check whether a specified action has failed.

4, Advanced retrieval

1. SearchAPI

ES supports two basic retrieval methods:

  • One is to send search parameters (uri + retrieval parameters) by using the REST request URI
  • The other is to send them by using REST request body (uri + request body)

1) Retrieve information

retrievalexplain
GET bank/_searcRetrieve all information under the bank, including type and docs
GET bank/_search?q=*&sort=account_number:ascRequest parameter retrieval

Responder

Response body propertiesexplain
took - ElasticsearchTime to execute search (MS)
time_outTimeout
_shardsCount the number of searched segments and the successful / failed search segments
hitssearch result
hits.totalsearch result
hits.hitsActual search result array (default to top 10 documents)
sortSorting key of results (if not, sort by score)
score || max_scoreCorrelation score and highest score (for full-text retrieval)

uri + request body for retrieval

GET bank/_search
{ 
    "query": { 
        "match_all": {}
	},
    "sort": [
		{ 
            "account_number": { 
            	"order": "desc"
			}
		}
	]
}

HTTP client tool (POSTMAN). The get request cannot carry the request body. It is the same when we change to post. We post a JSON style query request body to_ search API. It should be understood that once the search results are returned, elastic search completes the request, and will not maintain any server-side resources or cursor s of the results

2. Query DSL

1) Basic syntax format

Elasticsearch provides a Json style DSL (domain specific language) that can execute queries. This is called Query DSL. The query language is very comprehensive, and it feels a little complicated at the beginning. The way to really learn it well is to start with some basic examples.

Typical structure of a query statement

{ 
    QUERY_NAME: { 
        ARGUMENT: VALUE, 
        ARGUMENT: VALUE,
        ... 
    } 
} 

If it is for a field, its structure is as follows:

{ 
    QUERY_NAME: { 
        FIELD_NAME: { 
            ARGUMENT: VALUE, 
            ARGUMENT: VALUE,
            ... 
        } 
    }
}
GET bank/_search
{ 
    "query": { 
        "match_all": {}
    },
    "from": 0, 
    "size": 5, 
    "sort": [
        { 
            "account_number": { 
                "order": "desc"
            }
        }
    ]
}
  • Query defines how to query,
  • match_all query type [represents all queries]. In es, you can combine many query types in query to complete complex queries
  • In addition to the query parameter, we can also pass other parameters to change the query result. Such as sort, size
  • from+size limit to complete paging function
  • Sort sort, multi field sort, will sort the subsequent fields when the previous fields are equal, otherwise the previous order will prevail

2) Return partial fields

GET bank/_search
{ 
    "query": {
        "match_all": {}
    },
    "from": 0, 
    "size": 5, 
    "_source": [
        "age","balance"
    ]
}

3) match query

  • Basic type (non string), exact match

    GET bank/_search 
    { 
        "query": { 
            "match": { 
                "account_number": "20" 
            } 
        } 
    } 
    

    match returns account_number=20

  • String, full text retrieval

    GET bank/_search
    { 
        "query": { 
            "match": { 
                "address": "mill"
            }
        }
    }
    

    Finally, all records containing the word "mill" in the address will be queried. When the string type is searched, the full-text search will be carried out, and each record has a correlation score.

  • String, multiple words (word segmentation + full text search)

    GET bank/_search 
    { 
        "query": { 
            "match": { 
                "address": "mill road" 
            } 
        } 
    } 
    

    Finally, query all records containing mill or road or mill road in address, and give correlation scores

4)match_phrase [phrase matching]

The value to be matched is retrieved as a whole word (no word segmentation)

GET bank/_search
{ 
    "query": { 
        "match_phrase": { 
            "address": "mill road"
		}
	}
}

Find out all records containing mill road in address and give correlation scores

5)multi_match [multi field matching]

GET bank/_search 
{ 
    "query": { 
        "multi_match": { 
            "query": "mill", 
            "fields": ["state","address"] 
        } 
    } 
}
state perhaps address contain null

6) bool [compound query]

bool is used for compound query:

It is important to understand that compound statements can combine any other query statements, including compound statements. This means that compound statements can be nested with each other and can express very complex logic.

  • Must: all conditions listed in must must be met

    GET bank/_search
    { 
        "query": { 
            "bool": { 
                "must": [
    				{ "match": { "address": "mill" } },
    				{ "match": { "gender": "M" } }
    			]
    		}
    	}
    }
    
  • Should: the conditions listed in should should should be met. If they are met, the score of relevant documents will be increased and the query result will not be changed. If there is only should and only one matching rule in the query, the condition of should will be used as the default matching condition to change the query result

    GET bank/_search
    { 
        "query": { 
            "bool": { 
                "must": [
    				{ "match": { "address": "mill" } }, 
                    { "match": { "gender": "M" } }
    			],"should": [
    				{"match": { "address": "lane" }}
    			]
    		}
    	}
    }
    
  • must_not must not be the case specified

    GET bank/_search
    { 
        "query": { 
            "bool": { 
                "must": [
    				{"match": { "address": "mill" } }, 
                    {"match": { "gender": "M" } }
    			],
                "should": [
    				{"match": { "address": "lane" }}
    			],
                "must_not": [
    				{"match": { "email": "baluba.com" }}
    			]
    		}
    	}
    }
    

    The address contains mill and the gender is M. if there is lane in the address, it is best, but the email must not contain baluba.com

    eventdescribe
    mustWords (queries) must appear in matching documents and will help score
    filterThe word (query) must appear in the matching document, but unlike must, the score of this query will be ignored
    shouldThe word sentence (query) should appear in the matching document. The Boolean query does not contain the word sentence of must or filter. One or more should clauses must have matching files. The minimum data matching the should condition can be set by setting minimum_should_match parameter
    must_notWords (queries) cannot appear in matching documents

7) filter [result filtering]

Not all queries need to generate scores, especially those used only for "filtering". In order not to calculate the score, Elasticsearch automatically checks the scenario and optimizes the execution of the query.

GET bank/_search
{
	"query": {
		"bool": {
			"must": [{
				"match": {
					"address": "mill"
				}
			}],
			"filter": {
				"range": {
					"balance": {
						"gte": 10000,
						"lte": 20000
					}
				}
			}
		}
	}
}

8)term

Same as match. Matches the value of a property. Match is used for full-text search fields and term is used for other non text fields.

GET bank/_search
{
	"query": {
		"bool": {
			"must": [{
				"term": {
					"age": {
						"value": "28"
					}
				}
			}, {
				"match": {
					"address": "990 Mill Road"
				}
			}]
		}
	}
}

9) aggregations

Aggregation provides the ability to group and extract data from data. The simplest aggregation method is roughly equal to SQL GROUP BY and SQL aggregation function. In elastic search, you have the ability to perform a search and return hits (hit results) and aggregate results at the same time, separating all hits (hit results) in a response. This is very powerful and effective. You can execute queries and multiple aggregations, get their own (any) return results in one use, and use a concise and simplified API to avoid network roundtrip.

  • Search for the age distribution and average age of all people with mill in address, but do not display the details of these people.

    GET bank/_search
    {
    	"query": {
    		"match": {
    			"address": "mill"
    		}
    	},
    	"aggs": {
    		"group_by_state": {
    			"terms": {
    				"field": "age"
    			}
    		},
    		"avg_age": {
    			"avg": {
    				"field": "age"
    			}
    		}
    	},
    	"size": 0
    }
    //========================================
    size: 0 //Do not display search data
    aggs: //Perform aggregation. The aggregation syntax is as follows
    "aggs": {
    	"aggs_name The name of this aggregation is easy to display in the result set": {
    		"AGG_TYPE Type of aggregation( avg,term,terms)": {}
    	}
    }
    
  • Aggregate by age and request the average salary of these people in these age groups

    GET bank/account/_search
    {
    	"query": {
    		"match_all": {}
    	},
    	"aggs": {
    		"age_avg": {
    			"terms": {
    				"field": "age",
    				"size": 1000
    			},
    			"aggs": {
    				"banlances_avg": {
    					"avg": {
    						"field": "balance"
    					}
    				}
    			}
    		}
    	},
    	"size": 1000
    }
    
  • Find out all age distributions, and the average salary of M and F in these age groups, as well as the overall average salary of this age group

    GET bank/account/_search
    {
    	"query": {
    		"match_all": {}
    	},
    	"aggs": {
    		"age_agg": {
    			"terms": {
    				"field": "age",
    				"size": 100
    			},
    			"aggs": {
    				"gender_agg": {
    					"terms": {
    						"field": "gender.keyword",
    						"size": 100
    					},
    					"aggs": {
    						"balance_avg": {
    							"avg": {
    								"field": "balance"
    							}
    						}
    					}
    				},
    				"balance_avg": {
    					"avg": {
    						"field": "balance"
    					}
    				}
    			}
    		}
    	},
    	"size": 1000
    }
    

3. Mapping

1) Field type

2) Mapping

Mapping

Mapping is used to define a document and how its contained field s are stored and indexed. For example, use mapping to define:

  • Which string attributes should be considered full text fields.

  • Which attributes contain numbers, dates, or geographic locations.

  • Whether all properties in the document can be indexed (_allconfiguration).

  • Format of the date.

  • Custom mapping rules to perform dynamic attribute addition.

  • Viewing mapping information

    GET bank/_mappi
    
  • Modifying mapping information

    https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

    Auto guess mapping type

    JSON typeDomain type
    Boolean: true or falseboolean
    Integer: 123long
    Floating point: 123.45double
    String, valid date: September 15, 2014date
    String: foo barstring

3) New version changes

Es7 and above removed the concept of type.

  • Two data representations in relational databases are independent. Even if they have columns with the same name, it will not affect their use, but this is not the case in ES. Elastic search is a search engine developed based on Lucene, and the final processing method of files with the same name under different type s in ES is the same in Lucene.
    • Two users under two different types_ Name, under the same index of ES, is actually considered to be the same filed. You must define the same filed mapping in two different types. Otherwise, the same field names in different types will conflict in processing, resulting in the decrease of Lucene processing efficiency.
    • Removing type is to improve the efficiency of ES data processing. Elasticsearch 7.x
  • The type parameter in the URL is optional. For example, indexing a document no longer requires a document type. Elasticsearch 8.x
  • The type parameter in the URL is no longer supported. solve:
    1. Migrate the index from multi type to single type, and each type of document has an independent index
    2. Migrate all the type data under the existing index to the specified location. See data migration for details
1. Create mapping
1,Create index and specify mapping
PUT /my-index
{
	"mappings": {
		"properties": {
			"age": {
				"type": "integer"
			},
			"email": {
				"type": "keyword"
			},
			"name": {
				"type": "text"
			}
		}
	}
}
2. Add new field mapping
PUT /my-index/_mapping
{
	"properties": {
		"employee-id": {
			"type": "keyword",
			"index": false
		}
	}
}
3. Update mapping

We cannot update mapping fields that already exist. Update must create a new index for data migration

4. Data migration

Create new first_ The correct mapping of twitter. Then use the following methods for data migration

POST _reindex [Fixed writing]
{
	"source": {
		"index": "twitter"
	},
	"dest": {
		"index": "new_twitter"
	}
}

Migrate the data under the type of the old index

POST _reindex
{
	"source": {
		"index": "twitter",
		"type": "tweet"
	},
	"dest": {
		"index": "tweets"
	}
}

4. Participle

A tokenizer receives a character stream, divides it into independent tokens (words, usually independent words), and then outputs the tokens stream.

For example, the whitespace tokenizer splits text when it encounters white space characters. It splits the text "Quick, brown, fox!" into [Quick, brown, fox!]. Should

The tokenizer is also responsible for recording the order or position position position of each term (used for phrase and word proximity query), and the character offsets of the start and end of the original word represented by the term (used to highlight the content of the search).

Elasticsearch provides many built-in word splitters that can be used to build custom analyzers.

1) Install ik word splitter

Note: the default elasticsearch plugin install xxx.zip cannot be used for automatic installation

https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2 Corresponding es version installation

Enter the plugins directory inside the es container

docker exec -it container id /bin/bash

wget

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-anal ysis-ik-7.4.2.zip

unzip downloaded files

rm –rf *.zip

mv elasticsearch/

ik can confirm whether the word splitter is installed

cd .../bin

elasticsearch plugin list: lists the word splitters of the system

2) Test word splitter

Use default

POST _analyze 
{ 
    "text": "I am Chinese," 
} 

Use word splitter

POST _analyze 
{ 
    "analyzer": "ik_smart", 
    "text": "I am Chinese," 
} 

Another word breaker

ik_max_word

POST _analyze 
{ 
    "analyzer": "ik_max_word ", 
    "text": "I am Chinese," 
} 

From the observation results, we can see that there are obvious differences in word segmentation between different word splitters. Therefore, the default mapping can no longer be used when defining an index in the future. It is necessary to manually create mapping because word splitters should be selected.

3) Custom Thesaurus

Modify ikanalyzer.cfg.xml/usr/share/elasticsearch/plugins/ik/config/ in / usr / share / elasticsearch / plugins / IK / config /

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer Extended configuration</comment>
	<!--Users can configure their own extended dictionary here -->
	<entry key="ext_dict"></entry>
	<!--Users can configure their own extended stop word dictionary here-->
	<entry key="ext_stopwords"></entry>
	<!--Users can configure the remote extension dictionary here -->
	<entry key="remote_ext_dict">http://192.168.128.130/fenci/myword.txt</entry>
	<!--Users can configure the remote extended stop word dictionary here-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

Original xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer Extended configuration</comment>
	<!--Users can configure their own extended dictionary here -->
	<entry key="ext_dict"></entry>
	<!--Users can configure their own extended stop word dictionary here-->
	<entry key="ext_stopwords"></entry>
	<!--Users can configure the remote extension dictionary here -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--Users can configure the remote extended stop word dictionary here-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

Use nginx to publish static resources according to the path marked in red, create corresponding folders and files according to the requested path, and put them in the html of nginx

Then restart the es server and nginx.

After the update is completed, es will only use new word segmentation for the newly added data. Historical data will not be re segmented. If you want to re segment historical data. Need to perform:

POST my_index/_update_by_query?conflicts=proceed

5, Elasticsearch rest client

1),9300: TCP

  • spring-data-elasticsearch:transport-api.jar;
    • Different versions of springboot and transport-api.jar cannot be adapted to the es version
    • 7.x is no longer recommended and will be abandoned after 8

2),9200: HTTP

  • JestClient: unofficial, slow update
  • RestTemplate: simulate sending HTTP requests. Many ES operations need to be encapsulated by themselves, which is troublesome
  • HttpClient: same as above
  • Elasticsearch rest client: the official RestClient encapsulates ES es ES operations. The API is hierarchical and easy to use

Finally, elastic search rest client (elastic search rest high level client) is selected https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html

1. SpringBoot integration

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.4.2</version>
</dependency>

2. Disposition

import org.apache.http.HttpHost;
import org.elasticsearch.client.*;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @Description :
 * @Author : sherlock
 * @Date : 2021/11/22 14:29
 */

@Configuration
public class ElasticSearchConfig {

    public static final RequestOptions COMMON_OPTIONS;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
//        builder.addHeader("authorization", "Bearer" + Token);
//        builder.setHttpAsyncResponseConsumerFactory(
//                new HttpAsyncResponseConsumerFactory
//                        .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024)
//        );
        COMMON_OPTIONS = builder.build();
    }

    //elasticsearch address
    @Bean
    public RestHighLevelClient client() {
        RestClientBuilder builder = RestClient.builder(
                new HttpHost("192.168.56.10", 9200, "http")
        );
        return new RestHighLevelClient(builder);
    }
}

3. Use

Refer to official documents:

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html

import cn.hutool.json.JSON;
import cn.hutool.json.JSONObject;
import cn.hutool.json.JSONUtil;
import com.syq.gulimail.search.config.ElasticSearchConfig;
import lombok.Data;
import lombok.ToString;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.Avg;
import org.elasticsearch.search.aggregations.metrics.AvgAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;

@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimailSearchApplicationTests {

    @Autowired
    private RestHighLevelClient client;

    @ToString
    @Data
    class Account {

        private int account_number;
        private int balance;
        private String firstname;
        private String lastname;
        private int age;
        private String gender;
        private String address;
        private String employer;
        private String email;
        private String city;
        private String state;

    }

    /**
     * Retrieve data
     *
     * @throws IOException
     */
    @Test
    public void searchData() throws IOException {
        SearchRequest searchRequest = new SearchRequest();
        //1. Specify index
        searchRequest.indices("bank");
        //2. Specify search criteria
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //Construct search conditions
//        sourceBuilder.query();
//        sourceBuilder.from();
//        sourceBuilder.size();
//        sourceBuilder.aggregation();
        sourceBuilder.query(QueryBuilders.matchQuery("address", "mill"));

        //3. Create aggregation conditions
        //3.1) aggregate by age value
        TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
        sourceBuilder.aggregation(ageAgg);

        //3.2) calculate average salary
        AvgAggregationBuilder balanceAvg = AggregationBuilders.avg("balanceAvg").field("balance");
        sourceBuilder.aggregation(balanceAvg);

        System.out.println("Search criteria:" + sourceBuilder.toString());
        searchRequest.source(sourceBuilder);

        //4. Perform retrieval
        SearchResponse searchResponse = client.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);

        //5. Analysis results
        System.out.println("Query results:" + searchResponse.toString());
        SearchHit[] searchHits = searchResponse.getHits().getHits();
        for (SearchHit hit : searchHits) {
//            hit.getIndex();hit.getType();hit.getId();
            hit.getSourceAsString();
            Account account = JSONUtil.toBean(hit.getSourceAsString(), Account.class);
            System.out.println("account:" + account);
        }
        Aggregations aggregations = searchResponse.getAggregations();
//        for (Aggregation aggregation : aggregations.asList()) {
//            System.out.println("current aggregation:" + aggregation.getName());
//        }
        Terms ageAgg1 = aggregations.get("ageAgg");
        for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
            String keyAsString = bucket.getKeyAsString();
            System.out.println("Age:" + keyAsString + "==>" + bucket.getDocCount());
        }

        Avg balanceAvg1 = aggregations.get("balanceAvg");
        System.out.println("Average salary:" + balanceAvg1.getValue());

    }

    /**
     * Store / update data to es
     *
     * @throws IOException
     */
    @Test
    public void indexData() throws IOException {
        IndexRequest indexRequest = new IndexRequest("users");
        indexRequest.id("1");//Data id
//        indexRequest.source("username","zhangsan","age",18,"gender", "male");
        User user = new User();
        user.setUsername("sherlock");
        user.setAge(18);
        user.setGender("male");
        String s = JSONUtil.toJsonStr(user);
        indexRequest.source(s, XContentType.JSON);//Content to save, content type

        //Perform operation
        IndexResponse index = client.index(indexRequest, ElasticSearchConfig.COMMON_OPTIONS);

        //Extract useful data
        System.out.println(index);
    }

    @Data
    class User {
        private String username;
        private String gender;
        private Integer age;
    }

    @Test
    public void contextLoads() {
        System.out.println(client);
    }

}

Tags: Java ElasticSearch Spring Boot Spring Cloud IDEA

Posted on Tue, 23 Nov 2021 06:20:43 -0500 by urneegrux