preface
The two methods of integrating es with springboot and the simple addition, deletion, modification and query of ES were described earlier. For advanced retrieval, the query method recommended by es in restline 7.x needs to understand the del language format. Today, let's simply learn how to convert dsl language into java code
1, How to use ES to do Baidu like retrieval?
1. First think about the style of Baidu search. During Baidu search, a hot search word will appear on baidu home page
2. When using the search box to search, search completion will be performed
3. You can use pinyin for retrieval
2, Automatic complement of full-text retrieval
1. Create index
Generally speaking, the words prompted in the input box are frequently searched words, so full-text retrieval is the index of the corresponding hot search words.
1. Create an index. For subsequent Pinyin retrieval, Pinyin word segmentation and ik word segmentation should be used for the retrieved fileName field. However, the hot word here uses a keyword, and Pinyin word segmentation is not allowed.
Guess: files can set multiple fields, which will be verified later
{ "properties": { "createTime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd || yyyy/MM/dd HH:mm:ss|| yyyy/MM/dd ||epoch_millis" }, "searchInput": { "type": "completion", "analyzer": "ik_max_word", "fields": { "hot": { "type": "keyword" //For hot search words } } } } }
Conjecture structure:
{ "properties": { "createTime": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd || yyyy/MM/dd HH:mm:ss|| yyyy/MM/dd ||epoch_millis" }, "searchInput": { "type": "completion", "analyzer": "ik_max_word", "fields": { "hot": { "type": "keyword" //For hot search words }, "hot-py":{ "type": "text", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", //For hot search Pinyin "boost": 10.0 } } } } }
2. Add data
I use encapsulated tool classes to add data. Of course, I can use repository
/** * New data, custom id * * @param object Data to add * @param index Index, similar to database * @param id The data ID is generated randomly when it is null * @return */ public String addData(Object object, String index, String id) throws IOException { if (null == id) { return addData(object, index); } if (this.existsById(index, id)) { return this.updateDataByIdNoRealTime(object, index, id); } //Create request IndexRequest request = new IndexRequest(index); request.id(id); request.timeout(TimeValue.timeValueSeconds(1)); //Put data into request json request.source(JSON.toJSONString(object), XContentType.JSON); //client poke request IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT); log.info("Successfully added data. Index is: {}, response state: {}, id by: {}", index, response.status().getStatus(), response.getId()); return response.getId(); }
3. Advanced search
1. The automatic completion function is provided in es. The query object suggestbuilder needs to be defined. There are several types under the suggesttype:
suggest term: give tips and suggestions according to edit distance, and the result is a single word
suggest phrase: on the basis of term adviser, the relationship between multiple terms will be considered, such as whether they appear in the original text of the index at the same time, the degree of adjacency, and word frequency. The result is a sentence
suggest completion: define special fields to prompt incomplete. The field type needs to be defined as: completion, the data will be encoded into FST, stored together with the index, and the FST will be loaded into memory. Can only be used for prefix completion.
DSL query statement demo
{ "suggest": { "suggest": { "prefix": "cat", "completion": { "field": "search_input.hot", "size": 10 } } } }
Suggest: indicates that the next query is a query of type suggest
suggest: the name of this query, user-defined
Prefix: used to complete the word prefix. In this example, search for content starting with a cat
Completion: indicates a suggest ion of completion type. Other types include Term and Phrase
Field: the field to query
2.java code implementation
@Override public Set getSearchSuggest(Integer pageSize,String key){ //Define suggest object SuggestBuilder suggestBuilder = new SuggestBuilder(); //Define the name of this query SEARCH_INPUT = suggest, set prefix, supplement prefix, query number, pagesize, skipDuplicates to remove duplicate data CompletionSuggestionBuilder suggestion = SuggestBuilders .completionSuggestion(SEARCH_INPUT).prefix(key).size(pageSize).skipDuplicates(true); suggestBuilder.addSuggestion(SEARCH_SUGGEST,suggestion); SearchRequest request = new SearchRequest().indices(ESConst.HOT_KEY_INDEX).source(new SearchSourceBuilder().suggest(suggestBuilder)); //You can set it by setting searchsourcebuilder_ source filter field SearchResponse searchResponse = null; try { searchResponse = restHighLevelClient.search(request, RequestOptions.DEFAULT); System.out.println(searchResponse); } catch (IOException e) { e.printStackTrace(); } Suggest suggest = searchResponse.getSuggest(); Set<String> keywords = null; if (suggest != null) { keywords = new HashSet<>(); List<? extends Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option>> entries = suggest.getSuggestion(SEARCH_SUGGEST).getEntries(); for (Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option> entry: entries) { for (Suggest.Suggestion.Entry.Option option: entry.getOptions()) { /** A maximum of 9 recommendations can be returned, and the maximum length of each recommendation is 20 */ String keyword = option.getText().string(); if (!StringUtils.isEmpty(keyword) && keyword.length() <= 20) { /** Remove input fields */ if (keyword.equals(key)) continue; keywords.add(keyword); if (keywords.size() >= pageSize) { break; } } } } } return keywords; }
Since then, we can simply get the complete vocabulary set during retrieval; Of course, you can also set the error correction vocabulary. You only need to modify the suggest object type.
Three hot search words
1. Ideas
The key to hot search words is what you need to query. Generally speaking, hot search words are the most numerous words. Therefore, when I query streaming media in full-text retrieval, I will write a section class to save the words in the input box to the hot search word index of es before query. So it's simple. We just need to group query the hot key index to calculate the largest number of words and sort them.
2.DSL statement
Aggregates in es is equivalent to group by in SQL
es aggregation can be divided into four ways: indicator aggregation, bucket aggregation, matrix aggregation and pipeline aggregation.
Individual groups are aggregated by indicators. terms is equivalent to counting the count results of each group after grouping
"aggs": { "hot_count" : { "terms" : { "field" : "search_input.hot" } } }
3.java code implementation
@Override public List<Map<String, Object>> getHotKey(Integer pageSize,String type) { SearchRequest searchRequest = new SearchRequest(ESConst.HOT_KEY_INDEX); //Build search criteria SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); //select keyword, count(*) as hot_ count from hot-key group by keyword; -- > grouping TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms(HOT_COUNT).field(SEARCH_INPUT+".hot"); //No need to explain sourceBuilder.explain(false); //No raw data required sourceBuilder.fetchSource(false); //No version number is required sourceBuilder.version(false); sourceBuilder.aggregation(aggregationBuilder); RangeQueryBuilder rangeQueryBuilder = null; switch (type){ case ESConst.MONTH: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusMonths(1).format(formatter)).lte(LocalDateTime.now().format(formatter)); break; case ESConst.WEEK: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusWeeks(1).format(formatter)).lte(LocalDateTime.now().format(formatter)); break; case ESConst.DAY: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusDays(1).format(formatter)).lte(LocalDateTime.now().format(formatter)); break; default: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusDays(1).format(formatter)).lte(LocalDateTime.now().format(formatter)); break; } //time frame sourceBuilder.query(rangeQueryBuilder); searchRequest.source(sourceBuilder); try { SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); System.out.println(searchResponse); Terms terms = searchResponse.getAggregations().get(HOT_COUNT); List<Map<String,Object>> result = new ArrayList<>(); for (Terms.Bucket bucket : terms.getBuckets()) { Map<String, Object> map = new HashMap<>(); map.put("hotKey", bucket.getKeyAsString()); map.put("hotValue",bucket.getDocCount()); result.add(map); } return result; } catch (IOException e) { e.printStackTrace(); } return null; }
Four Pinyin completion
The following code has not been tested yet. It is just my imagination. I will try it later
1.DSL syntax
{ "suggest": { "suggest": { "prefix": "cat", "completion": { "field": "suggestion", "size": 10 } }, "py_suggest":{ "prefix": "maohe", "completion": { "field": "search_input.py-hot", "size": 10 } } } }
Because the mapping of the searchinput field maps two kinds of word segmentation when it is created, I think there should be no problem adding a query condition after it
2.java code implementation
Based on the original java code, add the corresponding py to it_ Suggest custom field query
/*Try Pinyin completion*/ CompletionSuggestionBuilder pysuggestion = SuggestBuilders .completionSuggestion(SEARCH_INPUT+".hot-py") .prefix(key) .size(pageSize) .skipDuplicates(true); suggestBuilder.addSuggestion("py_suggest",pysuggestion); /*------------------*/
summary
dsl language does not need to be remembered. When it needs to be used, baidu only needs to know the structure of dsl language followed by object creation and processing in es to write java code