ES 7.X does Baidu search, automatic completion of search, hot search words and Pinyin function

preface

The two methods of integrating es with springboot and the simple addition, deletion, modification and query of ES were described earlier. For advanced retrieval, the query method recommended by es in restline 7.x needs to understand the del language format. Today, let's simply learn how to convert dsl language into java code

1, How to use ES to do Baidu like retrieval?

1. First think about the style of Baidu search. During Baidu search, a hot search word will appear on baidu home page

2. When using the search box to search, search completion will be performed


3. You can use pinyin for retrieval

2, Automatic complement of full-text retrieval

1. Create index

Generally speaking, the words prompted in the input box are frequently searched words, so full-text retrieval is the index of the corresponding hot search words.

1. Create an index. For subsequent Pinyin retrieval, Pinyin word segmentation and ik word segmentation should be used for the retrieved fileName field. However, the hot word here uses a keyword, and Pinyin word segmentation is not allowed.

Guess: files can set multiple fields, which will be verified later

{
  "properties": {
    "createTime": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd || yyyy/MM/dd HH:mm:ss|| yyyy/MM/dd ||epoch_millis"
    },
    "searchInput": {
      "type": "completion",
      "analyzer": "ik_max_word",
      "fields": {
        "hot": {
          "type": "keyword"  //For hot search words
        }
      }
    }
  }
}

Conjecture structure:

{
  "properties": {
    "createTime": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss || yyyy-MM-dd || yyyy/MM/dd HH:mm:ss|| yyyy/MM/dd ||epoch_millis"
    },
    "searchInput": {
      "type": "completion",
      "analyzer": "ik_max_word",
      "fields": {
        "hot": {
          "type": "keyword"  //For hot search words
        },
        "hot-py":{
          "type": "text",
          "term_vector": "with_positions_offsets",
          "analyzer": "pinyin_analyzer", //For hot search Pinyin
          "boost": 10.0
        }
      }
    }
  }
}

2. Add data

I use encapsulated tool classes to add data. Of course, I can use repository

    /**
     * New data, custom id
     *
     * @param object Data to add
     * @param index  Index, similar to database
     * @param id     The data ID is generated randomly when it is null
     * @return
     */
    public String addData(Object object, String index, String id) throws IOException {
        if (null == id) {
            return addData(object, index);
        }
        if (this.existsById(index, id)) {
            return this.updateDataByIdNoRealTime(object, index, id);
        }
        //Create request
        IndexRequest request = new IndexRequest(index);
        request.id(id);
        request.timeout(TimeValue.timeValueSeconds(1));
        //Put data into request json
        request.source(JSON.toJSONString(object), XContentType.JSON);
        //client poke request
        IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
        log.info("Successfully added data. Index is: {}, response state: {}, id by: {}", index, response.status().getStatus(), response.getId());
        return response.getId();
    }

3. Advanced search

1. The automatic completion function is provided in es. The query object suggestbuilder needs to be defined. There are several types under the suggesttype:

suggest term: give tips and suggestions according to edit distance, and the result is a single word
suggest phrase: on the basis of term adviser, the relationship between multiple terms will be considered, such as whether they appear in the original text of the index at the same time, the degree of adjacency, and word frequency. The result is a sentence
suggest completion: define special fields to prompt incomplete. The field type needs to be defined as: completion, the data will be encoded into FST, stored together with the index, and the FST will be loaded into memory. Can only be used for prefix completion.

DSL query statement demo

{
  "suggest": {
    "suggest": {
      "prefix": "cat",
      "completion": {
        "field": "search_input.hot",
        "size": 10
      }
    }
  }
}

Suggest: indicates that the next query is a query of type suggest
suggest: the name of this query, user-defined
Prefix: used to complete the word prefix. In this example, search for content starting with a cat
Completion: indicates a suggest ion of completion type. Other types include Term and Phrase
Field: the field to query

2.java code implementation

    @Override
    public Set getSearchSuggest(Integer pageSize,String key){
		//Define suggest object
        SuggestBuilder suggestBuilder = new SuggestBuilder();
		//Define the name of this query SEARCH_INPUT = suggest, set prefix, supplement prefix, query number, pagesize, skipDuplicates to remove duplicate data
        CompletionSuggestionBuilder suggestion = SuggestBuilders
                .completionSuggestion(SEARCH_INPUT).prefix(key).size(pageSize).skipDuplicates(true);

        suggestBuilder.addSuggestion(SEARCH_SUGGEST,suggestion);
        SearchRequest request = new SearchRequest().indices(ESConst.HOT_KEY_INDEX).source(new SearchSourceBuilder().suggest(suggestBuilder));
        //You can set it by setting searchsourcebuilder_ source filter field
        SearchResponse searchResponse = null;
        try {
            searchResponse = restHighLevelClient.search(request, RequestOptions.DEFAULT);
            System.out.println(searchResponse);
        } catch (IOException e) {
            e.printStackTrace();
        }
        Suggest suggest = searchResponse.getSuggest();

        Set<String> keywords = null;
        if (suggest != null) {
            keywords = new HashSet<>();
            List<? extends Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option>> entries = suggest.getSuggestion(SEARCH_SUGGEST).getEntries();
            for (Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option> entry: entries) {
                for (Suggest.Suggestion.Entry.Option option: entry.getOptions()) {
                    /** A maximum of 9 recommendations can be returned, and the maximum length of each recommendation is 20 */
                    String keyword = option.getText().string();
                    if (!StringUtils.isEmpty(keyword) && keyword.length() <= 20) {
                        /** Remove input fields */
                        if (keyword.equals(key)) continue;
                        keywords.add(keyword);
                        if (keywords.size() >= pageSize) {
                            break;
                        }
                    }
                }
            }
        }
        return keywords;
    }

Since then, we can simply get the complete vocabulary set during retrieval; Of course, you can also set the error correction vocabulary. You only need to modify the suggest object type.

Three hot search words

1. Ideas

The key to hot search words is what you need to query. Generally speaking, hot search words are the most numerous words. Therefore, when I query streaming media in full-text retrieval, I will write a section class to save the words in the input box to the hot search word index of es before query. So it's simple. We just need to group query the hot key index to calculate the largest number of words and sort them.

2.DSL statement

Aggregates in es is equivalent to group by in SQL
es aggregation can be divided into four ways: indicator aggregation, bucket aggregation, matrix aggregation and pipeline aggregation.

Individual groups are aggregated by indicators. terms is equivalent to counting the count results of each group after grouping

 "aggs": {
     "hot_count" : {
       "terms" : {
         "field" : "search_input.hot"
       }
     }
   }

3.java code implementation

    @Override
    public List<Map<String, Object>> getHotKey(Integer pageSize,String type) {

        SearchRequest searchRequest = new SearchRequest(ESConst.HOT_KEY_INDEX);
        //Build search criteria
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //select keyword, count(*) as hot_ count from hot-key group by keyword;  -- >  grouping
        TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms(HOT_COUNT).field(SEARCH_INPUT+".hot");
        //No need to explain
        sourceBuilder.explain(false);
        //No raw data required
        sourceBuilder.fetchSource(false);
        //No version number is required
        sourceBuilder.version(false);
        sourceBuilder.aggregation(aggregationBuilder);

        RangeQueryBuilder rangeQueryBuilder = null;

        switch (type){
            case ESConst.MONTH: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusMonths(1).format(formatter)).lte(LocalDateTime.now().format(formatter));
                break;
            case ESConst.WEEK: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusWeeks(1).format(formatter)).lte(LocalDateTime.now().format(formatter));
                break;
            case ESConst.DAY: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusDays(1).format(formatter)).lte(LocalDateTime.now().format(formatter));
                break;
            default: rangeQueryBuilder = QueryBuilders.rangeQuery("createTime").gte(LocalDateTime.now().minusDays(1).format(formatter)).lte(LocalDateTime.now().format(formatter));
                break;
        }

        //time frame
        sourceBuilder.query(rangeQueryBuilder);

        searchRequest.source(sourceBuilder);
        try {
            SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            System.out.println(searchResponse);
            Terms terms = searchResponse.getAggregations().get(HOT_COUNT);
            List<Map<String,Object>> result = new ArrayList<>();
            for (Terms.Bucket bucket : terms.getBuckets()) {
                Map<String, Object> map = new HashMap<>();
                map.put("hotKey", bucket.getKeyAsString());
                map.put("hotValue",bucket.getDocCount());
                result.add(map);
            }
            return result;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

Four Pinyin completion

The following code has not been tested yet. It is just my imagination. I will try it later

1.DSL syntax

{
  "suggest": {
    "suggest": {
      "prefix": "cat",
      "completion": {
        "field": "suggestion",
        "size": 10
      }
    },
	"py_suggest":{
	    "prefix": "maohe",
        "completion": {
        "field": "search_input.py-hot",
        "size": 10
      }
	}
  }
}

Because the mapping of the searchinput field maps two kinds of word segmentation when it is created, I think there should be no problem adding a query condition after it

2.java code implementation

Based on the original java code, add the corresponding py to it_ Suggest custom field query

     /*Try Pinyin completion*/
     CompletionSuggestionBuilder pysuggestion = SuggestBuilders
             .completionSuggestion(SEARCH_INPUT+".hot-py")
             .prefix(key)
             .size(pageSize)
             .skipDuplicates(true);
     suggestBuilder.addSuggestion("py_suggest",pysuggestion);
        /*------------------*/

summary

dsl language does not need to be remembered. When it needs to be used, baidu only needs to know the structure of dsl language followed by object creation and processing in es to write java code

Tags: Big Data ElasticSearch

Posted on Wed, 27 Oct 2021 02:05:40 -0400 by padma