ElasticSearch 15 -- ES - term based and full-text based queries

term query

term is the smallest unit to express semantics. term needs to be processed in both search and natural language processing using statistical language model

characteristic

  • term level query: term query/range query/ exists query / prefix query / wildcard query
  • In ES, the term query does not do word segmentation. The term query is queried as a whole word, an accurate matching is performed, and the matching results are scored
  • You can convert the query into a filtering through Constant Score, avoid scoring, and use cache to improve performance

term query

We now have the following data

{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "ID10032_1194_1634175220_535254",
  "_score" : 6.151313,
  "_source" : {
    "user_id" : 10032,
    "consultant_company_id" : 1194,
    "type" : 9,
    "source" : 1,
    "operation_at" : 1634175220,
    "create_time" : "2021-10-14 09:33:40",
    "update_time" : "2021-10-14 09:33:40",
    "login_type" : 3,
    "true_name" : "Zhang San",
    "id" : "ID10032_1194_1634175220_535254"
  }
}

If we use match query

Here is id10032_ 1194_ 1634175220_ 535254 capital id10032_ 1194_ 1634175220_ 535254 lowercase is OK

GET test_user_action_detail/_search
{
  "query": {
    "match": {
      "id": "ID10032_1194_1634175220_535254"
    }
  }
}

You can get search results

If you use term query, ID10032_1194_1634175220_535254 can query data, but if it is ID10032_1194_1634175220_535254 in uppercase, then the term query cannot find the data

GET test_user_action_detail/_search
{
  "query": {
    "term": {
      "id": {
        "value": "id10032_1194_1634175220_535254"
      }
    }
  }
}

The reason is that the mapping of the id field is text/keyword

"id" : {
  "type" : "text",
  "fields" : {
    "keyword" : {
      "type" : "keyword",
      "ignore_above" : 256
    }
  }
}

The text data type will be word segmented, that is, uppercase will be word segmented into lowercase. Therefore, the ID data of text will actually become lower case id10032_1194_1634175220_535254, and the term query is case sensitive and will match accurately. If you only search in lower case, you can find the results. If you don't want to segment words, you can use the keyword field type search to change the term query statement to the following

GET test_user_action_detail/_search
{
  "query": {
    "term": {
      "id.keyword": {   //The keyword field is used here to query
        "value": "ID10032_1194_1634175220_535254"
      }
    }
  }
}

In this way, you can get the result. Because the keyword does not perform word segmentation, the inserted ID 10032_ 1194_ 1634175220_ 535254 is still id10032_ 1194_ 1634175220_ 535254, but note that in this case, it is lower case id10032_ 1194_ 1634175220_ 535254 can't find the results

filter does not calculate points

For term queries, there is no need to score, because it is an accurate search, so you can use filter queries to avoid scoring and effectively use the cache to improve performance

GET test_user_action_detail/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "id.keyword": {
            "value": "ID10032_1194_1634175220_535254"
          }
        }
      }
    }
  }
}

Full text based query

  • match query / match phrase query / query string query

  • characteristic

    • The index and search will be segmented. The query string is first passed to an appropriate word splitter, and then a word list for query is generated
    • When querying, we will segment words first, then query each word, and finally merge the results. And generate a score for each document.
    • For example, querying matrx reloaded will query the search results of matrx and reloaded, and then merge them.

range query

Range can be used for range query

range: {
  created_at: {
    gt:'1889237773'
  }
}

exists query

You can judge whether the value exists

exists:{
  field:"created_at"
}

Processing multivalued fields

For example, there is an array [Beijing, Shanghai]

If we use term query

GET test_user_action_detail/_search
{
  "query": {
    "term": {
      "city.keyword": { 
        "value": "Beijing"
      }
    }
  }
}

The above data will be searched, because for the array, the term query will be searched as long as it contains.

Geek time ES learning notes

Tags: Big Data ElasticSearch search engine

Posted on Thu, 21 Oct 2021 00:14:34 -0400 by Hafkas