Elasticsearch Series - Several advanced features

outline

This article focuses on simple ways to search for templates, map templates, highlight searches, and geographic locations.

Standard Search Template

One of the advanced features of the search template search tempalte allows you to template some of our searches by passing in specified parameters when using an existing template to avoid writing duplicate code.Templates can be used to encapsulate commonly used functions, making them easier to use.

This is similar to interface encapsulation when we program, which encapsulates things that are handled in detail into interfaces for others to invoke. Users only need to focus on parameters and response results, which can improve code reuse.

Let's look at the basic usages

Parameter substitution

GET /music/children/_search/template
{
  "source": {
    "query": {
      "match": {
        "{{field}}":"{{value}}"
      }
    }
  },
  "params": {
    "field":"name",
    "value":"bye-bye"
  }
}

The search template is compiled to be equivalent to:

GET /music/children/_search
{
  "query": {
    "match": {
      "name":"bye-bye"
    }
  }
}

Conditional queries using Json format

A slightly more complex condition can be written within the {#toJson}} block

GET /music/children/_search/template
{
  "source": "{\"query\":{\"match\": {{#toJson}}condition{{/toJson}}}}",
  "params": {
    "condition": {
      "name":"bye-bye"
    }
  }
}

The compiled search template is equivalent to the following:

GET /music/children/_search
{
  "query": {
    "match": {
      "name":"bye-bye"
    }
  }
}

join syntax

Parameter names within a join can be written in several ways:

GET /music/children/_search/template
{
  "source": {
    "query": {
      "match": {
        "name": "{{#join delimiter=' '}}names{{/join delimiter=' '}}"
      }
    }
  },
  "params": {
    "name":["gymbo","you are my sunshine","bye-bye"]
  }
}

The compiled search template is equivalent to the following:

GET /music/children/_search
{
  "query": {
    "match": {
      "name":"gymbo you are my sunshine bye-bye"
    }
  }
}

Default Value Settings for Search Templates

You can set some default values for the search template, such as {{^end}}500 to indicate that if the end parameter is empty, the default value is 500

GET /music/children/_search/template
{
  "source":{
    "query":{
      "range":{
        "likes":{
          "gte":"{{start}}",
          "lte":"{{end}}{{^end}}500{{/end}}"
        }
      }
    }
  },
  "params": {
    "start":1,
    "end":300
  }
}

The search template is compiled to be equivalent to:

GET /music/children/_search
{
  "query": {
    "range": {
      "likes": {
        "gte": 1,
        "lte": 300
      }
    }
  }
}

Conditional Judgment

In Museache, there is no such judgment as if/else, but you can make a section to skip it if that variable is false or not defined

{{#param1}}
    "This section is skipped if param1 is null or false"
{{/param1}}

Example: Create mustache scripts object

POST _scripts/condition
{
  "script": {
    "lang": "mustache",
    "source": 
    """
        {
        	"query": {
              "bool": {
                "must": {
                  "match": {
                    "name": "{{name}}"
                  }
                },
                "filter":{
                  {{#isLike}}
                    "range":{
                      "likes":{
                        {{#start}}
                          "gte":"{{start}}"
                          {{#end}},{{/end}}
                        {{/start}}
                        {{#end}}
                          "lte":"{{end}}"
                        {{/end}}
                      }
                    }
                  {{/isLike}}
                }
              }
            }
        }
    """
  }
}

Query using mustache template:

GET _search/template
{
    "id": "condition", 
    "params": {
      "name":"gymbo",
      "isLike":true,
      "start":1,
      "end":500
    }
}

These are several commonly used search templates. If you have a large project with a dedicated Elasticsearch engineer, you will often use some common functionality to template, and you only need to use templates to develop business systems for children's shoes.

Custom Mapping Template

ES has its own rules for type mapping of inserted data, such as 10, which automatically maps to long, 10 to text, and a built-in field for keyword.Convenience is convenient, but sometimes these types are not what we want, such as our integer value of 10, which we expect to be the integer type, and 10, which we want to be the keyword type. At this time, we can predefine a template, and when inserting data, the related fields match according to our predefined rules to determine the type of field value.

In addition, it should be noted that coding specifications are generally more rigorous in practice, that all document s are predefined types before data insertion, and that even if a field is added in the middle, the mapping command is executed before data is inserted.

But custom dynamic mapping templates also need to be understood.

Default dynamic mapping effect

Try inserting a piece of data:

PUT /test_index/type/1
{
  "test_string":"hello kitty",
  "test_number":10
}

View mapping information

GET /test_index/_mapping/type

The response is as follows:

{
  "test_index": {
    "mappings": {
      "type": {
        "properties": {
          "test_number": {
            "type": "long"
          },
          "test_string": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

The default dynamic mapping rule may not be what we want.

For example, we want the default number type to be integer and the string to be string, but the built-in field name is raw, not keyword, and 128 characters are reserved.

Dynamic Mapping Template

There are two ways:

  1. Match against a predefined template based on the default data type of the newly added field
  2. Match a predefined name based on the name of the newly added field, or a predefined wildcard, and then a predefined template
Matching based on data type
PUT /test_index
{
  "mappings": {
    "type": {
      "dynamic_templates": [
        {
          "integers" : {
            "match_mapping_type": "long",
            "mapping": {
              "type":"integer"
            }
          }
        },
        {
          "strings" : {
            "match_mapping_type": "string",
            "mapping": {
              "type":"text",
              "fields": {
                "raw": {
                  "type": "keyword",
                  "ignore_above": 128
                }
              }
            }
          }
        }
      ]
    }
  }
}

Delete the index, reinsert the data, and view the mapping information as follows:

{
  "test_index": {
    "mappings": {
      "type": {
        "dynamic_templates": [
          {
            "integers": {
              "match_mapping_type": "long",
              "mapping": {
                "type": "integer"
              }
            }
          },
          {
            "strings": {
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "ignore_above": 128,
                    "type": "keyword"
                  }
                },
                "type": "text"
              }
            }
          }
        ],
        "properties": {
          "test_number": {
            "type": "integer"
          },
          "test_string": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 128
              }
            }
          }
        }
      }
    }
  }
}

To map by expected type, as expected.

  • Mapping by field name
  • The field that begins with "long_", and is originally of type long, is converted to type integer
  • The field that begins with "string_", and is of type string, is converted to type string.ray The field at the end of'_text'is of type string and remains unchanged
PUT /test_index
{
  "mappings": {
    "type": {
      "dynamic_templates":[
       {
         "long_as_integer": {
	         "match_mapping_type":"long",
           "match": "long_*",
           "mapping":{
             "type":"integer"
           }
         }
       },
       {
         "string_as_raw": {
	         "match_mapping_type":"string",
           "match": "string_*",
           "unmatch":"*_text",
           "mapping": {
              "type":"text",
              "fields": {
                "raw": {
                  "type": "keyword",
                  "ignore_above": 128
                }
              }
            }
         }
       }
      ]
    }
  }
}

Insert data:

PUT /test_index/type/1
{
  "string_test":"hello kitty",
  "long_test": 10,
  "title_text":"Hello everyone"
}

Query mapping information

{
  "test_index": {
    "mappings": {
      "type": {
        "dynamic_templates": [
          {
            "long_as_integer": {
              "match": "long_*",
              "match_mapping_type": "long",
              "mapping": {
                "type": "integer"
              }
            }
          },
          {
            "string_as_raw": {
              "match": "string_*",
              "unmatch": "*_text",
              "match_mapping_type": "string",
              "mapping": {
                "fields": {
                  "raw": {
                    "ignore_above": 128,
                    "type": "keyword"
                  }
                },
                "type": "text"
              }
            }
          }
        ],
        "properties": {
          "long_test": {
            "type": "integer"
          },
          "string_test": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 128
              }
            }
          },
          "title_text": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

The result is as expected.

In some log management scenarios, we can define a type, create an index by date every day, and use the mapping template to do all the mapping relationships we define.

Highlight Search

When we search for text in the browser, we find that the keyword we entered is highlighted. Looking at the html source, we know that the highlighted part is labeled <em>. ES also supports highlighted search, and automatically adds <em> label to the returned document, which is compatible with html5 pages.

highlight basic syntax

Let's also start highlight search using music websites as an example:

GET /music/children/_search 
{
  "query": {
    "match": {
      "content": "love"
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

The parameters inside the highlight are the syntax of the highlight search. Specify the highlighted field as content. We can see that the hit ove has <em>highlighted label, <em></em>will turn red on html, so if you include that search term in your specified field, the search term will be highlighted in red in the text of that field.

{
  "took": 35,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "music",
        "_type": "children",
        "_id": "5",
        "_score": 0.2876821,
        "_source": {
          "id": "1740e61c-63da-474f-9058-c2ab3c4f0b0a",
          "author_first_name": "Jean",
          "author_last_name": "Ritchie",
          "author": "Jean Ritchie",
          "name": "love somebody",
          "content": "love somebody, yes I do",
          "language": "english",
          "tags": "love",
          "length": 38,
          "likes": 3,
          "isRelease": true,
          "releaseDate": "2019-12-22"
        },
        "highlight": {
          "content": [
            "<em>love</em> somebody, yes I do"
          ]
        }
      }
    ]
  }
}

Fields under highlight can specify more than one field so that keywords hit in more than one field can be highlighted, for example:

GET /music/children/_search 
{
  "query": {
    "match": {
      "content": "love"
    }
  },
  "highlight": {
    "fields": {
      "name":{},
      "content": {}
    }
  }
}

Three Highlighting Grammars

There are three highlighted grammars:

  1. plain highlight: With standard Lucene highlighter, simple queries are very well supported.
  2. unified highlight: The default highlight syntax, which uses Lucene Unified Highlighter to divide text into sentences and BM25 to calculate the score of entries for sentences, supports precise and fuzzy queries.
  3. fast vector highlighter: With Lucene Fast Vector highlighter, it is very powerful. If you turn on term_vector for field in mapping and set with_positions_offsets, you will use the highlighter, which has a performance advantage over extremely long text (larger than 1MB).

For example:

PUT /music
{
  "mappings": {
    "children": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "term_vector" : "with_positions_offsets"
        }
      }
    }
  }
}

In general, plain highlight is sufficient and no additional settings are required If you have a high requirement for highlight performance, try enabling unified highlight If the field value is particularly large, exceeding 1M, then fast vector highlight can be used

Customize highlighted html tags

We know that the default tag for highlighting is <em>, and this tag can be defined by itself, then use the style you like:

GET /music/children/_search 
{
  "query": {
    "match": {
      "content": "Love"
    }
  },
  "highlight": {
    "pre_tags": ["<tag1>"],
    "post_tags": ["</tag2>"], 
    "fields": {
      "content": {
        "type": "plain"
      }
    }
  }
}

Settings for highlight fragment s

For some very long text, we can't display it completely on the page. We just need to show the context of the keyword. Set fragment s here:

GET /_search
{
    "query" : {
        "match": { "content": "friend" }
    },
    "highlight" : {
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3, "no_match_size": 150 }
        }
    }
}

fragment_size: Sets the length of the fragment text judgement to be displayed, defaulting to 100.

number_of_fragments: You may have more than one fragment in your highlighted fragment text fragment, and you can specify just a few fragments to display.

geographical position

Geographic location-based app s are emerging, and there are many components that support geographic location, including Elasticsearch, where ES combines geographic location, full-text search, structured search, and analysis.

geo point data type

Elasticsearch is based on geographic location search, has a special object, geo_point, that stores geographic location information (longitude, latitude), and provides some basic query methods, such as geo_bounding_box.

Establishing a mapping of type geo_point

PUT /location
{
  "mappings": {
    "hotels": {
      "properties": {
        "location": {
          "type": "geo_point"
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}
insert data

The following insertion methods are recommended:

#latitude: dimension, longitude: longitude
PUT /location/hotels/1
{
  "content":"7days hotel",
  "location": {
    "lon": 113.928619,
    "lat": 22.528091
  }
}

There are two other ways to insert data, but it's particularly easy to confuse latitude and longitude positions, so it's not recommended:

# Within parentheses in location, the first is longitude and the second is latitude
PUT /location/hotels/2
{
  "content":"7days hotel ",
  "location": [113.923567,22.523988]
}

# In location, the first is latitude and the second is longitude
PUT /location/hotels/3
{
  "text": "7days hotel Orient Sunseed Hotel",
  "location": "22.521184, 113.914578" 
}
Query Method

A geo_bounding_box query for coordinate points within the geographic location range of a rectangle

GET /location/hotels/_search
{
  "query": {
     "geo_bounding_box": {
      "location": {
        "top_left":{
          "lon": 112,
          "lat": 23
        },
        "bottom_right":{
          "lon": 114,
          "lat": 21
        }
      }
    } 
  }
}

Common query scenarios

Ge_bounding_box mode
GET /location/hotels/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left":{
              "lon": 112,
              "lat": 23
            },
            "bottom_right":{
              "lon": 114,
              "lat": 21
            }
          }
        }
      }
    }
  }
}
Ge_polygon, a polygonal (triangular) area of three points

Polygons are supported, but this filter is expensive and minimizes use.

GET /location/hotels/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "geo_polygon": {
          "location": {
            "points": [
              {"lon": 115,"lat": 23},
              {"lon": 113,"lat": 25},
              {"lon": 112,"lat": 21}
            ]
          }
        }
      }
    }
  }
}
geo_distance method

It's useful to search based on the distance of your current location

GET /location/hotels/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "geo_distance": {
          "distance": 500, 
          "location": {
            "lon": 113.911231,
            "lat": 22.523375
          }
        }
      }
    }
  }
}
Sort by distance

Conditional searches based on the current location specify an upper limit of distance, 2 or 5 km, and the results of a qualified query show the distance (you can specify units) from the current location, sorted from near to far, which is a very common scenario.

Example request:

GET /location/hotels/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "geo_distance": {
          "distance": 2000, 
          "location": {
            "lon": 113.911231,
            "lat": 22.523375
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": { 
          "lon": 113.911231,
          "lat": 22.523375
        },
        "order":         "asc",
        "unit":          "m", 
        "distance_type": "plane" 
      }
    }
  ]
}
  • filter.geo_distance.distance: Maximum distance, 2000 m here
  • _geo_distance: Fixed notation, followed by latitude and longitude of the specified location
  • order: Sort by asc or desc
  • unit: units of distance, m/km
  • distance_type: How distances are calculated, sloppy_arc (default), arc (accurate) and plane (fastest)

The response is as follows:

"hits": [
      {
        "_index": "location",
        "_type": "hotels",
        "_id": "3",
        "_score": null,
        "_source": {
          "text": "7days hotel Orient Sunseed Hotel",
          "location": "22.521184, 113.914578"
        },
        "sort": [
          421.35435857277366
        ]
      },
      {
        "_index": "location",
        "_type": "hotels",
        "_id": "2",
        "_score": null,
        "_source": {
          "content": "7days hotel",
          "location": [
            113.923567,
            22.523988
          ]
        },
        "sort": [
          1268.8952707727062
        ]
      }

What's inside a sort is the ground distance from the current position in m.

Count the number of hotels in several areas of my current location

Unit represents the unit of distance, commonly mi and km.

distance_type represents how distances are calculated, sloppy_arc (default), arc (accurate), and plane (fastest).

GET /location/hotels/_search
{
  "size": 0,
  "aggs": {
    "group_by_distance": {
      "geo_distance": {
        "field": "location",
        "origin": {
          "lon": 113.911231,
          "lat": 22.523375
        },
        "unit": "mi", 
        "distance_type": "arc", 
        "ranges": [
          {"from": 0,"to": 500},
          {"from": 500,"to": 1500},
          {"from": 150,"to": 2000}
        ]
      }
    }
  }
}

Summary

This article gives a brief introduction to simple ways to use search templates, map templates, highlight searches, and geographic locations. Some ES-related items are more advanced, and search templates and map templates are still useful.Highlight search is typically found in browser search engines, where geographic location is interesting and can be used in Location-based APP applications.

Focus on Java high-concurrency, distributed architecture, more technology dry goods to share and learn from, please follow Public Number: Java Architecture Community You can sweep the QR code on the left to add friends and invite you to join the Java Architecture Community WeChat Group to explore technology

Tags: Programming Fragment ElasticSearch Java JSON

Posted on Fri, 15 May 2020 20:02:51 -0400 by garydt