Aggregate(aggregations) It allows us to easily realize the statistics, analysis and operation of data. For example:
- What brand of mobile phone is the most popular?
- The average price, the highest price and the lowest price of these mobile phones?
- How about the monthly sales of these mobile phones?
The implementation of these statistical functions is much more convenient than the sql of the database, and the query speed is very fast, which can achieve near real-time search effect.
1.1. Type of polymerization
There are three common types of aggregation:
- Bucket aggregation: used to group documents
- TermAggregation: group by document field value, such as brand value and country
- Date Histogram: grouped by date ladder, for example, a week is a group, or a month is a group
- Metric aggregation: used to calculate some values, such as maximum, minimum, average, etc
- Avg: Average
- Max: Max
- Min: find the minimum value
- Stats: find max, min, avg, sum, etc. at the same time
- pipeline aggregation: aggregation based on other aggregation results
Note: the fields participating in aggregation must be keyword, date, numeric value and boolean type
1.2.DSL aggregation
Now, we want to count several hotel brands in all the data. In fact, we group the data according to the brand. At this time, we can aggregate according to the name of the hotel brand, that is, Bucket aggregation.
1.2.1.Bucket aggregation syntax
The syntax is as follows:
GET /hotel/_search { "size": 0, // Set the size to 0. The results do not contain documents, but only aggregate results "aggs": { // Define aggregation "brandAgg": { //Give me a name "terms": { // The type of aggregation is aggregated according to the brand value, so term is selected "field": "brand", // Fields participating in aggregation "size": 10 // Number of aggregate results you want to get } } } }
The results are shown in the figure below:
1.2.2. Sorting of aggregation results
By default, Bucket aggregation will count the number of documents in the Bucket, record them as _count, and sort them in descending order according to _count.
We can specify the order attribute to customize the sorting method of aggregation:
GET /hotel/_search { "size": 0, "aggs": { "brandAgg": { "terms": { "field": "brand", "order": { "_count": "asc" // Sort in ascending order of _count }, "size": 10 } } } }
1.2.3. Limit aggregation range
By default, Bucket aggregation is used to aggregate all documents in the index library. However, in a real scenario, users will enter search criteria, so aggregation must be aggregation of search results. Then aggregation must add restrictive conditions.
We can limit the scope of documents to be aggregated by adding query conditions:
GET /hotel/_search { "query": { "range": { "price": { "lte": 200 // Only for documents less than 200 yuan } } }, "size": 0, "aggs": { "brandAgg": { "terms": { "field": "brand", "size": 20 } } } }
This time, the brands obtained by aggregation are obviously less:
1.2.4.Metric aggregation syntax
Last class, we grouped hotels according to brands to form buckets. Now we need to calculate the hotels in the bucket to obtain the min, max and avg equivalent of user scores of each brand.
This requires Metric aggregation, such as stat aggregation: you can obtain min, max, avg and other results.
The syntax is as follows:
GET /hotel/_search { "size": 0, "aggs": { "brandAgg": { "terms": { "field": "brand", "size": 20 }, "aggs": { // It is a sub aggregation of brands aggregation, that is, after grouping, each group is calculated separately "score_stats": { // Aggregate name "stats": { // Aggregation type, where stats can calculate min, max, avg, etc "field": "score" // Aggregate field, here is score } } } } } }
This score_stats aggregation is a sub aggregation nested within brandAgg's aggregation, because we need to calculate it separately in each bucket.
In addition, we can also sort the aggregation results, for example, according to the average hotel score of each bucket:
1.2.5. Summary
aggs stands for aggregation, which is at the same level as query. At this time, the role of query is?
- Limit the scope of aggregated documents
Three essential elements of aggregation:
- Aggregate name
- Aggregation type
- Aggregate field
Aggregate configurable properties are:
- size: Specifies the number of aggregation results
- order: specifies how aggregate results are sorted
- Field: Specifies the aggregate field
1.3.RestAPI aggregation
1.3.1.API syntax
The aggregation condition is at the same level as the query condition, so you need to use request.source() to specify the aggregation condition.
Syntax of aggregation condition:
The aggregation result is also different from the query result, and the API is also special. However, JSON is also parsed layer by layer:
1.3.2. Business requirements
Still iterate on the basis of the previous section
Demand: the brand, city and other information on the search page should not be written on the page, but obtained by aggregating the hotel data in the index library:
analysis:
At present, the city list, star list and brand list on the page are written dead and will not change with the change of search results. However, when the user's search conditions change, the search results will change.
For example, users search for the Oriental Pearl, and the hotel must be located near the Oriental Pearl in Shanghai. Therefore, the city can only be Shanghai. At this time, the information in Beijing, Shenzhen and Hangzhou should not be displayed in the city list.
In other words, the page should list the cities that are included in the search results; the page should list the brands that are included in the search results.
How to know which brands are included in the search results? How to know which cities are included in the search results?
Use the aggregation function and Bucket aggregation to group the documents in the search results based on brands and cities, so as to know which brands and cities are included.
Because it is the aggregation of search results, aggregation is the aggregation of limited scope, that is, the limiting conditions of aggregation are consistent with the conditions of search documents.
Viewing the browser, you can find that the front end has actually issued such a request:
The request parameters are exactly the same as those of the search document.
The return value type is the final result to be displayed on the page:
The result is a Map structure:
- key is a string, including city, star, brand and price
- value is a collection, such as the names of multiple cities
1.3.3. Business realization
Add a method in HotelController of cn.itcast.hotel.web package, and follow the following requirements:
- Request method: POST
- Request path: / hotel/filters
- Request parameters: RequestParams, which is consistent with the parameters of the search document
- Return value type: Map < string, list < string > >
code:
@PostMapping("filters") public Map<String, List<String>> getFilters(@RequestBody RequestParams params){ return hotelService.getFilters(params); }
Here, the getFilters method in IHotelService is called, which has not been implemented.
Define a new method in cn.itcast.hotel.service.IHotelService:
Map<String, List<String>> filters(RequestParams params);
Implement this method in cn.itcast.hotel.service.impl.HotelService:
Add new functions on the basis of the previous section
@Override public Map<String, List<String>> filters(RequestParams params) { try { // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL // 2.1.query, using the same query criteria buildBasicQuery(params, request); // 2.2. Setting size request.source().size(0); // 2.3. Polymerization buildAggregation(request); // 3. Make a request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Analysis results Map<String, List<String>> result = new HashMap<>(); Aggregations aggregations = response.getAggregations(); // 4.1. Obtain brand results according to brand name List<String> brandList = getAggByName(aggregations, "brandAgg"); result.put("brand", brandList); // 4.2. Obtain brand results according to brand name List<String> cityList = getAggByName(aggregations, "cityAgg"); result.put("city", cityList); // 4.3. Obtain brand results according to brand name List<String> starList = getAggByName(aggregations, "starAgg"); result.put("starName", starList); return result; } catch (IOException e) { throw new RuntimeException(e); } } private void buildAggregation(SearchRequest request) { request.source().aggregation(AggregationBuilders .terms("brandAgg") .field("brand") .size(100) ); request.source().aggregation(AggregationBuilders .terms("cityAgg") .field("city") .size(100) ); request.source().aggregation(AggregationBuilders .terms("starAgg") .field("starName") .size(100) ); } private List<String> getAggByName(Aggregations aggregations, String aggName) { // 4.1. Get aggregation results by aggregation name Terms brandTerms = aggregations.get(aggName); // 4.2. Getting buckets List<? extends Terms.Bucket> buckets = brandTerms.getBuckets(); // 4.3. Traversal List<String> brandList = new ArrayList<>(); for (Terms.Bucket bucket : buckets) { // 4.4. Get key String key = bucket.getKeyAsString(); brandList.add(key); } return brandList; }