1. Explains how to use es for paging search syntax
From: from which corner mark
size: how many pieces of data to query
GET /_search?size=10 GET /_search?size=10&from=0 GET /_search?size=10&from=20
paging
GET /test_index/test_type/_search
Response results
{ "took": 27, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "10", "_score": 1, "_source": { "test_field1": "test1", "test_field2": "updated test2" } }, { "_index": "test_index", "_type": "test_type", "_id": "12", "_score": 1, "_source": { "test_field": "test12" } }, { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field1": "test field111111" } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaced test2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 1, "_source": { "test_field1": "test field1", "test_field2": "bulk test1" } }, { "_index": "test_index", "_type": "test_type", "_id": "11", "_score": 1, "_source": { "num": 1, "tags": [] } } ] } }
Let's assume that the 9 pieces of data are divided into 3 pages, and each page is 3 pieces of data to experiment with the effect of this paging search
GET /test_index/test_type/_search?from=0&size=3
Response results
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "10", "_score": 1, "_source": { "test_field1": "test1", "test_field2": "updated test2" } }, { "_index": "test_index", "_type": "test_type", "_id": "12", "_score": 1, "_source": { "test_field": "test12" } } ] } }
Page 1: id=8,10,12
GET /test_index/test_type/_search?from=3&size=3
Response results
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field1": "test field111111" } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaced test2" } } ] } }
Page 2: id=4,6,2
GET /test_index/test_type/_search?from=6&size=3
Response results
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 1, "_source": { "test_field1": "test field1", "test_field2": "bulk test1" } }, { "_index": "test_index", "_type": "test_type", "_id": "11", "_score": 1, "_source": { "num": 1, "tags": [] } } ] } }
Page 3: id=7,1,11
2. What is a deep paging problem? Why does this problem arise and what is its underlying principle?
deep paging performance issues, as well as the principle of in-depth graphic disclosure, very advanced knowledge points
What is deep paging
In short, the search is very deep. For example, there are 60000 pieces of data in total, 20000 pieces of data are divided on each shard, and 10 pieces of data are divided into 2000 pages.
At this time, if you want to search page 1000, you need to think about the number of items to. In fact, you need to get 10001 ~ 10010 data
Does each shard return 10001 ~ 10010 data? Not at all!
Your request may reach the node of a shard that does not contain the index. Then this node is a coordinate node. Then this coordinate node will forward the search request to the node of the three shards of the index.
In this case, to search page 1000 of 60000 data, in fact, each shard needs to take out 10001 ~ 10010 of the internal 20000 data. Here is not these 10 data, but 10010 data. Therefore, the three shards return a total of 30030 data to the coordinate node, which will sort these data according to the correlation score_ socre sort, and then take out the ten pieces of data on page 1000.
When the search is too deep, you need to save a large amount of data on the coordinate node and sort a large amount of data. After sorting, take out the corresponding page of data and return. This process consumes not only network bandwidth, memory, but also CPU. Therefore, we should try our best to avoid the performance problem of deep paging