The number of shard s for dynamically expanding indexes under ES7.5

In the old version of ES (for example, version 2.3), after the number of shard s of index is set, it can't be modified any more, unless the data is rebuilt.

Starting from ES6.1, ES supports online operation to expand the number of shard s (Note: index locking is also required during operation)

Starting from ES7.0, in split, it is no longer necessary to add the parameter index.number Ou of routing Ou Shards



Refer to official documents for details:

    https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.html

    https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.html



split process:

1. Create a new target index with the same definition as the source index, but with more primary Shards.

2. Hard link segment from source index to target index. (if the file system does not support hard linking, it is a time-consuming process to copy all segments to the new index.)

3. After creating low-level files, hash all documents again to delete documents belonging to different Shards

4. Restore the target index as if it were a closed index just reopened.



Why does ES not support incremental reshaping?

From n slices to N + 1 slices. Incremental re sharding is indeed a supported feature of many key value stores. It is not feasible to add a new partition and push the new data into the new partition: this may be an index bottleneck, and determine the partition of the document according to the given ﹐ id, which is necessary for obtaining, deleting and updating requests, and will become very complex. This means that we need to use other hash schemes to rebalance the existing data.


The most common way for a key store to effectively perform this operation is to use a consistent hash. When the number of slices increases from n to N + 1, the consistent hash only needs to relocate 1 / N of the key. However, the storage unit (fragment) of elastic search is Lucene index. Because of their search oriented data structure, they only account for a large part of Lucene index, i.e. only 5% of documents. It is usually much more expensive to delete and index on another partition than to store key values. As mentioned in the previous section, this cost remains reasonable when increasing the number of slices by increasing the multiplier: This allows Elasticsearch to perform the split locally, which in turn allows the split to be performed at the index level, rather than re indexing the documents that need to be re indexed, as well as using hard links for effective file replication.


For append only data, you can get more flexibility by creating a new index and pushing the new data to it, and adding an alias to overwrite the old and new indexes of the read operation. It is assumed that the old index and the new index have m and N segments respectively, which has no overhead compared with searching the index with M + N segments.



The precondition for the index to be split:

1. The target index cannot exist.

2. The source index must have fewer primary shard s than the target index.

3. The number of primary shards in the target index must be a multiple of the number of primary shards in the source index.

4. The nodes handling the split process must have enough free disk space to accommodate the second copy of the existing index.



The following is the specific experimental part:

tips: the experimental machine is limited, the replica of the index is set to 0, and at least replica > = 1 in production


#Create an index, 2 primary shard s, no replica

curl -s -X PUT "http://1.1.1.1:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0
  },
    "aliases": {
    "my_search_indices": {}
  }
}'



#Write several test data

curl -s -X PUT "http://1.1.1.1:9200/my_search_indices/_doc/11?pretty" -H 'Content-Type: application/json' -d '{
  "id": 11,
  "name":"lee",
  "age":"23"
}'
curl -s -X PUT "http://1.1.1.1:9200/my_search_indices/_doc/22?pretty" -H 'Content-Type: application/json' -d '{
  "id": 22,
  "name":"amd",
  "age":"22"
}'


#Query data

curl -s -XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq .


#Write the index lock so that the split operation can be performed as follows

curl -s -X PUT "http://1.1.1.1:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": true
  }
}'


#Write data test to ensure the lock is effective

curl -s -X PUT "http://1.1.1.1:9200/twitter/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'


#Remove alias from twitter index

curl -s -X POST "http://1.1.1.1:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "remove" : { "index" : "twitter", "alias" : "my_search_indices" } }
    ]
}'



#Start to split the index. After adjustment, the index name is new twitter and the number of main shard s is 8

curl -s -X POST "http://1.1.1.1:9200/twitter/_split/new_twitter?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.number_of_shards": 8,
    "index.number_of_replicas": 0
  }
}'


#Add alias to the new index

curl -s -X POST "http://1.1.1.1:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "add" : { "index" : "new_twitter", "alias" : "my_search_indices" } }
    ]
}'

Result:

{

  "acknowledged" : true,

  "shards_acknowledged" : true,

  "index" : "new_twitter"

}


Supplement:

To view the progress of split, you can use the "cat/recovery" api or view it on the cerebro interface.



#View the data of the new index, which can be viewed normally

curl -s -XGET "http://1.1.1.1:9200/my_search_indices/_search" | jq .


#For the test of new index write data, you can see the failed

curl -s -X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'



#Turn on the write function of index

curl -s -X PUT "1.1.1.1:9200/my_search_indices/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": false 
  }
}'


#Test the new index write data again, and you can see that the write is successful at this time

curl -s -X PUT "1.1.1.1:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

curl -s -X PUT "1.1.1.1:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
  "id": 44,
  "name":"intel",
  "age":"4"
}'



#At this time, the old index is read-only. After we make sure the new index is OK, we can consider closing or deleting the old twitter index.






Tags: Linux curl JSON ElasticSearch Fragment

Posted on Thu, 16 Jan 2020 10:47:45 -0500 by kevinkorb