1. mget and size
es returns ten pieces of data by default for each query result, or you can set more by size
{ "query": { "bool": { "must": [ { "match": { "entname": "Huawei Technology Co., Ltd" } } ] } }, "size":20 }
When the size setting is greater than 10000, the query will return an exception:
Error prompt: "reason": "Result window is too large, from + size must be less than or equal to: [10000] However, this option can also be modified in es configuration, but it is not recommended to modify this option.
mget can be used when multiple query conditions correspond to multiple or single query results, such as querying data with id 1 and 2
GET /_mget { "docs" : [ { "_index" : "test_index", "_type" : "test_type", "_id" : 1 }, { "_index" : "test_index", "_type" : "test_type", "_id" : 2 } ] }
MultiGetRequest is used in java to set query parameters. By the way, es query can also specify collection range query, such as the construction method of QueryBuilders.termsQuery("showtemp", "1","2","3"...). Corresponding query statement:
{ "query": { "bool": { "must": [ { "terms": { "showTemp": [ "1", "2" ] } } ] } } }
bulk API
In java, BulkRequestBuilder is used to construct. You can see how to use it. This is for constructing multiple query conditions. Of course, you can also do batch insertion.
scrollId
The bulk and mget mentioned above are only for the batch of query conditions, and size is for the expansion of single returned results. If you want to really put all Buy second-hand If the result is found, you can select the scolid cursor query. java example:
/** * Batch cursor query by time range * * @param index * @param type * @param dateKey es Store update time field name * @param startDate Start time * @param endDate End time * @return */ public List<Map<String, Object>> searchByDate(String index, String type, String dateKey, String startDate, String endDate) throws IOException { List<Map<String, Object>> list = new ArrayList<>(); RestHighLevelClient client = EsService.getClient(); final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); SearchRequest searchRequest = new SearchRequest(index); searchRequest.scroll(scroll); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.size(2000); searchSourceBuilder.query(QueryBuilders.rangeQuery(dateKey).gte(startDate).lt(endDate)); searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = client.search(searchRequest); String scrollId = searchResponse.getScrollId(); SearchHit[] searchHits = searchResponse.getHits().getHits(); while (searchHits != null && searchHits.length > 0) { for (int i = 0; i < searchHits.length; i++) { list.add(searchHits[i].getSourceAsMap()); } SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); scrollRequest.scroll(scroll); searchResponse = client.searchScroll(scrollRequest); scrollId = searchResponse.getScrollId(); searchHits = searchResponse.getHits().getHits(); } //Once scrolling is complete, clear the scrolling context ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); clearScrollRequest.addScrollId(scrollId); ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest); boolean succeeded = clearScrollResponse.isSucceeded(); return list; }
snapshot
If you need to back up es data, you can choose snapshot. If you set automatic snapshot for some index es, you need to restart es to take effect.
reindex
Similarly, reindex can also be used for ES data backup and migration, but this is the migration from ES to es. If you need to backup to files, you can choose the following scheme.
esdump
install
npm install elasticdump
There are many installation methods, and even support docker installation and operation. The specific scheme you want to use can be selected according to your own situation. The use methods are as follows:
'#Copy analyzer word segmentation elasticdump \ --input=http://ip1:9200/my_index \ --output=http://ip2:9200/my_index \ --type=analyzer '#Copy mapping elasticdump \ --input=http://ip1.com:9200/my_index \ --output=http://ip2:9200/my_index \ --type=mapping '#Copy data elasticdump \ --input=http://ip1:9200/my_index \ --output=http://ip2:9200/my_index \ --type=data
The above is es to es, - out can also point to files.
logstash
People who know ELK naturally know what logstash is used for. It is generally used to output log files to es for statistics and viewing. Is it OK to turn around? Of course, it's no problem. You can also export es data to files or even other databases.
The installation of logstash is relatively simple. Here is an example of exporting es data to hdfs using logstash:
input { elasticsearch { hosts => "es:9200" index => "food_business_license" size => 10000 query => ' { "query": { "bool": { "must": [ { "term": { "PROVINCE": { "value": "Tibet" } } } ] } } } ' scroll => "5m" docinfo => true } } filter { if![hh]{ mutate { add_field => { "hh" => "ha-ha" } } } } output { webhdfs { host => "ip" port => "port" user => spark flush_size => 5000 idle_flush_time => 10 path => "/tmp/%{+YYYY}-%{+MM}-%{+dd}/food-%{+YYYY}%{+MM}%{+dd}.csv" codec => line { format => "%{LEGAL_PERSON}\u0001%{TAXPAYER_NAME}\u0001%{hh}\u0001%{LEGAL_PERSON}" } } }
Input is the input data, which is read from ES, so the written connection es and filter deal with some data formats and field conversion. Output is configured with output location and output format.
The principle of esdump and logstash batch export is also based on scroll query. I tested it and felt that it was faster to use logstash.