Elasticsearch batch import data 1

preface

Elasticsearch It is a very efficient full-text retrieval engine.

Elasticsearch It is very convenient for multidimensional data analysis, so it is often seen in the field of big data analysis. Most of the newly generated data in the production environment can be directly imported through applications, but the historical or initial data may need to be processed separately. In this case, it may be necessary to import a large amount of data

Here, I'd like to briefly share the operation methods and related foundations of batch import data, as well as possible problems. For details, please refer to Official documents

Tip: the latest version is Elasticsearch 2.2.0

outline

bulk API

ES provides an API called bulk for batch operations

It is used to update or delete a large number of indexes in an API call, which greatly improves the operation efficiency

form

API

The API can be/_ bulk, /{index}/_bulk, or {index} / {type}/_ For the three forms of bulk, when the index or type has been specified, if the content in the data file is not explicitly specified or declared, the value in the API will be used by default

API so/_ At the end of bulk, and keep up with JSON data in the following form

Data content format

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

Note: the last line must also end with \ n

Available methods

The available operations are index, create, delete and update:

  • index and create must start another line after action_and_meta_data, and then add the content (this format must be followed. An example of operation failure caused by not doing so will be shown later)
  • When deleting, you only need to connect the metadata, not the content (needless to say, it's OK to locate the document)
  • update has to connect with the local data to be changed, or start another row

Text assignment

Because it is a batch operation, it is unlikely to specify manually by using the command line directly. It is more to use files. If text files are used, the following format must be followed

curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"

Tip: requests is the file name, - s is the silent mode and does not produce output. You can also use > / dev / null instead

Import data

Attempt to index data not as required

[root@es-bulk tmp]# curl localhost:9200/stuff_orders/order_list/903713?pretty
{
  "_index" : "stuff_orders",
  "_type" : "order_list",
  "_id" : "903713",
  "found" : false
}
[root@es-bulk tmp]# cat test.json 
{"index":{"_index":"stuff_orders","_type":"order_list","_id":903713}}{"real_name":"Liu Bei","user_id":48430,"address_province":"Shanghai","address_city":"Pudong New Area","address_district":null,"address_street":"Room 345, No. 2, Lane 1, Guanglan Road, Pudong New Area, Shanghai","price":30.0,"carriage":6.0,"state":"canceled","created_at":"2013-10-24T09:09:28.000Z","payed_at":null,"goods":["Nutritious breakfast: Full Score of ham and wheat"],"position":[121.53,31.22],"weight":70.0,"height":172.0,"sex_type":"female","birthday":"1988-01-01"}
[root@es-bulk tmp]# curl -XPOST 'localhost:9200/stuff_orders/_bulk?pretty' --data-binary @test.json
{
  "error" : {
    "root_cause" : [ {
      "type" : "action_request_validation_exception",
      "reason" : "Validation Failed: 1: no requests added;"
    } ],
    "type" : "action_request_validation_exception",
    "reason" : "Validation Failed: 1: no requests added;"
  },
  "status" : 400
}
[root@es-bulk tmp]# curl localhost:9200/stuff_orders/order_list/903713?pretty
{
  "_index" : "stuff_orders",
  "_type" : "order_list",
  "_id" : "903713",
  "found" : false
}
[root@es-bulk tmp]#

Posted on Wed, 01 Dec 2021 23:24:35 -0500 by tecmeister