[ACCEPTED]-JSON Bulk import to Elasticstearch-elasticsearch

Accepted answer
Score: 33

Following the Bulk API documentation. You need to supply the bulk 24 operation with a file formatted very specifically:

NOTE: the 23 final line of data must end with a newline 22 character \n.

The possible actions are index, create, delete 21 and update. index and create expect a source 20 on the next line, and have the same semantics 19 as the op_type parameter to the standard 18 index API (i.e. create will fail if a document 17 with the same index and type exists already, whereas 16 index will add or replace a document as 15 necessary). delete does not expect a source 14 on the following line, and has the same 13 semantics as the standard delete API. update 12 expects that the partial doc, upsert and 11 script and its options are specified on 10 the next line.

If you’re providing text file 9 input to curl, you must use the --data-binary 8 flag instead of plain -d. The latter doesn’t 7 preserve newlines.

So you will need to change 6 the contents of your products.json file 5 to the following:

 {"index":{"_index":"cp", "_type":"products", "_id": "1"}}
 { "Title":"Product 1", "Description":"Product 1 Description", "Size":"Small", "Location":[{"url":"website.com", "price":"9.99", "anchor":"Prodcut 1"}],"Images":[{ "url":"product1.jpg"}],"Slug":"prodcut1"}
 {"index":{"_index":"cp", "_type":"products", "_id":"2"}}
 {"Title":"Product 2", "Description":"Prodcut 2 Desctiption", "Size":"large","Location":[{"url":"website2.com", "price":"99.94","anchor":"Product 2"},{"url":"website3.com","price":"79.95","anchor":"discount product 2"}],"Images":[{"url":"image.jpg"},{"url":"image2.jpg"}],"Slug":"product2"}

And be sure to use --data-binary in 4 your curl command (like your first command). Also 3 note the index and type can be omitted if you use 2 the index and type specific endpoint. Yours 1 is /cp/products like your 3rd curl command.

Score: 7

This was fast and worked for me on an array 6 of JSON objects.

cat data.json | \
jq -c '.[]  | .id = ._id | del (._id) | {"index": {"_index": "profiles", "_type": "gps", "_id": .id}}, .' |\
curl  -XPOST 127.0.0.1:9200/_bulk --data-binary @-

I had to do the copy and 5 delete of the _id field as the import threw 4 an error (Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters.) if it was not renamed. Most 3 data is unlikely to have an _id field in which 2 case this part should be omitted.

Credit 1 for this to Kevin Marsh

Score: 3

I ended up writing a bash script that is 3 "not at all optimized" to do this for me. The 2 dataset is relatively small so this will 1 work for my needs.

#!/bin/bash
COUNTER=0
CURLURL="http://127.0.0.1:9200/cp/products"
COUNT=$(less products.json | jq '.Products | length')    
while [  $COUNTER -lt $COUNT ]; do
  echo $COUNTER
  CURLDATA=$(less products.json | jq '.Products['$COUNTER']')
  RESPONSE=$(curl -XPOST "$CURLURL"  -d "$CURLDATA" -vn)
  let COUNTER=COUNTER+1
done
Score: 3

I was able to add the necessary headers 5 with the following sed script:

sed -e 's/^/{ "index" : {} }\n/' -i products.json

This will add 4 an empty index above each line in the file. An 3 empty index is allowed as long as the index 2 and type are specified in the URL. After 1 that, the proper call would be

curl -s -XPOST http://localhost:9200/cp/products/_bulk --data-binary @products.json
Score: 3

Another option is to use json-to-es-bulk tool.

Run the 3 following to convert your JSON file to NDJSON:

node ./index.js -f file.json --index index_name --type type_name

It 2 will create the request-data.txt file, which can be imported 1 with bulk:

curl -H "Content-Type: application/json" -XPOST "http://localhost:9200/my_index/my_type/_bulk?pretty" --data-binary "@request-data.txt"

More Related questions