Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different id when I use bulk or create #1475

Closed
lepi1 opened this issue Aug 27, 2021 · 5 comments
Closed

Different id when I use bulk or create #1475

lepi1 opened this issue Aug 27, 2021 · 5 comments

Comments

@lepi1
Copy link

lepi1 commented Aug 27, 2021

elasticsearch-ruby 7.0.0

If I do a bulk with this id: Mi LED 4K TV 4S 44'' and then do a create with the same id, it will save two different records.

Example (steps):
client = Elasticsearch::Client.new
client.bulk(body: [{create: { _index: @index_name, _id: "Mi LED 4K TV 4S 44''", data: {doc: 'Hello' }}}])
client.create(index: @index_name, id: "Mi LED 4K TV 4S 44''", body: {doc: ''})
then
client.search(index: @index_name, id: "Mi LED 4K TV 4S 44''")

result:

=> {"took"=>7,
 "timed_out"=>false,
 "_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0},
 "hits"=>
  {"total"=>{"value"=>2, "relation"=>"eq"},
   "max_score"=>1.0,
   "hits"=>
    [{"_index"=>"lepi_6", "_type"=>"_doc", "_id"=>"Mi LED 4K TV 4S 44''", "_score"=>1.0, "_source"=>{"doc"=>"Hello"}},
     {"_index"=>"lepi_6", "_type"=>"_doc", "_id"=>"Mi+LED+4K+TV+4S+44''", "_score"=>1.0, "_source"=>{"doc"=>""}}]}}

Expected behaviour: save only one record
@paulslaby

@lepi1
Copy link
Author

lepi1 commented Nov 1, 2021

Can someone please take a look?

@picandocodigo
Copy link
Member

Hi @lepi1,
The issue here is index (called by create) is calling an Util function listify on the id, which escapes the characters. A fix would be to use CGI.escape or EscapeUtils.escape_url (if you're using that gem in your project) on the id before passing it to bulk.

@picandocodigo
Copy link
Member

We're planning to discuss a common approach withing the team to address this issue. A workaround has been provided in the meantime.

@picandocodigo
Copy link
Member

This has been fixed on #1618, and has been backported to 7.16, 7.17 and 8.0. So the correct behaviour is going to be functional from versions 8.0.0, 7.16.3 and 7.17.0 of the client. The __escape function in utils was updated to escape spaces to %20. This way, when you use a value such as Mi LED 4K TV 4S 44'', Elasticsearch will store Mi LED 4K TV 4S 44'' instead of Mi+LED+4K+TV+4S+44''.

@picandocodigo
Copy link
Member

Closing this issue as the fix is now available in v7.16.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants