Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Extreme latency in BulkIndexer #113

Open
dokterbob opened this issue Jun 2, 2022 · 7 comments
Open

[BUG] Extreme latency in BulkIndexer #113

dokterbob opened this issue Jun 2, 2022 · 7 comments
Labels
bug Something isn't working good first issue Good for newcomers performance Make it fast!

Comments

@dokterbob
Copy link

What is the bug?
It seems that with a BulkIndexer with 2 workers, I am getting unexpected latency on BulkIndexer.Add(). It seems that somehow the workers are not consuming the queue within any reasonable sort of timeframe, I'm seeing delays of over 20s!

For example, in the last hour I've 53 cases of >1s latency on just Add() out of a total of 174 calls.

How can one reproduce the bug?
With 2 workers running, adding items from different goroutines and a relatively busy search cluster.

What is the expected behavior?
Sub-millisecond latencies, basically the time it takes to shove something into a channel.

What is your host/environment?

  • OS: Ubuntu 20.02
  • Version: 1.1.0 (but nothing has changed to the bulkindexer since the fork from ES)

Do you have any screenshots?
image

@dokterbob dokterbob added the bug Something isn't working label Jun 2, 2022
@dokterbob
Copy link
Author

dokterbob commented Jun 2, 2022

Note; returning to the default of numCPU workers seems to alleviate the issue, but given the significant delays (seconds versus sub-millisecond) I would still strongly argue that there is an underlying issue here. At the very least I would suggest documenting this unexpected behaviour.

Please see the difference below:
image

The issue seems reduced but it is still occurring!
image

CPU load on this server is around 10% and the load average is around 4. There is still about 10% of Add() calls which takes >1s.

@dokterbob dokterbob changed the title [BUG] Race condition in BulkIndexer [BUG] Extreme latency in BulkIndexer Jun 2, 2022
@dokterbob
Copy link
Author

@VijayanB @VachaShah Any ideas?

@dokterbob
Copy link
Author

Poke!

@dblock
Copy link
Member

dblock commented Jul 11, 2022

@dokterbob This looks visible problematic, but doesn't look like the folks here got to looking into it. Let's try to move this forward? First, what's the easiest way to reproduce this (maybe post code similar to the benchmarks in this project)? Are you able to bulk load data a lot faster into this instance with other mechanisms (aka is this a client issue for sure)?

@APoolio
Copy link

APoolio commented Aug 5, 2022

@dokterbob Do you mind posting some of the code you were using to help pinpoint this issue?

@dokterbob
Copy link
Author

Sorry, I didn't see the messages. Code is https://github.com/ipfs-search/ipfs-search/ but of course you'll need a more detailed test case.

After increasing the workers it seems the problem has become less severe. Now that it's been picked up I'll see if I can get more concrete feedback over the next couple of weeks.

@wbeckler wbeckler added the good first issue Good for newcomers label Jan 18, 2023
@zethuman
Copy link
Contributor

zethuman commented Apr 4, 2023

@dokterbob hey, I want to work on solving the issue, how relevant is this?
Are there any problems now, or was it on the client side?

@dblock dblock added the performance Make it fast! label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers performance Make it fast!
Projects
None yet
Development

No branches or pull requests

5 participants