-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory pressure when reading large topic from beginning #1046
Comments
As you correctly note, Sarama does not use a memory pool and simply relies on Go's garbage collector. You are also correct that the garbage collector does not return memory to the OS very frequently. However, Go's allocator is very good about reusing memory internally so I wouldn't expect enormous memory usage or growth; it's not something I've seen in similar use cases before. I would note that your messages are very large (1-2MB as you mention), which means the default Sarama configuration will not work well for you to begin with. Have you increased the value of Note that the default value of that config was recently bumped to 1MB (after the version you're using): #1024 which is probably still too small for your use case. If your messages are as large as you claim, I would try setting it to 8MB. |
Thansk for the answer. Indeed i had to increase it (to 2OMB actually, but i think 5MB would have been enough). From what i see, it doesn't seem to reuse the memory at all, because it consumes up to 2 GB of memory , then after a few minutes (of total inactivity, since all the daemon is doing is reacting to consummed messages) goes back to 200MB. Which means the total active memory shouldn't be a lot more than 200MB. Also, i set metrics.UseNilMetrics = true , because i feared that metrics gathering was the culprit, but it didn't change anything. |
Hmm... is your code doing anything unusual or potentially long-running with the messages that the consumer returns you? It's possible that those have some references back to the original packet data, so if you're holding on to them it might prevent anything from being reclaimed? |
I should add that the memory consumed during the draining of the topic seems proportional to the size of the topic (which is another thing i didn't expect). |
Probably the most useful investigation you could do right now is to use go's builtin memory profiling tools (https://golang.org/pkg/runtime/pprof/) to figure out more precisely where the bulk of the memory is being allocated, and why it isn't being reused/reclaimed. If there's a bug that will show us where it is, and if it's just the GC being lazy that will show us where to use memory pools (e.g. I suspect the decode process is probably going to be worse in this regard than reading the raw packet). |
Messages are just json-encoded structure, which i decode and then extract some info to store in a local map. It doesn't do anything else. In particular, if that were a memory leak on my part, it couldn't retrieve memory back after some time of inactivity. |
I did use pprof, and 99% of the memory is allocated by the decode function in Broker.go |
One interesting update : we've actually run the daemon in a more memory constrained environment, using kubernetes to limit the allowed memory to 1GB, and it doesn't seem to crash. Still, i'm curious to understand why the memory buffers inside go runtime aren't reused.. |
I have no idea why this is behaving the way it is, this is getting into the guts of the Go memory allocator and GC which are well beyond my knowledge. If the program runs fine in a more constrained system, then my best guess is that Go has decided allocating more (since it's available) is easier than reusing the existing chunks for some reason. I'm happy to take pull requests to e.g. use a |
@SimpleApp @eapache , i have faced similar issue too and this is how we solved it see if that makes any difference for you
https://github.com/Shopify/sarama/blob/master/config.go#L262
|
@chandradeepak Hi Sir, I have the same issue like you, my situation is I have a consumer that consume a lot of data from the consumer.Message(). and you said try not reading lot of data from this function. So have we can handle this case. Do you have any suggestion for me please? |
@l2eady some where in your code you should have } if that do some thing is hanging it won't go for next loop to read next message and that would slow it down. |
@chandradeepak Thank for your information. I was did that way, but it still not solve my problem about memory leak, and I try to trace the memory with pprof, and it show the sarama use a lot of memory in Broker file (sendAndReceieve, and also com/rcrowleygo-metrics-NewExpDecaySample) |
Versions
Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.
Sarama Version: 3b1b388
Kafka Version: 0.11.1
Go Version: go1.9.3 darwin/amd64
sarama-cluster version : v2.1.12 3001c2453136632aa3219a58ea3795bb584b83b5
Configuration
Problem arises in multiple configuration
Logs
N/A
Problem Description
Upon start of a daemon, I'm reading a fairly large topic (containing large messages, around 1MB / 2MB each) from the beginning, "draining it" from all the messages stored (using it like a poor-man's DB table). I know the limit of this approach, and i'm fine with it, but there is one problem i didn't expect :
it seems like reading a lot those messages makes the process memory raise tremendously, and the memory is only retrieved after many minutes (of doing nothing). I'm worried that under lower memory limit (using kubernetes), the process would simply crash when trying to read the topic.
My current understanding of the issue so far, and by reading similar issues, is that go garbage collector doesn't gives memory back very easily, but i'm wondering if sarama is itself reusing memory for the decoding operations from a memory pool, or if it's asking the runtime for new buffers for every message ?
In particular, this line in broker.go (line 530) made me wonder :
The text was updated successfully, but these errors were encountered: