-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded memory growth / leak in pipelines #209
Comments
Here are some screenshots from our continuous profiler showing 24 hours of data (averaged over 250 samples) from yesterday with olric and the same day last week without olric running. In Use Heap ProfilesBefore OlricAfter OlricAlloc Heap ProfilesBefore OlricAfter OlricCPU ProfilesI expected an increase in CPU given that there would be serialization during lookups compared to the local cache with ristretto, but wasn't really expecting the 5% CPU to turn into > 80% CPU for the whole service dedicated for Olric. A decent chunk of that is from increased GC from all the allocations. Before OlricAfter Olric |
The good news is that this also supports that it's primarily an issue with pipelines. In a separate service that only does single gets/puts and no pipelines that has been running in production longer, while there is definitely an increase in heap usage and GC, it doesn't completely dominate the entire service, of which olric is just a small part. In Use Heap ProfilesBefore OlricAfter OlricAlloc Heap ProfilesBefore OlricAfter OlricCPU ProfilesBefore OlricAfter Olric |
@derekperkins thank you for the detailed bug report. I applied a fix to overcome the problem. Previously, The other problem was the routing table fetch interval. Now you can use So my theory was you have many pipeline instances in runtime and they had their own There is a lot of room to improve the pipeline implementation but the latest version should fix the memory leak problem. Could you upgrade to Olric v0.5.2 and observe the system? |
v0.5.2 has been running for about 10 hours now and looks solid. I can confirm that these fixes solved the memory leak. In Use Heap ProfileAlloc Heap ProfileCPU ProfileAs you mentioned, there is definitely still room for improvement, but at least this is stable enough to keep running. I'm seeing an increased number of of Thanks for the quick fix! |
Here's the RAM usage over the course of 2.5 hours, showing unbounded memory growth. This happens on every pod until they gets OOMKilled. There are about 100 RPS to the grpc api, on 30 separate pods. Each request makes on average 1 pipeline Exec call with about 100 keys per call. I've run
pprof
repeatedly that gives different results than my OS metrics are telling me. I'm only seeing this when using the pipeline, our other service just does single Get/Put calls and maintains steady memory usage just a little over the DMap allocation.There are 2 DMaps, with the following config.
pprof -alloc_objects
pprof.main_linux64.alloc_objects.alloc_space.inuse_objects.inuse_space.012.pb.gz
pprof -alloc_objects
pprof.main_linux64.alloc_objects.alloc_space.inuse_objects.inuse_space.011.pb.gz
pprof -inuse_space
pprof.main_linux64.alloc_objects.alloc_space.inuse_objects.inuse_space.014.pb.gz
pprof -alloc_space
pprof.main_linux64.alloc_objects.alloc_space.inuse_objects.inuse_space.013.pb.gz
The text was updated successfully, but these errors were encountered: