-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in pods #474
base: master
Are you sure you want to change the base?
Conversation
This still feels not right, please don't merge it. I tracked down the call to the clojure agent pool to be here. More to come 😄 |
The call to shutdown-agents is actually already done in the Having spent a lot of time profiling for memory leaks with pods (see #268), my own experience (once those fixed were made) has been that either your own code or libraries are creating resources that need to be cleared. We use this as our custom destroy-pod function: (defn destroy-pod
[^ClojureRuntimeShim pod]
(when pod
(boot.pod/with-eval-in pod
(when (find-ns 'clojure.core.async.impl.exec.threadpool)
(let [e (resolve 'clojure.core.async.impl.exec.threadpool/the-executor)]
(.shutdownNow ^java.util.concurrent.ExecutorService (var-get e))))
(when (find-ns 'clojure.core.async.impl.timers)
; This will cause the thread to go into a terminated state (and print a ThreadDeath error)
(let [t (resolve 'clojure.core.async.impl.timers/timeout-daemon)]
(try (.interrupt ^Thread (var-get t))
(catch Throwable t))))
(when (find-ns 'datomic.api)
(let [f (var-get (resolve 'datomic.api/shutdown))]
(f false))))
(.close pod)
(.close ^java.net.URLClassLoader (.getClassLoader pod)))) |
Oh, I am using |
Same applies to thread pools I guess...which is maybe why you have the datomic shutdown... |
Yep, any resource at all will do it. It's unfortunate that a decent number of libraries create resources for you by virtue of requiring them and don't properly expose those resources to be managed. |
Yes unfortunate, I avoided starting the db and added your destroy fn but still the memory grows and grows...I am using the Hikary pool, which now I need to try to shutdown properly I guess, btw thanks for the hint. |
Sure thing. Good luck. I spent way too many hours chasing this; I feel the pain. |
It is still very weird that the biggest memory consumption happen in |
Ah well, not at all weird as we are running inside a |
Yep, the symptom, not the cause. |
So basically every |
Seems like you're using mount. I'm only somewhat familiar with mount, but I thought it offered an api to terminate the resource held by the defstate var. Otherwise, you can just alter-var-root and set the var to nil (or explicitly close the resource at the var, if it has such a method)...or do the equivalent with an atom. |
Yes it does provide that, on stop I will |
This also means that each boot task which wants to be used in a |
About this, maybe we should enable custom code in |
Perhaps JMX can be used to find leaks automatically? The |
Ah, maybe something like a shutdown hook that can be installed in the pod to perform cleanup... |
Yes you are right, there are already many hooks, but the problem arises when a task ( |
Yeah exactly |
makes sense to me |
Maybe instead of an oblect here https://github.com/arichiardi/boot/blob/fix-pod-leak/boot/base/src/main/java/boot/App.java#L326 we pass an (thread-safe) container of functions? I was checking where/when it is better to add this |
Or an atom in (with-eval-in p (doseq [x @boot.pod/destroy-hooks] (x))) |
Yes that would work too |
Any objection on having a
instead? The caller would need to do (simplified):
|
The latter also means that all the code inside the |
Possibly... although it seems like there is a more elegant solution somewhere, maybe? Another option to consider is reaping defunct pods out-of-band. Another thread could periodically iterate over the pods in I would really prefer some way to use reflection or JMX debugging APIs or something like that to find threads that need to be shut down, automatically, without the need for hooks. I would also like to find out which things can prevent a pod from being eligible for GC. |
Regarding the |
Good points in both your comments, I was already working on this, but then stopped. I could create a task that does what you said above though, I am in a |
Also an atom probably would not solve my problem because again, how can I know the pod |
Yes, that's why maybe we need both. I mean if I make a task that creates a pod and my task is loading core.async in the pod, then my task should also ensure it gets cleaned up. How would that work with the dynamic var? |
Can you do a |
You can use |
Yeah..there are ways of solving this with brute force without changing |
In order to fix memory leaks in task-local pods, we need a way to execute custom cleaning at pod destroy time. This was particularly evident when doing `boot watch test` in a resource heavy app. Fixes adzerk-oss/boot-test#26
Or |
Uhm this is odd, I have to the say that now I am a bit lost http://pastebin.com/ZHVZ4rk8 |
Being a pod pool in
|
The pod pool creates new pods, it doesn't reuse them... |
For completeness, the above is generated with:
|
Some simple
|
Oh man, really? https://logging.apache.org/log4j/2.x/manual/migration.html
|
Oddly enough, alter-var-root does not seem to work from outside the pod probably because I need to alter the |
@micha I went through a debugging session again and this is the result: Maybe I asked you this question already, but shouldn't I excluded the primary cause of my leak (log4j2), but by debugging I noticed this as well and I think it should be fixed, but maybe again I don't see other factors in play now. As you can see in the bin as well, the |
The |
Ah right I remember now 😂 it is my printing keeping it there then? Because I see it growing and growing, even if I trigger the GC in VisualVM they are not GC-ed..someone is keeping them there |
Seems like that's the symptom, not the cause. Those keys will remain in the weak hash map until the pods are collected by GC, not the other way around. |
You are right, sorry to bother with these weird questions, but I am more and more confused. If I trigger the GC manually then the keys in |
I can close this one, nobody has complained and I have not found a solution anyways 😄 |
@arichiardi but don't we all agree that it still exists? If so I think we should keep it open just for reference. |
To be honest I don't know if it still exists. I can leave it open yes, maybe it is better after all. |
Just a note to myself, potential steps to reproduce:
|
This was particularly evident when doing
boot watch test
and has been fixed thanks to Micha'sintuition of adding a clojure.core/shutdown-agents in boot.pod/destroy-pod.
Fixes adzerk-oss/boot-test#26