-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unloading rJava and/or restarting JVM #25
Comments
👍 Apart from the mclapply scenario described by @jeroenooms, there is another concern: the JVM grabs heap memory and never again returns it to the OS. So, if you have a task that will perform better in Java but requires a large amount of memory, I would really like to able to:
Thus, being able to load/unload the JVM several times over time in a single R program could be really useful. Even better if the heap size (as per -Xmx) can be easily set when the JVM is loaded. |
I have exactly the same need as you @asieira, I run a machine learning algorithm which consumes a lot of memory, from the R extraTrees package (http://cran.r-project.org/web/packages/extraTrees). I need to run the algorithm several times on benchmark data, and after a while I repeatedly get an OutOfMemory error. My understanding of the problem is that there must certainly be some references to Java objects which are kept undefinitely in my R session between each call to extraTrees(), and thus can not be cleaned by the JVM's garbage collector. After some repeated calls, the JVM heap grows too much and ends in an OutOfMemory exception. Is there a correct way to overcome the problem? I thought about restarting the JVM each time but it doesn't seem to work:
|
@gasse yours seems like a completely different issue. The point @jeroenooms and myself are making with this issue is that we would very much like to be able to stop the entire JVM. For example, the implementation of a The first thing I would check on your case is whether you have a 32-bit or 64-bit JRE installed by running |
Hi, the problem may be quite different but a solution to your problem (clean Java memomy once and for all) is also a solution to my problem (clean Java memory after each call to the extraTrees() function). I am running 64-bits JRE, and I am also interested in being able to change the heap size of the JVM without having to restart my entire R session, by restarting the JVM for example. |
I see what you mean now, @gasse. So yes, being able to unload and reload the JVM at different times with different heap sizes would allow to accommodate for any memory leaks and/or differences in data size between runs. Hope the nice people that wrote rJava get to work on this issue soon. |
There are two problems here and those are the main reason why this is not supported and why it wouldn't work in most cases even if rJava allowed it. First, rJava has no control over any existing Java references, so it's essentially impossible to shutdown the JVM. Even if rJava would try it, those references would lead to a crash. In addition, destroying the VM is a voluntary operation, e.g., if ay threads are running, they will prevent the VM from shutting down. So, it's impossible to solve the practical issue above - if the JVM is already started before you get control, then whichever code started it has already created references that exist and cannot be removed (without that code providing a way to shut down and clear the references). What rJava could so is to provide ways to remove its own references and attempt a shutdown. However, for the above reasons that is pretty much guaranteed to fail, because it cannot release its class loader as long as there is even a single live object around which is pretty much guaranteed (unless someone just called .jinit() but didn't create any object - which is unlikely). |
Thank you for your answer, Simon. Do I understand it correctly, that even if we were to |
Hi, @s-u. Thank you for replying. Looking at the Oracle documentation on unloading the VM here I can only see an issue being raised with other threads left running. So yes, the R programmer would need to be diligent in making sure that no other threads are forgotten about in a running state. I am not clear though on what you mean by references. Are you concerned that R variables containing underlying C pointers to Java data that no longer exists will be accessed and cause a segmentation fault or similar? |
@jeroenooms yes, the only reliable way to terminate the JVM is via @asieira yes, exactly. The only way for those to get removed is if there is no reference to them anywhere in R. rJava also keeps some references cached internally, but those I can remove on rJava unload. |
@s-u don't all Java references encapsulated in R objects obtained through rJava? So, isn't it possible in theory to:
Is any of that even remotely feasible? Sorry if I'm suggesting vague and impossible things here, not really sure how the R and C integration works. |
No, because rJava doesn't own those references. Except for a few internal objects rJava releases all references to objects created through its API, so they get released by R if they are not in use. If they are in use, then even if rJava would keep track of them, they may not be released since someone uses them. Technically, it would be possible to keep a list of all allocated references and force-deallocate them, but it would be very expensive and break all packages that use rJava if someone destroys the JVM. As a user I would be certainly not happy about it such a behavior. |
I agree, that would be awful for other packages. What if the packages could provide two optional functions to .jinit that would be called respectively prior to unloading (to free up references) and when the JVM is reloaded (to return to a sane state), so they could handle this scenario themselves? In this case, if would still be backwards compatible if the JVM was never unloaded. And if it was, it would fall upon to the user to ensure he only loads rJava-dependent packages that can handle this situation. Maybe you could just fail the unloading if at least one package is present that is loaded but didn't register these optional functions in .jinit to make things even safer. |
Hi,
It would be nice if it would be possible to set the max heap size:
And and afterwards restart the JVM. As a workaround I can just restart the whole rsession in RStudio. /Manuel |
Yes, you should really set |
If I read the conversation up until now correctly, we are limited to one JVM per application and we can't guarantee/control a close of a JVM without generating overhead. Follow up naive question. It is conceptually possible to somehow isolate the JVM on a separate process and then accomplish communications between that separate process and R via a socket? A back of napkin script with DBI/RJDBC seemed to confirm that I could hold a separate db connection on a single SNOW node on localhost at the same time as the db connection on the master node. Of course in that case the communication with the JVM happens entirely within the slave node rather than needing to access/interact with R objects on the master - so maybe there is no way to accomplish anything like this would a massive rewrite? Of course, the socket would have to invalidate on a fork and spawn its own JVM - but that part seems trivial relative to the potential costs/issues of isolating the JVM on a separate process/in a separate application. There would be overhead, of course, so this probably would have to be implemented as an option rather than as a default mode of operation. |
Sure, it's relatively easy to have separate processes. However, the point of rJava is exactly to NOT have a separate process, otherwise you could have used Rserve, SNOW or anything like that which has other benefits - but speed and memory efficiency are not among them. So you should pick the tool that is best for the requirements you have. |
To everyone reading this who have problems with the JVM not releasing the heap memory back to the OS: @asieira 's claim that
is true only for the default GC in some early version of Java (prior to 1.8 IIRC). It is indeed possible to choose a GC that would release the heap on performing the full garbage collection, see https://www.javacodegeeks.com/2017/11/minimize-java-memory-usage-right-garbage-collector.html. Furthermore, some (modern) GC's do not even require a full garbage collection to keep the JVM's memory footprint down to the amount it actually uses, see https://openjdk.java.net/jeps/346 and https://wiki.openjdk.java.net/display/shenandoah/Main. |
While trying to address #334 I have now final proof that it is impossible to truly unload or even re-initialize a VM. I have implemented a fully clean approach (just pure C program) which dynamically loads a JVM with |
Thanks for the update @s-u . Not to hijack this thread but I do have a question related to JVM instantiation: Will it ever be the case where JVM instances wil be shared across child R Session processes such that the rJava interface to calling out to the JVM host would amount to basically a new thread invocation on a shared VM across different R Sessions, or is it the case that as long as child R sessions materialize in separate processes, you will get a new JVM? And taking this a little further, if ever R manages to introduce a way of parallel compute which doesn't involve new R processes (ie: work more single-process/multi-thread) would that mean that child 'sessions' of R would share a process and therefore the JVM? I'm asking these so that we can avoid any pitfalls around some of our programming decisions about these technical considerations. |
[Repost from this SO question].
I would like to use
rJava
in combination withmcparallel
but obviously the JVM cannot be forked. Therefore a separate JVM instance needs to be initiated for each child process, e.g:However the problem in my case is that the JVM has already been initiated in the (main) parent process as well. This makes it impossible to use
rJava
in the child process:So what I really need is a way to shutdown/kill and restart the JVM in the child process. Simply
detach("package:rJava", unload = TRUE)
doesn't seem to do the trick. Theforce.init
parameter doesn't seem to result in a restart either:Is there some way I can forcefully shutdown/kill the JVM in order to reinitiate it in the child process?
The text was updated successfully, but these errors were encountered: