-
-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java based learners fail with parallelMap multicore #1898
Comments
A current workaround is to load a learner from a savefile. E.g. if a learner is loaded from the > library("parallelMap")
> parallelStartMulticore(2)
Starting parallelization in mode=multicore with cpus=2.
> library("mlr")
Loading required package: ParamHelpers
> lrn = makeLearner("classif.IBk")
> resample(lrn, pid.task, cv5)
Mapping in parallel: mode = multicore; cpus = 2; elements = 5.
[Resample] cross-validation iter 1: [Resample] cross-validation iter 2: ^C^C^C^C^C
> q("yes")
$ R
R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
[...]
> library("mlr")
Loading required package: ParamHelpers
> library("parallelMap")
> parallelStartMulticore(2)
Starting parallelization in mode=multicore with cpus=2.
> resample(lrn, pid.task, cv5)
Mapping in parallel: mode = multicore; cpus = 2; elements = 5.
[Resample] cross-validation iter 1: [Resample] cross-validation iter 2: mmce.test.mean=0.266
mmce.test.mean=0.318
# no hang |
|
|
I ran my own rJava based custom learner. It works find single thread, however with parallelStartSocket() Exporting objects to slaves for mode socket: .mlr.slave.options 00001: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
|
(Note that |
Hi I confirmed the time out is caused by something different with this issue though the single thread didn't take such duration. |
Multicore uses the operating system's (I don't know parallelStartSocket that well however, so don't take my word for it.) |
Thanks for such general question. I understood parallelStartSocket has significant overhead compared to parallelStartMultiCore. In my case, 40 core CPU cannot be available without multi thread/process, and MultiCore option cannot be used for my Java based code (because of the original issue in this thread). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is because
fork()
, which multicore is ultimately based on, and the java VM don't play along well if java is started before the forking happens. Loading java based packages, e.g. "RWeka", seems to start the java VM, so if the package gets loaded outside of theparallelMap
call. it fails.If, on the other hand, the fork is before loading the java vm, it works fine:
I therefore suggest to have a
configureMlr
option to defer loading of packages until a learner'strain
orpredict
function gets called. The user would still need to be careful not to load "RWeka" when he wants to use multicore, but this at least would give him the option. When a learner gets constructed, instead of loading a learner's package, mlr should simply check whether the requested package exists.The text was updated successfully, but these errors were encountered: