Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermq falsely claims to submit jobs when running in local mode #196

Closed
dapritchard opened this issue Jun 4, 2020 · 3 comments
Closed

Comments

@dapritchard
Copy link

Thank you so much for this package. I've started it as a parallel backend for a drake pipeline, and have been very impressed over the performance improvements that I've observed. I'm interested in evaluating the use of clustermq / rzmq in settings outside of drake, but seemingly can't get the example using foreach listed in the User Guide (in the subsection titled "As parallel foreach backend") to work. What am I missing here?

In the example below on my 4-core machine, I would expect the following code to run in close to 5 seconds, yet it runs in close to 20 seconds. When I use similar code to run some heavy processing, I'm only observing one core doing significant work.

library(foreach)
(n_cores <- parallel::detectCores())
#> [1] 4
clustermq::register_dopar_cmq(n_jobs = n_cores)
system.time(foreach(i = seq_len(n_cores)) %dopar% Sys.sleep(5))
#> Submitting 4 worker jobs (ID: 6856) ...
#>    user  system elapsed 
#>   0.118   0.022  20.187

(Note that the text of this issue is copied from a Stack Overflow question: https://stackoverflow.com/questions/62134030/using-clustermq-r-package-as-a-parallel-backend-for-foreach).

@mschubert
Copy link
Owner

mschubert commented Jun 4, 2020

Are you using the local instead of the multicore backend?

If so, the following should fix it:

getOption("clustermq.scheduler") # check which scheduler is set
options(clustermq.scheduler = "multicore") # select multicore
# foreach example..

The reason for this is that we never select multicore automatically: it's bad locally because it may duplicate memory (and crash your computer), and it's bad on a computing cluster because it may exceed the number of cores you reserved.

When you load library(clustermq) and haven't set up your scheduler, it will tell you this:

Option 'clustermq.scheduler' not set, defaulting to ‘LOCAL’
--- see: https://mschubert.github.io/clustermq/articles/userguide.html#configuration

In your case it won't because you access the package namespace via clustermq::, and printing messages with accessing the namespace only is considered bad practice by CRAN.

But I can see how this is confusing, it should probably be more obvious than it is right now, or at least not tell you it is submitting jobs when it isn't.

@dapritchard
Copy link
Author

dapritchard commented Jun 4, 2020

Yes, that was exactly it! Thanks so much for your response. I have to admit that I had seen the Option 'clustermq.scheduler' not set, defaulting to 'LOCAL' message, but I actually thought that was the correct setting, although in retrospect I probably should have deduced otherwise.

A couple of things that would have tipped me off immediately what the problem was:

  1. If the "As parallel foreach backend" section of the User Guide had the options(clustermq.scheduler = "multicore") command mentioned in it.
  2. If the defaulting to 'LOCAL' message had the word "sequential" in there somewhere (maybe something like defaulting to 'SEQUENTIAL' or defaulting to 'LOCAL' (sequential).
  3. As you mentioned, if the Submitting 4 worker jobs didn't appear when using the LOCAL scheduler.

I'd be happy to try and tackle any of those tasks in a PR if you'd be interested (presumably the first 2 are trivial). Thanks again for this wonderful package!

@mschubert mschubert changed the title Using clustermq locally for parallel processing clustermq falsely claims to submit jobs when running in local mode Jun 6, 2020
@mschubert
Copy link
Owner

I'd be happy to take a PR. I suggest 1 and 3:

  1. Add a comment line here: # set up the scheduler first, otherwise this will run sequentially
  2. Renaming the "LOCAL" scheduler is a no go because it would break existing user setups. Adding a special case for the message would work
  3. Easiest would be to check qsys$id in workers.r, but it would probably be better to somehow handle this in the qsys's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants