Parallelise tasks with `parallel` #84

dfalster · 2016-04-12T00:31:02Z

i know proper parallelisation is something that is a pretty big, longer-term challenge for remake. But I wonder about this as a relatively easily implementation for the use case of iterating over a list. The idea came to me via @kunstler.

Lets's say you have a target of sort

ret:
    command: my_wrapper(my_list)

where

my_wrapper <- function(input) {
    lapply(input, my_fun)

If this is slow, we might want to do some parallel compute. At present, we can force (trick) remake into doing some parallel compute using a function like (based on parallel package):

my_wrapper_parallel <- function(input, ncores= detectCores()-2) {
  cl <- makeCluster(ncores)
  on.exit(stopCluster(cl))
  parLapply(input, my_fun)
}

Then the remake target would be

ret:
    command: my_wrapper_parallel(my_list)
    packages: parallel

So we can already do this. But then we'll be writing lots of wrapper functions, so why not make it a remake option?

Would probably need to come as part of a general list target (#8). E.g.

ret:
    command: remake_list_target(my_list, my_fun)
    parallel: parallel

Eventually one might addd other (more complicated) backends (like farming out to AWS), but starting with parallel seems like an at least potentially manageable shorter term goal.

The text was updated successfully, but these errors were encountered:

wlandau · 2016-05-20T10:09:22Z

Cool idea. Unfortunately though, in a distributed computing scenario, parallel in single R session only reaches a single node. Like you said, proper parallelization is a long-term challenge. What you really want is something like make with the -j flag. So for the short term, I wrote a package called parallelRemake. It generates multiple remake/YAML files and then creates an overarching Makefile to arrange them into parallelizable stages. That way, you can call make -j <whatever> to run parallel instances of remake. Clunky, I know, but it seems to work. There's also a function to help automate the generation of remake/YAML files by producing them from named lists.

wlandau · 2016-06-03T23:32:03Z

By the way, I also built workflowHelper on top of parallelRemake to handle certain kinds of common workflows in parallel without having to go through YAML (maybe a short-term special case of #20). Update: remakeGenerator is the successor to workflowHelper.

dfalster · 2016-06-05T21:20:48Z

thanks for the suggestions @wlandau. As I see it, there are at least two levels of parallelisation remake needs:

Local machine (single node)
Remote (cluster) machines (multiple nodes)

The solutions to these might be different. Your suggestion was aimed at No 2 but would be a bit heavy for No 1.

I know No 2 has been on @richfitz's todo list, but good to see you have some ideas on this and made a start via parallelremake. One thing i was wondering was whether, instead of writing multiple remake files, you do write a makefile that executed parts of the one master remake file:

So lets say your remake.yml file has:

targets:
  all:
    depends:
      - target1
      - target2

  target1:
    command: f1()
  target2:
    command: f2()

You could write a make file like this:

all: target1 target2

target1:  
    Rscript -e "remake::make('target1')"

target2:  
    Rscript -e "remake::make('target2')"

This would still enable you to exploit the make -j option, or alternatively submit jobs via a queuing system, while still working with a single remake file. To get it to work, you'd want to make sure your you symlinked the .remake folder on each local node back to the master, so that it wrote accessed any dependencies and also wrote results into the correct place.

wlandau · 2016-06-06T00:00:39Z

Thanks for the great idea, @dfalster! I just implemented it in the single_yaml_file branch of parallelRemake, which I'll merge to master along with an major revision of workflowHelper. It's really gratifying how the whole structure cleaned up instantly.

Right now, I need to parse commands a bit more intelligently to figure out dependencies for the master Makefile, but the guts of remake should take care of that.

Edit: Now using remake's parse_command function to resolve dependencies for the master Makefile.

wlandau · 2016-06-06T02:19:32Z

Both parallelRemake and workflowHelper now implement the suggestion by @dfalster on master. That was a quicker update than I thought it would be.

wlandau · 2016-11-26T22:52:26Z

Regarding @dfalster's suggestion for a single-node solution, how hard would it be to resolve parallelizable groups of commands within the existing topological sort? With that accomplished, it would seem easy to iterate sequentially over groups and use parallel::mclapply() within groups.

richfitz · 2016-11-30T09:52:28Z

Hi Will - with the current interfaces available to us in the parallel package, the scope for using it for this is pretty narrow; it will work in a few use-cases where the tree has a very particular shape but in general you'd be lucky to get a 2x speed up.

This problem was the motivation for some queuing packages that I wrote (rrqueue and rrq) but it's possible that Henrick's amazing looking future package here might be a better interface.

wlandau mentioned this issue Jun 6, 2016

Storage inefficiency wlandau/workflowHelper#1

Closed

richfitz added major work enhancement labels Oct 27, 2016

wlandau mentioned this issue Nov 26, 2016

Templating within a remakefile #2

Open

richfitz mentioned this issue Feb 6, 2017

Integration of "future" package #150

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelise tasks with `parallel` #84

Parallelise tasks with `parallel` #84

dfalster commented Apr 12, 2016

wlandau commented May 20, 2016 •

edited

Loading

wlandau commented Jun 3, 2016 •

edited

Loading

dfalster commented Jun 5, 2016

wlandau commented Jun 6, 2016 •

edited

Loading

wlandau commented Jun 6, 2016

wlandau commented Nov 26, 2016

richfitz commented Nov 30, 2016

Parallelise tasks with parallel #84

Parallelise tasks with parallel #84

Comments

dfalster commented Apr 12, 2016

wlandau commented May 20, 2016 • edited Loading

wlandau commented Jun 3, 2016 • edited Loading

dfalster commented Jun 5, 2016

wlandau commented Jun 6, 2016 • edited Loading

wlandau commented Jun 6, 2016

wlandau commented Nov 26, 2016

richfitz commented Nov 30, 2016

Parallelise tasks with `parallel` #84

Parallelise tasks with `parallel` #84

wlandau commented May 20, 2016 •

edited

Loading

wlandau commented Jun 3, 2016 •

edited

Loading

wlandau commented Jun 6, 2016 •

edited

Loading