Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing julia calls #18

Open
scottlcarter79 opened this issue Jul 15, 2018 · 2 comments
Open

Parallelizing julia calls #18

scottlcarter79 opened this issue Jul 15, 2018 · 2 comments

Comments

@scottlcarter79
Copy link

As expected, calling a julia function within a parallel loop (using e.g. foreach and %dopar%) results in errors as the threads collide.

Intuitively, it seems like there should be a way to startup a julia process for each thread and then pass the connection as an arg to the parallel function.

If someone could help walk me through this I would be very grateful.

Thanks

@oliviaAB
Copy link

oliviaAB commented Oct 4, 2018

Hi,

Here is what I did to parallelise my calls to Julia. I used the package parallel.

  1. Get the number of cores you want to use:
no_cores = parallel::detectCores()-1

and the port ID of the current Julia evaluator:

mybaseport = ev$port ## Get the port of the current Julia evaluator
  1. Start a cluster:
mycluster = parallel::makeCluster(no_cores, outfile = "")

If you need any package inside the cluster nodes, you can load them on the cluster nodes using:

parallel::clusterEvalQ(mycluster, library(XRJulia))

Same if you need a specific variable, or a function:

parallel::clusterExport(mycluster, "no_cores")
parallel::clusterExport(mycluster, "mysuperfunction")
  1. Create a function that will start a new Julia evaluator on each thread. The function takes as argument the ID of the port you'll be using for the connection to the new Julia process:
startJuliaEvCluster = function(portid){
  myev = RJulia(port = as.integer(portid), .makeNew = T) ## start on the node a Julia evaluator with specified port number
  return(myev) ## return the Julia evaluator ID
}

and call this function on each cluster node:

portList = sapply(1:no_cores, sum, mybaseport) ## Assign to each core a port number, starting from 1+port number of the current evaluator
infocores = parallel::clusterApply(mycluster, portList, startJuliaEvCluster)

infocores is then a list, with no_cores elements, each element being a Julia evaluator.

  1. Write the function that you want to run inside your threads, mine is about simulating a system:
simulateInCluster = function(i, infocores, param, ...){
  myev = infocores[[i - no_cores*(i-1)%/%no_cores]] ##  get the Julia evaluator corresponding to the current cluster node
  myparam = param[i] ## if for example you want to use a different value for a specific parameter at each iteration
  ... ## the calculation you want to make
  return(someresult)
}

The argument i will give you the number of the iteration you're performing. Say that you want to run 1000 iterations of your computation on 3 cores. The 1st iteration will be performed on the 1st node, 2nd iteration on the 2nd node, ..., 4th iteration on the 1st node again, etc. So for each iteration you want to use the Julia evaluator corresponding to the cluster node you're on (hence the i - no_cores*(i-1)%/%no_cores used to select which julia evaluator to use inside a node).

  1. Start the parallel computation:
 myresults = parallel::clusterApply(mycluster, 1:niterations, simulateInCluster, infocores = infocores, param=param, ...)
  1. Don't forget to close the cluster when you're done:
parallel::stopCluster(mycluster)

I hope this helped!

@scottlcarter79
Copy link
Author

scottlcarter79 commented Oct 4, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants