An R package to provide mclapply() syntax for Windows machines. Has no effect on other platforms.
Note, this is an update of the script formerly found at
If you wish to continue using that version (for whatever reason), you can find the script at
and the accompanying blog post describing its use here.
Step 0: If you do not already have devtools
installed, install it using the instructions here. Note that for the purposes of this package, installing Rtools
is not necessary.
Step 1: Install parallelsugar
directly from my GitHub repository using install_github('nathanvan/parallelsugar')
. For the purposes of this package, you may ignore the error about Rtools
(unless you already have it installed, in which case the warning will not appear.)
> library(devtools)
WARNING: Rtools is required to build R packages, but is not currently
installed.
... snip ...
> install_github('nathanvan/parallelsugar')
Downloading github repo nathanvan/parallelsugar@master
Installing parallelsugar
... snip ...
* DONE (parallelsugar)
On Windows, the following line will take about 40 seconds to run because by default, mclapply
from the parallel
package is implemented as a serial function on Windows systems.
library(parallel)
system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
## user system elapsed
## 0.00 0.00 40.06
If we load parallelsugar
, the default implementation of parallel::mclapply
, which used fork based clusters, will be overwritten by parallelsugar::mclapply
, which is implemented with socket clusters. The above line of code will then take closer to 10 seconds.
library(parallelsugar)
##
## Attaching package: ‘parallelsugar’
##
## The following object is masked from ‘package:parallel’:
##
## mclapply
system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) )
## user system elapsed
## 0.04 0.08 12.98
By design, parallelsugar
approximates a fork based cluster -- every object that is within scope to the master R process is copied over to the processes on the other sockets. This implies that
- you can quickly run out of memory, and
- you can waste a lot of time copying over unnecessary objects hanging around in your R session.
Be warned!
## Load a package
library(Matrix)
## Define a global variable
a.global.variable <- Matrix::Diagonal(3)
## Define a global function
wait.then.square <- function(xx){
## Wait for 5 seconds
Sys.sleep(5);
## Square the argument
xx^2
}
## Check that it works with plain lapply
serial.output <- lapply( 1:4, function(xx) {
return( wait.then.square(xx) + a.global.variable )
})
## Test with the modified mclapply
par.output <- mclapply( 1:4, function(xx) {
return( wait.then.square(xx) + a.global.variable )
})
## Are they equal?
all.equal( serial.output, par.output )
## [1] TRUE
I put this together because it helped to solve a specific problem that I was having. If it solves your problem, please let me know. If it needs to be modified to solve your problem, please either
- open an issue on GitHub, or
- even better, fork, fix, and issue a pull request.