-
Notifications
You must be signed in to change notification settings - Fork 28
/
Copy pathuserguide.Rmd
679 lines (531 loc) · 25.2 KB
/
userguide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
---
title: "User Guide"
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{User Guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{css echo=FALSE}
img {
border: 0px !important;
margin: 2em 2em 2em 2em !important;
}
code {
border: 0px !important;
}
```
```{r echo=FALSE, results="hide"}
knitr::opts_chunk$set(
cache = FALSE,
echo = TRUE,
collapse = TRUE,
comment = "#>"
)
options(clustermq.scheduler = "local")
suppressPackageStartupMessages(library(clustermq))
```
## Installation
Install the `clustermq` package in R from CRAN. This will automatically detect
if [ZeroMQ](https://github.com/zeromq/libzmq) is installed and otherwise use
the bundled library:
```{r eval=FALSE}
# Recommended:
# If your system has `libzmq` installed but you want to enable the worker
# crash monitor, set the environment variable below to use the bundled
# `libzmq` library with the required feature (`-DZMQ_BUILD_DRAFT_API=1`):
# Sys.setenv(CLUSTERMQ_USE_SYSTEM_LIBZMQ=0)
install.packages("clustermq")
```
Alternatively you can use the `remotes` package to install directly from
Github. Note that this version needs `autoconf`/`automake` and `CMake` for
compilation:
```{r eval=FALSE}
# Sys.setenv(CLUSTERMQ_USE_SYSTEM_LIBZMQ=0)
# install.packages('remotes')
remotes::install_github("mschubert/clustermq")
# remotes::install_github("mschubert/clustermq@develop") # dev version
```
In the [`develop`](https://github.com/mschubert/clustermq/tree/develop) branch,
we will introduce code changes and new features. These may contain bugs, poor
documentation, or other inconveniences. This branch may not install at times.
However, [feedback is very
welcome](https://github.com/mschubert/clustermq/issues/new).
For any installation issues please see the
[FAQ](https://mschubert.github.io/clustermq/articles/faq.html).
## Configuration
An HPC cluster's scheduler ensures that computing jobs are distributed to
available worker nodes. Hence, this is what `clustermq` interfaces with in
order to do computations.
By default, we will take whichever scheduler we find and fall back on local
processing. This will work in most, but not all cases. You may need to
configure your scheduler.
### Setting up the scheduler {#scheduler-setup}
To set up a scheduler explicitly, see the following links:
* [SLURM](#slurm) - *should work without setup*
* [LSF](#lsf) - *should work without setup*
* [SGE](#sge) - *may require configuration*
* [PBS](#pbs)/[Torque](#torque) - *needs* `options(clustermq.scheduler="PBS"/"Torque")`
* you can suggest another scheduler by [opening an
issue](https://github.com/mschubert/clustermq/issues)
You may in addition need to activate [compute environments or
containers](#environments) if your shell (_e.g._ `~/.bashrc`) does not do this
automatically.
Check the
[FAQ](https://mschubert.github.io/clustermq/articles/faq.html) if your
job submission/call to `Q` errors or gets stuck.
### Local parallelization
While this is not the main focus of the package, you can use it to parallelize
function calls locally on multiple cores or processes. This can also be useful
to test your code on a subset of the data before submitting it to a scheduler.
* Multiprocess (*recommended*) - Use the `callr` package to run and manage
multiple parallel R processes with `options(clustermq.scheduler="multiprocess")`
* Multicore - Uses the `parallel` package to fork the current R process into
multiple threads with `options(clustermq.scheduler="multicore")`. This
sometimes causes problems (macOS, RStudio) and is not available on Windows.
### SSH connector
There are reasons why you might prefer to not to work on the computing cluster
directly but rather on your local machine instead.
[RStudio](https://posit.co/products/open-source/rstudio/) is an excellent local
IDE, it's more responsive than and feature-rich than browser-based solutions
([RStudio server](https://posit.co/products/open-source/rstudio-server/),
[Project Jupyter](https://jupyter.org/)), and it avoids X forwarding issues
when you want to look at plots you just made.
Using this setup, however, you lost access to the computing cluster. Instead,
you had to copy your data there, and then submit individual scripts as jobs,
aggregating the data in the end again. `clustermq` is trying to solve this by
providing a transparent SSH interface.
In order to use `clustermq` from your local machine, the package needs to be
installed on both there and on the computing cluster. On the computing cluster,
[set up your scheduler](#scheduler-setup) and make sure `clustermq` runs there
without problems. Note that the *remote scheduler* can not be `LOCAL` (default
if no HPC scheduler found) or `SSH` for this to work.
```{r eval=FALSE}
# If this is set to 'LOCAL' or 'SSH' you will get the following error:
# Expected PROXY_READY, received ‘PROXY_ERROR: Remote SSH QSys is not allowed’
options(
clustermq.scheduler = "multiprocess" # or multicore, LSF, SGE, Slurm etc.
)
```
On your *local machine*, add the following options:
```{r eval=FALSE}
options(
clustermq.scheduler = "ssh",
clustermq.ssh.host = "user@host", # use your user and host, obviously
clustermq.ssh.log = "~/cmq_ssh.log" # log for easier debugging
)
```
We recommend that you [set up SSH keys](https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server) for password-less login.
## Usage
### The `Q` function
The following arguments are supported by `Q`:
* `fun` - The function to call. This needs to be self-sufficient (because it
will not have access to the `master` environment)
* `...` - All iterated arguments passed to the function. If there is more than
one, all of them need to be named
* `const` - A named list of non-iterated arguments passed to `fun`
* `export` - A named list of objects to export to the worker environment
Behavior can further be fine-tuned using the options below:
* `fail_on_error` - Whether to stop if one of the calls returns an error
* `seed` - A common seed that is combined with job number for reproducible results
* `memory` - Amount of memory to request for the job (`bsub -M`)
* `n_jobs` - Number of jobs to submit for all the function calls
* `job_size` - Number of function calls per job. If used in combination with
`n_jobs` the latter will be overall limit
* `chunk_size` - How many calls a worker should process before reporting back
to the master. Default: every worker will report back 100 times total
The full documentation is available by typing `?Q`.
### Examples
The package is designed to distribute arbitrary function calls on HPC worker
nodes. There are, however, a couple of caveats to observe as the R session
running on a worker does not share your local memory.
The simplest example is to a function call that is completely self-sufficient,
and there is one argument (`x`) that we iterate through:
```{r}
fx = function(x) x * 2
Q(fx, x=1:3, n_jobs=1)
```
Non-iterated arguments are supported by the `const` argument:
```{r}
fx = function(x, y) x * 2 + y
Q(fx, x=1:3, const=list(y=10), n_jobs=1)
```
If a function relies on objects in its environment that are not passed as
arguments (including other functions), they can be exported using the `export` argument:
```{r}
fx = function(x) x * 2 + y
Q(fx, x=1:3, export=list(y=10), n_jobs=1)
```
If we want to use a package function we need to load it on the worker using the
`pkgs` parameter, or referencing it with `package_name::`:
```{r}
f1 = function(x) splitIndices(x, 3)
Q(f1, x=3, n_jobs=1, pkgs="parallel")
f2 = function(x) parallel::splitIndices(x, 3)
Q(f2, x=8, n_jobs=1)
# Q(f1, x=5, n_jobs=1)
# (Error #1) could not find function "splitIndices"
```
### As parallel `foreach` backend
The [`foreach`](https://cran.r-project.org/package=foreach) package provides an
interface to perform repeated tasks on different backends. While it can perform
the function of simple loops using `%do%`:
```{r}
library(foreach)
foreach(i=1:3) %do% sqrt(i)
```
it can also perform these operations in parallel using `%dopar%`:
```{r}
foreach(i=1:3) %dopar% sqrt(i)
```
The latter allows registering different handlers for parallel execution, where
we can use `clustermq`:
```{r}
# set up the scheduler first, otherwise this will run sequentially
# this accepts same arguments as `Q`
# the number of jobs is ignored here since we're using the LOCAL scheduler
clustermq::register_dopar_cmq(n_jobs=2, memory=1024)
# this will be executed as jobs
foreach(i=1:3) %dopar% sqrt(i)
```
As [BiocParallel](https://bioconductor.org/packages/release/bioc/html/BiocParallel.html)
supports `foreach` too, this means we can run all packages that use `BiocParallel`
on the cluster as well via `DoparParam`.
```{r}
library(BiocParallel)
# the number of jobs is ignored here since we're using the LOCAL scheduler
clustermq::register_dopar_cmq(n_jobs=2, memory=1024)
register(DoparParam())
bplapply(1:3, sqrt)
```
### With `targets`
The [`targets`](https://github.com/ropensci/targets) package enables users to
define a dependency structure of different function calls, and only evaluate
them if the underlying data changed.
> The `targets` package is a [Make](https://www.gnu.org/software/make/)-like
> pipeline tool for statistics and data science in R. The package skips costly
> runtime for tasks that are already up to date, orchestrates the necessary
> computation with implicit parallel computing, and abstracts files as R
> objects. If all the current output matches the current upstream code and
> data, then the whole pipeline is up to date, and the results are more
> trustworthy than otherwise.
It can use `clustermq` to [perform calculations as
jobs](https://books.ropensci.org/targets/hpc.html#clustermq).
## Options
The various configurable options are mentioned throughout the documentation,
where applicable, however, we list all of the options here for reference.
Options can be set by including a call to `options(<key> = <value>)` in your
current session or added as a line to your `~/.Rprofile`. The former will only
be available in your active session, while the latter will be available any
time after you restart R.
* `clustermq.scheduler` - One of the supported
[`clustermq` schedulers](#configuration); options are `"LOCAL"`,
`"multiprocess"`, `"multicore"`, `"lsf"`, `"sge"`, `"slurm"`, `"pbs"`,
`"Torque"`, or `"ssh"` (default is the HPC scheduler found in `$PATH`,
otherwise `"LOCAL"`)
* `clustermq.host` - The name of the node or device for constructing the
`ZeroMQ` host address (default is `Sys.info()["nodename"]`)
* `clustermq.ports` - A port range used by `clustermq` to initiate connections.
(default: `6000:9999`) Important: This option - when used with the ssh
connector - must be set as an option on the remote host.
* `clustermq.ssh.host` - The user name and host for
[connecting to the HPC via SSH](#ssh-connector) (e.g. `user@host`); we
recommend setting up SSH keys for password-less login
* `clustermq.ssh.log` - Path for a file (on the SSH host) that will be created
and populated with logging information regarding the SSH connection
(e.g. `"~/cmq_ssh.log"`); helpful for debugging purposes
* `clustermq.ssh.timeout` - The amount of time to wait (in seconds) for a SSH
start-up connection before timing out (default is `10` seconds)
* `clustermq.ssh.hpc_fwd_port` - Port that will be opened for SSH reverse
tunneling between the workers on the HPC and a local session.
Can also be specified as a port range that clustermq will sample from.
(default: one integer randomly sampled from the range between 50000 and
55000)
* `clustermq.worker.timeout` - The amount of time to wait (in seconds) for
master-worker communication before timing out (default is to wait
indefinitely)
* `clustermq.template` - Path to a [template file](#scheduler-templates) for
submitting HPC jobs; only necessary if using your own template, otherwise
the default template will be used (default depends on set or inferred
`clustermq.scheduler`)
* `clustermq.data.warning` - The threshold for the size of the common data (in
Mb) before `clustermq` throws a warning (default is `1000`)
* `clustermq.defaults` - A named-list of default values for the HPC template;
this takes precedence over defaults specified in the template file
(default is an empty list)
## Debugging workers
Function calls evaluated by workers are wrapped in event handlers, which means
that even if a call evaluation throws an error, this should be reported back to
the main R session.
However, there are reasons why workers might crash, and in which case they can
not report back. These include:
* A segfault in a low-level process
* Process kill due to resource constraints (e.g. walltime)
* Reaching the wait timeout without any signal from the master process
* Probably others
In this case, it is useful to have the worker(s) create a log file that will
also include events that are not reported back. It can be requested using:
```{r eval=FALSE}
Q(..., log_worker=TRUE)
```
This will create a file called *<cmq_id>-<array_index>.log* in your current
working directory, irrespective of which scheduler you use.
You can customize the file name using
```{r eval=FALSE}
Q(..., template=list(log_file = <yourlog>))
```
Note that in this case `log_file` is a template field of your scheduler script,
and hence needs to be present there in order for this to work. The default
templates all have this field included.
In order to log each worker separately, some schedulers support wildcards in
their log file names. For instance:
* Multicore/Multiprocess: `log_file="/path/to.file.%i"`
* SGE: `log_file="/path/to.file.$TASK_ID"`
* LSF: `log_file="/path/to.file.%I"`
* Slurm: `log_file="/path/to.file.%a"`
* PBS: `log_file="/path/to.file.$PBS_ARRAY_INDEX"`
* Torque: `log_file="/path/to.file.$PBS_ARRAYID"`
Your scheduler documentation will have more details about the available
options.
When reporting a bug that includes worker crashes, please always include a log
file.
## Environments
In some cases, it may be necessary to activate a specific computing environment
on the scheduler jobs prior to starting up the worker. This can be, for
instance, because *R* was only installed in a specific environment or
container.
Examples for such environments or containers are:
* [Bash module](https://modules.sourceforge.net/) environments
* [Conda](https://docs.conda.io/) environments
* [Docker](https://www.docker.com/)/[Singularity](https://singularity.lbl.gov/) containers
It should be possible to activate them in the job submission script (i.e., the
template file). This is widely untested, but would look the following for the
[LSF](#lsf) scheduler (analogous for others):
```{sh eval=FALSE}
#BSUB-J {{ job_name }}[1-{{ n_jobs }}] # name of the job / array jobs
#BSUB-o {{ log_file | /dev/null }} # stdout + stderr
#BSUB-M {{ memory | 4096 }} # Memory requirements in Mbytes
#BSUB-R rusage[mem={{ memory | 4096 }}] # Memory requirements in Mbytes
##BSUB-q default # name of the queue (uncomment)
##BSUB-W {{ walltime | 6:00 }} # walltime (uncomment)
module load {{ bashenv | default_bash_env }}
# or: source activate {{ conda | default_conda_env_name }}
# or: your environment activation command
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
This template still needs to be filled, so in the above example you need to
pass either
```{r eval=FALSE}
Q(..., template=list(bashenv="my environment name"))
```
or set it via an option:
```{r eval=FALSE}
options(
clustermq.defaults = list(bashenv="my default env")
)
```
## Scheduler templates
The package provides its own default scheduler templates, similar to the ones
listed below. Which template is used is decided based on which scheduler
submission executable is present in the user's `$PATH`, e.g. `sbatch` for SLURM
or `bsub` for LSF. `qsub` is ambiguous between SGE and PBS/Torque, so in this
case `options(clustermq.scheduler = "<opt>")` should be set to the correct one.
A user can provide their own template file via `options(clustermq.template =
"<file>")`, containing arbitrary template values `{{ value | default }}`. These
values will be filled upon job submission in the following order of priority:
1. The argument provided to `Q(..., template=list(key = value))` or
`workers(... template=list(key = value))`
2. The value of `getOption("clustermq.defaults")`
3. The default value inside the template
### LSF
Set the following options in your _R_ session that will submit jobs:
```{r eval=FALSE}
options(
clustermq.scheduler = "lsf",
clustermq.template = "/path/to/file/below" # if using your own template
)
```
To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.
```{sh eval=FALSE}
#BSUB-J {{ job_name }}[1-{{ n_jobs }}] # name of the job / array jobs
#BSUB-n {{ cores | 1 }} # number of cores to use per job
#BSUB-o {{ log_file | /dev/null }} # stdout + stderr; %I for array index
#BSUB-M {{ memory | 4096 }} # Memory requirements in Mbytes
#BSUB-R rusage[mem={{ memory | 4096 }}] # Memory requirements in Mbytes
##BSUB-q default # name of the queue (uncomment)
##BSUB-W {{ walltime | 6:00 }} # walltime (uncomment)
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
In this file, `#BSUB-*` defines command-line arguments to the `bsub` program.
* Memory: defined by `BSUB-M` and `BSUB-R`. Check your local setup if the
memory values supplied are MiB or KiB, default is `4096` if not requesting
memory when calling `Q()`
* Queue: `BSUB-q default`. Use the queue with name *default*. This will most
likely not exist on your system, so choose the right name and uncomment by
removing the additional `#`
* Walltime: `BSUB-W {{ walltime }}`. Set the maximum time a job is allowed to
run before being killed. The default here is to disable this line. If you
enable it, enter a fixed value or pass the `walltime` argument to each
function call. The way it is written, it will use 6 hours if no arguemnt is
given.
* For other options, see [the LSF
documentation](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options)
and add them via `#BSUB-*` (where `*` represents the argument)
* Do not change the identifiers in curly braces (`{{ ... }}`), as they are used
to fill in the right variables
Once this is done, the package will use your settings and no longer warn you of
the missing options.
### SGE
Set the following options in your _R_ session that will submit jobs:
```{r eval=FALSE}
options(
clustermq.scheduler = "sge",
clustermq.template = "/path/to/file/below" # if using your own template
)
```
To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.
```{sh eval=FALSE}
#$ -N {{ job_name }} # job name
##$ -q default # submit to queue named "default"
#$ -j y # combine stdout/error in one file
#$ -o {{ log_file | /dev/null }} # output file
#$ -cwd # use pwd as work dir
#$ -V # use environment variable
#$ -t 1-{{ n_jobs }} # submit jobs as array
#$ -pe smp {{ cores | 1 }} # number of cores to use per job
#$ -l m_mem_free={{ memory | 1073741824 }} # 1 Gb in bytes
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
In this file, `#$-*` defines command-line arguments to the `qsub` program.
* Queue: `$ -q default`. Use the queue with name *default*. This will most
likely not exist on your system, so choose the right name and uncomment by
removing the additional `#`
* For other options, see [the SGE
documentation](https://gridscheduler.sourceforge.net/htmlman/manuals.html). Do
not change the identifiers in curly braces (`{{ ... }}`), as they are used to
fill in the right variables.
Once this is done, the package will use your settings and no longer warn you of
the missing options.
### SLURM
Set the following options in your _R_ session that will submit jobs:
```{r eval=FALSE}
options(
clustermq.scheduler = "slurm",
clustermq.template = "/path/to/file/below" # if using your own template
)
```
To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.
```{sh eval=FALSE}
#!/bin/sh
#SBATCH --job-name={{ job_name }}
##SBATCH --partition=default
#SBATCH --output={{ log_file | /dev/null }}
#SBATCH --error={{ log_file | /dev/null }}
#SBATCH --mem-per-cpu={{ memory | 4096 }}
#SBATCH --array=1-{{ n_jobs }}
#SBATCH --cpus-per-task={{ cores | 1 }}
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
In this file, `#SBATCH` defines command-line arguments to the `sbatch` program.
* Partition: `SBATCH --partition default`. Use the queue with name *default*.
This will most likely not exist on your system, so choose the right name and
uncomment by removing the additional `#`
* For other options, see [the SLURM
documentation](https://slurm.schedmd.com/sbatch.html). Do not change the
identifiers in curly braces (`{{ ... }}`), as they are used to fill in the
right variables.
Once this is done, the package will use your settings and no longer warn you of
the missing options.
### PBS
Set the following options in your _R_ session that will submit jobs:
```{r eval=FALSE}
options(
clustermq.scheduler = "pbs",
clustermq.template = "/path/to/file/below" # if using your own template
)
```
To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.
```{sh eval=FALSE}
#PBS -N {{ job_name }}
#PBS -J 1-{{ n_jobs }}
#PBS -l select=1:ncpus={{ cores | 1 }}:mpiprocs={{ cores | 1 }}:mem={{ memory | 4096 }}MB
#PBS -l walltime={{ walltime | 12:00:00 }}
#PBS -o {{ log_file | /dev/null }}
#PBS -j oe
##PBS -q default
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
In this file, `#PBS-*` defines command-line arguments to the `qsub` program.
* Queue: `#PBS-q default`. Use the queue with name *default*. This will most
likely not exist on your system, so choose the right name and uncomment by
removing the additional `#`
* For other options, see the PBS documentation. Do not change the identifiers
in curly braces (`{{ ... }}`), as they are used to fill in the right
variables.
Once this is done, the package will use your settings and no longer warn you of
the missing options.
### Torque
Set the following options in your _R_ session that will submit jobs:
```{r eval=FALSE}
options(
clustermq.scheduler = "Torque",
clustermq.template = "/path/to/file/below" # if using your own template
)
```
To supply your own template, save the contents below with any desired changes
to a file and have `clustermq.template` point to it.
```{sh eval=FALSE}
#PBS -N {{ job_name }}
#PBS -l nodes={{ n_jobs }}:ppn={{ cores | 1 }},walltime={{ walltime | 12:00:00 }}
#PBS -o {{ log_file | /dev/null }}
#PBS -j oe
##PBS -q default
ulimit -v $(( 1024 * {{ memory | 4096 }} ))
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'
```
In this file, `#PBS-*` defines command-line arguments to the `qsub` program.
* Queue: `#PBS-q default`. Use the queue with name *default*. This will most
likely not exist on your system, so choose the right name and uncomment by
removing the additional `#`
* For other options, see the Torque documentation. Do not change the
identifiers in curly braces (`{{ ... }}`), as they are used to fill in the
right variables.
Once this is done, the package will use your settings and no longer warn you of
the missing options.
### SSH {#ssh-template}
While SSH is not a scheduler, we can access remote schedulers via SSH. If you
want to use it, first make sure that `clustermq` works on your server with the
real scheduler. Only then move on to setting up SSH.
```{r eval=FALSE}
options(
clustermq.scheduler = "ssh",
clustermq.ssh.host = "myhost", # set this up in your local ~/.ssh/config
clustermq.ssh.log = "~/ssh_proxy.log", # log file on your HPC
clustermq.ssh.timeout = 30, # if changing default connection timeout
clustermq.template = "/path/to/file/below" # if using your own template
)
```
The default template is shown below. If `R` is not in your HPC `$PATH`, you may
need to specify its path or [load the required bash modules/conda
environments](https://github.com/mschubert/clustermq/issues/281).
To supply your own template, save its contents with any desired changes to a
file _on your local machine_ and have `clustermq.template` point to it.
```{sh eval=FALSE}
ssh -o "ExitOnForwardFailure yes" -f
-R {{ ctl_port }}:localhost:{{ local_port }}
-R {{ job_port }}:localhost:{{ fwd_port }}
{{ ssh_host }}
"R --no-save --no-restore -e
'clustermq:::ssh_proxy(ctl={{ ctl_port }}, job={{ job_port }})'
> {{ ssh_log | /dev/null }} 2>&1"
```