-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tar_make_clustermq not shutting down cleanly in recent versions of targets #265
Comments
Of note, I noticed you've added this message to the end of the build...
I get that message for |
Are you using development remotes::install_github("wlandau/targets")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'targets' from a github remote, the SHA1 (0aeabe85) has not changed since last install.
#> Use `force = TRUE` to force installation
packageDescription("targets")$GithubSHA1
#> [1] "0aeabe85431e5e10416f42253319cb5226558643"
library(targets)
writeLines("#$ -N {{ job_name }}
#$ -t 1-{{ n_jobs }}
#$ -j y
#$ -cwd
#$ -V
module load R/4.0.3
CMQ_AUTH={{ auth }} R --no-save --no-restore -e 'clustermq:::worker(\"{{ master }}\")'
", "cmq.tmpl")
tar_script({
options(
clustermq.scheduler = "sge",
clustermq.template = "cmq.tmpl"
)
sleep <- function(x) {
Sys.sleep(1)
x
}
list(
tar_target(x, seq_len(12L)),
tar_target(y, sleep(x), pattern = map(x)),
tar_target(z, sleep(y), pattern = map(y))
)
})
tar_make_clustermq(workers = 4L)
#> ● run target x
#> ● run branch y_29239c8a
#> ● run branch y_7cc32924
#> ● run branch y_bd602d50
#> ● run branch y_05f206d7
#> ● run branch y_d8bc1b56
#> ● run branch y_356ae18f
#> ● run branch y_47b4a323
#> ● run branch y_9964667b
#> ● run branch y_a4d6927f
#> ● run branch y_00e82c73
#> ● run branch y_7ffb6cff
#> ● run branch y_e45d295b
#> ● run branch z_d0697b30
#> ● run branch z_959da4d3
#> ● run branch z_92be5b5f
#> ● run branch z_f698fbf4
#> ● run branch z_80b956fa
#> ● run branch z_2a3aa09e
#> ● run branch z_ab885e3c
#> ● run branch z_ae3fc0b8
#> ● run branch z_ce6eccbc
#> ● run branch z_c3b501f1
#> ● run branch z_f0291ce7
#> ● run branch z_d742c285
#> Master: [13.0s 2.6% CPU]; Worker: [avg 8.2% CPU, max 280.6 Mb]
#> ● end pipeline
tar_read(z)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 But not after I install mschubert/clustermq@b61d8da. targets::tar_make_clustermq(workers = 4L)
#> ● run target x
#> ● run branch y_29239c8a
#> ● run branch y_7cc32924
#> ● run branch y_bd602d50
#> ● run branch y_05f206d7
#> ● run branch y_d8bc1b56
#> ● run branch y_356ae18f
#> ● run branch y_47b4a323
#> ● run branch y_9964667b
#> ● run branch y_a4d6927f
#> ● run branch y_00e82c73
#> ● run branch y_7ffb6cff
#> ● run branch y_e45d295b
#> ● run branch z_d0697b30
#> ● run branch z_959da4d3
#> ● run branch z_92be5b5f
#> ● run branch z_f698fbf4
#> ● run branch z_80b956fa
#> ● run branch z_2a3aa09e
#> ● run branch z_ab885e3c
#> ● run branch z_ae3fc0b8
#> ● run branch z_ce6eccbc
#> ● run branch z_c3b501f1
#> ● run branch z_f0291ce7
● run branch z_d742c285
#> Error in private$zmq$poll(sid, timeout) : 1 peer(s) lost
#> <CENSORED> has registered the job-array task 77764053.1 for deletion
#> <CENSORED> has registered the job-array task 77764053.4 for deletion
#> job 77764053.1 is already in deletion
#> job 77764053.2 is already in deletion
#> job 77764053.3 is already in deletion
#> job 77764053.4 is already in deletion
#> Error: callr subprocess failed: 1 peer(s) lost
#> Type .Last.error.trace to see where the error occured Doesn't seem to affect options(
clustermq.scheduler = "sge",
clustermq.template = "cmq.tmpl"
)
clustermq::Q(function(x) x, x = 1:3, n_jobs = 2)
#> Submitting 2 worker jobs (ID: cmq8202) ...
#> Running 3 calculations (0 objs/0 Mb common; 1 calls/chunk) ...
#> Master: [1.9s 0.7% CPU]; Worker: [avg 33.1% CPU, max 248.6 Mb] Maybe it's from a recent change to |
If the new worker API really is supposed to be different, I am afraid all I can do for now is just ...unless the recommended way to shut down workers is now just |
I wonder, could this be related to mschubert/clustermq#223? |
Yes, that's 223. Unfortunately, it's difficult to solve because my dev code behaves differently than the ZeroMQ documentation says it should (or at least not as I expected after reading the docs) |
Prework
Description
Hi @wlandau,
Has anything changed with
targets
recently that would change the way a pipeline finishes? With the last couple installations I get unclean shutdowns for some of myclustermq
workers, but the pipeline seems to actually finish successfully. Just seems to be related to how the main process is handled when finishing, perhaps?The text was updated successfully, but these errors were encountered: