-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configure some iterations to run in the same job? #32
Comments
If I understand you correctly, this can be achieved by chunking jobs: https://mllg.github.io/batchtools/reference/submitJobs.html#chunking-of-jobs |
Hi Seb but I need to use array jobs at the same time, is that compatible? Usually to create a single job array (with many tasks that do some similar calculations), I do something like job.table <- batchtools::getJobTable(reg=reg)
chunks <- data.frame(job.table, chunk=1)
batchtools::submitJobs(chunks, resources=list(
walltime = 24*60*60,#seconds
memory = 2000,#megabytes per cpu
ncpus=1, #>1 for multicore/parallel jobs.
ntasks=1, #>1 for MPI jobs.
chunks.as.arrayjobs=TRUE), reg=reg) so then so at least from the docs, sections "Chunking of Jobs" and "Array Jobs" it seems that there are two different uses for
I would like to do both at the same time. So I wonder if we could specify |
probably best for me would be if I could specify this as an argument to batchmark (bench.grid <- mlr3::benchmark_grid(
tasks=task.list,
learners=learner.list,
resamplings=train.test.cv))
mlr3batchmark::batchmark(
bench.grid, store_models = TRUE, reg=reg,
job.for.each=c("resampling","task")) #could also specify "learner" here does that seem reasonable to you? or would you suggest another approach? If you don't have time to write this functionality, I could give it a try, but it would be useful to have some guidance about what you think the interface should look like /where would be the best place to edit the current code. |
Why do you want to run multiple resampling iterations in one job? |
some clusters have limits on the number of jobs/tasks that can be running/queued simultaneously. |
Yes, I understand the problem. In this case, you would normally set |
or another option: Why don't you merge more jobs into one array job? As far as I know, they are then counted as one job on a Slurm cluster. If you use |
I did not know about max.concurrent.jobs, that is helpful.
If I understand your suggestion correctly, you think I could assign all 1200 batchtools jobs in a single slurm job with 1200 tasks? Actually on our cluster I think each task in the job array is counted toward the limit of 1000, so that solution would not work. I believe that is implemented via https://slurm.schedmd.com/resource_limits.html#assoc_maxsubmitjobs but that page does not specify how job arrays are handled. |
Okay then use |
thanks for the quick feedback! I guess that could be a work-around in the short term. |
I don't think this requires a change to mlr3batchtmark but to batchtools. @mllg (maintainer) Do you think this could be useful? |
Hi @sebffischer
I was wondering if it is currently possible to run different benchmark iterations in the same cluster job?
In particular I would like to tell mlr3batchmark to create a new job for every data set and cross-validation fold, but have all the different algorithms run one after another in the same job. is that possible?
The text was updated successfully, but these errors were encountered: