You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now jobs are split by number of files. However, the number of entries varies wildly between nanoAOD files. If the submission routine in pico.py allows for event-based splitting of jobs, it would make it possible to create jobs and output files more uniform in length and size, and have easier finetuning of batch submission parameters such as maximum run time. With event-splitting, smaller files can be combined into one job, or a single large file can be split into several jobs.
The trickiest part is to save it in this config format for bookkeeping in the resubmission and status routines. This is where a lot of bugs might creep in if the information is not stored and retrieved correctly. The simplest and most compact would be to simply add it to the end of the usual filename in the chunk dictionary of the config JSON file,
Right now jobs are split by number of files. However, the number of entries varies wildly between nanoAOD files. If the submission routine in
pico.py
allows for event-based splitting of jobs, it would make it possible to create jobs and output files more uniform in length and size, and have easier finetuning of batch submission parameters such as maximum run time. With event-splitting, smaller files can be combined into one job, or a single large file can be split into several jobs.It would not be too hard to implement–I think.
The post-processor already allows to define a start event index and maximum number of events, so "all" one needs to do it add this as an option for the job argument list.
But first one needs to split the files into chunks that may overlap over not. Right now chunks are made here:
TauFW/PicoProducer/scripts/pico.py
Line 671 in 4a6311c
Currently, the chunks are saved as a dictionary in the JSON job config file for bookkeeping during resubmission, e.g.
The trickiest part is to save it in this config format for bookkeeping in the resubmission and status routines. This is where a lot of bugs might creep in if the information is not stored and retrieved correctly. The simplest and most compact would be to simply add it to the end of the usual filename in the chunk dictionary of the config JSON file,
and parse it in
checkchunks
.It should be possible. I plan to implement it in the near future.
The text was updated successfully, but these errors were encountered: