You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HTEX has a poll_period parameter that defaults to 10ms.
This is used in two places:
in a tight loop in the interchange
in a tight loop in the process_worker_pool
In the process worker pool, this affects how often results are sent back to the interchange, and so how often the worker pool receives new tasks, and so has some effect on throughput.
In the interchange, this only affects how long it takes for the interchange to notice that kill_event has been set - this happens only in the (probably rare) circumstance that a manager with the wrong parsl/python version registers, and that only happens inside the loop that kill_event is guarding.
This generates "nothing happening in the interchange" log messages at quite a large volume, probably unnecessarily.
This poll period could probably be made much larger (1 second?) depending on how fast the interchange should be expected to exit after a bad manager registration.
Other things to consider:
This loop does not need to keep iterating at all on inactivity: nothing will set the kill event except the body of the loop, and that will be discovered once, at the end of the loop body/start of the next iteration.
Using exceptions rather than an exit flag might be better (or worse)
To Reproduce
Turn on worker_debug in htex and watch interchange.log grow.
changed my mind: this poll period is also used as part of the communication protocol between the task puller thread (which gets tasks from the submit side and puts them on self.pending_task_queue, and the main thread which only looks for tasks when the poll loop in the main thread loops. That happens when either something happens on two other socket, or when the poll period expires.
Describe the bug
HTEX has a
poll_period
parameter that defaults to 10ms.This is used in two places:
In the process worker pool, this affects how often results are sent back to the interchange, and so how often the worker pool receives new tasks, and so has some effect on throughput.
In the interchange, this only affects how long it takes for the interchange to notice that
kill_event
has been set - this happens only in the (probably rare) circumstance that a manager with the wrong parsl/python version registers, and that only happens inside the loop that kill_event is guarding.This generates "nothing happening in the interchange" log messages at quite a large volume, probably unnecessarily.
This poll period could probably be made much larger (1 second?) depending on how fast the interchange should be expected to exit after a bad manager registration.
Other things to consider:
This loop does not need to keep iterating at all on inactivity: nothing will set the kill event except the body of the loop, and that will be discovered once, at the end of the loop body/start of the next iteration.
Using exceptions rather than an exit flag might be better (or worse)
To Reproduce
Turn on worker_debug in htex and watch interchange.log grow.
Expected behavior
This doesn't need to happen.
Environment
Parsl
master
branch at 92c5802The text was updated successfully, but these errors were encountered: