Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interchange poll period makes the interchange generate load unnecessarily #3659

Open
benclifford opened this issue Oct 25, 2024 · 1 comment

Comments

@benclifford
Copy link
Collaborator

Describe the bug

HTEX has a poll_period parameter that defaults to 10ms.

This is used in two places:

  • in a tight loop in the interchange
  • in a tight loop in the process_worker_pool

In the process worker pool, this affects how often results are sent back to the interchange, and so how often the worker pool receives new tasks, and so has some effect on throughput.

In the interchange, this only affects how long it takes for the interchange to notice that kill_event has been set - this happens only in the (probably rare) circumstance that a manager with the wrong parsl/python version registers, and that only happens inside the loop that kill_event is guarding.

This generates "nothing happening in the interchange" log messages at quite a large volume, probably unnecessarily.

This poll period could probably be made much larger (1 second?) depending on how fast the interchange should be expected to exit after a bad manager registration.

Other things to consider:

  • This loop does not need to keep iterating at all on inactivity: nothing will set the kill event except the body of the loop, and that will be discovered once, at the end of the loop body/start of the next iteration.

  • Using exceptions rather than an exit flag might be better (or worse)

To Reproduce
Turn on worker_debug in htex and watch interchange.log grow.

Expected behavior
This doesn't need to happen.

Environment
Parsl master branch at 92c5802

@benclifford
Copy link
Collaborator Author

changed my mind: this poll period is also used as part of the communication protocol between the task puller thread (which gets tasks from the submit side and puts them on self.pending_task_queue, and the main thread which only looks for tasks when the poll loop in the main thread loops. That happens when either something happens on two other socket, or when the poll period expires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant