-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071
Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071
Conversation
the ExtremeScale case will never fire in the current extremescale implementaiton, because an extreme scale executor is also a high throughput executor, and so the earlier htex case will fire. It is possible that extreme scale scaling was broken because of this case. This patch should not make it either better or worse, because it only eliminates dead code. when an executor is not an htex instance, no cases match, but no error is raised here, and so tasks_per_node is never assigned. Later on (line 206) use of tasks_per_node is an error. this entire case is removed, and executor.workers_per_node is always used.
…nforced/documented in the executor base classes. This patch makes it obligatory for statushandlingexecutors to have that, on the assumption that statushandlingexecutor will become generally a scaling-capable base class.
which will handle provider based scaling, rather than htex knowing about it. this should then more easily allow the workqueue executor to implement provider/block based scaling this might then merge with statushandlingexecutor entirely - as in, no executor would implement the statushandlingexecutor except via this block based executor this commit should be a fairly minimal attempt to extract code from htext and move it into a superclass, rather than any attempt to refactor the parent classes - that seems useful but should be in a subsequent commit
…ider-executor-abstraction
…for merging that class with status handling executor class
…-executor-abstraction
…ider-executor-abstraction
…ider-executor-abstraction
@@ -20,7 +20,7 @@ | |||
UnsupportedFeatureError | |||
) | |||
|
|||
from parsl.executors.status_handling import StatusHandlingExecutor | |||
from parsl.executors.status_handling import BlockProviderExecutor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if BlockManagingExecutor
might be clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are reasonable. I've added some comments about moving the scale_in logic, but not sure if that makes sense here. These changes would also affect funcX when a new parsl release is made, so we ought to do some sync up there (@kylechard)
@@ -124,6 +154,54 @@ def _filter_scale_in_ids(self, to_kill, killed): | |||
# Filters first iterable by bool values in second | |||
return list(compress(to_kill, killed)) | |||
|
|||
def scale_out(self, blocks: int = 1) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we move scale_in
here? I'm guessing it's because WQ doesn't have the hold block and then cancel mechanism? Maybe we could add the simpler method here, and have HTEX override it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scaling out and scaling in, despite having similar names, work really differently.
We've had experience with block style scaling out with multiple generations of parsl executors - ipp, htex(+exex) and wq. We don't have much experience, and the experience we've had so far has been pretty poor, with managing scaling in.
This PR attempts to factor out the stuff that we had long, positive experience with: scaling out.
I don't want it to try to have abstractions for things we do not understand well / cannot do well: scaling in.
parsl/executors/status_handling.py
Outdated
|
||
def _launch_block(self, block_id: str) -> Any: | ||
launch_cmd = self._get_launch_command(block_id) | ||
# if self.launch_cmd is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these comments since _get_launch_command
is now doing these steps.
commit 674291f Author: Ben Clifford <[email protected]> Date: Wed Aug 25 12:26:38 2021 +0000 Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071) when _get_launch_command was introduced removes 5-10 errors
commit 674291f Author: Ben Clifford <[email protected]> Date: Wed Aug 25 12:26:38 2021 +0000 Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071) when _get_launch_command was introduced removes 5-10 errors
The low latency executor has been broken since at least commit 674291f Author: Ben Clifford <[email protected]> Date: Wed Aug 25 12:26:38 2021 +0000 Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071) when _get_launch_command was introduced. So I think there are probably no active users.
Executors which want to use status handling should subclass the BlockProviderExecutor, formerly (PR #2071) known as the StatusHanldingExecutor. Code which performs status handling should only operate on BlockProviderExecutor instances - the status handling code shouldn't do anything to other ParslExecutor instances.
Executors which want to use status handling should subclass the BlockProviderExecutor, formerly (PR #2071) known as the StatusHanldingExecutor. Code which performs status handling should only operate on BlockProviderExecutor instances - the status handling code shouldn't do anything to other ParslExecutor instances.
Description
Several (most) executors past and present (ipp, htex, and in this PR, WorkQueue) make use of "blocks": a collection of worker processes launched using Providers, Launchers and Channels usually into some kind of batch system - but not all executors, though.
The
StatusHandlingExecutor
provides a base class that executor implementations to give some common error handling behaviour related to blocks.This PR moves more block handling code from the high throughput executor in that base class, and renames it
BlockProviderExecutor
. The immediate intention is to allow the block handling code from the high throughput executor to be re-used by the WorkQueue executor.This pushes subclasses of the (former)
StatusHandlingExecutor
to use blocks in the same way as the high throughput executor does. I think that is fine. An executor can still be implemented without any parsl support for block-style scaling (as happens with the ThreadPoolExecutor) and can still make use of providers, launchers and channels if desired.After making that abstraction, this PR then uses
BlockProviderExecutor
to allow the workqueue executor to use the block system in a similar way to HighThroughputExecutor. This has been tested with the DESC project for a couple of months.As with some uses of HighThroughputExecutor, but even more so due to per-task resource specification, when using WorkQueue the scaling code doesn't always know the number of simultaneous tasks that can be executed in a block.
Type of change