Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

Merged
merged 28 commits into from
Aug 25, 2021

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented May 17, 2021

Description

Several (most) executors past and present (ipp, htex, and in this PR, WorkQueue) make use of "blocks": a collection of worker processes launched using Providers, Launchers and Channels usually into some kind of batch system - but not all executors, though.

The StatusHandlingExecutor provides a base class that executor implementations to give some common error handling behaviour related to blocks.

This PR moves more block handling code from the high throughput executor in that base class, and renames it BlockProviderExecutor. The immediate intention is to allow the block handling code from the high throughput executor to be re-used by the WorkQueue executor.

This pushes subclasses of the (former) StatusHandlingExecutor to use blocks in the same way as the high throughput executor does. I think that is fine. An executor can still be implemented without any parsl support for block-style scaling (as happens with the ThreadPoolExecutor) and can still make use of providers, launchers and channels if desired.

After making that abstraction, this PR then uses BlockProviderExecutor to allow the workqueue executor to use the block system in a similar way to HighThroughputExecutor. This has been tested with the DESC project for a couple of months.

As with some uses of HighThroughputExecutor, but even more so due to per-task resource specification, when using WorkQueue the scaling code doesn't always know the number of simultaneous tasks that can be executed in a block.

Type of change

  • New feature (non-breaking change that adds functionality)

the ExtremeScale case will never fire in the current extremescale implementaiton,
because an extreme scale executor is also a high throughput executor, and so the
earlier htex case will fire.

It is possible that extreme scale scaling was broken because of this case. This
patch should not make it either better or worse, because it only eliminates dead
code.

when an executor is not an htex instance, no cases match, but no error is raised
here, and so tasks_per_node is never assigned. Later on (line 206) use of
tasks_per_node is an error.

this entire case is removed, and executor.workers_per_node is always used.
…nforced/documented in the executor base classes.

This patch makes it obligatory for statushandlingexecutors to have that, on the assumption
that statushandlingexecutor will become generally a scaling-capable base class.
which will handle provider based scaling, rather than htex knowing about
it.

this should then more easily allow the workqueue executor to implement
provider/block based scaling

this might then merge with statushandlingexecutor entirely - as in,
no executor would implement the statushandlingexecutor except via this
block based executor

this commit should be a fairly minimal attempt to extract code from
htext and move it into a superclass, rather than any attempt to refactor
the parent classes - that seems useful but should be in a subsequent commit
…for merging that class with status handling executor class
@benclifford benclifford changed the title Block provider abstraction Abstract more block handling from HighThroughputExecutor May 17, 2021
@benclifford benclifford changed the title Abstract more block handling from HighThroughputExecutor Abstract more block handling from HighThroughputExecutor and share with WorkQueue May 17, 2021
@benclifford benclifford added this to the 1.2 milestone May 19, 2021
@benclifford benclifford requested a review from yadudoc May 19, 2021 10:10
@benclifford benclifford marked this pull request as ready for review May 19, 2021 10:58
@@ -20,7 +20,7 @@
UnsupportedFeatureError
)

from parsl.executors.status_handling import StatusHandlingExecutor
from parsl.executors.status_handling import BlockProviderExecutor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if BlockManagingExecutor might be clearer

Copy link
Member

@yadudoc yadudoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are reasonable. I've added some comments about moving the scale_in logic, but not sure if that makes sense here. These changes would also affect funcX when a new parsl release is made, so we ought to do some sync up there (@kylechard)

@@ -124,6 +154,54 @@ def _filter_scale_in_ids(self, to_kill, killed):
# Filters first iterable by bool values in second
return list(compress(to_kill, killed))

def scale_out(self, blocks: int = 1) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we move scale_in here? I'm guessing it's because WQ doesn't have the hold block and then cancel mechanism? Maybe we could add the simpler method here, and have HTEX override it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scaling out and scaling in, despite having similar names, work really differently.

We've had experience with block style scaling out with multiple generations of parsl executors - ipp, htex(+exex) and wq. We don't have much experience, and the experience we've had so far has been pretty poor, with managing scaling in.

This PR attempts to factor out the stuff that we had long, positive experience with: scaling out.

I don't want it to try to have abstractions for things we do not understand well / cannot do well: scaling in.


def _launch_block(self, block_id: str) -> Any:
launch_cmd = self._get_launch_command(block_id)
# if self.launch_cmd is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these comments since _get_launch_command is now doing these steps.

@benclifford benclifford merged commit 674291f into master Aug 25, 2021
@benclifford benclifford deleted the benc-block-provider-executor-abstraction branch August 25, 2021 12:26
benclifford added a commit that referenced this pull request Jan 19, 2023
commit 674291f
Author: Ben Clifford <[email protected]>
Date:   Wed Aug 25 12:26:38 2021 +0000

    Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071)


when _get_launch_command was introduced

removes 5-10 errors
benclifford added a commit that referenced this pull request Jan 25, 2023
commit 674291f
Author: Ben Clifford <[email protected]>
Date:   Wed Aug 25 12:26:38 2021 +0000

    Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071)


when _get_launch_command was introduced

removes 5-10 errors
benclifford added a commit that referenced this pull request Jan 31, 2023
The low latency executor has been broken since at least

commit 674291f
Author: Ben Clifford <[email protected]>
Date:   Wed Aug 25 12:26:38 2021 +0000

    Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071)
when _get_launch_command was introduced.

So I think there are probably no active users.
benclifford added a commit that referenced this pull request Aug 7, 2023
Executors which want to use status handling should subclass the
BlockProviderExecutor, formerly (PR #2071) known as the
StatusHanldingExecutor.

Code which performs status handling should only operate on
BlockProviderExecutor instances - the status handling code shouldn't
do anything to other ParslExecutor instances.
yadudoc pushed a commit that referenced this pull request Aug 7, 2023
Executors which want to use status handling should subclass the
BlockProviderExecutor, formerly (PR #2071) known as the
StatusHanldingExecutor.

Code which performs status handling should only operate on
BlockProviderExecutor instances - the status handling code shouldn't
do anything to other ParslExecutor instances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants