Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

benclifford · 2021-05-17T09:11:52Z

Description

Several (most) executors past and present (ipp, htex, and in this PR, WorkQueue) make use of "blocks": a collection of worker processes launched using Providers, Launchers and Channels usually into some kind of batch system - but not all executors, though.

The StatusHandlingExecutor provides a base class that executor implementations to give some common error handling behaviour related to blocks.

This PR moves more block handling code from the high throughput executor in that base class, and renames it BlockProviderExecutor. The immediate intention is to allow the block handling code from the high throughput executor to be re-used by the WorkQueue executor.

This pushes subclasses of the (former) StatusHandlingExecutor to use blocks in the same way as the high throughput executor does. I think that is fine. An executor can still be implemented without any parsl support for block-style scaling (as happens with the ThreadPoolExecutor) and can still make use of providers, launchers and channels if desired.

After making that abstraction, this PR then uses BlockProviderExecutor to allow the workqueue executor to use the block system in a similar way to HighThroughputExecutor. This has been tested with the DESC project for a couple of months.

As with some uses of HighThroughputExecutor, but even more so due to per-task resource specification, when using WorkQueue the scaling code doesn't always know the number of simultaneous tasks that can be executed in a block.

Type of change

New feature (non-breaking change that adds functionality)

the ExtremeScale case will never fire in the current extremescale implementaiton, because an extreme scale executor is also a high throughput executor, and so the earlier htex case will fire. It is possible that extreme scale scaling was broken because of this case. This patch should not make it either better or worse, because it only eliminates dead code. when an executor is not an htex instance, no cases match, but no error is raised here, and so tasks_per_node is never assigned. Later on (line 206) use of tasks_per_node is an error. this entire case is removed, and executor.workers_per_node is always used.

…nforced/documented in the executor base classes. This patch makes it obligatory for statushandlingexecutors to have that, on the assumption that statushandlingexecutor will become generally a scaling-capable base class.

which will handle provider based scaling, rather than htex knowing about it. this should then more easily allow the workqueue executor to implement provider/block based scaling this might then merge with statushandlingexecutor entirely - as in, no executor would implement the statushandlingexecutor except via this block based executor this commit should be a fairly minimal attempt to extract code from htext and move it into a superclass, rather than any attempt to refactor the parent classes - that seems useful but should be in a subsequent commit

…ider-executor-abstraction

…for merging that class with status handling executor class

…-executor-abstraction

…ider-executor-abstraction

…r_node-general

…ider-executor-abstraction

…-executor-abstraction

yadudoc · 2021-08-24T15:57:24Z

parsl/executors/high_throughput/executor.py

@@ -20,7 +20,7 @@
    UnsupportedFeatureError
 )

-from parsl.executors.status_handling import StatusHandlingExecutor
+from parsl.executors.status_handling import BlockProviderExecutor


I wonder if BlockManagingExecutor might be clearer

yadudoc

These changes are reasonable. I've added some comments about moving the scale_in logic, but not sure if that makes sense here. These changes would also affect funcX when a new parsl release is made, so we ought to do some sync up there (@kylechard)

yadudoc · 2021-08-24T16:03:34Z

parsl/executors/status_handling.py

@@ -124,6 +154,54 @@ def _filter_scale_in_ids(self, to_kill, killed):
        # Filters first iterable by bool values in second
        return list(compress(to_kill, killed))

+    def scale_out(self, blocks: int = 1) -> List[str]:


Why don't we move scale_in here? I'm guessing it's because WQ doesn't have the hold block and then cancel mechanism? Maybe we could add the simpler method here, and have HTEX override it.

Scaling out and scaling in, despite having similar names, work really differently.

We've had experience with block style scaling out with multiple generations of parsl executors - ipp, htex(+exex) and wq. We don't have much experience, and the experience we've had so far has been pretty poor, with managing scaling in.

This PR attempts to factor out the stuff that we had long, positive experience with: scaling out.

I don't want it to try to have abstractions for things we do not understand well / cannot do well: scaling in.

yadudoc · 2021-08-24T19:53:13Z

parsl/executors/status_handling.py

+
+    def _launch_block(self, block_id: str) -> Any:
+        launch_cmd = self._get_launch_command(block_id)
+        # if self.launch_cmd is None:


We can remove these comments since _get_launch_command is now doing these steps.

…-executor-abstraction

commit 674291f Author: Ben Clifford <[email protected]> Date: Wed Aug 25 12:26:38 2021 +0000 Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071) when _get_launch_command was introduced removes 5-10 errors

The low latency executor has been broken since at least commit 674291f Author: Ben Clifford <[email protected]> Date: Wed Aug 25 12:26:38 2021 +0000 Abstract more block handling from HighThroughputExecutor and share with WorkQueue (#2071) when _get_launch_command was introduced. So I think there are probably no active users.

Executors which want to use status handling should subclass the BlockProviderExecutor, formerly (PR #2071) known as the StatusHanldingExecutor. Code which performs status handling should only operate on BlockProviderExecutor instances - the status handling code shouldn't do anything to other ParslExecutor instances.

benclifford added 14 commits April 22, 2021 09:34

Merge branch 'master' into benc/strategy-tasks_per_node-general

d84591c

remove unused old scaleout for wq

40a8c75

Merge branch 't/strategy-tasks_per_node-general' into benc-block-prov…

3011963

…ider-executor-abstraction

Move BlockProviderExecutor in status handling source, in preparation …

a0602d0

…for merging that class with status handling executor class

Merge StatusHandling and BlockProvider executor base classes

ce0f086

Merge remote-tracking branch 'origin/master' into benc-block-provider…

0bda48f

…-executor-abstraction

Merge branch 'master' into benc/strategy-tasks_per_node-general

c32e3e6

Merge branch 't/strategy-tasks_per_node-general' into benc-block-prov…

e8c835d

…ider-executor-abstraction

Merge remote-tracking branch 'origin/master' into t/strategy-tasks_pe…

7c46c16

…r_node-general

Merge branch 't/strategy-tasks_per_node-general' into benc-block-prov…

ab1a8e0

…ider-executor-abstraction

fix mis-resolved merge typo

99e163d

benclifford changed the title ~~Block provider abstraction~~ Abstract more block handling from HighThroughputExecutor May 17, 2021

benclifford changed the title ~~Abstract more block handling from HighThroughputExecutor~~ Abstract more block handling from HighThroughputExecutor and share with WorkQueue May 17, 2021

benclifford added 7 commits May 17, 2021 09:32

Rename StatusHandlingExecutor in two human readable strings

876b862

regenerate doc stubs

64c9ef1

Add a docstring for BlockProviderExecutor

2a01ba7

More docs and type annotations

3f3b0a5

fix punctuation typos

66e4b89

Change a TODO into a requirement of the superclass

de9cc56

Tidy TODO in docstring

05d6db2

benclifford added this to the 1.2 milestone May 19, 2021

benclifford requested a review from yadudoc May 19, 2021 10:10

benclifford marked this pull request as ready for review May 19, 2021 10:58

benclifford added 4 commits June 30, 2021 07:52

remove stubs

10a6817

Merge remote-tracking branch 'origin/master' into benc-block-provider…

eb00704

…-executor-abstraction

Merge branch 'master' into benc-block-provider-executor-abstraction

662fae3

Merge branch 'master' into benc-block-provider-executor-abstraction

98d3c74

benclifford mentioned this pull request Jul 27, 2021

Provider scaling with WorkQueue #2031

Closed

Merge branch 'master' into benc-block-provider-executor-abstraction

dca99b9

yadudoc reviewed Aug 24, 2021

View reviewed changes

yadudoc approved these changes Aug 24, 2021

View reviewed changes

benclifford added 2 commits August 25, 2021 09:59

Merge remote-tracking branch 'origin/master' into benc-block-provider…

0f07554

…-executor-abstraction

Remove commented out dead code

47d440b

benclifford merged commit 674291f into master Aug 25, 2021

benclifford deleted the benc-block-provider-executor-abstraction branch August 25, 2021 12:26

benclifford mentioned this pull request Jun 22, 2022

Add WorkQueueExecutor to parsl docs #1326

Closed

benclifford mentioned this pull request Aug 7, 2023

Remove NoStatusHandlingExecutor #2855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

benclifford commented May 17, 2021 •

edited

Loading

yadudoc Aug 24, 2021

yadudoc left a comment

yadudoc Aug 24, 2021

benclifford Aug 25, 2021

yadudoc Aug 24, 2021

Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

Abstract more block handling from HighThroughputExecutor and share with WorkQueue #2071

Conversation

benclifford commented May 17, 2021 • edited Loading

Description

Type of change

yadudoc Aug 24, 2021

Choose a reason for hiding this comment

yadudoc left a comment

Choose a reason for hiding this comment

yadudoc Aug 24, 2021

Choose a reason for hiding this comment

benclifford Aug 25, 2021

Choose a reason for hiding this comment

yadudoc Aug 24, 2021

Choose a reason for hiding this comment

benclifford commented May 17, 2021 •

edited

Loading