Implement resource hints for the SlurmProvider #1217

annawoodard · 2019-08-23T20:22:56Z

Partially addresses #942.

This commit adds the cores_per_node and mem_per_node keyword args to
the SlurmProvider. These default to None, and behavior is not modified
in the default case. Setting either has three effects. First, it modifies
the Slurm submit script to request the appropriate cores and/or memory.
Second, it sets the environment variables PARSL_MEMORY_GB and
PARSL_CORES on the node. Finally, the workers_per_node attribute is added to the HighThroughputExecutor which will be calculated according to the resource hints, if they are available. This is read by the strategy piece, enabling a more accurate calculation for scaling resources up and down. An example configuration, tested on Midway, is provided below. This configuration requests 4 1-core workers, each with 3 GB of memory.

from parsl.config import Config
from parsl.providers import SlurmProvider
from parsl.addresses import address_by_hostname
from parsl.executors import HighThroughputExecutor

config = Config(
    executors=[
        HighThroughputExecutor(
            cores_per_worker=1,
            mem_per_worker=3,
            address=address_by_hostname(),
            provider=SlurmProvider(
                'broadwl',
                nodes_per_block=1,
                init_blocks=1,
                min_blocks=1,
                max_blocks=1,
                mem_per_node=12,
                cores_per_node=4,
                exclusive=False
            ),
        )
    ],
)

Partially addresses #942. This commit adds the `cores_per_node` and `mem_per_node` keyword args to the SlurmProvider. These default to None, and behavior is not modified in the default case. Setting either has three effects. First, it modifies the Slurm submit script to request the appropriate cores and/or memory. Second, it sets the environment variables `PARSL_MEMORY_GB` and `PARSL_CORES` on the node. Finally, the `workers_per_node` attribute is added to the `HighThroughputExecutor` which will be calculated according to the resource hints, if they are available. This is read by the strategy piece, enabling a more accurate calculation for scaling resources up and down. An example configuration, tested on Midway, is provided below. This configuration requests 4 1-core workers, each with 3 GB of memory. ``` from parsl.config import Config from parsl.providers import SlurmProvider from parsl.addresses import address_by_hostname from parsl.executors import HighThroughputExecutor config = Config( executors=[ HighThroughputExecutor( cores_per_worker=1, mem_per_worker=3, address=address_by_hostname(), provider=SlurmProvider( 'broadwl', nodes_per_block=1, init_blocks=1, min_blocks=1, max_blocks=1, mem_per_node=12, cores_per_node=4, exclusive=False ), ) ], ) ```

benclifford · 2019-08-25T23:31:25Z

CI failing because mypy upset that:

error: "ExecutionProvider" has no attribute "mem_per_node"
error: "ExecutionProvider" has no attribute "cores_per_node"

because mypy doesn't infer that those fields will exist, even though it's inside a protective hasattr - see mypy issue 1424 for context.

discussion on slack leads to either:

define those fields as None in ExecutionProvider (and then hasattr isn't needed at all)

or

replace hasattr test with x = getattr(self, 'cores_per_node', None) which will always work.

Either of those approaches seem fine to me. The first one may be better for capturing the provider API in one place - the definition of ExecutionProvider.

ZhuozhaoLi · 2019-08-26T00:00:01Z

The workers_per_node is computed at the executor. I think the manager in process_worker_pool uses another mechanism to determine the number of workers (check the entire node's resource with psutil). I feel like the workers_per_node should also be passed to process_worker_pool so that the manager launches the correct number of workers?

annawoodard · 2019-08-26T01:35:06Z

@ZhuozhaoLi The process_worker_pool can now be passed the cores available to it via the environment variable PARSL_CORES (see #1207).

benclifford · 2019-08-26T01:42:15Z

I think it's important to be clear which config fields are "I am asking parsl to request this attribute be true" Vs "I am asserting that, because of the nature of the resource and other things specified that parsl doesn't necessary know about, this fact is true". It's been an ongoing confusion for me with eg MPI and batch systems for two decades and I see other users encountering the same confusion when I talk to them.

PARSL_CORES in master now is a way to assert to parsl what the truth is that you know outside of parsls ability to otherwise discover that truth.

I'm not sure which of the fields this PR adds is one or other of the above

ZhuozhaoLi · 2019-08-26T14:58:18Z

I see. Did not realize the other PR.

Also: clarify docstrings.

Also: fix incomplete conversion to property.

annawoodard force-pushed the support-env-resource-spec-#942 branch from 90f13e6 to bb00a12 Compare August 23, 2019 20:33

Add default mem_per_node and cores_per_node

08c1a26

annawoodard and others added 4 commits August 27, 2019 11:34

Switch to properties in base class

54d7e43

Also: clarify docstrings.

Fix flake8

ea64a21

Also: fix incomplete conversion to property.

Fix setter definition

15a9ffc

Merge branch 'master' into support-env-resource-spec-#942

0e5e361

benclifford approved these changes Aug 27, 2019

View reviewed changes

annawoodard merged commit 636ea59 into master Aug 27, 2019

annawoodard deleted the support-env-resource-spec-#942 branch August 27, 2019 18:56

Lnaden mentioned this pull request Oct 9, 2019

Adding retry mechanism for when command channel fails with ZMQErrors #1321

Merged

annawoodard mentioned this pull request Oct 9, 2019

Understand effect of resource hints on task mapping with Slurm #1341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement resource hints for the SlurmProvider #1217

Implement resource hints for the SlurmProvider #1217

annawoodard commented Aug 23, 2019 •

edited

Loading

benclifford commented Aug 25, 2019 •

edited

Loading

ZhuozhaoLi commented Aug 26, 2019

annawoodard commented Aug 26, 2019

benclifford commented Aug 26, 2019

ZhuozhaoLi commented Aug 26, 2019

Implement resource hints for the SlurmProvider #1217

Implement resource hints for the SlurmProvider #1217

Conversation

annawoodard commented Aug 23, 2019 • edited Loading

benclifford commented Aug 25, 2019 • edited Loading

ZhuozhaoLi commented Aug 26, 2019

annawoodard commented Aug 26, 2019

benclifford commented Aug 26, 2019

ZhuozhaoLi commented Aug 26, 2019

annawoodard commented Aug 23, 2019 •

edited

Loading

benclifford commented Aug 25, 2019 •

edited

Loading