Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement resource hints for the SlurmProvider #1217

Merged
merged 6 commits into from
Aug 27, 2019

Conversation

annawoodard
Copy link
Collaborator

@annawoodard annawoodard commented Aug 23, 2019

Partially addresses #942.

This commit adds the cores_per_node and mem_per_node keyword args to
the SlurmProvider. These default to None, and behavior is not modified
in the default case. Setting either has three effects. First, it modifies
the Slurm submit script to request the appropriate cores and/or memory.
Second, it sets the environment variables PARSL_MEMORY_GB and
PARSL_CORES on the node. Finally, the workers_per_node attribute is added to the HighThroughputExecutor which will be calculated according to the resource hints, if they are available. This is read by the strategy piece, enabling a more accurate calculation for scaling resources up and down. An example configuration, tested on Midway, is provided below. This configuration requests 4 1-core workers, each with 3 GB of memory.

from parsl.config import Config
from parsl.providers import SlurmProvider
from parsl.addresses import address_by_hostname
from parsl.executors import HighThroughputExecutor

config = Config(
    executors=[
        HighThroughputExecutor(
            cores_per_worker=1,
            mem_per_worker=3,
            address=address_by_hostname(),
            provider=SlurmProvider(
                'broadwl',
                nodes_per_block=1,
                init_blocks=1,
                min_blocks=1,
                max_blocks=1,
                mem_per_node=12,
                cores_per_node=4,
                exclusive=False
            ),
        )
    ],
)

Partially addresses #942.

This commit adds the `cores_per_node` and `mem_per_node` keyword args to
the SlurmProvider. These default to None, and behavior is not modified
in the default case. Setting either has three effects. First, it
modifies the Slurm submit script to request the appropriate cores and/or
memory.  Second, it sets the environment variables `PARSL_MEMORY_GB` and
`PARSL_CORES` on the node. Finally, the `workers_per_node` attribute is
added to the `HighThroughputExecutor` which will be calculated according
to the resource hints, if they are available. This is read by the
strategy piece, enabling a more accurate calculation for scaling
resources up and down. An example configuration, tested on Midway, is
provided below. This configuration requests 4 1-core workers, each with
3 GB of memory.

```
from parsl.config import Config
from parsl.providers import SlurmProvider
from parsl.addresses import address_by_hostname
from parsl.executors import HighThroughputExecutor

config = Config(
    executors=[
        HighThroughputExecutor(
            cores_per_worker=1,
            mem_per_worker=3,
            address=address_by_hostname(),
            provider=SlurmProvider(
                'broadwl',
                nodes_per_block=1,
                init_blocks=1,
                min_blocks=1,
                max_blocks=1,
                mem_per_node=12,
                cores_per_node=4,
                exclusive=False
            ),
        )
    ],
)
```
@annawoodard annawoodard force-pushed the support-env-resource-spec-#942 branch from 90f13e6 to bb00a12 Compare August 23, 2019 20:33
@benclifford
Copy link
Collaborator

benclifford commented Aug 25, 2019

CI failing because mypy upset that:

error: "ExecutionProvider" has no attribute "mem_per_node"
error: "ExecutionProvider" has no attribute "cores_per_node"

because mypy doesn't infer that those fields will exist, even though it's inside a protective hasattr - see mypy issue 1424 for context.

discussion on slack leads to either:

  • define those fields as None in ExecutionProvider (and then hasattr isn't needed at all)

or

  • replace hasattr test with x = getattr(self, 'cores_per_node', None) which will always work.

Either of those approaches seem fine to me. The first one may be better for capturing the provider API in one place - the definition of ExecutionProvider.

@ZhuozhaoLi
Copy link
Contributor

The workers_per_node is computed at the executor. I think the manager in process_worker_pool uses another mechanism to determine the number of workers (check the entire node's resource with psutil). I feel like the workers_per_node should also be passed to process_worker_pool so that the manager launches the correct number of workers?

@annawoodard
Copy link
Collaborator Author

@ZhuozhaoLi The process_worker_pool can now be passed the cores available to it via the environment variable PARSL_CORES (see #1207).

@benclifford
Copy link
Collaborator

I think it's important to be clear which config fields are "I am asking parsl to request this attribute be true" Vs "I am asserting that, because of the nature of the resource and other things specified that parsl doesn't necessary know about, this fact is true". It's been an ongoing confusion for me with eg MPI and batch systems for two decades and I see other users encountering the same confusion when I talk to them.

PARSL_CORES in master now is a way to assert to parsl what the truth is that you know outside of parsls ability to otherwise discover that truth.

I'm not sure which of the fields this PR adds is one or other of the above

@ZhuozhaoLi
Copy link
Contributor

I see. Did not realize the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants