Skip to content

Commit

Permalink
Remove defaults for Manager init, forcing keywords to be specified, a…
Browse files Browse the repository at this point in the history
…nd perhaps revealign a bug that start_method is unused (#2973)

This PR removes theoretically-always-unused argument defaults in the process worker pool manager, and forces arguments to be supplied as kwargs.

These defaults have two downsides:

i) users reading the code might believe the defaults will be used as end-Parsl-user defaults. However, generally there will be up to two other layers of defaults that will override the defaults removed by this PR: defaults specified in the argparser of process_worker_pool; and defaults specified in the constructor to the HighThroughputExecutor object.

These defaults, then, are misleading, as they are defaults only in the presence of a coding error such as ii) below.

ii) in the presence of a coding error adding a new parameter, where the new parameter is not wired all the way through, rather than raise an error about a missing parameter, the otherwise-unused default is used. This was a practical problem demonstrated by the threads/spawn/fork start method parameter introduced in #2433 and removed as broken in #2980.

Switching to mandatory kwargs can help with accidentally misarranged parameters in the construction of a Manager object, a similar but different coding error to ii) above.
  • Loading branch information
benclifford authored Dec 12, 2023
1 parent 982fdca commit 15290a9
Showing 1 changed file with 23 additions and 25 deletions.
48 changes: 23 additions & 25 deletions parsl/executors/high_throughput/process_worker_pool.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,22 +48,22 @@ class Manager:
| | IPC-Qeueues
"""
def __init__(self,
addresses="127.0.0.1",
address_probe_timeout=30,
task_port="50097",
result_port="50098",
cores_per_worker=1,
mem_per_worker=None,
max_workers=float('inf'),
prefetch_capacity=0,
uid=None,
block_id=None,
heartbeat_threshold=120,
heartbeat_period=30,
poll_period=10,
cpu_affinity=False,
available_accelerators: Sequence[str] = ()):
def __init__(self, *,
addresses,
address_probe_timeout,
task_port,
result_port,
cores_per_worker,
mem_per_worker,
max_workers,
prefetch_capacity,
uid,
block_id,
heartbeat_threshold,
heartbeat_period,
poll_period,
cpu_affinity,
available_accelerators: Sequence[str]):
"""
Parameters
----------
Expand All @@ -72,7 +72,7 @@ def __init__(self,
address_probe_timeout : int
Timeout in seconds for the address probe to detect viable addresses
to the interchange. Default : 30s
to the interchange.
uid : str
string unique identifier
Expand All @@ -82,43 +82,41 @@ def __init__(self,
cores_per_worker : float
cores to be assigned to each worker. Oversubscription is possible
by setting cores_per_worker < 1.0. Default=1
by setting cores_per_worker < 1.0.
mem_per_worker : float
GB of memory required per worker. If this option is specified, the node manager
will check the available memory at startup and limit the number of workers such that
the there's sufficient memory for each worker. If set to None, memory on node is not
considered in the determination of workers to be launched on node by the manager.
Default: None
max_workers : int
caps the maximum number of workers that can be launched.
default: infinity
prefetch_capacity : int
Number of tasks that could be prefetched over available worker capacity.
When there are a few tasks (<100) or when tasks are long running, this option should
be set to 0 for better load balancing. Default is 0.
be set to 0 for better load balancing.
heartbeat_threshold : int
Seconds since the last message from the interchange after which the
interchange is assumed to be un-available, and the manager initiates shutdown. Default:120s
interchange is assumed to be un-available, and the manager initiates shutdown.
Number of seconds since the last message from the interchange after which the worker
assumes that the interchange is lost and the manager shuts down. Default:120
assumes that the interchange is lost and the manager shuts down.
heartbeat_period : int
Number of seconds after which a heartbeat message is sent to the interchange, and workers
are checked for liveness.
poll_period : int
Timeout period used by the manager in milliseconds. Default: 10ms
Timeout period used by the manager in milliseconds.
cpu_affinity : str
Whether or how each worker should force its affinity to different CPUs
available_accelerators: list of str
List of accelerators available to the workers. Default: Empty list
List of accelerators available to the workers.
"""

Expand Down

0 comments on commit 15290a9

Please sign in to comment.