Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling strategy infinite error loop for workerless executors #3726

Open
pdobbelaere opened this issue Dec 20, 2024 · 1 comment
Open

Scaling strategy infinite error loop for workerless executors #3726

pdobbelaere opened this issue Dec 20, 2024 · 1 comment
Labels

Comments

@pdobbelaere
Copy link

Describe the bug
I stupidly configured Parsl so my htex has 0 workers (htex._workers_per_node == 0). This leads to the following logs:

parsl.jobs.strategy:214 _general_strategy DEBUG: Slot ratio calculation: active_slots = 0, active_tasks = 3
parsl.jobs.strategy:217 _general_strategy DEBUG: Executor HighThroughputExecutor has 3 active tasks, 0/1 running/pending blocks, and 0 connected workers
parsl.jobs.strategy:266 _general_strategy DEBUG: Strategy case 2: slots are overloaded - (slot_ratio = active_slots/active_tasks) < parallelism
parsl.jobs.strategy:275 _general_strategy DEBUG: Strategy case 2b: active_blocks 1 < max_blocks 2 so scaling out

htex never launches any workers, so the strategy will ask for more blocks. Because tasks_per_node == htex.workers_per_node == 0, we get the following error

parsl.utils:352 make_callback ERROR: Callback threw an exception - logging and proceeding anyway
Traceback (most recent call last):
  File "/kyukon/data/gent/vo/000/gvo00003/vsc43633/micromamba/envs/test_psiflow/lib/python3.10/site-packages/parsl/utils.py", line 350, in make_callback
    self.callback(*self.cb_args)
  File "/kyukon/data/gent/vo/000/gvo00003/vsc43633/micromamba/envs/test_psiflow/lib/python3.10/site-packages/parsl/jobs/job_status_poller.py", line 22, in poll
    self._strategy.strategize(self._executors)
  File "/kyukon/data/gent/vo/000/gvo00003/vsc43633/micromamba/envs/test_psiflow/lib/python3.10/site-packages/parsl/jobs/strategy.py", line 163, in _strategy_simple
    self._general_strategy(executors, strategy_type='simple')
  File "/kyukon/data/gent/vo/000/gvo00003/vsc43633/micromamba/envs/test_psiflow/lib/python3.10/site-packages/parsl/process_loggers.py", line 26, in wrapped
    r = func(*args, **kwargs)
  File "/kyukon/data/gent/vo/000/gvo00003/vsc43633/micromamba/envs/test_psiflow/lib/python3.10/site-packages/parsl/jobs/strategy.py", line 277, in _general_strategy
    excess_blocks = math.ceil(float(excess_slots) / (tasks_per_node * nodes_per_block))
ZeroDivisionError: float division by zero

Nothing (useful) happens and every call to strategy fails with this same error.

I have two questions:

  • Is there a use case for an executor without any workers? Otherwise, you could catch and throw during initialisation.
  • If the scaling strategy fails repeatedly, maybe Parsl should simply give up after so many retries instead of indefinitely trying and getting stuck?
@benclifford
Copy link
Collaborator

I don't think there's any use case for specifying workers = 0.

So

you could catch and throw during initialisation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants