Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] control "submit"-level resource distribution #262

Closed
mgt16-LANL opened this issue Feb 13, 2024 · 8 comments · Fixed by #293
Closed

[feature] control "submit"-level resource distribution #262

mgt16-LANL opened this issue Feb 13, 2024 · 8 comments · Fixed by #293
Labels
enhancement New feature or request

Comments

@mgt16-LANL
Copy link

Hi @jan-janssen - is there a way to control the distribution of cores/threads/gpus at executor.submit() time (e.g. per-"job")? I was looking over the more recent versions and it seems like the interface has gone more the way of initializing the executors with this information - but I very likely could have missed something.

@jan-janssen
Copy link
Member

Hi @mgt16-LANL , that is correct, typically we define the resources once for the executor and then each function which is submitted to a given executor uses the same pre-defined set of resources. The background for this is that the executor gets a set of reserved resources. In principle it would be technically possible to assign resources at the submit level, still that is currently not implemented.

@mgt16-LANL
Copy link
Author

I guess I'm pretty interested in having the resource allocation be dynamically available, especially from flux/slurm backends for more dynamic/load-balancing workflows! I'll tag as a request. Is there there a reason we couldn't just add these to *args or **kwargs in the BaseExecutor class submit function to be differentially handled by FluxPythonInterface bootup function, for example?

@mgt16-LANL mgt16-LANL added the enhancement New feature or request label Feb 13, 2024
@jan-janssen
Copy link
Member

There are two reasons:

  • it cloud lead to a confusion with the function arguments, for example if the function has an argument cores and pympipool also uses an argument cores.
  • it requires us to start a new python process for each task, in the current implementation we start one python process per executor per slot and then reuse these. A executor can execute multiple slots in parallel, depending on how the number of max_workers is set.

@jan-janssen jan-janssen changed the title How to control "submit"-level resource distribution [Feature] control "submit"-level resource distribution Feb 14, 2024
@jan-janssen jan-janssen changed the title [Feature] control "submit"-level resource distribution [feature] control "submit"-level resource distribution Feb 14, 2024
@mgt16-LANL
Copy link
Author

-it cloud lead to a confusion with the function arguments, for example if the function has an argument cores and pympipool also uses an argument cores.

My suggestion would be use something like "runtime cores" or a different nomenclature for the submit-time resource assignment.

  • it requires us to start a new python process for each task, in the current implementation we start one python process per executor per slot and then reuse these. A executor can execute multiple slots in parallel, depending on how the number of max_workers is set.

I'm not sure I understand this one - from the https://github.com/pyiron/pympipool/blob/main/pympipool/flux/executor.py code:

    def bootup(self, command_lst):
        if self._oversubscribe:
            raise ValueError(
                "Oversubscribing is currently not supported for the Flux adapter."
            )
        if self._executor is None:
            self._executor = flux.job.FluxExecutor()
        jobspec = flux.job.JobspecV1.from_command(
            command=command_lst,
            num_tasks=self._cores,
            cores_per_task=self._threads_per_core,
            gpus_per_task=self._gpus_per_core,
            num_nodes=None,
            exclusive=False,
        )
        jobspec.environment = dict(os.environ)
        if self._cwd is not None:
            jobspec.cwd = self._cwd
        self._future = self._executor.submit(jobspec)

It would seem like under the single python process, you could expose the underlying Jobspec to the user at submission time without requiring an additional python process?

@jan-janssen
Copy link
Member

About the second part, the bootup() happens when the executor is created and then the python process remains active until the executor is closed. Meaning you create an executor, it creates N workers (defined by max_workers) each with a specific set of resources as defined by the executor. Then functions are submitted to the executor and the executor internally distributes these functions to its workers.

@mgt16-LANL
Copy link
Author

Ah! That makes sense. Is the preferred method for getting this type of functionality with pympipool to define a set of executors to use as "queues" with more/less resources?

@jan-janssen
Copy link
Member

Ah! That makes sense. Is the preferred method for getting this type of functionality with pympipool to define a set of executors to use as "queues" with more/less resources?

Yes, at least that is how I was using it so far. This allows pympipool to reuse the python processes it started, meaning it no longer has to go through flux for every function that you submit.

@jan-janssen
Copy link
Member

@mgt16-LANL I have an initial draft for this interface available in #293 it would be very interesting to see if this also solves your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants