From 9600e9a89b54b9243a95d9682b4bc480f6549805 Mon Sep 17 00:00:00 2001 From: Jan Janssen Date: Wed, 20 Nov 2024 13:02:27 +0100 Subject: [PATCH] More updates --- docs/development.md | 147 ------------------------------- docs/trouble_shooting.md | 14 +-- notebooks/1-local.ipynb | 8 +- notebooks/2-hpc-submission.ipynb | 12 +-- notebooks/3-hpc-allocation.ipynb | 34 +++---- notebooks/4-developer.ipynb | 7 +- 6 files changed, 33 insertions(+), 189 deletions(-) delete mode 100644 docs/development.md diff --git a/docs/development.md b/docs/development.md deleted file mode 100644 index 071475b7..00000000 --- a/docs/development.md +++ /dev/null @@ -1,147 +0,0 @@ -# Development -The `executorlib` package is developed based on the need to simplify the up-scaling of python functions over multiple -compute nodes. The project is used for Exascale simualtion in the context of computational chemistry and materials -science. Still it remains a scientific research project with the goal to maximize the utilization of computational -resources for scientific applications. No formal support is provided. - -## Contributions -Any feedback and contributions are welcome. - -## License -``` -BSD 3-Clause License - -Copyright (c) 2022, Jan Janssen -All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -* Redistributions of source code must retain the above copyright notice, this - list of conditions and the following disclaimer. - -* Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - -* Neither the name of the copyright holder nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -``` - -## Integration -The key functionality of the `executorlib` package is the up-scaling of python functions with thread based parallelism, -MPI based parallelism or by assigning GPUs to individual python functions. In the background this is realized using a -combination of the [zero message queue](https://zeromq.org) and [cloudpickle](https://github.com/cloudpipe/cloudpickle) -to communicate binary python objects. The `executorlib.communication.SocketInterface` is an abstraction of this -interface, which is used in the other classes inside `executorlib` and might also be helpful for other projects. It -comes with a series of utility functions: - -* `executorlib.communication.interface_bootup()`: To initialize the interface -* `executorlib.communication.interface_connect()`: To connect the interface to another instance -* `executorlib.communication.interface_send()`: To send messages via this interface -* `executorlib.communication.interface_receive()`: To receive messages via this interface -* `executorlib.communication.interface_shutdown()`: To shutdown the interface - -While `executorlib` was initially designed for up-scaling python functions for HPC, the same functionality can be -leveraged to up-scale any executable independent of the programming language it is developed in. This approach follows -the design of the `flux.job.FluxExecutor` included in the [flux framework](https://flux-framework.org). In `executorlib` this approach -is extended to support any kind of subprocess, so it is no longer limited to the [flux framework](https://flux-framework.org). - -### Subprocess -Following the [`subprocess.check_output()`](https://docs.python.org/3/library/subprocess.html) interface of the standard -python libraries, any kind of command can be submitted to the `executorlib.SubprocessExecutor`. The command can either be -specified as a list `["echo", "test"]` in which the first entry is typically the executable followed by the corresponding -parameters or the command can be specified as a string `"echo test"` with the additional parameter `shell=True`. -```python -from executorlib import SubprocessExecutor - -with SubprocessExecutor(max_workers=2) as exe: - future = exe.submit(["echo", "test"], universal_newlines=True) - print(future.done(), future.result(), future.done()) -``` -``` ->>> (False, "test", True) -``` -In analogy to the previous examples the `SubprocessExecutor` class is directly imported from the `executorlib` module and -the maximum number of workers is set to two `max_workers=2`. In contrast to the `executorlib.Executor` class no other -settings to assign specific hardware to the command via the python interface are available in the `SubprocessExecutor` -class. To specify the hardware requirements for the individual commands, the user has to manually assign the resources -using the commands of the resource schedulers like `srun`, `flux run` or `mpiexec`. - -The `concurrent.futures.Future` object returned after submitting a command to the `executorlib.SubprocessExecutor` behaves -just like any other future object. It provides a `done()` function to check if the execution completed as well as a -`result()` function to return the output of the submitted command. - -In comparison to the `flux.job.FluxExecutor` included in the [flux framework](https://flux-framework.org) the -`executorlib.SubprocessExecutor` differs in two ways. One the `executorlib.SubprocessExecutor` does not provide any -option for resource assignment and two the `executorlib.SubprocessExecutor` returns the output of the command rather -than just returning the exit status when calling `result()`. - -### Interactive Shell -Beyond external executables which are called once with a set of input parameters and or input files and return one set -of outputs, there are some executables which allow the user to interact with the executable during the execution. The -challenge of interfacing a python process with such an interactive executable is to identify when the executable is ready -to receive the next input. A very basis example for an interactive executable is a script which counts to the number -input by the user. This can be written in python as `count.py`: -```python -def count(iterations): - for i in range(int(iterations)): - print(i) - print("done") - - -if __name__ == "__main__": - while True: - user_input = input() - if "shutdown" in user_input: - break - else: - count(iterations=int(user_input)) -``` -This example is challenging in terms of interfacing it with a python process as the length of the output changes depending -on the input. The first option that the `executorlib.ShellExecutor` provides is specifying the number of lines to read for -each call submitted to the executable using the `lines_to_read` parameter. In comparison to the `SubprocessExecutor` -defined above the `ShellExecutor` only supports the execution of a single executable at a time, correspondingly the input -parameters for calling the executable are provided at the time of initialization of the `ShellExecutor` and the inputs -are submitted using the `submit()` function: -```python -from executorlib import ShellExecutor - -with ShellExecutor(["python", "count.py"], universal_newlines=True) as exe: - future_lines = exe.submit(string_input="4", lines_to_read=5) - print(future_lines.done(), future_lines.result(), future_lines.done()) -``` -``` ->>> (False, "0\n1\n2\n3\ndone\n", True) -``` -The response for a given set of input is again returned as `concurrent.futures.Future` object, this allows the user to -execute other steps on the python side while waiting for the completion of the external executable. In this case the -example counts the numbers from `0` to `3` and prints each of them in one line followed by `done` to notify the user its -waiting for new inputs. This results in `n+1` lines of output for the input of `n`. Still predicting the number of lines -for a given input can be challenging, so the `executorlib.ShellExecutor` class also provides the option to wait until a -specific pattern is found in the output using the `stop_read_pattern`: -```python -from executorlib import ShellExecutor - -with ShellExecutor(["python", "count.py"], universal_newlines=True) as exe: - future_pattern = exe.submit(string_input="4", stop_read_pattern="done") - print(future_pattern.done(), future_pattern.result(), future_pattern.done()) -``` -``` ->>> (False, "0\n1\n2\n3\ndone\n", True) -``` -In this example the pattern simply searches for the string `done` in the output of the program and returns all the output -gathered from the executable since the last input as the result of the `concurrent.futures.Future` object returned after -the submission of the interactive command. diff --git a/docs/trouble_shooting.md b/docs/trouble_shooting.md index b82cde29..ca868bcd 100644 --- a/docs/trouble_shooting.md +++ b/docs/trouble_shooting.md @@ -16,13 +16,13 @@ local mode on computers with strict firewall rules. ## Message Passing Interface To use the message passing interface (MPI) executorlib requires [mpi4py](https://mpi4py.readthedocs.io/) as optional -dependency. The installation of this and other optional dependencies is covered in the [installation section](). +dependency. The installation of this and other optional dependencies is covered in the [installation section](https://executorlib.readthedocs.io/en/latest/installation.html#mpi-support). ## Missing Dependencies The default installation of executorlib only comes with a limited number of dependencies, especially the [zero message queue](https://zeromq.org) -and [cloudpickle](https://github.com/cloudpipe/cloudpickle). Additional features like [caching](), [HPC submission mode]() -and [HPC allocation mode]() require additional dependencies. The dependencies are explained in more detail in the -[installation section](). +and [cloudpickle](https://github.com/cloudpipe/cloudpickle). Additional features like [caching](https://executorlib.readthedocs.io/en/latest/installation.html#caching), [HPC submission mode](https://executorlib.readthedocs.io/en/latest/installation.html#hpc-submission-mode) +and [HPC allocation mode](https://executorlib.readthedocs.io/en/latest/installation.html#hpc-allocation-mode) require additional dependencies. The dependencies are explained in more detail in the +[installation section](https://executorlib.readthedocs.io/en/latest/installation.html#). ## Python Version Executorlib supports all current Python version ranging from 3.9 to 3.13. Still some of the dependencies and especially @@ -38,9 +38,9 @@ The resource dictionary parameter `resource_dict` can contain one or more of the * `openmpi_oversubscribe` (bool): adds the `--oversubscribe` command line flag (OpenMPI and SLURM only) - default False * `slurm_cmd_args` (list): Additional command line arguments for the srun call (SLURM only) -For the special case of the [HPC allocation mode]() the resource dictionary parameter `resource_dict` can also include -additional parameters define in the submission script of the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) -these include but are not limited to: +For the special case of the [HPC allocation mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html) +the resource dictionary parameter `resource_dict` can also include additional parameters define in the submission script +of the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) these include but are not limited to: * `run_time_max` (int): the maximum time the execution of the submitted Python function is allowed to take in seconds. * `memory_max` (int): the maximum amount of memory the Python function is allowed to use in Gigabytes. * `partition` (str): the partition of the queuing system the Python function is submitted to. diff --git a/notebooks/1-local.ipynb b/notebooks/1-local.ipynb index e656974d..a89eba96 100644 --- a/notebooks/1-local.ipynb +++ b/notebooks/1-local.ipynb @@ -230,7 +230,7 @@ "metadata": {}, "source": [ "In addition, to the compute cores `cores`, the resource dictionary parameter `resource_dict` can also define the threads per core as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource \n", - "Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments with the `slurm_cmd_args` parameter - [resource dictionary]()." + "Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments with the `slurm_cmd_args` parameter - [resource dictionary](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary)." ] }, { @@ -325,7 +325,7 @@ "source": [ "For most cases MPI based parallelism leads to higher computational efficiency in comparison to thread based parallelism, still the choice of parallelism depends on the specific Python function which should be executed in parallel. Careful benchmarks are required to achieve the optimal performance for a given computational architecture. \n", "\n", - "Beyond MPI based parallelism and thread based parallelism the [HPC Submission Mode]() and the [HPC Allocation Mode]() also provide the option to assign GPUs to the execution of individual Python functions. " + "Beyond MPI based parallelism and thread based parallelism the [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html) and the [HPC Allocation Mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html) also provide the option to assign GPUs to the execution of individual Python functions. " ] }, { @@ -457,9 +457,7 @@ } ], "source": [ - "with Executor(\n", - " backend=\"local\", resource_dict={\"cores\": 2}, block_allocation=True\n", - ") as exe:\n", + "with Executor(backend=\"local\", resource_dict={\"cores\": 2}, block_allocation=True) as exe:\n", " fs = exe.submit(calc_mpi, 3)\n", " print(fs.result())" ] diff --git a/notebooks/2-hpc-submission.ipynb b/notebooks/2-hpc-submission.ipynb index 28c9f25f..f0998333 100644 --- a/notebooks/2-hpc-submission.ipynb +++ b/notebooks/2-hpc-submission.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "# HPC Submission Mode\n", - "In contrast to the [local mode] and the [HPC allocation mode] the HPC Submission Mode does not communicate via the [zero message queue](https://zeromq.org) but instead stores the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in HPC Submission mode. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache. \n", + "In contrast to the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) and the [HPC allocation mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html) the HPC Submission Mode does not communicate via the [zero message queue](https://zeromq.org) but instead stores the python functions on the file system and uses the job scheduler to handle the dependencies of the Python functions. Consequently, the block allocation `block_allocation` and the init function `init_function` are not available in HPC Submission mode. At the same time it is possible to close the Python process which created the `Executor`, wait until the execution of the submitted Python functions is completed and afterwards reload the results from the cache. \n", "\n", "Internally the HPC submission mode is using the [Python simple queuing system adatper (pysqa)](https://pysqa.readthedocs.io) to connect to HPC job schedulers and the [h5py](https://www.h5py.org) package for serializing the Python functions to store them on the file system. Both packages are optional dependency of executorlib. The installation of the [pysqa](https://pysqa.readthedocs.io) package and the [h5py](https://www.h5py.org) package are covered in the installation section. " ] @@ -37,7 +37,7 @@ "id": "b20913f3-59e4-418c-a399-866124f8e497", "metadata": {}, "source": [ - "In comparison to the [Local Mode](), the only two parameters which are changed are the specification of the backend as `backend=\"slurm_submission\"` and the requirement to specify the cache directory using the `cache_directory=\"./cache\"`. The rest of the syntax remains exactly the same, to simplify the up-scaling of simulation workflows. " + "In comparison to the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html), the only two parameters which are changed are the specification of the backend as `backend=\"slurm_submission\"` and the requirement to specify the cache directory using the `cache_directory=\"./cache\"`. The rest of the syntax remains exactly the same, to simplify the up-scaling of simulation workflows. " ] }, { @@ -112,7 +112,7 @@ "metadata": {}, "source": [ "## Flux\n", - "While most HPC job schedulers require extensive configuration before they can be tested, the [flux framework](http://flux-framework.org) can be installed with the conda package manager, as explained in the [installation section](). This simple installation makes the flux framework especially suitable for demonstrations, testing and continous integration. So below a number of features for the HPC submission mode are demonstrated based on the example of the [flux framework](http://flux-framework.org) still the same applies to other job schedulers like SLURM introduced above." + "While most HPC job schedulers require extensive configuration before they can be tested, the [flux framework](http://flux-framework.org) can be installed with the conda package manager, as explained in the [installation section](https://executorlib.readthedocs.io/en/latest/installation.html#alternative-installations). This simple installation makes the flux framework especially suitable for demonstrations, testing and continous integration. So below a number of features for the HPC submission mode are demonstrated based on the example of the [flux framework](http://flux-framework.org) still the same applies to other job schedulers like SLURM introduced above." ] }, { @@ -121,7 +121,7 @@ "metadata": {}, "source": [ "### Dependencies\n", - "As already demonstrated for the [Local Mode]() the `Executor` class from executorlib is capable of resolving the dependencies of serial functions, when [concurrent futures Future](https://docs.python.org/3/library/concurrent.futures.html#future-objects) objects are used as inputs for subsequent function calls. For the case of the HPC submission these dependencies are communicated to the job scheduler, which allows to stop the Python process which created the `Executor` class, wait until the execution of the submitted Python functions is completed and afterwards restart the Python process for the `Executor` class and reload the calculation results from the cache defined by the `cache_directory` parameter." + "As already demonstrated for the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) the `Executor` class from executorlib is capable of resolving the dependencies of serial functions, when [concurrent futures Future](https://docs.python.org/3/library/concurrent.futures.html#future-objects) objects are used as inputs for subsequent function calls. For the case of the HPC submission these dependencies are communicated to the job scheduler, which allows to stop the Python process which created the `Executor` class, wait until the execution of the submitted Python functions is completed and afterwards restart the Python process for the `Executor` class and reload the calculation results from the cache defined by the `cache_directory` parameter." ] }, { @@ -155,7 +155,7 @@ "metadata": {}, "source": [ "### Resource Assignment\n", - "In analogy to the [Local Mode]() the resource assignment for the HPC submission mode is handled by either including the resource dictionary parameter `resource_dict` in the initialization of the `Executor` class or in every call of the `submit()` function. \n", + "In analogy to the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) the resource assignment for the HPC submission mode is handled by either including the resource dictionary parameter `resource_dict` in the initialization of the `Executor` class or in every call of the `submit()` function. \n", "\n", "Below this is demonstrated once for the assignment of muliple CPU cores for the execution of a Python function which internally uses the message passing interface (MPI) via the [mpi4py](https://mpi4py.readthedocs.io) package. " ] @@ -192,7 +192,7 @@ "id": "d91499d7-5c6c-4c10-b7b7-bfc4b87ddaa8", "metadata": {}, "source": [ - "Beyond CPU cores and threads which were previously also introduced for the [Local Mode]() the HPC submission mode also provides the option to select the available accelerator cards or GPUs, by specifying the `\"gpus_per_core\"` parameter in the resource dictionary `resource_dict`. For demonstration we create a Python function which reads the GPU device IDs and submit it to the `Executor` class:\n", + "Beyond CPU cores and threads which were previously also introduced for the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) the HPC submission mode also provides the option to select the available accelerator cards or GPUs, by specifying the `\"gpus_per_core\"` parameter in the resource dictionary `resource_dict`. For demonstration we create a Python function which reads the GPU device IDs and submit it to the `Executor` class:\n", "```python\n", "def get_available_gpus():\n", " import socket\n", diff --git a/notebooks/3-hpc-allocation.ipynb b/notebooks/3-hpc-allocation.ipynb index f0d2c604..e5325880 100644 --- a/notebooks/3-hpc-allocation.ipynb +++ b/notebooks/3-hpc-allocation.ipynb @@ -6,12 +6,12 @@ "metadata": {}, "source": [ "# HPC Allocation Mode\n", - "In contrast to the [HPC Submission Mode]() which submitts individual Python functions to HPC job schedulers, the HPC Allocation Mode takes a given allocation of the HPC job scheduler and executes Python functions with the resources available in this allocation. In this regard it is similar to the [Local Mode]() as it communicates with the individual Python processes using the [zero message queue](https://zeromq.org/), still it is more advanced as it can access the computational resources of all compute nodes of the given HPC allocation and also provides the option to assign GPUs as accelerators for parallel execution.\n", + "In contrast to the [HPC Submission Mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html) which submitts individual Python functions to HPC job schedulers, the HPC Allocation Mode takes a given allocation of the HPC job scheduler and executes Python functions with the resources available in this allocation. In this regard it is similar to the [Local Mode](https://executorlib.readthedocs.io/en/latest/1-local.html) as it communicates with the individual Python processes using the [zero message queue](https://zeromq.org/), still it is more advanced as it can access the computational resources of all compute nodes of the given HPC allocation and also provides the option to assign GPUs as accelerators for parallel execution.\n", "\n", "Available Functionality: \n", - "* Submit Python functions with the [submit() function or the map() function]().\n", - "* Support for parallel execution, either using the [message passing interface (MPI)](), [thread based parallelism]() or by [assigning dedicated GPUs]() to selected Python functions. All these resources assignments are handled via the [resource dictionary parameter resource_dict]().\n", - "* Performance optimization features, like [block allocation](), [dependency resolution]() and [caching]().\n", + "* Submit Python functions with the [submit() function or the map() function](https://executorlib.readthedocs.io/en/latest/1-local.html#basic-functionality).\n", + "* Support for parallel execution, either using the [message passing interface (MPI)](https://executorlib.readthedocs.io/en/latest/1-local.html#mpi-parallel-functions), [thread based parallelism](https://executorlib.readthedocs.io/en/latest/1-local.html#thread-parallel-functions) or by [assigning dedicated GPUs](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html#resource-assignment) to selected Python functions. All these resources assignments are handled via the [resource dictionary parameter resource_dict](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary).\n", + "* Performance optimization features, like [block allocation](https://executorlib.readthedocs.io/en/latest/1-local.html#block-allocation), [dependency resolution](https://executorlib.readthedocs.io/en/latest/1-local.html#dependencies) and [caching](https://executorlib.readthedocs.io/en/latest/1-local.html#cache).\n", "\n", "The only parameter the user has to change is the `backend` parameter. " ] @@ -22,7 +22,7 @@ "metadata": {}, "source": [ "## SLURM\n", - "With the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com/) currently being the most commonly used job scheduler, executorlib provides an interface to submit Python functions to SLURM. Internally, this is based on the [srun](https://slurm.schedmd.com/srun.html) command of the SLURM scheduler, which creates job steps in a given allocation. Given that all resource requests in SLURM are communicated via a central database a large number of submitted Python functions and resulting job steps can slow down the performance of SLURM. To address this limitation it is recommended to install the hierarchical job scheduler [flux](https://flux-framework.org/) in addition to SLURM, to use flux for distributing the resources within a given allocation. This configuration is discussed in more detail below in the section [SLURM with flux]()." + "With the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com/) currently being the most commonly used job scheduler, executorlib provides an interface to submit Python functions to SLURM. Internally, this is based on the [srun](https://slurm.schedmd.com/srun.html) command of the SLURM scheduler, which creates job steps in a given allocation. Given that all resource requests in SLURM are communicated via a central database a large number of submitted Python functions and resulting job steps can slow down the performance of SLURM. To address this limitation it is recommended to install the hierarchical job scheduler [flux](https://flux-framework.org/) in addition to SLURM, to use flux for distributing the resources within a given allocation. This configuration is discussed in more detail below in the section [SLURM with flux](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html#slurm-with-flux)." ] }, { @@ -53,7 +53,7 @@ "metadata": {}, "source": [ "## SLURM with Flux \n", - "As discussed in the installation section it is important to select the [flux](https://flux-framework.org/) version compatible to the installation of a given HPC cluster. Which GPUs are available? Who manufactured these GPUs? Does the HPC use [mpich](https://www.mpich.org/) or [OpenMPI](https://www.open-mpi.org/) or one of their commercial counter parts like cray MPI or intel MPI? Depending on the configuration different installation options can be choosen, as explained in the [installation section](). \n", + "As discussed in the installation section it is important to select the [flux](https://flux-framework.org/) version compatible to the installation of a given HPC cluster. Which GPUs are available? Who manufactured these GPUs? Does the HPC use [mpich](https://www.mpich.org/) or [OpenMPI](https://www.open-mpi.org/) or one of their commercial counter parts like cray MPI or intel MPI? Depending on the configuration different installation options can be choosen, as explained in the [installation section](https://executorlib.readthedocs.io/en/latest/installation.html#hpc-allocation-mode). \n", "\n", "Afterwards flux can be started in an [sbatch](https://slurm.schedmd.com/sbatch.html) submission script using:\n", "```\n", @@ -68,7 +68,7 @@ "metadata": {}, "source": [ "### Resource Assignment\n", - "Independent of the selected backend [local mode](), [HPC submission mode]() or HPC allocation mode the assignment of the computational resoruces remains the same. They can either be specified in the `submit()` function by adding the resource dictionary parameter [resource_dict]() or alternatively during the initialization of the `Executor` class by adding the resource dictionary parameter [resource_dict]() there. \n", + "Independent of the selected backend [local mode](https://executorlib.readthedocs.io/en/latest/1-local.html), [HPC submission mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html) or HPC allocation mode the assignment of the computational resoruces remains the same. They can either be specified in the `submit()` function by adding the resource dictionary parameter [resource_dict](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary) or alternatively during the initialization of the `Executor` class by adding the resource dictionary parameter [resource_dict](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary) there. \n", "\n", "This functionality of executorlib is commonly used to rewrite individual Python functions to use MPI while the rest of the Python program remains serial." ] @@ -122,7 +122,7 @@ "metadata": {}, "source": [ "### Block Allocation\n", - "The block allocation for the HPC allocation mode follows the same implementation as the [block allocation for the local mode](). It starts by defining the initialization function `init_function()` which returns a dictionary which is internally used to look up input parameters for Python functions submitted to the `Executor` class. Commonly this functionality is used to store large data objects inside the Python process created for the block allocation, rather than reloading these Python objects for each submitted function. " + "The block allocation for the HPC allocation mode follows the same implementation as the [block allocation for the local mode](https://executorlib.readthedocs.io/en/latest/1-local.html#block-allocation). It starts by defining the initialization function `init_function()` which returns a dictionary which is internally used to look up input parameters for Python functions submitted to the `Executor` class. Commonly this functionality is used to store large data objects inside the Python process created for the block allocation, rather than reloading these Python objects for each submitted function. " ] }, { @@ -163,14 +163,10 @@ ], "source": [ "with Executor(\n", - " backend=\"flux_allocation\",\n", - " flux_executor_pmi_mode=\"pmix\",\n", - " max_workers=2,\n", - " init_function=init_function,\n", - " block_allocation=True,\n", + " backend=\"flux_allocation\", flux_executor_pmi_mode=\"pmix\", max_workers=2, init_function=init_function, block_allocation=True\n", ") as exe:\n", " fs = exe.submit(calc_with_preload, 2, j=5)\n", - " print(fs.result())" + " print(fs.result())\n" ] }, { @@ -189,7 +185,7 @@ "metadata": {}, "source": [ "### Dependencies\n", - "Python functions with rather different computational resource requirements should not be merged into a single function. So to able to execute a series of Python functions which each depend on the output of the previous Python function executorlib internally handles the dependencies based on the [concurrent futures future](https://docs.python.org/3/library/concurrent.futures.html#future-objects) objects from the Python standard library. This implementation is independent of the selected backend and works for HPC allocation mode just like explained in the [local mode section](). " + "Python functions with rather different computational resource requirements should not be merged into a single function. So to able to execute a series of Python functions which each depend on the output of the previous Python function executorlib internally handles the dependencies based on the [concurrent futures future](https://docs.python.org/3/library/concurrent.futures.html#future-objects) objects from the Python standard library. This implementation is independent of the selected backend and works for HPC allocation mode just like explained in the [local mode section](https://executorlib.readthedocs.io/en/latest/1-local.html#dependencies). " ] }, { @@ -231,7 +227,7 @@ "metadata": {}, "source": [ "### Caching\n", - "Finally, also the caching is available for HPC allocation mode, in analogy to the [local mode](). Again this functionality is not designed to identify function calls with the same parameters, but rather provides the option to reload previously cached results even after the Python processes which contained the executorlib `Executor` class is closed. As the cache is stored on the file system, this option can decrease the performance of executorlib. Consequently the caching option should primarily be used during the prototyping phase. " + "Finally, also the caching is available for HPC allocation mode, in analogy to the [local mode](https://executorlib.readthedocs.io/en/latest/1-local.html#cache). Again this functionality is not designed to identify function calls with the same parameters, but rather provides the option to reload previously cached results even after the Python processes which contained the executorlib `Executor` class is closed. As the cache is stored on the file system, this option can decrease the performance of executorlib. Consequently the caching option should primarily be used during the prototyping phase. " ] }, { @@ -249,9 +245,7 @@ } ], "source": [ - "with Executor(\n", - " backend=\"flux_allocation\", flux_executor_pmi_mode=\"pmix\", cache_directory=\"./cache\"\n", - ") as exe:\n", + "with Executor(backend=\"flux_allocation\", flux_executor_pmi_mode=\"pmix\", cache_directory=\"./cache\") as exe:\n", " future_lst = [exe.submit(sum, [i, i]) for i in range(1, 4)]\n", " print([f.result() for f in future_lst])" ] @@ -301,7 +295,7 @@ "source": [ "def calc_nested():\n", " from executorlib import Executor\n", - "\n", + " \n", " with Executor(backend=\"flux_allocation\", flux_executor_pmi_mode=\"pmix\") as exe:\n", " fs = exe.submit(sum, [1, 1])\n", " return fs.result()" diff --git a/notebooks/4-developer.ipynb b/notebooks/4-developer.ipynb index 83c14282..ec02cdc3 100644 --- a/notebooks/4-developer.ipynb +++ b/notebooks/4-developer.ipynb @@ -71,7 +71,7 @@ " command: list, universal_newlines: bool = True, shell: bool = False\n", "):\n", " import subprocess\n", - "\n", + " \n", " return subprocess.check_output(\n", " command, universal_newlines=universal_newlines, shell=shell\n", " )" @@ -158,7 +158,6 @@ "source": [ "def init_process():\n", " import subprocess\n", - "\n", " return {\n", " \"process\": subprocess.Popen(\n", " [\"python\", \"count.py\"],\n", @@ -311,8 +310,8 @@ "While it is not recommended to link to specific internal components of executorlib in external Python packages but rather only the `Executor` class should be used as central interface to executorlib, the internal architecture is briefly outlined below. \n", "* `backend` - the backend module contains the functionality for the Python processes created by executorlib to execute the submitted Python functions.\n", "* `base` - the base module contains the definition of the executorlib `ExecutorBase` class which is internally used to create the different interfaces. To compare if an given `Executor` class is based on executorlib compare with the `ExecutorBase` class which can be imported as `from executorlib.base.executor import ExecutorBase`.\n", - "* `cache` - the cache module defines the file based communication for the [HPC submission mode]().\n", - "* `interactive` - the interactive modules defines the [zero message queue](https://zeromq.org) based communication for the [local mode]() and the [HPC allocation mode]().\n", + "* `cache` - the cache module defines the file based communication for the [HPC submission mode](https://executorlib.readthedocs.io/en/latest/2-hpc-submission.html).\n", + "* `interactive` - the interactive modules defines the [zero message queue](https://zeromq.org) based communication for the [local mode](https://executorlib.readthedocs.io/en/latest/1-local.html) and the [HPC allocation mode](https://executorlib.readthedocs.io/en/latest/3-hpc-allocation.html).\n", "* `standalone` - the standalone module contains a number of utility functions which only depend on external libraries and do not have any internal dependency to other parts of `executorlib`. This includes the functionality to generate executable commands, the [h5py](https://www.h5py.org) based interface for caching, a number of input checks, routines to plot the dependencies of a number of future objects, functionality to interact with the [queues defined in the Python standard library](https://docs.python.org/3/library/queue.html), the interface for serialization based on [cloudpickle](https://github.com/cloudpipe/cloudpickle) and finally an extension to the [threading](https://docs.python.org/3/library/threading.html) of the Python standard library.\n", "\n", "Given the level of separation the integration of submodules from the standalone module in external software packages should be the easiest way to benefit from the developments in executorlib beyond just using the `Executor` class. "