Skip to content

Commit

Permalink
Add markdown syntax highlighting
Browse files Browse the repository at this point in the history
  • Loading branch information
jan-janssen committed Apr 16, 2024
1 parent 2ee834b commit 5125ad6
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 27 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ no central data storage is required as the workers and the scheduling task can c
## Examples
The following examples illustrates how `pympipool` can be used to distribute a series of MPI parallel function calls
within a queuing system allocation. `example.py`:
```
```python
from pympipool import Executor

def calc(i):
Expand Down
32 changes: 18 additions & 14 deletions docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ to simplify the up-scaling of individual functions in a given workflow.
## Compatibility
Starting with the basic example of `1+1=2`. With the `ThreadPoolExecutor` from the [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
standard library this can be written as - `test_thread.py`:
```
```python
from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(
Expand Down Expand Up @@ -36,7 +36,7 @@ worker `gpus_per_worker`. Finally, for those backends which support over-subscri
replacement for the [`concurrent.futures.Executor`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures).

The previous example is rewritten for the `pympipool.Executor` in - `test_sum.py`:
```
```python
from pympipool import Executor

with Executor(
Expand All @@ -58,7 +58,7 @@ The result of the calculation is again `1+1=2`.

Beyond pre-defined functions like the `sum()` function, the same functionality can be used to submit user-defined
functions. In the `test_serial.py` example a custom summation function is defined:
```
```python
from pympipool import Executor

def calc(*args):
Expand Down Expand Up @@ -95,7 +95,7 @@ For backwards compatibility with the [`multiprocessing.Pool`](https://docs.pytho
class the [`concurrent.futures.Executor`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
also implements the `map()` function to map a series of inputs to a function. The same `map()` function is also
available in the `pympipool.Executor` - `test_map.py`:
```
```python
from pympipool import Executor

def calc(*args):
Expand All @@ -118,7 +118,7 @@ each function submitted to this worker has access to the dataset, as it is alrea
the user defines an initialization function `init_function` which returns a dictionary with one key per dataset. The
keys of the dictionary can then be used as additional input parameters in each function submitted to the `pympipool.Executor`.
This functionality is illustrated in the `test_data.py` example:
```
```python
from pympipool import Executor

def calc(i, j, k):
Expand Down Expand Up @@ -179,7 +179,7 @@ uses for thread based parallelism, so it might be necessary to set certain envir
At the current stage `pympipool.Executor` does not set these parameters itself, so you have to add them in the function
you submit before importing the corresponding library:

```
```python
def calc(i):
import os
os.environ["OMP_NUM_THREADS"] = "2"
Expand Down Expand Up @@ -214,7 +214,7 @@ function while these functions can still me submitted to the `pympipool.Executor
advantage of this approach is that the users can parallelize their workflows one function at the time.

The example in `test_mpi.py` illustrates the submission of a simple MPI parallel python function:
```
```python
from pympipool import Executor

def calc(i):
Expand Down Expand Up @@ -246,7 +246,7 @@ and finally the index of the specific process `0` or `1`.
With the rise of machine learning applications, the use of GPUs for scientific application becomes more and more popular.
Consequently, it is essential to have full control over the assignment of GPUs to specific python functions. In the
`test_gpu.py` example the `tensorflow` library is used to identify the GPUs and return their configuration:
```
```python
import socket
from pympipool import Executor
from tensorflow.python.client import device_lib
Expand Down Expand Up @@ -274,7 +274,7 @@ as two functions are submitted and the results are printed.

To clarify the execution of such an example on a high performance computing (HPC) cluster using the [SLURM workload manager](https://www.schedmd.com)
the submission script is given below:
```
```shell
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gpus-per-node=1
Expand All @@ -290,7 +290,7 @@ For the more complex setup of running the [flux framework](https://flux-framewor
within the [SLURM workload manager](https://www.schedmd.com) it is essential that the resources are passed from the
[SLURM workload manager](https://www.schedmd.com) to the [flux framework](https://flux-framework.org). This is achieved
by calling `srun flux start` in the submission script:
```
```shell
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gpus-per-node=1
Expand All @@ -316,7 +316,7 @@ Following the [`subprocess.check_output()`](https://docs.python.org/3/library/su
python libraries, any kind of command can be submitted to the `pympipool.SubprocessExecutor`. The command can either be
specified as a list `["echo", "test"]` in which the first entry is typically the executable followed by the corresponding
parameters or the command can be specified as a string `"echo test"` with the additional parameter `shell=True`.
```
```python
from pympipool import SubprocessExecutor

with SubprocessExecutor(max_workers=2) as exe:
Expand Down Expand Up @@ -345,7 +345,7 @@ of outputs, there are some executables which allow the user to interact with the
challenge of interfacing a python process with such an interactive executable is to identify when the executable is ready
to receive the next input. A very basis example for an interactive executable is a script which counts to the number
input by the user. This can be written in python as `count.py`:
```
```python
def count(iterations):
for i in range(int(iterations)):
print(i)
Expand All @@ -366,12 +366,14 @@ each call submitted to the executable using the `lines_to_read` parameter. In co
defined above the `ShellExecutor` only supports the execution of a single executable at a time, correspondingly the input
parameters for calling the executable are provided at the time of initialization of the `ShellExecutor` and the inputs
are submitted using the `submit()` function:
```
```python
from pympipool import ShellExecutor

with ShellExecutor(["python", "count.py"], universal_newlines=True) as exe:
future_lines = exe.submit(string_input="4", lines_to_read=5)
print(future_lines.done(), future_lines.result(), future_lines.done())
```
```
>>> (False, "0\n1\n2\n3\ndone\n", True)
```
The response for a given set of input is again returned as `concurrent.futures.Future` object, this allows the user to
Expand All @@ -380,12 +382,14 @@ example counts the numbers from `0` to `3` and prints each of them in one line f
waiting for new inputs. This results in `n+1` lines of output for the input of `n`. Still predicting the number of lines
for a given input can be challenging, so the `pympipool.ShellExecutor` class also provides the option to wait until a
specific pattern is found in the output using the `stop_read_pattern`:
```
```python
from pympipool import ShellExecutor

with ShellExecutor(["python", "count.py"], universal_newlines=True) as exe:
future_pattern = exe.submit(string_input="4", stop_read_pattern="done")
print(future_pattern.done(), future_pattern.result(), future_pattern.done())
```
```
>>> (False, "0\n1\n2\n3\ndone\n", True)
```
In this example the pattern simply searches for the string `done` in the output of the program and returns all the output
Expand Down
24 changes: 12 additions & 12 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,19 @@ their own version of `mpi` and `mpi4py` the `pympipool` package is also provided

### conda-based installation
In the same way `pympipool` can be installed with the [conda package manager](https://anaconda.org/conda-forge/pympipool):
```
```shell
conda install -c conda-forge pympipool
```
When resolving the dependencies with `conda` gets slow it is recommended to use `mamba` instead of `conda`. So you can
also install `pympipool` using:
```
```shell
mamba install -c conda-forge pympipool
```

### pypi-based installation
`pympipool` can be installed from the [python package index (pypi)](https://pypi.org/project/pympipool/) using the
following command:
```
```shell
pip install pympipool
```

Expand All @@ -52,7 +52,7 @@ Still the user would not call these interfaces directly, but rather use it throu
### Flux Framework
For Linux users without a pre-installed resource scheduler in their high performance computing (HPC) environment, the
[flux framework](https://flux-framework.org) can be installed with the `conda` package manager:
```
```shell
conda install -c conda-forge flux-core
```
For alternative ways to install the [flux framework](https://flux-framework.org) please refer to their official
Expand All @@ -61,30 +61,30 @@ For alternative ways to install the [flux framework](https://flux-framework.org)
#### Nvidia
For adding GPU support in the [flux framework](https://flux-framework.org) you want to install `flux-sched` in addition
to `flux-core`. For Nvidia GPUs you need:
```
```shell
conda install -c conda-forge flux-core flux-sched libhwloc=*=cuda*
```
In case this fails because there is no GPU on the login node and the `cudatoolkit` cannot be installed you can use the
`CONDA_OVERRIDE_CUDA` environment variable to pretend a local cuda version is installed `conda` can link to using:
```
```shell
CONDA_OVERRIDE_CUDA="11.6" conda install -c conda-forge flux-core flux-sched libhwloc=*=cuda*
```

#### AMD
For adding GPU support in the [flux framework](https://flux-framework.org) you want to install `flux-sched` in addition
to `flux-core`. For AMD GPUs you need:
```
```shell
conda install -c conda-forge flux-core flux-sched
```

#### Test Flux
To test the [flux framework](https://flux-framework.org) and validate the GPUs are correctly recognized you can start
a flux instance using:
```
```shell
flux start
```
Afterwards, you can list the resources accessible to flux using:
```
```shell
flux resource list
```
This should contain a column for the GPUs if you installed the required dependencies. Here is an example output for a
Expand All @@ -101,15 +101,15 @@ hyper-threading the total number of CPU cores might be half the number of cores
When the [flux framework](https://flux-framework.org) is used inside an existing queuing system, then you have to
communicate these resources to it. For the [SLURM workload manager](https://www.schedmd.com) this is achieved by calling
`flux start` with `srun`. For an interactive session use:
```
```shell
srun --pty flux start
```
Alternatively, to execute a python script which uses `pympipool` you can call it with:
```
```shell
srun flux start python <your python script.py>
```
In the same way to start a Jupyter Notebook in an interactive allocation you can use:
```
```shell
srun --pty flux start jupyter notebook
```
Then each jupyter notebook you execute on this jupyter notebook server has access to the resources of the interactive
Expand Down

0 comments on commit 5125ad6

Please sign in to comment.