Skip to content

Commit

Permalink
docs: grammar tweaks
Browse files Browse the repository at this point in the history
Problem: some of the grammar in the docs could be improved.

Move some commas around and add some dashes.
  • Loading branch information
jameshcorbett committed Nov 30, 2023
1 parent 2c616ac commit d98e924
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 15 deletions.
2 changes: 1 addition & 1 deletion docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ with ThreadPoolExecutor(
future = exe.submit(sum, [1, 1])
print(future.result())
```
In this case `max_workers=1` limits the number of threads uses by the `ThreadPoolExecutor` to one. Then the `sum()`
In this case `max_workers=1` limits the number of threads used by the `ThreadPoolExecutor` to one. Then the `sum()`
function is submitted to the executor with a list with two ones `[1, 1]` as input. A [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
object is returned. The `Future` object allows to check the status of the execution with the `done()` method which
returns `True` or `False` depending on the state of the execution. Or the main process can wait until the execution is
Expand Down
21 changes: 10 additions & 11 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,27 @@ pympipool - up-scale python functions for high performance computing
Up-scaling python functions for high performance computing (HPC) can be challenging. While the python standard library
provides interfaces for multiprocessing and asynchronous task execution, namely
`multiprocessing <https://docs.python.org/3/library/multiprocessing.html>`_ and
`concurrent.futures <https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures>`_ both are
`concurrent.futures <https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures>`_, both are
limited to the execution on a single compute node. So a series of python libraries have been developed to address the
up-scaling of python functions for HPC. Starting in the datascience and machine learning community with solutions
like `dask <https://www.dask.org>`_ over more HPC focused solutions like
`fireworks <https://materialsproject.github.io/fireworks/>`_ and `parsl <http://parsl-project.org>`_ up to Python
like `dask <https://www.dask.org>`_, over to more HPC-focused solutions like
`fireworks <https://materialsproject.github.io/fireworks/>`_ and `parsl <http://parsl-project.org>`_, up to Python
bindings for the message passing interface (MPI) named `mpi4py <https://mpi4py.readthedocs.io>`_. Each of these
solutions has their advantages and disadvantages, in particular scaling beyond serial python functions, including thread
based parallelism, MPI parallel python application or assignment of GPUs to individual python function remains
challenging.
solutions has its advantages and disadvantages. However, one disadvantage common to all these libraries is the relative difficulty of scaling from serial functions to functions that make use of thread-based, MPI-based, or GPU-based parallelism.

To address these challenges :code:`pympipool` is developed with three goals in mind:

* Extend the standard python library `concurrent.futures.Executor <https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures>`_ interface, to minimize the barrier of up-scaling an existing workflow to be used on HPC resources.
* Integrate thread based parallelism, MPI parallel python functions based on `mpi4py <https://mpi4py.readthedocs.io>`_ and GPU assignment. This allows the users to accelerate their workflows one function at a time.
* Integrate thread-based parallelism, MPI-parallel python functions based on `mpi4py <https://mpi4py.readthedocs.io>`_, and GPU assignment. This allows users to accelerate their workflows one function at a time.
* Embrace `Jupyter <https://jupyter.org>`_ notebooks for the interactive development of HPC workflows, as they allow the users to document their though process right next to the python code and their results all within one document.

HPC Context
-----------
In contrast to frameworks like `dask <https://www.dask.org>`_, `fireworks <https://materialsproject.github.io/fireworks/>`_
and `parsl <http://parsl-project.org>`_ which can be used to submit a number of worker processes directly the the HPC
Frameworks like `dask <https://www.dask.org>`_, `fireworks <https://materialsproject.github.io/fireworks/>`_
and `parsl <http://parsl-project.org>`_ can be used to submit a number of worker processes directly to the HPC
queuing system and then transfer tasks from either the login node or an interactive allocation to these worker processes
to accelerate the execution, `mpi4py <https://mpi4py.readthedocs.io>`_ and :code:`pympipool` follow a different
approach. Here the user creates their HPC allocation first and then `mpi4py <https://mpi4py.readthedocs.io>`_ or
to accelerate the execution. By contrast, `mpi4py <https://mpi4py.readthedocs.io>`_ and :code:`pympipool` follow a different
approach, in which the user creates their HPC allocation first and then `mpi4py <https://mpi4py.readthedocs.io>`_ or
:code:`pympipool` can be used to distribute the tasks within this allocation. The advantage of this approach is that
no central data storage is required as the workers and the scheduling task can communicate directly.

Expand Down Expand Up @@ -69,6 +67,7 @@ The same code can also be executed inside a jupyter notebook directly which enab
The standard `concurrent.futures.Executor <https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures>`_
interface is extended by adding the option :code:`cores_per_worker=2` to assign multiple MPI ranks to each function call.
To create two workers :code:`max_workers=2` each with two cores each requires a total of four CPU cores to be available.

After submitting the function :code:`calc()` with the corresponding parameter to the executor :code:`exe.submit(calc, 0)`
a python `concurrent.futures.Future <https://docs.python.org/3/library/concurrent.futures.html#future-objects>`_ is
returned. Consequently, the :code:`pympipool.Executor` can be used as a drop-in replacement for the
Expand Down
6 changes: 3 additions & 3 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ pip install pympipool
## High Performance Computing
`pympipool` currently provides interfaces to the [SLURM workload manager](https://www.schedmd.com) and the
[flux framework](https://flux-framework.org). With the [flux framework](https://flux-framework.org) being the
recommended solution as it can be installed without root user rights and it can be integrated in existing resource
recommended solution as it can be installed without root permissions and it can be integrated in existing resource
managers like the [SLURM workload manager](https://www.schedmd.com). The advantages of using `pympipool` in combination
with these resource schedulers is the fine-grained resource allocation. In addition, to scaling beyond a single compute
node they add the ability to assign GPUs and thread based parallelism. The two resource manager are internally linked to
with these resource schedulers is the fine-grained resource allocation. In addition to scaling beyond a single compute
node, they add the ability to assign GPUs and thread based parallelism. The two resource manager are internally linked to
two interfaces:

* `pympipool.slurm.PySlurmExecutor`: The interface for the [SLURM workload manager](https://www.schedmd.com).
Expand Down

0 comments on commit d98e924

Please sign in to comment.