diff --git a/notebooks/1-local.ipynb b/notebooks/1-local.ipynb index bde52e9d..efd39b38 100644 --- a/notebooks/1-local.ipynb +++ b/notebooks/1-local.ipynb @@ -5,7 +5,19 @@ "id": "6915218e-7cf3-4bf4-9618-7e6942b4762f", "metadata": {}, "source": [ - "# Local Testing" + "# Local Mode\n", + "The local mode in executorlib which is selected by setting the `backend` parameter to `\"local\"` is primarily used to enable rapid prototyping on a workstation computer to test your parallel Python program with executorlib before transferring it to an high performance computer (HPC). With the added capability of executorlib it is typically 10% slower than the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) from the Python standard library on a single node, when all acceleration features are enabled. This overhead is primarily related to the creation of new tasks. So the performance of executorlib improves when the individual Python function calls require extensive computations. \n", + "\n", + "On advantage that executorlib has over the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) and the [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) from the Python standard libary, is the use of [cloudpickle](https://github.com/cloudpipe/cloudpickle) as serialization backend to transfer Python functions between processes. This enables the use of dynamically defined Python functions for example in the case of a Jupyter notebook. " + ] + }, + { + "cell_type": "markdown", + "id": "ccc686dd-8fc5-4755-8a19-f40010ebb1b8", + "metadata": {}, + "source": [ + "## Basic Functionality\n", + "The general functionality of executorlib follows the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) of the Python standard library. You can import the `Executor` class directly from executorlib and then just replace the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) or [ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) with the `Executor` class to start using executorlib. " ] }, { @@ -18,6 +30,14 @@ "from executorlib import Executor" ] }, + { + "cell_type": "markdown", + "id": "1654679f-38b3-4699-9bfe-b48cbde0b2db", + "metadata": {}, + "source": [ + "It is recommended to use the `Executor` class in combination with a `with`-statement. This gurantees the processes created by the `Executor` class to evaluate the Python functions are afterward closed and do not remain ghost processes. A function is then submitted using the `submit(fn, /, *args, **kwargs)` function which executes a given function `fn` as `fn(*args, **kwargs)`. The `submit()` function returns a [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object, as defined by the Python Standard Library. As a first example we submit the function `sum()` to calculate the sum of the list `[1, 1]`:" + ] + }, { "cell_type": "code", "execution_count": 2, @@ -29,18 +49,26 @@ "output_type": "stream", "text": [ "2\n", - "CPU times: user 28.8 ms, sys: 14.3 ms, total: 43.2 ms\n", - "Wall time: 709 ms\n" + "CPU times: user 19.9 ms, sys: 14.6 ms, total: 34.5 ms\n", + "Wall time: 210 ms\n" ] } ], "source": [ "%%time\n", - "with Executor(backend=\"local\") as exe:\n", + "with Executor() as exe:\n", " future = exe.submit(sum, [1, 1])\n", " print(future.result())" ] }, + { + "cell_type": "markdown", + "id": "a1109584-9db2-4f9d-b3ed-494d96241396", + "metadata": {}, + "source": [ + "As expected the result of the summation `sum([1, 1])` is `2`. The same result is retrieved from the [concurrent.futures.Future](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object received from the submission of the `sum()` as it is printed here `print(future.result())`. For most Python functions and especially the `sum()` function it is computationally not efficient to initialize the `Executor` class only for the execution of a single function call, rather it is more computationally efficient to initialize the `Executor` class once and then submit a number of functions. This can be achieved with a loop. For example the sum of the pairs `[2, 2]`, `[3, 3]` and `[4, 4]` can be achieved with a for-loop inside the context of the `Executor()` class as provided by the `with`-statement. " + ] + }, { "cell_type": "code", "execution_count": 3, @@ -52,18 +80,26 @@ "output_type": "stream", "text": [ "[4, 6, 8]\n", - "CPU times: user 9.98 ms, sys: 5.93 ms, total: 15.9 ms\n", - "Wall time: 471 ms\n" + "CPU times: user 7.72 ms, sys: 7.54 ms, total: 15.3 ms\n", + "Wall time: 187 ms\n" ] } ], "source": [ "%%time\n", - "with Executor(backend=\"local\") as exe:\n", + "with Executor() as exe:\n", " future_lst = [exe.submit(sum, [i, i]) for i in range(2, 5)]\n", " print([f.result() for f in future_lst])" ] }, + { + "cell_type": "markdown", + "id": "7db58f70-8137-4f1c-a87b-0d282f2bc3c5", + "metadata": {}, + "source": [ + "If only the parameters change but the function, which is applied to these parameters, remains the same, like in the case above the `sum()` function is applied to three pairs of parameters, then the `map(fn, *iterables, timeout=None, chunksize=1)` function can be used to map the function to the different sets of parameters - as it is defined in the [Python standard library](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map). " + ] + }, { "cell_type": "code", "execution_count": 4, @@ -75,52 +111,54 @@ "output_type": "stream", "text": [ "[10, 12, 14]\n", - "CPU times: user 13.6 ms, sys: 1.71 ms, total: 15.3 ms\n", - "Wall time: 471 ms\n" + "CPU times: user 7.16 ms, sys: 7.72 ms, total: 14.9 ms\n", + "Wall time: 191 ms\n" ] } ], "source": [ "%%time\n", - "with Executor(backend=\"local\") as exe:\n", + "with Executor() as exe:\n", " results = exe.map(sum, [[5, 5], [6, 6], [7, 7]])\n", " print(list(results))" ] }, { "cell_type": "markdown", - "id": "5de0f0f2-bf5c-46b3-8171-a3a206ce6775", + "id": "ac86bf47-4eb6-4d7c-acae-760b880803a8", "metadata": {}, "source": [ - "## Parallel functions " + "These three examples cover the general functionality of the `Executor` class. Following the [Executor](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor) interface as it is defined in the Python standard library. " ] }, { "cell_type": "markdown", - "id": "dc8e692f-bf6c-4838-bb82-6a6b8454a2e7", + "id": "5de0f0f2-bf5c-46b3-8171-a3a206ce6775", "metadata": {}, "source": [ - "### MPI parallel functions" + "## Parallel Functions\n", + "Writing parallel software is not trivial. So rather than writing the whole Python program in a parallel way, executorlib allows developers to implement parallel execution on a function by function level. In this way individual functions can be replaced by parallel functions as needed without the need to modify the rest of the program. With the Local Mode executorlib supports two levels of parallel execution, parallel execution based on the Message Passing Interface (MPI) using the [mpi4py](https://mpi4py.readthedocs.io) package, or thread based parallel execution. Both levels of parallelism can be defined inside the function and do not require any modifications to the rest of the Python program. " ] }, { - "cell_type": "code", - "execution_count": 5, - "id": "f3a8268d-b852-4be4-98e5-7190479c3eda", + "cell_type": "markdown", + "id": "dc8e692f-bf6c-4838-bb82-6a6b8454a2e7", "metadata": {}, - "outputs": [], "source": [ - "from executorlib import Executor" + "### MPI Parallel Functions\n", + "MPI is the default way to develop parallel programs for HPCs. Still it can be challenging to refactor a previously serial program to efficiently use MPI to achieve optimal computational efficiency for parallel execution, even with libraries like [mpi4py](https://mpi4py.readthedocs.io). To simplify the up-scaling of Python programs executorlib provides the option to use MPI parallel Python code inside a given Python function and then submit this parallel Python function to an `Executor` for evaluation. \n", + "\n", + "The following `calc_mpi()` function imports the [mpi4py](https://mpi4py.readthedocs.io) package and then uses the internal functionality of MPI to get the total number of parallel CPU cores in the current MPI group `MPI.COMM_WORLD.Get_size()` and the index of the current processor in the MPI group `MPI.COMM_WORLD.Get_rank()`." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 5, "id": "a251d083-489e-41c1-9e49-c86093858006", "metadata": {}, "outputs": [], "source": [ - "def calc(i):\n", + "def calc_mpi(i):\n", " from mpi4py import MPI\n", "\n", " size = MPI.COMM_WORLD.Get_size()\n", @@ -130,7 +168,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "id": "266864f1-d29e-4934-9b5d-51f4ffb11f5c", "metadata": {}, "outputs": [ @@ -140,35 +178,17 @@ "text": [ "[(3, 2, 0), (3, 2, 1)]\n" ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "--------------------------------------------------------------------------\n", - "WARNING: Linux kernel CMA support was requested via the\n", - "btl_vader_single_copy_mechanism MCA variable, but CMA support is\n", - "not available due to restrictive ptrace settings.\n", - "\n", - "The vader shared memory BTL will fall back on another single-copy\n", - "mechanism if one is available. This may result in lower performance.\n", - "\n", - " Local host: cmpc06\n", - "--------------------------------------------------------------------------\n", - "[cmpc06:12990] 1 more process has sent help message help-btl-vader.txt / cma-permission-denied\n", - "[cmpc06:12990] Set MCA parameter \"orte_base_help_aggregate\" to 0 to see all help / error messages\n" - ] } ], "source": [ - "with Executor(max_workers=1, max_cores=2, backend=\"local\") as exe:\n", - " fs = exe.submit(calc, 3, resource_dict={\"cores\": 2})\n", + "with Executor(backend=\"local\") as exe:\n", + " fs = exe.submit(calc_mpi, 3, resource_dict={\"cores\": 2})\n", " print(fs.result())" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 7, "id": "cb4ad978-bdf2-47bb-a7df-846641a54ec2", "metadata": {}, "outputs": [ @@ -178,34 +198,11 @@ "text": [ "[(3, 2, 0), (3, 2, 1)]\n" ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "--------------------------------------------------------------------------\n", - "WARNING: Linux kernel CMA support was requested via the\n", - "btl_vader_single_copy_mechanism MCA variable, but CMA support is\n", - "not available due to restrictive ptrace settings.\n", - "\n", - "The vader shared memory BTL will fall back on another single-copy\n", - "mechanism if one is available. This may result in lower performance.\n", - "\n", - " Local host: cmpc06\n", - "--------------------------------------------------------------------------\n", - "[cmpc06:13024] 1 more process has sent help message help-btl-vader.txt / cma-permission-denied\n", - "[cmpc06:13024] Set MCA parameter \"orte_base_help_aggregate\" to 0 to see all help / error messages\n" - ] } ], "source": [ - "with Executor(\n", - " max_workers=1, max_cores=2, resource_dict={\"cores\": 2}, backend=\"local\"\n", - ") as exe:\n", - " fs = exe.submit(\n", - " calc,\n", - " 3,\n", - " )\n", + "with Executor(resource_dict={\"cores\": 2}, backend=\"local\") as exe:\n", + " fs = exe.submit(calc_mpi, 3)\n", " print(fs.result())" ] }, @@ -214,15 +211,67 @@ "id": "4f5c5221-d99c-4614-82b1-9d6d3260c1bf", "metadata": {}, "source": [ - "### Thread parallelism" + "### Thread Parallel Functions" ] }, { - "cell_type": "markdown", - "id": "6d85583f-4b2e-454d-9867-d02035577eb3", + "cell_type": "code", + "execution_count": 8, + "id": "7a7d21f6-9f1a-4f30-8024-9993e156dc75", + "metadata": {}, + "outputs": [], + "source": [ + "def calc_with_threads(i):\n", + " import os\n", + "\n", + " os.environ[\"OMP_NUM_THREADS\"] = \"2\"\n", + " os.environ[\"OPENBLAS_NUM_THREADS\"] = \"2\"\n", + " os.environ[\"MKL_NUM_THREADS\"] = \"2\"\n", + " os.environ[\"VECLIB_MAXIMUM_THREADS\"] = \"2\"\n", + " os.environ[\"NUMEXPR_NUM_THREADS\"] = \"2\"\n", + " import numpy as np\n", + "\n", + " return i" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b8ed330d-ee77-44a0-a02f-670fa945b043", "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n" + ] + } + ], "source": [ - "### Combined parallelism" + "with Executor(backend=\"local\") as exe:\n", + " fs = exe.submit(calc_with_threads, 3, resource_dict={\"threads_per_core\": 2})\n", + " print(fs.result())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cfb2e2ce-0030-40fe-a9f5-3be420d0ae23", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n" + ] + } + ], + "source": [ + "with Executor(backend=\"local\", resource_dict={\"threads_per_core\": 2}) as exe:\n", + " fs = exe.submit(calc_with_threads, 3)\n", + " print(fs.result())" ] }, { @@ -243,7 +292,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 11, "id": "1271887b-68a5-4dbb-afb7-c88c04ebbdf1", "metadata": {}, "outputs": [], @@ -253,7 +302,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 12, "id": "0da4c7d0-2268-4ea8-b62d-5d94c79ebc72", "metadata": {}, "outputs": [ @@ -262,8 +311,8 @@ "output_type": "stream", "text": [ "2\n", - "CPU times: user 14.2 ms, sys: 4.5 ms, total: 18.7 ms\n", - "Wall time: 684 ms\n" + "CPU times: user 12.2 ms, sys: 27.3 ms, total: 39.5 ms\n", + "Wall time: 280 ms\n" ] } ], @@ -276,7 +325,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 13, "id": "ea1eaaf7-15a1-4032-83f0-b6e5ae6267b9", "metadata": {}, "outputs": [ @@ -285,8 +334,8 @@ "output_type": "stream", "text": [ "[4, 6, 8]\n", - "CPU times: user 12.5 ms, sys: 7.49 ms, total: 19.9 ms\n", - "Wall time: 689 ms\n" + "CPU times: user 11.7 ms, sys: 28.1 ms, total: 39.8 ms\n", + "Wall time: 288 ms\n" ] } ], @@ -299,7 +348,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 14, "id": "4988a169-cdd9-4e95-aadc-eedf1cd693d9", "metadata": {}, "outputs": [ @@ -308,8 +357,8 @@ "output_type": "stream", "text": [ "[10, 12, 14]\n", - "CPU times: user 13.2 ms, sys: 6.64 ms, total: 19.8 ms\n", - "Wall time: 695 ms\n" + "CPU times: user 11.6 ms, sys: 26.2 ms, total: 37.8 ms\n", + "Wall time: 268 ms\n" ] } ], @@ -322,7 +371,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 15, "id": "cb8c4943-4c78-4203-95f2-1db758e588d9", "metadata": {}, "outputs": [], @@ -337,7 +386,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 16, "id": "5ebf7195-58f9-40f2-8203-2d4b9f0e9602", "metadata": {}, "outputs": [ @@ -347,24 +396,6 @@ "text": [ "[(3, 2, 0), (3, 2, 1)]\n" ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "--------------------------------------------------------------------------\n", - "WARNING: Linux kernel CMA support was requested via the\n", - "btl_vader_single_copy_mechanism MCA variable, but CMA support is\n", - "not available due to restrictive ptrace settings.\n", - "\n", - "The vader shared memory BTL will fall back on another single-copy\n", - "mechanism if one is available. This may result in lower performance.\n", - "\n", - " Local host: cmpc06\n", - "--------------------------------------------------------------------------\n", - "[cmpc06:13322] 1 more process has sent help message help-btl-vader.txt / cma-permission-denied\n", - "[cmpc06:13322] Set MCA parameter \"orte_base_help_aggregate\" to 0 to see all help / error messages\n" - ] } ], "source": [ @@ -384,7 +415,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 17, "id": "cc648799-a0c6-4878-a469-97457bce024f", "metadata": {}, "outputs": [], @@ -395,7 +426,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 18, "id": "8aa754cc-eb1a-4fa1-bd72-272246df1d2f", "metadata": {}, "outputs": [], @@ -406,7 +437,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 19, "id": "18bafb74-aacd-4744-9dfe-2a8ed07a1d25", "metadata": {}, "outputs": [ @@ -436,7 +467,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 20, "id": "7722b49c-7253-4ec8-9c73-d70e4d475106", "metadata": {}, "outputs": [], @@ -446,7 +477,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 21, "id": "ecdcef49-5c89-4538-b377-d53979673bf7", "metadata": {}, "outputs": [ @@ -455,8 +486,8 @@ "output_type": "stream", "text": [ "[2, 4, 6]\n", - "CPU times: user 89.7 ms, sys: 47 ms, total: 137 ms\n", - "Wall time: 1.62 s\n" + "CPU times: user 159 ms, sys: 50.1 ms, total: 209 ms\n", + "Wall time: 521 ms\n" ] } ], @@ -469,7 +500,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 22, "id": "c39babe8-4370-4d31-9520-9a7ce63378c8", "metadata": {}, "outputs": [ @@ -478,8 +509,8 @@ "output_type": "stream", "text": [ "[2, 4, 6]\n", - "CPU times: user 18.6 ms, sys: 5.68 ms, total: 24.3 ms\n", - "Wall time: 528 ms\n" + "CPU times: user 10.8 ms, sys: 10.6 ms, total: 21.4 ms\n", + "Wall time: 188 ms\n" ] } ], @@ -492,7 +523,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "id": "34a9316d-577f-4a63-af14-736fb4e6b219", "metadata": {}, "outputs": [ @@ -500,7 +531,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "['sumb6a5053f96b7031239c2e8d0e7563ce4.h5out', 'sumd1bf4ee658f1ac42924a2e4690e797f4.h5out', 'sum5171356dfe527405c606081cfbd2dffe.h5out']\n" + "['sumb6a5053f96b7031239c2e8d0e7563ce4.h5out', 'sum5171356dfe527405c606081cfbd2dffe.h5out', 'sumd1bf4ee658f1ac42924a2e4690e797f4.h5out']\n" ] } ], @@ -527,7 +558,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "id": "e7fe0048-8ab4-4a9e-bd25-d56911153528", "metadata": {}, "outputs": [], @@ -537,7 +568,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 25, "id": "d8b75a26-479d-405e-8895-a8d56b3f0f4b", "metadata": {}, "outputs": [], @@ -548,7 +579,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 26, "id": "35fd5747-c57d-4926-8d83-d5c55a130ad6", "metadata": {}, "outputs": [ @@ -573,7 +604,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 27, "id": "f67470b5-af1d-4add-9de8-7f259ca67324", "metadata": {}, "outputs": [ @@ -715,7 +746,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.10" + "version": "3.12.5" } }, "nbformat": 4,