From bb0a49dc8f80ac3830c6614608d8ba7760be011e Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Wed, 18 Sep 2024 12:42:17 -0700 Subject: [PATCH 1/3] Update GPU attribute documentation Signed-off-by: Danila Fedorin --- doc/rst/technotes/gpu.rst | 47 +++++++++++++++++++++++++++------------ 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index 68ed74d7dfb7..3c3be230327a 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -265,8 +265,7 @@ GPU-Related Attributes Chapel's GPU support makes use of attributes (see `Attributes in Chapel <./attributes.html>`_) to control various aspects of how code is compiled or executed on the GPU. Currently the following GPU-specific attributes are available: -``@assertOnGpu`` (described in `Diagnostics and Utilities`_), -``@gpu.assertEligible``, +``@assertOnGpu`` and ``@gpu.assertEligible`` (described in `Diagnostics and Utilities`_), ``@gpu.blockSize``, ``@gpu.itersPerThread``. Because @@ -297,10 +296,11 @@ sequentially within the same GPU thread. Users must ensure that the arguments to the "blockSize" and "itersPerThread" attributes are positive and non-zero. -In addition to applying GPU attributes to loops, Chapel provides (experimental) -support for applying them to variable declarations. This is intended for use -with variables whose initializers contain GPU-bound code. The following example -demonstrates initializing an array ``A`` from a ``foreach`` expression: +To apply attributes to expression-level loops such as promoted function calls +or ``foreach`` expressions, Chapel also (experimentally) supports decorating +variable declarations with GPU attributes. In the following example, an array +``A`` is initialized from a ``foreach`` expression, where two GPU attributes +are used to control the execution of the expression on the GPU: .. code-block:: chapel @@ -308,6 +308,19 @@ demonstrates initializing an array ``A`` from a ``foreach`` expression: @gpu.itersPerThread(4) var A = foreach i in 1..1000000 do i * i; +This integrates with Chapel's support for `Remote Variable Declarations <./remote.html>`_; +the following piece of code demonstrates declaring a (GPU-allocated) array +``A`` in code that otherwise runs on a CPU locale: + +.. code-block:: chapel + + @assertOnGpu + on here.gpus[0] var A = foreach i in 1..1000000 do i * i; + +The ``@assertOnGpu`` attribute applies and checks the GPU eligibility of the +``foreach`` expression. The expression is then executed on the GPU locale, +which ensures the runtime GPU assertion is satisfied. + CPU-as-Device Mode ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``CHPL_GPU`` environment variable can be set to ``cpu`` to enable many GPU @@ -380,14 +393,20 @@ will actually run on a GPU or not) pass ``chpl`` the ``--report-gpu`` flag. Since not all Chapel loops are eligible for conversion into GPU kernels, it is helpful to be able to ensure that a particular loop is being executed -on the GPU. This can be achieved by marking the loop with the ``@assertOnGpu`` -attribute. When a ``forall`` or ``foreach`` loop is marked with this attribute, -the compiler will perform a compile-time check and produce an error if one of -the aforementioned requirements is not met. Loops marked with the -``@assertOnGpu`` attribute will also conduct a runtime assertion that will halt -execution when not being performed on a GPU. This can happen when the loop -is eligible for GPU execution, but is being executed outside of a GPU locale. -The :mod:`GPU` module contains additional utility functions. +on the GPU. This can be achieved by marking the loop with the +:annotation:`~GPU.@assertOnGpu` attribute. When a ``forall`` or ``foreach`` +loop is marked with this attribute, the compiler will perform a compile-time +check and produce an error if one of the aforementioned requirements is not met. +Loops marked with the ``@assertOnGpu`` attribute will also conduct a runtime +assertion that will halt execution when not being performed on a GPU. This can +happen when the loop is eligible for GPU execution, but is being executed +outside of a GPU locale. The :mod:`GPU` module contains additional utility +functions. + +In some cases, it is desirable to write code that can execute on the GPU, but is +not required to do so. In this case, ``@assertOnGpu``'s runtime component +is unnecessary. The :annotation:`@gpu.assertEligible ` attribute has the +same compile-time behavior as ``@assertOnGpu``, but does not perform this check. Utilities in the :mod:`MemDiagnostics` module can be used to monitor GPU memory allocations and detect memory leaks. For example, :proc:`startVerboseMem() From 58e540a48a6e226f41d840ea4ce4d9e72cd689fe Mon Sep 17 00:00:00 2001 From: Danila Fedorin Date: Wed, 18 Sep 2024 12:58:30 -0700 Subject: [PATCH 2/3] Add a link to documentation for promotion Signed-off-by: Danila Fedorin --- doc/rst/technotes/gpu.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index 3c3be230327a..6065e0357c50 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -296,11 +296,12 @@ sequentially within the same GPU thread. Users must ensure that the arguments to the "blockSize" and "itersPerThread" attributes are positive and non-zero. -To apply attributes to expression-level loops such as promoted function calls -or ``foreach`` expressions, Chapel also (experimentally) supports decorating -variable declarations with GPU attributes. In the following example, an array -``A`` is initialized from a ``foreach`` expression, where two GPU attributes -are used to control the execution of the expression on the GPU: +To apply attributes to expression-level loops such as +:ref:`promoted function calls ` or ``foreach`` expressions, Chapel +also (experimentally) supports decorating variable declarations with GPU +attributes. In the following example, an array ``A`` is initialized from a +``foreach`` expression, where two GPU attributes are used to control the +execution of the expression on the GPU: .. code-block:: chapel From b5e5d0b9dc787a75cc5f155ed4f190f8aa6c0748 Mon Sep 17 00:00:00 2001 From: Daniel Date: Wed, 18 Sep 2024 13:50:28 -0700 Subject: [PATCH 3/3] Incorporate reviewer feedback Signed-off-by: Danila Fedorin --- doc/rst/technotes/gpu.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/rst/technotes/gpu.rst b/doc/rst/technotes/gpu.rst index 6065e0357c50..186ffe9b6e6b 100644 --- a/doc/rst/technotes/gpu.rst +++ b/doc/rst/technotes/gpu.rst @@ -407,7 +407,8 @@ functions. In some cases, it is desirable to write code that can execute on the GPU, but is not required to do so. In this case, ``@assertOnGpu``'s runtime component is unnecessary. The :annotation:`@gpu.assertEligible ` attribute has the -same compile-time behavior as ``@assertOnGpu``, but does not perform this check. +same compile-time behavior as ``@assertOnGpu``, but does not perform this +execution-time check. Utilities in the :mod:`MemDiagnostics` module can be used to monitor GPU memory allocations and detect memory leaks. For example, :proc:`startVerboseMem()