GPU Attribute Improvements #25826

DanilaFe · 2024-08-27T21:30:42Z

This PR seeks to improve the experience of using various GPU attributes in several cases. These changes are motivated by some of the difficulties I've observed @Iainmon to have had while working on his GPU-enabled machine learning code.

Case 1: Writing CPU and GPU code using the GPU locale model

The way to ensure that a loop is GPU-eligible in user code (and to fail compilation if the loop is not GPU eligible), is to use @assertOnGpu. However, one cannot do this when writing code that is expected to support both GPUs and CPUs. I've observed Iain's code to have something like the following:

if onGpu {
  @assertOnGpu
  foreach ... { }
} else {
  foreach ... { /* the same loop as above */ }
}

This way, the code could be used on both the GPU and the CPU, and the compiler will ensure that the GPU version is eligible. However, this introduces a maintenance burden, and makes the code rather verbose. To work around this problem, I introduce a new GPU primitive + attribute: @gpu.assertEligible. This attribute has the same behavior as @assertOnGpu at compile-time, but it does not have a runtime effect. Thus, the code above can be flattened and continue to support both CPU and GPU runs:

if onGpu {
  @gpu.assertEligible
  foreach ... { }
}

In my opinion, we should phase out the use of @assertOnGpu in favor of @gpu.assertEligible. It's unclear to me that having a runtime assertion using this attribute is worth keeping it around two similar attributes. Personally, I think that the compile-time assertion can be handled by @gpu.assertEligible, and various utilities from GpuDiagnostics for tracking kernel launches etc. can be used to ensure that GPU execution occurs at runtime. This PR doesn't make this (potentially more controversial change).

Case 2: Disabling GPU support and compiling with `CHPL_LOCALE_MODEL=flat`

When I told Iain to run his performance experiments in the flat locale model (to get started with initial performance results via the CPU), he immediately ran into internal errors. This is an instance of #22822.

My chosen solution to this problem is to make @assertOnGpu a compile-time error under CHPL_LOCALE_MODEL=flat. This is because of the semantics of @assertOnGpu: this attribute has a runtime check; without a GPU, the check is guaranteed to fail, and cause a "certain" failure. This error is now user-facing, and tells the user to switch to @gpu.assertEligible if all they want is a compile-time check.

On the other hand, the @gpu.assertEligible attribute, which does not have any runtime semantics, does not cause a compilation error with the flat locale model. Instead, the attribute is simply ignored (we don't perform any GPU logic with the flat locale model, and it doesn't seem worth it to actually perform GPU transformations / analysis for the sole purpose of validating GPU eligibility). The same is true for @gpu.blockSize, and the non-user-facing "GPU primitive block" primitive which is used to group GPU primitives created via attributes. Thus, the following code compiles and runs just fine in the flat locale model:

@gpu.assertEligible
@gpu.blockSize(128)
foreach i in 1..128 { /* ... */ }

Reviewed by @e-kayrakli -- thanks!

Testing

new flat tests for GPU primitives, including a new user-facing error.
GPU tests, including new tests for @gpu.assertEligible
paratest

Signed-off-by: Danila Fedorin <[email protected]>

e-kayrakli

Looks good generally. I have mostly nits. I have more testing ideas that I'd like to see go in with this PR, though:

Error message for blockSize(1,2) if we don't have it. (I know it is not about this PR, but the PR touches relevant areas)
More importantly, what happens if I use both flavors of assertions together in flat and gpu locale models? Some of those cases will error, some should just work. It'd be good to see that they can be used without an embarrassing issue.

e-kayrakli · 2024-08-28T19:54:47Z

compiler/optimizations/gpuTransforms.cpp

+      reason = "is marked with @gpu.assertEligible";
+    } else {
+      CHPL_ASSERT(assertion->isPrimitive(PRIM_ASSERT_ON_GPU));
+      reason = "contains assertOnGpu()";


it doesn't look like we need this line.

We do; the legacy assertOnGpu is still present in the code. I deprecated and can remove it in a separate PR, but would rather not do it in this one.

Ah, I see. Would you be up for doing that before the release?

Yes, for sure.

compiler/optimizations/gpuTransforms.cpp

e-kayrakli · 2024-08-28T20:06:41Z

compiler/passes/convert-uast.cpp

-    body->insertAtTail(new CallExpr(PRIM_ASSERT_ON_GPU,
-                                    new SymExpr(gTrue)));
-    return true;
+    body->insertAtTail(new CallExpr("chpl__assertOnGpuAttr"));


I know I said this wouldn't bother me, but could you explain to me (or point me to an issue) why this is better?

Some of my reasons:

I can use the overload resolution process to select candidate functions. For instance, I may use a where clause to avoid inserting GPU primitives when CHPL_LOCALE_MODEL=flat

I can ensure that the arguments to the primitives are correctly typed (since calls are just “regular functions”).

I don’t have to change compiler code to adjust the behavior of attributes, to some extent

Using an experimental annotation (which I plan to work on next after this PR) I can get chpldoc to use these function definitions (like chpl__assertOnGpuAttr) to generate documentation for attributes, which addresses Should we provide a specific way to document attributes? #23149.

Using an experimental annotation (which I plan to work on next after this PR) I can get chpldoc to use these function definitions (like chpl__assertOnGpuAttr) to generate documentation for attributes, which addresses

Nice.

Thanks for explanation. You are right on all counts.

e-kayrakli · 2024-08-28T20:07:00Z

compiler/passes/convert-uast.cpp

  if (assertOnGpuAttr) {
-    body->insertAtTail(new CallExpr(PRIM_ASSERT_ON_GPU,


Have you removed this primitive? I don't think you did.

I didn't, and don't intend to in this PR, though my goal in the long term is to remove it.

compiler/passes/convert-uast.cpp

modules/standard/AutoGpu.chpl

Signed-off-by: Danila Fedorin <[email protected]>

e-kayrakli · 2024-08-29T17:24:28Z

runtime/include/chpl-gpu.h

+// Provide a fallback for the chpl_assert_on_gpu function for non-GPU locales.
+// This works exactly the same as the standard one.
+
+static inline void chpl_assert_on_gpu(int32_t lineno, int32_t filenameIdx) {


Consider ln and fn for formal names to have consistency in the runtime. (I don't love those names at all, I love consistency)

Signed-off-by: Danila Fedorin <[email protected]>

@e-kayrakli

Closes #23149. This PR implements my initial vision for documenting attributes based on procedures that implement them. In PR #25826, I made GPU attributes be backed by Chapel functions. One of my goals with this approach was to be able to document these attributes using the existing Chapel mechanism (doc comments attached to procedures). This PR does so, by adding a new chpldoc attribute (`@chpldoc.attributeSignature`) that marks a function as defining the signature (formals etc.) of an attribute. Since the functions in the GPU module that back attributes are internal (not intended to be called directly), they would normally be skipped by chpldoc's output. This PR adjusts the behavior of chpldoc to not skip the functions in this case. It also adds documentation to the various attributes using this new mechanism. This PR depends on chapel-lang/sphinxcontrib-chapeldomain#97, which implements the RST-to-HTML conversion of the new 'annotation' directive chpldoc will emit. <img width="725" alt="Screen Shot 2024-09-05 at 10 16 43 AM" src="https://github.com/user-attachments/assets/bcc3fbe8-301f-4727-ae92-be2968d4e0f4"> Reviewed by @e-kayrakli, @mppf, and @lydia-duncan -- thanks! ## Testing - [x] paratest (just in case)

@e-kayrakli

This documents and makes use of the various changes I made in #25826, chapel-lang/sphinxcontrib-chapeldomain#97, and #25884. Reviewed by @e-kayrakli -- thanks!

DanilaFe added 10 commits August 26, 2024 11:20

Use Chapel functions to back GPU attributes

bfc96dd

Signed-off-by: Danila Fedorin <[email protected]>

Improve the last-resort error case for blockSize

f645734

Signed-off-by: Danila Fedorin <[email protected]>

Add a static version of 'assertOnGpu'

984e44e

Signed-off-by: Danila Fedorin <[email protected]>

Ensure leftover GPU primitives are left in the tree

7ed6e71

Signed-off-by: Danila Fedorin <[email protected]>

Emit an error when runtime-only 'assert on gpu' is used

1c4b678

Signed-off-by: Danila Fedorin <[email protected]>

Add tests for how GPU code behaves with LOCALE!=gpu

f05768a

Signed-off-by: Danila Fedorin <[email protected]>

Lock down 'assertEligible' not causing assertions

2e73c19

Signed-off-by: Danila Fedorin <[email protected]>

Add a test that locks down 'assert eligible' errors

91b974a

Signed-off-by: Danila Fedorin <[email protected]>

Avoid internal errors when reporting errors with 'assert eligible'

8cd6a12

Signed-off-by: Danila Fedorin <[email protected]>

Update tests with new module

b94d40d

Signed-off-by: Danila Fedorin <[email protected]>

e-kayrakli reviewed Aug 28, 2024

View reviewed changes

DanilaFe added 3 commits August 28, 2024 18:36

Apply most reviewer feedback

f9d29d9

Signed-off-by: Danila Fedorin <[email protected]>

Allow compilation to succeed on LM=flat w/assert_on_gpu

099ba0b

Signed-off-by: Danila Fedorin <[email protected]>

Move assert-on-gpu warning into module code and allow silencing it

0b19479

Signed-off-by: Danila Fedorin <[email protected]>

e-kayrakli reviewed Aug 29, 2024

View reviewed changes

DanilaFe added 3 commits August 30, 2024 09:21

Update function arg names per reviewer request

bce213a

Signed-off-by: Danila Fedorin <[email protected]>

Improve and lock down error messages for insufficient arguments

56f4e09

Signed-off-by: Danila Fedorin <[email protected]>

Update assertOnGpu-in-LM=flat tests

594dee7

Signed-off-by: Danila Fedorin <[email protected]>

DanilaFe merged commit d301f0a into chapel-lang:main Aug 30, 2024
7 checks passed

DanilaFe mentioned this pull request Sep 5, 2024

Document GPU attributes #25884

Merged

1 task

DanilaFe mentioned this pull request Sep 18, 2024

Update GPU attribute documentation #25963

Merged

DanilaFe added a commit that referenced this pull request Sep 18, 2024

Update GPU attribute documentation (#25963)

f6b4319

This documents and makes use of the various changes I made in #25826, chapel-lang/sphinxcontrib-chapeldomain#97, and #25884. Reviewed by @e-kayrakli -- thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Attribute Improvements #25826

GPU Attribute Improvements #25826

DanilaFe commented Aug 27, 2024 •

edited

Loading

e-kayrakli left a comment

e-kayrakli Aug 28, 2024

DanilaFe Aug 29, 2024

e-kayrakli Aug 29, 2024

DanilaFe Aug 29, 2024

e-kayrakli Aug 28, 2024

DanilaFe Aug 29, 2024

e-kayrakli Aug 29, 2024

e-kayrakli Aug 28, 2024

DanilaFe Aug 29, 2024

e-kayrakli Aug 29, 2024

		if (assertOnGpuAttr) {
		body->insertAtTail(new CallExpr(PRIM_ASSERT_ON_GPU,

GPU Attribute Improvements #25826

GPU Attribute Improvements #25826

Conversation

DanilaFe commented Aug 27, 2024 • edited Loading

Case 1: Writing CPU and GPU code using the GPU locale model

Case 2: Disabling GPU support and compiling with CHPL_LOCALE_MODEL=flat

Testing

e-kayrakli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanilaFe commented Aug 27, 2024 •

edited

Loading

Case 2: Disabling GPU support and compiling with `CHPL_LOCALE_MODEL=flat`