Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Attribute Improvements #25826

Merged
merged 16 commits into from
Aug 30, 2024
Merged

Conversation

DanilaFe
Copy link
Contributor

@DanilaFe DanilaFe commented Aug 27, 2024

Closes #22822.

This PR seeks to improve the experience of using various GPU attributes in several cases. These changes are motivated by some of the difficulties I've observed @Iainmon to have had while working on his GPU-enabled machine learning code.

Case 1: Writing CPU and GPU code using the GPU locale model

The way to ensure that a loop is GPU-eligible in user code (and to fail compilation if the loop is not GPU eligible), is to use @assertOnGpu. However, one cannot do this when writing code that is expected to support both GPUs and CPUs. I've observed Iain's code to have something like the following:

if onGpu {
  @assertOnGpu
  foreach ... { }
} else {
  foreach ... { /* the same loop as above */ }
}

This way, the code could be used on both the GPU and the CPU, and the compiler will ensure that the GPU version is eligible. However, this introduces a maintenance burden, and makes the code rather verbose. To work around this problem, I introduce a new GPU primitive + attribute: @gpu.assertEligible. This attribute has the same behavior as @assertOnGpu at compile-time, but it does not have a runtime effect. Thus, the code above can be flattened and continue to support both CPU and GPU runs:

if onGpu {
  @gpu.assertEligible
  foreach ... { }
}

In my opinion, we should phase out the use of @assertOnGpu in favor of @gpu.assertEligible. It's unclear to me that having a runtime assertion using this attribute is worth keeping it around two similar attributes. Personally, I think that the compile-time assertion can be handled by @gpu.assertEligible, and various utilities from GpuDiagnostics for tracking kernel launches etc. can be used to ensure that GPU execution occurs at runtime. This PR doesn't make this (potentially more controversial change).

Case 2: Disabling GPU support and compiling with CHPL_LOCALE_MODEL=flat

When I told Iain to run his performance experiments in the flat locale model (to get started with initial performance results via the CPU), he immediately ran into internal errors. This is an instance of #22822.

My chosen solution to this problem is to make @assertOnGpu a compile-time error under CHPL_LOCALE_MODEL=flat. This is because of the semantics of @assertOnGpu: this attribute has a runtime check; without a GPU, the check is guaranteed to fail, and cause a "certain" failure. This error is now user-facing, and tells the user to switch to @gpu.assertEligible if all they want is a compile-time check.

On the other hand, the @gpu.assertEligible attribute, which does not have any runtime semantics, does not cause a compilation error with the flat locale model. Instead, the attribute is simply ignored (we don't perform any GPU logic with the flat locale model, and it doesn't seem worth it to actually perform GPU transformations / analysis for the sole purpose of validating GPU eligibility). The same is true for @gpu.blockSize, and the non-user-facing "GPU primitive block" primitive which is used to group GPU primitives created via attributes. Thus, the following code compiles and runs just fine in the flat locale model:

@gpu.assertEligible
@gpu.blockSize(128)
foreach i in 1..128 { /* ... */ }

Reviewed by @e-kayrakli -- thanks!

Testing

  • new flat tests for GPU primitives, including a new user-facing error.
  • GPU tests, including new tests for @gpu.assertEligible
  • paratest

Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally. I have mostly nits. I have more testing ideas that I'd like to see go in with this PR, though:

  • Error message for blockSize(1,2) if we don't have it. (I know it is not about this PR, but the PR touches relevant areas)
  • More importantly, what happens if I use both flavors of assertions together in flat and gpu locale models? Some of those cases will error, some should just work. It'd be good to see that they can be used without an embarrassing issue.

reason = "is marked with @gpu.assertEligible";
} else {
CHPL_ASSERT(assertion->isPrimitive(PRIM_ASSERT_ON_GPU));
reason = "contains assertOnGpu()";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like we need this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do; the legacy assertOnGpu is still present in the code. I deprecated and can remove it in a separate PR, but would rather not do it in this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Would you be up for doing that before the release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for sure.

compiler/optimizations/gpuTransforms.cpp Outdated Show resolved Hide resolved
compiler/optimizations/gpuTransforms.cpp Outdated Show resolved Hide resolved
compiler/optimizations/gpuTransforms.cpp Outdated Show resolved Hide resolved
compiler/optimizations/gpuTransforms.cpp Outdated Show resolved Hide resolved
body->insertAtTail(new CallExpr(PRIM_ASSERT_ON_GPU,
new SymExpr(gTrue)));
return true;
body->insertAtTail(new CallExpr("chpl__assertOnGpuAttr"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said this wouldn't bother me, but could you explain to me (or point me to an issue) why this is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my reasons:

  1. I can use the overload resolution process to select candidate functions. For instance, I may use a where clause to avoid inserting GPU primitives when CHPL_LOCALE_MODEL=flat
  2. I can ensure that the arguments to the primitives are correctly typed (since calls are just “regular functions”).
  3. I don’t have to change compiler code to adjust the behavior of attributes, to some extent
  4. Using an experimental annotation (which I plan to work on next after this PR) I can get chpldoc to use these function definitions (like chpl__assertOnGpuAttr) to generate documentation for attributes, which addresses Should we provide a specific way to document attributes? #23149.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an experimental annotation (which I plan to work on next after this PR) I can get chpldoc to use these function definitions (like chpl__assertOnGpuAttr) to generate documentation for attributes, which addresses

Nice.

Thanks for explanation. You are right on all counts.

if (assertOnGpuAttr) {
body->insertAtTail(new CallExpr(PRIM_ASSERT_ON_GPU,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you removed this primitive? I don't think you did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't, and don't intend to in this PR, though my goal in the long term is to remove it.

compiler/passes/convert-uast.cpp Show resolved Hide resolved
modules/standard/AutoGpu.chpl Show resolved Hide resolved
modules/standard/AutoGpu.chpl Show resolved Hide resolved
// Provide a fallback for the chpl_assert_on_gpu function for non-GPU locales.
// This works exactly the same as the standard one.

static inline void chpl_assert_on_gpu(int32_t lineno, int32_t filenameIdx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider ln and fn for formal names to have consistency in the runtime. (I don't love those names at all, I love consistency)

@DanilaFe DanilaFe merged commit d301f0a into chapel-lang:main Aug 30, 2024
7 checks passed
@DanilaFe DanilaFe mentioned this pull request Sep 5, 2024
1 task
DanilaFe added a commit that referenced this pull request Sep 5, 2024
Closes #23149.

This PR implements my initial vision for documenting attributes based on
procedures that implement them. In PR
#25826, I made GPU attributes
be backed by Chapel functions. One of my goals with this approach was to
be able to document these attributes using the existing Chapel mechanism
(doc comments attached to procedures). This PR does so, by adding a new
chpldoc attribute (`@chpldoc.attributeSignature`) that marks a function
as defining the signature (formals etc.) of an attribute.

Since the functions in the GPU module that back attributes are internal
(not intended to be called directly), they would normally be skipped by
chpldoc's output. This PR adjusts the behavior of chpldoc to not skip
the functions in this case.

It also adds documentation to the various attributes using this new
mechanism.

This PR depends on
chapel-lang/sphinxcontrib-chapeldomain#97, which
implements the RST-to-HTML conversion of the new 'annotation' directive
chpldoc will emit.

<img width="725" alt="Screen Shot 2024-09-05 at 10 16 43 AM"
src="https://github.com/user-attachments/assets/bcc3fbe8-301f-4727-ae92-be2968d4e0f4">

Reviewed by @e-kayrakli, @mppf, and @lydia-duncan -- thanks!

## Testing

- [x] paratest (just in case)
DanilaFe added a commit that referenced this pull request Sep 18, 2024
This documents and makes use of the various changes I made in
#25826,
chapel-lang/sphinxcontrib-chapeldomain#97, and
#25884.

Reviewed by @e-kayrakli -- thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

assertOnGpu causes compiler segfault with CHPL_LOCALE_MODEL!=gpu
2 participants