-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDF_UNREACHABLE macro. #9727
Add CUDF_UNREACHABLE macro. #9727
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #9727 +/- ##
================================================
- Coverage 86.13% 86.01% -0.13%
================================================
Files 139 139
Lines 22438 22426 -12
================================================
- Hits 19328 19290 -38
- Misses 3110 3136 +26
Continue to review full report at Codecov.
|
…ail rather than be unreachable).
This PR has been labeled |
@bdice can we get this wrapped up? |
@jrhemstad Yes! I might need guidance on a few cases where I am unsure about whether the path should actually be marked as unreachable. I think the current state of the PR may be too aggressive in applying that. |
rerun tests |
rerun tests |
#ifndef __CUDA_ARCH__ | ||
CUDF_FAIL("Unsupported type_id."); | ||
CUDF_FAIL("Invalid type_id."); | ||
#else | ||
cudf_assert(false && "Unsupported type_id."); | ||
|
||
// The following code will never be reached, but the compiler generates a | ||
// warning if there isn't a return value. | ||
|
||
// Need to find out what the return type is in order to have a default | ||
// return value and solve the compiler warning for lack of a default | ||
// return | ||
using return_type = decltype(f.template operator()<int8_t>(std::forward<Ts>(args)...)); | ||
return return_type(); | ||
CUDF_UNREACHABLE("Invalid type_id."); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to make this into a single macro (maybe this should be CUDF_UNREACHABLE
, so it covers both host and device code)? I see the pattern in a few places in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but I didn't want to hide the dependence on #ifndef __CUDA_ARCH__
. Failure/raising an error and unreachable code mean very different things in my opinion, and I didn't want to conflate them by replacing this with an idiom that has potential for misuse. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It's weird because we do have the uneven handling between host and device as it is. Maybe it should be the other way around, and CUDF_FAIL
can call CUDF_UNREACHABLE
if in device code. As in - "we failed on the device, here's an assert if debug and don't expect a return".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @jrhemstad for thoughts on this. I would defer that change to a later PR if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still in favor of keeping these macros separate. Letting CUDF_FAIL
defer to an unreachable path seems dangerous. Developers that see CUDF_FAIL
should be able to reasonably expect an error, and should not use it to signify branches that can be optimized out as impossible to reach. A macro named something like CUDF_IMPOSSIBLE
might be a compromise, but I think a combined macro like that would obscure the intention (in harmful ways) more than it helps with cleanliness/brevity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, obscuring the intention is the main issue I can see.
Here's what bugs me: we are using CUDF_UNREACHABLE
both for truly unreachable code and failure. Ideally, CUDF_UNREACHABLE
macro would call GCC's __builtin_unreachable()
if in host code. But we call CUDF_FAIL
instead in such cases.
Feels like code that should not be executed should use CUDF_FAIL
(both host and device) and truly unreachable code should use CUDF_UNREACHABLE
(both host and device). I understand that this may do more hard than good, just bringing it up for consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all the cases handled in this way are actually unreachable (by enum exhaustion, in most cases). We’re just taking the opportunity to raise an error on the host because we can do that without any significant performance or compile time penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more nitpicks, looks 🔥 otherwise
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
This reverts commit 48cebf7.
…seen in NDS q72 in Spark (#10534) The following change addresses a performance degradation we noticed in the `mixed_join` and `compute_mixed_join_output_size` that looks to be tied to the theoretical occupancy of these kernels, as limited by the number of registers used. The regression is triggered by this patch: #9727, which improves handling of unreachable code paths. That said, somehow, this change is altering the number of registers these kernels need. Both `mixed_join` and `compute_mixed_join_output_size` are very sensitive to the register count, per NSight compute. With the patch, the register required changed from 92 to 102, and 118 to 141 respectively. The fix here hints the compiler what our block size is (128 threads). This, from our testing, allows the compiler to reduce the number of registers required to 128 for `compute_mixed_join_output_size` and 96 for `mixed_join`. This lead to better occupancy (I think @nvdbaranec measured it going from 30% to 50%) and I saw the wall clock time of q72 (which started all this) to go from 133s to 121s, which is within the ballpark I'd expect. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Mike Wilson (https://github.com/hyperbolic2346)
This reverts commit 48cebf7.
Resolves #7753. I replaced all instances of
cudf_assert(false && "message");
withCUDF_UNREACHABLE("message");
. There are a few instances where the condition of the assertion is not alwaysfalse
, and thus the code following it may still be reachable. I did not change those cases.