Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDF_UNREACHABLE macro. #9727

Merged
merged 19 commits into from
Mar 18, 2022

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 19, 2021

Resolves #7753. I replaced all instances of cudf_assert(false && "message"); with CUDF_UNREACHABLE("message");. There are a few instances where the condition of the assertion is not always false, and thus the code following it may still be reachable. I did not change those cases.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 19, 2021
@bdice bdice changed the title Add CUDF_UNREACHABLE Add CUDF_UNREACHABLE macro. Nov 19, 2021
@bdice bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Nov 19, 2021
@bdice bdice self-assigned this Nov 19, 2021
@codecov
Copy link

codecov bot commented Nov 20, 2021

Codecov Report

Merging #9727 (3d3000c) into branch-22.04 (4596244) will decrease coverage by 0.12%.
The diff coverage is 93.70%.

@@               Coverage Diff                @@
##           branch-22.04    #9727      +/-   ##
================================================
- Coverage         86.13%   86.01%   -0.13%     
================================================
  Files               139      139              
  Lines             22438    22426      -12     
================================================
- Hits              19328    19290      -38     
- Misses             3110     3136      +26     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.57% <ø> (ø)
python/cudf/cudf/core/series.py 95.16% <ø> (ø)
python/dask_cudf/dask_cudf/tests/test_accessor.py 98.41% <ø> (ø)
python/cudf/cudf/core/column/decimal.py 91.30% <73.68%> (-1.01%) ⬇️
python/cudf/cudf/core/column/categorical.py 89.63% <84.61%> (-0.29%) ⬇️
python/cudf/cudf/core/column/string.py 88.91% <94.44%> (+0.64%) ⬆️
python/cudf/cudf/core/column/numerical.py 95.62% <95.83%> (+0.64%) ⬆️
python/cudf/cudf/api/types.py 89.79% <100.00%> (ø)
python/cudf/cudf/core/column/column.py 89.27% <100.00%> (+0.10%) ⬆️
python/cudf/cudf/core/column/datetime.py 88.55% <100.00%> (-0.52%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 621d26f...3d3000c. Read the comment docs.

@bdice bdice changed the base branch from branch-22.02 to branch-22.04 January 20, 2022 04:41
@github-actions
Copy link

github-actions bot commented Mar 2, 2022

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

@jrhemstad
Copy link
Contributor

@bdice can we get this wrapped up?

@bdice
Copy link
Contributor Author

bdice commented Mar 2, 2022

@jrhemstad Yes! I might need guidance on a few cases where I am unsure about whether the path should actually be marked as unreachable. I think the current state of the PR may be too aggressive in applying that.

@bdice bdice marked this pull request as ready for review March 15, 2022 21:02
@bdice bdice requested a review from a team as a code owner March 15, 2022 21:02
@bdice bdice requested review from vuule and nvdbaranec March 15, 2022 21:02
cpp/src/hash/hashing.cu Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor Author

bdice commented Mar 15, 2022

rerun tests

@nvdbaranec
Copy link
Contributor

rerun tests

Comment on lines 513 to 517
#ifndef __CUDA_ARCH__
CUDF_FAIL("Unsupported type_id.");
CUDF_FAIL("Invalid type_id.");
#else
cudf_assert(false && "Unsupported type_id.");

// The following code will never be reached, but the compiler generates a
// warning if there isn't a return value.

// Need to find out what the return type is in order to have a default
// return value and solve the compiler warning for lack of a default
// return
using return_type = decltype(f.template operator()<int8_t>(std::forward<Ts>(args)...));
return return_type();
CUDF_UNREACHABLE("Invalid type_id.");
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to make this into a single macro (maybe this should be CUDF_UNREACHABLE, so it covers both host and device code)? I see the pattern in a few places in the PR.

Copy link
Contributor Author

@bdice bdice Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but I didn't want to hide the dependence on #ifndef __CUDA_ARCH__. Failure/raising an error and unreachable code mean very different things in my opinion, and I didn't want to conflate them by replacing this with an idiom that has potential for misuse. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It's weird because we do have the uneven handling between host and device as it is. Maybe it should be the other way around, and CUDF_FAIL can call CUDF_UNREACHABLE if in device code. As in - "we failed on the device, here's an assert if debug and don't expect a return".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @jrhemstad for thoughts on this. I would defer that change to a later PR if possible.

Copy link
Contributor Author

@bdice bdice Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still in favor of keeping these macros separate. Letting CUDF_FAIL defer to an unreachable path seems dangerous. Developers that see CUDF_FAIL should be able to reasonably expect an error, and should not use it to signify branches that can be optimized out as impossible to reach. A macro named something like CUDF_IMPOSSIBLE might be a compromise, but I think a combined macro like that would obscure the intention (in harmful ways) more than it helps with cleanliness/brevity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, obscuring the intention is the main issue I can see.

Here's what bugs me: we are using CUDF_UNREACHABLE both for truly unreachable code and failure. Ideally, CUDF_UNREACHABLE macro would call GCC's __builtin_unreachable() if in host code. But we call CUDF_FAIL instead in such cases.
Feels like code that should not be executed should use CUDF_FAIL (both host and device) and truly unreachable code should use CUDF_UNREACHABLE (both host and device). I understand that this may do more hard than good, just bringing it up for consideration.

Copy link
Contributor Author

@bdice bdice Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all the cases handled in this way are actually unreachable (by enum exhaustion, in most cases). We’re just taking the opportunity to raise an error on the host because we can do that without any significant performance or compile time penalty.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more nitpicks, looks 🔥 otherwise

cpp/src/io/parquet/chunk_dict.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/chunk_dict.cu Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor Author

bdice commented Mar 17, 2022

rerun tests

1 similar comment
@jrhemstad
Copy link
Contributor

rerun tests

@bdice bdice added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 18, 2022
@bdice
Copy link
Contributor Author

bdice commented Mar 18, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 48cebf7 into rapidsai:branch-22.04 Mar 18, 2022
abellina added a commit to abellina/cudf that referenced this pull request Mar 28, 2022
ajschmidt8 pushed a commit that referenced this pull request Mar 30, 2022
…seen in NDS q72 in Spark (#10534)

The following change addresses a performance degradation we noticed in the `mixed_join` and `compute_mixed_join_output_size` that looks to be tied to the theoretical occupancy of these kernels, as limited by the number of registers used.

The regression is triggered by this patch: #9727, which improves handling of unreachable code paths. That said, somehow, this change is altering the number of registers these kernels need. Both `mixed_join` and `compute_mixed_join_output_size` are very sensitive to the register count, per NSight compute. With the patch, the register required changed from 92 to 102, and 118 to 141 respectively. 

The fix here hints the compiler what our block size is (128 threads). This, from our testing, allows the compiler to reduce the number of registers required to 128 for `compute_mixed_join_output_size` and 96 for `mixed_join`. This lead to better occupancy (I think @nvdbaranec measured it going from 30% to 50%) and I saw the wall clock time of q72 (which started all this) to go from 133s to 121s, which is within the ballpark I'd expect.

Authors:
   - Alessandro Bellina (https://github.com/abellina)

Approvers:
   - Mike Wilson (https://github.com/hyperbolic2346)
abellina added a commit to abellina/cudf that referenced this pull request Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Use __builtin_unreachable for unreachable code paths
5 participants