[FEA] Consider adopting C++20 style Span class abstraction #3118

hcho3 · 2020-11-05T07:47:57Z

Problem.

#3107 was caused by an out-of-bounds access to an array. Many cuML algorithms such as random forest uses lots of arrays, and it is easy to introduce an out-of-bounds access to the codebase by accident. In addition, out-of-bounds access in CUDA kernels is quite difficult to pinpoint and debug, as the kernel crash with cudaErrorIllegalAddress exception, and we'd have to manually run cuda-memcheck, which can take a while to run.

Proposal. We should adopt C++20 style Span class to model arrays with defined bounds. The Span class will conduct automatic bounds checks, allowing us to quickly detect and fix out-of-bound access bugs. It also follows the Fail-Fast principle of software development. Furthermore, it makes a nicer abstraction to pack arrays with their bounds information. (Think Java-style arrays, where every array has the length field.) So passing arrays between functions will become less error-prone when we pass them as Span objects.

XGBoost has a device-side Span class (credit to @trivialfis): https://github.com/dmlc/xgboost/blob/2fcc4f2886340e66e988688473a73d2c46525630/include/xgboost/span.h#L412

Possible disadvantages. Bounds check may introduce performance penalty due to the addition of extra branching. My opinion is that the benefit of automatic bounds checking (fewer bugs, improved developer productivity) outweighs a slight performance penalty. Performance impact can be mitigated by supplying data() method to the Span class. For performance-critical loops, we can extract the raw pointer from the Span class and use the pointer directly to avoid the overhead of bounds checking in operator[]. This should be done sparingly, and only for small tight loops where the bounds of the loop variable is clearly identified.

The text was updated successfully, but these errors were encountered:

hcho3 · 2020-11-05T07:50:13Z

Assigning myself to this ticket. TODO. File a pull request to implement POC for device-side Span class in cuML.

JohnZed · 2020-11-05T18:11:38Z

It looks like some Span implementations do bounds checking in debug builds but not release builds, e.g. https://github.com/tcbrindle/span
Would this be desirable behavior to mitigate performance concerns? Frequent bounds checks do seem expensive to me unless the compiler can do a lot of magic to lift them out of key loops and kernels.

jrhemstad · 2020-11-05T18:34:41Z

Note that libcudf also has a device_span class. I'm a big fan of using span-like abstractions. We don't do any bounds checking on accesses (debug or otherwise). Doing checks in debug mode with assert seems reasonable.

OOB accesses for operator[] in std::span is UB: https://en.cppreference.com/w/cpp/container/span/operator_at . UB does mean that it's perfectly acceptable to crash the program with an assert or otherwise.

Furthermore, bounds checking from device code is dubious. It looks like XGBoost's Span type uses a mixture of printf and asm("trap;"). The trap corrupts the CUDA context and is an unrecoverable error. Furthermore, the presence of a printf in a kernel, even if it's never executed, can cause noticeable performance degradation as the runtime has to make the necessary syscalls to setup the staging buffer to receive printf output from threads.

Instead of using printf, using the __assert_fail built-in is preferred. It has the same drawback as asm("trap") where it will corrupt the CUDA context, but at least you don't need a printf. See libcudf's usage here: https://github.com/rapidsai/cudf/blob/89b802e6cecffe2425048f1f70cd682b865730b8/cpp/include/cudf/detail/utilities/release_assert.cuh#L33-L36

trivialfis · 2020-11-05T18:50:06Z

Interesting. I will try the built-in assertfail.

teju85 · 2020-11-05T18:55:37Z

Even though it is dangerous, bounds-check has performance concerns (as stated by everyone before me). So, for key algos like RF, I'd not want bounds-check to be done in device code, atleast not in release builds.

Ideally, we should emphasize on better unit-tests and them coupled with cuda-memcheck enabled runs should catch most of these issues. And we do have had success in catching such issues in ml-prims. Sadly, we didn't spend time after 0.14 release on enabling this flow for cuML unit-tests.

hcho3 · 2020-11-05T19:03:53Z

Got it. In that case, we can conditionally enable bound checks in Debug builds.

Even without bound checks, Span is a useful abstraction. For example, we can eliminate the mistake of using the wrong bounds when multiple arrays are being passed in as function arguments. (RF, for example, has multiple functions with 5+ array arguments. It's easy to use a wrong size info.) Span lets us keep the array size information close to the array pointer.

canonizer · 2020-11-05T19:05:20Z

I'm in favor of bounds checking. Are there any measurements for performance penalties caused by device-side bounds checking, especially for memory bandwidth-bound problems?

trivialfis · 2020-11-05T19:13:33Z

@canonizer I believe it varies between different cards. Also most of the hot loops have pre defined bound hence the overhead can be avoided manually in those loops. I'm also in favor of having span, it's not only safer, but also a nice abstraction.

hcho3 · 2020-11-05T19:16:56Z

I've updated the title of this issue, to separate the two concerns: 1) introducing Span in general; and 2) performing bounds check in operator[].

teju85 · 2020-11-06T01:23:01Z

I too am completely in favor of span. Just a couple of notes:

relying on this feature for safety of our code is not enough (I have made this mistake in the past!). Rigorous testing as well as regularly running our tests with cuda-memcheck tooling enabled are a more robust and sure-shot way to achieve safer code.
since we already have a lot of algos with varying roofline behavior, we'll need to introduce bounds-check with a lot of care. I'd strongly prefer doing this only after we have our ASV-based reporting enabled even for all of our C++ benchmarks.

Thank you @jrhemstad. I think device_span class in cudf pretty much covers all our needs in cuml too!

hcho3 · 2020-11-06T02:04:11Z

Thanks for pointing to device_span, @jrhemstad. Would it be a good idea to submit a PR to put it in RAFT, so that cuML can also use it? (I don't think cuML has direct access to libcudf's headers.)

cc @dantegd

trivialfis · 2020-11-06T06:28:36Z

Hi @jrhemstad , I looked into the __assertfail you mentioned in your previous comment and got a few questions:

Is __CUDACC_ARCH__ a real macro defined by nvcc?
Skimming through the cuda header common_functions.h, __assertfail seems to be only defined at runtime compilation?

I also looked into the host device function __assert_fail. There might be a nvcc bug which turns it into undefined identifier in some device functions (4 functions in xgboost), but works for all the other device functions.

jrhemstad · 2020-11-06T15:34:43Z

Is __CUDACC_ARCH__ a real macro defined by nvcc?

Wow, that's definitely wrong and definitely not a real macro. Looks like a mismash of __CUDACC__ and __CUDA_ARCH__. It looks like our code hasn't actually been using __assertfail for a long time.

I corrected it to __CUDA_ARCH__ and confirmed that I'm seeing undefined reference errors. I'm investigating now.

jrhemstad · 2020-11-06T15:59:06Z

Thanks for pointing to device_span, @jrhemstad. Would it be a good idea to submit a PR to put it in RAFT, so that cuML can also use it? (I don't think cuML has direct access to libcudf's headers.)

cc @dantegd

I'm hoping to ultimately get it into Thrust. That would be the ideal place for it to live.

jrhemstad · 2020-11-06T16:29:34Z

@trivialfis thank you for pointing out this mistake. Looks like a perfect storm of typos/misunderstanding and a dropped unit test led to this being uncaught for a very long time. I've corrected it here: rapidsai/cudf#6696

jrhemstad · 2020-11-06T20:18:43Z

I've opened NVIDIA/cccl#752. We'll work with the Thrust team to try and get this added in the medium term. It probably won't be ready in the short term, so if you want something sooner than later, copying what is in cudf is probably the best thing to do.

github-actions · 2021-02-16T20:18:05Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions · 2021-03-18T20:20:53Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

trivialfis · 2021-05-11T12:21:10Z

Hi, I think it will take some time before thrust adding span, and some more time before we can use it. Any chance we can make our own implementation in cuml as a temporary solution?

trivialfis · 2021-12-14T01:50:29Z

Being worked on here: rapidsai/raft#399 .

trivialfis · 2022-02-16T15:27:39Z

@hcho3 It's merged into raft now.

hcho3 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Nov 5, 2020

hcho3 self-assigned this Nov 5, 2020

hcho3 added CUDA / C++ CUDA issue and removed ? - Needs Triage Need team to review and classify labels Nov 5, 2020

hcho3 mentioned this issue Nov 5, 2020

[REVIEW] Fix experimental RF backend crashes and add tests #3117

Merged

hcho3 changed the title ~~[FEA] Consider adopting C++20 style Span class to automate bounds check~~ [FEA] Consider adopting C++20 style Span class abstraction Nov 5, 2020

github-actions bot added the inactive-90d label Feb 16, 2021

github-actions bot added the inactive-30d label Mar 18, 2021

github-actions bot removed the inactive-90d label May 11, 2021

github-actions bot removed the inactive-30d label May 13, 2021

hcho3 closed this as completed Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Consider adopting C++20 style Span class abstraction #3118

[FEA] Consider adopting C++20 style Span class abstraction #3118

hcho3 commented Nov 5, 2020 •

edited

Loading

hcho3 commented Nov 5, 2020

JohnZed commented Nov 5, 2020

jrhemstad commented Nov 5, 2020 •

edited

Loading

trivialfis commented Nov 5, 2020

teju85 commented Nov 5, 2020

hcho3 commented Nov 5, 2020 •

edited

Loading

canonizer commented Nov 5, 2020

trivialfis commented Nov 5, 2020 •

edited

Loading

hcho3 commented Nov 5, 2020

teju85 commented Nov 6, 2020

hcho3 commented Nov 6, 2020

trivialfis commented Nov 6, 2020 •

edited

Loading

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Mar 18, 2021

trivialfis commented May 11, 2021

trivialfis commented Dec 14, 2021

trivialfis commented Feb 16, 2022

[FEA] Consider adopting C++20 style Span class abstraction #3118

[FEA] Consider adopting C++20 style Span class abstraction #3118

Comments

hcho3 commented Nov 5, 2020 • edited Loading

hcho3 commented Nov 5, 2020

JohnZed commented Nov 5, 2020

jrhemstad commented Nov 5, 2020 • edited Loading

trivialfis commented Nov 5, 2020

teju85 commented Nov 5, 2020

hcho3 commented Nov 5, 2020 • edited Loading

canonizer commented Nov 5, 2020

trivialfis commented Nov 5, 2020 • edited Loading

hcho3 commented Nov 5, 2020

teju85 commented Nov 6, 2020

hcho3 commented Nov 6, 2020

trivialfis commented Nov 6, 2020 • edited Loading

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

jrhemstad commented Nov 6, 2020

github-actions bot commented Feb 16, 2021

github-actions bot commented Mar 18, 2021

trivialfis commented May 11, 2021

trivialfis commented Dec 14, 2021

trivialfis commented Feb 16, 2022

hcho3 commented Nov 5, 2020 •

edited

Loading

jrhemstad commented Nov 5, 2020 •

edited

Loading

hcho3 commented Nov 5, 2020 •

edited

Loading

trivialfis commented Nov 5, 2020 •

edited

Loading

trivialfis commented Nov 6, 2020 •

edited

Loading