Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUML_USING_RANGE macro for easier NVTX profiling #4436

Closed

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Dec 9, 2021

Add an AUTO_RANGE type and a CUML_USING_RANGE macro for easier profiling with NVTX.

  1. CUML_USING_RANGE creates a variable AUTO_RANGE on the stack, so it's destructed at the end of the enclosing code block (and triggers POP_RANGE). The benefit is to (a) never forget to pop the range and (b) do not mess up the NVTX profile in the presence of exceptions.
  2. The macro and the constructor support printf-style varargs to label things in a more meaningful way (e.g. add an epoch number to the label).

@achirkin achirkin requested a review from a team as a code owner December 9, 2021 14:48
@achirkin achirkin added 2 - In Progress Currenty a work in progress feature request New feature or request non-breaking Non-breaking change 3 - Ready for Review Ready for review by team and removed CUDA/C++ 2 - In Progress Currenty a work in progress labels Dec 9, 2021
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach! Is there any particular reason why we need to introduce a new macro to facilitate it, though? Something along the lines of

auto nvtx_range = AUTO_RANGE(...);

shouldn't be any more onerous to type and makes it clearer what is going on. It would be nice to avoid adding macros unless there's a really compelling reason to do so.

I'll also do a more thorough line-level review, but please be sure to add brackets around all your if/else clauses.

@@ -42,4 +42,81 @@ void PUSH_RANGE(const char* name);
/** Pop the latest range */
void POP_RANGE();

/** Push a named nvtx range that would be popped at the end of the object lifetime. */
class AUTO_RANGE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be used more generally both inside of RAFT primitives and across different projects. Is there any particular reason we shouldn't push this whole file to RAFT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I asked the same question in the gtests for your current RAFT PR but I haven't finished the review yet so I figured I'd ask here in comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fantastic! I'd definitely use it in Triton backends if it were available in RAFT.

Copy link
Contributor Author

@achirkin achirkin Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your permission, I'd love to move this to raft :) My PR rapidsai/raft#401 would benefit from that indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I think that's a great idea!

@achirkin
Copy link
Contributor Author

achirkin commented Dec 9, 2021

Thanks, @wphicks for the comments! I aimed at allowing both direct usage and the macro (the macro being obviously very simple). My reason for using the macro was that it generated unique names, so the user wouldn't need to think about any shadowing warnings coming from gcc. Also the macro compiles to absolutely nothing when NVTX is disabled (I'm not sure whether the compiler would completely optimize away the unused variable).

Btw, do we have the readability-braces rule for the single-line if statements? I didn't know that. Why don't we add it to the clang-tidy then? :)

@wphicks
Copy link
Contributor

wphicks commented Dec 9, 2021

I aimed at allowing both direct usage and the macro

My concern would be that if the macro is there, folks will see it as "the right way" to invoke this, and avoiding widespread usage of an additional macro throughout the codebase would be nice.

My reason for using the macro was that it generated unique names

That is a convenient feature, but having human-defined names may also make it clearer why a range is being added to a particular block in the first place.

I'm not sure whether the compiler would completely optimize away the unused variable

That's a more compelling argument to me. Practically speaking, it is true that all compilers we work with (and all that I know of that we don't) will optimize this away, but there's no guarantee.

I'm not sure. I'm generally pretty anti-macro unless we have a super compelling reason to use one, but let's get other folks to weigh in. @cjnolet do you have thoughts on this? Worth a macro or no?

Why don't we add it to the clang-tidy then?

We definitely should!

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@0114c67). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-22.02    #4436   +/-   ##
===============================================
  Coverage                ?   85.72%           
===============================================
  Files                   ?      236           
  Lines                   ?    19326           
  Branches                ?        0           
===============================================
  Hits                    ?    16568           
  Misses                  ?     2758           
  Partials                ?        0           
Flag Coverage Δ
dask 46.51% <0.00%> (?)
non-dask 78.62% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0114c67...99e2bf5. Read the comment docs.

@achirkin
Copy link
Contributor Author

Moving this PR to rapidsai/raft#416 Interested parties are invited to continue the discussion there :)

@achirkin achirkin closed this Dec 10, 2021
@achirkin achirkin deleted the fea-using-nvtx-ranges branch December 10, 2021 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CUDA/C++ feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants