-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUML_USING_RANGE macro for easier NVTX profiling #4436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach! Is there any particular reason why we need to introduce a new macro to facilitate it, though? Something along the lines of
auto nvtx_range = AUTO_RANGE(...);
shouldn't be any more onerous to type and makes it clearer what is going on. It would be nice to avoid adding macros unless there's a really compelling reason to do so.
I'll also do a more thorough line-level review, but please be sure to add brackets around all your if/else clauses.
@@ -42,4 +42,81 @@ void PUSH_RANGE(const char* name); | |||
/** Pop the latest range */ | |||
void POP_RANGE(); | |||
|
|||
/** Push a named nvtx range that would be popped at the end of the object lifetime. */ | |||
class AUTO_RANGE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be used more generally both inside of RAFT primitives and across different projects. Is there any particular reason we shouldn't push this whole file to RAFT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I asked the same question in the gtests for your current RAFT PR but I haven't finished the review yet so I figured I'd ask here in comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fantastic! I'd definitely use it in Triton backends if it were available in RAFT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your permission, I'd love to move this to raft :) My PR rapidsai/raft#401 would benefit from that indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I think that's a great idea!
Thanks, @wphicks for the comments! I aimed at allowing both direct usage and the macro (the macro being obviously very simple). My reason for using the macro was that it generated unique names, so the user wouldn't need to think about any shadowing warnings coming from gcc. Also the macro compiles to absolutely nothing when NVTX is disabled (I'm not sure whether the compiler would completely optimize away the unused variable). Btw, do we have the readability-braces rule for the single-line if statements? I didn't know that. Why don't we add it to the clang-tidy then? :) |
My concern would be that if the macro is there, folks will see it as "the right way" to invoke this, and avoiding widespread usage of an additional macro throughout the codebase would be nice.
That is a convenient feature, but having human-defined names may also make it clearer why a range is being added to a particular block in the first place.
That's a more compelling argument to me. Practically speaking, it is true that all compilers we work with (and all that I know of that we don't) will optimize this away, but there's no guarantee. I'm not sure. I'm generally pretty anti-macro unless we have a super compelling reason to use one, but let's get other folks to weigh in. @cjnolet do you have thoughts on this? Worth a macro or no?
We definitely should! |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #4436 +/- ##
===============================================
Coverage ? 85.72%
===============================================
Files ? 236
Lines ? 19326
Branches ? 0
===============================================
Hits ? 16568
Misses ? 2758
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Moving this PR to rapidsai/raft#416 Interested parties are invited to continue the discussion there :) |
Add an
AUTO_RANGE
type and aCUML_USING_RANGE
macro for easier profiling with NVTX.CUML_USING_RANGE
creates a variableAUTO_RANGE
on the stack, so it's destructed at the end of the enclosing code block (and triggersPOP_RANGE
). The benefit is to (a) never forget to pop the range and (b) do not mess up the NVTX profile in the presence of exceptions.