-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add cuda_event type #870
Conversation
A few days ago @jrhemstad mentioned you'd welcome this wrapper. I tried to follow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that both wait
and record
should actually be members of cuda_stream_view
. In thinking of member functions as verbs, a stream waits on an event or a stream records an event. The relationship of the thing doing the action and the thing being acted on feels inverted to make these members of the event
.
Thanks, @jrhemstad ! I wouldn't mind moving the On the other hand, I can just change the wording to something like... |
Yes, I think we will need an
Good point. We should keep that as a |
There would also be some inconvenience in having to explicitly convert stream types to the |
We should design for a world where Eventually these types will all exist in libcu++. |
Done. Sorry I don't have the rights to set github labels here. |
Also note, I added a function that doesn't strictly belong to the PR (I hope it's ok). I need it in cuml to decide if I need one worker stream or two (if I cannot use the default). I thought it would be useful for others too: rmm/include/rmm/cuda_stream_view.hpp Lines 96 to 118 in b55088b
|
include/rmm/cuda_event_view.hpp
Outdated
enum cuda_event_flags { | ||
/** Default event flag. */ | ||
EVENT_DEFAULT = cudaEventDefault, | ||
/** Event uses blocking synchronization. */ | ||
EVENT_BLOCKING_SYNC = cudaEventBlockingSync, | ||
/** Event will not record timing data. */ | ||
EVENT_DISABLE_TIMING = cudaEventDisableTiming, | ||
/** Event is suitable for interprocess use. cudaEventDisableTiming must be set. */ | ||
EVENT_INTERPROCESS = cudaEventInterprocess | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, I wonder if these things should be encoded in the type of the cuda_event/event_view
. For example, the elapsed_time_since
function currently accepts any event_view
, but really it only works with an event that was created where cudaEventDisableTiming
wasn't specified. That's the kind of thing that the type system should be used to enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm personally not sure if it's worth the cost. I think lifting cudaEventDisableTiming
into something like cuda_event_with_timing
would be nice. But the same thing would cause problems for cuda_event_view
: we'd get the exception from elapsed_time_since
moved to the implicit conversion/constructor from cudaEvent_t
. cudaEventBlockingSync
does not seem to affect the api at all, so there is no point in lifting it. Not sure if it makes sense for cudaEventInterprocess
and the future flags to come either, and it may seem illogical to a user that the meaning of some of the flags is duplicated in the type system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking something more like cuda_event<Properties...>
.
So the API for elapsed_time_since
would be something like:
template <typename... Properties>
float elapsed_time_since(event_view<Properties...> e){
static_assert( /* Properties does not contain `cudaEventDisableTiming` */ )
}
cudaEventBlockingSync does not seem to affect the api at all, so there is no point in lifting it
Maybe. Maybe not though. I might want to specialize/overload a function for an event that is blocking vs. non-blocking.
Likewise with cudaEventInterprocess
, I might write a function that I require the specified event is capable of working with IPC. I don't believe there is even a way to query a cudaEvent_t
after the fact to detect if it was created with cudaEventInterprocess
, so if a user passed in an incorrectly created stream, an error likely wouldn't be detected until I attempted to use that event in another API like cudaIpcGetEventHandle
.
My point is that there are a number of ways a user can misuse events. We can either detect those at runtime and throw exceptions, or we can detect them at compile time. Personally, I try and push as many errors to compile time as possible.
Thinking through all the complexities and corner cases like this is exactly why we shouldn't have to be designing these things ourselves and should be provided by CUDA, but I digress...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you convince me, @jrhemstad that there is never a need to decide the properties of an event at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I definitely can't. I think what the right thing to do here is use a pattern like std::span
.
A span can have a statically known size, like std::span<int, 5>
is a span of 5 int
s.
Or it can have a dynamically known size std::span<int>
. The way they do this is they have a special sentinel value for the second template argument that says "the size of this span is dynamic". It's called std::dynamic_extent
and it's the default value for the second template argument.
So we could use this same pattern for an event
type where you could have:
namespace cuda_event{
struct timing{};
struct IPC{};
struct dynamic{
cudaEventFlags f;
dynamic(cudaEventFlags f) : f{f}
};
}
cuda_event<timing> e0; // This is an event where timing hasn't been disabled
cuda_event<timing, IPC> e1; // Event that supports timing and IPC
cuda_event<dynamic> e2{/*dynamic value*/};
This way you can have best of both worlds. If I'm writing a function and I want to statically declare that an event passed into my function supports timing then I can specify cuda_event<timing,...>
. If I want to support an event with dynamic properties I can just do cuda_event<dynamic>
.
Don't get me wrong, it's a lot of machinery to make this work, but that's precisely why CUDA should be providing it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having variable number of template arguments would be not ideal, because this would imply the markers can go in any order, and users later would have problems with cuda_event<timing, IPC>
and cuda_event<IPC,timing>
being different types. Perhaps, I can force the order and other constraints using lots of static_assert
s...
In general, I'm always up for more type safety. It's just feels to me a bit foreign in rmm, where there are no similar constructs on the streams (legacy/default/(non-)blocking) and stream pools (non-empty?:).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this, @jrhemstad ?
rmm/include/rmm/cuda_event_view.hpp
Lines 82 to 83 in 9aadcb4
/** @brief An event view with flags provided at runtime. */ | |
class cuda_event_view_ { |
...
rmm/include/rmm/cuda_event_view.hpp
Lines 162 to 163 in 9aadcb4
template <cuda_event_flags Flags = EVENT_DEFAULT> | |
class cuda_event_view : public cuda_event_view_ { |
This way, cuda_event_view
(without brackets) defaults to the most commonly used, statically enforced cuda_event_view<EVENT_DEFAULT>
, while it's still possible to set flags dynamically via cuda_event_view_
. And since I use cuda_event_flags
as the template argument, I can use union (|
) operator and not worry about the order in which the flags appear.
I think this still requires a bit of design work, and we are in burn down for 21.10 so I'm moving this to the next release. @achirkin please change the target branch. |
Ok, though I hoped to push it in 21.10, cause I need events in rapidsai/cuml#4201 . |
@achirkin you can use CUDA events now without a wrapper class. We already use events in RMM. I think we should take our time and get this right, and perhaps it should just live in libcu++ from the beginning. |
Removing |
This PR has been labeled |
This PR has been labeled |
This is going to be added in CCCL. Closing. |
cuda_event
andcuda_event_view
wrappers, similar tocuda_stream
andcuda_stream_view
.cuda_stream
andcuda_stream_view
to interact with the added event types.