-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Make detail APIs order MR parameter after stream #5119
Comments
Point number 1 is quite subtle. The common case here depends on whether or not the called function is just the detail implementation of the calling function. For example
In this case, you always want to pass the mr from the public API into the detail API. But in other cases, a decision needs to be made. Example (using proposed ordering, not current ordering)
Note the reason we got these switched around is marked by ***** above. A common use of detail APIs is just to implement the public API, and in this case we always pass 0 for stream, so we often ended up putting the detail's stream param last to allow using a default of 0. Inside detail APIs, we should always be passing |
Good points. Also, in your ***** case I noticed that an explicit cudf/cpp/src/copying/slice.cpp Lines 65 to 70 in 17ad431
I suppose that's even more reason to remove the default value for stream . I'll add a note about this in the description.
|
Side note: this discussion made me realize that with talk of using PTDS, we should probably stop using a "raw" zero for the default stream and instead use a function that conditionally returns
|
@jrhemstad what do you think about replacing class cudf_stream {
cudaStream_t _ptr = default_stream();
public:
cudf_stream() = default; // {} gives default_stream()
cudf_stream(int) = delete; // prevent cast from 0
cudf_stream(std::nullptr_t) = delete; // prevent cast from nullptr
cudf_stream(cudaStream_t ptr): _ptr{ptr} {} // implict cast from cudaStream_t
operator cudaStream_t() { return _ptr; } // implicit cast to cudaStream_t
}; Edit: here's an example https://wandbox.org/permlink/BGZ5nJMkAhsrCyN2 |
It could even be worth considering something similar for |
I'm always a fan of strong typing. I think that makes sense in this case. |
I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class? |
Great minds! I just ran into a situation where RMM needs a strongly typed stream object:
When you try and construct |
@harrism so are you envisioning an owning stream object? I.e., creates the stream when the object is created, destroyes when the object is destroyed? So APIs would become:
The class would have to be moveable, but not copyable (can't have multiple objects trying to destroy the same stream handle), so it should probably just be a wrapper around a |
Yeah, ultimately I was thinking about a stream pool owned by RMM. The problem I'm concerned about is an allocator that maintains a free list of blocks associated with a stream: currently that stream could get destroyed without the allocator knowing, and then that list is orphaned. Normally to "steal" those blocks from the stream you cudaStreamSynchronize(), which is UB if the stream has been destroyed. Moreover, since cudaStream_t is just a pointer, that pointer could potentially get reused for another stream created without the RMM allocator knowing anything about it. Synchronizing this new stream to steal those blocks is actually unnecessary. So you have this free list of blocks incorrectly associated with this new stream. If RMM owns stream creation and destruction, we can provide a way to ensure this is handled correctly and safely. And yes, I think making an owning stream object makes sense. But it might just get/return streams from/to the pool rather than calling cudaStreamCreate(). (That can be future work of course) |
Ok, so for now I can disallow construction of the |
Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the first PR in the series. This series of PRs will - Replace usage of `cudaStream_t` with `rmm::cuda_stream_view` - Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default` - Ensure all APIs always order the stream parameter *before* the memory resource parameter. #5119 This first PR converts: - column.hpp (and source) - device_column_view.cuh - copying.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions. - null_mask.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions. - AST (transform) - Usages of the above APIs in other source files - Some benchmarks Contributes to #6645 and #5119 ~Depends on #6732.~
Converting libcudf to use rmm::cuda_stream_view will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the second PR in the series. This series of PRs will Replace usage of cudaStream_t with rmm::cuda_stream_view Replace usage of 0 or nullptr as a stream identifier with rmm::cuda_stream_default Ensure all APIs always order the stream parameter before the memory resource parameter. #5119 Contributes to #6645 and #5119 Depends on #6646 so this PR will look much bigger until that one is merged. Also fixes #6706 (to_arrow and to_dlpack are not properly synchronized). This second PR converts: table.hpp (and source / dependencies) column_factories.hpp (and source / dependencies) headers in include/detail/aggregation (and source / dependencies) include/detail/groupby/sort_helper.hpp (and source / dependencies) include/detail/utilities/cuda.cuh (and dependencies) binary ops concatenate copy_if copy_range fill gather get_value hash groupby quantiles reductions repeat replace reshape round scatter search sequence sorting stream compaction
Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the third PR in the series. This series of PRs will - Replace usage of `cudaStream_t` with `rmm::cuda_stream_view` - Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default` - Ensure all APIs always order the stream parameter before the memory resource parameter. #5119 Contributes to #6645 and #5119 Depends on #6646 and #6648 so this PR will look much bigger until they are merged. This third PR converts: - remaining dictionary functionality - cuio - lists - scalar - strings - groupby - join - contiguous_split - get_element - datetime_ops - extract - merge - partitioning - minmax reduction - scan - byte_cast - clamp - interleave_columns - is_sorted - groupby - rank - tests - concurrent map classes
Fixed by #6744 |
Describe the bug
Internal (
detail
namespace) APIs that mirror public APIs often include both memory resource and stream parameters with defaulted values, and usually in this order:cudf::detail::example(...other parameters..., rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource(), cudaStream_t stream = 0)
However, placing
mr
beforestream
is somewhat inconvenient because:mr
parameter will often use the default because the caller-specified memory resource should only be used for allocations that are returned to the caller, and not for intermediate temporary allocations.0
ornullptr
thanrmm::mr::get_default_resource()
.Expected behavior
Re-order the
stream
andmr
parameters ofdetail
APIs to make it easier to specify a stream while keeping the implicitly defaulted memory resource. Also, the default value can be removed from thestream
parameter to prevent forgetting to hook it up when using the default value formr
.Additionally, the
TRANSITION.md
guide could be updated to explain the reasoning for this ordering.The text was updated successfully, but these errors were encountered: