[BUG] Make detail APIs order MR parameter after stream #5119

trevorsm7 · 2020-05-07T00:32:42Z

Describe the bug
Internal (detail namespace) APIs that mirror public APIs often include both memory resource and stream parameters with defaulted values, and usually in this order:

cudf::detail::example(...other parameters...,
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource(),
  cudaStream_t stream = 0)

However, placing mr before stream is somewhat inconvenient because:

The mr parameter will often use the default because the caller-specified memory resource should only be used for allocations that are returned to the caller, and not for intermediate temporary allocations.
If there is no stream in the calling scope, the public API can usually be used instead.
It's less onerous to explicitly specify 0 or nullptr than rmm::mr::get_default_resource().

Expected behavior
Re-order the stream and mr parameters of detail APIs to make it easier to specify a stream while keeping the implicitly defaulted memory resource. Also, the default value can be removed from the stream parameter to prevent forgetting to hook it up when using the default value for mr.

cudf::detail::example(...other parameters...,
  cudaStream_t stream, // no longer defaults to 0
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

Additionally, the TRANSITION.md guide could be updated to explain the reasoning for this ordering.

The text was updated successfully, but these errors were encountered:

harrism · 2020-05-07T02:15:49Z

Point number 1 is quite subtle. The common case here depends on whether or not the called function is just the detail implementation of the calling function. For example

cudf::fill(..., mr) calls cudf::detail::fill(..., mr, stream=0).

In this case, you always want to pass the mr from the public API into the detail API.

But in other cases, a decision needs to be made. Example (using proposed ordering, not current ordering)

std::unique_ptr<column> cudf::api_A(..., mr) {
    return cudf::detail::api_A(..., 0, mr); // passes MR, stream set to 0 *****
}

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream=0) {
    ...
    auto temp = cudf::detail::api_B(..., stream); // don't pass non-default MR because output is temp
    ...
    return cudf::detail::api_C(..., stream, mr); // pass non-def MR because output is returned
}

Note the reason we got these switched around is marked by ***** above. A common use of detail APIs is just to implement the public API, and in this case we always pass 0 for stream, so we often ended up putting the detail's stream param last to allow using a default of 0.

Inside detail APIs, we should always be passing stream. And since fixing the present issue would make sure stream is always before mr in the parameter list, we should probably not have a default value for stream parameters.

trevorsm7 · 2020-05-07T20:26:29Z

Good points. Also, in your ***** case I noticed that an explicit , 0 is frequently used anyway even where it could just be using the implicit default. For example:

cudf/cpp/src/copying/slice.cpp

Lines 65 to 70 in 17ad431

    
           std::vector<cudf::column_view> slice(cudf::column_view const& input, 
        
                                                std::vector<size_type> const& indices) 
        
           { 
        
             CUDF_FUNC_RANGE(); 
        
             return detail::slice(input, indices, 0); 
        
           }

I suppose that's even more reason to remove the default value for stream. I'll add a note about this in the description.

jrhemstad · 2020-05-08T00:15:51Z

Side note: this discussion made me realize that with talk of using PTDS, we should probably stop using a "raw" zero for the default stream and instead use a function that conditionally returns cudaStreamDefault or cudaStreamPerThread.

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream= default_stream()) {

https://github.com/thrust/thrust/pull/717/files

trevorsm7 · 2020-05-08T21:18:04Z

@jrhemstad what do you think about replacing cudaStream_t stream with something like this? This would prevent using 0 or nullptr, but it would allow {} as a shorthand for default_stream(), as well as implictly casting to/from cudaStream_t.

class cudf_stream {
  cudaStream_t _ptr = default_stream();
  
 public:
  cudf_stream() = default; // {} gives default_stream()
  cudf_stream(int) = delete; // prevent cast from 0
  cudf_stream(std::nullptr_t) = delete; // prevent cast from nullptr
  cudf_stream(cudaStream_t ptr): _ptr{ptr} {} // implict cast from cudaStream_t
  
  operator cudaStream_t() { return _ptr; } // implicit cast to cudaStream_t
};

Edit: here's an example https://wandbox.org/permlink/BGZ5nJMkAhsrCyN2

trevorsm7 · 2020-05-08T21:25:31Z

It could even be worth considering something similar for rmm::mr::device_memory_resource* as that would make it so that rmm::mr::get_default_resource() only needs to be defined in one place and all the defaults could just be mr = {}.

jrhemstad · 2020-05-09T00:14:05Z

I'm always a fan of strong typing. I think that makes sense in this case.

harrism · 2020-05-09T01:15:24Z

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

jrhemstad · 2020-05-11T17:08:02Z

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

Great minds!

I just ran into a situation where RMM needs a strongly typed stream object:

  explicit device_uvector(std::size_t size,
                          cudaStream_t stream                 = 0,
                          rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

  device_uvector(std::size_t size,
                 value_type const& v,
                 cudaStream_t stream                 = 0,
                 rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource()

When you try and construct device_uvector(100,0) It's ambiguous which of the two ctors should be selected.

jrhemstad · 2020-05-11T21:45:52Z

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

@harrism so are you envisioning an owning stream object? I.e., creates the stream when the object is created, destroyes when the object is destroyed?

So APIs would become:

foo(.., rmm::stream const& s);

The class would have to be moveable, but not copyable (can't have multiple objects trying to destroy the same stream handle), so it should probably just be a wrapper around a unique_ptr<cudaStream_t>.

harrism · 2020-05-12T00:15:59Z

Yeah, ultimately I was thinking about a stream pool owned by RMM. The problem I'm concerned about is an allocator that maintains a free list of blocks associated with a stream: currently that stream could get destroyed without the allocator knowing, and then that list is orphaned. Normally to "steal" those blocks from the stream you cudaStreamSynchronize(), which is UB if the stream has been destroyed.

Moreover, since cudaStream_t is just a pointer, that pointer could potentially get reused for another stream created without the RMM allocator knowing anything about it. Synchronizing this new stream to steal those blocks is actually unnecessary. So you have this free list of blocks incorrectly associated with this new stream.

If RMM owns stream creation and destruction, we can provide a way to ensure this is handled correctly and safely.

And yes, I think making an owning stream object makes sense. But it might just get/return streams from/to the pool rather than calling cudaStreamCreate(). (That can be future work of course)

trevorsm7 · 2020-05-12T00:32:16Z

Ok, so for now I can disallow construction of the rmm::stream from a cudaStream_t (while still allowing it to be cast the other way around) and just have a default constructor/factory that returns the default stream. Also, it''ll be passed around by reference as Jake suggested. I'll split this off into another PR.

Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the first PR in the series. This series of PRs will - Replace usage of `cudaStream_t` with `rmm::cuda_stream_view` - Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default` - Ensure all APIs always order the stream parameter *before* the memory resource parameter. #5119 This first PR converts: - column.hpp (and source) - device_column_view.cuh - copying.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions. - null_mask.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions. - AST (transform) - Usages of the above APIs in other source files - Some benchmarks Contributes to #6645 and #5119 ~Depends on #6732.~

Converting libcudf to use rmm::cuda_stream_view will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the second PR in the series. This series of PRs will Replace usage of cudaStream_t with rmm::cuda_stream_view Replace usage of 0 or nullptr as a stream identifier with rmm::cuda_stream_default Ensure all APIs always order the stream parameter before the memory resource parameter. #5119 Contributes to #6645 and #5119 Depends on #6646 so this PR will look much bigger until that one is merged. Also fixes #6706 (to_arrow and to_dlpack are not properly synchronized). This second PR converts: table.hpp (and source / dependencies) column_factories.hpp (and source / dependencies) headers in include/detail/aggregation (and source / dependencies) include/detail/groupby/sort_helper.hpp (and source / dependencies) include/detail/utilities/cuda.cuh (and dependencies) binary ops concatenate copy_if copy_range fill gather get_value hash groupby quantiles reductions repeat replace reshape round scatter search sequence sorting stream compaction

Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the third PR in the series. This series of PRs will - Replace usage of `cudaStream_t` with `rmm::cuda_stream_view` - Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default` - Ensure all APIs always order the stream parameter before the memory resource parameter. #5119 Contributes to #6645 and #5119 Depends on #6646 and #6648 so this PR will look much bigger until they are merged. This third PR converts: - remaining dictionary functionality - cuio - lists - scalar - strings - groupby - join - contiguous_split - get_element - datetime_ops - extract - merge - partitioning - minmax reduction - scan - byte_cast - clamp - interleave_columns - is_sorted - groupby - rank - tests - concurrent map classes

harrism · 2020-12-02T02:07:49Z

Fixed by #6744

trevorsm7 added libcudf Affects libcudf (C++/CUDA) code. tech debt labels May 7, 2020

trevorsm7 self-assigned this May 7, 2020

trevorsm7 mentioned this issue May 8, 2020

[WIP] Cleanup detail APIs #5135

Closed

trevorsm7 mentioned this issue May 14, 2020

[REVIEW] Add detail headers for strings converter functions #5198

Merged

harrism mentioned this issue May 26, 2020

[REVIEW] Nested list column types, phase 2 #4990

Merged

devavret mentioned this issue Jun 1, 2020

[REVIEW] Cross join feature. #5327

Merged

This was referenced Nov 2, 2020

[REVIEW] Implement cudf::round floating point and integer types (HALF_UP) [skip ci] #6562

Merged

Replace raw streams with rmm::cuda_stream_view (part 1) #6646

Merged

Replace raw streams with rmm::cuda_stream_view (part 2) #6648

Merged

harrism mentioned this issue Nov 12, 2020

Replace raw streams with rmm::cuda_stream_view (part 3) #6744

Merged

harrism mentioned this issue Nov 16, 2020

Support scatter() for list columns #6768

Merged

harrism assigned harrism and unassigned trevorsm7 Nov 23, 2020

harrism closed this as completed Dec 2, 2020

davidwendt mentioned this issue Jan 26, 2022

Update developer guide to recommend no default stream parameter. #10136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Make detail APIs order MR parameter after stream #5119

[BUG] Make detail APIs order MR parameter after stream #5119

trevorsm7 commented May 7, 2020 •

edited

Loading

harrism commented May 7, 2020

trevorsm7 commented May 7, 2020

jrhemstad commented May 8, 2020 •

edited

Loading

trevorsm7 commented May 8, 2020 •

edited

Loading

trevorsm7 commented May 8, 2020

jrhemstad commented May 9, 2020

harrism commented May 9, 2020

jrhemstad commented May 11, 2020

jrhemstad commented May 11, 2020

harrism commented May 12, 2020

trevorsm7 commented May 12, 2020

harrism commented Dec 2, 2020

[BUG] Make detail APIs order MR parameter after stream #5119

[BUG] Make detail APIs order MR parameter after stream #5119

Comments

trevorsm7 commented May 7, 2020 • edited Loading

harrism commented May 7, 2020

trevorsm7 commented May 7, 2020

jrhemstad commented May 8, 2020 • edited Loading

trevorsm7 commented May 8, 2020 • edited Loading

trevorsm7 commented May 8, 2020

jrhemstad commented May 9, 2020

harrism commented May 9, 2020

jrhemstad commented May 11, 2020

jrhemstad commented May 11, 2020

harrism commented May 12, 2020

trevorsm7 commented May 12, 2020

harrism commented Dec 2, 2020

trevorsm7 commented May 7, 2020 •

edited

Loading

jrhemstad commented May 8, 2020 •

edited

Loading

trevorsm7 commented May 8, 2020 •

edited

Loading