Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Make detail APIs order MR parameter after stream #5119

Closed
trevorsm7 opened this issue May 7, 2020 · 12 comments
Closed

[BUG] Make detail APIs order MR parameter after stream #5119

trevorsm7 opened this issue May 7, 2020 · 12 comments
Assignees
Labels
libcudf Affects libcudf (C++/CUDA) code.

Comments

@trevorsm7
Copy link
Contributor

trevorsm7 commented May 7, 2020

Describe the bug
Internal (detail namespace) APIs that mirror public APIs often include both memory resource and stream parameters with defaulted values, and usually in this order:

cudf::detail::example(...other parameters...,
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource(),
  cudaStream_t stream = 0)

However, placing mr before stream is somewhat inconvenient because:

  1. The mr parameter will often use the default because the caller-specified memory resource should only be used for allocations that are returned to the caller, and not for intermediate temporary allocations.
  2. If there is no stream in the calling scope, the public API can usually be used instead.
  3. It's less onerous to explicitly specify 0 or nullptr than rmm::mr::get_default_resource().

Expected behavior
Re-order the stream and mr parameters of detail APIs to make it easier to specify a stream while keeping the implicitly defaulted memory resource. Also, the default value can be removed from the stream parameter to prevent forgetting to hook it up when using the default value for mr.

cudf::detail::example(...other parameters...,
  cudaStream_t stream, // no longer defaults to 0
  rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

Additionally, the TRANSITION.md guide could be updated to explain the reasoning for this ordering.

@trevorsm7 trevorsm7 added libcudf Affects libcudf (C++/CUDA) code. tech debt labels May 7, 2020
@trevorsm7 trevorsm7 self-assigned this May 7, 2020
@harrism
Copy link
Member

harrism commented May 7, 2020

Point number 1 is quite subtle. The common case here depends on whether or not the called function is just the detail implementation of the calling function. For example

cudf::fill(..., mr) calls cudf::detail::fill(..., mr, stream=0).

In this case, you always want to pass the mr from the public API into the detail API.

But in other cases, a decision needs to be made. Example (using proposed ordering, not current ordering)

std::unique_ptr<column> cudf::api_A(..., mr) {
    return cudf::detail::api_A(..., 0, mr); // passes MR, stream set to 0 *****
}

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream=0) {
    ...
    auto temp = cudf::detail::api_B(..., stream); // don't pass non-default MR because output is temp
    ...
    return cudf::detail::api_C(..., stream, mr); // pass non-def MR because output is returned
}

Note the reason we got these switched around is marked by ***** above. A common use of detail APIs is just to implement the public API, and in this case we always pass 0 for stream, so we often ended up putting the detail's stream param last to allow using a default of 0.

Inside detail APIs, we should always be passing stream. And since fixing the present issue would make sure stream is always before mr in the parameter list, we should probably not have a default value for stream parameters.

@trevorsm7
Copy link
Contributor Author

Good points. Also, in your ***** case I noticed that an explicit , 0 is frequently used anyway even where it could just be using the implicit default. For example:

std::vector<cudf::column_view> slice(cudf::column_view const& input,
std::vector<size_type> const& indices)
{
CUDF_FUNC_RANGE();
return detail::slice(input, indices, 0);
}

I suppose that's even more reason to remove the default value for stream. I'll add a note about this in the description.

@jrhemstad
Copy link
Contributor

jrhemstad commented May 8, 2020

Side note: this discussion made me realize that with talk of using PTDS, we should probably stop using a "raw" zero for the default stream and instead use a function that conditionally returns cudaStreamDefault or cudaStreamPerThread.

std::unique_ptr<column> cudf::detail::api_A(..., mr, stream= default_stream()) {

https://github.com/thrust/thrust/pull/717/files

@trevorsm7
Copy link
Contributor Author

trevorsm7 commented May 8, 2020

@jrhemstad what do you think about replacing cudaStream_t stream with something like this? This would prevent using 0 or nullptr, but it would allow {} as a shorthand for default_stream(), as well as implictly casting to/from cudaStream_t.

class cudf_stream {
  cudaStream_t _ptr = default_stream();
  
 public:
  cudf_stream() = default; // {} gives default_stream()
  cudf_stream(int) = delete; // prevent cast from 0
  cudf_stream(std::nullptr_t) = delete; // prevent cast from nullptr
  cudf_stream(cudaStream_t ptr): _ptr{ptr} {} // implict cast from cudaStream_t
  
  operator cudaStream_t() { return _ptr; } // implicit cast to cudaStream_t
};

Edit: here's an example https://wandbox.org/permlink/BGZ5nJMkAhsrCyN2

@trevorsm7
Copy link
Contributor Author

It could even be worth considering something similar for rmm::mr::device_memory_resource* as that would make it so that rmm::mr::get_default_resource() only needs to be defined in one place and all the defaults could just be mr = {}.

@jrhemstad
Copy link
Contributor

I'm always a fan of strong typing. I think that makes sense in this case.

@harrism
Copy link
Member

harrism commented May 9, 2020

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

@jrhemstad
Copy link
Contributor

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

Great minds!

I just ran into a situation where RMM needs a strongly typed stream object:

  explicit device_uvector(std::size_t size,
                          cudaStream_t stream                 = 0,
                          rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource())

  device_uvector(std::size_t size,
                 value_type const& v,
                 cudaStream_t stream                 = 0,
                 rmm::mr::device_memory_resource* mr = rmm::mr::get_default_resource()

When you try and construct device_uvector(100,0) It's ambiguous which of the two ctors should be selected.

@jrhemstad
Copy link
Contributor

I like this. And I also like the idea of RMM owning stream lifetime (solves some corner cases with Async allocators). So maybe this should be an RMM class?

@harrism so are you envisioning an owning stream object? I.e., creates the stream when the object is created, destroyes when the object is destroyed?

So APIs would become:

foo(.., rmm::stream const& s);

The class would have to be moveable, but not copyable (can't have multiple objects trying to destroy the same stream handle), so it should probably just be a wrapper around a unique_ptr<cudaStream_t>.

@harrism
Copy link
Member

harrism commented May 12, 2020

Yeah, ultimately I was thinking about a stream pool owned by RMM. The problem I'm concerned about is an allocator that maintains a free list of blocks associated with a stream: currently that stream could get destroyed without the allocator knowing, and then that list is orphaned. Normally to "steal" those blocks from the stream you cudaStreamSynchronize(), which is UB if the stream has been destroyed.

Moreover, since cudaStream_t is just a pointer, that pointer could potentially get reused for another stream created without the RMM allocator knowing anything about it. Synchronizing this new stream to steal those blocks is actually unnecessary. So you have this free list of blocks incorrectly associated with this new stream.

If RMM owns stream creation and destruction, we can provide a way to ensure this is handled correctly and safely.

And yes, I think making an owning stream object makes sense. But it might just get/return streams from/to the pool rather than calling cudaStreamCreate(). (That can be future work of course)

@trevorsm7
Copy link
Contributor Author

Ok, so for now I can disallow construction of the rmm::stream from a cudaStream_t (while still allowing it to be cast the other way around) and just have a default constructor/factory that returns the default stream. Also, it''ll be passed around by reference as Jake suggested. I'll split this off into another PR.

kkraus14 pushed a commit that referenced this issue Nov 13, 2020
Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the first PR in the series. This series of PRs will

 - Replace usage of `cudaStream_t` with `rmm::cuda_stream_view`
 - Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default`
 - Ensure all APIs always order the stream parameter *before* the memory resource parameter. #5119

This first PR converts:
 - column.hpp (and source)
 - device_column_view.cuh
 - copying.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions.
 - null_mask.hpp (and source) : moves functions that had streams in public APIs to `namespace detail` and adds streamless public versions.
 - AST (transform)
 - Usages of the above APIs in other source files
 - Some benchmarks

Contributes to #6645 and #5119

~Depends on #6732.~
harrism added a commit that referenced this issue Nov 20, 2020
Converting libcudf to use rmm::cuda_stream_view will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the second PR in the series. This series of PRs will

    Replace usage of cudaStream_t with rmm::cuda_stream_view
    Replace usage of 0 or nullptr as a stream identifier with rmm::cuda_stream_default
    Ensure all APIs always order the stream parameter before the memory resource parameter. #5119

Contributes to #6645 and #5119

Depends on #6646 so this PR will look much bigger until that one is merged.

Also fixes #6706 (to_arrow and to_dlpack are not properly synchronized).

This second PR converts:

    table.hpp (and source / dependencies)
    column_factories.hpp (and source / dependencies)
    headers in include/detail/aggregation (and source / dependencies)
    include/detail/groupby/sort_helper.hpp (and source / dependencies)
    include/detail/utilities/cuda.cuh (and dependencies)
    binary ops
    concatenate
    copy_if
    copy_range
    fill
    gather
    get_value
    hash groupby
    quantiles
    reductions
    repeat
    replace
    reshape
    round
    scatter
    search
    sequence
    sorting
    stream compaction
@harrism harrism assigned harrism and unassigned trevorsm7 Nov 23, 2020
kkraus14 pushed a commit that referenced this issue Nov 24, 2020
Converting libcudf to use `rmm::cuda_stream_view` will require a LOT of changes, so I'm splitting it into multiple PRs to ease reviewing. This is the third PR in the series. This series of PRs will

- Replace usage of `cudaStream_t` with `rmm::cuda_stream_view`
- Replace usage of `0` or `nullptr` as a stream identifier with `rmm::cuda_stream_default`
- Ensure all APIs always order the stream parameter before the memory resource parameter. #5119

Contributes to #6645 and #5119

Depends on #6646 and #6648 so this PR will look much bigger until they are merged.

This third PR converts:

 - remaining dictionary functionality
 - cuio
 - lists
 - scalar
 - strings
 - groupby
 - join
 - contiguous_split
 - get_element
 - datetime_ops
 - extract
 - merge
 - partitioning
 - minmax reduction
 - scan
 - byte_cast
 - clamp
 - interleave_columns
 - is_sorted
 - groupby
 - rank
 - tests
 - concurrent map classes
@harrism
Copy link
Member

harrism commented Dec 2, 2020

Fixed by #6744

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants