-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement maybe-owning multi-dimensional container (mdbuffer) #1999
Conversation
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Artem M. Chirkin <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @whpicks for this feature! Due to lack of time I just skimmed through a large part of implementation details, so my review is restricted to the high level API.
The API is well polished and seems convenient to use. I think it will be a really helpful utility!
You are handling host
, device
, managed
, and pinned
memory types. I agree that these are the variants that we expect to encounter in practice.
We have APIs in RAFT, specifically some of the ANN build algorithms, that just care about the data shape, and host/device dispatch is happenning only deeper in the call chain. At that point mdbuffer
is a great help to simplify the code when we finally need to guarantee accessibility from device. And I see that actually all four allocation types could be useful when we create an mdbuffer
. So having the vocabulary to express this using raft::memory_type
seems helpful.
Still, I am a little bit concerned by the proliferation of mdarray
types. The allocation mechanism of mdarray
can be defined by simply passing the desired memory_resource
during creation. When we want to use the data, we care about whether it is accessible from host or device memory space, and that is expressed by the accessor_type
. We shall ask the question: it is really useful to have so many mdarray
/ mdspan
variants? I am not sure about this, but I do not have any objection for the current proposal. We should still keep in mind that other memory requirements will could come up (e.g. array allocated with huge page attributes) and we probably don't want to define new mdarray
type for each of these.
Thanks once more for this work, it is a great addition to RAFT! The PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great! I think this code will really help simplify some of our code, and I'm looking forward to taking advantage of it
I actually strongly agree with this. As you say, there are use cases where we really only care about whether or not we can access those data at a particular point in runtime and what the shape/layout is. In that case, we pay fairly dearly for the proliferation of compile-time types. The compile-time specification of memory type also runs counter to some of the general ways I see folks thinking about the purpose of I think there may be room for both constructs to exist alongside one another, however. There's no question that having the ability to dispatch to radically different compile-time paths based on the runtime condition of memory type is useful, and I'm hoping that's what mdbuffer will support. On the other hand, it would be good to revisit how we're using mdspan in cases where we explicitly wish to elide the difference between memory types for e.g. GPU/CPU compatible code. I had proposed something along those lines quite some time ago, but at the time others objected to it. Maybe we can review some of those ideas again as a follow-on to this PR and see if there is greater clarity on the different use cases now that |
Thank you everyone for all of the valuable feedback on this! I'm going to try to implement @benfred 's suggestion to align the memory_type values with cudaMemoryType, and if there are no objections, I'll trigger a merge after that. |
This is counter to the original use of mdspan within RAFT also. We intentionally created only two different types (device and host) with a general accessor wrapper that specifies whether or not the memory can be read from device or host. Note that the "both" case can still be specified through that pattern, which also covers managed memory, for example. The idea is to allow functions to define which access level they are interested in without having to introduce new template types for each possible combination. The idea behind this lightweight accessor wrapper is so that we can support any accessor the user might be using so long as the accessor can produce the correct values, either on host or device, when asked... (intentionally side stepping the submdspan can of worms discussion here too). I also agree that we should be able to simplify mdarray in this way- since the template itself should only care about what mdspan it needs to produce in order to interact with APIs that require it. Ideally, the mdarray would accept the memory type as a runtime argument so long as it produces an mdspan that is either device or host accessible. One problem here is that the way our current mdarray is designed (which now slightly differs from libcu++ and the actual spec), the memory type is coupled to the container policy, which is a template argument. It's on the todo list to reconcile the mdarray with libcu++, and even eventually replace the current one with the one in libcu++. It just keeps getting put on the back-burner behind cuVS priorities. I'm not convinced we need to be worrying about CUDA IPC (or even SCADA) at this layer. That seems like details we would want to delegate to the container policy. Hopefully we can someday find a good way to purge that from the templates altogether. |
The decision to make separate "device" and "host" mdspan typedefs adheres to the design of the other c++ vocabulary types in RAPIDS and in the CCCL (including Thrust). As an organization, we should be striving to follow CCCL's lead here. I personally think the mdbuffer is going to be very useful for gluing together the container and view layer, allowing things like smart copy-back in between. However, this shouldn't change the vocab types in our public APIs (which is why it's so great that mdbuffer supports implicit conversion). |
Thanks! That's not really what I was getting at, but I think this is getting beyond the scope of this PR. We can discuss more elsewhere. |
Also update necessary copyright headers
/ok to test |
/merge |
@wphicks, my prior response was also in reply to @tfeher's earlier comment:
(I can only speak to the decision for "device"-accessible and "host"-accessible variants) |
/ok to test |
…ai#1999) ### What is mdbuffer? This PR introduces a maybe-owning multi-dimensional abstraction called `mdbuffer` to help simplify code that _may_ require an `mdarray` but only if the data are not already in a desired form or location. As a concrete example, consider a function `foo_device` which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which a `device_mdarray` is created and the data are explicitly copied from host to device. This leads to a proliferation of branches as `foo_device` interacts with other functions with similar requirements. As an initial simplification, `mdbuffer` allows us to write a single template that accepts an `mdspan` pointing to memory on either host _or_ device and routes it through the same code: ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device. // If it was already accessible from device, no copy was performed. If it // was not, a copy was performed. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // It is sometimes useful to know whether or not a copy was performed to // e.g. determine whether the transformed data should be copied back to its original // location. This can be checked via the `is_owning()` method. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } foo_device(res, some_host_mdspan); // Still works; memory is allocated and copy is performed foo_device(res, some_device_mdspan); // Still works and no allocation or copy is required foo_device(res, some_managed_mdspan); // Still works and no allocation or copy is required ``` While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable, `mdbuffer` permits a further consolidation through implicit conversion of an mdspan to an mdbuffer: ```c++ void foo_device(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { auto buf = raft::mdbuffer{res, data, raft::memory_type::device}; some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } // All of the following work exactly as before but no longer require separate template instantiations foo_device(res, some_host_mdspan); foo_device(res, some_device_mdspan); foo_device(res, some_managed_mdspan); ``` `mdbuffer` also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's `.view()` method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve a `std::variant` of all mdspan types which may provide a view on the `mdbuffer`'s data (depending on its memory type). We can then use `std::visit` to perform runtime dispatching based on where the data are stored: ```c++ void foo(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { std::visit([](auto view) { if constexpr (typename decltype(view)::accessor_type::is_device_accessible) { // Do something with these data on device } else { // Do something with these data on host } }, data.view()); } ``` In addition to moving data among various memory types (host, device, managed, and pinned currently), `mdbuffer` can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary. ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer<float, raft::matrix_extent<int>, raft::row_major>{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device, and // represented by floats in row-major order. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // The same check can be used to determine whether or not a copy was // required, regardless of the cause. I.e. if the data were already on // device but in column-major order, the is_owning() method would still // return true because new storage needed to be allocated. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } ``` ### What mdbuffer is **not** `mdbuffer` is **not** a replacement for either `mdspan` or `mdarray`. `mdspan` remains the standard object for passing data views throughout the RAFT codebase, and `mdarray` remains the standard object for allocating new multi-dimensional data. This is reflected in the fact that `mdbuffer` can _only_ be constructed from an existing `mdspan` or `mdarray` or another `mdbuffer`. `mdbuffer` is intended to be used solely to simplify code where data _may_ need to be copied to a different location. ### Follow-ups - I have omitted the mdbuffer-based replacement for and generalization of `temporary_device_buffer` since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here. - For all necessary copies, `mdbuffer` uses `raft::copy`. For _some_ transformations that require a change in data type or layout, `raft::copy` is not fully optimized. See rapidsai#1842 for more information. Optimizing this will be an important change to ensure that `mdbuffer` can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect for `mdbuffer`, however, so there should be little concern about beginning to use it as is. - `std::visit`'s performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmark `mdbuffer`'s current performance and compare to internal use of a `visit` implementation that uses a `switch` on the available memory types. Resolve rapidsai#1602 Authors: - William Hicks (https://github.com/wphicks) - Tarang Jain (https://github.com/tarang-jain) Approvers: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Tamas Bela Feher (https://github.com/tfeher) - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1999
What is mdbuffer?
This PR introduces a maybe-owning multi-dimensional abstraction called
mdbuffer
to help simplify code that may require anmdarray
but only if the data are not already in a desired form or location.As a concrete example, consider a function
foo_device
which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which adevice_mdarray
is created and the data are explicitly copied from host to device. This leads to a proliferation of branches asfoo_device
interacts with other functions with similar requirements.As an initial simplification,
mdbuffer
allows us to write a single template that accepts anmdspan
pointing to memory on either host or device and routes it through the same code:While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable,
mdbuffer
permits a further consolidation through implicit conversion of an mdspan to an mdbuffer:mdbuffer
also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's.view()
method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve astd::variant
of all mdspan types which may provide a view on themdbuffer
's data (depending on its memory type). We can then usestd::visit
to perform runtime dispatching based on where the data are stored:In addition to moving data among various memory types (host, device, managed, and pinned currently),
mdbuffer
can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary.What mdbuffer is not
mdbuffer
is not a replacement for eithermdspan
ormdarray
.mdspan
remains the standard object for passing data views throughout the RAFT codebase, andmdarray
remains the standard object for allocating new multi-dimensional data. This is reflected in the fact thatmdbuffer
can only be constructed from an existingmdspan
ormdarray
or anothermdbuffer
.mdbuffer
is intended to be used solely to simplify code where data may need to be copied to a different location.Follow-ups
temporary_device_buffer
since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here.mdbuffer
usesraft::copy
. For some transformations that require a change in data type or layout,raft::copy
is not fully optimized. See [FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842 for more information. Optimizing this will be an important change to ensure thatmdbuffer
can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect formdbuffer
, however, so there should be little concern about beginning to use it as is.std::visit
's performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmarkmdbuffer
's current performance and compare to internal use of avisit
implementation that uses aswitch
on the available memory types.Resolve #1602