[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

wphicks · 2023-09-22T15:57:23Z

As a follow up to PR #1818, the logic for arbitrary host-to-host copies as well as the custom device kernel would benefit from cache-oblivious transfers. This would be almost trivial with #501, which (as I understand it) would extend submdspan support to arbitrary layouts. On the host side, the smallest submdspan accessed in the cache-oblivious transfer could (generally speaking) be SIMD accelerated, as well.

### What is mdbuffer? This PR introduces a maybe-owning multi-dimensional abstraction called `mdbuffer` to help simplify code that _may_ require an `mdarray` but only if the data are not already in a desired form or location. As a concrete example, consider a function `foo_device` which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which a `device_mdarray` is created and the data are explicitly copied from host to device. This leads to a proliferation of branches as `foo_device` interacts with other functions with similar requirements. As an initial simplification, `mdbuffer` allows us to write a single template that accepts an `mdspan` pointing to memory on either host _or_ device and routes it through the same code: ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device. // If it was already accessible from device, no copy was performed. If it // was not, a copy was performed. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // It is sometimes useful to know whether or not a copy was performed to // e.g. determine whether the transformed data should be copied back to its original // location. This can be checked via the `is_owning()` method. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } foo_device(res, some_host_mdspan); // Still works; memory is allocated and copy is performed foo_device(res, some_device_mdspan); // Still works and no allocation or copy is required foo_device(res, some_managed_mdspan); // Still works and no allocation or copy is required ``` While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable, `mdbuffer` permits a further consolidation through implicit conversion of an mdspan to an mdbuffer: ```c++ void foo_device(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { auto buf = raft::mdbuffer{res, data, raft::memory_type::device}; some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } // All of the following work exactly as before but no longer require separate template instantiations foo_device(res, some_host_mdspan); foo_device(res, some_device_mdspan); foo_device(res, some_managed_mdspan); ``` `mdbuffer` also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's `.view()` method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve a `std::variant` of all mdspan types which may provide a view on the `mdbuffer`'s data (depending on its memory type). We can then use `std::visit` to perform runtime dispatching based on where the data are stored: ```c++ void foo(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { std::visit([](auto view) { if constexpr (typename decltype(view)::accessor_type::is_device_accessible) { // Do something with these data on device } else { // Do something with these data on host } }, data.view()); } ``` In addition to moving data among various memory types (host, device, managed, and pinned currently), `mdbuffer` can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary. ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer<float, raft::matrix_extent<int>, raft::row_major>{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device, and // represented by floats in row-major order. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // The same check can be used to determine whether or not a copy was // required, regardless of the cause. I.e. if the data were already on // device but in column-major order, the is_owning() method would still // return true because new storage needed to be allocated. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } ``` ### What mdbuffer is **not** `mdbuffer` is **not** a replacement for either `mdspan` or `mdarray`. `mdspan` remains the standard object for passing data views throughout the RAFT codebase, and `mdarray` remains the standard object for allocating new multi-dimensional data. This is reflected in the fact that `mdbuffer` can _only_ be constructed from an existing `mdspan` or `mdarray` or another `mdbuffer`. `mdbuffer` is intended to be used solely to simplify code where data _may_ need to be copied to a different location. ### Follow-ups - I have omitted the mdbuffer-based replacement for and generalization of `temporary_device_buffer` since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here. - For all necessary copies, `mdbuffer` uses `raft::copy`. For _some_ transformations that require a change in data type or layout, `raft::copy` is not fully optimized. See #1842 for more information. Optimizing this will be an important change to ensure that `mdbuffer` can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect for `mdbuffer`, however, so there should be little concern about beginning to use it as is. - `std::visit`'s performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmark `mdbuffer`'s current performance and compare to internal use of a `visit` implementation that uses a `switch` on the available memory types. Resolve #1602 Authors: - William Hicks (https://github.com/wphicks) - Tarang Jain (https://github.com/tarang-jain) Approvers: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Tamas Bela Feher (https://github.com/tfeher) - Ben Frederickson (https://github.com/benfred) URL: #1999

…ai#1999) ### What is mdbuffer? This PR introduces a maybe-owning multi-dimensional abstraction called `mdbuffer` to help simplify code that _may_ require an `mdarray` but only if the data are not already in a desired form or location. As a concrete example, consider a function `foo_device` which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which a `device_mdarray` is created and the data are explicitly copied from host to device. This leads to a proliferation of branches as `foo_device` interacts with other functions with similar requirements. As an initial simplification, `mdbuffer` allows us to write a single template that accepts an `mdspan` pointing to memory on either host _or_ device and routes it through the same code: ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device. // If it was already accessible from device, no copy was performed. If it // was not, a copy was performed. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // It is sometimes useful to know whether or not a copy was performed to // e.g. determine whether the transformed data should be copied back to its original // location. This can be checked via the `is_owning()` method. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } foo_device(res, some_host_mdspan); // Still works; memory is allocated and copy is performed foo_device(res, some_device_mdspan); // Still works and no allocation or copy is required foo_device(res, some_managed_mdspan); // Still works and no allocation or copy is required ``` While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable, `mdbuffer` permits a further consolidation through implicit conversion of an mdspan to an mdbuffer: ```c++ void foo_device(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { auto buf = raft::mdbuffer{res, data, raft::memory_type::device}; some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } // All of the following work exactly as before but no longer require separate template instantiations foo_device(res, some_host_mdspan); foo_device(res, some_device_mdspan); foo_device(res, some_managed_mdspan); ``` `mdbuffer` also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's `.view()` method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve a `std::variant` of all mdspan types which may provide a view on the `mdbuffer`'s data (depending on its memory type). We can then use `std::visit` to perform runtime dispatching based on where the data are stored: ```c++ void foo(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) { std::visit([](auto view) { if constexpr (typename decltype(view)::accessor_type::is_device_accessible) { // Do something with these data on device } else { // Do something with these data on host } }, data.view()); } ``` In addition to moving data among various memory types (host, device, managed, and pinned currently), `mdbuffer` can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary. ```c++ template <typename mdspan_type> void foo_device(raft::resources const& res, mdspan_type data) { auto buf = raft::mdbuffer<float, raft::matrix_extent<int>, raft::row_major>{res, raft::mdbuffer{data}, raft::memory_type::device}; // Data in buf is now guaranteed to be accessible from device, and // represented by floats in row-major order. some_kernel<<<...>>>(buf.view<raft::memory_type::device>()); // The same check can be used to determine whether or not a copy was // required, regardless of the cause. I.e. if the data were already on // device but in column-major order, the is_owning() method would still // return true because new storage needed to be allocated. if (buf.is_owning()) { raft::copy(res, data, buf.view<raft::memory_type::device>()); } } ``` ### What mdbuffer is **not** `mdbuffer` is **not** a replacement for either `mdspan` or `mdarray`. `mdspan` remains the standard object for passing data views throughout the RAFT codebase, and `mdarray` remains the standard object for allocating new multi-dimensional data. This is reflected in the fact that `mdbuffer` can _only_ be constructed from an existing `mdspan` or `mdarray` or another `mdbuffer`. `mdbuffer` is intended to be used solely to simplify code where data _may_ need to be copied to a different location. ### Follow-ups - I have omitted the mdbuffer-based replacement for and generalization of `temporary_device_buffer` since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here. - For all necessary copies, `mdbuffer` uses `raft::copy`. For _some_ transformations that require a change in data type or layout, `raft::copy` is not fully optimized. See rapidsai#1842 for more information. Optimizing this will be an important change to ensure that `mdbuffer` can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect for `mdbuffer`, however, so there should be little concern about beginning to use it as is. - `std::visit`'s performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmark `mdbuffer`'s current performance and compare to internal use of a `visit` implementation that uses a `switch` on the available memory types. Resolve rapidsai#1602 Authors: - William Hicks (https://github.com/wphicks) - Tarang Jain (https://github.com/tarang-jain) Approvers: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) - Tamas Bela Feher (https://github.com/tfeher) - Ben Frederickson (https://github.com/benfred) URL: rapidsai#1999

wphicks added the feature request New feature or request label Sep 22, 2023

wphicks self-assigned this Sep 22, 2023

wphicks mentioned this issue Nov 15, 2023

Implement maybe-owning multi-dimensional container (mdbuffer) #1999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

wphicks commented Sep 22, 2023

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

Comments

wphicks commented Sep 22, 2023