Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

Open
wphicks opened this issue Sep 22, 2023 · 0 comments
Open

[FEA] Use cache-oblivious copies for arbitrary copies in raft::copy #1842

wphicks opened this issue Sep 22, 2023 · 0 comments
Assignees
Labels
feature request New feature or request

Comments

@wphicks
Copy link
Contributor

wphicks commented Sep 22, 2023

As a follow up to PR #1818, the logic for arbitrary host-to-host copies as well as the custom device kernel would benefit from cache-oblivious transfers. This would be almost trivial with #501, which (as I understand it) would extend submdspan support to arbitrary layouts. On the host side, the smallest submdspan accessed in the cache-oblivious transfer could (generally speaking) be SIMD accelerated, as well.

@wphicks wphicks added the feature request New feature or request label Sep 22, 2023
@wphicks wphicks self-assigned this Sep 22, 2023
rapids-bot bot pushed a commit that referenced this issue Jan 4, 2024
### What is mdbuffer?

This PR introduces a maybe-owning multi-dimensional abstraction called `mdbuffer` to help simplify code that _may_ require an `mdarray` but only if the data are not already in a desired form or location.

As a concrete example, consider a function `foo_device` which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which a `device_mdarray` is created and the data are explicitly copied from host to device. This leads to a proliferation of branches as `foo_device` interacts with other functions with similar requirements.

As an initial simplification, `mdbuffer` allows us to write a single template that accepts an `mdspan` pointing to memory on either host _or_ device and routes it through the same code:
```c++
template <typename mdspan_type>
void foo_device(raft::resources const& res, mdspan_type data) {
  auto buf = raft::mdbuffer{res, raft::mdbuffer{data}, raft::memory_type::device};
  // Data in buf is now guaranteed to be accessible from device.
  // If it was already accessible from device, no copy was performed. If it
  // was not, a copy was performed.

  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());

  // It is sometimes useful to know whether or not a copy was performed to
  // e.g. determine whether the transformed data should be copied back to its original
  // location. This can be checked via the `is_owning()` method.
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}

foo_device(res, some_host_mdspan);  // Still works; memory is allocated and copy is performed
foo_device(res, some_device_mdspan);  // Still works and no allocation or copy is required
foo_device(res, some_managed_mdspan);  // Still works and no allocation or copy is required
```

While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable, `mdbuffer` permits a further consolidation through implicit conversion of an mdspan to an mdbuffer:

```c++
void foo_device(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data)
{ auto buf = raft::mdbuffer{res, data, raft::memory_type::device};
  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}

// All of the following work exactly as before but no longer require separate template instantiations
foo_device(res, some_host_mdspan);
foo_device(res, some_device_mdspan);
foo_device(res, some_managed_mdspan);
```

`mdbuffer` also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's `.view()` method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve a `std::variant` of all mdspan types which may provide a view on the `mdbuffer`'s data (depending on its memory type). We can then use `std::visit` to perform runtime dispatching based on where the data are stored:

```c++
void foo(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) {
  std::visit([](auto view) {
    if constexpr (typename decltype(view)::accessor_type::is_device_accessible) {
      // Do something with these data on device
    } else {
      // Do something with these data on host
    }
  }, data.view());
}
```

In addition to moving data among various memory types (host, device, managed, and pinned currently), `mdbuffer` can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary.

```c++
template <typename mdspan_type>
void foo_device(raft::resources const& res, mdspan_type data) {
  auto buf = raft::mdbuffer<float, raft::matrix_extent<int>, raft::row_major>{res,
raft::mdbuffer{data}, raft::memory_type::device};
  // Data in buf is now guaranteed to be accessible from device, and
  // represented by floats in row-major order.

  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());

  // The same check can be used to determine whether or not a copy was
  // required, regardless of the cause. I.e. if the data were already on
  // device but in column-major order, the is_owning() method would still
  // return true because new storage needed to be allocated.
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}
```

### What mdbuffer is **not**
`mdbuffer` is **not** a replacement for either `mdspan` or `mdarray`. `mdspan` remains the standard object for passing data views throughout the RAFT codebase, and `mdarray` remains the standard object for allocating new multi-dimensional data. This is reflected in the fact that `mdbuffer` can _only_ be constructed from an existing `mdspan` or `mdarray` or another `mdbuffer`. `mdbuffer` is intended to be used solely to simplify code where data _may_ need to be copied to a different location.

### Follow-ups

-  I have omitted the mdbuffer-based replacement for and generalization of `temporary_device_buffer` since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here.
- For all necessary copies, `mdbuffer` uses `raft::copy`. For _some_ transformations that require a change in data type or layout, `raft::copy` is not fully optimized. See #1842 for more information. Optimizing this will be an important change to ensure that `mdbuffer` can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect for `mdbuffer`, however, so there should be little concern about beginning to use it as is.
- `std::visit`'s performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmark `mdbuffer`'s current performance and compare to internal use of a `visit` implementation that uses a `switch` on the available memory types.

Resolve #1602

Authors:
  - William Hicks (https://github.com/wphicks)
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Ben Frederickson (https://github.com/benfred)

URL: #1999
ChristinaZ pushed a commit to ChristinaZ/raft that referenced this issue Jan 17, 2024
…ai#1999)

### What is mdbuffer?

This PR introduces a maybe-owning multi-dimensional abstraction called `mdbuffer` to help simplify code that _may_ require an `mdarray` but only if the data are not already in a desired form or location.

As a concrete example, consider a function `foo_device` which operates on memory accessible from the device. If we wish to pass it data originating on the host, a separate code path must be created in which a `device_mdarray` is created and the data are explicitly copied from host to device. This leads to a proliferation of branches as `foo_device` interacts with other functions with similar requirements.

As an initial simplification, `mdbuffer` allows us to write a single template that accepts an `mdspan` pointing to memory on either host _or_ device and routes it through the same code:
```c++
template <typename mdspan_type>
void foo_device(raft::resources const& res, mdspan_type data) {
  auto buf = raft::mdbuffer{res, raft::mdbuffer{data}, raft::memory_type::device};
  // Data in buf is now guaranteed to be accessible from device.
  // If it was already accessible from device, no copy was performed. If it
  // was not, a copy was performed.

  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());

  // It is sometimes useful to know whether or not a copy was performed to
  // e.g. determine whether the transformed data should be copied back to its original
  // location. This can be checked via the `is_owning()` method.
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}

foo_device(res, some_host_mdspan);  // Still works; memory is allocated and copy is performed
foo_device(res, some_device_mdspan);  // Still works and no allocation or copy is required
foo_device(res, some_managed_mdspan);  // Still works and no allocation or copy is required
```

While this is a useful simplification, it still leads to a proliferation of template instantiations. If this is undesirable, `mdbuffer` permits a further consolidation through implicit conversion of an mdspan to an mdbuffer:

```c++
void foo_device(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data)
{ auto buf = raft::mdbuffer{res, data, raft::memory_type::device};
  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}

// All of the following work exactly as before but no longer require separate template instantiations
foo_device(res, some_host_mdspan);
foo_device(res, some_device_mdspan);
foo_device(res, some_managed_mdspan);
```

`mdbuffer` also offers a simple way to perform runtime dispatching based on the memory type passed to it using standard C++ patterns. While mdbuffer's `.view()` method takes an optional template parameter indicating the mdspan type to retrieve as a view, that parameter can be omitted to retrieve a `std::variant` of all mdspan types which may provide a view on the `mdbuffer`'s data (depending on its memory type). We can then use `std::visit` to perform runtime dispatching based on where the data are stored:

```c++
void foo(raft::resources const& res, raft::mdbuffer<float, raft::matrix_extent<int>>&& data) {
  std::visit([](auto view) {
    if constexpr (typename decltype(view)::accessor_type::is_device_accessible) {
      // Do something with these data on device
    } else {
      // Do something with these data on host
    }
  }, data.view());
}
```

In addition to moving data among various memory types (host, device, managed, and pinned currently), `mdbuffer` can be used to coerce data to a desired in-memory layout or to a compatible data type (e.g. floats to doubles). As with changes in the memory type, a copy will be performed if and only if it is necessary.

```c++
template <typename mdspan_type>
void foo_device(raft::resources const& res, mdspan_type data) {
  auto buf = raft::mdbuffer<float, raft::matrix_extent<int>, raft::row_major>{res,
raft::mdbuffer{data}, raft::memory_type::device};
  // Data in buf is now guaranteed to be accessible from device, and
  // represented by floats in row-major order.

  some_kernel<<<...>>>(buf.view<raft::memory_type::device>());

  // The same check can be used to determine whether or not a copy was
  // required, regardless of the cause. I.e. if the data were already on
  // device but in column-major order, the is_owning() method would still
  // return true because new storage needed to be allocated.
  if (buf.is_owning()) {
    raft::copy(res, data, buf.view<raft::memory_type::device>());
  }
}
```

### What mdbuffer is **not**
`mdbuffer` is **not** a replacement for either `mdspan` or `mdarray`. `mdspan` remains the standard object for passing data views throughout the RAFT codebase, and `mdarray` remains the standard object for allocating new multi-dimensional data. This is reflected in the fact that `mdbuffer` can _only_ be constructed from an existing `mdspan` or `mdarray` or another `mdbuffer`. `mdbuffer` is intended to be used solely to simplify code where data _may_ need to be copied to a different location.

### Follow-ups

-  I have omitted the mdbuffer-based replacement for and generalization of `temporary_device_buffer` since this PR is already enormous. I have this partially written however, and I'll post a link to its current state to help motivate the changes here.
- For all necessary copies, `mdbuffer` uses `raft::copy`. For _some_ transformations that require a change in data type or layout, `raft::copy` is not fully optimized. See rapidsai#1842 for more information. Optimizing this will be an important change to ensure that `mdbuffer` can be used with absolutely minimal overhead in all cases. These non-optimized cases represent a small fraction of the real-world use cases we can expect for `mdbuffer`, however, so there should be little concern about beginning to use it as is.
- `std::visit`'s performance for a small number of variants is sometimes non-optimal. As a followup, it would be good to benchmark `mdbuffer`'s current performance and compare to internal use of a `visit` implementation that uses a `switch` on the available memory types.

Resolve rapidsai#1602

Authors:
  - William Hicks (https://github.com/wphicks)
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Ben Frederickson (https://github.com/benfred)

URL: rapidsai#1999
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant