-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cagra memory optimizations #1790
Conversation
This just gets copied over to the output, and requires multiple copies of the output graph on host memory.
was still making a copy from device->host inside serialize_mdspan, and with the include_dataset changes this branch won't even be called
template <typename T, typename IdxT> | ||
class device_matrix_view_from_host { | ||
public: | ||
device_matrix_view_from_host(raft::resources const& res, host_matrix_view<T, IdxT> host_view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wphicks this pattern seems a lot like the mdbuffer
to me. The goal here is to make a device_mdspan
when the pointer can be accessed from device or copy memory to device when it can't.
/** | ||
* Utility to sync memory from a host_matrix_view to a device_matrix_view | ||
* | ||
* In certain situations (UVM/HMM/ATS) host memory might be directly accessible on the | ||
* device, and no extra allocations need to be performed. This class checks | ||
* if the host_matrix_view is already accessible on the device, and only creates device | ||
* memory and copies over if necessary. In memory limited situations this is preferable | ||
* to having both a host and device copy | ||
*/ | ||
template <typename T, typename IdxT> | ||
class device_matrix_view_from_host { | ||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely okay keeping this as an internal utility for now. Could you add a todo to the docs here (and for the host->device conversion function) to use mdbuffer
for this once it's available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a TODO here 6552c66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @benfred!
/merge |
When trying to build a CAGRA index with 500M embeddings, we were running out of memory - even when using managed memory.
This PR contains some changes to reduce the memory usage:
optimize
call, and just use host memory passed in by caller