Cagra memory optimizations #1790

benfred · 2023-08-30T23:40:06Z

When trying to build a CAGRA index with 500M embeddings, we were running out of memory - even when using managed memory.

This PR contains some changes to reduce the memory usage:

For certain large matrices, don't make 2nd copies on the device or host if the memory is already accessible via UVM /ATS/HMM. For instance, we were taking a copy of the intermediate graph from host to device memory - and in certain cases (500M dataset, intermediate_graph_degree=128) the intermediate graph was 256GB alone.
Don’t create a separate ‘pruned_graph’ host matrix in the optimize call, and just use host memory passed in by caller
Free the intermediate graph before creating the index

This just gets copied over to the output, and requires multiple copies of the output graph on host memory.

was still making a copy from device->host inside serialize_mdspan, and with the include_dataset changes this branch won't even be called

cjnolet · 2023-08-31T14:31:49Z

cpp/include/raft/neighbors/detail/cagra/utils.hpp

+template <typename T, typename IdxT>
+class device_matrix_view_from_host {
+ public:
+  device_matrix_view_from_host(raft::resources const& res, host_matrix_view<T, IdxT> host_view)


cc @wphicks this pattern seems a lot like the mdbuffer to me. The goal here is to make a device_mdspan when the pointer can be accessed from device or copy memory to device when it can't.

cjnolet · 2023-09-05T17:12:26Z

cpp/include/raft/neighbors/detail/cagra/utils.hpp

+/**
+ * Utility to sync memory from a host_matrix_view to a device_matrix_view
+ *
+ * In certain situations (UVM/HMM/ATS) host memory might be directly accessible on the
+ * device, and no extra allocations need to be performed. This class checks
+ * if the host_matrix_view is already accessible on the device, and only creates device
+ * memory and copies over if necessary. In memory limited situations this is preferable
+ * to having both a host and device copy
+ */
+template <typename T, typename IdxT>
+class device_matrix_view_from_host {
+ public:


I'm definitely okay keeping this as an internal utility for now. Could you add a todo to the docs here (and for the host->device conversion function) to use mdbuffer for this once it's available?

added a TODO here 6552c66

cjnolet

LGTM. Thanks @benfred!

cjnolet · 2023-09-10T03:53:53Z

/merge

benfred added 10 commits August 28, 2023 15:19

Don't keep extra pruned_graph copy in optimize

4ba7510

This just gets copied over to the output, and requires multiple copies of the output graph on host memory.

in serialize, don't take copy of dataset if not strided

c5e4998

simplify kern_prune

5d9c959

avoid multiple detour_count matrices when using managed memory

203667f

free intermediate graph before creating index

f803deb

don't take device copy of intermediate graph unless necessary

c2eb1e3

refactor

eb86b1e

rev_graph

191e7fc

remove accidental checkin

f539655

revert serialization change

0e0c5f3

was still making a copy from device->host inside serialize_mdspan, and with the include_dataset changes this branch won't even be called

benfred requested a review from a team as a code owner August 30, 2023 23:40

github-actions bot added the cpp label Aug 30, 2023

benfred added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Aug 30, 2023

Merge branch 'branch-23.10' into cagra_mem

9d13475

cjnolet assigned benfred Aug 30, 2023

cjnolet reviewed Aug 31, 2023

View reviewed changes

cjnolet reviewed Sep 5, 2023

View reviewed changes

benfred and others added 4 commits September 5, 2023 10:28

Add TODO about using mdbuffer

6552c66

Merge branch 'cagra_mem' of github.com:benfred/raft into cagra_mem

3fa5829

Merge branch 'branch-23.10' into cagra_mem

0c1c718

Merge branch 'branch-23.10' into cagra_mem

10b9e5a

cjnolet approved these changes Sep 8, 2023

View reviewed changes

rapids-bot bot merged commit 12480cf into rapidsai:branch-23.10 Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cagra memory optimizations #1790

Cagra memory optimizations #1790

benfred commented Aug 30, 2023

cjnolet Aug 31, 2023

cjnolet Sep 5, 2023

benfred Sep 5, 2023

cjnolet left a comment

cjnolet commented Sep 10, 2023

Cagra memory optimizations #1790

Cagra memory optimizations #1790

Conversation

benfred commented Aug 30, 2023

cjnolet Aug 31, 2023

Choose a reason for hiding this comment

cjnolet Sep 5, 2023

Choose a reason for hiding this comment

benfred Sep 5, 2023

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Sep 10, 2023