[BUG] raft::allocate bypasses RMM #308

harrism · 2021-08-03T02:44:22Z

Describe the bug

/** cuda malloc */
template <typename Type>
void allocate(Type*& ptr, size_t len, bool setZero = false) {
  CUDA_CHECK(cudaMalloc((void**)&ptr, sizeof(Type) * len));
  if (setZero) CUDA_CHECK(cudaMemset(ptr, 0, sizeof(Type) * len));
}

Surprised to see that this quite commonly called function (in RAFT and cuML) calls directly into cudaMalloc, making memory pool use impossible.

The text was updated successfully, but these errors were encountered:

dantegd · 2021-08-03T13:18:23Z

Not sure where you found that allocate that indeed it shouldn’t be around unless it’s for a test perhaps? But in general the allocate that is used by raft (and cuML) is here

raft/cpp/include/raft/mr/device/allocator.hpp

Line 43 in 947e22f

void* ptr = rmm::mr::get_current_device_resource()->allocate(n, stream);

(which is being replaced to use rmm directly, but it works with the pool allocator currently AFAIK) so if something is using that allocate you mention it is a big and should be removed.

Cc @viclafargue since you’re doing the raft rmm refactor, it’d be good to double check nothing in the main code is using this cudamalloc

viclafargue · 2021-08-03T14:39:04Z

Yes, with the RAFT refactor, raft::allocate should now use RMM :

raft/cpp/include/raft/cudart_utils.h

Lines 268 to 277 in 7725a80

    
           template <typename Type> 
        
           void allocate(Type*& ptr, size_t len, cudaStream_t stream, 
        
                         bool setZero = false) { 
        
             size_t size = len * sizeof(Type); 
        
             ptr = (Type*)rmm::mr::get_current_device_resource()->allocate(size, stream); 
        
             if (setZero) CUDA_CHECK(cudaMemset((void*)ptr, 0, size)); 
        
             std::lock_guard<std::mutex> _(mutex_); 
        
             allocations[ptr] = size; 
        
           }

The new design of raft::allocate, raft::deallocate, and raft::deallocate_all is still in progress though.

harrism · 2021-08-04T22:52:01Z

@dantegd it's in <raft/cudart_utils.h>

And it's called all over the place (both RAFT and cuML), not just in tests. Just search the codebase for raft::allocate( and you will see.

harrism · 2021-08-04T22:57:13Z

Moreover, very little code should need to call memory_resource::allocate: only thrust allocators. Everything else should use containers: device_buffer for untyped or byte data, device_uvector and device_vector for typed data. No raw pointers. Searching the codebase for the string allocate( should turn up very few results.

As an example, if you search for this string in libcudf it comes up 10 times, 8 of which are inside allocators. And I think all 10 will be gone from libcudf in the near future.

dantegd · 2021-08-09T15:29:25Z

@harrism I fully agree, though I commented that it is in tests mostly since I did the searchfor raft::allocate( and found it in = a lot of tests, but couldn't find it in any non tests in RAFT, and in 2 non tests files in cuML:

cpp/src/ml_mg_utils.cuh
cpp/src_prims/metrics/scores.cuh

Just was curious if I missed some place else.

Regardless of that, the PR of @viclafargue should solve the issue of bypassing, and yes the whole codebase is moving to use containers directly and has been for a while, can't find an issue on a quick search though, @divyegala would you happen to know if we have an issue for use of containers?

divyegala · 2021-08-10T00:19:29Z

@dantegd @harrism I don't believe we have an issue in cuML for directly using containers. I am not against using raft::allocate as long as it's only in tests

harrism · 2021-08-10T04:53:18Z

OK, I see now it's mostly in tests. I do see it in src_prims/metrics/scores.cuh and src/ml_mg_utils.cuh.

That said, why is it OK for test code to be lower quality than core library code? If the tests allocate most memory using a raw byte allocator, that means they are using raw pointers. No Raw Pointers is an important goal to achieve, so I would add a corollary: No Raw Allocation.

Second, bypassing RMM in tests is a bad idea, as it means you can't easily switch the underlying memory resource used in the tests to help isolate behavior happening in real apps where RMM MRs are used.

viclafargue · 2021-08-11T17:33:21Z

@harrism I could remove it from src, src_prims and bench. I agree that it's probably better to avoid using raw allocation in tests as well. However, we would like to merge the changes in RMM, RAFT, cuML, cuGraph and cuHornet as soon as possible to avoid possible future conflicts with PRs people are currently working on. It could be interesting to have a follow-up PR to remove any call to raft::allocate in testing as well.

Second, bypassing RMM in tests is a bad idea, as it means you can't easily switch the underlying memory resource used in the tests to help isolate behavior happening in real apps where RMM MRs are used.

raft::allocate used in tests should not bypass RMM anymore. It now calls rmm::mr::get_current_device_resource() that should return a pointer to an RMM allocator that follows RMM configuration. Please correct me if I'm missing something though. However, again, it might indeed be better to remove all calls to raft::allocate in the end, maybe in a follow-up PR.

dantegd · 2021-09-07T21:33:07Z

@viclafargue this now should have been dealt with #286, correct? Should we update the issue or open a new one about removing raft::allocate entirely?

viclafargue · 2021-09-08T08:17:57Z

raft::allocate should indeed now use RMM thanks to #286. I just opened two new issues (#323 and rapidsai/cuml#4197) to remove any calls to raft::allocate in RAFT and cuML.

harrism · 2021-10-06T02:26:57Z

I'm reopening this because there are still calls to rmm::mr::get_current_device_resource()->allocate(size, stream); in RAFT. These will cause a problem in the near future as we transition to replacing the RMM::device_memory_resource interface with proposed cuda::memory_resource and cuda::stream_ordered_memory_resource interfaces in libcu++. Those interfaces will somewhat change the API. For example, there will be separate allocate() and allocate_async functions, the latter accepting a stream. Also, there will be an optional alignment parameter.

This is a great example of why RMM clients should use rmm::device_buffer for raw byte allocation rather than direct allocation using mr::allocate(). "No raw pointers"

harrism · 2021-10-06T02:28:27Z

Sorry, just saw there are new issues opened. Do they cover all uses of MR->allocate()?

viclafargue · 2021-10-07T09:15:44Z

Well, replacing raft::allocate calls (see #323), would allow to remove the function in cpp/include/raft/cudart_utils.h.
Then, if my observations are right, the MR->allocate() pattern would only remain in these two files : cpp/include/raft/mr/buffer_base.hpp and cpp/include/raft/mr/device/allocator.hpp.

github-actions · 2021-11-23T20:00:59Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-02-21T20:01:14Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Answers #308. Requires the appropriate changes in `cuML` and `cuGraph` before merging. Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #400

harrism added the bug Something isn't working label Aug 3, 2021

viclafargue closed this as completed Sep 8, 2021

harrism reopened this Oct 6, 2021

cjnolet mentioned this issue Oct 6, 2021

[WIP] Removing raft::allocate from stats sum test #357

Closed

cjnolet assigned viclafargue Oct 12, 2021

github-actions bot added the inactive-30d label Nov 23, 2021

viclafargue mentioned this issue Nov 24, 2021

[WIP] Remove RAFT memory management #400

Merged

github-actions bot added the inactive-90d label Feb 21, 2022

cjnolet closed this as completed Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] raft::allocate bypasses RMM #308

[BUG] raft::allocate bypasses RMM #308

harrism commented Aug 3, 2021

dantegd commented Aug 3, 2021 •

edited

Loading

viclafargue commented Aug 3, 2021

harrism commented Aug 4, 2021 •

edited

Loading

harrism commented Aug 4, 2021

dantegd commented Aug 9, 2021 •

edited

Loading

divyegala commented Aug 10, 2021

harrism commented Aug 10, 2021

viclafargue commented Aug 11, 2021

dantegd commented Sep 7, 2021

viclafargue commented Sep 8, 2021

harrism commented Oct 6, 2021

harrism commented Oct 6, 2021

viclafargue commented Oct 7, 2021

github-actions bot commented Nov 23, 2021

github-actions bot commented Feb 21, 2022

[BUG] raft::allocate bypasses RMM #308

[BUG] raft::allocate bypasses RMM #308

Comments

harrism commented Aug 3, 2021

dantegd commented Aug 3, 2021 • edited Loading

viclafargue commented Aug 3, 2021

harrism commented Aug 4, 2021 • edited Loading

harrism commented Aug 4, 2021

dantegd commented Aug 9, 2021 • edited Loading

divyegala commented Aug 10, 2021

harrism commented Aug 10, 2021

viclafargue commented Aug 11, 2021

dantegd commented Sep 7, 2021

viclafargue commented Sep 8, 2021

harrism commented Oct 6, 2021

harrism commented Oct 6, 2021

viclafargue commented Oct 7, 2021

github-actions bot commented Nov 23, 2021

github-actions bot commented Feb 21, 2022

dantegd commented Aug 3, 2021 •

edited

Loading

harrism commented Aug 4, 2021 •

edited

Loading

dantegd commented Aug 9, 2021 •

edited

Loading