Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Interface support to reserve an address space without actually allocating it #6

Open
teju85 opened this issue May 6, 2020 · 14 comments

Comments

@teju85
Copy link
Member

teju85 commented May 6, 2020

Describe the solution you'd like
@seunghwak has a very good point here about having a support for reserving memory (aka over-subscription) using CUDA's virtual memory management APIs. I believe this is a good improvement to our existing Allocator, device_buffer and host_buffer interfaces. Thus, filing this issue so that this feature item is not lost.

Additional context
Ref: https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management/

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@divyegala
Copy link
Member

@teju85 @seunghwak is this something that will be solved by using RMM directly?

@teju85
Copy link
Member Author

teju85 commented Jul 19, 2021

Yes, I think we should be able to use RMM's managed_memory_resource for this purpose. But I'd like to hear from @seunghwak if this was what he had in mind or something else.

@seunghwak
Copy link
Contributor

My understanding of this issue is about reserving an address space (https://developer.nvidia.com/blog/introducing-low-level-gpu-virtual-memory-management/) instead of using managed memory.

This address space reservation feature is mainly to avoid reallocation. If we resize a vector, 1) it first allocates the memory block having the new size, 2) copies the old data to the new block, and 3) deallocates the old block. This address reservation feature allows us to replace 1), 2), 3) by just allocating a (new size - old size) block and mapping this to the reserved address space.

Managed memory is to allow using host memory as a lower level buffer (larger in size but slower in speed) for a device memory.

So, these two are two different things, and AFAIK, there was some brief discussion about supporting the address space reservation feature in RMM, but AFAIK, it has not been implemented yet in RMM.

@teju85
Copy link
Member Author

teju85 commented Jul 19, 2021

I now remember this discussion. This needs the support of address-reserve API of cuda, which I don't see in RMM. @harrism and/or @jrhemstad is this being planned for in RMM?

@jrhemstad
Copy link

Use of the CUDA VMM APIs would be an implementation detail of a memory resource implementation. There aren't any plans to expose explicit RMM APIs for CUDA VMM.

@teju85
Copy link
Member Author

teju85 commented Jul 20, 2021

@jrhemstad if there's a use-case like the one Seunghwa discussed above, is there a chance of getting this feature added onto RMM roadmap?

@harrism
Copy link
Member

harrism commented Jul 20, 2021

OK, so the specific use case is for resizing buffers, and the reasoning is to avoid copies.

The trouble with this is that RMM implements the memory_resource interface, which originates from std. There is no reallocate in memory_resource (or in C++, for that matter, only C). So adding a feature like this would have to be downstream of the MR interface, which is not attractive. For this reason it has only been discussed for RMM, no decision has been made whether or not to implement it.

So let me ask: is reallocation definitely a bottleneck?

@teju85
Copy link
Member Author

teju85 commented Jul 20, 2021

Fair enough @harrism . So far I haven't seen this being the bottleneck, maybe @seunghwak had a use-case in mind when he suggested this feature?

@seunghwak
Copy link
Contributor

So, the biggest benefit of this is memory footprint. We can clearly live without this, but having this will allow more memory footprint optimization for us (and we can handle bigger graphs within the gpu memory limit).

To explain this in more detail,

without this feature, to resize, we need memory size of "old_size + new_size" while with address space reservation, we need only max(old_size, new_size). Say we're doing filtering of multiple blocks, we do something like.

rmm::device_uvector<int> filtered_elements(0, stream);
size_t num_inserted = 0;
for (size_t i = 0; i < num_blocks; ++i) {
  filtered_elements.resize(num_inserted + block_sizes[i], stream);
  filter(...); // num_inserted gets updated...
}
filtered_elements.resize(num_inserted, stream);

So, with the address reservation, we need to just reserve the address space of sum block_sizes[i] but without address reservation, this code will require actual allocation of sum (i=0 to block_sizes -2) block_sizes[i] + sum block_sizes[i] in the worst case (if 100% of the elements passes filtering).

Avoiding copy is a second benefit (but less important as this is pretty fast).

@harrism
Copy link
Member

harrism commented Jul 20, 2021

Address reservation can be used (as Jake pointed out) as an implementation detail of the vector to reduce memory overhead of resizing. This does not require an external interface for address reservation.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapids-bot bot pushed a commit that referenced this issue Feb 15, 2024
Demangle the error stack trace provided by GCC.
Example output:
```bash
RAFT failure at file=/workspace/raft/cpp/bench/ann/src/raft/raft_ann_bench_utils.h line=127: Ooops!
Obtained 16 stack frames
#1 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::logic_error::logic_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5e [0x7fb20acce45e]
#2 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::configured_raft_resources::stream_wait(CUstream_st*) const +0x2e3 [0x7fb20acd0ac3]
#3 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::RaftIvfPQ<float, long>::search(float const*, int, int, unsigned long*, float*, CUstream_st*) const +0x63e [0x7fb20acd44fe]
#4 in ./cpp/build/ANN_BENCH: void raft::bench::ann::bench_search<float>(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective) +0xf76 [0x55853859f586]
#5 in ./cpp/build/ANN_BENCH: benchmark::internal::LambdaBenchmark<benchmark::RegisterBenchmark<void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&)::{lambda(benchmark::State&)#1}>::Run(benchmark::State&) +0x84 [0x558538548f14]
#6 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkInstance::Run(long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const +0x168 [0x5585385d6498]
#7 in ./cpp/build/ANN_BENCH(+0x149108) [0x5585385b7108]
#8 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoNIterations() +0x34f [0x5585385b8c7f]
#9 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoOneRepetition() +0x119 [0x5585385b99b9]
#10 in ./cpp/build/ANN_BENCH(+0x13afdd) [0x5585385a8fdd]
#11 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) +0x58e [0x5585385aa8fe]
#12 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks() +0x6a [0x5585385aaada]
#13 in ./cpp/build/ANN_BENCH: raft::bench::ann::run_main(int, char**) +0x11ed [0x5585385980cd]
#14 in /lib/x86_64-linux-gnu/libc.so.6(+0x28150) [0x7fb213e28150]
#15 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x89 [0x7fb213e28209]
#16 in ./cpp/build/ANN_BENCH(+0xbfcef) [0x55853852dcef]


```

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #2188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

5 participants