Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] C++ benchmarks memory leak #4525

Closed
tfeher opened this issue Jan 26, 2022 · 0 comments · Fixed by #4594
Closed

[BUG] C++ benchmarks memory leak #4525

tfeher opened this issue Jan 26, 2022 · 0 comments · Fixed by #4594
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@tfeher
Copy link
Contributor

tfeher commented Jan 26, 2022

Describe the bug
Memory utilization continuously grows while running some of the C++ benchmarks, leading to OOM

Steps/Code to reproduce bug

make .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -GNinja
ninja -j 40 install
bench/prims_benchmark

Output

...
DistanceCosineF/10/manual_time                                            25.6 ms         25.7 ms           27
DistanceCosineF/11/manual_time                                            50.0 ms         50.1 ms           14
DistanceCosineF/12/manual_time                                            3.63 ms         3.65 ms          193
terminate called after throwing an instance of 'rmm::out_of_memory'
  what():  std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/envs/rapids/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory
Aborted (core dumped)

Might be related to particular benchmarks. The following lead to OOM:

  • bench/prims_benchmark --benchmark_filter=Gram*
  • bench/prims_benchmark --benchmark_filter=Distance*

Expected behavior
Memory consumption within bounds.

Environment details (please complete the following information):

  • Environment location: Docker rapidsai/rapidsai-core-dev-nightly:22.02-cuda11.5-devel-ubuntu20.04-py3.8
  • Linux Distro/Architecture: Ubuntu 20.04 amd64
  • GPU Model/Driver: V100-SXM2-16GB, 460.32.03
  • CUDA: 11.5
  • Method of cuML install: from source, git hash 4e23f87, using tools in the dev docker image
@tfeher tfeher added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 26, 2022
rapids-bot bot pushed a commit that referenced this issue Feb 24, 2022
Fix #4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft.

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4594
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Fix rapidsai#4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft.

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant