[BUG] C++ benchmarks memory leak #4525

tfeher · 2022-01-26T16:19:03Z

Describe the bug
Memory utilization continuously grows while running some of the C++ benchmarks, leading to OOM

Steps/Code to reproduce bug

make .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -GNinja
ninja -j 40 install
bench/prims_benchmark

Output

...
DistanceCosineF/10/manual_time                                            25.6 ms         25.7 ms           27
DistanceCosineF/11/manual_time                                            50.0 ms         50.1 ms           14
DistanceCosineF/12/manual_time                                            3.63 ms         3.65 ms          193
terminate called after throwing an instance of 'rmm::out_of_memory'
  what():  std::bad_alloc: out_of_memory: CUDA error at: /opt/conda/envs/rapids/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory
Aborted (core dumped)

Might be related to particular benchmarks. The following lead to OOM:

bench/prims_benchmark --benchmark_filter=Gram*
bench/prims_benchmark --benchmark_filter=Distance*

Expected behavior
Memory consumption within bounds.

Environment details (please complete the following information):

Environment location: Docker rapidsai/rapidsai-core-dev-nightly:22.02-cuda11.5-devel-ubuntu20.04-py3.8
Linux Distro/Architecture: Ubuntu 20.04 amd64
GPU Model/Driver: V100-SXM2-16GB, 460.32.03
CUDA: 11.5
Method of cuML install: from source, git hash 4e23f87, using tools in the dev docker image

The text was updated successfully, but these errors were encountered:

Fix #4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #4594

Fix rapidsai#4525 as well as a hard crash in c++ benchmarks due to some recent changes in raft. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4594

tfeher added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 26, 2022

RAMitchell mentioned this issue Feb 21, 2022

Fix OOM and cudaContext crash in C++ benchmarks #4594

Merged

rapids-bot bot closed this as completed in #4594 Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] C++ benchmarks memory leak #4525

[BUG] C++ benchmarks memory leak #4525

tfeher commented Jan 26, 2022

[BUG] C++ benchmarks memory leak #4525

[BUG] C++ benchmarks memory leak #4525

Comments

tfeher commented Jan 26, 2022