Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out-of-memory error in compiled-binaryop benchmark #10269

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

davidwendt
Copy link
Contributor

Fixes out-of-memory error that occurs when running the BINARYOP_BENCH COMPILED_BINARYOP benchmark combined with the BINARYOP benchmark. The following minimal command shows the error:

benchmarks/BINARYOP_BENCH '--benchmark_filter=COMPILED_BINARYOP|100000000/10'
...
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/10/manual_time                              40.4 ms         40.4 ms           17 bytes_per_second=202.953G/s
terminate called after throwing an instance of 'rmm::out_of_memory'
  what():  std::bad_alloc: out_of_memory: RMM failure at:/conda/envs/rapids/include/rmm/mr/device/pool_memory_resource.hpp:192: Maximum pool size exceeded
Aborted (core dumped)

The COMPILED_BINARYOP is using a TEMPLATED_BENCHMARK_F macro which causes a new separate memory pool to be created instead of reusing the one already created by BINARYOP.
This PR reworks the benchmark macros in compiled_binaryop.cpp to avoid using TEMPLATED_BENCHMARK_F allowing it share the existing memory pool.

Similar to #10258

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Feb 10, 2022
@davidwendt davidwendt requested a review from a team as a code owner February 10, 2022 16:19
@davidwendt davidwendt self-assigned this Feb 10, 2022
@codecov
Copy link

codecov bot commented Feb 10, 2022

Codecov Report

Merging #10269 (bbb3282) into branch-22.04 (eb5e3e3) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##           branch-22.04   #10269   +/-   ##
=============================================
  Coverage         10.43%   10.43%           
=============================================
  Files               122      122           
  Lines             20583    20583           
=============================================
  Hits               2147     2147           
  Misses            18436    18436           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb5e3e3...bbb3282. Read the comment docs.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. TIL.

@karthikeyann
Copy link
Contributor

I too fixed the same bug locally with same fix.
Thanks for doing this.

TEMPLATED_BENCHMARK_F macro which causes a new separate memory pool

Still unsure how this is caused.

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@karthikeyann
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 463266f into rapidsai:branch-22.04 Feb 14, 2022
@davidwendt davidwendt deleted the binaryop-benchmark-oom branch February 14, 2022 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants