[BUG] `limiting_resource_adaptor` isn't actually thread safe #868

jrhemstad · 2021-09-13T14:35:32Z

Describe the bug

The limiting_resource_adaptor wraps an upstream and limits the amount of memory allocated from an upstream. If that limit is exceeded, it throws an rmm::bad_alloc.

An attempt was made at making this adaptor thread safe by using an atomic for the internal allocated_bytes_ used to track how much memory has been allocated from the upstream. However, this is not sufficient to guarantee the limit is enforced. This is the logic of allocating from the upstream:

  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  {
    void* p = nullptr;


    std::size_t proposed_size = rmm::detail::align_up(bytes, allocation_alignment_);
    if (proposed_size + allocated_bytes_ <= allocation_limit_) {
      p = upstream_->allocate(bytes, stream);
      allocated_bytes_ += proposed_size;
    } else {
      throw rmm::bad_alloc{"Exceeded memory limit"};
    }


    return p;
  }

This line:

if (proposed_size + allocated_bytes_ <= allocation_limit_)

is equivalent to:

auto const tmp = allocated_bytes_.load(memory_order_seq_cst)
if(proposed_size + tmp <= allocation_limit_)

Imagine at time t0 that allocation_limit_ is 2GB and allocated_bytes_ is 1GB.

At t0, two threads i and j attempt to allocate 1GB. They concurrently load allocated_bytes_ and see it is 1GB such that the check if (proposed_size + allocated_bytes_ <= allocation_limit_) will pass.

Both i and j will then attempt to allocate 1GB from the upstream. If both succeed (depending on the state of the upstream), the total allocated from the upstream will be 3GB, surpassing the 2GB limit.

Solution

A trivial fix is to just declare this adaptor is not threadsafe and require wrapping with the thread_safe_adaptor .

Similarly trivial is we could just add a lock internally to limiting_resource_adaptor.

A bit more involved, but still pretty easy is to just improve the allocation logic with the atomic.

  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  {
    auto const proposed_size = rmm::detail::align_up(bytes, allocation_alignment_);
    auto const old = allocated_bytes_.fetch_add(proposed_size);
    if (old + proposed_size <= allocation_limit_) {
     return upstream_->allocate(bytes, stream);
    }
    allocated_bytes_.fetch_sub(proposed_size);
    throw rmm::bad_alloc{"Exceeded memory limit"};
  }

  void do_deallocate(void* p, std::size_t bytes, cuda_stream_view stream) override
  {
    std::size_t allocated_size = rmm::detail::align_up(bytes, allocation_alignment_);
    upstream_->deallocate(p, bytes, stream);
    allocated_bytes.fetch_sub(allocated_size);
  }

In this "eager" algorithm, a thread will first atomically increment the allocated_bytes_ tracking variable before allocating from the upstream. In this way, two threads can never simultaneously observe a value of allocated_bytes_ that would allow them both to succeed even if the sum of their allocations would exceed the limit. If the old + proposed_size exceeds the limit, then you revert the previous fetch_add with a fetch_sub.

The text was updated successfully, but these errors were encountered:

abellina · 2021-09-13T15:15:06Z

@rongou fyi

rongou · 2021-09-13T17:47:02Z

I can take a look.

Fixes #868 Also fixed some clang-tidy warnings. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Alessandro Bellina (https://github.com/abellina) - Jake Hemstad (https://github.com/jrhemstad) - Mike Wilson (https://github.com/hyperbolic2346) - Mark Harris (https://github.com/harrism) URL: #869

jrhemstad added bug Something isn't working good first issue Good for newcomers labels Sep 13, 2021

abellina mentioned this issue Sep 13, 2021

support CUDA async memory resource in JNI rapidsai/cudf#9201

Merged

rongou self-assigned this Sep 13, 2021

rongou mentioned this issue Sep 13, 2021

fix race condition in limiting resource adapter #869

Merged

rapids-bot bot closed this as completed in #869 Sep 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `limiting_resource_adaptor` isn't actually thread safe #868

[BUG] `limiting_resource_adaptor` isn't actually thread safe #868

jrhemstad commented Sep 13, 2021 •

edited

Loading

abellina commented Sep 13, 2021

rongou commented Sep 13, 2021

[BUG] limiting_resource_adaptor isn't actually thread safe #868

[BUG] limiting_resource_adaptor isn't actually thread safe #868

Comments

jrhemstad commented Sep 13, 2021 • edited Loading

abellina commented Sep 13, 2021

rongou commented Sep 13, 2021

[BUG] `limiting_resource_adaptor` isn't actually thread safe #868

[BUG] `limiting_resource_adaptor` isn't actually thread safe #868

jrhemstad commented Sep 13, 2021 •

edited

Loading