Skip to content

Commit

Permalink
Fix for DeviceBuffer (#931)
Browse files Browse the repository at this point in the history
`DeviceBuffer` deallocation sometimes fails with a segfault. I wasn't able to create a minimal reproducer for now. However, a possible source for the problem is the way Cython releases the memory. Indeed the DeviceBuffer's `DeviceMemoryResource` appears to be sometimes released before the `device_buffer`. This causes a segfault in the the `device_buffer` deconstructor as the device memory resource pointer is set to null.

The code of this PR seems to fix the issue. It makes use of a holder class that seem to allow proper ordering of the class members destructions.

Authors:
  - Victor Lafargue (https://github.com/viclafargue)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #931
  • Loading branch information
viclafargue authored Jan 5, 2022
1 parent 8c5ed6a commit 5a239d2
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 1 deletion.
5 changes: 5 additions & 0 deletions python/rmm/_lib/device_buffer.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ from rmm._lib.lib cimport (
from rmm._lib.memory_resource cimport get_current_device_resource


# The DeviceMemoryResource attribute could be released prematurely
# by the gc if the DeviceBuffer is in a reference cycle. Removing
# the tp_clear function with the no_gc_clear decoration prevents that.
# See https://github.com/rapidsai/rmm/pull/931 for details.
@cython.no_gc_clear
cdef class DeviceBuffer:

def __cinit__(self, *,
Expand Down
27 changes: 26 additions & 1 deletion python/rmm/tests/test_rmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,8 @@ def test_rmm_enable_disable_logging(dtype, nelem, alloc, tmpdir):
def test_mr_devicebuffer_lifetime():
# Test ensures MR/Stream lifetime is longer than DeviceBuffer. Even if all
# references go out of scope
# Create new Pool MR
# It is necessary to verify that it also works when using an upstream :
# here a Pool MR with the current MR as upstream
rmm.mr.set_current_device_resource(
rmm.mr.PoolMemoryResource(rmm.mr.get_current_device_resource())
)
Expand Down Expand Up @@ -691,3 +692,27 @@ def callback(nbytes: int) -> bool:
with pytest.raises(MemoryError):
rmm.DeviceBuffer(size=int(1e11))
assert retried[0]


def test_dev_buf_circle_ref_dealloc():
rmm.mr.set_current_device_resource(rmm.mr.CudaMemoryResource())

dbuf1 = rmm.DeviceBuffer(size=1_000_000)

# Make dbuf1 part of a reference cycle:
l1 = [dbuf1]
l1.append(l1)

# due to the reference cycle, the device buffer doesn't actually get
# cleaned up until later, when we invoke `gc.collect()`:
del dbuf1, l1

rmm.mr.set_current_device_resource(rmm.mr.CudaMemoryResource())

# by now, the only remaining reference to the *original* memory
# resource should be in `dbuf1`. However, the cyclic garbage collector
# will eliminate that reference when it clears the object via its
# `tp_clear` method. Later, when `tp_dealloc` attemps to actually
# deallocate `dbuf1` (which needs the MR alive), a segfault occurs.

gc.collect()

0 comments on commit 5a239d2

Please sign in to comment.