Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent DeviceBuffer DeviceMemoryResource premature release #931

Merged
merged 8 commits into from
Jan 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions python/rmm/_lib/device_buffer.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ from rmm._lib.lib cimport (
from rmm._lib.memory_resource cimport get_current_device_resource


# The DeviceMemoryResource attribute could be released prematurely
# by the gc if the DeviceBuffer is in a reference cycle. Removing
# the tp_clear function with the no_gc_clear decoration prevents that.
# See https://github.com/rapidsai/rmm/pull/931 for details.
@cython.no_gc_clear
cdef class DeviceBuffer:

def __cinit__(self, *,
Expand Down
27 changes: 26 additions & 1 deletion python/rmm/tests/test_rmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,8 @@ def test_rmm_enable_disable_logging(dtype, nelem, alloc, tmpdir):
def test_mr_devicebuffer_lifetime():
# Test ensures MR/Stream lifetime is longer than DeviceBuffer. Even if all
# references go out of scope
# Create new Pool MR
# It is necessary to verify that it also works when using an upstream :
# here a Pool MR with the current MR as upstream
rmm.mr.set_current_device_resource(
rmm.mr.PoolMemoryResource(rmm.mr.get_current_device_resource())
)
Expand Down Expand Up @@ -691,3 +692,27 @@ def callback(nbytes: int) -> bool:
with pytest.raises(MemoryError):
rmm.DeviceBuffer(size=int(1e11))
assert retried[0]


def test_dev_buf_circle_ref_dealloc():
viclafargue marked this conversation as resolved.
Show resolved Hide resolved
rmm.mr.set_current_device_resource(rmm.mr.CudaMemoryResource())

dbuf1 = rmm.DeviceBuffer(size=1_000_000)

# Make dbuf1 part of a reference cycle:
l1 = [dbuf1]
l1.append(l1)

# due to the reference cycle, the device buffer doesn't actually get
# cleaned up until later, when we invoke `gc.collect()`:
del dbuf1, l1

rmm.mr.set_current_device_resource(rmm.mr.CudaMemoryResource())

# by now, the only remaining reference to the *original* memory
# resource should be in `dbuf1`. However, the cyclic garbage collector
# will eliminate that reference when it clears the object via its
# `tp_clear` method. Later, when `tp_dealloc` attemps to actually
# deallocate `dbuf1` (which needs the MR alive), a segfault occurs.

gc.collect()